Extend the FileCollector's API with addDirectory which adds a directory
and its contents to the VFS mapping.
Differential revision: https://reviews.llvm.org/D76671
The current implementation of the JSONWriter does not support writing
out directory entries. Earlier today I added a unit test to illustrate
the problem. When an entry is added to the YAMLVFSWriter and the path is
a directory, it will incorrectly emit the directory as a file, and any
files inside that directory will not be found by the VFS.
It's possible to partially work around the issue by only adding "leaf
nodes" (files) to the YAMLVFSWriter. However, this doesn't work for
representing empty directories. This is a problem for clients of the VFS
that want to iterate over a directory. The directory not being there is
not the same as the directory being empty.
This is not just a hypothetical problem. The FileCollector for example
does not differentiate between file and directory paths. I temporarily
worked around the issue for LLDB by ignoring directories, but I suspect
this will prove problematic sooner rather than later.
This patch fixes the issue by extending the JSONWriter to support
writing out directory entries. We store whether an entry should be
emitted as a file or directory.
Differential revision: https://reviews.llvm.org/D76670
Before this patch, it wasn't possible to extend the ThinLTO threads to all SMT/CMT threads in the system. Only one thread per core was allowed, instructed by usage of llvm::heavyweight_hardware_concurrency() in the ThinLTO code. Any number passed to the LLD flag /opt:lldltojobs=..., or any other ThinLTO-specific flag, was previously interpreted in the context of llvm::heavyweight_hardware_concurrency(), which means SMT disabled.
One can now say in LLD:
/opt:lldltojobs=0 -- Use one std::thread / hardware core in the system (no SMT). Default value if flag not specified.
/opt:lldltojobs=N -- Limit usage to N threads, regardless of usage of heavyweight_hardware_concurrency().
/opt:lldltojobs=all -- Use all hardware threads in the system. Equivalent to /opt:lldltojobs=$(nproc) on Linux and /opt:lldltojobs=%NUMBER_OF_PROCESSORS% on Windows. When an affinity mask is set for the process, threads will be created only for the cores selected by the mask.
When N > number-of-hardware-threads-in-the-system, the threads in the thread pool will be dispatched equally on all CPU sockets (tested only on Windows).
When N <= number-of-hardware-threads-on-a-CPU-socket, the threads will remain on the CPU socket where the process started (only on Windows).
Differential Revision: https://reviews.llvm.org/D75153
When Clang crashes a useful message is output:
"PLEASE submit a bug report to https://bugs.llvm.org/ and include the
crash backtrace, preprocessed source, and associated run script."
A similar message is now output for all tools.
Differential Revision: https://reviews.llvm.org/D74324
Summary:
This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
in addition to the GCC patch for the 8..6-a CLI:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html
In detail this patch
- march options for armv8.6-a
- BFloat16 assembly
This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.
Based on work by:
- labrinea
- MarkMurrayARM
- Luke Cheeseman
- Javed Asbar
- Mikhail Maltsev
- Luke Geeson
Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson
Reviewed By: SjoerdMeijer
Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D76062
The algorithm supports both assigning a fixed offset to a field prior to
layout and allowing fields to have sizes that aren't multiples of their
required alignments. This means that the well-known algorithm of sorting
by decreasing alignment isn't always good enough. Still, we start with
that, and only if that leaves padding around do we fall back on a greedy
padding-minimizing algorithm.
There is no known efficient algorithm for producing a guaranteed-minimal
layout in all cases. In fact, allowing arbitrary fixed-offset fields means
there's a straightforward reduction from bin-packing, making this NP-hard.
But as usual with such problems, we can still efficiently produce adequate
solutions to the cases that matter most to us.
I intend to use this in coroutine frame layout, where the retcon lowerings
very badly want to minimize total space usage, and where the switch lowering
can indeed produce a header with interior padding if the promise field is
highly-aligned. But it may be useful in a much wider variety of situations.
Summary:
When building a large Xcode project with multiple module dependencies, and mixed Objective-C & Swift, I observed a large number of clang processes stalling at zero CPU for 30+ seconds throughout the build. This was especially prevalent on my 18-core iMac Pro.
After some sampling, the major cause appears to be the lock file implementation for precompiled modules in the module cache. When the lock is heavily contended by multiple clang processes, the exponential backoff runs in lockstep, with some of the processes sleeping for 30+ seconds in order to acquire the file lock.
In the attached patch, I implemented a more aggressive polling mechanism that limits the sleep interval to a max of 500ms, and randomizes the wait time. I preserved a limited form of exponential backoff. I also updated the code to use cross-platform timing, thread sleep, and random number capabilities available in C++11.
On iMac Pro (2.3 GHz Intel Xeon W, 18 core):
Xcode 11.1 bundled clang:
502.2 seconds (average of 5 runs)
Custom clang build with LockFileManager patch applied:
276.6 seconds (average of 5 runs)
This is a 1.82x speedup for this use case.
On MacBook Pro (4 core 3.1GHz Intel i7):
Xcode 11.1 bundled clang:
539.4 seconds (average of 2 runs)
Custom clang build with LockFileManager patch applied:
509.5 seconds (average of 2 runs)
As expected, machines with fewer cores benefit less from this change.
```
Call graph:
2992 Thread_393602 DispatchQueue_1: com.apple.main-thread (serial)
2992 start (in libdyld.dylib) + 1 [0x7fff6a1683d5]
2992 main (in clang) + 297 [0x1097a1059]
2992 driver_main(int, char const**) (in clang) + 2803 [0x1097a5513]
2992 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (in clang) + 1608 [0x1097a7cc8]
2992 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (in clang) + 3299 [0x1097dace3]
2992 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (in clang) + 509 [0x1097dcc1d]
2992 clang::FrontendAction::Execute() (in clang) + 42 [0x109818b3a]
2992 clang::ParseAST(clang::Sema&, bool, bool) (in clang) + 185 [0x10981b369]
2992 clang::Parser::ParseFirstTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&) (in clang) + 37 [0x10983e9b5]
2992 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&) (in clang) + 141 [0x10983ecfd]
2992 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (in clang) + 695 [0x10983f3b7]
2992 clang::Parser::ParseObjCAtDirectives(clang::Parser::ParsedAttributesWithRange&) (in clang) + 637 [0x10a9be9bd]
2992 clang::Parser::ParseModuleImport(clang::SourceLocation) (in clang) + 170 [0x10c4841ba]
2992 clang::Parser::ParseModuleName(clang::SourceLocation, llvm::SmallVectorImpl<std::__1::pair<clang::IdentifierInfo*, clang::SourceLocation> >&, bool) (in clang) + 503 [0x10c485267]
2992 clang::Preprocessor::Lex(clang::Token&) (in clang) + 316 [0x1098285cc]
2992 clang::Preprocessor::LexAfterModuleImport(clang::Token&) (in clang) + 690 [0x10cc7af62]
2992 clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<std::__1::pair<clang::IdentifierInfo*, clang::SourceLocation> >, clang::Module::NameVisibilityKind, bool) (in clang) + 7989 [0x10bba6535]
2992 compileAndLoadModule(clang::CompilerInstance&, clang::SourceLocation, clang::SourceLocation, clang::Module*, llvm::StringRef) (in clang) + 296 [0x10bba8318]
2992 llvm::LockFileManager::waitForUnlock() (in clang) + 91 [0x10b6953ab]
2992 nanosleep (in libsystem_c.dylib) + 199 [0x7fff6a22c914]
2992 __semwait_signal (in libsystem_kernel.dylib) + 10 [0x7fff6a2a0f32]
```
Differential Revision: https://reviews.llvm.org/D69575
Check the path length limit against the length of the UTF-16 version of
the input rather than the UTF-8 equivalent, as the UTF-16 length may be
shorter. Move widenPath from the llvm::sys::path namespace in Path.h to
the llvm::sys::windows namespace in WindowsSupport.h. Only use the
reduced path length limit for create directory. Canonicalize using
sys::path::remove_dots().
Differential Revision: https://reviews.llvm.org/D75372
Currently, when building with the Unix support library and `isatty` does
not exist for the target platform (i.e. `HAVE_ISATTY` is false),
compilation of the file `raw_ostream.cpp` will fail due to direct use of
`isatty` in the function `raw_fd_ostream::preferred_buffer_size()`.
Use is_displayed() to fix the problem.
Reviewed By: probinson, MaskRay
Differential Revision: https://reviews.llvm.org/D75278
This adds the --debug-vars option to llvm-objdump, which prints
locations (registers/memory) of source-level variables alongside the
disassembly based on DWARF info. A vertical line is printed for each
live-range, with a label at the top giving the variable name and
location, and the position and length of the line indicating the program
counter range in which it is valid.
Currently, this only works for object files, not executables or shared
libraries.
Differential revision: https://reviews.llvm.org/D70720
After a crash catched by the CrashRecoveryContext, this patch prevents from accessing dangling pointers in TimerGroup structures before the clang tool exits. Previously, the default TimerGroup had internal linked lists which were still pointing to old Timer or TimerGroup instances, which lived in stack frames released by the CrashRecoveryContext.
Fixes PR45164.
Differential Revision: https://reviews.llvm.org/D76099
Behavior of IEEEFloat::roundToIntegral is aligned with IEEE-754
operation roundToIntegralExact. In partucular this function now:
- returns opInvalid for signaling NaNs,
- returns opInexact if the result of rounding differs from argument.
Differential Revision: https://reviews.llvm.org/D75246
Added a write method for TimeTrace that takes two strings representing
file names. The first is any file name that may have been provided by the
user via `time-trace-file` flag, and the second is a fallback that should
be configured by the caller. This method makes it cleaner to write the
trace output because there is no longer a need to check file names at the
caller and simplifies future TimeTrace usages.
Reviewed By: modocache
Differential Revision: https://reviews.llvm.org/D74514
* Delete boilerplate
* Change functions to return `Error`
* Test parsing errors
* Update callers of ARMAttributeParser::parse() to check the `Error` return value.
Since this patch touches nearly everything in the file, I apply
http://llvm.org/docs/Proposals/VariableNames.html and change variable
names to lower case.
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D75015
and follow-ups:
a2ca1c2d "build: disable zlib by default on Windows"
2181bf40 "[CMake] Link against ZLIB::ZLIB"
1079c68a "Attempt to fix ZLIB CMake logic on Windows"
This changed the output of llvm-config --system-libs, and more
importantly it broke stand-alone builds. Instead of piling on more fix
attempts, let's revert this to reduce the risk of more breakages.
This patch upstreams support for the ARM Armv8.1m cpu Cortex-M55.
In detail adding support for:
- mcpu option in clang
- Arm Target Features in clang
- llvm Arm TargetParser definitions
details of the CPU can be found here:
https://developer.arm.com/ip-products/processors/cortex-m/cortex-m55
Reviewers: chill
Reviewed By: chill
Subscribers: dmgreen, kristof.beyls, hiraditya, cfe-commits,
llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74966
Lots of headers pass around MemoryBuffer objects, but very few open
them. Let those that do include FileSystem.h.
Saves ~250 includes of Chrono.h & FileSystem.h:
$ diff -u thedeps-before.txt thedeps-after.txt | grep '^[-+] ' | sort | uniq -c | sort -nr
254 - ../llvm/include/llvm/Support/FileSystem.h
253 - ../llvm/include/llvm/Support/Chrono.h
237 - ../llvm/include/llvm/Support/NativeFormatting.h
237 - ../llvm/include/llvm/Support/FormatProviders.h
192 - ../llvm/include/llvm/ADT/StringSwitch.h
190 - ../llvm/include/llvm/Support/FormatVariadicDetails.h
...
This requires duplicating the file_t typedef, which is unfortunate. I
sunk the choice of mapping mode down into the cpp file using variable
template specializations instead of class members in headers.
llvm-ar is using CompareStringOrdinal which is available
only starting with Windows Vista (WINVER 0x600).
Fix this by hoising WindowsSupport.h, which sets _WIN32_WINNT
to 0x0601, up to llvm/include/llvm/Support and use it in llvm-ar.
Patch by Cristian Adam!
Differential revision: https://reviews.llvm.org/D74599
Summary: Include the offset at which this happened.
Reviewers: dblaikie, jhenderson
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75265
MathExtras.h was just wrapping SwapByteOrder.h functionality, so have
the callers use it directly. Use the MathExtras.h name (ByteSwap_NN) as
the standard naming, since it appears to be the most popular.
Summary:
These modificaitons will be used in D74883.
Fixed length C strings can have trailing NULLs or sometimes spaces (BSD archive files), so the fixed length C string defaults to stripping trailing NULLs, but can have the arguments specify to remove one or more kinds of spaces if needed. This is used to extract fixed length C strings from ELF NOTEs in D74883.
Reviewers: labath, dblaikie, aprantl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74991
This patch upstreams support for the AArch64 Armv8-A cpu Cortex-A34.
In detail adding support for:
- mcpu option in clang
- AArch64 Target Features in clang
- llvm AArch64 TargetParser definitions
details of the cpu can be found here:
https://developer.arm.com/ip-products/processors/cortex-a/cortex-a34
Reviewers: SjoerdMeijer
Reviewed By: SjoerdMeijer
Subscribers: SjoerdMeijer, kristof.beyls, hiraditya, cfe-commits,
llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74483
Change-Id: Ida101fc544ca183a0a0e61a1277c8957855fde0b
The CheckAtomic module performs two tests to determine if passing
'-latomic' to the linker is required: one for 64-bit atomics, and
another for non-64-bit atomics. Include the missing check for 64-bit
atomics.
Reviewers: beanz, compnerd
Reviewed By: beanz, compnerd
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69444
As noted on D74621, the bswap intrinsic has a self imposed limitation that the type's bitwidth must be divisible by 16, but there's no reason that APInt::byteSwap must have the same limitation, given that it can already handle any byte width.
Summary:
this review is extracted from D74308.
It creates two error handlers which allow to redefine error
reporting routine and should be used for all places
where errors are reported:
std::function<void(Error)> RecoverableErrorHandler = defaultErrorHandler;
std::function<void(Error)> WarningHandler = defaultWarningHandler;
It also creates accessors to above handlers which should be used to
report errors.
function_ref<void(Error)> getRecoverableErrorHandler() {
return RecoverableErrorHandler;
}
function_ref<void(Error)> getWarningHandler() { return WarningHandler; }
It patches all error reporting places inside DWARFContext and DWARLinker.
Reviewers: jhenderson, dblaikie, probinson, aprantl, JDevlieghere
Reviewed By: jhenderson, JDevlieghere
Subscribers: hiraditya, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D74481
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
This reverts commit 80a34ae311 with fixes.
Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.