Summary:
The previous path reworked some handling of temporary files which
exposed some bugs related to capturing local state by reference in the
callback labmda. Squashing this by copying in everything instead. There
was also a problem where the argument name was changed for
`--bitcode-library=` but clang still used `--target-library=`.
Summary:
This patch reworks the command line argument handling in the linker
wrapper from using the LLVM `cl` interface to using the `Option`
interface with TableGen. This has several benefits compared to the old
method.
We use arguments from the linker arguments in the linker
wrapper, such as the libraries and input files, this allows us to
properly parse these. Additionally we can now easily set up aliases to
the linker wrapper arguments and pass them in the linker input directly.
That is, pass an option like `cuda-path=` as `--offload-arg=cuda-path=`
in the linker's inputs. This will allow us to handle offloading
compilation in the linker itself some day. Finally, this is also a much
cleaner interface for passing arguments to the individual device linking
jobs.
Summary:
A previous patch added a new ELF section type for LLVM offloading. We
should use this when extracting the offloading sections rather than
checking the string. This pach also removes the implicit support for
COFF and MACH-O because we don't support those currently and should not
be included.
Currently we use the `embedBufferInModule` function to store binary
strings containing device offloading data inside the host object to
create a fatbinary. In the case of LTO, we need to extract this object
from the LLVM-IR. This patch adds a metadata node for the embedded
objects containing the embedded pointers and the sections they were
stored at. This should create a cleaner interface for identifying these
values.
In the future it may be worthwhile to also encode an `ID` in the
metadata corresponding to the object's special section type if relevant.
This would allow us to extract the data from an object file and LLVM-IR
using the same ID.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D129033
We use LLD to perform AMDGPU linking. This linker accepts some arguments
through the `-plugin-opt` facilities. These options match what `Clang`
will output when given the same input.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D128923
When doing CTU analysis setup you pre-compile .cpp to .ast and then
you run clang-extdef-mapping on the .cpp file as well. This is a
pretty slow process since we have to recompile the file each time.
With this patch you can now run clang-extdef-mapping directly on
the .ast file. That saves a lot of time.
I tried this on llvm/lib/AsmParser/Parser.cpp and running
extdef-mapping on the .cpp file took 5.4s on my machine.
While running it on the .ast file it took 2s.
This can save a lot of time for the setup phase of CTU analysis.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D128704
Summary:
Currently we just check the extension to set the image kind. This
incorrectly labels the `.o` files created during LTO as object files.
This patch simply adds a check for the bitcode magic bytes instead.
This patch gives basic parsing and semantic support for
"parallel masked taskloop simd" construct introduced in
OpenMP 5.1 (section 2.16.10)
Differential Revision: https://reviews.llvm.org/D128946
This patch gives basic parsing and semantic support for
"parallel masked taskloop" construct introduced in
OpenMP 5.1 (section 2.16.9)
Differential Revision: https://reviews.llvm.org/D128834
Summary:
We don't currently support other variable types, like managed or
surface. This patch simply adds code that checks the flags and does
nothing. This prevents us from registering a surface as a variable as we
do now. In the future, registering these will require adding the flags
to the entry struct.
This patch gives basic parsing and semantic support for
"masked taskloop simd" construct introduced in OpenMP 5.1 (section 2.16.8)
Differential Revision: https://reviews.llvm.org/D128693
In interactive C++ it is convenient to roll back to a previous state of the
compiler. For example:
clang-repl> int x = 42;
clang-repl> %undo
clang-repl> float x = 24 // not an error
To support this, the patch extends the functionality used to recover from
errors and adds functionality to recover the low-level execution infrastructure.
The current implementation is based on watermarks. It exploits the fact that
at each incremental input the underlying compiler infrastructure is in a valid
state. We can only go N incremental inputs back to a previous valid state. We do
not need and do not do any further dependency tracking.
This patch was co-developed with V. Vassilev, relies on the past work of Purva
Chaudhari in clang-repl and is inspired by the past work on the same feature
in the Cling interpreter.
Co-authored-by: Purva-Chaudhari <purva.chaudhari02@gmail.com>
Co-authored-by: Vassil Vassilev <v.g.vassilev@gmail.com>
Signed-off-by: Jun Zhang <jun@junz.org>
This patch mainly handles treating `begin` as block openers.
While and for statements will be handled in another patch.
Reviewed By: HazardyKnusperkeks
Differential Revision: https://reviews.llvm.org/D123450
This patch gives basic parsing and semantic support for "masked taskloop"
construct introduced in OpenMP 5.1 (section 2.16.7)
Differential Revision: https://reviews.llvm.org/D128478
This patch implements soft reset and adds tests for soft reset success of the
diagnostics engine. This allows us to recover from errors in clang-repl without
resetting the pragma handlers' state.
Differential revision: https://reviews.llvm.org/D126183
Summary:
This patch adds some new sanity checks to make sure that the sizes of
the offsets are within the bounds of the file or what is expected by the
binary. This also improves the error handling of the version structure
to be built into the binary itself so we can change it easier.
The target features are necessary for correctly compiling most programs
in LTO mode. Currently, these are derived in clang at link time and
passed as an arguemnt to the linker wrapper. This is problematic because
it requires knowing the required toolchain at link time, which should
not be necessry. Instead, these features should be embedded into the
offloading binary so we can unify them in the linker wrapper for LTO.
This also required changing the offload packager to interpret multiple
arguments as concatenation with a comma. This is so we can still use the
`,` separator for the argument list.
Depends on D127246
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D127686
Summary:
Currently we use temporary files to write the intermediate results to.
However, these are stored as regular strings and we do a few unnecessary
copies and conversions of them. This patch simply replaces these strings
with a reference to the filename stored in the list of temporary files.
The temporary files will stay alive during the whole linking phase and
have stable pointers, so we should be able to cheaply pass references to
them rather than copying them every time.
Summary:
A recent patch added some new code paths to the linker wrapper. Older
compilers seem to have problems with returning errors wrapped in
an Excepted type without explicitly moving them. This caused failures in
some of the buildbots. This patch fixes that.
The linker wrapper currently eagerly extracts all identified offloading
binaries to a file. This isn't ideal because we will soon open these
files again to examine their symbols for LTO and other things.
Additionally, we may not use every extracted file in the case of static
libraries. This would be very noisy in the case of static libraries that
may contain code for several targets not participating in the current
link.
Recent changes allow us to treat an Offloading binary as a standard
binary class. So that allows us to use an OwningBinary to model the
file. Now we keep it in memory and only write it once we know which
files will be participating in the final link job. This also reworks a
lot of the structure around how we handle this by removing the old
DeviceFile class.
The main benefit from this is that the following doesn't output 32+ files and
instead will only output a single temp file for the linked module.
```
$ clang input.c -fopenmp --offload-arch=sm_70 -foffload-lto -save-temps
```
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D127246
Previously, omitting unnecessary DWARF unwinds was only done in two
cases:
* For Darwin + aarch64, if no DWARF unwind info is needed for all the
functions in a TU, then the `__eh_frame` section would be omitted
entirely. If any one function needed DWARF unwind, then MC would emit
DWARF unwind entries for all the functions in the TU.
* For watchOS, MC would omit DWARF unwind on a per-function basis, as
long as compact unwind was available for that function.
This diff makes it so that we omit DWARF unwind on a per-function basis
for Darwin + aarch64 as well. In addition, we introduce the flag
`--emit-dwarf-unwind=` which can toggle between `always`,
`no-compact-unwind` (only emit DWARF when CU cannot be emitted for a
given function), and the target platform `default`. `no-compact-unwind`
is particularly useful for newer x86_64 platforms: we don't want to omit
DWARF unwind for x86_64 in general due to possible backwards compat
issues, but we should make it possible for people to opt into this
behavior if they are only targeting newer platforms.
**Motivation:** I'm working on adding support for `__eh_frame` to LLD,
but I'm concerned that we would suffer a perf hit. Processing compact
unwind is already expensive, and that's a simpler format than EH frames.
Given that MC currently produces one EH frame entry for every compact
unwind entry, I don't think processing them will be cheap. I tried to do
something clever on LLD's end to drop the unnecessary EH frames at parse
time, but this made the code significantly more complex. So I'm looking
at fixing this at the MC level instead.
**Addendum:** It turns out that there was a latent bug in the X86
backend when `OmitDwarfIfHaveCompactUnwind` is naively enabled, which is
not too surprising given that this combination has not been heretofore
used.
For functions that have unwind info that cannot be encoded with CU, MC
would end up dropping both the compact unwind entry (OK; existing
behavior) as well as the DWARF entries (not OK). This diff fixes things
so that we emit the DWARF entry, as well as a CU entry with encoding
`UNWIND_X86_MODE_DWARF` -- this basically tells the unwinder to look for
the DWARF entry. I'm not 100% sure the `UNWIND_X86_MODE_DWARF` CU entry
is necessary, this was the simplest fix. ld64 seems to be able to handle
both the absence and presence of this CU entry. Ultimately ld64 (and
LLD) will synthesize `UNWIND_X86_MODE_DWARF` if it is absent, so there
is no impact to the final binary size.
Reviewed By: davide, lhames
Differential Revision: https://reviews.llvm.org/D122258
When running scan-build-py's analyze-build script with output format set
to sarif & html it wants to print a message on how to look at the
defects mentioning the directory name twice.
But the path argument was only given once to the logging function,
causing "TypeError: not enough arguments for format string" exception.
Differential Revision: https://reviews.llvm.org/D126974
Building on D126796, this commit adds the infrastructure for being able
to print out descriptions of what each warning does.
Differential Revision: https://reviews.llvm.org/D126832
Summary:
The OffloadingBinary uses a convenience struct to help manage the memory
that will be serialized using the binary format. This currently uses a
reference to an existing buffer, but this should own the memory instead
so it is easier to work with seeing as its only current use requires
saving the buffer anyway.
This patch adds an llvm-driver multicall tool that can combine multiple
LLVM-based tools. The build infrastructure is enabled for a tool by
adding the GENERATE_DRIVER option to the add_llvm_executable CMake
call, and changing the tool's main function to a canonicalized
tool_name_main format (i.e. llvm_ar_main, clang_main, etc...).
As currently implemented llvm-driver contains dsymutil, llvm-ar,
llvm-cxxfilt, llvm-objcopy, and clang (if clang is included in the
build).
llvm-driver can be enabled from builds by setting
LLVM_TOOL_LLVM_DRIVER_BUILD=On.
There are several limitations in the current implementation, which can
be addressed in subsequent patches:
(1) the multicall binary cannot currently properly handle
multi-dispatch tools. This means symlinking llvm-ranlib to llvm-driver
will not properly result in llvm-ar's main being called.
(2) the multicall binary cannot be comprised of tools containing
conflicting cl::opt options as the global cl::opt option list cannot
contain duplicates.
These limitations can be addressed in subsequent patches.
Differential revision: https://reviews.llvm.org/D109977
The "#!" line in all scan-build-py scripts were using just bare
"/usr/bin/python" which according to PEP-0394 can be either python3,
python2 or not exist at all.
E.g in latest debian and ubuntu releases "/usr/bin/python" does not
exist at all by default and user must install python-is-python2 or
python-is-python3 packages to get the bare version less "python"
command.
Until recently (70b06fe8a1 "scan-build-py: Force the opening in utf-8"
changed "libscanbuild") these scripts worked in both python2 and
python3, but now they (rightfully) are python3 only, and broke on
systems where the "python" command means python2.
By changing the "#!" to be "python3" it is not only explicit that the
scripts require python3 it also works on systems where "python" command
is python2 or nonexistent.
Differential Revision: https://reviews.llvm.org/D126804
`-gen-reproducer` causes crash reproduction to be emitted
even when clang didn't crash, and now can optionally take an
argument of never, on-crash (default), on-error and always.
Differential revision: https://reviews.llvm.org/D120201
musl-libc doesn't support dladdr in statically linked binaries:
> Are you using static or dynamic linking? If static, dladdr is just a
> stub that always fails. It could be implemented to work under some
> conditions, but it would be highly dependent on what options you
> compile the binary with, since by default static binaries do not
> contain the bloat that would be needed to perform introspection.
Source: https://www.openwall.com/lists/musl/2013/01/15/25 (in response
to a bug report).
Libclang unfortunately uses dladdr to find the ResourcesPath so will
fail if it is linked statically on Alpine Linux. This patch fixes this
issue by falling back to getMainExecutable if dladdr returns an error.
Reference: https://github.com/llvm/llvm-project/issues/40641#issuecomment-981011427
Differential Revision: https://reviews.llvm.org/D124815
-gen-reproducer causes crash reproduction to be emitted even
when clang didn't crash, and now can optionally take an argument
of never, on-crash (default), on-error and always.
Differential revision: https://reviews.llvm.org/D120201
This is first of a series of patches for making the special lexing for dependency scanning a first-class feature of the `Preprocessor` and `Lexer`.
This patch only includes NFC renaming changes to make reviewing of the functionality changing parts easier.
Differential Revision: https://reviews.llvm.org/D125484
This is a support for " #pragma omp atomic compare fail ". It has Parser & AST support for now.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D123235
Summary:
The linker wrapper supports embedding bitcode images instead of linked
device images to facilitate JIT in the device runtime. However, we were
incorrectly passing in the file twice when this option was set. This
patch makes sure we only use the intermediate result of the LTO pass and
don't add the final output to the full job.
In the future we will want to add both of these andle handle that
accoridngly to allow the runtime to either use the AoT compiled version
or JIT compile the bitcode version if availible.
This commit builds upon recently added indexing support for C++ concepts
from https://reviews.llvm.org/D124441 by extending libclang to
support indexing and visiting concepts, constraints and requires
expressions as well.
Differential Revision: https://reviews.llvm.org/D126031
We use the clang-linker-wrapper to perform device linking of embedded
offloading object files. This is done by generating those jobs inside of
the linker-wrapper itself. This patch adds an argument in Clang and the
linker-wrapper that allows users to forward input to the device linking
phase. This can either be done for every device linker, or for a
specific target triple. We use the `-Xoffload-linker <arg>` and the
`-Xoffload-linker-<triple> <arg>` syntax to accomplish this.
Reviewed By: markdewing, tra
Differential Revision: https://reviews.llvm.org/D126226
Allows emitting define amdgpu_kernel void @func() IR from C or C++.
This replaces the current workflow which is to write a stub in opencl that
calls an external C function implemented in C++ combined through llvm-link.
Calling the resulting function still requires a manual implementation of the
ABI from the host side. The primary application is for more rapid debugging
of the amdgpu backend by permuting a C or C++ test file instead of manually
updating an IR file.
Implementation closely follows D54425. Non-amd reviewers from there.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D125970
On Apple Silicon Macs, using a Darwin thread priority of PRIO_DARWIN_BG seems to
map directly to the QoS class Background. With this priority, the thread is
confined to efficiency cores only, which makes background indexing take forever.
Introduce a new ThreadPriority "Low" that sits in the middle between Background
and Default, and maps to QoS class "Utility" on Mac. Make this new priority the
default for indexing. This makes the thread run on all cores, but still lowers
priority enough to keep the machine responsive, and not interfere with
user-initiated actions.
I didn't change the implementations for Windows and Linux; on these systems,
both ThreadPriority::Background and ThreadPriority::Low map to the same thread
priority. This could be changed as a followup (e.g. by using SCHED_BATCH for Low
on Linux).
See also https://github.com/clangd/clangd/issues/1119.
Reviewed By: sammccall, dgoldman
Differential Revision: https://reviews.llvm.org/D124715
Summary:
Static libraries need to be handled differently from regular inpout
files, namely they are loaded lazily. Previously we used a flag to
indicate a file camm from a static library. This patch simplifies this
by simply keeping a different array that contains the static libraries
so we don't need to parse them out again.
Summary:
The linker wrapper previously had functionality to strip the sections
manually. We don't use this at all because this is much better done by
the linker via the `SHF_EXCLUDE` flag. This patch simply removes the
support for thi sfeature to simplify the code.
Summary:
We use embedded binaries to extract offloading device code from the host
fatbinary. This uses a binary format whose necessary alignment is
eight bytes. The alignment is included within the ELF section type so
the data extracted from the ELF should always be aligned at that amount.
However, if this file was extraqcted from a static archive, it was being
sent as an offset in the archive file which did not have the same
alignment guaruntees as the ELF file. This was causing errors in the
UB-sanitizer build as it would occasionally try to access a misaligned
address. To fix this, I simply copy the memory directly to a new buffer
which is guarnteed to have worst-case alignment of 16 in the case that
it's not properly aligned.
In order to do offloading compilation we need to embed files into the
host and create fatbainaries. Clang uses a special binary format to
bundle several files along with their metadata into a single binary
image. This is currently performed using the `-fembed-offload-binary`
option. However this is not very extensibile since it requires changing
the command flag every time we want to add something and makes optional
arguments difficult. This patch introduces a new tool called
`clang-offload-packager` that behaves similarly to CUDA's `fatbinary`.
This tool takes several input files with metadata and embeds it into a
single image that can then be embedded in the host.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D125165
This patch adds the necessary code generation to create the wrapper code
that registers all the globals in CUDA. We create the necessary
functions and iterate through the list of
`__start_cuda_offloading_entries` to find which globals must be
registered. This is very similar to the code generation done currently
in Clang for non-rdc builds, but here we are registering a fully linked
fatbinary and finding the globals via the above sections.
With this we should be able to fully support basic RDC / LTO building of CUDA
code.
It's also worth noting that this does not include the necessary PTX to JIT the
image, so to use this support the offloading architecture must match the
system's architecture.
Depends on D123810
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D123812
This patch adds the initial support for wrapping CUDA images. This
requires changing some of the logic for how we bundle images. We now
need to copy the image for all kinds that are active for the
architecture. Then we need to run a separate wrapping job if the Kind is
Cuda. For cuda wrapping we need to use the `fatbinary` program from the
CUDA SDK to bundle all the binaries together. This is then passed to a
new function to perfom the actual module code generation that will be
implemented in a later patch.
Depends on D120273 D123471
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D123810
This is generally a better default for tools other than the compiler, which
shouldn't assume a PCH file on disk is something they can consume.
Preserve the old behavior in places associated with libclang/c-index-test
(including ASTUnit) as there are tests relying on it and most important
consumers are out-of-tree. It's unclear whether the tests are specifically
trying to test this functionality, and what the downstream implications of
removing it are. Hopefully someone more familiar can clean this up in future.
Differential Revision: https://reviews.llvm.org/D125149
It should be useful clang-fuzzer itself, though my own motivation is
to use this in fuzzing clang-pseudo. (clang-tools-extra/pseudo/fuzzer).
Differential Revision: https://reviews.llvm.org/D125166
All llvm-project fuzzers use this library to parse command-line arguments.
Many of them don't deal with LLVM IR or modules in any way. Bundling those
functions in one library forces build dependencies that don't need to be there.
Among other things, this means check-clang-pseudo no longer depends on most of
LLVM.
Differential Revision: https://reviews.llvm.org/D125081
Currently we handle static libraries like any other object in the
linker wrapper. However, this does not preserve the sematnics that
dictate static libraries should be lazily loaded as the symbols are
needed. This allows us to ignore linking in architectures that are not
used by the main application being compiled. This patch adds the basic
support for detecting if a file came from a static library, and only
including it in the link job if it's used by other object files.
This patch only adds the basic support, to be more correct we should
check the symbols and only inclue the library if the link job contains
symbols that are needed. Ideally we could just put this on the linker
itself, but nvlink doesn't seem to support `.a` files.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D125092
Summary:
A previous patch merged the command execution and printing into a helper
function. The old printing code wasn't removed causing each to be
printed twice.
After basic support for embedding and handling CUDA files was added to
the new driver, we should be able to call CUDA functions from OpenMP
code. This patch makes the necessary changes to successfuly link in CUDA
programs that were compiled using the new driver. With this patch it
should be possible to compile device-only CUDA code (no kernels) and
call it from OpenMP as follows:
```
$ clang++ cuda.cu -fopenmp-new-driver -offload-arch=sm_70 -c
$ clang++ openmp.cpp cuda.o -fopenmp-new-driver -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_70
```
Currently this requires using a host variant to suppress the generation
of a CPU-side fallback call.
Depends on D120272
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120273
Delete the output streams coming from
CompilerInstance::createOutputFile() and friends once writes are
finished. Concretely, replacing `OS->flush()` with `OS.reset()` in:
- `ExtractAPIAction::EndSourceFileAction()`
- `PrecompiledPreambleAction::setEmittedPreamblePCH()`
- `cc1_main()'s support for `-ftime-trace`
This fixes theoretical bugs related to proxy streams, which may have
cleanups to run in their destructor. For example, a proxy that
CompilerInstance sometimes uses is `buffer_ostream`, which wraps a
`raw_ostream` lacking pwrite support and adds it. `flush()` does not
promise that output is complete; `buffer_ostream` needs to wait until
the destructor to forward anything so that it can service later calls to
`pwrite()`. If the destructor isn't called then the proxied stream
hasn't received any content.
This also protects against some logic bugs, triggering a null
dereference on a later attempt to write to the stream.
No tests, since in practice these particular code paths never use
use `buffer_ostream`; you need to be writing a binary file to a
pipe (such as stdout) to hit it, but `-extract-api` writes a text file
and the other two use computed filenames that will never (in practice)
be a pipe. This is effectively NFC, for now.
But I have some other patches in the works that add guard rails,
crashing if the stream hasn't been destructed by the time the
CompilerInstance is told to keep the output file, since in most cases
this is a problem.
Differential Revision: https://reviews.llvm.org/D124635
This is to improve maintenance a bit and remove need to maintain the additional option and related code-paths.
Differential Revision: https://reviews.llvm.org/D124558
Summary:
A previous patch updated the path searching in the linker wrapper. I
made an error and caused `lld`, which is necessary to link AMDGPU
images, to not be found on some systems. This patch fixes this by
correctly searching that linker-wrapper's binary path first again.
Add support for concepts and requires expression in the clang index.
Genarate USRs for concepts.
Also change how `RecursiveASTVisitor` handles return type requirement in
requires expressions. The new code unpacks the synthetic template parameter
list used for storing the actual expression. This simplifies
implementation of the indexing. No code seems to depend on the original
traversal anyway and the synthesized template parameter list is easily
accessible from inside the requires expression if needed.
Add tests in the clangd codebase.
Fixes https://github.com/clangd/clangd/issues/1103.
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D124441
When we do LTO we consider ourselves to have whole program visibility if
every single input file we have contains LLVM bitcode. If we have whole
program visibliity then we can create a single image and utilize CUDA's
non-RDC mode by not passing `-c` to `ptxas` and ignoring the `nvlink`
job. This should be faster for some situations and also saves us the
time executing `nvlink`.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D124292
This patch extends cc1as to export the build version load command with
LC_VERSION_MIN_MACOSX.
This is especially important for Mac Catalyst as Mac Catalyst uses
the MacOS's compiler rt built-ins.
Differential Revision: https://reviews.llvm.org/D121868
The linker wrapper is used to perform linking and wrapping of embedded
device object files. Currently its internals are not able to be tested
easily. This patch adds the `--dry-run` and `--print-wrapped-module`
options to investigate the link jobs that will be run along with the
wrapped code that will be created to register the binaries.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D124039
The previous patch introduced the offloading binary format so we can
store some metada along with the binary image. This patch introduces
using this inside the linker wrapper and Clang instead of the previous
method that embedded the metadata in the section name.
Differential Revision: https://reviews.llvm.org/D122683
Summary:
The changes in D122987 ensures that the offloading sections always have
the SHF_EXCLUDE flag. This means that we do not need to manually strip
these sections for ELF or COFF targets.