This patch completely removes the old OpenMP device runtime. Previously,
the old runtime had the prefix `libomptarget-new-` and the old runtime
was simply called `libomptarget-`. This patch makes the formerly new
runtime the only runtime available. The entire project has been deleted,
and all references to the `libomptarget-new` runtime has been replaced
with `libomptarget-`.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D118934
The linker wrapper tool uses the 'nvlink' and 'ptxas' binaries to link
and assemble device files. Previously we searched for this using the
binaries in the user's path. This didn't work in cases where the user
passed in a specific Cuda path to Clang. This patch changes the linker
wrapper to accept an argument for the Cuda path we can get from Clang.
This should fix#53573.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D118944
Summary:
This patch implements the `-save-temps` flag for the linker wrapper.
This allows the user to inspect the intermeditary outpout that the
linker wrapper creates.
This patch adds support for a few extra flags in the linker wrapper,
such as debugging flags, verbose output, and passing arguments to ptxas. We also
now forward pass remarks to the LLVM backend so they will show up in the LTO
passes.
Depends on D117049
Differential Revision: https://reviews.llvm.org/D117156
Summary;
This patch adds support for embedding device images in the linker
wrapper tool. This will be used for performing JIT functionality in the
future.
Depends on D117048
Differential Revision: https://reviews.llvm.org/D117049
Summary:
This patch adds support for linking the OpenMP device bitcode library
late when doing LTO. This simply passes it in as an additional device
file when doing the final device linking phase with LTO. This has the
advantage that we don't link it multiple times, and the device
references do not get inlined and prevent us from doing needed OpenMP
optimizations when we have visiblity of the whole module.
Fix some failings where the implicit conversion of an Error to an
Expected triggered the deleted copy constructor.
Depends on D116675
Differential revision: https://reviews.llvm.org/D117048
This patch implements the fist support for handling LTO in the
offloading pipeline. The flag `-foffload-lto` is used to control if
bitcode is embedded into the device. If bitcode is found in the device,
the extracted files will be sent to the LTO pipeline to be linked and
sent to the backend. This implementation does not separately link the
device bitcode libraries yet.
Depends on D116675
Differential Revision: https://reviews.llvm.org/D116975
This patch introduces a linker wrapper tool that allows us to preprocess
files before they are sent to the linker. This adds a dummy action and
job to the driver stage that builds the linker command as usual and then
replaces the command line with the wrapper tool.
Depends on D116543
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D116544
This patch adds support for embedding the device object files into the
host IR to create a fat binary. Each offloading file will be inserted
into a section with the following naming format
`.llvm.offloading.<triple>.<arch>.<filename>`.
Depends on D116542
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D116543
This patch introduces the `-fopenmp-new-driver` option which instructs
the compiler to use a new driver scheme for producing offloading code.
In this scheme we create a complete offloading object file and then pass
it as input to the host compilation phase. This will allow us to embed
the object code in the backend phase.
This is the start of a series of commits to rework the OpenMP offloading driver
pipeline. The goal of this is to simplify the steps required for creating an
offloading program. This patch changes the driver's configuration to simply pass
the device file back to the host as an input so it can be embedded as an LLVM IR
global during the backend, then simply passes that object file to the linker.
This driver implementation will currently create the following phases,
```
$ clang input.c -fopenmp -fopenmp-targets=nvptx64 -fopenmp-new-driver -ccc-print-phases
+- 0: input, "input.c", c, (host-openmp)
+- 1: preprocessor, {0}, cpp-output, (host-openmp)
+- 2: compiler, {1}, ir, (host-openmp)
| | +- 3: input, "input.c", c, (device-openmp)
| | +- 4: preprocessor, {3}, cpp-output, (device-openmp)
| |- 5: compiler, {4}, ir, (device-openmp)
| +- 6: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (nvptx64)" {5}, ir
| +- 7: backend, {6}, assembler, (device-openmp)
|- 8: assembler, {7}, object, (device-openmp)
+- 9: offload, "host-openmp (x86_64-unknown-linux-gnu)" {2}, "device-openmp (nvptx64)" {8}, ir
+- 10: backend, {9}, assembler, (host-openmp)
+- 11: assembler, {10}, object, (host-openmp)
12: clang-linker-wrapper, {11}, image, (host-openmp)
```
Which will map to the following bindings
```
# "x86_64-unknown-linux-gnu" - "clang", inputs: ["input.c"], output: "/tmp/input-bae62e.bc"
# "nvptx64" - "clang", inputs: ["input.c", "/tmp/input-bae62e.bc"], output: "/tmp/input-76784e.s"
# "nvptx64" - "NVPTX::Assembler", inputs: ["/tmp/input-76784e.s"], output: "/tmp/input-8f29db.o"
# "x86_64-unknown-linux-gnu" - "clang", inputs: ["/tmp/input-bae62e.bc", "/tmp/input-8f29db.o"], output: "/tmp/input-545450.o"
# "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["/tmp/input-545450.o"], output: "a.out"
```
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D116541
Branch protection in M-class is supported by
- Armv8.1-M.Main
- Armv8-M.Main
- Armv7-M
Attempting to enable this for other architectures, either by
command-line (e.g -mbranch-protection=bti) or by target attribute
in source code (e.g. __attribute__((target("branch-protection=..."))) )
will generate a warning.
In both cases function attributes related to branch protection will not
be emitted. Regardless of the warning, module level attributes related to
branch protection will be emitted when it is enabled via the command-line.
The following people also contributed to this patch:
- Victor Campos
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D115501
GCC added -gz=zlib-gnu in 2014 for -gz meaning change (.zdebug =>
SHF_COMPRESSED) and the legacy zlib-gnu hasn't gain adoption.
According to Debian Code Search (`gz=zlib-gnu`), no project uses -gz=zlib-gnu
(valgrind has a configure to use -gz=zlib). Any possible -gz=zlib-gnu user can
switch to -gz smoothly (supported by integrated assemblers for many years;
binutils 2.26).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D117744
This method introduces new CMake variable
PPC_LINUX_DEFAULT_IEEELONGDOUBLE (false by default) to enable fp128 as
default long double format.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D118110
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function.
Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1].
This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage.
A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540.
The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged.
[1] - https://lkml.org/lkml/2021/11/22/591
Reviewed By: xiangzhangllvm
Differential Revision: https://reviews.llvm.org/D116070
This patch changes the special-case handling of visibility when
compiling for an OpenMP target offloading device. This was orignally
added as a precaution against the bug encountered in PR41826 when
symbols in the device were being preempted by shared library symbols.
This should instead be done by making the visibility protected by default.
With protected visibility we are asserting that the symbols on the device
will never be preempted or preempt another symbol pending a shared library
load.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D117806
This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019
The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching).
Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: d703b92296/lld/COFF/Writer.cpp (L1298)
The outcome is that we can finally use Live++ or Recode along with clang-cl.
NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result.
Depends on D43002, D80833 and D81301 for the full feature.
Differential Revision: https://reviews.llvm.org/D116511
This was intended to be fixed by D98856, but that only seemed to have
the desired behaviour when compiling to assembly using `-S`, not when
compiling into an object file or executable. Given that this was not
the intention of D98856, this patch fixes the behaviour.
Added warning for potential cases of
unaligned access when option
-mno-unaligned-access has been specified
Differential Revision: https://reviews.llvm.org/D116221
Early revisions of the VR4300 have a hardware bug where two consecutive
multiplications can produce an incorrect result in the second multiply.
This revision adds the `-mfix4300` flag to llvm (and clang) which, when
passed, provides a software fix for this issue.
More precise description of the "mulmul" bug:
```
mul.[s,d] fd,fs,ft
mul.[s,d] fd,fs,ft or [D]MULT[U] rs,rt
```
When the above sequence is executed by the CPU, if at least one of the
source operands of the first mul instruction happens to be `sNaN`, `0`
or `Infinity`, then the second mul instruction may produce an incorrect
result. This can happen both if the two mul instructions are next to each
other and if the first one is in a delay slot and the second is the first
instruction of the branch target.
Description of the fix:
This fix adds a backend pass to llvm which scans for mul instructions in
each basic block and inserts a nop whenever the following conditions are
met:
- The current instruction is a single or double-precision floating-point
mul instruction.
- The next instruction is either a mul instruction (any kind) or a branch
instruction.
Differential Revision: https://reviews.llvm.org/D116238
The Linux kernel has a make macro called cc-option that invokes the
compiler with an option in isolation to see if it is supported before
adding it to CFLAGS. The exit code of the compiler is used to determine
if the flag is supported and should be added to the compiler invocation.
A call to cc-option with '-mno-outline-atomics' was added to prevent
linking errors with newer GCC versions but this call succeeds with a
non-AArch64 target because there is no warning from clang with
'-mno-outline-atomics', just '-moutline-atomics'. Because the call
succeeds and adds '-mno-outline-atomics' to the compiler invocation,
there is a warning from LLVM because the 'outline-atomics target
feature is only supported by the AArch64 backend.
$ echo | clang -target x86_64 -moutline-atomics -Werror -x c -c -o /dev/null -
clang-14: error: The 'x86_64' architecture does not support -moutline-atomics; flag ignored [-Werror,-Woption-ignored]
$ echo $?
1
$ echo | clang -target x86_64 -mno-outline-atomics -Werror -x c -c -o /dev/null -
'-outline-atomics' is not a recognized feature for this target (ignoring feature)
$ echo $?
0
This does not match GCC's behavior, which errors when the flag is added
to a non-AArch64 target.
$ echo | gcc -moutline-atomics -x c -c -o /dev/null -
gcc: error: unrecognized command-line option ‘-moutline-atomics’; did you mean ‘-finline-atomics’?
$ echo | gcc -mno-outline-atomics -x c -c -o /dev/null -
gcc: error: unrecognized command-line option ‘-mno-outline-atomics’; did you mean ‘-fno-inline-atomics’?
$ echo | aarch64-linux-gnu-gcc -moutline-atomics -x c -c -o /dev/null -
$ echo | aarch64-linux-gnu-gcc -mno-outline-atomics -x c -c -o /dev/null -
To get closer to GCC's behavior, issue a warning when
'-mno-outline-atomics' is used without an AArch64 triple and do not add
'{-,+}outline-atomic" to the list of target features in these cases.
Link: https://github.com/ClangBuiltLinux/linux/issues/1552
Reviewed By: melver, nickdesaulniers
Differential Revision: https://reviews.llvm.org/D116128
This patch adds a toolchain (TC) for SPIR-V along with the
following changes in Driver and base ToolChain and Tool.
This is required to provide a mechanism in clang to bypass
SPIR-V backend in LLVM for SPIR-V until it lands in LLVM and
matures.
The SPIR-V code is generated by the SPIRV-LLVM translator tool
named 'llvm-spirv' that is sought in 'PATH'.
The compilation phases/actions should be bound for SPIR-V in
the meantime as following:
compile -> tools::Clang
backend -> tools::SPIRV::Translator
assemble -> tools::SPIRV::Translator
However, Driver’s ToolSelector collapses compile-backend-assemble
and compile-backend sequences to tools::Clang. To prevent this,
added new {use,has}IntegratedBackend properties in ToolChain and
Tool to which the ToolSelector reacts on, and which SPIR-V TC
overrides.
Linking of multiple input files is currently not supported but
can be added separately.
Differential Revision: https://reviews.llvm.org/D112410
Co-authored-by: Henry Linjamäki <henry.linjamaki@parmance.com>
Reland integrates build fixes & further review suggestions.
Thanks to @zturner for the initial S_OBJNAME patch!
Differential Revision: https://reviews.llvm.org/D43002
Also revert all subsequent fixes:
- abd1cbf5e5 [Clang] Disable debug-info-objname.cpp test on Unix until I sort out the issue.
- 00ec441253 [Clang] debug-info-objname.cpp test: explictly encode a x86 target when using %clang_cl to avoid falling back to a native CPU triple.
- cd407f6e52 [Clang] Fix build by restricting debug-info-objname.cpp test to x86.
This patch refactors the HIP tool chain for new HIP tool chain, HIPSPV
tool chain, which is added in the follow up patch part 2.
Rename HIPToolChain to HIPAMDToolChain and Renames HIP.* files to HIPAMD.*.
Introduce HIPUtility.* file where common HIP utilities, shared among HIP
tool chain implementations, are placed in.
Move constructHIPFatbinCommand() and
constructGenerateObjFileFromHIPFatBinary() to HIPUtility. HIPSPV tool
chain is going to use them.
Tweak bundle target ID in constructHIPFatbinCommand(): extra dashes are
dropped if the Target ID is empty and 'hip' offload kind is made default
for non-AMD targets.
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu, Artem Belevich, Eric Christopher
Differential Revision: https://reviews.llvm.org/D110549
When this pass was originally implemented, the fix pass was enabled
using a llvm command-line flag. This works fine, except in the case of
LTO, where the flag is not passed into the linker plugin in order to
enable the function pass in the LTO backend.
Now LTO exists, the expectation now is to use target features rather
than command-line arguments to control code generation, as this ensures
that different command-line arguments in different files are correctly
represented, and target-features always get to the LTO plugin as they
are encoded into LLVM IR.
The fall-out of this change is that the fix pass has to always be added
to the backend pass pipeline, so now it makes no changes if the function
does not have the right target feature to enable it. This should make a
minimal difference to compile time.
One advantage is it's now much easier to enable when compiling for a
Cortex-A53, as CPUs imply their own individual sets of target-features,
in a more fine-grained way. I haven't done this yet, but it is an
option, if the fix should be enabled in more places.
Existing tests of the user interface are unaffected, the changes are to
reflect that the argument is now turned into a target feature.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D114703
The new runtime is currently broken for AMD offloading. This patch makes
the default the old runtime only for the AMD target.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D114965
This patch changes the `-fopenmp-target-new-runtime` option which controls if
the new or old device runtime is used to be true by default. Disabling this to
use the old runtime now requires using `-fno-openmp-target-new-runtime`.
Reviewed By: JonChesterfield, tianshilei1992, gregrodgers, ronlieb
Differential Revision: https://reviews.llvm.org/D114890
Handle branch protection option on the commandline as well as a function
attribute. One patch for both mechanisms, as they use the same underlying
parsing mechanism.
These are recorded in a set of LLVM IR module-level attributes like we do for
AArch64 PAC/BTI (see https://reviews.llvm.org/D85649):
- command-line options are "translated" to module-level LLVM IR
attributes (metadata).
- functions have PAC/BTI specific attributes iff the
__attribute__((target("branch-protection=...))) was used in the function
declaration.
- command-line option -mbranch-protection to armclang targeting Arm,
following this grammar:
branch-protection ::= "-mbranch-protection=" <protection>
protection ::= "none" | "standard" | "bti" [ "+" <pac-ret-clause> ]
| <pac-ret-clause> [ "+" "bti"]
pac-ret-clause ::= "pac-ret" [ "+" <pac-ret-option> ]
pac-ret-option ::= "leaf" ["+" "b-key"] | "b-key" ["+" "leaf"]
b-key is simply a placeholder to make it consistent with AArch64's
version. In Arm, however, it triggers a warning informing that b-key is
unsupported and a-key will be selected instead.
- Handle _attribute_((target(("branch-protection=..."))) for AArch32 with the
same grammer as the commandline options.
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Momchil Velikov
- Victor Campos
- Ties Stuij
Reviewed By: vhscampos
Differential Revision: https://reviews.llvm.org/D112421
Allow toggling of -fnew-infallible so last instance takes precedence
Testing:
ninja check-all
Reviewed By: bruno
Differential Revision: https://reviews.llvm.org/D113523
[NFC] As part of using inclusive language within the llvm project, this patch
renames master flag to main flag in these comments.
Reviewed By: ZarkoCA
Differential Revision: https://reviews.llvm.org/D114090
From GCC's manpage:
-fplugin-arg-name-key=value
Define an argument called key with a value of value for the
plugin called name.
Since we don't have a key-value pair similar to gcc's plugin_argument
struct, simply accept key=value here anyway and pass it along as-is to
plugins.
This translates to the already existing '-plugin-arg-pluginname arg'
that clang cc1 accepts.
There is an ambiguity here because in clang, both the plugin name
as well as the option name can contain dashes, so when e.g. passing
-fplugin-arg-foo-bar-foo
it is not clear whether the plugin is foo-bar and the option is foo,
or the plugin is foo and the option is bar-foo. GCC solves this by
interpreting all dashes as part of the option name. So dashes can't be
part of the plugin name in this case.
Differential Revision: https://reviews.llvm.org/D113250
AMD64 ABI mandates caller to specify the number of used SSE registers
when passing variable arguments.
GCC also provides option -mskip-rax-setup to skip the setup of rax when
SSE is disabled. This helps to reduce the code size, see pr23258.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112413
ld.lld used by Android ignores .note.GNU-stack and defaults to noexecstack,
so the `-z noexecstack` linker option is unneeded.
The `--noexecstack` assembler option is unneeded because AsmPrinter.cpp
prints `.section .note.GNU-stack,"",@progbits` (when `llvm.init.trampoline` is unused),
so the assembler won't synthesize an executable .note.GNU-stack.
Reviewed By: danalbert
Differential Revision: https://reviews.llvm.org/D113840
With this,
void f() { __asm__("mov eax, ebx"); }
now compiles with clang with -masm=intel.
This matches gcc.
The flag is not accepted in clang-cl mode. It has no effect on
MSVC-style `__asm {}` blocks, which are unconditionally in intel
mode both before and after this change.
One difference to gcc is that in clang, inline asm strings are
"local" while they're "global" in gcc. Building the following with
-masm=intel works with clang, but not with gcc where the ".att_syntax"
from the 2nd __asm__() is in effect until file end (or until a
".intel_syntax" somewhere later in the file):
__asm__("mov eax, ebx");
__asm__(".att_syntax\nmovl %ebx, %eax");
__asm__("mov eax, ebx");
This also updates clang's intrinsic headers to work both in
-masm=att (the default) and -masm=intel modes.
The official solution for this according to "Multiple assembler dialects in asm
templates" in gcc docs->Extensions->Inline Assembly->Extended Asm
is to write every inline asm snippet twice:
bt{l %[Offset],%[Base] | %[Base],%[Offset]}
This works in LLVM after D113932 and D113894, so use that.
(Just putting `.att_syntax` at the start of the snippet works in some but not
all cases: When LLVM interpolates in parameters like `%0`, it uses at&t or
intel syntax according to the inline asm snippet's flavor, so the `.att_syntax`
within the snippet happens to late: The interpolated-in parameter is already
in intel style, and then won't parse in the switched `.att_syntax`.)
It might be nice to invent a `#pragma clang asm_dialect push "att"` /
`#pragma clang asm_dialect pop` to be able to force asm style per snippet,
so that the inline asm string doesn't contain the same code in two variants,
but let's leave that for a follow-up.
Fixes PR21401 and PR20241.
Differential Revision: https://reviews.llvm.org/D113707
Change the error message to use ignorelist, and changed some variable and function
names in related code and test.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D113189
The driver uses class SanitizerArgs to store parsed sanitizer arguments. It keeps a cached
SanitizerArgs object in ToolChain and uses it for different jobs. This does not work if
the sanitizer options are different for different jobs, which could happen when an
offloading toolchain translates the options for different jobs.
To fix this, SanitizerArgs should be created by using the actual arguments passed
to jobs instead of the original arguments passed to the driver, since the toolchain
may change the original arguments. And the sanitizer arguments should be diagnose
once.
This patch also fixes HIP toolchain for handling -fgpu-sanitize: a warning is emitted
for GPU's not supporting sanitizer and skipped. This is for backward compatibility
with existing -fsanitize options. -fgpu-sanitize is also turned on by default.
Reviewed by: Artem Belevich, Evgenii Stepanov
Differential Revision: https://reviews.llvm.org/D111443
Implement support for loading the stack canary from a memory location held in
the TLS register, with an optional offset applied. This is used by the Linux
kernel to implement per-task stack canaries, which is impossible on SMP systems
when using a global variable for the stack canary.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112768
Add new triple and target info for ‘spirv32’ and ‘spirv64’ and,
thus, enabling clang (LLVM IR) code emission to SPIR-V target.
The target for SPIR-V is mostly reused from SPIR by derivation
from a common base class since IR output for SPIR-V is mostly
the same as SPIR. Some refactoring are made accordingly.
Added and updated tests for parts that are different between
SPIR and SPIR-V.
Patch by linjamaki (Henry Linjamäki)!
Differential Revision: https://reviews.llvm.org/D109144
Trying to update some options that don't at least have an inclusive language version.
This patch adds `objcmt-allowlist-dir-path` as a default alternative.
Reviewed By: akyrtzi
Differential Revision: https://reviews.llvm.org/D112591
This reverts commit 2d7fba5f95.
The patch was reverted because it caused regression with rocThrust
due to ambiguity of template specialization.
For details please see https://reviews.llvm.org/D109496
A resolution to the ambiguity issues created by P0522, which is a DR solving
CWG 150, did not come as expected, so we are just going to accept the change,
and watch how users digest it.
For now we deprecate the flag with a warning, and make it on by default.
We don't remove the flag completely in order to give users a chance to
work around any problems by disabling it.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D109496
This patch splits the existing SveVectorBits LangOpt into VScaleMin and
VScaleMax LangOpts such that we can represent such an option. The cc1
option has also been split into -mvscale-{min,max}=<n> options so that the
cc1 arguments better reflect the vscale_range IR attribute.
Differential Revision: https://reviews.llvm.org/D111790
Representation of the file's last modification time depends on the file
system and isn't guaranteed to be in seconds. Cast to seconds explicitly
and tighten the test case to check the magnitude of the calculated
value, so we can catch passing milliseconds or nanoseconds.
rdar://83915615
Differential Revision: https://reviews.llvm.org/D111205
This patch ensures that we always tune for a given CPU on AArch64
targets when the user specifies the "-mtune=xyz" flag. In the
AArch64Subtarget if the tune flag is unset we use the CPU value
instead.
I've updated the release notes here:
llvm/docs/ReleaseNotes.rst
and added tests here:
clang/test/Driver/aarch64-mtune.c
Differential Revision: https://reviews.llvm.org/D110258
By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true`
and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values
with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.
Reviewed By: yaxunl, #amdgpu
Differential Revision: https://reviews.llvm.org/D109707
By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true`
and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values
with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.
Reviewed By: yaxunl, #amdgpu
Differential Revision: https://reviews.llvm.org/D109707
This reland commit 1131b1eb35, which
adds support to __attribute__((availability)) annotation for Fuchsia
platform. This patch also adds '-ffuchsia-api-level' to allow specify
Fuchsia API level from the command line.
Differential Revision: https://reviews.llvm.org/D108592
This patch adds support to __attribute__((availability)) annotation for
Fuchsia platform. This patch also adds '-ffuchsia-api-level' to allow
specify Fuchsia API level from the command line.
Differential Revision: https://reviews.llvm.org/D108592
An archive containing device code object files can be passed to
clang command line for linking. For each given offload target
it creates a device specific archives which is either passed to llvm-link
if the target is amdgpu, or to clang-nvlink-wrapper if the target is
nvptx. -L/-l flags are used to specify these fat archives on the command
line. E.g.
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib
It currently doesn't support linking an archive directly, like:
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a
Linking with x86 offload also does not work.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D105191
This patch adds two flags to be supported for the new runtime. The flags
are `-fopenmp-assume-threads-oversubscription` and
-fopenmp-assume-teams-oversubscription`. These add global values that
can be checked by the work sharing runtime functions to make better
judgements about how to distribute work between the threads.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D111348
An archive containing device code object files can be passed to
clang command line for linking. For each given offload target
it creates a device specific archives which is either passed to llvm-link
if the target is amdgpu, or to clang-nvlink-wrapper if the target is
nvptx. -L/-l flags are used to specify these fat archives on the command
line. E.g.
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp -L. -lmylib
It currently doesn't support linking an archive directly, like:
clang++ -fopenmp -fopenmp-targets=nvptx64 main.cpp libmylib.a
Linking with x86 offload also does not work.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D105191
clang-cl maps /wdNNNN to -Wno-flags for a few warnings that map
cleanly from cl.exe concepts to clang concepts.
This patch adds support for the same numbers to
`#pragma warning(disable : NNNN)`. It also lets
`#pragma warning(push)` and `#pragma warning(pop)` have an effect,
since these are used together with `warning(disable)`.
The optional numeric argument to `warning(push)` is ignored,
as are the other non-`disable` `pragma warning()` arguments.
(Supporting `error` would be easy, but we also don't support
`/we`, and those should probably be added together.)
The motivating example is that a bunch of code (including in LLVM)
uses this idiom to locally disable warnings about calls to deprecated
functions in Windows-only code, and 4996 maps nicely to
-Wno-deprecated-declarations:
#pragma warning(push)
#pragma warning(disable: 4996)
f();
#pragma warning(pop)
Implementation-wise:
- Move `/wd` flag handling from Options.td to actual Driver-level code
- Extract the function mapping cl.exe IDs to warning groups to the
new file clang/lib/Basic/CLWarnings.cpp
- Create a diag::Group enum so that CLWarnings.cpp can refer to
existing groups by ID (and give DllexportExplicitInstantiationDecl
a named group), and add a function to map a diag::Group to the
spelling of it's associated commandline flag
- Call that new function from PragmaWarningHandler
Differential Revision: https://reviews.llvm.org/D110668
On AIX, we relied on LTO to merge the csects for profiling data/counter
sections.
AIX binder now get the namedcsect support to support the merging,
so now we can enable PGO without LTO with the new binder.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D110671
This matches GCC.
Change the CC1 option to encode the unwind table level (1: needed by exceptions,
2: asynchronous) so that we can support two modes in the future.
This is to build the foundation of a new debug info feature to use only
the base name of template as its debug info name (eg: "t1" instead of
the full "t1<int>"). The intent being that a consumer can still retrieve
all that information from the DW_TAG_template_*_parameters.
So gno-simple-template-names is business as usual/previously ("t1<int>")
=simple is the simplified name ("t1")
=mangled is a special mode to communicate the full information, but
also indicate that the name should be able to be simplified. The data
is encoded as "_STNt1|<int>" which will be matched with an
llvm-dwarfdump --verify feature to deconstruct this name, rebuild the
original name, and then try to rebuild the simple name via the DWARF
tags - then compare the latter and the former to ensure that all the
data necessary to fully rebuild the name is present.
Summary:
Introduce a new frontend flag `-fswift-async-fp={auto|always|never}`
that controls how code generation sets the Swift extended async frame
info bit. There are three possibilities:
* `auto`: which determines how to set the bit based on deployment target, either
statically or dynamically via `swift_async_extendedFramePointerFlags`.
* `always`: default, always set the bit statically, regardless of deployment
target.
* `never`: never set the bit, regardless of deployment target.
Differential Revision: https://reviews.llvm.org/D109451
This patch introduces the flags `-fopenmp-target-debug` and
`-fopenmp-target-debug=` to set the value of a global in the device.
This will be used to enable or disable debugging features statically in
the device runtime library.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D109544
- Make flto an alias of flto=full.
- Make foffload-lto an alias of foffload-lto=full.
- Make flto_EQ_jobserver, flto_EQ_auto aliases of flto=full,
since they are being treated as full lto right now.
- Clean up the code for parseLTOMode and setLTOMode.
- Replace uses of OPT_flto with OPT_flto_EQ since they alias now.
Differential Revision: https://reviews.llvm.org/D108881
Change-Id: I5d867db83a680434fba5c8d85c9a83135d3b81ee
- Make flto an alias of flto=full.
- Make foffload-lto an alias of foffload-lto=full.
- Make flto_EQ_jobserver, flto_EQ_auto aliases of flto=full,
since they are being treated as full lto right now.
- Clean up the code for parseLTOMode and setLTOMode.
- Replace uses of OPT_flto with OPT_flto_EQ since they alias now.
Change-Id: Iea5338c20cb800b43529b20745e92600e2cfd2b1
Earlier BundleEntryID used to be <OffloadKind>-<Triple>-<GPUArch>.
This used to work because the clang-offload-bundler didn't need
GPUArch explicitly for any bundling/unbundling action. With
unbundleArchive it needs GPUArch to ensure compatibility between
device specific code objects. D93525 enforced triples to have
separators for all 4 components irrespective of number of
components, like "amdgcn-amd-amdhsa--". It was required to
to correctly parse a possible 4th environment component or a GPU.
But, this condition is breaking backward compatibility with
archive libraries compiled with compilers older than D93525.
This patch allows triples to have any number of components with
and without extra separator for empty environment field. Thus,
both the following bundle entry IDs are same:
openmp-amdgcn-amd-amdhsa--gfx906
openmp-amdgcn-amd-amdhsa-gfx906
Reviewed By: yaxunl, grokos
Differential Revision: https://reviews.llvm.org/D106809
Now prints the list of known archs. This requires plumbing a Driver
arg through a few functions.
Also add two more convenience insert() overlods to StringMap.
Differential Revision: https://reviews.llvm.org/D109105
The intent of this patch is to add support of -fp-model=[source|double|extended] to allow
the compiler to use a wider type for intermediate floating point calculations. As a side
effect to that, the value of FLT_EVAL_METHOD is changed according to the pragma
float_control.
Unfortunately some issue was uncovered with this change in preprocessing. See details in
https://reviews.llvm.org/D93769 . We are therefore reverting this patch until we find a way
to reconcile the value of FLT_EVAL_METHOD, the pragma and the -E flow.
This reverts commit 66ddac22e2.
This patch implements Clang support for an original OpenMP extension
we have developed to support OpenACC: the `ompx_hold` map type
modifier. The next patch in this series, D106510, implements OpenMP
runtime support.
Consider the following example:
```
#pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x
{
foo(); // might have map(delete: x)
#pragma omp target map(present, alloc: x) // x is guaranteed to be present
printf("%d\n", x);
}
```
The `ompx_hold` map type modifier above specifies that the `target
data` directive holds onto the mapping for `x` throughout the
associated region regardless of any `target exit data` directives
executed during the call to `foo`. Thus, the presence assertion for
`x` at the enclosed `target` construct cannot fail. (As usual, the
standard OpenMP reference count for `x` must also reach zero before
the data is unmapped.)
Justification for inclusion in Clang and LLVM's OpenMP runtime:
* The `ompx_hold` modifier supports OpenACC functionality (structured
reference count) that cannot be achieved in standard OpenMP, as of
5.1.
* The runtime implementation for `ompx_hold` (next patch) will thus be
used by Flang's OpenACC support.
* The Clang implementation for `ompx_hold` (this patch) as well as the
runtime implementation are required for the Clang OpenACC support
being developed as part of the ECP Clacc project, which translates
OpenACC to OpenMP at the directive AST level. These patches are the
first step in upstreaming OpenACC functionality from Clacc.
* The Clang implementation for `ompx_hold` is also used by the tests
in the runtime implementation. That syntactic support makes the
tests more readable than low-level runtime calls can. Moreover,
upstream Flang and Clang do not yet support OpenACC syntax
sufficiently for writing the tests.
* More generally, the Clang implementation enables a clean separation
of concerns between OpenACC and OpenMP development in LLVM. That
is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's
extended OpenMP implementation and test suite without directly
considering OpenACC's language and execution model, which can be
handled by LLVM's OpenACC developers.
* OpenMP users might find the `ompx_hold` modifier useful, as in the
above example.
See new documentation introduced by this patch in `openmp/docs` for
more detail on the functionality of this extension and its
relationship with OpenACC. For example, it explains how the runtime
must support two reference counts, as specified by OpenACC.
Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new
command-line option introduced by this patch, is specified.
Reviewed By: ABataev, jdoerfert, protze.joachim, grokos
Differential Revision: https://reviews.llvm.org/D106509
-fstack-clash-protection was added in Clang commit e67cbac812 but was
enabled only on Linux. Allow it on FreeBSD as well, as it works fine.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D108571
Previoulsy debug-info-for-profiling and pseudo-probe-for-profiling are mutual exclusive because they compete the dwarf discrimnator for callsites on the IR. This changes allows to use the two switches together. The side effect is that callsite discriminators will be taken by pseudo probe, while discriminators for other instructions are still available for AutoFDO use. This is less than ideal, however, it still allows us a chance to smoothly transition from AutoFDO to CSSPGO, by collecting both profiles from a CSSPGO binary.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D107876
Temporary files created by the offloading device toolchain are not removed
after compilation when using a two-step compilation. The offload-bundler uses a
different filename for the device binary than the `.o` file present in the
Job's input list. This is not listed as a temporary file so it is never
removed. This patch explicitly adds the device binary as a temporary file to
consume it. This fixes PR50336.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D107668
GCC supports multiple forms of -falign-loops=.
-falign-loops= is currently ignored in Clang.
This patch implements the simplest but the most useful form where N is a
power of 2.
The underlying implementation uses a `llvm::TargetOptions` option for now.
Bitcode generation ignores this option.
Differential Revision: https://reviews.llvm.org/D106701
The declaration for the global new function in C++ is generated in the compiler front-end. When examining exception propagation, we found that this is the largest root throw site propagator requiring unwind code to be generated for callers up the stack. Allowing this to be handled immediately with termination stops upward propagation and leads to significantly less landing pads generated. This in turns leads to a performance and .text size win.
With `-fnew-infallible` this annotates the declaration with `throw()` and `__attribute__((returns_nonnull))`. `throw()` allows the compiler to assume exceptions do not propagate out of new and eliminate it as a root throw site. Note that the definition of global new is user-replaceable so users should ensure that the one used follows these semantics.
Measuring internally, we're seeing at 0.5% CPU win in one of our large internal FB workload. Measuring on clang self-build (cd0a1226b5) we get:
thinlto/
"dwarfehprepare.NumCleanupLandingPadsRemaining": 153494,
"dwarfehprepare.NumNoUnwind": 26309,
thinlto_newinfallible/
"dwarfehprepare.NumCleanupLandingPadsRemaining": 143660,
"dwarfehprepare.NumNoUnwind": 28744,
a 1-143660/153494 = 6.4% reduction in landing pads and a 28744/26309 = 9.3% increase in the number of nounwind functions.
Testing:
ninja check-all
new test case to make sure these attributes are added correctly to global new.
Reviewed By: urnathan
Differential Revision: https://reviews.llvm.org/D105225
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
Definition of `__cpp_threadsafe_static_init` macro is controlled by
language option Opts.ThreadsafeStatics. This patch sets language
option to false by default in OpenCL mode, resulting in macro
`__cpp_threadsafe_static_init` being undefined. Default value can be
overridden using command line option -fthreadsafe-statics.
Change is supposed to address portability because not all OpenCL
vendors support thread safe implementation of static initialization.
Fixes llvm.org/PR48012
Differential Revision: https://reviews.llvm.org/D107163
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
The Intel compiler ICC supports the option "-fp-model=(source|double|extended)"
which causes the compiler to use a wider type for intermediate floating point
calculations. Also supported is a way to embed this effect in the source
program with #pragma float_control(source|double|extended).
This patch extends pragma float_control syntax, and also adds support
for a new floating point option "-ffp-eval-method=(source|double|extended)".
source: intermediate results use source precision
double: intermediate results use double precision
extended: intermediate results use extended precision
Reviewed By: Aaron Ballman
Differential Revision: https://reviews.llvm.org/D93769
Change the ffp-model=precise to enables -ffp-contract=on (previously
-ffp-model=precise enabled -ffp-contract=fast). This is a follow-up
to Andy Kaylor's comments in the llvm-dev discussion "Floating Point
semantic modes". From the same email thread, I put Andy's distillation
of floating point options and floating point modes into UsersManual.rst
Also fixes bugs.llvm.org/show_bug.cgi?id=50222
I had to revert this a few times because of failures on the x86-64
buildbot but I think we finally have that fixed by LNT/79f2b03c51.
Reviewed By: rjmccall, andrew.kaylor
Differential Revision: https://reviews.llvm.org/D74436
Moving `InputInfo.h` from `lib/Driver/` into `include/Driver` to be able to expose it in an API consumed from outside of `clangDriver`.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D106787
Constructor homing reduces the amount of class type info that is emitted
by emitting conmplete type info for a class only when a constructor for
that class is emitted.
This will mainly reduce the amount of duplicate debug info in object
files. In Chrome enabling ctor homing decreased total build directory sizes
by about 30%.
It's also expected that some class types (such as unused classes)
will no longer be emitted in the debug info. This is fine, since we wouldn't
expect to need these types when debugging.
In some cases (e.g. libc++, https://reviews.llvm.org/D98750), classes
are used without calling the constructor. Since this is technically
undefined behavior, enabling constructor homing should be fine.
However Clang now has an attribute
`__attribute__((standalone_debug))` that can be used on classes to
ignore ctor homing.
Bug: https://bugs.llvm.org/show_bug.cgi?id=46537
Differential Revision: https://reviews.llvm.org/D106084
This patch adds the -fminimize-whitespace with the following effects:
* If combined with -E, remove as much non-line-breaking whitespace as
possible.
* If combined with -E -P, removes as much whitespace as possible,
including line-breaks.
The motivation is to reduce the amount of insignificant changes in the
preprocessed output with source files where only whitespace has been
changed (add/remove comments, clang-format, etc.) which is in particular
useful with ccache.
A patch for ccache for using this flag has been proposed to ccache as well:
https://github.com/ccache/ccache/pull/815, which will use
-fnormalize-whitespace when clang-13 has been detected, and additionally
uses -P in "unify_mode". ccache already had a unify_mode in an older
version which was removed because of problems that using the
preprocessor itself does not have (such that the custom tokenizer did
not recognize C++11 raw strings).
This patch slightly reorganizes which part is responsible for adding
newlines that are required for semantics. It is now either
startNewLineIfNeeded() or MoveToLine() but never both; this avoids the
ShouldUpdateCurrentLine workaround and avoids redundant lines being
inserted in some cases. It also fixes a mandatory newline not inserted
after a _Pragma("...") that is expanded into a #pragma.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D104601
This patch makes the changes in the driver that converts the medium code
model to large.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D106371
Emit the unsupported option error until the Clang's library integration support for 128-bit long double is available for AIX.
Reviewed By: Whitney, cebowleratibm
Differential Revision: https://reviews.llvm.org/D106074
Change the ffp-model=precise to enables -ffp-contract=on (previously
-ffp-model=precise enabled -ffp-contract=fast). This is a follow-up
to Andy Kaylor's comments in the llvm-dev discussion "Floating Point
semantic modes". From the same email thread, I put Andy's distillation
of floating point options and floating point modes into UsersManual.rst
Also fixes bugs.llvm.org/show_bug.cgi?id=50222
Reviewed By: rjmccall, andrew.kaylor
Differential Revision: https://reviews.llvm.org/D74436
With this patch, OpenMP on AMDGCN will use the math functions
provided by ROCm ocml library. Linking device code to the ocml will be
done in the next patch.
Reviewed By: JonChesterfield, jdoerfert, scchan
Differential Revision: https://reviews.llvm.org/D104904
The Intel compiler ICC supports the option "-fp-model=(source|double|extended)"
which causes the compiler to use a wider type for intermediate floating point
calculations. Also supported is a way to embed this effect in the source
program with #pragma float_control(source|double|extended).
This patch extends pragma float_control syntax, and also adds support
for a new floating point option "-ffp-eval-method=(source|double|extended)".
source: intermediate results use source precision
double: intermediate results use double precision
extended: intermediate results use extended precision
Reviewed By: Aaron Ballman
Differential Revision: https://reviews.llvm.org/D93769
This diff changes llvm-ifs to use unified IFS file format
and perform other renaming changes in preparation for the
merging between elfabi/ifs.
Differential Revision: https://reviews.llvm.org/D99810
Turning on -funique-internal-linkage-names when -fpseudo-probe-for-profiling is on, unless -fno-unique-internal-linkage-names is specified.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D106193
While GNU as only allows the directory form of the .file directive for DWARF v5,
the integrated assembler prefers the directory form on all DWARF versions
(-fdwarf-directory-asm).
We currently set CC1 -fno-dwarf-directory-asm for -fno-integrated-as -gdwarf-5
which may cause the directory entry 0 and the filename entry 0 to be incorrect
(see D105662 and the example below). This patch makes -fno-integrated-as -gdwarf-5 use
-fdwarf-directory-asm as well.
```
cd /tmp/c
before
% clang -g -gdwarf-5 -fno-integrated-as e/a.c -S -o - | grep '\.file.*0'
.file 0 "/tmp/c/e/a.c" md5 0x97e31cee64b4e58a4af8787512d735b6
% clang -g -gdwarf-5 -fno-integrated-as e/a.c -c
% llvm-dwarfdump a.o | grep include_directories
include_directories[ 0] = "/tmp/c/e"
after
% clang -g -gdwarf-5 -fno-integrated-as e/a.c -S -o - | grep '\.file.*0'
.file 0 "/tmp/c" "e/a.c" md5 0x97e31cee64b4e58a4af8787512d735b6
% clang -g -gdwarf-5 -fno-integrated-as e/a.c -c
% llvm-dwarfdump a.o | grep include_directories
include_directories[ 0] = "/tmp/c"
```
Reviewed By: #debug-info, dblaikie, osandov
Differential Revision: https://reviews.llvm.org/D105835
D105314 added the abibility choose to use AsmParser for parsing inline
asm. -no-intergrated-as will override this default if specified
explicitly.
If toolchain choose to use MCAsmParser for inline asm, don't pass
the option to disable integrated-as explictly unless set by user.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D105512
We should not error out on non-x86 targets if `-fbasic-block-sections=none` is in effect.
Also, filter it out for GPU-side compilations, as we do with other options not
supported on the GPU.
Differential Revision: https://reviews.llvm.org/D105226
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.
Reviewed By: aaron.ballman, kpn
Differential Revision: https://reviews.llvm.org/D100118
This patch adds unbundling support of an archive file. It takes an
archive file along with a set of offload targets as input.
Output is a device specific archive for each given offload target.
Input archive contains bundled code objects bundled using
clang-offload-bundler. Each generated device specific archive contains
a set of device code object files which are named as
<Parent Bundle Name>-<CodeObject-GPUArch>.
Entries in input archive can be of any binary type which is
supported by clang-offload-bundler, like *.bc. Output archives will
contain files in same type.
Example Usuage:
clang-offload-bundler --unbundle --inputs=lib-generic.a -type=a
-targets=openmp-amdgcn-amdhsa--gfx906,openmp-amdgcn-amdhsa--gfx908
-outputs=devicelib-gfx906.a,deviceLib-gfx908.a
Reviewed By: jdoerfert, yaxunl
Differential Revision: https://reviews.llvm.org/D93525
Added the option `-altivec-src-compat=[mixed,gcc,xl]`. The default at this time is `mixed`.
The default behavior for clang is for all vector compares to return a scalar unless the vectors being
compared are vector bool or vector pixel. In that case the compare returns a
vector. With the gcc case all vector compares return vectors and in the xl case
all vector compares return scalars.
This patch does not change the default behavior of clang.
This option will be used in future patches to implement behaviour compatibility for the vector bool/pixel types.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D103615
This reverts commit c3fe847f9d.
Tests fail in non-asserts builds because they assume named IR, by the
looks of it (testing for the "entry" label, for instance). I don't know
enough about the update_cc_test_checks.py stuff to know how to manually
fix these tests, so reverting for now.
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.
Reviewed By: aaron.ballman, kpn
Differential Revision: https://reviews.llvm.org/D100118
Added the option `-altivec-src-compat=[mixed,gcc,xl]`. The default at this time is `mixed`.
The default behavior for clang is for all vector compares to return a scalar unless the vectors being
compared are vector bool or vector pixel. In that case the compare returns a
vector. With the gcc case all vector compares return vectors and in the xl case
all vector compares return scalars.
This patch does not change the default behavior of clang.
This option will be used in future patches to implement behaviour compatibility for the vector bool/pixel types.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D103615
When clang driver is used with -save-temps to compile OpenCL program,
clang driver first launches clang -cc1 -E to generate preprocessor expansion output,
then launches clang -cc1 with the generated preprocessor expansion output as input
to generate LLVM IR.
Currently clang by default passes "-finclude-default-header" "-fdeclare-opencl-builtins"
in both steps, which causes default header included again in the second step, which
causes error.
This patch let clang not to include default header when input type is preprocessor expansion
output, which fixes the issue.
Reviewed by: Anastasia Stulova
Differential Revision: https://reviews.llvm.org/D104800
This is mostly a mechanical change, but a testcase that contains
parts of the StringRef class (clang/test/Analysis/llvm-conventions.cpp)
isn't touched.
Only LLVM-based instrumentation profile is supported on AIX.
And it currently must be used with full LTO.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D104803
The default Altivec ABI was implemented but the clang error for specifying
its use still remains. Users could get around this but not specifying the
type of Altivec ABI but we need to remove the error.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D102094
Summary:
The changes introduced in D97680 turns this command line option into a no-op so
it can be removed entirely.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D102940
This reverts commit a1449a10db.
Seems like my changes to LNT had no effect -- puzzled.
The 21 tests pass on my sandbox with the clang patch but are
failing in exec time in the bot
This patch changes the ffp-model=precise to enables -ffp-contract=on
(previously -ffp-model=precise enabled -ffp-contract=fast). This is a
follow-up to Andy Kaylor's comments in the llvm-dev discussion
"Floating Point semantic modes". From the same email thread, I put
Andy's distillation of floating point options and floating point modes
into UsersManual.rst
Differential Revision: https://reviews.llvm.org/D74436
Support -Wno-frame-larger-than (with no =) and make it properly
interoperate with -Wframe-larger-than. Reject -Wframe-larger-than with
no argument.
We continue to support Clang's old spelling, -Wframe-larger-than=, for
compatibility with existing users of that facility.
In passing, stop the driver from accepting and ignoring
-fwarn-stack-size and make it a cc1-only flag as intended.
-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.
-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.
Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.
Reviewed By: rsmith, qcolombet
Differential Revision: https://reviews.llvm.org/D103928
This patch adds the command line option -foperator-names which acts as the opposite of -fno-operator-names. With this command line option it is possible to reenable C++ operator keywords on the command line if -fno-operator-names had previously been passed.
Differential Revision: https://reviews.llvm.org/D103749
This patch changes the ffp-model=precise to enables -ffp-contract=on
(previously -ffp-model=precise enabled -ffp-contract=fast). This is a
follow-up to Andy Kaylor's comments in the llvm-dev discussion
"Floating Point semantic modes". From the same email thread, I put
Andy's distillation of floating point options and floating point modes
into UsersManual.rst
Differential Revision: https://reviews.llvm.org/D74436
This fixed PR#48894 for AArch64. The issue has been fixed for Arm in
https://reviews.llvm.org/D95872
The following rules apply to -Wa,-march with this change:
- Only compiler options apply to non assembly files
- Compiler and assembler options apply to assembly files
- For assembly files, we prefer the assembler option(s) if we have both kinds of option
- Of the options that apply (or are preferred), the last value wins (it's not additive)
Reviewed By: DavidSpickett, nickdesaulniers
Differential Revision: https://reviews.llvm.org/D103184
This patch fixes a Windows -EHa crash induced by previous commit 797ad70152.
The crash was caused by "LifetimeMarker" scope (with option -O2) that should not be considered as SEH Scope.
This change also turns off -fasync-exceptions by default under -EHa option for now.
Differential Revision: https://reviews.llvm.org/D103664#2799944
A recent change (D99683) to support ThinLTO for HIP caused a regression
when compiling cuda code with -flto=thin -fwhole-program-vtables.
Specifically, we now get an error:
error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
This error is coming from the device offload cc1 action being set up for
the cuda compile, for which -flto=thin doesn't apply and gets dropped.
This is a regression, but points to a potential issue that was silently
occurring before the patch, details below.
Before D99683, the check for fwhole-program-vtables in the driver looked
like:
if (WholeProgramVTables) {
if (!D.isUsingLTO())
D.Diag(diag::err_drv_argument_only_allowed_with)
<< "-fwhole-program-vtables"
<< "-flto";
CmdArgs.push_back("-fwhole-program-vtables");
}
And D.isUsingLTO() returned true since we have -flto=thin. However,
because the cuda cc1 compile is doing device offloading, which didn't
support any LTO, there was other code that suppressed -flto* options
from being passed to the cc1 invocation. So the cc1 invocation silently
had -fwhole-program-vtables without any -flto*. This seems potentially
problematic, since if we had any virtual calls we would get type test
assume sequences without the corresponding LTO pass that handles them.
However, with the patch, which adds support for device offloading LTO
option -foffload-lto=thin, the code has changed so that we set a bool
IsUsingLTO based on either -flto* or -foffload-lto*, depending on
whether this is the device offloading action. For the device offload
action in our compile, since we don't have -foffload-lto, IsUsingLTO is
false, and the check for LTO with -fwhole-program-vtables now fails.
What we should do is only pass through -fwhole-program-vtables to the
cc1 invocation that has LTO enabled (either the device offload action
with -foffload-lto, or the non-device offload action with -flto), and
otherwise drop the -fwhole-program-vtables for the non-LTO action.
Then we should error only if we have -fwhole-program-vtables without any
-f*lto* options.
Differential Revision: https://reviews.llvm.org/D103579
All fuchsia targets will now use the relative-vtables ABI by default.
Also remove -fexperimental-relative-c++-abi-vtables from test RUNs targeting fuchsia.
Differential Revision: https://reviews.llvm.org/D102374
VS 2019 16.11 (just released in Preview) is adding support for the
/std:c++20 option and bumping /std:c++latest to "post-c++20". This
updates clang-cl to match.
Differential revision: https://reviews.llvm.org/D103155
Since 4468e5b899 clang will prefer
the last one it finds of "-mimplicit-it" or "-Wa,-mimplicit-it".
Due to a mistake in that patch the compiler argument "-mimplicit-it"
was never marked as used, even if it was the last one and was passed
to llvm.
Move the Claim call back to the start of the loop and update
the testing to check we don't get any unused argument warnings.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D103086
Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling
LTO for offload compilation. Allow LTO for AMDGPU target.
AMDGPU target does not support codegen of object files containing
call of external functions, therefore the LLVM module passed to
AMDGPU backend needs to contain definitions of all the callees.
An LLVM option is added to allow function importer to import
functions with noinline attribute.
HIP toolchain passes proper LLVM options to lld to make sure
function importer imports definitions of all the callees.
Reviewed by: Teresa Johnson, Artem Belevich
Differential Revision: https://reviews.llvm.org/D99683
If multiple instances of the -arm-implicit-it option is passed to
the backend, it errors out.
Also fix cases where there are multiple -Wa,-mimplicit-it; the existing
tests indicate that the last one specified takes effect, while in
practice it passed double options, which didn't work as intended.
Differential Revision: https://reviews.llvm.org/D102812
Instead of ignoring flto=auto and -flto=jobserver, treat them as -flto
and pass -flto=full along.
Differential Revision: https://reviews.llvm.org/D102479
This reverts commit 2919222d80.
That commit broke backwards compatibility. Additionally, the
replacement, -Wa,-mimplicit-it, isn't yet supported by any stable
release of Clang.
See D102812 for a fix for the error cases when callers specify both
-mimplicit-it and -Wa,-mimplicit-it.
This is a GNU as and Clang cc1as option, not a GCC option.
Users should specify `-Wa,-mimplicit-it=` instead.
Note: mixing the -m option and the -Wa, option doesn't work
`-Wa,-mimplicit-it=never -mimplicit-it=always` =>
`clang (LLVM option parsing): for the --arm-implicit-it option: may only occur zero or one times!`
Reviewed By: nickdesaulniers, raj.khem
Differential Revision: https://reviews.llvm.org/D102568
Currently, we have support for SYCL 1.2.1 (also known as SYCL 2017).
This patch introduces the start of support for SYCL 2020 mode, which is
the latest SYCL standard available at (https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html).
This sets the default SYCL to be 2020 in the driver, and introduces the
notion of a "default" version (set to 2020) when cc1 is in SYCL mode
but there was no explicit -sycl-std= specified on the command line.
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.
Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
outside of _try region must be updated in memory (not just in register)
before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.
Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others. If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation described below.
Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.
Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.
Part-2 (will be in Part-2 patch): LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.htmlhttps://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D80344/new/
Follow up to D88631 but for aarch64; the Linux kernel uses the command
line flags:
1. -mstack-protector-guard=sysreg
2. -mstack-protector-guard-reg=sp_el0
3. -mstack-protector-guard-offset=0
to use the system register sp_el0 for the stack canary, enabling the
kernel to have a unique stack canary per task (like a thread, but not
limited to userspace as the kernel can preempt itself).
Address pr/47341 for aarch64.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/289
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen
Differential Revision: https://reviews.llvm.org/D100919
This patch adds support for GCC's -fstack-usage flag. With this flag, a stack
usage file (i.e., .su file) is generated for each input source file. The format
of the stack usage file is also similar to what is used by GCC. For each
function defined in the source file, a line with the following information is
produced in the .su file.
<source_file>:<line_number>:<function_name> <size_in_byte> <static/dynamic>
"Static" means that the function's frame size is static and the size info is an
accurate reflection of the frame size. While "dynamic" means the function's
frame size can only be determined at run-time because the function manipulates
the stack dynamically (e.g., due to variable size objects). The size info only
reflects the size of the fixed size frame objects in this case and therefore is
not a reliable measure of the total frame size.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D100509
Previously clang would print a binary blob into the bundled file
for amdgcn. With this patch, it will instead print textual IR as
expected.
Reviewed By: JonChesterfield, ronlieb
Differential Revision: https://reviews.llvm.org/D102065
Change-Id: I10c0127ab7357787769fdf9a2edd4b3071e790a1
-fno-semantic-interposition (only effective with -fpic) can optimize default
visibility external linkage (non-ifunc-non-COMDAT) variable access and function
calls to avoid GOT/PLT, by using local aliases, e.g.
```
int var;
__attribute__((optnone)) int fun(int x) { return x * x; }
int test() { return fun(var); }
```
-fpic (var and fun are dso_preemptable)
```
test:
.LBB1_1:
auipc a0, %got_pcrel_hi(var)
ld a0, %pcrel_lo(.LBB1_1)(a0)
lw a0, 0(a0)
// fun is preemptible by default in ld -shared mode. ld will create a PLT.
tail fun@plt
```
vs -fpic -fno-semantic-interposition (var and fun are dso_local)
```
test:
.Ltest$local:
.LBB1_1:
auipc a0, %pcrel_hi(.Lvar$local)
addi a0, a0, %pcrel_lo(.LBB1_1)
lw a0, 0(a0)
// The assembler either resolves .Lfun$local at assembly time (-mno-relax
// -fno-function-sections), or produces a relocation referencing a non-preemptible
// local symbol (which can avoid PLT).
tail .Lfun$local
```
Note: Clang's default -fpic is more aggressive than GCC -fpic: interprocedural
optimizations (including inlining) are available but local aliases are not used.
-fpic -fsemantic-interposition can disable interprocedural optimizations.
Depends on D101875
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D101876
-fno-semantic-interposition (only effective with -fpic) can optimize default
visibility external linkage (non-ifunc-non-COMDAT) variable access and function
calls to avoid GOT/PLT, by using local aliases, e.g.
```
int var;
__attribute__((optnone)) int fun(int x) { return x * x; }
int test() { return fun(var); }
```
-fpic (var and fun are dso_preemptable)
```
test: // @test
adrp x8, :got:var
ldr x8, [x8, :got_lo12:var]
ldr w0, [x8]
// fun is preemptible by default in ld -shared mode. ld will create a PLT.
b fun
```
vs -fpic -fno-semantic-interposition (var and fun are dso_local)
```
test: // @test
.Ltest$local:
adrp x8, .Lvar$local
ldr w0, [x8, :lo12:.Lvar$local]
// The assembler either resolves .Lfun$local at assembly time, or produces a
// relocation referencing a non-preemptible section symbol (which can avoid PLT).
b .Lfun$local
```
Note: Clang's default -fpic is more aggressive than GCC -fpic: interprocedural
optimizations (including inlining) are available but local aliases are not used.
-fpic -fsemantic-interposition can disable interprocedural optimizations.
Depends on D101872
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D101873
Previously clang would print a binary blob into the bundled file
for amdgcn. With this patch, it will instead print textual IR as
expected.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D102065
This implements the flag proposed in RFC
http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html.
The goal is to add a way to override the default target C++ ABI through a
compiler flag. This makes it easier to test and transition between different
C++ ABIs through compile flags rather than build flags.
In this patch:
- Store -fc++-abi= in a LangOpt. This isn't stored in a CodeGenOpt because
there are instances outside of codegen where Clang needs to know what the
ABI is (particularly through ASTContext::createCXXABI), and we should be
able to override the target default if the flag is provided at that point.
- Expose the existing ABIs in TargetCXXABI as values that can be passed
through this flag.
- Create a .def file for these ABIs to make it easier to check flag values.
- Add an error for diagnosing bad ABI flag values.
Differential Revision: https://reviews.llvm.org/D85802
GCC supports negative values for -mstack-protector-guard-offset=, this
should be a signed value. Pre-req to D100919.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101325
[clang][amdgpu] Use implicit code object version
At present, clang always passes amdhsa-code-object-version on to -cc1. That is
great for certainty over what object version is being used when debugging.
Unfortunately, the command line argument is in AMDGPUBaseInfo.cpp in the amdgpu
target. If clang is used with an llvm compiled with DLLVM_TARGETS_TO_BUILD
that excludes amdgpu, this will be diagnosed (as discovered via D98658):
- Unknown command line argument '--amdhsa-code-object-version=4'
This means that clang, built only for X86, can be used to compile the nvptx
devicertl for openmp but not the amdgpu one. That would shortly spawn fragile
logic in the devicertl cmake to try to guess whether the clang used will work.
This change omits the amdhsa-code-object-version parameter when it matches the
default that AMDGPUBaseInfo.cpp specifies, with a comment to indicate why. As
this is the only part of clang's codegen for amdgpu that depends on the target
in the back end it suffices to build the openmp runtime on most (all?) systems.
It is a non-functional change, though observable in the updated tests and when
compiling with -###. It may cause minor disruption to the amd-stg-open branch.
Revision of D98746, builds on refactor in D101077
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D101095
[clang][nfc] Split getOrCheckAMDGPUCodeObjectVersion
Separates detection of deprecated or invalid code object version from
returning the version. Written to avoid any behaviour change.
Precursor to a revision of D98746.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D101077
This is a user-facing option, so it doesn't make sense for it to be cc1
only.
Follow-up to D100420
Differential revision: https://reviews.llvm.org/D100759
This allows frontend and backend diagnostic files to all go into the
same place. Have it control the Windows (mini-)dump location.
Differential Revision: https://reviews.llvm.org/D99199
Programmers would like to be able to test direct methods by calling them from a
different linkage unit or mocking them, both of which are impossible. This
patch adds a flag that effectively disables the attribute, which will fix this
when enabled in testable builds. rdar://71190891
Differential revision: https://reviews.llvm.org/D95845
Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.
Solution:
This patch adds two new flags
- OF_CRLF which indicates that CRLF translation is used.
- OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.
Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.
So this is the behaviour per platform with my patch:
z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode
Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return
The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.
```
if (Flags & OF_CRLF)
CrtOpenFlags |= _O_TEXT;
```
These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D99426
In order to bring up scalable vector support in LLVM incrementally,
we introduced behaviour to emit a warning, instead of an error, when
asking the wrong question of a scalable vector, like asking for the
fixed number of elements.
This patch puts that behaviour under a flag. The default behaviour is
that the compiler will always error, which means that all LLVM unit
tests and regression tests will now fail when a code-path is taken that
still uses the wrong interface.
The behaviour to demote an error to a warning can be individually enabled
for tools that want to support experimental use of scalable vectors.
This patch enables that behaviour when driving compilation from Clang.
This means that for users who want to try out scalable-vector support,
fixed-width codegen support, or build user-code with scalable vector
intrinsics, Clang will not crash and burn when the compiler encounters
such a case.
This allows us to do away with the following pattern in many of the SVE tests:
RUN: .... 2>%t
RUN: cat %t | FileCheck --check-prefix=WARN
WARN-NOT: warning: ...
The behaviour to emit warnings is only temporary and we expect this flag
to be removed in the future when scalable vector support is more stable.
This patch also has fixes the following tests:
unittests:
ScalableVectorMVTsTest.SizeQueries
SelectionDAGAddressAnalysisTest.unknownSizeFrameObjects
AArch64SelectionDAGTest.computeKnownBitsSVE_ZERO_EXTEND_VECTOR_INREG
regression tests:
Transforms/InstCombine/vscale_gep.ll
Reviewed By: paulwalker-arm, ctetreau
Differential Revision: https://reviews.llvm.org/D98856
For DBX, it does not handle column info well. Set -gno-column-info
by default for DBX.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D99703
Based on this debugger type, for now, we plan to:
1: use inline string by default for XCOFF DWARF
2: generate no column info for debug line table.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D99400
The contents of the string returned by getenv() is not guaranteed across calls to getenv(). The code to handle the CC_PRINT etc env vars calls getenv() and saves the results in just a char *. The string returned by getenv() needs to be copied and saved. Switching the type of the strings from char * to std::string will do this and manage the alloated memory.
Differential Revision: https://reviews.llvm.org/D98554
Summary: Add -fno-split-stack and rename CC1 option from `-split-stacks`
to `-fsplit-stack`.
Test Plan: check-all
Differential Revision: https://reviews.llvm.org/D99245
This patch sets the OF_Text flag correctly for the json file created in Clang::DumpCompilationDatabaseFragmentToDir.
Reviewed By: amccarth
Differential Revision: https://reviews.llvm.org/D99200
SYCL compilations initiated by the driver will spawn off one or more
frontend compilation jobs (one for device and one for host). This patch
reworks the driver options to make upstreaming this from the downstream
SYCL fork easier.
This patch introduces a language option to identify host executions
(SYCLIsHost) and a -cc1 frontend option to enable this mode. -fsycl and
-fno-sycl become driver-only options that are rejected when passed to
-cc1. This is because the frontend and beyond should be looking at
whether the user is doing a device or host compilation specifically.
Because the frontend should only ever be in one mode or the other,
-fsycl-is-device and -fsycl-is-host are mutually exclusive options.