This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.
‘--offload’ option, which is envisioned in [1], is added for specifying
offload targets. This option is used to override default device target
(amdgcn-amd-amdhsa) for HIP compilation for emitting device code as
SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().
getOffloadingDeviceToolChain() function (based on the design in the
SYCL repository) is added to select HIPSPVToolChain when HIP offload
target is ‘spirv64’.
The HIPActionBuilder is modified to produce LLVM IR at the backend
phase. HIPSPV tool chain expects to receive HIP device code as LLVM
IR so it can run external LLVM passes over them. HIPSPV TC is also
responsible for emitting the SPIR-V binary.
A Cuda GPU architecture ‘generic’ is added. The name is picked from
the LLVM SPIR-V Backend. In the HIPSPV code path the architecture
name is inserted to the bundle entry ID as target ID. Target ID is
expected to be always present so a component in the target triple
is not mistaken as target ID.
Tests are added for checking the HIPSPV tool chain.
[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu, Artem Belevich, Alexey Bader
Differential Revision: https://reviews.llvm.org/D110622
The UpperBound of RVV type in debug info should be elements count minus one,
as the LowerBound start from zero.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D115430
Summary: This patch records the access flag for
class/struct/union types in the clang part.
The summary of binary size change and debug info size change due to the DW_AT_accessibility attribute are as the following table. They are built with flags of `clang -O0 -g` (no -gz).
| section | before | after | change | % |
| .debug_loc | 929821 | 929821 |0|0|
|.debug_abbrev | 5885289 | 5971547 |+86258|+1.466%|
|.debug_info | 497613455 | 498122074 |+508619|+0.102%|
|.debug_ranges | 45731664 | 45731664 |0|0|
|.debug_str | 233842595 | 233839388 |-3207| -0.001%|
|.debug_line | 149773166 | 149764583 |-8583|-0.006%|
|total (debug) |933775990 |934359077|+583087 |+0.062%|
|total (binary) |1394617288 | 1395200024| +582736|+0.042%|
Reviewed By: dblaikie, shchenz
Differential Revision: https://reviews.llvm.org/D115503
This is the last cleanup step resulting from D115804 .
Now that clang uses intrinsics when we're in the special FP mode,
we don't need a function attribute as an indicator to the backend.
The LLVM part of the change is in D115885.
Differential Revision: https://reviews.llvm.org/D115886
Fix a mistake in 9bf917394eba3ba4df77cc17690c6d04f4e9d57f: sret
arguments use ConvertType, not ConvertTypeForMem, see the handling
in CodeGenTypes::GetFunctionType().
This fixes fp-matrix-pragma.c on s390x.
For aggregates, we need to store the element type to be able to
reconstruct the aggregate Address. This increases the size of this
packed structure (as the second value is already used for alignment
in this case), but I did not observe any compile-time or memory
usage regression from this change.
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D115693
With opaque pointers the pointer cast may be a no-op, such that
var and castedAddr are the same. However, we still need to update
the map entry as the underlying global changed. We could explicitly
check whether the global was replaced, but we may as well just
always update the entry.
We got an unintended consequence of the optimizer getting smarter when
compiling in a non-standard mode, and there's no good way to inhibit
those optimizations at a later stage. The test is based on an example
linked from D92270.
We allow the "no-strict-float-cast-overflow" exception to normal C
cast rules to preserve legacy code that does not expect overflowing
casts from FP to int to produce UB. See D46236 for details.
Differential Revision: https://reviews.llvm.org/D115804
Store the pointer element type inside LValue so that we can
preserve it when converting it back into an Address. Storing the
pointer element type might not be strictly required here in that
we could probably re-derive it from the QualType (which would
require CGF access though), but storing it seems like the simpler
solution.
The global register case is special and does not store an element
type, as the value is not a pointer type in that case and it's not
possible to create an Address from it.
This is the main remaining part from D103465.
Differential Revision: https://reviews.llvm.org/D115791
CreateElementBitCast() can preserve the pointer element type in
the presence of opaque pointers, so use it in place of CreateBitCast()
in some places. This also sometimes simplifies the code a bit.
Change all uses of the deprecated constructor to pass the
element type explicitly and drop it.
For cases where the correct element type was not immediately
obvious to me or would require a slightly larger change I'm
falling back to explicitly calling getPointerElementType() for now.
Explicitly track the pointer element type in Address, rather than
deriving it from the pointer type, which will no longer be possible
with opaque pointers. This just adds the basic facility, for now
everything is still going through the deprecated constructors.
I had to adjust one place in the LValue implementation to satisfy
the new assertions: Global registers are represented as a
MetadataAsValue, which does not have a pointer type. We should
avoid using Address in this case.
This implements a part of D103465.
Differential Revision: https://reviews.llvm.org/D115725
In 32bit mode, attaching TBAA metadata to the store following the call
to inline assembler results in describing the wrong type by making a
fake lvalue(i.e., whatever the inline assembler happens to leave in
EAX:EDX.) Even if inline assembler somehow describes the correct type,
setting TBAA information on return type of call to inline assembler is
likely not correct, since TBAA rules need not apply to inline assembler.
Differential Revision: https://reviews.llvm.org/D115320
This no longer allows creating an invalid Address through the regular
constructor. There were only two places that did this (AggValueSlot
and EHCleanupScope) which did this by converting a potential nullptr
into an Address. I've fixed both of these by directly storing an
Address instead.
This is intended as a bit of preliminary cleanup for D103465.
Differential Revision: https://reviews.llvm.org/D115630
This reverts commit 800bf8ed29.
The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was
failing because I forgot the `llc` commands are architecture specific.
I'll follow up with a fix.
Differential Revision: https://reviews.llvm.org/D115689
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.
Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D114565
There are instances where clang codegen creates stores to
address space 4 in ctors, which causes a crash in llc.
This store was being optimized out at opt levels > 0.
For example:
pragma omp declare target
static const double log_smallx = log2(smallx);
pragma omp end declare target
This patch ensures that any global const that does not
have constant initialization stays in address space 1.
Note - a second patch is in the works where all global
constants are placed in address space 1 during
codegen and then the opt pass InferAdressSpaces
will promote to address space 4 where necessary.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D115661
This reverts commit 2b554920f1.
This change causes tsan test timeout on x86_64-linux-autoconf.
The timeout can be reproduced by:
git clone https://github.com/llvm/llvm-zorg.git
BUILDBOT_CLOBBER= BUILDBOT_REVISION=eef8f3f85679c5b1ae725bade1c23ab7bb6b924f llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_standard.sh
The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.
In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D113623
I found that the coroutine intrinsic llvm.coro.param in documentation
(https://llvm.org/docs/Coroutines.html#id101) didn't get used actually
since there isn't lowering codes in LLVM. I also checked the
implementation of libstdc++ and libc++. Both of them didn't use
llvm.coro.param. So I am pretty sure that the llvm.coro.param intrinsic
is unused. I think it would be better t to remove it to avoid possible
misleading understandings.
Note: according to [class.copy.elision]/p1.3, this optimization is
allowed by the C++ language specification. Let's make it someday.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D115222
Update `OpenMPIRBuilder::collapseLoops()` to call `resize()` instead of
`set_size()`. The latter asserts on capacity limits and cannot grow,
which seems likely to be unintentional here (if it is, I think a local
assertion would be good for clarity).
Also update `CodeGenFunction::EmitOMPCollapsedCanonicalLoopNest()` to
use `pop_back_n()` instead of `set_size()`.
Differential Revision: https://reviews.llvm.org/D115378
This patch translates HIP kernels to SPIR-V kernels when the HIP
compilation mode is targeting SPIR-S. This involves:
* Setting Cuda calling convention to CC_OpenCLKernel (which maps to
SPIR_KERNEL in LLVM IR later on).
* Coercing pointer arguments with default address space (AS) qualifier
to CrossWorkGroup AS (__global in OpenCL). HIPSPV's device code is
ultimately SPIR-V for OpenCL execution environment (as
starter/default) where Generic or Function (OpenCL's private) is not
supported as storage class for kernel pointer types. This leaves the
CrossWorkGroup to be the only reasonable choice for HIP buffers.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D109818
This patch fixes issues for -fgpu-rdc for Windows MSVC
toolchain:
Fix COFF specific section flags and remove section types
in llvm-mc input file for Windows.
Escape fatbin path in llvm-mc input file.
Add -triple option to llvm-mc.
Put __hip_gpubin_handle in comdat when it has linkonce_odr
linkage.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D115039
WG14 adopted the _ExtInt feature from Clang for C23, but renamed the
type to be _BitInt. This patch does the vast majority of the work to
rename _ExtInt to _BitInt, which accounts for most of its size. The new
type is exposed in older C modes and all C++ modes as a conforming
extension. However, there are functional changes worth calling out:
* Deprecates _ExtInt with a fix-it to help users migrate to _BitInt.
* Updates the mangling for the type.
* Updates the documentation and adds a release note to warn users what
is going on.
* Adds new diagnostics for use of _BitInt to call out when it's used as
a Clang extension or as a pre-C23 compatibility concern.
* Adds new tests for the new diagnostic behaviors.
I want to call out the ABI break specifically. We do not believe that
this break will cause a significant imposition for early adopters of
the feature, and so this is being done as a full break. If it turns out
there are critical uses where recompilation is not an option for some
reason, we can consider using ABI tags to ease the transition.
The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to
match how the hardware instruction works.
Don't change the API of the corresponding OpenCL builtins.
Differential Revision: https://reviews.llvm.org/D115032
With C++17 the exception specification has been made part of the
function type, and therefore part of mangled type names.
However, it's valid to convert function pointers with an exception
specification to function pointers with the same argument and return
types but without an exception specification, which means that e.g. a
function of type "void () noexcept" can be called through a pointer
of type "void ()". We must therefore consider the two types to be
compatible for CFI purposes.
We can do this by stripping the exception specification before mangling
the type name, which is what this patch does.
Differential Revision: https://reviews.llvm.org/D115015
This does similar thing to 6b1341e, but fixes single element 128-bit
float type: `struct { long double x; }`.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D114937
Glibc 2.32 and newer uses these symbol names to support IEEE-754 128-bit
float. GCC transforms name of these builtins to align with Glibc header
behavior.
Since Clang doesn't have all GCC-compatible builtins implemented, this
patch only mutates the implemented part.
Note nexttoward is a special case (no nexttowardf128) so it's also
handled here.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D112401
If clang's output is set to bitcode and LTO is enabled, clang would
unconditionally add the flag to the module. Unfortunately, if the input were a
bitcode or IR file and had the flag set, this would result in two copies of the
flag, which is illegal IR. Guard the setting of the flag by checking whether it
already exists. This follows existing practice for the related "ThinLTO" module
flag.
Differential Revision: https://reviews.llvm.org/D112177
Handle branch protection option on the commandline as well as a function
attribute. One patch for both mechanisms, as they use the same underlying
parsing mechanism.
These are recorded in a set of LLVM IR module-level attributes like we do for
AArch64 PAC/BTI (see https://reviews.llvm.org/D85649):
- command-line options are "translated" to module-level LLVM IR
attributes (metadata).
- functions have PAC/BTI specific attributes iff the
__attribute__((target("branch-protection=...))) was used in the function
declaration.
- command-line option -mbranch-protection to armclang targeting Arm,
following this grammar:
branch-protection ::= "-mbranch-protection=" <protection>
protection ::= "none" | "standard" | "bti" [ "+" <pac-ret-clause> ]
| <pac-ret-clause> [ "+" "bti"]
pac-ret-clause ::= "pac-ret" [ "+" <pac-ret-option> ]
pac-ret-option ::= "leaf" ["+" "b-key"] | "b-key" ["+" "leaf"]
b-key is simply a placeholder to make it consistent with AArch64's
version. In Arm, however, it triggers a warning informing that b-key is
unsupported and a-key will be selected instead.
- Handle _attribute_((target(("branch-protection=..."))) for AArch32 with the
same grammer as the commandline options.
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension
The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:
https://developer.arm.com/documentation/ddi0553/latest
The following people contributed to this patch:
- Momchil Velikov
- Victor Campos
- Ties Stuij
Reviewed By: vhscampos
Differential Revision: https://reviews.llvm.org/D112421
See discussion in D51650, this change was a little aggressive in an
error while doing a 'while we were here', so this removes that error
condition, as it is apparently useful.
This reverts commit bb4934601d.
Currently variables appearing inside private/firstprivate/lastprivate
clause of openmp task construct are not visible inside lldb debugger.
This is because compiler does not generate debug info for it.
Please consider the testcase debug_private.c attached with patch.
```
28 #pragma omp task shared(res) private(priv1, priv2) firstprivate(fpriv)
29 {
30 priv1 = n;
31 priv2 = n + 2;
32 printf("Task n=%d,priv1=%d,priv2=%d,fpriv=%d\n",n,priv1,priv2,fpriv);
33
-> 34 res = priv1 + priv2 + fpriv + foo(n - 1);
35 }
36 #pragma omp taskwait
37 return res;
(lldb) p priv1
error: <user expression 0>:1:1: use of undeclared identifier 'priv1'
priv1
^
(lldb) p priv2
error: <user expression 1>:1:1: use of undeclared identifier 'priv2'
priv2
^
(lldb) p fpriv
error: <user expression 2>:1:1: use of undeclared identifier 'fpriv'
fpriv
^
```
After the current patch, lldb is able to show the variables
```
(lldb) p priv1
(int) $0 = 10
(lldb) p priv2
(int) $1 = 12
(lldb) p fpriv
(int) $2 = 14
```
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D114504
Add an AtomicScopeModel for HIP and support for OpenCL builtins
that are missing in HIP.
Patch by: Michael Liao
Revised by: Anshil Ghandi
Reviewed by: Yaxun Liu
Differential Revision: https://reviews.llvm.org/D113925
Eachempati.
This patch adds clang (parsing, sema, serialization, codegen) support for the 'depend' clause on the 'taskwait' directive.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D113540
AMD64 ABI mandates caller to specify the number of used SSE registers
when passing variable arguments.
GCC also provides option -mskip-rax-setup to skip the setup of rax when
SSE is disabled. This helps to reduce the code size, see pr23258.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D112413
With this,
void f() { __asm__("mov eax, ebx"); }
now compiles with clang with -masm=intel.
This matches gcc.
The flag is not accepted in clang-cl mode. It has no effect on
MSVC-style `__asm {}` blocks, which are unconditionally in intel
mode both before and after this change.
One difference to gcc is that in clang, inline asm strings are
"local" while they're "global" in gcc. Building the following with
-masm=intel works with clang, but not with gcc where the ".att_syntax"
from the 2nd __asm__() is in effect until file end (or until a
".intel_syntax" somewhere later in the file):
__asm__("mov eax, ebx");
__asm__(".att_syntax\nmovl %ebx, %eax");
__asm__("mov eax, ebx");
This also updates clang's intrinsic headers to work both in
-masm=att (the default) and -masm=intel modes.
The official solution for this according to "Multiple assembler dialects in asm
templates" in gcc docs->Extensions->Inline Assembly->Extended Asm
is to write every inline asm snippet twice:
bt{l %[Offset],%[Base] | %[Base],%[Offset]}
This works in LLVM after D113932 and D113894, so use that.
(Just putting `.att_syntax` at the start of the snippet works in some but not
all cases: When LLVM interpolates in parameters like `%0`, it uses at&t or
intel syntax according to the inline asm snippet's flavor, so the `.att_syntax`
within the snippet happens to late: The interpolated-in parameter is already
in intel style, and then won't parse in the switched `.att_syntax`.)
It might be nice to invent a `#pragma clang asm_dialect push "att"` /
`#pragma clang asm_dialect pop` to be able to force asm style per snippet,
so that the inline asm string doesn't contain the same code in two variants,
but let's leave that for a follow-up.
Fixes PR21401 and PR20241.
Differential Revision: https://reviews.llvm.org/D113707
Calls to MMA builtins that take pointer to void
do not accept other pointers/arrays whereas normal
functions with the same parameter do. This patch
allows MMA built-ins to accept non-void pointers
and arrays.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D113306
category is empty
Currently, if we create a category in ObjC that is empty, we still emit
runtime metadata for that category. This is a scenario that could
commonly be run into when using __attribute__((objc_direct_members)),
which elides the need for much of the category metadata. This is
slightly wasteful and can be easily skipped by checking the category
metadata contents during CodeGen.
rdar://66177182
Differential Revision: https://reviews.llvm.org/D113455
As discussed here: https://lwn.net/Articles/691932/
GCC6.0 adds target_clones multiversioning. This functionality is
an odd cross between the cpu_dispatch and 'target' MV, but is compatible
with neither.
This attribute allows you to list all options, then emits a separately
optimized version of each function per-option (similar to the
cpu_specific attribute). It automatically generates a resolver, just
like the other two.
The mangling however, is... ODD to say the least. The mangling format
is:
<normal_mangling>.<option string>.<option ordinal>.
Differential Revision:https://reviews.llvm.org/D51650
Per C++17 [except.spec], 'throw()' has become equivalent to
'noexcept', and should therefore call std::terminate, not
std::unexpected.
Differential Revision: https://reviews.llvm.org/D113517
this patch - https://reviews.llvm.org/D110337 changes the way how hostcall
hidden argument is emitted for printf, but the sanitized kernels also use
hostcall buffer to report a error for invalid memory access, which is not
handled by the above patch and it leads to vdi runtime error:
Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT:
Agent attempted to access an inaccessible address. code: 0x2b
Patch by: Praveen Velliengiri
Reviewed by: Yaxun Liu, Matt Arsenault
Differential Revision: https://reviews.llvm.org/D112820
Two identical instantiations of a template function can be emitted by two TU's
with linkonce_odr linkage without causing duplicate symbols in linker. MSVC
also requires these symbols be in comdat sections. Linux does not require
the symbols in comdat sections to be merged by linker but by default
clang puts them in comdat sections.
If a template kernel is instantiated identically in two TU's. MSVC requires
that them to be in comdat sections, otherwise MSVC linker will diagnose them as
duplicate symbols. However, currently clang does not put instantiated template
kernels in comdat sections, which causes link error for MSVC.
This patch allows putting instantiated template kernels into comdat sections.
Reviewed by: Artem Belevich, Reid Kleckner
Differential Revision: https://reviews.llvm.org/D112492
After the changes introduced by D106799 it is possible to tag
outlined function with both AlwaysInline and NoInline attributes using
-fno-inline command line options.
This issue is similiar to D107649.
Differential Revision: https://reviews.llvm.org/D112645
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.
This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112680
at the start of the entry block, which in turn would aid better code transformation/optimization.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110257
This patch removes the assumption propagation that was added in D110655
primarily to get assumption informatino on opaque call sites for
optimizations. The analysis done in D111445 allows us to do this more
intelligently in the back-end.
Depends on D111445
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D111463
add tracing for loads and stores.
The primary goal is to have more options for data-flow-guided fuzzing,
i.e. use data flow insights to perform better mutations or more agressive corpus expansion.
But the feature is general puspose, could be used for other things too.
Pipe the flag though clang and clang driver, same as for the other SanitizerCoverage flags.
While at it, change some plain arrays into std::array.
Tests: clang flags test, LLVM IR test, compiler-rt executable test.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D113447
Previously, the Backend_Emit{Nothing,BC,LL} modes did
not run the LLVM verifier since it is usually added via
the TargetMachine::addPassesToEmitFile method according
to the DisableVerify parameter. This is called from
EmitAssemblyHelper::AddEmitPasses, which is only relevant
for BackendAction-s that require CodeGen.
Note:
* In these particular situations the verifier is added
to the optimization pipeline rather than the codegen
pipeline so that it runs prior to the BC/LL emission
pass.
* This change applies to both the old and the new PMs.
* Because the clang tests use -emit-llvm ubiquitously,
this change will enable the verifier for them.
* A small bug is fixed in emitIFuncDefinition so that
the clang/test/CodeGen/ifunc.c test would pass:
the emitIFuncDefinition incorrectly passed the
GlobalDecl of the IFunc itself to the call to
GetOrCreateLLVMFunction for creating the resolver.
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D113352
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.
Reviewed By: jdoerfert, JonChesterfield, tianshilei1992
Differential Revision: https://reviews.llvm.org/D113421
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.
Reviewed By: jdoerfert, JonChesterfield, tianshilei1992
Differential Revision: https://reviews.llvm.org/D113421
https://reviews.llvm.org/D92808 made clang use the operand bundle
instead of emitting retainRV/claimRV calls on arm64. This commit makes
changes to clang that are needed to use the operand bundle on x86-64.
Differential Revision: https://reviews.llvm.org/D111331
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.
This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.
The exact set of changes to check-openmp probably needs revision before commit
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112680
Add new triple and target info for ‘spirv32’ and ‘spirv64’ and,
thus, enabling clang (LLVM IR) code emission to SPIR-V target.
The target for SPIR-V is mostly reused from SPIR by derivation
from a common base class since IR output for SPIR-V is mostly
the same as SPIR. Some refactoring are made accordingly.
Added and updated tests for parts that are different between
SPIR and SPIR-V.
Patch by linjamaki (Henry Linjamäki)!
Differential Revision: https://reviews.llvm.org/D109144
Nathan Chancellor reported a crash due to commit
3466e00716 (Reland "[Attr] support btf_type_tag attribute").
The following test can reproduce the crash:
$ cat efi.i
typedef unsigned long efi_query_variable_info_t(int);
typedef struct {
struct {
efi_query_variable_info_t __attribute__((regparm(0))) * query_variable_info;
};
} efi_runtime_services_t;
efi_runtime_services_t efi_0;
$ clang -m32 -O2 -g -c -o /dev/null efi.i
The reason is that FunctionTypeLoc.getParam(Idx) may return a
nullptr which should be checked before dereferencing the
result pointer. This patch fixed this issue.
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
Resolve lit failures in clang after 8ca4b3e's land
Fix lit test failures in clang-ppc* and clang-x64-windows-msvc
Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc
Fix internal_clone(aarch64) inline assembly
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
This is to revert commit f95bd18b5f (Revert "[Attr] support
btf_type_tag attribute") plus a bug fix.
Previous change failed to handle cases like below:
$ cat reduced.c
void a(*);
void a() {}
$ clang -c reduced.c -O2 -g
In such cases, during clang IR generation, for function a(),
CGCodeGen has numParams = 1 for FunctionType. But for
FunctionTypeLoc we have FuncTypeLoc.NumParams = 0. By using
FunctionType.numParams as the bound to access FuncTypeLoc
params, a random crash is triggered. The bug fix is to
check against FuncTypeLoc.NumParams before accessing
FuncTypeLoc.getParam(Idx).
Differential Revision: https://reviews.llvm.org/D111199
This reverts commits 737e4216c5 and
ce7ac9e66a.
After those commits, the compiler can crash with a reduced
testcase like this:
$ cat reduced.c
void a(*);
void a() {}
$ clang -c reduced.c -O2 -g
We almost always want to use the default AA pipeline. It's very easy for
users of PassBuilder to forget to customize the AAManager to use the
default AA pipeline (for example, the NewPM C API forgets to do this).
If somebody wants a custom AA pipeline, similar to what is being done
now with the default AA pipeline registration, they can
FAM.registerPass([&] { return std::move(MyAA); });
before calling
PB.registerFunctionAnalyses(FAM);
For example, LTOBackend.cpp and NewPMDriver.cpp do this.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D113210
This patch added clang codegen and llvm support
for btf_type_tag support. Currently, btf_type_tag
attribute info is preserved in DebugInfo IR only for
pointer types associated with typedef, global variable
and function declaration. Eventually, such information
is emitted to dwarf.
The following is an example:
$ cat test.c
#define __tag __attribute__((btf_type_tag("tag")))
int __tag *g;
$ clang -O2 -g -c test.c
$ llvm-dwarfdump --debug-info test.o
...
0x0000001e: DW_TAG_variable
DW_AT_name ("g")
DW_AT_type (0x00000033 "int *")
DW_AT_external (true)
DW_AT_decl_file ("/home/yhs/test.c")
DW_AT_decl_line (2)
DW_AT_location (DW_OP_addr 0x0)
0x00000033: DW_TAG_pointer_type
DW_AT_type (0x00000042 "int")
0x00000038: DW_TAG_LLVM_annotation
DW_AT_name ("btf_type_tag")
DW_AT_const_value ("tag")
0x00000041: NULL
0x00000042: DW_TAG_base_type
DW_AT_name ("int")
DW_AT_encoding (DW_ATE_signed)
DW_AT_byte_size (0x04)
0x00000049: NULL
Basically, a DW_TAG_LLVM_annotation tag will be inserted
under DW_TAG_pointer_type tag if that pointer has a btf_type_tag
associated with it.
Differential Revision: https://reviews.llvm.org/D111199
This diff makes several amendments to the local file caching mechanism
which was migrated from ThinLTO to Support in
rGe678c51177102845c93529d457b020f969125373 in response to follow-up
discussion on that commit.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D113080
This patch implements __builtin_reduce_max and __builtin_reduce_min as
specified in D111529.
The order of operations does not matter for min or max reductions and
they can be directly lowered to the corresponding
llvm.vector.reduce.{fmin,fmax,umin,umax,smin,smax} intrinsic calls.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D112001
Basically, inline builtin definition are shadowed by externally visible
redefinition. This matches GCC behavior.
The implementation has to workaround the fact that:
1. inline builtin are renamed at callsite during codegen, but
2. they may be shadowed by a later external definition
As a consequence, during codegen, we need to walk redecls and eventually rewrite
some call sites, which is totally inelegant.
Differential Revision: https://reviews.llvm.org/D112059
There's a nuanced check about when to use suffixes on these integer
non-type-template-parameters, but when rebuilding names for
-gsimple-template-names there isn't enough data in the DWARF to
determine when to use suffixes or not. So turn on suffixes always to
make it easy to match up names in llvm-dwarfdump --verify.
I /think/ if we correctly modelled auto non-type-template parameters
maybe we could put suffixes only on those. But there's also some logic
in Clang that puts the suffixes on overloaded functions - at least
that's what the parameter says (see D77598 and printTemplateArguments
"TemplOverloaded" parameter) - but I think maybe it's for anything that
/can/ be overloaded, not necessarily only the things that are overloaded
(the argument value is hardcoded at the various callsites, doesn't seem
to depend on overload resolution/searching for overloaded functions). So
maybe with "auto" modeled more accurately, and differentiating between
function templates (always using type suffixes there) and class/variable
templates (only using the suffix for "auto" types) we could correctly
use integer type suffixes only in the minimal set of cases.
But that seems all too much fuss, so let's just put integer type
suffixes everywhere always in the debug info of integer non-type
template parameters in template names.
(more context:
* https://reviews.llvm.org/D77598#inline-1057607
* https://groups.google.com/g/llvm-dev/c/ekLMllbLIZg/m/-dhJ0hO1AAAJ )
Differential Revision: https://reviews.llvm.org/D111477
Verify that the resolver exists, that it is a defined
Function, and that its return type matches the ifunc's
type. Add corresponding check to BitcodeReader, change
clang to emit the correct type, and fix tests to comply.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D112349
createReplacementInstr was a trivial wrapper around
ConstantExpr::getAsInstruction, which also inserted the new instruction
into a basic block. Implement this directly in getAsInstruction by
adding an InsertBefore parameter and change all callers to use it. NFC.
A follow-up patch will remove createReplacementInstr.
Differential Revision: https://reviews.llvm.org/D112791
Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u,
i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero.
These are only exposed as builtins, and require user opt-in.
Differential Revision: https://reviews.llvm.org/D112186
Add `__c11_atomic_fetch_nand` builtin to language extensions and support `__atomic_fetch_nand` libcall in compiler-rt.
Reviewed By: theraven
Differential Revision: https://reviews.llvm.org/D112400
This patch implements __builtin_elementwise_abs as specified in
D111529.
Reviewed By: aaron.ballman, scanon
Differential Revision: https://reviews.llvm.org/D111986
This patch supports the atomic construct (read and write) following
section 2.17.7 of OpenMP 5.0 standard. Also added tests and
verifier for the same.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D111992
This patch implements __builtin_elementwise_max and
__builtin_elementwise_min, as specified in D111529.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D111985
Adds initial parsing and sema for the 'append_args' clause.
Note that an AST clause is not created as it instead adds its values
to the OMPDeclareVariantAttr.
Differential Revision: https://reviews.llvm.org/D111854
This reverts commit 121b2252de.
The following code causes a crash in some circumstances:
struct k {
~k() __attribute__((annotate(""))) {}
};
void m() { k(); }
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html
The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.
This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.
I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D108872
Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only
exposed as builtins, and require user opt-in.
Differential Revision: https://reviews.llvm.org/D112146
Some downstream users have plugins that -clear-ast-before-backend may
affect. Add an option to opt out.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D112100
Add i8x16 relaxed_swizzle instructions. These are only
exposed as builtins, and require user opt-in.
Differential Revision: https://reviews.llvm.org/D112022
Emit __clangast in custom section instead of named data segment
to find it while iterating sections.
This could be avoided if all data segements (the wasm sense) were
represented as their own sections (in the llvm sense).
This can be resolved by https://github.com/WebAssembly/tool-conventions/issues/138
And the on-disk hashtable in clangast needs to be aligned by 4 bytes,
so add paddings in name length field in custom section header.
The length of clangast section name can be represented in 1 byte
by leb128, and possible maximum pads are 3 bytes, so the section
name length won't be invalid in theory.
Fixes https://bugs.llvm.org/show_bug.cgi?id=35928
Differential Revision: https://reviews.llvm.org/D74531
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D111371
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.
Patch By: noajshu
Differential Revision: https://reviews.llvm.org/D111371
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
Previously without -disable-free, -clear-ast-before-backend would crash in ~ASTContext() due to various reasons.
This works around that by doing a lot of the cleanup ahead of the destructor so that the destructor doesn't actually do any manual cleanup if we've already cleaned up beforehand.
This actually does save a measurable amount of memory with -clear-ast-before-backend, although at an almost unnoticeable runtime cost:
https://llvm-compile-time-tracker.com/compare.php?from=5d755b32f2775b9219f6d6e2feda5e1417dc993b&to=58ef1c7ad7e2ad45f9c97597905a8cf05a26258c&stat=max-rss
Previously we weren't doing any cleanup with -disable-free, so I tried measuring the impact of always doing the cleanup and didn't measure anything noticeable on llvm-compile-time-tracker.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D111767
Not all constants are emitted within the context of a function, so use
the module's ASTContext instead because 1) that's the same as the
current function ASTContext, and 2) the module can never be null.
Fixes PR50787.
Adds initial parsing and sema for the 'adjust_args' clause.
Note that an AST clause is not created as it instead adds its expressions
to the OMPDeclareVariantAttr.
Differential Revision: https://reviews.llvm.org/D99905
This reverts commit 97f0c63783.
As discussed in https://reviews.llvm.org/D110684, it increased the
compile time and the binary size of clang more than 1%. I reverted
this patch first to think about a better way to do it.
This implements the new implicit conversion sequence to an incomplete
(unbounded) array type. It is mostly Richard Smith's work, updated to
trunk, testcases added and a few bugs fixed found in such testing.
It is not a complete implementation of p0388.
Differential Revision: https://reviews.llvm.org/D102645
Current btf_tag is applied to declaration only.
Per discussion in https://reviews.llvm.org/D111199,
we plan to introduce btf_type_tag attribute for types.
So rename btf_tag to btf_decl_tag to make it easily
differentiable from btf_type_tag.
Differential Revision: https://reviews.llvm.org/D111588
Sequel patch to https://reviews.llvm.org/D111293.
Remove call to CodeGenFunction::InitTempAlloca() from OpenMP related
codegen part.
Also remove the metadata `!llvm.access.group` from the updated lit
tests.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D111316
In the original design, we levarage _mt intrinsics to define macros for
_m intrinsics. Such as,
```
__builtin_rvv_vadd_vv_i8m1_mt((vbool8_t)(op0), (vint8m1_t)(op1), (vint8m1_t)(op2), (vint8m1_t)(op3), (size_t)(op4), (size_t)VE_TAIL_AGNOSTIC)
```
However, we could not define generic interface for mask intrinsics any
more due to clang_builtin_alias only accepts clang builtins as its
argument.
In the example,
```
__rvv_overloaded
__attribute__((clang_builtin_alias(__builtin_rvv_vadd_vv_i8m1_mt)))
vint8m1_t vadd(vbool8_t op0, vint8m1_t op1, vint8m1_t op2, vint8m1_t
op3, size_t op4, size_t op5);
```
op5 is the tail policy argument. When users want to use vadd generic
interface for masked vector add, they need to specify tail policy in the
previous design. In this patch, we define _m intrinsics as clang
builtins to solve the problem.
Differential Revision: https://reviews.llvm.org/D110684
When AnnotateAttr is on a function, AddGlobalAnnotations is only called
in CodeGenModule::EmitGlobalFunctionDefinition which means AnnotateAttr
on function declaration without function body will be ignored.
The patch will move AddGlobalAnnotations to
CodeGenModule::SetFunctionAttributes, so with or without function body,
the AnnotateAttr will get code gen for a function.
It'll help case when AnnotateAttr is on external function, and the
AnnotateAttr will be consumed in IR level.
For example, a pass to collect num of uses for functions with
__attribute((annotate("count_use"))) after optimizations,
As long as there's __attribute((annotate("count_use"))), function with
or without function body should be counted.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D111109
Patch by: python3kgae (Xiang Li)
CodeGenFunction::InitTempAlloca() inits the static alloca within the
entry block which may *not* necessarily be correct always.
For example, the current instruction insertion point (pointed by the
instruction builder) could be a program point which is hit multiple
times during the program execution, and it is expected that the static
alloca is initialized every time the program point is hit.
Hence remove CodeGenFunction::InitTempAlloca(), and initialize the
static alloca where the instruction insertion point is at the moment.
This patch, as a starting attempt, removes the calls to
CodeGenFunction::InitTempAlloca() which do not have any side effect on
the lit tests.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D111293
In this case, we know statically that we're destroying the most-derived
class, so the vptr must already point to the current class and never
needs to be updated.
Currently, there're multiple float types that can be represented by
__attribute__((mode(xx))). It's parsed, and then a corresponding type is
created if available.
This refactor moves the enum for mode into a global enum class visible
to ASTContext.
Reviewed By: aaron.ballman, erichkeane
Differential Revision: https://reviews.llvm.org/D111391
This patch adds support for the
`__kmpc_get_hardware_num_threads_in_block` function that returns the
number of threads. This was missing in the new runtime and was used by
the AMDGPU plugin which prevented it from using the new runtime. This
patchs also unified the interface for getting the thread numbers in the
frontend.
Originally authored by jdoerfert.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D111475
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
Previously if you passed an absolute path to clang, where only part of
the path to the file was remapped, it would result in the file's DIFile
being stored with a duplicate path, for example:
```
!DIFile(filename: "./ios/Sources/bar.c", directory: "./ios/Sources")
```
This change handles absolute paths, specifically in the case they are
remapped to something relative, and uses the dirname for the directory,
and basename for the filename.
This also adds a test verifying this behavior for more standard uses as
well.
Differential Revision: https://reviews.llvm.org/D111352
non-Darwin ObjC runtimes:
- Use the same logic the Darwin runtime does for inferring that a
receiver is non-null and therefore doesn't require null checks.
Previously we weren't skipping these for non-super dispatch.
- Emit a null check when there's a consumed parameter so that we can
destroy the argument if the call doesn't happen. This mostly
involves extracting some common logic from the Darwin-runtime code.
- Generate a zero aggregate by zeroing the same memory that was used
in the method call instead of zeroing separate memory and then
merging them with a phi. This uses less memory and avoids unnecessary
copies.
- Emit zero initialization, and generate zero values in phis, using
the proper zero-value routines instead of assuming that the zero
value of the result type has a bitwise-zero representation.
This patch adds two flags to be supported for the new runtime. The flags
are `-fopenmp-assume-threads-oversubscription` and
-fopenmp-assume-teams-oversubscription`. These add global values that
can be checked by the work sharing runtime functions to make better
judgements about how to distribute work between the threads.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D111348
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
This reverts c7f16ab3e3 / r109694 - which
suggested this was done to improve consistency with the gdb test suite.
Possible that at the time GCC did not canonicalize integer types, and so
matching types was important for cross-compiler validity, or that it was
only a case of over-constrained test cases that printed out/tested the
exact names of integer types.
In any case neither issue seems to exist today based on my limited
testing - both gdb and lldb canonicalize integer types (in a way that
happens to match Clang's preferred naming, incidentally) and so never
print the original text name produced in the DWARF by GCC or Clang.
This canonicalization appears to be in `integer_types_same_name_p` for
GDB and in `TypeSystemClang::GetBasicTypeEnumeration` for lldb.
(I tested this with one translation unit defining 3 variables - `long`,
`long (*)()`, and `int (*)()`, and another translation unit that had
main, and a function that took `long (*)()` as a parameter - then
compiled them with mismatched compilers (either GCC+Clang, or
Clang+(Clang with this patch applied)) and no matter the combination,
despite the debug info for one CU naming the type "long int" and the
other naming it "long", both debuggers printed out the name as "long"
and were able to correctly perform overload resolution and pass the
`long int (*)()` variable to the `long (*)()` function parameter)
Did find one hiccup, identified by the lldb test suite - that CodeView
was relying on these names to map them to builtin types in that format.
So added some handling for that in LLVM. (these could be split out into
separate patches, but seems small enough to not warrant it - will do
that if there ends up needing any reverti/revisiting)
Differential Revision: https://reviews.llvm.org/D110455
adjustMemberOfForLambdaCaptures.
The problem is happening when user passes lambda function with reference
type in the map clause.
The natural of the problem when processing generateInfoForCapture,
the BasePointer is generated with new load for a lambda variable with
reference type. It is not expected in adjustMemberOfForLambdaCaptures.
One way to fix this is to skipping call to generateInfoForCapture for
map(to:lambda). The map info will be generated later in the call to
generateDefaultMapInfo samiler as firsprivate clase.
This to fix https://bugs.llvm.org/show_bug.cgi?id=52071
Differential Revision:https://reviews.llvm.org/D111115
This is to save memory for Clang compiles.
Measuring building PassBuilder.cpp under /usr/bin/time, max rss goes from 0.93GB to 0.7GB.
This does not turn it by default yet.
I've turned on the option locally and run it over a good amount of files without any issues.
For more background, see
https://lists.llvm.org/pipermail/cfe-dev/2021-September/068930.html.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D111105
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
Insert OMPLoopTransformationDirective between OMPLoopBasedDirective and the loop transformations OMPTileDirective and OMPUnrollDirective. This simplifies handling of loop transformations not requiring distinguishing between OMPTileDirective and OMPUnrollDirective anymore.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D111119
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
Modify the IfStmt node to suppoort constant evaluated expressions.
Add a new ExpressionEvaluationContext::ImmediateFunctionContext to
keep track of immediate function contexts.
This proved easier/better/probably more efficient than walking the AST
backward as it allows diagnosing nested if consteval statements.
We keep a map from function name to source location so we don't have to
do it via looking up a source location from the AST. However, since
function names can be long, we actually use a hash of the function name
as the key.
Additionally, we can't rely on Clang's printing of function names via
the AST, so we just demangle the name instead.
This is necessary to implement
https://lists.llvm.org/pipermail/cfe-dev/2021-September/068930.html.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D110665
Per the GCC info page:
If the function is declared 'extern', then this definition of the
function is used only for inlining. In no case is the function
compiled as a standalone function, not even if you take its address
explicitly. Such an address becomes an external reference, as if
you had only declared the function, and had not defined it.
Respect that behavior for inline builtins: keep the original definition, and
generate a copy of the declaration suffixed by '.inline' that's only referenced
in direct call.
This fixes holes in c3717b6858.
Differential Revision: https://reviews.llvm.org/D111009
by Raul Penacoba.
The size of kmp_depend_info and the number of dependencies are computed multiplying the iterator sizes, which not right.
Now size is computed as:
itersize1*numclausedeps1 + itersize2*numclausedeps2 + ... + itersizeN*numclausedepsN
where itersizeX is the size of the iterator and numclausedepsX the number of dependencies in that depend clause.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D111045
This patch fixes the return value of the builtin __builtin_ppc_load2r to
correctly return short instead of int.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D110771
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.
Differential Revision: https://reviews.llvm.org/D110808
This patch adds OpenMP assumption attributes to call sites in applicable
regions. Currently this applies the caller's assumption attributes to
any calls contained within it. So, if a call occurs inside an OpenMP
assumes region to a function outside that region, we will assume that
call respects the assumptions. This is primarily useful for inline
assembly calls used heavily in the OpenMP GPU device runtime, which
allows us to then make judgements about what the ASM will do.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110655
This patch is in a series of patches to provide builtins for compatibility with
the XL compiler. This patch implements the software divide builtin as
wrappers for a floating point divide. XL provided these builtins because it
didn't produce software estimates by default at `-Ofast`. When compiled
with `-Ofast` these builtins will produce the software estimate for divide.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D106959
With -fpreserve-vec3-type enabled, a cast was not created when
converting from a non-vec3 type to a vec3 type, even though a
conversion to vec3 was performed. This resulted in creation of
invalid store instructions.
Differential Revision: https://reviews.llvm.org/D108470
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).
One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.
Previous revisions didn't properly declare the new dependencies.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110364
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).
One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110364
(This is a recommit of 3d6f49a569 that should no longer break validation since
bd379915de).
It is a common practice in glibc header to provide an inline redefinition of an
existing function. It is especially the case for fortified function.
Clang currently has an imperfect approach to the problem, using a combination of
trivially recursive function detection and noinline attribute.
Simplify the logic by suffixing these functions by `.inline` during codegen, so
that they are not recognized as builtin by llvm.
After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite
Differential Revision: https://reviews.llvm.org/D109967
This patch is in a series of patches to provide builtins for
compatability with the XL compiler. This patch adds builtins for compare
exponent and test data class operations on floating point values.
Reviewed By: #powerpc, lei
Differential Revision: https://reviews.llvm.org/D109437
It is a common practice in glibc header to provide an inline redefinition of an
existing function. It is especially the case for fortified function.
Clang currently has an imperfect approach to the problem, using a combination of
trivially recursive function detection and noinline attribute.
Simplify the logic by suffixing these functions by `.inline` during codegen, so
that they are not recognized as builtin by llvm.
After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite
Differential Revision: https://reviews.llvm.org/D109967
This patch adds the following built-ins:
__builtin_vsx_build_pair
__builtin_mma_build_acc
Reviewed By: #powerpc, nemanjai, lei
Differential Revision: https://reviews.llvm.org/D107647
This patch adds a new RTL function for worksharing. Currently we use
`__kmpc_for_static_init` for both the `distribute` and `parallel`
portion of the loop clause. This patch replaces the `distribute` portion
with a new runtime call `__kmpc_distribute_static_init`. Currently this
will be used exactly the same way, but will make it easier in the future
to fine-tune the distribute and parallel portion of the loop.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110429
This excludes certain names that can't be rebuilt from the available
DWARF:
* Atomic types - no DWARF differentiating int from atomic int.
* Vector types - enough DWARF (an attribute on the array type) to do
this, but I haven't written the extra code to add the attributes
required for this
* Lambdas - ambiguous with any other unnamed class
* Unnamed classes/enums - would need column info for the type in
addition to file/line number
* noexcept function types - not encoded in DWARF
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D110209
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics
We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.
This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)
In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110029
The matrix extension requires the indices for matrix subscript
expression to be valid and it is UB otherwise.
extract/insertelement produce poison if the index is invalid, which
limits the optimizer to not be bale to scalarize load/extract pairs for
example, which causes very suboptimal code to be generated when using
matrix subscript expressions with variable indices for large matrixes.
This patch updates IRGen to emit assumes to for index expression to
convey the information that the index must be valid.
This also adjusts the order in which operations are emitted slightly, so
indices & assumes are added before the load of the matrix value.
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D102478
Using the preferred name creates a mismatch between the textual name of
a type and the DWARF tags describing the parameters as well as possible
inconsistency between DWARF producers (like Clang and GCC, or
older/newer Clang versions, etc).
See PR51862.
The consumers of the Elidable flag in CXXConstructExpr assume that
an elidable construction just goes through a single copy/move construction,
so that the source object is immediately passed as an argument and is the same
type as the parameter itself.
With the implementation of P2266 and after some adjustments to the
implementation of P1825, we started (correctly, as per standard)
allowing more cases where the copy initialization goes through
user defined conversions.
With this patch we stop using this flag in NRVO contexts, to preserve code
that relies on that assumption.
This causes no known functional changes, we just stop firing some asserts
in a cople of included test cases.
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D109800
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6
Differential Revision: https://reviews.llvm.org/D102107
D109607 results in a regression in llvm-test-suite.
The reason is we didn't check the size of SourceTy, so that we will
return wrong SSE type when SourceTy is overlapped.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D110037
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.
A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h
Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91944
This fixes a bug in clang where, when clang sees a switch with a
fallthrough to a default like this:
static void funcA(void) {}
static void funcB(void) {}
int main(int argc, char **argv) {
switch (argc) {
case 0:
funcA();
break;
case 10:
default:
funcB();
break;
}
}
It does not add a proper debug location for that switch case, such as
case 10: above.
Patch by Shubham Rastogi!
Differential Revision: https://reviews.llvm.org/D109940
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.
A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h
Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91944
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.
A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h
Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91944
Use irbuilder as default and remove redundant Clang codegen for masked construct and master construct.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D100874
We previously made all multiversioning resolvers/ifuncs have weak
ODR linkage in IR, since we NEED to emit the whole resolver every time
we see a call, but it is not necessarily the place where all the
definitions live.
HOWEVER, when doing so, we neglected the case where the versions have
internal linkage. This patch ensures we do this, so you don't get weird
behavior with static functions.
D105263 adds support for _Float16 type. It introduced a bug (pr51813) that generates a <4 x half> type instead the default double when passing blank structure by SSE registers.
Although I doubt it may expose a bug somewhere other than D105263, it's good to avoid return half type when no half type in arguments.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D109607
Summary:
Introduce a new frontend flag `-fswift-async-fp={auto|always|never}`
that controls how code generation sets the Swift extended async frame
info bit. There are three possibilities:
* `auto`: which determines how to set the bit based on deployment target, either
statically or dynamically via `swift_async_extendedFramePointerFlags`.
* `always`: default, always set the bit statically, regardless of deployment
target.
* `never`: never set the bit, regardless of deployment target.
Differential Revision: https://reviews.llvm.org/D109451
Storing the vtable field of an object should use the same address space as
the this pointer. Currently it is assumed to be addr space 0 but this may not
be true.
This assumption (added in 054cc3b1b4) caused
issues for the out-of-tree CHERI targets.
Reviewed by: John McCall, Alexander Richardson
Differential Revision: https://reviews.llvm.org/D109841
Remove the previous error and add support for special handling of small
complex types as in PPC64 ELF ABI. As in, generate code to load from
varargs location and pack it in a temp variable, then return a pointer to
the struct.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D106393
Split ThreadSanitizerPass into ThreadSanitizerPass (as a function
pass) and ModuleThreadSanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.
This is a follow-up to D105006 and D105007.
Split MemorySanitizerPass into MemorySanitizerPass (as a function
pass) and ModuleMemorySanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.
This is a follow-up to D105006 and D105007.
This change puts the functionality in commit
c5792aa90f behind a flag that is off by
default. The original commit is not in Apple's Clang fork (and blocks
are an Apple extension in the first place), and there is one known
issue that needs to be addressed before it can be enabled safely.
Differential Revision: https://reviews.llvm.org/D108243
eg: t1<void () const> - DWARF doesn't have a particularly nice way to
encode this, for real member function types (like `void (t1::*)()
const`) the const-ness is encoded in the type of the artificial first
parameter. But `void () const` has no parameters, so encode it like a
normal const-qualified type, using DW_TAG_const_type. (similarly for
restrict and volatile)
Reference qualifiers (& and &&) coming in a separate commit shortly.
This patch introduces the flags `-fopenmp-target-debug` and
`-fopenmp-target-debug=` to set the value of a global in the device.
This will be used to enable or disable debugging features statically in
the device runtime library.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D109544
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
integer 0/1 for the operand of bundle "clang.arc.attachedcall"
This should make it easier to understand what the IR is doing and also
simplify some of the passes as they no longer have to translate the
integer values to the runtime functions.
Differential Revision: https://reviews.llvm.org/D102996
Currently, we have no front-end type for ppc_fp128 type in IR. PowerPC
target generates ppc_fp128 type from long double now, but there's option
(-mabi=(ieee|ibm)longdouble) to control it and we're going to do
transition from IBM extended double-double ppc_fp128 to IEEE fp128 in
the future.
This patch adds type __ibm128 which always represents ppc_fp128 in IR,
as what GCC did for that type. Without this type in Clang, compilation
will fail if compiling against future version of libstdcxx (which uses
__ibm128 in headers).
Although all operations in backend for __ibm128 is done by software,
only PowerPC enables support for it.
There's something not implemented in this commit, which can be done in
future ones:
- Literal suffix for __ibm128 type. w/W is suitable as GCC documented.
- __attribute__((mode(IF))) should be for __ibm128.
- Complex __ibm128 type.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D93377
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:
* `unrollLoopFull`
* `unrollLoopPartial`
* `unrollLoopHeuristic`
`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.
With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.
Reviewed By: jdoerfert, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107764
Add support for ordered directive in the OpenMPIRBuilder.
This patch also modidies clang to use the ordered directive when the
option -fopenmp-enable-irbuilder is enabled.
Also fix one ICE when parsing one canonical for loop with the relational
operator LE or GE in openmp region by replacing unary increment
operation of the expression of the variable "Expr A" minus the variable
"Expr B" (++(Expr A - Expr B)) with binary addition operation of the
experssion of the variable "Expr A" minus the variable "Expr B" and the
expression with constant value "1" (Expr A - Expr B + "1").
Reviewed By: Meinersbur, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107430
I discovered this quirk when working on some DWARF - AST printing prints
type template parameters fully qualified, but printed template template
parameters the way they were written syntactically, or wholely
unqualified - instead, we should print them consistently with the way we
print type template parameters: fully qualified.
The one place this got weird was for partial specializations like in
ast-print-temp-class.cpp - hence the need for checking for
TemplateNameDependenceScope::DependentInstantiation template template
parameters. (not 100% sure that's the right solution to that, though -
open to ideas)
Differential Revision: https://reviews.llvm.org/D108794
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)
TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.
While the end result of discussion may lead back to the current design,
it may also not lead to the current design.
Therefore i take it upon myself
to revert the tree back to last known good state.
This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
Breaks build with -DBUILD_SHARED_LIBS=ON
```
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
"LLVMFrontendOpenMP" of type SHARED_LIBRARY
depends on "LLVMPasses" (weak)
"LLVMipo" of type SHARED_LIBRARY
depends on "LLVMFrontendOpenMP" (weak)
"LLVMCoroutines" of type SHARED_LIBRARY
depends on "LLVMipo" (weak)
"LLVMPasses" of type SHARED_LIBRARY
depends on "LLVMCoroutines" (weak)
depends on "LLVMipo" (weak)
At least one of these targets is not a STATIC_LIBRARY. Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed. Build files cannot be regenerated correctly.
```
This reverts commit 707ce34b06.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:
* `unrollLoopFull`
* `unrollLoopPartial`
* `unrollLoopHeuristic`
`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.
With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.
Reviewed By: jdoerfert, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107764
Discovered in SYCL, the field annotations were always cast to an i8*,
which is an invalid bitcast for a pointer type with an address space.
This patch makes sure that we create an intrinsic that takes a pointer
to the correct address-space and properly do our casts.
Differential Revision: https://reviews.llvm.org/D109003
This patch implements Clang support for an original OpenMP extension
we have developed to support OpenACC: the `ompx_hold` map type
modifier. The next patch in this series, D106510, implements OpenMP
runtime support.
Consider the following example:
```
#pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x
{
foo(); // might have map(delete: x)
#pragma omp target map(present, alloc: x) // x is guaranteed to be present
printf("%d\n", x);
}
```
The `ompx_hold` map type modifier above specifies that the `target
data` directive holds onto the mapping for `x` throughout the
associated region regardless of any `target exit data` directives
executed during the call to `foo`. Thus, the presence assertion for
`x` at the enclosed `target` construct cannot fail. (As usual, the
standard OpenMP reference count for `x` must also reach zero before
the data is unmapped.)
Justification for inclusion in Clang and LLVM's OpenMP runtime:
* The `ompx_hold` modifier supports OpenACC functionality (structured
reference count) that cannot be achieved in standard OpenMP, as of
5.1.
* The runtime implementation for `ompx_hold` (next patch) will thus be
used by Flang's OpenACC support.
* The Clang implementation for `ompx_hold` (this patch) as well as the
runtime implementation are required for the Clang OpenACC support
being developed as part of the ECP Clacc project, which translates
OpenACC to OpenMP at the directive AST level. These patches are the
first step in upstreaming OpenACC functionality from Clacc.
* The Clang implementation for `ompx_hold` is also used by the tests
in the runtime implementation. That syntactic support makes the
tests more readable than low-level runtime calls can. Moreover,
upstream Flang and Clang do not yet support OpenACC syntax
sufficiently for writing the tests.
* More generally, the Clang implementation enables a clean separation
of concerns between OpenACC and OpenMP development in LLVM. That
is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's
extended OpenMP implementation and test suite without directly
considering OpenACC's language and execution model, which can be
handled by LLVM's OpenACC developers.
* OpenMP users might find the `ompx_hold` modifier useful, as in the
above example.
See new documentation introduced by this patch in `openmp/docs` for
more detail on the functionality of this extension and its
relationship with OpenACC. For example, it explains how the runtime
must support two reference counts, as specified by OpenACC.
Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new
command-line option introduced by this patch, is specified.
Reviewed By: ABataev, jdoerfert, protze.joachim, grokos
Differential Revision: https://reviews.llvm.org/D106509
This change defines a helper function getOpenCLCompatibleVersion()
inside LangOptions class. The function contains mapping between
C++ for OpenCL versions and their corresponding compatible OpenCL
versions. This mapping function should be updated each time a new
C++ for OpenCL language version is introduced. The helper function
is expected to simplify conditions on OpenCL C and C++ for OpenCL
versions inside compiler code.
Code refactoring performed.
Differential Revision: https://reviews.llvm.org/D108693
Streamline template arguments across types, variables, and functions -
for convenient reuse in experiments related to template argument list
reconstitution (not including template argument lists in the "name" of
those entities, and leaving it to debug info consumers to rebuild the
full template name from the semantic descriptions of the argument lists)
But the change seems like a good refactoring/cleanup anyway.
I'd certainly be open to suggestions about how this might be more
streamlined - like is there no generic way to query template argument
lists across the 3 kinds of entities, rather than needing special case
code?
Extend the information preserved in `TypeInfo` by replacing the `AlignIsRequired` bool flag with a three-valued enum, the enum also indicates where the alignment attribute come from, which could be helpful in determining whether the attribute should overrule.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D108858
Generate btf_tag annotations for DILocalVariable. The annotations
are represented as an DINodeArray in DebugInfo.
Differential Revision: https://reviews.llvm.org/D106620
Generate btf_tag annotations for DIGlobalVariable. The annotations
are represented as an DINodeArray in DebugInfo.
Differential Revision: https://reviews.llvm.org/D106619
Generate btf_tag annotations for DISubprograms. The annotations
are represented as an DINodeArray in DebugInfo.
Differential Revision: https://reviews.llvm.org/D106618
Klocwork static code analysis exposed this concern:
Pointer 'SubExpr' returned from call to getSubExpr() function which may
return NULL from 'cast_or_null<Expr>(Operand)', which will be
dereferenced in the statement following it
Add an assert on SubExpr to make it clear this pointer cannot be null.
The existing code attempting to bitcast from a value in the default globals AS
to i8 addrspace(0)* was triggering an assertion failure in our downstream fork.
I found this while compiling poppler for CHERI-RISC-V (we use AS200 for all
globals). The test case uses AMDGPU since that is one of the in-tree targets
with a non-zero default globals address space.
The new test previously triggered a "Invalid constantexpr bitcast!" assertion
and now correctly generates code with addrspace(1) pointers.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D105972
Add support for the GNU C style __attribute__((error(""))) and
__attribute__((warning(""))). These attributes are meant to be put on
declarations of functions whom should not be called.
They are frequently used to provide compile time diagnostics similar to
_Static_assert, but which may rely on non-ICE conditions (ie. relying on
compiler optimizations). This is also similar to diagnose_if function
attribute, but can diagnose after optimizations have been run.
While users may instead simply call undefined functions in such cases to
get a linkage failure from the linker, these provide a much more
ergonomic and actionable diagnostic to users and do so at compile time
rather than at link time. Users instead may be able use inline asm .err
directives.
These are used throughout the Linux kernel in its implementation of
BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be
converted to use _Static_assert because many of the parameters are not
ICEs. The Linux kernel still needs to be modified to make use of these
when building with Clang; I have a patch that does so I will send once
this feature is landed.
To do so, we create a new IR level Function attribute, "dontcall" (both
error and warning boil down to one IR Fn Attr). Then, similar to calls
to inline asm, we attach a !srcloc Metadata node to call sites of such
attributed callees.
The backend diagnoses these during instruction selection, while we still
know that a call is a call (vs say a JMP that's a tail call) in an arch
agnostic manner.
The frontend then reconstructs the SourceLocation from that Metadata,
and determines whether to emit an error or warning based on the callee's
attribute.
Link: https://bugs.llvm.org/show_bug.cgi?id=16428
Link: https://github.com/ClangBuiltLinux/linux/issues/1173
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D106030
NVPTX does not allow dots in the identifier, so ptxas errors out with
fatal : Parsing error near '.static': syntax error
because it parses .static as a directive. Avoid this problem by using
two underscores, similar to what OpenMP does for outlined functions.
Differential Revision: https://reviews.llvm.org/D108456
8ace121305 introduced a regression for code that explicitly ignores the
-Wframe-larger-than= warning. Make sure we don't generate the
warn-stack-size attribute for that case.
Differential Revision: https://reviews.llvm.org/D108686
Previously when emitting a C++ guarded initializer, we tried to work out what
the enclosing function would be used for and added it to the COMDAT containing
the variable if we thought that doing so would be correct. But this was done
from a context in which we didn't -- and realistically couldn't -- correctly
infer how the enclosing function would be used.
Instead, add the initialization function to a COMDAT from the code that
creates it, in the case where it makes sense to do so: when we know that
the one and only reference to the initialization function is in
@llvm.global.ctors and that reference is in the same COMDAT.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D108680
CodeGenAction::ExecuteAction creates a BackendConsumer for the
purpose of handling diagnostics. The BackendConsumer's
DiagnosticHandlerImpl method expects CurLinkModule to be set,
but this did not happen on the code path that goes through
ExecuteAction. This change makes it so that the BackendConsumer
constructor used by ExecuteAction requires the Module to be
specified and passes the appropriate module in ExecuteAction.
The change also adds a test that fails without this change
and passes with it. To make the test work, the FIXME in the
handling of DK_Linker diagnostics was addressed so that warnings
and notes are no longer silently discarded. Since this introduces
a new warning diagnostic, a flag to control it (-Wlinker-warnings)
has also been added.
Reviewed By: xur
Differential Revision: https://reviews.llvm.org/D108603
...instead of redeclaring them in clang's own X86Target.def. They were already
required to be in sync (IIUC), so no reason to maintain two identical lists.
Reviewed By: erichkeane, craig.topper
Differential Revision: https://reviews.llvm.org/D108151
Remove redundant fields and replace pointer with virtual function
Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.
This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108380
This function was defaulting to use the ABI alignment for the LLVM
type. Here we change to use the preferred alignment. This will allow
unification with GetTempAlloca, which if alignment isn't specified, uses
the preferred alignment.
Differential Revision: https://reviews.llvm.org/D108450
Pass a LangAS instead of a target address space to
GetOrCreateLLVMGlobal, to remove a place where the frontend assumes that
target address space 0 is special.
Differential Revision: https://reviews.llvm.org/D108445
Mapping expressions that have `this` as their base expression aren't
considered a valid base variable and the rest of the runtime expects
this. However, if we have an expression with no value declaration we can
try to extract it manually to provide more helpful debuggin information.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108483
Generate btf_tag annotations for record fields. The annotations
are represented as an DINodeArray in DebugInfo.
Differential Revision: https://reviews.llvm.org/D106616
Partially reverts 85157c0079, which had removed these builtins and intrinsics
in favor of normal codegen patterns. It turns out that it is possible for the
patterns to be split over multiple basic blocks, however, which means that DAG
ISel is not able to select them to the pmin/pmax instructions. To make sure the
SIMD intrinsics generate the correct instructions in these cases, reintroduce
the clang builtins and corresponding LLVM intrinsics, but also keep the normal
pattern matching as well.
Differential Revision: https://reviews.llvm.org/D108387
Remove redundant fields and replace pointer with virtual function
Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.
This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D108380
The purpose of __attribute__((disable_sanitizer_instrumentation)) is to
prevent all kinds of sanitizer instrumentation applied to a certain
function, Objective-C method, or global variable.
The no_sanitize(...) attribute drops instrumentation checks, but may
still insert code preventing false positive reports. In some cases
though (e.g. when building Linux kernel with -fsanitize=kernel-memory
or -fsanitize=thread) the users may want to avoid any kind of
instrumentation.
Differential Revision: https://reviews.llvm.org/D108029
Clang patch D106614 added attribute btf_tag support. This patch
generates btf_tag annotations for DIComposite types.
Each btf_tag annotation is represented as a 2D array of
meta strings. Each record may have more than one
btf_tag annotations.
Differential Revision: https://reviews.llvm.org/D106615
[nfc] Replaces enum indices into an array with a struct. Named the
fields to match the enum, leaves memory layout and initialization unchanged.
Motivation is to later safely remove dead fields and replace redundant ones
with (compile time) computation. It should also be possible to factor some
common fields into a base and introduce a gfx10 amdgpu instance with less
duplication than the arrays of integers require.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D108339
With -fpreserve-vec3-type enabled, a cast was not created when
converting from a vec3 type to a non-vec3 type, even though a
conversion to vec4 was performed. This resulted in creation of
invalid store instructions.
Differential Revision: https://reviews.llvm.org/D107963
Refactored implementation of AddressSanitizerPass and
HWAddressSanitizerPass to use pass options similar to passes like
MemorySanitizerPass. This makes sure that there is a single mapping
from class name to pass name (needed by D108298), and options like
-debug-only and -print-after makes a bit more sense when (despite
that it is the unparameterized pass name that should be used in those
options).
A result of the above is that some pass names are removed in favor
of the parameterized versions:
- "khwasan" is now "hwasan<kernel;recover>"
- "kasan" is now "asan<kernel>"
- "kmsan" is now "msan<kernel>"
Differential Revision: https://reviews.llvm.org/D105007
This patch implements Flow Sensitive Sample FDO (FSAFDO) profile
loader. We have two profile loaders for FS profile,
one before RegAlloc and one before BlockPlacement.
To enable it, when -fprofile-sample-use=<profile> is specified,
add "-enable-fs-discriminator=true \
-disable-ra-fsprofile-loader=false \
-disable-layout-fsprofile-loader=false"
to turn on the FS profile loaders.
Differential Revision: https://reviews.llvm.org/D107878
Removed AArch64 usage of the getMaxVScale interface, replacing it with
the vscale_range(min, max) IR Attribute.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D106277
Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives.
CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards.
In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0
Also see discussion in D105706.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D107540
- They need to be preserved even if there's no reference within the
device code as the host code may need to initialize them based on the
application logic.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D107718
We were using an OpaqueValueExpr allocated on the stack to store
the size of a VLA. Because the VLASizeMap in CodegenFunction
uses the address of the expression to avoid recomputing VLAs,
we were accidentally reusing an earlier llvm::Value. This led to
invalid LLVM IR.
This is a temporary solution until VLASizeMap can be pushed and popped
based on the context.
Differential Revision: https://reviews.llvm.org/D107666
After D94315 we add the `NoInline` attribute to the outlined function to handle
data environments in the OpenMP if clause. This conflicted with the `AlwaysInline`
attribute added to the outlined function. for better performance in D106799.
The data environments should ideally not require NoInline, but for now this
fixes PR51349.
Reviewed By: mikerice
Differential Revision: https://reviews.llvm.org/D107649
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
GCC supports multiple forms of -falign-loops=.
-falign-loops= is currently ignored in Clang.
This patch implements the simplest but the most useful form where N is a
power of 2.
The underlying implementation uses a `llvm::TargetOptions` option for now.
Bitcode generation ignores this option.
Differential Revision: https://reviews.llvm.org/D106701
Implement target builtins for gfx90a including fadd64, fadd32, add2h,
max and min on various global, flat and ds address spaces for which
intrinsics are implemented.
Differential Revision: https://reviews.llvm.org/D106909
This matches the behavior of GCC.
Patch does not change remapping logic itself, so adding one simple smoke test should be enough.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D107393
For fixed SVE types, predicates are represented using vectors of i8,
where as for scalable types they are represented using vectors of i1. We
can avoid going through memory for casts between these by bitcasting the
i1 scalable vectors to/from a scalable i8 vector of matching size, which
can then use the existing vector insert/extract logic.
Differential Revision: https://reviews.llvm.org/D106860
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
In LLVM IR terms the ACLE type 'data512_t' is essentially an aggregate
type { [8 x i64] }. When emitting code for inline assembly operands,
clang tries to scalarize aggregate types to an integer of the equivalent
length, otherwise it passes them by-reference. This patch adds a target
hook to tell whether a given inline assembly operand is scalarizable
so that clang can emit code to pass/return it by-value.
Differential Revision: https://reviews.llvm.org/D94098
Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D107025
@kpn pointed out that the global variable initialization functions didn't
have the "strictfp" metadata set correctly, and @rjmccall said that there
was buggy code in SetFPModel and StartFunction, this patch is to solve
those problems. When Sema creates a FunctionDecl, it sets the
FunctionDeclBits.UsesFPIntrin to "true" if the lexical FP settings
(i.e. a combination of command line options and #pragma float_control
settings) correspond to ConstrainedFP mode. That bit is used when CodeGen
starts codegen for a llvm function, and it translates into the
"strictfp" function attribute. See bugs.llvm.org/show_bug.cgi?id=44571
Reviewed By: Aaron Ballman
Differential Revision: https://reviews.llvm.org/D102343
On ELF, an SHT_INIT_ARRAY outside a section group is a GC root. The current
codegen abuses SHT_INIT_ARRAY in a section group to mean a GC root.
On PE/COFF, the dynamic initialization for `__declspec(selectany)` in a comdat
can be garbage collected by `-opt:ref`.
Call `addUsedGlobal` for the two cases to fix the abuse/bug.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D106925
The device runtime contains several calls to __kmpc_get_hardware_num_threads_in_block
and __kmpc_get_hardware_num_blocks. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.
In commit D106033 we have the optimization phase. This commit adds the attributes to
the outlined function for the grid size. the two attributes are `omp_target_num_teams` and
`omp_target_thread_limit`. These values are added as long as they are constant.
Two functions are created `getNumThreadsExprForTargetDirective` and
`getNumTeamsExprForTargetDirective`. The original functions `emitNumTeamsForTargetDirective`
and `emitNumThreadsForTargetDirective` identify the expresion and emit the code.
However, for the Device version of the outlined function, we cannot emit anything.
Therefore, this is a first attempt to separate emision of code from deduction of the
values.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106298
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.
Differential Revision: https://reviews.llvm.org/D106724
Allegedly the DWARF backend ignores this field of DIEnumerator, but we
set it nonetheless in case we decide to use it in the future.
Alternatively, we could remove it, but it is simpler to pass down the
signed bit as it is in the AST for now.
Implemented to address comments on D106585
This patch adds the always inline attribute to the outlined functions generated
by OpenMP regions. Because there is only a single instance of this function and
it always has internal linkage it is safe to inline in every instance it is
created. This could potentially lead to performance degredation due to
inflated register counts in the parallel region.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106799
DIEnumerator stores an APInt as of April 2020, so now we don't need to
truncate the enumerator value to 64 bits. Fixes assertions during IRGen.
Split from D105320, thanks to Matheus Izvekov for the test case and
report.
Differential Revision: https://reviews.llvm.org/D106585
XL provides functions __vec_ldrmb/__vec_strmb for loading/storing a
sequence of 1 to 16 bytes in big endian order, right justified in the
vector register (regardless of target endianness).
This is equivalent to vec_xl_len_r/vec_xst_len_r which are only
available on Power9.
This patch simply uses the Power9 functions when compiled for Power9,
but provides a more general implementation for Power8.
Differential revision: https://reviews.llvm.org/D106757
In OpenMP 5.1:
> If the `write` or `update` clause is specifieded, the atomic operation is not an atomic conditional update for which the comparison fails, and the effective memory ordering is `release`, `acq_rel`, or `seq_cst`, the strong flush on entry to the atomic operation is also a release flush. If the `read` or `update` clause is specified and the effective memory ordering is `acquire`, `acq_rel`, or `seq_cst` then the strong flush on exit from the atomic operation is also an acquire flush.
In OpenMP 5.0:
> If the `write`, `update`, or **`capture`** clause is specified and the `release`, `acq_rel`, or `seq_cst` clause is specified then the strong flush on entry to the atomic operation is also a release flush. If the `read` or `capture` clause is specified and the `acquire`, `acq_rel`, or `seq_cst` clause is specified then the strong flush on exit from the atomic operation is also an acquire flush.
From my understanding, in OpenMP 5.1, `capture` is removed from the requirement for flush, therefore we don't have to enforce it.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D100768
Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax}
with standard codegen patterns. Since wasm_simd128.h uses an integer vector as
the standard single vector type, the IR for the pmin and pmax intrinsic
functions contains bitcasts that would not be there otherwise. Add extra codegen
patterns that can still select the pmin and pmax instructions in the presence of
these bitcasts.
Differential Revision: https://reviews.llvm.org/D106612
Address sanitizer passes may generate call of ASAN bitcode library
functions after bitcode linking in lld, therefore lld cannot add
those symbols since it does not know they will be used later.
To solve this issue, clang emits a reference to a bicode library
function which calls all ASAN functions which need to be
preserved. This basically force all ASAN functions to be
linked in.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D106315
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.
!srcloc is generated in clang codegen, and pulled back out by llvm
functions like AsmPrinter::emitInlineAsm that need to report errors in
the inline asm. From there it goes to LLVMContext::emitError, is
stored in DiagnosticInfoInlineAsm, and ends up back in clang, at
BackendConsumer::InlineAsmDiagHandler(), which reconstitutes a true
clang::SourceLocation from the integer cookie.
Throughout this code path, it's now 64-bit rather than 32, which means
that if SourceLocation is expanded to a 64-bit type, this error report
won't lose half of the data.
The compiler will tolerate both of i32 and i64 !srcloc metadata in
input IR without faulting. Test added in llvm/MC. (The semantic
accuracy of the metadata is another matter, but I don't know of any
situation where that matters: if you're reading an IR file written by
a previous run of clang, you don't have the SourceManager that can
relate those source locations back to the original source files.)
Original version of the patch by Mikhail Maltsev.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D105491
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106496
These builtins were added to capture the fact that the underlying Wasm
instructions return i32s and implicitly sign or zero extend the extracted lanes
in the case of the i8x16 and i16x8 variants. But we do sufficient optimizations
during code gen that these low-level details do not need to be exposed to users.
This commit replaces the use of the builtins in wasm_simd128.h with normal
target-independent vector code. As a result, we can switch the relevant
intrinsics to use functions rather than macros and can use more user-friendly
return types rather than trying to precisely expose the underlying Wasm types.
Note, however, that the generated LLVM IR is no different after this change.
Differential Revision: https://reviews.llvm.org/D106500
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal instruction selection patterns. The wasm_simd128.h
intrinsics header was already using portable code for the corresponding
intrinsics, so now it produces the correct instructions.
Differential Revision: https://reviews.llvm.org/D106400
This patch is in a series of patches to provide
builtins for compatibility with the XL compiler.
This patch adds builtins related to floating point
operations
Reviewed By: #powerpc, nemanjai, amyk, NeHuang
Differential Revision: https://reviews.llvm.org/D103986
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.
NFC: this patch introduces typedefs for the integer type used by
SourceLocation and makes all the boring changes to use the typedefs
everywhere, but for the moment, they are unconditionally defined to
uint32_t.
Patch originally by Mikhail Maltsev.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D105492
Implemented builtins for mtmsr, mfspr, mtspr on PowerPC;
the patch is intended for XL Compatibility.
Differential revision: https://reviews.llvm.org/D106130
This patch implements store, load, move from and to registers related
builtins, as well as the builtin for stfiw. The patch aims to provide
feature parady with xlC on AIX.
Differential revision: https://reviews.llvm.org/D105946
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch add the builtin and emit target independent
code for __cmpb.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D105194
This patch fixes `__builtin_ppc_recipdivf`, `__builtin_ppc_recipdivd`,
`__builtin_ppc_rsqrtf`, and `__builtin_ppc_rsqrtd`. FastMathFlags are
set to fast immediately before emitting these builtins. Now the flags
are restored to their previous values after the builtins are emitted.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D105984
Added a number of different builtins that exist in the XL compiler. Most of
these builtins already exist in clang under a different name.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D104386
This patch is in a series of patches to provide builtins for
compatibility with the XL compiler. This patch adds software divide
builtins with no checking. These builtins are each emitted as a fast
fdiv.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D106150
Summary:
The AIX linker will produce errors on unresolved weak symbols. Change the
generated code to not check for the initialization function but just call
it and ensure that it always exists. Also, the AIX atexit routine has a
different name (and signature) so call it correctly. Update the lit tests
to test on AIX appropriately.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: hubert.reinterpretcast (Hubert Tong)
Differential Revision: https://reviews.llvm.org/D104420
It's noteworthy that GCC has the same bug here, which is a bit
surprising. Both Clang and GCC's bug is only for function template
arguments that are themselves templates with default template arguments
(f1<t1<int[, missing_default_here]>>). Probably because function name
matching isn't generally necessary - whereas type matching is necessary
for DWARF consumers to associate declarations and definitions across
translation units, so the bug's been addressed there already - but
continued to exist for function templates since it's fairly benign
there.
I came across this while working on a change that could reconstitute
these pretty printed names based on the rest of the DWARF, reducing the
size of the DWARF by not having to encode all the template parameters in
the name string. That reconstitution code can't tell the difference
between a defaulted argument or not, so couldn't create the current
buggy-ish output.
Making the names more consistent between direct and indirect references,
and between function and class templates seems all to the good.
(I fixed the function template version of this a few years back in
9fdd09a4cc - clearly I should've looked
more closely and generalized the code better so it only had to be fixed
once - well, doing that here now)
Remove uses of to-be-deprecated API. In cases where the correct
element type was not immediately obvious to me, fall back to
explicit getPointerElementType().
Remove uses of to-be-deprecated API.
Unfortunately this one mostly just makes the use of
getPointerElementType() explicit, as the correct type to use
wasn't immediately available (deriving it from QualType is left
as an excercise to the reader).
Remove uses of to-be-deprecated API. I've fallen back to calling
getPointerElementType() in some cases where the correct type wasn't
immediately obvious to me.
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102107
This provides intrinsics for emitting instructions that set the FPSCR (`mtfsf/mtfsfi`).
The patch also conservatively marks the rounding mode as an implicit def for both since they both may set the rounding mode depending on the operands.
Reviewed By: #powerpc, qiucf
Differential Revision: https://reviews.llvm.org/D105957
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtins and instrisics for population
count, reversed load and store related operations.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D106021
This patch implements the `__popcntb` XL compatibility builtin for 32bit in the frontend and backend. This patch also updates tests for `__popcntb` and other XL Compat sync related builtins.
Reviewed By: #powerpc, nemanjai, amyk
Differential Revision: https://reviews.llvm.org/D105360
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtins and emit target independent
code for rotate related operations.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D104744
This patch make coroutine passes run by default in LLVM pipeline. Now
the clang and opt could handle IR inputs containing coroutine intrinsics
without special options.
It should be fine. On the one hand, the coroutine passes seems to be stable
since there are already many projects using coroutine feature.
On the other hand, the coroutine passes should do nothing for IR who doesn't
contain coroutine intrinsic.
Test Plan: check-llvm
Reviewed by: lxfind, aeubanks
Differential Revision: https://reviews.llvm.org/D105877
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50435.
Differential Revision: https://reviews.llvm.org/D106019
Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.
Differential Revision: https://reviews.llvm.org/D105950
LDARX and LWARX sometimes gets optimized out by the compiler
when it is critical to the correctness of the code. This inline asm generation
ensures that it preserved.
Differential Revision: https://reviews.llvm.org/D105754
Properties that were declared `@property(copy, nonatomic) id foo` make an
unnecessary call to objc_get_property(). This call can be replaced with a
direct access to the backing variable identical to how a `@property(nonatomic)
id foo` would do it.
This reduces codegen by 4 bytes (x86_64/arm64) and removes a cross linkage unit
function call per property declared as copy/nonatomic.
Differential Revision: https://reviews.llvm.org/D105311
Replace the clang builtin function and LLVM intrinsic for
f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing
combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern.
Differential Revision: https://reviews.llvm.org/D105755
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.
The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.
[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11
This was in parts extracted from D59319.
Reviewed By: ABataev, JonChesterfield
Differential Revision: https://reviews.llvm.org/D101976
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.
The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.
[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11
This was in parts extracted from D59319.
Reviewed By: ABataev, JonChesterfield
Differential Revision: https://reviews.llvm.org/D101976
Replace the clang builtin function and LLVM intrinsic previously used to select
the f64x2.promote_low_f32x4 instruction with custom combines from standard
SelectionDAG nodes. Implement the new combines to share code with the similar
combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232.
Differential Revision: https://reviews.llvm.org/D105675
If the base is used in a map clause and later we have a memberexpr with
this base, and the member is a pointer, and this pointer is dereferenced
anyhow (subscript, array section, dereference, etc.), such components
should be considered as overlapped, otherwise it may lead to incorrect
size computations, since we try to map a pointee as a part of the whole
struct, which is not true for the pointer members.
Differential Revision: https://reviews.llvm.org/D105562
This change is intended as initial setup. The plan is to add
more semantic checks later. I plan to update the documentation
as more semantic checks are added (instead of documenting the
details up front). Most of the code closely mirrors that for
the Swift calling convention. Three places are marked as
[FIXME: swiftasynccc]; those will be addressed once the
corresponding convention is introduced in LLVM.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D95561
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
No need to emit private copyfor firstprivate constants in target
regions, we can use the original copy instead.
Differential Revision: https://reviews.llvm.org/D105647
In preparation for dropping support for it. I've replaced it with
a proper type where the correct type was obvious and left an
explicit getPointerElementType() where it wasn't.
This fixes a gap in the `overloadable` attribute support (K&R declared
functions would get mangled symbol names, but that name wouldn't be
represented in the debug info linkage name field for the function) and
in -funique-internal-linkage-names (this came up in review discussion on
D98799) where K&R static declarations would not get the uniqued linkage
names.
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.
Differential Revision: https://reviews.llvm.org/D105395
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.
Reviewed By: aaron.ballman, kpn
Differential Revision: https://reviews.llvm.org/D100118
functions implicitly generated by the compiler
These fake functions would cause clang to crash if the changes proposed
in https://reviews.llvm.org/D98799 were made.
the call already has the operand bundle
This bug was causing the call to `replaceAllUsesWith` to crash because
the old call instruction and the new call instruction were the same.
rdar://74957948
Differential Revision: https://reviews.llvm.org/D97824
Fix suggested by Yuanfang Chen:
Non-distinct debuginfo is attached to the function due to the undecorated declaration. Later, when seeing the function definition and `nodebug` attribute, the non-distinct debuginfo should be cleared.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D104777
According to AVR ABI (https://gcc.gnu.org/wiki/avr-gcc), returned struct value
within size 1-8 bytes should be returned directly (via register r18-r25), while
larger ones should be returned via an implicit struct pointer argument.
Reviewed By: dylanmckay
Differential Revision: https://reviews.llvm.org/D99237
C++ constructors/destructors need to go through a different constructor to construct a GlobalDecl object in order to retrieve their linkage type. This causes an assert failure in the default constructor of GlobalDecl. I'm chaning it to using the exsiting GlobalDecl object.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102356
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.
Reviewed By: aaron.ballman, kpn
Differential Revision: https://reviews.llvm.org/D100118
We need to mask the immediate to the width of a single vector
rather than 2 vectors. If we use the width of 2 vectors then
any shift larger than the length of 1 vector is going to overflow
the shuffle indices.
Fixes PR50895.
This patch adds a module level metadata flag indicating that the module
was compiled with the `-fopenmp` flag. This will make it easier for
passes like OpenMPOpt to determine if it should be run.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102361
Note regarding C++ for OpenCL:
When compiling C++ for OpenCL, DW_LANG_C_plus_plus* is emitted.
There is no DWARF language code defined for C++ for OpenCL as of yet,
but DWARF issue 210514.1 has been raised to request one.
In the mean time, continuing to emit DW_LANG_C_plus_plus* for C++ for
OpenCL allows the potential to distinguish between C++ for OpenCL and
OpenCL C in !DICompileUnit nodes, whereas using DW_LANG_OpenCL for
C++ for OpenCL would prevent this.
This change therefore leaves C++ for OpenCL as-is.
Reviewed By: shchenz, Anastasia
Differential Revision: https://reviews.llvm.org/D104118
This is mostly a mechanical change, but a testcase that contains
parts of the StringRef class (clang/test/Analysis/llvm-conventions.cpp)
isn't touched.
copy/dispose helper functions
We found out that these fake functions would cause clang to crash if the
changes proposed in https://reviews.llvm.org/D98799 were made.
The original patch was reverted in f681fd927e
because debug locations were missing in the body of the block byref
helper functions. This patch fixes the bug by calling CreateArtificial
after the calls to StartFunction.
Differential Revision: https://reviews.llvm.org/D104082
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.
This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D97680
The codegen for simd constructs was affected by the presence (or
absence) of the 'monotonic' schedule modifier for worksharing
loops. The modifier is only intended to apply to the scheduling of
chunks for a thread, not iterations of a loop inside a chunk.
In addition, the monotonic modifier was applied to worksharing loops
by default if no schedule clause was present; the referenced part of
the OpenMP 4.5 spec in the code (section 2.7.1) only applies if the
user specified a schedule clause with a static kind but no modifier.
Without a user-specified schedule clause we should default to
nonmonotonic scheduling.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D103793
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
Also:
- add driver test (fsanitize-use-after-return.c)
- add basic IR test (asan-use-after-return.cpp)
- (NFC) cleaned up logic for generating table of __asan_stack_malloc
depending on flag.
for issue: https://github.com/google/sanitizers/issues/1394
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D104076
<string> is currently the highest impact header in a clang+llvm build:
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.
This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.
Differential Revision: https://reviews.llvm.org/D103888
-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.
-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.
Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.
Reviewed By: rsmith, qcolombet
Differential Revision: https://reviews.llvm.org/D103928
Implementation of the unroll directive introduced in OpenMP 5.1. Follows the approach from D76342 for the tile directive (i.e. AST-based, not using the OpenMPIRBuilder). Tries to use `llvm.loop.unroll.*` metadata where possible, but has to fall back to an AST representation of the outer loop if the partially unrolled generated loop is associated with another directive (because it needs to compute the number of iterations).
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D99459
Clang will create a global value put in constant memory if an aggregate value
is declared firstprivate in the target device. The symbol name only uses the
name of the firstprivate variable, so symbol name conflicts will occur if the
variable is allowed to have different types through templates. An example of
this behvaiour is shown in https://godbolt.org/z/EsMjYh47n. This patch adds the
mangled type name to the symbol to avoid such naming conflicts. This fixes
https://bugs.llvm.org/show_bug.cgi?id=50642.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D103995
This renames the expression value categories from rvalue to prvalue,
keeping nomenclature consistent with C++11 onwards.
C++ has the most complicated taxonomy here, and every other language
only uses a subset of it, so it's less confusing to use the C++ names
consistently, and mentally remap to the C names when working on that
context (prvalue -> rvalue, no xvalues, etc).
Renames:
* VK_RValue -> VK_PRValue
* Expr::isRValue -> Expr::isPRValue
* SK_QualificationConversionRValue -> SK_QualificationConversionPRValue
* JSON AST Dumper Expression nodes value category: "rvalue" -> "prvalue"
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D103720
This implements the 'using enum maybe-qualified-enum-tag ;' part of
1099. It introduces a new 'UsingEnumDecl', subclassed from
'BaseUsingDecl'. Much of the diff is the boilerplate needed to get the
new class set up.
There is one case where we accept ill-formed, but I believe this is
merely an extended case of an existing bug, so consider it
orthogonal. AFAICT in class-scope the c++20 rule is that no 2 using
decls can bring in the same target decl ([namespace.udecl]/8). But we
already accept:
struct A { enum { a }; };
struct B : A { using A::a; };
struct C : B { using A::a;
using B::a; }; // same enumerator
this patch permits mixtures of 'using enum Bob;' and 'using Bob::member;' in the same way.
Differential Revision: https://reviews.llvm.org/D102241
vtbl itself is in default global address space. When clang emits
ctor, it gets a pointer to the vtbl field based on the this pointer,
then stores vtbl to the pointer.
Since this pointer can point to any address space (e.g. an object
created in stack), this pointer points to default address space, therefore
the pointer to vtbl field in this object should also be in default
address space.
Currently, clang incorrectly casts the pointer to vtbl field in this object
to global address space. This caused assertions in backend.
This patch fixes that by removing the incorrect addr space cast.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D103835
This fixes inconsistencies in the ms_abi.c testcase.
Also add a couple cases of missing double pointers in the windows part
of the testcase; the outcome of building that testcase on windows hasn't
changed, but the previous form of the test was imprecise (checking
for "%[[STRUCT_FOO]]*" when clang actually generates "%[[STRUCT_FOO]]**"),
which still used to match.
Ideally this would share code with the native Windows case, but
X86_64ABIInfo and WinX86_64ABIInfo aren't superclasses/subclasses of
each other so it's impractical, and the code to share currently only
consists of a couple lines.
Differential Revision: https://reviews.llvm.org/D103837
On x86_64 mingw, long doubles are always passed indirectly as
arguments (see an existing case in WinX86_64ABIInfo::classify);
generalize the existing code for reading varargs - any non-aggregate
type that is larger than 64 bits (which would be both long double
in mingw, and __int128) are passed indirectly too.
This makes reading varargs consistent with how they're passed,
fixing interop with both gcc and clang callers, for long double
and __int128.
Differential Revision: https://reviews.llvm.org/D103452
Refactored code of dependence processing and added new inoutset dependence type.
Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps.
Size of type of the dependence flag changed from 1 to 4 bytes in clang.
All dependence flags library gets so far and corresponding dependence types:
1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET.
Differential Revision: https://reviews.llvm.org/D97085
If the memory object is scalable type, we do not know the exact size of
it at compile time. Set the size of lifetime marker to unknown if the
object is scalable one.
Differential Revision: https://reviews.llvm.org/D102822
This is a pre-patch for adding using-enum support. It breaks out
the shadow decl handling of UsingDecl to a new intermediate base
class, BaseUsingDecl, altering the decl hierarchy to
def BaseUsing : DeclNode<Named, "", 1>;
def Using : DeclNode<BaseUsing>;
def UsingPack : DeclNode<Named>;
def UsingShadow : DeclNode<Named>;
def ConstructorUsingShadow : DeclNode<UsingShadow>;
Differential Revision: https://reviews.llvm.org/D101777
Use llvm.experimental.vector.insert instead of storing into an alloca
when generating code for these intrinsics. This defers the codegen of
the generated vector to instruction selection, allowing existing
shufflevector style optimizations to apply.
Additionally, introduce a new target transform that can recognise fixed
predicate patterns in the svbool variants of these intrinsics.
Differential Revision: https://reviews.llvm.org/D103082
This fixes PR49198: Wrong usage of __dso_handle in user code leads to
a compiler crash.
When Init is an address of the global itself, we need to track it
across RAUW. Otherwise the initializer can be destroyed if the global
is replaced.
Differential Revision: https://reviews.llvm.org/D101156
This patch fixes a Windows -EHa crash induced by previous commit 797ad70152.
The crash was caused by "LifetimeMarker" scope (with option -O2) that should not be considered as SEH Scope.
This change also turns off -fasync-exceptions by default under -EHa option for now.
Differential Revision: https://reviews.llvm.org/D103664#2799944
Need to emit a call for __kmpc_cancel_barrier in the exit block for
__kmpc_cancel function call if cancellation of the parallel block is
requested.
Differential Revision: https://reviews.llvm.org/D103646
The PreInits of a loop transformation (atm moment only tile) include the computation of the trip count. The trip count is needed by any loop-associated directives that consumes the transformation-generated loop. Hence, we must ensure that the PreInits of consumed loop transformations are emitted with the consuming directive.
This is done by addinging the inner loop transformation's PreInits to the outer loop-directive's PreInits. The outer loop-directive will consume the de-sugared AST such that the inner PreInits are not emitted twice. The PreInits of a loop transformation are still emitted directly if its generated loop(s) are not associated with another loop-associated directive.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D102180
This attribute applies to a using declaration, and permits importing a
declaration without knowing if that declaration exists. This is useful
for libc++ C wrapper headers that re-export declarations in std::, in
cases where the base C library doesn't provide all declarations.
This attribute was proposed in http://lists.llvm.org/pipermail/cfe-dev/2020-June/066038.html.
rdar://69313357
Differential Revision: https://reviews.llvm.org/D90188
This doesn't actually have any effect: we only call this code with
SequentiallyConsistent orderings. But delete it anyway for consistency
with other recent changes.
These actually can be automatically imported from another DLL. (This
works properly as long as the actual implementation of emutls is
linked dynamically from e.g. libgcc; if the implementation comes from
compiler-rt or a statically linked libgcc, it doesn't work as intended.)
This fixes PR50146 and https://github.com/msys2/MINGW-packages/issues/8706
(fixing calling std::call_once in a dynamically linked libstdc++);
since f731839584 the dso_local attribute
on the TLS variable affected the actual generated code for accessing
the emutls variable.
The dso_local attribute on the emutls variable made those accesses to
use 32 bit relative addressing in code, which requires runtime pseudo
relocations in the text section, and breaks entirely if the actual
other variable ends up loaded too far away in the virtual address
space.
Differential Revision: https://reviews.llvm.org/D102970
The original version of this was reverted, and @rjmcall provided some
advice to architect a new solution. This is that solution.
This implements a builtin to provide a unique name that is stable across
compilations of this TU for the purposes of implementing the library
component of the unnamed kernel feature of SYCL. It does this by
running the Itanium mangler with a few modifications.
Because it is somewhat common to wrap non-kernel-related lambdas in
macros that aren't present on the device (such as for logging), this
uniquely generates an ID for all lambdas involved in the naming of a
kernel. It uses the lambda-mangling number to do this, except replaces
this with its own number (starting at 10000 for readabililty reasons)
for lambdas used to name a kernel.
Additionally, this implements itself as constexpr with a slight catch:
if a name would be invalidated by the use of this lambda in a later
kernel invocation, it is diagnosed as an error (see the Sema tests).
Differential Revision: https://reviews.llvm.org/D103112
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.
Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.
Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035
Reviewed By: vitalybuka, morehouse
Differential Revision: https://reviews.llvm.org/D102772
When -gstrict-dwarf is specified, generate DW_TAG_rvalue_reference_type
at DWARF 4 or above
Reviewed By: dblaikie, aprantl
Differential Revision: https://reviews.llvm.org/D100630
D88631 added initial support for:
- -mstack-protector-guard=
- -mstack-protector-guard-reg=
- -mstack-protector-guard-offset=
flags, and D100919 extended these to AArch64. Unfortunately, these flags
aren't retained for LTO. Make them module attributes rather than
TargetOptions.
Link: https://github.com/ClangBuiltLinux/linux/issues/1378
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102742
variables emitted on both host and device side with different addresses
when ODR-used by host function should not cause device side counter-part
to be force emitted.
This fixes the regression caused by https://reviews.llvm.org/D102237
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D102801
It turns out we have not correctly supported exception spec all along in
Emscripten EH. Emscripten EH supports `throw()` but not `throw` with
types. See https://bugs.llvm.org/show_bug.cgi?id=50396.
Wasm EH also only supports `throw()` but not `throw` with types, and we
have been printing a warning message for the latter. This prints the
same warning message for `throw` with types when Emscripten EH is used,
or more precisely, when Wasm EH is not used. (So this will print the
warning messsage even when `-fno-exceptions` is used but I think that
should be fine. It's cumbersome to do a complilcated option checking in
CGException.cpp and options checkings are mostly done in elsewhere.)
Reviewed By: dschuff, kripken
Differential Revision: https://reviews.llvm.org/D102791
This reduces the size of chrome.dll.pdb built with optimizations,
coverage, and line table info from 4,690,210,816 to 2,181,128,192, which
makes it possible to fit under the 4GB limit.
This change can greatly reduce binary size in coverage builds, which do
not need value profiling. IR PGO builds are unaffected. There is a minor
behavior change for frontend PGO.
PGO and coverage both use InstrProfiling to create profile data with
counters. PGO records the address of each function in the __profd_
global. It is used later to map runtime function pointer values back to
source-level function names. Coverage does not appear to use this
information.
Recording the address of every function with code coverage drastically
increases code size. Consider this program:
void foo();
void bar();
inline void inlineMe(int x) {
if (x > 0)
foo();
else
bar();
}
int getVal();
int main() { inlineMe(getVal()); }
With code coverage, the InstrProfiling pass runs before inlining, and it
captures the address of inlineMe in the __profd_ global. This greatly
increases code size, because now the compiler can no longer delete
trivial code.
One downside to this approach is that users of frontend PGO must apply
the -mllvm -enable-value-profiling flag globally in TUs that enable PGO.
Otherwise, some inline virtual method addresses may not be recorded and
will not be able to be promoted. My assumption is that this mllvm flag
is not popular, and most frontend PGO users don't enable it.
Differential Revision: https://reviews.llvm.org/D102818
`clang -fpic -fno-semantic-interposition` may set dso_local on variables for -fpic.
GCC folks consider there are 'address interposition' and 'semantic interposition',
and 'disabling semantic interposition' can optimize function calls but
cannot change variable references to use local aliases
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483).
This patch removes dso_local for variables in
`clang -fpic -fno-semantic-interposition` mode so that the built shared objects can
work with copy relocations. Building llvm-project tiself with
-fno-semantic-interposition (D102453) should now be safe with trunk Clang.
Example:
```
// a.c
int var;
int *addr() { return var; }
// old: cannot be interposed
movslq .Lvar$local(%rip), %rax
// new: can be interposed
movq var@GOTPCREL(%rip), %rax
movslq (%rax), %rax
```
The local alias lowering for `GlobalVariable`s is kept in case there is a
future option allowing local aliases.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D102583
To bring D99599's implementation in line with the existing
PrintPassInstrumentation, and to fix a FIXME, add more customizability
to PrintPassInstrumentation.
Introduce three new options. The first takes over the existing
"-debug-pass-manager-verbose" cl::opt.
The second and third option are specific to -fdebug-pass-structure. They
allow indentation, and also don't print analysis queries.
To avoid more golden file tests than necessary, prune down the
-fdebug-pass-structure tests.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D102196
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.
Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
outside of _try region must be updated in memory (not just in register)
before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.
Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others. If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation described below.
Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.
Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.
Part-2 (will be in Part-2 patch): LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.htmlhttps://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D80344/new/
I wouldn't recommend writing code like the testcase; a function
parameter isn't atomic, so using an atomic type doesn't really make
sense. But it's valid, so clang shouldn't crash on it.
The code was assuming hasAggregateEvaluationKind(Ty) implies Ty is a
RecordType, which isn't true. Just use isRecordType() instead.
Differential Revision: https://reviews.llvm.org/D102015
Follow up to D88631 but for aarch64; the Linux kernel uses the command
line flags:
1. -mstack-protector-guard=sysreg
2. -mstack-protector-guard-reg=sp_el0
3. -mstack-protector-guard-offset=0
to use the system register sp_el0 for the stack canary, enabling the
kernel to have a unique stack canary per task (like a thread, but not
limited to userspace as the kernel can preempt itself).
Address pr/47341 for aarch64.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/289
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen
Differential Revision: https://reviews.llvm.org/D100919
Adding lowering support for bitreverse.
Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D102397
`__block` variables used to be always stored on the head instead of stack.
D51564 allowed `__block` variables to the stored on the stack like normal
variablesif they not captured by any escaping block, but the debug-info
generation code wasn't made aware of it so we still unconditionally emit DWARF
expressions pointing to the heap.
This patch makes CGDebugInfo use the `EscapingByref` introduced in D51564 that
tracks whether the `__block` variable is actually on the heap. If it's stored on
the stack instead we just use the debug info we would generate for normal
variables instead.
Reviewed By: ahatanak, aprantl
Differential Revision: https://reviews.llvm.org/D99946
This patch adds support for GCC's -fstack-usage flag. With this flag, a stack
usage file (i.e., .su file) is generated for each input source file. The format
of the stack usage file is also similar to what is used by GCC. For each
function defined in the source file, a line with the following information is
produced in the .su file.
<source_file>:<line_number>:<function_name> <size_in_byte> <static/dynamic>
"Static" means that the function's frame size is static and the size info is an
accurate reflection of the frame size. While "dynamic" means the function's
frame size can only be determined at run-time because the function manipulates
the stack dynamically (e.g., due to variable size objects). The size info only
reflects the size of the fixed size frame objects in this case and therefore is
not a reliable measure of the total frame size.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D100509
Support for Darwin's libsystem_m's vector functions has been added to
LLVM in 93a9a8a8d9.
This patch adds support for -fveclib=Darwin_libsystem_m to Clang.
Reviewed By: arphaman
Differential Revision: https://reviews.llvm.org/D102489
I've taken the following steps to add unwinding support from inline assembly:
1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:
```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
to label %exit unwind label %uexit
```
2.) Add Bitcode writing/reading support + LLVM-IR parsing.
3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.
4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.
5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.
6.) Don't allow unwinding callbr.
Reviewed By: Amanieu
Differential Revision: https://reviews.llvm.org/D95745
The original change was reverted because it was discovered
that clang mishandles thunks, and they receive wrong
attributes for their this/return types - the ones for the function
they will call, not the ones they have.
While i have tried to fix this in https://reviews.llvm.org/D100388
that patch has been up and stuck for a month now,
with little signs of progress.
So while it will be good to solve this for real,
for now we can simply avoid introducing the bug,
by not annotating this/return for thunks.
This reverts commit 6270b3a1ea,
relanding 0aa0458f14.
As it was discovered in post-commit feedback
for 0aa0458f14,
we handle thunks incorrectly, and end up annotating
their this/return with attributes that are valid
for their callees, not for thunks themselves.
While it would be good to fix this properly,
and keep annotating them on thunks,
i've tried doing that in https://reviews.llvm.org/D100388
with little success, and the patch is stuck for a month now.
So for now, as a stopgap measure, subj.
Original commit message:
In http://lists.llvm.org/pipermail/llvm-dev/2020-July/143257.html we have
mentioned our plans to make some of the incremental compilation facilities
available in llvm mainline.
This patch proposes a minimal version of a repl, clang-repl, which enables
interpreter-like interaction for C++. For instance:
./bin/clang-repl
clang-repl> int i = 42;
clang-repl> extern "C" int printf(const char*,...);
clang-repl> auto r1 = printf("i=%d\n", i);
i=42
clang-repl> quit
The patch allows very limited functionality, for example, it crashes on invalid
C++. The design of the proposed patch follows closely the design of cling. The
idea is to gather feedback and gradually evolve both clang-repl and cling to
what the community agrees upon.
The IncrementalParser class is responsible for driving the clang parser and
codegen and allows the compiler infrastructure to process more than one input.
Every input adds to the “ever-growing” translation unit. That model is enabled
by an IncrementalAction which prevents teardown when HandleTranslationUnit.
The IncrementalExecutor class hides some of the underlying implementation
details of the concrete JIT infrastructure. It exposes the minimal set of
functionality required by our incremental compiler/interpreter.
The Transaction class keeps track of the AST and the LLVM IR for each
incremental input. That tracking information will be later used to implement
error recovery.
The Interpreter class orchestrates the IncrementalParser and the
IncrementalExecutor to model interpreter-like behavior. It provides the public
API which can be used (in future) when using the interpreter library.
Differential revision: https://reviews.llvm.org/D96033
This reverts commit 44a4000181.
We are seeing build failures due to missing dependency to libSupport and
CMake Error at tools/clang/tools/clang-repl/cmake_install.cmake
file INSTALL cannot find
In http://lists.llvm.org/pipermail/llvm-dev/2020-July/143257.html we have
mentioned our plans to make some of the incremental compilation facilities
available in llvm mainline.
This patch proposes a minimal version of a repl, clang-repl, which enables
interpreter-like interaction for C++. For instance:
./bin/clang-repl
clang-repl> int i = 42;
clang-repl> extern "C" int printf(const char*,...);
clang-repl> auto r1 = printf("i=%d\n", i);
i=42
clang-repl> quit
The patch allows very limited functionality, for example, it crashes on invalid
C++. The design of the proposed patch follows closely the design of cling. The
idea is to gather feedback and gradually evolve both clang-repl and cling to
what the community agrees upon.
The IncrementalParser class is responsible for driving the clang parser and
codegen and allows the compiler infrastructure to process more than one input.
Every input adds to the “ever-growing” translation unit. That model is enabled
by an IncrementalAction which prevents teardown when HandleTranslationUnit.
The IncrementalExecutor class hides some of the underlying implementation
details of the concrete JIT infrastructure. It exposes the minimal set of
functionality required by our incremental compiler/interpreter.
The Transaction class keeps track of the AST and the LLVM IR for each
incremental input. That tracking information will be later used to implement
error recovery.
The Interpreter class orchestrates the IncrementalParser and the
IncrementalExecutor to model interpreter-like behavior. It provides the public
API which can be used (in future) when using the interpreter library.
Differential revision: https://reviews.llvm.org/D96033
Currently clang does not emit device template variables
instantiated only in host functions, however, nvcc is
able to do that:
https://godbolt.org/z/fneEfferY
This patch fixes this issue by refactoring and extending
the existing mechanism for emitting static device
var ODR-used by host only. Basically clang records
device variables ODR-used by host code and force
them to be emitted in device compilation. The existing
mechanism makes sure these device variables ODR-used
by host code are added to llvm.compiler-used, therefore
they are guaranteed not to be deleted.
It also fixes non-ODR-use of static device variable by host code
causing static device variable to be emitted and registered,
which should not.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D102237
Analogously to https://reviews.llvm.org/D98794 this patch uses the
`alignstack` attribute to fix incorrect passing of homogeneous
aggregate (HA) arguments on AArch32. The EABI/AAPCS was recently
updated to clarify how VFP co-processor candidates are aligned:
4488e34998
Differential Revision: https://reviews.llvm.org/D100853
Follow the more general patch for now, do not try to SPMDize the kernel
if the variable is used and local.
Differential Revision: https://reviews.llvm.org/D101911
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
We're trying to move DebugLogging into instrumentation, rather than
being part of PassManagers/AnalysisManagers.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D102093
Commit 5baea05601 set the CurCodeDecl
because it was needed to pass the assert in CodeGenFunction::EmitLValueForLambdaField,
But this was not right to do as CodeGenFunction::FinishFunction passes it to EmitEndEHSpec
and cause corruption of the EHStack.
Revert the part of the commit that changes the CurCodeDecl, and instead
adjust the assert to check for a null CurCodeDecl.
Differential Revision: https://reviews.llvm.org/D102027
Follow up on 431e3138a and complete the other possible combinations.
Besides enforcing the new behavior, it also mitigates TSAN false positives when
combining orders that used to be stronger.
This patch fixes various issues with our prior `declare target` handling
and extends it to support `omp begin declare target` as well.
This started with PR49649 in mind, trying to provide a way for users to
avoid the "ref" global use introduced for globals with internal linkage.
From there it went down the rabbit hole, e.g., all variables, even
`nohost` ones, were emitted into the device code so it was impossible to
determine if "ref" was needed late in the game (based on the name only).
To make it really useful, `begin declare target` was needed as it can
carry the `device_type`. Not emitting variables eagerly had a ripple
effect. Finally, the precedence of the (explicit) declare target list
items needed to be taken into account, that meant we cannot just look
for any declare target attribute to make a decision. This caused the
handling of functions to require fixup as well.
I tried to clean up things while I was at it, e.g., we should not "parse
declarations and defintions" as part of OpenMP parsing, this will always
break at some point. Instead, we keep track what region we are in and
act on definitions and declarations instead, this is what we do for
declare variant and other begin/end directives already.
Highlights:
- new diagnosis for restrictions specificed in the standard,
- delayed emission of globals not mentioned in an explicit
list of a declare target,
- omission of `nohost` globals on the host and `host` globals on the
device,
- no explicit parsing of declarations in-between `omp [begin] declare
variant` and the corresponding end anymore, regular parsing instead,
- precedence for explicit mentions in `declare target` lists over
implicit mentions in the declaration-definition-seq, and
- `omp allocate` declarations will now replace an earlier emitted
global, if necessary.
---
Notes:
The patch is larger than I hoped but it turns out that most changes do
on their own lead to "inconsistent states", which seem less desirable
overall.
After working through this I feel the standard should remove the
explicit declare target forms as the delayed emission is horrible.
That said, while we delay things anyway, it seems to me we check too
often for the current status even though that is often not sufficient to
act upon. There seems to be a lot of duplication that can probably be
trimmed down. Eagerly emitting some things seems pretty weak as an
argument to keep so much logic around.
---
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D101030
Codegen for OpeMP copyin has non-deterministic IR output due to the unspecified evaluation order in a codegen conditional branch, which makes automatic test generation unreliable. This patch refactors codegen code to avoid this non-determinism.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101952
This implements the flag proposed in RFC
http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html.
The goal is to add a way to override the default target C++ ABI through a
compiler flag. This makes it easier to test and transition between different
C++ ABIs through compile flags rather than build flags.
In this patch:
- Store -fc++-abi= in a LangOpt. This isn't stored in a CodeGenOpt because
there are instances outside of codegen where Clang needs to know what the
ABI is (particularly through ASTContext::createCXXABI), and we should be
able to override the target default if the flag is provided at that point.
- Expose the existing ABIs in TargetCXXABI as values that can be passed
through this flag.
- Create a .def file for these ABIs to make it easier to check flag values.
- Add an error for diagnosing bad ABI flag values.
Differential Revision: https://reviews.llvm.org/D85802
If a return value is explicitly rounded to 64 bits, an additional zext
instruction is emitted, and in some cases it prevents tail call
optimization.
As discussed in D100225, this rounding is not necessary and can be
disabled.
Differential Revision: https://reviews.llvm.org/D100591
Warn when a declaration uses an identifier that doesn't obey the reserved
identifier rule from C and/or C++.
Differential Revision: https://reviews.llvm.org/D93095
The use of llvm::sort causes periodic failures on the bot with EXPENSIVE_CHECKS enabled,
as the regular sort pre-shuffles the array in the expensive checks mode, leading to a
non-deterministic test result which causes the CodeGenCXX/attr-cpuspecific-outoflinedefs.cpp
testcase to fail on the bot (http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/).
Add function to create the offload_maptypes and the offload_mapnames globals. These two functions
are used in clang. They will be used in the Flang/MLIR lowering as well.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D101503
In matrix type casts, we were doing bitcast when the matrices had the same size. This was incorrect and this patch fixes that.
Also added some new CodeGen tests for signed <-> usigned conversions
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D101754
AMDGPU backend need to know whether floating point opcodes that support exception
flag gathering quiet and propagate signaling NaN inputs per IEEE754-2008, which is
conveyed by a function attribute "amdgpu-ieee". "amdgpu-ieee"="false" turns this off.
Without this function attribute backend assumes it is on for compute functions.
-mamdgpu-ieee and -mno-amdgpu-ieee are added to Clang to control this function attribute.
By default it is on. -mno-amdgpu-ieee requires -fno-honor-nans or equivalent.
Reviewed by: Matt Arsenault
Differential Revision: https://reviews.llvm.org/D77013
This adds the long overdue implementations of these functions
that have been part of the ABI document and are now part of
the "Power Vector Intrinsic Programming Reference" (PVIPR).
The approach is to add new builtins and to emit code with
the fast flag regardless of whether fastmath was specified
on the command line.
Differential revision: https://reviews.llvm.org/D101209
This patch fixes a bug from D89802. For example, without it, Clang
generates x as the debug map name for both x and y in the following
example:
```
#pragma omp target map(to: x, y)
x = y = 1;
```
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D101564
Currently Clang does not add mustprogress to inifinite loops with a
known constant condition, matching C11 behavior. The forward progress
guarantee in C++11 and later should allow us to add mustprogress to any
loop (http://eel.is/c++draft/intro.progress#1).
This allows us to simplify the code dealing with adding mustprogress a
bit.
Reviewed By: aaron.ballman, lebedev.ri
Differential Revision: https://reviews.llvm.org/D96418
isn't an ExprWithCleanups
This patch fixes a bug where a temporary ObjC pointer is released before
the end of the full expression.
This fixes PR50043.
rdar://77030453
Differential Revision: https://reviews.llvm.org/D101502
This patch is child of D89671, contains the clang
implementation to use the OpenMP IRBuilder's section
construct.
Co-author: @anchu-rajendran
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91054
The Neon vadd intrinsics were added to the ARMSIMD intrinsic map,
however due to being defined under an AArch64 guard in arm_neon.td,
were not previously useable on ARM. This change rectifies that.
It is important to note that poly128 is not valid on ARM, thus it was
extracted out of the original arm_neon.td definition and separated
for the sake of AArch64.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D100772
Reverts parts of https://reviews.llvm.org/D17183, but keeps the
resetDataLayout() API and adds an assert that checks that datalayout string and
user label prefix are in sync.
Approach 1 in https://reviews.llvm.org/D17183#2653279
Reduces number of TUs build for 'clang-format' from 689 to 575.
I also implemented approach 2 in D100764. If someone feels motivated
to make us use DataLayout more, it's easy to revert this change here
and go with D100764 instead. I don't plan on doing more work in this
area though, so I prefer going with the smaller, more self-consistent change.
Differential Revision: https://reviews.llvm.org/D100776
Commit e3d8ee35e4 ("reland "[DebugInfo] Support to emit debugInfo
for extern variables"") added support to emit debugInfo for
extern variables if requested by the target. Currently, only
BPF target enables this feature by default.
As BPF ecosystem grows, callback function started to get
support, e.g., recently bpf_for_each_map_elem() is introduced
(https://lwn.net/Articles/846504/) with a callback function as an
argument. In the future we may have something like below as
a demonstration of use case :
extern int do_work(int);
long bpf_helper(void *callback_fn, void *callback_ctx, ...);
long prog_main() {
struct { ... } ctx = { ... };
return bpf_helper(&do_work, &ctx, ...);
}
Basically bpf helper may have a callback function and the
callback function is defined in another file or in the kernel.
In this case, we would like to know the debuginfo types for
do_work(), so the verifier can proper verify the safety of
bpf_helper() call.
For the following example,
extern int do_work(int);
long bpf_helper(void *callback_fn);
long prog() {
return bpf_helper(&do_work);
}
Currently, there is no debuginfo generated for extern function do_work().
In the IR, we have,
...
define dso_local i64 @prog() local_unnamed_addr #0 !dbg !7 {
entry:
%call = tail call i64 @bpf_helper(i8* bitcast (i32 (i32)* @do_work to i8*)) #2, !dbg !11
ret i64 %call, !dbg !12
}
...
declare dso_local i32 @do_work(i32) #1
...
This patch added support for the above callback function use case, and
the generated IR looks like below:
...
declare !dbg !17 dso_local i32 @do_work(i32) #1
...
!17 = !DISubprogram(name: "do_work", scope: !1, file: !1, line: 1, type: !18, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !2)
!18 = !DISubroutineType(types: !19)
!19 = !{!20, !20}
!20 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
The TargetInfo.allowDebugInfoForExternalVar is renamed to
TargetInfo.allowDebugInfoForExternalRef as now it guards
both extern variable and extern function debuginfo generation.
Differential Revision: https://reviews.llvm.org/D100567
Default address space (applies when no explicit address space was
specified) maps to generic (4) address space.
Added SYCL named address spaces `sycl_global`, `sycl_local` and
`sycl_private` defined as sub-sets of the default address space.
Static variables without address space now reside in global address
space when compile for SPIR target, unless they have an explicit address
space qualifier in source code.
Differential Revision: https://reviews.llvm.org/D89909
Adds new intrinsics for instructions that are in the final SIMD spec but did not
previously have intrinsics. Also updates the names of existing intrinsics to
reflect the final names of the underlying instructions in the spec. Keeps the
old names as deprecated functions to ease the transition to the new names.
Differential Revision: https://reviews.llvm.org/D101112
LLVM should be smarter about *known* malloc's alignment and this knowledge may enable other optimizations.
Originally started as LLVM patch - https://reviews.llvm.org/D100862 but this logic should be really in Clang.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D100879
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.
Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)
Also document the function attribute "frame-pointer" which is long overdue.
Differential Revision: https://reviews.llvm.org/D101016
From https://bugs.llvm.org/show_bug.cgi?id=49739:
Currently, `#pragma clang fp` are ignored for matrix types.
For the code below, the `contract` fast-math flag should be added to the generated call to `llvm.matrix.multiply` and `fadd`
```
typedef float fx2x2_t __attribute__((matrix_type(2, 2)));
void foo(fx2x2_t &A, fx2x2_t &C, fx2x2_t &B) {
#pragma clang fp contract(fast)
C = A*B + C;
}
```
Reviewed By: fhahn, mibintc
Differential Revision: https://reviews.llvm.org/D100834
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.
Reviewed By: jdoerfert, Meinersbur
Differential Revision: https://reviews.llvm.org/D95976
On ELF targets, if a function has uwtable or personality, or does not have
nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.
Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified.
(i.e. If -g[123], every function gets `.eh_frame`.
This behavior is strange but that is the status quo on GCC and Clang.)
Let's take asan as an example. Other sanitizers are similar.
`asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true,
so every function gets `.eh_frame` if `-g[123]` is specified.
This is the root cause that
`-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame
while
`-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.
This patch
* sets the nounwind attribute on sanitizer module ctor/dtor.
* let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.
The "uwtable" mechanism is generic: synthesized functions not cloned/specialized
from existing ones should consider `Function::createWithDefaultAttr` instead of
`Function::create` if they want to get some default attributes which
have more of module semantics.
Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.
Differential Revision: https://reviews.llvm.org/D100251
The implicitly generated mappings for allocation/deallocation in mappers
runtime should be mapped as implicit, also no need to clear member_of
flag to avoid ref counter increment. Also, the ref counter should not be
incremented for the very first element that comes from the mapper
function.
Differential Revision: https://reviews.llvm.org/D100673
As reported in PR50025, sometimes we would end up not emitting functions
needed by inline multiversioned variants. This is because we typically
use the 'deferred decl' mechanism to emit these. However, the variants
are emitted after that typically happens. This fixes that by ensuring
we re-run deferred decls after this happens. Also, the multiversion
emission is done recursively to ensure that MV functions that require
other MV functions to be emitted get emitted.
This reverts commit fa6b54c44a.
The commited patch broke mlir tests. It seems that mlir tests depend on coroutine function properties set in CoroEarly pass.
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920
To fix this, we set the attributes in the Clang front-end instead of in CoroEarly pass.
Reviewed By: rjmccall, ChuanqiXu
Differential Revision: https://reviews.llvm.org/D100282
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920
Differential Revision: https://reviews.llvm.org/D100282
Add device variables to llvm.compiler.used if they are
ODR-used by either host or device functions.
This is necessary to prevent them from being
eliminated by whole-program optimization
where the compiler has no way to know a device
variable is used by some host code.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D98814
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from
if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
to
if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
Introduce a getValueAsBool that normalize the check, with the following
behavior:
no attributes or attribute set to "false" => return false
attribute set to "true" => return true
Differential Revision: https://reviews.llvm.org/D99299
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.
Differential Revision: https://reviews.llvm.org/D100596
For combined worksharing directives need to emit the temp arrays outside
of the parallel region and update them in the master thread only.
Differential Revision: https://reviews.llvm.org/D100187
This is a Clang-only change and depends on the existing "musttail"
support already implemented in LLVM.
The [[clang::musttail]] attribute goes on a return statement, not
a function definition. There are several constraints that the user
must follow when using [[clang::musttail]], and these constraints
are verified by Sema.
Tail calls are supported on regular function calls, calls through a
function pointer, member function calls, and even pointer to member.
Future work would be to throw a warning if a users tries to pass
a pointer or reference to a local variable through a musttail call.
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D99517
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example
struct S {
__attribute__ ((__aligned__(16))) double v[4];
};
Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)
Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.
This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.
The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.
For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.
On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.
Patch by Momchil Velikov and Lucas Prates.
Differential Revision: https://reviews.llvm.org/D98794
The documentation says that for variadic functions, all composites
are treated similarly, no special handling of HFAs/HVAs, not even
for the fixed arguments of a variadic function.
Differential Revision: https://reviews.llvm.org/D100467
Removes the builtins and intrinsics used to opt in to using these instructions
and replaces them with normal ISel patterns now that they are no longer
prototypes.
Differential Revision: https://reviews.llvm.org/D100402
Add a custom DAG combine and ISD opcode for detecting patterns like
(uint_to_fp (extract_subvector ...))
before the extract_subvector is expanded to ensure that they will ultimately
lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions
are no longer prototypes and can now be produced via standard IR, this commit
also removes the target intrinsics and builtins that had been used to prototype
the instructions.
Differential Revision: https://reviews.llvm.org/D100425
Now that these instructions are no longer prototypes, we do not need to be
careful about keeping them opt-in and can use the standard LLVM infrastructure
for them. This commit removes the bespoke intrinsics we were using to represent
these operations in favor of the corresponding target-independent intrinsics.
The clang builtins are preserved because there is no standard way to easily
represent these operations in C/C++.
For consistency with the scalar codegen in the Wasm backend, the intrinsic used
to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though
@llvm.roundeven better captures the semantics of the underlying Wasm
instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is
left to a potential future patch.
Differential Revision: https://reviews.llvm.org/D100411
Aggregate types over 16 bytes are passed by reference.
Contrary to the x86_64 ABI, smaller structs with an odd (non power
of two) are padded and passed in registers.
Differential Revision: https://reviews.llvm.org/D100374
According to i386 System V ABI:
1. when __m256 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 32 byte boundary at the time of the call.
2. when __m512 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 64 byte boundary at the time of the call.
The current method of clang passing __m512 parameter are as follow:
1. when target supports avx512, passing it with 64 byte alignment;
2. when target supports avx, passing it with 32 byte alignment;
3. Otherwise, passing it with 16 byte alignment.
Passing __m256 parameter are as follow:
1. when target supports avx or avx512, passing it with 32 byte alignment;
2. Otherwise, passing it with 16 byte alignment.
This pach will passing __m128/__m256/__m512 following i386 System V ABI and
apply it to Linux only since other System V OS (e.g Darwin, PS4 and FreeBSD) don't
want to spend any effort dealing with the ramifications of ABI breaks at present.
Differential Revision: https://reviews.llvm.org/D78564
The existing Windows Itanium patches for dllimport/export
behaviour w.r.t vtables/rtti can't be adopted for PS4 due to
backwards compatibility reasons (see comments on
https://reviews.llvm.org/D90299).
This commit adds our PS4 scheme for this to Clang.
Differential Revision: https://reviews.llvm.org/D93203
Summary: The tags DW_LANG_C_plus_plus_14 and DW_LANG_C_plus_plus_11, introduced in Dwarf-5, are unexpected in previous versions. Fixing the mismathing doesn't have any drawbacks for any other debuggers, but helps dbx.
Reviewed By: aprantl, shchenz
Differential Revision: https://reviews.llvm.org/D99250
The first one is the real parameters of the coroutine function, the
other one just for copying parameters to the coroutine frame.
Considering the following c++ code:
```
struct coro {
...
};
coro foo(struct test & t) {
...
co_await suspend_always();
...
co_await suspend_always();
...
co_await suspend_always();
}
int main(int argc, char *argv[]) {
auto c = foo(...);
c.handle.resume();
...
}
```
Function foo is the standard coroutine function, and it has only
one parameter named t (ignoring this at first),
when we use the llvm code to compile this function, we can get the
following ir:
```
!2921 = distinct !DISubprogram(name: "foo", linkageName:
"_ZN6Object3fooE4test", scope: !2211, file: !45, li\
ne: 48, type: !2329, scopeLine: 48, flags: DIFlagPrototyped |
DIFlagAllCallsDescribed, spFlags: DISPFlagDefi\
nition | DISPFlagOptimized, unit: !44, declaration: !2328,
retainedNodes: !2922)
!2924 = !DILocalVariable(name: "t", arg: 2, scope: !2921, file: !45,
line: 48, type: !838)
...
!2926 = !DILocalVariable(name: "t", scope: !2921, type: !838, flags:
DIFlagArtificial)
```
We can find there are two `the same` DIVariable named t in the same
dwarf scope for foo.resume.
And when we try to use llvm-dwarfdump to dump the dwarf info of this
elf, we get the following output:
```
0x00006684: DW_TAG_subprogram
DW_AT_low_pc (0x00000000004013a0)
DW_AT_high_pc (0x00000000004013a8)
DW_AT_frame_base (DW_OP_reg7 RSP)
DW_AT_object_pointer (0x0000669c)
DW_AT_GNU_all_call_sites (true)
DW_AT_specification (0x00005b5c "_ZN6Object3fooE4test")
0x000066a5: DW_TAG_formal_parameter
DW_AT_name ("t")
DW_AT_decl_file ("/disk1/yifeng.dongyifeng/my_code/llvm/build/bin/coro-debug-1.cpp")
DW_AT_decl_line (48)
DW_AT_type (0x00004146 "test")
0x000066ba: DW_TAG_variable
DW_AT_name ("t")
DW_AT_type (0x00004146 "test")
DW_AT_artificial (true)
```
The elf also has two 't' in the same scope.
But unluckily, it might let the debugger
confused. And failed to print parameters for O0 or above.
This patch will make coroutine parameters and move
parameters use the same DIVar and try to fix the problems
that I mentioned before.
Test Plan: check-clang
Reviewed By: aprantl, jmorse
Differential Revision: https://reviews.llvm.org/D97533
This implements C-style type conversions for matrix types, as specified
in clang/docs/MatrixTypes.rst.
Fixes PR47141.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D99037
This ensures these types have distinct names if they are distinct types
(eg: if one is an instantiation with a type in one inline namespace, and
another from a type with the same simple name, but in a different inline
namespace).
As it is being noted in D99249, lack of alignment information on `this`
has been preventing LICM from happening.
For some time now, lack of alignment attribute does *not* imply
natural alignment, but an alignment of `1`.
Also, we used to treat dereferenceable as implying alignment,
but we no longer do, so it's a bugfix.
Differential Revision: https://reviews.llvm.org/D99790
Recently atomicrmw started to support fadd/fsub:
https://reviews.llvm.org/D53965
However clang atomic builtins fetch add/sub still does not support
emitting atomicrmw fadd/fsub.
This patch adds that.
Reviewed by: John McCall, Artem Belevich, Matt Arsenault, JF Bastien,
James Y Knight, Louis Dionne, Olivier Giroux
Differential Revision: https://reviews.llvm.org/D71726
The reason for the NewPM redesign is described in the commit
cba3e783389a: [NewPM] Disable PreservedCFGChecker ...
The checker introduces an internal custom CFG analysis that tracks
current up-to date CFG snapshot. The analysis is invalidated along
any other CFG related analysis (the key is CFGAnalyses). If the CFG
analysis is not invalidated at a functional pass exit then the checker
asserts that the CFG snapshot taken from this analysis is equals to
a snapshot of the current CFG.
Along the way:
- the function CFG::printDiff() is simplified by removing function
name calculation. The name is printed by the caller;
- fixed CFG invalidated condition (see CFG::invalidate());
- StandardInstrumentations::registerCallbacks() gets additional
optional parameter of type FunctionAnalysisManager*, which is
needed by the checker to get the custom CFG analysis;
- several PM related tests updated to explicitly set
-verify-cfg-preserved=1 as they need.
This patch is safe to land as the CFGChecker is left switched off
(the options -verify-cfg-preserved is false by default). It will be
switched on by a separate patch to minimize possible reverts.
Reviewed By: skatkov, kuhar
Differential Revision: https://reviews.llvm.org/D91327
These all pass 1 type to getIntrinsic. So rather than assigning
IntrinsicTypes for each builtin which invokes the SmallVector
constructor, just select the intrinsic ID with a switch and
share a single assignment of IntrinsicTypes.
Head files are included in a separate patch in case the name needs to be changed.
RV32 / 64:
clmul
clmulh
clmulr
Differential Revision: https://reviews.llvm.org/D99711
Forgot to amend the Author.
Original commit message:
Header files are included in a separate patch in case the name needs to be changed.
RV32 / 64:
orc.b
Differential Revision: https://reviews.llvm.org/D99320
Implementation for RISC-V Zbr extension intrinsic.
Header files are included in separate patch in case the name needs to be changed
RV32 / 64:
crc32b
crc32h
crc32w
crc32cb
crc32ch
crc32cw
RV64 Only:
crc32d
crc32cd
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99009
Summary:
Currently the mapping names are not passed to the mapper components that set up
the array region. This means array mappings will not have their names availible
in the runtime. This patch fixes this by passing the argument name to the region
correctly. This means that the mapped variable's name will be the declared
mapper that placed it on the device.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D99681
Calling `ParseCommandLineOptions` should only be called from `main` as the
CommandLine setup code isn't thread-safe. As BackendUtil is part of the
generic Clang FrontendAction logic, a process which has several threads executing
Clang FrontendActions will randomly crash in the unsafe setup code.
This patch avoids calling the function unless either the debug-pass option or
limit-float-precision option is set. Without these two options set the
`ParseCommandLineOptions` call doesn't do anything beside parsing
the command line `clang` which doesn't set any options.
See also D99652 where LLDB received a workaround for this crash.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D99740
Need to bitcast the function pointer passed as a parameter to the real
type to avoid possible problem with calling conventions.
Differential Revision: https://reviews.llvm.org/D99521
Removes the prototype builtin and intrinsic for i64x2.eq and implements that
instruction as well as the other i64x2 comparison instructions in the final SIMD
spec. Unsigned comparisons were not included in the final spec, so they still
need to be scalarized via a custom lowering.
Differential Revision: https://reviews.llvm.org/D99623
another one for distributed mode.
Currently during module importing, ThinLTO opens all the source modules,
collect functions to be imported and append them to the destination module,
then leave all the modules open through out the lto backend pipeline. This
patch refactors it in the way that one source module will be closed before
another source module is opened. All the source modules will be closed after
importing phase is done. It will save some amount of memory when there are
many source modules to be imported.
Note that this patch only changes the distributed thinlto mode. For in
process thinlto mode, one source module is shared acorss different thinlto
backend threads so it is not changed in this patch.
Differential Revision: https://reviews.llvm.org/D99554
Need to cast the argument for the debug wrapper function call to the
corresponding parameter type to avoid crash.
Differential Revision: https://reviews.llvm.org/D99617
If no initializer-clause is specified, the private variables will be
initialized following the rules for initialization of objects with static
storage duration.
Need to adjust the implementation to the current version of the
standard.
Differential Revision: https://reviews.llvm.org/D99539
Since the introduction of class properties in Objective-C it is possible to declare a class and an instance
property with the same identifier in an interface/protocol.
Right now Clang just generates debug information for whatever property comes first in the source file.
The second property is ignored as it's filtered out by the set of already emitted properties (which is just
using the identifier of the property to check for equivalence). I don't think generating debug info in this case
was never supported as the identifier filter is in place since 7123bca7fb
(which precedes the introduction of class properties).
This patch expands the filter to take in account identifier + whether the property is class/instance. This
ensures that both properties are emitted in this special situation.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D99512
The `noinline` for non-SPMD parallel functions is probably not necessary
but as long as we use it we should put it on the outermost parallel
function, which is the wrapper, not the actual outlined function.
Resolves PR49752
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D99506
The original issue is caused by the fact that the variable is allocated
with incorrect type i1 instead of i8. This causes the bitcasting of the
declaration to i8 type and the bitcast expression does not match the
original variable.
To fix the problem, the UndefValue initializer and the original
variable should be emitted with type i8, not i1.
Differential Revision: https://reviews.llvm.org/D99297
Need to insert a basic block during generation of the target region to
avoid crash for the GPU to be able always calling a cleanup action.
This cleanup action is required for the correct emission of the target
region for the GPU.
Differential Revision: https://reviews.llvm.org/D99445
I think byval/sret and the others are close to being able to rip out
the code to support the missing type case. A lot of this code is
shared with inalloca, so catch this up to the others so that can
happen.
When emitting a function body there needs to be a instr profiling counter emitted. Otherwise instr profiling won't work for this function.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D98135
tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available.
Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame).
The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame.
In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap.
Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime.
To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point.
The following is a common code pattern called "Symmetric Transfer" in coroutine:
```
auto tmp = await_suspend();
__builtin_coro_resume(tmp.address());
return;
```
In the above code example, `await_suspend()` returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine.
During the call to `await_suspend()`, the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards.
However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for `tmp`. It will then call the `address` function on tmp. `address` function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the `tmp` pointer. So when the optimizer looks at it, it has to conservatively assume that `tmp` may escape and hence put it on the heap. Furthermore, in some cases `address` call would be inlined, which will generate a bunch of store/load instructions that move the `tmp` pointer around. Those stores will also make the compiler to think that `tmp` might escape.
To summarize, it's really difficult for the mid-end to figure out that the `tmp` data is short-lived.
I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines.
Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893
Differential Revision: https://reviews.llvm.org/D99227
Add builtin function __builtin_get_device_side_mangled_name
to get device side manged name for functions and global
variables, which can be used to get symbol address of kernels
or variables by mangled name in dynamically loaded
bundled code objects at run time.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D99301
In order to test the preservation of the original Debug Info metadata
in your projects, a front end option could be very useful, since users
usually report that a concrete entity (e.g. variable x, or function fn2())
is missing debug info. The [0] is an example of running the utility
on GDB Project.
This depends on: D82546 and D82545.
Differential Revision: https://reviews.llvm.org/D82547
If emit inlined region for master/critical directives, no need to clear
lambda/block context data, otherwise the variables cannot be found and
it causes a crash at compile time.
Differential Revision: https://reviews.llvm.org/D99280
There are a number of functions in altivec.h that use
vector __int128 which isn't supported on AIX. Those functions
need to be guarded for targets that don't support the type.
Furthermore, the functions that produce quadword instructions
without using the type need a builtin. This patch adds the
macro guards to altivec.h using the __SIZEOF_INT128__ which
is only defined on targets that support the __int128 type.
This attribute represents the minimum and maximum values vscale can
take. For now this attribute is not hooked up to anything during
codegen, this will be added in the future when such codegen is
considered stable.
Additionally hook up the -msve-vector-bits=<x> clang option to emit this
attribute.
Differential Revision: https://reviews.llvm.org/D98030
08196e0b2e exposed LowerExpectIntrinsic's
internal implementation detail in the form of
LikelyBranchWeight/UnlikelyBranchWeight options to the outside.
While this isn't incorrect from the results viewpoint,
this is suboptimal from the layering viewpoint,
and causes confusion - should transforms also use those weights,
or should they use something else, D98898?
So go back to status quo by making LikelyBranchWeight/UnlikelyBranchWeight
internal again, and fixing all the code that used it directly,
which currently is only clang codegen, thankfully,
to emit proper @llvm.expect intrinsics instead.
Upon reviewing D98898 i've come to realization that these are
implementation detail of LowerExpectIntrinsicPass,
and they should not be exposed to outside of it.
This reverts commit ee8b53815d.
This makes the settings available for use in other passes by housing
them within the Support lib, but NFC otherwise.
See D98898 for the proposed usage in SimplifyCFG
(where this change was originally included).
Differential Revision: https://reviews.llvm.org/D98945
C functions may be declared and defined in different prototypes like below. This patch unifies the checks for mangling names in symbol linkage name emission and debug linkage name emission so that the two names are consistent.
static int go(int);
static int go(a) int a;
{
return a;
}
Test Plan:
Differential Revision: https://reviews.llvm.org/D98799
Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.
Depends on D98466.
Differential Revision: https://reviews.llvm.org/D98676
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.
Depends on D98457.
Differential Revision: https://reviews.llvm.org/D98466
This reverts commit 809a1e0ffd.
Mach-O doesn't support dso_local and this change broke XNU because of the use of dso_local.
Differential Revision: https://reviews.llvm.org/D98458
The condition variable is in scope in the loop increment, so we need to
emit the jump destination from wthin the scope of the condition
variable.
For GCC compatibility (and compatibility with real-world 'FOR_EACH'
macros), 'continue' is permitted in a statement expression within the
condition of a for loop, though, so there are two cases here:
* If the for loop has no condition variable, we can emit the jump
destination before emitting the condition.
* If the for loop has a condition variable, we must defer emitting the
jump destination until after emitting the variable. We diagnose a
'continue' appearing in the initializer of the condition variable,
because it would jump past the initializer into the scope of that
variable.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D98816
Added basic parsing/sema/serialization support for interop directive.
Support for the 'init' clause.
Differential Revision: https://reviews.llvm.org/D98558
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.
Differential Revision: https://reviews.llvm.org/D98487
The idiom:
```
DeclContext::lookup_result R = DeclContext::lookup(Name);
for (auto *D : R) {...}
```
is not safe when in the loop body we trigger deserialization from an AST file.
The deserialization can insert new declarations in the StoredDeclsList whose
underlying type is a vector. When the vector decides to reallocate its storage
the pointer we hold becomes invalid.
This patch replaces a SmallVector with an singly-linked list. The current
approach stores a SmallVector<NamedDecl*, 4> which is around 8 pointers.
The linked list is 3, 5, or 7. We do better in terms of memory usage for small
cases (and worse in terms of locality -- the linked list entries won't be near
each other, but will be near their corresponding declarations, and we were going
to fetch those memory pages anyway). For larger cases: the vector uses a
doubling strategy for reallocation, so will generally be between half-full and
full. Let's say it's 75% full on average, so there's N * 4/3 + 4 pointers' worth
of space allocated currently and will be 2N pointers with the linked list. So we
break even when there are N=6 entries and slightly lose in terms of memory usage
after that. We suspect that's still a win on average.
Thanks to @rsmith!
Differential revision: https://reviews.llvm.org/D91524
`CodeGenFunction::EmitRuntimeCall` automatically sets the right calling
convention for the callee so we can avoid setting it ourselves.
As requested in https://reviews.llvm.org/D98411
Reviewed by: anastasia
Differential Revision: https://reviews.llvm.org/D98705
The MSVC runtime library doesn't have a definition for wmemchr,
so provide an inline implementation.
Differential Revision: https://reviews.llvm.org/D98472
Recognize BI__builtin_isinf and BI__builtin_isfinite (and a few other opcodes
for finite) in testFPKind() and handle with TDC.
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D97901
This patch implements the __rndr and __rndrrs intrinsics to provide access to the random
number instructions introduced in Armv8.5-A. They are only defined for the AArch64
execution state and are available when __ARM_FEATURE_RNG is defined.
These intrinsics store the random number in their pointer argument and return a status
code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter
intrinsic reseeds the random number generator.
The instructions write the NZCV flags indicating the success of the operation that we can
then read with a CSET.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://bugs.llvm.org/show_bug.cgi?id=47838
Differential Revision: https://reviews.llvm.org/D98264
Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc
`__translate_sampler_initializer` has a calling convention of
`spir_func`, but clang generated calls to it using the default CC.
Instruction Combining was lowering these mismatching calling conventions
to `store i1* undef` which itself was subsequently lowered to a trap
instruction by simplifyCFG resulting in runtime `SIGILL`
There are arguably two bugs here: but whether there's any wisdom in
converting an obviously invalid call into a runtime crash over aborting
with a sensible error message will require further discussion. So for
now it's enough to set the right calling convention on the runtime
helper.
Reviewed By: svenh, bader
Differential Revision: https://reviews.llvm.org/D98411
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: mibintc
Differential Revision: https://reviews.llvm.org/D97125
In order to prevent further building issues related to the usage of SmallVector
in other compilation unit, this patch adjusts the llvm.h header as a workaround
instead.
Besides, this patch reverts previous workarounds:
1. Revert "[NFC] Use llvm::SmallVector to workaround XL compiler problem on AIX"
This reverts commit 561fb7f60a.
2.Revert "[clang][cli] Fix build failure in CompilerInvocation"
This reverts commit 8dc70bdcd0.
Differential Revision: https://reviews.llvm.org/D98552
This was motivated by the fact that constructor type homing (debug info
optimization that we want to turn on by default) drops some libc++ types,
so an attribute would allow us to override constructor homing and emit
them anyway. I'm currently looking into the particular libc++ issue, but
even if we do fix that, this issue might come up elsewhere and it might be
nice to have this.
As I've implemented it now, the attribute isn't specific to the
constructor homing optimization and overrides all of the debug info
optimizations.
Open to discussion about naming, specifics on what the attribute should do, etc.
Differential Revision: https://reviews.llvm.org/D97411
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.
There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
D96109 added support for unique internal linkage names for both internal
linkage functions and global variables. There was a lot of discussion on how to
get the demangling right for functions but I completely missed the point that
demanglers do not support suffixes for global vars. For example:
$ c++filt _ZL3foo
foo
$ c++filt _ZL3foo.uniq.123
_ZL3foo.uniq.123
The demangling for functions works as expected.
I am not sure of the impact of this. I don't understand how debuggers and other
tools depend on the correctness of global variable demangling so I am
pre-emptively disabling it until we can get the demangling support added.
Importantly, uniquefying global variables is not needed right now as we do not
do profile attribution to global vars based on sampling. It was added for
completeness and so this feature is not exactly missed.
Differential Revision: https://reviews.llvm.org/D98392
This patch extends the matrix spec to allow matrix-by-scalar division.
Originally support for `/` was left out to avoid ambiguity for the
matrix-matrix version of `/`, which could either be elementwise or
specified as matrix multiplication M1 * (1/M2).
For the matrix-scalar version, no ambiguity exists; `*` is also
an elementwise operation in that case. Matrix-by-scalar division
is commonly supported by systems including Matlab, Mathematica
or NumPy.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D97857
Summary:
The changes introduced in D87946 changed the API for libomptarget
functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x
but was not given a backward-compatible interface. This change will
require people using Clang 13.x or 12.x to recompile their offloading
programs.
Reviewed By: jdoerfert cchen
Differential Revision: https://reviews.llvm.org/D98358
These are incompatible with opaque pointers. This is in preparation
of dropping this API on the IRBuilder side as well.
Instead explicitly pass the loaded type.
Commit 1b04bdc2f3 added support for
capturing the 'this' pointer in a SEH context (__finally or __except),
But the case in which the 'this' pointer is part of a lambda capture
was not handled properly
Differential Revision: https://reviews.llvm.org/D97687
Demonstrate how to generate vadd/vfadd intrinsic functions
1. add -gen-riscv-vector-builtins for clang builtins.
2. add -gen-riscv-vector-builtin-codegen for clang codegen.
3. add -gen-riscv-vector-header for riscv_vector.h. It also generates
ifdef directives with extension checking, base on D94403.
4. add -gen-riscv-vector-generic-header for riscv_vector_generic.h.
Generate overloading version Header for generic api.
https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#c11-generic-interface
5. update tblgen doc for riscv related options.
riscv_vector.td also defines some unused type transformers for vadd,
because I think it could demonstrate how tranfer type work and we need
them for the whole intrinsic functions implementation in the future.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Reviewed By: jrtc27, craig.topper, HsiangKai, Jim, Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D95016
It's available both in CodeGenOptions and in LangOptions, and LangOptions
implementation is slightly better as it uses a StringRef instead of a char
pointer, so use it.
Differential Revision: https://reviews.llvm.org/D98175
LLVM is recommending to use SmallVector (that is, omitting the N), in the
absence of a well-motivated choice for the number of inlined elements N.
However, this doesn't work well with XL compiler on AIX since some header(s)
aren't properly picked up with it. We need to take a further look into the real
issue underneath and fix it in a later patch.
But currently we'd like to use this patch to unblock the build compiler issue
first.
Differential Revision: https://reviews.llvm.org/D98265
SUMMARY:
n the patch https://reviews.llvm.org/D87451 "add new option -mignore-xcoff-visibility"
we did as "The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR."
in these patch we let -mignore-xcoff-visibility effect on generating IR too. the new feature only work on AIX OS
Reviewer: Jason Liu,
Differential Revision: https://reviews.llvm.org/D89986
This is the first patch supporting M68k in Clang
- Register M68k as a target
- Target specific CodeGen support
- Target specific attribute support
Authors: myhsu, m4yers, glaubitz
Differential Revision: https://reviews.llvm.org/D88393
The option -funique-internal-linkage-names was added in D73307 and D78243 as a
LLVM early pass to insert a unique suffix to internal linkage functions and
vars. The unique suffix was the hash of the module path. However, we found
that this can be done more cleanly in clang early and the fixes that need to
be done later can be completely avoided. The fixes in particular are trying
to modify the DW_AT_linkage_name and finding the right place to insert the
pass.
This patch ressurects the original implementation proposed in D73307 which
was reviewed and then ditched in favor of the pass based approach.
Differential Revision: https://reviews.llvm.org/D96109
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
* Only the worksharing-loop directive.
* Recognizes only the nowait clause.
* No loop nests with more than one loop.
* Untested with templates, exceptions.
* Semantic checking left to the existing infrastructure.
This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
* The distance function: How many loop iterations there will be before entering the loop nest.
* The loop variable function: Conversion from a logical iteration number to the loop variable.
These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.
The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.
For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.
Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.
Reviewed By: jdenny
Differential Revision: https://reviews.llvm.org/D94973
Background:
Wasm EH, while using Windows EH (catchpad/cleanuppad based) IR, uses
Itanium-based libraries and ABIs with some modifications.
`__clang_call_terminate` is a wrapper generated in Clang's Itanium C++
ABI implementation. It contains this code, in C-style pseudocode:
```
void __clang_call_terminate(void *exn) {
__cxa_begin_catch(exn);
std::terminate();
}
```
So this function is a wrapper to call `__cxa_begin_catch` on the
exception pointer before termination.
In Itanium ABI, this function is called when another exception is thrown
while processing an exception. The pointer for this second, violating
exception is passed as the argument of this `__clang_call_terminate`,
which calls `__cxa_begin_catch` with that pointer and calls
`std::terminate` to terminate the program.
The spec (https://libcxxabi.llvm.org/spec.html) for `__cxa_begin_catch`
says,
```
When the personality routine encounters a termination condition, it
will call __cxa_begin_catch() to mark the exception as handled and then
call terminate(), which shall not return to its caller.
```
In wasm EH's Clang implementation, this function is called from
cleanuppads that terminates the program, which we also call terminate
pads. Cleanuppads normally don't access the thrown exception and the
wasm backend converts them to `catch_all` blocks. But because we need
the exception pointer in this cleanuppad, we generate
`wasm.get.exception` intrinsic (which will eventually be lowered to
`catch` instruction) as we do in the catchpads. But because terminate
pads are cleanup pads and should run even when a foreign exception is
thrown, so what we have been doing is:
1. In `WebAssemblyLateEHPrepare::ensureSingleBBTermPads()`, we make sure
terminate pads are in this simple shape:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
```
2. In `WebAssemblyHandleEHTerminatePads` pass at the end of the
pipeline, we attach a `catch_all` to terminate pads, so they will be in
this form:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
catch_all
call @std::terminate()
unreachable
```
In `catch_all` part, we don't have the exception pointer, so we call
`std::terminate()` directly. The reason we ran HandleEHTerminatePads at
the end of the pipeline, separate from LateEHPrepare, was it was
convenient to assume there was only a single `catch` part per `try`
during CFGSort and CFGStackify.
---
Problem:
While it thinks terminate pads could have been possibly split or calls
to `__clang_call_terminate` could have been duplicated,
`WebAssemblyLateEHPrepare::ensureSingleBBTermPads()` assumes terminate
pads contain no more than calls to `__clang_call_terminate` and
`unreachable` instruction. I assumed that because in LLVM very limited
forms of transformations are done to catchpads and cleanuppads to
maintain the scoping structure. But it turned out to be incorrect;
passes can merge cleanuppads into one, including terminate pads, as long
as the new code has a correct scoping structure. One pass that does this
I observed was `SimplifyCFG`, but there can be more. After this
transformation, a single cleanuppad can contain any number of other
instructions with the call to `__clang_call_terminate` and can span many
BBs. It wouldn't be practical to duplicate all these BBs within the
cleanuppad to generate the equivalent `catch_all` blocks, only with
calls to `__clang_call_terminate` replaced by calls to `std::terminate`.
Unless we do more complicated transformation to split those calls to
`__clang_call_terminate` into a separate cleanuppad, it is tricky to
solve.
---
Solution (?):
This CL just disables the generation and use of `__clang_call_terminate`
and calls `std::terminate()` directly in its place.
The possible downside of this approach can be, because the Itanium ABI
intended to "mark" the violating exception handled, we don't do that
anymore. What `__cxa_begin_catch` actually does is increment the
exception's handler count and decrement the uncaught exception count,
which in my opinion do not matter much given that we are about to
terminate the program anyway. Also it does not affect info like stack
traces that can be possibly shown to developers.
And while we use a variant of Itanium EH ABI, we can make some
deviations if we choose to; we are already different in that in the
current version of the EH spec we don't support two-phase unwinding. We
can possibly consider a more complicated transformation later to
reenable this, but I don't think that has high priority.
Changes in this CL contains:
- In Clang, we don't generate a call to `wasm.get.exception()` intrinsic
and `__clang_call_terminate` function in terminate pads anymore; we
simply generate calls to `std::terminate()`, which is the default
implementation of `CGCXXABI::emitTerminateForUnexpectedException`.
- Remove `WebAssembly::ensureSingleBBTermPads() function and
`WebAssemblyHandleEHTerminatePads` pass, because terminate pads are
already `catch_all` now (because they don't need the exception
pointer) and we don't need these transformations anymore.
- Change tests to use `std::terminate` directly. Also removes tests that
tested `LateEHPrepare::ensureSingleBBTermPads` and
`HandleEHTerminatePads` pass.
- Drive-by fix: Add some function attributes to EH intrinsic
declarations
Fixes https://github.com/emscripten-core/emscripten/issues/13582.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D97834
Use a WeakTrackingVH to cope with the stmt emission logic that cleans up
unreachable blocks. This invalidates the reference to the deferred
replacement placeholder. Cope with it.
Fixes PR25102 (from 2015!)
This change adds a new IR noundef attribute, which denotes when a function call argument or return val may never contain uninitialized bits.
In MemorySanitizer, this attribute enables optimizations which decrease instrumented code size by up to 17% (measured with an instrumented build of clang) . I'll introduce the change allowing msan to take advantage of this information in a separate patch.
Differential Revision: https://reviews.llvm.org/D81678
explicitly emitting retainRV or claimRV calls in the IR
This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.
Original commit message:
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
If the file in line directive does not exist on the system we need, to
use the original file to get its file id.
Differential Revision: https://reviews.llvm.org/D97945
unsigned variable 'IntNo' has been declared but not been defined inside function
EmitWebAssemblyBuiltinExpr().
static code analysis tool complains about uninitialized variable "IntNo" since
this enters to default branch without setting any intrinsics and calls Function
*Callee = CGM.getIntrinsic(IntNo).
This patch fixes the problem by adding default cases in switch statements.
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.
> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
> which indicates the call is implicitly followed by a marker
> instruction and an implicit retainRV/claimRV call that consumes the
> call result. In addition, it emits a call to
> @llvm.objc.clang.arc.noop.use, which consumes the call result, to
> prevent the middle-end passes from changing the return type of the
> called function. This is currently done only when the target is arm64
> and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
> with the operand bundle in the IR and removes the inserted calls after
> processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
> operand bundle. It doesn't remove the operand bundle on the call since
> the backend needs it to emit the marker instruction. The retainRV and
> claimRV calls are emitted late in the pipeline to prevent optimization
> passes from transforming the IR in a way that makes it harder for the
> ARC middle-end passes to figure out the def-use relationship between
> the call and the retainRV/claimRV calls (which is the cause of
> PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
> nothing in the callee prevents it from being paired up with the
> retainRV/claimRV call in the caller. It then inserts a release call if
> claimRV is attached to the call since autoreleaseRV+claimRV is
> equivalent to a release. If it cannot find an autoreleaseRV call, it
> tries to transfer the operand bundle to a function call in the callee.
> This is important since the ARC optimizer can remove the autoreleaseRV
> returning the callee result, which makes it impossible to pair it up
> with the retainRV/claimRV call in the caller. If that fails, it simply
> emits a retain call in the IR if retainRV is attached to the call and
> does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
> constant value if the call has the operand bundle. This ensures the
> call always has at least one user (the call to
> @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
> multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
> calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808
This reverts commit ed4718eccb.
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: mibintc
Differential Revision: https://reviews.llvm.org/D97125
If the mapped structure has data members, which have 'default' mappers,
need to map these members individually using their 'default' mappers.
Differential Revision: https://reviews.llvm.org/D92195
`GetAddrOfGlobalTemporary` previously tried to emit the initializer of
a global temporary before updating the global temporary map. Emitting the
initializer could recurse back into `GetAddrOfGlobalTemporary` for the same
temporary, resulting in an infinite recursion.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D97733
The situation with inline asm/MC error reporting is kind of messy at the
moment. The errors from MC layout are not reliably propagated and users
have to specify an inlineasm handler separately to get inlineasm
diagnose. The latter issue is not a correctness issue but could be improved.
* Kill LLVMContext inlineasm diagnose handler and migrate it to use
DiagnoseInfo/DiagnoseHandler.
* Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This
covers use cases like inlineasm, MC, and any clients using SourceMgr.
* Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step
is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all
use cases, only one of them is used.
* If LLVMContext is available, let MCContext uses LLVMContext's diagnose
handler; if LLVMContext is not available, MCContext uses its own default
diagnose handler which just prints SMDiagnostic.
* Change a few clients(Clang, llc, lldb) to use the new way of reporting.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D97449
Currently clang uses stub function to launch kernel. This is inconvenient
to interop with C++ programs since the stub function has different name
as kernel, which is required by ROCm debugger.
This patch emits a variable symbol which has the same name as the kernel
and uses it to register and launch the kernel. This allows C++ program to
launch a kernel by using the original kernel name.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D86376
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.
Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.
Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D97608
Simply make sure that the CodeGenFunction::CXXThisValue and CXXABIThisValue
are correctly initialized to the recovered value.
For lambda capture, we also need to make sure to fill the LambdaCaptureFields
Differential Revision: https://reviews.llvm.org/D97534
For ELF targets, GCC 11 will set SHF_GNU_RETAIN on the section of a
`__attribute__((retain))` function/variable to prevent linker garbage
collection. (See AttrDocs.td for the linker support).
This patch adds `retain` functions/variables to the `llvm.used` list, which has
the desired linker GC semantics. Note: `retain` does not imply `used`,
so an unused function/variable can be dropped by Sema.
Before 'retain' was introduced, previous ELF solutions require inline asm or
linker tricks, e.g. `asm volatile(".reloc 0, R_X86_64_NONE, target");`
(architecture dependent) or define a non-local symbol in the section and use
`ld -u`. There was no elegant source-level solution.
With D97448, `__attribute__((retain))` will set `SHF_GNU_RETAIN` on ELF targets.
Differential Revision: https://reviews.llvm.org/D97447
An global value in the `llvm.used` list does not have GC root semantics on ELF targets.
This will be changed in a subsequent backend patch.
Change some `llvm.used` in the ELF code path to use `llvm.compiler.used` to
prevent undesired GC root semantics.
Change one extern "C" alias (due to `__attribute__((used))` in extern "C") to use `llvm.compiler.used` on all targets.
GNU ld has a rule "`__start_/__stop_` references from a live input section retain the associated C identifier name sections",
which LLD may drop entirely (currently refined to exclude SHF_LINK_ORDER/SHF_GROUP) in a future release (the rule makes it clumsy to GC metadata sections; D96914 added a way to try the potential future behavior).
For `llvm.used` global values defined in a C identifier name section, keep using `llvm.used` so that
the future LLD change will not affect them.
rnk kindly categorized the changes:
```
ObjC/blocks: this wants GC root semantics, since ObjC mainly runs on Mac.
MS C++ ABI stuff: wants GC root semantics, no change
OpenMP: unsure, but GC root semantics probably don't hurt
CodeGenModule: affected in this patch to *not* use GC root semantics so that __attribute__((used)) behavior remains the same on ELF, plus two other minor use cases that don't want GC semantics
Coverage: Probably want GC root semantics
CGExpr.cpp: refers to LTO, wants GC root
CGDeclCXX.cpp: one is MS ABI specific, so yes GC root, one is some other C++ init functionality, which should form GC roots (C++ initializers can have side effects and must run)
CGDecl.cpp: Changed in this patch for __attribute__((used))
```
Differential Revision: https://reviews.llvm.org/D97446
These flags affect coverage mapping (-fcoverage-mapping), not
-fprofile-[instr-]generate so it makes more sense to use the
-fcoverage-* prefix.
Differential Revision: https://reviews.llvm.org/D97434
And then push those change throughout LLVM.
Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).
The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.
Differential Revision: https://reviews.llvm.org/D97223
The new `-fsanitize-address-destructor-kind=` option allows control over how module
destructors are emitted by ASan.
The new option is consumed by both the driver and the frontend and is propagated into
codegen options by the frontend.
Both the legacy and new pass manager code have been updated to consume the new option
from the codegen options.
It would be nice if the new utility functions (`AsanDtorKindToString` and
`AsanDtorKindFromString`) could live in LLVM instead of Clang so they could be
consumed by other language frontends. Unfortunately that doesn't work because
the clang driver doesn't link against the LLVM instrumentation library.
rdar://71609176
Differential Revision: https://reviews.llvm.org/D96572
After a revision of D96274 changed `DiagnosticOptions` to not store all remark arguments **as-written**, it is no longer possible to reconstruct the arguments accurately from the class.
This is caused by the fact that for `-Rpass=regexp` and friends, `DiagnosticOptions` store only the group name `pass` and not `regexp`. This is the same representation used for the plain `-Rpass` argument.
Note that each argument must be generated exactly once in `CompilerInvocation::generateCC1CommandLine`, otherwise each subsequent call would produce more arguments than the previous one. Currently this works out because of the way `RoundTrip` splits the responsibilities for certain arguments based on what arguments were queried during parsing. However, this invariant breaks when we move to single round-trip for the whole `CompilerInvocation`.
This patch ensures that for one `-Rpass=regexp` argument, we don't generate two arguments (`-Rpass` from `DiagnosticOptions` and `-Rpass=regexp` from `CodeGenOptions`) by shifting the responsibility for handling both cases to `CodeGenOptions`. To distinguish between the cases correctly, additional information is stored in `CodeGenOptions`.
The `CodeGenOptions` parser of `-Rpass[=regexp]` arguments also looks at `-Rno-pass` and `-R[no-]everything`, which is necessary for generating the correct argument regardless of the ordering of `CodeGenOptions`/`DiagnosticOptions` parsing/generation.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D96847
For -fgpu-rdc mode, static device vars in different TU's may have the same name.
To support accessing file-scope static device variables in host code, we need to give them
a distinct name and external linkage. This can be done by postfixing each static device variable with
a distinct CUID (Compilation Unit ID) hash.
Since the static device variables have different name across compilation units, now we let
them have external linkage so that they can be looked up by the runtime.
Reviewed by: Artem Belevich, and Jon Chesterfield
Differential Revision: https://reviews.llvm.org/D85223
-O1 and above do dont call real optimizer pipeline in ThinLTO PreLink.
Also clang can't add PostLink OptimizerLastEPCallbacks for in-process ThinLTO.
This results in missing sanitizer passes with ThinLTO.
Simple working solution is just call OptimizerLastEPCallbacks
at the end of buildThinLTOPreLinkDefaultPipeline.
Differential Revision: https://reviews.llvm.org/D96320
Patch takes advantage of the implicit default behavior to reduce the number of attributes, which in turns reduces compilation time.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D97116
Currently managed variables are emitted as undefined symbols, which
causes difficulty for diagnosing undefined symbols for non-managed
variables.
This patch transforms managed variables in device compilation so that
they can be emitted as normal variables.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D96195
Demonstrate how to add RISC-V V builtins and lower them to IR intrinsics for V extension.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93446
Previously, the definition was so-marked, but the declaration was
not. This resulted in LLVM's dwarf emission treating the function as
being external, and incorrectly emitting DW_AT_external.
Differential Revision: https://reviews.llvm.org/D96044
This patch responds to a comment from @vitalybuka in D96203: suggestion to
do the change incrementally, and start by modifying this file name. I modified
the file name and made the other changes that follow from that rename.
Reviewers: vitalybuka, echristo, MaskRay, jansvoboda11, aaron.ballman
Differential Revision: https://reviews.llvm.org/D96974
This patch adds the following SHA3 Intrinsics:
vsha512hq_u64,
vsha512h2q_u64,
vsha512su0q_u64,
vsha512su1q_u64
veor3q_u8
veor3q_u16
veor3q_u32
veor3q_u64
veor3q_s8
veor3q_s16
veor3q_s32
veor3q_s64
vrax1q_u64
vxarq_u64
vbcaxq_u8
vbcaxq_u16
vbcaxq_u32
vbcaxq_u64
vbcaxq_s8
vbcaxq_s16
vbcaxq_s32
vbcaxq_s64
Note need to include +sha3 and +crypto when building from the front-end
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D96381
When WPD is enabled, via WholeProgramVTables, emit type metadata for
available_externally vtables. Additionally, add the vtables to the
llvm.compiler.used global so that they are not prematurely eliminated
(before *LTO analysis).
This is needed to avoid devirtualizing calls to a function overriding a
class defined in a header file but with a strong definition in a shared
library. Without type metadata on the available_externally vtables from
the header, the WPD analysis never sees what a derived class is
overriding. Even if the available_externally base class functions are
pure virtual, because shared library definitions are already treated
conservatively (committed patches D91583, D96721, and D96722) we will
not devirtualize, which would be unsafe since the library might contain
overrides that aren't visible to the LTO unit.
An example is std::error_category, which is overridden in LLVM
and causing failures after a self build with WPD enabled, because
libstdc++ contains hidden overrides of the virtual base class methods.
Differential Revision: https://reviews.llvm.org/D96919
We currently always store absolute filenames in coverage mapping. This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines. We are also
duplicating the path prefix potentially wasting space.
This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.
Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.
Differential Revision: https://reviews.llvm.org/D95753
We currently always store absolute filenames in coverage mapping. This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines. We are also
duplicating the path prefix potentially wasting space.
This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.
Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.
Differential Revision: https://reviews.llvm.org/D95753
The recent commit 00a6254 "Stop traping on sNaN in builtin_isnan" changed the
lowering in constrained FP mode of builtin_isnan from an FP comparison to
integer operations to avoid trapping.
SystemZ has a special instruction "Test Data Class" which is the preferred
way to do this check. This patch adds a new target hook "testFPKind()" that
lets SystemZ emit the s390_tdc intrinsic instead.
testFPKind() takes the BuiltinID as an argument and is expected to soon
handle more opcodes than just 'builtin_isnan'.
Review: Thomas Preud'homme, Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D96568
Add the types for the RISC-V V extension builtins.
These types will be used by the RISC-V V intrinsics which require
types of the form <vscale x 1 x i64>(LMUL=1 element size=64) or
<vscale x 4 x i32>(LMUL=2 element size=32), etc. The vector_size
attribute does not work for us as it doesn't create a scalable
vector type. We want these types to be opaque and have no operators
defined for them. We want them to be sizeless. This makes them
similar to the ARM SVE builtin types. But we will have quite a bit
more types. This patch adds around 60. Later patches will add
another 230 or so types representing tuples of these types similar
to the x2/x3/x4 types in ARM SVE. But with extra complexity that
these types are combined with the LMUL concept that is unique to
RISCV.
For more background see this RFC
http://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html
Authored-by: Roger Ferrer Ibanez <roger.ferrer@bsc.es>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D92715
This allows the option to affect the LTO output. Module::Max helps to
generate debug info for all modules in the same format.
Differential Revision: https://reviews.llvm.org/D96597
OpenMP 5.0 removed a lot of restriction for overlapped mapped items
comparing to OpenMP 4.5. Patch restricts the checks for overlapped data
mappings only for OpenMP 4.5 and less and reorders mapping of the
arguments so, that present and alloc mappings are processed first and
then all others.
Differential Revision: https://reviews.llvm.org/D86119
The tile directive is in OpenMP's Technical Report 8 and foreseeably will be part of the upcoming OpenMP 5.1 standard.
This implementation is based on an AST transformation providing a de-sugared loop nest. This makes it simple to forward the de-sugared transformation to loop associated directives taking the tiled loops. In contrast to other loop associated directives, the OMPTileDirective does not use CapturedStmts. Letting loop associated directives consume loops from different capture context would be difficult.
A significant amount of code generation logic is taking place in the Sema class. Eventually, I would prefer if these would move into the CodeGen component such that we could make use of the OpenMPIRBuilder, together with flang. Only expressions converting between the language's iteration variable and the logical iteration space need to take place in the semantic analyzer: Getting the of iterations (e.g. the overload resolution of `std::distance`) and converting the logical iteration number to the iteration variable (e.g. overload resolution of `iteration + .omp.iv`). In clang, only CXXForRangeStmt is also represented by its de-sugared components. However, OpenMP loop are not defined as syntatic sugar. Starting with an AST-based approach allows us to gradually move generated AST statements into CodeGen, instead all at once.
I would also like to refactor `checkOpenMPLoop` into its functionalities in a follow-up. In this patch it is used twice. Once for checking proper nesting and emitting diagnostics, and additionally for deriving the logical iteration space per-loop (instead of for the loop nest).
Differential Revision: https://reviews.llvm.org/D76342
This takes advantage of the implicit default behavior to reduce the number of
attributes, which in turns reduces compilation time. I've observed -3% in
instruction count when compiling sqlite3 amalgamation with -O0
Differential Revision: https://reviews.llvm.org/D96400
This is a follow up of D92940.
We have successfully converted fadd/fmul _mm_reduce_* intrinsics to
llvm.reduction + reassoc flag. We can do the same approach for fmin/fmax
too, i.e. llvm.reduction + nnan flag.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93179
This patch ensures that vector predication and vectorization width
pragmas work together correctly/as expected. Specifically, this patch
fixes the issue that when vectorization_width > 1, the vector
predication behaviour (this would matter if it has NOT been disabled
explicitly by a pragma) was getting ignored, which was incorrect.
The fix here removes the dependence of vector predication on the
vectorization width. The loop metadata corresponding to clang loop
pragma vectorize_predicate is always emitted, if the pragma is
specified, even if vectorization is disabled by vectorize_width(1)
or vectorize(disable) since the option is also used for interleaving
by the LoopVectorize pass.
Reviewed By: dmgreen, Meinersbur
Differential Revision: https://reviews.llvm.org/D94779
This patch adds 2 new options to control when Clang adds `mustprogress`:
1. -ffinite-loops: assume all loops are finite; mustprogress is added
to all loops, regardless of the selected language standard.
2. -fno-finite-loops: assume no loop is finite; mustprogress is not
added to any loop or function. We could add mustprogress to
functions without loops, but we would have to detect that in Clang,
which is probably not worth it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D96419
class types.
The goal is to provide a way to bypass constructor homing when emitting
class definitions and force class definitions in the debug info.
Not sure about the wording of the attribute, or whether it should be
specific to classes with constructors
explicitly emitting retainRV or claimRV calls in the IR
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
EarlyCSEPass called after msan redices code size by about 10%.
Similar optimization exists for legacy pass manager in
addGeneralOptsForMemorySanitizer.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D96406
The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o
adding any new functionality.
Differential Revision: https://reviews.llvm.org/D95974
Signed prefix is removed and the single word spelling is
printed for the scalar types.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D96161
Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D96231
As Itanium ABI[http://itanium-cxx-abi.github.io/cxx-abi/abi.html#once-ctor]
points out:
"The size of the guard variable is 64 bits. The first byte (i.e. the byte at
the address of the full variable) shall contain the value 0 prior to
initialization of the associated variable, and 1 after initialization is complete."
Differential Revision: https://reviews.llvm.org/D95822
Pipe element type spelling for arg info metadata
should follow the same behavior as normal type spelling.
We should only use the canonical type spelling in the
base type field.
This patch also removed duplication in type handling.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D96151
When a function or a file is excluded using -fprofile-list= option,
don't emit coverage mapping as doing so confuses users since those
functions would always have zero count. This also reduces the binary
size considerably in cases where only a few functions or files are
being instrumented.
Differential Revision: https://reviews.llvm.org/D96000
For -fgpu-rdc, shadow variables should not be internalized, otherwise
they cannot be accessed by other TUs. This is necessary because
the shadow variable of external device variables are always
emitted as undefined symbols, which need to resolve to a global
symbols.
Managed variables need to be emitted as undefined symbols
in device compilations.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D95901
__builtin_isnan currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: kpn
Differential Revision: https://reviews.llvm.org/D95948
- The failures are all cc1-based tests due to the missing `-aux-triple` options,
which is always prepared by the driver in CUDA/HIP compilation.
- Add extra check on the missing aux-targetinfo to prevent crashing.
[hip][cuda] Enable extended lambda support on Windows.
- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D69322
This reverts commit 4874ff0241.
emitting retainRV or claimRV calls in the IR
This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.
Original commit message:
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
emitting retainRV or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D69322
Clang usually propagates counter mapping region for conditions of `if`, `while`,
`for`, etc from parent counter. We should do the same for condition of conditional operator.
Differential Revision: https://reviews.llvm.org/D95918
Extract registering device variable to CUDA runtime codegen function since it
will be called in multiple places.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D95558
Currently clang is not correctly retrieving from the AST the metadata for
constrained FP builtins. This patch fixes that for the X86 specific builtins.
Differential Revision: https://reviews.llvm.org/D94614
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
Normally, Clang will not make dllimport functions available for inlining
if they reference non-imported symbols, as this can lead to confusing
link errors. But if the function is marked always_inline, the user
presumably knows what they're doing and the attribute should be honored.
Differential revision: https://reviews.llvm.org/D95673
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
with fix to test case and stringrefs.
Currently (for codeview) lambdas have a string like `<lambda_0>` in
their mangled name, and don't have any display name. This change uses the
`<lambda_0>` as the display name, which helps distinguish between lambdas
in -gline-tables-only, since there are no linkage names there.
It also changes how we display lambda names; previously we used
`<unnamed-tag>`; now it will show `<lambda_0>`.
I added a function to the mangling context code to create this string;
for Itanium it just returns an empty string.
Bug: https://bugs.llvm.org/show_bug.cgi?id=48432
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D95187
This reverts 9b21d4b943
Currently (for codeview) lambdas have a string like `<lambda_0>` in
their mangled name, and don't have any display name. This change uses the
`<lambda_0>` as the display name, which helps distinguish between lambdas
in -gline-tables-only, since there are no linkage names there.
It also changes how we display lambda names; previously we used
`<unnamed-tag>`; now it will show `<lambda_0>`.
I added a function to the mangling context code to create this string;
for Itanium it just returns an empty string.
Bug: https://bugs.llvm.org/show_bug.cgi?id=48432
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D95187
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
There are two use cases.
Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).
Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld". You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727https://sourceware.org/bugzilla/show_bug.cgi?id=22969)
Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).
This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.
`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.
Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).
Differential Revision: https://reviews.llvm.org/D85474
For Clang synthesized `__va_list_tag` (`CreateX86_64ABIBuiltinVaListDecl`),
its DW_AT_decl_file/DW_AT_decl_line are arbitrarily set from `CurLoc`.
In a stage 2 `-DCMAKE_BUILD_TYPE=Debug` clang build, I observe that
in driver.cpp, DW_AT_decl_file/DW_AT_decl_line may be set to an `#include` line
(the transitively included file uses va_arg (`__builtin_va_arg`)).
This seems arbitrary. Drop that.
Reviewed By: #debug-info, dblaikie
Differential Revision: https://reviews.llvm.org/D94735
`getLineNumber()` picks CurLoc if the parameter is invalid. This appears to
mainly work around missing SourceLocation information for some constructs, but
sometimes adds unintended locations.
* For `CodeGenObjC/debug-info-blocks.m`, `CurLoc` has been advanced to the closing brace. The debug line of `ImplicitVarParameter` is set to the line of `}` because this implicit parameter has an invalid `SourceLocation`. The debug line is a bit arbitrary - perhaps the location of `^{` is better.
* The file/line of Clang synthesized `__va_list_tag` is arbitrarily attached a `#include` line. D94735
Drop the special case to make getLineNumber less magic and add CurLoc fallback in its callers instead.
Tested with stage 2 -DCMAKE_BUILD_TYPE=Debug clang, byte identical.
Reviewed By: #debug-info, aprantl
Differential Revision: https://reviews.llvm.org/D94391
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
Whenever we enter a new OpenMP data environment we want to enter a
function to simplify reasoning. Later we probably want to remove the
entire specialization wrt. the if clause and pass the result to the
runtime, for now this should fix PR48686.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D94315
or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end annotates calls with attribute "clang.arc.rv"="retain"
or "clang.arc.rv"="claim", which indicates the call is implicitly
followed by a marker instruction and a retainRV/claimRV call that
consumes the call result. This is currently done only when the target
is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the
annotated calls in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the annotated
calls. It doesn't remove the attribute on the call since the backend
needs it to emit the marker instruction. The retainRV/claimRV calls
are emitted late in the pipeline to prevent optimization passes from
transforming the IR in a way that makes it harder for the ARC
middle-end passes to figure out the def-use relationship between the
call and the retainRV/claimRV calls (which is the cause of PR31925).
- The function inliner removes the autoreleaseRV call in the callee that
returns the result if nothing in the callee prevents it from being
paired up with the calls annotated with "clang.arc.rv"="retain/claim"
in the caller. If the call is annotated with "claim", a release call
is inserted since autoreleaseRV+claimRV is equivalent to a release. If
it cannot find an autoreleaseRV call, it tries to transfer the
attributes to a function call in the callee. This is important since
ARC optimizer can remove the autoreleaseRV call returning the callee
result, which makes it impossible to pair it up with the retainRV or
claimRV call in the caller. If that fails, it simply emits a retain
call in the IR if the call is annotated with "retain" and does nothing
if it's annotated with "claim".
- This patch teaches dead argument elimination pass not to change the
return type of a function if any of the calls to the function are
annotated with attribute "clang.arc.rv". This is necessary since the
pass can incorrectly determine nothing in the IR uses the function
return, which can happen since the front-end no longer explicitly
emits retainRV/claimRV calls in the IR, and change its return type to
'void'.
Future work:
- Use the attribute on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the attributes.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
In the PPC32 SVR4 ABI, a va_list has copies of registers from the function call.
va_arg looked in the wrong registers for (the pointer representation of) an
object in Objective-C, and for some types in C++. Fix va_arg to look in the
general-purpose registers, not the floating-point registers. Also fix va_arg
for some C++ types, like a member function pointer, that are aggregates for
the ABI.
Anthony Richardby found the problem in Objective-C. Eli Friedman suggested
part of this fix.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47921
Reviewed By: efriedma, nemanjai
Differential Revision: https://reviews.llvm.org/D90329
Most of CGExprConstant.cpp is using the CharUnits abstraction
and is using getCharWidth() (directly of indirectly) when converting
between size of a char and size in bits. This patch is making that
abstraction more consistent by adding CharTy to the CodeGenTypeCache
(honoring getCharWidth() when mapping from char to LLVM IR types,
instead of using Int8Ty directly).
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D94979
When using getByteArrayType the requested size is calculated in
char units, but the type used for the array was hardcoded to the
Int8Ty. This patch is using getCharWIdth a bit more consistently
by using getIntNTy in combination with getCharWidth, instead
of explictly using getInt8Ty.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D94977
This patch implements codegen for __managed__ variable attribute for HIP.
Diagnostics will be added later.
Differential Revision: https://reviews.llvm.org/D94814
check
This patch fixes a bug in emitARCOperationAfterCall where it inserts the
fall-back call after a bitcast instruction and then replaces the
bitcast's operand with the result of the fall-back call. The generated
IR without this patch looks like this:
msgSend.call: ; preds = %entry
%call = call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend
br label %msgSend.cont
msgSend.null-receiver: ; preds = %entry
call void @llvm.objc.release(i8* %4)
br label %msgSend.cont
msgSend.cont:
%8 = phi i8* [ %call, %msgSend.call ], [ null, %msgSend.null-receiver ]
%9 = bitcast i8* %10 to %0*
%10 = call i8* @llvm.objc.retain(i8* %8)
Notice that `%9 = bitcast i8* %10` to %0* is taking operand %10 which is
defined after it.
To fix the bug, this patch modifies the insert point to point to the
bitcast instruction so that the fall-back call is inserted before the
bitcast. In addition, it teaches the function to look at phi
instructions that are generated when there is a check for a null
receiver and insert the retainRV/claimRV instruction right after the
call instead of inserting a fall-back call right after the phi
instruction.
rdar://73360225
Differential Revision: https://reviews.llvm.org/D95181
CodeGenModule::EmitNullConstant() creates constants with their "in memory"
type, not their "in vregs" type. The one place where this difference matters is
when the type is _Bool, as that is an i1 when in vregs and an i8 in memory.
Fixes: rdar://73361264
Summary:
The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library.
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D94806
Combined with 'da98651 - Revert "DR2064:
decltype(E) is only a dependent', this change (5a391d3) caused verifier
errors when building Chromium. See https://crbug.com/1168494#c1 for a
reproducer.
Additionally it reverts changes that were dependent on this one, see
below.
> Following up on PR48517, fix handling of template arguments that refer
> to dependent declarations.
>
> Treat an id-expression that names a local variable in a templated
> function as being instantiation-dependent.
>
> This addresses a language defect whereby a reference to a dependent
> declaration can be formed without any construct being value-dependent.
> Fixing that through value-dependence turns out to be problematic, so
> instead this patch takes the approach (proposed on the core reflector)
> of allowing the use of pointers or references to (but not values of)
> dependent declarations inside value-dependent expressions, and instead
> treating template arguments as dependent if they evaluate to a constant
> involving such dependent declarations.
>
> This ends up affecting a bunch of OpenMP tests, due to OpenMP
> imprecisely handling instantiation-dependent constructs, bailing out
> early instead of processing dependent constructs to the extent possible
> when handling the template.
>
> Previously committed as 8c1f2d15b8, and
> reverted because a dependency commit was reverted.
This reverts commit 5a391d38ac.
It also restores clang/test/SemaCXX/coroutines.cpp to its state before
da986511fb.
Revert "[c++20] P1907R1: Support for generalized non-type template arguments of scalar type."
> Previously committed as 9e08e51a20, and
> reverted because a dependency commit was reverted. This incorporates the
> following follow-on commits that were also reverted:
>
> 7e84aa1b81 by Simon Pilgrim
> ed13d8c667 by me
> 95c7b6cadb by Sam McCall
> 430d5d8429 by Dave Zarzycki
This reverts commit 4b574008ae.
Revert "[msabi] Mangle a template argument referring to array-to-pointer decay"
> [msabi] Mangle a template argument referring to array-to-pointer decay
> applied to an array the same as the array itself.
>
> This follows MS ABI, and corrects a regression from the implementation
> of generalized non-type template parameters, where we "forgot" how to
> mangle this case.
This reverts commit 18e093faf7.
OMP_MAP_TARGET_PARAM flag is used to mark the data that shoud be passed
as arguments to the target kernels, nothing else. But the compiler still
marks the data with OMP_MAP_TARGET_PARAM flags even if the data is
passed to the data movement directives, like target data, target update
etc. This flag is just ignored for this directives and the compiler does
not need to emit it.
Reviewed By: cchen
Differential Revision: https://reviews.llvm.org/D91261
D94745 rewrites the `deviceRTLs` using OpenMP and compiles it by directly
calling the device compilation. `clang` crashes because entry in
`OffloadEntriesDeviceGlobalVar` is unintialized. Current design supposes the
device compilation can only be invoked after host compilation with the host IR
such that `clang` can initialize `OffloadEntriesDeviceGlobalVar` from host IR.
This avoids us using device compilation directly, especially when we only have
code wrapped into `declare target` which are all device code. The same issue
also exists for `OffloadEntriesInfoManager`.
In this patch, we simply initialized an entry if it is not in the maps. Not sure
we need an option to tell the device compiler that it is invoked standalone.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94871
Previously committed as 9e08e51a20, and
reverted because a dependency commit was reverted. This incorporates the
following follow-on commits that were also reverted:
7e84aa1b81 by Simon Pilgrim
ed13d8c667 by me
95c7b6cadb by Sam McCall
430d5d8429 by Dave Zarzycki
for function scopes, rather than using the qualified name.
In line-tables-only mode, we used to emit qualified names as the display name for functions when using CodeView.
This patch changes to emitting the parent scopes instead, with forward declarations for class types.
The total object file size ends up being slightly smaller than if we use the full qualified names.
Differential Revision: https://reviews.llvm.org/D94639
Under -mabi=ieeelongdouble on PowerPC, IEEE-quad floating point semantic
is used for long double. This patch mutates call to related builtins
into f128 version on PowerPC. And in theory, this should be applied to
other targets when their backend supports IEEE 128-bit style libcalls.
GCC already has these mutations except nansl, which is not available on
PowerPC along with other variants (nans, nansf).
Reviewed By: RKSimon, nemanjai
Differential Revision: https://reviews.llvm.org/D92080
This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes
atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`,
and `__arm_st64bv0`. These are selected into the LS64 instructions
LD64B, ST64B, ST64BV and ST64BV0, respectively.
Based on patches written by Simon Tatham.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D93232
Move nomerge attribute from function declaration/definition to callsites to
allow virtual function calls attach the attribute.
Differential Revision: https://reviews.llvm.org/D94537
The patch adds the required methods to FixedPointBuilder
for converting between fixed-point and floating point,
and uses them from Clang.
This depends on D54749.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D86632
VLST return values are coerced to VLATs in the function epilog for
consistency with the VLAT ABI. Previously, this coercion was done
through memory. It is preferable to use the
llvm.experimental.vector.insert intrinsic to avoid going through memory
here.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D94290
ELF -fno-pic sets dso_local on a function declaration to allow direct accesses
when taking its address (similar to a data symbol). The emitted code follows the
traditional GCC/Clang -fno-pic behavior: an absolute relocation is produced.
If the function is not defined in the executable, a canonical PLT entry will be
needed at link time. This is similar to a copy relocation and is incompatible
with (-Bsymbolic or --dynamic-list linked shared objects / protected symbols in
a shared object).
This patch gives -fno-pic code a way to avoid such a canonical PLT entry.
The FIXME was about a generalization for -fpie -mpie-copy-relocations (now -fpie
-fdirect-access-external-data). While we could set dso_local to avoid GOT when
taking the address of a function declaration (there is an ignorable difference
about R_386_PC32 vs R_386_PLT32 on i386), it likely does not provide any benefit
and can just cause trouble, so we don't make the generalization.
D92633 added -f[no-]direct-access-external-data to supersede -m[no-]pie-copy-relocations.
(The option works for -fpie but is a no-op for -fno-pic and -fpic.)
This patch makes -fno-pic -fno-direct-access-external-data drop dso_local from
global variable declarations. This usually causes the backend to emit a GOT
indirection for external data access. With a GOT relocation, the subsequent
-no-pie link will not have copy relocation even if the data symbol turns out to
be defined by a shared object.
Differential Revision: https://reviews.llvm.org/D92714
GCC r218397 "x86-64: Optimize access to globals in PIE with copy reloc" made
-fpie code emit R_X86_64_PC32 to reference external data symbols by default.
Clang adopted -mpie-copy-relocations D19996 as a flexible alternative.
The name -mpie-copy-relocations can be improved [1] and does not capture the
idea that this option can apply to -fno-pic and -fpic [2], so this patch
introduces -f[no-]direct-access-external-data and makes -mpie-copy-relocations
their aliases for compatibility.
[1]
For
```
extern int var;
int get() { return var; }
```
if var is defined in another translation unit in the link unit, there is no copy
relocation.
[2]
-fno-pic -fno-direct-access-external-data is useful to avoid copy relocations.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65888
If a shared object is linked with -Bsymbolic or --dynamic-list and exports a
data symbol, normally the data symbol cannot be accessed by -fno-pic code
(because by default an absolute relocation is produced which will lead to a copy
relocation). -fno-direct-access-external-data can prevent copy relocations.
-fpic -fdirect-access-external-data can avoid GOT indirection. This is like the
undefined counterpart of -fno-semantic-interposition. However, the user should
define var in another translation unit and link with -Bsymbolic or
--dynamic-list, otherwise the linker will error in a -shared link. Generally
the user has better tools for their goal but I want to mention that this
combination is valid.
On COFF, the behavior is like always -fdirect-access-external-data.
`__declspec(dllimport)` is needed to enable indirect access.
There is currently no plan to affect non-ELF behaviors or -fpic behaviors.
-fno-pic -fno-direct-access-external-data will be implemented in the subsequent patch.
GCC feature request https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D92633
Like @aprantl suggested, modify to use the canonicalized DIFile, if we
don't know the loc info and filename for the compiler generated
functions for example static initialization functions.
Reviewed By: dblaikie, aprantl
Differential Revision: https://reviews.llvm.org/D87147
Match the legacy PM in running various ObjC ARC passes.
This requires making some module passes into function passes. These were
initially ported as module passes since they add function declarations
(e.g. https://reviews.llvm.org/D86178), but that's still up for debate
and other passes do so.
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D93743
@ikudrin enabled support for dwarf64 in D87011. Adding a clang flag so it can be used through that compilation pass.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D90507
`wasm_rethrow_in_catch` intrinsic and builtin are used in order to
rethrow an exception when the exception is caught but there is no
matching clause within the current `catch`. For example,
```
try {
foo();
} catch (int n) {
...
}
```
If the caught exception does not correspond to C++ `int` type, it should
be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch`
because at the time I thought there would be another intrinsic for C++'s
`throw` keyword, which rethrows an exception. It turned out that `throw`
keyword doesn't require wasm's `rethrow` instruction, so we rename
`rethrow_in_catch` to just `rethrow` here.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94038
This patch adds support for two new variants of the vectorize_width
pragma:
1. vectorize_width(X[, fixed|scalable]) where an optional second
parameter is passed to the vectorize_width pragma, which indicates if
the user wishes to use fixed width or scalable vectorization. For
example the user can now write something like:
#pragma clang loop vectorize_width(4, fixed)
or
#pragma clang loop vectorize_width(4, scalable)
In the absence of a second parameter it is assumed the user wants
fixed width vectorization, in order to maintain compatibility with
existing code.
2. vectorize_width(fixed|scalable) where the width is left unspecified,
but the user hints what type of vectorization they prefer, either
fixed width or scalable.
I have implemented this by making use of the LLVM loop hint attribute:
llvm.loop.vectorize.scalable.enable
Tests were added to
clang/test/CodeGenCXX/pragma-loop.cpp
for both the 'fixed' and 'scalable' optional parameter.
See this thread for context: http://lists.llvm.org/pipermail/cfe-dev/2020-November/067262.html
Differential Revision: https://reviews.llvm.org/D89031
This reduces the number of `WinX86_64ABIInfo::classify` call sites from
3 to 1. The call sites were similar, but passed different values for
FreeSSERegs. Use variables instead of `if`s to manage that argument.