This patch is child of D89671, contains the clang
implementation to use the OpenMP IRBuilder's section
construct.
Co-author: @anchu-rajendran
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91054
The Neon vadd intrinsics were added to the ARMSIMD intrinsic map,
however due to being defined under an AArch64 guard in arm_neon.td,
were not previously useable on ARM. This change rectifies that.
It is important to note that poly128 is not valid on ARM, thus it was
extracted out of the original arm_neon.td definition and separated
for the sake of AArch64.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D100772
Reverts parts of https://reviews.llvm.org/D17183, but keeps the
resetDataLayout() API and adds an assert that checks that datalayout string and
user label prefix are in sync.
Approach 1 in https://reviews.llvm.org/D17183#2653279
Reduces number of TUs build for 'clang-format' from 689 to 575.
I also implemented approach 2 in D100764. If someone feels motivated
to make us use DataLayout more, it's easy to revert this change here
and go with D100764 instead. I don't plan on doing more work in this
area though, so I prefer going with the smaller, more self-consistent change.
Differential Revision: https://reviews.llvm.org/D100776
Commit e3d8ee35e4 ("reland "[DebugInfo] Support to emit debugInfo
for extern variables"") added support to emit debugInfo for
extern variables if requested by the target. Currently, only
BPF target enables this feature by default.
As BPF ecosystem grows, callback function started to get
support, e.g., recently bpf_for_each_map_elem() is introduced
(https://lwn.net/Articles/846504/) with a callback function as an
argument. In the future we may have something like below as
a demonstration of use case :
extern int do_work(int);
long bpf_helper(void *callback_fn, void *callback_ctx, ...);
long prog_main() {
struct { ... } ctx = { ... };
return bpf_helper(&do_work, &ctx, ...);
}
Basically bpf helper may have a callback function and the
callback function is defined in another file or in the kernel.
In this case, we would like to know the debuginfo types for
do_work(), so the verifier can proper verify the safety of
bpf_helper() call.
For the following example,
extern int do_work(int);
long bpf_helper(void *callback_fn);
long prog() {
return bpf_helper(&do_work);
}
Currently, there is no debuginfo generated for extern function do_work().
In the IR, we have,
...
define dso_local i64 @prog() local_unnamed_addr #0 !dbg !7 {
entry:
%call = tail call i64 @bpf_helper(i8* bitcast (i32 (i32)* @do_work to i8*)) #2, !dbg !11
ret i64 %call, !dbg !12
}
...
declare dso_local i32 @do_work(i32) #1
...
This patch added support for the above callback function use case, and
the generated IR looks like below:
...
declare !dbg !17 dso_local i32 @do_work(i32) #1
...
!17 = !DISubprogram(name: "do_work", scope: !1, file: !1, line: 1, type: !18, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !2)
!18 = !DISubroutineType(types: !19)
!19 = !{!20, !20}
!20 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
The TargetInfo.allowDebugInfoForExternalVar is renamed to
TargetInfo.allowDebugInfoForExternalRef as now it guards
both extern variable and extern function debuginfo generation.
Differential Revision: https://reviews.llvm.org/D100567
Default address space (applies when no explicit address space was
specified) maps to generic (4) address space.
Added SYCL named address spaces `sycl_global`, `sycl_local` and
`sycl_private` defined as sub-sets of the default address space.
Static variables without address space now reside in global address
space when compile for SPIR target, unless they have an explicit address
space qualifier in source code.
Differential Revision: https://reviews.llvm.org/D89909
Adds new intrinsics for instructions that are in the final SIMD spec but did not
previously have intrinsics. Also updates the names of existing intrinsics to
reflect the final names of the underlying instructions in the spec. Keeps the
old names as deprecated functions to ease the transition to the new names.
Differential Revision: https://reviews.llvm.org/D101112
LLVM should be smarter about *known* malloc's alignment and this knowledge may enable other optimizations.
Originally started as LLVM patch - https://reviews.llvm.org/D100862 but this logic should be really in Clang.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D100879
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.
Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)
Also document the function attribute "frame-pointer" which is long overdue.
Differential Revision: https://reviews.llvm.org/D101016
From https://bugs.llvm.org/show_bug.cgi?id=49739:
Currently, `#pragma clang fp` are ignored for matrix types.
For the code below, the `contract` fast-math flag should be added to the generated call to `llvm.matrix.multiply` and `fadd`
```
typedef float fx2x2_t __attribute__((matrix_type(2, 2)));
void foo(fx2x2_t &A, fx2x2_t &C, fx2x2_t &B) {
#pragma clang fp contract(fast)
C = A*B + C;
}
```
Reviewed By: fhahn, mibintc
Differential Revision: https://reviews.llvm.org/D100834
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.
Reviewed By: jdoerfert, Meinersbur
Differential Revision: https://reviews.llvm.org/D95976
On ELF targets, if a function has uwtable or personality, or does not have
nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.
Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified.
(i.e. If -g[123], every function gets `.eh_frame`.
This behavior is strange but that is the status quo on GCC and Clang.)
Let's take asan as an example. Other sanitizers are similar.
`asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true,
so every function gets `.eh_frame` if `-g[123]` is specified.
This is the root cause that
`-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame
while
`-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.
This patch
* sets the nounwind attribute on sanitizer module ctor/dtor.
* let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.
The "uwtable" mechanism is generic: synthesized functions not cloned/specialized
from existing ones should consider `Function::createWithDefaultAttr` instead of
`Function::create` if they want to get some default attributes which
have more of module semantics.
Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.
Differential Revision: https://reviews.llvm.org/D100251
The implicitly generated mappings for allocation/deallocation in mappers
runtime should be mapped as implicit, also no need to clear member_of
flag to avoid ref counter increment. Also, the ref counter should not be
incremented for the very first element that comes from the mapper
function.
Differential Revision: https://reviews.llvm.org/D100673
As reported in PR50025, sometimes we would end up not emitting functions
needed by inline multiversioned variants. This is because we typically
use the 'deferred decl' mechanism to emit these. However, the variants
are emitted after that typically happens. This fixes that by ensuring
we re-run deferred decls after this happens. Also, the multiversion
emission is done recursively to ensure that MV functions that require
other MV functions to be emitted get emitted.
This reverts commit fa6b54c44a.
The commited patch broke mlir tests. It seems that mlir tests depend on coroutine function properties set in CoroEarly pass.
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920
To fix this, we set the attributes in the Clang front-end instead of in CoroEarly pass.
Reviewed By: rjmccall, ChuanqiXu
Differential Revision: https://reviews.llvm.org/D100282
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920
Differential Revision: https://reviews.llvm.org/D100282
Add device variables to llvm.compiler.used if they are
ODR-used by either host or device functions.
This is necessary to prevent them from being
eliminated by whole-program optimization
where the compiler has no way to know a device
variable is used by some host code.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D98814
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from
if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
to
if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
Introduce a getValueAsBool that normalize the check, with the following
behavior:
no attributes or attribute set to "false" => return false
attribute set to "true" => return true
Differential Revision: https://reviews.llvm.org/D99299
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.
Differential Revision: https://reviews.llvm.org/D100596
For combined worksharing directives need to emit the temp arrays outside
of the parallel region and update them in the master thread only.
Differential Revision: https://reviews.llvm.org/D100187
This is a Clang-only change and depends on the existing "musttail"
support already implemented in LLVM.
The [[clang::musttail]] attribute goes on a return statement, not
a function definition. There are several constraints that the user
must follow when using [[clang::musttail]], and these constraints
are verified by Sema.
Tail calls are supported on regular function calls, calls through a
function pointer, member function calls, and even pointer to member.
Future work would be to throw a warning if a users tries to pass
a pointer or reference to a local variable through a musttail call.
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D99517
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example
struct S {
__attribute__ ((__aligned__(16))) double v[4];
};
Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)
Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.
This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.
The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.
For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.
On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.
Patch by Momchil Velikov and Lucas Prates.
Differential Revision: https://reviews.llvm.org/D98794
The documentation says that for variadic functions, all composites
are treated similarly, no special handling of HFAs/HVAs, not even
for the fixed arguments of a variadic function.
Differential Revision: https://reviews.llvm.org/D100467
Removes the builtins and intrinsics used to opt in to using these instructions
and replaces them with normal ISel patterns now that they are no longer
prototypes.
Differential Revision: https://reviews.llvm.org/D100402
Add a custom DAG combine and ISD opcode for detecting patterns like
(uint_to_fp (extract_subvector ...))
before the extract_subvector is expanded to ensure that they will ultimately
lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions
are no longer prototypes and can now be produced via standard IR, this commit
also removes the target intrinsics and builtins that had been used to prototype
the instructions.
Differential Revision: https://reviews.llvm.org/D100425
Now that these instructions are no longer prototypes, we do not need to be
careful about keeping them opt-in and can use the standard LLVM infrastructure
for them. This commit removes the bespoke intrinsics we were using to represent
these operations in favor of the corresponding target-independent intrinsics.
The clang builtins are preserved because there is no standard way to easily
represent these operations in C/C++.
For consistency with the scalar codegen in the Wasm backend, the intrinsic used
to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though
@llvm.roundeven better captures the semantics of the underlying Wasm
instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is
left to a potential future patch.
Differential Revision: https://reviews.llvm.org/D100411
Aggregate types over 16 bytes are passed by reference.
Contrary to the x86_64 ABI, smaller structs with an odd (non power
of two) are padded and passed in registers.
Differential Revision: https://reviews.llvm.org/D100374
According to i386 System V ABI:
1. when __m256 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 32 byte boundary at the time of the call.
2. when __m512 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 64 byte boundary at the time of the call.
The current method of clang passing __m512 parameter are as follow:
1. when target supports avx512, passing it with 64 byte alignment;
2. when target supports avx, passing it with 32 byte alignment;
3. Otherwise, passing it with 16 byte alignment.
Passing __m256 parameter are as follow:
1. when target supports avx or avx512, passing it with 32 byte alignment;
2. Otherwise, passing it with 16 byte alignment.
This pach will passing __m128/__m256/__m512 following i386 System V ABI and
apply it to Linux only since other System V OS (e.g Darwin, PS4 and FreeBSD) don't
want to spend any effort dealing with the ramifications of ABI breaks at present.
Differential Revision: https://reviews.llvm.org/D78564
The existing Windows Itanium patches for dllimport/export
behaviour w.r.t vtables/rtti can't be adopted for PS4 due to
backwards compatibility reasons (see comments on
https://reviews.llvm.org/D90299).
This commit adds our PS4 scheme for this to Clang.
Differential Revision: https://reviews.llvm.org/D93203
Summary: The tags DW_LANG_C_plus_plus_14 and DW_LANG_C_plus_plus_11, introduced in Dwarf-5, are unexpected in previous versions. Fixing the mismathing doesn't have any drawbacks for any other debuggers, but helps dbx.
Reviewed By: aprantl, shchenz
Differential Revision: https://reviews.llvm.org/D99250
The first one is the real parameters of the coroutine function, the
other one just for copying parameters to the coroutine frame.
Considering the following c++ code:
```
struct coro {
...
};
coro foo(struct test & t) {
...
co_await suspend_always();
...
co_await suspend_always();
...
co_await suspend_always();
}
int main(int argc, char *argv[]) {
auto c = foo(...);
c.handle.resume();
...
}
```
Function foo is the standard coroutine function, and it has only
one parameter named t (ignoring this at first),
when we use the llvm code to compile this function, we can get the
following ir:
```
!2921 = distinct !DISubprogram(name: "foo", linkageName:
"_ZN6Object3fooE4test", scope: !2211, file: !45, li\
ne: 48, type: !2329, scopeLine: 48, flags: DIFlagPrototyped |
DIFlagAllCallsDescribed, spFlags: DISPFlagDefi\
nition | DISPFlagOptimized, unit: !44, declaration: !2328,
retainedNodes: !2922)
!2924 = !DILocalVariable(name: "t", arg: 2, scope: !2921, file: !45,
line: 48, type: !838)
...
!2926 = !DILocalVariable(name: "t", scope: !2921, type: !838, flags:
DIFlagArtificial)
```
We can find there are two `the same` DIVariable named t in the same
dwarf scope for foo.resume.
And when we try to use llvm-dwarfdump to dump the dwarf info of this
elf, we get the following output:
```
0x00006684: DW_TAG_subprogram
DW_AT_low_pc (0x00000000004013a0)
DW_AT_high_pc (0x00000000004013a8)
DW_AT_frame_base (DW_OP_reg7 RSP)
DW_AT_object_pointer (0x0000669c)
DW_AT_GNU_all_call_sites (true)
DW_AT_specification (0x00005b5c "_ZN6Object3fooE4test")
0x000066a5: DW_TAG_formal_parameter
DW_AT_name ("t")
DW_AT_decl_file ("/disk1/yifeng.dongyifeng/my_code/llvm/build/bin/coro-debug-1.cpp")
DW_AT_decl_line (48)
DW_AT_type (0x00004146 "test")
0x000066ba: DW_TAG_variable
DW_AT_name ("t")
DW_AT_type (0x00004146 "test")
DW_AT_artificial (true)
```
The elf also has two 't' in the same scope.
But unluckily, it might let the debugger
confused. And failed to print parameters for O0 or above.
This patch will make coroutine parameters and move
parameters use the same DIVar and try to fix the problems
that I mentioned before.
Test Plan: check-clang
Reviewed By: aprantl, jmorse
Differential Revision: https://reviews.llvm.org/D97533
This implements C-style type conversions for matrix types, as specified
in clang/docs/MatrixTypes.rst.
Fixes PR47141.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D99037
This ensures these types have distinct names if they are distinct types
(eg: if one is an instantiation with a type in one inline namespace, and
another from a type with the same simple name, but in a different inline
namespace).
As it is being noted in D99249, lack of alignment information on `this`
has been preventing LICM from happening.
For some time now, lack of alignment attribute does *not* imply
natural alignment, but an alignment of `1`.
Also, we used to treat dereferenceable as implying alignment,
but we no longer do, so it's a bugfix.
Differential Revision: https://reviews.llvm.org/D99790
Recently atomicrmw started to support fadd/fsub:
https://reviews.llvm.org/D53965
However clang atomic builtins fetch add/sub still does not support
emitting atomicrmw fadd/fsub.
This patch adds that.
Reviewed by: John McCall, Artem Belevich, Matt Arsenault, JF Bastien,
James Y Knight, Louis Dionne, Olivier Giroux
Differential Revision: https://reviews.llvm.org/D71726
The reason for the NewPM redesign is described in the commit
cba3e783389a: [NewPM] Disable PreservedCFGChecker ...
The checker introduces an internal custom CFG analysis that tracks
current up-to date CFG snapshot. The analysis is invalidated along
any other CFG related analysis (the key is CFGAnalyses). If the CFG
analysis is not invalidated at a functional pass exit then the checker
asserts that the CFG snapshot taken from this analysis is equals to
a snapshot of the current CFG.
Along the way:
- the function CFG::printDiff() is simplified by removing function
name calculation. The name is printed by the caller;
- fixed CFG invalidated condition (see CFG::invalidate());
- StandardInstrumentations::registerCallbacks() gets additional
optional parameter of type FunctionAnalysisManager*, which is
needed by the checker to get the custom CFG analysis;
- several PM related tests updated to explicitly set
-verify-cfg-preserved=1 as they need.
This patch is safe to land as the CFGChecker is left switched off
(the options -verify-cfg-preserved is false by default). It will be
switched on by a separate patch to minimize possible reverts.
Reviewed By: skatkov, kuhar
Differential Revision: https://reviews.llvm.org/D91327
These all pass 1 type to getIntrinsic. So rather than assigning
IntrinsicTypes for each builtin which invokes the SmallVector
constructor, just select the intrinsic ID with a switch and
share a single assignment of IntrinsicTypes.
Head files are included in a separate patch in case the name needs to be changed.
RV32 / 64:
clmul
clmulh
clmulr
Differential Revision: https://reviews.llvm.org/D99711
Forgot to amend the Author.
Original commit message:
Header files are included in a separate patch in case the name needs to be changed.
RV32 / 64:
orc.b
Differential Revision: https://reviews.llvm.org/D99320
Implementation for RISC-V Zbr extension intrinsic.
Header files are included in separate patch in case the name needs to be changed
RV32 / 64:
crc32b
crc32h
crc32w
crc32cb
crc32ch
crc32cw
RV64 Only:
crc32d
crc32cd
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99009
Summary:
Currently the mapping names are not passed to the mapper components that set up
the array region. This means array mappings will not have their names availible
in the runtime. This patch fixes this by passing the argument name to the region
correctly. This means that the mapped variable's name will be the declared
mapper that placed it on the device.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D99681
Calling `ParseCommandLineOptions` should only be called from `main` as the
CommandLine setup code isn't thread-safe. As BackendUtil is part of the
generic Clang FrontendAction logic, a process which has several threads executing
Clang FrontendActions will randomly crash in the unsafe setup code.
This patch avoids calling the function unless either the debug-pass option or
limit-float-precision option is set. Without these two options set the
`ParseCommandLineOptions` call doesn't do anything beside parsing
the command line `clang` which doesn't set any options.
See also D99652 where LLDB received a workaround for this crash.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D99740
Need to bitcast the function pointer passed as a parameter to the real
type to avoid possible problem with calling conventions.
Differential Revision: https://reviews.llvm.org/D99521
Removes the prototype builtin and intrinsic for i64x2.eq and implements that
instruction as well as the other i64x2 comparison instructions in the final SIMD
spec. Unsigned comparisons were not included in the final spec, so they still
need to be scalarized via a custom lowering.
Differential Revision: https://reviews.llvm.org/D99623
another one for distributed mode.
Currently during module importing, ThinLTO opens all the source modules,
collect functions to be imported and append them to the destination module,
then leave all the modules open through out the lto backend pipeline. This
patch refactors it in the way that one source module will be closed before
another source module is opened. All the source modules will be closed after
importing phase is done. It will save some amount of memory when there are
many source modules to be imported.
Note that this patch only changes the distributed thinlto mode. For in
process thinlto mode, one source module is shared acorss different thinlto
backend threads so it is not changed in this patch.
Differential Revision: https://reviews.llvm.org/D99554
Need to cast the argument for the debug wrapper function call to the
corresponding parameter type to avoid crash.
Differential Revision: https://reviews.llvm.org/D99617
If no initializer-clause is specified, the private variables will be
initialized following the rules for initialization of objects with static
storage duration.
Need to adjust the implementation to the current version of the
standard.
Differential Revision: https://reviews.llvm.org/D99539
Since the introduction of class properties in Objective-C it is possible to declare a class and an instance
property with the same identifier in an interface/protocol.
Right now Clang just generates debug information for whatever property comes first in the source file.
The second property is ignored as it's filtered out by the set of already emitted properties (which is just
using the identifier of the property to check for equivalence). I don't think generating debug info in this case
was never supported as the identifier filter is in place since 7123bca7fb
(which precedes the introduction of class properties).
This patch expands the filter to take in account identifier + whether the property is class/instance. This
ensures that both properties are emitted in this special situation.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D99512
The `noinline` for non-SPMD parallel functions is probably not necessary
but as long as we use it we should put it on the outermost parallel
function, which is the wrapper, not the actual outlined function.
Resolves PR49752
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D99506
The original issue is caused by the fact that the variable is allocated
with incorrect type i1 instead of i8. This causes the bitcasting of the
declaration to i8 type and the bitcast expression does not match the
original variable.
To fix the problem, the UndefValue initializer and the original
variable should be emitted with type i8, not i1.
Differential Revision: https://reviews.llvm.org/D99297
Need to insert a basic block during generation of the target region to
avoid crash for the GPU to be able always calling a cleanup action.
This cleanup action is required for the correct emission of the target
region for the GPU.
Differential Revision: https://reviews.llvm.org/D99445
I think byval/sret and the others are close to being able to rip out
the code to support the missing type case. A lot of this code is
shared with inalloca, so catch this up to the others so that can
happen.
When emitting a function body there needs to be a instr profiling counter emitted. Otherwise instr profiling won't work for this function.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D98135
tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available.
Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame).
The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame.
In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap.
Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime.
To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point.
The following is a common code pattern called "Symmetric Transfer" in coroutine:
```
auto tmp = await_suspend();
__builtin_coro_resume(tmp.address());
return;
```
In the above code example, `await_suspend()` returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine.
During the call to `await_suspend()`, the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards.
However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for `tmp`. It will then call the `address` function on tmp. `address` function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the `tmp` pointer. So when the optimizer looks at it, it has to conservatively assume that `tmp` may escape and hence put it on the heap. Furthermore, in some cases `address` call would be inlined, which will generate a bunch of store/load instructions that move the `tmp` pointer around. Those stores will also make the compiler to think that `tmp` might escape.
To summarize, it's really difficult for the mid-end to figure out that the `tmp` data is short-lived.
I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines.
Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893
Differential Revision: https://reviews.llvm.org/D99227
Add builtin function __builtin_get_device_side_mangled_name
to get device side manged name for functions and global
variables, which can be used to get symbol address of kernels
or variables by mangled name in dynamically loaded
bundled code objects at run time.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D99301
In order to test the preservation of the original Debug Info metadata
in your projects, a front end option could be very useful, since users
usually report that a concrete entity (e.g. variable x, or function fn2())
is missing debug info. The [0] is an example of running the utility
on GDB Project.
This depends on: D82546 and D82545.
Differential Revision: https://reviews.llvm.org/D82547
If emit inlined region for master/critical directives, no need to clear
lambda/block context data, otherwise the variables cannot be found and
it causes a crash at compile time.
Differential Revision: https://reviews.llvm.org/D99280
There are a number of functions in altivec.h that use
vector __int128 which isn't supported on AIX. Those functions
need to be guarded for targets that don't support the type.
Furthermore, the functions that produce quadword instructions
without using the type need a builtin. This patch adds the
macro guards to altivec.h using the __SIZEOF_INT128__ which
is only defined on targets that support the __int128 type.
This attribute represents the minimum and maximum values vscale can
take. For now this attribute is not hooked up to anything during
codegen, this will be added in the future when such codegen is
considered stable.
Additionally hook up the -msve-vector-bits=<x> clang option to emit this
attribute.
Differential Revision: https://reviews.llvm.org/D98030
08196e0b2e exposed LowerExpectIntrinsic's
internal implementation detail in the form of
LikelyBranchWeight/UnlikelyBranchWeight options to the outside.
While this isn't incorrect from the results viewpoint,
this is suboptimal from the layering viewpoint,
and causes confusion - should transforms also use those weights,
or should they use something else, D98898?
So go back to status quo by making LikelyBranchWeight/UnlikelyBranchWeight
internal again, and fixing all the code that used it directly,
which currently is only clang codegen, thankfully,
to emit proper @llvm.expect intrinsics instead.
Upon reviewing D98898 i've come to realization that these are
implementation detail of LowerExpectIntrinsicPass,
and they should not be exposed to outside of it.
This reverts commit ee8b53815d.
This makes the settings available for use in other passes by housing
them within the Support lib, but NFC otherwise.
See D98898 for the proposed usage in SimplifyCFG
(where this change was originally included).
Differential Revision: https://reviews.llvm.org/D98945
C functions may be declared and defined in different prototypes like below. This patch unifies the checks for mangling names in symbol linkage name emission and debug linkage name emission so that the two names are consistent.
static int go(int);
static int go(a) int a;
{
return a;
}
Test Plan:
Differential Revision: https://reviews.llvm.org/D98799
Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.
Depends on D98466.
Differential Revision: https://reviews.llvm.org/D98676
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.
Depends on D98457.
Differential Revision: https://reviews.llvm.org/D98466
This reverts commit 809a1e0ffd.
Mach-O doesn't support dso_local and this change broke XNU because of the use of dso_local.
Differential Revision: https://reviews.llvm.org/D98458
The condition variable is in scope in the loop increment, so we need to
emit the jump destination from wthin the scope of the condition
variable.
For GCC compatibility (and compatibility with real-world 'FOR_EACH'
macros), 'continue' is permitted in a statement expression within the
condition of a for loop, though, so there are two cases here:
* If the for loop has no condition variable, we can emit the jump
destination before emitting the condition.
* If the for loop has a condition variable, we must defer emitting the
jump destination until after emitting the variable. We diagnose a
'continue' appearing in the initializer of the condition variable,
because it would jump past the initializer into the scope of that
variable.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D98816
Added basic parsing/sema/serialization support for interop directive.
Support for the 'init' clause.
Differential Revision: https://reviews.llvm.org/D98558
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.
Differential Revision: https://reviews.llvm.org/D98487
The idiom:
```
DeclContext::lookup_result R = DeclContext::lookup(Name);
for (auto *D : R) {...}
```
is not safe when in the loop body we trigger deserialization from an AST file.
The deserialization can insert new declarations in the StoredDeclsList whose
underlying type is a vector. When the vector decides to reallocate its storage
the pointer we hold becomes invalid.
This patch replaces a SmallVector with an singly-linked list. The current
approach stores a SmallVector<NamedDecl*, 4> which is around 8 pointers.
The linked list is 3, 5, or 7. We do better in terms of memory usage for small
cases (and worse in terms of locality -- the linked list entries won't be near
each other, but will be near their corresponding declarations, and we were going
to fetch those memory pages anyway). For larger cases: the vector uses a
doubling strategy for reallocation, so will generally be between half-full and
full. Let's say it's 75% full on average, so there's N * 4/3 + 4 pointers' worth
of space allocated currently and will be 2N pointers with the linked list. So we
break even when there are N=6 entries and slightly lose in terms of memory usage
after that. We suspect that's still a win on average.
Thanks to @rsmith!
Differential revision: https://reviews.llvm.org/D91524
`CodeGenFunction::EmitRuntimeCall` automatically sets the right calling
convention for the callee so we can avoid setting it ourselves.
As requested in https://reviews.llvm.org/D98411
Reviewed by: anastasia
Differential Revision: https://reviews.llvm.org/D98705
The MSVC runtime library doesn't have a definition for wmemchr,
so provide an inline implementation.
Differential Revision: https://reviews.llvm.org/D98472
Recognize BI__builtin_isinf and BI__builtin_isfinite (and a few other opcodes
for finite) in testFPKind() and handle with TDC.
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D97901
This patch implements the __rndr and __rndrrs intrinsics to provide access to the random
number instructions introduced in Armv8.5-A. They are only defined for the AArch64
execution state and are available when __ARM_FEATURE_RNG is defined.
These intrinsics store the random number in their pointer argument and return a status
code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter
intrinsic reseeds the random number generator.
The instructions write the NZCV flags indicating the success of the operation that we can
then read with a CSET.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://bugs.llvm.org/show_bug.cgi?id=47838
Differential Revision: https://reviews.llvm.org/D98264
Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc
`__translate_sampler_initializer` has a calling convention of
`spir_func`, but clang generated calls to it using the default CC.
Instruction Combining was lowering these mismatching calling conventions
to `store i1* undef` which itself was subsequently lowered to a trap
instruction by simplifyCFG resulting in runtime `SIGILL`
There are arguably two bugs here: but whether there's any wisdom in
converting an obviously invalid call into a runtime crash over aborting
with a sensible error message will require further discussion. So for
now it's enough to set the right calling convention on the runtime
helper.
Reviewed By: svenh, bader
Differential Revision: https://reviews.llvm.org/D98411
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: mibintc
Differential Revision: https://reviews.llvm.org/D97125
In order to prevent further building issues related to the usage of SmallVector
in other compilation unit, this patch adjusts the llvm.h header as a workaround
instead.
Besides, this patch reverts previous workarounds:
1. Revert "[NFC] Use llvm::SmallVector to workaround XL compiler problem on AIX"
This reverts commit 561fb7f60a.
2.Revert "[clang][cli] Fix build failure in CompilerInvocation"
This reverts commit 8dc70bdcd0.
Differential Revision: https://reviews.llvm.org/D98552
This was motivated by the fact that constructor type homing (debug info
optimization that we want to turn on by default) drops some libc++ types,
so an attribute would allow us to override constructor homing and emit
them anyway. I'm currently looking into the particular libc++ issue, but
even if we do fix that, this issue might come up elsewhere and it might be
nice to have this.
As I've implemented it now, the attribute isn't specific to the
constructor homing optimization and overrides all of the debug info
optimizations.
Open to discussion about naming, specifics on what the attribute should do, etc.
Differential Revision: https://reviews.llvm.org/D97411
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.
There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
D96109 added support for unique internal linkage names for both internal
linkage functions and global variables. There was a lot of discussion on how to
get the demangling right for functions but I completely missed the point that
demanglers do not support suffixes for global vars. For example:
$ c++filt _ZL3foo
foo
$ c++filt _ZL3foo.uniq.123
_ZL3foo.uniq.123
The demangling for functions works as expected.
I am not sure of the impact of this. I don't understand how debuggers and other
tools depend on the correctness of global variable demangling so I am
pre-emptively disabling it until we can get the demangling support added.
Importantly, uniquefying global variables is not needed right now as we do not
do profile attribution to global vars based on sampling. It was added for
completeness and so this feature is not exactly missed.
Differential Revision: https://reviews.llvm.org/D98392
This patch extends the matrix spec to allow matrix-by-scalar division.
Originally support for `/` was left out to avoid ambiguity for the
matrix-matrix version of `/`, which could either be elementwise or
specified as matrix multiplication M1 * (1/M2).
For the matrix-scalar version, no ambiguity exists; `*` is also
an elementwise operation in that case. Matrix-by-scalar division
is commonly supported by systems including Matlab, Mathematica
or NumPy.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D97857
Summary:
The changes introduced in D87946 changed the API for libomptarget
functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x
but was not given a backward-compatible interface. This change will
require people using Clang 13.x or 12.x to recompile their offloading
programs.
Reviewed By: jdoerfert cchen
Differential Revision: https://reviews.llvm.org/D98358
These are incompatible with opaque pointers. This is in preparation
of dropping this API on the IRBuilder side as well.
Instead explicitly pass the loaded type.
Commit 1b04bdc2f3 added support for
capturing the 'this' pointer in a SEH context (__finally or __except),
But the case in which the 'this' pointer is part of a lambda capture
was not handled properly
Differential Revision: https://reviews.llvm.org/D97687
Demonstrate how to generate vadd/vfadd intrinsic functions
1. add -gen-riscv-vector-builtins for clang builtins.
2. add -gen-riscv-vector-builtin-codegen for clang codegen.
3. add -gen-riscv-vector-header for riscv_vector.h. It also generates
ifdef directives with extension checking, base on D94403.
4. add -gen-riscv-vector-generic-header for riscv_vector_generic.h.
Generate overloading version Header for generic api.
https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#c11-generic-interface
5. update tblgen doc for riscv related options.
riscv_vector.td also defines some unused type transformers for vadd,
because I think it could demonstrate how tranfer type work and we need
them for the whole intrinsic functions implementation in the future.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Reviewed By: jrtc27, craig.topper, HsiangKai, Jim, Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D95016
It's available both in CodeGenOptions and in LangOptions, and LangOptions
implementation is slightly better as it uses a StringRef instead of a char
pointer, so use it.
Differential Revision: https://reviews.llvm.org/D98175
LLVM is recommending to use SmallVector (that is, omitting the N), in the
absence of a well-motivated choice for the number of inlined elements N.
However, this doesn't work well with XL compiler on AIX since some header(s)
aren't properly picked up with it. We need to take a further look into the real
issue underneath and fix it in a later patch.
But currently we'd like to use this patch to unblock the build compiler issue
first.
Differential Revision: https://reviews.llvm.org/D98265
SUMMARY:
n the patch https://reviews.llvm.org/D87451 "add new option -mignore-xcoff-visibility"
we did as "The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR."
in these patch we let -mignore-xcoff-visibility effect on generating IR too. the new feature only work on AIX OS
Reviewer: Jason Liu,
Differential Revision: https://reviews.llvm.org/D89986
This is the first patch supporting M68k in Clang
- Register M68k as a target
- Target specific CodeGen support
- Target specific attribute support
Authors: myhsu, m4yers, glaubitz
Differential Revision: https://reviews.llvm.org/D88393
The option -funique-internal-linkage-names was added in D73307 and D78243 as a
LLVM early pass to insert a unique suffix to internal linkage functions and
vars. The unique suffix was the hash of the module path. However, we found
that this can be done more cleanly in clang early and the fixes that need to
be done later can be completely avoided. The fixes in particular are trying
to modify the DW_AT_linkage_name and finding the right place to insert the
pass.
This patch ressurects the original implementation proposed in D73307 which
was reviewed and then ditched in favor of the pass based approach.
Differential Revision: https://reviews.llvm.org/D96109
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
* Only the worksharing-loop directive.
* Recognizes only the nowait clause.
* No loop nests with more than one loop.
* Untested with templates, exceptions.
* Semantic checking left to the existing infrastructure.
This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
* The distance function: How many loop iterations there will be before entering the loop nest.
* The loop variable function: Conversion from a logical iteration number to the loop variable.
These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.
The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.
For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.
Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.
Reviewed By: jdenny
Differential Revision: https://reviews.llvm.org/D94973
Background:
Wasm EH, while using Windows EH (catchpad/cleanuppad based) IR, uses
Itanium-based libraries and ABIs with some modifications.
`__clang_call_terminate` is a wrapper generated in Clang's Itanium C++
ABI implementation. It contains this code, in C-style pseudocode:
```
void __clang_call_terminate(void *exn) {
__cxa_begin_catch(exn);
std::terminate();
}
```
So this function is a wrapper to call `__cxa_begin_catch` on the
exception pointer before termination.
In Itanium ABI, this function is called when another exception is thrown
while processing an exception. The pointer for this second, violating
exception is passed as the argument of this `__clang_call_terminate`,
which calls `__cxa_begin_catch` with that pointer and calls
`std::terminate` to terminate the program.
The spec (https://libcxxabi.llvm.org/spec.html) for `__cxa_begin_catch`
says,
```
When the personality routine encounters a termination condition, it
will call __cxa_begin_catch() to mark the exception as handled and then
call terminate(), which shall not return to its caller.
```
In wasm EH's Clang implementation, this function is called from
cleanuppads that terminates the program, which we also call terminate
pads. Cleanuppads normally don't access the thrown exception and the
wasm backend converts them to `catch_all` blocks. But because we need
the exception pointer in this cleanuppad, we generate
`wasm.get.exception` intrinsic (which will eventually be lowered to
`catch` instruction) as we do in the catchpads. But because terminate
pads are cleanup pads and should run even when a foreign exception is
thrown, so what we have been doing is:
1. In `WebAssemblyLateEHPrepare::ensureSingleBBTermPads()`, we make sure
terminate pads are in this simple shape:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
```
2. In `WebAssemblyHandleEHTerminatePads` pass at the end of the
pipeline, we attach a `catch_all` to terminate pads, so they will be in
this form:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
catch_all
call @std::terminate()
unreachable
```
In `catch_all` part, we don't have the exception pointer, so we call
`std::terminate()` directly. The reason we ran HandleEHTerminatePads at
the end of the pipeline, separate from LateEHPrepare, was it was
convenient to assume there was only a single `catch` part per `try`
during CFGSort and CFGStackify.
---
Problem:
While it thinks terminate pads could have been possibly split or calls
to `__clang_call_terminate` could have been duplicated,
`WebAssemblyLateEHPrepare::ensureSingleBBTermPads()` assumes terminate
pads contain no more than calls to `__clang_call_terminate` and
`unreachable` instruction. I assumed that because in LLVM very limited
forms of transformations are done to catchpads and cleanuppads to
maintain the scoping structure. But it turned out to be incorrect;
passes can merge cleanuppads into one, including terminate pads, as long
as the new code has a correct scoping structure. One pass that does this
I observed was `SimplifyCFG`, but there can be more. After this
transformation, a single cleanuppad can contain any number of other
instructions with the call to `__clang_call_terminate` and can span many
BBs. It wouldn't be practical to duplicate all these BBs within the
cleanuppad to generate the equivalent `catch_all` blocks, only with
calls to `__clang_call_terminate` replaced by calls to `std::terminate`.
Unless we do more complicated transformation to split those calls to
`__clang_call_terminate` into a separate cleanuppad, it is tricky to
solve.
---
Solution (?):
This CL just disables the generation and use of `__clang_call_terminate`
and calls `std::terminate()` directly in its place.
The possible downside of this approach can be, because the Itanium ABI
intended to "mark" the violating exception handled, we don't do that
anymore. What `__cxa_begin_catch` actually does is increment the
exception's handler count and decrement the uncaught exception count,
which in my opinion do not matter much given that we are about to
terminate the program anyway. Also it does not affect info like stack
traces that can be possibly shown to developers.
And while we use a variant of Itanium EH ABI, we can make some
deviations if we choose to; we are already different in that in the
current version of the EH spec we don't support two-phase unwinding. We
can possibly consider a more complicated transformation later to
reenable this, but I don't think that has high priority.
Changes in this CL contains:
- In Clang, we don't generate a call to `wasm.get.exception()` intrinsic
and `__clang_call_terminate` function in terminate pads anymore; we
simply generate calls to `std::terminate()`, which is the default
implementation of `CGCXXABI::emitTerminateForUnexpectedException`.
- Remove `WebAssembly::ensureSingleBBTermPads() function and
`WebAssemblyHandleEHTerminatePads` pass, because terminate pads are
already `catch_all` now (because they don't need the exception
pointer) and we don't need these transformations anymore.
- Change tests to use `std::terminate` directly. Also removes tests that
tested `LateEHPrepare::ensureSingleBBTermPads` and
`HandleEHTerminatePads` pass.
- Drive-by fix: Add some function attributes to EH intrinsic
declarations
Fixes https://github.com/emscripten-core/emscripten/issues/13582.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D97834
Use a WeakTrackingVH to cope with the stmt emission logic that cleans up
unreachable blocks. This invalidates the reference to the deferred
replacement placeholder. Cope with it.
Fixes PR25102 (from 2015!)
This change adds a new IR noundef attribute, which denotes when a function call argument or return val may never contain uninitialized bits.
In MemorySanitizer, this attribute enables optimizations which decrease instrumented code size by up to 17% (measured with an instrumented build of clang) . I'll introduce the change allowing msan to take advantage of this information in a separate patch.
Differential Revision: https://reviews.llvm.org/D81678
explicitly emitting retainRV or claimRV calls in the IR
This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.
Original commit message:
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
If the file in line directive does not exist on the system we need, to
use the original file to get its file id.
Differential Revision: https://reviews.llvm.org/D97945
unsigned variable 'IntNo' has been declared but not been defined inside function
EmitWebAssemblyBuiltinExpr().
static code analysis tool complains about uninitialized variable "IntNo" since
this enters to default branch without setting any intrinsics and calls Function
*Callee = CGM.getIntrinsic(IntNo).
This patch fixes the problem by adding default cases in switch statements.
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.
> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
> which indicates the call is implicitly followed by a marker
> instruction and an implicit retainRV/claimRV call that consumes the
> call result. In addition, it emits a call to
> @llvm.objc.clang.arc.noop.use, which consumes the call result, to
> prevent the middle-end passes from changing the return type of the
> called function. This is currently done only when the target is arm64
> and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
> with the operand bundle in the IR and removes the inserted calls after
> processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
> operand bundle. It doesn't remove the operand bundle on the call since
> the backend needs it to emit the marker instruction. The retainRV and
> claimRV calls are emitted late in the pipeline to prevent optimization
> passes from transforming the IR in a way that makes it harder for the
> ARC middle-end passes to figure out the def-use relationship between
> the call and the retainRV/claimRV calls (which is the cause of
> PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
> nothing in the callee prevents it from being paired up with the
> retainRV/claimRV call in the caller. It then inserts a release call if
> claimRV is attached to the call since autoreleaseRV+claimRV is
> equivalent to a release. If it cannot find an autoreleaseRV call, it
> tries to transfer the operand bundle to a function call in the callee.
> This is important since the ARC optimizer can remove the autoreleaseRV
> returning the callee result, which makes it impossible to pair it up
> with the retainRV/claimRV call in the caller. If that fails, it simply
> emits a retain call in the IR if retainRV is attached to the call and
> does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
> constant value if the call has the operand bundle. This ensures the
> call always has at least one user (the call to
> @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
> multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
> calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808
This reverts commit ed4718eccb.
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: mibintc
Differential Revision: https://reviews.llvm.org/D97125
If the mapped structure has data members, which have 'default' mappers,
need to map these members individually using their 'default' mappers.
Differential Revision: https://reviews.llvm.org/D92195
`GetAddrOfGlobalTemporary` previously tried to emit the initializer of
a global temporary before updating the global temporary map. Emitting the
initializer could recurse back into `GetAddrOfGlobalTemporary` for the same
temporary, resulting in an infinite recursion.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D97733
The situation with inline asm/MC error reporting is kind of messy at the
moment. The errors from MC layout are not reliably propagated and users
have to specify an inlineasm handler separately to get inlineasm
diagnose. The latter issue is not a correctness issue but could be improved.
* Kill LLVMContext inlineasm diagnose handler and migrate it to use
DiagnoseInfo/DiagnoseHandler.
* Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This
covers use cases like inlineasm, MC, and any clients using SourceMgr.
* Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step
is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all
use cases, only one of them is used.
* If LLVMContext is available, let MCContext uses LLVMContext's diagnose
handler; if LLVMContext is not available, MCContext uses its own default
diagnose handler which just prints SMDiagnostic.
* Change a few clients(Clang, llc, lldb) to use the new way of reporting.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D97449
Currently clang uses stub function to launch kernel. This is inconvenient
to interop with C++ programs since the stub function has different name
as kernel, which is required by ROCm debugger.
This patch emits a variable symbol which has the same name as the kernel
and uses it to register and launch the kernel. This allows C++ program to
launch a kernel by using the original kernel name.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D86376
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.
Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.
Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D97608
Simply make sure that the CodeGenFunction::CXXThisValue and CXXABIThisValue
are correctly initialized to the recovered value.
For lambda capture, we also need to make sure to fill the LambdaCaptureFields
Differential Revision: https://reviews.llvm.org/D97534
For ELF targets, GCC 11 will set SHF_GNU_RETAIN on the section of a
`__attribute__((retain))` function/variable to prevent linker garbage
collection. (See AttrDocs.td for the linker support).
This patch adds `retain` functions/variables to the `llvm.used` list, which has
the desired linker GC semantics. Note: `retain` does not imply `used`,
so an unused function/variable can be dropped by Sema.
Before 'retain' was introduced, previous ELF solutions require inline asm or
linker tricks, e.g. `asm volatile(".reloc 0, R_X86_64_NONE, target");`
(architecture dependent) or define a non-local symbol in the section and use
`ld -u`. There was no elegant source-level solution.
With D97448, `__attribute__((retain))` will set `SHF_GNU_RETAIN` on ELF targets.
Differential Revision: https://reviews.llvm.org/D97447
An global value in the `llvm.used` list does not have GC root semantics on ELF targets.
This will be changed in a subsequent backend patch.
Change some `llvm.used` in the ELF code path to use `llvm.compiler.used` to
prevent undesired GC root semantics.
Change one extern "C" alias (due to `__attribute__((used))` in extern "C") to use `llvm.compiler.used` on all targets.
GNU ld has a rule "`__start_/__stop_` references from a live input section retain the associated C identifier name sections",
which LLD may drop entirely (currently refined to exclude SHF_LINK_ORDER/SHF_GROUP) in a future release (the rule makes it clumsy to GC metadata sections; D96914 added a way to try the potential future behavior).
For `llvm.used` global values defined in a C identifier name section, keep using `llvm.used` so that
the future LLD change will not affect them.
rnk kindly categorized the changes:
```
ObjC/blocks: this wants GC root semantics, since ObjC mainly runs on Mac.
MS C++ ABI stuff: wants GC root semantics, no change
OpenMP: unsure, but GC root semantics probably don't hurt
CodeGenModule: affected in this patch to *not* use GC root semantics so that __attribute__((used)) behavior remains the same on ELF, plus two other minor use cases that don't want GC semantics
Coverage: Probably want GC root semantics
CGExpr.cpp: refers to LTO, wants GC root
CGDeclCXX.cpp: one is MS ABI specific, so yes GC root, one is some other C++ init functionality, which should form GC roots (C++ initializers can have side effects and must run)
CGDecl.cpp: Changed in this patch for __attribute__((used))
```
Differential Revision: https://reviews.llvm.org/D97446
These flags affect coverage mapping (-fcoverage-mapping), not
-fprofile-[instr-]generate so it makes more sense to use the
-fcoverage-* prefix.
Differential Revision: https://reviews.llvm.org/D97434
And then push those change throughout LLVM.
Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).
The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.
Differential Revision: https://reviews.llvm.org/D97223
The new `-fsanitize-address-destructor-kind=` option allows control over how module
destructors are emitted by ASan.
The new option is consumed by both the driver and the frontend and is propagated into
codegen options by the frontend.
Both the legacy and new pass manager code have been updated to consume the new option
from the codegen options.
It would be nice if the new utility functions (`AsanDtorKindToString` and
`AsanDtorKindFromString`) could live in LLVM instead of Clang so they could be
consumed by other language frontends. Unfortunately that doesn't work because
the clang driver doesn't link against the LLVM instrumentation library.
rdar://71609176
Differential Revision: https://reviews.llvm.org/D96572
After a revision of D96274 changed `DiagnosticOptions` to not store all remark arguments **as-written**, it is no longer possible to reconstruct the arguments accurately from the class.
This is caused by the fact that for `-Rpass=regexp` and friends, `DiagnosticOptions` store only the group name `pass` and not `regexp`. This is the same representation used for the plain `-Rpass` argument.
Note that each argument must be generated exactly once in `CompilerInvocation::generateCC1CommandLine`, otherwise each subsequent call would produce more arguments than the previous one. Currently this works out because of the way `RoundTrip` splits the responsibilities for certain arguments based on what arguments were queried during parsing. However, this invariant breaks when we move to single round-trip for the whole `CompilerInvocation`.
This patch ensures that for one `-Rpass=regexp` argument, we don't generate two arguments (`-Rpass` from `DiagnosticOptions` and `-Rpass=regexp` from `CodeGenOptions`) by shifting the responsibility for handling both cases to `CodeGenOptions`. To distinguish between the cases correctly, additional information is stored in `CodeGenOptions`.
The `CodeGenOptions` parser of `-Rpass[=regexp]` arguments also looks at `-Rno-pass` and `-R[no-]everything`, which is necessary for generating the correct argument regardless of the ordering of `CodeGenOptions`/`DiagnosticOptions` parsing/generation.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D96847
For -fgpu-rdc mode, static device vars in different TU's may have the same name.
To support accessing file-scope static device variables in host code, we need to give them
a distinct name and external linkage. This can be done by postfixing each static device variable with
a distinct CUID (Compilation Unit ID) hash.
Since the static device variables have different name across compilation units, now we let
them have external linkage so that they can be looked up by the runtime.
Reviewed by: Artem Belevich, and Jon Chesterfield
Differential Revision: https://reviews.llvm.org/D85223
-O1 and above do dont call real optimizer pipeline in ThinLTO PreLink.
Also clang can't add PostLink OptimizerLastEPCallbacks for in-process ThinLTO.
This results in missing sanitizer passes with ThinLTO.
Simple working solution is just call OptimizerLastEPCallbacks
at the end of buildThinLTOPreLinkDefaultPipeline.
Differential Revision: https://reviews.llvm.org/D96320
Patch takes advantage of the implicit default behavior to reduce the number of attributes, which in turns reduces compilation time.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D97116
Currently managed variables are emitted as undefined symbols, which
causes difficulty for diagnosing undefined symbols for non-managed
variables.
This patch transforms managed variables in device compilation so that
they can be emitted as normal variables.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D96195
Demonstrate how to add RISC-V V builtins and lower them to IR intrinsics for V extension.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93446
Previously, the definition was so-marked, but the declaration was
not. This resulted in LLVM's dwarf emission treating the function as
being external, and incorrectly emitting DW_AT_external.
Differential Revision: https://reviews.llvm.org/D96044
This patch responds to a comment from @vitalybuka in D96203: suggestion to
do the change incrementally, and start by modifying this file name. I modified
the file name and made the other changes that follow from that rename.
Reviewers: vitalybuka, echristo, MaskRay, jansvoboda11, aaron.ballman
Differential Revision: https://reviews.llvm.org/D96974
This patch adds the following SHA3 Intrinsics:
vsha512hq_u64,
vsha512h2q_u64,
vsha512su0q_u64,
vsha512su1q_u64
veor3q_u8
veor3q_u16
veor3q_u32
veor3q_u64
veor3q_s8
veor3q_s16
veor3q_s32
veor3q_s64
vrax1q_u64
vxarq_u64
vbcaxq_u8
vbcaxq_u16
vbcaxq_u32
vbcaxq_u64
vbcaxq_s8
vbcaxq_s16
vbcaxq_s32
vbcaxq_s64
Note need to include +sha3 and +crypto when building from the front-end
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D96381
When WPD is enabled, via WholeProgramVTables, emit type metadata for
available_externally vtables. Additionally, add the vtables to the
llvm.compiler.used global so that they are not prematurely eliminated
(before *LTO analysis).
This is needed to avoid devirtualizing calls to a function overriding a
class defined in a header file but with a strong definition in a shared
library. Without type metadata on the available_externally vtables from
the header, the WPD analysis never sees what a derived class is
overriding. Even if the available_externally base class functions are
pure virtual, because shared library definitions are already treated
conservatively (committed patches D91583, D96721, and D96722) we will
not devirtualize, which would be unsafe since the library might contain
overrides that aren't visible to the LTO unit.
An example is std::error_category, which is overridden in LLVM
and causing failures after a self build with WPD enabled, because
libstdc++ contains hidden overrides of the virtual base class methods.
Differential Revision: https://reviews.llvm.org/D96919
We currently always store absolute filenames in coverage mapping. This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines. We are also
duplicating the path prefix potentially wasting space.
This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.
Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.
Differential Revision: https://reviews.llvm.org/D95753
We currently always store absolute filenames in coverage mapping. This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines. We are also
duplicating the path prefix potentially wasting space.
This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.
Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.
Differential Revision: https://reviews.llvm.org/D95753
The recent commit 00a6254 "Stop traping on sNaN in builtin_isnan" changed the
lowering in constrained FP mode of builtin_isnan from an FP comparison to
integer operations to avoid trapping.
SystemZ has a special instruction "Test Data Class" which is the preferred
way to do this check. This patch adds a new target hook "testFPKind()" that
lets SystemZ emit the s390_tdc intrinsic instead.
testFPKind() takes the BuiltinID as an argument and is expected to soon
handle more opcodes than just 'builtin_isnan'.
Review: Thomas Preud'homme, Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D96568
Add the types for the RISC-V V extension builtins.
These types will be used by the RISC-V V intrinsics which require
types of the form <vscale x 1 x i64>(LMUL=1 element size=64) or
<vscale x 4 x i32>(LMUL=2 element size=32), etc. The vector_size
attribute does not work for us as it doesn't create a scalable
vector type. We want these types to be opaque and have no operators
defined for them. We want them to be sizeless. This makes them
similar to the ARM SVE builtin types. But we will have quite a bit
more types. This patch adds around 60. Later patches will add
another 230 or so types representing tuples of these types similar
to the x2/x3/x4 types in ARM SVE. But with extra complexity that
these types are combined with the LMUL concept that is unique to
RISCV.
For more background see this RFC
http://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html
Authored-by: Roger Ferrer Ibanez <roger.ferrer@bsc.es>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D92715
This allows the option to affect the LTO output. Module::Max helps to
generate debug info for all modules in the same format.
Differential Revision: https://reviews.llvm.org/D96597
OpenMP 5.0 removed a lot of restriction for overlapped mapped items
comparing to OpenMP 4.5. Patch restricts the checks for overlapped data
mappings only for OpenMP 4.5 and less and reorders mapping of the
arguments so, that present and alloc mappings are processed first and
then all others.
Differential Revision: https://reviews.llvm.org/D86119
The tile directive is in OpenMP's Technical Report 8 and foreseeably will be part of the upcoming OpenMP 5.1 standard.
This implementation is based on an AST transformation providing a de-sugared loop nest. This makes it simple to forward the de-sugared transformation to loop associated directives taking the tiled loops. In contrast to other loop associated directives, the OMPTileDirective does not use CapturedStmts. Letting loop associated directives consume loops from different capture context would be difficult.
A significant amount of code generation logic is taking place in the Sema class. Eventually, I would prefer if these would move into the CodeGen component such that we could make use of the OpenMPIRBuilder, together with flang. Only expressions converting between the language's iteration variable and the logical iteration space need to take place in the semantic analyzer: Getting the of iterations (e.g. the overload resolution of `std::distance`) and converting the logical iteration number to the iteration variable (e.g. overload resolution of `iteration + .omp.iv`). In clang, only CXXForRangeStmt is also represented by its de-sugared components. However, OpenMP loop are not defined as syntatic sugar. Starting with an AST-based approach allows us to gradually move generated AST statements into CodeGen, instead all at once.
I would also like to refactor `checkOpenMPLoop` into its functionalities in a follow-up. In this patch it is used twice. Once for checking proper nesting and emitting diagnostics, and additionally for deriving the logical iteration space per-loop (instead of for the loop nest).
Differential Revision: https://reviews.llvm.org/D76342
This takes advantage of the implicit default behavior to reduce the number of
attributes, which in turns reduces compilation time. I've observed -3% in
instruction count when compiling sqlite3 amalgamation with -O0
Differential Revision: https://reviews.llvm.org/D96400
This is a follow up of D92940.
We have successfully converted fadd/fmul _mm_reduce_* intrinsics to
llvm.reduction + reassoc flag. We can do the same approach for fmin/fmax
too, i.e. llvm.reduction + nnan flag.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93179
This patch ensures that vector predication and vectorization width
pragmas work together correctly/as expected. Specifically, this patch
fixes the issue that when vectorization_width > 1, the vector
predication behaviour (this would matter if it has NOT been disabled
explicitly by a pragma) was getting ignored, which was incorrect.
The fix here removes the dependence of vector predication on the
vectorization width. The loop metadata corresponding to clang loop
pragma vectorize_predicate is always emitted, if the pragma is
specified, even if vectorization is disabled by vectorize_width(1)
or vectorize(disable) since the option is also used for interleaving
by the LoopVectorize pass.
Reviewed By: dmgreen, Meinersbur
Differential Revision: https://reviews.llvm.org/D94779
This patch adds 2 new options to control when Clang adds `mustprogress`:
1. -ffinite-loops: assume all loops are finite; mustprogress is added
to all loops, regardless of the selected language standard.
2. -fno-finite-loops: assume no loop is finite; mustprogress is not
added to any loop or function. We could add mustprogress to
functions without loops, but we would have to detect that in Clang,
which is probably not worth it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D96419
class types.
The goal is to provide a way to bypass constructor homing when emitting
class definitions and force class definitions in the debug info.
Not sure about the wording of the attribute, or whether it should be
specific to classes with constructors
explicitly emitting retainRV or claimRV calls in the IR
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
EarlyCSEPass called after msan redices code size by about 10%.
Similar optimization exists for legacy pass manager in
addGeneralOptsForMemorySanitizer.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D96406
The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o
adding any new functionality.
Differential Revision: https://reviews.llvm.org/D95974
Signed prefix is removed and the single word spelling is
printed for the scalar types.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D96161
Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D96231
As Itanium ABI[http://itanium-cxx-abi.github.io/cxx-abi/abi.html#once-ctor]
points out:
"The size of the guard variable is 64 bits. The first byte (i.e. the byte at
the address of the full variable) shall contain the value 0 prior to
initialization of the associated variable, and 1 after initialization is complete."
Differential Revision: https://reviews.llvm.org/D95822
Pipe element type spelling for arg info metadata
should follow the same behavior as normal type spelling.
We should only use the canonical type spelling in the
base type field.
This patch also removed duplication in type handling.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D96151
When a function or a file is excluded using -fprofile-list= option,
don't emit coverage mapping as doing so confuses users since those
functions would always have zero count. This also reduces the binary
size considerably in cases where only a few functions or files are
being instrumented.
Differential Revision: https://reviews.llvm.org/D96000
For -fgpu-rdc, shadow variables should not be internalized, otherwise
they cannot be accessed by other TUs. This is necessary because
the shadow variable of external device variables are always
emitted as undefined symbols, which need to resolve to a global
symbols.
Managed variables need to be emitted as undefined symbols
in device compilations.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D95901
__builtin_isnan currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: kpn
Differential Revision: https://reviews.llvm.org/D95948
- The failures are all cc1-based tests due to the missing `-aux-triple` options,
which is always prepared by the driver in CUDA/HIP compilation.
- Add extra check on the missing aux-targetinfo to prevent crashing.
[hip][cuda] Enable extended lambda support on Windows.
- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D69322
This reverts commit 4874ff0241.
emitting retainRV or claimRV calls in the IR
This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.
Original commit message:
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
emitting retainRV or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D69322
Clang usually propagates counter mapping region for conditions of `if`, `while`,
`for`, etc from parent counter. We should do the same for condition of conditional operator.
Differential Revision: https://reviews.llvm.org/D95918
Extract registering device variable to CUDA runtime codegen function since it
will be called in multiple places.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D95558
Currently clang is not correctly retrieving from the AST the metadata for
constrained FP builtins. This patch fixes that for the X86 specific builtins.
Differential Revision: https://reviews.llvm.org/D94614
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
Normally, Clang will not make dllimport functions available for inlining
if they reference non-imported symbols, as this can lead to confusing
link errors. But if the function is marked always_inline, the user
presumably knows what they're doing and the attribute should be honored.
Differential revision: https://reviews.llvm.org/D95673
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
with fix to test case and stringrefs.
Currently (for codeview) lambdas have a string like `<lambda_0>` in
their mangled name, and don't have any display name. This change uses the
`<lambda_0>` as the display name, which helps distinguish between lambdas
in -gline-tables-only, since there are no linkage names there.
It also changes how we display lambda names; previously we used
`<unnamed-tag>`; now it will show `<lambda_0>`.
I added a function to the mangling context code to create this string;
for Itanium it just returns an empty string.
Bug: https://bugs.llvm.org/show_bug.cgi?id=48432
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D95187
This reverts 9b21d4b943
Currently (for codeview) lambdas have a string like `<lambda_0>` in
their mangled name, and don't have any display name. This change uses the
`<lambda_0>` as the display name, which helps distinguish between lambdas
in -gline-tables-only, since there are no linkage names there.
It also changes how we display lambda names; previously we used
`<unnamed-tag>`; now it will show `<lambda_0>`.
I added a function to the mangling context code to create this string;
for Itanium it just returns an empty string.
Bug: https://bugs.llvm.org/show_bug.cgi?id=48432
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D95187
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
There are two use cases.
Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).
Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld". You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727https://sourceware.org/bugzilla/show_bug.cgi?id=22969)
Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).
This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.
`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.
Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).
Differential Revision: https://reviews.llvm.org/D85474
For Clang synthesized `__va_list_tag` (`CreateX86_64ABIBuiltinVaListDecl`),
its DW_AT_decl_file/DW_AT_decl_line are arbitrarily set from `CurLoc`.
In a stage 2 `-DCMAKE_BUILD_TYPE=Debug` clang build, I observe that
in driver.cpp, DW_AT_decl_file/DW_AT_decl_line may be set to an `#include` line
(the transitively included file uses va_arg (`__builtin_va_arg`)).
This seems arbitrary. Drop that.
Reviewed By: #debug-info, dblaikie
Differential Revision: https://reviews.llvm.org/D94735
`getLineNumber()` picks CurLoc if the parameter is invalid. This appears to
mainly work around missing SourceLocation information for some constructs, but
sometimes adds unintended locations.
* For `CodeGenObjC/debug-info-blocks.m`, `CurLoc` has been advanced to the closing brace. The debug line of `ImplicitVarParameter` is set to the line of `}` because this implicit parameter has an invalid `SourceLocation`. The debug line is a bit arbitrary - perhaps the location of `^{` is better.
* The file/line of Clang synthesized `__va_list_tag` is arbitrarily attached a `#include` line. D94735
Drop the special case to make getLineNumber less magic and add CurLoc fallback in its callers instead.
Tested with stage 2 -DCMAKE_BUILD_TYPE=Debug clang, byte identical.
Reviewed By: #debug-info, aprantl
Differential Revision: https://reviews.llvm.org/D94391
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
Whenever we enter a new OpenMP data environment we want to enter a
function to simplify reasoning. Later we probably want to remove the
entire specialization wrt. the if clause and pass the result to the
runtime, for now this should fix PR48686.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D94315
or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end annotates calls with attribute "clang.arc.rv"="retain"
or "clang.arc.rv"="claim", which indicates the call is implicitly
followed by a marker instruction and a retainRV/claimRV call that
consumes the call result. This is currently done only when the target
is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the
annotated calls in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the annotated
calls. It doesn't remove the attribute on the call since the backend
needs it to emit the marker instruction. The retainRV/claimRV calls
are emitted late in the pipeline to prevent optimization passes from
transforming the IR in a way that makes it harder for the ARC
middle-end passes to figure out the def-use relationship between the
call and the retainRV/claimRV calls (which is the cause of PR31925).
- The function inliner removes the autoreleaseRV call in the callee that
returns the result if nothing in the callee prevents it from being
paired up with the calls annotated with "clang.arc.rv"="retain/claim"
in the caller. If the call is annotated with "claim", a release call
is inserted since autoreleaseRV+claimRV is equivalent to a release. If
it cannot find an autoreleaseRV call, it tries to transfer the
attributes to a function call in the callee. This is important since
ARC optimizer can remove the autoreleaseRV call returning the callee
result, which makes it impossible to pair it up with the retainRV or
claimRV call in the caller. If that fails, it simply emits a retain
call in the IR if the call is annotated with "retain" and does nothing
if it's annotated with "claim".
- This patch teaches dead argument elimination pass not to change the
return type of a function if any of the calls to the function are
annotated with attribute "clang.arc.rv". This is necessary since the
pass can incorrectly determine nothing in the IR uses the function
return, which can happen since the front-end no longer explicitly
emits retainRV/claimRV calls in the IR, and change its return type to
'void'.
Future work:
- Use the attribute on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the attributes.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
In the PPC32 SVR4 ABI, a va_list has copies of registers from the function call.
va_arg looked in the wrong registers for (the pointer representation of) an
object in Objective-C, and for some types in C++. Fix va_arg to look in the
general-purpose registers, not the floating-point registers. Also fix va_arg
for some C++ types, like a member function pointer, that are aggregates for
the ABI.
Anthony Richardby found the problem in Objective-C. Eli Friedman suggested
part of this fix.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47921
Reviewed By: efriedma, nemanjai
Differential Revision: https://reviews.llvm.org/D90329
Most of CGExprConstant.cpp is using the CharUnits abstraction
and is using getCharWidth() (directly of indirectly) when converting
between size of a char and size in bits. This patch is making that
abstraction more consistent by adding CharTy to the CodeGenTypeCache
(honoring getCharWidth() when mapping from char to LLVM IR types,
instead of using Int8Ty directly).
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D94979
When using getByteArrayType the requested size is calculated in
char units, but the type used for the array was hardcoded to the
Int8Ty. This patch is using getCharWIdth a bit more consistently
by using getIntNTy in combination with getCharWidth, instead
of explictly using getInt8Ty.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D94977
This patch implements codegen for __managed__ variable attribute for HIP.
Diagnostics will be added later.
Differential Revision: https://reviews.llvm.org/D94814
check
This patch fixes a bug in emitARCOperationAfterCall where it inserts the
fall-back call after a bitcast instruction and then replaces the
bitcast's operand with the result of the fall-back call. The generated
IR without this patch looks like this:
msgSend.call: ; preds = %entry
%call = call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend
br label %msgSend.cont
msgSend.null-receiver: ; preds = %entry
call void @llvm.objc.release(i8* %4)
br label %msgSend.cont
msgSend.cont:
%8 = phi i8* [ %call, %msgSend.call ], [ null, %msgSend.null-receiver ]
%9 = bitcast i8* %10 to %0*
%10 = call i8* @llvm.objc.retain(i8* %8)
Notice that `%9 = bitcast i8* %10` to %0* is taking operand %10 which is
defined after it.
To fix the bug, this patch modifies the insert point to point to the
bitcast instruction so that the fall-back call is inserted before the
bitcast. In addition, it teaches the function to look at phi
instructions that are generated when there is a check for a null
receiver and insert the retainRV/claimRV instruction right after the
call instead of inserting a fall-back call right after the phi
instruction.
rdar://73360225
Differential Revision: https://reviews.llvm.org/D95181
CodeGenModule::EmitNullConstant() creates constants with their "in memory"
type, not their "in vregs" type. The one place where this difference matters is
when the type is _Bool, as that is an i1 when in vregs and an i8 in memory.
Fixes: rdar://73361264
Summary:
The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library.
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D94806
Combined with 'da98651 - Revert "DR2064:
decltype(E) is only a dependent', this change (5a391d3) caused verifier
errors when building Chromium. See https://crbug.com/1168494#c1 for a
reproducer.
Additionally it reverts changes that were dependent on this one, see
below.
> Following up on PR48517, fix handling of template arguments that refer
> to dependent declarations.
>
> Treat an id-expression that names a local variable in a templated
> function as being instantiation-dependent.
>
> This addresses a language defect whereby a reference to a dependent
> declaration can be formed without any construct being value-dependent.
> Fixing that through value-dependence turns out to be problematic, so
> instead this patch takes the approach (proposed on the core reflector)
> of allowing the use of pointers or references to (but not values of)
> dependent declarations inside value-dependent expressions, and instead
> treating template arguments as dependent if they evaluate to a constant
> involving such dependent declarations.
>
> This ends up affecting a bunch of OpenMP tests, due to OpenMP
> imprecisely handling instantiation-dependent constructs, bailing out
> early instead of processing dependent constructs to the extent possible
> when handling the template.
>
> Previously committed as 8c1f2d15b8, and
> reverted because a dependency commit was reverted.
This reverts commit 5a391d38ac.
It also restores clang/test/SemaCXX/coroutines.cpp to its state before
da986511fb.
Revert "[c++20] P1907R1: Support for generalized non-type template arguments of scalar type."
> Previously committed as 9e08e51a20, and
> reverted because a dependency commit was reverted. This incorporates the
> following follow-on commits that were also reverted:
>
> 7e84aa1b81 by Simon Pilgrim
> ed13d8c667 by me
> 95c7b6cadb by Sam McCall
> 430d5d8429 by Dave Zarzycki
This reverts commit 4b574008ae.
Revert "[msabi] Mangle a template argument referring to array-to-pointer decay"
> [msabi] Mangle a template argument referring to array-to-pointer decay
> applied to an array the same as the array itself.
>
> This follows MS ABI, and corrects a regression from the implementation
> of generalized non-type template parameters, where we "forgot" how to
> mangle this case.
This reverts commit 18e093faf7.
OMP_MAP_TARGET_PARAM flag is used to mark the data that shoud be passed
as arguments to the target kernels, nothing else. But the compiler still
marks the data with OMP_MAP_TARGET_PARAM flags even if the data is
passed to the data movement directives, like target data, target update
etc. This flag is just ignored for this directives and the compiler does
not need to emit it.
Reviewed By: cchen
Differential Revision: https://reviews.llvm.org/D91261
D94745 rewrites the `deviceRTLs` using OpenMP and compiles it by directly
calling the device compilation. `clang` crashes because entry in
`OffloadEntriesDeviceGlobalVar` is unintialized. Current design supposes the
device compilation can only be invoked after host compilation with the host IR
such that `clang` can initialize `OffloadEntriesDeviceGlobalVar` from host IR.
This avoids us using device compilation directly, especially when we only have
code wrapped into `declare target` which are all device code. The same issue
also exists for `OffloadEntriesInfoManager`.
In this patch, we simply initialized an entry if it is not in the maps. Not sure
we need an option to tell the device compiler that it is invoked standalone.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94871
Previously committed as 9e08e51a20, and
reverted because a dependency commit was reverted. This incorporates the
following follow-on commits that were also reverted:
7e84aa1b81 by Simon Pilgrim
ed13d8c667 by me
95c7b6cadb by Sam McCall
430d5d8429 by Dave Zarzycki
for function scopes, rather than using the qualified name.
In line-tables-only mode, we used to emit qualified names as the display name for functions when using CodeView.
This patch changes to emitting the parent scopes instead, with forward declarations for class types.
The total object file size ends up being slightly smaller than if we use the full qualified names.
Differential Revision: https://reviews.llvm.org/D94639
Under -mabi=ieeelongdouble on PowerPC, IEEE-quad floating point semantic
is used for long double. This patch mutates call to related builtins
into f128 version on PowerPC. And in theory, this should be applied to
other targets when their backend supports IEEE 128-bit style libcalls.
GCC already has these mutations except nansl, which is not available on
PowerPC along with other variants (nans, nansf).
Reviewed By: RKSimon, nemanjai
Differential Revision: https://reviews.llvm.org/D92080
This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes
atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`,
and `__arm_st64bv0`. These are selected into the LS64 instructions
LD64B, ST64B, ST64BV and ST64BV0, respectively.
Based on patches written by Simon Tatham.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D93232
Move nomerge attribute from function declaration/definition to callsites to
allow virtual function calls attach the attribute.
Differential Revision: https://reviews.llvm.org/D94537
The patch adds the required methods to FixedPointBuilder
for converting between fixed-point and floating point,
and uses them from Clang.
This depends on D54749.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D86632
VLST return values are coerced to VLATs in the function epilog for
consistency with the VLAT ABI. Previously, this coercion was done
through memory. It is preferable to use the
llvm.experimental.vector.insert intrinsic to avoid going through memory
here.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D94290
ELF -fno-pic sets dso_local on a function declaration to allow direct accesses
when taking its address (similar to a data symbol). The emitted code follows the
traditional GCC/Clang -fno-pic behavior: an absolute relocation is produced.
If the function is not defined in the executable, a canonical PLT entry will be
needed at link time. This is similar to a copy relocation and is incompatible
with (-Bsymbolic or --dynamic-list linked shared objects / protected symbols in
a shared object).
This patch gives -fno-pic code a way to avoid such a canonical PLT entry.
The FIXME was about a generalization for -fpie -mpie-copy-relocations (now -fpie
-fdirect-access-external-data). While we could set dso_local to avoid GOT when
taking the address of a function declaration (there is an ignorable difference
about R_386_PC32 vs R_386_PLT32 on i386), it likely does not provide any benefit
and can just cause trouble, so we don't make the generalization.
D92633 added -f[no-]direct-access-external-data to supersede -m[no-]pie-copy-relocations.
(The option works for -fpie but is a no-op for -fno-pic and -fpic.)
This patch makes -fno-pic -fno-direct-access-external-data drop dso_local from
global variable declarations. This usually causes the backend to emit a GOT
indirection for external data access. With a GOT relocation, the subsequent
-no-pie link will not have copy relocation even if the data symbol turns out to
be defined by a shared object.
Differential Revision: https://reviews.llvm.org/D92714
GCC r218397 "x86-64: Optimize access to globals in PIE with copy reloc" made
-fpie code emit R_X86_64_PC32 to reference external data symbols by default.
Clang adopted -mpie-copy-relocations D19996 as a flexible alternative.
The name -mpie-copy-relocations can be improved [1] and does not capture the
idea that this option can apply to -fno-pic and -fpic [2], so this patch
introduces -f[no-]direct-access-external-data and makes -mpie-copy-relocations
their aliases for compatibility.
[1]
For
```
extern int var;
int get() { return var; }
```
if var is defined in another translation unit in the link unit, there is no copy
relocation.
[2]
-fno-pic -fno-direct-access-external-data is useful to avoid copy relocations.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65888
If a shared object is linked with -Bsymbolic or --dynamic-list and exports a
data symbol, normally the data symbol cannot be accessed by -fno-pic code
(because by default an absolute relocation is produced which will lead to a copy
relocation). -fno-direct-access-external-data can prevent copy relocations.
-fpic -fdirect-access-external-data can avoid GOT indirection. This is like the
undefined counterpart of -fno-semantic-interposition. However, the user should
define var in another translation unit and link with -Bsymbolic or
--dynamic-list, otherwise the linker will error in a -shared link. Generally
the user has better tools for their goal but I want to mention that this
combination is valid.
On COFF, the behavior is like always -fdirect-access-external-data.
`__declspec(dllimport)` is needed to enable indirect access.
There is currently no plan to affect non-ELF behaviors or -fpic behaviors.
-fno-pic -fno-direct-access-external-data will be implemented in the subsequent patch.
GCC feature request https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D92633
Like @aprantl suggested, modify to use the canonicalized DIFile, if we
don't know the loc info and filename for the compiler generated
functions for example static initialization functions.
Reviewed By: dblaikie, aprantl
Differential Revision: https://reviews.llvm.org/D87147
Match the legacy PM in running various ObjC ARC passes.
This requires making some module passes into function passes. These were
initially ported as module passes since they add function declarations
(e.g. https://reviews.llvm.org/D86178), but that's still up for debate
and other passes do so.
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D93743
@ikudrin enabled support for dwarf64 in D87011. Adding a clang flag so it can be used through that compilation pass.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D90507
`wasm_rethrow_in_catch` intrinsic and builtin are used in order to
rethrow an exception when the exception is caught but there is no
matching clause within the current `catch`. For example,
```
try {
foo();
} catch (int n) {
...
}
```
If the caught exception does not correspond to C++ `int` type, it should
be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch`
because at the time I thought there would be another intrinsic for C++'s
`throw` keyword, which rethrows an exception. It turned out that `throw`
keyword doesn't require wasm's `rethrow` instruction, so we rename
`rethrow_in_catch` to just `rethrow` here.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94038
This patch adds support for two new variants of the vectorize_width
pragma:
1. vectorize_width(X[, fixed|scalable]) where an optional second
parameter is passed to the vectorize_width pragma, which indicates if
the user wishes to use fixed width or scalable vectorization. For
example the user can now write something like:
#pragma clang loop vectorize_width(4, fixed)
or
#pragma clang loop vectorize_width(4, scalable)
In the absence of a second parameter it is assumed the user wants
fixed width vectorization, in order to maintain compatibility with
existing code.
2. vectorize_width(fixed|scalable) where the width is left unspecified,
but the user hints what type of vectorization they prefer, either
fixed width or scalable.
I have implemented this by making use of the LLVM loop hint attribute:
llvm.loop.vectorize.scalable.enable
Tests were added to
clang/test/CodeGenCXX/pragma-loop.cpp
for both the 'fixed' and 'scalable' optional parameter.
See this thread for context: http://lists.llvm.org/pipermail/cfe-dev/2020-November/067262.html
Differential Revision: https://reviews.llvm.org/D89031
This reduces the number of `WinX86_64ABIInfo::classify` call sites from
3 to 1. The call sites were similar, but passed different values for
FreeSSERegs. Use variables instead of `if`s to manage that argument.
getAs<> can return null if the cast is invalid, which can lead to null pointer deferences. Use castAs<> instead which will assert that the cast is valid.
This is an enhancement to LLVM Source-Based Code Coverage in clang to track how
many times individual branch-generating conditions are taken (evaluate to TRUE)
and not taken (evaluate to FALSE). Individual conditions may comprise larger
boolean expressions using boolean logical operators. This functionality is
very similar to what is supported by GCOV except that it is very closely
anchored to the ASTs.
Differential Revision: https://reviews.llvm.org/D84467
VLST arguments are coerced to VLATs at the function boundary for
consistency with the VLAT ABI. They are then bitcast back to VLSTs in
the function prolog. Previously, this conversion is done through memory.
With the introduction of the llvm.vector.{insert,extract} intrinsic, we
can avoid going through memory here.
Depends on D92761
Differential Revision: https://reviews.llvm.org/D92762
As a follow-up to D93656, I'm switching the Clang UniqueInternalLinkageNamesPass scheduling to using the LLVM one with newpm.
Test Plan:
Reviewed By: aeubanks, tmsriram
Differential Revision: https://reviews.llvm.org/D94019
[libomptarget][amdgpu] Call into deviceRTL instead of ockl
Amdgpu codegen presently emits a call into ockl. The same functionality
is already present in the deviceRTL. Adds an amdgpu specific entry point
to avoid the dependency. This lets simple openmp code (specifically, that
which doesn't use libm) run without rocm device libraries installed.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D93356
Add powerpcle support to clang.
For FreeBSD, assume a freestanding environment for now, as we only need it in the first place to build loader, which runs in the OpenFirmware environment instead of the FreeBSD environment.
For Linux, recognize glibc and musl environments to match current usage in Void Linux PPC.
Adjust driver to match current binutils behavior regarding machine naming.
Adjust and expand tests.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93919
The idea is that the CC1 default for ELF should set dso_local on default
visibility external linkage definitions in the default -mrelocation-model pic
mode (-fpic/-fPIC) to match COFF/Mach-O and make output IR similar.
The refactoring is made available by 2820a2ca3a.
Currently only x86 supports local aliases. We move the decision to the driver.
There are three CC1 states:
* -fsemantic-interposition: make some linkages interposable and make default visibility external linkage definitions dso_preemptable.
* (default): selected if the target supports .Lfoo$local: make default visibility external linkage definitions dso_local
* -fhalf-no-semantic-interposition: if neither option is set or the target does not support .Lfoo$local: like -fno-semantic-interposition but local aliases are not used. So references can be interposed if not optimized out.
Add -fhalf-no-semantic-interposition to a few tests using the half-based semantic interposition behavior.
* static relocation model: always
* other relocation models: if isStrongDefinitionForLinker
This will make LLVM IR emitted for COFF/Mach-O and executable ELF similar.
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.
Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
This simplifies TargetMachine::shouldAssumeDSOLocal and and gives frontend the
decision to use dso_local. For LLVM synthesized functions/globals, they may lose
inferred dso_local but such optimizations are probably not very useful.
Note: the hasComdat() condition in canBenefitFromLocalAlias (D77429) may be dead now.
(llvm/CodeGen/X86/semantic-interposition-comdat.ll)
(Investigate whether we need test coverage when Fuchsia C++ ABI is clearer)
UBSan was using the complete-object align rather than nv alignment
when checking the "this" pointer of a method.
Furthermore, CGF.CXXABIThisAlignment was also being set incorrectly,
due to an incorrectly negated test. The latter doesn't appear to have
had any impact, due to it not really being used anywhere.
Differential Revision: https://reviews.llvm.org/D93072
As proposed in https://github.com/WebAssembly/simd/pull/380. This commit makes
the new instructions available only via clang builtins and LLVM intrinsics to
make their use opt-in while they are still being evaluated for inclusion in the
SIMD proposal.
Depends on D93771.
Differential Revision: https://reviews.llvm.org/D93775
function when the receiver is nil
Callee-destroyed arguments to a method have to be destroyed in the
caller function when the receiver is nil as the method doesn't get
executed. This fixes PR48207.
rdar://71808391
Differential Revision: https://reviews.llvm.org/D93273
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.
This patch adds support of hot function attribute to LLVM IR. This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).
This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
isFunctionColdInCallGraph is true, this function will be marked as
cold.
The changes are:
(1) user annotated function attribute will used in setting function
section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.
The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.
Differential Revision: https://reviews.llvm.org/D92493
Add a special case for handling __builtin_mul_overflow with unsigned
inputs and a signed output to avoid emitting the __muloti4 library
call on x86_64. __muloti4 is not implemented in libgcc, so avoiding
this call fixes compilation of some programs that call
__builtin_mul_overflow with these arguments.
For example, this fixes the build of cpio with clang, which includes code from
gnulib that calls __builtin_mul_overflow with these argument types.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D84405
On PPC, the vector pair instructions are independent from MMA.
This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names.
We also move the vector pair type/intrinsic/builtin tests to their own files.
Differential Revision: https://reviews.llvm.org/D91974
This change makes use of the llvm.vector.extract intrinsic to avoid
going through memory when performing bitcasts between vector-length
agnostic types and vector-length specific types.
Depends on D91362
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D92761
The `assume` attribute is a way to provide additional, arbitrary
information to the optimizer. For now, assumptions are restricted to
strings which will be accumulated for a function and emitted as comma
separated string function attribute. The key of the LLVM-IR function
attribute is `llvm.assume`. Similar to `llvm.assume` and
`__builtin_assume`, the `assume` attribute provides a user defined
assumption to the compiler.
A follow up patch will introduce an LLVM-core API to query the
assumptions attached to a function. We also expect to add more options,
e.g., expression arguments, to the `assume` attribute later on.
The `omp [begin] asssumes` pragma will leverage this attribute and
expose the functionality in the absence of OpenMP.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D91979
This patch enables the Clang type __vector_pair and its associated LLVM
intrinsics even when MMA is disabled. With this patch, the type is now controlled
by the PPC paired-vector-memops option. The builtins and intrinsics will be
renamed to drop the mma prefix in another patch.
Differential Revision: https://reviews.llvm.org/D91819
This abstracts away the members that are being replaced in a follow-up patch.
Depends on D83979.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D93214
This patch adds the functionality to compare BFI counts with real
profile
counts right after reading the profile. It will print remarks under
-Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo.
Differential Revision: https://reviews.llvm.org/D91813
Followup to D87604, having confirmed on PR47506 that we can use the llvm codegen expansion for fadd/fmul as well.
Differential Revision: https://reviews.llvm.org/D92940
Background: Call to library arithmetic functions for div is emitted by the
compiler and it set wrong “C” calling convention for calls to these functions,
whereas library functions are declared with `spir_function` calling convention.
InstCombine optimization replaces such calls with “unreachable” instruction.
It looks like clang lacks SPIRABIInfo class which should specify default
calling conventions for “system” function calls. SPIR supports only
SPIR_FUNC and SPIR_KERNEL calling convention.
Reviewers: Erich Keane, Anastasia
Differential Revision: https://reviews.llvm.org/D92721
I tried to put it in the same place in the pipeline as the legacy PM.
Fixes PR48399.
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D93002
This patch adds vcmla and the rotated variants as defined in
"Arm Neon Intrinsics Reference for ACLE Q3 2020" [1]
The *_lane_* are still missing, but they can be added separately.
This patch only adds the builtin mapping for AArch64.
[1] https://developer.arm.com/documentation/ihi0073/latest
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D92930
The new PM is considered stable and many downstream groups have adopted it (some
have adopted it for more than two years). Add -f[no-]legacy-pass-manager to reflect the
fact that it is no longer experimental and the legacy pass manager is something we strive to retire.
In the future, when the legacy PM eventually goes away,
-fno-experimental-new-pass-manager and -flegacy-pass-manager will be removed.
This patch also changes -f[no-]legacy-pass-manager to pass `-plugin-opt={new,legacy}-pass-manager` to the linker (supported by both ld.lld and LLVMgold.so) when -flto/-flto=thin is specified
Reviewed By: aeubanks, rsmith
Differential Revision: https://reviews.llvm.org/D92915
Swiftcall does it's own target-independent argument type classification,
since it is not designed to be ABI compatible with anything local on the
target that isn't LLVM-based. This means it never uses inalloca.
However, we have duplicate logic for checking for inalloca parameters
that runs before call argument setup. This logic needs to know ahead of
time if inalloca will be used later, and we can't move the
CGFunctionInfo calculation earlier.
This change gets the calling convention from either the
FunctionProtoType or ObjCMethodDecl, checks if it is swift, and if so
skips the stackbase setup.
Depends on D92883.
Differential Revision: https://reviews.llvm.org/D92944
This template exists to abstract over FunctionPrototype and
ObjCMethodDecl, which have similar APIs for storing parameter types. In
place of a template, use a PointerUnion with two cases to handle this.
Hopefully this improves readability, since the type of the prototype is
easier to discover. This allows me to sink this code, which is mostly
assertions, out of the header file and into the cpp file. I can also
simplify the overloaded methods for computing isGenericMethod, and get
rid of the second EmitCallArgs overload.
Differential Revision: https://reviews.llvm.org/D92883
Currently, -ftime-report + new pass manager emits one line of report for each
pass run. This potentially causes huge output text especially with regular LTO
or large single file (Obeserved in private tests and was reported in D51276).
The behaviour of -ftime-report + legacy pass manager is
emitting one line of report for each pass object which has relatively reasonable
text output size. This patch adds a flag `-ftime-report=` to control time report
aggregation for new pass manager.
The flag is for new pass manager only. Using it with legacy pass manager gives
an error. It is a driver and cc1 flag. `per-pass` is the new default so
`-ftime-report` is aliased to `-ftime-report=per-pass`. Before this patch,
functionality-wise `-ftime-report` is aliased to `-ftime-report=per-pass-run`.
* Adds an boolean variable TimePassesHandler::PerRun to control per-pass vs per-pass-run.
* Adds a new clang CodeGen flag CodeGenOptions::TimePassesPerRun to work with the existing CodeGenOptions::TimePasses.
* Remove FrontendOptions::ShowTimers, its uses are replaced by the existing CodeGenOptions::TimePasses.
* Remove FrontendTimesIsEnabled (It was introduced in D45619 which was largely reverted.)
Differential Revision: https://reviews.llvm.org/D92436
This test shows we're in some cases not getting strictfp information from
the AST. Correct that.
Differential Revision: https://reviews.llvm.org/D92596
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
getFieldOffset(layoutStartOffset) is expected to point to the first trivial
field or the one which follows non-trivial. So it must be byte aligned already.
However this is not obvious without assumptions about callers.
This patch will avoid the need in such assumptions.
Depends on D92727.
Differential Revision: https://reviews.llvm.org/D92728
Such fields will likely have offset zero making
__sanitizer_dtor_callback poisoning wrong regions.
E.g. it can poison base class member from derived class constructor.
Differential Revision: https://reviews.llvm.org/D92727
PPCMCInstLower does not actually call shouldAssumeDSOLocal for ppc32 so this is dead.
Actually Clang ppc32 does produce a pair of absolute relocations which match GCC.
This also fixes a comment (R_PPC_COPY and R_PPC64_COPY do exist).
shouldRTTIBeUnique() returns false for iOS64CXXABI, which causes
RTTI objects to be emitted hidden. Update two tests that didn't
expect this to happen for the default triple.
Also rename iOS64CXXABI to AppleARM64CXXABI, since it's used for
arm64-apple-macos triples too.
Part of PR46644.
Differential Revision: https://reviews.llvm.org/D91904
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.
One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.
With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.
To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.
Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.
Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D91756
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
It doesn't use the GCC personality rountine, because the
interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.
Reviewed By: daltenty, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D91455
In C++ when a reference variable is captured by copy, the lambda
is supposed to make a copy of the referenced variable in the captures
and refer to the copy in the lambda. Therefore, it is valid to capture
a reference to a host global variable in a device lambda since the
device lambda will refer to the copy of the host global variable instead
of access the host global variable directly.
However, clang tries to avoid capturing of reference to a host global variable
if it determines the use of the reference variable in the lambda function is
not odr-use. Clang also tries to emit load of the reference to a global variable
as load of the global variable if it determines that the reference variable is
a compile-time constant.
For a device lambda to capture a reference variable to host global variable
and use the captured value, clang needs to be taught that in such cases the use of the reference
variable is odr-use and the reference variable is not compile-time constant.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D91088
OpenMPIRBuilder::createParallel outlines the body region of the parallel
construct into a new function that accepts any value previously defined outside
the region as a function argument. This function is called back by OpenMP
runtime function __kmpc_fork_call, which expects trailing arguments to be
pointers. If the region uses a value that is not of a pointer type, e.g. a
struct, the produced code would be invalid. In such cases, make createParallel
emit IR that stores the value on stack and pass the pointer to the outlined
function instead. The outlined function then loads the value back and uses as
normal.
Reviewed By: jdoerfert, llitchev
Differential Revision: https://reviews.llvm.org/D92189
Commit 6b1341eb fixed alignment for 128-bit FP types on PowerPC.
However, the quadword alignment adjustment shouldn't be applied to IBM
extended double (ppc_fp128 in IR) values.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92278
Methods synthesized from declared properties were being added to the
method lists twice. This came from the change to list them in the
class's method list, which missed removing the place in CGObjCGNU that
added them again.
Reviewed By: lanza
Differential Revision: https://reviews.llvm.org/D91874
Thanks to D77248, we can bypass the use of stubs altogether and use PLT
relocations if they are available for the target. LLVM and LLD support the
R_AARCH64_PLT32 relocation, so we can also guarantee a static PLT relocation on AArch64.
Not emitting these stubs saves a lot of extra binary size.
Differential Revision: https://reviews.llvm.org/D83812
After D17993, with -fno-delete-null-pointer-checks we add the dereferenceable attribute to the `this` pointer.
We have observed that one internal target which worked before fails even with -fno-delete-null-pointer-checks.
Switching to dereferenceable_or_null fixes the problem.
dereferenceable currently does not always respect NullPointerIsValid and may
imply nonnull and lead to aggressive optimization. The optimization may be
related to `CallBase::isReturnNonNull`, `Argument::hasNonNullAttr`, or
`Value::getPointerDereferenceableBytes`. See D66664 and D66618 for some discussions.
Reviewed By: bkramer, rsmith
Differential Revision: https://reviews.llvm.org/D92297
This change introduces a new clang switch `-fpseudo-probe-for-profiling` to enable AutoFDO with pseudo instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
One implication from pseudo-probe instrumentation is that the profile is now sensitive to CFG changes. We perform the pseudo instrumentation very early in the pre-LTO pipeline, before any CFG transformation. This ensures that the CFG instrumented and annotated is stable and optimization-resilient.
The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86502
Currently clang is not correctly retrieving from the AST the metadata for
constrained FP builtins. This patch fixes that for the non-target specific
builtins.
Differential Revision: https://reviews.llvm.org/D92122
This patch enables vector type arguments on AIX. All non-aggregate Altivec vector types are 16bytes in size and are 16byte aligned.
Reviewed By: Xiangling_L
Differential Revision: https://reviews.llvm.org/D92117
This code got quite twisted because we consider some MSVC builtins to be
target agnostic, and some to be target specific. Target specific
intrinsics have a pattern of doing up-front argument evaluation, while
general intrinsics do not evaluate their arguments up front. As we tried
to share codepaths between the target-specific and target-agnostic
handling, we ended up doing double evaluation.
Instead, have each target handle MSVC intrinsics consistently before up
front argument evaluation. This requires passing less data around and is
more consistent with target independent intrinsic handling.
See D50979 for past examples of this bug. I noticed this while looking
into adding some more intrinsics.
Differential Revision: https://reviews.llvm.org/D92061
Added support for the options mabi=vec-extabi and mabi=vec-default which are analogous to qvecnvol and qnovecnvol when using XL on AIX.
The extended Altivec ABI on AIX is enabled using mabi=vec-extabi in clang and vec-extabi in llc.
Reviewed By: Xiangling_L, DiggerLin
Differential Revision: https://reviews.llvm.org/D89684
Add a Visited set to avoid repeatedly processing the same base classes
in complex class hierarchies. This cut down the compile time of one
source file from >12min to ~1min.
Differential Revision: https://reviews.llvm.org/D91676
Recently HIP toolchain made a change to use clang instead of opt/llc to do compilation
(https://reviews.llvm.org/D81861). The intention is to make HIP toolchain canonical like
other toolchains.
However, this change introduced an unintentional change regarding backend fp fuse
option, which caused regressions in some HIP applications.
Basically before the change, HIP toolchain used clang to generate bitcode, then use
opt/llc to optimize bitcode and generate ISA. As such, the amdgpu backend takes
the default fp fuse mode which is 'Standard'. This mode respect contract flag of
fmul/fadd instructions and do not fuse fmul/fadd instructions without contract flag.
However, after the change, HIP toolchain now use clang to generate IR, do optimization,
and generate ISA as one process. Now amdgpu backend fp fuse option is determined
by -ffp-contract option, which is 'fast' by default. And this -ffp-contract=fast language option
is translated to 'Fast' fp fuse option in backend. Suddenly backend starts to fuse fmul/fadd
instructions without contract flag.
This causes wrong result for some device library functions, e.g. tan(-1e20), which should
return 0.8446, now returns -0.933. What is worse is that since backend with 'Fast' fp fuse
option does not respect contract flag, there is no way to use #pragma clang fp contract
directive to enforce fp contract requirements.
This patch fixes the regression by introducing a new value 'fast-honor-pragmas' for -ffp-contract
and use it for HIP by default. 'fast-honor-pragmas' is equivalent to 'fast' in frontend but
let the backend to use 'Standard' fp fuse option. 'fast-honor-pragmas' is useful since 'Fast'
fp fuse option in backend does not honor contract flag, it is of little use to HIP
applications since all code with #pragma STDC FP_CONTRACT or any IR from a
source compiled with -ffp-contract=on is broken.
Differential Revision: https://reviews.llvm.org/D90174
Ensure that the DSO Locality of the globals in the IR is derived from
their final visibility when using -fvisibility-from-dllstorageclass.
To accomplish this we reset the DSO locality of globals (before
setting their visibility from their dllstorageclass) at the end of
IRGen in Clang. This removes any effects that visibility options or
annotations may have had on the DSO locality.
The resulting DSO locality of the globals will be pessimistic
w.r.t. to the normal compiler IRGen.
Differential Revision: https://reviews.llvm.org/D91779
After fix for PR48174 the base pointer for pointer-based
array-sections/array-subscripts will be emitted as `&ptr[idx]`, but
actually it should be just `ptr`, i.e. the address stored in the ponter
to point correctly to the beginning of the array. Currently it may lead
to a crash in the runtime.
Differential Revision: https://reviews.llvm.org/D91805
This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D84345
This moves handling of alwaysinline, coroutines, matrix lowering, PGO,
and LTO-required passes into PassBuilder. Much of this is replicated
between Clang and opt. Other out-of-tree users also replicate some of
this, such as Rust [1] replicating the alwaysinline, LTO, and PGO
passes.
The LTO passes are also now run in
build(Thin)LTOPreLinkDefaultPipeline() since they are semantically
required for (Thin)LTO.
[1]: f5230fbf76/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp (L896)
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D91585
Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.
Reviewers: jdoerfert grokos
Differential Revision: https://reviews.llvm.org/D87946
According to ELF v2 ABI, both IEEE 128-bit and IBM extended floating
point variables should be quad-word (16 bytes) aligned. Previously, only
vector types are considered aligned as quad-word on PowerPC.
This patch will fix incorrectness of IEEE 128-bit float argument in
va_arg cases.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D91596
Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D89802
The compiler should treat array subscript with base pointer as a first
pointer in complex data, it is used only for member expression with base
pointer.
Differential Revision: https://reviews.llvm.org/D91660
Matrix types in memory are represented as arrays, but accessed through
vector pointers, with the alignment specified on the access operation.
For inline assembly, update pointer arguments to use vector pointers.
Otherwise there will be a mis-match if the matrix is also an
input-argument which is represented as vector.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D91631
If the data member pointer is mapped, the compiler tries to optimize the
mapping of such data by discarding explicit mapping flags and trying to
emit combined data instead. In some cases, this optimization is not
quite correctly implemented and it leads to a program crash at the
runtime. Instead, if the data member is mapped, just emit it as is and
do not emit combined mapping flags for it.
Differential Revision: https://reviews.llvm.org/D91552
Add an option -munsafe-fp-atomics for AMDGPU target.
When enabled, clang adds function attribute "amdgpu-unsafe-fp-atomics"
to any functions for amdgpu target. This allows amdgpu backend to use
unsafe fp atomic instructions in these functions.
Differential Revision: https://reviews.llvm.org/D91546
arguments.
* Adds 'nonnull' and 'dereferenceable(N)' to 'this' pointer arguments
* Gates 'nonnull' on -f(no-)delete-null-pointer-checks
* Introduces this-nonnull.cpp and microsoft-abi-this-nullable.cpp tests to
explicitly test the behavior of this change
* Refactors hundreds of over-constrained clang tests to permit these
attributes, where needed
* Updates Clang12 patch notes mentioning this change
Reviewed-by: rsmith, jdoerfert
Differential Revision: https://reviews.llvm.org/D17993
This patch updates Clang's IRGen to add !annotation nodes with an
"auto-init" annotation to all stores for auto-initialization.
As discussed in 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
this allows using optimization remarks to track down where auto-init
code was inserted (and not removed by optimizations).
There are a few cases in the tests where !annotation gets dropped by
optimizations. Those optimizations will be updated in subsequent
patches.
This patch is based on a patch by Francis Visoiu Mistrih.
Reviewed By: thegameg, paquette
Differential Revision: https://reviews.llvm.org/D91417
parameters.
It appears that LLVM isn't able to generate a DW_AT_const_value for a
constant of class type, but if it could, we'd match GCC's debug info in
this case, and in the interim we no longer crash.
See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.
I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.
This reverts commit 7bdad08429,
and some of it's follow-ups, that don't stand on their own.
The code has a few sequence that looked like:
Ops.push_back(Ops[0]);
Ops.erase(Ops.begin());
And are equivalent to:
std::rotate(Ops.begin(), Ops.begin() + 1, Ops.end());
The latter has the advantage of never reallocating the vector, which
would be a bug in the original code as push_back would read from the
memory it deallocated.
Make it required. Since it's a module pass, optnone won't test it, so
extend the clang test to also use opt-bisect now that it's supported.
14/16 check-dfsan tests failed with NPM enabled, now all pass.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D91385
- If an aggregate argument is indirectly accessed within kernels, direct
passing results in unpromotable `alloca`, which degrade performance
significantly. InferAddrSpace pass is enhanced in
[D91121](https://reviews.llvm.org/D91121) to take the assumption that
generic pointers loaded from the constant memory could be regarded
global ones. The need for the coercion on aggregate arguments is
mitigated.
Differential Revision: https://reviews.llvm.org/D89980
This removes lots of duplicated code which was necessary before
https://reviews.llvm.org/D89158.
Now we can use PassBuilder::runRegisteredEPCallbacks().
This is mostly sanitizers.
There is likely more that can be done to simplify, but let's start with this.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D90870
Need to check if there are map types for the components before trying to
access them when trying to modify type mappings for combined partial
mappings.
Differential Revision: https://reviews.llvm.org/D91370
For dllexported default constructors with default arguments, we export
default constructor closures which pass in the default args. (See D8331
for a good explanation.)
For templates, that means those default args must be instantiated even
if the function isn't called. That is done by the
InstantiateDefaultCtorDefaultArgs() function, but it wasn't done for
explicit specializations, causing asserts (see bug).
Differential revision: https://reviews.llvm.org/D91089
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Since callbacks may end up not adding passes, we need to check if the
pass managers are empty before adding them, so PassManager now has an
isEmpty() function. For example, polly adds callbacks but doesn't always
add passes in those callbacks, so this is necessary to keep
-debug-pass-manager tests' output from changing depending on if polly is
enabled or not.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
Reviewed By: asbirlea, Meinersbur
Differential Revision: https://reviews.llvm.org/D89158
except where they are necessary to disambiguate the target.
This substantially improves diagnostics from the standard library,
which are otherwise full of `::__1::` noise.
This enables a method sending an autorelease message to an object and
returning the object in MRR to avoid adding the object to an autorelease
pool if a call to objc_retainAutoreleasedReturnValue in the caller
function accepts the hand off of the retain count.
rdar://problem/50678052
Differential Revision: https://reviews.llvm.org/D91111
mangling support for non-type template parameters of class type and
template parameter objects.
The Itanium side of this follows the approach I proposed in
https://github.com/itanium-cxx-abi/cxx-abi/issues/47 on 2020-09-06.
The MSVC side of this was determined empirically by observing MSVC's
output.
Differential Revision: https://reviews.llvm.org/D89998
For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.
This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.
I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.
Reviewed By: mehdi_amini, fghanim
Differential Revision: https://reviews.llvm.org/D91109
D86841 had an error where for statements with no conditional were
required to make progress. This is not true, this patch removes that
line, and adds regression tests.
Differential Revision: https://reviews.llvm.org/D91075
In order not to modify the `tgt_target_data_update` information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload `arg` when
the maptype is set as `OMP_MAP_DESCRIPTOR`. The origin `arg` is for
passing the pointer information, however, the overloaded `arg` is an
array of descriptor_dim:
struct descriptor_dim {
int64_t offset;
int64_t count;
int64_t stride
};
and the array size is the same as dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
`arg_size` parameter by using dimension size.
For supporting `stride` in array section, we use a dummy dimension in
descriptor to store the unit size. The formula for counting the stride
in dimension D_n: `unit size * (D_0 * D_1 ... * D_n-1) * D_n.stride`.
Demonstrate how it works:
```
double arr[3][4][5];
D0: { offset = 0, count = 1, stride = 8 } // offset, count, dimension size always be 0, 1, 1 for this extra dimension, stride is the unit size
D1: { offset = 0, count = 2, stride = 8 * 1 * 2 = 16 } // stride = unit size * (product of dimension size of D0) * D1.stride = 4 * 1 * 2 = 8
D2: { offset = 2, count = 2, stride = 8 * (1 * 5) * 1 = 40 } // stride = unit size * (product of dimension size of D0, D1) * D2.stride = 4 * 5 * 1 = 20
D3: { offset = 0, count = 2, stride = 8 * (1 * 5 * 4) * 2 = 320 } // stride = unit size * (product of dimension size of D0, D1, D2) * D3.stride = 4 * 25 * 2 = 200
// X here means we need to offload this data, therefore, runtime will transfer
// data from offset 80, 96, 120, 136, 400, 416, 440, 456
// Runtime patch: https://reviews.llvm.org/D82245
// OOOOO OOOOO OOOOO
// OOOOO OOOOO OOOOO
// XOXOO OOOOO XOXOO
// XOXOO OOOOO XOXOO
```
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84192
The strictfp metadata was added to the casting AST nodes in D85960, but
we aren't using that metadata yet. This patch adds that support.
In order to avoid lots of ad-hoc passing around of the strictfp bits I
updated the IRBuilder when moving from a function that has the Expr* to a
function that lacks it. I believe we should switch to this pattern to keep
the strictfp support from being overly invasive.
For the purpose of testing that we're picking up the right metadata, I
also made my tests use a pragma to make the AST's strictfp metadata not
match the global strictfp metadata. This exposes issues that we need to
deal with in subsequent patches, and I believe this is the right method
for most all of our clang strictfp tests.
Differential Revision: https://reviews.llvm.org/D88913
For the language C++ the keyword __unaligned (a Microsoft extension) had no effect on pointers.
The reason, why there was a difference between C and C++ for the keyword __unaligned:
For C, the Method getAsCXXREcordDecl() returns nullptr. That guarantees that hasUnaligned() is called.
If the language is C++, it is not guaranteed, that hasUnaligend() is called and evaluated.
Here are some links:
The Bug: https://bugs.llvm.org/show_bug.cgi?id=47499
Thread on the cfe-dev mailing list: http://lists.llvm.org/pipermail/cfe-dev/2020-September/066783.html
Diff, that introduced the check hasUnaligned() in getNaturalTypeAlignment(): https://reviews.llvm.org/D30166
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D90630
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
In order to prevent TargetMachines from adding unnecessary optimization
passes at -O0, TargetMachine::registerPassBuilderCallbacks() will be
changed to take an OptimizationLevel, but that will be done separately.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89158
Since C++11, the C++ standard has a forward progress guarantee
[intro.progress], so all such functions must have the `mustprogress`
requirement. In addition, from C11 and onwards, loops without a non-zero
constant conditional or no conditional are also required to make
progress (C11 6.8.5p6). This patch implements these attribute deductions
so they can be used by the optimization passes.
Differential Revision: https://reviews.llvm.org/D86841
Clang now asserts for the below case:
```
void clang::CodeGen::CGOpenMPRuntime::createOffloadEntriesAndInfoMetadata(): Assertion `std::get<0>(E) && "All ordered entries must exist!"' failed.
```
The reason why Clang hit the assert is because in
`emitTargetDataCalls`, both `BeginThenGen` and `BeginElseGen` call
`registerTargetRegionEntryInfo` and try to register the Entry in
OffloadEntriesTargetRegion with same key. If changing the expression in
if clause to any constant expression, then the assert disappear. (https://godbolt.org/z/TW7haj)
The assert itself is to avoid
user from accessing elements out of bound inside `OrderedEntries` in
`createOffloadEntriesAndInfoMetadata`.
In this patch, I add a check in `registerTargetRegionEntryInfo` to avoid
register the target region more than once.
A test case that triggers assert: https://godbolt.org/z/4cnGW8
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D90704
Since glibc has supported math library functions conforming IEEE 128-bit
floating point types on some platform (like ppc64le), we can fix clang's
math builtins missing this type.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D90593
Add MMA builtin decoding. These builtins use the new PowerPC-specific types __vector_pair and __vector_quad.
So to avoid pervasive changes, we use custom type descriptors and custom decoding for these builtins.
We also use custom code generation to expand builtin calls with pointers to simpler intrinsic calls with non-pointer types.
Differential Revision: https://reviews.llvm.org/D81748
415f7ee883 had a silly typo introduced when I inlined some
code into a loop from its own function.
Original commit message:
For PlayStation we offer source code compatibility with
Microsoft's dllimport/export annotations; however, our file
format is based on ELF.
To support this we translate from DLL storage class to ELF
visibility at the end of codegen in Clang.
Other toolchains have used similar strategies (e.g. see the
documentation for this ARM toolchain:
https://developer.arm.com/documentation/dui0530/i/migrating-from-rvct-v3-1-to-rvct-v4-0/changes-to-symbol-visibility-between-rvct-v3-1-and-rvct-v4-0)
This patch adds the ability to perform this translation. Options
are provided to support customizing the mapping behaviour.
Differential Revision: https://reviews.llvm.org/D89970
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
Currently for explicit template function instantiation in CUDA/HIP device
compilation clang emits instantiated kernel with external linkage
and instantiated device function with internal linkage.
This is fine for -fno-gpu-rdc since there is only one TU.
However this causes duplicate symbols for kernels for -fgpu-rdc if
the same instantiation happen in multiple TU. Or missing symbols
if a device function calls an explicitly instantiated template function
in a different TU.
To make explicit template function instantiation work for
-fgpu-rdc we need to follow the C++ linkage paradigm, i.e.
use weak_odr linkage.
Differential Revision: https://reviews.llvm.org/D90311
The __isPlatformVersionAtLeast routine is an implementation of `if (@available)` check
that uses the _availability_version_check API on Darwin that's supported on
macOS 10.15, iOS 13, tvOS 13 and watchOS 6.
Differential Revision: https://reviews.llvm.org/D90367
415f7ee883 had LIT test failures on any build where the clang executable
was not called "clang". I have adjusted the LIT CHECKs to remove the
binary name to fix this.
Original commit message:
For PlayStation we offer source code compatibility with
Microsoft's dllimport/export annotations; however, our file
format is based on ELF.
To support this we translate from DLL storage class to ELF
visibility at the end of codegen in Clang.
Other toolchains have used similar strategies (e.g. see the
documentation for this ARM toolchain:
https://developer.arm.com/documentation/dui0530/i/migrating-from-rvct-v3-1-to-rvct-v4-0/changes-to-symbol-visibility-between-rvct-v3-1-and-rvct-v4-0)
This patch adds the ability to perform this translation. Options
are provided to support customizing the mapping behaviour.
Differential Revision: https://reviews.llvm.org/D89970
Similar to -fprofile-generate=, add -fmemory-profile= which takes a
directory path. This is passed down to LLVM via a new module flag
metadata. LLVM in turn provides this name to the runtime via the new
__memprof_profile_filename variable.
Additionally, always pass a default filename (in $cwd if a directory
name is not specified vi the = form of the option). This is also
consistent with the behavior of the PGO instrumentation. Since the
memory profiles will generally be fairly large, it doesn't make sense to
dump them to stderr. Also, importantly, the memory profiles will
eventually be dumped in a compact binary format, which is another reason
why it does not make sense to send these to stderr by default.
Change the existing memprof tests to specify log_path=stderr when that
was being relied on.
Depends on D89086.
Differential Revision: https://reviews.llvm.org/D89087
The attribute has no effect on a do statement since the path of execution
will always include its substatement.
It adds a diagnostic when the attribute is used on an infinite while loop
since the codegen omits the branch here. Since the likelihood attributes
have no effect on a do statement no diagnostic will be issued for
do [[unlikely]] {...} while(0);
Differential Revision: https://reviews.llvm.org/D89899
Make DebugLogging a member variable so that users of PassBuilder don't
need to pass it around so much.
Move call to TargetMachine::registerPassBuilderCallbacks() within
PassBuilder so users don't need to remember to call it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D90437
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
We don't currently support passing unnamed variadic SVE arguments
so I've added a fatal error if we hit such cases to prevent any
silent ABI issues in future.
Differential Revision: https://reviews.llvm.org/D90230
This patch is mainly doing two things:
1. Adding support for parentheses, making the combination of target features
more diverse;
2. Making the priority of ’,‘ is higher than that of '|' by default. So I need
to make some change with PTX Builtin function.
Differential Revision: https://reviews.llvm.org/D89184
[AMDGPU] Add __builtin_amdgcn_grid_size
Similar to D76772, loads the data from the dispatch pointer. Marked invariant.
Patch also updates the openmp devicertl to use this builtin.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D90251
We used to only emit static const data members in CodeView as
S_CONSTANTS when they were used; this patch makes it so they are always emitted.
This changes CodeViewDebug.cpp to find the static const members from the
class debug info instead of creating DIGlobalVariables in the IR
whenever a static const data member is used.
Bug: https://bugs.llvm.org/show_bug.cgi?id=47580
Differential Revision: https://reviews.llvm.org/D89072
This reverts commit 504615353f.
Previously we added support for target nowait, but target data nowait
has not been supported yet. In this patch, target data nowait will also be
wrapped into a task.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90099
Define the __vector_pair and __vector_quad types that are used to manipulate
the new accumulator registers introduced by MMA on PowerPC. Because these two
types are specific to PowerPC, they are defined in a separate new file so it
will be easier to add other PowerPC specific types if we need to in the future.
Differential Revision: https://reviews.llvm.org/D81508
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.
Differential Revision: https://reviews.llvm.org/D90253
[libomptarget][nvptx] Undef, weak shared variables
Shared variables on nvptx, and LDS on amdgcn, are uninitialized at
the start of kernel execution. Therefore create the variables with
undef instead of zeros, motivated in part by the amdgcn back end
rejecting LDS+initializer.
Common is zero initialized, which seems incompatible with shared. Thus
change them to weak, following the direction of
https://reviews.llvm.org/rG7b3eabdcd215
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90248
Summary:
This patch adds support for passing in the original delcaration name in the
source file to the libomptarget runtime. This will allow the runtime to provide
more intelligent debugging messages. This patch takes the original expression
parsed from the OpenMP map / update clause and provides a textual
representation if it was explicitly mapped, otherwise it takes the name of the
variable declaration as a fallback. The information in passed to the runtime in
a global array of strings that matches the existing ident_t source location
strings using ";name;filename;column;row;;". See
clang/test/OpenMP/target_map_names.cpp for an example of the generated output
for a given map clause.
Reviewers: jdoervert
Differential Revision: https://reviews.llvm.org/D89802
In current implementation, if it requires an outer task, the mapper array will be privatized no matter whether it has mapper. In fact, when there is no mapper, the mapper array only contains number of nullptr. In the libomptarget, the use of mapper array is `if (mappers_array && mappers_array[i])`, which means we can directly set mapper array to nullptr if there is no mapper. This can avoid unnecessary data copy.
In this patch, the data privatization will not be emitted if the mapper array is nullptr. When it comes to the emit of task body, the nullptr will be used directly.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90101
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
The implementation of target nowait just wraps the target region into a task. The essential four parameters (base ptr, ptr, size, mapper) are taken as firstprivate such that they will be copied to the private location. When there is no user-defined mapper, the mapper variable will be nullptr. However, it will be still copied to the corresponding place. Therefore, a memcpy will be generated and the source pointer will be nullptr, causing a segmentation fault. The root cause is when calling `emitOffloadingArraysArgument`, the last argument `Options` has a field about whether it requires a task. It only takes depend clause into account. In this patch, the nowait clause is also included.
There're two things that will be done in another patches:
1. target data nowait has not been supported yet. D90099 added the support.
2. When there is no mapper, the mapper array can be nullptr no matter whether it requires outer task or not. It can avoid an unnecessary data copy. This is an optimization that is covered in D90101.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D89844
We used to only emit static const data members in CodeView as
S_CONSTANTS when they were used; this patch makes it so they are always emitted.
I changed CodeViewDebug.cpp to find the static const members from the
class debug info instead of creating DIGlobalVariables in the IR
whenever a static const data member is used.
Bug: https://bugs.llvm.org/show_bug.cgi?id=47580
Differential Revision: https://reviews.llvm.org/D89072
This is a long-delayed follow-up to
5e5b85098d.
`TempMDNode` includes a bunch of machinery for RAUW, and should only be
used when necessary. RAUW wasn't being used in any of these cases... it
was just a placeholder for a self-reference.
Where the real node was using `MDNode::getDistinct`, just replace the
temporary argument with `nullptr`.
Where the real node was using `MDNode::get`, the `replaceOperandWith`
call was "promoting" the node to a distinct one implicitly due to
self-reference detection in `MDNode::handleChangedOperand`. The
`TempMDNode` was serving a purpose by delaying uniquing, but it's way
simpler to just call `MDNode::getDistinct` in the first place.
Note that using a self-reference at all in these places is a hold-over
from before `distinct` metadata existed. It was an old trick to create
distinct nodes. It would be intrusive to change, including bitcode
upgrades, etc., and it's harmless so I'm not sure there's much value in
removing it from existing schemas. After this commit it still has a tiny
memory cost (in the extra metadata operand) but no more overhead in
construction.
Differential Revision: https://reviews.llvm.org/D90079
This allows using annotation in a much more contexts than it currently has.
especially when annotation with template or constexpr.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D88645
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.
While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
Fixes pr/47479.
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D87956
non-type template parameters.
Create a unique TemplateParamObjectDecl instance for each such value,
representing the globally unique template parameter object to which the
template parameter refers.
No IR generation support yet; that will follow in a separate patch.
assembly operands."
Earlyclobbers are now excepted from this change (original commit: c78da03).
Review: Ulrich Weigand, Nick Desaulniers
Differential Revision: https://reviews.llvm.org/D87279
The patch adjusts the existing `llvm::DenseMap<unsigned, T>` and
`llvm::DenseSet<unsigned>` objects that store source locations, so
that they use `SourceLocation` directly instead of `unsigned`.
This patch relies on the `DenseMapInfo` trait added in D89719.
It also replaces the construction of `SourceLocation` objects from
the constants -1 and -2 with calls to the trait's methods `getEmptyKey`
and `getTombstoneKey` where appropriate.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D69840
This fixes miscomputation of __builtin_constant_evaluated in the
initializer of a variable that's not usable in constant expressions, but
is readable when constant-folding.
If evaluation of a constant initializer fails, we throw away the
evaluated result instead of keeping it as a non-constant-initializer
value for the variable, because it might not be a correct value.
To avoid regressions for initializers that are foldable but not formally
constant initializers, we now try constant-evaluating some globals in
C++ twice: once to check for a constant initializer (in an mode where
is_constannt_evaluated returns true) and again to determine the runtime
value if the initializer is not a constant initializer.
The name is unfortunate because it is similar to the driver option -ftest-coverage.
It turns out aside from one occurrence in a test, this option is not used.
Instead of framing the interface around whether the variable is an ICE
(which is only interesting in C++98), primarily track whether the
initializer is a constant initializer (which is interesting in all C++
language modes).
No functionality change intended.
for which it matters.
This is a step towards separating checking for a constant initializer
(in which std::is_constant_evaluated returns true) and any other
evaluation of a variable initializer (in which it returns false).
Recently commit D78699 (commit 26cfb6e562), fixed clang's behavior with respect
to passing a union type through a register to correctly follow the ABI. However,
this is an ABI breaking change with earlier versions of the clang compiler, so we
should add an -fclang-abi-compat option to address this. Additionally, the PS4 ABI
requires the older behavior, so that is added as well.
This change adds a Ver11 value to the ClangABI enum that when it is set (or the
target is the PS4 triple), we skip the ABI fix introduced in D78699.
Differential Revision: https://reviews.llvm.org/D89747
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.
> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
This reverts commit 273c299d5d.
Implementing the likelihood attributes for the iteration statements adds
a new helper function. This function can't be const qualified since
these non-modifying members aren't const qualified.
This implements the likelihood attribute for the switch statement. Based on the
discussion in D85091 and D86559 it only handles the attribute when placed on
the case labels or the default labels.
It also marks the likelihood attribute as feature complete. There are more QoI
patches in the pipeline.
Differential Revision: https://reviews.llvm.org/D89210
initialization a little smarter.
Look through casts that preserve zero-ness when determining if an
initializer is zero, so that we can handle cases like an {0} initializer
whose corresponding field is a type other than 'int'.
This patch makes sure that the instance of TypeSize comparison operator
is done with a fixed type size.
Differential Revision: https://reviews.llvm.org/D89312
This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).
To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.
Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
rL131311 added `asm()` support for builtin functions, but `asm()` for builtins with
specialized emitting (e.g. memcpy, various math functions) still do not work.
This patch makes these functions work for `asm()` and `#pragma redefine_extname`.
glibc uses `asm()` to redirect internal libc function calls to hidden aliases.
Limitation: such a function is a builtin in clang, but will not be recognized as
a libcall in optimization passes because Clang does not annotate the renamed
function as a libcall. In GCC -O1 or above, `abs` can be optimized out but we can't.
Additionally, we cannot redirect `__builtin_sin` to `real_sin` in the following example:
double sin(double x) asm("real_sin");
double f(double d) { return __builtin_sin(d); }
---
According to @rsmith, the following three statements cannot be simultaneously true:
(1) The frontend function foo has known, builtin semantics X.
(2) The symbol foo has known, builtin semantics X.
(3) It's not correct to lower a call to the frontend function foo to the symbol foo.
People do want (1) (if it is profitable to expand a memcpy, do it).
This also means that people do not want to add -fno-builtin-memcpy.
People do want (3): that is why they use asm("__GI_memcpy") in the first place.
So unfortunately we make a compromise by not refuting (2) (see the limitation above).
For most libcalls, there is a small loss because compilers don't synthesize them.
For the few glibc cares about, it uses `asm("memcpy = __GI_memcpy");` to make
the assembly level redirection.
(Changing function names (e.g. `__memcpy`) is a hit to ergonomics which is not acceptable).
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D88712
This reverts commits 683b308c07 and
8487bfd4e9.
We will go for a more restricted approach that does not give freedom to
everyone to change ABIs on whichever platform.
See the discussion on https://reviews.llvm.org/D85802.
Prototype the newly proposed load_lane instructions, as specified in
https://github.com/WebAssembly/simd/pull/350. Since these instructions are not
available to origin trial users on Chrome stable, make them opt-in by only
selecting them from intrinsics rather than normal ISel patterns. Since we only
need rough prototypes to measure performance right now, this commit does not
implement all the load and store patterns that would be necessary to make full
use of the offset immediate. However, the full suite of offset tests is included
to make it easy to track improvements in the future.
Since these are the first instructions to have a memarg immediate as well as an
additional immediate, the disassembler needed some additional hacks to be able
to parse them correctly. Making that code more principled is left as future
work.
Differential Revision: https://reviews.llvm.org/D89366
Using TypeSize::getFixedSize() instead of relying upon the implicit
TypeSize->uint64_cast as the type is always fixed width.
Differential Revision: https://reviews.llvm.org/D89313
Update `clang/lib/CodeGen` to use a `MemoryBufferRef` from
`getBufferOrNone` instead of `MemoryBuffer*` from `getBuffer`. No
functionality change here.
Differential Revision: https://reviews.llvm.org/D89411
This implements the flag proposed in RFC http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html.
The goal is to add a way to override the default target C++ ABI through
a compiler flag. This makes it easier to test and transition between different
C++ ABIs through compile flags rather than build flags.
In this patch:
- Store `-fc++-abi=` in a LangOpt. This isn't stored in a
CodeGenOpt because there are instances outside of codegen where Clang
needs to know what the ABI is (particularly through
ASTContext::createCXXABI), and we should be able to override the
target default if the flag is provided at that point.
- Expose the existing ABIs in TargetCXXABI as values that can be passed
through this flag.
- Create a .def file for these ABIs to make it easier to check flag
values.
- Add an error for diagnosing bad ABI flag values.
Differential Revision: https://reviews.llvm.org/D85802
Change EmitAsmStmt() to
- Not tie physregs with the "+r" constraint, but instead add the hard
register as an input constraint. This makes "+r" and "=r":"r" look the same
in the output.
Background: Macro intensive user code may contain inline assembly
statements with multiple operands constrained to the same physreg. Such a
case (with the operand constraints "+r" : "r") currently triggers the
TwoAddressInstructionPass assertion against any extra use of a tied
register. Furthermore, TwoAddress will insert a COPY to that physreg even
though isel has already done so (for the non-tied use), which may lead to a
second redundant instruction currently. A simple fix for this is to not
emit tied physreg uses in the first place for the "+r" constraint, which is
what this patch does.
- Give an error on multiple outputs to the same physical register.
This should be reported and this is also what GCC does.
Review: Ulrich Weigand, Aaron Ballman, Jennifer Yu, Craig Topper
Differential Revision: https://reviews.llvm.org/D87279
Followup to D85191.
This changes getTypeInfoInChars to return a TypeInfoChars
struct instead of a std::pair of CharUnits. This lets the
interface match getTypeInfo more closely.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86447
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
short a;
int b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.
The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```
I've updated the patch D16586 to follow such behavior by verifying that we
only change volatile bit-field access when:
- it won't overlap with any other non-bit-field member
- we only access memory inside the bounds of the record
- avoid overlapping zero-length bit-fields.
Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D72932
Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences.
The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions.
Differential Revision: https://reviews.llvm.org/D87604
References to different declarations of the same entity aren't different
values, so shouldn't have different representations.
Recommit of e6393ee813, most recently
reverted in 9a33f027ac due to a bug caused
by ObjCInterfaceDecls not propagating availability attributes along
their redeclaration chains; that bug was fixed in
e2d4174e9c.
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:
* The "Oland" and "Hainan" variants of gfx601 are now split out into
gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
that to avoid using the shaderZExport workaround on gfx602.
* One variant of gfx703 is now split out into gfx705. LLPC and other
front-ends could use that to avoid using the
shaderSpiCsRegAllocFragmentation workaround on gfx705.
* The "TongaPro" variant of gfx802 is now split out into gfx805.
TongaPro has a faster 64-bit shift than its former friends in gfx802,
and a subtarget feature could be set up for that to take advantage of
it. This commit does not make that change; it just adds the target.
V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
so fix the GPUKind order.
Differential Revision: https://reviews.llvm.org/D88916
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
For example:
union M256 {
double d;
__m256 m;
};
extern void foo1(union M256 A);
union M256 m1;
void test() {
foo1(m1);
}
clang will pass m1 through stack which does not follow the ABI.
Differential Revision: https://reviews.llvm.org/D78699
Previously, when clang was compiled with -DLLVM_ENABLE_ASSERTIONS=ON, the added tests were displaying:
inlinable function call in a function with debug info must have a !dbg location
call void @"??1?$c@UB@@@@QEAA@XZ"(%struct.c* @"?f@?1??d@@YAPEAU?$c@UB@@@@XZ@4U2@A")
fatal error: error in backend: Broken module found, compilation aborted!
Stack dump:
0. Program arguments: <f:\svn\buildninja\bin\clang -cc1 -emit-llvm debug-info-no-location.cpp> -gcodeview -debug-info-kind=limited
1. <eof> parser at end of file
2. Per-function optimization
Fixes PR43012
Differential Revision: https://reviews.llvm.org/D66328
Move it as an EP callback (-O[123]) or in addSanitizersAtO0.
This makes it not run in ThinLTO pre-link (like the other sanitizers),
so don't check LTO runs in hwasan-new-pm.c. Changing its position also
seems to change the generated IR. I think we just need to make sure the
pass runs.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D88936
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to
consolidate more OpenMP code generation into the OMPIRBuilder. Future
additions to the GPU runtime functions should now go in OMPKinds.def
Reviewers: jdoerfert
Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl
Tags: #OpenMP #LLVM #clang
Differential Revision: https://reviews.llvm.org/D88430
SUMMARY:
In IBM compiler xlclang , there is an option -fnovisibility which suppresses visibility. For more details see: https://www.ibm.com/support/knowledgecenter/SSGH3R_16.1.0/com.ibm.xlcpp161.aix.doc/compiler_ref/opt_visibility.html.
We need to add the option -mignore-xcoff-visibility for compatibility with the IBM AIX OS (as the option is enabled by default in AIX). With this option llvm does not emit any visibility attribute to ASM or XCOFF object file.
The option only work on the AIX OS, for other non-AIX OS using the option will report an unsupported options error.
In AIX OS:
1.1 the option -mignore-xcoff-visibility is enabled by default , if there is not -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command .
1.2 if there is -fvisibility=* explicitly but not -mignore-xcoff-visibility explicitly in the clang command. it will generate visibility attributes.
1.3 if there are both -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command. The option "-mignore-xcoff-visibility" wins , it do not emit the visibility attribute.
The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR.
Reviewer: daltenty,Jason Liu
Differential Revision: https://reviews.llvm.org/D87451
D17779: host-side shadow variables of external declarations of device-side
global variables have internal linkage and are referenced by
`__cuda_register_globals`.
nvcc from CUDA 11 does not allow `__device__ inline` or `__device__ constexpr`
(C++17 inline variables) but clang has incorrectly supported them for a while:
```
error: A __device__ variable cannot be marked constexpr
error: An inline __device__/__constant__/__managed__ variable must have internal linkage when the program is compiled in whole program mode (-rdc=false)
```
If such a variable (which has a comdat group) is discarded (a copy from another
translation unit is prevailing and selected), accessing the variable from
outside the section group (`__cuda_register_globals`) is a violation of the ELF
specification and will be rejected by linkers:
> A symbol table entry with STB_LOCAL binding that is defined relative to one of a group's sections, and that is contained in a symbol table section that is not part of the group, must be discarded if the group members are discarded. References to this symbol table entry from outside the group are not allowed.
As a workaround, don't register such inline variables for now.
(If we register the variables in all TUs, we will keep multiple instances of the shadow and break the C++ semantics for inline variables).
We should reject such variables in Sema but our internal users need some time to migrate.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D88786