Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.
Depends on D98466.
Differential Revision: https://reviews.llvm.org/D98676
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.
Depends on D98457.
Differential Revision: https://reviews.llvm.org/D98466
This reverts commit 809a1e0ffd.
Mach-O doesn't support dso_local and this change broke XNU because of the use of dso_local.
Differential Revision: https://reviews.llvm.org/D98458
The condition variable is in scope in the loop increment, so we need to
emit the jump destination from wthin the scope of the condition
variable.
For GCC compatibility (and compatibility with real-world 'FOR_EACH'
macros), 'continue' is permitted in a statement expression within the
condition of a for loop, though, so there are two cases here:
* If the for loop has no condition variable, we can emit the jump
destination before emitting the condition.
* If the for loop has a condition variable, we must defer emitting the
jump destination until after emitting the variable. We diagnose a
'continue' appearing in the initializer of the condition variable,
because it would jump past the initializer into the scope of that
variable.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D98816
Added basic parsing/sema/serialization support for interop directive.
Support for the 'init' clause.
Differential Revision: https://reviews.llvm.org/D98558
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.
Differential Revision: https://reviews.llvm.org/D98487
The idiom:
```
DeclContext::lookup_result R = DeclContext::lookup(Name);
for (auto *D : R) {...}
```
is not safe when in the loop body we trigger deserialization from an AST file.
The deserialization can insert new declarations in the StoredDeclsList whose
underlying type is a vector. When the vector decides to reallocate its storage
the pointer we hold becomes invalid.
This patch replaces a SmallVector with an singly-linked list. The current
approach stores a SmallVector<NamedDecl*, 4> which is around 8 pointers.
The linked list is 3, 5, or 7. We do better in terms of memory usage for small
cases (and worse in terms of locality -- the linked list entries won't be near
each other, but will be near their corresponding declarations, and we were going
to fetch those memory pages anyway). For larger cases: the vector uses a
doubling strategy for reallocation, so will generally be between half-full and
full. Let's say it's 75% full on average, so there's N * 4/3 + 4 pointers' worth
of space allocated currently and will be 2N pointers with the linked list. So we
break even when there are N=6 entries and slightly lose in terms of memory usage
after that. We suspect that's still a win on average.
Thanks to @rsmith!
Differential revision: https://reviews.llvm.org/D91524
`CodeGenFunction::EmitRuntimeCall` automatically sets the right calling
convention for the callee so we can avoid setting it ourselves.
As requested in https://reviews.llvm.org/D98411
Reviewed by: anastasia
Differential Revision: https://reviews.llvm.org/D98705
The MSVC runtime library doesn't have a definition for wmemchr,
so provide an inline implementation.
Differential Revision: https://reviews.llvm.org/D98472
Recognize BI__builtin_isinf and BI__builtin_isfinite (and a few other opcodes
for finite) in testFPKind() and handle with TDC.
Review: Ulrich Weigand.
Differential Revision: https://reviews.llvm.org/D97901
This patch implements the __rndr and __rndrrs intrinsics to provide access to the random
number instructions introduced in Armv8.5-A. They are only defined for the AArch64
execution state and are available when __ARM_FEATURE_RNG is defined.
These intrinsics store the random number in their pointer argument and return a status
code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter
intrinsic reseeds the random number generator.
The instructions write the NZCV flags indicating the success of the operation that we can
then read with a CSET.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://bugs.llvm.org/show_bug.cgi?id=47838
Differential Revision: https://reviews.llvm.org/D98264
Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc
`__translate_sampler_initializer` has a calling convention of
`spir_func`, but clang generated calls to it using the default CC.
Instruction Combining was lowering these mismatching calling conventions
to `store i1* undef` which itself was subsequently lowered to a trap
instruction by simplifyCFG resulting in runtime `SIGILL`
There are arguably two bugs here: but whether there's any wisdom in
converting an obviously invalid call into a runtime crash over aborting
with a sensible error message will require further discussion. So for
now it's enough to set the right calling convention on the runtime
helper.
Reviewed By: svenh, bader
Differential Revision: https://reviews.llvm.org/D98411
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: mibintc
Differential Revision: https://reviews.llvm.org/D97125
In order to prevent further building issues related to the usage of SmallVector
in other compilation unit, this patch adjusts the llvm.h header as a workaround
instead.
Besides, this patch reverts previous workarounds:
1. Revert "[NFC] Use llvm::SmallVector to workaround XL compiler problem on AIX"
This reverts commit 561fb7f60a.
2.Revert "[clang][cli] Fix build failure in CompilerInvocation"
This reverts commit 8dc70bdcd0.
Differential Revision: https://reviews.llvm.org/D98552
This was motivated by the fact that constructor type homing (debug info
optimization that we want to turn on by default) drops some libc++ types,
so an attribute would allow us to override constructor homing and emit
them anyway. I'm currently looking into the particular libc++ issue, but
even if we do fix that, this issue might come up elsewhere and it might be
nice to have this.
As I've implemented it now, the attribute isn't specific to the
constructor homing optimization and overrides all of the debug info
optimizations.
Open to discussion about naming, specifics on what the attribute should do, etc.
Differential Revision: https://reviews.llvm.org/D97411
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.
There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
D96109 added support for unique internal linkage names for both internal
linkage functions and global variables. There was a lot of discussion on how to
get the demangling right for functions but I completely missed the point that
demanglers do not support suffixes for global vars. For example:
$ c++filt _ZL3foo
foo
$ c++filt _ZL3foo.uniq.123
_ZL3foo.uniq.123
The demangling for functions works as expected.
I am not sure of the impact of this. I don't understand how debuggers and other
tools depend on the correctness of global variable demangling so I am
pre-emptively disabling it until we can get the demangling support added.
Importantly, uniquefying global variables is not needed right now as we do not
do profile attribution to global vars based on sampling. It was added for
completeness and so this feature is not exactly missed.
Differential Revision: https://reviews.llvm.org/D98392
This patch extends the matrix spec to allow matrix-by-scalar division.
Originally support for `/` was left out to avoid ambiguity for the
matrix-matrix version of `/`, which could either be elementwise or
specified as matrix multiplication M1 * (1/M2).
For the matrix-scalar version, no ambiguity exists; `*` is also
an elementwise operation in that case. Matrix-by-scalar division
is commonly supported by systems including Matlab, Mathematica
or NumPy.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D97857
Summary:
The changes introduced in D87946 changed the API for libomptarget
functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x
but was not given a backward-compatible interface. This change will
require people using Clang 13.x or 12.x to recompile their offloading
programs.
Reviewed By: jdoerfert cchen
Differential Revision: https://reviews.llvm.org/D98358
These are incompatible with opaque pointers. This is in preparation
of dropping this API on the IRBuilder side as well.
Instead explicitly pass the loaded type.
Commit 1b04bdc2f3 added support for
capturing the 'this' pointer in a SEH context (__finally or __except),
But the case in which the 'this' pointer is part of a lambda capture
was not handled properly
Differential Revision: https://reviews.llvm.org/D97687
Demonstrate how to generate vadd/vfadd intrinsic functions
1. add -gen-riscv-vector-builtins for clang builtins.
2. add -gen-riscv-vector-builtin-codegen for clang codegen.
3. add -gen-riscv-vector-header for riscv_vector.h. It also generates
ifdef directives with extension checking, base on D94403.
4. add -gen-riscv-vector-generic-header for riscv_vector_generic.h.
Generate overloading version Header for generic api.
https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#c11-generic-interface
5. update tblgen doc for riscv related options.
riscv_vector.td also defines some unused type transformers for vadd,
because I think it could demonstrate how tranfer type work and we need
them for the whole intrinsic functions implementation in the future.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Reviewed By: jrtc27, craig.topper, HsiangKai, Jim, Paul-C-Anagnostopoulos
Differential Revision: https://reviews.llvm.org/D95016
It's available both in CodeGenOptions and in LangOptions, and LangOptions
implementation is slightly better as it uses a StringRef instead of a char
pointer, so use it.
Differential Revision: https://reviews.llvm.org/D98175
LLVM is recommending to use SmallVector (that is, omitting the N), in the
absence of a well-motivated choice for the number of inlined elements N.
However, this doesn't work well with XL compiler on AIX since some header(s)
aren't properly picked up with it. We need to take a further look into the real
issue underneath and fix it in a later patch.
But currently we'd like to use this patch to unblock the build compiler issue
first.
Differential Revision: https://reviews.llvm.org/D98265
SUMMARY:
n the patch https://reviews.llvm.org/D87451 "add new option -mignore-xcoff-visibility"
we did as "The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR."
in these patch we let -mignore-xcoff-visibility effect on generating IR too. the new feature only work on AIX OS
Reviewer: Jason Liu,
Differential Revision: https://reviews.llvm.org/D89986
This is the first patch supporting M68k in Clang
- Register M68k as a target
- Target specific CodeGen support
- Target specific attribute support
Authors: myhsu, m4yers, glaubitz
Differential Revision: https://reviews.llvm.org/D88393
The option -funique-internal-linkage-names was added in D73307 and D78243 as a
LLVM early pass to insert a unique suffix to internal linkage functions and
vars. The unique suffix was the hash of the module path. However, we found
that this can be done more cleanly in clang early and the fixes that need to
be done later can be completely avoided. The fixes in particular are trying
to modify the DW_AT_linkage_name and finding the right place to insert the
pass.
This patch ressurects the original implementation proposed in D73307 which
was reviewed and then ditched in favor of the pass based approach.
Differential Revision: https://reviews.llvm.org/D96109
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
* Only the worksharing-loop directive.
* Recognizes only the nowait clause.
* No loop nests with more than one loop.
* Untested with templates, exceptions.
* Semantic checking left to the existing infrastructure.
This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
* The distance function: How many loop iterations there will be before entering the loop nest.
* The loop variable function: Conversion from a logical iteration number to the loop variable.
These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.
The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.
For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.
Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.
Reviewed By: jdenny
Differential Revision: https://reviews.llvm.org/D94973
Background:
Wasm EH, while using Windows EH (catchpad/cleanuppad based) IR, uses
Itanium-based libraries and ABIs with some modifications.
`__clang_call_terminate` is a wrapper generated in Clang's Itanium C++
ABI implementation. It contains this code, in C-style pseudocode:
```
void __clang_call_terminate(void *exn) {
__cxa_begin_catch(exn);
std::terminate();
}
```
So this function is a wrapper to call `__cxa_begin_catch` on the
exception pointer before termination.
In Itanium ABI, this function is called when another exception is thrown
while processing an exception. The pointer for this second, violating
exception is passed as the argument of this `__clang_call_terminate`,
which calls `__cxa_begin_catch` with that pointer and calls
`std::terminate` to terminate the program.
The spec (https://libcxxabi.llvm.org/spec.html) for `__cxa_begin_catch`
says,
```
When the personality routine encounters a termination condition, it
will call __cxa_begin_catch() to mark the exception as handled and then
call terminate(), which shall not return to its caller.
```
In wasm EH's Clang implementation, this function is called from
cleanuppads that terminates the program, which we also call terminate
pads. Cleanuppads normally don't access the thrown exception and the
wasm backend converts them to `catch_all` blocks. But because we need
the exception pointer in this cleanuppad, we generate
`wasm.get.exception` intrinsic (which will eventually be lowered to
`catch` instruction) as we do in the catchpads. But because terminate
pads are cleanup pads and should run even when a foreign exception is
thrown, so what we have been doing is:
1. In `WebAssemblyLateEHPrepare::ensureSingleBBTermPads()`, we make sure
terminate pads are in this simple shape:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
```
2. In `WebAssemblyHandleEHTerminatePads` pass at the end of the
pipeline, we attach a `catch_all` to terminate pads, so they will be in
this form:
```
%exn = catch
call @__clang_call_terminate(%exn)
unreachable
catch_all
call @std::terminate()
unreachable
```
In `catch_all` part, we don't have the exception pointer, so we call
`std::terminate()` directly. The reason we ran HandleEHTerminatePads at
the end of the pipeline, separate from LateEHPrepare, was it was
convenient to assume there was only a single `catch` part per `try`
during CFGSort and CFGStackify.
---
Problem:
While it thinks terminate pads could have been possibly split or calls
to `__clang_call_terminate` could have been duplicated,
`WebAssemblyLateEHPrepare::ensureSingleBBTermPads()` assumes terminate
pads contain no more than calls to `__clang_call_terminate` and
`unreachable` instruction. I assumed that because in LLVM very limited
forms of transformations are done to catchpads and cleanuppads to
maintain the scoping structure. But it turned out to be incorrect;
passes can merge cleanuppads into one, including terminate pads, as long
as the new code has a correct scoping structure. One pass that does this
I observed was `SimplifyCFG`, but there can be more. After this
transformation, a single cleanuppad can contain any number of other
instructions with the call to `__clang_call_terminate` and can span many
BBs. It wouldn't be practical to duplicate all these BBs within the
cleanuppad to generate the equivalent `catch_all` blocks, only with
calls to `__clang_call_terminate` replaced by calls to `std::terminate`.
Unless we do more complicated transformation to split those calls to
`__clang_call_terminate` into a separate cleanuppad, it is tricky to
solve.
---
Solution (?):
This CL just disables the generation and use of `__clang_call_terminate`
and calls `std::terminate()` directly in its place.
The possible downside of this approach can be, because the Itanium ABI
intended to "mark" the violating exception handled, we don't do that
anymore. What `__cxa_begin_catch` actually does is increment the
exception's handler count and decrement the uncaught exception count,
which in my opinion do not matter much given that we are about to
terminate the program anyway. Also it does not affect info like stack
traces that can be possibly shown to developers.
And while we use a variant of Itanium EH ABI, we can make some
deviations if we choose to; we are already different in that in the
current version of the EH spec we don't support two-phase unwinding. We
can possibly consider a more complicated transformation later to
reenable this, but I don't think that has high priority.
Changes in this CL contains:
- In Clang, we don't generate a call to `wasm.get.exception()` intrinsic
and `__clang_call_terminate` function in terminate pads anymore; we
simply generate calls to `std::terminate()`, which is the default
implementation of `CGCXXABI::emitTerminateForUnexpectedException`.
- Remove `WebAssembly::ensureSingleBBTermPads() function and
`WebAssemblyHandleEHTerminatePads` pass, because terminate pads are
already `catch_all` now (because they don't need the exception
pointer) and we don't need these transformations anymore.
- Change tests to use `std::terminate` directly. Also removes tests that
tested `LateEHPrepare::ensureSingleBBTermPads` and
`HandleEHTerminatePads` pass.
- Drive-by fix: Add some function attributes to EH intrinsic
declarations
Fixes https://github.com/emscripten-core/emscripten/issues/13582.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D97834
Use a WeakTrackingVH to cope with the stmt emission logic that cleans up
unreachable blocks. This invalidates the reference to the deferred
replacement placeholder. Cope with it.
Fixes PR25102 (from 2015!)
This change adds a new IR noundef attribute, which denotes when a function call argument or return val may never contain uninitialized bits.
In MemorySanitizer, this attribute enables optimizations which decrease instrumented code size by up to 17% (measured with an instrumented build of clang) . I'll introduce the change allowing msan to take advantage of this information in a separate patch.
Differential Revision: https://reviews.llvm.org/D81678
explicitly emitting retainRV or claimRV calls in the IR
This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.
Original commit message:
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
If the file in line directive does not exist on the system we need, to
use the original file to get its file id.
Differential Revision: https://reviews.llvm.org/D97945
unsigned variable 'IntNo' has been declared but not been defined inside function
EmitWebAssemblyBuiltinExpr().
static code analysis tool complains about uninitialized variable "IntNo" since
this enters to default branch without setting any intrinsics and calls Function
*Callee = CGM.getIntrinsic(IntNo).
This patch fixes the problem by adding default cases in switch statements.
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.
> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
> which indicates the call is implicitly followed by a marker
> instruction and an implicit retainRV/claimRV call that consumes the
> call result. In addition, it emits a call to
> @llvm.objc.clang.arc.noop.use, which consumes the call result, to
> prevent the middle-end passes from changing the return type of the
> called function. This is currently done only when the target is arm64
> and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
> with the operand bundle in the IR and removes the inserted calls after
> processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
> operand bundle. It doesn't remove the operand bundle on the call since
> the backend needs it to emit the marker instruction. The retainRV and
> claimRV calls are emitted late in the pipeline to prevent optimization
> passes from transforming the IR in a way that makes it harder for the
> ARC middle-end passes to figure out the def-use relationship between
> the call and the retainRV/claimRV calls (which is the cause of
> PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
> nothing in the callee prevents it from being paired up with the
> retainRV/claimRV call in the caller. It then inserts a release call if
> claimRV is attached to the call since autoreleaseRV+claimRV is
> equivalent to a release. If it cannot find an autoreleaseRV call, it
> tries to transfer the operand bundle to a function call in the callee.
> This is important since the ARC optimizer can remove the autoreleaseRV
> returning the callee result, which makes it impossible to pair it up
> with the retainRV/claimRV call in the caller. If that fails, it simply
> emits a retain call in the IR if retainRV is attached to the call and
> does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
> constant value if the call has the operand bundle. This ensures the
> call always has at least one user (the call to
> @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
> multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
> calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808
This reverts commit ed4718eccb.
__builtin_isinf currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: mibintc
Differential Revision: https://reviews.llvm.org/D97125
If the mapped structure has data members, which have 'default' mappers,
need to map these members individually using their 'default' mappers.
Differential Revision: https://reviews.llvm.org/D92195
`GetAddrOfGlobalTemporary` previously tried to emit the initializer of
a global temporary before updating the global temporary map. Emitting the
initializer could recurse back into `GetAddrOfGlobalTemporary` for the same
temporary, resulting in an infinite recursion.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D97733
The situation with inline asm/MC error reporting is kind of messy at the
moment. The errors from MC layout are not reliably propagated and users
have to specify an inlineasm handler separately to get inlineasm
diagnose. The latter issue is not a correctness issue but could be improved.
* Kill LLVMContext inlineasm diagnose handler and migrate it to use
DiagnoseInfo/DiagnoseHandler.
* Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This
covers use cases like inlineasm, MC, and any clients using SourceMgr.
* Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step
is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all
use cases, only one of them is used.
* If LLVMContext is available, let MCContext uses LLVMContext's diagnose
handler; if LLVMContext is not available, MCContext uses its own default
diagnose handler which just prints SMDiagnostic.
* Change a few clients(Clang, llc, lldb) to use the new way of reporting.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D97449
Currently clang uses stub function to launch kernel. This is inconvenient
to interop with C++ programs since the stub function has different name
as kernel, which is required by ROCm debugger.
This patch emits a variable symbol which has the same name as the kernel
and uses it to register and launch the kernel. This allows C++ program to
launch a kernel by using the original kernel name.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D86376
This seems to be more of a Clang thing rather than a generic LLVM thing,
so this moves it out of LLVM pipelines and as Clang extension hooks into
LLVM pipelines.
Move the post-inline EEInstrumentation out of the backend pipeline and
into a late pass, similar to other sanitizer passes. It doesn't fit
into the codegen pipeline.
Also fix up EntryExitInstrumentation not running at -O0 under the new
PM. PR49143
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D97608
Simply make sure that the CodeGenFunction::CXXThisValue and CXXABIThisValue
are correctly initialized to the recovered value.
For lambda capture, we also need to make sure to fill the LambdaCaptureFields
Differential Revision: https://reviews.llvm.org/D97534
For ELF targets, GCC 11 will set SHF_GNU_RETAIN on the section of a
`__attribute__((retain))` function/variable to prevent linker garbage
collection. (See AttrDocs.td for the linker support).
This patch adds `retain` functions/variables to the `llvm.used` list, which has
the desired linker GC semantics. Note: `retain` does not imply `used`,
so an unused function/variable can be dropped by Sema.
Before 'retain' was introduced, previous ELF solutions require inline asm or
linker tricks, e.g. `asm volatile(".reloc 0, R_X86_64_NONE, target");`
(architecture dependent) or define a non-local symbol in the section and use
`ld -u`. There was no elegant source-level solution.
With D97448, `__attribute__((retain))` will set `SHF_GNU_RETAIN` on ELF targets.
Differential Revision: https://reviews.llvm.org/D97447
An global value in the `llvm.used` list does not have GC root semantics on ELF targets.
This will be changed in a subsequent backend patch.
Change some `llvm.used` in the ELF code path to use `llvm.compiler.used` to
prevent undesired GC root semantics.
Change one extern "C" alias (due to `__attribute__((used))` in extern "C") to use `llvm.compiler.used` on all targets.
GNU ld has a rule "`__start_/__stop_` references from a live input section retain the associated C identifier name sections",
which LLD may drop entirely (currently refined to exclude SHF_LINK_ORDER/SHF_GROUP) in a future release (the rule makes it clumsy to GC metadata sections; D96914 added a way to try the potential future behavior).
For `llvm.used` global values defined in a C identifier name section, keep using `llvm.used` so that
the future LLD change will not affect them.
rnk kindly categorized the changes:
```
ObjC/blocks: this wants GC root semantics, since ObjC mainly runs on Mac.
MS C++ ABI stuff: wants GC root semantics, no change
OpenMP: unsure, but GC root semantics probably don't hurt
CodeGenModule: affected in this patch to *not* use GC root semantics so that __attribute__((used)) behavior remains the same on ELF, plus two other minor use cases that don't want GC semantics
Coverage: Probably want GC root semantics
CGExpr.cpp: refers to LTO, wants GC root
CGDeclCXX.cpp: one is MS ABI specific, so yes GC root, one is some other C++ init functionality, which should form GC roots (C++ initializers can have side effects and must run)
CGDecl.cpp: Changed in this patch for __attribute__((used))
```
Differential Revision: https://reviews.llvm.org/D97446
These flags affect coverage mapping (-fcoverage-mapping), not
-fprofile-[instr-]generate so it makes more sense to use the
-fcoverage-* prefix.
Differential Revision: https://reviews.llvm.org/D97434
And then push those change throughout LLVM.
Keep the old signature in Clang's CGBuilder for now -- that will be
updated in a follow-on patch (D97224).
The MLIR LLVM-IR dialect is not updated to support the new alignment
attribute, but preserves its existing behavior.
Differential Revision: https://reviews.llvm.org/D97223
The new `-fsanitize-address-destructor-kind=` option allows control over how module
destructors are emitted by ASan.
The new option is consumed by both the driver and the frontend and is propagated into
codegen options by the frontend.
Both the legacy and new pass manager code have been updated to consume the new option
from the codegen options.
It would be nice if the new utility functions (`AsanDtorKindToString` and
`AsanDtorKindFromString`) could live in LLVM instead of Clang so they could be
consumed by other language frontends. Unfortunately that doesn't work because
the clang driver doesn't link against the LLVM instrumentation library.
rdar://71609176
Differential Revision: https://reviews.llvm.org/D96572
After a revision of D96274 changed `DiagnosticOptions` to not store all remark arguments **as-written**, it is no longer possible to reconstruct the arguments accurately from the class.
This is caused by the fact that for `-Rpass=regexp` and friends, `DiagnosticOptions` store only the group name `pass` and not `regexp`. This is the same representation used for the plain `-Rpass` argument.
Note that each argument must be generated exactly once in `CompilerInvocation::generateCC1CommandLine`, otherwise each subsequent call would produce more arguments than the previous one. Currently this works out because of the way `RoundTrip` splits the responsibilities for certain arguments based on what arguments were queried during parsing. However, this invariant breaks when we move to single round-trip for the whole `CompilerInvocation`.
This patch ensures that for one `-Rpass=regexp` argument, we don't generate two arguments (`-Rpass` from `DiagnosticOptions` and `-Rpass=regexp` from `CodeGenOptions`) by shifting the responsibility for handling both cases to `CodeGenOptions`. To distinguish between the cases correctly, additional information is stored in `CodeGenOptions`.
The `CodeGenOptions` parser of `-Rpass[=regexp]` arguments also looks at `-Rno-pass` and `-R[no-]everything`, which is necessary for generating the correct argument regardless of the ordering of `CodeGenOptions`/`DiagnosticOptions` parsing/generation.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D96847
For -fgpu-rdc mode, static device vars in different TU's may have the same name.
To support accessing file-scope static device variables in host code, we need to give them
a distinct name and external linkage. This can be done by postfixing each static device variable with
a distinct CUID (Compilation Unit ID) hash.
Since the static device variables have different name across compilation units, now we let
them have external linkage so that they can be looked up by the runtime.
Reviewed by: Artem Belevich, and Jon Chesterfield
Differential Revision: https://reviews.llvm.org/D85223
-O1 and above do dont call real optimizer pipeline in ThinLTO PreLink.
Also clang can't add PostLink OptimizerLastEPCallbacks for in-process ThinLTO.
This results in missing sanitizer passes with ThinLTO.
Simple working solution is just call OptimizerLastEPCallbacks
at the end of buildThinLTOPreLinkDefaultPipeline.
Differential Revision: https://reviews.llvm.org/D96320
Patch takes advantage of the implicit default behavior to reduce the number of attributes, which in turns reduces compilation time.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D97116
Currently managed variables are emitted as undefined symbols, which
causes difficulty for diagnosing undefined symbols for non-managed
variables.
This patch transforms managed variables in device compilation so that
they can be emitted as normal variables.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D96195
Demonstrate how to add RISC-V V builtins and lower them to IR intrinsics for V extension.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93446
Previously, the definition was so-marked, but the declaration was
not. This resulted in LLVM's dwarf emission treating the function as
being external, and incorrectly emitting DW_AT_external.
Differential Revision: https://reviews.llvm.org/D96044
This patch responds to a comment from @vitalybuka in D96203: suggestion to
do the change incrementally, and start by modifying this file name. I modified
the file name and made the other changes that follow from that rename.
Reviewers: vitalybuka, echristo, MaskRay, jansvoboda11, aaron.ballman
Differential Revision: https://reviews.llvm.org/D96974
This patch adds the following SHA3 Intrinsics:
vsha512hq_u64,
vsha512h2q_u64,
vsha512su0q_u64,
vsha512su1q_u64
veor3q_u8
veor3q_u16
veor3q_u32
veor3q_u64
veor3q_s8
veor3q_s16
veor3q_s32
veor3q_s64
vrax1q_u64
vxarq_u64
vbcaxq_u8
vbcaxq_u16
vbcaxq_u32
vbcaxq_u64
vbcaxq_s8
vbcaxq_s16
vbcaxq_s32
vbcaxq_s64
Note need to include +sha3 and +crypto when building from the front-end
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D96381
When WPD is enabled, via WholeProgramVTables, emit type metadata for
available_externally vtables. Additionally, add the vtables to the
llvm.compiler.used global so that they are not prematurely eliminated
(before *LTO analysis).
This is needed to avoid devirtualizing calls to a function overriding a
class defined in a header file but with a strong definition in a shared
library. Without type metadata on the available_externally vtables from
the header, the WPD analysis never sees what a derived class is
overriding. Even if the available_externally base class functions are
pure virtual, because shared library definitions are already treated
conservatively (committed patches D91583, D96721, and D96722) we will
not devirtualize, which would be unsafe since the library might contain
overrides that aren't visible to the LTO unit.
An example is std::error_category, which is overridden in LLVM
and causing failures after a self build with WPD enabled, because
libstdc++ contains hidden overrides of the virtual base class methods.
Differential Revision: https://reviews.llvm.org/D96919
We currently always store absolute filenames in coverage mapping. This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines. We are also
duplicating the path prefix potentially wasting space.
This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.
Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.
Differential Revision: https://reviews.llvm.org/D95753
We currently always store absolute filenames in coverage mapping. This
is problematic for several reasons. It poses a problem for distributed
compilation as source location might vary across machines. We are also
duplicating the path prefix potentially wasting space.
This change modifies how we store filenames in coverage mapping. Rather
than absolute paths, it stores the compilation directory and file paths
as given to the compiler, either relative or absolute. Later when
reading the coverage mapping information, we recombine relative paths
with the working directory. This approach is similar to handling
ofDW_AT_comp_dir in DWARF.
Finally, we also provide a new option, -fprofile-compilation-dir akin
to -fdebug-compilation-dir which can be used to manually override the
compilation directory which is useful in distributed compilation cases.
Differential Revision: https://reviews.llvm.org/D95753
The recent commit 00a6254 "Stop traping on sNaN in builtin_isnan" changed the
lowering in constrained FP mode of builtin_isnan from an FP comparison to
integer operations to avoid trapping.
SystemZ has a special instruction "Test Data Class" which is the preferred
way to do this check. This patch adds a new target hook "testFPKind()" that
lets SystemZ emit the s390_tdc intrinsic instead.
testFPKind() takes the BuiltinID as an argument and is expected to soon
handle more opcodes than just 'builtin_isnan'.
Review: Thomas Preud'homme, Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D96568
Add the types for the RISC-V V extension builtins.
These types will be used by the RISC-V V intrinsics which require
types of the form <vscale x 1 x i64>(LMUL=1 element size=64) or
<vscale x 4 x i32>(LMUL=2 element size=32), etc. The vector_size
attribute does not work for us as it doesn't create a scalable
vector type. We want these types to be opaque and have no operators
defined for them. We want them to be sizeless. This makes them
similar to the ARM SVE builtin types. But we will have quite a bit
more types. This patch adds around 60. Later patches will add
another 230 or so types representing tuples of these types similar
to the x2/x3/x4 types in ARM SVE. But with extra complexity that
these types are combined with the LMUL concept that is unique to
RISCV.
For more background see this RFC
http://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html
Authored-by: Roger Ferrer Ibanez <roger.ferrer@bsc.es>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D92715
This allows the option to affect the LTO output. Module::Max helps to
generate debug info for all modules in the same format.
Differential Revision: https://reviews.llvm.org/D96597
OpenMP 5.0 removed a lot of restriction for overlapped mapped items
comparing to OpenMP 4.5. Patch restricts the checks for overlapped data
mappings only for OpenMP 4.5 and less and reorders mapping of the
arguments so, that present and alloc mappings are processed first and
then all others.
Differential Revision: https://reviews.llvm.org/D86119
The tile directive is in OpenMP's Technical Report 8 and foreseeably will be part of the upcoming OpenMP 5.1 standard.
This implementation is based on an AST transformation providing a de-sugared loop nest. This makes it simple to forward the de-sugared transformation to loop associated directives taking the tiled loops. In contrast to other loop associated directives, the OMPTileDirective does not use CapturedStmts. Letting loop associated directives consume loops from different capture context would be difficult.
A significant amount of code generation logic is taking place in the Sema class. Eventually, I would prefer if these would move into the CodeGen component such that we could make use of the OpenMPIRBuilder, together with flang. Only expressions converting between the language's iteration variable and the logical iteration space need to take place in the semantic analyzer: Getting the of iterations (e.g. the overload resolution of `std::distance`) and converting the logical iteration number to the iteration variable (e.g. overload resolution of `iteration + .omp.iv`). In clang, only CXXForRangeStmt is also represented by its de-sugared components. However, OpenMP loop are not defined as syntatic sugar. Starting with an AST-based approach allows us to gradually move generated AST statements into CodeGen, instead all at once.
I would also like to refactor `checkOpenMPLoop` into its functionalities in a follow-up. In this patch it is used twice. Once for checking proper nesting and emitting diagnostics, and additionally for deriving the logical iteration space per-loop (instead of for the loop nest).
Differential Revision: https://reviews.llvm.org/D76342
This takes advantage of the implicit default behavior to reduce the number of
attributes, which in turns reduces compilation time. I've observed -3% in
instruction count when compiling sqlite3 amalgamation with -O0
Differential Revision: https://reviews.llvm.org/D96400
This is a follow up of D92940.
We have successfully converted fadd/fmul _mm_reduce_* intrinsics to
llvm.reduction + reassoc flag. We can do the same approach for fmin/fmax
too, i.e. llvm.reduction + nnan flag.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93179
This patch ensures that vector predication and vectorization width
pragmas work together correctly/as expected. Specifically, this patch
fixes the issue that when vectorization_width > 1, the vector
predication behaviour (this would matter if it has NOT been disabled
explicitly by a pragma) was getting ignored, which was incorrect.
The fix here removes the dependence of vector predication on the
vectorization width. The loop metadata corresponding to clang loop
pragma vectorize_predicate is always emitted, if the pragma is
specified, even if vectorization is disabled by vectorize_width(1)
or vectorize(disable) since the option is also used for interleaving
by the LoopVectorize pass.
Reviewed By: dmgreen, Meinersbur
Differential Revision: https://reviews.llvm.org/D94779
This patch adds 2 new options to control when Clang adds `mustprogress`:
1. -ffinite-loops: assume all loops are finite; mustprogress is added
to all loops, regardless of the selected language standard.
2. -fno-finite-loops: assume no loop is finite; mustprogress is not
added to any loop or function. We could add mustprogress to
functions without loops, but we would have to detect that in Clang,
which is probably not worth it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D96419
class types.
The goal is to provide a way to bypass constructor homing when emitting
class definitions and force class definitions in the debug info.
Not sure about the wording of the attribute, or whether it should be
specific to classes with constructors
explicitly emitting retainRV or claimRV calls in the IR
Background:
This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
which indicates the call is implicitly followed by a marker
instruction and an implicit retainRV/claimRV call that consumes the
call result. In addition, it emits a call to
@llvm.objc.clang.arc.noop.use, which consumes the call result, to
prevent the middle-end passes from changing the return type of the
called function. This is currently done only when the target is arm64
and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
claimRV is attached to the call since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since the ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if retainRV is attached to the call and
does nothing if claimRV is attached to it.
- SCCP refrains from replacing the return value of a call with a
constant value if the call has the operand bundle. This ensures the
call always has at least one user (the call to
@llvm.objc.clang.arc.noop.use).
- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
multiple operand bundles of the same kind were being added to a call.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
EarlyCSEPass called after msan redices code size by about 10%.
Similar optimization exists for legacy pass manager in
addGeneralOptsForMemorySanitizer.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D96406
The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o
adding any new functionality.
Differential Revision: https://reviews.llvm.org/D95974
Signed prefix is removed and the single word spelling is
printed for the scalar types.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D96161
Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D96231
As Itanium ABI[http://itanium-cxx-abi.github.io/cxx-abi/abi.html#once-ctor]
points out:
"The size of the guard variable is 64 bits. The first byte (i.e. the byte at
the address of the full variable) shall contain the value 0 prior to
initialization of the associated variable, and 1 after initialization is complete."
Differential Revision: https://reviews.llvm.org/D95822
Pipe element type spelling for arg info metadata
should follow the same behavior as normal type spelling.
We should only use the canonical type spelling in the
base type field.
This patch also removed duplication in type handling.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D96151
When a function or a file is excluded using -fprofile-list= option,
don't emit coverage mapping as doing so confuses users since those
functions would always have zero count. This also reduces the binary
size considerably in cases where only a few functions or files are
being instrumented.
Differential Revision: https://reviews.llvm.org/D96000
For -fgpu-rdc, shadow variables should not be internalized, otherwise
they cannot be accessed by other TUs. This is necessary because
the shadow variable of external device variables are always
emitted as undefined symbols, which need to resolve to a global
symbols.
Managed variables need to be emitted as undefined symbols
in device compilations.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D95901
__builtin_isnan currently generates a floating-point compare operation
which triggers a trap when faced with a signaling NaN in StrictFP mode.
This commit uses integer operations instead to not generate any trap in
such a case.
Reviewed By: kpn
Differential Revision: https://reviews.llvm.org/D95948
- The failures are all cc1-based tests due to the missing `-aux-triple` options,
which is always prepared by the driver in CUDA/HIP compilation.
- Add extra check on the missing aux-targetinfo to prevent crashing.
[hip][cuda] Enable extended lambda support on Windows.
- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D69322
This reverts commit 4874ff0241.
emitting retainRV or claimRV calls in the IR
This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.
Original commit message:
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
emitting retainRV or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end adds operand bundle "clang.arc.rv" to calls, which
indicates the call is implicitly followed by a marker instruction and
an implicit retainRV/claimRV call that consumes the call result. In
addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
consumes the call result, to prevent the middle-end passes from changing
the return type of the called function. This is currently done only when
the target is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
with the operand bundle in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the call with the
operand bundle. It doesn't remove the operand bundle on the call since
the backend needs it to emit the marker instruction. The retainRV and
claimRV calls are emitted late in the pipeline to prevent optimization
passes from transforming the IR in a way that makes it harder for the
ARC middle-end passes to figure out the def-use relationship between
the call and the retainRV/claimRV calls (which is the cause of
PR31925).
- The function inliner removes an autoreleaseRV call in the callee if
nothing in the callee prevents it from being paired up with the
retainRV/claimRV call in the caller. It then inserts a release call if
the call is annotated with claimRV since autoreleaseRV+claimRV is
equivalent to a release. If it cannot find an autoreleaseRV call, it
tries to transfer the operand bundle to a function call in the callee.
This is important since ARC optimizer can remove the autoreleaseRV
returning the callee result, which makes it impossible to pair it up
with the retainRV/claimRV call in the caller. If that fails, it simply
emits a retain call in the IR if the implicit call is a call to
retainRV and does nothing if it's a call to claimRV.
Future work:
- Use the operand bundle on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the operand bundles.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D69322
Clang usually propagates counter mapping region for conditions of `if`, `while`,
`for`, etc from parent counter. We should do the same for condition of conditional operator.
Differential Revision: https://reviews.llvm.org/D95918
Extract registering device variable to CUDA runtime codegen function since it
will be called in multiple places.
Reviewed by: Artem Belevich
Differential Revision: https://reviews.llvm.org/D95558
Currently clang is not correctly retrieving from the AST the metadata for
constrained FP builtins. This patch fixes that for the X86 specific builtins.
Differential Revision: https://reviews.llvm.org/D94614
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
Normally, Clang will not make dllimport functions available for inlining
if they reference non-imported symbols, as this can lead to confusing
link errors. But if the function is marked always_inline, the user
presumably knows what they're doing and the attribute should be honored.
Differential revision: https://reviews.llvm.org/D95673
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.
The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.
Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.
Differential Revision: https://reviews.llvm.org/D76802
with fix to test case and stringrefs.
Currently (for codeview) lambdas have a string like `<lambda_0>` in
their mangled name, and don't have any display name. This change uses the
`<lambda_0>` as the display name, which helps distinguish between lambdas
in -gline-tables-only, since there are no linkage names there.
It also changes how we display lambda names; previously we used
`<unnamed-tag>`; now it will show `<lambda_0>`.
I added a function to the mangling context code to create this string;
for Itanium it just returns an empty string.
Bug: https://bugs.llvm.org/show_bug.cgi?id=48432
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D95187
This reverts 9b21d4b943
Currently (for codeview) lambdas have a string like `<lambda_0>` in
their mangled name, and don't have any display name. This change uses the
`<lambda_0>` as the display name, which helps distinguish between lambdas
in -gline-tables-only, since there are no linkage names there.
It also changes how we display lambda names; previously we used
`<unnamed-tag>`; now it will show `<lambda_0>`.
I added a function to the mangling context code to create this string;
for Itanium it just returns an empty string.
Bug: https://bugs.llvm.org/show_bug.cgi?id=48432
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D95187
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
There are two use cases.
Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).
Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld". You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727https://sourceware.org/bugzilla/show_bug.cgi?id=22969)
Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).
This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.
`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.
Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).
Differential Revision: https://reviews.llvm.org/D85474
For Clang synthesized `__va_list_tag` (`CreateX86_64ABIBuiltinVaListDecl`),
its DW_AT_decl_file/DW_AT_decl_line are arbitrarily set from `CurLoc`.
In a stage 2 `-DCMAKE_BUILD_TYPE=Debug` clang build, I observe that
in driver.cpp, DW_AT_decl_file/DW_AT_decl_line may be set to an `#include` line
(the transitively included file uses va_arg (`__builtin_va_arg`)).
This seems arbitrary. Drop that.
Reviewed By: #debug-info, dblaikie
Differential Revision: https://reviews.llvm.org/D94735
`getLineNumber()` picks CurLoc if the parameter is invalid. This appears to
mainly work around missing SourceLocation information for some constructs, but
sometimes adds unintended locations.
* For `CodeGenObjC/debug-info-blocks.m`, `CurLoc` has been advanced to the closing brace. The debug line of `ImplicitVarParameter` is set to the line of `}` because this implicit parameter has an invalid `SourceLocation`. The debug line is a bit arbitrary - perhaps the location of `^{` is better.
* The file/line of Clang synthesized `__va_list_tag` is arbitrarily attached a `#include` line. D94735
Drop the special case to make getLineNumber less magic and add CurLoc fallback in its callers instead.
Tested with stage 2 -DCMAKE_BUILD_TYPE=Debug clang, byte identical.
Reviewed By: #debug-info, aprantl
Differential Revision: https://reviews.llvm.org/D94391
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
Whenever we enter a new OpenMP data environment we want to enter a
function to simplify reasoning. Later we probably want to remove the
entire specialization wrt. the if clause and pass the result to the
runtime, for now this should fix PR48686.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D94315
or claimRV calls in the IR
Background:
This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.
https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
What this patch does to fix the problem:
- The front-end annotates calls with attribute "clang.arc.rv"="retain"
or "clang.arc.rv"="claim", which indicates the call is implicitly
followed by a marker instruction and a retainRV/claimRV call that
consumes the call result. This is currently done only when the target
is arm64 and the optimization level is higher than -O0.
- ARC optimizer temporarily emits retainRV/claimRV calls after the
annotated calls in the IR and removes the inserted calls after
processing the function.
- ARC contract pass emits retainRV/claimRV calls after the annotated
calls. It doesn't remove the attribute on the call since the backend
needs it to emit the marker instruction. The retainRV/claimRV calls
are emitted late in the pipeline to prevent optimization passes from
transforming the IR in a way that makes it harder for the ARC
middle-end passes to figure out the def-use relationship between the
call and the retainRV/claimRV calls (which is the cause of PR31925).
- The function inliner removes the autoreleaseRV call in the callee that
returns the result if nothing in the callee prevents it from being
paired up with the calls annotated with "clang.arc.rv"="retain/claim"
in the caller. If the call is annotated with "claim", a release call
is inserted since autoreleaseRV+claimRV is equivalent to a release. If
it cannot find an autoreleaseRV call, it tries to transfer the
attributes to a function call in the callee. This is important since
ARC optimizer can remove the autoreleaseRV call returning the callee
result, which makes it impossible to pair it up with the retainRV or
claimRV call in the caller. If that fails, it simply emits a retain
call in the IR if the call is annotated with "retain" and does nothing
if it's annotated with "claim".
- This patch teaches dead argument elimination pass not to change the
return type of a function if any of the calls to the function are
annotated with attribute "clang.arc.rv". This is necessary since the
pass can incorrectly determine nothing in the IR uses the function
return, which can happen since the front-end no longer explicitly
emits retainRV/claimRV calls in the IR, and change its return type to
'void'.
Future work:
- Use the attribute on x86-64.
- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
calls annotated with the attributes.
rdar://71443534
Differential Revision: https://reviews.llvm.org/D92808
In the PPC32 SVR4 ABI, a va_list has copies of registers from the function call.
va_arg looked in the wrong registers for (the pointer representation of) an
object in Objective-C, and for some types in C++. Fix va_arg to look in the
general-purpose registers, not the floating-point registers. Also fix va_arg
for some C++ types, like a member function pointer, that are aggregates for
the ABI.
Anthony Richardby found the problem in Objective-C. Eli Friedman suggested
part of this fix.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47921
Reviewed By: efriedma, nemanjai
Differential Revision: https://reviews.llvm.org/D90329
Most of CGExprConstant.cpp is using the CharUnits abstraction
and is using getCharWidth() (directly of indirectly) when converting
between size of a char and size in bits. This patch is making that
abstraction more consistent by adding CharTy to the CodeGenTypeCache
(honoring getCharWidth() when mapping from char to LLVM IR types,
instead of using Int8Ty directly).
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D94979
When using getByteArrayType the requested size is calculated in
char units, but the type used for the array was hardcoded to the
Int8Ty. This patch is using getCharWIdth a bit more consistently
by using getIntNTy in combination with getCharWidth, instead
of explictly using getInt8Ty.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D94977
This patch implements codegen for __managed__ variable attribute for HIP.
Diagnostics will be added later.
Differential Revision: https://reviews.llvm.org/D94814
check
This patch fixes a bug in emitARCOperationAfterCall where it inserts the
fall-back call after a bitcast instruction and then replaces the
bitcast's operand with the result of the fall-back call. The generated
IR without this patch looks like this:
msgSend.call: ; preds = %entry
%call = call i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend
br label %msgSend.cont
msgSend.null-receiver: ; preds = %entry
call void @llvm.objc.release(i8* %4)
br label %msgSend.cont
msgSend.cont:
%8 = phi i8* [ %call, %msgSend.call ], [ null, %msgSend.null-receiver ]
%9 = bitcast i8* %10 to %0*
%10 = call i8* @llvm.objc.retain(i8* %8)
Notice that `%9 = bitcast i8* %10` to %0* is taking operand %10 which is
defined after it.
To fix the bug, this patch modifies the insert point to point to the
bitcast instruction so that the fall-back call is inserted before the
bitcast. In addition, it teaches the function to look at phi
instructions that are generated when there is a check for a null
receiver and insert the retainRV/claimRV instruction right after the
call instead of inserting a fall-back call right after the phi
instruction.
rdar://73360225
Differential Revision: https://reviews.llvm.org/D95181
CodeGenModule::EmitNullConstant() creates constants with their "in memory"
type, not their "in vregs" type. The one place where this difference matters is
when the type is _Bool, as that is an i1 when in vregs and an i8 in memory.
Fixes: rdar://73361264
Summary:
The custom mapper API did not previously support the mapping names added previously. This means they were not present if a user requested debugging information while using the mapper functions. This adds basic support for passing the mapped names to the runtime library.
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D94806
Combined with 'da98651 - Revert "DR2064:
decltype(E) is only a dependent', this change (5a391d3) caused verifier
errors when building Chromium. See https://crbug.com/1168494#c1 for a
reproducer.
Additionally it reverts changes that were dependent on this one, see
below.
> Following up on PR48517, fix handling of template arguments that refer
> to dependent declarations.
>
> Treat an id-expression that names a local variable in a templated
> function as being instantiation-dependent.
>
> This addresses a language defect whereby a reference to a dependent
> declaration can be formed without any construct being value-dependent.
> Fixing that through value-dependence turns out to be problematic, so
> instead this patch takes the approach (proposed on the core reflector)
> of allowing the use of pointers or references to (but not values of)
> dependent declarations inside value-dependent expressions, and instead
> treating template arguments as dependent if they evaluate to a constant
> involving such dependent declarations.
>
> This ends up affecting a bunch of OpenMP tests, due to OpenMP
> imprecisely handling instantiation-dependent constructs, bailing out
> early instead of processing dependent constructs to the extent possible
> when handling the template.
>
> Previously committed as 8c1f2d15b8, and
> reverted because a dependency commit was reverted.
This reverts commit 5a391d38ac.
It also restores clang/test/SemaCXX/coroutines.cpp to its state before
da986511fb.
Revert "[c++20] P1907R1: Support for generalized non-type template arguments of scalar type."
> Previously committed as 9e08e51a20, and
> reverted because a dependency commit was reverted. This incorporates the
> following follow-on commits that were also reverted:
>
> 7e84aa1b81 by Simon Pilgrim
> ed13d8c667 by me
> 95c7b6cadb by Sam McCall
> 430d5d8429 by Dave Zarzycki
This reverts commit 4b574008ae.
Revert "[msabi] Mangle a template argument referring to array-to-pointer decay"
> [msabi] Mangle a template argument referring to array-to-pointer decay
> applied to an array the same as the array itself.
>
> This follows MS ABI, and corrects a regression from the implementation
> of generalized non-type template parameters, where we "forgot" how to
> mangle this case.
This reverts commit 18e093faf7.
OMP_MAP_TARGET_PARAM flag is used to mark the data that shoud be passed
as arguments to the target kernels, nothing else. But the compiler still
marks the data with OMP_MAP_TARGET_PARAM flags even if the data is
passed to the data movement directives, like target data, target update
etc. This flag is just ignored for this directives and the compiler does
not need to emit it.
Reviewed By: cchen
Differential Revision: https://reviews.llvm.org/D91261
D94745 rewrites the `deviceRTLs` using OpenMP and compiles it by directly
calling the device compilation. `clang` crashes because entry in
`OffloadEntriesDeviceGlobalVar` is unintialized. Current design supposes the
device compilation can only be invoked after host compilation with the host IR
such that `clang` can initialize `OffloadEntriesDeviceGlobalVar` from host IR.
This avoids us using device compilation directly, especially when we only have
code wrapped into `declare target` which are all device code. The same issue
also exists for `OffloadEntriesInfoManager`.
In this patch, we simply initialized an entry if it is not in the maps. Not sure
we need an option to tell the device compiler that it is invoked standalone.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94871
Previously committed as 9e08e51a20, and
reverted because a dependency commit was reverted. This incorporates the
following follow-on commits that were also reverted:
7e84aa1b81 by Simon Pilgrim
ed13d8c667 by me
95c7b6cadb by Sam McCall
430d5d8429 by Dave Zarzycki
for function scopes, rather than using the qualified name.
In line-tables-only mode, we used to emit qualified names as the display name for functions when using CodeView.
This patch changes to emitting the parent scopes instead, with forward declarations for class types.
The total object file size ends up being slightly smaller than if we use the full qualified names.
Differential Revision: https://reviews.llvm.org/D94639
Under -mabi=ieeelongdouble on PowerPC, IEEE-quad floating point semantic
is used for long double. This patch mutates call to related builtins
into f128 version on PowerPC. And in theory, this should be applied to
other targets when their backend supports IEEE 128-bit style libcalls.
GCC already has these mutations except nansl, which is not available on
PowerPC along with other variants (nans, nansf).
Reviewed By: RKSimon, nemanjai
Differential Revision: https://reviews.llvm.org/D92080
This introduces the ARMv8.7-A LS64 extension's intrinsics for 64 bytes
atomic loads and stores: `__arm_ld64b`, `__arm_st64b`, `__arm_st64bv`,
and `__arm_st64bv0`. These are selected into the LS64 instructions
LD64B, ST64B, ST64BV and ST64BV0, respectively.
Based on patches written by Simon Tatham.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D93232
Move nomerge attribute from function declaration/definition to callsites to
allow virtual function calls attach the attribute.
Differential Revision: https://reviews.llvm.org/D94537
The patch adds the required methods to FixedPointBuilder
for converting between fixed-point and floating point,
and uses them from Clang.
This depends on D54749.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D86632
VLST return values are coerced to VLATs in the function epilog for
consistency with the VLAT ABI. Previously, this coercion was done
through memory. It is preferable to use the
llvm.experimental.vector.insert intrinsic to avoid going through memory
here.
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D94290
ELF -fno-pic sets dso_local on a function declaration to allow direct accesses
when taking its address (similar to a data symbol). The emitted code follows the
traditional GCC/Clang -fno-pic behavior: an absolute relocation is produced.
If the function is not defined in the executable, a canonical PLT entry will be
needed at link time. This is similar to a copy relocation and is incompatible
with (-Bsymbolic or --dynamic-list linked shared objects / protected symbols in
a shared object).
This patch gives -fno-pic code a way to avoid such a canonical PLT entry.
The FIXME was about a generalization for -fpie -mpie-copy-relocations (now -fpie
-fdirect-access-external-data). While we could set dso_local to avoid GOT when
taking the address of a function declaration (there is an ignorable difference
about R_386_PC32 vs R_386_PLT32 on i386), it likely does not provide any benefit
and can just cause trouble, so we don't make the generalization.
D92633 added -f[no-]direct-access-external-data to supersede -m[no-]pie-copy-relocations.
(The option works for -fpie but is a no-op for -fno-pic and -fpic.)
This patch makes -fno-pic -fno-direct-access-external-data drop dso_local from
global variable declarations. This usually causes the backend to emit a GOT
indirection for external data access. With a GOT relocation, the subsequent
-no-pie link will not have copy relocation even if the data symbol turns out to
be defined by a shared object.
Differential Revision: https://reviews.llvm.org/D92714
GCC r218397 "x86-64: Optimize access to globals in PIE with copy reloc" made
-fpie code emit R_X86_64_PC32 to reference external data symbols by default.
Clang adopted -mpie-copy-relocations D19996 as a flexible alternative.
The name -mpie-copy-relocations can be improved [1] and does not capture the
idea that this option can apply to -fno-pic and -fpic [2], so this patch
introduces -f[no-]direct-access-external-data and makes -mpie-copy-relocations
their aliases for compatibility.
[1]
For
```
extern int var;
int get() { return var; }
```
if var is defined in another translation unit in the link unit, there is no copy
relocation.
[2]
-fno-pic -fno-direct-access-external-data is useful to avoid copy relocations.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65888
If a shared object is linked with -Bsymbolic or --dynamic-list and exports a
data symbol, normally the data symbol cannot be accessed by -fno-pic code
(because by default an absolute relocation is produced which will lead to a copy
relocation). -fno-direct-access-external-data can prevent copy relocations.
-fpic -fdirect-access-external-data can avoid GOT indirection. This is like the
undefined counterpart of -fno-semantic-interposition. However, the user should
define var in another translation unit and link with -Bsymbolic or
--dynamic-list, otherwise the linker will error in a -shared link. Generally
the user has better tools for their goal but I want to mention that this
combination is valid.
On COFF, the behavior is like always -fdirect-access-external-data.
`__declspec(dllimport)` is needed to enable indirect access.
There is currently no plan to affect non-ELF behaviors or -fpic behaviors.
-fno-pic -fno-direct-access-external-data will be implemented in the subsequent patch.
GCC feature request https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D92633
Like @aprantl suggested, modify to use the canonicalized DIFile, if we
don't know the loc info and filename for the compiler generated
functions for example static initialization functions.
Reviewed By: dblaikie, aprantl
Differential Revision: https://reviews.llvm.org/D87147
Match the legacy PM in running various ObjC ARC passes.
This requires making some module passes into function passes. These were
initially ported as module passes since they add function declarations
(e.g. https://reviews.llvm.org/D86178), but that's still up for debate
and other passes do so.
Reviewed By: ahatanak
Differential Revision: https://reviews.llvm.org/D93743
@ikudrin enabled support for dwarf64 in D87011. Adding a clang flag so it can be used through that compilation pass.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D90507
`wasm_rethrow_in_catch` intrinsic and builtin are used in order to
rethrow an exception when the exception is caught but there is no
matching clause within the current `catch`. For example,
```
try {
foo();
} catch (int n) {
...
}
```
If the caught exception does not correspond to C++ `int` type, it should
be rethrown. These intrinsic/builtin were renamed `rethrow_in_catch`
because at the time I thought there would be another intrinsic for C++'s
`throw` keyword, which rethrows an exception. It turned out that `throw`
keyword doesn't require wasm's `rethrow` instruction, so we rename
`rethrow_in_catch` to just `rethrow` here.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D94038
This patch adds support for two new variants of the vectorize_width
pragma:
1. vectorize_width(X[, fixed|scalable]) where an optional second
parameter is passed to the vectorize_width pragma, which indicates if
the user wishes to use fixed width or scalable vectorization. For
example the user can now write something like:
#pragma clang loop vectorize_width(4, fixed)
or
#pragma clang loop vectorize_width(4, scalable)
In the absence of a second parameter it is assumed the user wants
fixed width vectorization, in order to maintain compatibility with
existing code.
2. vectorize_width(fixed|scalable) where the width is left unspecified,
but the user hints what type of vectorization they prefer, either
fixed width or scalable.
I have implemented this by making use of the LLVM loop hint attribute:
llvm.loop.vectorize.scalable.enable
Tests were added to
clang/test/CodeGenCXX/pragma-loop.cpp
for both the 'fixed' and 'scalable' optional parameter.
See this thread for context: http://lists.llvm.org/pipermail/cfe-dev/2020-November/067262.html
Differential Revision: https://reviews.llvm.org/D89031
This reduces the number of `WinX86_64ABIInfo::classify` call sites from
3 to 1. The call sites were similar, but passed different values for
FreeSSERegs. Use variables instead of `if`s to manage that argument.
getAs<> can return null if the cast is invalid, which can lead to null pointer deferences. Use castAs<> instead which will assert that the cast is valid.
This is an enhancement to LLVM Source-Based Code Coverage in clang to track how
many times individual branch-generating conditions are taken (evaluate to TRUE)
and not taken (evaluate to FALSE). Individual conditions may comprise larger
boolean expressions using boolean logical operators. This functionality is
very similar to what is supported by GCOV except that it is very closely
anchored to the ASTs.
Differential Revision: https://reviews.llvm.org/D84467
VLST arguments are coerced to VLATs at the function boundary for
consistency with the VLAT ABI. They are then bitcast back to VLSTs in
the function prolog. Previously, this conversion is done through memory.
With the introduction of the llvm.vector.{insert,extract} intrinsic, we
can avoid going through memory here.
Depends on D92761
Differential Revision: https://reviews.llvm.org/D92762
As a follow-up to D93656, I'm switching the Clang UniqueInternalLinkageNamesPass scheduling to using the LLVM one with newpm.
Test Plan:
Reviewed By: aeubanks, tmsriram
Differential Revision: https://reviews.llvm.org/D94019
[libomptarget][amdgpu] Call into deviceRTL instead of ockl
Amdgpu codegen presently emits a call into ockl. The same functionality
is already present in the deviceRTL. Adds an amdgpu specific entry point
to avoid the dependency. This lets simple openmp code (specifically, that
which doesn't use libm) run without rocm device libraries installed.
Reviewed By: ronlieb
Differential Revision: https://reviews.llvm.org/D93356
Add powerpcle support to clang.
For FreeBSD, assume a freestanding environment for now, as we only need it in the first place to build loader, which runs in the OpenFirmware environment instead of the FreeBSD environment.
For Linux, recognize glibc and musl environments to match current usage in Void Linux PPC.
Adjust driver to match current binutils behavior regarding machine naming.
Adjust and expand tests.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D93919
The idea is that the CC1 default for ELF should set dso_local on default
visibility external linkage definitions in the default -mrelocation-model pic
mode (-fpic/-fPIC) to match COFF/Mach-O and make output IR similar.
The refactoring is made available by 2820a2ca3a.
Currently only x86 supports local aliases. We move the decision to the driver.
There are three CC1 states:
* -fsemantic-interposition: make some linkages interposable and make default visibility external linkage definitions dso_preemptable.
* (default): selected if the target supports .Lfoo$local: make default visibility external linkage definitions dso_local
* -fhalf-no-semantic-interposition: if neither option is set or the target does not support .Lfoo$local: like -fno-semantic-interposition but local aliases are not used. So references can be interposed if not optimized out.
Add -fhalf-no-semantic-interposition to a few tests using the half-based semantic interposition behavior.
* static relocation model: always
* other relocation models: if isStrongDefinitionForLinker
This will make LLVM IR emitted for COFF/Mach-O and executable ELF similar.
As mentioned in D93793, there are quite a few places where unary `IRBuilder::CreateShuffleVector(X, Mask)` can be used
instead of `IRBuilder::CreateShuffleVector(X, Undef, Mask)`.
Let's update them.
Actually, it would have been more natural if the patches were made in this order:
(1) let them use unary CreateShuffleVector first
(2) update IRBuilder::CreateShuffleVector to use poison as a placeholder value (D93793)
The order is swapped, but in terms of correctness it is still fine.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93923
This simplifies TargetMachine::shouldAssumeDSOLocal and and gives frontend the
decision to use dso_local. For LLVM synthesized functions/globals, they may lose
inferred dso_local but such optimizations are probably not very useful.
Note: the hasComdat() condition in canBenefitFromLocalAlias (D77429) may be dead now.
(llvm/CodeGen/X86/semantic-interposition-comdat.ll)
(Investigate whether we need test coverage when Fuchsia C++ ABI is clearer)
UBSan was using the complete-object align rather than nv alignment
when checking the "this" pointer of a method.
Furthermore, CGF.CXXABIThisAlignment was also being set incorrectly,
due to an incorrectly negated test. The latter doesn't appear to have
had any impact, due to it not really being used anywhere.
Differential Revision: https://reviews.llvm.org/D93072
As proposed in https://github.com/WebAssembly/simd/pull/380. This commit makes
the new instructions available only via clang builtins and LLVM intrinsics to
make their use opt-in while they are still being evaluated for inclusion in the
SIMD proposal.
Depends on D93771.
Differential Revision: https://reviews.llvm.org/D93775
function when the receiver is nil
Callee-destroyed arguments to a method have to be destroyed in the
caller function when the receiver is nil as the method doesn't get
executed. This fixes PR48207.
rdar://71808391
Differential Revision: https://reviews.llvm.org/D93273
Clang FE currently has hot/cold function attribute. But we only have
cold function attribute in LLVM IR.
This patch adds support of hot function attribute to LLVM IR. This
attribute will be used in setting function section prefix/suffix.
Currently .hot and .unlikely suffix only are added in PGO (Sample PGO)
compilation (through isFunctionHotInCallGraph and
isFunctionColdInCallGraph).
This patch changes the behavior. The new behavior is:
(1) If the user annotates a function as hot or isFunctionHotInCallGraph
is true, this function will be marked as hot. Otherwise,
(2) If the user annotates a function as cold or
isFunctionColdInCallGraph is true, this function will be marked as
cold.
The changes are:
(1) user annotated function attribute will used in setting function
section prefix/suffix.
(2) hot attribute overwrites profile count based hotness.
(3) profile count based hotness overwrite user annotated cold attribute.
The intention for these changes is to provide the user a way to mark
certain function as hot in cases where training input is hard to cover
all the hot functions.
Differential Revision: https://reviews.llvm.org/D92493
Add a special case for handling __builtin_mul_overflow with unsigned
inputs and a signed output to avoid emitting the __muloti4 library
call on x86_64. __muloti4 is not implemented in libgcc, so avoiding
this call fixes compilation of some programs that call
__builtin_mul_overflow with these arguments.
For example, this fixes the build of cpio with clang, which includes code from
gnulib that calls __builtin_mul_overflow with these argument types.
Reviewed By: vsk
Differential Revision: https://reviews.llvm.org/D84405
On PPC, the vector pair instructions are independent from MMA.
This patch renames the vector pair LLVM intrinsics and Clang builtins to replace the _mma_ prefix by _vsx_ in their names.
We also move the vector pair type/intrinsic/builtin tests to their own files.
Differential Revision: https://reviews.llvm.org/D91974
This change makes use of the llvm.vector.extract intrinsic to avoid
going through memory when performing bitcasts between vector-length
agnostic types and vector-length specific types.
Depends on D91362
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D92761
The `assume` attribute is a way to provide additional, arbitrary
information to the optimizer. For now, assumptions are restricted to
strings which will be accumulated for a function and emitted as comma
separated string function attribute. The key of the LLVM-IR function
attribute is `llvm.assume`. Similar to `llvm.assume` and
`__builtin_assume`, the `assume` attribute provides a user defined
assumption to the compiler.
A follow up patch will introduce an LLVM-core API to query the
assumptions attached to a function. We also expect to add more options,
e.g., expression arguments, to the `assume` attribute later on.
The `omp [begin] asssumes` pragma will leverage this attribute and
expose the functionality in the absence of OpenMP.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D91979
This patch enables the Clang type __vector_pair and its associated LLVM
intrinsics even when MMA is disabled. With this patch, the type is now controlled
by the PPC paired-vector-memops option. The builtins and intrinsics will be
renamed to drop the mma prefix in another patch.
Differential Revision: https://reviews.llvm.org/D91819
This abstracts away the members that are being replaced in a follow-up patch.
Depends on D83979.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D93214
This patch adds the functionality to compare BFI counts with real
profile
counts right after reading the profile. It will print remarks under
-Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo.
Differential Revision: https://reviews.llvm.org/D91813
Followup to D87604, having confirmed on PR47506 that we can use the llvm codegen expansion for fadd/fmul as well.
Differential Revision: https://reviews.llvm.org/D92940
Background: Call to library arithmetic functions for div is emitted by the
compiler and it set wrong “C” calling convention for calls to these functions,
whereas library functions are declared with `spir_function` calling convention.
InstCombine optimization replaces such calls with “unreachable” instruction.
It looks like clang lacks SPIRABIInfo class which should specify default
calling conventions for “system” function calls. SPIR supports only
SPIR_FUNC and SPIR_KERNEL calling convention.
Reviewers: Erich Keane, Anastasia
Differential Revision: https://reviews.llvm.org/D92721
I tried to put it in the same place in the pipeline as the legacy PM.
Fixes PR48399.
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D93002
This patch adds vcmla and the rotated variants as defined in
"Arm Neon Intrinsics Reference for ACLE Q3 2020" [1]
The *_lane_* are still missing, but they can be added separately.
This patch only adds the builtin mapping for AArch64.
[1] https://developer.arm.com/documentation/ihi0073/latest
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D92930
The new PM is considered stable and many downstream groups have adopted it (some
have adopted it for more than two years). Add -f[no-]legacy-pass-manager to reflect the
fact that it is no longer experimental and the legacy pass manager is something we strive to retire.
In the future, when the legacy PM eventually goes away,
-fno-experimental-new-pass-manager and -flegacy-pass-manager will be removed.
This patch also changes -f[no-]legacy-pass-manager to pass `-plugin-opt={new,legacy}-pass-manager` to the linker (supported by both ld.lld and LLVMgold.so) when -flto/-flto=thin is specified
Reviewed By: aeubanks, rsmith
Differential Revision: https://reviews.llvm.org/D92915
Swiftcall does it's own target-independent argument type classification,
since it is not designed to be ABI compatible with anything local on the
target that isn't LLVM-based. This means it never uses inalloca.
However, we have duplicate logic for checking for inalloca parameters
that runs before call argument setup. This logic needs to know ahead of
time if inalloca will be used later, and we can't move the
CGFunctionInfo calculation earlier.
This change gets the calling convention from either the
FunctionProtoType or ObjCMethodDecl, checks if it is swift, and if so
skips the stackbase setup.
Depends on D92883.
Differential Revision: https://reviews.llvm.org/D92944
This template exists to abstract over FunctionPrototype and
ObjCMethodDecl, which have similar APIs for storing parameter types. In
place of a template, use a PointerUnion with two cases to handle this.
Hopefully this improves readability, since the type of the prototype is
easier to discover. This allows me to sink this code, which is mostly
assertions, out of the header file and into the cpp file. I can also
simplify the overloaded methods for computing isGenericMethod, and get
rid of the second EmitCallArgs overload.
Differential Revision: https://reviews.llvm.org/D92883
Currently, -ftime-report + new pass manager emits one line of report for each
pass run. This potentially causes huge output text especially with regular LTO
or large single file (Obeserved in private tests and was reported in D51276).
The behaviour of -ftime-report + legacy pass manager is
emitting one line of report for each pass object which has relatively reasonable
text output size. This patch adds a flag `-ftime-report=` to control time report
aggregation for new pass manager.
The flag is for new pass manager only. Using it with legacy pass manager gives
an error. It is a driver and cc1 flag. `per-pass` is the new default so
`-ftime-report` is aliased to `-ftime-report=per-pass`. Before this patch,
functionality-wise `-ftime-report` is aliased to `-ftime-report=per-pass-run`.
* Adds an boolean variable TimePassesHandler::PerRun to control per-pass vs per-pass-run.
* Adds a new clang CodeGen flag CodeGenOptions::TimePassesPerRun to work with the existing CodeGenOptions::TimePasses.
* Remove FrontendOptions::ShowTimers, its uses are replaced by the existing CodeGenOptions::TimePasses.
* Remove FrontendTimesIsEnabled (It was introduced in D45619 which was largely reverted.)
Differential Revision: https://reviews.llvm.org/D92436
This test shows we're in some cases not getting strictfp information from
the AST. Correct that.
Differential Revision: https://reviews.llvm.org/D92596
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
getFieldOffset(layoutStartOffset) is expected to point to the first trivial
field or the one which follows non-trivial. So it must be byte aligned already.
However this is not obvious without assumptions about callers.
This patch will avoid the need in such assumptions.
Depends on D92727.
Differential Revision: https://reviews.llvm.org/D92728
Such fields will likely have offset zero making
__sanitizer_dtor_callback poisoning wrong regions.
E.g. it can poison base class member from derived class constructor.
Differential Revision: https://reviews.llvm.org/D92727
PPCMCInstLower does not actually call shouldAssumeDSOLocal for ppc32 so this is dead.
Actually Clang ppc32 does produce a pair of absolute relocations which match GCC.
This also fixes a comment (R_PPC_COPY and R_PPC64_COPY do exist).
shouldRTTIBeUnique() returns false for iOS64CXXABI, which causes
RTTI objects to be emitted hidden. Update two tests that didn't
expect this to happen for the default triple.
Also rename iOS64CXXABI to AppleARM64CXXABI, since it's used for
arm64-apple-macos triples too.
Part of PR46644.
Differential Revision: https://reviews.llvm.org/D91904
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.
One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.
With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.
To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.
Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.
Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D91756
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
It doesn't use the GCC personality rountine, because the
interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.
Reviewed By: daltenty, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D91455
In C++ when a reference variable is captured by copy, the lambda
is supposed to make a copy of the referenced variable in the captures
and refer to the copy in the lambda. Therefore, it is valid to capture
a reference to a host global variable in a device lambda since the
device lambda will refer to the copy of the host global variable instead
of access the host global variable directly.
However, clang tries to avoid capturing of reference to a host global variable
if it determines the use of the reference variable in the lambda function is
not odr-use. Clang also tries to emit load of the reference to a global variable
as load of the global variable if it determines that the reference variable is
a compile-time constant.
For a device lambda to capture a reference variable to host global variable
and use the captured value, clang needs to be taught that in such cases the use of the reference
variable is odr-use and the reference variable is not compile-time constant.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D91088
OpenMPIRBuilder::createParallel outlines the body region of the parallel
construct into a new function that accepts any value previously defined outside
the region as a function argument. This function is called back by OpenMP
runtime function __kmpc_fork_call, which expects trailing arguments to be
pointers. If the region uses a value that is not of a pointer type, e.g. a
struct, the produced code would be invalid. In such cases, make createParallel
emit IR that stores the value on stack and pass the pointer to the outlined
function instead. The outlined function then loads the value back and uses as
normal.
Reviewed By: jdoerfert, llitchev
Differential Revision: https://reviews.llvm.org/D92189
Commit 6b1341eb fixed alignment for 128-bit FP types on PowerPC.
However, the quadword alignment adjustment shouldn't be applied to IBM
extended double (ppc_fp128 in IR) values.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92278
Methods synthesized from declared properties were being added to the
method lists twice. This came from the change to list them in the
class's method list, which missed removing the place in CGObjCGNU that
added them again.
Reviewed By: lanza
Differential Revision: https://reviews.llvm.org/D91874
Thanks to D77248, we can bypass the use of stubs altogether and use PLT
relocations if they are available for the target. LLVM and LLD support the
R_AARCH64_PLT32 relocation, so we can also guarantee a static PLT relocation on AArch64.
Not emitting these stubs saves a lot of extra binary size.
Differential Revision: https://reviews.llvm.org/D83812
After D17993, with -fno-delete-null-pointer-checks we add the dereferenceable attribute to the `this` pointer.
We have observed that one internal target which worked before fails even with -fno-delete-null-pointer-checks.
Switching to dereferenceable_or_null fixes the problem.
dereferenceable currently does not always respect NullPointerIsValid and may
imply nonnull and lead to aggressive optimization. The optimization may be
related to `CallBase::isReturnNonNull`, `Argument::hasNonNullAttr`, or
`Value::getPointerDereferenceableBytes`. See D66664 and D66618 for some discussions.
Reviewed By: bkramer, rsmith
Differential Revision: https://reviews.llvm.org/D92297
This change introduces a new clang switch `-fpseudo-probe-for-profiling` to enable AutoFDO with pseudo instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.
One implication from pseudo-probe instrumentation is that the profile is now sensitive to CFG changes. We perform the pseudo instrumentation very early in the pre-LTO pipeline, before any CFG transformation. This ensures that the CFG instrumented and annotated is stable and optimization-resilient.
The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D86502
Currently clang is not correctly retrieving from the AST the metadata for
constrained FP builtins. This patch fixes that for the non-target specific
builtins.
Differential Revision: https://reviews.llvm.org/D92122
This patch enables vector type arguments on AIX. All non-aggregate Altivec vector types are 16bytes in size and are 16byte aligned.
Reviewed By: Xiangling_L
Differential Revision: https://reviews.llvm.org/D92117
This code got quite twisted because we consider some MSVC builtins to be
target agnostic, and some to be target specific. Target specific
intrinsics have a pattern of doing up-front argument evaluation, while
general intrinsics do not evaluate their arguments up front. As we tried
to share codepaths between the target-specific and target-agnostic
handling, we ended up doing double evaluation.
Instead, have each target handle MSVC intrinsics consistently before up
front argument evaluation. This requires passing less data around and is
more consistent with target independent intrinsic handling.
See D50979 for past examples of this bug. I noticed this while looking
into adding some more intrinsics.
Differential Revision: https://reviews.llvm.org/D92061
Added support for the options mabi=vec-extabi and mabi=vec-default which are analogous to qvecnvol and qnovecnvol when using XL on AIX.
The extended Altivec ABI on AIX is enabled using mabi=vec-extabi in clang and vec-extabi in llc.
Reviewed By: Xiangling_L, DiggerLin
Differential Revision: https://reviews.llvm.org/D89684
Add a Visited set to avoid repeatedly processing the same base classes
in complex class hierarchies. This cut down the compile time of one
source file from >12min to ~1min.
Differential Revision: https://reviews.llvm.org/D91676
Recently HIP toolchain made a change to use clang instead of opt/llc to do compilation
(https://reviews.llvm.org/D81861). The intention is to make HIP toolchain canonical like
other toolchains.
However, this change introduced an unintentional change regarding backend fp fuse
option, which caused regressions in some HIP applications.
Basically before the change, HIP toolchain used clang to generate bitcode, then use
opt/llc to optimize bitcode and generate ISA. As such, the amdgpu backend takes
the default fp fuse mode which is 'Standard'. This mode respect contract flag of
fmul/fadd instructions and do not fuse fmul/fadd instructions without contract flag.
However, after the change, HIP toolchain now use clang to generate IR, do optimization,
and generate ISA as one process. Now amdgpu backend fp fuse option is determined
by -ffp-contract option, which is 'fast' by default. And this -ffp-contract=fast language option
is translated to 'Fast' fp fuse option in backend. Suddenly backend starts to fuse fmul/fadd
instructions without contract flag.
This causes wrong result for some device library functions, e.g. tan(-1e20), which should
return 0.8446, now returns -0.933. What is worse is that since backend with 'Fast' fp fuse
option does not respect contract flag, there is no way to use #pragma clang fp contract
directive to enforce fp contract requirements.
This patch fixes the regression by introducing a new value 'fast-honor-pragmas' for -ffp-contract
and use it for HIP by default. 'fast-honor-pragmas' is equivalent to 'fast' in frontend but
let the backend to use 'Standard' fp fuse option. 'fast-honor-pragmas' is useful since 'Fast'
fp fuse option in backend does not honor contract flag, it is of little use to HIP
applications since all code with #pragma STDC FP_CONTRACT or any IR from a
source compiled with -ffp-contract=on is broken.
Differential Revision: https://reviews.llvm.org/D90174
Ensure that the DSO Locality of the globals in the IR is derived from
their final visibility when using -fvisibility-from-dllstorageclass.
To accomplish this we reset the DSO locality of globals (before
setting their visibility from their dllstorageclass) at the end of
IRGen in Clang. This removes any effects that visibility options or
annotations may have had on the DSO locality.
The resulting DSO locality of the globals will be pessimistic
w.r.t. to the normal compiler IRGen.
Differential Revision: https://reviews.llvm.org/D91779
After fix for PR48174 the base pointer for pointer-based
array-sections/array-subscripts will be emitted as `&ptr[idx]`, but
actually it should be just `ptr`, i.e. the address stored in the ponter
to point correctly to the beginning of the array. Currently it may lead
to a crash in the runtime.
Differential Revision: https://reviews.llvm.org/D91805
This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.
Reviewed by: arsenm
Differential Revision: https://reviews.llvm.org/D84345
This moves handling of alwaysinline, coroutines, matrix lowering, PGO,
and LTO-required passes into PassBuilder. Much of this is replicated
between Clang and opt. Other out-of-tree users also replicate some of
this, such as Rust [1] replicating the alwaysinline, LTO, and PGO
passes.
The LTO passes are also now run in
build(Thin)LTOPreLinkDefaultPipeline() since they are semantically
required for (Thin)LTO.
[1]: f5230fbf76/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp (L896)
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D91585
Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.
Reviewers: jdoerfert grokos
Differential Revision: https://reviews.llvm.org/D87946
According to ELF v2 ABI, both IEEE 128-bit and IBM extended floating
point variables should be quad-word (16 bytes) aligned. Previously, only
vector types are considered aligned as quad-word on PowerPC.
This patch will fix incorrectness of IEEE 128-bit float argument in
va_arg cases.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D91596
Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D89802
The compiler should treat array subscript with base pointer as a first
pointer in complex data, it is used only for member expression with base
pointer.
Differential Revision: https://reviews.llvm.org/D91660
Matrix types in memory are represented as arrays, but accessed through
vector pointers, with the alignment specified on the access operation.
For inline assembly, update pointer arguments to use vector pointers.
Otherwise there will be a mis-match if the matrix is also an
input-argument which is represented as vector.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D91631
If the data member pointer is mapped, the compiler tries to optimize the
mapping of such data by discarding explicit mapping flags and trying to
emit combined data instead. In some cases, this optimization is not
quite correctly implemented and it leads to a program crash at the
runtime. Instead, if the data member is mapped, just emit it as is and
do not emit combined mapping flags for it.
Differential Revision: https://reviews.llvm.org/D91552
Add an option -munsafe-fp-atomics for AMDGPU target.
When enabled, clang adds function attribute "amdgpu-unsafe-fp-atomics"
to any functions for amdgpu target. This allows amdgpu backend to use
unsafe fp atomic instructions in these functions.
Differential Revision: https://reviews.llvm.org/D91546
arguments.
* Adds 'nonnull' and 'dereferenceable(N)' to 'this' pointer arguments
* Gates 'nonnull' on -f(no-)delete-null-pointer-checks
* Introduces this-nonnull.cpp and microsoft-abi-this-nullable.cpp tests to
explicitly test the behavior of this change
* Refactors hundreds of over-constrained clang tests to permit these
attributes, where needed
* Updates Clang12 patch notes mentioning this change
Reviewed-by: rsmith, jdoerfert
Differential Revision: https://reviews.llvm.org/D17993
This patch updates Clang's IRGen to add !annotation nodes with an
"auto-init" annotation to all stores for auto-initialization.
As discussed in 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
this allows using optimization remarks to track down where auto-init
code was inserted (and not removed by optimizations).
There are a few cases in the tests where !annotation gets dropped by
optimizations. Those optimizations will be updated in subsequent
patches.
This patch is based on a patch by Francis Visoiu Mistrih.
Reviewed By: thegameg, paquette
Differential Revision: https://reviews.llvm.org/D91417
parameters.
It appears that LLVM isn't able to generate a DW_AT_const_value for a
constant of class type, but if it could, we'd match GCC's debug info in
this case, and in the interim we no longer crash.
See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.
I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.
This reverts commit 7bdad08429,
and some of it's follow-ups, that don't stand on their own.
The code has a few sequence that looked like:
Ops.push_back(Ops[0]);
Ops.erase(Ops.begin());
And are equivalent to:
std::rotate(Ops.begin(), Ops.begin() + 1, Ops.end());
The latter has the advantage of never reallocating the vector, which
would be a bug in the original code as push_back would read from the
memory it deallocated.
Make it required. Since it's a module pass, optnone won't test it, so
extend the clang test to also use opt-bisect now that it's supported.
14/16 check-dfsan tests failed with NPM enabled, now all pass.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D91385
- If an aggregate argument is indirectly accessed within kernels, direct
passing results in unpromotable `alloca`, which degrade performance
significantly. InferAddrSpace pass is enhanced in
[D91121](https://reviews.llvm.org/D91121) to take the assumption that
generic pointers loaded from the constant memory could be regarded
global ones. The need for the coercion on aggregate arguments is
mitigated.
Differential Revision: https://reviews.llvm.org/D89980
This removes lots of duplicated code which was necessary before
https://reviews.llvm.org/D89158.
Now we can use PassBuilder::runRegisteredEPCallbacks().
This is mostly sanitizers.
There is likely more that can be done to simplify, but let's start with this.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D90870
Need to check if there are map types for the components before trying to
access them when trying to modify type mappings for combined partial
mappings.
Differential Revision: https://reviews.llvm.org/D91370
For dllexported default constructors with default arguments, we export
default constructor closures which pass in the default args. (See D8331
for a good explanation.)
For templates, that means those default args must be instantiated even
if the function isn't called. That is done by the
InstantiateDefaultCtorDefaultArgs() function, but it wasn't done for
explicit specializations, causing asserts (see bug).
Differential revision: https://reviews.llvm.org/D91089
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Since callbacks may end up not adding passes, we need to check if the
pass managers are empty before adding them, so PassManager now has an
isEmpty() function. For example, polly adds callbacks but doesn't always
add passes in those callbacks, so this is necessary to keep
-debug-pass-manager tests' output from changing depending on if polly is
enabled or not.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
Reviewed By: asbirlea, Meinersbur
Differential Revision: https://reviews.llvm.org/D89158
except where they are necessary to disambiguate the target.
This substantially improves diagnostics from the standard library,
which are otherwise full of `::__1::` noise.
This enables a method sending an autorelease message to an object and
returning the object in MRR to avoid adding the object to an autorelease
pool if a call to objc_retainAutoreleasedReturnValue in the caller
function accepts the hand off of the retain count.
rdar://problem/50678052
Differential Revision: https://reviews.llvm.org/D91111
mangling support for non-type template parameters of class type and
template parameter objects.
The Itanium side of this follows the approach I proposed in
https://github.com/itanium-cxx-abi/cxx-abi/issues/47 on 2020-09-06.
The MSVC side of this was determined empirically by observing MSVC's
output.
Differential Revision: https://reviews.llvm.org/D89998
For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.
This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.
I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.
Reviewed By: mehdi_amini, fghanim
Differential Revision: https://reviews.llvm.org/D91109
D86841 had an error where for statements with no conditional were
required to make progress. This is not true, this patch removes that
line, and adds regression tests.
Differential Revision: https://reviews.llvm.org/D91075
In order not to modify the `tgt_target_data_update` information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload `arg` when
the maptype is set as `OMP_MAP_DESCRIPTOR`. The origin `arg` is for
passing the pointer information, however, the overloaded `arg` is an
array of descriptor_dim:
struct descriptor_dim {
int64_t offset;
int64_t count;
int64_t stride
};
and the array size is the same as dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
`arg_size` parameter by using dimension size.
For supporting `stride` in array section, we use a dummy dimension in
descriptor to store the unit size. The formula for counting the stride
in dimension D_n: `unit size * (D_0 * D_1 ... * D_n-1) * D_n.stride`.
Demonstrate how it works:
```
double arr[3][4][5];
D0: { offset = 0, count = 1, stride = 8 } // offset, count, dimension size always be 0, 1, 1 for this extra dimension, stride is the unit size
D1: { offset = 0, count = 2, stride = 8 * 1 * 2 = 16 } // stride = unit size * (product of dimension size of D0) * D1.stride = 4 * 1 * 2 = 8
D2: { offset = 2, count = 2, stride = 8 * (1 * 5) * 1 = 40 } // stride = unit size * (product of dimension size of D0, D1) * D2.stride = 4 * 5 * 1 = 20
D3: { offset = 0, count = 2, stride = 8 * (1 * 5 * 4) * 2 = 320 } // stride = unit size * (product of dimension size of D0, D1, D2) * D3.stride = 4 * 25 * 2 = 200
// X here means we need to offload this data, therefore, runtime will transfer
// data from offset 80, 96, 120, 136, 400, 416, 440, 456
// Runtime patch: https://reviews.llvm.org/D82245
// OOOOO OOOOO OOOOO
// OOOOO OOOOO OOOOO
// XOXOO OOOOO XOXOO
// XOXOO OOOOO XOXOO
```
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84192
The strictfp metadata was added to the casting AST nodes in D85960, but
we aren't using that metadata yet. This patch adds that support.
In order to avoid lots of ad-hoc passing around of the strictfp bits I
updated the IRBuilder when moving from a function that has the Expr* to a
function that lacks it. I believe we should switch to this pattern to keep
the strictfp support from being overly invasive.
For the purpose of testing that we're picking up the right metadata, I
also made my tests use a pragma to make the AST's strictfp metadata not
match the global strictfp metadata. This exposes issues that we need to
deal with in subsequent patches, and I believe this is the right method
for most all of our clang strictfp tests.
Differential Revision: https://reviews.llvm.org/D88913
For the language C++ the keyword __unaligned (a Microsoft extension) had no effect on pointers.
The reason, why there was a difference between C and C++ for the keyword __unaligned:
For C, the Method getAsCXXREcordDecl() returns nullptr. That guarantees that hasUnaligned() is called.
If the language is C++, it is not guaranteed, that hasUnaligend() is called and evaluated.
Here are some links:
The Bug: https://bugs.llvm.org/show_bug.cgi?id=47499
Thread on the cfe-dev mailing list: http://lists.llvm.org/pipermail/cfe-dev/2020-September/066783.html
Diff, that introduced the check hasUnaligned() in getNaturalTypeAlignment(): https://reviews.llvm.org/D30166
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D90630
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
In order to prevent TargetMachines from adding unnecessary optimization
passes at -O0, TargetMachine::registerPassBuilderCallbacks() will be
changed to take an OptimizationLevel, but that will be done separately.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89158
Since C++11, the C++ standard has a forward progress guarantee
[intro.progress], so all such functions must have the `mustprogress`
requirement. In addition, from C11 and onwards, loops without a non-zero
constant conditional or no conditional are also required to make
progress (C11 6.8.5p6). This patch implements these attribute deductions
so they can be used by the optimization passes.
Differential Revision: https://reviews.llvm.org/D86841
Clang now asserts for the below case:
```
void clang::CodeGen::CGOpenMPRuntime::createOffloadEntriesAndInfoMetadata(): Assertion `std::get<0>(E) && "All ordered entries must exist!"' failed.
```
The reason why Clang hit the assert is because in
`emitTargetDataCalls`, both `BeginThenGen` and `BeginElseGen` call
`registerTargetRegionEntryInfo` and try to register the Entry in
OffloadEntriesTargetRegion with same key. If changing the expression in
if clause to any constant expression, then the assert disappear. (https://godbolt.org/z/TW7haj)
The assert itself is to avoid
user from accessing elements out of bound inside `OrderedEntries` in
`createOffloadEntriesAndInfoMetadata`.
In this patch, I add a check in `registerTargetRegionEntryInfo` to avoid
register the target region more than once.
A test case that triggers assert: https://godbolt.org/z/4cnGW8
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D90704
Since glibc has supported math library functions conforming IEEE 128-bit
floating point types on some platform (like ppc64le), we can fix clang's
math builtins missing this type.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D90593
Add MMA builtin decoding. These builtins use the new PowerPC-specific types __vector_pair and __vector_quad.
So to avoid pervasive changes, we use custom type descriptors and custom decoding for these builtins.
We also use custom code generation to expand builtin calls with pointers to simpler intrinsic calls with non-pointer types.
Differential Revision: https://reviews.llvm.org/D81748
415f7ee883 had a silly typo introduced when I inlined some
code into a loop from its own function.
Original commit message:
For PlayStation we offer source code compatibility with
Microsoft's dllimport/export annotations; however, our file
format is based on ELF.
To support this we translate from DLL storage class to ELF
visibility at the end of codegen in Clang.
Other toolchains have used similar strategies (e.g. see the
documentation for this ARM toolchain:
https://developer.arm.com/documentation/dui0530/i/migrating-from-rvct-v3-1-to-rvct-v4-0/changes-to-symbol-visibility-between-rvct-v3-1-and-rvct-v4-0)
This patch adds the ability to perform this translation. Options
are provided to support customizing the mapping behaviour.
Differential Revision: https://reviews.llvm.org/D89970
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
Currently for explicit template function instantiation in CUDA/HIP device
compilation clang emits instantiated kernel with external linkage
and instantiated device function with internal linkage.
This is fine for -fno-gpu-rdc since there is only one TU.
However this causes duplicate symbols for kernels for -fgpu-rdc if
the same instantiation happen in multiple TU. Or missing symbols
if a device function calls an explicitly instantiated template function
in a different TU.
To make explicit template function instantiation work for
-fgpu-rdc we need to follow the C++ linkage paradigm, i.e.
use weak_odr linkage.
Differential Revision: https://reviews.llvm.org/D90311
The __isPlatformVersionAtLeast routine is an implementation of `if (@available)` check
that uses the _availability_version_check API on Darwin that's supported on
macOS 10.15, iOS 13, tvOS 13 and watchOS 6.
Differential Revision: https://reviews.llvm.org/D90367
415f7ee883 had LIT test failures on any build where the clang executable
was not called "clang". I have adjusted the LIT CHECKs to remove the
binary name to fix this.
Original commit message:
For PlayStation we offer source code compatibility with
Microsoft's dllimport/export annotations; however, our file
format is based on ELF.
To support this we translate from DLL storage class to ELF
visibility at the end of codegen in Clang.
Other toolchains have used similar strategies (e.g. see the
documentation for this ARM toolchain:
https://developer.arm.com/documentation/dui0530/i/migrating-from-rvct-v3-1-to-rvct-v4-0/changes-to-symbol-visibility-between-rvct-v3-1-and-rvct-v4-0)
This patch adds the ability to perform this translation. Options
are provided to support customizing the mapping behaviour.
Differential Revision: https://reviews.llvm.org/D89970
Similar to -fprofile-generate=, add -fmemory-profile= which takes a
directory path. This is passed down to LLVM via a new module flag
metadata. LLVM in turn provides this name to the runtime via the new
__memprof_profile_filename variable.
Additionally, always pass a default filename (in $cwd if a directory
name is not specified vi the = form of the option). This is also
consistent with the behavior of the PGO instrumentation. Since the
memory profiles will generally be fairly large, it doesn't make sense to
dump them to stderr. Also, importantly, the memory profiles will
eventually be dumped in a compact binary format, which is another reason
why it does not make sense to send these to stderr by default.
Change the existing memprof tests to specify log_path=stderr when that
was being relied on.
Depends on D89086.
Differential Revision: https://reviews.llvm.org/D89087
The attribute has no effect on a do statement since the path of execution
will always include its substatement.
It adds a diagnostic when the attribute is used on an infinite while loop
since the codegen omits the branch here. Since the likelihood attributes
have no effect on a do statement no diagnostic will be issued for
do [[unlikely]] {...} while(0);
Differential Revision: https://reviews.llvm.org/D89899
Make DebugLogging a member variable so that users of PassBuilder don't
need to pass it around so much.
Move call to TargetMachine::registerPassBuilderCallbacks() within
PassBuilder so users don't need to remember to call it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D90437
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
We don't currently support passing unnamed variadic SVE arguments
so I've added a fatal error if we hit such cases to prevent any
silent ABI issues in future.
Differential Revision: https://reviews.llvm.org/D90230
This patch is mainly doing two things:
1. Adding support for parentheses, making the combination of target features
more diverse;
2. Making the priority of ’,‘ is higher than that of '|' by default. So I need
to make some change with PTX Builtin function.
Differential Revision: https://reviews.llvm.org/D89184
[AMDGPU] Add __builtin_amdgcn_grid_size
Similar to D76772, loads the data from the dispatch pointer. Marked invariant.
Patch also updates the openmp devicertl to use this builtin.
Reviewed By: yaxunl
Differential Revision: https://reviews.llvm.org/D90251
We used to only emit static const data members in CodeView as
S_CONSTANTS when they were used; this patch makes it so they are always emitted.
This changes CodeViewDebug.cpp to find the static const members from the
class debug info instead of creating DIGlobalVariables in the IR
whenever a static const data member is used.
Bug: https://bugs.llvm.org/show_bug.cgi?id=47580
Differential Revision: https://reviews.llvm.org/D89072
This reverts commit 504615353f.
Previously we added support for target nowait, but target data nowait
has not been supported yet. In this patch, target data nowait will also be
wrapped into a task.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90099
Define the __vector_pair and __vector_quad types that are used to manipulate
the new accumulator registers introduced by MMA on PowerPC. Because these two
types are specific to PowerPC, they are defined in a separate new file so it
will be easier to add other PowerPC specific types if we need to in the future.
Differential Revision: https://reviews.llvm.org/D81508
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.
Differential Revision: https://reviews.llvm.org/D90253
[libomptarget][nvptx] Undef, weak shared variables
Shared variables on nvptx, and LDS on amdgcn, are uninitialized at
the start of kernel execution. Therefore create the variables with
undef instead of zeros, motivated in part by the amdgcn back end
rejecting LDS+initializer.
Common is zero initialized, which seems incompatible with shared. Thus
change them to weak, following the direction of
https://reviews.llvm.org/rG7b3eabdcd215
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90248
Summary:
This patch adds support for passing in the original delcaration name in the
source file to the libomptarget runtime. This will allow the runtime to provide
more intelligent debugging messages. This patch takes the original expression
parsed from the OpenMP map / update clause and provides a textual
representation if it was explicitly mapped, otherwise it takes the name of the
variable declaration as a fallback. The information in passed to the runtime in
a global array of strings that matches the existing ident_t source location
strings using ";name;filename;column;row;;". See
clang/test/OpenMP/target_map_names.cpp for an example of the generated output
for a given map clause.
Reviewers: jdoervert
Differential Revision: https://reviews.llvm.org/D89802
In current implementation, if it requires an outer task, the mapper array will be privatized no matter whether it has mapper. In fact, when there is no mapper, the mapper array only contains number of nullptr. In the libomptarget, the use of mapper array is `if (mappers_array && mappers_array[i])`, which means we can directly set mapper array to nullptr if there is no mapper. This can avoid unnecessary data copy.
In this patch, the data privatization will not be emitted if the mapper array is nullptr. When it comes to the emit of task body, the nullptr will be used directly.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90101
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D88609
The implementation of target nowait just wraps the target region into a task. The essential four parameters (base ptr, ptr, size, mapper) are taken as firstprivate such that they will be copied to the private location. When there is no user-defined mapper, the mapper variable will be nullptr. However, it will be still copied to the corresponding place. Therefore, a memcpy will be generated and the source pointer will be nullptr, causing a segmentation fault. The root cause is when calling `emitOffloadingArraysArgument`, the last argument `Options` has a field about whether it requires a task. It only takes depend clause into account. In this patch, the nowait clause is also included.
There're two things that will be done in another patches:
1. target data nowait has not been supported yet. D90099 added the support.
2. When there is no mapper, the mapper array can be nullptr no matter whether it requires outer task or not. It can avoid an unnecessary data copy. This is an optimization that is covered in D90101.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D89844
We used to only emit static const data members in CodeView as
S_CONSTANTS when they were used; this patch makes it so they are always emitted.
I changed CodeViewDebug.cpp to find the static const members from the
class debug info instead of creating DIGlobalVariables in the IR
whenever a static const data member is used.
Bug: https://bugs.llvm.org/show_bug.cgi?id=47580
Differential Revision: https://reviews.llvm.org/D89072
This is a long-delayed follow-up to
5e5b85098d.
`TempMDNode` includes a bunch of machinery for RAUW, and should only be
used when necessary. RAUW wasn't being used in any of these cases... it
was just a placeholder for a self-reference.
Where the real node was using `MDNode::getDistinct`, just replace the
temporary argument with `nullptr`.
Where the real node was using `MDNode::get`, the `replaceOperandWith`
call was "promoting" the node to a distinct one implicitly due to
self-reference detection in `MDNode::handleChangedOperand`. The
`TempMDNode` was serving a purpose by delaying uniquing, but it's way
simpler to just call `MDNode::getDistinct` in the first place.
Note that using a self-reference at all in these places is a hold-over
from before `distinct` metadata existed. It was an old trick to create
distinct nodes. It would be intrusive to change, including bitcode
upgrades, etc., and it's harmless so I'm not sure there's much value in
removing it from existing schemas. After this commit it still has a tiny
memory cost (in the extra metadata operand) but no more overhead in
construction.
Differential Revision: https://reviews.llvm.org/D90079
This allows using annotation in a much more contexts than it currently has.
especially when annotation with template or constexpr.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D88645
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.
While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u
Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining. Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.
Fixes pr/47479.
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D87956
non-type template parameters.
Create a unique TemplateParamObjectDecl instance for each such value,
representing the globally unique template parameter object to which the
template parameter refers.
No IR generation support yet; that will follow in a separate patch.
assembly operands."
Earlyclobbers are now excepted from this change (original commit: c78da03).
Review: Ulrich Weigand, Nick Desaulniers
Differential Revision: https://reviews.llvm.org/D87279
The patch adjusts the existing `llvm::DenseMap<unsigned, T>` and
`llvm::DenseSet<unsigned>` objects that store source locations, so
that they use `SourceLocation` directly instead of `unsigned`.
This patch relies on the `DenseMapInfo` trait added in D89719.
It also replaces the construction of `SourceLocation` objects from
the constants -1 and -2 with calls to the trait's methods `getEmptyKey`
and `getTombstoneKey` where appropriate.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D69840
This fixes miscomputation of __builtin_constant_evaluated in the
initializer of a variable that's not usable in constant expressions, but
is readable when constant-folding.
If evaluation of a constant initializer fails, we throw away the
evaluated result instead of keeping it as a non-constant-initializer
value for the variable, because it might not be a correct value.
To avoid regressions for initializers that are foldable but not formally
constant initializers, we now try constant-evaluating some globals in
C++ twice: once to check for a constant initializer (in an mode where
is_constannt_evaluated returns true) and again to determine the runtime
value if the initializer is not a constant initializer.
The name is unfortunate because it is similar to the driver option -ftest-coverage.
It turns out aside from one occurrence in a test, this option is not used.
Instead of framing the interface around whether the variable is an ICE
(which is only interesting in C++98), primarily track whether the
initializer is a constant initializer (which is interesting in all C++
language modes).
No functionality change intended.
for which it matters.
This is a step towards separating checking for a constant initializer
(in which std::is_constant_evaluated returns true) and any other
evaluation of a variable initializer (in which it returns false).
Recently commit D78699 (commit 26cfb6e562), fixed clang's behavior with respect
to passing a union type through a register to correctly follow the ABI. However,
this is an ABI breaking change with earlier versions of the clang compiler, so we
should add an -fclang-abi-compat option to address this. Additionally, the PS4 ABI
requires the older behavior, so that is added as well.
This change adds a Ver11 value to the ClangABI enum that when it is set (or the
target is the PS4 triple), we skip the ABI fix introduced in D78699.
Differential Revision: https://reviews.llvm.org/D89747
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.
> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
This reverts commit 273c299d5d.
Implementing the likelihood attributes for the iteration statements adds
a new helper function. This function can't be const qualified since
these non-modifying members aren't const qualified.
This implements the likelihood attribute for the switch statement. Based on the
discussion in D85091 and D86559 it only handles the attribute when placed on
the case labels or the default labels.
It also marks the likelihood attribute as feature complete. There are more QoI
patches in the pipeline.
Differential Revision: https://reviews.llvm.org/D89210
initialization a little smarter.
Look through casts that preserve zero-ness when determining if an
initializer is zero, so that we can handle cases like an {0} initializer
whose corresponding field is a type other than 'int'.
This patch makes sure that the instance of TypeSize comparison operator
is done with a fixed type size.
Differential Revision: https://reviews.llvm.org/D89312
This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).
To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.
Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
rL131311 added `asm()` support for builtin functions, but `asm()` for builtins with
specialized emitting (e.g. memcpy, various math functions) still do not work.
This patch makes these functions work for `asm()` and `#pragma redefine_extname`.
glibc uses `asm()` to redirect internal libc function calls to hidden aliases.
Limitation: such a function is a builtin in clang, but will not be recognized as
a libcall in optimization passes because Clang does not annotate the renamed
function as a libcall. In GCC -O1 or above, `abs` can be optimized out but we can't.
Additionally, we cannot redirect `__builtin_sin` to `real_sin` in the following example:
double sin(double x) asm("real_sin");
double f(double d) { return __builtin_sin(d); }
---
According to @rsmith, the following three statements cannot be simultaneously true:
(1) The frontend function foo has known, builtin semantics X.
(2) The symbol foo has known, builtin semantics X.
(3) It's not correct to lower a call to the frontend function foo to the symbol foo.
People do want (1) (if it is profitable to expand a memcpy, do it).
This also means that people do not want to add -fno-builtin-memcpy.
People do want (3): that is why they use asm("__GI_memcpy") in the first place.
So unfortunately we make a compromise by not refuting (2) (see the limitation above).
For most libcalls, there is a small loss because compilers don't synthesize them.
For the few glibc cares about, it uses `asm("memcpy = __GI_memcpy");` to make
the assembly level redirection.
(Changing function names (e.g. `__memcpy`) is a hit to ergonomics which is not acceptable).
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D88712
This reverts commits 683b308c07 and
8487bfd4e9.
We will go for a more restricted approach that does not give freedom to
everyone to change ABIs on whichever platform.
See the discussion on https://reviews.llvm.org/D85802.
Prototype the newly proposed load_lane instructions, as specified in
https://github.com/WebAssembly/simd/pull/350. Since these instructions are not
available to origin trial users on Chrome stable, make them opt-in by only
selecting them from intrinsics rather than normal ISel patterns. Since we only
need rough prototypes to measure performance right now, this commit does not
implement all the load and store patterns that would be necessary to make full
use of the offset immediate. However, the full suite of offset tests is included
to make it easy to track improvements in the future.
Since these are the first instructions to have a memarg immediate as well as an
additional immediate, the disassembler needed some additional hacks to be able
to parse them correctly. Making that code more principled is left as future
work.
Differential Revision: https://reviews.llvm.org/D89366
Using TypeSize::getFixedSize() instead of relying upon the implicit
TypeSize->uint64_cast as the type is always fixed width.
Differential Revision: https://reviews.llvm.org/D89313
Update `clang/lib/CodeGen` to use a `MemoryBufferRef` from
`getBufferOrNone` instead of `MemoryBuffer*` from `getBuffer`. No
functionality change here.
Differential Revision: https://reviews.llvm.org/D89411
This implements the flag proposed in RFC http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html.
The goal is to add a way to override the default target C++ ABI through
a compiler flag. This makes it easier to test and transition between different
C++ ABIs through compile flags rather than build flags.
In this patch:
- Store `-fc++-abi=` in a LangOpt. This isn't stored in a
CodeGenOpt because there are instances outside of codegen where Clang
needs to know what the ABI is (particularly through
ASTContext::createCXXABI), and we should be able to override the
target default if the flag is provided at that point.
- Expose the existing ABIs in TargetCXXABI as values that can be passed
through this flag.
- Create a .def file for these ABIs to make it easier to check flag
values.
- Add an error for diagnosing bad ABI flag values.
Differential Revision: https://reviews.llvm.org/D85802
Change EmitAsmStmt() to
- Not tie physregs with the "+r" constraint, but instead add the hard
register as an input constraint. This makes "+r" and "=r":"r" look the same
in the output.
Background: Macro intensive user code may contain inline assembly
statements with multiple operands constrained to the same physreg. Such a
case (with the operand constraints "+r" : "r") currently triggers the
TwoAddressInstructionPass assertion against any extra use of a tied
register. Furthermore, TwoAddress will insert a COPY to that physreg even
though isel has already done so (for the non-tied use), which may lead to a
second redundant instruction currently. A simple fix for this is to not
emit tied physreg uses in the first place for the "+r" constraint, which is
what this patch does.
- Give an error on multiple outputs to the same physical register.
This should be reported and this is also what GCC does.
Review: Ulrich Weigand, Aaron Ballman, Jennifer Yu, Craig Topper
Differential Revision: https://reviews.llvm.org/D87279
Followup to D85191.
This changes getTypeInfoInChars to return a TypeInfoChars
struct instead of a std::pair of CharUnits. This lets the
interface match getTypeInfo more closely.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86447
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
short a;
int b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.
The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```
I've updated the patch D16586 to follow such behavior by verifying that we
only change volatile bit-field access when:
- it won't overlap with any other non-bit-field member
- we only access memory inside the bounds of the record
- avoid overlapping zero-length bit-fields.
Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.
Reviewed By: ostannard
Differential Revision: https://reviews.llvm.org/D72932
Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences.
The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions.
Differential Revision: https://reviews.llvm.org/D87604
References to different declarations of the same entity aren't different
values, so shouldn't have different representations.
Recommit of e6393ee813, most recently
reverted in 9a33f027ac due to a bug caused
by ObjCInterfaceDecls not propagating availability attributes along
their redeclaration chains; that bug was fixed in
e2d4174e9c.
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:
* The "Oland" and "Hainan" variants of gfx601 are now split out into
gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
that to avoid using the shaderZExport workaround on gfx602.
* One variant of gfx703 is now split out into gfx705. LLPC and other
front-ends could use that to avoid using the
shaderSpiCsRegAllocFragmentation workaround on gfx705.
* The "TongaPro" variant of gfx802 is now split out into gfx805.
TongaPro has a faster 64-bit shift than its former friends in gfx802,
and a subtarget feature could be set up for that to take advantage of
it. This commit does not make that change; it just adds the target.
V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
so fix the GPUKind order.
Differential Revision: https://reviews.llvm.org/D88916
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
For example:
union M256 {
double d;
__m256 m;
};
extern void foo1(union M256 A);
union M256 m1;
void test() {
foo1(m1);
}
clang will pass m1 through stack which does not follow the ABI.
Differential Revision: https://reviews.llvm.org/D78699
Previously, when clang was compiled with -DLLVM_ENABLE_ASSERTIONS=ON, the added tests were displaying:
inlinable function call in a function with debug info must have a !dbg location
call void @"??1?$c@UB@@@@QEAA@XZ"(%struct.c* @"?f@?1??d@@YAPEAU?$c@UB@@@@XZ@4U2@A")
fatal error: error in backend: Broken module found, compilation aborted!
Stack dump:
0. Program arguments: <f:\svn\buildninja\bin\clang -cc1 -emit-llvm debug-info-no-location.cpp> -gcodeview -debug-info-kind=limited
1. <eof> parser at end of file
2. Per-function optimization
Fixes PR43012
Differential Revision: https://reviews.llvm.org/D66328
Move it as an EP callback (-O[123]) or in addSanitizersAtO0.
This makes it not run in ThinLTO pre-link (like the other sanitizers),
so don't check LTO runs in hwasan-new-pm.c. Changing its position also
seems to change the generated IR. I think we just need to make sure the
pass runs.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D88936
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to
consolidate more OpenMP code generation into the OMPIRBuilder. Future
additions to the GPU runtime functions should now go in OMPKinds.def
Reviewers: jdoerfert
Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl
Tags: #OpenMP #LLVM #clang
Differential Revision: https://reviews.llvm.org/D88430
SUMMARY:
In IBM compiler xlclang , there is an option -fnovisibility which suppresses visibility. For more details see: https://www.ibm.com/support/knowledgecenter/SSGH3R_16.1.0/com.ibm.xlcpp161.aix.doc/compiler_ref/opt_visibility.html.
We need to add the option -mignore-xcoff-visibility for compatibility with the IBM AIX OS (as the option is enabled by default in AIX). With this option llvm does not emit any visibility attribute to ASM or XCOFF object file.
The option only work on the AIX OS, for other non-AIX OS using the option will report an unsupported options error.
In AIX OS:
1.1 the option -mignore-xcoff-visibility is enabled by default , if there is not -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command .
1.2 if there is -fvisibility=* explicitly but not -mignore-xcoff-visibility explicitly in the clang command. it will generate visibility attributes.
1.3 if there are both -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command. The option "-mignore-xcoff-visibility" wins , it do not emit the visibility attribute.
The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR.
Reviewer: daltenty,Jason Liu
Differential Revision: https://reviews.llvm.org/D87451
D17779: host-side shadow variables of external declarations of device-side
global variables have internal linkage and are referenced by
`__cuda_register_globals`.
nvcc from CUDA 11 does not allow `__device__ inline` or `__device__ constexpr`
(C++17 inline variables) but clang has incorrectly supported them for a while:
```
error: A __device__ variable cannot be marked constexpr
error: An inline __device__/__constant__/__managed__ variable must have internal linkage when the program is compiled in whole program mode (-rdc=false)
```
If such a variable (which has a comdat group) is discarded (a copy from another
translation unit is prevailing and selected), accessing the variable from
outside the section group (`__cuda_register_globals`) is a violation of the ELF
specification and will be rejected by linkers:
> A symbol table entry with STB_LOCAL binding that is defined relative to one of a group's sections, and that is contained in a symbol table section that is not part of the group, must be discarded if the group members are discarded. References to this symbol table entry from outside the group are not allowed.
As a workaround, don't register such inline variables for now.
(If we register the variables in all TUs, we will keep multiple instances of the shadow and break the C++ semantics for inline variables).
We should reject such variables in Sema but our internal users need some time to migrate.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D88786
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.
The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.
Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
Bruno De Fraine discovered some issues with D85091. The branch weights
generated for `logical not` and `ternary conditional` were wrong. The
`logical and` and `logical or` differed from the code generated of
`__builtin_predict`.
Adjusted the generated code for the likelihood to match
`__builtin_predict`. The patch is based on Bruno's suggestions.
Differential Revision: https://reviews.llvm.org/D88363
Currently CUDA/HIP toolchain uses "unknown" as bound arch
for offload action for fat binary. This causes -mcpu or -march
with "unknown" added in HIPToolChain::TranslateArgs or
CUDAToolChain::TranslateArgs.
This causes issue for https://reviews.llvm.org/D88377 since
HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs.
The bound arch of offload action for fat binary is not really
used, therefore set it to CudaArch::UNUSED.
Differential Revision: https://reviews.llvm.org/D88524
We now recognize this function as a builtin despite it having an
unexpected number of parameters; make sure we don't enforce that it has
only 1 argument for its 2 parameters.
To facilitate faster loading of device binaries and share them among processes,
HIP runtime favors their alignment being 4096 bytes. HIP runtime can load
unaligned device binaries, however, aligning them at 4096 bytes results in
faster loading and less shared memory usage.
This patch adds an option -bundle-align to clang-offload-bundler which allows
bundles to be aligned at specified alignment. By default it is 1, which is NFC
compared to existing format.
This patch then aligns embedded fat binary and device binary inside fat binary
at 4096 bytes.
It has been verified this change does not cause significant overall file size increase
for typical HIP applications (less than 1%).
Differential Revision: https://reviews.llvm.org/D88734
Summary:
Motivated by the new objc_direct attribute, this change adds a new
attribute that remotes metadata from Protocols that the programmer knows
isn't going to be used at runtime. We simply have the frontend skip
generating any protocol metadata entries (e.g. OBJC_CLASS_NAME,
_OBJC_$_PROTOCOL_INSTANCE_METHDOS, _OBJC_PROTOCOL, etc) for a protocol
marked with `__attribute__((objc_non_runtime_protocol))`.
There are a few APIs used to retrieve a protocol at runtime.
`@protocol(SomeProtocol)` will now error out of the requested protocol
is marked with attribute. `objc_getProtocol` will return `NULL` which
is consistent with the behavior of a non-existing protocol.
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75574
- `-cl-fp32-correctly-rounded-divide-sqrt` is already handled in a
per-instruction manner by annotating the accuracy required. There's no
need to add that fn-attr. So far, there's no in-tree backend handling
that attr and that OpenCL specific option.
- In case that out-of-tree backends are broken, this change could be
reverted if those backends could not be fixed.
Differential Revision: https://reviews.llvm.org/D88424
The patch adds a new TargetMachine member "registerPassBuilderCallbacks" for targets to add passes to the pass pipeline using the New Pass Manager (similar to adjustPassManager for the Legacy Pass Manager).
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D88138
Failing tests on Arm due to the tests automatically populating
incomatible pointer width architectures. Reverting until the tests are
updated. Failing tests:
OpenMP/distribute_parallel_for_num_threads_codegen.cpp
OpenMP/distribute_parallel_for_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp
This reverts commit 90eaedda9b.
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to consolidate
more OpenMP code generation into the OMPIRBuilder. This patch also
invalidates specifying target architectures with conflicting pointer
sizes.
Reviewers: jdoerfert
Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl
Tags: #OpenMP #Clang #LLVM
Differential Revision: https://reviews.llvm.org/D88430
On some targets, preferred alignment is larger than ABI alignment in some cases. For example,
on AIX we have special power alignment rules which would cause that. Previously, to support
those cases, we added a “PreferredAlignment” field in the `RecordLayout` to store the AIX
special alignment values in “PreferredAlignment” as the community suggested.
However, that patch alone is not enough. There are places in the Clang where `PreferredAlignment`
should have been used instead of ABI-specified alignment. This patch is aimed at fixing those
spots.
Differential Revision: https://reviews.llvm.org/D86790
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
Add class types to the retained types list to make sure they
don't get dropped if the constructor is optimized out later.
Differential Revision: https://reviews.llvm.org/D88522
- Fix a memory leak accidentally introduced yesterday by using CodeGen's
existing mangling context instead of creating a new context afresh.
- Move GNU-runtime ObjC method mangling into the AST mangler; this will
eventually be necessary to support direct methods there, but is also
just the right architecture.
- Make the Apple-runtime method mangling work properly when given an
interface declaration, fixing a bug (which had solidified into a test)
where mangling a category method from the interface could cause it to
be mangled as if the category name was a class name. (Category names
are namespaced within their class and have no global meaning.)
- Fix a code cross-reference in dsymutil.
Based on a patch by Ellis Hoag.
GCC 7 introduced -fprofile-update={atomic,prefer-atomic} (prefer-atomic is for
best efforts (some targets do not support atomics)) to increment counters
atomically, which is exactly what we have done with -fprofile-instr-generate
(D50867) and -fprofile-arcs (b5ef137c11).
This patch adds the option to clang to surface the internal options at driver level.
GCC 7 also turned on -fprofile-update=prefer-atomic when -pthread is specified,
but it has performance regression
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307). So we don't follow suit.
Differential Revision: https://reviews.llvm.org/D87737
This reverts commit 55c4ff91bd.
Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
Instead of expliciting emitting a setc in the inline asm instructions,
we can use flag output. This allows the backend to use the flag
directly if it is needed by a branch. Previously we needed a test
instruction to convert the register back to a flag.
If the flag can't be used directly, the backend will emit a setcc.
Differential Revision: https://reviews.llvm.org/D87888
Extend -fsanitize=nullability-arg to handle call sites which accept C++
member pointers.
rdar://62476022
Differential Revision: https://reviews.llvm.org/D88336
- `-cl-fp32-correctly-rounded-divide-sqrt` is an OpenCL-specific option
and `correctly-rounded-divide-sqrt-fp-math` should be added for OpenCL
at most.
Differential revision: https://reviews.llvm.org/D88303
After some recent upstream discussion we decided that it was best
to avoid having the / operator for both ElementCount and TypeSize,
since this could give the impression that these classes can be used
in the same way as basic integer integer types. However, division
for scalable types is a bit odd because we are only dividing the
minimum quantity by a value, as opposed to something like:
(MinSize * Vscale) / SomeValue
This is why when performing division it's important the caller
first establishes whether the operation makes sense, perhaps by
calling isKnownMultipleOf() prior to division. The caller must now
explictly call divideCoefficientBy() on the class to perform the
operation.
Differential Revision: https://reviews.llvm.org/D87700
References to different declarations of the same entity aren't different
values, so shouldn't have different representations.
Recommit of e6393ee813 with fixed handling
for weak declarations. We now look for attributes on the most recent
declaration when determining whether a declaration is weak. (Second
recommit with further fixes for mishandling of weak declarations. Our
behavior here is fundamentally unsound -- see PR47663 -- but this
approach attempts to not make things worse.)
Previously for nowait target, CG emitted a function call to `__tgt_target_nowait`, etc. However, in OpenMP RTL, these functions just directly call the no-nowait version, which means nowait is not working as expected.
OpenMP specification says a target is acutally a target task, which is an untied and detachable task. It is natural to go to the direction that generates a task for a nowait target. However, OpenMP task has a problem that it must be within to a parallel region; otherwise the task will be executed immediately. As a result, if we directly wrap to a regular task, the `target nowait` outside of a parallel region is still a synchronous version.
In D77609, I added the support for unshackled task in OpenMP RTL. Basically, unshackled task is a task that is not bound to any parallel region. So all nowait target will be tranformed into an unshackled task. In order to distinguish from regular task, a new flag bit is set for unshackled task. This flag will be used by RTL for later process.
Since all target tasks are allocated via `__kmpc_omp_target_task_alloc`, and in current `libomptarget`, `__kmpc_omp_target_task_alloc` just calls `__kmpc_omp_task_alloc`. Therefore, we can modify the flag in `__kmpc_omp_target_task_alloc` so that we don't need to modify the FE too much. If users choose to opt out the feature, they just need to use a RTL w/o support of unshackled threads.
As a result, in this patch, the `target nowait` region is simply wrapped into a regular task. Later once we have RTL support for unshackled tasks, the wrapped tasks can be executed by unshackled threads w/o changes in the FE.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D78075
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
PAC/BTI-related codegen in the AArch64 backend is controlled by a set
of LLVM IR function attributes, added to the function by Clang, based
on command-line options and GCC-style function attributes. However,
functions, generated in the LLVM middle end (for example,
asan.module.ctor or __llvm_gcov_write_out) do not get any attributes
and the backend incorrectly does not do any PAC/BTI code generation.
This patch record the default state of PAC/BTI codegen in a set of
LLVM IR module-level attributes, based on command-line options:
* "sign-return-address", with non-zero value means generate code to
sign return addresses (PAC-RET), zero value means disable PAC-RET.
* "sign-return-address-all", with non-zero value means enable PAC-RET
for all functions, zero value means enable PAC-RET only for
functions, which spill LR.
* "sign-return-address-with-bkey", with non-zero value means use B-key
for signing, zero value mean use A-key.
This set of attributes are always added for AArch64 targets (as
opposed, for example, to interpreting a missing attribute as having a
value 0) in order to be able to check for conflicts when combining
module attributed during LTO.
Module-level attributes are overridden by function level attributes.
All the decision making about whether to not to generate PAC and/or
BTI code is factored out into AArch64FunctionInfo, there shouldn't be
any places left, other than AArch64FunctionInfo, which directly
examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which
is/will-be handled by a separate patch.
Differential Revision: https://reviews.llvm.org/D85649
Add the ability to selectively instrument a subset of functions by dividing the functions into N logical groups and then selecting a group to cover. By selecting different groups over time you could cover the entire application incrementally with lower overhead than instrumenting the entire application at once.
Differential Revision: https://reviews.llvm.org/D87953
Passing them directly is likely to be non-conforming, since it usually
involves copying the bytes of the record. For unknown architectures, we
don't know what MSVC does or will do, but we should at least try to
conform as well as we can.
Regardless of the target architecture, we should always use the C rules
(RAA_Default) for records that "canBePassedInRegisters". Those are
trivially copyable things, and things marked with [[trivial_abi]].
This should be NFC, although it changes where the final decision about
x86_32 overaligned records is made. The current x86_32 C rules say that
overaligned things are passed indirectly, so there is no functional
difference.
constructors.
This changes the code to avoid using constructor homing for aggregate
classes and classes with trivial default constructors, instead of trying
to loop through the constructors.
Differential Revision: https://reviews.llvm.org/D87808
As mentioned in the bug report, tryEmitPrivate chokes on the
MaterializeTemporaryExpr in the reproducers, since it assumes that if
there are elements, than it must be a ConstantArrayType. However, the
MaterializeTemporaryExpr (which matches exactly the AST when it is NOT a
global/static) has an incomplete array type.
This changes the section where the number-of-elements is non-zero to
properly handle non-CAT types by just extracting it as an array type
(since all we needed was the element type out of it).
This patch implements custom codegen for the vec_replace_elt and
vec_replace_unaligned builtins.
These builtins map to the @llvm.ppc.altivec.vinsw and @llvm.ppc.altivec.vinsd
intrinsics depending on the arguments. The main motivation for doing custom
codegen for these intrinsics is because there are float and double versions of
the builtin. Normally, the converting the float to an integer would be done via
fptoui in the IR. This is incorrect as fptoui truncates the value and we must
ensure the value is not truncated. Therefore, we provide custom codegen to utilize
bitcast instead as bitcasts do not truncate.
Differential Revision: https://reviews.llvm.org/D83500
I believe the inline asm emitted here should have a memory clobber since it writes to memory.
It was also missing the dirflag clobber that we use by default along with flags and fpsr. To avoid missing defaults in the future, get the default list from the target
Differential Revision: https://reviews.llvm.org/D88121
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.
Differential Revision: https://reviews.llvm.org/D87910
D87921 was reverted in commit b89059a313
as it was causing an unknown llvm PPC bot failure. Reapplying the patch
after confirming that this is not responsible. Build bot failure:
https://reviews.llvm.org/D87921#2286644 which caused the revert.
The wrong placement of add pass with optimizations led to
-funique-internal-linkage-names being disabled.
Fixed the placement of the MPM.addpass for UniqueInternalLinkageNames to make it
work correctly with -O2 and new pass manager. Updated the tests to explicitly
check O0 and O1.
Differential Revision: https://reviews.llvm.org/D87921
A static device variable may be accessed in host code through
cudaMemCpyFromSymbol etc. Currently clang does not
emit the static device variable if it is only referenced by
host code, which causes host code to fail at run time.
This patch fixes that.
Differential Revision: https://reviews.llvm.org/D88115
This completes the circle, complementing -lto-embed-bitcode
(specifically, post-merge-pre-opt). Using -thinlto-assume-merged skips
function importing. The index file is still needed for the other data it
contains.
Differential Revision: https://reviews.llvm.org/D87949
This updates the C++ ABI argument classification code to use the logic
from D72114, fixing an ABI incompatibility with MSVC.
Part of PR44395.
Differential Revision: https://reviews.llvm.org/D87923
Fixed the placement of the MPM.addpass for UniqueInternalLinkageNames to make
it work correctly with -O2 and new pass manager. Updated the tests to
explicitly check O0 and O2.
Previously, the addPass was placed before BackendUtil.cpp#L1373 which is wrong
as MPM gets assigned at this point and any additions to the pass vector before
this is wrong. This change just moves it after MPM is assigned and places it at
a point where O0 and O0+ can share it.
Differential Revision: https://reviews.llvm.org/D87921
This patch fixes the problem that user-defined mapper array is not correctly privatized inside a task. This problem causes openmp/libomptarget/test/offloading/target_depend_nowait.cpp fails.
Differential Revision: https://reviews.llvm.org/D84470
- After loading builtin bitcode for linking, skip adding default
function attributes on LLVM intrinsics as their attributes are
well-defined and retrieved directly from internal definitions. Adding
extra attributes on intrinsics results in inconsistent result when
`-save-temps` is present. Also, that makes few optimizations
conservative.
Differential Revision: https://reviews.llvm.org/D87761
This will embed bitcode after (Thin)LTO merge, but before optimizations.
In the case the thinlto backend is called from clang, the .llvmcmd
section is also produced. Doing so in the case where the caller is the
linker doesn't yet have a motivation, and would require plumbing through
command line args.
Differential Revision: https://reviews.llvm.org/D87636
Need to map the component as TO instead of the literal, because need to
pass a reference to a component if the pointer is overaligned.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84887
This patch adds a command line flag for the machine function splitter
(added in rG94faadaca4e1).
-fsplit-machine-functions
Split machine functions using profile information (x86 ELF). On
other targets an error is emitted. If profile information is not
provided a warning is emitted notifying the user that profile
information is required.
Differential Revision: https://reviews.llvm.org/D87047
Local vars, marked with pragma allocate, mustbe allocate by the call of
the runtime function and cannot be allocated as other local variables.
Instead, we allocate a space for the pointer in private record and store
the address, returned by kmpc_alloc call in this pointer.
So, for untied tasks
```
#pragma omp task untied
{
S s;
#pragma omp allocate(s) allocator(allocator)
s = x;
}
```
compiler generates something like this:
```
struct task_with_privates {
S *ptr;
};
void entry(task_with_privates *p) {
S *s = p->s;
switch(partid) {
case 1:
p->s = (S*)kmpc_alloc();
kmpc_omp_task();
br exit;
case 2:
*s = x;
kmpc_omp_task();
br exit;
case 2:
~S(s);
kmpc_free((void*)s);
br exit;
}
exit:
}
```
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86558
We're now getting close to having the necessary analysis/combines etc. for the new generic llvm smax/smin/umax/umin intrinsics.
This patch updates the SSE/AVX integer MINMAX intrinsics to emit the generic equivalents instead of the icmp+select code pattern.
Differential Revision: https://reviews.llvm.org/D87603
This is consistent with the clang option added in
7ed8124d46, and the comments on the
runtime patch in D87120.
Differential Revision: https://reviews.llvm.org/D87622
This is recommit of 6c8041aa0f, reverted in de044f7562 because of some
fails. Original commit message is below.
This change allow a CastExpr to have optional FPOptionsOverride object,
stored in trailing storage. Of all cast nodes only ImplicitCastExpr,
CStyleCastExpr, CXXFunctionalCastExpr and CXXStaticCastExpr are allowed
to have FPOptions.
Differential Revision: https://reviews.llvm.org/D85960
After the recent discussion on cfe-dev 'Can indirect class parameters be
noalias?' [1], it seems like using using noalias is problematic for
current C++, but should be allowed for C-only code.
This patch introduces a new option to let the user indicate that it is
safe to mark indirect class parameters as noalias. Note that this also
applies to external callers, e.g. it might not be safe to use this flag
for C functions that are called by C++ functions.
In targets that allocate indirect arguments in the called function, this
enables more agressive optimizations with respect to memory operations
and brings a ~1% - 2% codesize reduction for some programs.
[1] : http://lists.llvm.org/pipermail/cfe-dev/2020-July/066353.html
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D85473
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
This change allow a CastExpr to have optional FPOptionsOverride object,
stored in trailing storage. Of all cast nodes only ImplicitCastExpr,
CStyleCastExpr, CXXFunctionalCastExpr and CXXStaticCastExpr are allowed
to have FPOptions.
Differential Revision: https://reviews.llvm.org/D85960
This is the initial part of the implementation of the C++20 likelihood
attributes. It handles the attributes in an if statement.
Differential Revision: https://reviews.llvm.org/D85091
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.
One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87220
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
short a;
int b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.
The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```
Patch D16586 was updated to follow such behavior by verifying that we
only change volatile bit-field access when:
- it won't overlap with any other non-bit-field member
- we only access memory inside the bounds of the record
- avoid overlapping zero-length bit-fields.
Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.
Differential Revision: https://reviews.llvm.org/D72932
The following people contributed to this patch:
- Diogo Sampaio
- Ties Stuij
Fixes issue noticed by static analysis where we have a copy+paste typo, testing ScheduleKind.M1 twice instead of ScheduleKind.M2.
Differential Revision: https://reviews.llvm.org/D87250
We're now getting close to having the necessary analysis/combines etc. for the new generic llvm.abs.* intrinsics.
This patch updates the SSE/AVX ABS vector intrinsics to emit the generic equivalents instead of the icmp+sub+select code pattern.
Differential Revision: https://reviews.llvm.org/D87101
This change groups
* Rename: `ignoreParenBaseCasts` -> `IgnoreParenBaseCasts` for uniformity
* Rename: `IgnoreConversionOperator` -> `IgnoreConversionOperatorSingleStep` for uniformity
* Inline `IgnoreNoopCastsSingleStep` into a lambda inside `IgnoreNoopCasts`
* Refactor `IgnoreUnlessSpelledInSource` to make adequate use of `IgnoreExprNodes`
Differential Revision: https://reviews.llvm.org/D86880
This adds the size to forward declared class DITypes, if the size is known.
Fixes an issue where we determine whether to emit fragments based on the
type size, so fragments would sometimes be incorrectly emitted if there
was no size.
Bug: https://bugs.llvm.org/show_bug.cgi?id=47338
Differential Revision: https://reviews.llvm.org/D87062
Temporarily revert commit 04abbb3a78
due to regressions in some HIP apps due backend issues revealed by
this change.
Will re-commit it when backend issues are fixed.
This assert doesn't really make sense for functions in general, since they
start life as declarations, and there isn't really any reason to require them
to be defined before attributes are applied to them.
rdar://67895846
Previously, clang was crashing on the attached test because the EH cleanup for
the block capture was incorrectly emitted under the assumption that the
expression wasn't conditionally evaluated. This was because before 9a52de00260,
pushLifetimeExtendedDestroy was mainly used with C++ automatic lifetime
extension, where a conditionally evaluated expression wasn't possible. Now that
we're using this path for block captures, we need to handle this case.
rdar://66250047
Differential revision: https://reviews.llvm.org/D86854
This relands D85743 with a fix for test
CodeGen/attr-arm-sve-vector-bits-call.c that disables the new pass
manager with '-fno-experimental-new-pass-manager'. Test was failing due
to IR differences with the new pass manager which broke the Fuchsia
builder [1]. Reverted in 2e7041f.
[1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375
Original summary:
This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.
VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:
* Implicit casting between VLA <-> VLS types.
* Coercion of VLS types in function args/return.
* Mangling of VLS types.
Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.
Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.
The VLA and VLS types are defined by the ACLE to map to the same
machine-level SVE vectors. VLS types are mangled in the same way as:
__SVE_VLS<typename, unsigned>
where the first argument is the underlying variable-length type and the
second argument is the SVE vector length in bits. For example:
#if __ARM_FEATURE_SVE_BITS==512
// Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE
typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));
// Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE
typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));
#endif
The latest ACLE specification (00bet5) does not contain details of this
mangling scheme, it will be specified in the next revision. The
mangling scheme is otherwise defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.
[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85743
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
It's not undefined behavior for an unsigned left shift to overflow (i.e. to
shift bits out), but it has been the source of bugs and exploits in certain
codebases in the past. As we do in other parts of UBSan, this patch adds a
dynamic checker which acts beyond UBSan and checks other sources of errors. The
option is enabled as part of -fsanitize=integer.
The flag is named: -fsanitize=unsigned-shift-base
This matches shift-base and shift-exponent flags.
<rdar://problem/46129047>
Differential Revision: https://reviews.llvm.org/D86000
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).
The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.
This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D86146
See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
Note that the runtime changes will be sent separately (hopefully this
week, need to add some tests).
This patch includes the LLVM pass to instrument memory accesses with
either inline sequences to increment the access count in the shadow
location, or alternatively to call into the runtime. It also changes
calls to memset/memcpy/memmove to the equivalent runtime version.
The pass is modeled on the address sanitizer pass.
The clang changes add the driver option to invoke the new pass, and to
link with the upcoming heap profiling runtime libraries.
Currently there is no attempt to optimize the instrumentation, e.g. to
aggregate updates to the same memory allocation. That will be
implemented as follow on work.
Differential Revision: https://reviews.llvm.org/D85948
This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.
VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:
* Implicit casting between VLA <-> VLS types.
* Coercion of VLS types in function args/return.
* Mangling of VLS types.
Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.
Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.
The VLA and VLS types are defined by the ACLE to map to the same
machine-level SVE vectors. VLS types are mangled in the same way as:
__SVE_VLS<typename, unsigned>
where the first argument is the underlying variable-length type and the
second argument is the SVE vector length in bits. For example:
#if __ARM_FEATURE_SVE_BITS==512
// Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE
typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));
// Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE
typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));
#endif
The latest ACLE specification (00bet5) does not contain details of this
mangling scheme, it will be specified in the next revision. The
mangling scheme is otherwise defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.
[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85743
This patch adds type information for SVE ACLE vector types,
by describing them as vectors, with a lower bound of 0, and
an upper bound described by a DWARF expression using the
AArch64 Vector Granule register (VG), which contains the
runtime multiple of 64bit granules in an SVE vector.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86101
For some reason the ctor homing case was before the template
specialization case, and could have returned false too early.
I moved the code out into a separate function to avoid this.
This reverts commit 05777ab941.
This patch adds the -Xclang option
"-fexperimental-debug-variable-locations" and same LLVM CodeGen option,
to pick which variable location tracking solution to use.
Right now all the switch does is pick which LiveDebugValues
implementation to use, the normal VarLoc one or the instruction
referencing one in rGae6f78824031. Over time, the aim is to add fragments
of support in aid of the value-tracking RFC:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
also controlled by this command line switch. That will slowly move
variable locations to be defined by an instruction calculating a value,
and a DBG_INSTR_REF instruction referring to that value. Thus, this is
going to grow into a "use the new kind of variable locations" switch,
rather than just "use the new LiveDebugValues implementation".
Differential Revision: https://reviews.llvm.org/D83048
For some reason the ctor homing case was before the template
specialization case, and could have returned false too early.
I moved the code out into a separate function to avoid this.
Also added a run line to the template specialization test. I guess
all the -debug-info-kind=limited tests should still pass with =constructor,
but it's probably unnecessary to test for all of those.
Differential Revision: https://reviews.llvm.org/D86491
The orignal patch with the missing 'REQUIRES: asserts' as there is a debug-only
flag used in the test.
Original summary:
D81347 changes the ASTFileSignature to be an array of 20 uint8_t instead of 5
uint32_t. However, it didn't update the code in ObjectFilePCHContainerOperations
that creates the dwoID in the module from the ASTFileSignature
(`Buffer->Signature` being the array subclass that is now `std::array<uint8_t,
20>` instead of `std::array<uint32_t, 5>`).
```
uint64_t Signature = [..] (uint64_t)Buffer->Signature[1] << 32 | Buffer->Signature[0]
```
This code works with the old ASTFileSignature (where two uint32_t are enough to
fill the uint64_t), but after the patch this only took two bytes from the
ASTFileSignature and only partly filled the Signature uint64_t.
This caused that the dwoID in the module ref and the dwoID in the actual module
no longer match (which in turns causes that LLDB keeps warning about the dwoID's
not matching when debugging -gmodules-compiled binaries).
This patch just unifies the logic for turning the ASTFileSignature into an
uint64_t which makes the dwoID match again (and should prevent issues like that
in the future).
Reviewed By: aprantl, dang
Differential Revision: https://reviews.llvm.org/D84013
This changes the methods in CGExprScalar to use
FixedPointBuilder to generate IR for fixed-point
conversions and operations.
Since FixedPointBuilder emits padded operations slightly
differently than the original code, some tests change.
Reviewed By: leonardchan
Differential Revision: https://reviews.llvm.org/D86282
This relands D84013 but with a test that relies on less shell features to
hopefully make the test pass on Fuchsia (where the test from the previous patch
version strangely failed with a plain "Exit code 1").
Original summary:
D81347 changes the ASTFileSignature to be an array of 20 uint8_t instead of 5 uint32_t.
However, it didn't update the code in ObjectFilePCHContainerOperations that creates
the dwoID in the module from the ASTFileSignature (`Buffer->Signature` being the
array subclass that is now `std::array<uint8_t, 20>` instead of `std::array<uint32_t, 5>`).
```
uint64_t Signature = [..] (uint64_t)Buffer->Signature[1] << 32 | Buffer->Signature[0]
```
This code works with the old ASTFileSignature (where two uint32_t are enough to
fill the uint64_t), but after the patch this only took two bytes from the ASTFileSignature
and only partly filled the Signature uint64_t.
This caused that the dwoID in the module ref and the dwoID in the actual module no
longer match (which in turns causes that LLDB keeps warning about the dwoID's not
matching when debugging -gmodules-compiled binaries).
This patch just unifies the logic for turning the ASTFileSignature into an uint64_t which
makes the dwoID match again (and should prevent issues like that in the future).
Reviewed By: aprantl, dang
Differential Revision: https://reviews.llvm.org/D84013
D81347 changes the ASTFileSignature to be an array of 20 uint8_t instead of 5
uint32_t. However, it didn't update the code in ObjectFilePCHContainerOperations
that creates the dwoID in the module from the ASTFileSignature
(`Buffer->Signature` being the array subclass that is now `std::array<uint8_t,
20>` instead of `std::array<uint32_t, 5>`).
```
uint64_t Signature = [..] (uint64_t)Buffer->Signature[1] << 32 | Buffer->Signature[0]
```
This code works with the old ASTFileSignature (where two uint32_t are enough to
fill the uint64_t), but after the patch this only took two bytes from the
ASTFileSignature and only partly filled the Signature uint64_t.
This caused that the dwoID in the module ref and the dwoID in the actual module
no longer match (which in turns causes that LLDB keeps warning about the dwoID's
not matching when debugging -gmodules-compiled binaries).
This patch just unifies the logic for turning the ASTFileSignature into an
uint64_t which makes the dwoID match again (and should prevent issues like that
in the future).
Reviewed By: aprantl, dang
Differential Revision: https://reviews.llvm.org/D84013
This patch moves FixedPointSemantics and APFixedPoint
from Clang to LLVM ADT.
This will make it easier to use the fixed-point
classes in LLVM for constructing an IR builder for
fixed-point and for reusing the APFixedPoint class
for constant evaluation purposes.
RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144025.html
Reviewed By: leonardchan, rjmccall
Differential Revision: https://reviews.llvm.org/D85312
This adds parsing and codegen support for tune in target attribute.
I've implemented this so that arch in the target attribute implicitly disables tune from the command line. I'm not sure what gcc does here. But since -march implies -mtune. I assume 'arch' in the target attribute implies tune in the target attribute.
Differential Revision: https://reviews.llvm.org/D86187
This function returns a struct `BuiltinVectorTypeInfo` that contains
the builtin vector's element type, element count and number of vectors
(used for vector tuples).
Reviewed By: c-rhodes
Differential Revision: https://reviews.llvm.org/D86100
Building on the backend support from D85165. This parses the command line option in the driver, passes it on to CC1 and adds a function attribute.
-Still need to support tune on the target attribute.
-Need to use "generic" as the tuning by default. But need to change generic in the backend first.
-Need to set tune if march is specified and mtune isn't.
-May need to disable getHostCPUName's ability to guess CPU name from features when it doesn't have a family/model match for mtune=native. That's what gcc appears to do.
Differential Revision: https://reviews.llvm.org/D85384
When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.
Verified on the reproducer of PR46156 and QMCPack.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D86037
These functions won't ever unwind. This is useful for MemorySanitizer
as it simplifies handling __atomic_load in particular.
Differential Revision: https://reviews.llvm.org/D85573
We can use this to remove some calls to initFeatureMap from Sema
and CodeGen when a function doesn't have a target attribute.
This reduces compile time of the linux kernel where this map
is needed to diagnose some inline assembly constraints based
on whether sse, avx, or avx512 is enabled.
Differential Revision: https://reviews.llvm.org/D85807
Need to call getRawStmt() function instead, when trying to get inner
associated statement for the executable directive. Not all directives
use captured statements.
In untied tasks, need to allocate the space for local variales, declared
in task region, when the memory for task data is allocated. THe function
can be interrupted and we can exit from the function in untied task
switch. Need to keep the state of the local variables in this case.
Also, the compiler should not call cleanup when exiting in untied task
switch until the real exit out of the declaration scope is met during
execution.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D84457
If the arguments are mapped, but are actually not used in the target
region, the compiler still adds attribute TGT_OMP_TARGET_PARAM for such
arguments. It makes the libomptarget to add such parameters to the list
of arguments, passed to the kernel at the runtime, and may lead to
incorrect results/crashes during execution.
Differential Revision: https://reviews.llvm.org/D85755
Summary:
In untied tasks, need to allocate the space for local variales, declared
in task region, when the memory for task data is allocated. THe function
can be interrupted and we can exit from the function in untied task
switch. Need to keep the state of the local variables in this case.
Also, the compiler should not call cleanup when exiting in untied task
switch until the real exit out of the declaration scope is met during
execution.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, cfe-commits, sstefan1, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D84457
Adds the binary format goff and the operating system zos to the triple
class. goff is selected as default binary format if zos is choosen as
operating system. No further functionality is added.
Reviewers: efriedma, tahonermann, hubert.reinterpertcast, MaskRay
Reviewed By: efriedma, tahonermann, hubert.reinterpertcast
Differential Revision: https://reviews.llvm.org/D82081
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D85385
Replace the `ident_t` handling in Clang with the methods offered by the
OMPIRBuilder. This cuts down on the clang code as well as the
differences between the two, making further transitions easier. Tests
have changed but there should not be a real functional change. The most
interesting difference is probably that we stop generating local ident_t
allocations for now and just use globals. Given that this happens only
with debug info, the location part of the `ident_t` is probably bigger
than the test anyway. As the location part is already a global, we can
avoid the allocation, memcpy, and store in favor of a constant global
that is slightly bigger. This can be revisited if there are
complications.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D80735
Fixes pr/11710.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Resubmit after breaking Windows and OSX builds.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D80242
- Skip generating profile data on `__global__` function in the host
compilation. It's a host-side stub function only and don't have
profile instrumentation generated on the real function body. The extra
profile data results in the malformed instrumentation profile data.
- Skip generating region mapping on functions in the wrong-side, i.e.,
+ For the device compilation, skip host-only functions; and,
+ For the host compilation, skip device-only functions (including
`__global__` functions.)
- As the device-side profiling is not ready yet, only host-side profile
code generation is checked.
Differential Revision: https://reviews.llvm.org/D85276
On the frontend side, this patch recovers AIX static init implementation to
use the linkage type and function names Clang chooses for sinit related function.
On the backend side, this patch sets correct linkage and function names on aliases
created for sinit/sterm functions.
Differential Revision: https://reviews.llvm.org/D84534
Add address space to indirect abi info and use it for kernels.
Previously, indirect arguments assumed assumed a stack passed object
in the alloca address space using byval. A stack pointer is unsuitable
for kernel arguments, which are passed in a separate, constant buffer
with a different address space.
Start using the new byref for aggregate kernel arguments. Previously
these were emitted as raw struct arguments, and turned into loads in
the backend. These will lower identically, although with byref you now
have the option of applying an explicit alignment. In the future, a
reasonable implementation would use byref for all kernel arguments
(this would be a practical problem at the moment due to losing things
like noalias on pointer arguments).
This is mostly to avoid fighting the optimizer's treatment of
aggregate load/store. SROA and instcombine both turn aggregate loads
and stores into a long sequence of element loads and stores, rather
than the optimizable memcpy I would expect in this situation. Now an
explicit memcpy will be introduced up-front which is better understood
and helps eliminate the alloca in more situations.
This skips using byref in the case where HIP kernel pointer arguments
in structs are promoted to global pointers. At minimum an additional
patch is needed to allow coercion with indirect arguments. This also
skips using it for OpenCL due to the current workaround used to
support kernels calling kernels. Distinct function bodies would need
to be generated up front instead of emitting an illegal call.
Summary:
Introduced OMPChildren class to handle all associated clauses, statement
and child expressions/statements. It allows to represent some directives
more correctly (like flush, depobj etc. with pseudo clauses, ordered
depend directives, which are standalone, and target data directives).
Also, it will make easier to avoid using of CapturedStmt in directives,
if required (atomic, tile etc. directives).
Also, it simplifies serialization/deserialization of the
executable/declarative directives.
Reduces number of allocation operations for mapper declarations.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, jfb, cfe-commits, sstefan1, aaron.ballman, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83261
The second argument of getNaturalAlignIndirect() was `bool ByRef`, but
the implementation was just delegating to getIndirect() with `ByRef`
passed unchanged to `bool ByVal` parameter of getIndirect().
Fix a couple of /*ByRef=*/ comments as well.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D85113
The _ExtInt(1) in getTypeForMem was hitting the bool logic for expanding
to an 8 bit value. The result was an assert, or store i1 %0, i8* %2, align 1
since the parameter IS an i1. This patch changes the 'forMem' test to
exclude ext-int from the bool test.
Without this patch, the following example fails but shouldn't
according to OpenMP TR8:
```
#pragma omp target enter data map(alloc:i)
#pragma omp target data map(present, alloc: i)
{
#pragma omp target exit data map(delete:i)
} // fails presence check here
```
OpenMP TR8 sec. 2.22.7.1 "map Clause", p. 321, L23-26 states:
> If the map clause appears on a target, target data, target enter
> data or target exit data construct with a present map-type-modifier
> then on entry to the region if the corresponding list item does not
> appear in the device data environment an error occurs and the
> program terminates.
There is no corresponding statement about the exit from a region.
Thus, the `present` modifier should:
1. Check for presence upon entry into any region, including a `target
exit data` region. This behavior is already implemented correctly.
2. Should not check for presence upon exit from any region, including
a `target` or `target data` region. Without this patch, this
behavior is not implemented correctly, breaking the above example.
In the case of `target data`, this patch fixes the latter behavior by
removing the `present` modifier from the map types Clang generates for
the runtime call at the end of the region.
In the case of `target`, we have not found a valid OpenMP program for
which such a fix would matter. It appears that, if a program can
guarantee that data is present at the beginning of a `target` region
so that there's no error there, that data is also guaranteed to be
present at the end. This patch adds a comment to the runtime to
document this case.
Reviewed By: grokos, RaviNarayanaswamy, ABataev
Differential Revision: https://reviews.llvm.org/D84422
This patch simplified IR generation for __builtin_btf_type_id().
For __builtin_btf_type_id(obj, flag), previously IR builtin
looks like
if (obj is a lvalue)
llvm.bpf.btf.type.id(obj.ptr, 1, flag) !type
else
llvm.bpf.btf.type.id(obj, 0, flag) !type
The purpose of the 2nd argument is to differentiate
__builtin_btf_type_id(obj, flag) where obj is a lvalue
vs.
__builtin_btf_type_id(obj.ptr, flag)
Note that obj or obj.ptr is never used by the backend
and the `obj` argument is only used to derive the type.
This code sequence is subject to potential llvm CSE when
- obj is the same .e.g., nullptr
- flag is the same
- metadata type is different, e.g., typedef of struct "s"
and strust "s".
In the above, we don't want CSE since their metadata is different.
This patch change IR builtin to
llvm.bpf.btf.type.id(seq_num, flag) !type
and seq_num is always increasing. This will prevent potential
llvm CSE.
Also report an error if the type name is empty for
remote relocation since remote relocation needs non-empty
type name to do relocation against vmlinux.
Differential Revision: https://reviews.llvm.org/D85174
This patch added the following additional compile-once
run-everywhere (CO-RE) relocations:
- existence/size of typedef, struct/union or enum type
- enum value and enum value existence
These additional relocations will make CO-RE bpf programs more
adaptive for potential kernel internal data structure changes.
For existence/size relocations, the following two code patterns
are supported:
1. uint32_t __builtin_preserve_type_info(*(<type> *)0, flag);
2. <type> var;
uint32_t __builtin_preserve_field_info(var, flag);
flag = 0 for existence relocation and flag = 1 for size relocation.
For enum value existence and enum value relocations, the following code
pattern is supported:
uint64_t __builtin_preserve_enum_value(*(<enum_type> *)<enum_value>,
flag);
flag = 0 means existence relocation and flag = 1 for enum value.
relocation. In the above <enum_type> can be an enum type or
a typedef to enum type. The <enum_value> needs to be an enumerator
value from the same enum type. The return type is uint64_t to
permit potential 64bit enumerator values.
Differential Revision: https://reviews.llvm.org/D83242
In order to follow NEC Aurora SX VE ABI correctly, change to sign/zero
extend integer arguments and return values smaller than 64 bits in clang.
Also update regression test.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D85071
Specified in https://github.com/WebAssembly/simd/pull/237, these
instructions load the first vector lane from memory and zero the other
lanes. Since these instructions are not officially part of the SIMD
proposal, they are only available on an opt-in basis via LLVM
intrinsics and clang builtin functions. If these instructions are
merged to the proposal, this implementation will change so that the
instructions will be generated from normal IR. At that point the
intrinsics and builtin functions would be removed.
This PR also changes the opcodes for the experimental f32x4.qfm{a,s}
instructions because their opcodes conflicted with those of the
v128.load{32,64}_zero instructions. The new opcodes were chosen to
match those used in V8.
Differential Revision: https://reviews.llvm.org/D84820
notail on x86-64
This is needed because the epilogue code inserted before tail calls on
x86-64 breaks the handshake between the caller and callee.
Calls to objc_retainAutoreleasedReturnValue used to have the same
problem, which was fixed in https://reviews.llvm.org/D59656.
rdar://problem/66029552
Differential Revision: https://reviews.llvm.org/D84540
Provides AMDGCN and NVPTX specific specialization of getGPUWarpSize,
getGPUThreadID, and getGPUNumThreads methods. Adds tests for AMDGCN
codegen for these methods in generic and simd modes. Also changes the
precondition in InitTempAlloca to be slightly more permissive. Useful for
AMDGCN OpenMP codegen where allocas are created with a cast to an
address space.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84260
fptosi/fptoui have similar, but not identical, semantics. In
particular, the behavior on overflow is different.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46844 for 64-bit. (The
corresponding patch for 32-bit is more involved because the equivalent
intrinsics don't exist, as far as I can tell.)
Differential Revision: https://reviews.llvm.org/D84703
We previously used a non-aggregate RValue to represent the passed value,
which violated the assumptions of call arg lowering in some cases, in
particular on 32-bit Windows, where we'd end up producing an FCA store
with TBAA metadata, that the IR verifier would reject.
When we use the OpenMPIRBuilder for the parallel region we need to also
use it to get the thread ID (among other things) in the body. This is
because CGOpenMPRuntime::getThreadID() and
CGOpenMPRuntime::emitUpdateLocation implicitly assumes that if they are
called from within a parallel region there is a certain structure to the
code and certain members of the OMPRegionInfo are initialized. It might
make sense to initialize them even if we use the OpenMPIRBuilder but we
would preferably get rid of such state instead.
Bug reported by Anchu Rajendran Sudhakumari.
Depends on D82470.
Reviewed By: anchu-rajendran
Differential Revision: https://reviews.llvm.org/D82822
We need to keep track of the alloca insertion point (which we already
communicate via the callback to the user) as we place allocas as well.
Reviewed By: fghanim, SouraVX
Differential Revision: https://reviews.llvm.org/D82470
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D84767
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
It applies only for global pointers.
Differential Revision: https://reviews.llvm.org/D84767
Previously ctor homing was omitting debug info for classes if they
have both trival and nontrivial constructors, but we should only omit debug
info if the class doesn't have any trivial constructors.
retained types list.
bug: https://bugs.llvm.org/show_bug.cgi?id=46537
Differential Revision: https://reviews.llvm.org/D84870
This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives. The
next patch in this series implements OpenMP runtime support.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84711
This patch introduces 2 new address spaces in OpenCL: global_device and global_host
which are a subset of a global address space, so the address space scheme will be
looking like:
```
generic->global->host
->device
->private
->local
constant
```
Justification: USM allocations may be associated with both host and device memory. We
want to give users a way to tell the compiler the allocation type of a USM pointer for
optimization purposes. (Link to the Unified Shared Memory extension:
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc)
Before this patch USM pointer could be only in opencl_global
address space, hence a device backend can't tell if a particular pointer
points to host or device memory. On FPGAs at least we can generate more
efficient hardware code if the user tells us where the pointer can point -
being able to distinguish between these types of pointers at compile time
allows us to instantiate simpler load-store units to perform memory
transactions.
Patch by Dmitry Sidorov.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D82174
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.
Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.
Differential Revision: https://reviews.llvm.org/D84556
This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives. The
next patch in this series implements OpenMP runtime support.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D84711
types.
We previously did not treat a function type as dependent if it had a
parameter pack with a non-dependent type -- such a function type depends
on the arity of the pack so is dependent even though none of the
parameter types is dependent. In order to properly handle this, we now
treat pack expansion types as always being dependent types (depending on
at least the pack arity), and always canonically being pack expansion
types, even in the unusual case when the pattern is not a dependent
type. This does mean that we can have canonical types that are pack
expansions that contain no unexpanded packs, which is unfortunate but
not inaccurate.
We also previously did not treat a typedef type as
instantiation-dependent if its canonical type was not
instantiation-dependent. That's wrong because instantiation-dependence
is a property of the type sugar, not of the type; an
instantiation-dependent type can have a non-instantiation-dependent
canonical type.
This patch implements Clang front end support for the OpenMP TR8
`present` map type modifier. The next patch in this series implements
OpenMP runtime support.
This patch does not attempt to implement TR8 sec. 2.22.7.1 "map
Clause", p. 319, L14-16:
> If a map clause with a present map-type-modifier is present in a map
> clause, then the effect of the clause is ordered before all other
> map clauses that do not have the present modifier.
Compare to L10-11, which Clang does not appear to implement yet:
> For a given construct, the effect of a map clause with the to, from,
> or tofrom map-type is ordered before the effect of a map clause with
> the alloc, release, or delete map-type.
This patch also does not implement the `present` implicit-behavior for
`defaultmap` or the `present` motion-modifier for `target update`.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D83061
Pass LowerMatrixIntrinsics wasn't running yet running under the new pass
manager, and this adds LowerMatrixIntrinsics to the pipeline (to the
same place as where it is running in the old PM).
Differential Revision: https://reviews.llvm.org/D84180
Reapply 49e5f603d4
which had been reverted in c94332919b.
Originally reverted because I hadn't updated it in quite a while when I
got around to committing it, so there were a bunch of missing changes to
new code since I'd written the patch.
Reviewers: aaron.ballman
Differential Revision: https://reviews.llvm.org/D76646
Sometimes we also want to avoid merging inline assembly. This patch add
the nomerge function attribute to inline assembly.
Reviewed By: zequanwu
Differential Revision: https://reviews.llvm.org/D84225
Summary:
Need to avoid an optimization for base pointer mapping for target data
directives.
Reviewers: jdoerfert, ye-luo
Subscribers: yaxunl, guansong, cfe-commits, sstefan1, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D84182
OptNoneInstrumentation is part of StandardInstrumentations. It skips
functions (or loops) that are marked optnone.
The feature of skipping optional passes for optnone functions under NPM
is gated on a -enable-npm-optnone flag. Currently it is by default
false. That is because we still need to mark all required passes to be
required. Otherwise optnone functions will start having incorrect
semantics. After that is done in following changes, we can remove the
flag and always enable this.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D83519
Use common interface for accessing target specific GPU grid values in NVPTX
OpenMP codegen as proposed in https://reviews.llvm.org/D80917
Originally authored by Greg Rodgers (@gregrodgers).
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D83492
The `intrinsics_gen` target exists in the CMake exports since r309389
(see LLVMConfig.cmake.in), hence projects can depend on `intrinsics_gen`
even it they are built separately from LLVM.
Reviewed By: MaskRay, JDevlieghere
Differential Revision: https://reviews.llvm.org/D83454
On AIX, the semantic of global_dtors contains __sterm functions associated with C++
cleanup actions and user-declared __attribute__((destructor)) functions. We should
never merely register __sterm with atexit(), so currently
-fregister_global_dtors_with_atexit does not work well on AIX: It would cause
finalization actions to not occur when unloading shared libraries. We need to figure
out a way to handle that when we start supporting user-declared
__attribute__((destructor)) functions.
Currently we report_fatal_error on this option temporarily.
Differential Revision: https://reviews.llvm.org/D83974
Refactors CGOpenMPRuntimeNVPTX as CGOpenMPRuntimeGPU to make it a
generalization for OpenMP GPU Codegen. Target specific specialized
methods for NVPTX are defined in class CGOpenMPRuntimeNVPTX. This
paves the way for a clean and maintainable extension to more GPU
targets for OpenMP Codegen.
For original author (git blame) list of CGOpenMPRuntimeGPU code,
look in history of CGOpenMPRuntimeNVPTX.cpp and .h, after this commit.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D83723
This patch fixes the compilation warnings that L is not a reference.
Thanks to Lingda Li for providing the patch.
Differential Revision: https://reviews.llvm.org/D83959
Set the debug location for static init related functions(__dtor
and __finalize) so we can generate valid debug info on AIX by invoking
-g with clang or -debug-info-kind=limited with clang_cc1.
This also works for any other future targets who may use sinit and
sterm functions for static initialization, where a direct call to
dtor will be generated within finalize function body.
This patch also aims at validating that the debug info generated
is correct for AIX sinit related functions.
Differential Revision: https://reviews.llvm.org/D83702
This patch implements the code generation to use OpenMP 5.0 declare mapper (a.k.a. user-defined mapper) constructs.
Patch written by Lingda Li.
Differential Revision: https://reviews.llvm.org/D67833
llvm function is marked nounwind
This fixes cases where an invoke is emitted, despite the called llvm
function being marked nounwind, because ConstructAttributeList failed to
add the attribute to the attribute list. llvm optimization passes turn
invokes into calls and optimize away the exception handling code, but
it's better to avoid emitting the code in the front-end if the called
function is known not to raise an exception.
Differential Revision: https://reviews.llvm.org/D83906
Summary:
If user-defined reductions with the initializer are used with classes,
the compiler misses the constructor call when trying to create a private
copy of the reduction variable.
Reviewers: jdoerfert
Subscribers: cfe-commits, yaxunl, guansong, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83334
Summary:
If the original variable is marked for allocation in the different
address space using #pragma omp allocate, need to cast the allocated
variable to its original type with the original address space.
Otherwise, the compiler may crash trying to bitcast the type of the new
allocated variable to the original type in some cases, like passing this
variable as an argument in function calls.
Reviewers: jdoerfert
Subscribers: jholewinski, cfe-commits, yaxunl, guansong, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83696
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.
This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
Check that the implicit cast from `id` used to construct the element
variable in an ObjC for-in statement is valid.
This check is included as part of a new `objc-cast` sanitizer, outside
of the main 'undefined' group, as (IIUC) the behavior it's checking for
is not technically UB.
The check can be extended to cover other kinds of invalid casts in ObjC.
Partially addresses: rdar://12903059, rdar://9542496
Differential Revision: https://reviews.llvm.org/D71491
Summary:
Need to privatize addresses of the captured variables when trying to
emit the body of the target data directive in no target codegen mode.
Reviewers: jdoerfert
Subscribers: yaxunl, guansong, cfe-commits, sstefan1, caomhin
Tags: #clang
Differential Revision: https://reviews.llvm.org/D83478
There is a version that just tests (also called
isIntegerConstantExpression) & whereas this version is specifically used
when the value is of interest (a few call sites were actually refactored
to calling the test-only version) so let's make the API look more like
it.
Reviewers: aaron.ballman
Differential Revision: https://reviews.llvm.org/D76646
Remove _COMPAT. Drop the ARCHNAME. Remove the non-COMPAT versions
that are no longer needed.
We now only use these macros in places where we need compatibility
with libgcc/compiler-rt. So we don't need to call out _COMPAT
specifically.
This change fixed a SEH bug (exposed by test58 & test61 in MSVC test xcpt4u.c);
when an Except-filter is located inside a finally, the frame-pointer generated today
via intrinsic @llvm.eh.recoverfp is the frame-pointer of the immediate
parent _finally, not the frame-ptr of outermost host function.
The fix is to retrieve the Establisher's frame-pointer that was previously saved in
parent's frame.
The prolog of a filter inside a _finally should be like code below:
%0 = call i8* @llvm.eh.recoverfp(i8* bitcast (@"?fin$0@0@main@@"), i8*%frame_pointer)
%1 = call i8* @llvm.localrecover(i8* bitcast (@"?fin$0@0@main@@"), i8*%0, i32 0)
%2 = bitcast i8* %1 to i8**
%3 = load i8*, i8** %2, align 8
Differential Revision: https://reviews.llvm.org/D77982
There are various runtime calls in the device runtime with unused, or
always fixed, arguments. This is bad for all sorts of reasons. Clean up
two before as we match them in OpenMPOpt now.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D83268
Do not detect device library by default in rocm detector.
Only detect device library in Rocm and HIP toolchain.
Separate detection of HIP runtime and Rocm device library.
Detect rocm path by version file in host toolchains.
Also added detecting rocm version and printing rocm
installation path and version with -v.
Fixed include path and device library detection for
ROCm 3.5.
Added --hip-version option. Renamed --hip-device-lib-path
to --rocm-device-lib-path.
Fixed default value for -fhip-new-launch-api.
Added default -std option for HIP.
Differential Revision: https://reviews.llvm.org/D82930
thunk's return value slot directly when the return type is an aggregate
instead of doing so via a temporary
This fixes PR45997 (https://bugs.llvm.org/show_bug.cgi?id=45997), which
is caused by a bug that has existed since we started passing and
returning C++ structs with ObjC strong pointer members (see
https://reviews.llvm.org/D44908) or structs annotated with trivial_abi
directly.
rdar://problem/63740936
Differential Revision: https://reviews.llvm.org/D82513
Currently, Clang previously diagnosed this code by default:
void f(int a[static 0]);
saying that "static has no effect on zero-length arrays", which was
accurate.
However, static array extents require that the caller of the function
pass a nonnull pointer to an array of *at least* that number of
elements, but it can pass more (see C17 6.7.6.3p6). Given that we allow
zero-sized arrays as a GNU extension and that it's valid to pass more
elements than specified by the static array extent, we now support
zero-sized static array extents with the usual semantics because it can
be useful in cases like:
void my_bzero(char p[static 0], int n);
my_bzero(&c+1, 0); //ok
my_bzero(t+k,n-k); //ok, pattern from actual code
Many platform ABIs have special support for passing aggregates that
either just contain a single member of floatint-point type, or else
a homogeneous set of members of the same floating-point type.
When making this determination, any extra "empty" members of the
aggregate type will typically be ignored. However, in C++ (at least
in all prior versions), no data member would actually count as empty,
even if it's type is an empty record -- it would still be considered
to take up at least one byte of space, and therefore make those ABI
special cases not apply.
This is now changing in C++20, which introduced the [[no_unique_address]]
attribute. Members of empty record type, if they also carry this
attribute, now do *not* take up any space in the type, and therefore
the ABI special cases for single-element or homogeneous aggregates
should apply.
The C++ Itanium ABI has been updated accordingly, and GCC 10 has
added support for this new case. This patch now adds support to
LLVM. This is cross-platform; it affects all platforms that use
the single-element or homogeneous aggregate ABI special case and
implement this using any of the following common subroutines
in lib/CodeGen/TargetInfo.cpp:
isEmptyField
isEmptyRecord
isSingleElementStruct
isHomogeneousAggregate
Summary:
D82193 exposed a problem with global type definitions in
`OMPConstants.h`. This causes a race when running in thinLTO mode.
Types now live inside of OpenMPIRBuilder to prevent this from happening.
Reviewers: jdoerfert
Subscribers: yaxunl, hiraditya, guansong, dexonsmith, aaron.ballman, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D83176
The SystemZ ABI specifies that aggregate types with just a single
member of floating-point type shall be passed as if they were just
a scalar of that type. This applies to both struct and class types
(but not unions).
However, the current ABI support code in clang only checks this
case for struct types, which means that for class types, generated
code does not adhere to the platform ABI.
Fixed by accepting both struct and class types in the
SystemZABIInfo::GetSingleElementType routine.
When using __sync_nand_and_fetch with __int128, a problem is found that
the wrong value for the 'invert' value gets emitted to the xor in case
where the int size is greater than 64 bits.
This is because uses of llvm::ConstantInt::get which zero extends the
greater than 64 bits, so instead -1 that we require, it end up
getting 18446744073709551615
This patch replaces the call to llvm::ConstantInt::get with the call
to llvm::Constant::getAllOnesValue which works for all integer types.
Reviewers: jfp, erichkeane, rjmccall, hfinkel
Differential Revision: https://reviews.llvm.org/D82832