When generating __clang_call_terminate use SetLLVMFunctionAttributes
to set the default function attributes, like we do for all the other
functions generated by clang. This fixes a problem where target
features from the command line weren't being applied to this function.
Differential Revision: https://reviews.llvm.org/D138679
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6, ABataev
Differential Revision: https://reviews.llvm.org/D102107
I used a script to reuse existing check lines rather than creating new
ones. There are more opportunities to reduce the line count but the
"check generated functions" logic makes that somewhat tricky.
FWIW, we really should redo the update script with all these use cases
in mind...
Differential Revision: https://reviews.llvm.org/D128686
This adds -no-opaque-pointers to clang tests whose output will
change when opaque pointers are enabled by default. This is
intended to be part of the migration approach described in
https://discourse.llvm.org/t/enabling-opaque-pointers-by-default/61322/9.
The patch has been produced by replacing %clang_cc1 with
%clang_cc1 -no-opaque-pointers for tests that fail with opaque
pointers enabled. Worth noting that this doesn't cover all tests,
there's a remaining ~40 tests not using %clang_cc1 that will need
a followup change.
Differential Revision: https://reviews.llvm.org/D123115
EHTerminateScope is used to implement C++ noexcept semantics. Per C++
[except.terminate], it is implemented-defined whether no, some, or all
cleanups are run prior to terminatation.
Therefore, the code to run cleanups on the way towards termination is
unnecessary, and may be omitted.
After this change, we will still run some cleanups: any cleanups in a
function called from the noexcept function will continue to run, while
those in the noexcept function itself will not.
(Commit attempt 2: check InnermostEHScope != stable_end() before accessing it.)
Differential Revision: https://reviews.llvm.org/D113620
When adding new attributes, existing attributes are dropped. While
this appears to be a longstanding issue, this was highlighted by D105169
which dropped a lot of attributes due to adding the new noundef
attribute.
Ahmed Bougacha (@ab) tracked down the issue and provided the fix in
CGCall.cpp. I bundled it up and updated the tests.
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
EHTerminateScope is used to implement C++ noexcept semantics. Per C++
[except.terminate], it is implemented-defined whether no, some, or all
cleanups are run prior to terminatation.
Therefore, the code to run cleanups on the way towards termination is
unnecessary, and may be omitted.
After this change, we will still run some cleanups: any cleanups in a
function called from the noexcept function will continue to run, while
those in the noexcept function itself will not.
Differential Revision: https://reviews.llvm.org/D113620
Alternative to D116817.
This introduces a new value-based folding interface for Or (FoldOr),
which takes 2 values and returns an existing Value or a constant if the
Or can be simplified. Otherwise nullptr is returned. This replaces the
more restrictive CreateOr which takes 2 constants.
This is the used to implement a folder that uses InstructionSimplify.
The logic to simplify `Or` instructions is moved there. Subsequent
patches are going to transition other CreateXXX to the more general
FoldXXX interface.
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D116935
The fold for merging a GEP of GEP into a single GEP currently bails
if doing so would result in notional overindexing. The justification
given in the comment above this check is dangerously incorrect: GEPs
with notional overindexing are perfectly fine, and if some code
treats them incorrectly, then that code is broken, not the GEP.
Such a GEP might legally appear in source IR, so only preventing
its creation cannot be sufficient. (The constant folder also ends
up canonicalizing the GEP to remove the notional overindexing, but
that's neither here nor there.)
This check dates back to
bd4fef4a89,
and as far as I can tell the original issue this was trying to
patch around has since been resolved.
Differential Revision: https://reviews.llvm.org/D116587
MakeNaturalAlignAddrLValue() expects the pointee type, but the
pointer type was passed. As a result, the natural alignment of
the pointer (usually 8) was always used in place of the natural
alignment of the value type.
Differential Revision: https://reviews.llvm.org/D116171
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
Resolve lit failures in clang after 8ca4b3e's land
Fix lit test failures in clang-ppc* and clang-x64-windows-msvc
Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc
Fix internal_clone(aarch64) inline assembly
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert, jhuber6
Differential Revision: https://reviews.llvm.org/D102107
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102107
If a test does not contain an " simd" but -fopenmp-simd RUN lines we can
just check that we do not create __kmpc|__tgt calls.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D101973
The original change was reverted because it was discovered
that clang mishandles thunks, and they receive wrong
attributes for their this/return types - the ones for the function
they will call, not the ones they have.
While i have tried to fix this in https://reviews.llvm.org/D100388
that patch has been up and stuck for a month now,
with little signs of progress.
So while it will be good to solve this for real,
for now we can simply avoid introducing the bug,
by not annotating this/return for thunks.
This reverts commit 6270b3a1ea,
relanding 0aa0458f14.
This patch fixes various issues with our prior `declare target` handling
and extends it to support `omp begin declare target` as well.
This started with PR49649 in mind, trying to provide a way for users to
avoid the "ref" global use introduced for globals with internal linkage.
From there it went down the rabbit hole, e.g., all variables, even
`nohost` ones, were emitted into the device code so it was impossible to
determine if "ref" was needed late in the game (based on the name only).
To make it really useful, `begin declare target` was needed as it can
carry the `device_type`. Not emitting variables eagerly had a ripple
effect. Finally, the precedence of the (explicit) declare target list
items needed to be taken into account, that meant we cannot just look
for any declare target attribute to make a decision. This caused the
handling of functions to require fixup as well.
I tried to clean up things while I was at it, e.g., we should not "parse
declarations and defintions" as part of OpenMP parsing, this will always
break at some point. Instead, we keep track what region we are in and
act on definitions and declarations instead, this is what we do for
declare variant and other begin/end directives already.
Highlights:
- new diagnosis for restrictions specificed in the standard,
- delayed emission of globals not mentioned in an explicit
list of a declare target,
- omission of `nohost` globals on the host and `host` globals on the
device,
- no explicit parsing of declarations in-between `omp [begin] declare
variant` and the corresponding end anymore, regular parsing instead,
- precedence for explicit mentions in `declare target` lists over
implicit mentions in the declaration-definition-seq, and
- `omp allocate` declarations will now replace an earlier emitted
global, if necessary.
---
Notes:
The patch is larger than I hoped but it turns out that most changes do
on their own lead to "inconsistent states", which seem less desirable
overall.
After working through this I feel the standard should remove the
explicit declare target forms as the delayed emission is horrible.
That said, while we delay things anyway, it seems to me we check too
often for the current status even though that is often not sufficient to
act upon. There seems to be a lot of duplication that can probably be
trimmed down. Eagerly emitting some things seems pretty weak as an
argument to keep so much logic around.
---
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D101030
This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101849
This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101849
For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.
To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.
To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
arguments.
* Adds 'nonnull' and 'dereferenceable(N)' to 'this' pointer arguments
* Gates 'nonnull' on -f(no-)delete-null-pointer-checks
* Introduces this-nonnull.cpp and microsoft-abi-this-nullable.cpp tests to
explicitly test the behavior of this change
* Refactors hundreds of over-constrained clang tests to permit these
attributes, where needed
* Updates Clang12 patch notes mentioning this change
Reviewed-by: rsmith, jdoerfert
Differential Revision: https://reviews.llvm.org/D17993
Replace the `ident_t` handling in Clang with the methods offered by the
OMPIRBuilder. This cuts down on the clang code as well as the
differences between the two, making further transitions easier. Tests
have changed but there should not be a real functional change. The most
interesting difference is probably that we stop generating local ident_t
allocations for now and just use globals. Given that this happens only
with debug info, the location part of the `ident_t` is probably bigger
than the test anyway. As the location part is already a global, we can
avoid the allocation, memcpy, and store in favor of a constant global
that is slightly bigger. This can be revisited if there are
complications.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D80735
llvm function is marked nounwind
This fixes cases where an invoke is emitted, despite the called llvm
function being marked nounwind, because ConstructAttributeList failed to
add the attribute to the attribute list. llvm optimization passes turn
invokes into calls and optimize away the exception handling code, but
it's better to avoid emitting the code in the front-end if the called
function is known not to raise an exception.
Differential Revision: https://reviews.llvm.org/D83906
Summary:
When -fopenmp option is specified then version 5.0 will be set as
default.
Reviewers: gregrodgers, jdoerfert, ABataev
Reviewed By: ABataev
Subscribers: pdhaliwal, yaxunl, guansong, sstefan1, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D81098
Summary:
Clang -fpic defaults to -fno-semantic-interposition (GCC -fpic defaults
to -fsemantic-interposition).
Users need to specify -fsemantic-interposition to get semantic
interposition behavior.
Semantic interposition is currently a best-effort feature. There may
still be some cases where it is not handled well.
Reviewers: peter.smith, rnk, serge-sans-paille, sfertile, jfb, jdoerfert
Subscribers: dschuff, jyknight, dylanmckay, nemanjai, jvesely, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, arphaman, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73865
For consistency with normal instructions and clarity when reading IR,
it's best to print the %0, %1, ... names of function arguments in
definitions.
Also modifies the parser to accept IR in that form for obvious reasons.
llvm-svn: 367755
Fixed emission of the __kmpc_global_thread_num() so that it is not
messed up with alloca instructions anymore. Plus, fixes emission of the
__kmpc_global_thread_num() functions in the target outlined regions so
that they are not called before runtime is initialized.
llvm-svn: 343856
Currently ident_t objects are created const when debug info is not
enabled, but the libittnotify libray in the OpenMP runtime writes to
the reserved_2 field (See __kmp_itt_region_forking in
openmp/runtime/src/kmp_itt.inl). Now create ident_t objects non-const.
Differential Revision: https://reviews.llvm.org/D51331
llvm-svn: 340934
Summary:
Upstream LLVM is changing the the prototypes of the @llvm.memcpy/memmove/memset
intrinsics. This change updates the Clang tests for this change.
The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.
This change removes the alignment argument in favour of placing the alignment
attribute on the source and destination pointers of the memory intrinsic call.
For example, code which used to read:
call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)
At this time the source and destination alignments must be the same (Step 1).
Step 2 of the change, to be landed shortly, will relax that contraint and allow
the source and destination to have different alignments.
llvm-svn: 322964
only.
Added support for -fopenmp-simd option that allows compilation of
simd-based constructs without emission of OpenMP runtime calls.
llvm-svn: 321560
If the thread id is requested in windows mode within funclets, we may
generate incorrect function call that could lead to broken codegen.
llvm-svn: 317208
If exceptions are enabled, there may be a problem with the codegen of
the finalization functions from OpenMP runtime. It happens because of
the problem with the getting of thread identifier value. Patch tries to
fix it by using the result of the call of function
__kmpc_global_thread_num() rather than loading of value of outlined
function parameter.
llvm-svn: 311007
Summary:
This patch fixes an issue detected when firstprivate variables are passed to an OpenMP outlined function vararg list. Currently they are not compatible with what the runtime library expects causing malfunction in some targets.
This patch fixes the issue by moving the casting logic already in place for offloading to the common code that creates the outline function and arguments and updates the regression tests accordingly.
Reviewers: hfinkel, arpith-jacob, carlo.bertolli, kkwli0, ABataev
Subscribers: cfe-commits, caomhin
Differential Revision: http://reviews.llvm.org/D21150
llvm-svn: 272900