Commit Graph

14172 Commits

Author SHA1 Message Date
Yaxun (Sam) Liu cb08558caa [HIP] Fix regressions due to fp contract change
Recently HIP toolchain made a change to use clang instead of opt/llc to do compilation
(https://reviews.llvm.org/D81861). The intention is to make HIP toolchain canonical like
other toolchains.

However, this change introduced an unintentional change regarding backend fp fuse
option, which caused regressions in some HIP applications.

Basically before the change, HIP toolchain used clang to generate bitcode, then use
opt/llc to optimize bitcode and generate ISA. As such, the amdgpu backend takes
the default fp fuse mode which is 'Standard'. This mode respect contract flag of
fmul/fadd instructions and do not fuse fmul/fadd instructions without contract flag.

However, after the change, HIP toolchain now use clang to generate IR, do optimization,
and generate ISA as one process. Now amdgpu backend fp fuse option is determined
by -ffp-contract option, which is 'fast' by default. And this -ffp-contract=fast language option
is translated to 'Fast' fp fuse option in backend. Suddenly backend starts to fuse fmul/fadd
instructions without contract flag.

This causes wrong result for some device library functions, e.g. tan(-1e20), which should
return 0.8446, now returns -0.933. What is worse is that since backend with 'Fast' fp fuse
option does not respect contract flag, there is no way to use #pragma clang fp contract
directive to enforce fp contract requirements.

This patch fixes the regression by introducing a new value 'fast-honor-pragmas' for -ffp-contract
and use it for HIP by default. 'fast-honor-pragmas' is equivalent to 'fast' in frontend but
let the backend to use 'Standard' fp fuse option. 'fast-honor-pragmas' is useful since 'Fast'
fp fuse option in backend does not honor contract flag, it is of little use to HIP
applications since all code with #pragma STDC FP_CONTRACT or any IR from a
source compiled with -ffp-contract=on is broken.

Differential Revision: https://reviews.llvm.org/D90174
2020-11-24 08:10:06 -05:00
Ben Dunbobbin e42021d5cc [Clang][-fvisibility-from-dllstorageclass] Set DSO Locality from final visibility
Ensure that the DSO Locality of the globals in the IR is derived from
their final visibility when using -fvisibility-from-dllstorageclass.

To accomplish this we reset the DSO locality of globals (before
setting their visibility from their dllstorageclass) at the end of
IRGen in Clang. This removes any effects that visibility options or
annotations may have had on the DSO locality.

The resulting DSO locality of the globals will be pessimistic
w.r.t. to the normal compiler IRGen.

Differential Revision: https://reviews.llvm.org/D91779
2020-11-24 00:32:14 +00:00
Zequan Wu 15a3ae1ab1 [Clang] Add __STDCPP_THREADS__ to standard predefine macros
According to https://eel.is/c++draft/cpp.predefined#2.6, `__STDCPP_THREADS__` is a predefined macro.

Differential Revision: https://reviews.llvm.org/D91747
2020-11-22 16:05:53 -08:00
Alexey Bataev c964f30814 [OPENMP]Use the real pointer value as base, not indexed value.
After fix for PR48174 the base pointer for pointer-based
array-sections/array-subscripts will be emitted as `&ptr[idx]`, but
actually it should be just `ptr`, i.e. the address stored in the ponter
to point correctly to the beginning of the array. Currently it may lead
to a crash in the runtime.

Differential Revision: https://reviews.llvm.org/D91805
2020-11-20 11:34:14 -08:00
Alex Richardson 51e09e1d5a [AMDGPU] Set the default globals address space to 1
This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D84345
2020-11-20 15:46:53 +00:00
Arthur Eubanks 72badbcdcc [NPM] Move more O0 pass building into PassBuilder
This moves handling of alwaysinline, coroutines, matrix lowering, PGO,
and LTO-required passes into PassBuilder. Much of this is replicated
between Clang and opt. Other out-of-tree users also replicate some of
this, such as Rust [1] replicating the alwaysinline, LTO, and PGO
passes.

The LTO passes are also now run in
build(Thin)LTOPreLinkDefaultPipeline() since they are semantically
required for (Thin)LTO.

[1]: f5230fbf76/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp (L896)

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D91585
2020-11-19 11:22:23 -08:00
Joseph Huber da8bec47ab [OpenMP] Add Location Fields to Libomptarget Runtime for Debugging
Summary:
Add support for passing source locations to libomptarget runtime functions using the ident_t struct present in the rest of the libomp API. This will allow the runtime system to give much more insightful error messages and debugging values.

Reviewers: jdoerfert grokos

Differential Revision: https://reviews.llvm.org/D87946
2020-11-19 12:01:53 -05:00
Xiangling Liao 17497ec514 [AIX][FE] Support constructor/destructor attribute
Support attribute((constructor)) and attribute((destructor)) on AIX

Differential Revision: https://reviews.llvm.org/D90892
2020-11-19 09:24:01 -05:00
Qiu Chaofan 6b1341eb5b [PowerPC] [Clang] Fix alignment of 128-bit float types
According to ELF v2 ABI, both IEEE 128-bit and IBM extended floating
point variables should be quad-word (16 bytes) aligned. Previously, only
vector types are considered aligned as quad-word on PowerPC.

This patch will fix incorrectness of IEEE 128-bit float argument in
va_arg cases.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D91596
2020-11-19 14:22:14 +08:00
Joseph Huber 97e55cfef5 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;"

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D89802
2020-11-18 15:28:39 -05:00
Alexey Bataev 5ba324ccad [OPENMP]Fix PR48174: compile-time crash with target enter data on a global struct.
The compiler should treat array subscript with base pointer as a first
pointer in complex data, it is used only for member expression with base
pointer.

Differential Revision: https://reviews.llvm.org/D91660
2020-11-18 07:48:58 -08:00
Florian Hahn 680931af27
[Matrix] Adjust matrix pointer type for inline asm arguments.
Matrix types in memory are represented as arrays, but accessed through
vector pointers, with the alignment specified on the access operation.

For inline assembly, update pointer arguments to use vector pointers.
Otherwise there will be a mis-match if the matrix is also an
input-argument which is represented as vector.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D91631
2020-11-18 11:44:11 +00:00
Nick Desaulniers f4c6080ab8 Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch"
This reverts commit b7926ce6d7.

Going with a simpler approach.
2020-11-17 17:27:14 -08:00
Alexey Bataev 5292187a2d [OPENMP]Fix PR48076: mapping of data member pointer.
If the data member pointer is mapped, the compiler tries to optimize the
mapping of such data by discarding explicit mapping flags and trying to
emit combined data instead. In some cases, this optimization is not
quite correctly implemented and it leads to a program crash at the
runtime. Instead, if the data member is mapped, just emit it as is and
do not emit combined mapping flags for it.

Differential Revision: https://reviews.llvm.org/D91552
2020-11-17 07:18:32 -08:00
Yaxun (Sam) Liu 3f4b5893ef [AMDGPU] Add option -munsafe-fp-atomics
Add an option -munsafe-fp-atomics for AMDGPU target.

When enabled, clang adds function attribute "amdgpu-unsafe-fp-atomics"
to any functions for amdgpu target. This allows amdgpu backend to use
unsafe fp atomic instructions in these functions.

Differential Revision: https://reviews.llvm.org/D91546
2020-11-16 21:52:12 -05:00
CJ Johnson 69cd776e1e [CodeGen] Apply 'nonnull' and 'dereferenceable(N)' to 'this' pointer
arguments.

* Adds 'nonnull' and 'dereferenceable(N)' to 'this' pointer arguments
* Gates 'nonnull' on -f(no-)delete-null-pointer-checks
* Introduces this-nonnull.cpp and microsoft-abi-this-nullable.cpp tests to
  explicitly test the behavior of this change
* Refactors hundreds of over-constrained clang tests to permit these
  attributes, where needed
* Updates Clang12 patch notes mentioning this change

Reviewed-by: rsmith, jdoerfert

Differential Revision: https://reviews.llvm.org/D17993
2020-11-16 17:39:17 -08:00
Florian Hahn ca2e7e5999 [IRGen] Add !annotation metadata for auto-init stores.
This patch updates Clang's IRGen to add !annotation nodes with an
"auto-init" annotation to all stores for auto-initialization.

As discussed in 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
this allows using optimization remarks to track down where auto-init
code was inserted (and not removed by optimizations).

There are a few cases in the tests where !annotation gets dropped by
optimizations. Those optimizations will be updated in subsequent
patches.

This patch is based on a patch by Francis Visoiu Mistrih.

Reviewed By: thegameg, paquette

Differential Revision: https://reviews.llvm.org/D91417
2020-11-16 10:37:02 +00:00
Richard Smith dc58cd1480 PR48169: Fix crash generating debug info for class non-type template
parameters.

It appears that LLVM isn't able to generate a DW_AT_const_value for a
constant of class type, but if it could, we'd match GCC's debug info in
this case, and in the interim we no longer crash.
2020-11-15 17:43:26 -08:00
Roman Lebedev 6861d938e5
Revert "clang-misexpect: Profile Guided Validation of Performance Annotations in LLVM"
See discussion in https://bugs.llvm.org/show_bug.cgi?id=45073 / https://reviews.llvm.org/D66324#2334485
the implementation is known-broken for certain inputs,
the bugreport was up for a significant amount of timer,
and there has been no activity to address it.
Therefore, just completely rip out all of misexpect handling.

I suspect, fixing it requires redesigning the internals of MD_misexpect.
Should anyone commit to fixing the implementation problem,
starting from clean slate may be better anyways.

This reverts commit 7bdad08429,
and some of it's follow-ups, that don't stand on their own.
2020-11-14 13:12:38 +03:00
Mehdi Amini 42e88bd6b1 Replace sequences of v.push_back(v[i]); v.erase(&v[i]); with std::rotate (NFC)
The code has a few sequence that looked like:

    Ops.push_back(Ops[0]);
    Ops.erase(Ops.begin());

And are equivalent to:

    std::rotate(Ops.begin(), Ops.begin() + 1, Ops.end());

The latter has the advantage of never reallocating the vector, which
would be a bug in the original code as push_back would read from the
memory it deallocated.
2020-11-14 00:55:33 +00:00
Arthur Eubanks 6e098189db [DFSan][NewPM] Handle dfsan under NPM
Make it required. Since it's a module pass, optnone won't test it, so
extend the clang test to also use opt-bisect now that it's supported.

14/16 check-dfsan tests failed with NPM enabled, now all pass.

Reviewed By: leonardchan

Differential Revision: https://reviews.llvm.org/D91385
2020-11-13 13:41:38 -08:00
Heejin Ahn 902ea588ea [WebAssembly] Rename atomic.notify and *.atomic.wait
- atomic.notify -> memory.atomic.notify
- i32.atomic.wait -> memory.atomic.wait32
- i64.atomic.wait -> memory.atomic.wait64

See https://github.com/WebAssembly/threads/pull/149.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D91447
2020-11-13 12:04:48 -08:00
Baptiste Saleil 3f78605a8c [PowerPC] Add paired vector load and store builtins and intrinsics
This patch adds the Clang builtins and LLVM intrinsics to load and store vector pairs.

Differential Revision: https://reviews.llvm.org/D90799
2020-11-13 12:35:10 -06:00
Michael Liao 8920ef06a1 [hip] Remove the coercion on aggregate kernel arguments.
- If an aggregate argument is indirectly accessed within kernels, direct
  passing results in unpromotable `alloca`, which degrade performance
  significantly. InferAddrSpace pass is enhanced in
  [D91121](https://reviews.llvm.org/D91121) to take the assumption that
  generic pointers loaded from the constant memory could be regarded
  global ones. The need for the coercion on aggregate arguments is
  mitigated.

Differential Revision: https://reviews.llvm.org/D89980
2020-11-12 21:19:30 -05:00
Arthur Eubanks 3a7b57b7ca [NFC][NewPM] Reuse PassBuilder callbacks with -O0
This removes lots of duplicated code which was necessary before
https://reviews.llvm.org/D89158.
Now we can use PassBuilder::runRegisteredEPCallbacks().
This is mostly sanitizers.

There is likely more that can be done to simplify, but let's start with this.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D90870
2020-11-12 12:42:59 -08:00
Alexey Bataev 3c6b457bee [OPENMP]Fix PR48076: Check map types array before accessing its front.
Need to check if there are map types for the components before trying to
access them when trying to modify type mappings for combined partial
mappings.

Differential Revision: https://reviews.llvm.org/D91370
2020-11-12 12:00:29 -08:00
Hans Wennborg a088766508 [dllexport] Instantiate default ctor default args for explicit specializations (PR45811)
For dllexported default constructors with default arguments, we export
default constructor closures which pass in the default args. (See D8331
for a good explanation.)

For templates, that means those default args must be instantiated even
if the function isn't called. That is done by the
InstantiateDefaultCtorDefaultArgs() function, but it wasn't done for
explicit specializations, causing asserts (see bug).

Differential revision: https://reviews.llvm.org/D91089
2020-11-12 13:29:34 +01:00
Arthur Eubanks b6ccff3d5f [NewPM] Provide method to run all pipeline callbacks, used for -O0
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.

This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.

Since callbacks may end up not adding passes, we need to check if the
pass managers are empty before adding them, so PassManager now has an
isEmpty() function. For example, polly adds callbacks but doesn't always
add passes in those callbacks, so this is necessary to keep
-debug-pass-manager tests' output from changing depending on if polly is
enabled or not.

Tests are a continuation of those added in
https://reviews.llvm.org/D89083.

Reviewed By: asbirlea, Meinersbur

Differential Revision: https://reviews.llvm.org/D89158
2020-11-11 15:10:27 -08:00
Richard Smith 5f12f4ff90 Suppress printing of inline namespace names in diagnostics by default,
except where they are necessary to disambiguate the target.

This substantially improves diagnostics from the standard library,
which are otherwise full of `::__1::` noise.
2020-11-11 15:05:51 -08:00
Akira Hatanaka 874b0a0b9d [CodeGen] Mark calls to objc_autorelease as tail
This enables a method sending an autorelease message to an object and
returning the object in MRR to avoid adding the object to an autorelease
pool if a call to objc_retainAutoreleasedReturnValue in the caller
function accepts the hand off of the retain count.

rdar://problem/50678052

Differential Revision: https://reviews.llvm.org/D91111
2020-11-10 13:48:25 -08:00
Richard Smith b637148ecb [c++20] For P0732R2 / P1907R1: Basic code generation and name
mangling support for non-type template parameters of class type and
template parameter objects.

The Itanium side of this follows the approach I proposed in
https://github.com/itanium-cxx-abi/cxx-abi/issues/47 on 2020-09-06.

The MSVC side of this was determined empirically by observing MSVC's
output.

Differential Revision: https://reviews.llvm.org/D89998
2020-11-09 22:10:27 -08:00
Michael Kruse e5dba2d7e5 [OMPIRBuilder] Start 'Create' methods with lower case. NFC.
For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.

This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.

I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.

Reviewed By: mehdi_amini, fghanim

Differential Revision: https://reviews.llvm.org/D91109
2020-11-09 19:35:11 -06:00
Fangrui Song e625f9c5d1 -fbasic-block-sections=list=: Suppress output if failed to open the file
Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D90815
2020-11-09 09:26:37 -08:00
Atmn Patel fd3cad7a60 [clang] Fix ForStmt mustprogress handling
D86841 had an error where for statements with no conditional were
required to make progress. This is not true, this patch removes that
line, and adds regression tests.

Differential Revision: https://reviews.llvm.org/D91075
2020-11-09 11:38:06 -05:00
Tyker d093401a26 [NFC] Remove string parameter of annotation attribute from AST childs.
this simplifies using annotation attributes when using clang as library
2020-11-09 16:39:59 +01:00
Simon Pilgrim 8930032f53 Don't dereference a dyn_cast<> result - use cast<> instead. NFCI.
We were relying on the dyn_cast<> succeeding - better use cast<> and have it assert that its the correct type than dereference a null result.
2020-11-08 13:06:07 +00:00
Arthur Eubanks 226e179f74 Revert "[NewPM] Provide method to run all pipeline callbacks, used for -O0"
This reverts commit ae38540042.
As well as some follow-up test fixes.

The original change causes new-pass-manager.ll to fail when polly is enabled.
2020-11-08 00:32:35 -08:00
Fangrui Song f2e479db92 [OpenMP] Fix -Wmisleading-indentation after D84192 2020-11-06 20:09:43 -08:00
cchen 0cab91140f [OpenMP5.0] map item can be non-contiguous for target update
In order not to modify the `tgt_target_data_update` information but still be
able to pass the extra information for non-contiguous map item (offset,
count, and stride for each dimension), this patch overload `arg` when
the maptype is set as `OMP_MAP_DESCRIPTOR`. The origin `arg` is for
passing the pointer information, however, the overloaded `arg` is an
array of descriptor_dim:

struct descriptor_dim {
  int64_t offset;
  int64_t count;
  int64_t stride
};

and the array size is the same as dimension size. In addition, since we
have count and stride information in descriptor_dim, we can replace/overload the
`arg_size` parameter by using dimension size.

For supporting `stride` in array section, we use a dummy dimension in
descriptor to store the unit size. The formula for counting the stride
in dimension D_n: `unit size * (D_0 * D_1 ... * D_n-1) * D_n.stride`.

Demonstrate how it works:
```
double arr[3][4][5];

D0: { offset = 0, count = 1, stride = 8 }                                // offset, count, dimension size always be 0, 1, 1 for this extra dimension, stride is the unit size
D1: { offset = 0, count = 2, stride = 8 * 1 * 2 = 16 }                   // stride = unit size * (product of dimension size of D0) * D1.stride = 4 * 1 * 2 = 8
D2: { offset = 2, count = 2, stride = 8 * (1 * 5) * 1 = 40  }            // stride = unit size * (product of dimension size of D0, D1) * D2.stride = 4 * 5 * 1 = 20
D3: { offset = 0, count = 2, stride = 8 * (1 * 5 * 4) * 2 = 320 }        // stride = unit size * (product of dimension size of D0, D1, D2) * D3.stride = 4 * 25 * 2 = 200

// X here means we need to offload this data, therefore, runtime will transfer
// data from offset 80, 96, 120, 136, 400, 416, 440, 456
// Runtime patch: https://reviews.llvm.org/D82245
// OOOOO OOOOO OOOOO
// OOOOO OOOOO OOOOO
// XOXOO OOOOO XOXOO
// XOXOO OOOOO XOXOO
```

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84192
2020-11-06 21:04:37 -06:00
Kevin P. Neal 2069403cdf [FPEnv] Use strictfp metadata in casting nodes
The strictfp metadata was added to the casting AST nodes in D85960, but
we aren't using that metadata yet. This patch adds that support.

In order to avoid lots of ad-hoc passing around of the strictfp bits I
updated the IRBuilder when moving from a function that has the Expr* to a
function that lacks it. I believe we should switch to this pattern to keep
the strictfp support from being overly invasive.

For the purpose of testing that we're picking up the right metadata, I
also made my tests use a pragma to make the AST's strictfp metadata not
match the global strictfp metadata. This exposes issues that we need to
deal with in subsequent patches, and I believe this is the right method
for most all of our clang strictfp tests.

Differential Revision: https://reviews.llvm.org/D88913
2020-11-06 11:56:12 -05:00
Jan Ole Hüser d2e7dca5ca [CodeGen] Fix Bug 47499: __unaligned extension inconsistent behaviour with C and C++
For the language C++ the keyword __unaligned (a Microsoft extension) had no effect on pointers.

The reason, why there was a difference between C and C++ for the keyword __unaligned:
For C, the Method getAsCXXREcordDecl() returns nullptr. That guarantees that hasUnaligned() is called.
If the language is C++, it is not guaranteed, that hasUnaligend() is called and evaluated.

Here are some links:

The Bug: https://bugs.llvm.org/show_bug.cgi?id=47499
Thread on the cfe-dev mailing list: http://lists.llvm.org/pipermail/cfe-dev/2020-September/066783.html
Diff, that introduced the check hasUnaligned() in getNaturalTypeAlignment(): https://reviews.llvm.org/D30166

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D90630
2020-11-05 12:57:17 -08:00
Arthur Eubanks ae38540042 [NewPM] Provide method to run all pipeline callbacks, used for -O0
Some targets may add required passes via
    TargetMachine::registerPassBuilderCallbacks(). We need to run those even
    under -O0. As an example, BPFTargetMachine adds
    BPFAbstractMemberAccessPass, a required pass.

    This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
    usage of the NPM) by allowing us to share added passes like coroutines
    and sanitizers between -O0 and other optimization levels.

    Tests are a continuation of those added in
    https://reviews.llvm.org/D89083.

    In order to prevent TargetMachines from adding unnecessary optimization
    passes at -O0, TargetMachine::registerPassBuilderCallbacks() will be
    changed to take an OptimizationLevel, but that will be done separately.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D89158
2020-11-04 22:27:16 -08:00
Atmn Patel ac73b73c16 [clang] Add mustprogress and llvm.loop.mustprogress attribute deduction
Since C++11, the C++ standard has a forward progress guarantee
[intro.progress], so all such functions must have the `mustprogress`
requirement. In addition, from C11 and onwards, loops without a non-zero
constant conditional or no conditional are also required to make
progress (C11 6.8.5p6). This patch implements these attribute deductions
so they can be used by the optimization passes.

Differential Revision: https://reviews.llvm.org/D86841
2020-11-04 22:03:14 -05:00
Arthur Eubanks ab0ddbc38a Reland [NewPM] Add OptimizationLevel param to registerPipelineStartEPCallback
This allows targets to skip optional optimization passes at -O0.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D90777
2020-11-04 13:11:40 -08:00
cchen d0d43b58b1 [OpenMP] target nested `use_device_ptr() if()` and is_device_ptr trigger asserts
Clang now asserts for the below case:
```
void clang::CodeGen::CGOpenMPRuntime::createOffloadEntriesAndInfoMetadata(): Assertion `std::get<0>(E) && "All ordered entries must exist!"' failed.
```

The reason why Clang hit the assert is because in
`emitTargetDataCalls`, both `BeginThenGen` and `BeginElseGen` call
`registerTargetRegionEntryInfo` and try to register the Entry in
OffloadEntriesTargetRegion with same key. If changing the expression in
if clause to any constant expression, then the assert disappear. (https://godbolt.org/z/TW7haj)

The assert itself is to avoid
user from accessing elements out of bound inside `OrderedEntries` in
`createOffloadEntriesAndInfoMetadata`.

In this patch, I add a check in `registerTargetRegionEntryInfo` to avoid
register the target region more than once.

A test case that triggers assert: https://godbolt.org/z/4cnGW8

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D90704
2020-11-04 12:36:57 -06:00
Qiu Chaofan 7faf62a80b [Clang] Add more fp128 math library function builtins
Since glibc has supported math library functions conforming IEEE 128-bit
floating point types on some platform (like ppc64le), we can fix clang's
math builtins missing this type.

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D90593
2020-11-04 17:58:42 +08:00
Baptiste Saleil daa127d77e [PowerPC] Add MMA builtin decoding and definitions
Add MMA builtin decoding. These builtins use the new PowerPC-specific types __vector_pair and __vector_quad.
So to avoid pervasive changes, we use custom type descriptors and custom decoding for these builtins.
We also use custom code generation to expand builtin calls with pointers to simpler intrinsic calls with non-pointer types.

Differential Revision: https://reviews.llvm.org/D81748
2020-11-03 15:08:46 -06:00
Ben Dunbobbin 7ad6010f58 Fix - [Clang] Add the ability to map DLL storage class to visibility
415f7ee883 had a silly typo introduced when I inlined some
code into a loop from its own function.

Original commit message:

For PlayStation we offer source code compatibility with
Microsoft's dllimport/export annotations; however, our file
format is based on ELF.

To support this we translate from DLL storage class to ELF
visibility at the end of codegen in Clang.

Other toolchains have used similar strategies (e.g. see the
documentation for this ARM toolchain:

https://developer.arm.com/documentation/dui0530/i/migrating-from-rvct-v3-1-to-rvct-v4-0/changes-to-symbol-visibility-between-rvct-v3-1-and-rvct-v4-0)

This patch adds the ability to perform this translation. Options
are provided to support customizing the mapping behaviour.

Differential Revision: https://reviews.llvm.org/D89970
2020-11-03 19:13:54 +00:00
Tim Renouf 89d41f3a2b [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf ee3e642627 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
Yaxun (Sam) Liu abd8cd9199 [CUDA][HIP] Fix linkage for -fgpu-rdc
Currently for explicit template function instantiation in CUDA/HIP device
compilation clang emits instantiated kernel with external linkage
and instantiated device function with internal linkage.

This is fine for -fno-gpu-rdc since there is only one TU.

However this causes duplicate symbols for kernels for -fgpu-rdc if
the same instantiation happen in multiple TU. Or missing symbols
if a device function calls an explicitly instantiated template function
in a different TU.

To make explicit template function instantiation work for
-fgpu-rdc we need to follow the C++ linkage paradigm, i.e.
use weak_odr linkage.

Differential Revision: https://reviews.llvm.org/D90311
2020-11-03 08:07:19 -05:00
Alex Lorenz 701456b523 [darwin] add support for __isPlatformVersionAtLeast check for if (@available)
The __isPlatformVersionAtLeast routine is an implementation of `if (@available)` check
that uses the _availability_version_check API on Darwin that's supported on
macOS 10.15, iOS 13, tvOS 13 and watchOS 6.

Differential Revision: https://reviews.llvm.org/D90367
2020-11-02 16:28:09 -08:00
Ben Dunbobbin ae9231ca2a Reland - [Clang] Add the ability to map DLL storage class to visibility
415f7ee883 had LIT test failures on any build where the clang executable
was not called "clang". I have adjusted the LIT CHECKs to remove the
binary name to fix this.

Original commit message:

For PlayStation we offer source code compatibility with
Microsoft's dllimport/export annotations; however, our file
format is based on ELF.

To support this we translate from DLL storage class to ELF
visibility at the end of codegen in Clang.

Other toolchains have used similar strategies (e.g. see the
documentation for this ARM toolchain:

https://developer.arm.com/documentation/dui0530/i/migrating-from-rvct-v3-1-to-rvct-v4-0/changes-to-symbol-visibility-between-rvct-v3-1-and-rvct-v4-0)

This patch adds the ability to perform this translation. Options
are provided to support customizing the mapping behaviour.

Differential Revision: https://reviews.llvm.org/D89970
2020-11-02 23:24:49 +00:00
Ben Dunbobbin 5024d3aa18 Revert "[Clang] Add the ability to map DLL storage class to visibility"
This reverts commit 415f7ee883.

The added tests were failing on the build bots!
2020-11-02 17:33:54 +00:00
Ben Dunbobbin 415f7ee883 [Clang] Add the ability to map DLL storage class to visibility
For PlayStation we offer source code compatibility with
Microsoft's dllimport/export annotations; however, our file
format is based on ELF.

To support this we translate from DLL storage class to ELF
visibility at the end of codegen in Clang.

Other toolchains have used similar strategies (e.g. see the
documentation for this ARM toolchain:

https://developer.arm.com/documentation/dui0530/i/migrating-from-rvct-v3-1-to-rvct-v4-0/changes-to-symbol-visibility-between-rvct-v3-1-and-rvct-v4-0)

This patch adds the ability to perform this translation. Options
are provided to support customizing the mapping behaviour.

Differential Revision: https://reviews.llvm.org/D89970
2020-11-02 17:08:23 +00:00
Teresa Johnson 0949f96dc6 [MemProf] Pass down memory profile name with optional path from clang
Similar to -fprofile-generate=, add -fmemory-profile= which takes a
directory path. This is passed down to LLVM via a new module flag
metadata. LLVM in turn provides this name to the runtime via the new
__memprof_profile_filename variable.

Additionally, always pass a default filename (in $cwd if a directory
name is not specified vi the = form of the option). This is also
consistent with the behavior of the PGO instrumentation. Since the
memory profiles will generally be fairly large, it doesn't make sense to
dump them to stderr. Also, importantly, the memory profiles will
eventually be dumped in a compact binary format, which is another reason
why it does not make sense to send these to stderr by default.

Change the existing memprof tests to specify log_path=stderr when that
was being relied on.

Depends on D89086.

Differential Revision: https://reviews.llvm.org/D89087
2020-11-01 17:38:23 -08:00
Mark de Wever b46fddf75f [CodeGen] Implement [[likely]] and [[unlikely]] for while and for loop.
The attribute has no effect on a do statement since the path of execution
will always include its substatement.

It adds a diagnostic when the attribute is used on an infinite while loop
since the codegen omits the branch here. Since the likelihood attributes
have no effect on a do statement no diagnostic will be issued for
do [[unlikely]] {...} while(0);

Differential Revision: https://reviews.llvm.org/D89899
2020-10-31 17:51:29 +01:00
Arthur Eubanks 5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Thomas Lively a787e09779 [WebAssembly] Prototype i64x2.bitmask
As proposed in https://github.com/WebAssembly/simd/pull/368.

Differential Revision: https://reviews.llvm.org/D90514
2020-10-30 17:23:30 -07:00
Thomas Lively 0a512a555a [WebAssembly] Prototype i64x2.eq
As proposed in https://github.com/WebAssembly/simd/pull/381. Since it is still
in the prototyping phase, it is only accessible via a target builtin function
and a target intrinsic.

Depends on D90504.

Differential Revision: https://reviews.llvm.org/D90508
2020-10-30 16:38:15 -07:00
Thomas Lively 1cb0b56607 [WebAssembly] Prototype i64x2.widen_{low,high}_i32x4_{s,u}
As proposed in https://github.com/WebAssembly/simd/pull/290. As usual, these
instructions are available only via builtin functions and intrinsics while they
are in the prototyping stage.

Differential Revision: https://reviews.llvm.org/D90504
2020-10-30 15:44:04 -07:00
Arthur Eubanks 2e31727a88 [NFC] Clean up PassBuilder
Make DebugLogging a member variable so that users of PassBuilder don't
need to pass it around so much.

Move call to TargetMachine::registerPassBuilderCallbacks() within
PassBuilder so users don't need to remember to call it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D90437
2020-10-30 10:03:59 -07:00
Arthur Eubanks 10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
David Sherwood cea69fa4dc [SVE] Add fatal error for unnamed SVE variadic arguments
We don't currently support passing unnamed variadic SVE arguments
so I've added a fatal error if we hit such cases to prevent any
silent ABI issues in future.

Differential Revision: https://reviews.llvm.org/D90230
2020-10-30 13:35:47 +00:00
Liu, Chen3 00090a2b82 Support complex target features combinations
This patch is mainly doing two things:

1. Adding support for parentheses, making the combination of target features
   more diverse;
2. Making the priority of ’,‘ is higher than that of '|' by default. So I need
   to make some change with PTX Builtin function.

Differential Revision: https://reviews.llvm.org/D89184
2020-10-30 10:32:53 +08:00
Thomas Lively be6f50798e [WebAssembly] Implement SIMD signselect instructions
As proposed in https://github.com/WebAssembly/simd/pull/124, using the opcodes
adopted by V8 in
https://chromium-review.googlesource.com/c/v8/v8/+/2486235/2/src/wasm/wasm-opcodes.h.
Uses new builtin functions and a new target intrinsic exclusively to ensure that
the new instructions are only emitted when a user explicitly opts in to using
them since they are still in the prototyping and evaluation phase.

Differential Revision: https://reviews.llvm.org/D90357
2020-10-29 11:06:20 -07:00
Jon Chesterfield dee7704829 [AMDGPU] Add __builtin_amdgcn_grid_size
[AMDGPU] Add __builtin_amdgcn_grid_size

Similar to D76772, loads the data from the dispatch pointer. Marked invariant.

Patch also updates the openmp devicertl to use this builtin.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D90251
2020-10-29 16:25:13 +00:00
Amy Huang 7669f3c0f6 Recommit "[CodeView] Emit static data members as S_CONSTANTs."
We used to only emit static const data members in CodeView as
S_CONSTANTS when they were used; this patch makes it so they are always emitted.

This changes CodeViewDebug.cpp to find the static const members from the
class debug info instead of creating DIGlobalVariables in the IR
whenever a static const data member is used.

Bug: https://bugs.llvm.org/show_bug.cgi?id=47580

Differential Revision: https://reviews.llvm.org/D89072

This reverts commit 504615353f.
2020-10-28 16:35:59 -07:00
Shilei Tian 0661328d7e [Clang][OpenMP] Added the support for target data nowait
Previously we added support for target nowait, but target data nowait
has not been supported yet. In this patch, target data nowait will also be
wrapped into a task.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90099
2020-10-28 15:53:30 -04:00
Mircea Trofin 6fa35541a0 [NFC][ThinLTO] Change command line passing to EmbedBitcodeInModule
Changing to pass by ref - less null checks to worry about.

Differential Revision: https://reviews.llvm.org/D90330
2020-10-28 12:33:39 -07:00
Baptiste Saleil 40dd4d5233 [Clang][PowerPC] Add __vector_pair and __vector_quad types
Define the __vector_pair and __vector_quad types that are used to manipulate
the new accumulator registers introduced by MMA on PowerPC. Because these two
types are specific to PowerPC, they are defined in a separate new file so it
will be easier to add other PowerPC specific types if we need to in the future.

Differential Revision: https://reviews.llvm.org/D81508
2020-10-28 13:19:20 -05:00
Thomas Lively 5b464f2aa5 [WebAssembly] Fix incorrectly named target builtin
Rename __builtin_wasm_q15mulr_saturate_s_i8x16 to
__builtin_wasm_q15mulr_saturate_s_i16x8, fixing the implied lane interpretation
of the result.
2020-10-28 10:22:43 -07:00
Heejin Ahn 98941279b9 [WebAssembly] Clang-format builtins generation (NFC)
Differential Revision: https://reviews.llvm.org/D90294
2020-10-28 10:01:21 -07:00
Thomas Lively 31e944556f [WebAssembly] Prototype extending multiplication SIMD instructions
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.

Differential Revision: https://reviews.llvm.org/D90253
2020-10-28 09:38:59 -07:00
JonChesterfield 5d02ca49a2 [libomptarget][nvptx] Undef, weak shared variables
[libomptarget][nvptx] Undef, weak shared variables

Shared variables on nvptx, and LDS on amdgcn, are uninitialized at
the start of kernel execution. Therefore create the variables with
undef instead of zeros, motivated in part by the amdgcn back end
rejecting LDS+initializer.

Common is zero initialized, which seems incompatible with shared. Thus
change them to weak, following the direction of
https://reviews.llvm.org/rG7b3eabdcd215

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90248
2020-10-28 14:25:36 +00:00
Benjamin Kramer 90a9f97cbd [openmp] Use front() instead of *begin() to not hide bugs when CurTypes is empty. 2020-10-28 13:58:23 +01:00
Benjamin Kramer 207cf71fa9 Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API"
This reverts commit d981c7b758 and
a87d7b3d44. Test fails under msan.
2020-10-28 13:58:14 +01:00
Joseph Huber a87d7b3d44 [OpenMP] Add Passing in Original Declaration Names To Mapper API
Summary:
This patch adds support for passing in the original delcaration name in the
source file to the libomptarget runtime. This will allow the runtime to provide
more intelligent debugging messages. This patch takes the original expression
parsed from the OpenMP map / update clause and provides a textual
representation if it was explicitly mapped, otherwise it takes the name of the
variable declaration as a fallback. The information in passed to the runtime in
a global array of strings that matches the existing ident_t source location
strings using ";name;filename;column;row;;". See
clang/test/OpenMP/target_map_names.cpp for an example of the generated output
for a given map clause.

Reviewers: jdoervert

Differential Revision: https://reviews.llvm.org/D89802
2020-10-27 16:09:19 -04:00
Amy Huang 504615353f Revert "[CodeView] Emit static data members as S_CONSTANTs."
Seems like there's an assert in here that we shouldn't be running into.

This reverts commit 515973222e.
2020-10-27 11:29:58 -07:00
Nico Weber 2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c6.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Shilei Tian d38788b357 [Clang][OpenMP] Avoid unnecessary privatization of mapper array when there is no user defined mapper
In current implementation, if it requires an outer task, the mapper array will be privatized no matter whether it has mapper. In fact, when there is no mapper, the mapper array only contains number of nullptr. In the libomptarget, the use of mapper array is `if (mappers_array && mappers_array[i])`, which means we can directly set mapper array to nullptr if there is no mapper. This can avoid unnecessary data copy.

In this patch, the data privatization will not be emitted if the mapper array is nullptr. When it comes to the emit of task body, the nullptr will be used directly.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90101
2020-10-27 00:02:32 -04:00
Arthur Eubanks e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Shilei Tian e20d64c3d9 [Clang][OpenMP] Fixed an issue of segment fault when using target nowait
The implementation of target nowait just wraps the target region into a task. The essential four parameters (base ptr, ptr, size, mapper) are taken as firstprivate such that they will be copied to the private location. When there is no user-defined mapper, the mapper variable will be nullptr. However, it will be still copied to the corresponding place. Therefore, a memcpy will be generated and the source pointer will be nullptr, causing a segmentation fault. The root cause is when calling `emitOffloadingArraysArgument`, the last argument `Options` has a field about whether it requires a task. It only takes depend clause into account. In this patch, the nowait clause is also included.

There're two things that will be done in another patches:
1. target data nowait has not been supported yet. D90099 added the support.
2. When there is no mapper, the mapper array can be nullptr no matter whether it requires outer task or not. It can avoid an unnecessary data copy. This is an optimization that is covered in D90101.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89844
2020-10-26 22:33:22 -04:00
Amy Huang 515973222e [CodeView] Emit static data members as S_CONSTANTs.
We used to only emit static const data members in CodeView as
S_CONSTANTS when they were used; this patch makes it so they are always emitted.

I changed CodeViewDebug.cpp to find the static const members from the
class debug info instead of creating DIGlobalVariables in the IR
whenever a static const data member is used.

Bug: https://bugs.llvm.org/show_bug.cgi?id=47580

Differential Revision: https://reviews.llvm.org/D89072
2020-10-26 15:30:35 -07:00
Duncan P. N. Exon Smith d4c667c9af Avoid unnecessary uses of `MDNode::getTemporary`, NFC
This is a long-delayed follow-up to
5e5b85098d.

`TempMDNode` includes a bunch of machinery for RAUW, and should only be
used when necessary. RAUW wasn't being used in any of these cases... it
was just a placeholder for a self-reference.

Where the real node was using `MDNode::getDistinct`, just replace the
temporary argument with `nullptr`.

Where the real node was using `MDNode::get`, the `replaceOperandWith`
call was "promoting" the node to a distinct one implicitly due to
self-reference detection in `MDNode::handleChangedOperand`. The
`TempMDNode` was serving a purpose by delaying uniquing, but it's way
simpler to just call `MDNode::getDistinct` in the first place.

Note that using a self-reference at all in these places is a hold-over
from before `distinct` metadata existed. It was an old trick to create
distinct nodes. It would be intrusive to change, including bitcode
upgrades, etc., and it's harmless so I'm not sure there's much value in
removing it from existing schemas. After this commit it still has a tiny
memory cost (in the extra metadata operand) but no more overhead in
construction.

Differential Revision: https://reviews.llvm.org/D90079
2020-10-26 17:03:25 -04:00
Zequan Wu e56e7bd469 Revert "Revert "Ensure that checkInitIsICE is called exactly once for every variable""
This reverts commit a2ac64dd90.
2020-10-26 12:08:57 -07:00
Zequan Wu a2ac64dd90 Revert "Ensure that checkInitIsICE is called exactly once for every variable"
This causing `Assertion Result && "Could not evaluate expression"' failed` at https://bugs.chromium.org/p/chromium/issues/detail?id=1142009

This reverts commit 76c0092665.
2020-10-26 11:59:55 -07:00
Nick Desaulniers c8f84bd094 [Clang][CodeGen] fix failed assertion
Ensure we can emit symbol aliases via function attribute
even when function signatures contain incomplete types.

Via bugreport:
https://reviews.llvm.org/D66492#2350947

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D90073
2020-10-26 11:37:55 -07:00
Tyker d3205bbca3 [Annotation] Allows annotation to carry some additional constant arguments.
This allows using annotation in a much more contexts than it currently has.
especially when annotation with template or constexpr.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D88645
2020-10-26 10:50:05 +01:00
Melanie Blower 2e204e2391 [clang] Enable support for #pragma STDC FENV_ACCESS
Reviewers: rjmccall, rsmith, sepavloff

Differential Revision: https://reviews.llvm.org/D87528
2020-10-25 06:46:25 -07:00
Akira Hatanaka 71e1a56de1 [CodeGen] Emit destructor calls to destruct non-trivial C struct
temporaries created by conditional and assignment operators

rdar://problem/64989559

Differential Revision: https://reviews.llvm.org/D83448
2020-10-23 14:46:17 -07:00
Nick Desaulniers b7926ce6d7 [IR] add fn attr for no_stack_protector; prevent inlining on mismatch
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining.  Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.

Fixes pr/47479.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D87956
2020-10-23 11:55:39 -07:00
Venkataramanan Kumar 57cdc52c4d Initial support for vectorization using Libmvec (GLIBC vector math library)
Differential Revision: https://reviews.llvm.org/D88154
2020-10-22 16:01:39 -04:00
Xiang1 Zhang 7c3fea7721 [X86] Support customizing stack protector guard
Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D88631
2020-10-22 10:08:14 +08:00
Richard Smith ba4768c966 [c++20] For P0732R2 / P1907R1: Basic frontend support for class types as
non-type template parameters.

Create a unique TemplateParamObjectDecl instance for each such value,
representing the globally unique template parameter object to which the
template parameter refers.

No IR generation support yet; that will follow in a separate patch.
2020-10-21 13:21:41 -07:00
Jonas Paulsson 42a82862b6 Reapply "[clang] Improve handling of physical registers in inline
assembly operands."

Earlyclobbers are now excepted from this change (original commit: c78da03).

Review: Ulrich Weigand, Nick Desaulniers

Differential Revision: https://reviews.llvm.org/D87279
2020-10-21 10:53:40 +02:00
Mikhail Maltsev 7819411837 [clang] Use SourceLocation as key in hash maps, NFCI
The patch adjusts the existing `llvm::DenseMap<unsigned, T>` and
`llvm::DenseSet<unsigned>` objects that store source locations, so
that they use `SourceLocation` directly instead of `unsigned`.

This patch relies on the `DenseMapInfo` trait added in D89719.

It also replaces the construction of `SourceLocation` objects from
the constants -1 and -2 with calls to the trait's methods `getEmptyKey`
and `getTombstoneKey` where appropriate.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D69840
2020-10-20 16:24:09 +01:00
Richard Smith 08c8d5bc51 Properly track whether a variable is constant-initialized.
This fixes miscomputation of __builtin_constant_evaluated in the
initializer of a variable that's not usable in constant expressions, but
is readable when constant-folding.

If evaluation of a constant initializer fails, we throw away the
evaluated result instead of keeping it as a non-constant-initializer
value for the variable, because it might not be a correct value.
To avoid regressions for initializers that are foldable but not formally
constant initializers, we now try constant-evaluating some globals in
C++ twice: once to check for a constant initializer (in an mode where
is_constannt_evaluated returns true) and again to determine the runtime
value if the initializer is not a constant initializer.
2020-10-19 23:59:11 -07:00
Fangrui Song 0ab222e7d7 [gcov] Delete CC1 option -test-coverage
The name is unfortunate because it is similar to the driver option -ftest-coverage.
It turns out aside from one occurrence in a test, this option is not used.
2020-10-19 21:48:51 -07:00
Richard Smith 3692d20d2b Refactor tracking of constant initializers for variables.
Instead of framing the interface around whether the variable is an ICE
(which is only interesting in C++98), primarily track whether the
initializer is a constant initializer (which is interesting in all C++
language modes).

No functionality change intended.
2020-10-19 21:31:19 -07:00
Richard Smith 76c0092665 Ensure that checkInitIsICE is called exactly once for every variable
for which it matters.

This is a step towards separating checking for a constant initializer
(in which std::is_constant_evaluated returns true) and any other
evaluation of a variable initializer (in which it returns false).
2020-10-19 19:04:04 -07:00
Douglas Yung 774ab60125 Add option to use older clang ABI behavior when passing certain union types as function arguments
Recently commit D78699 (commit 26cfb6e562), fixed clang's behavior with respect
to passing a union type through a register to correctly follow the ABI. However,
this is an ABI breaking change with earlier versions of the clang compiler, so we
should add an -fclang-abi-compat option to address this. Additionally, the PS4 ABI
requires the older behavior, so that is added as well.

This change adds a Ver11 value to the ClangABI enum that when it is set (or the
target is the PS4 triple), we skip the ABI fix introduced in D78699.

Differential Revision: https://reviews.llvm.org/D89747
2020-10-19 18:17:34 -07:00
Simon Pilgrim 7fe7d9b130 Fix MSVC "not all control paths return a value" warning. NFCI. 2020-10-19 11:48:31 +01:00
Hans Wennborg 0628bea513 Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting"
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.

> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar

This reverts commit 273c299d5d.
2020-10-19 12:31:14 +02:00
Mark de Wever 389c8d5b20 [NFC] Make non-modifying members const.
Implementing the likelihood attributes for the iteration statements adds
a new helper function. This function can't be const qualified since
these non-modifying members aren't const qualified.
2020-10-18 18:50:21 +02:00
Mark de Wever 2bcda6bb28 [Sema, CodeGen] Implement [[likely]] and [[unlikely]] in SwitchStmt
This implements the likelihood attribute for the switch statement. Based on the
discussion in D85091 and D86559 it only handles the attribute when placed on
the case labels or the default labels.

It also marks the likelihood attribute as feature complete. There are more QoI
patches in the pipeline.

Differential Revision: https://reviews.llvm.org/D89210
2020-10-18 13:48:42 +02:00
Richard Smith d4aac67859 Make the check for whether we should memset(0) an aggregate
initialization a little smarter.

Look through casts that preserve zero-ness when determining if an
initializer is zero, so that we can handle cases like an {0} initializer
whose corresponding field is a type other than 'int'.
2020-10-16 16:48:22 -07:00
Richard Smith 48c70c1664 Extend memset-to-zero optimization to C++11 aggregate functional casts
Aggr{...}.

We previously missed these cases due to not stepping over the additional
AST nodes representing their syntactic form.
2020-10-16 13:21:08 -07:00
Matt Arsenault 0a7cd99a70 Reapply "OpaquePtr: Add type to sret attribute"
This reverts commit eb9f7c28e5.

Previously this was incorrectly handling linking of the contained
type, so this merges the fixes from D88973.
2020-10-16 11:05:02 -04:00
Caroline Concatto e8d9ee9c7c [SVE][CodeGen]Use getFixedSize() function for TypeSize comparison in clang
This patch makes sure that the instance of TypeSize comparison operator
is done with a fixed type size.

Differential Revision: https://reviews.llvm.org/D89312
2020-10-16 10:56:39 +01:00
Vedant Kumar 273c299d5d [PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting
This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
the splitting pass to be toggled on/off. The current method of passing
`-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
correctly (say, with `-O0` or `-Oz`).

To implement the -fsplit-cold-code option, an attribute is applied to
functions to indicate that they may be considered for splitting. This
removes some complexity from the old/new PM pipeline builders, and
behaves as expected when LTO is enabled.

Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
Differential Revision: https://reviews.llvm.org/D57265
Reviewed By: Aditya Kumar, Vedant Kumar
Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar
2020-10-15 23:13:33 +00:00
Fangrui Song 5a338599fb [CGBuiltin] Respect asm labels and redefine_extname for builtins with specialized emitting
rL131311 added `asm()` support for builtin functions, but `asm()` for builtins with
specialized emitting (e.g. memcpy, various math functions) still do not work.

This patch makes these functions work for `asm()` and `#pragma redefine_extname`.
glibc uses `asm()` to redirect internal libc function calls to hidden aliases.

Limitation: such a function is a builtin in clang, but will not be recognized as
a libcall in optimization passes because Clang does not annotate the renamed
function as a libcall.  In GCC -O1 or above, `abs` can be optimized out but we can't.
Additionally, we cannot redirect `__builtin_sin` to `real_sin` in the following example:

  double sin(double x) asm("real_sin");
  double f(double d) { return __builtin_sin(d); }

---

According to @rsmith, the following three statements cannot be simultaneously true:

(1) The frontend function foo has known, builtin semantics X.
(2) The symbol foo has known, builtin semantics X.
(3) It's not correct to lower a call to the frontend function foo to the symbol foo.

People do want (1) (if it is profitable to expand a memcpy, do it).
This also means that people do not want to add -fno-builtin-memcpy.
People do want (3): that is why they use asm("__GI_memcpy") in the first place.

So unfortunately we make a compromise by not refuting (2) (see the limitation above).
For most libcalls, there is a small loss because compilers don't synthesize them.
For the few glibc cares about, it uses `asm("memcpy = __GI_memcpy");` to make
the assembly level redirection.
(Changing function names (e.g. `__memcpy`) is a hit to ergonomics which is not acceptable).

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D88712
2020-10-15 15:14:38 -07:00
Reid Kleckner 5fbab4025e [MS] Apply `inreg` to AArch64 sret parms on instance methods
The documentation rules indicate that instance methods should return
large, trivially copyable aggregates via X1/X0 and not X8 as is normally
done when returning such structs from free functions:
https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#return-values

Fixes PR47836, a bug in the initial implementation of these rules.

I tried to simplify the logic a bit as well while I'm here.

Differential Revision: https://reviews.llvm.org/D89362
2020-10-15 14:54:42 -07:00
Yaxun (Sam) Liu e384e94fbe Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024"
This reverts commit 187658b8a6 due to
AMDGPU backend issues.
2020-10-15 17:25:55 -04:00
Leonard Chan 79829a4704 Revert "[clang] Add -fc++-abi= flag for specifying which C++ ABI to use"
This reverts commits 683b308c07 and
8487bfd4e9.

We will go for a more restricted approach that does not give freedom to
everyone to change ABIs on whichever platform.

See the discussion on https://reviews.llvm.org/D85802.
2020-10-15 14:24:38 -07:00
Thomas Lively 1992e30c2d [WebAssembly] Prototype i8x16.popcnt
As proposed at https://github.com/WebAssembly/simd/pull/379. Use a target
builtin and intrinsic rather than normal codegen patterns to make the
instruction opt-in until it is merged to the proposal and stabilized in engines.

Differential Revision: https://reviews.llvm.org/D89446
2020-10-15 21:18:22 +00:00
Stanislav Mekhanoshin d1beb95d12 [AMDGPU] gfx1032 target
Differential Revision: https://reviews.llvm.org/D89487
2020-10-15 12:41:18 -07:00
Thomas Lively 3f738d1f5e Reland "[WebAssembly] v128.load{8,16,32,64}_lane instructions"
This reverts commit 7c8385a352 with a typing fix
to an instruction selection pattern.
2020-10-15 19:32:34 +00:00
Thomas Lively 7c8385a352 Revert "[WebAssembly] v128.load{8,16,32,64}_lane instructions"
This reverts commit 7c6bfd90ab.
2020-10-15 15:49:36 +00:00
Thomas Lively 7c6bfd90ab [WebAssembly] v128.load{8,16,32,64}_lane instructions
Prototype the newly proposed load_lane instructions, as specified in
https://github.com/WebAssembly/simd/pull/350. Since these instructions are not
available to origin trial users on Chrome stable, make them opt-in by only
selecting them from intrinsics rather than normal ISel patterns. Since we only
need rough prototypes to measure performance right now, this commit does not
implement all the load and store patterns that would be necessary to make full
use of the offset immediate. However, the full suite of offset tests is included
to make it easy to track improvements in the future.

Since these are the first instructions to have a memarg immediate as well as an
additional immediate, the disassembler needed some additional hacks to be able
to parse them correctly. Making that code more principled is left as future
work.

Differential Revision: https://reviews.llvm.org/D89366
2020-10-15 15:33:10 +00:00
Caroline Concatto 145e44bb18 [SVE]Fix implicit TypeSize casts in EmitCheckValue
Using TypeSize::getFixedSize() instead of relying upon the implicit
TypeSize->uint64_cast as the type is always fixed width.

Differential Revision: https://reviews.llvm.org/D89313
2020-10-15 13:25:46 +01:00
Simon Pilgrim d7fa9030d4 [CodeGen][X86] Emit fshl/fshr ir intrinsics for shiftleft128/shiftright128 ms intrinsics
Now that funnel shift handling is pretty good, we can use the intrinsics directly and avoid a lot of zext/trunc issues.

https://godbolt.org/z/YqhnnM

Differential Revision: https://reviews.llvm.org/D89405
2020-10-15 10:22:41 +01:00
Duncan P. N. Exon Smith dde4e0318c clang/CodeGen: Stop using SourceManager::getBuffer, NFC
Update `clang/lib/CodeGen` to use a `MemoryBufferRef` from
`getBufferOrNone` instead of `MemoryBuffer*` from `getBuffer`. No
functionality change here.

Differential Revision: https://reviews.llvm.org/D89411
2020-10-14 23:32:43 -04:00
Leonard Chan 683b308c07 [clang] Add -fc++-abi= flag for specifying which C++ ABI to use
This implements the flag proposed in RFC http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html.

The goal is to add a way to override the default target C++ ABI through
a compiler flag. This makes it easier to test and transition between different
C++ ABIs through compile flags rather than build flags.

In this patch:
- Store `-fc++-abi=` in a LangOpt. This isn't stored in a
  CodeGenOpt because there are instances outside of codegen where Clang
  needs to know what the ABI is (particularly through
  ASTContext::createCXXABI), and we should be able to override the
  target default if the flag is provided at that point.
- Expose the existing ABIs in TargetCXXABI as values that can be passed
  through this flag.
  - Create a .def file for these ABIs to make it easier to check flag
    values.
  - Add an error for diagnosing bad ABI flag values.

Differential Revision: https://reviews.llvm.org/D85802
2020-10-14 12:31:21 -07:00
Jonas Paulsson 625fa47617 Revert "[clang] Improve handling of physical registers in inline assembly operands."
This reverts commit c78da03778.

Temporarily reverted due to https://bugs.llvm.org/show_bug.cgi?id=47837.
2020-10-14 08:42:51 +02:00
Jonas Paulsson c78da03778 [clang] Improve handling of physical registers in inline assembly operands.
Change EmitAsmStmt() to

- Not tie physregs with the "+r" constraint, but instead add the hard
  register as an input constraint. This makes "+r" and "=r":"r" look the same
  in the output.

  Background: Macro intensive user code may contain inline assembly
  statements with multiple operands constrained to the same physreg. Such a
  case (with the operand constraints "+r" : "r") currently triggers the
  TwoAddressInstructionPass assertion against any extra use of a tied
  register. Furthermore, TwoAddress will insert a COPY to that physreg even
  though isel has already done so (for the non-tied use), which may lead to a
  second redundant instruction currently. A simple fix for this is to not
  emit tied physreg uses in the first place for the "+r" constraint, which is
  what this patch does.

- Give an error on multiple outputs to the same physical register.

  This should be reported and this is also what GCC does.

Review: Ulrich Weigand, Aaron Ballman, Jennifer Yu, Craig Topper

Differential Revision: https://reviews.llvm.org/D87279
2020-10-13 15:09:52 +02:00
Bevin Hansson 101309fe04 [AST] Change return type of getTypeInfoInChars to a proper struct instead of std::pair.
Followup to D85191.

This changes getTypeInfoInChars to return a TypeInfoChars
struct instead of a std::pair of CharUnits. This lets the
interface match getTypeInfo more closely.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86447
2020-10-13 13:26:56 +02:00
Bevin Hansson 9fa7f48459 [Fixed Point] Add fixed-point to floating point cast types and consteval.
Reviewed By: leonardchan

Differential Revision: https://reviews.llvm.org/D86631
2020-10-13 13:26:56 +02:00
Ties Stuij 208987844f [ARM] Follow AACPS standard for volatile bit-fields access width
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
  short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
  short a;
  int  b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.

The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```

I've updated the patch D16586 to follow such behavior by verifying that we
only change volatile bit-field access when:
 - it won't overlap with any other non-bit-field member
 - we only access memory inside the bounds of the record
 - avoid overlapping zero-length bit-fields.

Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D72932
2020-10-13 10:31:48 +01:00
Simon Pilgrim 6c23cbc560 [X86] Convert integer _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences.

The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions.

Differential Revision: https://reviews.llvm.org/D87604
2020-10-13 09:28:39 +01:00
Richard Smith 913f600566 Canonicalize declaration pointers when forming APValues.
References to different declarations of the same entity aren't different
values, so shouldn't have different representations.

Recommit of e6393ee813, most recently
reverted in 9a33f027ac due to a bug caused
by ObjCInterfaceDecls not propagating availability attributes along
their redeclaration chains; that bug was fixed in
e2d4174e9c.
2020-10-12 19:32:57 -07:00
Arthur Eubanks 9a33f027ac Revert "Canonicalize declaration pointers when forming APValues."
This reverts commit 9dcd96f728.

See https://crbug.com/1134762.
2020-10-12 12:37:24 -07:00
Tim Renouf 666ef0db20 [AMDGPU] Add gfx602, gfx705, gfx805 targets
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:

* The "Oland" and "Hainan" variants of gfx601 are now split out into
  gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
  that to avoid using the shaderZExport workaround on gfx602.

* One variant of gfx703 is now split out into gfx705. LLPC and other
  front-ends could use that to avoid using the
  shaderSpiCsRegAllocFragmentation workaround on gfx705.

* The "TongaPro" variant of gfx802 is now split out into gfx805.
  TongaPro has a faster 64-bit shift than its former friends in gfx802,
  and a subtarget feature could be set up for that to take advantage of
  it. This commit does not make that change; it just adds the target.

V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
    so fix the GPUKind order.

Differential Revision: https://reviews.llvm.org/D88916

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-10-10 17:22:22 +01:00
Thomas Lively d8f58bf53a [WebAssembly] Prototype i16x8.q15mulr_sat_s
This saturating, rounding, Q-format multiplication instruction is proposed in
https://github.com/WebAssembly/simd/pull/365.

Differential Revision: https://reviews.llvm.org/D88968
2020-10-09 21:17:53 +00:00
Liu, Chen3 26cfb6e562 [X86] Passing union type through register
For example:

  union M256 {
    double d;
    __m256 m;
  };
  extern void foo1(union M256 A);
  union M256 m1;
  void test() {
    foo1(m1);
  }

clang will pass m1 through stack which does not follow the ABI.

Differential Revision: https://reviews.llvm.org/D78699
2020-10-09 11:24:29 +08:00
Alexandre Ganea 66face6aa0 Re-land [DebugInfo] Add debug location to stubs generated by CGDeclCXX and mark them as artificial
Previously, when clang was compiled with -DLLVM_ENABLE_ASSERTIONS=ON, the added tests were displaying:

inlinable function call in a function with debug info must have a !dbg location
  call void @"??1?$c@UB@@@@QEAA@XZ"(%struct.c* @"?f@?1??d@@YAPEAU?$c@UB@@@@XZ@4U2@A")
fatal error: error in backend: Broken module found, compilation aborted!
Stack dump:
0.      Program arguments: <f:\svn\buildninja\bin\clang -cc1 -emit-llvm debug-info-no-location.cpp> -gcodeview -debug-info-kind=limited
1.      <eof> parser at end of file
2.      Per-function optimization

Fixes PR43012

Differential Revision: https://reviews.llvm.org/D66328
2020-10-08 20:49:17 -04:00
Arthur Eubanks afff74e5c2 [HWAsan][NewPM] Handle hwasan like other sanitizers
Move it as an EP callback (-O[123]) or in addSanitizersAtO0.

This makes it not run in ThinLTO pre-link (like the other sanitizers),
so don't check LTO runs in hwasan-new-pm.c. Changing its position also
seems to change the generated IR. I think we just need to make sure the
pass runs.

Reviewed By: leonardchan

Differential Revision: https://reviews.llvm.org/D88936
2020-10-08 14:43:21 -07:00
Joseph Huber 3cc1f1fc1d [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to
consolidate more OpenMP code generation into the OMPIRBuilder. Future
additions to the GPU runtime functions should now go in OMPKinds.def

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl

Tags: #OpenMP #LLVM #clang

Differential Revision: https://reviews.llvm.org/D88430
2020-10-08 14:00:22 -04:00
diggerlin 92bca12843 [AIX] add new option -mignore-xcoff-visibility
SUMMARY:

In IBM compiler xlclang , there is an option -fnovisibility which suppresses visibility. For more details see: https://www.ibm.com/support/knowledgecenter/SSGH3R_16.1.0/com.ibm.xlcpp161.aix.doc/compiler_ref/opt_visibility.html.

We need to add the option -mignore-xcoff-visibility for compatibility with the IBM AIX OS (as the option is enabled by default in AIX). With this option llvm does not emit any visibility attribute to ASM or XCOFF object file.

The option only work on the AIX OS, for other non-AIX OS using the option will report an unsupported options error.

In AIX OS:

1.1  the option -mignore-xcoff-visibility is enabled by default , if there is not -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command .

1.2 if there is -fvisibility=* explicitly but not -mignore-xcoff-visibility  explicitly in the clang command.  it will generate visibility attributes.

1.3 if there are  both  -fvisibility=* and  -mignore-xcoff-visibility  explicitly in the clang command. The option  "-mignore-xcoff-visibility" wins , it do not emit the visibility attribute.

The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR.

Reviewer: daltenty,Jason Liu

Differential Revision: https://reviews.llvm.org/D87451
2020-10-08 09:34:58 -04:00
Pushpinder Singh 3a12ff0dac [OpenMP][RTL] Remove dead code
RequiresDataSharing was always 0, resulting dead code in device runtime library.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D88829
2020-10-06 05:43:47 -04:00
Fangrui Song a2cc883368 [CUDA] Don't call __cudaRegisterVariable on C++17 inline variables
D17779: host-side shadow variables of external declarations of device-side
global variables have internal linkage and are referenced by
`__cuda_register_globals`.

nvcc from CUDA 11 does not allow `__device__ inline` or `__device__ constexpr`
(C++17 inline variables) but clang has incorrectly supported them for a while:

```
error: A __device__ variable cannot be marked constexpr
error: An inline __device__/__constant__/__managed__ variable must have internal linkage when the program is compiled in whole program mode (-rdc=false)
```

If such a variable (which has a comdat group) is discarded (a copy from another
translation unit is prevailing and selected), accessing the variable from
outside the section group (`__cuda_register_globals`) is a violation of the ELF
specification and will be rejected by linkers:

> A symbol table entry with STB_LOCAL binding that is defined relative to one of a group's sections, and that is contained in a symbol table section that is not part of the group, must be discarded if the group members are discarded. References to this symbol table entry from outside the group are not allowed.

As a workaround, don't register such inline variables for now.
(If we register the variables in all TUs, we will keep multiple instances of the shadow and break the C++ semantics for inline variables).
We should reject such variables in Sema but our internal users need some time to migrate.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D88786
2020-10-05 12:53:59 -07:00
Craig Topper a02b449bb1 [X86] Sync AESENC/DEC Key Locker builtins with gcc.
For the wide builtins, pass a single input and output pointer to
the builtins. Emit the GEPs and input loads from CGBuiltin.
2020-10-04 12:09:41 -07:00
Craig Topper 230c57b0bd [X86] Synchronize the encodekey builtins with gcc. Don't assume void* is 16 byte aligned.
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.

The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.

Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
2020-10-04 12:09:35 -07:00
Mark de Wever 1113fbf44c [CodeGen] Improve likelihood branch weights
Bruno De Fraine discovered some issues with D85091. The branch weights
generated for `logical not` and `ternary conditional` were wrong. The
`logical and` and `logical or` differed from the code generated of
`__builtin_predict`.

Adjusted the generated code for the likelihood to match
`__builtin_predict`. The patch is based on Bruno's suggestions.

Differential Revision: https://reviews.llvm.org/D88363
2020-10-04 14:24:27 +02:00
Yaxun (Sam) Liu cbd420c5ed [CUDA][HIP] Fix bound arch for offload action for fat binary
Currently CUDA/HIP toolchain uses "unknown" as bound arch
for offload action for fat binary. This causes -mcpu or -march
with "unknown" added in HIPToolChain::TranslateArgs or
CUDAToolChain::TranslateArgs.

This causes issue for https://reviews.llvm.org/D88377 since
HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs.

The bound arch of offload action for fat binary is not really
used, therefore set it to CudaArch::UNUSED.

Differential Revision: https://reviews.llvm.org/D88524
2020-10-02 19:05:51 -04:00
Richard Smith 8fb2a235b0 Don't reject calls to MinGW's unusual _setjmp declaration.
We now recognize this function as a builtin despite it having an
unexpected number of parameters; make sure we don't enforce that it has
only 1 argument for its 2 parameters.
2020-10-02 15:12:15 -07:00
Yaxun (Sam) Liu dc6a0b0ec7 [HIP] Align device binary
To facilitate faster loading of device binaries and share them among processes,
HIP runtime favors their alignment being 4096 bytes. HIP runtime can load
unaligned device binaries, however, aligning them at 4096 bytes results in
faster loading and less shared memory usage.

This patch adds an option -bundle-align to clang-offload-bundler which allows
bundles to be aligned at specified alignment. By default it is 1, which is NFC
compared to existing format.

This patch then aligns embedded fat binary and device binary inside fat binary
at 4096 bytes.

It has been verified this change does not cause significant overall file size increase
for typical HIP applications (less than 1%).

Differential Revision: https://reviews.llvm.org/D88734
2020-10-02 18:10:44 -04:00
Nathan Lanza 14f6bfcb52 [clang] Implement objc_non_runtime_protocol to remove protocol metadata
Summary:
Motivated by the new objc_direct attribute, this change adds a new
attribute that remotes metadata from Protocols that the programmer knows
isn't going to be used at runtime. We simply have the frontend skip
generating any protocol metadata entries (e.g. OBJC_CLASS_NAME,
_OBJC_$_PROTOCOL_INSTANCE_METHDOS, _OBJC_PROTOCOL, etc) for a protocol
marked with `__attribute__((objc_non_runtime_protocol))`.

There are a few APIs used to retrieve a protocol at runtime.
`@protocol(SomeProtocol)` will now error out of the requested protocol
is marked with attribute. `objc_getProtocol` will return `NULL` which
is consistent with the behavior of a non-existing protocol.

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75574
2020-10-02 17:35:50 -04:00
Michael Liao 8c36eaf037 [clang][opencl][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr.
- `-cl-fp32-correctly-rounded-divide-sqrt` is already handled in a
  per-instruction manner by annotating the accuracy required. There's no
  need to add that fn-attr. So far, there's no in-tree backend handling
  that attr and that OpenCL specific option.
- In case that out-of-tree backends are broken, this change could be
  reverted if those backends could not be fixed.

Differential Revision: https://reviews.llvm.org/D88424
2020-10-01 11:07:39 -04:00
Arthur Eubanks ce5379f0f0 [NPM] Add target specific hook to add passes for New Pass Manager
The patch adds a new TargetMachine member "registerPassBuilderCallbacks" for targets to add passes to the pass pipeline using the New Pass Manager (similar to adjustPassManager for the Legacy Pass Manager).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D88138
2020-09-30 13:29:43 -07:00
Joseph Huber 1b60f63e4f Revert "[OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def"
Failing tests on Arm due to the tests automatically populating
incomatible pointer width architectures. Reverting until the tests are
updated. Failing tests:

OpenMP/distribute_parallel_for_num_threads_codegen.cpp
OpenMP/distribute_parallel_for_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_if_codegen.cpp
OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp
OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_if_codegen.cpp
OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp

This reverts commit 90eaedda9b.
2020-09-30 15:12:21 -04:00
Joseph Huber 90eaedda9b [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to consolidate
more OpenMP code generation into the OMPIRBuilder. This patch also
invalidates specifying target architectures with conflicting pointer
sizes.

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl

Tags: #OpenMP #Clang #LLVM

Differential Revision: https://reviews.llvm.org/D88430
2020-09-30 14:00:01 -04:00
Xiangling Liao 3a7487f903 [FE] Use preferred alignment instead of ABI alignment for complete object when applicable
On some targets, preferred alignment is larger than ABI alignment in some cases. For example,
on AIX we have special power alignment rules which would cause that. Previously, to support
those cases, we added a “PreferredAlignment” field in the `RecordLayout` to store the AIX
special alignment values in “PreferredAlignment” as the community suggested.

However, that patch alone is not enough. There are places in the Clang where `PreferredAlignment`
should have been used instead of ABI-specified alignment. This patch is aimed at fixing those
spots.

Differential Revision: https://reviews.llvm.org/D86790
2020-09-30 10:48:28 -04:00
Xiang1 Zhang 413577a879 [X86] Support Intel Key Locker
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88398
2020-09-30 18:08:45 +08:00
Amy Huang 5c4fc581d5 [DebugInfo] Add types from constructor homing to the retained types list.
Add class types to the retained types list to make sure they
don't get dropped if the constructor is optimized out later.

Differential Revision: https://reviews.llvm.org/D88522
2020-09-29 17:00:45 -07:00
John McCall 984744a131 Fix a variety of minor issues with ObjC method mangling:
- Fix a memory leak accidentally introduced yesterday by using CodeGen's
  existing mangling context instead of creating a new context afresh.

- Move GNU-runtime ObjC method mangling into the AST mangler; this will
  eventually be necessary to support direct methods there, but is also
  just the right architecture.

- Make the Apple-runtime method mangling work properly when given an
  interface declaration, fixing a bug (which had solidified into a test)
  where mangling a category method from the interface could cause it to
  be mangled as if the category name was a class name.  (Category names
  are namespaced within their class and have no global meaning.)

- Fix a code cross-reference in dsymutil.

Based on a patch by Ellis Hoag.
2020-09-29 19:51:53 -04:00
Fangrui Song 3681be876f Add -fprofile-update={atomic,prefer-atomic,single}
GCC 7 introduced -fprofile-update={atomic,prefer-atomic} (prefer-atomic is for
best efforts (some targets do not support atomics)) to increment counters
atomically, which is exactly what we have done with -fprofile-instr-generate
(D50867) and -fprofile-arcs (b5ef137c11).
This patch adds the option to clang to surface the internal options at driver level.

GCC 7 also turned on -fprofile-update=prefer-atomic when -pthread is specified,
but it has performance regression
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307). So we don't follow suit.

Differential Revision: https://reviews.llvm.org/D87737
2020-09-29 10:43:23 -07:00
Tres Popp eb9f7c28e5 Revert "OpaquePtr: Add type to sret attribute"
This reverts commit 55c4ff91bd.

Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
2020-09-29 10:31:04 +02:00
Ellis Hoag 98ef7e29b0 This reduces code duplication between CGObjCMac.cpp and Mangle.cpp
for generating the mangled name of an Objective-C method.

This has no intended functionality change.

https://reviews.llvm.org/D88329
2020-09-29 02:26:51 -04:00
Yaxun (Sam) Liu 187658b8a6 Recommit "[HIP] Change default --gpu-max-threads-per-block value to 1024"
Recommit 04abbb3a78
2020-09-28 22:43:17 -04:00
Craig Topper 288c5776c9 [X86] Use inlineasm flag output for the _bittest* intrinsics.
Instead of expliciting emitting a setc in the inline asm instructions,
we can use flag output. This allows the backend to use the flag
directly if it is needed by a branch. Previously we needed a test
instruction to convert the register back to a flag.

If the flag can't be used directly, the backend will emit a setcc.

Differential Revision: https://reviews.llvm.org/D87888
2020-09-28 13:33:22 -07:00
Vedant Kumar 06bc685fa2 [ubsan] nullability-arg: Fix crash on C++ member pointers
Extend -fsanitize=nullability-arg to handle call sites which accept C++
member pointers.

rdar://62476022

Differential Revision: https://reviews.llvm.org/D88336
2020-09-28 09:41:18 -07:00
Michael Liao 5dbf80cad9 [clang][codegen] Annotate `correctly-rounded-divide-sqrt-fp-math` fn-attr for OpenCL only.
- `-cl-fp32-correctly-rounded-divide-sqrt` is an OpenCL-specific option
  and `correctly-rounded-divide-sqrt-fp-math` should be added for OpenCL
  at most.

Differential revision: https://reviews.llvm.org/D88303
2020-09-28 11:40:32 -04:00
David Sherwood bafdd11326 [SVE] Replace / operator in TypeSize/ElementCount with divideCoefficientBy
After some recent upstream discussion we decided that it was best
to avoid having the / operator for both ElementCount and TypeSize,
since this could give the impression that these classes can be used
in the same way as basic integer integer types. However, division
for scalable types is a bit odd because we are only dividing the
minimum quantity by a value, as opposed to something like:

  (MinSize * Vscale) / SomeValue

This is why when performing division it's important the caller
first establishes whether the operation makes sense, perhaps by
calling isKnownMultipleOf() prior to division. The caller must now
explictly call divideCoefficientBy() on the class to perform the
operation.

Differential Revision: https://reviews.llvm.org/D87700
2020-09-28 08:03:00 +01:00
Richard Smith 9dcd96f728 Canonicalize declaration pointers when forming APValues.
References to different declarations of the same entity aren't different
values, so shouldn't have different representations.

Recommit of e6393ee813 with fixed handling
for weak declarations. We now look for attributes on the most recent
declaration when determining whether a declaration is weak. (Second
recommit with further fixes for mishandling of weak declarations. Our
behavior here is fundamentally unsound -- see PR47663 -- but this
approach attempts to not make things worse.)
2020-09-27 19:05:26 -07:00
Shilei Tian ebb1092a28 [Clang][OpenMP] Added support for nowait target in CodeGen via regular task
Previously for nowait target, CG emitted a function call to `__tgt_target_nowait`, etc. However, in OpenMP RTL, these functions just directly call the no-nowait version, which means nowait is not working as expected.

OpenMP specification says a target is acutally a target task, which is an untied and detachable task. It is natural to go to the direction that generates a task for a nowait target. However, OpenMP task has a problem that it must be within to a parallel region; otherwise the task will be executed immediately. As a result, if we directly wrap to a regular task, the `target nowait` outside of a parallel region is still a synchronous version.

In D77609, I added the support for unshackled task in OpenMP RTL. Basically, unshackled task is a task that is not bound to any parallel region. So all nowait target will be tranformed into an unshackled task. In order to distinguish from regular task, a new flag bit is set for unshackled task. This flag will be used by RTL for later process.

Since all target tasks are allocated via `__kmpc_omp_target_task_alloc`, and in current `libomptarget`, `__kmpc_omp_target_task_alloc` just calls `__kmpc_omp_task_alloc`. Therefore, we can modify the flag in `__kmpc_omp_target_task_alloc` so that we don't need to modify the FE too much. If users choose to opt out the feature, they just need to use a RTL w/o support of unshackled threads.

As a result, in this patch, the `target nowait` region is simply wrapped into a regular task. Later once we have RTL support for unshackled tasks, the wrapped tasks can be executed by unshackled threads w/o changes in the FE.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D78075
2020-09-25 22:10:36 -04:00
Matt Arsenault 55c4ff91bd OpaquePtr: Add type to sret attribute
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
2020-09-25 14:07:30 -04:00
Chris Bowler f330d9f163 [PPC] [AIX] Implement calling convention IR for C99 complex types on AIX
Add AIX calling convention logic to Clang for C99 complex types on AIX

Differential Revision: https://reviews.llvm.org/D88130
2020-09-25 07:43:31 -04:00
Momchil Velikov a88c722e68 [AArch64] PAC/BTI code generation for LLVM generated functions
PAC/BTI-related codegen in the AArch64 backend is controlled by a set
of LLVM IR function attributes, added to the function by Clang, based
on command-line options and GCC-style function attributes. However,
functions, generated in the LLVM middle end (for example,
asan.module.ctor or __llvm_gcov_write_out) do not get any attributes
and the backend incorrectly does not do any PAC/BTI code generation.

This patch record the default state of PAC/BTI codegen in a set of
LLVM IR module-level attributes, based on command-line options:

* "sign-return-address", with non-zero value means generate code to
  sign return addresses (PAC-RET), zero value means disable PAC-RET.

* "sign-return-address-all", with non-zero value means enable PAC-RET
  for all functions, zero value means enable PAC-RET only for
  functions, which spill LR.

* "sign-return-address-with-bkey", with non-zero value means use B-key
  for signing, zero value mean use A-key.

This set of attributes are always added for AArch64 targets (as
opposed, for example, to interpreting a missing attribute as having a
value 0) in order to be able to check for conflicts when combining
module attributed during LTO.

Module-level attributes are overridden by function level attributes.
All the decision making about whether to not to generate PAC and/or
BTI code is factored out into AArch64FunctionInfo, there shouldn't be
any places left, other than AArch64FunctionInfo, which directly
examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which
is/will-be handled by a separate patch.

Differential Revision: https://reviews.llvm.org/D85649
2020-09-25 11:47:14 +01:00
Ian Levesque 6f7fbdd285 [xray] Function coverage groups
Add the ability to selectively instrument a subset of functions by dividing the functions into N logical groups and then selecting a group to cover. By selecting different groups over time you could cover the entire application incrementally with lower overhead than instrumenting the entire application at once.

Differential Revision: https://reviews.llvm.org/D87953
2020-09-24 22:09:53 -04:00
Reid Kleckner ecfc9b9712 [MS] For unknown ISAs, pass non-trivially copyable arguments indirectly
Passing them directly is likely to be non-conforming, since it usually
involves copying the bytes of the record. For unknown architectures, we
don't know what MSVC does or will do, but we should at least try to
conform as well as we can.
2020-09-24 16:29:48 -07:00
Reid Kleckner b8a50e9207 [MS] Simplify rules for passing C++ records
Regardless of the target architecture, we should always use the C rules
(RAA_Default) for records that "canBePassedInRegisters". Those are
trivially copyable things, and things marked with [[trivial_abi]].

This should be NFC, although it changes where the final decision about
x86_32 overaligned records is made. The current x86_32 C rules say that
overaligned things are passed indirectly, so there is no functional
difference.
2020-09-24 16:29:47 -07:00
Amy Huang c8df781e54 [DebugInfo] Fix bug in constructor homing with classes with trivial
constructors.

This changes the code to avoid using constructor homing for aggregate
classes and classes with trivial default constructors, instead of trying
to loop through the constructors.

Differential Revision: https://reviews.llvm.org/D87808
2020-09-24 14:43:48 -07:00
Erich Keane f8a92adfa2 Remove dead branch identified by @rsmith on post-commit for D88236 2020-09-24 13:05:15 -07:00
Erich Keane 606a734755 [PR47636] Fix tryEmitPrivate to handle non-constantarraytypes
As mentioned in the bug report, tryEmitPrivate chokes on the
MaterializeTemporaryExpr in the reproducers, since it assumes that if
there are elements, than it must be a ConstantArrayType. However, the
MaterializeTemporaryExpr (which matches exactly the AST when it is NOT a
global/static) has an incomplete array type.

This changes the section where the number-of-elements is non-zero to
properly handle non-CAT types by just extracting it as an array type
(since all we needed was the element type out of it).
2020-09-24 12:09:22 -07:00
Amy Kwan 6b136b19cb [Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins.
This patch implements custom codegen for the vec_replace_elt and
vec_replace_unaligned builtins.

These builtins map to the @llvm.ppc.altivec.vinsw and @llvm.ppc.altivec.vinsd
intrinsics depending on the arguments. The main motivation for doing custom
codegen for these intrinsics is because there are float and double versions of
the builtin. Normally, the converting the float to an integer would be done via
fptoui in the IR. This is incorrect as fptoui truncates the value and we must
ensure the value is not truncated. Therefore, we provide custom codegen to utilize
bitcast instead as bitcasts do not truncate.

Differential Revision: https://reviews.llvm.org/D83500
2020-09-23 22:55:25 -05:00
Craig Topper d9717d8ee7 [X86] Add a memory clobber to the bittest intrinsic inline asm. Get default clobbers from the target
I believe the inline asm emitted here should have a memory clobber since it writes to memory.

It was also missing the dirflag clobber that we use by default along with flags and fpsr. To avoid missing defaults in the future, get the default list from the target

Differential Revision: https://reviews.llvm.org/D88121
2020-09-23 14:54:39 -07:00
Amy Kwan 2e7117f847 [PowerPC] Implement the 128-bit vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins in Clang/LLVM
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.

Differential Revision: https://reviews.llvm.org/D87910
2020-09-23 16:49:40 -04:00
Stanislav Mekhanoshin 59691dc874 [AMDGPU] Make ds fp atomics overloadable
Differential Revision: https://reviews.llvm.org/D87947
2020-09-23 11:39:50 -07:00
Sriraman Tallam 7d0bbe4090 Re-apply https://reviews.llvm.org/D87921, was reverted to triage a PPC bot failure.
D87921 was reverted in commit b89059a313
as it was causing an unknown llvm PPC bot failure.  Reapplying the patch
after confirming that this is not responsible. Build bot failure:
https://reviews.llvm.org/D87921#2286644  which caused the revert.

The wrong placement of add pass with optimizations led to
-funique-internal-linkage-names being disabled.

Fixed the placement of the MPM.addpass for UniqueInternalLinkageNames to make it
work correctly with -O2 and new pass manager. Updated the tests to explicitly
check O0 and O1.

Differential Revision: https://reviews.llvm.org/D87921
2020-09-23 10:28:40 -07:00
Yaxun (Sam) Liu 301e23305d [CUDA][HIP] Fix static device var used by host code only
A static device variable may be accessed in host code through
cudaMemCpyFromSymbol etc. Currently clang does not
emit the static device variable if it is only referenced by
host code, which causes host code to fail at run time.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D88115
2020-09-23 08:18:19 -04:00
Mircea Trofin cf112382dd [ThinLTO] Option to bypass function importing.
This completes the circle, complementing -lto-embed-bitcode
(specifically, post-merge-pre-opt). Using -thinlto-assume-merged skips
function importing. The index file is still needed for the other data it
contains.

Differential Revision: https://reviews.llvm.org/D87949
2020-09-22 13:12:11 -07:00
Sriraman Tallam b89059a313 Revert "The wrong placement of add pass with optimizations led to -funique-internal-linkage-names being disabled."
This reverts commit 6950db36d3.
2020-09-22 12:32:43 -07:00
Zequan Wu 9caa3fbe03 [Coverage] Add empty line regions to SkippedRegions
Differential Revision: https://reviews.llvm.org/D84988
2020-09-21 12:42:53 -07:00
Reid Kleckner 3b3a165485 [MS] On x86_32, pass overaligned, non-copyable arguments indirectly
This updates the C++ ABI argument classification code to use the logic
from D72114, fixing an ABI incompatibility with MSVC.

Part of PR44395.

Differential Revision: https://reviews.llvm.org/D87923
2020-09-21 11:49:17 -07:00
Sriraman Tallam 6950db36d3 The wrong placement of add pass with optimizations led to -funique-internal-linkage-names being disabled.
Fixed the placement of the MPM.addpass for UniqueInternalLinkageNames to make
it work correctly with -O2 and new pass manager. Updated the tests to
explicitly check O0 and O2.

Previously, the addPass was placed before BackendUtil.cpp#L1373 which is wrong
as MPM gets assigned at this point and any additions to the pass vector before
this is wrong. This change just moves it after MPM is assigned and places it at
a point where O0 and O0+ can share it.

Differential Revision: https://reviews.llvm.org/D87921
2020-09-21 10:00:12 -07:00
Alexey Bataev d5ce8233bf [OpenMP 5.0] Fix user-defined mapper privatization in tasks
This patch fixes the problem that user-defined mapper array is not correctly privatized inside a task. This problem causes openmp/libomptarget/test/offloading/target_depend_nowait.cpp fails.

Differential Revision: https://reviews.llvm.org/D84470
2020-09-17 11:21:10 -04:00
Michael Liao 4d4f092283 [clang][codegen] Skip adding default function attributes on intrinsics.
- After loading builtin bitcode for linking, skip adding default
  function attributes on LLVM intrinsics as their attributes are
  well-defined and retrieved directly from internal definitions. Adding
  extra attributes on intrinsics results in inconsistent result when
  `-save-temps` is present. Also, that makes few optimizations
  conservative.

Differential Revision: https://reviews.llvm.org/D87761
2020-09-16 14:10:05 -04:00
Sam McCall f5c7102dbc Update dead links to Itanium and ARM ABIs. NFC 2020-09-16 13:42:01 +02:00
Simon Pilgrim 4abb5cd839 CGBlocks.cpp - assert non-null CGF pointer. NFCI.
Fixes static analyzer warning.
2020-09-16 12:30:24 +01:00
Mircea Trofin 61fc10d6a5 [ThinLTO] add post-thinlto-merge option to -lto-embed-bitcode
This will embed bitcode after (Thin)LTO merge, but before optimizations.
In the case the thinlto backend is called from clang, the .llvmcmd
section is also produced. Doing so in the case where the caller is the
linker doesn't yet have a motivation, and would require plumbing through
command line args.

Differential Revision: https://reviews.llvm.org/D87636
2020-09-15 15:56:11 -07:00
Alexey Bataev 9e3842d603 [OPENMP]Fix codegen for is_device_ptr component, captured by reference.
Need to map the component as TO instead of the literal, because need to
pass a reference to a component if the pointer is overaligned.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84887
2020-09-15 17:21:38 -04:00
Snehasish Kumar f1a3ab9044 [clang] Add a command line flag for the Machine Function Splitter.
This patch adds a command line flag for the machine function splitter
(added in rG94faadaca4e1).

-fsplit-machine-functions
Split machine functions using profile information (x86 ELF). On
other targets an error is emitted. If profile information is not
provided a warning is emitted notifying the user that profile
information is required.

Differential Revision: https://reviews.llvm.org/D87047
2020-09-15 12:41:58 -07:00
Zequan Wu f975ae4867 [CodeGen][typeid] Emit typeinfo directly if type is known at compile-time
Differential Revision: https://reviews.llvm.org/D87425
2020-09-15 12:15:47 -07:00
Alexey Bataev 738bab743b [OPENMP]Add support for allocate vars in untied tasks.
Local vars, marked with pragma allocate, mustbe allocate by the call of
the runtime function and cannot be allocated as other local variables.
Instead, we allocate a space for the pointer in private record and store
the address, returned by kmpc_alloc call in this pointer.
So, for untied tasks

```
 #pragma omp task untied
 {
   S s;
    #pragma omp allocate(s) allocator(allocator)
   s = x;
 }
```
compiler generates something like this:
```
struct task_with_privates {
  S *ptr;
};

void entry(task_with_privates *p) {
  S *s = p->s;
  switch(partid) {
  case 1:
    p->s = (S*)kmpc_alloc();
    kmpc_omp_task();
    br exit;
  case 2:
    *s = x;
    kmpc_omp_task();
    br exit;
  case 2:
    ~S(s);
    kmpc_free((void*)s);
    br exit;
  }
exit:
}
```

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86558
2020-09-15 13:39:14 -04:00
Simon Pilgrim 9eab73fa17 [X86] Update SSE/AVX integer MINMAX intrinsics to emit llvm.smax.* etc. (PR46851)
We're now getting close to having the necessary analysis/combines etc. for the new generic llvm smax/smin/umax/umin intrinsics.

This patch updates the SSE/AVX integer MINMAX intrinsics to emit the generic equivalents instead of the icmp+select code pattern.

Differential Revision: https://reviews.llvm.org/D87603
2020-09-15 11:19:08 +01:00
Teresa Johnson 226d80ebe2 [MemProf] Rename HeapProfiler to MemProfiler for consistency
This is consistent with the clang option added in
7ed8124d46, and the comments on the
runtime patch in D87120.

Differential Revision: https://reviews.llvm.org/D87622
2020-09-14 13:14:57 -07:00
Simon Pilgrim 3b7708e2de Assert we've found the size of each (non-overlapping) structure. NFCI.
Fixes clang static analyzer warning.
2020-09-14 16:10:52 +01:00
Serge Pavlov f1cd6593da [AST][FPEnv] Keep FP options in trailing storage of CastExpr
This is recommit of 6c8041aa0f, reverted in de044f7562 because of some
fails. Original commit message is below.

This change allow a CastExpr to have optional FPOptionsOverride object,
stored in trailing storage. Of all cast nodes only ImplicitCastExpr,
CStyleCastExpr, CXXFunctionalCastExpr and CXXStaticCastExpr are allowed
to have FPOptions.

Differential Revision: https://reviews.llvm.org/D85960
2020-09-14 12:15:21 +07:00
Florian Hahn a874d63344 [Clang] Add option to allow marking pass-by-value args as noalias.
After the recent discussion on cfe-dev 'Can indirect class parameters be
noalias?' [1], it seems like using using noalias is problematic for
current C++, but should be allowed for C-only code.

This patch introduces a new option to let the user indicate that it is
safe to mark indirect class parameters as noalias. Note that this also
applies to external callers, e.g. it might not be safe to use this flag
for C functions that are called by C++ functions.

In targets that allocate indirect arguments in the called function, this
enables more agressive optimizations with respect to memory operations
and brings a ~1% - 2% codesize reduction for some programs.

[1] : http://lists.llvm.org/pipermail/cfe-dev/2020-July/066353.html

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D85473
2020-09-12 14:56:13 +01:00
Tyker 78de7297ab Reland [AssumeBundles] Use operand bundles to encode alignment assumptions
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
2020-09-12 15:36:06 +02:00
Serge Pavlov de044f7562 Revert "[AST][FPEnv] Keep FP options in trailing storage of CastExpr"
This reverts commit 6c8041aa0f.
It caused some fails on buildbots.
2020-09-12 17:06:42 +07:00
Serge Pavlov 6c8041aa0f [AST][FPEnv] Keep FP options in trailing storage of CastExpr
This change allow a CastExpr to have optional FPOptionsOverride object,
stored in trailing storage. Of all cast nodes only ImplicitCastExpr,
CStyleCastExpr, CXXFunctionalCastExpr and CXXStaticCastExpr are allowed
to have FPOptions.

Differential Revision: https://reviews.llvm.org/D85960
2020-09-12 14:30:44 +07:00
Cullen Rhodes 002f5ab3b1 [clang][aarch64] Fix ILP32 ABI for arm_sve_vector_bits
The element types of scalable vectors are defined in terms of stdint
types in the ACLE. This patch fixes the mapping to builtin types for the
ILP32 ABI when creating VLS types with the arm_sve_vector_bits, where
the mapping is as follows:

  int32_t -> LongTy
  int64_t -> LongLongTy
  uint32_t -> UnsignedLongTy
  uint64_t -> UnsignedLongLongTy

This is implemented by leveraging getBuiltinVectorTypeInfo which is
target agnostic since it calls ASTContext::getIntTypeForBitwidth for
integer types. The element type for svfloat16_t is changed from
Float16Ty to HalfTy when creating VLS types since this is what is used
elsewhere.

For more information, see:

https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#types-varying-by-data-model
https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-support-for-scalable-vectors

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87358
2020-09-11 09:46:35 +00:00
Michael Liao b22d450496 Remove dependency on clangASTMatchers.
- It seems no long required for shared library builds.
2020-09-10 22:17:48 -04:00
Mark de Wever 08196e0b2e Implements [[likely]] and [[unlikely]] in IfStmt.
This is the initial part of the implementation of the C++20 likelihood
attributes. It handles the attributes in an if statement.

Differential Revision: https://reviews.llvm.org/D85091
2020-09-09 20:48:37 +02:00
Qiu Chaofan 88ff4d2ca1 [PowerPC] Fix STRICT_FRINT/STRICT_FNEARBYINT lowering
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.

One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87220
2020-09-09 22:40:58 +08:00
Ties Stuij d6f3f61231 Revert "[ARM] Follow AACPS standard for volatile bit-fields access width"
This reverts commit 514df1b2bb.

Some of the buildbots got llvm-lit errors on CodeGen/volatile.c
2020-09-08 18:46:27 +01:00
Ties Stuij 514df1b2bb [ARM] Follow AACPS standard for volatile bit-fields access width
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
  short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
  short a;
  int  b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.

The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```

Patch D16586 was updated to follow such behavior by verifying that we
only change volatile bit-field access when:
 - it won't overlap with any other non-bit-field member
 - we only access memory inside the bounds of the record
 - avoid overlapping zero-length bit-fields.

Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.

Differential Revision: https://reviews.llvm.org/D72932

The following people contributed to this patch:
- Diogo Sampaio
- Ties Stuij
2020-09-08 17:49:49 +01:00
Simon Pilgrim 58970eb7d1 [OpenMP] Fix typo in CodeGenFunction::EmitOMPWorksharingLoop (PR46412)
Fixes issue noticed by static analysis where we have a copy+paste typo, testing ScheduleKind.M1 twice instead of ScheduleKind.M2.

Differential Revision: https://reviews.llvm.org/D87250
2020-09-08 11:59:38 +01:00
Simon Pilgrim 2853ae3c1b [X86] Update SSE/AVX ABS intrinsics to emit llvm.abs.* (PR46851)
We're now getting close to having the necessary analysis/combines etc. for the new generic llvm.abs.* intrinsics.

This patch updates the SSE/AVX ABS vector intrinsics to emit the generic equivalents instead of the icmp+sub+select code pattern.

Differential Revision: https://reviews.llvm.org/D87101
2020-09-07 13:54:12 +01:00
Simon Pilgrim a8a91533dd [X86] Replace EmitX86AddSubSatExpr with EmitX86BinaryIntrinsic generic helper. NFCI.
Feed the Intrinsic::ID value directly instead of via the IsSigned/IsAddition bool flags.
2020-09-07 13:33:48 +01:00
Eduardo Caldas 1a7a2cd747 [Ignore Expressions][NFC] Refactor to better use `IgnoreExpr.h` and nits
This change groups
* Rename: `ignoreParenBaseCasts` -> `IgnoreParenBaseCasts` for uniformity
* Rename: `IgnoreConversionOperator` -> `IgnoreConversionOperatorSingleStep` for uniformity
* Inline `IgnoreNoopCastsSingleStep` into a lambda inside `IgnoreNoopCasts`
* Refactor `IgnoreUnlessSpelledInSource` to make adequate use of `IgnoreExprNodes`

Differential Revision: https://reviews.llvm.org/D86880
2020-09-07 09:32:30 +00:00
Amy Huang aaf1a96408 [DebugInfo] Add size to class declarations in debug info.
This adds the size to forward declared class DITypes, if the size is known.

Fixes an issue where we determine whether to emit fragments based on the
type size, so fragments would sometimes be incorrectly emitted if there
was no size.

Bug: https://bugs.llvm.org/show_bug.cgi?id=47338

Differential Revision: https://reviews.llvm.org/D87062
2020-09-03 15:42:27 -07:00
Yaxun (Sam) Liu 62dbb7e54c Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024"
Temporarily revert commit 04abbb3a78
due to regressions in some HIP apps due backend issues revealed by
this change.

Will re-commit it when backend issues are fixed.
2020-09-02 16:12:28 -04:00
Erik Pilkington 2d11ae0a40 Fix a -Wparenthesis warning in 8ff44e644b, NFC 2020-09-02 15:01:54 -04:00
Erik Pilkington 8ff44e644b [IRGen] Fix an assert when __attribute__((used)) is used on an ObjC method
This assert doesn't really make sense for functions in general, since they
start life as declarations, and there isn't really any reason to require them
to be defined before attributes are applied to them.

rdar://67895846
2020-09-02 12:19:11 -04:00
Erik Pilkington a9a6e62ddf [CodeGen] Make sure the EH cleanup for block captures is conditional when the block literal is in a conditional context
Previously, clang was crashing on the attached test because the EH cleanup for
the block capture was incorrectly emitted under the assumption that the
expression wasn't conditionally evaluated. This was because before 9a52de00260,
pushLifetimeExtendedDestroy was mainly used with C++ automatic lifetime
extension, where a conditionally evaluated expression wasn't possible. Now that
we're using this path for block captures, we need to handle this case.

rdar://66250047

Differential revision: https://reviews.llvm.org/D86854
2020-08-31 10:12:17 -04:00
Arnold Schwaighofer 41634497d4 Teach the swift calling convention about _Atomic types
rdar://67351073

Differential Revision: https://reviews.llvm.org/D86218
2020-08-31 07:07:25 -07:00
Fangrui Song b5ef137c11 [gcov] Increment counters with atomicrmw if -fsanitize=thread
Without this patch, `clang --coverage -fsanitize=thread` may fail spuriously
because non-atomic counter increments can be detected as data races.
2020-08-28 16:32:35 -07:00
Cullen Rhodes 2ddf795e8c Reland "[CodeGen][AArch64] Support arm_sve_vector_bits attribute"
This relands D85743 with a fix for test
CodeGen/attr-arm-sve-vector-bits-call.c that disables the new pass
manager with '-fno-experimental-new-pass-manager'. Test was failing due
to IR differences with the new pass manager which broke the Fuchsia
builder [1]. Reverted in 2e7041f.

[1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375

Original summary:

This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.

VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:

  * Implicit casting between VLA <-> VLS types.
  * Coercion of VLS types in function args/return.
  * Mangling of VLS types.

Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.

Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.

The VLA and VLS types are defined by the ACLE to map to the same
machine-level SVE vectors. VLS types are mangled in the same way as:

  __SVE_VLS<typename, unsigned>

where the first argument is the underlying variable-length type and the
second argument is the SVE vector length in bits. For example:

  #if __ARM_FEATURE_SVE_BITS==512
  // Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE
  typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));
  // Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE
  typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));
  #endif

The latest ACLE specification (00bet5) does not contain details of this
mangling scheme, it will be specified in the next revision.  The
mangling scheme is otherwise defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.

[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85743
2020-08-28 15:57:09 +00:00
David Sherwood f4257c5832 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
JF Bastien 82d29b397b Add an unsigned shift base sanitizer
It's not undefined behavior for an unsigned left shift to overflow (i.e. to
shift bits out), but it has been the source of bugs and exploits in certain
codebases in the past. As we do in other parts of UBSan, this patch adds a
dynamic checker which acts beyond UBSan and checks other sources of errors. The
option is enabled as part of -fsanitize=integer.

The flag is named: -fsanitize=unsigned-shift-base
This matches shift-base and shift-exponent flags.

<rdar://problem/46129047>

Differential Revision: https://reviews.llvm.org/D86000
2020-08-27 19:50:10 -07:00
Cullen Rhodes 2e7041fdc2 Revert "[CodeGen][AArch64] Support arm_sve_vector_bits attribute"
Test CodeGen/attr-arm-sve-vector-bits-call.c is failing on some builders
[1][2]. Reverting whilst I investigate.

[1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375
[2] https://luci-milo.appspot.com/p/fuchsia/builders/ci/clang-linux-x64/b8870800848452818112

This reverts commit 42587345a3.
2020-08-27 21:31:05 +00:00
Craig Topper 17ceda99d3 [CodeGen] Use an AttrBuilder to bulk remove 'target-cpu', 'target-features', and 'tune-cpu' before re-adding in CodeGenModule::setNonAliasAttributes.
I think the removeAttributes interface should be faster than
calling removeAttribute 3 times.
2020-08-27 12:54:20 -07:00
Mikhail Maltsev ae1396c7d4 [ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).

The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.

This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D86146
2020-08-27 18:43:16 +01:00
Teresa Johnson 7ed8124d46 [HeapProf] Clang and LLVM support for heap profiling instrumentation
See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html

Note that the runtime changes will be sent separately (hopefully this
week, need to add some tests).

This patch includes the LLVM pass to instrument memory accesses with
either inline sequences to increment the access count in the shadow
location, or alternatively to call into the runtime. It also changes
calls to memset/memcpy/memmove to the equivalent runtime version.
The pass is modeled on the address sanitizer pass.

The clang changes add the driver option to invoke the new pass, and to
link with the upcoming heap profiling runtime libraries.

Currently there is no attempt to optimize the instrumentation, e.g. to
aggregate updates to the same memory allocation. That will be
implemented as follow on work.

Differential Revision: https://reviews.llvm.org/D85948
2020-08-27 08:50:35 -07:00
Cullen Rhodes 42587345a3 [CodeGen][AArch64] Support arm_sve_vector_bits attribute
This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.

VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:

  * Implicit casting between VLA <-> VLS types.
  * Coercion of VLS types in function args/return.
  * Mangling of VLS types.

Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.

Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.

The VLA and VLS types are defined by the ACLE to map to the same
machine-level SVE vectors. VLS types are mangled in the same way as:

  __SVE_VLS<typename, unsigned>

where the first argument is the underlying variable-length type and the
second argument is the SVE vector length in bits. For example:

  #if __ARM_FEATURE_SVE_BITS==512
  // Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE
  typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));
  // Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE
  typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));
  #endif

The latest ACLE specification (00bet5) does not contain details of this
mangling scheme, it will be specified in the next revision.  The
mangling scheme is otherwise defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.

[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85743
2020-08-27 15:11:58 +00:00
Sander de Smalen 4e9b66de3f [AArch64][SVE] Add missing debug info for ACLE types.
This patch adds type information for SVE ACLE vector types,
by describing them as vectors, with a lower bound of 0, and
an upper bound described by a DWARF expression using the
AArch64 Vector Granule register (VG), which contains the
runtime multiple of 64bit granules in an SVE vector.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86101
2020-08-27 10:56:42 +01:00
Christopher Tetreault 19e883fc59 [SVE] Remove calls to VectorType::getNumElements from clang
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D82582
2020-08-26 11:12:26 -07:00
Zequan Wu 9500a72091 Revert "[Coverage] Enable emitting gap area between macros"
This reverts commit a31c89c1b7.
2020-08-25 15:28:42 -07:00
Amy Huang b1009ee84f Reland "[DebugInfo] Move constructor homing case in shouldOmitDefinition."
For some reason the ctor homing case was before the template
specialization case, and could have returned false too early.
I moved the code out into a separate function to avoid this.

This reverts commit 05777ab941.
2020-08-25 12:36:11 -07:00
Jeremy Morse 121a49d839 [LiveDebugValues] Add switches for using instr-ref variable locations
This patch adds the -Xclang option
"-fexperimental-debug-variable-locations" and same LLVM CodeGen option,
to pick which variable location tracking solution to use.

Right now all the switch does is pick which LiveDebugValues
implementation to use, the normal VarLoc one or the instruction
referencing one in rGae6f78824031. Over time, the aim is to add fragments
of support in aid of the value-tracking RFC:

  http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html

also controlled by this command line switch. That will slowly move
variable locations to be defined by an instruction calculating a value,
and a DBG_INSTR_REF instruction referring to that value. Thus, this is
going to grow into a "use the new kind of variable locations" switch,
rather than just "use the new LiveDebugValues implementation".

Differential Revision: https://reviews.llvm.org/D83048
2020-08-25 14:58:48 +01:00
Eric Christopher 05777ab941 Temporarily Revert "[DebugInfo] Move constructor homing case in shouldOmitDefinition."
as it's causing test failures.

This reverts commit 589ce5f705.
2020-08-24 21:51:31 -07:00
Amy Huang 589ce5f705 [DebugInfo] Move constructor homing case in shouldOmitDefinition.
For some reason the ctor homing case was before the template
specialization case, and could have returned false too early.
I moved the code out into a separate function to avoid this.

Also added a run line to the template specialization test. I guess
all the -debug-info-kind=limited tests should still pass with =constructor,
but it's probably unnecessary to test for all of those.

Differential Revision: https://reviews.llvm.org/D86491
2020-08-24 20:17:59 -07:00
Raphael Isemann 105151ca56 Reland "Correctly emit dwoIDs after ASTFileSignature refactoring (D81347)"
The orignal patch with the missing 'REQUIRES: asserts' as there is a debug-only
flag used in the test.

Original summary:

D81347 changes the ASTFileSignature to be an array of 20 uint8_t instead of 5
uint32_t. However, it didn't update the code in ObjectFilePCHContainerOperations
that creates the dwoID in the module from the ASTFileSignature
(`Buffer->Signature` being the array subclass that is now `std::array<uint8_t,
20>` instead of `std::array<uint32_t, 5>`).

```
  uint64_t Signature = [..] (uint64_t)Buffer->Signature[1] << 32 | Buffer->Signature[0]
```

This code works with the old ASTFileSignature (where two uint32_t are enough to
fill the uint64_t), but after the patch this only took two bytes from the
ASTFileSignature and only partly filled the Signature uint64_t.

This caused that the dwoID in the module ref and the dwoID in the actual module
no longer match (which in turns causes that LLDB keeps warning about the dwoID's
not matching when debugging -gmodules-compiled binaries).

This patch just unifies the logic for turning the ASTFileSignature into an
uint64_t which makes the dwoID match again (and should prevent issues like that
in the future).

Reviewed By: aprantl, dang

Differential Revision: https://reviews.llvm.org/D84013
2020-08-24 14:52:53 +02:00
Bevin Hansson 577f8b157a [Fixed Point] Add codegen for fixed-point shifts.
This patch adds codegen to Clang for fixed-point shift
operations.

Reviewed By: leonardchan

Differential Revision: https://reviews.llvm.org/D83294
2020-08-24 14:37:16 +02:00
Bevin Hansson 808ac54645 [Fixed Point] Use FixedPointBuilder to codegen fixed-point IR.
This changes the methods in CGExprScalar to use
FixedPointBuilder to generate IR for fixed-point
conversions and operations.

Since FixedPointBuilder emits padded operations slightly
differently than the original code, some tests change.

Reviewed By: leonardchan

Differential Revision: https://reviews.llvm.org/D86282
2020-08-24 14:37:07 +02:00
Raphael Isemann 2b3074c0d1 Revert "Reland "Correctly emit dwoIDs after ASTFileSignature refactoring (D81347)""
This reverts commit ada2e8ea67. Still breaking
on Fuchsia (and also Fedora) with exit code 1, so back to investigating.
2020-08-24 12:54:25 +02:00
Raphael Isemann ada2e8ea67 Reland "Correctly emit dwoIDs after ASTFileSignature refactoring (D81347)"
This relands D84013 but with a test that relies on less shell features to
hopefully make the test pass on Fuchsia (where the test from the previous patch
version strangely failed with a plain "Exit code 1").

Original summary:

D81347 changes the ASTFileSignature to be an array of 20 uint8_t instead of 5 uint32_t.
However, it didn't update the code in ObjectFilePCHContainerOperations that creates
the dwoID in the module from the ASTFileSignature (`Buffer->Signature` being the
array subclass that is now `std::array<uint8_t, 20>` instead of `std::array<uint32_t, 5>`).

```
  uint64_t Signature = [..] (uint64_t)Buffer->Signature[1] << 32 | Buffer->Signature[0]
```

This code works with the old ASTFileSignature  (where two uint32_t are enough to
fill the uint64_t), but after the patch this only took two bytes from the ASTFileSignature
and only partly filled the Signature uint64_t.

This caused that the dwoID in the module ref and the dwoID in the actual module no
longer match (which in turns causes that LLDB keeps warning about the dwoID's not
matching when debugging -gmodules-compiled binaries).

This patch just unifies the logic for turning the ASTFileSignature into an uint64_t which
makes the dwoID match again (and should prevent issues like that in the future).

Reviewed By: aprantl, dang

Differential Revision: https://reviews.llvm.org/D84013
2020-08-24 11:51:32 +02:00
Raphael Isemann c1dd5df425 Revert "Correctly emit dwoIDs after ASTFileSignature refactoring (D81347)"
This reverts commit a4c3ed42ba.

The test is curiously failing with a plain exit code 1 on Fuchsia.
2020-08-21 16:08:37 +02:00
Raphael Isemann a4c3ed42ba Correctly emit dwoIDs after ASTFileSignature refactoring (D81347)
D81347 changes the ASTFileSignature to be an array of 20 uint8_t instead of 5
uint32_t. However, it didn't update the code in ObjectFilePCHContainerOperations
that creates the dwoID in the module from the ASTFileSignature
(`Buffer->Signature` being the array subclass that is now `std::array<uint8_t,
20>` instead of `std::array<uint32_t, 5>`).

```
  uint64_t Signature = [..] (uint64_t)Buffer->Signature[1] << 32 | Buffer->Signature[0]
```

This code works with the old ASTFileSignature (where two uint32_t are enough to
fill the uint64_t), but after the patch this only took two bytes from the
ASTFileSignature and only partly filled the Signature uint64_t.

This caused that the dwoID in the module ref and the dwoID in the actual module
no longer match (which in turns causes that LLDB keeps warning about the dwoID's
not matching when debugging -gmodules-compiled binaries).

This patch just unifies the logic for turning the ASTFileSignature into an
uint64_t which makes the dwoID match again (and should prevent issues like that
in the future).

Reviewed By: aprantl, dang

Differential Revision: https://reviews.llvm.org/D84013
2020-08-21 15:05:02 +02:00
Bevin Hansson 1a995a0af3 [ADT] Move FixedPoint.h from Clang to LLVM.
This patch moves FixedPointSemantics and APFixedPoint
from Clang to LLVM ADT.

This will make it easier to use the fixed-point
classes in LLVM for constructing an IR builder for
fixed-point and for reusing the APFixedPoint class
for constant evaluation purposes.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144025.html

Reviewed By: leonardchan, rjmccall

Differential Revision: https://reviews.llvm.org/D85312
2020-08-20 10:29:45 +02:00
Craig Topper 724f570ad2 [X86] Add support 'tune' in target attribute
This adds parsing and codegen support for tune in target attribute.

I've implemented this so that arch in the target attribute implicitly disables tune from the command line. I'm not sure what gcc does here. But since -march implies -mtune. I assume 'arch' in the target attribute implies tune in the target attribute.

Differential Revision: https://reviews.llvm.org/D86187
2020-08-19 15:58:19 -07:00
Aaron Puchert 916b750a8d [CodeGen] Use existing EmitLambdaVLACapture (NFC) 2020-08-19 15:20:05 +02:00
Sander de Smalen 0353848cc9 [Clang][SVE] NFC: Move info about ACLE types into separate function.
This function returns a struct `BuiltinVectorTypeInfo` that contains
the builtin vector's element type, element count and number of vectors
(used for vector tuples).

Reviewed By: c-rhodes

Differential Revision: https://reviews.llvm.org/D86100
2020-08-19 11:04:20 +01:00
Craig Topper 4cbceb74bb [X86] Add basic support for -mtune command line option in clang
Building on the backend support from D85165. This parses the command line option in the driver, passes it on to CC1 and adds a function attribute.

-Still need to support tune on the target attribute.
-Need to use "generic" as the tuning by default. But need to change generic in the backend first.
-Need to set tune if march is specified and mtune isn't.
-May need to disable getHostCPUName's ability to guess CPU name from features when it doesn't have a family/model match for mtune=native. That's what gcc appears to do.

Differential Revision: https://reviews.llvm.org/D85384
2020-08-18 15:13:19 -07:00
Zequan Wu 84fffa6728 [Coverage] Adjust skipped regions only if {Prev,Next}TokLoc is in the same file as regions' {start, end}Loc
Fix a bug if {Prev, Next}TokLoc is in different file from skipped regions' {start, end}Loc

Differential Revision: https://reviews.llvm.org/D86116
2020-08-18 13:26:19 -07:00
Eli Friedman 673dbe1b5e [clang codegen] Use IR "align" attribute for static array arguments.
Without the "align" attribute, marking the argument dereferenceable is
basically useless.  See also D80166.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46876 .

Differential Revision: https://reviews.llvm.org/D84992
2020-08-18 12:51:16 -07:00
Johannes Doerfert 95a25e4c32 [OpenMP][FIX] Do not use TBAA in type punning reduction GPU code PR46156
When we implement OpenMP GPU reductions we use type punning a lot during
the shuffle and reduce operations. This is not always compatible with
language rules on aliasing. So far we generated TBAA which later allowed
to remove some of the reduce code as accesses and initialization were
"known to not alias". With this patch we avoid TBAA in this step,
hopefully for all accesses that we need to.

Verified on the reproducer of PR46156 and QMCPack.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D86037
2020-08-16 14:38:31 -05:00
Gui Andrade 909a851dbf [CGAtomic] Mark atomic libcall functions `nounwind`
These functions won't ever unwind. This is useful for MemorySanitizer
as it simplifies handling __atomic_load in particular.

Differential Revision: https://reviews.llvm.org/D85573
2020-08-14 07:46:43 +00:00
Zequan Wu a31c89c1b7 [Coverage] Enable emitting gap area between macros
Differential Revision: https://reviews.llvm.org/D85176
2020-08-12 16:25:27 -07:00
Craig Topper 5c1fe4e20f [Target] Cache the command line derived feature map in TargetOptions.
We can use this to remove some calls to initFeatureMap from Sema
and CodeGen when a function doesn't have a target attribute.

This reduces compile time of the linux kernel where this map
is needed to diagnose some inline assembly constraints based
on whether sse, avx, or avx512 is enabled.

Differential Revision: https://reviews.llvm.org/D85807
2020-08-12 12:37:23 -07:00
Alexey Bataev fbd6d2c54e [OPENMP] Fix PR47063: crash when trying to get captured statetment.
Need to call getRawStmt() function instead, when trying to get inner
associated statement for the executable directive. Not all directives
use captured statements.
2020-08-12 12:05:58 -04:00
Alexey Bataev f4f3f678f1 [OPENMP]Fix PR37671: Privatize local(private) variables in untied tasks.
In untied tasks, need to allocate the space for local variales, declared
in task region, when the memory for task data is allocated. THe function
can be interrupted and we can exit from the function in untied task
switch. Need to keep the state of the local variables in this case.
Also, the compiler should not call cleanup when exiting in untied task
switch until the real exit out of the declaration scope is met during
 execution.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84457
2020-08-12 11:28:19 -04:00
Alexey Bataev ddbd21d288 [OPENMP]Do not add TGT_OMP_TARGET_PARAM flag to non-captured mapped arguments.
If the arguments are mapped, but are actually not used in the target
region, the compiler still adds attribute TGT_OMP_TARGET_PARAM for such
arguments. It makes the libomptarget to add such parameters to the list
of arguments, passed to the kernel at the runtime, and may lead to
incorrect results/crashes during execution.

Differential Revision: https://reviews.llvm.org/D85755
2020-08-12 10:06:52 -04:00
Alexey Bataev 3651658bdd Revert "[OPENMP]Fix PR37671: Privatize local(private) variables in untied tasks."
This reverts commit ec9563c54e to
investigate compiler crash revelaed by the buildbots.
2020-08-12 09:50:32 -04:00
Alexey Bataev ec9563c54e [OPENMP]Fix PR37671: Privatize local(private) variables in untied tasks.
Summary:
In untied tasks, need to allocate the space for local variales, declared
in task region, when the memory for task data is allocated. THe function
can be interrupted and we can exit from the function in untied task
switch. Need to keep the state of the local variables in this case.
Also, the compiler should not call cleanup when exiting in untied task
switch until the real exit out of the declaration scope is met during
 execution.

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, cfe-commits, sstefan1, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D84457
2020-08-12 09:37:24 -04:00
Kai Nacke b3aece0531 [SystemZ/ZOS] Add binary format goff and operating system zos to the triple
Adds the binary format goff and the operating system zos to the triple
class. goff is selected as default binary format if zos is choosen as
operating system. No further functionality is added.

Reviewers: efriedma, tahonermann, hubert.reinterpertcast, MaskRay

Reviewed By: efriedma, tahonermann, hubert.reinterpertcast

Differential Revision: https://reviews.llvm.org/D82081
2020-08-11 05:26:26 -04:00
Wang, Pengfei 9512525947 [X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics.
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D85385
2020-08-11 10:28:41 +08:00
Johannes Doerfert fa5d22a045 [OpenMP][NFC] Reuse OMPIRBuilder `struct ident_t` handling in Clang
Replace the `ident_t` handling in Clang with the methods offered by the
OMPIRBuilder. This cuts down on the clang code as well as the
differences between the two, making further transitions easier. Tests
have changed but there should not be a real functional change. The most
interesting difference is probably that we stop generating local ident_t
allocations for now and just use globals. Given that this happens only
with debug info, the location part of the `ident_t` is probably bigger
than the test anyway. As the location part is already a global, we can
avoid the allocation, memcpy, and store in favor of a constant global
that is slightly bigger. This can be revisited if there are
complications.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D80735
2020-08-10 17:13:26 -05:00
Nick Desaulniers 4f2ad15db5 [Clang] implement -fno-eliminate-unused-debug-types
Fixes pr/11710.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Resubmit after breaking Windows and OSX builds.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D80242
2020-08-10 15:08:48 -07:00
Michael Liao c7b683c126 [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions.
- Skip generating profile data on `__global__` function in the host
  compilation. It's a host-side stub function only and don't have
  profile instrumentation generated on the real function body. The extra
  profile data results in the malformed instrumentation profile data.
- Skip generating region mapping on functions in the wrong-side, i.e.,
  + For the device compilation, skip host-only functions; and,
  + For the host compilation, skip device-only functions (including
    `__global__` functions.)
- As the device-side profiling is not ready yet, only host-side profile
  code generation is checked.

Differential Revision: https://reviews.llvm.org/D85276
2020-08-10 11:01:46 -04:00
Xiangling Liao 6ef801aa6b [AIX] Static init frontend recovery and backend support
On the frontend side, this patch recovers AIX static init implementation to
use the linkage type and function names Clang chooses for sinit related function.

On the backend side, this patch sets correct linkage and function names on aliases
created for sinit/sterm functions.

Differential Revision: https://reviews.llvm.org/D84534
2020-08-10 10:10:49 -04:00
Nick Desaulniers abb9bf4bcf Revert "[Clang] implement -fno-eliminate-unused-debug-types"
This reverts commit e486921fd6.

Breaks windows builds and osx builds.
2020-08-07 16:11:41 -07:00
Nick Desaulniers e486921fd6 [Clang] implement -fno-eliminate-unused-debug-types
Fixes pr/11710.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D80242
2020-08-07 14:13:48 -07:00
Alexey Bataev 4a7aedb843 [OPENMP]Simplify representation for atomic, critical, master and section
constrcut.

Several constructs may be represented wityout relying on CapturedStmt.
It saves memory and improves compilation speed.
2020-08-07 09:58:23 -04:00
Matt Arsenault 30eeb742f1 clang: Use byref for aggregate kernel arguments
Add address space to indirect abi info and use it for kernels.

Previously, indirect arguments assumed assumed a stack passed object
in the alloca address space using byval. A stack pointer is unsuitable
for kernel arguments, which are passed in a separate, constant buffer
with a different address space.

Start using the new byref for aggregate kernel arguments. Previously
these were emitted as raw struct arguments, and turned into loads in
the backend. These will lower identically, although with byref you now
have the option of applying an explicit alignment. In the future, a
reasonable implementation would use byref for all kernel arguments
(this would be a practical problem at the moment due to losing things
like noalias on pointer arguments).

This is mostly to avoid fighting the optimizer's treatment of
aggregate load/store. SROA and instcombine both turn aggregate loads
and stores into a long sequence of element loads and stores, rather
than the optimizable memcpy I would expect in this situation. Now an
explicit memcpy will be introduced up-front which is better understood
and helps eliminate the alloca in more situations.

This skips using byref in the case where HIP kernel pointer arguments
in structs are promoted to global pointers. At minimum an additional
patch is needed to allow coercion with indirect arguments. This also
skips using it for OpenCL due to the current workaround used to
support kernels calling kernels. Distinct function bodies would need
to be generated up front instead of emitting an illegal call.
2020-08-06 15:52:26 -04:00
Alexey Bataev 0af7835eae [OPENMP]Redesign of OMPExecutableDirective/OMPDeclarativeDirective representation.
Summary:
Introduced OMPChildren class to handle all associated clauses, statement
and child expressions/statements. It allows to represent some directives
more correctly (like flush, depobj etc. with pseudo clauses, ordered
depend directives, which are standalone, and target data directives).
Also, it will make easier to avoid using of CapturedStmt in directives,
if required (atomic, tile etc. directives).
Also, it simplifies serialization/deserialization of the
executable/declarative directives.
Reduces number of allocation operations for mapper declarations.

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, jfb, cfe-commits, sstefan1, aaron.ballman, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D83261
2020-08-06 12:25:19 -04:00
Anatoly Trosinenko 5a07490d76 [ABI][NFC] Fix the confusion of ByVal and ByRef argument names
The second argument of getNaturalAlignIndirect() was `bool ByRef`, but
the implementation was just delegating to getIndirect() with `ByRef`
passed unchanged to `bool ByVal` parameter of getIndirect().

Fix a couple of /*ByRef=*/ comments as well.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D85113
2020-08-06 15:20:18 +03:00
Stanislav Mekhanoshin 105608a4c2 [AMDGPU] Added missing gfx1031 cases to CGOpenMPRuntimeGPU.cpp 2020-08-05 12:39:03 -07:00
Erich Keane 2143a90b34 Fix _ExtInt(1) to be a i1 in memory.
The _ExtInt(1) in getTypeForMem was hitting the bool logic for expanding
to an 8 bit value.  The result was an assert, or store i1 %0, i8* %2, align 1
since the parameter IS an i1.  This patch changes the 'forMem' test to
exclude ext-int from the bool test.
2020-08-05 10:54:51 -07:00
Joel E. Denny 002d61db2b [OpenMP] Fix `present` for exit from `omp target data`
Without this patch, the following example fails but shouldn't
according to OpenMP TR8:

```
 #pragma omp target enter data map(alloc:i)
 #pragma omp target data map(present, alloc: i)
 {
   #pragma omp target exit data map(delete:i)
 } // fails presence check here
```

OpenMP TR8 sec. 2.22.7.1 "map Clause", p. 321, L23-26 states:

> If the map clause appears on a target, target data, target enter
> data or target exit data construct with a present map-type-modifier
> then on entry to the region if the corresponding list item does not
> appear in the device data environment an error occurs and the
> program terminates.

There is no corresponding statement about the exit from a region.
Thus, the `present` modifier should:

1. Check for presence upon entry into any region, including a `target
   exit data` region.  This behavior is already implemented correctly.

2. Should not check for presence upon exit from any region, including
   a `target` or `target data` region.  Without this patch, this
   behavior is not implemented correctly, breaking the above example.

In the case of `target data`, this patch fixes the latter behavior by
removing the `present` modifier from the map types Clang generates for
the runtime call at the end of the region.

In the case of `target`, we have not found a valid OpenMP program for
which such a fix would matter.  It appears that, if a program can
guarantee that data is present at the beginning of a `target` region
so that there's no error there, that data is also guaranteed to be
present at the end.  This patch adds a comment to the runtime to
document this case.

Reviewed By: grokos, RaviNarayanaswamy, ABataev

Differential Revision: https://reviews.llvm.org/D84422
2020-08-05 10:03:31 -04:00
Yonghong Song 00602ee7ef BPF: simplify IR generation for __builtin_btf_type_id()
This patch simplified IR generation for __builtin_btf_type_id().
For __builtin_btf_type_id(obj, flag), previously IR builtin
looks like
   if (obj is a lvalue)
     llvm.bpf.btf.type.id(obj.ptr, 1, flag)  !type
   else
     llvm.bpf.btf.type.id(obj, 0, flag)  !type
The purpose of the 2nd argument is to differentiate
   __builtin_btf_type_id(obj, flag) where obj is a lvalue
vs.
   __builtin_btf_type_id(obj.ptr, flag)

Note that obj or obj.ptr is never used by the backend
and the `obj` argument is only used to derive the type.
This code sequence is subject to potential llvm CSE when
  - obj is the same .e.g., nullptr
  - flag is the same
  - metadata type is different, e.g., typedef of struct "s"
    and strust "s".
In the above, we don't want CSE since their metadata is different.

This patch change IR builtin to
   llvm.bpf.btf.type.id(seq_num, flag)  !type
and seq_num is always increasing. This will prevent potential
llvm CSE.

Also report an error if the type name is empty for
remote relocation since remote relocation needs non-empty
type name to do relocation against vmlinux.

Differential Revision: https://reviews.llvm.org/D85174
2020-08-04 16:29:42 -07:00
Thorsten Schuett e18c6ef6b4 [clang] improve diagnostics for misaligned and large atomics
"Listing the alignment and access size (== expected alignment) in the warning
seems like a good idea."

solves PR 46947

  struct Foo {
    struct Bar {
      void * a;
      void * b;
    };
    Bar bar;
  };

  struct ThirtyTwo {
    struct Large {
      void * a;
      void * b;
      void * c;
      void * d;
    };
    Large bar;
  };

  void braz(Foo *foo, ThirtyTwo *braz) {
    Foo::Bar bar;
    __atomic_load(&foo->bar, &bar, __ATOMIC_RELAXED);

    ThirtyTwo::Large foobar;
    __atomic_load(&braz->bar, &foobar, __ATOMIC_RELAXED);
  }

repro.cpp:21:3: warning: misaligned atomic operation may incur significant performance penalty; the expected (16 bytes) exceeds the actual alignment (8 bytes) [-Watomic-alignment]
  __atomic_load(&foo->bar, &bar, __ATOMIC_RELAXED);
  ^
repro.cpp:24:3: warning: misaligned atomic operation may incur significant performance penalty; the expected (32 bytes) exceeds the actual alignment (8 bytes) [-Watomic-alignment]
  __atomic_load(&braz->bar, &foobar, __ATOMIC_RELAXED);
  ^
repro.cpp:24:3: warning: large atomic operation may incur significant performance penalty; the access size (32 bytes) exceeds the max lock-free size (16  bytes) [-Watomic-alignment]
3 warnings generated.

Differential Revision: https://reviews.llvm.org/D85102
2020-08-04 11:10:29 -07:00
Yonghong Song 6d67506964 [clang][BPF] support type exist/size and enum exist/value relocations
This patch added the following additional compile-once
run-everywhere (CO-RE) relocations:
  - existence/size of typedef, struct/union or enum type
  - enum value and enum value existence

These additional relocations will make CO-RE bpf programs more
adaptive for potential kernel internal data structure changes.

For existence/size relocations, the following two code patterns
are supported:
  1. uint32_t __builtin_preserve_type_info(*(<type> *)0, flag);
  2. <type> var;
     uint32_t __builtin_preserve_field_info(var, flag);
flag = 0 for existence relocation and flag = 1 for size relocation.

For enum value existence and enum value relocations, the following code
pattern is supported:
  uint64_t __builtin_preserve_enum_value(*(<enum_type> *)<enum_value>,
                                         flag);
flag = 0 means existence relocation and flag = 1 for enum value.
relocation. In the above <enum_type> can be an enum type or
a typedef to enum type. The <enum_value> needs to be an enumerator
value from the same enum type. The return type is uint64_t to
permit potential 64bit enumerator values.

Differential Revision: https://reviews.llvm.org/D83242
2020-08-04 08:39:53 -07:00
Kazushi (Jam) Marukawa 045e79e77c [VE] Extend integer arguments and return values smaller than 64 bits
In order to follow NEC Aurora SX VE ABI correctly, change to sign/zero
extend integer arguments and return values smaller than 64 bits in clang.
Also update regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D85071
2020-08-04 08:07:05 +09:00
Thomas Lively cb32792210 [WebAssembly] Implement prototype v128.load{32,64}_zero instructions
Specified in https://github.com/WebAssembly/simd/pull/237, these
instructions load the first vector lane from memory and zero the other
lanes. Since these instructions are not officially part of the SIMD
proposal, they are only available on an opt-in basis via LLVM
intrinsics and clang builtin functions. If these instructions are
merged to the proposal, this implementation will change so that the
instructions will be generated from normal IR. At that point the
intrinsics and builtin functions would be removed.

This PR also changes the opcodes for the experimental f32x4.qfm{a,s}
instructions because their opcodes conflicted with those of the
v128.load{32,64}_zero instructions. The new opcodes were chosen to
match those used in V8.

Differential Revision: https://reviews.llvm.org/D84820
2020-08-03 13:54:00 -07:00
Akira Hatanaka 41b1e97b12 [CodeGen][ObjC] Mark calls to objc_unsafeClaimAutoreleasedReturnValue as
notail on x86-64

This is needed because the epilogue code inserted before tail calls on
x86-64 breaks the handshake between the caller and callee.

Calls to objc_retainAutoreleasedReturnValue used to have the same
problem, which was fixed in https://reviews.llvm.org/D59656.

rdar://problem/66029552

Differential Revision: https://reviews.llvm.org/D84540
2020-08-03 13:25:25 -07:00
Saiyedul Islam 160ff83765 [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 3
Provides AMDGCN and NVPTX specific specialization of getGPUWarpSize,
getGPUThreadID, and getGPUNumThreads methods. Adds tests for AMDGCN
codegen for these methods in generic and simd modes. Also changes the
precondition in InitTempAlloca to be slightly more permissive. Useful for
AMDGCN OpenMP codegen where allocas are created with a cast to an
address space.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84260
2020-08-03 05:38:39 +00:00
Eli Friedman 8dfb5d767e [clang codegen][AArch64] Use llvm.aarch64.neon.fcvtzs/u where it's necessary
fptosi/fptoui have similar, but not identical, semantics.  In
particular, the behavior on overflow is different.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46844 for 64-bit.  (The
corresponding patch for 32-bit is more involved because the equivalent
intrinsics don't exist, as far as I can tell.)

Differential Revision: https://reviews.llvm.org/D84703
2020-07-30 15:41:54 -07:00
Richard Smith 1e7f026c3b PR46908: Emit undef destroying_delete_t as an aggregate RValue.
We previously used a non-aggregate RValue to represent the passed value,
which violated the assumptions of call arg lowering in some cases, in
particular on 32-bit Windows, where we'd end up producing an FCA store
with TBAA metadata, that the IR verifier would reject.
2020-07-30 14:50:01 -07:00
Johannes Doerfert ebad64dfe1 [OpenMP][FIX] Consistently use OpenMPIRBuilder if requested
When we use the OpenMPIRBuilder for the parallel region we need to also
use it to get the thread ID (among other things) in the body. This is
because CGOpenMPRuntime::getThreadID() and
CGOpenMPRuntime::emitUpdateLocation implicitly assumes that if they are
called from within a parallel region there is a certain structure to the
code and certain members of the OMPRegionInfo are initialized. It might
make sense to initialize them even if we use the OpenMPIRBuilder but we
would preferably get rid of such state instead.

Bug reported by Anchu Rajendran Sudhakumari.

Depends on D82470.

Reviewed By: anchu-rajendran

Differential Revision: https://reviews.llvm.org/D82822
2020-07-30 10:19:40 -05:00
Johannes Doerfert 19756ef53a [OpenMP][IRBuilder] Support allocas in nested parallel regions
We need to keep track of the alloca insertion point (which we already
communicate via the callback to the user) as we place allocas as well.

Reviewed By: fghanim, SouraVX

Differential Revision: https://reviews.llvm.org/D82470
2020-07-30 10:19:39 -05:00
Alexey Bataev 622e46156d [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84767
2020-07-30 11:18:33 -04:00
Alexey Bataev b69357c2f4 Revert "[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region."
This reverts commit 142d0d3ed8 to
investigate undefined behavior revealed by buildbots.
2020-07-30 10:57:56 -04:00
Alexey Bataev 142d0d3ed8 [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
It applies only for global pointers.

Differential Revision: https://reviews.llvm.org/D84767
2020-07-30 09:40:05 -04:00
Amy Huang f71deb43ab [DebugInfo] Fix to ctor homing to ignore classes with trivial ctors.
Previously ctor homing was omitting debug info for classes if they
have both trival and nontrivial constructors, but we should only omit debug
info if the class doesn't have any trivial constructors.

retained types list.

bug: https://bugs.llvm.org/show_bug.cgi?id=46537

Differential Revision: https://reviews.llvm.org/D84870
2020-07-29 19:55:20 -07:00
Arthur Eubanks 71d0a2b8a3 [DFSan][NewPM] Port DataFlowSanitizer to NewPM
Reviewed By: ychen, morehouse

Differential Revision: https://reviews.llvm.org/D84707
2020-07-29 10:19:15 -07:00
Joel E. Denny 9f2f3b9de6 [OpenMP] Implement TR8 `present` motion modifier in Clang (1/2)
This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives.  The
next patch in this series implements OpenMP runtime support.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84711
2020-07-29 12:18:45 -04:00
Alexey Bader 8d27be8dba [OpenCL] Add global_device and global_host address spaces
This patch introduces 2 new address spaces in OpenCL: global_device and global_host
which are a subset of a global address space, so the address space scheme will be
looking like:

```
generic->global->host
                          ->device
             ->private
             ->local
constant
```

Justification: USM allocations may be associated with both host and device memory. We
want to give users a way to tell the compiler the allocation type of a USM pointer for
optimization purposes. (Link to the Unified Shared Memory extension:
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc)

Before this patch USM pointer could be only in opencl_global
address space, hence a device backend can't tell if a particular pointer
points to host or device memory. On FPGAs at least we can generate more
efficient hardware code if the user tells us where the pointer can point -
being able to distinguish between these types of pointers at compile time
allows us to instantiate simpler load-store units to perform memory
transactions.

Patch by Dmitry Sidorov.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D82174
2020-07-29 17:24:53 +03:00
Thomas Lively 11bb7eef41 [WebAssembly] Remove intrinsics for SIMD widening ops
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.

Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.

Differential Revision: https://reviews.llvm.org/D84556
2020-07-28 18:25:55 -07:00
Joel E. Denny 69fc33f0cd Revert "[OpenMP] Implement TR8 `present` motion modifier in Clang (1/2)"
This reverts commit 3c3faae497.

It breaks a number of bots.
2020-07-28 20:30:05 -04:00
Joel E. Denny 3c3faae497 [OpenMP] Implement TR8 `present` motion modifier in Clang (1/2)
This patch implements Clang front end support for the OpenMP TR8
`present` motion modifier for `omp target update` directives.  The
next patch in this series implements OpenMP runtime support.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D84711
2020-07-28 19:15:18 -04:00
Zahira Ammarguellat 80bd6ae13e On Windows build, making the /bigobj flag global , instead of passing it per file.
To avoid having this flag be passed in per/file manner, we are instead
passing it globally.

This fixes this bug: https://bugs.llvm.org/show_bug.cgi?id=46733

Reviewed-by: aaron.ballman, beanz, meinersbur

Differential Revision: https://reviews.llvm.org/D84038
2020-07-28 18:04:36 -05:00
Richard Smith 740a164dec PR46377: Fix dependence calculation for function types and typedef
types.

We previously did not treat a function type as dependent if it had a
parameter pack with a non-dependent type -- such a function type depends
on the arity of the pack so is dependent even though none of the
parameter types is dependent. In order to properly handle this, we now
treat pack expansion types as always being dependent types (depending on
at least the pack arity), and always canonically being pack expansion
types, even in the unusual case when the pattern is not a dependent
type. This does mean that we can have canonical types that are pack
expansions that contain no unexpanded packs, which is unfortunate but
not inaccurate.

We also previously did not treat a typedef type as
instantiation-dependent if its canonical type was not
instantiation-dependent. That's wrong because instantiation-dependence
is a property of the type sugar, not of the type; an
instantiation-dependent type can have a non-instantiation-dependent
canonical type.
2020-07-28 13:23:13 -07:00
Zequan Wu b46176bbb0 Reland [Coverage] Add comment to skipped regions
Bug filled here: https://bugs.llvm.org/show_bug.cgi?id=45757.
Add comment to skipped regions so we don't track execution count for lines containing only comments.

Differential Revision: https://reviews.llvm.org/D83592
2020-07-28 13:20:57 -07:00
Richard Smith 6c18f7db73 For PR46800, implement the GCC __builtin_complex builtin.
glibc's implementation of the CMPLX macro uses it (with -fgnuc-version
set to 4.7 or later).
2020-07-22 13:43:10 -07:00
Hans Wennborg 238bbd48c5 Revert abd45154b "[Coverage] Add comment to skipped regions"
This casued assertions during Chromium builds. See comment on the code review

> Bug filled here: https://bugs.llvm.org/show_bug.cgi?id=45757.
> Add comment to skipped regions so we don't track execution count for lines containing only comments.
>
> Differential Revision: https://reviews.llvm.org/D84208

This reverts commit abd45154bd and the
follow-up 87d7254733.
2020-07-22 17:09:20 +02:00
Joel E. Denny aa82c40f0a [OpenMP] Implement TR8 `present` map type modifier in Clang (1/2)
This patch implements Clang front end support for the OpenMP TR8
`present` map type modifier.  The next patch in this series implements
OpenMP runtime support.

This patch does not attempt to implement TR8 sec. 2.22.7.1 "map
Clause", p. 319, L14-16:

> If a map clause with a present map-type-modifier is present in a map
> clause, then the effect of the clause is ordered before all other
> map clauses that do not have the present modifier.

Compare to L10-11, which Clang does not appear to implement yet:

> For a given construct, the effect of a map clause with the to, from,
> or tofrom map-type is ordered before the effect of a map clause with
> the alloc, release, or delete map-type.

This patch also does not implement the `present` implicit-behavior for
`defaultmap` or the `present` motion-modifier for `target update`.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D83061
2020-07-22 10:15:32 -04:00