Commit Graph

15545 Commits

Author SHA1 Message Date
Nikita Popov aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
Simon Pilgrim e4074432d5 [X86] Remove avx512f integer and/or/xor/min/max reduction intrinsics and use generic equivalents
None of these have any reordering issues, and they still emit the same reduction intrinsics without any change in the existing test coverage:

llvm-project\clang\test\CodeGen\X86\avx512-reduceIntrin.c
llvm-project\clang\test\CodeGen\X86\avx512-reduceMinMaxIntrin.c

Differential Revision: https://reviews.llvm.org/D117881
2022-01-24 11:57:53 +00:00
Simon Pilgrim 3e50593b18 [X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions

This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_max_epu32
  // CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
  return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Sibling patch to D117791

Differential Revision: https://reviews.llvm.org/D117798
2022-01-24 11:40:29 +00:00
Simon Pilgrim e5147f82e1 [X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)

This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
  // CHECK-LABEL: test_mm256_abs_epi8
  // CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
  return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Differential Revision: https://reviews.llvm.org/D117791
2022-01-24 11:25:21 +00:00
Wei Wang 55d887b833 [time-trace] Add optimizer and codegen regions to NPM
Optimizer and codegen regions were only added to legacy PM. Add
them to NPM as well.

Differential Revision: https://reviews.llvm.org/D117605
2022-01-21 19:17:57 -08:00
Simon Pilgrim 0abaf64580 Revert rG4727d29d908f9dd608dd97a58c0af1ad579fd3ca "[X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs"
Some build bots are referencing the `__builtin_ia32_pabs` intrinsics via alternative headers
2022-01-21 12:35:36 +00:00
Simon Pilgrim 3ef88b3184 Revert rG8ee135dcf8ff060656ad481c3e980fe8763576f5 "[X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`"
Some build bots are referencing the `__builtin_ia32_pmax/min` intrinsics via alternative headers
2022-01-21 12:34:19 +00:00
Simon Pilgrim 8ee135dcf8 [X86] Remove `__builtin_ia32_pmax/min` intrinsics and use generic `__builtin_elementwise_max/min`
D111985 added the generic `__builtin_elementwise_max` and `__builtin_elementwise_min` intrinsics with the same integer behaviour as the SSE/AVX instructions

This patch removes the `__builtin_ia32_pmax/min` intrinsics and just uses `__builtin_elementwise_max/min` - the existing tests see no changes:
```
__m256i test_mm256_max_epu32(__m256i a, __m256i b) {
  // CHECK-LABEL: test_mm256_max_epu32
  // CHECK: call <8 x i32> @llvm.umax.v8i32(<8 x i32> %{{.*}}, <8 x i32> %{{.*}})
  return _mm256_max_epu32(a, b);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Sibling patch to D117791

Differential Revision: https://reviews.llvm.org/D117798
2022-01-21 12:24:58 +00:00
Simon Pilgrim 4727d29d90 [X86] Remove __builtin_ia32_pabs intrinsics and use generic __builtin_elementwise_abs
D111986 added the generic `__builtin_elementwise_abs()` intrinsic with the same integer absolute behaviour as the SSE/AVX instructions (abs(INT_MIN) == INT_MIN)

This patch removes the `__builtin_ia32_pabs*` intrinsics and just uses `__builtin_elementwise_abs` - the existing tests see no changes:
```
__m256i test_mm256_abs_epi8(__m256i a) {
  // CHECK-LABEL: test_mm256_abs_epi8
  // CHECK: [[ABS:%.*]] = call <32 x i8> @llvm.abs.v32i8(<32 x i8> %{{.*}}, i1 false)
  return _mm256_abs_epi8(a);
}
```
This requires us to add a `__v64qs` explicitly signed char vector type (we already have `__v16qs` and `__v32qs`).

Differential Revision: https://reviews.llvm.org/D117791
2022-01-21 11:59:08 +00:00
Joao Moreira 82af95029e [X86] Enable ibt-seal optimization when LTO is used in Kernel
Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function.

Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1].

This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage.

A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540.

The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged.

[1] - https://lkml.org/lkml/2021/11/22/591

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D116070
2022-01-21 10:55:34 +08:00
Alexandre Ganea 5af2433e17 [clang-cl] Support the /HOTPATCH flag
This patch adds support for the MSVC /HOTPATCH flag: https://docs.microsoft.com/sv-se/cpp/build/reference/hotpatch-create-hotpatchable-image?view=msvc-170&viewFallbackFrom=vs-2019

The flag is translated to a new -fms-hotpatch flag, which in turn adds a 'patchable-function' attribute for each function in the TU. This is then picked up by the PatchableFunction pass which would generate a TargetOpcode::PATCHABLE_OP of minsize = 2 (which means the target instruction must resolve to at least two bytes). TargetOpcode::PATCHABLE_OP is only implemented for x86/x64. When targetting ARM/ARM64, /HOTPATCH isn't required (instructions are always 2/4 bytes and suitable for hotpatching).

Additionally, when using /Z7, we generate a 'hot patchable' flag in the CodeView debug stream, in the S_COMPILE3 record. This flag is then picked up by LLD (or link.exe) and is used in conjunction with the linker /FUNCTIONPADMIN flag to generate extra space before each function, to accommodate for live patching long jumps. Please see: d703b92296/lld/COFF/Writer.cpp (L1298)

The outcome is that we can finally use Live++ or Recode along with clang-cl.

NOTE: It seems that MSVC cl.exe always enables /HOTPATCH on x64 by default, although if we did the same I thought we might generate sub-optimal code (if this flag was active by default). Additionally, MSVC always generates a .debug$S section and a S_COMPILE3 record, which Clang doesn't do without /Z7. Therefore, the following MSVC command-line "cl /c file.cpp" would have to be written with Clang such as "clang-cl /c file.cpp /HOTPATCH /Z7" in order to obtain the same result.

Depends on D43002, D80833 and D81301 for the full feature.

Differential Revision: https://reviews.llvm.org/D116511
2022-01-20 12:57:19 -05:00
Florian Hahn 67aa314bce
[IRGen] Do not overwrite existing attributes in CGCall.
When adding new attributes, existing attributes are dropped. While
this appears to be a longstanding issue, this was highlighted by D105169
which dropped a lot of attributes due to adding the new noundef
attribute.

Ahmed Bougacha (@ab) tracked down the issue and provided the fix in
CGCall.cpp. I bundled it up and updated the tests.
2022-01-20 13:45:19 +00:00
Chenbing.Zheng 0be3da1fab [RISCV] Add intrinsic for Zbt extension
RV32: fsl, fsr, fsri
RV64: fsl, fsr, fsri, fslw, fsrw, fsriw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117468
2022-01-20 08:27:05 +00:00
Yaxun (Sam) Liu 85c2bd2a0e Prevent adding module flag amdgpu_hostcall multiple times
HIP program with printf call fails to compile with -fsanitize=address
option, because of appending module flag - amdgpu_hostcall twice, one
for printf and one for sanitize option. This patch fixes that issue.

Patch by: Praveen Velliengiri

Reviewed by: Yaxun Liu, Roman Lebedev

Differential Revision: https://reviews.llvm.org/D116216
2022-01-19 12:52:33 -05:00
Arnamoy Bhattacharyya 9fbd33ad62 [OMPIRBuilder] Add support for simd (loop) directive.
This patch adds OMPIRBuilder support for the simd directive (without any clause).  This will be a first step towards lowering simd directive in LLVM_Flang.  The patch uses existing CanonicalLoop infrastructure of IRBuilder to add the support.  Also adds necessary code to add llvm.access.group and llvm.loop metadata wherever needed.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D114379
2022-01-19 11:32:17 -05:00
Ben Shi a2f488c6a5 [clang][AVR] Implement '__flashN' for variables on different flash banks
Reviewed By: aykevl

Differential Revision: https://reviews.llvm.org/D115982
2022-01-19 11:24:01 +00:00
hyeongyu kim 1b1c8d83d3 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2022-01-16 18:54:17 +09:00
Nikita Popov c63a3175c2 [AttrBuilder] Remove ctor accepting AttributeList and Index
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
2022-01-15 22:39:31 +01:00
James Y Knight 0d3f2fd269 Revert "Skip exception cleanups when the innermost scope is EHTerminateScope."
Breaks tests on some platforms. Reverting while investigating.

This reverts commit a4e255f9c6.
2022-01-14 18:59:24 -05:00
Matt Arsenault 33315ef321 clang/AMDGPU: Don't set implicit arg attribute to default size
Since 2959e082e1, we conservatively
assume all inputs are enabled by default. This isn't the best
interface for controlling these anyway, since it's not granular and
only allows trimming the last fields.
2022-01-14 18:43:30 -05:00
James Y Knight a4e255f9c6 Skip exception cleanups when the innermost scope is EHTerminateScope.
EHTerminateScope is used to implement C++ noexcept semantics. Per C++
[except.terminate], it is implemented-defined whether no, some, or all
cleanups are run prior to terminatation.

Therefore, the code to run cleanups on the way towards termination is
unnecessary, and may be omitted.

After this change, we will still run some cleanups: any cleanups in a
function called from the noexcept function will continue to run, while
those in the noexcept function itself will not.

Differential Revision: https://reviews.llvm.org/D113620
2022-01-14 18:01:29 -05:00
Erich Keane 2bcba21c8b [CPU-Dispatch] Make sure Dispatch names get updated if previously mangled
Cases where there is a mangling of a cpu-dispatch/cpu-specific function
before the function becomes 'multiversion' (such as a member function)
causes the wrong name to be emitted for one of the variants/resolver,
since the name is cached.  Make sure we invalidate the cache in
cpu-dispatch/cpu-specific modes, like we previously did for just target
multiversioning.
2022-01-14 10:45:55 -08:00
Benjamin Kramer 765dd8b8a4 [CGBuiltin] Simplify code. NFCI. 2022-01-14 16:02:02 +01:00
Jun Zhang 8de0c1feca
[Clang] Add __builtin_reduce_or and __builtin_reduce_and
This patch implements two builtins specified in D111529.
The last __builtin_reduce_add will be seperated into another one.

Differential Revision: https://reviews.llvm.org/D116736
2022-01-14 22:05:26 +08:00
Kevin Athey a0458b531c Add -fsanitize-address-param-retval to clang.
With the introduction of this flag, it is no longer necessary to enable noundef analysis with 4 separate flags.
(-Xclang -enable-noundef-analysis -mllvm -msan-eager-checks=1).
This change only covers the introduction into the compiler.

This is a follow up to: https://reviews.llvm.org/D116855

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D116633
2022-01-14 00:41:28 -08:00
Maurice Heumann 072e2a7c67 [MS] Implement on-demand TLS initialization for Microsoft CXX ABI
TLS initializers, for example constructors of thread-local variables, don't necessarily get called. If a thread was created before a module is loaded, the module's TLS initializers are not executed for this particular thread.

This is why Microsoft added support for dynamic TLS initialization. Before every use of thread-local variables, a check is added that runs the module's TLS initializers on-demand.

To do this, the method `__dyn_tls_on_demand_init` gets called. Internally, it simply calls `__dyn_tls_init`.

No additional TLS initializer that sets the guard needs to be emitted, as the guard always gets set by `__dyn_tls_init`.
The guard is also checked again within `__dyn_tls_init`. This makes our check redundant, however, as Microsoft's compiler also emits this check, the behaviour is adopted here.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D115456
2022-01-13 21:23:23 -08:00
Elizabeth Andrews 4eaf5846d0 [clang] Fix function pointer address space
Functions pointers should be created with program address space. This
patch introduces program address space in TargetInfo. Targets with
non-default (default is 0) address space for functions should explicitly
set this value. This patch fixes a crash on lvalue reference to function
pointer (in device code) when using oneAPI DPC++ compiler.

Differential Revision: https://reviews.llvm.org/D111566
2022-01-13 08:06:19 -08:00
Erich Keane b699e8b11a Add another assert to cpu-dispatch emission to help track down a tough
to repro error.

As mentioned yesterday, I've got a problem that I can only reproduce on
Godbolt (none of the build configs on my local machine!), so this is at
least somewhat usable until I figure out a cause.
2022-01-13 06:54:08 -08:00
Lian Wang 16877c5d2c [RISCV] Add bfp and bfpw intrinsic in zbf extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D116994
2022-01-13 02:53:00 +00:00
Kevin Athey a141e47138 [NFC] Minimize noundef analysis when disabled
Minor adjustment in order of noundef analysis to be a bit more optimal (when disabled).

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D117078
2022-01-12 17:21:19 -08:00
Erich Keane 6e77ad11ff Add an assert in cpudispatch emit to try to track down an error.
I'm attempting to debug an issue that I can only get to happen on
godbolt, where the cpu-dispatch resolver for an out of line member
function is generated with the wrong name, causing a link failure.
2022-01-12 10:31:28 -08:00
Simon Pilgrim 497a4b26c4 CGBuiltin - Use castAs<> instead of getAs<> to avoid dereference of nullptr
The pointer is always dereferenced immediately below, so assert the cast is correct instead of returning nullptr
2022-01-12 15:35:37 +00:00
Marco Elver 732ad8ea62 [clang][auto-init] Provide __builtin_alloca*_uninitialized variants
When `-ftrivial-auto-var-init=` is enabled, allocas unconditionally
receive auto-initialization since [1].

In certain cases, it turns out, this is causing problems. For example,
when using alloca to add a random stack offset, as the Linux kernel does
on syscall entry [2]. In this case, none of the alloca'd stack memory is
ever used, and initializing it should be controllable; furthermore, it
is not always possible to safely call memset (see [2]).

Introduce `__builtin_alloca_uninitialized()` (and
`__builtin_alloca_with_align_uninitialized`), which never performs
initialization when `-ftrivial-auto-var-init=` is enabled.

[1] https://reviews.llvm.org/D60548
[2] https://lkml.kernel.org/r/YbHTKUjEejZCLyhX@elver.google.com

Reviewed By: glider

Differential Revision: https://reviews.llvm.org/D115440
2022-01-12 15:13:10 +01:00
Sven van Haastregt 4b85800bfd [OpenCL] Set external linkage for block enqueue kernels
All kernels can be called from the host as per the SPIR_KERNEL calling
convention.  As such, all kernels should have external linkage, but
block enqueue kernels were created with internal linkage.

Reported-by: Pedro Olsen Ferreira

Differential Revision: https://reviews.llvm.org/D115523
2022-01-12 13:30:09 +00:00
Adam Magier b2715660ed [clang][CodeGen][UBSan] VLA size checking for unsigned integer parameter
The code generation for the UBSan VLA size check was qualified by a con-
dition that the parameter must be a signed integer, however the C spec
does not make any distinction that only signed integer parameters can be
used to declare a VLA, only qualifying that it must be greater than zero
if it is not a constant.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D116048
2022-01-12 01:11:52 +01:00
Nick Desaulniers 5c562f62a4 [clang] number labels in asm goto strings after tied inputs
I noticed that the following case would compile in Clang but not GCC:
    void *x(void) {
      void *p = &&foo;
      asm goto ("# %0\n\t# %l1":"+r"(p):::foo);
      foo:;
      return p;
    }

Changing the output template above from %l2 would compile in GCC but not
Clang.

This demonstrates that when using tied outputs (say via the "+r" output
constraint), the hidden inputs occur or are numbered BEFORE the labels,
at least with GCC.

In fact, GCC does denote this in its documentation:
https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Extended-Asm.html#Goto-Labels

> Output operand with constraint modifier ‘+’ is counted as two operands
> because it is considered as one output and one input operand.

For the sake of compatibility, I think it's worthwhile to just make this
change.

It's better to use symbolic names for compatibility (especially now
between released version of Clang that support asm goto with outputs).
ie. %l1 from the above would be %l[foo]. The GCC docs also make this
recommendation.

Also, I cleaned up some cruft in GCCAsmStmt::getNamedOperand. AFAICT,
NumPlusOperands was no longer used, though I couldn't find which commit
didn't clean that up correctly.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98096
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103640
Link: https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Extended-Asm.html#Goto-Labels

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D115471
2022-01-11 12:09:24 -08:00
Nick Desaulniers c8463fd22b [clang][CGStmt] emit i constraint rather than X for asm goto indirect dests
As suggested in:
https://reviews.llvm.org/D114895#3177794
X will be converted to i by SelectionDAGISEL anyways.

Reviewed By: void, jyknight

Differential Revision: https://reviews.llvm.org/D115311
2022-01-11 11:48:40 -08:00
Akira Hatanaka e5df9cc098 [CodeGen] Treat ObjC `__unsafe_unretained` and class types as trivial
when generating copy/dispose helper functions

Analyze the block captures just once before generating copy/dispose
block helper functions and honor the inert `__unsafe_unretained`
qualifier. This refactor fixes a bug where captures of ObjC
`__unsafe_unretained` and class types were needlessly retained/released
by the copy/dispose helper functions.

Differential Revision: https://reviews.llvm.org/D116948
2022-01-11 11:18:24 -08:00
Nikita Popov acc39873b7 [CodeGen] Avoid deprecated Address constructor 2022-01-11 13:07:02 +01:00
Hans Wennborg 0b48d0fe12 [ADT] Add an in-place version of toHex()
and use that to simplify MD5's hex string code which was previously
using a string stream, as well as Clang's
CGDebugInfo::computeChecksum().

Differential revision: https://reviews.llvm.org/D116960
2022-01-11 11:51:04 +01:00
Nikita Popov 2d1b55ebea [CodeGen] Make element type in emitArrayDestroy() predictable
When calling emitArrayDestroy(), the pointer will usually have
ConvertTypeForMem(EltType) as the element type, as one would expect.
However, globals with initializers sometimes don't use the same
types as values normally would, e.g. here the global uses
{ double, i32 } rather than %struct.T as element type.

Add an early cast to the global destruction path to avoid this
special case. The cast would happen lateron anyway, it only gets
moved to an earlier point.

Differential Revision: https://reviews.llvm.org/D116219
2022-01-11 09:25:29 +01:00
Jennifer Yu 140a6b1e5c [clang][OpenMP5.1] Initial parsing/sema for 'indirect' clause
Differential Revision: https://reviews.llvm.org/D116764
2022-01-10 16:58:56 -08:00
Adrian Prantl eb200e584e Emit the C++ dialect in -gmodules .pcm files.
Because of commit: https://reviews.llvm.org/D104291 the -gmodules .pcm
files do not have the same DW_AT_language dialect as the .o file. This
was a simple matter of passing the DebugStrictDwarf flag to the
PCHContainerGenerator object's CodeGenOpts from the CompilerInstance
passed in to it.

Before this change if you ran dwarfdump on the gmodule cache folder
you would get DW_AT_language (DW_LANG_C_plus_plus) even when using
-std=c++14 with clang

Patch by Shubham Rastogi!

Differential Revision: https://reviews.llvm.org/D116790
2022-01-10 16:13:40 -08:00
Nikita Popov 7725331ccd [CodeGen] Avoid some pointer element type accesses
Possibly this is sufficient to fix PR53089.
2022-01-10 15:02:55 +01:00
Serge Guelton d2cc6c2d0c Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is
inefficient: it makes its constructor slightly heavier, and involves extra
allocation for each new string attribute. Storing the attribute key/value as
strings implies extra allocation/copy step.

Use a sorted vector instead. Given the low number of attributes generally
involved, this is cheaper, as showcased by

https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions

Differential Revision: https://reviews.llvm.org/D116599
2022-01-10 14:49:53 +01:00
Kazu Hirata 17d4bd3d78 [clang] Fix bugprone argument comments (NFC)
Identified with bugprone-argument-comment.
2022-01-09 00:19:49 -08:00
Kazu Hirata 40446663c7 [clang] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
2022-01-09 00:19:47 -08:00
Johannes Doerfert 37639b72a1 [OpenMP][FIX] Emit debug declares only if debug info is available
The `EmitDeclareOfAutoVariable` introduced in D114504 and D115510 has a
precondition that cannot be violated. It is unclear if we should call it
directly given the sparse usage in clang but for now we should at least
not crash if the debug info kind is too low.

Fixes #52938.

Differential Revision: https://reviews.llvm.org/D116865
2022-01-08 17:01:19 -06:00
Kazu Hirata d1b127b5b7 [clang] Remove unused forward declarations (NFC) 2022-01-08 11:56:40 -08:00
Simon Pilgrim 6ee589e2f5 [CGObjCMac] Use castAs<> instead of getAs<> to avoid dereference of nullptr inside BuildRCBlockVarRecordLayout
This will assert the cast is correct instead of returning nullptr (UnionType is a subtype of RecordType so this should be clean).
2022-01-08 16:18:55 +00:00
Simon Pilgrim 06e9733fec [CGExpr] Use castAs<> instead of getAs<> to avoid dereference of nullptr
This will assert the cast is correct instead of returning nullptr
2022-01-08 14:26:09 +00:00
Jun Zhang 5be131922c
[NFC] Test commit.
This is just a test commit to check whether the permission I got is
correct or not.
2022-01-08 10:36:09 +08:00
Jun Zhang b2ed9f3f44
[Clang] Implement the rest of __builtin_elementwise_* functions.
The patch implement the rest of __builtin_elementwise_* functions
specified in D111529, including:
* __builtin_elementwise_floor
* __builtin_elementwise_roundeven
* __builtin_elementwise_trunc

Signed-off-by: Jun <jun@junz.org>

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D115429
2022-01-07 15:11:36 +00:00
Nikita Popov e8b98a5216 [CodeGen] Emit elementtype attributes for indirect inline asm constraints
This implements the clang side of D116531. The elementtype
attribute is added for all indirect constraints (*) and tests are
updated accordingly.

Differential Revision: https://reviews.llvm.org/D116666
2022-01-06 09:29:22 +01:00
David Pagan 7df2371bc6 Add codegen for allocate directive's 'align' clause 2022-01-05 12:40:58 -05:00
Chuanqi Xu c75cedc237 [Coroutines] Set presplit attribute in Clang and mlir
This fixes bug49264.

Simply, coroutine shouldn't be inlined before CoroSplit. And the marker
for pre-splited coroutine is created in CoroEarly pass, which ran after
AlwaysInliner Pass in O0 pipeline. So that the AlwaysInliner couldn't
detect it shouldn't inline a coroutine. So here is the error.

This patch set the presplit attribute in clang and mlir. So the inliner
would always detect the attribute before splitting.

Reviewed By: rjmccall, ezhulenev

Differential Revision: https://reviews.llvm.org/D115790
2022-01-05 10:25:02 +08:00
serge-sans-paille 9290ccc3c1 Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of
attributes to be removed from an AttrBuilder. Previously AttrBuilder was used
both for building and removing, which introduced odd situation like creation of
Attribute with dummy value because the only relevant part was the attribute
kind.

Differential Revision: https://reviews.llvm.org/D116110
2022-01-04 15:37:46 +01:00
Jun Zhang 82020de532
Recommit "[Clang] Extend emitUnaryBuiltin to avoid duplicate logic.""
This reverts the revert commit f552ba6e84.

Recommit with fixed author name.
2022-01-04 13:46:41 +00:00
Florian Hahn f552ba6e84
Revert "[Clang] Extend emitUnaryBuiltin to avoid duplicate logic."
This reverts commit 5c57e6aa57.

Reverted due to a typo in the authors name. Will recommit soon with
fixed authorship.
2022-01-04 13:45:28 +00:00
Jun Zhan 5c57e6aa57
[Clang] Extend emitUnaryBuiltin to avoid duplicate logic.
This patch extends `emitUnaryBuiltin` so that we can better emitting IR when
implement builtins specified in D111529.

Also contains some NFC, applying it to existing code.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D116161
2022-01-04 11:47:41 +00:00
Kazu Hirata d677a7cb05 [clang] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-02 10:20:23 -08:00
Kazu Hirata 683e6ee7d0 [CodeGen] Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2022-01-01 09:14:23 -08:00
Kazu Hirata 298367ee6e [clang] Use nullptr instead of 0 or NULL (NFC)
Identified with modernize-use-nullptr.
2021-12-29 08:34:20 -08:00
Johannes Doerfert 944aa0421c Reapply "[OpenMP][NFCI] Embed the source location string size in the ident_t"
This reverts commit 73ece231ee and
reapplies 7bfcdbcbf3 with mlir changes.
Also reverts commit 423ba12971 and
includes the unit test changes of
16da214004.
2021-12-29 01:10:38 -06:00
Mehdi Amini 73ece231ee Revert "[OpenMP][NFCI] Embed the source location string size in the ident_t"
This reverts commit 7bfcdbcbf3.
Broke MLIR build
2021-12-29 06:57:36 +00:00
Johannes Doerfert 7bfcdbcbf3 [OpenMP][NFCI] Embed the source location string size in the ident_t
One of the unused ident_t fields now holds the size of the string
(=const char *) field so we have an easier time dealing with those
in the future.

Differential Revision: https://reviews.llvm.org/D113126
2021-12-28 23:53:29 -06:00
Joseph Huber 7cdaa5a94e [OpenMP][FIX] Change globalization alignment to 16
This patch changes the default aligntment from 8 to 16, and encodes this
information in the `__kmpc_alloc_shared` runtime call to communicate it
to the HeapToStack pass. The previous alignment of 8 was not sufficient
for the maximum size of primitive types on 64-bit systems, and needs to
be increaesd. This reduces the amount of space availible in the data
sharing stack, so this implementation will need to be improved later to
include the alignment requirements in the allocation call, and use it
properly in the data sharing stack in the runtime.

Depends on D115888

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115971
2021-12-27 16:58:25 -05:00
Nikita Popov 3e65861131 [CodeGen] Avoid one more pointer element type access
The number of elements is always a SizeTy here.
2021-12-27 12:58:22 +01:00
Nikita Popov 1f07a4a569 [CodeGen] Avoid more pointer element type accesses 2021-12-27 12:00:22 +01:00
Shao-Ce SUN ec501f15a8 [clang][CodeGen] Remove the signed version of createExpression
Fix a TODO. Remove the callers of this signed version and delete.

Reviewed By: CodaFi

Differential Revision: https://reviews.llvm.org/D116014
2021-12-27 14:16:08 +08:00
Kazu Hirata 31cfb3f4f6 [clang] Remove redundant calls to c_str() (NFC)
Identified with readability-redundant-string-cstr.
2021-12-26 13:31:40 -08:00
Kazu Hirata 0542d15211 Remove redundant string initialization (NFC)
Identified with readability-redundant-string-init.
2021-12-26 09:39:26 -08:00
Kazu Hirata 2d303e6781 Remove redundant return and continue statements (NFC)
Identified with readability-redundant-control-flow.
2021-12-24 23:17:54 -08:00
Kazu Hirata 76f0f1cc5c Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC) 2021-12-24 21:43:06 -08:00
Kazu Hirata 9c0a4227a9 Use Optional::getValueOr (NFC) 2021-12-24 20:57:40 -08:00
Shilei Tian c7a589a2c4 [Clang][OpenMP] Add the support for atomic compare in parser
This patch adds the support for `atomic compare` in parser. The support
in Sema and CodeGen will come soon. For now, it simply eimits an error when it
is encountered.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D115561
2021-12-24 08:16:51 -05:00
Nikita Popov dd903173c0 [OpenMP] Avoid creating null pointer lvalue (NFC)
The reduction initialization code creates a "naturally aligned null
pointer to void lvalue", which I found somewhat odd, even though it
works out in the end because it is not actually used. It doesn't
look like this code actually needs an LValue for anything though,
and we can use an invalid Address to represent this case instead.

Differential Revision: https://reviews.llvm.org/D116214
2021-12-24 09:01:56 +01:00
Chuanqi Xu f3d4e168db [C++20] Conform coroutine's comments in clang (NFC-ish)
The comments for coroutine in clang wrote for coroutine-TS. Now
coroutine is merged into standard. Try to conform the comments.
2021-12-24 12:41:44 +08:00
Nikita Popov 7977fd7cfc [OpenMP] Remove no-op cast (NFC)
This was casting the address to its own element type, which is
a no-op.
2021-12-23 15:15:26 +01:00
Nikita Popov bf2b5551f9 [CodeGen] Use CreateConstInBoundsGEP() in one more place
This does exactly what this code manually implemented.
2021-12-23 14:58:47 +01:00
Nikita Popov 2c7dc13146 [CGBuilder] Add CreateGEP() overload that accepts an Address
Add an overload for an Address and a single non-constant offset.
This makes it easier to preserve the element type and adjust the
alignment appropriately.
2021-12-23 14:53:42 +01:00
Nikita Popov 53f0538181 [CodeGen] Use correct element type for store to sret
sret is special in that it does not use the memory type
representation. Manually construct the LValue using ConvertType
instead of ConvertTypeForMem here.

This fixes matrix-lowering-opt-levels.c on s390x.
2021-12-23 13:02:49 +01:00
Nikita Popov 09669e6c5f [CodeGen] Avoid pointer element type access when creating LValue
This required fixing two places that were passing the pointer type
rather than the expected pointee type to the method.
2021-12-23 10:53:15 +01:00
Nikita Popov 1201a0f395 [OpenMP] Fix incorrect type when casting from uintptr
MakeNaturalAlignAddrLValue() expects the pointee type, but the
pointer type was passed. As a result, the natural alignment of
the pointer (usually 8) was always used in place of the natural
alignment of the value type.

Differential Revision: https://reviews.llvm.org/D116171
2021-12-23 08:57:11 +01:00
Krzysztof Parzyszek dcb3e8083a [Hexagon] Make conversions to vector predicate types explicit for builtins
HVX does not have load/store instructions for vector predicates (i.e. bool
vectors). Because of that, vector predicates need to be converted to another
type before being stored, and the most convenient representation is an HVX
vector.
As a consequence, in C/C++, source-level builtins that either take or
produce vector predicates take or return regular vectors instead. On the
other hand, the corresponding LLVM intrinsics do have boolean types that,
and so a conversion of the operand or the return value was necessary.
This conversion would happen inside clang's codegen, but was somewhat
fragile.

This patch changes the strategy: a builtin that takes a vector predicate
now really expects a vector predicate. Since such a predicate cannot be
provided via a variable, this builtin must be composed with other builtins
that either convert vector to a predicate (V6_vandvrt) or predicate to a
vector (V6_vandqrt).

For users using builtins defined in hvx_hexagon_protos.h there is no impact:
the conversions were added to that file. Other users will need to insert
- __builtin_HEXAGON_V6_vandvrt[_128B](V, -1) to convert vector V to a
  vector predicate, or
- __builtin_HEXAGON_V6_vandqrt[_128B](Q, -1) to convert vector predicate Q
  to a vector.

Builtins __builtin_HEXAGON_V6_vmaskedstore.* are a temporary exception to
that, but they are deprecated and should not be used anyway. In the future
they will either follow the same rule, or be removed.
2021-12-22 12:52:24 -08:00
Jeremy Morse ea22fdd120 [Clang][DebugInfo] Cease turning instruction-referencing off by default
Over in D114631 I turned this debug-info feature on by default, for x86_64
only. I'd previously stripped out the clang cc1 option that controlled it
in 651122fc4a, unfortunately that turned out to not be completely
effective, and the two things deleted in this patch continued to keep it
off-by-default.  Oooff.

As a follow-up, this patch removes the last few things to do with
ValueTrackingVariableLocations from clang, which was the original purpose
of D114631. In an ideal world, if this patch causes you trouble you'd
revert 3c04507088 instead, which was where this behaviour was supposed
to start being the default, although that might not be practical any more.
2021-12-22 16:30:05 +00:00
Alok Kumar Sharma 5eb271880c [clang][OpenMP][DebugInfo] Debug support for variables in shared clause of OpenMP task construct
Currently variables appearing inside shared clause of OpenMP task construct
are not visible inside lldb debugger.

After the current patch, lldb is able to show the variable

```
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000400934 a.out`.omp_task_entry. [inlined] .omp_outlined.(.global_tid.=0, .part_id.=0x000000000071f0d0, .privates.=0x000000000071f0e8, .copy_fn.=(a.out`.omp_task_privates_map. at testshared.cxx:8), .task_t.=0x000000000071f0c0, __context=0x000000000071f0f0) at testshared.cxx:10:34
   7      else {
   8    #pragma omp task shared(svar) firstprivate(n)
   9        {
-> 10         printf("Task svar = %d\n", svar);
   11         printf("Task n = %d\n", n);
   12         svar = fib(n - 1);
   13       }
(lldb) p svar
(int) $0 = 9
```

Reviewed By: djtodoro

Differential Revision: https://reviews.llvm.org/D115510
2021-12-22 20:04:21 +05:30
Jun Zhan b55ea2fbc0
[Clang] Add __builtin_reduce_xor
This patch implements __builtin_reduce_xor as specified in D111529.

Reviewed By: fhahn, aaron.ballman

Differential Revision: https://reviews.llvm.org/D115231
2021-12-22 10:00:27 +00:00
Alexandre Ganea a282ea4898 Reland - [CodeView] Emit S_OBJNAME record
Reland integrates build fixes & further review suggestions.

Thanks to @zturner for the initial S_OBJNAME patch!

Differential Revision: https://reviews.llvm.org/D43002
2021-12-21 19:02:14 -05:00
Alexandre Ganea 5bb5142e80 Revert [CodeView] Emit S_OBJNAME record
Also revert all subsequent fixes:
- abd1cbf5e5 [Clang] Disable debug-info-objname.cpp test on Unix until I sort out the issue.
- 00ec441253 [Clang] debug-info-objname.cpp test: explictly encode a x86 target when using %clang_cl to avoid falling back to a native CPU triple.
- cd407f6e52 [Clang] Fix build by restricting debug-info-objname.cpp test to x86.
2021-12-21 19:02:14 -05:00
Nikita Popov a995cdab19 [CodeGen] Avoid more pointer element type accesses 2021-12-21 15:52:18 +01:00
Alexandre Ganea f44e3fbadd [CodeView] Emit S_OBJNAME record
Thanks to @zturner for the initial patch!

Differential Revision: https://reviews.llvm.org/D43002
2021-12-21 09:26:36 -05:00
Nikita Popov 9a05a7b00c [CodeGen] Accept Address in CreateLaunderInvariantGroup
Add an overload that accepts and returns an Address, as we
generally just want to replace the pointer with a laundered one,
while retaining remaining information.
2021-12-21 14:43:20 +01:00
Nikita Popov e751d97863 [CodeGen] Avoid some pointer element type accesses
This avoids some pointer element type accesses when compiling
C++ code.
2021-12-21 14:16:28 +01:00
Nikita Popov 55d7a12b86 [CodeGen] Avoid pointee type access during global var declaration
All callers pass in a GlobalVariable, so we can conveniently fetch
the type from there.
2021-12-21 11:48:37 +01:00
Sami Tolvanen ec2e26eaf6 [Clang] Add __builtin_function_start
Control-Flow Integrity (CFI) replaces references to address-taken
functions with pointers to the CFI jump table. This is a problem
for low-level code, such as operating system kernels, which may
need the address of an actual function body without the jump table
indirection.

This change adds the __builtin_function_start() builtin, which
accepts an argument that can be constant-evaluated to a function,
and returns the address of the function body.

Link: https://github.com/ClangBuiltLinux/linux/issues/1353

Depends on D108478

Reviewed By: pcc, rjmccall

Differential Revision: https://reviews.llvm.org/D108479
2021-12-20 12:55:33 -08:00
Ellis Hoag ac719d7c9a [InstrProf] Don't profile merge by default in lightweight mode
Profile merging is not supported when using debug info profile
correlation because the data section won't be in the binary at runtime.
Change the default profile name in this mode to `default_%p.proflite` so
we don't use profile merging.

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D115979
2021-12-20 09:51:49 -08:00
Sam McCall af27466c50 Reland "[AST] Add UsingType: a sugar type for types found via UsingDecl"
This reverts commit cc56c66f27.
Fixed a bad assertion, the target of a UsingShadowDecl must not have
*local* qualifiers, but it can be a typedef whose underlying type is qualified.
2021-12-20 18:03:15 +01:00
Sam McCall cc56c66f27 Revert "[AST] Add UsingType: a sugar type for types found via UsingDecl"
This reverts commit e1600db19d.

Breaks sanitizer tests, at least on windows:
https://lab.llvm.org/buildbot/#/builders/127/builds/21592/steps/4/logs/stdio
2021-12-20 17:53:56 +01:00
Sam McCall e1600db19d [AST] Add UsingType: a sugar type for types found via UsingDecl
Currently there's no way to find the UsingDecl that a typeloc found its
underlying type through. Compare to DeclRefExpr::getFoundDecl().

Design decisions:
- a sugar type, as there are many contexts this type of use may appear in
- UsingType is a leaf like TypedefType, the underlying type has no TypeLoc
- not unified with UnresolvedUsingType: a single name is appealing,
  but being sometimes-sugar is often fiddly.
- not unified with TypedefType: the UsingShadowDecl is not a TypedefNameDecl or
  even a TypeDecl, and users think of these differently.
- does not cover other rarer aliases like objc @compatibility_alias,
  in order to be have a concrete API that's easy to understand.
- implicitly desugared by the hasDeclaration ASTMatcher, to avoid
  breaking existing patterns and following the precedent of ElaboratedType.

Scope:
- This does not cover types associated with template names introduced by
  using declarations. A future patch should introduce a sugar TemplateName
  variant for this. (CTAD deduced types fall under this)
- There are enough AST matchers to fix the in-tree clang-tidy tests and
  probably any other matchers, though more may be useful later.

Caveats:
- This changes a fairly common pattern in the AST people may depend on matching.
  Previously, typeLoc(loc(recordType())) matched whether a struct was
  referred to by its original scope or introduced via using-decl.
  Now, the using-decl case is not matched, and needs a separate matcher.
  This is similar to the case of typedefs but nevertheless both adds
  complexity and breaks existing code.

Differential Revision: https://reviews.llvm.org/D114251
2021-12-20 17:15:38 +01:00
Yaxun (Sam) Liu a6786cdd57 [HIPSPV][3/4] Enable SPIR-V emission for HIP
This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.

‘--offload’ option, which is envisioned in [1], is added for specifying
offload targets. This option is used to override default device target
(amdgcn-amd-amdhsa) for HIP compilation for emitting device code as
SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().

getOffloadingDeviceToolChain() function (based on the design in the
SYCL repository) is added to select HIPSPVToolChain when HIP offload
target is ‘spirv64’.

The HIPActionBuilder is modified to produce LLVM IR at the backend
phase. HIPSPV tool chain expects to receive HIP device code as LLVM
IR so it can run external LLVM passes over them. HIPSPV TC is also
responsible for emitting the SPIR-V binary.

A Cuda GPU architecture ‘generic’ is added. The name is picked from
the LLVM SPIR-V Backend. In the HIPSPV code path the architecture
name is inserted to the bundle entry ID as target ID. Target ID is
expected to be always present so a component in the target triple
is not mistaken as target ID.

Tests are added for checking the HIPSPV tool chain.

[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html

Patch by: Henry Linjamäki

Reviewed by: Yaxun Liu, Artem Belevich, Alexey Bader

Differential Revision: https://reviews.llvm.org/D110622
2021-12-20 10:45:09 -05:00
jacquesguan 9c11e95286 [Clang][RISCV] Fix upper bound of RISC-V V type in debug info
The UpperBound of RVV type in debug info should be elements count minus one,
as the LowerBound start from zero.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D115430
2021-12-20 14:25:06 +08:00
Esme-Yi 18f087c21c [DebugInfo][Clang] record the access flag for class/struct/union types.
Summary: This patch records the access flag for
class/struct/union types in the clang part.

The summary of binary size change and debug info size change due to the DW_AT_accessibility attribute are as the following table. They are built with flags of `clang -O0 -g` (no -gz).

| section | before | after | change | % |
| .debug_loc | 929821 | 929821 |0|0|
|.debug_abbrev | 5885289 | 5971547 |+86258|+1.466%|
|.debug_info | 497613455 | 498122074 |+508619|+0.102%|
|.debug_ranges | 45731664 | 45731664 |0|0|
|.debug_str | 233842595 | 233839388 |-3207| -0.001%|
|.debug_line | 149773166 | 149764583 |-8583|-0.006%|
|total (debug) |933775990 |934359077|+583087 |+0.062%|

|total (binary) |1394617288 | 1395200024| +582736|+0.042%|

Reviewed By: dblaikie, shchenz

Differential Revision: https://reviews.llvm.org/D115503
2021-12-20 02:40:42 +00:00
Sanjay Patel 1965cc4695 [CodeGen] remove creation of FP cast function attribute
This is the last cleanup step resulting from D115804 .
Now that clang uses intrinsics when we're in the special FP mode,
we don't need a function attribute as an indicator to the backend.
The LLVM part of the change is in D115885.

Differential Revision: https://reviews.llvm.org/D115886
2021-12-19 11:55:00 -05:00
Kazu Hirata 713ee230f8 [clang] Use llvm::reverse (NFC) 2021-12-17 16:51:42 -08:00
Nikita Popov 9e45146721 [CodeGen] Fix element type for sret argument
Fix a mistake in 9bf917394eba3ba4df77cc17690c6d04f4e9d57f: sret
arguments use ConvertType, not ConvertTypeForMem, see the handling
in CodeGenTypes::GetFunctionType().

This fixes fp-matrix-pragma.c on s390x.
2021-12-17 16:13:28 +01:00
Nikita Popov 9bf917394e [CodeGen] Avoid more pointer element type accesses 2021-12-17 12:11:50 +01:00
Nikita Popov ba31cb4d38 [CodeGen] Store element type in RValue
For aggregates, we need to store the element type to be able to
reconstruct the aggregate Address. This increases the size of this
packed structure (as the second value is already used for alignment
in this case), but I did not observe any compile-time or memory
usage regression from this change.
2021-12-17 09:05:59 +01:00
Ellis Hoag 58d9c1aec8 [Try2][InstrProf] Attach debug info to counters
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.

Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

The original diff https://reviews.llvm.org/D114565 was reverted because of the `Instrumentation/InstrProfiling/debug-info-correlate.ll` test, which is fixed in this commit.

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D115693
2021-12-16 14:20:30 -08:00
Mike Rice 2d0bf14397 [clang] Cleanup unneeded Function nullptr checks [NFC]
Add an assert and avoid unneeded checks of Fn in
CodeGenFunction::GenerateCode.

Differential Revision: https://reviews.llvm.org/D115817
2021-12-16 08:28:10 -08:00
Nikita Popov 2d89382b5a [CodeGen] Avoid more pointer element type accesses
This is enough to build sqlite3 with opaque pointers.
2021-12-16 16:34:09 +01:00
Nikita Popov 8285522014 [CodeGen] Always update map entry after adding initializer
With opaque pointers the pointer cast may be a no-op, such that
var and castedAddr are the same. However, we still need to update
the map entry as the underlying global changed. We could explicitly
check whether the global was replaced, but we may as well just
always update the entry.
2021-12-16 16:29:35 +01:00
Nikita Popov a0cf066eac [CodeGen] Store element type in ParamValue
ParamValue is basically a union between an Address and a Value*.
To be able to reconstruct the Address, we now need to store the
pointer element type.
2021-12-16 15:31:55 +01:00
Nikita Popov 58c8c53263 [CodeGen] Avoid more pointer element type accesses 2021-12-16 15:26:21 +01:00
Sanjay Patel 8c7f2a4f87 [CodeGen] use saturating FP casts when compiling with "no-strict-float-cast-overflow"
We got an unintended consequence of the optimizer getting smarter when
compiling in a non-standard mode, and there's no good way to inhibit
those optimizations at a later stage. The test is based on an example
linked from D92270.

We allow the "no-strict-float-cast-overflow" exception to normal C
cast rules to preserve legacy code that does not expect overflowing
casts from FP to int to produce UB. See D46236 for details.

Differential Revision: https://reviews.llvm.org/D115804
2021-12-16 09:10:12 -05:00
Nikita Popov 34eb715f61 [CodeGen] Avoid more pointer element type accesses 2021-12-16 12:03:11 +01:00
Nikita Popov 9fa15e0073 [CodeGen] Remove an unused MakeAddrLValue() overload (NFC)
This is unused and we should prefer the overloads accepting Address.
2021-12-16 11:49:20 +01:00
Nikita Popov 6bca9a428e [CodeGen] Store ElementType in LValue
Store the pointer element type inside LValue so that we can
preserve it when converting it back into an Address. Storing the
pointer element type might not be strictly required here in that
we could probably re-derive it from the QualType (which would
require CGF access though), but storing it seems like the simpler
solution.

The global register case is special and does not store an element
type, as the value is not a pointer type in that case and it's not
possible to create an Address from it.

This is the main remaining part from D103465.

Differential Revision: https://reviews.llvm.org/D115791
2021-12-16 09:23:33 +01:00
Nikita Popov b9492ec649 [CodeGen] Avoid some pointer element type accesses 2021-12-15 14:46:10 +01:00
Nikita Popov d930c3155c [CodeGen] Pass element type to EmitCheckedInBoundsGEP()
Same as for other GEP creation methods.
2021-12-15 14:03:33 +01:00
Nikita Popov 90bbf79c7b [CodeGen] Avoid some deprecated Address constructors
Some of these are on the critical path towards making something
minimal work with opaque pointers.
2021-12-15 12:45:23 +01:00
Nikita Popov 481de0ed80 [CodeGen] Prefer CreateElementBitCast() where possible
CreateElementBitCast() can preserve the pointer element type in
the presence of opaque pointers, so use it in place of CreateBitCast()
in some places. This also sometimes simplifies the code a bit.
2021-12-15 11:48:39 +01:00
Nikita Popov 834c8ff587 [CodeGen] Avoid some uses of deprecated Address constructor
Explicitly pass in the element type instead.
2021-12-15 11:13:10 +01:00
Nikita Popov c3b624a191 [CodeGen] Avoid deprecated ConstantAddress constructor
Change all uses of the deprecated constructor to pass the
element type explicitly and drop it.

For cases where the correct element type was not immediately
obvious to me or would require a slightly larger change I'm
falling back to explicitly calling getPointerElementType() for now.
2021-12-15 10:42:41 +01:00
Nikita Popov b4f46555d7 [CodeGen] Avoid some pointer element type accesses 2021-12-15 09:29:27 +01:00
Nikita Popov abbc2e997b [CodeGen] Store ElementType in Address
Explicitly track the pointer element type in Address, rather than
deriving it from the pointer type, which will no longer be possible
with opaque pointers. This just adds the basic facility, for now
everything is still going through the deprecated constructors.

I had to adjust one place in the LValue implementation to satisfy
the new assertions: Global registers are represented as a
MetadataAsValue, which does not have a pointer type. We should
avoid using Address in this case.

This implements a part of D103465.

Differential Revision: https://reviews.llvm.org/D115725
2021-12-15 08:59:44 +01:00
Sindhu Chittireddy 4706a297fb Avoid setting tbaa on the store of return type of call to inline assembler.
In 32bit mode, attaching TBAA metadata to the store following the call
to inline assembler results in describing the wrong type by making a
fake lvalue(i.e., whatever the inline assembler happens to leave in
EAX:EDX.) Even if inline assembler somehow describes the correct type,
setting TBAA information on return type of call to inline assembler is
likely not correct, since TBAA rules need not apply to inline assembler.

Differential Revision: https://reviews.llvm.org/D115320
2021-12-14 17:40:33 -08:00
Nikita Popov b81450afb6 [CodeGen] Add std:: qualifier
Hopefully addresses the buildbot failures.
2021-12-14 12:17:55 +01:00
Nikita Popov b8d121eb1d [CodeGen] Require use of Address::invalid() for invalid address (NFC)
This no longer allows creating an invalid Address through the regular
constructor. There were only two places that did this (AggValueSlot
and EHCleanupScope) which did this by converting a potential nullptr
into an Address. I've fixed both of these by directly storing an
Address instead.

This is intended as a bit of preliminary cleanup for D103465.

Differential Revision: https://reviews.llvm.org/D115630
2021-12-14 12:06:05 +01:00
Ellis Hoag c809da7d9c Revert "[InstrProf] Attach debug info to counters"
This reverts commit 800bf8ed29.

The `Instrumentation/InstrProfiling/debug-info-correlate.ll` test was
failing because I forgot the `llc` commands are architecture specific.
I'll follow up with a fix.

Differential Revision: https://reviews.llvm.org/D115689
2021-12-13 18:15:17 -08:00
Ellis Hoag 800bf8ed29 [InstrProf] Attach debug info to counters
Add the llvm flag `-debug-info-correlate` to attach debug info to instrumentation counters so we can correlate raw profile data to their functions. Raw profiles are dumped as `.proflite` files. The next diff enables `llvm-profdata` to consume `.proflite` and debug info files to produce a normal `.profdata` profile.

Part of the "lightweight instrumentation" work: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D114565
2021-12-13 17:51:22 -08:00
Ethan Stewart d1327f8a57 [clang][amdgpu] - Choose when to promote VarDecl to address space 4.
There are instances where clang codegen creates stores to
address space 4 in ctors, which causes a crash in llc.
This store was being optimized out at opt levels > 0.

For example:

pragma omp declare target
static  const double log_smallx = log2(smallx);
pragma omp end declare target

This patch ensures that any global const that does not
have constant initialization stays in address space 1.

Note - a second patch is in the works where all global
constants are placed in address space 1 during
codegen and then the opt pass InferAdressSpaces
will promote to address space 4 where necessary.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115661
2021-12-13 16:31:24 -06:00
Matt Devereau 41def32040 [AArch64][SVE][NEON] Add NEON-SVE-Bridge intrinsics
Adds svset_neonq, svget_neonq, svdup_neonq AArch64 intrinsics.

These are described in the ACLE specification:
https://github.com/ARM-software/acle/pull/72

https://reviews.llvm.org/D114713
2021-12-13 11:31:57 +00:00
Andrew Browne 7c004c2bc9 Revert "[asan] Add support for disable_sanitizer_instrumentation attribute"
This reverts commit 2b554920f1.

This change causes tsan test timeout on x86_64-linux-autoconf.

The timeout can be reproduced by:
  git clone https://github.com/llvm/llvm-zorg.git
  BUILDBOT_CLOBBER= BUILDBOT_REVISION=eef8f3f85679c5b1ae725bade1c23ab7bb6b924f llvm-zorg/zorg/buildbot/builders/sanitizers/buildbot_standard.sh
2021-12-10 14:33:38 -08:00
Alexander Potapenko 2b554920f1 [asan] Add support for disable_sanitizer_instrumentation attribute
For ASan this will effectively serve as a synonym for
__attribute__((no_sanitize("address")))

Differential Revision: https://reviews.llvm.org/D114421
2021-12-10 12:17:26 +01:00
Joseph Huber bc9c4d7216 [OpenMP][FIX] Pass the num_threads value directly to parallel_51
The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.

In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D113623
2021-12-09 16:30:29 -05:00
Chuanqi Xu 352e36e10d [Coroutines] Remove unused coroutine builtin/intrinsics llvm.coro.param (NFC-ish)
I found that the coroutine intrinsic llvm.coro.param in documentation
(https://llvm.org/docs/Coroutines.html#id101) didn't get used actually
since there isn't lowering codes in LLVM. I also checked the
implementation of libstdc++ and libc++. Both of them didn't use
llvm.coro.param. So I am pretty sure that the llvm.coro.param intrinsic
is unused. I think it would be better t to remove it to avoid possible
misleading understandings.

Note: according to [class.copy.elision]/p1.3, this optimization is
allowed by the C++ language specification. Let's make it someday.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D115222
2021-12-09 14:40:25 +08:00
Duncan P. N. Exon Smith cfd1d49dc0 OpenMP: Avoid using SmallVector::set_size()
Update `OpenMPIRBuilder::collapseLoops()` to call `resize()` instead of
`set_size()`. The latter asserts on capacity limits and cannot grow,
which seems likely to be unintentional here (if it is, I think a local
assertion would be good for clarity).

Also update `CodeGenFunction::EmitOMPCollapsedCanonicalLoopNest()` to
use `pop_back_n()` instead of `set_size()`.

Differential Revision: https://reviews.llvm.org/D115378
2021-12-08 15:22:50 -08:00
Jun Zhang 8680f951c2 Add __builtin_elementwise_ceil
This patch implements one of the missing builtin functions specified
in https://reviews.llvm.org/D111529.
2021-12-08 08:29:33 -05:00
Henry Linjamäki 9ae5810b53 [HIPSPV] Convert HIP kernels to SPIR-V kernels
This patch translates HIP kernels to SPIR-V kernels when the HIP
compilation mode is targeting SPIR-S. This involves:

* Setting Cuda calling convention to CC_OpenCLKernel (which maps to
  SPIR_KERNEL in LLVM IR later on).

* Coercing pointer arguments with default address space (AS) qualifier
  to CrossWorkGroup AS (__global in OpenCL). HIPSPV's device code is
  ultimately SPIR-V for OpenCL execution environment (as
  starter/default) where Generic or Function (OpenCL's private) is not
  supported as storage class for kernel pointer types. This leaves the
  CrossWorkGroup to be the only reasonable choice for HIP buffers.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D109818
2021-12-08 12:18:15 +03:00
Yaxun (Sam) Liu 3b172f60c6 [HIP] Fix -fgpu-rdc for Windows
This patch fixes issues for -fgpu-rdc for Windows MSVC
toolchain:

Fix COFF specific section flags and remove section types
in llvm-mc input file for Windows.

Escape fatbin path in llvm-mc input file.

Add -triple option to llvm-mc.

Put __hip_gpubin_handle in comdat when it has linkonce_odr
linkage.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D115039
2021-12-06 16:42:23 -05:00
Aaron Ballman 6c75ab5f66 Introduce _BitInt, deprecate _ExtInt
WG14 adopted the _ExtInt feature from Clang for C23, but renamed the
type to be _BitInt. This patch does the vast majority of the work to
rename _ExtInt to _BitInt, which accounts for most of its size. The new
type is exposed in older C modes and all C++ modes as a conforming
extension. However, there are functional changes worth calling out:

* Deprecates _ExtInt with a fix-it to help users migrate to _BitInt.
* Updates the mangling for the type.
* Updates the documentation and adds a release note to warn users what
is going on.
* Adds new diagnostics for use of _BitInt to call out when it's used as
a Clang extension or as a pre-C23 compatibility concern.
* Adds new tests for the new diagnostic behaviors.

I want to call out the ABI break specifically. We do not believe that
this break will cause a significant imposition for early adopters of
the feature, and so this is being done as a full break. If it turns out
there are critical uses where recompilation is not an option for some
reason, we can consider using ABI tags to ease the transition.
2021-12-06 12:52:01 -05:00
Jonas Devlieghere 4cb79294e8 Revert "[clang][DebugInfo] Allow function-local statics and types to be scoped within a lexical block"
This reverts commit e403f4fdc8 because it
breaks TestSetData.py on GreenDragon:

https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/39089/
2021-12-06 09:34:53 -08:00
Kristina Bessonova e403f4fdc8 [clang][DebugInfo] Allow function-local statics and types to be scoped within a lexical block
This is almost a reincarnation of https://reviews.llvm.org/D15977 originally
implemented by Amjad Aboud. It was discussed on llvm-dev [0], committed
with its backend counterpart [1], but finally reverted [2].

This patch makes clang to emit debug info for function-local static variables,
records (classes, structs and unions) and typdefs correctly scoped if
those function-local entites defined within a lexical (bracketed) block.

Before this patch, clang emits all those entities directly scoped in
DISubprogram no matter where they were really defined, causing
debug info loss (reported several times in [3], [4], [5]).

[0] https://lists.llvm.org/pipermail/llvm-dev/2015-November/092551.html
[1] https://reviews.llvm.org/rG30e7a8f694a19553f64b3a3a5de81ce317b9ec2f
[2] https://reviews.llvm.org/rGdc4531e552af6c880a69d226d3666756198fbdc8
[3] https://bugs.llvm.org/show_bug.cgi?id=19238
[4] https://bugs.llvm.org/show_bug.cgi?id=23164
[5] https://bugs.llvm.org/show_bug.cgi?id=44695

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D113743
2021-12-06 12:19:09 +02:00
Jay Foad 2774bad112 [AMDGPU] Change llvm.amdgcn.image.bvh.intersect.ray to take vec3 args
The ray_origin, ray_dir and ray_inv_dir arguments should all be vec3 to
match how the hardware instruction works.

Don't change the API of the corresponding OpenCL builtins.

Differential Revision: https://reviews.llvm.org/D115032
2021-12-04 10:32:11 +00:00
Peter Collingbourne 0a14674f27 CodeGen: Strip exception specifications from function types in CFI type names.
With C++17 the exception specification has been made part of the
function type, and therefore part of mangled type names.

However, it's valid to convert function pointers with an exception
specification to function pointers with the same argument and return
types but without an exception specification, which means that e.g. a
function of type "void () noexcept" can be called through a pointer
of type "void ()". We must therefore consider the two types to be
compatible for CFI purposes.

We can do this by stripping the exception specification before mangling
the type name, which is what this patch does.

Differential Revision: https://reviews.llvm.org/D115015
2021-12-03 14:50:52 -05:00
Qiu Chaofan b9adaa1782 [PowerPC] [Clang] Fix alignment adjustment of single-elemented float128
This does similar thing to 6b1341e, but fixes single element 128-bit
float type: `struct { long double x; }`.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D114937
2021-12-03 18:07:34 +08:00
Qiu Chaofan 4f94c02616 [Clang] Mutate bulitin names under IEEE128 on PPC64
Glibc 2.32 and newer uses these symbol names to support IEEE-754 128-bit
float. GCC transforms name of these builtins to align with Glibc header
behavior.

Since Clang doesn't have all GCC-compatible builtins implemented, this
patch only mutates the implemented part.

Note nexttoward is a special case (no nexttowardf128) so it's also
handled here.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D112401
2021-12-03 17:50:18 +08:00
Matt Arsenault 2f0a571418 Reapply "OpenMP: Start calling setTargetAttributes for generated kernels"
This reverts commit 25eb7fa01d.

Previous buildbot failures appear to have been a fluke from a dirty
build.
2021-12-02 14:55:56 -05:00
David Greene 53adfa8750 [clang] Do not duplicate "EnableSplitLTOUnit" module flag
If clang's output is set to bitcode and LTO is enabled, clang would
unconditionally add the flag to the module.  Unfortunately, if the input were a
bitcode or IR file and had the flag set, this would result in two copies of the
flag, which is illegal IR.  Guard the setting of the flag by checking whether it
already exists.  This follows existing practice for the related "ThinLTO" module
flag.

Differential Revision: https://reviews.llvm.org/D112177
2021-12-02 08:24:56 -08:00
skc7 16b781e6d1 [AMDGPU][clang] Fix __builtin_nontemporal_store() failure on AMDGPU
Reviewed By: yaxunl, sameerds

Differential Revision: https://reviews.llvm.org/D114849
2021-12-02 05:53:25 +00:00
Ties Stuij e3b2f0226b [clang][ARM] PACBTI-M frontend support
Handle branch protection option on the commandline as well as a function
attribute. One patch for both mechanisms, as they use the same underlying
parsing mechanism.

These are recorded in a set of LLVM IR module-level attributes like we do for
AArch64 PAC/BTI (see https://reviews.llvm.org/D85649):

- command-line options are "translated" to module-level LLVM IR
  attributes (metadata).

- functions have PAC/BTI specific attributes iff the
  __attribute__((target("branch-protection=...))) was used in the function
  declaration.

- command-line option -mbranch-protection to armclang targeting Arm,
following this grammar:

branch-protection ::= "-mbranch-protection=" <protection>
protection ::=  "none" | "standard" | "bti" [ "+" <pac-ret-clause> ]
                | <pac-ret-clause> [ "+" "bti"]
pac-ret-clause ::= "pac-ret" [ "+" <pac-ret-option> ]
pac-ret-option ::= "leaf" ["+" "b-key"] | "b-key" ["+" "leaf"]

b-key is simply a placeholder to make it consistent with AArch64's
version. In Arm, however, it triggers a warning informing that b-key is
unsupported and a-key will be selected instead.

- Handle _attribute_((target(("branch-protection=..."))) for AArch32 with the
same grammer as the commandline options.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Victor Campos
- Ties Stuij

Reviewed By: vhscampos

Differential Revision: https://reviews.llvm.org/D112421
2021-12-01 10:37:16 +00:00
Matt Arsenault 25eb7fa01d Revert "OpenMP: Start calling setTargetAttributes for generated kernels"
This reverts commit 6c27d389c8.

This is failing on the buildbots
2021-11-29 15:47:10 -05:00
Anshil Gandhi df0560ca00 [HIP] Add atomic load, atomic store and atomic cmpxchng_weak builtin support in HIP-clang
Introduce `__hip_atomic_load`, `__hip_atomic_store` and `__hip_atomic_compare_exchange_weak`
builtins in HIP.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D114553
2021-11-29 12:07:13 -07:00
Matt Arsenault 6c27d389c8 OpenMP: Start calling setTargetAttributes for generated kernels
This wasn't setting any of the attributes the target would expect to
emit for kernels.
2021-11-29 13:43:34 -05:00
Erich Keane fc53eb69c2 Reapply 'Implement target_clones multiversioning'
See discussion in D51650, this change was a little aggressive in an
error while doing a 'while we were here', so this removes that error
condition, as it is apparently useful.

This reverts commit bb4934601d.
2021-11-29 06:30:01 -08:00
Alok Kumar Sharma 36cb7477d1 [clang][OpenMP][DebugInfo] Debug support for private variables inside an OpenMP task construct
Currently variables appearing inside private/firstprivate/lastprivate
clause of openmp task construct are not visible inside lldb debugger.
This is because compiler does not generate debug info for it.

Please consider the testcase debug_private.c attached with patch.

```
   28   #pragma omp task shared(res) private(priv1, priv2) firstprivate(fpriv)
   29       {
   30         priv1 = n;
   31         priv2 = n + 2;
   32         printf("Task n=%d,priv1=%d,priv2=%d,fpriv=%d\n",n,priv1,priv2,fpriv);
   33
-> 34         res = priv1 + priv2 + fpriv + foo(n - 1);
   35       }
   36   #pragma omp taskwait
   37       return res;
(lldb) p priv1
error: <user expression 0>:1:1: use of undeclared identifier 'priv1'
priv1
^
(lldb) p priv2
error: <user expression 1>:1:1: use of undeclared identifier 'priv2'
priv2
^
(lldb) p fpriv
error: <user expression 2>:1:1: use of undeclared identifier 'fpriv'
fpriv
^
```

After the current patch, lldb is able to show the variables

```
(lldb) p priv1
(int) $0 = 10
(lldb) p priv2
(int) $1 = 12
(lldb) p fpriv
(int) $2 = 14
```

Reviewed By: djtodoro

Differential Revision: https://reviews.llvm.org/D114504
2021-11-25 19:55:22 +05:30
Yaxun (Sam) Liu aa9b90ca44 Fix warning due to default switch label
Fix warning due to default label in switch which covers all enumeration values
2021-11-23 10:52:51 -05:00
Yaxun (Sam) Liu e13246a2ec [HIP] Add HIP scope atomic operations
Add an AtomicScopeModel for HIP and support for OpenCL builtins
that are missing in HIP.

Patch by: Michael Liao

Revised by: Anshil Ghandi

Reviewed by: Yaxun Liu

Differential Revision: https://reviews.llvm.org/D113925
2021-11-23 10:13:37 -05:00
Alexey Bataev 80256605f8 [OpenMP] support depend clause for taskwait directive, by Deepak
Eachempati.

This patch adds clang (parsing, sema, serialization, codegen) support for the 'depend' clause on the 'taskwait' directive.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D113540
2021-11-19 06:30:17 -08:00
Phoebe Wang de34a940ae [X86] Add -mskip-rax-setup support to align with GCC
AMD64 ABI mandates caller to specify the number of used SSE registers
when passing variable arguments.
GCC also provides option -mskip-rax-setup to skip the setup of rax when
SSE is disabled. This helps to reduce the code size, see pr23258.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112413
2021-11-18 11:20:32 +08:00
Nico Weber ae98182cf7 [clang] Make -masm=intel affect inline asm style
With this,

  void f() {  __asm__("mov eax, ebx"); }

now compiles with clang with -masm=intel.

This matches gcc.

The flag is not accepted in clang-cl mode. It has no effect on
MSVC-style `__asm {}` blocks, which are unconditionally in intel
mode both before and after this change.

One difference to gcc is that in clang, inline asm strings are
"local" while they're "global" in gcc. Building the following with
-masm=intel works with clang, but not with gcc where the ".att_syntax"
from the 2nd __asm__() is in effect until file end (or until a
".intel_syntax" somewhere later in the file):

  __asm__("mov eax, ebx");
  __asm__(".att_syntax\nmovl %ebx, %eax");
  __asm__("mov eax, ebx");

This also updates clang's intrinsic headers to work both in
-masm=att (the default) and -masm=intel modes.
The official solution for this according to "Multiple assembler dialects in asm
templates" in gcc docs->Extensions->Inline Assembly->Extended Asm
is to write every inline asm snippet twice:

    bt{l %[Offset],%[Base] | %[Base],%[Offset]}

This works in LLVM after D113932 and D113894, so use that.

(Just putting `.att_syntax` at the start of the snippet works in some but not
all cases: When LLVM interpolates in parameters like `%0`, it uses at&t or
intel syntax according to the inline asm snippet's flavor, so the `.att_syntax`
within the snippet happens to late: The interpolated-in parameter is already
in intel style, and then won't parse in the switched `.att_syntax`.)

It might be nice to invent a `#pragma clang asm_dialect push "att"` /
`#pragma clang asm_dialect pop` to be able to force asm style per snippet,
so that the inline asm string doesn't contain the same code in two variants,
but let's leave that for a follow-up.

Fixes PR21401 and PR20241.

Differential Revision: https://reviews.llvm.org/D113707
2021-11-17 13:41:59 -05:00
Ahsan Saghir 4c8b8e0154 [PowerPC] Allow MMA built-ins to accept non-void pointers and arrays
Calls to MMA builtins that take pointer to void
do not accept other pointers/arrays whereas normal
functions with the same parameter do. This patch
allows MMA built-ins to accept non-void pointers
and arrays.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D113306
2021-11-16 09:14:41 -06:00
Kazu Hirata d0ac215dd5 [clang] Use isa instead of dyn_cast (NFC) 2021-11-14 09:32:40 -08:00
Josh Learn 7611e16fce [clang][objc][codegen] Skip emitting ObjC category metadata when the
category is empty

Currently, if we create a category in ObjC that is empty, we still emit
runtime metadata for that category. This is a scenario that could
commonly be run into when using __attribute__((objc_direct_members)),
which elides the need for much of the category metadata. This is
slightly wasteful and can be easily skipped by checking the category
metadata contents during CodeGen.

rdar://66177182

Differential Revision: https://reviews.llvm.org/D113455
2021-11-12 16:21:21 -08:00
Adrian Kuegel bb4934601d Revert "Implement target_clones multiversioning"
This reverts commit 9deab60ae7.
There is a possibly unintended semantic change.
2021-11-12 11:05:58 +01:00
David Blaikie 6512098877 DebugInfo/Printing: Improve name of policy for including types for template arguments
Feedback from Richard Smith that the policy should be named closer to
the context its used in.
2021-11-11 21:59:27 -08:00
Erich Keane 9deab60ae7 Implement target_clones multiversioning
As discussed here: https://lwn.net/Articles/691932/

GCC6.0 adds target_clones multiversioning. This functionality is
an odd cross between the cpu_dispatch and 'target' MV, but is compatible
with neither.

This attribute allows you to list all options, then emits a separately
optimized version of each function per-option (similar to the
cpu_specific attribute). It automatically generates a resolver, just
like the other two.

The mangling however, is... ODD to say the least. The mangling format
is:
<normal_mangling>.<option string>.<option ordinal>.

Differential Revision:https://reviews.llvm.org/D51650
2021-11-11 11:11:16 -08:00
James Y Knight fddc4e4116 Correct handling of the 'throw()' exception specifier in C++17.
Per C++17 [except.spec], 'throw()' has become equivalent to
'noexcept', and should therefore call std::terminate, not
std::unexpected.

Differential Revision: https://reviews.llvm.org/D113517
2021-11-10 17:40:16 -05:00
Yaxun (Sam) Liu 4b3881e9f3 Emit hidden hostcall argument for sanitized kernels
this patch - https://reviews.llvm.org/D110337 changes the way how hostcall
hidden argument is emitted for printf, but the sanitized kernels also use
hostcall buffer to report a error for invalid memory access, which is not
handled by the above patch and it leads to vdi runtime error:

Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT:
Agent attempted to access an inaccessible address. code: 0x2b

Patch by: Praveen Velliengiri

Reviewed by: Yaxun Liu, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D112820
2021-11-10 17:05:57 -05:00
Yaxun (Sam) Liu 80072fde61 [CUDA][HIP] Allow comdat for kernels
Two identical instantiations of a template function can be emitted by two TU's
with linkonce_odr linkage without causing duplicate symbols in linker. MSVC
also requires these symbols be in comdat sections. Linux does not require
the symbols in comdat sections to be merged by linker but by default
clang puts them in comdat sections.

If a template kernel is instantiated identically in two TU's. MSVC requires
that them to be in comdat sections, otherwise MSVC linker will diagnose them as
duplicate symbols. However, currently clang does not put instantiated template
kernels in comdat sections, which causes link error for MSVC.

This patch allows putting instantiated template kernels into comdat sections.

Reviewed by: Artem Belevich, Reid Kleckner

Differential Revision: https://reviews.llvm.org/D112492
2021-11-10 16:42:23 -05:00
Igor Kirillov 4860f6cb25 [OpenMP] Fix: opposite attributes could be set by -fno-inline
After the changes introduced by D106799 it is possible to tag
outlined function with both AlwaysInline and NoInline attributes using
-fno-inline command line options.
This issue is similiar to D107649.

Differential Revision: https://reviews.llvm.org/D112645
2021-11-10 16:48:09 +00:00
Jon Chesterfield 27177b82d4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-10 15:30:56 +00:00
Vassil Vassilev 4fb0805c65 [clang-repl] Allow Interpreter::getSymbolAddress to take a mangled name. 2021-11-10 12:52:05 +00:00
Jorge Gorbe Moya 770ddf599d Fix unused variable warning in release build 2021-11-09 19:48:42 -08:00
hsmahesha 3b9a85d10a [CFE][Codegen] Make sure to maintain the contiguity of all the static allocas
at the start of the entry block, which in turn would aid better code transformation/optimization.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D110257
2021-11-10 08:45:21 +05:30
Joseph Huber 4b5c3e591d [OpenMP] Remove doing assumption propagation in the front end.
This patch removes the assumption propagation that was added in D110655
primarily to get assumption informatino on opaque call sites for
optimizations. The analysis done in D111445 allows us to do this more
intelligently in the back-end.

Depends on D111445

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111463
2021-11-09 17:39:24 -05:00
Kostya Serebryany b7f3a4f4fa [sancov] add tracing for loads and store
add tracing for loads and stores.

The primary goal is to have more options for data-flow-guided fuzzing,
i.e. use data flow insights to perform better mutations or more agressive corpus expansion.
But the feature is general puspose, could be used for other things too.

Pipe the flag though clang and clang driver, same as for the other SanitizerCoverage flags.
While at it, change some plain arrays into std::array.

Tests: clang flags test, LLVM IR test, compiler-rt executable test.

Reviewed By: morehouse

Differential Revision: https://reviews.llvm.org/D113447
2021-11-09 14:35:13 -08:00
Itay Bookstein 9efce0baee [clang] Run LLVM Verifier in modes without CodeGen too
Previously, the Backend_Emit{Nothing,BC,LL} modes did
not run the LLVM verifier since it is usually added via
the TargetMachine::addPassesToEmitFile method according
to the DisableVerify parameter. This is called from
EmitAssemblyHelper::AddEmitPasses, which is only relevant
for BackendAction-s that require CodeGen.

Note:
* In these particular situations the verifier is added
  to the optimization pipeline rather than the codegen
  pipeline so that it runs prior to the BC/LL emission
  pass.
* This change applies to both the old and the new PMs.
* Because the clang tests use -emit-llvm ubiquitously,
  this change will enable the verifier for them.
* A small bug is fixed in emitIFuncDefinition so that
  the clang/test/CodeGen/ifunc.c test would pass:
  the emitIFuncDefinition incorrectly passed the
  GlobalDecl of the IFunc itself to the call to
  GetOrCreateLLVMFunction for creating the resolver.

Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D113352
2021-11-09 23:57:13 +02:00
Itay Bookstein 3b1fd19357 [CodeGen] Diagnose and reject non-function ifunc resolvers
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: MaskRay, erichkeane

Differential Revision: https://reviews.llvm.org/D112868
2021-11-09 23:51:36 +02:00
Atmn Patel 737c4a2673 [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 15:11:05 -05:00
David Pagan b0de656bdf Initial parsing/sema for 'align' clause
Added basic parsing/sema/serialization support for 'align' clause for use with
'allocate' directive.
2021-11-09 07:34:18 -05:00
Atmn Patel ef717f3852 Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files"
This reverts commit 81a7cad2ff.
2021-11-09 02:10:42 -05:00
Atmn Patel 81a7cad2ff [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 01:52:52 -05:00
Akira Hatanaka d61eb6c5d9 [ObjC][ARC] Use operand bundle "clang.arc.attachedcall" on x86-64
https://reviews.llvm.org/D92808 made clang use the operand bundle
instead of emitting retainRV/claimRV calls on arm64. This commit makes
changes to clang that are needed to use the operand bundle on x86-64.

Differential Revision: https://reviews.llvm.org/D111331
2021-11-08 18:38:40 -08:00
Jon Chesterfield 0fa45d6d80 Revert "[OpenMP] Lower printf to __llvm_omp_vprintf"
This reverts commit db81d8f6c4.
2021-11-08 20:28:57 +00:00
Jon Chesterfield db81d8f6c4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

The exact set of changes to check-openmp probably needs revision before commit

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-08 18:38:00 +00:00
hyeongyu kim fd9b099906 Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit aacfbb953e.

Revert "Fix lit test failures in CodeGenCoroutines"

This reverts commit 63fff0f5bf.
2021-11-09 02:15:55 +09:00
Jon Chesterfield 2c37ae6d14 [nfc] Refactor CGGPUBuiltin to help review D112680 2021-11-08 15:00:08 +00:00
Anastasia Stulova a10a69fe9c [SPIR-V] Add SPIR-V triple and clang target info.
Add new triple and target info for ‘spirv32’ and ‘spirv64’ and,
thus, enabling clang (LLVM IR) code emission to SPIR-V target.

The target for SPIR-V is mostly reused from SPIR by derivation
from a common base class since IR output for SPIR-V is mostly
the same as SPIR. Some refactoring are made accordingly.

Added and updated tests for parts that are different between
SPIR and SPIR-V.

Patch by linjamaki (Henry Linjamäki)!

Differential Revision: https://reviews.llvm.org/D109144
2021-11-08 13:34:10 +00:00
Benjamin Kramer 8adb6d6de2 [clang] Use llvm::reverse. NFCI. 2021-11-07 14:24:33 +01:00
Yonghong Song bbab17c6c9 [Clang][Attr] fix a btf_type_attr CGDebugInfo codegen bug
Nathan Chancellor reported a crash due to commit
3466e00716 (Reland "[Attr] support btf_type_tag attribute").

The following test can reproduce the crash:
  $ cat efi.i
  typedef unsigned long efi_query_variable_info_t(int);
  typedef struct {
    struct {
      efi_query_variable_info_t __attribute__((regparm(0))) * query_variable_info;
    };
  } efi_runtime_services_t;
  efi_runtime_services_t efi_0;
  $ clang -m32 -O2 -g -c -o /dev/null efi.i

The reason is that FunctionTypeLoc.getParam(Idx) may return a
nullptr which should be checked before dereferencing the
result pointer. This patch fixed this issue.
2021-11-06 18:19:00 -07:00
hyeongyukim aacfbb953e [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169

[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)

This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453

Resolve lit failures in clang after 8ca4b3e's land

Fix lit test failures in clang-ppc* and clang-x64-windows-msvc

Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc

Fix internal_clone(aarch64) inline assembly
2021-11-06 19:19:22 +09:00
Juneyoung Lee 89ad2822af Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit 7584ef766a.
2021-11-06 15:39:19 +09:00
Juneyoung Lee 7584ef766a [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2021-11-06 15:36:42 +09:00
Zahira Ammarguellat 627868263c In spir functions, llvm.dbg.declare intrinsics created
for parameters and locals need to refer to the stack
allocation in the alloca address space.
2021-11-05 15:08:09 -07:00
Yonghong Song 3466e00716 Reland "[Attr] support btf_type_tag attribute"
This is to revert commit f95bd18b5f (Revert "[Attr] support
btf_type_tag attribute") plus a bug fix.

Previous change failed to handle cases like below:
    $ cat reduced.c
    void a(*);
    void a() {}
    $ clang -c reduced.c -O2 -g

In such cases, during clang IR generation, for function a(),
CGCodeGen has numParams = 1 for FunctionType. But for
FunctionTypeLoc we have FuncTypeLoc.NumParams = 0. By using
FunctionType.numParams as the bound to access FuncTypeLoc
params, a random crash is triggered. The bug fix is to
check against FuncTypeLoc.NumParams before accessing
FuncTypeLoc.getParam(Idx).

Differential Revision: https://reviews.llvm.org/D111199
2021-11-05 11:25:17 -07:00
Martin Storsjö f95bd18b5f Revert "[Attr] support btf_type_tag attribute"
This reverts commits 737e4216c5 and
ce7ac9e66a.

After those commits, the compiler can crash with a reduced
testcase like this:

$ cat reduced.c
void a(*);
void a() {}
$ clang -c reduced.c -O2 -g
2021-11-05 10:36:40 +02:00
Arthur Eubanks 13317286f8 [NewPM] Use the default AA pipeline by default
We almost always want to use the default AA pipeline. It's very easy for
users of PassBuilder to forget to customize the AAManager to use the
default AA pipeline (for example, the NewPM C API forgets to do this).

If somebody wants a custom AA pipeline, similar to what is being done
now with the default AA pipeline registration, they can

  FAM.registerPass([&] { return std::move(MyAA); });

before calling

  PB.registerFunctionAnalyses(FAM);

For example, LTOBackend.cpp and NewPMDriver.cpp do this.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113210
2021-11-04 15:10:34 -07:00
Mike Rice 4eac7bcf1a [OpenMP] Add parsing/sema/serialization for 'bind' clause.
Differential Revision: https://reviews.llvm.org/D113154
2021-11-04 14:40:30 -07:00
Yonghong Song 737e4216c5 [Attr] support btf_type_tag attribute
This patch added clang codegen and llvm support
for btf_type_tag support. Currently, btf_type_tag
attribute info is preserved in DebugInfo IR only for
pointer types associated with typedef, global variable
and function declaration. Eventually, such information
is emitted to dwarf.

The following is an example:
  $ cat test.c
  #define __tag __attribute__((btf_type_tag("tag")))
  int __tag *g;
  $ clang -O2 -g -c test.c
  $ llvm-dwarfdump --debug-info test.o
  ...
  0x0000001e:   DW_TAG_variable
                  DW_AT_name      ("g")
                  DW_AT_type      (0x00000033 "int *")
                  DW_AT_external  (true)
                  DW_AT_decl_file ("/home/yhs/test.c")
                  DW_AT_decl_line (2)
                  DW_AT_location  (DW_OP_addr 0x0)

  0x00000033:   DW_TAG_pointer_type
                  DW_AT_type      (0x00000042 "int")

  0x00000038:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag")

  0x00000041:     NULL

  0x00000042:   DW_TAG_base_type
                  DW_AT_name      ("int")
                  DW_AT_encoding  (DW_ATE_signed)
                  DW_AT_byte_size (0x04)

  0x00000049:   NULL

Basically, a DW_TAG_LLVM_annotation tag will be inserted
under DW_TAG_pointer_type tag if that pointer has a btf_type_tag
associated with it.

Differential Revision: https://reviews.llvm.org/D111199
2021-11-04 14:23:31 -07:00
Noah Shutty d788c44f5c [Support] Improve Caching conformance with Support library behavior
This diff makes several amendments to the local file caching mechanism
which was migrated from ThinLTO to Support in
rGe678c51177102845c93529d457b020f969125373 in response to follow-up
discussion on that commit.

Patch By: noajshu

Differential Revision: https://reviews.llvm.org/D113080
2021-11-04 13:00:44 -07:00
Aaron Ballman c524f1a076 No longer crash when a consteval function returns a structure
Ensure that the destination slot exists in this case. This addresses PR51484.
2021-11-04 09:41:10 -04:00
Kirill Stoimenov a55c4ec1ce [ASan] Process functions in Asan module pass
This came up as recommendation while reviewing D112098.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D112732
2021-11-03 20:27:53 +00:00
Vitaly Buka 3131714f8d [NFC][asan] Use AddressSanitizerOptions in ModuleAddressSanitizerPass
Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D113072
2021-11-03 11:32:14 -07:00
Kirill Stoimenov b3145323b5 Revert "[ASan] Process functions in Asan module pass"
This reverts commit 76ea87b94e.

Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D113129
2021-11-03 18:01:01 +00:00
Kirill Stoimenov 76ea87b94e [ASan] Process functions in Asan module pass
This came up as recommendation while reviewing D112098.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D112732
2021-11-03 17:51:01 +00:00
Vitaly Buka ee4634f7fe [NFC][asan] Fix confusing variable name
There is no such thing as ModuleUseAfterScope.
2021-11-02 16:49:15 -07:00
Florian Hahn 7999355106
[Clang] Add min/max reduction builtins.
This patch implements __builtin_reduce_max and __builtin_reduce_min as
specified in D111529.

The order of operations does not matter for min or max reductions and
they can be directly lowered to the corresponding
llvm.vector.reduce.{fmin,fmax,umin,umax,smin,smax} intrinsic calls.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D112001
2021-11-02 15:01:42 +01:00
serge-sans-paille 6bfc85c217 Fix inline builtin handling in case of redefinition
Basically, inline builtin definition are shadowed by externally visible
redefinition. This matches GCC behavior.

The implementation has to workaround the fact that:

1. inline builtin are renamed at callsite during codegen, but
2. they may be shadowed by a later external definition

As a consequence, during codegen, we need to walk redecls and eventually rewrite
some call sites, which is totally inelegant.

Differential Revision: https://reviews.llvm.org/D112059
2021-11-02 09:53:49 +01:00
David Blaikie 8bf1244538 DebugInfo: workaround for context-sensitive use of non-type-template-parameter integer suffixes
There's a nuanced check about when to use suffixes on these integer
non-type-template-parameters, but when rebuilding names for
-gsimple-template-names there isn't enough data in the DWARF to
determine when to use suffixes or not. So turn on suffixes always to
make it easy to match up names in llvm-dwarfdump --verify.

I /think/ if we correctly modelled auto non-type-template parameters
maybe we could put suffixes only on those. But there's also some logic
in Clang that puts the suffixes on overloaded functions - at least
that's what the parameter says (see D77598 and printTemplateArguments
"TemplOverloaded" parameter) - but I think maybe it's for anything that
/can/ be overloaded, not necessarily only the things that are overloaded
(the argument value is hardcoded at the various callsites, doesn't seem
to depend on overload resolution/searching for overloaded functions). So
maybe with "auto" modeled more accurately, and differentiating between
function templates (always using type suffixes there) and class/variable
templates (only using the suffix for "auto" types) we could correctly
use integer type suffixes only in the minimal set of cases.

But that seems all too much fuss, so let's just put integer type
suffixes everywhere always in the debug info of integer non-type
template parameters in template names.

(more context:
* https://reviews.llvm.org/D77598#inline-1057607
* https://groups.google.com/g/llvm-dev/c/ekLMllbLIZg/m/-dhJ0hO1AAAJ )

Differential Revision: https://reviews.llvm.org/D111477
2021-11-01 17:08:26 -07:00
Itay Bookstein 848812a55e [Verifier] Add verification logic for GlobalIFuncs
Verify that the resolver exists, that it is a defined
Function, and that its return type matches the ifunc's
type. Add corresponding check to BitcodeReader, change
clang to emit the correct type, and fix tests to comply.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112349
2021-10-31 20:00:57 -07:00
Kazu Hirata 4db2e4cebe Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC) 2021-10-30 19:00:19 -07:00
Kazu Hirata 972d4133e9 Use {DenseSet,SmallPtrSet}::contains (NFC) 2021-10-29 20:26:07 -07:00
Jay Foad 1b758925ad [IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction
createReplacementInstr was a trivial wrapper around
ConstantExpr::getAsInstruction, which also inserted the new instruction
into a basic block. Implement this directly in getAsInstruction by
adding an InsertBefore parameter and change all callers to use it. NFC.

A follow-up patch will remove createReplacementInstr.

Differential Revision: https://reviews.llvm.org/D112791
2021-10-29 15:02:58 +01:00
Thomas Lively fb67f3d969 [WebAssembly] Add prototype relaxed float to int trunc instructions
Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u,
i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero.

These are only exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112186
2021-10-28 14:01:53 -07:00
Mike Rice 6f9c25167d [OpenMP] Initial parsing/sema for the 'omp loop' construct
Adds basic parsing/sema/serialization support for the #pragma omp loop
directive.

Differential Revision: https://reviews.llvm.org/D112499
2021-10-28 08:26:43 -07:00
Kai Luo 6ea2431d3f [clang][compiler-rt][atomics] Add `__c11_atomic_fetch_nand` builtin and support `__atomic_fetch_nand` libcall
Add `__c11_atomic_fetch_nand` builtin to language extensions and support `__atomic_fetch_nand` libcall in compiler-rt.

Reviewed By: theraven

Differential Revision: https://reviews.llvm.org/D112400
2021-10-28 02:18:43 +00:00
Florian Hahn 01870d51b8
[Clang] Add elementwise abs builtin.
This patch implements __builtin_elementwise_abs as specified in
D111529.

Reviewed By: aaron.ballman, scanon

Differential Revision: https://reviews.llvm.org/D111986
2021-10-27 21:01:44 +01:00
Nico Weber c7aaa2efef [clang] Add range accessor for ObjCAtTryStmt catch_stmts and use it
No behavior change.

Differential Revision: https://reviews.llvm.org/D112543
2021-10-27 08:57:05 -04:00
Shraiysh Vaishay 9fb52cb3f1 [MLIR][OpenMP] Added omp.atomic.read and omp.atomic.write
This patch supports the atomic construct (read and write) following
section 2.17.7 of OpenMP 5.0 standard. Also added tests and
verifier for the same.

Reviewed By: kiranchandramohan

Differential Revision: https://reviews.llvm.org/D111992
2021-10-27 14:05:44 +05:30
Uday Bondhugula 9fb9c6b91e [Clang][NFC] Clang CUDA codegen clean-up
Update an instance of dyn_cast -> cast and other NFC clang-tidy fixes
for Clang CUDA codegen.

Differential Revision: https://reviews.llvm.org/D112284
2021-10-27 11:07:20 +05:30
Florian Hahn 1ef25d28c1
[Clang] Add elementwise min/max builtins.
This patch implements __builtin_elementwise_max and
__builtin_elementwise_min, as specified in D111529.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D111985
2021-10-26 16:53:40 +01:00
Mike Rice d8699391a4 [OPENMP51]Initial parsing/sema for append_args clause for 'declare variant'
Adds initial parsing and sema for the 'append_args' clause.

Note that an AST clause is not created as it instead adds its values
to the OMPDeclareVariantAttr.

Differential Revision: https://reviews.llvm.org/D111854
2021-10-25 09:38:50 -07:00
Kazu Hirata 16ceb44e62 [clang] Use llvm::{count,count_if,find_if,all_of,none_of} (NFC) 2021-10-25 09:14:45 -07:00
Kazu Hirata 4bd46501c3 Use llvm::any_of and llvm::none_of (NFC) 2021-10-24 17:35:33 -07:00
Duncan P. N. Exon Smith 2410fb4616 Support: Use Expected<T>::moveInto() in a few places
These are some usage examples for `Expected<T>::moveInto()`.

Differential Revision: https://reviews.llvm.org/D112280
2021-10-22 12:40:10 -07:00
Kazu Hirata dccfaddc6b [clang] Use StringRef::contains (NFC) 2021-10-21 08:58:19 -07:00
Yonghong Song f6811cec84 [DebugInfo] Support typedef with btf_decl_tag attributes
Clang patch ([1]) added support for btf_decl_tag attributes with typedef
types. This patch added llvm support including dwarf generation.
For example, for typedef
   typedef unsigned * __u __attribute__((btf_decl_tag("tag1")));
   __u u;
the following shows llvm-dwarfdump result:
   0x00000033:   DW_TAG_typedef
                   DW_AT_type      (0x00000048 "unsigned int *")
                   DW_AT_name      ("__u")
                   DW_AT_decl_file ("/home/yhs/work/tests/llvm/btf_tag/t.c")
                   DW_AT_decl_line (1)

   0x0000003e:     DW_TAG_LLVM_annotation
                     DW_AT_name    ("btf_decl_tag")
                     DW_AT_const_value     ("tag1")

   0x00000047:     NULL

  [1] https://reviews.llvm.org/D110127

Differential Revision: https://reviews.llvm.org/D110129
2021-10-21 08:42:58 -07:00
Aaron Ballman aad244dfc5 Revert "AddGlobalAnnotations for function with or without function body."
This reverts commit 121b2252de.

The following code causes a crash in some circumstances:

  struct k {
    ~k() __attribute__((annotate(""))) {}
  };
  void m() { k(); }
2021-10-21 07:08:18 -04:00
Itay Bookstein 08ed216000 [IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872
2021-10-20 10:29:47 -07:00
Zhi An Ng e1fb13401e [WebAssembly] Add prototype relaxed float min max instructions
Add relaxed. f32x4.min, f32x4.max, f64x2.min, f64x2.max. These are only
exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112146
2021-10-20 09:41:51 -07:00
Arthur Eubanks 063c2f89aa [clang] Add option to disable -clear-ast-before-backend
Some downstream users have plugins that -clear-ast-before-backend may
affect. Add an option to opt out.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D112100
2021-10-19 20:51:48 -07:00
Zhi An Ng 2542bfa43a [WebAssembly] Add prototype relaxed swizzle instructions
Add i8x16 relaxed_swizzle instructions. These are only
exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112022
2021-10-19 17:53:04 -07:00
Yuta Saito 1813fde9cc [WebAssembly] Emit clangast in custom section aligned by 4 bytes
Emit __clangast in custom section instead of named data segment
to find it while iterating sections.
This could be avoided if all data segements (the wasm sense) were
represented as their own sections (in the llvm sense).
This can be resolved by https://github.com/WebAssembly/tool-conventions/issues/138

And the on-disk hashtable in clangast needs to be aligned by 4 bytes,
so add paddings in name length field in custom section header.

The length of clangast section name can be represented in 1 byte
by leb128, and possible maximum pads are 3 bytes, so the section
name length won't be invalid in theory.

Fixes https://bugs.llvm.org/show_bug.cgi?id=35928

Differential Revision: https://reviews.llvm.org/D74531
2021-10-19 15:50:08 -07:00
Noah Shutty e678c51177 [Support][ThinLTO] Move ThinLTO caching to LLVM Support library
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.

Patch By: noajshu

Differential Revision: https://reviews.llvm.org/D111371
2021-10-18 18:57:25 -07:00
Petr Hosek 8e46e34d24 Revert "[Support][ThinLTO] Move ThinLTO caching to LLVM Support library"
This reverts commit 92b8cc52bb since
it broke the gold plugin.
2021-10-18 12:24:05 -07:00
Noah Shutty 92b8cc52bb [Support][ThinLTO] Move ThinLTO caching to LLVM Support library
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.

Patch By: noajshu

Differential Revision: https://reviews.llvm.org/D111371
2021-10-18 12:08:49 -07:00
Juneyoung Lee f193bcc701 Revert D105169 due to the two-stage failure in ASAN
This reverts the following commits:
37ca7a795b
9aa6c72b92
705387c507
8ca4b3ef19
80dba72a66
2021-10-18 23:52:46 +09:00
Kazu Hirata d245f2e859 [clang] Use llvm::erase_if (NFC) 2021-10-17 13:50:29 -07:00
Juneyoung Lee 80dba72a66 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2021-10-16 12:01:37 +09:00
Zhi An Ng da07942834 [WebAssembly] Add prototype relaxed laneselect instructions
Add i8x16, i16x8, i32x4, i64x2 laneselect instructions. These are only
exposed as builtins, and require user opt-in.
2021-10-15 17:45:09 -07:00
Kazu Hirata 6a154e606e [clang] Use llvm::is_contained (NFC) 2021-10-15 10:07:08 -07:00
Richard Smith effbf0bdd0 PR52183: Don't emit code for a void-typed constant expression.
This is unnecessary in general, and wrong when the expression invokes a
consteval function.
2021-10-14 20:55:51 -07:00
Arthur Eubanks d0a5f61c4f [clang] Support -clear-ast-before-backend without -disable-free
Previously without -disable-free, -clear-ast-before-backend would crash in ~ASTContext() due to various reasons.
This works around that by doing a lot of the cleanup ahead of the destructor so that the destructor doesn't actually do any manual cleanup if we've already cleaned up beforehand.

This actually does save a measurable amount of memory with -clear-ast-before-backend, although at an almost unnoticeable runtime cost:
https://llvm-compile-time-tracker.com/compare.php?from=5d755b32f2775b9219f6d6e2feda5e1417dc993b&to=58ef1c7ad7e2ad45f9c97597905a8cf05a26258c&stat=max-rss

Previously we weren't doing any cleanup with -disable-free, so I tried measuring the impact of always doing the cleanup and didn't measure anything noticeable on llvm-compile-time-tracker.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111767
2021-10-14 13:43:53 -07:00
Aaron Ballman 68157fe15b Fix a crash on valid consteval code.
Not all constants are emitted within the context of a function, so use
the module's ASTContext instead because 1) that's the same as the
current function ASTContext, and 2) the module can never be null.

Fixes PR50787.
2021-10-14 15:48:10 -04:00
Mike Rice fb4c451001 [OPENMP51]Initial parsing/sema for adjust_args clause for 'declare variant'
Adds initial parsing and sema for the 'adjust_args' clause.

Note that an AST clause is not created as it instead adds its expressions
to the OMPDeclareVariantAttr.

Differential Revision: https://reviews.llvm.org/D99905
2021-10-13 09:34:09 -07:00
Hsiangkai Wang 5158cfef8b [RISCV] After reverting _mt builtins, add `ta` argument for LLVM IR.
Previous patch only reverts C builtins for tail policy. In order to keep
LLVM IR intact, add the `ta` argument in vector builtins.
2021-10-13 19:41:49 +08:00
Hsiangkai Wang ff3ed78304 Revert "[RISCV] Define _m intrinsics as builtins, instead of macros."
This reverts commit 97f0c63783.

As discussed in https://reviews.llvm.org/D110684, it increased the
compile time and the binary size of clang more than 1%. I reverted
this patch first to think about a better way to do it.
2021-10-13 12:21:51 +08:00
Arthur Eubanks b6a8c69554 [NFC] Rename EmitAssemblyHelper new/legacy PM methods
To reflect the fact that the new PM is the default now.

Differential Revision: https://reviews.llvm.org/D111680
2021-10-12 15:41:44 -07:00
Arthur Eubanks 2cadef6537 [clang] Teardown new PM data structures before running codegen pipeline
Do this by refactoring the optimization and codegen pipelines into separate functions.

This saves a tiny bit of memory in non-LTO builds [1].

[1] https://llvm-compile-time-tracker.com/compare.php?from=fbddf22ef72d3c2e9b14e1501841b03380eef12b&to=cd276df52eb6f2b84a8e1efe5318460c6debf82d&stat=max-rss

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111582
2021-10-12 14:17:11 -07:00
Kazu Hirata 57b40b5f34 [AST, CodeGen, Driver] Use llvm::is_contained (NFC) 2021-10-12 09:19:49 -07:00
Nathan Sidwell dcd74716f9 [clang] p0388 conversion to incomplete array
This implements the new implicit conversion sequence to an incomplete
(unbounded) array type.  It is mostly Richard Smith's work, updated to
trunk, testcases added and a few bugs fixed found in such testing.

It is not a complete implementation of p0388.

Differential Revision: https://reviews.llvm.org/D102645
2021-10-12 07:35:20 -07:00
Yonghong Song a162b67c98 [Clang][Attr] rename btf_tag to btf_decl_tag
Current btf_tag is applied to declaration only.
Per discussion in https://reviews.llvm.org/D111199,
we plan to introduce btf_type_tag attribute for types.
So rename btf_tag to btf_decl_tag to make it easily
differentiable from btf_type_tag.

Differential Revision: https://reviews.llvm.org/D111588
2021-10-11 22:17:17 -07:00
hsmahesha db9c2d7751 [CFE][Codegen] Remove CodeGenFunction::InitTempAlloca()
Sequel patch to https://reviews.llvm.org/D111316

Finally, remove the defintion of CodeGenFunction::InitTempAlloca().

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D111324
2021-10-12 10:04:15 +05:30
hsmahesha f7de6962c8 [CFE][Codegen][In-progress] Remove CodeGenFunction::InitTempAlloca()
Sequel patch to https://reviews.llvm.org/D111293.

Remove call to CodeGenFunction::InitTempAlloca() from OpenMP related
codegen part.

Also remove the metadata `!llvm.access.group` from the updated lit
tests.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D111316
2021-10-12 10:01:46 +05:30
Hsiangkai Wang 97f0c63783 [RISCV] Define _m intrinsics as builtins, instead of macros.
In the original design, we levarage _mt intrinsics to define macros for
_m intrinsics. Such as,

```
__builtin_rvv_vadd_vv_i8m1_mt((vbool8_t)(op0), (vint8m1_t)(op1), (vint8m1_t)(op2), (vint8m1_t)(op3), (size_t)(op4), (size_t)VE_TAIL_AGNOSTIC)
```

However, we could not define generic interface for mask intrinsics any
more due to clang_builtin_alias only accepts clang builtins as its
argument.

In the example,

```
 __rvv_overloaded
 __attribute__((clang_builtin_alias(__builtin_rvv_vadd_vv_i8m1_mt)))
  vint8m1_t vadd(vbool8_t op0, vint8m1_t op1, vint8m1_t op2, vint8m1_t
  op3, size_t op4, size_t op5);
```

op5 is the tail policy argument. When users want to use vadd generic
interface for masked vector add, they need to specify tail policy in the
previous design. In this patch, we define _m intrinsics as clang
builtins to solve the problem.

Differential Revision: https://reviews.llvm.org/D110684
2021-10-12 10:47:55 +08:00
Chris Bieneman 121b2252de AddGlobalAnnotations for function with or without function body.
When AnnotateAttr is on a function, AddGlobalAnnotations is only called
in CodeGenModule::EmitGlobalFunctionDefinition which means AnnotateAttr
on function declaration without function body will be ignored.
The patch will move AddGlobalAnnotations  to
CodeGenModule::SetFunctionAttributes, so with or without function body,
the AnnotateAttr will get code gen for a function.

It'll help case when AnnotateAttr is on external function, and the
AnnotateAttr will be consumed in IR level.

For example, a pass to collect num of uses for functions with
__attribute((annotate("count_use"))) after optimizations,
As long as there's __attribute((annotate("count_use"))), function with
or without function body should be counted.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D111109

Patch by:  python3kgae (Xiang Li)
2021-10-11 14:50:34 -05:00
hsmahesha 0481682996 [CFE][Codegen][In-progress] Remove CodeGenFunction::InitTempAlloca()
CodeGenFunction::InitTempAlloca() inits the static alloca within the
entry block which may *not* necessarily be correct always.

For example, the current instruction insertion point (pointed by the
instruction builder) could be a program point which is hit multiple
times during the program execution, and it is expected that the static
alloca is initialized every time the program point is hit.

Hence remove CodeGenFunction::InitTempAlloca(), and initialize the
static alloca where the instruction insertion point is at the moment.

This patch, as a starting attempt, removes the calls to
CodeGenFunction::InitTempAlloca() which do not have any side effect on
the lit tests.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D111293
2021-10-09 09:23:14 +05:30
Richard Smith 7eae8c6e62 Don't update the vptr at the start of the destructor of a final class.
In this case, we know statically that we're destroying the most-derived
class, so the vptr must already point to the current class and never
needs to be updated.
2021-10-08 19:59:42 -07:00
Qiu Chaofan 8a714722e2 [NFC] [Clang] Use global enum for explicit float mode
Currently, there're multiple float types that can be represented by
__attribute__((mode(xx))). It's parsed, and then a corresponding type is
created if available.

This refactor moves the enum for mode into a global enum class visible
to ASTContext.

Reviewed By: aaron.ballman, erichkeane

Differential Revision: https://reviews.llvm.org/D111391
2021-10-09 10:39:10 +08:00
Joseph Huber bad44d5f39 [OpenMP] Add RTL function for getting number of threads in block.
This patch adds support for the
`__kmpc_get_hardware_num_threads_in_block` function that returns the
number of threads. This was missing in the new runtime and was used by
the AMDGPU plugin which prevented it from using the new runtime. This
patchs also unified the interface for getting the thread numbers in the
frontend.

Originally authored by jdoerfert.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D111475
2021-10-08 22:21:59 -04:00
Richard Smith 222305d6ff PR51079: Treat thread_local variables with an incomplete class type as
being not trivially destructible when determining if we can skip calling
their thread wrapper function.
2021-10-08 18:46:01 -07:00
Reid Kleckner 89b57061f7 Move TargetRegistry.(h|cpp) from Support to MC
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.

This allows us to ensure that Support doesn't have includes from MC/*.

Differential Revision: https://reviews.llvm.org/D111454
2021-10-08 14:51:48 -07:00
Arthur Eubanks a6891d2104 [clang] Set max allowed alignment to 2^32
Followup to D110451 which set LLVM's max allowed alignment to 2^32.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D111250
2021-10-08 11:44:15 -07:00
Masoud Ataei b0f68791f0 [clang] Option control afn flag
Clang option to set/unset afn fast-math flag.

 Differential: https://reviews.llvm.org/D106191
 Reviewd with: aaron.ballman, erichkeane, and others
2021-10-08 14:26:14 -04:00
Keith Smiley 68e49aea9a Revert "[clang] Fix absolute file paths with -fdebug-prefix-map"
This reverts commit a23a596793.

This broke a windows test https://buildkite.com/llvm-project/premerge-checks/builds/59492#7dad207c-6cbe-40ad-95e4-c48b47fe2527

Differential Revision: https://reviews.llvm.org/D111444
2021-10-08 10:39:44 -07:00
Keith Smiley a23a596793 [clang] Fix absolute file paths with -fdebug-prefix-map
Previously if you passed an absolute path to clang, where only part of
the path to the file was remapped, it would result in the file's DIFile
being stored with a duplicate path, for example:

```
!DIFile(filename: "./ios/Sources/bar.c", directory: "./ios/Sources")
```

This change handles absolute paths, specifically in the case they are
remapped to something relative, and uses the dirname for the directory,
and basename for the filename.

This also adds a test verifying this behavior for more standard uses as
well.

Differential Revision: https://reviews.llvm.org/D111352
2021-10-08 10:35:17 -07:00
John McCall 5ab6ee7599 Fix a variety of bugs with nil-receiver checks when targeting
non-Darwin ObjC runtimes:

- Use the same logic the Darwin runtime does for inferring that a
  receiver is non-null and therefore doesn't require null checks.
  Previously we weren't skipping these for non-super dispatch.

- Emit a null check when there's a consumed parameter so that we can
  destroy the argument if the call doesn't happen.  This mostly
  involves extracting some common logic from the Darwin-runtime code.

- Generate a zero aggregate by zeroing the same memory that was used
  in the method call instead of zeroing separate memory and then
  merging them with a phi.  This uses less memory and avoids unnecessary
  copies.

- Emit zero initialization, and generate zero values in phis, using
  the proper zero-value routines instead of assuming that the zero
  value of the result type has a bitwise-zero representation.
2021-10-08 05:44:06 -04:00
Wang, Pengfei c0f9c7c015 [X86] Check if struct is blank before getting the inner types
This fixes pr52011.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D111037
2021-10-08 17:09:34 +08:00
Joseph Huber 9efdca87c7 [OpenMP] Introduce new flags to assert thread and team usage in the runtime
This patch adds two flags to be supported for the new runtime. The flags
are `-fopenmp-assume-threads-oversubscription` and
-fopenmp-assume-teams-oversubscription`. These add global values that
can be checked by the work sharing runtime functions to make better
judgements about how to distribute work between the threads.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D111348
2021-10-07 22:23:09 -04:00
Itay Bookstein 40ec1c0f16 [IR][NFC] Rename getBaseObject to getAliaseeObject
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
2021-10-06 19:33:10 -07:00
David Blaikie f6a561c4d6 DebugInfo: Use clang's preferred names for integer types
This reverts c7f16ab3e3 / r109694 - which
suggested this was done to improve consistency with the gdb test suite.
Possible that at the time GCC did not canonicalize integer types, and so
matching types was important for cross-compiler validity, or that it was
only a case of over-constrained test cases that printed out/tested the
exact names of integer types.

In any case neither issue seems to exist today based on my limited
testing - both gdb and lldb canonicalize integer types (in a way that
happens to match Clang's preferred naming, incidentally) and so never
print the original text name produced in the DWARF by GCC or Clang.

This canonicalization appears to be in `integer_types_same_name_p` for
GDB and in `TypeSystemClang::GetBasicTypeEnumeration` for lldb.

(I tested this with one translation unit defining 3 variables - `long`,
`long (*)()`, and `int (*)()`, and another translation unit that had
main, and a function that took `long (*)()` as a parameter - then
compiled them with mismatched compilers (either GCC+Clang, or
Clang+(Clang with this patch applied)) and no matter the combination,
despite the debug info for one CU naming the type "long int" and the
other naming it "long", both debuggers printed out the name as "long"
and were able to correctly perform overload resolution and pass the
`long int (*)()` variable to the `long (*)()` function parameter)

Did find one hiccup, identified by the lldb test suite - that CodeView
was relying on these names to map them to builtin types in that format.
So added some handling for that in LLVM. (these could be split out into
separate patches, but seems small enough to not warrant it - will do
that if there ends up needing any reverti/revisiting)

Differential Revision: https://reviews.llvm.org/D110455
2021-10-06 16:02:34 -07:00
Jennifer Yu a4743eba3c Fix assert of "Unable to find base lambda address" from
adjustMemberOfForLambdaCaptures.

The problem is happening when user passes lambda function with reference
type in the map clause.

The natural of the problem when processing generateInfoForCapture,
the BasePointer is generated with new load for a lambda variable with
reference type.  It is not expected in adjustMemberOfForLambdaCaptures.

One way to fix this is to skipping call to generateInfoForCapture for
map(to:lambda).  The map info will be generated later in the call to
generateDefaultMapInfo samiler as firsprivate clase.

This to fix https://bugs.llvm.org/show_bug.cgi?id=52071

Differential Revision:https://reviews.llvm.org/D111115
2021-10-06 14:14:28 -07:00
Arthur Eubanks 6522b7cc32 [clang] Add option to clear AST memory before running LLVM passes
This is to save memory for Clang compiles.
Measuring building PassBuilder.cpp under /usr/bin/time, max rss goes from 0.93GB to 0.7GB.

This does not turn it by default yet.

I've turned on the option locally and run it over a good amount of files without any issues.

For more background, see
https://lists.llvm.org/pipermail/cfe-dev/2021-September/068930.html.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D111105
2021-10-06 13:42:22 -07:00
Arthur Eubanks 05392466f0 Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 13:29:23 -07:00
Arthur Eubanks 569346f274 Revert "Reland [IR] Increase max alignment to 4GB"
This reverts commit 8d64314ffe.
2021-10-06 11:38:11 -07:00
Arthur Eubanks 8d64314ffe Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 11:03:51 -07:00
Arthur Eubanks 72cf8b6044 Revert "[IR] Increase max alignment to 4GB"
This reverts commit df84c1fe78.

Breaks some bots
2021-10-06 10:21:35 -07:00
Arthur Eubanks df84c1fe78 [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 09:54:14 -07:00
Michael Kruse f37e8b0b83 [Clang][OpenMP] Infix OMPLoopTransformationDirective abstract class. NFC.
Insert OMPLoopTransformationDirective between OMPLoopBasedDirective and the loop transformations OMPTileDirective and OMPUnrollDirective. This simplifies handling of loop transformations not requiring distinguishing between OMPTileDirective and OMPUnrollDirective anymore.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D111119
2021-10-06 10:49:07 -05:00
Simon Pilgrim b9b90bb542 [clang] Replace report_fatal_error(std::string) uses with report_fatal_error(Twine)
As described on D111049, we're trying to remove the <string> dependency from error handling and replace uses of report_fatal_error(const std::string&) with the Twine() variant which can be forward declared.
2021-10-06 11:43:19 +01:00
Corentin Jabot 424733c12a Implement if consteval (P1938)
Modify the IfStmt node to suppoort constant evaluated expressions.

Add a new ExpressionEvaluationContext::ImmediateFunctionContext to
keep track of immediate function contexts.

This proved easier/better/probably more efficient than walking the AST
backward as it allows diagnosing nested if consteval statements.
2021-10-05 08:04:14 -04:00
Arthur Eubanks 2568286892 [clang] Don't use the AST to display backend diagnostics
We keep a map from function name to source location so we don't have to
do it via looking up a source location from the AST. However, since
function names can be long, we actually use a hash of the function name
as the key.

Additionally, we can't rely on Clang's printing of function names via
the AST, so we just demangle the name instead.

This is necessary to implement
https://lists.llvm.org/pipermail/cfe-dev/2021-September/068930.html.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D110665
2021-10-04 14:14:32 -07:00
serge-sans-paille 0f0e31cf51 Update inline builtin handling to honor gnu inline attribute
Per the GCC info page:

    If the function is declared 'extern', then this definition of the
    function is used only for inlining.  In no case is the function
    compiled as a standalone function, not even if you take its address
    explicitly.  Such an address becomes an external reference, as if
    you had only declared the function, and had not defined it.

Respect that behavior for inline builtins: keep the original definition, and
generate a copy of the declaration suffixed by '.inline' that's only referenced
in direct call.

This fixes holes in c3717b6858.

Differential Revision: https://reviews.llvm.org/D111009
2021-10-04 22:26:25 +02:00
Alexey Bataev bfc8f9e9b0 [clang] Fix computation of number of dependencies using OpenMP iterator,
by Raul Penacoba.

The size of kmp_depend_info and the number of dependencies are computed multiplying the iterator sizes, which not right.
Now size is computed as:

itersize1*numclausedeps1 + itersize2*numclausedeps2 + ... + itersizeN*numclausedepsN

where itersizeX is the size of the iterator and numclausedepsX the number of dependencies in that depend clause.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D111045
2021-10-04 07:06:51 -07:00
Stefan Pintilie 4fc2f4979c [PowerPC] Fix __builtin_ppc_load2r to return short instead of int.
This patch fixes the return value of the builtin __builtin_ppc_load2r to
correctly return short instead of int.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D110771
2021-10-04 06:17:02 -05:00
Jay Foad d933adeaca [APInt] Stop using soft-deprecated constructors and methods in clang. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.

Differential Revision: https://reviews.llvm.org/D110808
2021-10-04 09:38:11 +01:00
Dávid Bolvanský b1fcca3884 Fixed warnings in LLVM produced by -Wbitwise-instead-of-logical 2021-10-03 13:04:18 +02:00
Joseph Huber d12502a3ab [OpenMP] Apply OpenMP assumptions to applicable call sites
This patch adds OpenMP assumption attributes to call sites in applicable
regions. Currently this applies the caller's assumption attributes to
any calls contained within it. So, if a call occurs inside an OpenMP
assumes region to a function outside that region, we will assume that
call respects the assumptions. This is primarily useful for inline
assembly calls used heavily in the OpenMP GPU device runtime, which
allows us to then make judgements about what the ASM will do.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110655
2021-09-29 16:08:21 -04:00
Quinn Pham 67a3d1e275 [PowerPC] swdiv builtins for XL compatibility
This patch is in a series of patches to provide builtins for compatibility with
the XL compiler. This patch implements the software divide builtin as
wrappers for a floating point divide. XL provided these builtins because it
didn't produce software estimates by default at `-Ofast`. When compiled
with `-Ofast` these builtins will produce the software estimate for divide.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D106959
2021-09-29 11:31:07 -05:00
Sven van Haastregt 4da744a20f [OpenCL] Fix as_type3 invalid store creation
With -fpreserve-vec3-type enabled, a cast was not created when
converting from a non-vec3 type to a vec3 type, even though a
conversion to vec3 was performed.  This resulted in creation of
invalid store instructions.

Differential Revision: https://reviews.llvm.org/D108470
2021-09-29 09:40:06 +01:00
Arthur Eubanks aa53785f23 Reland [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Previous revisions didn't properly declare the new dependencies.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 15:31:30 -07:00
Arthur Eubanks 7833d20f1f Revert "[clang] Rework dontcall attributes"
This reverts commit 2943071e2e.

Breaks bots
2021-09-28 14:49:27 -07:00
Arthur Eubanks 2943071e2e [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 14:21:10 -07:00
serge-sans-paille c3717b6858 Simplify handling of builtin with inline redefinition
(This is a recommit of 3d6f49a569 that should no longer break validation since
bd379915de).

It is a common practice in glibc header to provide an inline redefinition of an
existing function. It is especially the case for fortified function.

Clang currently has an imperfect approach to the problem, using a combination of
trivially recursive function detection and noinline attribute.

Simplify the logic by suffixing these functions by `.inline` during codegen, so
that they are not recognized as builtin by llvm.

After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite

Differential Revision: https://reviews.llvm.org/D109967
2021-09-28 21:00:47 +02:00
Kevin Athey 0d76d4833d Revert "Simplify handling of builtin with inline redefinition"
This reverts commit 3d6f49a569.

Broke bot: https://lab.llvm.org/buildbot/#/builders/5/builds/12360
2021-09-28 11:30:37 -07:00
David Blaikie 85f612efeb DebugInfo: Use sugared function type when emitting function declarations for call sites
Otherwise we're losing type information for these functions.
2021-09-28 10:44:35 -07:00
Quinn Pham 70391b3468 [PowerPC] FP compare and test XL compat builtins.
This patch is in a series of patches to provide builtins for
compatability with the XL compiler. This patch adds builtins for compare
exponent and test data class operations on floating point values.

Reviewed By: #powerpc, lei

Differential Revision: https://reviews.llvm.org/D109437
2021-09-28 11:01:51 -05:00
serge-sans-paille 3d6f49a569 Simplify handling of builtin with inline redefinition
It is a common practice in glibc header to provide an inline redefinition of an
existing function. It is especially the case for fortified function.

Clang currently has an imperfect approach to the problem, using a combination of
trivially recursive function detection and noinline attribute.

Simplify the logic by suffixing these functions by `.inline` during codegen, so
that they are not recognized as builtin by llvm.

After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite

Differential Revision: https://reviews.llvm.org/D109967
2021-09-28 13:24:25 +02:00
Ahsan Saghir 593b074a09 [PowerPC] MMA - Add __builtin_vsx_build_pair and __builtin_mma_build_acc builtins
This patch adds the following built-ins:

__builtin_vsx_build_pair
__builtin_mma_build_acc

Reviewed By: #powerpc, nemanjai, lei

Differential Revision: https://reviews.llvm.org/D107647
2021-09-27 19:51:28 -05:00
Joseph Huber b4a5543624 [OpenMP] Introduce a new worksharing RTL function for distribute
This patch adds a new RTL function for worksharing. Currently we use
`__kmpc_for_static_init` for both the `distribute` and `parallel`
portion of the loop clause. This patch replaces the `distribute` portion
with a new runtime call `__kmpc_distribute_static_init`. Currently this
will be used exactly the same way, but will make it easier in the future
to fine-tune the distribute and parallel portion of the loop.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110429
2021-09-27 11:36:37 -04:00
Wang, Pengfei 7d6889964a [X86][FP16] Add more builtins to avoid multi evaluation problems & add 2 missed intrinsics
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110336
2021-09-27 09:27:04 +08:00
David Blaikie 8d9ddd4f50 DebugInfo: STN: Handle unreconstitutable types in function types 2021-09-23 21:13:16 -07:00
David Blaikie 25ac0d3c73 DebugInfo: Implement the -gsimple-template-names functionality
This excludes certain names that can't be rebuilt from the available
DWARF:

* Atomic types - no DWARF differentiating int from atomic int.
* Vector types - enough DWARF (an attribute on the array type) to do
  this, but I haven't written the extra code to add the attributes
  required for this
* Lambdas - ambiguous with any other unnamed class
* Unnamed classes/enums - would need column info for the type in
  addition to file/line number
* noexcept function types - not encoded in DWARF
2021-09-23 19:58:32 -07:00
Thomas Lively 2f519825ba [WebAssembly] Add prototype relaxed SIMD fma/fms instructions
Add experimental clang builtins, LLVM intrinsics, and backend definitions for
the new {f32x4,f64x2}.{fma,fms} instructions in the relaxed SIMD proposal:
https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md.
Do not allow these instructions to be selected without explicit user opt-in.

Differential Revision: https://reviews.llvm.org/D110295
2021-09-23 11:01:36 -07:00
Hongtao Yu d9b511d8e8 [CSSPGO] Set PseudoProbeInserter as a default pass.
Currenlty PseudoProbeInserter is a pass conditioned on a target switch. It works well with a single clang invocation. It doesn't work so well when the backend is called separately (i.e, through the linker or llc), where user has always to pass -pseudo-probe-for-profiling explictly. I'm making the pass a default pass that requires no command line arg to trigger, but will be actually run depending on whether the CU comes with `llvm.pseudo_probe_desc` metadata.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D110209
2021-09-22 09:09:48 -07:00
Shilei Tian ca999f7191 [OpenMP][Offloading] Use bitset to indicate execution mode instead of value
The execution mode of a kernel is stored in a global variable, whose value means:
- 0 - SPMD mode
- 1 - indicates generic mode
- 2 - SPMD mode execution with generic mode semantics

We are going to add support for SIMD execution mode. It will be come with another
execution mode, such as SIMD-generic mode. As a result, this value-based indicator
is not flexible.

This patch changes to bitset based solution to encode execution mode. Each
position is:
[0] - generic mode
[1] - SPMD mode
[2] - SIMD mode (will be added later)

In this way, `0x1` is generic mode, `0x2` is SPMD mode, and `0x3` is SPMD mode
execution with generic mode semantics. In the future after we add the support for
SIMD mode, `0b1xx` will be in SIMD mode.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D110029
2021-09-22 11:40:52 -04:00
Florian Hahn ea21d688dc
[Matrix] Emit assumption that matrix indices are valid.
The matrix extension requires the indices for matrix subscript
expression to be valid and it is UB otherwise.

extract/insertelement produce poison if the index is invalid, which
limits the optimizer to not be bale to scalarize load/extract pairs for
example, which causes very suboptimal code to be generated when using
matrix subscript expressions with variable indices for large matrixes.

This patch updates IRGen to emit assumes to for index expression to
convey the information that the index must be valid.

This also adjusts the order in which operations are emitted slightly, so
indices & assumes are added before the load of the matrix value.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D102478
2021-09-22 12:27:37 +01:00
David Blaikie 2ff049b12e DebugInfo: Don't use preferred template names in debug info
Using the preferred name creates a mismatch between the textual name of
a type and the DWARF tags describing the parameters as well as possible
inconsistency between DWARF producers (like Clang and GCC, or
older/newer Clang versions, etc).
2021-09-21 20:08:16 -07:00
David Blaikie db6f1e8a88 DebugInfo: Don't suppress inline namespaces when printing template template parameter names 2021-09-21 19:30:13 -07:00
David Blaikie d31dfc3011 DebugInfo: Unify some printing policy adjustments 2021-09-21 19:30:12 -07:00
Giorgis Georgakoudis ac90dfc43a Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit 1d66649adf.

Revert to fix AMG GPU issue.
2021-09-21 13:20:39 -07:00
Matheus Izvekov d9308aa39b [clang] don't mark as Elidable CXXConstruct expressions used in NRVO
See PR51862.

The consumers of the Elidable flag in CXXConstructExpr assume that
an elidable construction just goes through a single copy/move construction,
so that the source object is immediately passed as an argument and is the same
type as the parameter itself.

With the implementation of P2266 and after some adjustments to the
implementation of P1825, we started (correctly, as per standard)
allowing more cases where the copy initialization goes through
user defined conversions.

With this patch we stop using this flag in NRVO contexts, to preserve code
that relies on that assumption.
This causes no known functional changes, we just stop firing some asserts
in a cople of included test cases.

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D109800
2021-09-21 21:41:20 +02:00
Giorgis Georgakoudis 1d66649adf [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6

Differential Revision: https://reviews.llvm.org/D102107
2021-09-21 10:50:04 -07:00
Wang, Pengfei 227673398c [X86] Always check the size of SourceTy before getting the next type
D109607 results in a regression in llvm-test-suite.
The reason is we didn't check the size of SourceTy, so that we will
return wrong SSE type when SourceTy is overlapped.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D110037
2021-09-20 23:34:19 +08:00
alokmishra.besu 000875c127 OpenMP 5.0 metadirective
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.

A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h

Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D91944
2021-09-18 13:40:44 -05:00
Nico Weber 31cca21565 Revert "OpenMP 5.0 metadirective"
This reverts commit c7d7b98e52.
Breaks tests on macOS, see comment on https://reviews.llvm.org/D91944
2021-09-18 09:10:37 -04:00
Adrian Prantl 843390c58a Apply proper source location to fallthrough switch cases.
This fixes a bug in clang where, when clang sees a switch with a
fallthrough to a default like this:

static void funcA(void) {}
static void funcB(void) {}

int main(int argc, char **argv) {

switch (argc) {
    case 0:
        funcA();
        break;
    case 10:
    default:
        funcB();
        break;
}
}

It does not add a proper debug location for that switch case, such as
case 10: above.

Patch by Shubham Rastogi!

Differential Revision: https://reviews.llvm.org/D109940
2021-09-17 14:45:04 -07:00
cchen 9ff848c5cd Revert "[OpenMP] Use irbuilder as default for masked and master construct"
This reverts commit 2908fc0d3f.
2021-09-17 16:44:09 -05:00
alokmishra.besu 347f3c186d OpenMP 5.0 metadirective
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.

A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h

Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D91944
2021-09-17 16:30:06 -05:00
cchen 7efb825382 Revert "OpenMP 5.0 metadirective"
This reverts commit c7d7b98e52.
2021-09-17 16:14:16 -05:00
cchen c7d7b98e52 OpenMP 5.0 metadirective
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.

A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h

Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D91944
2021-09-17 16:03:13 -05:00
cchen 2908fc0d3f [OpenMP] Use irbuilder as default for masked and master construct
Use irbuilder as default and remove redundant Clang codegen for masked construct and master construct.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D100874
2021-09-17 15:54:11 -05:00
Erich Keane e3b10525b4 Make multiversioning work with internal linkage
We previously made all multiversioning resolvers/ifuncs have weak
ODR linkage in IR, since we NEED to emit the whole resolver every time
we see a call, but it is not necessarily the place where all the
definitions live.

HOWEVER, when doing so, we neglected the case where the versions have
internal linkage.  This patch ensures we do this, so you don't get weird
behavior with static functions.
2021-09-17 05:56:38 -07:00
Wang, Pengfei e9e1d4751b [X86] Refactor GetSSETypeAtOffset to fix pr51813
D105263 adds support for _Float16 type. It introduced a bug (pr51813) that generates a <4 x half> type instead the default double when passing blank structure by SSE registers.

Although I doubt it may expose a bug somewhere other than D105263, it's good to avoid return half type when no half type in arguments.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D109607
2021-09-17 10:51:59 +08:00
Aaron Ballman aefb81a33a Removing some spurious whitespace; NFC 2021-09-16 13:14:08 -04:00
Arnold Schwaighofer f670c5aeee Add a new frontend flag `-fswift-async-fp={auto|always|never}`
Summary:
Introduce a new frontend flag `-fswift-async-fp={auto|always|never}`
that controls how code generation sets the Swift extended async frame
info bit. There are three possibilities:

* `auto`: which determines how to set the bit based on deployment target, either
statically or dynamically via `swift_async_extendedFramePointerFlags`.
* `always`: default, always set the bit statically, regardless of deployment
target.
* `never`: never set the bit, regardless of deployment target.

Differential Revision: https://reviews.llvm.org/D109451
2021-09-16 08:48:51 -07:00
Yaxun (Sam) Liu abe8b354e3 Fix vtbl field addr space
Storing the vtable field of an object should use the same address space as
the this pointer. Currently it is assumed to be addr space 0 but this may not
be true.

This assumption (added in 054cc3b1b4) caused
issues for the out-of-tree CHERI targets.

Reviewed by: John McCall, Alexander Richardson

Differential Revision: https://reviews.llvm.org/D109841
2021-09-16 10:57:31 -04:00
Zarko Todorovski 1b0a71c5fc [PowerPC][AIX] Add support for varargs for complex types on AIX
Remove the previous error and add support for special handling of small
complex types as in PPC64 ELF ABI. As in, generate code to load from
varargs location and pack it in a temp variable, then return a pointer to
the struct.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D106393
2021-09-16 09:38:03 -04:00
Bjorn Pettersson 8f8616655c [NewPM] Use a separate struct for ModuleThreadSanitizerPass
Split ThreadSanitizerPass into ThreadSanitizerPass (as a function
pass) and ModuleThreadSanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.

This is a follow-up to D105006 and D105007.
2021-09-16 14:58:42 +02:00
Bjorn Pettersson ab41eef9ac [NewPM] Use a separate struct for ModuleMemorySanitizerPass
Split MemorySanitizerPass into MemorySanitizerPass (as a function
pass) and ModuleMemorySanitizerPass (as a module pass).
Main reason is to make sure that we have a unique mapping from
ClassName to PassName in the new passmanager framework, making it
possible to correctly identify the passes when dealing with options
such as -print-after and -print-pipeline-passes.

This is a follow-up to D105006 and D105007.
2021-09-16 14:58:42 +02:00
David Blaikie 8264846c0e Senticify some comments - post-commit review for e4b9f5e851
Based on feedback from Paul Robinson.
2021-09-15 13:59:11 -07:00
Walter Lee 66c6bbe7ff Put code that avoids heapifying local blocks behind a flag
This change puts the functionality in commit
c5792aa90f behind a flag that is off by
default.  The original commit is not in Apple's Clang fork (and blocks
are an Apple extension in the first place), and there is one known
issue that needs to be addressed before it can be enabled safely.

Differential Revision: https://reviews.llvm.org/D108243
2021-09-14 14:06:05 -04:00
David Blaikie 13e34f9fc1 Fixup some formatting from a recent commit 2021-09-14 00:41:19 -07:00
David Blaikie e4b9f5e851 DebugInfo: Add support for template parameters with reference qualifiers
Followon from the previous commit supporting cvr qualifiers.
2021-09-14 00:39:47 -07:00
David Blaikie db4ff98bf9 DebugInfo: Add support for template parameters with qualifiers
eg: t1<void () const> - DWARF doesn't have a particularly nice way to
encode this, for real member function types (like `void (t1::*)()
const`) the const-ness is encoded in the type of the artificial first
parameter. But `void () const` has no parameters, so encode it like a
normal const-qualified type, using DW_TAG_const_type. (similarly for
restrict and volatile)

Reference qualifiers (& and &&) coming in a separate commit shortly.
2021-09-14 00:04:40 -07:00
Xiang1 Zhang c81d6ab875 [X86] Adjust Keylocker handle mem size
Reviewed By: Topper Craig

Differential Revision: https://reviews.llvm.org/D109488
2021-09-13 18:03:27 +08:00
Xiang1 Zhang bdce8d40c6 Revert "[X86] Adjust Keylocker handle mem size"
This reverts commit 3731de6b7f.
2021-09-13 18:00:46 +08:00
Xiang1 Zhang 3731de6b7f [X86] Adjust Keylocker handle mem size
Reviewed By: Topper Craig

Differential Revision: https://reviews.llvm.org/D109354
2021-09-13 17:59:33 +08:00
Joseph Huber 29b44ca896 [OpenMP] Add flag for setting debug in the offloading device
This patch introduces the flags `-fopenmp-target-debug` and
`-fopenmp-target-debug=` to set the value of a global in the device.
This will be used to enable or disable debugging features statically in
the device runtime library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D109544
2021-09-10 18:19:19 -04:00
Roman Lebedev 85ba583eba
[NFCI][clang] Move allocation alignment manifestation for malloc-like into Sema from Codegen
... so that it happens right next to `AddKnownFunctionAttributesForReplaceableGlobalAllocationFunction()`,
which is good for consistency.
2021-09-10 20:49:28 +03:00
Chris Lattner 735f46715d [APInt] Normalize naming on keep constructors / predicate methods.
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`.  This achieves two things:

1) This starts standardizing predicates across the LLVM codebase,
   following (in this case) ConstantInt.  The word "Value" doesn't
   convey anything of merit, and is missing in some of the other things.

2) Calling an integer "null" doesn't make any sense.  The original sin
   here is mine and I've regretted it for years.  This moves us to calling
   it "zero" instead, which is correct!

APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go.  As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.

Included in this patch are changes to a bunch of the codebase, but there are
more.  We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.

Differential Revision: https://reviews.llvm.org/D109483
2021-09-09 09:50:24 -07:00
Akira Hatanaka 59cc39ae14 [ObjC][ARC] Use the addresses of the ARC runtime functions instead of
integer 0/1 for the operand of bundle "clang.arc.attachedcall"

This should make it easier to understand what the IR is doing and also
simplify some of the passes as they no longer have to translate the
integer values to the runtime functions.

Differential Revision: https://reviews.llvm.org/D102996
2021-09-08 11:56:22 -07:00
Wang, Pengfei e6e8d25920 [X86][mingw] Modify the alignment of __m128/__m256/__m512 vector type for mingw
This is a follow up patch after D78564 and D108887.

Martin helped to confirm the alignment in GCC mingw is the same as the
size of vector. https://reviews.llvm.org/D108887#inline-1040893

Reviewed By: mstorsjo

Differential Revision: https://reviews.llvm.org/D109265
2021-09-06 20:28:09 +08:00
Qiu Chaofan fae0dfa642 [Clang] Add __ibm128 type to represent ppc_fp128
Currently, we have no front-end type for ppc_fp128 type in IR. PowerPC
target generates ppc_fp128 type from long double now, but there's option
(-mabi=(ieee|ibm)longdouble) to control it and we're going to do
transition from IBM extended double-double ppc_fp128 to IEEE fp128 in
the future.

This patch adds type __ibm128 which always represents ppc_fp128 in IR,
as what GCC did for that type. Without this type in Clang, compilation
will fail if compiling against future version of libstdcxx (which uses
__ibm128 in headers).

Although all operations in backend for __ibm128 is done by software,
only PowerPC enables support for it.

There's something not implemented in this commit, which can be done in
future ones:

- Literal suffix for __ibm128 type. w/W is suitable as GCC documented.
- __attribute__((mode(IF))) should be for __ibm128.
- Complex __ibm128 type.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D93377
2021-09-06 18:00:58 +08:00
Michael Kruse 650bbc5620 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.

Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-04 19:18:58 -05:00
Kazuaki Ishizaki 8f77dc459e [clang] NFC: Fix trivial typo in comments and document
`the the` -> `the`

Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D77470
2021-09-04 12:59:42 +05:30
PeixinQiao a42380ce83 [OMPIRBuilder] Add ordered directive to OMPBuilder
Add support for ordered directive in the OpenMPIRBuilder.

This patch also modidies clang to use the ordered directive when the
option -fopenmp-enable-irbuilder is enabled.

Also fix one ICE when parsing one canonical for loop with the relational
operator LE or GE in openmp region by replacing unary increment
operation of the expression of the variable "Expr A" minus the variable
"Expr B" (++(Expr A - Expr B)) with binary addition operation of the
experssion of the variable "Expr A" minus the variable "Expr B" and the
expression with constant value "1" (Expr A - Expr B + "1").

Reviewed By: Meinersbur, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107430
2021-09-03 09:37:58 +08:00
David Blaikie 5fb3f43778 Fully qualify template template parameters when printing
I discovered this quirk when working on some DWARF - AST printing prints
type template parameters fully qualified, but printed template template
parameters the way they were written syntactically, or wholely
unqualified - instead, we should print them consistently with the way we
print type template parameters: fully qualified.

The one place this got weird was for partial specializations like in
ast-print-temp-class.cpp - hence the need for checking for
TemplateNameDependenceScope::DependentInstantiation template template
parameters. (not 100% sure that's the right solution to that, though -
open to ideas)

Differential Revision: https://reviews.llvm.org/D108794
2021-09-02 15:04:34 -07:00
Roman Lebedev 3f1f08f0ed
Revert @llvm.isnan intrinsic patchset.
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)

TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.

While the end result of discussion may lead back to the current design,
it may also not lead to the current design.

Therefore i take it upon myself
to revert the tree back to last known good state.

This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
2021-09-02 13:53:56 +03:00
Roman Lebedev 50634deaa5
Revert "[OpenMP][OpenMPIRBuilder] Implement loop unrolling."
Breaks build with -DBUILD_SHARED_LIBS=ON
```
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "LLVMFrontendOpenMP" of type SHARED_LIBRARY
    depends on "LLVMPasses" (weak)
  "LLVMipo" of type SHARED_LIBRARY
    depends on "LLVMFrontendOpenMP" (weak)
  "LLVMCoroutines" of type SHARED_LIBRARY
    depends on "LLVMipo" (weak)
  "LLVMPasses" of type SHARED_LIBRARY
    depends on "LLVMCoroutines" (weak)
    depends on "LLVMipo" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed.  Build files cannot be regenerated correctly.
```

This reverts commit 707ce34b06.
2021-09-02 12:42:23 +03:00
Michael Kruse 707ce34b06 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-02 02:37:25 -05:00
Erich Keane 42ae7eb581 Ensure field-annotations on pointers properly match the AS of the field.
Discovered in SYCL, the field annotations were always cast to an i8*,
which is an invalid bitcast for a pointer type with an address space.
This patch makes sure that we create an intrinsic that takes a pointer
to the correct address-space and properly do our casts.

Differential Revision: https://reviews.llvm.org/D109003
2021-09-01 06:12:24 -07:00
Joel E. Denny 83ddfa0d22 [OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2)
This patch implements Clang support for an original OpenMP extension
we have developed to support OpenACC: the `ompx_hold` map type
modifier.  The next patch in this series, D106510, implements OpenMP
runtime support.

Consider the following example:

```
 #pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x
 {
   foo(); // might have map(delete: x)
   #pragma omp target map(present, alloc: x) // x is guaranteed to be present
   printf("%d\n", x);
 }
```

The `ompx_hold` map type modifier above specifies that the `target
data` directive holds onto the mapping for `x` throughout the
associated region regardless of any `target exit data` directives
executed during the call to `foo`.  Thus, the presence assertion for
`x` at the enclosed `target` construct cannot fail.  (As usual, the
standard OpenMP reference count for `x` must also reach zero before
the data is unmapped.)

Justification for inclusion in Clang and LLVM's OpenMP runtime:

* The `ompx_hold` modifier supports OpenACC functionality (structured
  reference count) that cannot be achieved in standard OpenMP, as of
  5.1.
* The runtime implementation for `ompx_hold` (next patch) will thus be
  used by Flang's OpenACC support.
* The Clang implementation for `ompx_hold` (this patch) as well as the
  runtime implementation are required for the Clang OpenACC support
  being developed as part of the ECP Clacc project, which translates
  OpenACC to OpenMP at the directive AST level.  These patches are the
  first step in upstreaming OpenACC functionality from Clacc.
* The Clang implementation for `ompx_hold` is also used by the tests
  in the runtime implementation.  That syntactic support makes the
  tests more readable than low-level runtime calls can.  Moreover,
  upstream Flang and Clang do not yet support OpenACC syntax
  sufficiently for writing the tests.
* More generally, the Clang implementation enables a clean separation
  of concerns between OpenACC and OpenMP development in LLVM.  That
  is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's
  extended OpenMP implementation and test suite without directly
  considering OpenACC's language and execution model, which can be
  handled by LLVM's OpenACC developers.
* OpenMP users might find the `ompx_hold` modifier useful, as in the
  above example.

See new documentation introduced by this patch in `openmp/docs` for
more detail on the functionality of this extension and its
relationship with OpenACC.  For example, it explains how the runtime
must support two reference counts, as specified by OpenACC.

Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new
command-line option introduced by this patch, is specified.

Reviewed By: ABataev, jdoerfert, protze.joachim, grokos

Differential Revision: https://reviews.llvm.org/D106509
2021-08-31 16:13:49 -04:00
Kazu Hirata b8debabb77 [clang] Remove redundant calls to c_str() (NFC)
Identified with readability-redundant-string-cstr.
2021-08-31 08:53:51 -07:00
Justas Janickas f9bc1b3bee [OpenCL] Defines helper function for kernel language compatible OpenCL version
This change defines a helper function getOpenCLCompatibleVersion()
inside LangOptions class. The function contains mapping between
C++ for OpenCL versions and their corresponding compatible OpenCL
versions. This mapping function should be updated each time a new
C++ for OpenCL language version is introduced. The helper function
is expected to simplify conditions on OpenCL C and C++ for OpenCL
versions inside compiler code.

Code refactoring performed.

Differential Revision: https://reviews.llvm.org/D108693
2021-08-31 10:08:38 +01:00
David Blaikie 4f3a92ca0a DebugInfo: Refactor/deduplicate various template argument list emission
Streamline template arguments across types, variables, and functions -
for convenient reuse in experiments related to template argument list
reconstitution (not including template argument lists in the "name" of
those entities, and leaving it to debug info consumers to rebuild the
full template name from the semantic descriptions of the argument lists)

But the change seems like a good refactoring/cleanup anyway.

I'd certainly be open to suggestions about how this might be more
streamlined - like is there no generic way to query template argument
lists across the 3 kinds of entities, rather than needing special case
code?
2021-08-30 22:39:46 -07:00
Andrei Elovikov 1724a16437 [NFC][clang] Move IR-independent parts of target MV support to X86TargetParser.cpp
...that is located under llvm/lib/Support/.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D108423
2021-08-30 09:48:48 -07:00
Steven Wan 73733ae526 TypeInfo records more information about align requirement
Extend the information preserved in `TypeInfo` by replacing the `AlignIsRequired` bool flag with a three-valued enum, the enum also indicates where the alignment attribute come from, which could be helpful in determining whether the attribute should overrule.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D108858
2021-08-28 19:47:48 -04:00
Yonghong Song 82d9cb34a2 [DebugInfo] convert btf_tag attrs to DI annotations for func parameters
Generate btf_tag annotations for DILocalVariable. The annotations
are represented as an DINodeArray in DebugInfo.

Differential Revision: https://reviews.llvm.org/D106620
2021-08-26 14:27:58 -07:00
Yonghong Song d2d7a90ced [DebugInfo] convert btf_tag attrs to DI annotations for DIGlobalVariable
Generate btf_tag annotations for DIGlobalVariable. The annotations
are represented as an DINodeArray in DebugInfo.

Differential Revision: https://reviews.llvm.org/D106619
2021-08-26 10:36:33 -07:00
Yonghong Song 2de051ba12 [DebugInfo] convert btf_tag attrs to DI annotations for DISubprograms
Generate btf_tag annotations for DISubprograms. The annotations
are represented as an DINodeArray in DebugInfo.

Differential Revision: https://reviews.llvm.org/D106618
2021-08-26 08:54:11 -07:00
Sindhu Chittireddy de15979bc3 Assert pointer cannot be null; NFC
Klocwork static code analysis exposed this concern:
Pointer 'SubExpr' returned from call to getSubExpr() function which may
return NULL from 'cast_or_null<Expr>(Operand)', which will be
dereferenced in the statement following it

Add an assert on SubExpr to make it clear this pointer cannot be null.
2021-08-26 06:58:56 -04:00
Alex Richardson 7cab90a7b1 Fix __attribute__((annotate("")) with non-zero globals AS
The existing code attempting to bitcast from a value in the default globals AS
to i8 addrspace(0)* was triggering an assertion failure in our downstream fork.
I found this while compiling poppler for CHERI-RISC-V (we use AS200 for all
globals). The test case uses AMDGPU since that is one of the in-tree targets
with a non-zero default globals address space.
The new test previously triggered a "Invalid constantexpr bitcast!" assertion
and now correctly generates code with addrspace(1) pointers.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D105972
2021-08-26 10:09:40 +01:00
Nick Desaulniers 846e562dcc [Clang] add support for error+warning fn attrs
Add support for the GNU C style __attribute__((error(""))) and
__attribute__((warning(""))). These attributes are meant to be put on
declarations of functions whom should not be called.

They are frequently used to provide compile time diagnostics similar to
_Static_assert, but which may rely on non-ICE conditions (ie. relying on
compiler optimizations). This is also similar to diagnose_if function
attribute, but can diagnose after optimizations have been run.

While users may instead simply call undefined functions in such cases to
get a linkage failure from the linker, these provide a much more
ergonomic and actionable diagnostic to users and do so at compile time
rather than at link time. Users instead may be able use inline asm .err
directives.

These are used throughout the Linux kernel in its implementation of
BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be
converted to use _Static_assert because many of the parameters are not
ICEs. The Linux kernel still needs to be modified to make use of these
when building with Clang; I have a patch that does so I will send once
this feature is landed.

To do so, we create a new IR level Function attribute, "dontcall" (both
error and warning boil down to one IR Fn Attr).  Then, similar to calls
to inline asm, we attach a !srcloc Metadata node to call sites of such
attributed callees.

The backend diagnoses these during instruction selection, while we still
know that a call is a call (vs say a JMP that's a tail call) in an arch
agnostic manner.

The frontend then reconstructs the SourceLocation from that Metadata,
and determines whether to emit an error or warning based on the callee's
attribute.

Link: https://bugs.llvm.org/show_bug.cgi?id=16428
Link: https://github.com/ClangBuiltLinux/linux/issues/1173

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D106030
2021-08-25 10:34:18 -07:00
Jonas Hahnfeld ea08c4cd1c [CUDA] Fix static device variables with -fgpu-rdc
NVPTX does not allow dots in the identifier, so ptxas errors out with
   fatal   : Parsing error near '.static': syntax error
because it parses .static as a directive. Avoid this problem by using
two underscores, similar to what OpenMP does for outlined functions.

Differential Revision: https://reviews.llvm.org/D108456
2021-08-25 09:31:22 +02:00
Yi Kong 5fc4828aa6 [clang] Don't generate warn-stack-size when the warning is ignored
8ace121305 introduced a regression for code that explicitly ignores the
-Wframe-larger-than= warning. Make sure we don't generate the
warn-stack-size attribute for that case.

Differential Revision: https://reviews.llvm.org/D108686
2021-08-25 14:58:45 +08:00
Richard Smith cd4d6d718b PR48030: Fix COMDAT-related linking problem with C++ thread_local static data members.
Previously when emitting a C++ guarded initializer, we tried to work out what
the enclosing function would be used for and added it to the COMDAT containing
the variable if we thought that doing so would be correct. But this was done
from a context in which we didn't -- and realistically couldn't -- correctly
infer how the enclosing function would be used.

Instead, add the initialization function to a COMDAT from the code that
creates it, in the case where it makes sense to do so: when we know that
the one and only reference to the initialization function is in
@llvm.global.ctors and that reference is in the same COMDAT.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D108680
2021-08-24 19:53:44 -07:00
Bob Haarman 1c829ce1e3 [clang][codegen] Set CurLinkModule in CodeGenAction::ExecuteAction
CodeGenAction::ExecuteAction creates a BackendConsumer for the
purpose of handling diagnostics. The BackendConsumer's
DiagnosticHandlerImpl method expects CurLinkModule to be set,
but this did not happen on the code path that goes through
ExecuteAction. This change makes it so that the BackendConsumer
constructor used by ExecuteAction requires the Module to be
specified and passes the appropriate module in ExecuteAction.

The change also adds a test that fails without this change
and passes with it. To make the test work, the FIXME in the
handling of DK_Linker diagnostics was addressed so that warnings
and notes are no longer silently discarded. Since this introduces
a new warning diagnostic, a flag to control it (-Wlinker-warnings)
has also been added.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D108603
2021-08-24 21:25:49 +00:00
Wang, Pengfei c728bd5bba [X86] AVX512FP16 instructions enabling 5/6
Enable FP16 FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105268
2021-08-24 09:07:19 +08:00
Andrei Elovikov f5c2889488 [NFC][clang] Use X86 Features declaration from X86TargetParser
...instead of redeclaring them in clang's own X86Target.def. They were already
required to be in sync (IIUC), so no reason to maintain two identical lists.

Reviewed By: erichkeane, craig.topper

Differential Revision: https://reviews.llvm.org/D108151
2021-08-23 12:30:28 -07:00
Jon Chesterfield c2574e63ff [openmp][nfc] Refactor GridValues
Remove redundant fields and replace pointer with virtual function

Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.

This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108380
2021-08-23 16:19:11 +01:00
Andy Wingo 4fb0c08342 [clang][CodeGen] GetDefaultAlignTempAlloca uses preferred alignment
This function was defaulting to use the ABI alignment for the LLVM
type.  Here we change to use the preferred alignment.  This will allow
unification with GetTempAlloca, which if alignment isn't specified, uses
the preferred alignment.

Differential Revision: https://reviews.llvm.org/D108450
2021-08-23 14:55:58 +02:00
Andy Wingo 8da70fed70 [clang][NFC] Tighten up code for GetGlobalVarAddressSpace
The LangAS local is only used in the OpenCL case; move its decl
inwards.

Differential Revision: https://reviews.llvm.org/D108449
2021-08-23 14:55:58 +02:00
Andy Wingo d3d4d98576 [clang][NFC] GetOrCreateLLVMGlobal takes LangAS
Pass a LangAS instead of a target address space to
GetOrCreateLLVMGlobal, to remove a place where the frontend assumes that
target address space 0 is special.

Differential Revision: https://reviews.llvm.org/D108445
2021-08-23 14:55:58 +02:00
Shilei Tian 2c6ffb4eb2 [NFC] clang-format -i clang/lib/CodeGen/CGStmtOpenMP.cpp 2021-08-22 22:57:05 -04:00
Simon Pilgrim 7f48bd3bed CGBuiltin.cpp - pass SVETypeFlags by const reference. NFC.
Don't pass the struct by value.
2021-08-22 12:13:17 +01:00
Wang, Pengfei b088536ce9 [X86] AVX512FP16 instructions enabling 4/6
Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267
2021-08-22 08:59:35 +08:00
Joseph Huber ec66ed79f4 [OpenMP] Correctly add member expressions to OpenMP info
Mapping expressions that have `this` as their base expression aren't
considered a valid base variable and the rest of the runtime expects
this. However, if we have an expression with no value declaration we can
try to extract it manually to provide more helpful debuggin information.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108483
2021-08-20 20:45:14 -04:00
Arthur Eubanks 644f88a25b [NFC] addAttribute(FunctionIndex) => addFnAttribute() 2021-08-20 14:18:59 -07:00
Yonghong Song 5ca7131eb3 [DebugInfo] convert btf_tag attrs to DI annotations for record fields
Generate btf_tag annotations for record fields. The annotations
are represented as an DINodeArray in DebugInfo.

Differential Revision: https://reviews.llvm.org/D106616
2021-08-20 12:52:51 -07:00
Jon Chesterfield b1efeface7 Revert "[openmp][nfc] Refactor GridValues"
Failed a nvptx codegen test
This reverts commit 2a47a84b40.
2021-08-20 18:17:27 +01:00
Thomas Lively 88962cea46 [WebAssembly] Restore builtins and intrinsics for pmin/pmax
Partially reverts 85157c0079, which had removed these builtins and intrinsics
in favor of normal codegen patterns. It turns out that it is possible for the
patterns to be split over multiple basic blocks, however, which means that DAG
ISel is not able to select them to the pmin/pmax instructions. To make sure the
SIMD intrinsics generate the correct instructions in these cases, reintroduce
the clang builtins and corresponding LLVM intrinsics, but also keep the normal
pattern matching as well.

Differential Revision: https://reviews.llvm.org/D108387
2021-08-20 09:21:31 -07:00
Jon Chesterfield 2a47a84b40 [openmp][nfc] Refactor GridValues
Remove redundant fields and replace pointer with virtual function

Of fourteen fields, three are dead and four can be computed from the
remainder. This leaves a couple of currently dead fields in place as
they are expected to be used from the deviceRTL shortly. Two of the
fields that can be computed are only used from codegen and require a
log2() implementation so are inlined into codegen instead.

This change leaves the new methods in the same location in the struct
as the previous fields for convenience at review.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108380
2021-08-20 16:41:26 +01:00
Alexander Potapenko b0391dfc73 [clang][Codegen] Introduce the disable_sanitizer_instrumentation attribute
The purpose of __attribute__((disable_sanitizer_instrumentation)) is to
prevent all kinds of sanitizer instrumentation applied to a certain
function, Objective-C method, or global variable.

The no_sanitize(...) attribute drops instrumentation checks, but may
still insert code preventing false positive reports. In some cases
though (e.g. when building Linux kernel with -fsanitize=kernel-memory
or -fsanitize=thread) the users may want to avoid any kind of
instrumentation.

Differential Revision: https://reviews.llvm.org/D108029
2021-08-20 14:01:06 +02:00
Yonghong Song cab12fc28c [DebugInfo] convert btf_tag attrs to annotations for DIComposite types
Clang patch D106614 added attribute btf_tag support. This patch
generates btf_tag annotations for DIComposite types.
Each btf_tag annotation is represented as a 2D array of
meta strings. Each record may have more than one
btf_tag annotations.

Differential Revision: https://reviews.llvm.org/D106615
2021-08-19 18:01:29 -07:00
Jon Chesterfield 77579b99e9 [openmp][nfc] Replace OMPGridValues array with struct
[nfc] Replaces enum indices into an array with a struct. Named the
fields to match the enum, leaves memory layout and initialization unchanged.

Motivation is to later safely remove dead fields and replace redundant ones
with (compile time) computation. It should also be possible to factor some
common fields into a base and introduce a gfx10 amdgpu instance with less
duplication than the arrays of integers require.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D108339
2021-08-19 13:25:42 +01:00
Sven van Haastregt 7bda1a0711 [OpenCL] Fix as_type(vec3) invalid store creation
With -fpreserve-vec3-type enabled, a cast was not created when
converting from a vec3 type to a non-vec3 type, even though a
conversion to vec4 was performed.  This resulted in creation of
invalid store instructions.

Differential Revision: https://reviews.llvm.org/D107963
2021-08-19 11:57:09 +01:00
Bjorn Pettersson 36d5138619 [NewPM] Make some sanitizer passes parameterized in the PassRegistry
Refactored implementation of AddressSanitizerPass and
HWAddressSanitizerPass to use pass options similar to passes like
MemorySanitizerPass. This makes sure that there is a single mapping
from class name to pass name (needed by D108298), and options like
-debug-only and -print-after makes a bit more sense when (despite
that it is the unparameterized pass name that should be used in those
options).

A result of the above is that some pass names are removed in favor
of the parameterized versions:
- "khwasan" is now "hwasan<kernel;recover>"
- "kasan" is now "asan<kernel>"
- "kmsan" is now "msan<kernel>"

Differential Revision: https://reviews.llvm.org/D105007
2021-08-19 12:43:37 +02:00
Martin Storsjö cc3affd8b0 [clang] [MSVC] Implement __mulh and __umulh builtins for aarch64
The code is based on the same __mulh and __umulh intrinsics for
x86.

This should fix PR51128.

Differential Revision: https://reviews.llvm.org/D106721
2021-08-19 11:29:55 +03:00
Rong Xu 5fdaaf7fd8 [SampleFDO] Flow Sensitive Sample FDO (FSAFDO) profile loader
This patch implements Flow Sensitive Sample FDO (FSAFDO) profile
loader. We have two profile loaders for FS profile,
one before RegAlloc and one before BlockPlacement.

To enable it, when -fprofile-sample-use=<profile> is specified,
add "-enable-fs-discriminator=true \
     -disable-ra-fsprofile-loader=false \
     -disable-layout-fsprofile-loader=false"
to turn on the FS profile loaders.

Differential Revision: https://reviews.llvm.org/D107878
2021-08-18 18:37:35 -07:00
Arthur Eubanks 3f4d00bc3b [NFC] More get/removeAttribute() cleanup 2021-08-17 21:05:41 -07:00
Arthur Eubanks de0ae9e89e [NFC] Cleanup more AttributeList::addAttribute() 2021-08-17 21:05:41 -07:00
Arthur Eubanks ad727ab7d9 [NFC] Migrate some callers away from Function/AttributeLists methods that take an index
These methods can be confusing.
2021-08-17 21:05:40 -07:00
Arthur Eubanks 46cf82532c [NFC] Replace Function handling of attributes with less confusing calls
To avoid magic constants and confusing indexes.
2021-08-17 21:05:40 -07:00
Wang, Pengfei 5aeca3b0a5 [CFE][X86] Enable complex _Float16 support
Support complex _Float16 on X86 in C/C++ following the latest X86 psABI. (https://gitlab.com/x86-psABIs)

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105331
2021-08-18 11:16:14 +08:00
Wang, Pengfei 2379949aad [X86] AVX512FP16 instructions enabling 3/6
Enable FP16 conversion instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105265
2021-08-18 09:03:41 +08:00
Dylan Fleming ef198cd99e [SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute
Removed AArch64 usage of the getMaxVScale interface, replacing it with
the vscale_range(min, max) IR Attribute.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D106277
2021-08-17 14:42:47 +01:00
Wang, Pengfei f1de9d6dae [X86] AVX512FP16 instructions enabling 2/6
Enable FP16 binary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105264
2021-08-15 08:56:33 +08:00
Arthur Eubanks 8e9ffa1dc6 [NFC] Cleanup callers of AttributeList::hasAttributes()
AttributeList::hasAttributes() is confusing, use clearer methods like
hasFnAttrs().
2021-08-13 12:16:52 -07:00
Arthur Eubanks 80ea2bb574 [NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes()
This is more consistent with similar methods.
2021-08-13 11:16:52 -07:00
Arthur Eubanks 92ce6db9ee [NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()
This is more consistent with similar methods.
2021-08-13 11:09:18 -07:00
Michael Kruse b1de32d6dd [OMPIRBuilder] Clarify CanonicalLoopInfo. NFC.
Add in-source documentation on how CanonicalLoopInfo is intended to be used. In particular, clarify what parts of a CanonicalLoopInfo is considered part of the loop, that those parts must be side-effect free, and that InsertPoints to instructions outside those parts can be expected to be preserved after method calls implementing loop-associated directives.

CanonicalLoopInfo are now invalidated after it does not describe canonical loop anymore and asserts when trying to use it afterwards.

In addition, rename `createXYZWorkshareLoop` to `applyXYZWorkshareLoop` and remove the update location to avoid that the impression that they insert something from scratch at that location where in reality its InsertPoint is ignored. createStaticWorkshareLoop does not return a CanonicalLoopInfo anymore. First, it was not a canonical loop in the clarified sense (containing side-effects in form of calls to the OpenMP runtime). Second, it is ambiguous which of the two possible canonical loops it should actually return. It will not be needed before a feature expected to be introduced in OpenMP 6.0

Also see discussion in D105706.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D107540
2021-08-12 21:02:19 -05:00
Arnold Schwaighofer 9eb99d2e73 CodeGen: No need to check for isExternC if HasStrictReturn is already false
NFC intended.

Differential Revision: https://reviews.llvm.org/D107841
2021-08-11 07:42:48 -07:00
Wang, Pengfei 6f7f5b54c8 [X86] AVX512FP16 instructions enabling 1/6
1. Enable FP16 type support and basic declarations used by following patches.
2. Enable new instructions VMOVW and VMOVSH.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105263
2021-08-10 12:46:01 +08:00
Michael Liao 6ec36d18ec [cuda] Mark builtin texture/surface reference variable as 'externally_initialized'.
- They need to be preserved even if there's no reference within the
  device code as the host code may need to initialize them based on the
  application logic.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D107718
2021-08-09 13:27:40 -04:00
Roger Ferrer Ibanez bfb77364d0 [OpenMP] Fix accidental reuse of VLA size
We were using an OpaqueValueExpr allocated on the stack to store
the size of a VLA. Because the VLASizeMap in CodegenFunction
uses the address of the expression to avoid recomputing VLAs,
we were accidentally reusing an earlier llvm::Value. This led to
invalid LLVM IR.

This is a temporary solution until VLASizeMap can be pushed and popped
based on the context.

Differential Revision: https://reviews.llvm.org/D107666
2021-08-07 05:55:27 +00:00
Joseph Huber 41a6b50c25 [OpenMP]Fix PR51349: Remove AlwaysInline for if regions.
After D94315 we add the `NoInline` attribute to the outlined function to handle
data environments in the OpenMP if clause. This conflicted with the `AlwaysInline`
attribute added to the outlined function. for better performance in D106799.
The data environments should ideally not require NoInline, but for now this
fixes PR51349.

Reviewed By: mikerice

Differential Revision: https://reviews.llvm.org/D107649
2021-08-06 17:53:04 -04:00
Serge Pavlov 4c4093e6e3 Introduce intrinsic llvm.isnan
This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:

    Clang has builtin function '__builtin_isnan', which implements C
    library function 'isnan'. This function now is implemented entirely in
    clang codegen, which expands the function into set of IR operations.
    There are three mechanisms by which the expansion can be made.

    * The most common mechanism is using an unordered comparison made by
      instruction 'fcmp uno'. This simple solution is target-independent
      and works well in most cases. It however is not suitable if floating
      point exceptions are tracked. Corresponding IEEE 754 operation and C
      function must never raise FP exception, even if the argument is a
      signaling NaN. Compare instructions usually does not have such
      property, they raise 'invalid' exception in such case. So this
      mechanism is unsuitable when exception behavior is strict. In
      particular it could result in unexpected trapping if argument is SNaN.

    * Another solution was implemented in https://reviews.llvm.org/D95948.
      It is used in the cases when raising FP exceptions by 'isnan' is not
      allowed. This solution implements 'isnan' using integer operations.
      It solves the problem of exceptions, but offers one solution for all
      targets, however some can do the check in more efficient way.

    * Solution implemented by https://reviews.llvm.org/D96568 introduced a
      hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
      specific code into IR. Now only SystemZ implements this hook and it
      generates a call to target specific intrinsic function.

    Although these mechanisms allow to implement 'isnan' with enough
    efficiency, expanding 'isnan' in clang has drawbacks:

    * The operation 'isnan' is hidden behind generic integer operations or
      target-specific intrinsics. It complicates analysis and can prevent
      some optimizations.

    * IR can be created by tools other than clang, in this case treatment
      of 'isnan' has to be duplicated in that tool.

    Another issue with the current implementation of 'isnan' comes from the
    use of options '-ffast-math' or '-fno-honor-nans'. If such option is
    specified, 'fcmp uno' may be optimized to 'false'. It is valid
    optimization in general, but it results in 'isnan' always returning
    'false'. For example, in some libc++ implementations the following code
    returns 'false':

        std::isnan(std::numeric_limits<float>::quiet_NaN())

    The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
    operands are never NaNs. This assumption however should not be applied
    to the functions that check FP number properties, including 'isnan'. If
    such function returns expected result instead of actually making
    checks, it becomes useless in many cases. The option '-ffast-math' is
    often used for performance critical code, as it can speed up execution
    by the expense of manual treatment of corner cases. If 'isnan' returns
    assumed result, a user cannot use it in the manual treatment of NaNs
    and has to invent replacements, like making the check using integer
    operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
    which also expresses the opinion, that limitations imposed by
    '-ffast-math' should be applied only to 'math' functions but not to
    'tests'.

    To overcome these drawbacks, this change introduces a new IR intrinsic
    function 'llvm.isnan', which realizes the check as specified by IEEE-754
    and C standards in target-agnostic way. During IR transformations it
    does not undergo undesirable optimizations. It reaches instruction
    selection, where is lowered in target-dependent way. The lowering can
    vary depending on options like '-ffast-math' or '-ffp-model' so the
    resulting code satisfies requested semantics.

    Differential Revision: https://reviews.llvm.org/D104854
2021-08-06 14:32:27 +07:00
Fangrui Song c38efb4899 [clang] Implement -falign-loops=N (N is a power of 2) for non-LTO
GCC supports multiple forms of -falign-loops=.
-falign-loops= is currently ignored in Clang.

This patch implements the simplest but the most useful form where N is a
power of 2.

The underlying implementation uses a `llvm::TargetOptions` option for now.
Bitcode generation ignores this option.

Differential Revision: https://reviews.llvm.org/D106701
2021-08-05 12:17:50 -07:00
Anshil Gandhi 39dac1f7f6 [clang] Add clang builtins support for gfx90a
Implement target builtins for gfx90a including fadd64, fadd32, add2h,
max and min on various global, flat and ds address spaces for which
intrinsics are implemented.

Differential Revision: https://reviews.llvm.org/D106909
2021-08-05 02:08:06 -06:00
Pavel Asyutchenko 7df405e079 Apply -fmacro-prefix-map to __builtin_FILE()
This matches the behavior of GCC.
Patch does not change remapping logic itself, so adding one simple smoke test should be enough.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107393
2021-08-04 16:42:14 -07:00
Bradley Smith e57e1e4e00 [clang][AArch64][SVE] Avoid going through memory for fixed/scalable predicate casts
For fixed SVE types, predicates are represented using vectors of i8,
where as for scalable types they are represented using vectors of i1. We
can avoid going through memory for casts between these by bitcasting the
i1 scalable vectors to/from a scalable i8 vector of matching size, which
can then use the existing vector insert/extract logic.

Differential Revision: https://reviews.llvm.org/D106860
2021-08-04 16:10:37 +00:00
Serge Pavlov 0c28a7c990 Revert "Introduce intrinsic llvm.isnan"
This reverts commit 16ff91ebcc.
Several errors were reported mainly test-suite execution time. Reverted
for investigation.
2021-08-04 17:18:15 +07:00
Serge Pavlov 16ff91ebcc Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.

* The most common mechanism is using an unordered comparison made by
  instruction 'fcmp uno'. This simple solution is target-independent
  and works well in most cases. It however is not suitable if floating
  point exceptions are tracked. Corresponding IEEE 754 operation and C
  function must never raise FP exception, even if the argument is a
  signaling NaN. Compare instructions usually does not have such
  property, they raise 'invalid' exception in such case. So this
  mechanism is unsuitable when exception behavior is strict. In
  particular it could result in unexpected trapping if argument is SNaN.

* Another solution was implemented in https://reviews.llvm.org/D95948.
  It is used in the cases when raising FP exceptions by 'isnan' is not
  allowed. This solution implements 'isnan' using integer operations.
  It solves the problem of exceptions, but offers one solution for all
  targets, however some can do the check in more efficient way.

* Solution implemented by https://reviews.llvm.org/D96568 introduced a
  hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
  specific code into IR. Now only SystemZ implements this hook and it
  generates a call to target specific intrinsic function.

Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:

* The operation 'isnan' is hidden behind generic integer operations or
  target-specific intrinsics. It complicates analysis and can prevent
  some optimizations.

* IR can be created by tools other than clang, in this case treatment
  of 'isnan' has to be duplicated in that tool.

Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':

    std::isnan(std::numeric_limits<float>::quiet_NaN())

The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.

To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.

Differential Revision: https://reviews.llvm.org/D104854
2021-08-04 15:27:49 +07:00
Alexandros Lamprineas 29b263a34f [Clang][AArch64] Inline assembly support for the ACLE type 'data512_t'
In LLVM IR terms the ACLE type 'data512_t' is essentially an aggregate
type { [8 x i64] }. When emitting code for inline assembly operands,
clang tries to scalarize aggregate types to an integer of the equivalent
length, otherwise it passes them by-reference. This patch adds a target
hook to tell whether a given inline assembly operand is scalarizable
so that clang can emit code to pass/return it by-value.

Differential Revision: https://reviews.llvm.org/D94098
2021-07-31 09:51:28 +01:00
Tarindu Jayatilaka 7a797b2902 Take OptimizationLevel class out of Pass Builder
Pulled out the OptimizationLevel class from PassBuilder in order to be able to access it from within the PassManager and avoid include conflicts.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D107025
2021-07-29 21:57:23 -07:00
Melanie Blower bc5b5ea037 [clang][patch][FPEnv] Make initialization of C++ globals strictfp aware
@kpn pointed out that the global variable initialization functions didn't
have the "strictfp" metadata set correctly, and @rjmccall said that there
was buggy code in SetFPModel and StartFunction, this patch is to solve
those problems. When Sema creates a FunctionDecl, it sets the
FunctionDeclBits.UsesFPIntrin to "true" if the lexical FP settings
(i.e. a combination of command line options and #pragma float_control
settings) correspond to ConstrainedFP mode. That bit is used when CodeGen
starts codegen for a llvm function, and it translates into the
"strictfp" function attribute. See bugs.llvm.org/show_bug.cgi?id=44571

Reviewed By: Aaron Ballman

Differential Revision: https://reviews.llvm.org/D102343
2021-07-29 12:02:37 -04:00
Kai Luo e4902e69e9 [PowerPC] Fix return type of XL compat CAS
`__compare_and_swap*` should return `i32` rather than `i1`.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D107077
2021-07-29 14:49:26 +00:00
Fangrui Song 828767f325 COFF/ELF: Place llvm.global_ctors elements in llvm.used if comdat is used
On ELF, an SHT_INIT_ARRAY outside a section group is a GC root. The current
codegen abuses SHT_INIT_ARRAY in a section group to mean a GC root.

On PE/COFF, the dynamic initialization for `__declspec(selectany)` in a comdat
can be garbage collected by `-opt:ref`.

Call `addUsedGlobal` for the two cases to fix the abuse/bug.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106925
2021-07-28 11:44:19 -07:00
Matheus Izvekov 4819b751bd [clang] NFC: change uses of `Expr->getValueKind` into `is?Value`
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D100733
2021-07-28 03:09:31 +02:00
Jose M Monsalve Diaz 0276db1416 [OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions
The device runtime contains several calls to __kmpc_get_hardware_num_threads_in_block
and __kmpc_get_hardware_num_blocks. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In commit D106033 we have the optimization phase. This commit adds the attributes to
the outlined function for the grid size. the two attributes are `omp_target_num_teams` and
`omp_target_thread_limit`. These values are added as long as they are constant.

Two functions are created `getNumThreadsExprForTargetDirective` and
`getNumTeamsExprForTargetDirective`. The original functions `emitNumTeamsForTargetDirective`
 and `emitNumThreadsForTargetDirective` identify the expresion and emit the code.
However, for the Device version of the outlined function, we cannot emit anything.
Therefore, this is a first attempt to separate emision of code from deduction of the
values.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106298
2021-07-27 17:21:04 -04:00
Thomas Lively 33786576fd [WebAssembly] Codegen for extmul SIMD instructions
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.

Differential Revision: https://reviews.llvm.org/D106724
2021-07-27 08:41:30 -07:00
Reid Kleckner f9f56488e0 [DebugInfo] Use per-enumerator signedness for DIEnumerator
Allegedly the DWARF backend ignores this field of DIEnumerator, but we
set it nonetheless in case we decide to use it in the future.
Alternatively, we could remove it, but it is simpler to pass down the
signed bit as it is in the AST for now.

Implemented to address comments on D106585
2021-07-26 16:14:28 -07:00
Joseph Huber af000197c4 [OpenMP] Always inline the OpenMP outlined function
This patch adds the always inline attribute to the outlined functions generated
by OpenMP regions. Because there is only a single instance of this function and
it always has internal linkage it is safe to inline in every instance it is
created. This could potentially lead to performance degredation due to
inflated register counts in the parallel region.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106799
2021-07-26 17:27:59 -04:00
Reid Kleckner 3230493299 Fix clang debug info irgen of i128 enums
DIEnumerator stores an APInt as of April 2020, so now we don't need to
truncate the enumerator value to 64 bits. Fixes assertions during IRGen.

Split from D105320, thanks to Matheus Izvekov for the test case and
report.

Differential Revision: https://reviews.llvm.org/D106585
2021-07-26 12:25:29 -07:00
Nemanja Ivanovic 1c50a5da36 [PowerPC] Implement partial vector ld/st builtins for XL compatibility
XL provides functions __vec_ldrmb/__vec_strmb for loading/storing a
sequence of 1 to 16 bytes in big endian order, right justified in the
vector register (regardless of target endianness).
This is equivalent to vec_xl_len_r/vec_xst_len_r which are only
available on Power9.

This patch simply uses the Power9 functions when compiled for Power9,
but provides a more general implementation for Power8.

Differential revision: https://reviews.llvm.org/D106757
2021-07-26 13:19:52 -05:00
Shilei Tian 3274cdc83e [Clang][OpenMP] Remove the mandatory flush for capture for OpenMP 5.1
In OpenMP 5.1:
> If the `write` or `update` clause is specifieded, the atomic operation is not an atomic conditional update for which the comparison fails, and the effective memory ordering is `release`, `acq_rel`, or `seq_cst`, the strong flush on entry to the atomic operation is also a release flush. If the `read` or `update` clause is specified and the effective memory ordering is `acquire`, `acq_rel`, or `seq_cst` then the strong flush on exit from the atomic operation is also an acquire flush.

In OpenMP 5.0:
> If the `write`, `update`, or **`capture`** clause is specified and the `release`, `acq_rel`, or `seq_cst` clause is specified then the strong flush on entry to the atomic operation is also a release flush. If the `read` or `capture` clause is specified and the `acquire`, `acq_rel`, or `seq_cst` clause is specified then the strong flush on exit from the atomic operation is also an acquire flush.

From my understanding, in OpenMP 5.1, `capture` is removed from the requirement for flush, therefore we don't have to enforce it.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D100768
2021-07-26 11:00:44 -04:00
Thomas Lively 85157c0079 [WebAssembly] Codegen for pmin and pmax
Replace the clang builtins and LLVM intrinsics for {f32x4,f64x2}.{pmin,pmax}
with standard codegen patterns. Since wasm_simd128.h uses an integer vector as
the standard single vector type, the IR for the pmin and pmax intrinsic
functions contains bitcasts that would not be there otherwise. Add extra codegen
patterns that can still select the pmin and pmax instructions in the presence of
these bitcasts.

Differential Revision: https://reviews.llvm.org/D106612
2021-07-23 14:49:21 -07:00
Fangrui Song 7290ddd6b1 Revert "[clang] -falign-loops="
This reverts commit 42896eeed9.

Unfinished. Accidentally pushed when reverting a clangd commit.
2021-07-23 09:58:35 -07:00
Fangrui Song 42896eeed9 [clang] -falign-loops= 2021-07-23 09:50:43 -07:00
Yaxun (Sam) Liu 44dbbe6106 [HIP] Preserve ASAN bitcode library functions
Address sanitizer passes may generate call of ASAN bitcode library
functions after bitcode linking in lld, therefore lld cannot add
those symbols since it does not know they will be used later.

To solve this issue, clang emits a reference to a bicode library
function which calls all ASAN functions which need to be
preserved. This basically force all ASAN functions to be
linked in.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D106315
2021-07-23 10:35:52 -04:00
Kai Luo e4ed93cb25 [PowerPC] Implement XL compatible behavior of __compare_and_swap
According to https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=functions-compare-swap-compare-swaplp
XL's `__compare_and_swap` has a weird behavior that

> In either case, the contents of the memory location specified by addr are copied into the memory location specified by old_val_addr.

(unlike c11 `atomic_compare_exchange` specified in http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf)

This patch let clang's implementation follow this behavior.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D106344
2021-07-23 01:16:02 +00:00
Florian Mayer 96c63492cb [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-22 16:20:27 -07:00
Alexey Bataev b88a68c45e [OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments.
Added missed arguments in
__tgt_target_teams_nowait_mapper/__tgt_target_nowait_mapper runtime
functions calls.

Differential Revision: https://reviews.llvm.org/D106542
2021-07-22 08:44:37 -07:00
Alexey Bataev f828f0a90f Revert "[OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments."
This reverts commit b455f7f225 to fix
buildbots.
2021-07-22 08:06:29 -07:00
Alexey Bataev b455f7f225 [OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments.
Added missed arguments in
__tgt_target_teams_nowait_mapper/__tgt_target_nowait_mapper runtime
functions calls.

Differential Revision: https://reviews.llvm.org/D106542
2021-07-22 07:53:37 -07:00
Florian Mayer 789a4a2e5c Revert "[hwasan] Use stack safety analysis."
This reverts commit bde9415fef.
2021-07-22 12:16:16 +01:00
Florian Mayer bde9415fef [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-22 12:04:54 +01:00
Simon Tatham bd41136746 [clang] Use i64 for the !srcloc metadata on asm IR nodes.
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.

!srcloc is generated in clang codegen, and pulled back out by llvm
functions like AsmPrinter::emitInlineAsm that need to report errors in
the inline asm. From there it goes to LLVMContext::emitError, is
stored in DiagnosticInfoInlineAsm, and ends up back in clang, at
BackendConsumer::InlineAsmDiagHandler(), which reconstitutes a true
clang::SourceLocation from the integer cookie.

Throughout this code path, it's now 64-bit rather than 32, which means
that if SourceLocation is expanded to a 64-bit type, this error report
won't lose half of the data.

The compiler will tolerate both of i32 and i64 !srcloc metadata in
input IR without faulting. Test added in llvm/MC. (The semantic
accuracy of the metadata is another matter, but I don't know of any
situation where that matters: if you're reading an IR file written by
a previous run of clang, you don't have the SourceManager that can
relate those source locations back to the original source files.)

Original version of the patch by Mikhail Maltsev.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D105491
2021-07-22 10:24:52 +01:00
Joseph Huber 754eb1c210 [OpenMP] Change `__kmpc_free_shared` to include the paired allocation size
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106496
2021-07-21 20:56:21 -04:00
Thomas Lively 8af333cf1a [WebAssembly] Replace @llvm.wasm.popcnt with @llvm.ctpop.v16i8
Use the standard target-independent intrinsic to take advantage of standard
optimizations.

Differential Revision: https://reviews.llvm.org/D106506
2021-07-21 16:45:54 -07:00
Thomas Lively db7efcab7d [WebAssembly] Remove clang builtins for extract_lane and replace_lane
These builtins were added to capture the fact that the underlying Wasm
instructions return i32s and implicitly sign or zero extend the extracted lanes
in the case of the i8x16 and i16x8 variants. But we do sufficient optimizations
during code gen that these low-level details do not need to be exposed to users.

This commit replaces the use of the builtins in wasm_simd128.h with normal
target-independent vector code. As a result, we can switch the relevant
intrinsics to use functions rather than macros and can use more user-friendly
return types rather than trying to precisely expose the underlying Wasm types.
Note, however, that the generated LLVM IR is no different after this change.

Differential Revision: https://reviews.llvm.org/D106500
2021-07-21 16:11:00 -07:00
Thomas Lively 1a57ee1276 [WebAssembly] Codegen for v128.load{32,64}_zero
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal instruction selection patterns. The wasm_simd128.h
intrinsics header was already using portable code for the corresponding
intrinsics, so now it produces the correct instructions.

Differential Revision: https://reviews.llvm.org/D106400
2021-07-21 09:02:12 -07:00
Quinn Pham e002d251dd [PowerPC] Floating Point Builtins for XL Compat.
This patch is in a series of patches to provide
builtins for compatibility with the XL compiler.
This patch adds builtins related to floating point
operations

Reviewed By: #powerpc, nemanjai, amyk, NeHuang

Differential Revision: https://reviews.llvm.org/D103986
2021-07-21 08:33:39 -05:00
Simon Tatham 21401a7262 [clang] Introduce SourceLocation::[U]IntTy typedefs.
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.

NFC: this patch introduces typedefs for the integer type used by
SourceLocation and makes all the boring changes to use the typedefs
everywhere, but for the moment, they are unconditionally defined to
uint32_t.

Patch originally by Mikhail Maltsev.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D105492
2021-07-21 10:45:46 +01:00
Albion Fung 2fd1520247 [PowerPC] Implemented mtmsr, mfspr, mtspr Builtins
Implemented builtins for mtmsr, mfspr, mtspr on PowerPC;
the patch is intended for XL Compatibility.

Differential revision: https://reviews.llvm.org/D106130
2021-07-20 17:51:00 -05:00
Albion Fung 3434ac9e39 [PowerPC] Store, load, move from and to registers related builtins
This patch implements store, load, move from and to registers related
builtins, as well as the builtin for stfiw. The patch aims to provide
feature parady with xlC on AIX.

Differential revision: https://reviews.llvm.org/D105946
2021-07-20 15:46:14 -05:00
Victor Huang 1a762f93f8 [PowerPC] Add PowerPC cmpb builtin and emit target indepedent code for XL compatibility
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch add the builtin and emit target independent
code for __cmpb.

Reviewed By: nemanjai, #powerpc

Differential revision: https://reviews.llvm.org/D105194
2021-07-20 13:06:22 -05:00
Melanie Blower ea864c9933 [clang][patch][NFC] Refactor calculation of FunctionDecl to avoid duplicate code 2021-07-20 11:01:22 -04:00
Quinn Pham fd855c24c7 [PowerPC] Restore FastMathFlags of Builder for Vector FDiv Builtins
This patch fixes `__builtin_ppc_recipdivf`, `__builtin_ppc_recipdivd`,
`__builtin_ppc_rsqrtf`, and `__builtin_ppc_rsqrtd`. FastMathFlags are
set to fast immediately before emitting these builtins. Now the flags
are restored to their previous values after the builtins are emitted.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D105984
2021-07-20 09:41:00 -05:00
Stefan Pintilie 02cd937945 [PowerPC][Builtins] Added a number of builtins for compatibility with XL.
Added a number of different builtins that exist in the XL compiler. Most of
these builtins already exist in clang under a different name.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D104386
2021-07-20 08:57:55 -05:00
Florian Mayer 5f08219322 Revert "[hwasan] Use stack safety analysis."
This reverts commit e9c63ed10b.
2021-07-20 10:36:46 +01:00
Florian Mayer e9c63ed10b [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-20 10:06:35 +01:00
Quinn Pham 0268e123be [PowerPC] swdiv_nochk Builtins for XL Compat
This patch is in a series of patches to provide builtins for
compatibility with the XL compiler. This patch adds software divide
builtins with no checking. These builtins are each emitted as a fast
fdiv.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D106150
2021-07-19 16:51:10 -05:00
Giorgis Georgakoudis fb0cf01795 Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit e9c7291cb2.

Fix failing tests
2021-07-19 07:54:26 -07:00
Jamie Schmeiser 73840f9f81 thread_local support for AIX
Summary:
The AIX linker will produce errors on unresolved weak symbols.  Change the
generated code to not check for the initialization function but just call
it and ensure that it always exists.  Also, the AIX atexit routine has a
different name (and signature) so call it correctly.  Update the lit tests
to test on AIX appropriately.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: hubert.reinterpretcast (Hubert Tong)
Differential Revision: https://reviews.llvm.org/D104420
2021-07-19 10:03:22 -04:00
Florian Mayer 807d50100c Revert "[hwasan] Use stack safety analysis."
This reverts commit 12268fe14a.
2021-07-19 12:08:32 +01:00
Florian Mayer 12268fe14a [hwasan] Use stack safety analysis.
This avoids unnecessary instrumentation.

Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D105703
2021-07-19 11:54:44 +01:00
David Blaikie dac582ad3a DebugInfo: Name class templates with default arguments consistently (both direct naming, and as a template argument for a function template)
It's noteworthy that GCC has the same bug here, which is a bit
surprising. Both Clang and GCC's bug is only for function template
arguments that are themselves templates with default template arguments
(f1<t1<int[, missing_default_here]>>). Probably because function name
matching isn't generally necessary - whereas type matching is necessary
for DWARF consumers to associate declarations and definitions across
translation units, so the bug's been addressed there already - but
continued to exist for function templates since it's fairly benign
there.

I came across this while working on a change that could reconstitute
these pretty printed names based on the rest of the DWARF, reducing the
size of the DWARF by not having to encode all the template parameters in
the name string. That reconstitution code can't tell the difference
between a defaulted argument or not, so couldn't create the current
buggy-ish output.

Making the names more consistent between direct and indirect references,
and between function and class templates seems all to the good.

(I fixed the function template version of this a few years back in
9fdd09a4cc - clearly I should've looked
more closely and generalized the code better so it only had to be fixed
once - well, doing that here now)
2021-07-17 23:58:15 -07:00
Nikita Popov 2c68ecccc9 [OpaquePtr] Remove uses of CreateGEP() without element type
Remove uses of to-be-deprecated API. In cases where the correct
element type was not immediately obvious to me, fall back to
explicit getPointerElementType().
2021-07-17 22:56:27 +02:00
Nikita Popov 6225d0cc6e [OpaquePtr] Remove uses of CreateInBoundsGEP() without element type
Remove uses of to-be-deprecated API.

Unfortunately this one mostly just makes the use of
getPointerElementType() explicit, as the correct type to use
wasn't immediately available (deriving it from QualType is left
as an excercise to the reader).
2021-07-17 21:27:16 +02:00
Nikita Popov 4ace6008f2 [OpaquePtr] Remove uses of CreateStructGEP() without element type
Remove uses of to-be-deprecated API.
2021-07-17 18:48:21 +02:00
Nikita Popov 6d3e7c783b [OpaquePtr] Remove uses of CreateConstGEP1_32() without element type
Remove uses of to-be-deprecated API. I've fallen back to calling
getPointerElementType() in some cases where the correct type wasn't
immediately obvious to me.
2021-07-17 18:32:36 +02:00
Nikita Popov 5071360eb1 [OpaquePtr] Remove uses of CGF.Builder.CreateConstInBoundsGEP1_64() without type
Remove uses of to-be-deprecated API.
2021-07-17 17:07:46 +02:00
Nikita Popov 357756ecf6 [OpaquePtr] Remove uses of CreateConstGEP1_64() without element type
Remove uses of to-be-deprecated API.
2021-07-17 16:43:20 +02:00
Nikita Popov 4737eebc0d [OpaquePtr] Remove uses of CreateConstInBoundsGEP2_64() without type
Remove uses of to-be-deprecated API.
2021-07-17 16:42:10 +02:00
Giorgis Georgakoudis e9c7291cb2 [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102107
2021-07-16 23:27:44 -07:00
Nemanja Ivanovic 35a18a981f [PowerPC] Implement intrinsics for mtfsf[i]
This provides intrinsics for emitting instructions that set the FPSCR (`mtfsf/mtfsfi`).

The patch also conservatively marks the rounding mode as an implicit def for both since they both may set the rounding mode depending on the operands.

Reviewed By: #powerpc, qiucf

Differential Revision: https://reviews.llvm.org/D105957
2021-07-16 16:26:11 -05:00
Victor Huang 4eb107ccba [PowerPC] Add PowerPC population count, reversed load and store related builtins and instrinsics for XL compatibility
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtins and instrisics for population
count, reversed load and store related operations.

Reviewed By: nemanjai, #powerpc

Differential revision: https://reviews.llvm.org/D106021
2021-07-15 17:23:56 -05:00
Artem Belevich d774b4aa5e [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.
That should allow clang to compile mma.h from CUDA-11.3.

Differential Revision: https://reviews.llvm.org/D105384
2021-07-15 12:02:09 -07:00
Quinn Pham de3956605a [PowerPC] Fix popcntb XL Compat Builtin for 32bit
This patch implements the `__popcntb` XL compatibility builtin for 32bit in the frontend and backend. This patch also updates tests for `__popcntb` and other XL Compat sync related builtins.

Reviewed By: #powerpc, nemanjai, amyk

Differential Revision: https://reviews.llvm.org/D105360
2021-07-15 13:19:47 -05:00
Victor Huang d40e8091bd [PowerPC] Add PowerPC rotate related builtins and emit target independent code for XL compatibility
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtins and emit target independent
code for rotate related operations.

Reviewed By: nemanjai, #powerpc

Differential revision: https://reviews.llvm.org/D104744
2021-07-15 10:23:54 -05:00
Chuanqi Xu 8a1727ba51 [Coroutines] Run coroutine passes by default
This patch make coroutine passes run by default in LLVM pipeline. Now
the clang and opt could handle IR inputs containing coroutine intrinsics
without special options.
It should be fine. On the one hand, the coroutine passes seems to be stable
since there are already many projects using coroutine feature.
On the other hand, the coroutine passes should do nothing for IR who doesn't
contain coroutine intrinsic.

Test Plan: check-llvm

Reviewed by: lxfind, aeubanks

Differential Revision: https://reviews.llvm.org/D105877
2021-07-15 14:33:40 +08:00
Thomas Lively 4a4229f70f [WebAssembly] Codegen for v128.storeX_lane instructions
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50435.

Differential Revision: https://reviews.llvm.org/D106019
2021-07-14 16:15:25 -07:00
Thomas Lively 970e090010 [WebAssembly] Codegen for v128.loadX_lane instructions
Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.

Differential Revision: https://reviews.llvm.org/D105950
2021-07-14 11:31:53 -07:00
Albion Fung f1aca5ac96 [PowerPC] Fix L[D|W]ARX Implementation
LDARX and LWARX sometimes gets optimized out by the compiler
when it is critical to the correctness of the code. This inline asm generation
ensures that it preserved.

Differential Revision: https://reviews.llvm.org/D105754
2021-07-13 11:02:07 -05:00
Dave MacLachlan 45ffe6341d [clang/objc] Optimize getters for non-atomic, copied properties
Properties that were declared `@property(copy, nonatomic) id foo` make an
unnecessary call to objc_get_property().  This call can be replaced with a
direct access to the backing variable identical to how a `@property(nonatomic)
id foo` would do it.

This reduces codegen by 4 bytes (x86_64/arm64) and removes a cross linkage unit
function call per property declared as copy/nonatomic.

Differential Revision: https://reviews.llvm.org/D105311
2021-07-13 09:22:13 -04:00
Thomas Lively cbabfc63b1 [WebAssembly] Custom combines for f32x4.demote_zero_f64x2
Replace the clang builtin function and LLVM intrinsic for
f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing
combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern.

Differential Revision: https://reviews.llvm.org/D105755
2021-07-12 10:32:18 -07:00
Johannes Doerfert e2cfbfcc0c [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 17:53:56 -05:00
Nico Weber d3e7491333 Revert Attributor patch series
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
2021-07-10 16:15:55 -04:00
Johannes Doerfert 1d5711c3ee [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 12:32:50 -05:00
Thomas Lively e5220104d0 [WebAssembly] Custom combines for f64x2.promote_low_f32x4
Replace the clang builtin function and LLVM intrinsic previously used to select
the f64x2.promote_low_f32x4 instruction with custom combines from standard
SelectionDAG nodes. Implement the new combines to share code with the similar
combines for f64x2.convert_low_i32x4_{s,u}. Resolves PR50232.

Differential Revision: https://reviews.llvm.org/D105675
2021-07-09 18:59:29 -07:00
Alexey Bataev ab8989ab87 [OPENMP]Fix overlapped mapping for dereferenced pointer members.
If the base is used in a map clause and later we have a memberexpr with
this base, and the member is a pointer, and this pointer is dereferenced
anyhow (subscript, array section, dereference, etc.), such components
should be considered as overlapped, otherwise it may lead to incorrect
size computations, since we try to map a pointee as a part of the whole
struct, which is not true for the pointer members.

Differential Revision: https://reviews.llvm.org/D105562
2021-07-09 12:51:26 -07:00
David Blaikie 768e3af634 PR51034: Debug Info: Remove 'prototyped' from K&R function declarations
Regression caused by 6c9559b67b.
2021-07-09 12:07:36 -07:00
Varun Gandhi 92dcb1d2db [Clang] Introduce Swift async calling convention.
This change is intended as initial setup. The plan is to add
more semantic checks later. I plan to update the documentation
as more semantic checks are added (instead of documenting the
details up front). Most of the code closely mirrors that for
the Swift calling convention. Three places are marked as
[FIXME: swiftasynccc]; those will be addressed once the
corresponding convention is introduced in LLVM.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D95561
2021-07-09 11:50:10 -07:00
David Blaikie 1def2579e1 PR51018: Remove explicit conversions from SmallString to StringRef to future-proof against C++23
C++23 will make these conversions ambiguous - so fix them to make the
codebase forward-compatible with C++23 (& a follow-up change I've made
will make this ambiguous/invalid even in <C++23 so we don't regress
this & it generally improves the code anyway)
2021-07-08 13:37:57 -07:00
Nikita Popov a0ea367562 [CodeGen] Avoid nullptr arg to CreateStructGEP (NFC)
For now just make the getPointerElementType() explicit.
2021-07-08 21:21:43 +02:00
Alexey Bataev f57d396dca [OPENMP]Do no privatize const firstprivates in target regions.
No need to emit private copyfor firstprivate constants in target
regions, we can use the original copy instead.

Differential Revision: https://reviews.llvm.org/D105647
2021-07-08 11:55:37 -07:00
Nikita Popov 693251fb2f [CodeGen] Avoid CreateGEP with nullptr type (NFC)
In preparation for dropping support for it. I've replaced it with
a proper type where the correct type was obvious and left an
explicit getPointerElementType() where it wasn't.
2021-07-08 20:38:54 +02:00
Alexey Bataev b3c80dd894 [OPENMP]Remove const firstprivate allocation as a variable in a constant space.
Current implementation is not compatible with asynchronous target
regions, need to remove it.

Differential Revision: https://reviews.llvm.org/D105375
2021-07-07 05:56:48 -07:00
Hsiangkai Wang 593bf9b4de [Clang][RISCV] Implement vlseg and vlsegff.
Differential Revision: https://reviews.llvm.org/D103527
2021-07-07 13:44:40 +08:00
David Blaikie 6c9559b67b DebugInfo: Mangle K&R declarations for debug info linkage names
This fixes a gap in the `overloadable` attribute support (K&R declared
functions would get mangled symbol names, but that name wouldn't be
represented in the debug info linkage name field for the function) and
in -funique-internal-linkage-names (this came up in review discussion on
D98799) where K&R static declarations would not get the uniqued linkage
names.
2021-07-06 16:28:02 -07:00
Xiang1 Zhang a39bb960fc [X86] Refine code of generating BB labels in Keylocker
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D105336
2021-07-05 09:29:51 +08:00
Nikita Popov fabc17192e [IRBuilder] Add type argument to CreateMaskedLoad/Gather
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.

Differential Revision: https://reviews.llvm.org/D105395
2021-07-04 12:17:59 +02:00
Jun Ma 3afbf89804 [clang][AArch64][SVE] Handle PRValue under VLAT <-> VLST cast
This change fixes the crash that PRValue cannot be handled by
EmitLValue.

Differential Revision: https://reviews.llvm.org/D105097
2021-07-01 10:09:47 +08:00
Yaxun (Sam) Liu 434bd5bf54 [AMDGPU] Add builtin functions image_bvh_intersect_ray
Reviewed by: Stanislav Mekhanoshin, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D104946
2021-06-30 13:10:47 -04:00
Melanie Blower e773216f46 [clang][patch] Add builtin __arithmetic_fence and option fprotect-parens
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.

Reviewed By: aaron.ballman, kpn

Differential Revision: https://reviews.llvm.org/D100118
2021-06-30 09:58:06 -04:00
Akira Hatanaka 6cda73e3c4 [CodeGen] Add ParmVarDecls to FunctionDecls that are created to generate
ObjC property getter/setter functions

This is needed to prevent clang from crashing when we make the changes
proposed in https://reviews.llvm.org/D98799.

Differential Revision: https://reviews.llvm.org/D104883
2021-06-29 16:27:24 -07:00
Steffen Larsen 3644726a78 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0.

PTX ISA description of

  - `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld
  - `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st
  - `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma
  - `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma

Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>
Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D104847
2021-06-29 15:44:07 -07:00
Akira Hatanaka 8d21d54725 [CodeGen] Stop creating fake FunctionDecls when generating IR for
functions implicitly generated by the compiler

These fake functions would cause clang to crash if the changes proposed
in https://reviews.llvm.org/D98799 were made.
2021-06-29 14:22:33 -07:00
Akira Hatanaka 952944c12c [ObjC][ARC] Don't add operand bundle clang.arc.attachedcall to a call if
the call already has the operand bundle

This bug was causing the call to `replaceAllUsesWith` to crash because
the old call instruction and the new call instruction were the same.

rdar://74957948

Differential Revision: https://reviews.llvm.org/D97824
2021-06-29 10:23:01 -07:00
Bruno De Fraine 4d8871a898 PR50767: clear non-distinct debuginfo for function with nodebug definition after undecorated declaration
Fix suggested by Yuanfang Chen:

Non-distinct debuginfo is attached to the function due to the undecorated declaration. Later, when seeing the function definition and `nodebug` attribute, the non-distinct debuginfo should be cleared.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D104777
2021-06-29 10:26:45 +02:00
Xiang1 Zhang 6d234a6908 [X86] Zero some outputs of Kelocker intrinsics in error case
Reviewed By: WangPengfei

Differential Revision: https://reviews.llvm.org/D104766
2021-06-29 13:35:40 +08:00
Ben Shi c94c8d8b5d [AVR][clang] Fix wrong calling convention in functions return struct type
According to AVR ABI (https://gcc.gnu.org/wiki/avr-gcc), returned struct value
within size 1-8 bytes should be returned directly (via register r18-r25), while
larger ones should be returned via an implicit struct pointer argument.

Reviewed By: dylanmckay

Differential Revision: https://reviews.llvm.org/D99237
2021-06-29 11:32:39 +08:00
Michael Liao 948308ef34 Fix `-Wunused-variable` warning. NFC. 2021-06-28 22:50:36 -04:00
Hongtao Yu 633ca3ff2f [UniqueLinkageName] Use exsiting GlobalDecl object instead of reconstructing one.
C++ constructors/destructors need to go through a different constructor to construct a GlobalDecl object in order to retrieve their linkage type. This causes an assert failure in the default constructor of GlobalDecl. I'm chaning it to using the exsiting GlobalDecl object.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102356
2021-06-28 14:50:41 -07:00
Melanie Blower c27e5a2a8e Revert "[clang][patch][fpenv] Add builtin __arithmetic_fence and option fprotect-parens"
This reverts commit 4f1238e44d.
Buildbot fails on predecessor patch
2021-06-28 12:42:59 -04:00
Melanie Blower 4f1238e44d [clang][patch][fpenv] Add builtin __arithmetic_fence and option fprotect-parens
This patch adds a new clang builtin, __arithmetic_fence. The purpose of the
builtin is to provide the user fine control, at the expression level, over
floating point optimization when -ffast-math (-ffp-model=fast) is enabled.
The builtin prevents the optimizer from rearranging floating point expression
evaluation. The new option fprotect-parens has the same effect on
parenthesized expressions, forcing the optimizer to respect the parentheses.

Reviewed By: aaron.ballman, kpn

Differential Revision: https://reviews.llvm.org/D100118
2021-06-28 12:26:53 -04:00
Jinsong Ji eb237ffca8 [PowerPC] Add XL Compat fetch builtins
Prototype
```
unsigned int __fetch_and_add (volatile unsigned int* addr, unsigned int
val);
unsigned long __fetch_and_addlp (volatile unsigned long* addr, unsigned
long val);
```
Ref:
https://www.ibm.com/docs/en/xl-c-and-cpp-linux/16.1.1?topic=functions-fetch

Reviewed By: #powerpc, w2yehia, lkail

Differential Revision: https://reviews.llvm.org/D104991
2021-06-28 02:52:32 +00:00
Craig Topper 7a112356e4 [X86] Correct the conversion of VALIGND/Q intrinsics to shufflevector.
We need to mask the immediate to the width of a single vector
rather than 2 vectors. If we use the width of 2 vectors then
any shift larger than the length of 1 vector is going to overflow
the shuffle indices.

Fixes PR50895.
2021-06-26 19:06:00 -07:00
Joseph Huber 9ce02ea8c9 [OpenMP] Add Module metadata for OpenMP compilation
This patch adds a module level metadata flag indicating that the module
was compiled with the `-fopenmp` flag. This will make it easier for
passes like OpenMPOpt to determine if it should be run.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102361
2021-06-25 16:34:19 -04:00
Stuart Brady e47027d091 [OpenCL] Use DW_LANG_OpenCL language tag for OpenCL C
Note regarding C++ for OpenCL:

   When compiling C++ for OpenCL, DW_LANG_C_plus_plus* is emitted.

   There is no DWARF language code defined for C++ for OpenCL as of yet,
   but DWARF issue 210514.1 has been raised to request one.

   In the mean time, continuing to emit DW_LANG_C_plus_plus* for C++ for
   OpenCL allows the potential to distinguish between C++ for OpenCL and
   OpenCL C in !DICompileUnit nodes, whereas using DW_LANG_OpenCL for
   C++ for OpenCL would prevent this.

   This change therefore leaves C++ for OpenCL as-is.

Reviewed By: shchenz, Anastasia

Differential Revision: https://reviews.llvm.org/D104118
2021-06-25 11:48:42 +01:00
Jinsong Ji f3ef4f5bff [PowerPC] Add XL compat __compare_and_swap builtins
Prototype
int __compare_and_swap (volatile int* addr, int* old_val_addr, int
new_val);

int __compare_and_swaplp (volatile long* addr, long* old_val_addr, long
new_val);

Refer to
https://www.ibm.com/docs/en/xl-c-and-cpp-aix/16.1?topic=functions-compare-swap-compare-swaplp

Reviewed By: w2yehia

Differential Revision: https://reviews.llvm.org/D104837
2021-06-25 01:08:48 +00:00
Martin Storsjö e5c7c171e5 [clang] Rename StringRef _lower() method calls to _insensitive()
This is mostly a mechanical change, but a testcase that contains
parts of the StringRef class (clang/test/Analysis/llvm-conventions.cpp)
isn't touched.
2021-06-25 00:22:01 +03:00
Akira Hatanaka 8db0dbbe2c [CodeGen] Don't create fake FunctionDecls when generating block/byref
copy/dispose helper functions

We found out that these fake functions would cause clang to crash if the
changes proposed in https://reviews.llvm.org/D98799 were made.

The original patch was reverted in f681fd927e
because debug locations were missing in the body of the block byref
helper functions. This patch fixes the bug by calling CreateArtificial
after the calls to StartFunction.

Differential Revision: https://reviews.llvm.org/D104082
2021-06-24 11:45:52 -07:00
Aakanksha Patil 3453f3dd46 [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Zequan Wu f681fd927e Revert "[CodeGen] Don't create fake FunctionDecls when generating block/byref"
That commit causes crash with error "!dbg attachment points at wrong subprogram for function" on iOS platforms.

This reverts commit f4c06bcb67.
2021-06-22 21:48:00 -07:00
Akira Hatanaka f4c06bcb67 [CodeGen] Don't create fake FunctionDecls when generating block/byref
copy/dispose helper functions

We found out that these fake functions would cause clang to crash if the
changes proposed in https://reviews.llvm.org/D98799 were made.

Differential Revision: https://reviews.llvm.org/D104082
2021-06-22 11:42:53 -07:00
Fangrui Song 948016228f Improve clang -Wframe-larger-than= diagnostic
Match the style in D104667.

This commit is for non-LTO diagnostics, while D104667 is for LTO and llc diagnostics.
2021-06-22 11:20:49 -07:00
Joseph Huber 68d133a3e8 [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
Graham Hunter a83ce95b09 [clang] Remove unused capture in closure
c6a91ee6aa removed uses of IsMonotonic from OpenMP SIMD codegen,
but that left a capture of the variable unused which upset buildbots
using -Werror.
2021-06-22 15:09:39 +01:00
Graham Hunter c6a91ee6aa [Clang][OpenMP] Monotonic does not apply to SIMD
The codegen for simd constructs was affected by the presence (or
absence) of the 'monotonic' schedule modifier for worksharing
loops. The modifier is only intended to apply to the scheduling of
chunks for a thread, not iterations of a loop inside a chunk.

In addition, the monotonic modifier was applied to worksharing loops
by default if no schedule clause was present; the referenced part of
the OpenMP 4.5 spec in the code (section 2.7.1) only applies if the
user specified a schedule clause with a static kind but no modifier.
Without a user-specified schedule clause we should default to
nonmonotonic scheduling.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103793
2021-06-22 10:24:11 +01:00
Nick Desaulniers 8ace121305 [IR] convert warn-stack-size from module flag to fn attr
Otherwise, this causes issues when building with LTO for object files
that use different values.

Link: https://github.com/ClangBuiltLinux/linux/issues/1395

Reviewed By: dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D104342
2021-06-21 15:09:25 -07:00
Nick Desaulniers 193e41c987 [Clang][Codegen] Add GNU function attribute 'no_profile' and lower it to noprofile
noprofile IR attribute already exists to prevent profiling with PGO;
emit that when a function uses the newly added no_profile function
attribute.

The Linux kernel would like to avoid compiler generated code in
functions annotated with such attribute. We already respect this for
libcalls to fentry() and mcount().

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223
Link: https://lore.kernel.org/lkml/CAKwvOdmPTi93n2L0_yQkrzLdmpxzrOR7zggSzonyaw2PGshApw@mail.gmail.com/

Reviewed By: MaskRay, void, phosek, aaron.ballman

Differential Revision: https://reviews.llvm.org/D104475
2021-06-18 13:42:32 -07:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Chen Zheng 4590b406c0 [Debug-Info] guard DW_LANG_C_plus_plus_14 under strict dwarf
Reviewed By: stuart

Differential Revision: https://reviews.llvm.org/D104291
2021-06-16 03:17:56 +00:00
Kevin Athey e0b469ffa1 [clang-cl][sanitizer] Add -fsanitize-address-use-after-return to clang.
Also:
  - add driver test (fsanitize-use-after-return.c)
  - add basic IR test (asan-use-after-return.cpp)
  - (NFC) cleaned up logic for generating table of __asan_stack_malloc
    depending on flag.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D104076
2021-06-11 12:07:35 -07:00
Simon Pilgrim 61cdaf66fe [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Nick Desaulniers fc018ebb60 [IR] make -warn-frame-size into a module attr
-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.

-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.

Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.

Reviewed By: rsmith, qcolombet

Differential Revision: https://reviews.llvm.org/D103928
2021-06-10 16:15:27 -07:00
Michael Kruse a22236120f [OpenMP] Implement '#pragma omp unroll'.
Implementation of the unroll directive introduced in OpenMP 5.1. Follows the approach from D76342 for the tile directive (i.e. AST-based, not using the OpenMPIRBuilder). Tries to use `llvm.loop.unroll.*` metadata where possible, but has to fall back to an AST representation of the outer loop if the partially unrolled generated loop is associated with another directive (because it needs to compute the number of iterations).

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99459
2021-06-10 14:30:17 -05:00
Joseph Huber 0c32ffceed [OpenMP] Add type to firstprivate symbol for const firstprivate values
Clang will create a global value put in constant memory if an aggregate value
is declared firstprivate in the target device. The symbol name only uses the
name of the firstprivate variable, so symbol name conflicts will occur if the
variable is allowed to have different types through templates. An example of
this behvaiour is shown in https://godbolt.org/z/EsMjYh47n. This patch adds the
mangled type name to the symbol to avoid such naming conflicts. This fixes
https://bugs.llvm.org/show_bug.cgi?id=50642.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103995
2021-06-10 09:02:20 -04:00
Hongtao Yu 64b2fb7967 [CSSPGO] Emit mangled dwarf names for line tables debug option under -fpseudo-probe-for-profiling
Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D103909
2021-06-09 10:46:03 -07:00
AndreyChurbanov 9ce2e5e700 Revert "[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type"
This reverts commit a1f550e052.

Revert in order to fix backwards compatibility breakage
caused by type size change for task dependence flag.
2021-06-09 17:38:38 +03:00
Matheus Izvekov aef5d8fdc7 [clang] NFC: Rename rvalue to prvalue
This renames the expression value categories from rvalue to prvalue,
keeping nomenclature consistent with C++11 onwards.

C++ has the most complicated taxonomy here, and every other language
only uses a subset of it, so it's less confusing to use the C++ names
consistently, and mentally remap to the C names when working on that
context (prvalue -> rvalue, no xvalues, etc).

Renames:
* VK_RValue -> VK_PRValue
* Expr::isRValue -> Expr::isPRValue
* SK_QualificationConversionRValue -> SK_QualificationConversionPRValue
* JSON AST Dumper Expression nodes value category: "rvalue" -> "prvalue"

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D103720
2021-06-09 12:27:10 +02:00
Brendon Cahoon 294efbbd3e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Brendon Cahoon 211e584fa2 Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Nathan Sidwell b2d0c16e91 [clang] p1099 using enum part 2
This implements the 'using enum maybe-qualified-enum-tag ;' part of
1099. It introduces a new 'UsingEnumDecl', subclassed from
'BaseUsingDecl'. Much of the diff is the boilerplate needed to get the
new class set up.

There is one case where we accept ill-formed, but I believe this is
merely an extended case of an existing bug, so consider it
orthogonal. AFAICT in class-scope the c++20 rule is that no 2 using
decls can bring in the same target decl ([namespace.udecl]/8). But we
already accept:

struct A { enum { a }; };
struct B : A { using A::a; };
struct C : B { using A::a;
using B::a; }; // same enumerator

this patch permits mixtures of 'using enum Bob;' and 'using Bob::member;' in the same way.

Differential Revision: https://reviews.llvm.org/D102241
2021-06-08 11:11:46 -07:00
Nick Desaulniers 3787ee4571 reland [IR] make -stack-alignment= into a module attr
Relands commit 433c8d950c with fixes for
MIPS.

Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 10:59:46 -07:00
Brendon Cahoon ea10a86984 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Nick Desaulniers a596b54d47 Revert "[IR] make -stack-alignment= into a module attr"
This reverts commit 433c8d950c.

Breaks the MIPS build.
2021-06-08 08:55:50 -07:00
Nick Desaulniers 433c8d950c [IR] make -stack-alignment= into a module attr
Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 08:31:04 -07:00
Yaxun (Sam) Liu 054cc3b1b4 [CUDA][HIP] Fix store of vtbl in ctor
vtbl itself is in default global address space. When clang emits
ctor, it gets a pointer to the vtbl field based on the this pointer,
then stores vtbl to the pointer.

Since this pointer can point to any address space (e.g. an object
created in stack), this pointer points to default address space, therefore
the pointer to vtbl field in this object should also be in default
address space.

Currently, clang incorrectly casts the pointer to vtbl field in this object
to global address space. This caused assertions in backend.

This patch fixes that by removing the incorrect addr space cast.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D103835
2021-06-08 10:24:44 -04:00
Martin Storsjö b34da6ff9c [clang] Apply MS ABI details on __builtin_ms_va_list on non-windows platforms on x86_64
This fixes inconsistencies in the ms_abi.c testcase.

Also add a couple cases of missing double pointers in the windows part
of the testcase; the outcome of building that testcase on windows hasn't
changed, but the previous form of the test was imprecise (checking
for "%[[STRUCT_FOO]]*" when clang actually generates "%[[STRUCT_FOO]]**"),
which still used to match.

Ideally this would share code with the native Windows case, but
X86_64ABIInfo and WinX86_64ABIInfo aren't superclasses/subclasses of
each other so it's impractical, and the code to share currently only
consists of a couple lines.

Differential Revision: https://reviews.llvm.org/D103837
2021-06-08 12:14:12 +03:00
Martin Storsjö 6de45b9e6a [clang] Fix reading long doubles with va_arg on x86_64 mingw
On x86_64 mingw, long doubles are always passed indirectly as
arguments (see an existing case in WinX86_64ABIInfo::classify);
generalize the existing code for reading varargs - any non-aggregate
type that is larger than 64 bits (which would be both long double
in mingw, and __int128) are passed indirectly too.

This makes reading varargs consistent with how they're passed,
fixing interop with both gcc and clang callers, for long double
and __int128.

Differential Revision: https://reviews.llvm.org/D103452
2021-06-07 22:34:10 +03:00
AndreyChurbanov a1f550e052 [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type
Refactored code of dependence processing and added new inoutset dependence type.
Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps.
Size of type of the dependence flag changed from 1 to 4 bytes in clang.
All dependence flags library gets so far and corresponding dependence types:
1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET.

Differential Revision: https://reviews.llvm.org/D97085
2021-06-07 21:42:51 +03:00
Hsiangkai Wang 2b13ff6979 [Clang][CodeGen] Set the size of llvm.lifetime to unknown for scalable types.
If the memory object is scalable type, we do not know the exact size of
it at compile time. Set the size of lifetime marker to unknown if the
object is scalable one.

Differential Revision: https://reviews.llvm.org/D102822
2021-06-07 23:30:13 +08:00
Nathan Sidwell ddda05add5 [clang][NFC] Break out BaseUsingDecl from UsingDecl
This is a pre-patch for adding using-enum support.  It breaks out
the shadow decl handling of UsingDecl to a new intermediate base
class, BaseUsingDecl, altering the decl hierarchy to

def BaseUsing : DeclNode<Named, "", 1>;
  def Using : DeclNode<BaseUsing>;
def UsingPack : DeclNode<Named>;
def UsingShadow : DeclNode<Named>;
  def ConstructorUsingShadow : DeclNode<UsingShadow>;

Differential Revision: https://reviews.llvm.org/D101777
2021-06-07 06:29:28 -07:00
Bradley Smith 60c9b5f35c [AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics
Use llvm.experimental.vector.insert instead of storing into an alloca
when generating code for these intrinsics. This defers the codegen of
the generated vector to instruction selection, allowing existing
shufflevector style optimizations to apply.

Additionally, introduce a new target transform that can recognise fixed
predicate patterns in the svbool variants of these intrinsics.

Differential Revision: https://reviews.llvm.org/D103082
2021-06-07 12:21:38 +01:00
Andrew Savonichev b31f41e78b [Clang] Support a user-defined __dso_handle
This fixes PR49198: Wrong usage of __dso_handle in user code leads to
a compiler crash.

When Init is an address of the global itself, we need to track it
across RAUW. Otherwise the initializer can be destroyed if the global
is replaced.

Differential Revision: https://reviews.llvm.org/D101156
2021-06-07 12:54:08 +03:00
Ten Tzen 33ba8bd2c9 [Windows SEH]: Fix -O2 crash for Windows -EHa
This patch fixes a Windows -EHa crash induced by previous commit 797ad70152.
The crash was caused by "LifetimeMarker" scope (with option -O2) that should not be considered as SEH Scope.

This change also turns off -fasync-exceptions by default under -EHa option for now.

Differential Revision: https://reviews.llvm.org/D103664#2799944
2021-06-04 14:07:44 -07:00
Alexey Bataev c84a5448b5 [OPENMP]Fix PR50129: omp cancel parallel not working as expected.
Need to emit a call for __kmpc_cancel_barrier in the exit block for
__kmpc_cancel function call if cancellation of the parallel block is
requested.

Differential Revision: https://reviews.llvm.org/D103646
2021-06-04 08:24:55 -07:00
Michael Kruse 07a6beb402 [Clang][OpenMP] Emit dependent PreInits before directive.
The PreInits of a loop transformation (atm moment only tile) include the computation of the trip count. The trip count is needed by any loop-associated directives that consumes the transformation-generated loop. Hence, we must ensure that the PreInits of consumed loop transformations are emitted with the consuming directive.

This is done by addinging the inner loop transformation's PreInits to the outer loop-directive's PreInits. The outer loop-directive will consume the de-sugared AST such that the inner PreInits are not emitted twice. The PreInits of a loop transformation are still emitted directly if its generated loop(s) are not associated with another loop-associated directive.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D102180
2021-06-02 16:59:35 -05:00
Erik Pilkington 369c648399 [clang] Implement the using_if_exists attribute
This attribute applies to a using declaration, and permits importing a
declaration without knowing if that declaration exists. This is useful
for libc++ C wrapper headers that re-export declarations in std::, in
cases where the base C library doesn't provide all declarations.

This attribute was proposed in http://lists.llvm.org/pipermail/cfe-dev/2020-June/066038.html.

rdar://69313357

Differential Revision: https://reviews.llvm.org/D90188
2021-06-02 10:30:24 -04:00
Eli Friedman 577fea4e1a [CGAtomic] Delete outdated code comparing success/failure ordering for cmpxchg.
This doesn't actually have any effect: we only call this code with
SequentiallyConsistent orderings.  But delete it anyway for consistency
with other recent changes.
2021-05-28 15:36:01 -07:00
Tim Northover e94fada045 SwiftAsync: add Clang attribute to apply the LLVM `swiftasync` one.
Expected to be used by Swift runtime developers.
2021-05-28 12:31:12 +01:00
Martin Storsjö 0e4cf807ae [clang] [MinGW] Don't mark emutls variables as DSO local
These actually can be automatically imported from another DLL. (This
works properly as long as the actual implementation of emutls is
linked dynamically from e.g. libgcc; if the implementation comes from
compiler-rt or a statically linked libgcc, it doesn't work as intended.)

This fixes PR50146 and https://github.com/msys2/MINGW-packages/issues/8706
(fixing calling std::call_once in a dynamically linked libstdc++);
since f731839584 the dso_local attribute
on the TLS variable affected the actual generated code for accessing
the emutls variable.

The dso_local attribute on the emutls variable made those accesses to
use 32 bit relative addressing in code, which requires runtime pseudo
relocations in the text section, and breaks entirely if the actual
other variable ends up loaded too far away in the virtual address
space.

Differential Revision: https://reviews.llvm.org/D102970
2021-05-27 23:51:22 +03:00
Erich Keane eba69b59d1 Reimplement __builtin_unique_stable_name-
The original version of this was reverted, and @rjmcall provided some
advice to architect a new solution.  This is that solution.

This implements a builtin to provide a unique name that is stable across
compilations of this TU for the purposes of implementing the library
component of the unnamed kernel feature of SYCL.  It does this by
running the Itanium mangler with a few modifications.

Because it is somewhat common to wrap non-kernel-related lambdas in
macros that aren't present on the device (such as for logging), this
uniquely generates an ID for all lambdas involved in the naming of a
kernel. It uses the lambda-mangling number to do this, except replaces
this with its own number (starting at 10000 for readabililty reasons)
for lambdas used to name a kernel.

Additionally, this implements itself as constexpr with a slight catch:
if a name would be invalidated by the use of this lambda in a later
kernel invocation, it is diagnosed as an error (see the Sema tests).

Differential Revision: https://reviews.llvm.org/D103112
2021-05-27 07:12:20 -07:00
Marco Elver 280333021e [SanitizeCoverage] Add support for NoSanitizeCoverage function attribute
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.

Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.

Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035

Reviewed By: vitalybuka, morehouse

Differential Revision: https://reviews.llvm.org/D102772
2021-05-25 12:57:14 +02:00
Marco Elver ca6df73406 [NFC][CodeGenOptions] Refactor checking SanitizeCoverage options
Refactor checking SanitizeCoverage options into
CodeGenOptions::hasSanitizeCoverage().

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D102927
2021-05-25 12:57:14 +02:00
Chen Zheng 99d45ed22f [Debug-Info] handle DW_TAG_rvalue_reference_type at strict DWARF.
When -gstrict-dwarf is specified, generate DW_TAG_rvalue_reference_type
at DWARF 4 or above

Reviewed By: dblaikie, aprantl

Differential Revision: https://reviews.llvm.org/D100630
2021-05-23 21:24:13 -04:00
Nick Desaulniers 033138ea45 [IR] make stack-protector-guard-* flags into module attrs
D88631 added initial support for:

- -mstack-protector-guard=
- -mstack-protector-guard-reg=
- -mstack-protector-guard-offset=

flags, and D100919 extended these to AArch64. Unfortunately, these flags
aren't retained for LTO. Make them module attributes rather than
TargetOptions.

Link: https://github.com/ClangBuiltLinux/linux/issues/1378

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D102742
2021-05-21 15:53:30 -07:00
Yaxun (Sam) Liu 4cb42564ec [CUDA][HIP] Fix device variables used by host
variables emitted on both host and device side with different addresses
when ODR-used by host function should not cause device side counter-part
to be force emitted.

This fixes the regression caused by https://reviews.llvm.org/D102237

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102801
2021-05-20 17:04:29 -04:00
Heejin Ahn 3eb12b0ae1 [WebAssembly] Warn on exception spec for Emscripten EH
It turns out we have not correctly supported exception spec all along in
Emscripten EH. Emscripten EH supports `throw()` but not `throw` with
types. See https://bugs.llvm.org/show_bug.cgi?id=50396.

Wasm EH also only supports `throw()` but not `throw` with types, and we
have been printing a warning message for the latter. This prints the
same warning message for `throw` with types when Emscripten EH is used,
or more precisely, when Wasm EH is not used. (So this will print the
warning messsage even when `-fno-exceptions` is used but I think that
should be fine. It's cumbersome to do a complilcated option checking in
CGException.cpp and options checkings are mostly done in elsewhere.)

Reviewed By: dschuff, kripken

Differential Revision: https://reviews.llvm.org/D102791
2021-05-20 13:00:20 -07:00
Reid Kleckner 8f20ac9595 [PGO] Don't reference functions unless value profiling is enabled
This reduces the size of chrome.dll.pdb built with optimizations,
coverage, and line table info from 4,690,210,816 to 2,181,128,192, which
makes it possible to fit under the 4GB limit.

This change can greatly reduce binary size in coverage builds, which do
not need value profiling. IR PGO builds are unaffected. There is a minor
behavior change for frontend PGO.

PGO and coverage both use InstrProfiling to create profile data with
counters. PGO records the address of each function in the __profd_
global. It is used later to map runtime function pointer values back to
source-level function names. Coverage does not appear to use this
information.

Recording the address of every function with code coverage drastically
increases code size. Consider this program:

  void foo();
  void bar();
  inline void inlineMe(int x) {
    if (x > 0)
      foo();
    else
      bar();
  }
  int getVal();
  int main() { inlineMe(getVal()); }

With code coverage, the InstrProfiling pass runs before inlining, and it
captures the address of inlineMe in the __profd_ global. This greatly
increases code size, because now the compiler can no longer delete
trivial code.

One downside to this approach is that users of frontend PGO must apply
the -mllvm -enable-value-profiling flag globally in TUs that enable PGO.
Otherwise, some inline virtual method addresses may not be recorded and
will not be able to be promoted. My assumption is that this mllvm flag
is not popular, and most frontend PGO users don't enable it.

Differential Revision: https://reviews.llvm.org/D102818
2021-05-20 11:09:24 -07:00
Fangrui Song 37561ba89b -fno-semantic-interposition: Don't set dso_local on GlobalVariable
`clang -fpic -fno-semantic-interposition` may set dso_local on variables for -fpic.

GCC folks consider there are 'address interposition' and 'semantic interposition',
and 'disabling semantic interposition' can optimize function calls but
cannot change variable references to use local aliases
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483).

This patch removes dso_local for variables in
`clang -fpic -fno-semantic-interposition` mode so that the built shared objects can
work with copy relocations. Building llvm-project tiself with
-fno-semantic-interposition (D102453) should now be safe with trunk Clang.

Example:
```
// a.c
int var;
int *addr() { return var; }

// old: cannot be interposed
movslq  .Lvar$local(%rip), %rax
// new: can be interposed
movq    var@GOTPCREL(%rip), %rax
movslq  (%rax), %rax
```

The local alias lowering for `GlobalVariable`s is kept in case there is a
future option allowing local aliases.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D102583
2021-05-19 16:08:28 -07:00
Arthur Eubanks 0c509dbc7e [NewPM] Add options to PrintPassInstrumentation
To bring D99599's implementation in line with the existing
PrintPassInstrumentation, and to fix a FIXME, add more customizability
to PrintPassInstrumentation.

Introduce three new options. The first takes over the existing
"-debug-pass-manager-verbose" cl::opt.

The second and third option are specific to -fdebug-pass-structure. They
allow indentation, and also don't print analysis queries.

To avoid more golden file tests than necessary, prune down the
-fdebug-pass-structure tests.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D102196
2021-05-18 20:59:35 -07:00
Benjamin Kramer 3f3642a763 [CodeGen] Avoid unused variable warning in Release builds. NFCI. 2021-05-18 11:09:12 +02:00
Ten Tzen 797ad70152 [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.

This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.

Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.

NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.

The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
  faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
  must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
  outside of _try region must be updated in memory (not just in register)
  before the subsequent exception occurs.

The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.

Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.

This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.

One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:

A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others.  If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:

Warning: jump bypasses variable with a non-trivial destructor

In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.

Implementation:
Part-1: Clang implementation described below.

Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.

Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.

Part-2 (will be in Part-2 patch): LLVM implementation described below.

For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).

For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.

The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.

Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html
https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html

Differential Revision: https://reviews.llvm.org/D80344/new/
2021-05-17 22:42:17 -07:00
Arthur Eubanks 9f7d552cff [NFC] Pass GV value type instead of pointer type to GetOrCreateLLVMGlobal
For opaque pointers, to avoid PointerType::getElementType().

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102638
2021-05-17 18:41:17 -07:00
Eli Friedman 698568b74c [clang CodeGen] Don't crash on large atomic function parameter.
I wouldn't recommend writing code like the testcase; a function
parameter isn't atomic, so using an atomic type doesn't really make
sense.  But it's valid, so clang shouldn't crash on it.

The code was assuming hasAggregateEvaluationKind(Ty) implies Ty is a
RecordType, which isn't true.  Just use isRecordType() instead.

Differential Revision: https://reviews.llvm.org/D102015
2021-05-17 13:18:23 -07:00
Nick Desaulniers 0f41778919 [AArch64] Support customizing stack protector guard
Follow up to D88631 but for aarch64; the Linux kernel uses the command
line flags:

1. -mstack-protector-guard=sysreg
2. -mstack-protector-guard-reg=sp_el0
3. -mstack-protector-guard-offset=0

to use the system register sp_el0 for the stack canary, enabling the
kernel to have a unique stack canary per task (like a thread, but not
limited to userspace as the kernel can preempt itself).

Address pr/47341 for aarch64.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/289
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: xiangzhangllvm, DavidSpickett, dmgreen

Differential Revision: https://reviews.llvm.org/D100919
2021-05-17 11:49:22 -07:00
Irina Dobrescu 50511df32e [AArch64] Lower bitreverse in ISel
Adding lowering support for bitreverse.

Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D102397
2021-05-17 13:35:27 +01:00
Raphael Isemann 888ce70af2 [DebugInfo] Fix DWARF expressions for __block vars that are not on the heap
`__block` variables used to be always stored on the head instead of stack.
D51564 allowed `__block` variables to the stored on the stack like normal
variablesif they not captured by any escaping block, but the debug-info
generation code wasn't made aware of it so we still unconditionally emit DWARF
expressions pointing to the heap.

This patch makes CGDebugInfo use the `EscapingByref` introduced in D51564 that
tracks whether the `__block` variable is actually on the heap. If it's stored on
the stack instead we just use the debug info we would generate for normal
variables instead.

Reviewed By: ahatanak, aprantl

Differential Revision: https://reviews.llvm.org/D99946
2021-05-17 14:32:07 +02:00
Florian Hahn 803c52d0db
Recommit "[Clang,Driver] Add -fveclib=Darwin_libsystem_m support."
Recommit D102489, with the test case requiring the AArch64 backend.

This reverts the revert 59b419adc6.
2021-05-16 18:49:53 +01:00
Pengxuan Zheng c9b36a041f Support GCC's -fstack-usage flag
This patch adds support for GCC's -fstack-usage flag. With this flag, a stack
usage file (i.e., .su file) is generated for each input source file. The format
of the stack usage file is also similar to what is used by GCC. For each
function defined in the source file, a line with the following information is
produced in the .su file.

<source_file>:<line_number>:<function_name> <size_in_byte> <static/dynamic>

"Static" means that the function's frame size is static and the size info is an
accurate reflection of the frame size. While "dynamic" means the function's
frame size can only be determined at run-time because the function manipulates
the stack dynamically (e.g., due to variable size objects). The size info only
reflects the size of the fixed size frame objects in this case and therefore is
not a reliable measure of the total frame size.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D100509
2021-05-15 10:22:49 -07:00
Douglas Yung 59b419adc6 Revert "[Clang,Driver] Add -fveclib=Darwin_libsystem_m support."
This reverts commit 187a14e1f3.

The test added in this commit is failing on several build bots:

https://lab.llvm.org/buildbot/#/builders/139/builds/4059
https://lab.llvm.org/buildbot/#/builders/132/builds/5605
2021-05-14 22:39:12 -07:00
Arthur Eubanks e8448a5985 [NFC] Directly get GV type 2021-05-14 14:27:07 -07:00
Florian Hahn 187a14e1f3
[Clang,Driver] Add -fveclib=Darwin_libsystem_m support.
Support for Darwin's libsystem_m's vector functions has been added to
LLVM in 93a9a8a8d9.

This patch adds support for -fveclib=Darwin_libsystem_m to Clang.

Reviewed By: arphaman

Differential Revision: https://reviews.llvm.org/D102489
2021-05-14 21:00:13 +01:00
Michael Kruse 83ff0ff463 [Clang][OpenMP] Allow unified_shared_memory for Pascal-generation GPUs.
The Pascal architecture supports the page migration engine required for
unified_shared_memory, as indicated by NVIDIA:
 * https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
 * https://developer.nvidia.com/blog/beyond-gpu-memory-limits-unified-memory-pascal/
 * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements

The limitation was introduced in D54493 which justified the cut-off by
the requirement for unified addressing. However, Unified Virtual
Addressing (UVA) is already available with sm20 (Fermi, Kepler,
Maxwell):
 * https://docs.nvidia.com/cuda/gpudirect-rdma/index.html#basics-of-uva-cuda-memory-management

Unified shared memory might even be possible with these, but with
migration of entire allocations on kernel startup.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D101595
2021-05-13 17:15:34 -05:00
Aakanksha Patil 464e4dc50f [AMDGPU] Add gfx1034 target
Differential Revision: https://reviews.llvm.org/D102306
2021-05-13 14:25:18 -04:00
cynecx 8ec9fd4839 Support unwinding from inline assembly
I've taken the following steps to add unwinding support from inline assembly:

1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:

```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
    to label %exit unwind label %uexit
```

2.) Add Bitcode writing/reading support + LLVM-IR parsing.

3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.

4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.

5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.

6.) Don't allow unwinding callbr.

Reviewed By: Amanieu

Differential Revision: https://reviews.llvm.org/D95745
2021-05-13 19:13:03 +01:00
Roman Lebedev 16d0381841
Return "[CGCall] Annotate `this` argument with alignment"
The original change was reverted because it was discovered
that clang mishandles thunks, and they receive wrong
attributes for their this/return types - the ones for the function
they will call, not the ones they have.

While i have tried to fix this in https://reviews.llvm.org/D100388
that patch has been up and stuck for a month now,
with little signs of progress.

So while it will be good to solve this for real,
for now we can simply avoid introducing the bug,
by not annotating this/return for thunks.

This reverts commit 6270b3a1ea,
relanding 0aa0458f14.
2021-05-13 20:33:14 +03:00
Roman Lebedev a624cec56d
[Clang][Codegen] Do not annotate thunk's this/return types with align/deref/nonnull attrs
As it was discovered in post-commit feedback
for 0aa0458f14,
we handle thunks incorrectly, and end up annotating
their this/return with attributes that are valid
for their callees, not for thunks themselves.

While it would be good to fix this properly,
and keep annotating them on thunks,
i've tried doing that in https://reviews.llvm.org/D100388
with little success, and the patch is stuck for a month now.

So for now, as a stopgap measure, subj.
2021-05-13 20:33:08 +03:00
Vassil Vassilev 92f9852fc9 [clang-repl] Recommit "Land initial infrastructure for incremental parsing"
Original commit message:

  In http://lists.llvm.org/pipermail/llvm-dev/2020-July/143257.html we have
  mentioned our plans to make some of the incremental compilation facilities
  available in llvm mainline.

  This patch proposes a minimal version of a repl, clang-repl, which enables
  interpreter-like interaction for C++. For instance:

  ./bin/clang-repl
  clang-repl> int i = 42;
  clang-repl> extern "C" int printf(const char*,...);
  clang-repl> auto r1 = printf("i=%d\n", i);
  i=42
  clang-repl> quit

  The patch allows very limited functionality, for example, it crashes on invalid
  C++. The design of the proposed patch follows closely the design of cling. The
  idea is to gather feedback and gradually evolve both clang-repl and cling to
  what the community agrees upon.

  The IncrementalParser class is responsible for driving the clang parser and
  codegen and allows the compiler infrastructure to process more than one input.
  Every input adds to the “ever-growing” translation unit. That model is enabled
  by an IncrementalAction which prevents teardown when HandleTranslationUnit.

  The IncrementalExecutor class hides some of the underlying implementation
  details of the concrete JIT infrastructure. It exposes the minimal set of
  functionality required by our incremental compiler/interpreter.

  The Transaction class keeps track of the AST and the LLVM IR for each
  incremental input. That tracking information will be later used to implement
  error recovery.

  The Interpreter class orchestrates the IncrementalParser and the
  IncrementalExecutor to model interpreter-like behavior. It provides the public
  API which can be used (in future) when using the interpreter library.

  Differential revision: https://reviews.llvm.org/D96033
2021-05-13 06:30:29 +00:00
Vassil Vassilev f6907152db Revert "[clang-repl] Land initial infrastructure for incremental parsing"
This reverts commit 44a4000181.

We are seeing build failures due to missing dependency to libSupport and
CMake Error at tools/clang/tools/clang-repl/cmake_install.cmake
file INSTALL cannot find
2021-05-13 04:44:19 +00:00
Vassil Vassilev 44a4000181 [clang-repl] Land initial infrastructure for incremental parsing
In http://lists.llvm.org/pipermail/llvm-dev/2020-July/143257.html we have
mentioned our plans to make some of the incremental compilation facilities
available in llvm mainline.

This patch proposes a minimal version of a repl, clang-repl, which enables
interpreter-like interaction for C++. For instance:

./bin/clang-repl
clang-repl> int i = 42;
clang-repl> extern "C" int printf(const char*,...);
clang-repl> auto r1 = printf("i=%d\n", i);
i=42
clang-repl> quit

The patch allows very limited functionality, for example, it crashes on invalid
C++. The design of the proposed patch follows closely the design of cling. The
idea is to gather feedback and gradually evolve both clang-repl and cling to
what the community agrees upon.

The IncrementalParser class is responsible for driving the clang parser and
codegen and allows the compiler infrastructure to process more than one input.
Every input adds to the “ever-growing” translation unit. That model is enabled
by an IncrementalAction which prevents teardown when HandleTranslationUnit.

The IncrementalExecutor class hides some of the underlying implementation
details of the concrete JIT infrastructure. It exposes the minimal set of
functionality required by our incremental compiler/interpreter.

The Transaction class keeps track of the AST and the LLVM IR for each
incremental input. That tracking information will be later used to implement
error recovery.

The Interpreter class orchestrates the IncrementalParser and the
IncrementalExecutor to model interpreter-like behavior. It provides the public
API which can be used (in future) when using the interpreter library.

Differential revision: https://reviews.llvm.org/D96033
2021-05-13 04:23:24 +00:00
Chen Zheng a0ca4c46ca [Debug-Info] add -gstrict-dwarf support in backend
Reviewed By: dblaikie, probinson

Differential Revision: https://reviews.llvm.org/D100826
2021-05-12 23:00:52 -04:00
Roman Lebedev 2d84195d60
[NFCI][clang][Codegen] CodeGenVTables::addVTableComponent(): use getGlobalDecl
It does the same thing.
Split off from https://reviews.llvm.org/D100388
2021-05-12 20:39:54 +03:00
Yaxun (Sam) Liu 98575708da [CUDA][HIP] Fix device template variables
Currently clang does not emit device template variables
instantiated only in host functions, however, nvcc is
able to do that:

https://godbolt.org/z/fneEfferY

This patch fixes this issue by refactoring and extending
the existing mechanism for emitting static device
var ODR-used by host only. Basically clang records
device variables ODR-used by host code and force
them to be emitted in device compilation. The existing
mechanism makes sure these device variables ODR-used
by host code are added to llvm.compiler-used, therefore
they are guaranteed not to be deleted.

It also fixes non-ODR-use of static device variable by host code
causing static device variable to be emitted and registered,
which should not.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102237
2021-05-12 11:13:29 -04:00
Momchil Velikov 5c7b43aa82 [clang][AArch32] Correctly align HA arguments when passed on the stack
Analogously to https://reviews.llvm.org/D98794 this patch uses the
`alignstack` attribute to fix incorrect passing of homogeneous
aggregate (HA) arguments on AArch32. The EABI/AAPCS was recently
updated to clarify how VFP co-processor candidates are aligned:
4488e34998

Differential Revision: https://reviews.llvm.org/D100853
2021-05-10 16:28:46 +01:00
Alexey Bataev 230953d577 [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.
Follow the more general patch for now, do not try to SPMDize the kernel
if the variable is used and local.

Differential Revision: https://reviews.llvm.org/D101911
2021-05-10 06:34:11 -07:00
Arthur Eubanks 34a8a437bf [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797
2021-05-07 21:51:47 -07:00
Arthur Eubanks 6f7131002b [NewPM] Move analysis invalidation/clearing logging to instrumentation
We're trying to move DebugLogging into instrumentation, rather than
being part of PassManagers/AnalysisManagers.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D102093
2021-05-07 15:25:31 -07:00
Olivier Goffart c4adc49a1c [SEH] Fix regression with SEH in noexpect functions
Commit 5baea05601 set the CurCodeDecl
because it was needed to pass the assert in CodeGenFunction::EmitLValueForLambdaField,
But this was not right to do as CodeGenFunction::FinishFunction passes it to EmitEndEHSpec
and cause corruption of the EHStack.

Revert the part of the commit that changes the CurCodeDecl, and instead
adjust the assert to check for a null CurCodeDecl.

Differential Revision: https://reviews.llvm.org/D102027
2021-05-07 13:27:59 -07:00
Ahsan Saghir 25bbff632d [PowerPC] Provide MMA builtins for compatibility
Vector pair intrinsics and builtins were renamed in
https://reviews.llvm.org/D91974 to replace the _mma_ prefix by _vsx_.
However, some projects used the _mma_ version, so this patch adds
these intrinsics to provide compatibility.

Fixes Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=50159

Reviewed By: nemanjai, amyk

Differential Revision: https://reviews.llvm.org/D100482
2021-05-07 09:10:16 -05:00
Bruno Cardoso Lopes 819e0d105e [CGAtomic] Lift strong requirement for remaining compare_exchange combinations
Follow up on 431e3138a and complete the other possible combinations.

Besides enforcing the new behavior, it also mitigates TSAN false positives when
combining orders that used to be stronger.
2021-05-06 21:05:20 -07:00
Johannes Doerfert df729e2b82 [OpenMP] Overhaul `declare target` handling
This patch fixes various issues with our prior `declare target` handling
and extends it to support `omp begin declare target` as well.

This started with PR49649 in mind, trying to provide a way for users to
avoid the "ref" global use introduced for globals with internal linkage.
From there it went down the rabbit hole, e.g., all variables, even
`nohost` ones, were emitted into the device code so it was impossible to
determine if "ref" was needed late in the game (based on the name only).
To make it really useful, `begin declare target` was needed as it can
carry the `device_type`. Not emitting variables eagerly had a ripple
effect. Finally, the precedence of the (explicit) declare target list
items needed to be taken into account, that meant we cannot just look
for any declare target attribute to make a decision. This caused the
handling of functions to require fixup as well.

I tried to clean up things while I was at it, e.g., we should not "parse
declarations and defintions" as part of OpenMP parsing, this will always
break at some point. Instead, we keep track what region we are in and
act on definitions and declarations instead, this is what we do for
declare variant and other begin/end directives already.

Highlights:
  - new diagnosis for restrictions specificed in the standard,
  - delayed emission of globals not mentioned in an explicit
    list of a declare target,
  - omission of `nohost` globals on the host and `host` globals on the
    device,
  - no explicit parsing of declarations in-between `omp [begin] declare
    variant` and the corresponding end anymore, regular parsing instead,
  - precedence for explicit mentions in `declare target` lists over
    implicit mentions in the declaration-definition-seq, and
  - `omp allocate` declarations will now replace an earlier emitted
    global, if necessary.

---

Notes:

The patch is larger than I hoped but it turns out that most changes do
on their own lead to "inconsistent states", which seem less desirable
overall.

After working through this I feel the standard should remove the
explicit declare target forms as the delayed emission is horrible.
That said, while we delay things anyway, it seems to me we check too
often for the current status even though that is often not sufficient to
act upon. There seems to be a lot of duplication that can probably be
trimmed down. Eagerly emitting some things seems pretty weak as an
argument to keep so much logic around.

---

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D101030
2021-05-06 02:10:41 -05:00
Giorgis Georgakoudis f97b843d88 [OpenMP] Fix non-determinism in clang copyin codegen
Codegen for OpeMP copyin has non-deterministic IR output due to the unspecified evaluation order in a codegen conditional branch, which makes automatic test generation unreliable. This patch refactors codegen code to avoid this non-determinism.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101952
2021-05-05 19:24:03 -07:00
Giorgis Georgakoudis 313ee609e1 [OpenMP] Fix non-determinism in clang task codegen (lastprivates)
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101800
2021-05-04 11:56:31 -07:00
Leonard Chan 84c4754372 [clang] Add -fc++-abi= flag for specifying which C++ ABI to use
This implements the flag proposed in RFC
http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html.

The goal is to add a way to override the default target C++ ABI through a
compiler flag. This makes it easier to test and transition between different
C++ ABIs through compile flags rather than build flags.

In this patch:

- Store -fc++-abi= in a LangOpt. This isn't stored in a CodeGenOpt because
  there are instances outside of codegen where Clang needs to know what the
  ABI is (particularly through ASTContext::createCXXABI), and we should be
  able to override the target default if the flag is provided at that point.
- Expose the existing ABIs in TargetCXXABI as values that can be passed
  through this flag.
  - Create a .def file for these ABIs to make it easier to check flag values.
  - Add an error for diagnosing bad ABI flag values.

Differential Revision: https://reviews.llvm.org/D85802
2021-05-04 10:52:13 -07:00
Andrew Savonichev b451ecd86e [Clang][AArch64] Disable rounding of return values for AArch64
If a return value is explicitly rounded to 64 bits, an additional zext
instruction is emitted, and in some cases it prevents tail call
optimization.

As discussed in D100225, this rounding is not necessary and can be
disabled.

Differential Revision: https://reviews.llvm.org/D100591
2021-05-04 20:29:01 +03:00
Nico Weber d7ec48d71b [clang] accept -fsanitize-ignorelist= in addition to -fsanitize-blacklist=
Use that for internal names (including the default ignorelists of the
sanitizers).

Differential Revision: https://reviews.llvm.org/D101832
2021-05-04 10:24:00 -04:00
serge-sans-paille b83b23275b Introduce -Wreserved-identifier
Warn when a declaration uses an identifier that doesn't obey the reserved
identifier rule from C and/or C++.

Differential Revision: https://reviews.llvm.org/D93095
2021-05-04 11:19:01 +02:00
Alex Lorenz 2669abaecf [clang][CodeGen] Use llvm::stable_sort for multi version resolver options
The use of llvm::sort causes periodic failures on the bot with EXPENSIVE_CHECKS enabled,
as the regular sort pre-shuffles the array in the expensive checks mode, leading to a
non-deterministic test result which causes the CodeGenCXX/attr-cpuspecific-outoflinedefs.cpp
testcase to fail on the bot (http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/).
2021-05-03 20:07:00 -07:00
Valentin Clement 63f8226f25 [OpenMPIRBuilder] Add createOffloadMaptypes and createOffloadMapnames functions
Add function to create the offload_maptypes and the offload_mapnames globals. These two functions
are used in clang. They will be used in the Flang/MLIR lowering as well.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101503
2021-05-03 15:42:32 -04:00
Giorgis Georgakoudis a27ca15dd0 [OpenMP] Fix non-determinism in clang task codegen
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D101739
2021-05-03 10:34:38 -07:00
Saurabh Jha 696becbd13 [Matrix] Remove bitcast when casting between matrices of the same size
In matrix type casts, we were doing bitcast when the matrices had the same size. This was incorrect and this patch fixes that.
Also added some new CodeGen tests for signed <-> usigned conversions

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D101754
2021-05-03 15:31:43 +01:00
Yaxun (Sam) Liu c58a6a6fb4 [HIP] Fix device lib selection
Choose optimized device lib bitcode by fp options
for performance.

Reviewed by: Artem Belevich, Fangrui Song

Differential Revision: https://reviews.llvm.org/D101654
2021-05-01 20:31:11 -04:00
Yaxun (Sam) Liu 0175999805 [AMDGPU] Add options -mamdgpu-ieee -mno-amdgpu-ieee
AMDGPU backend need to know whether floating point opcodes that support exception
flag gathering quiet and propagate signaling NaN inputs per IEEE754-2008, which is
conveyed by a function attribute "amdgpu-ieee". "amdgpu-ieee"="false" turns this off.
Without this function attribute backend assumes it is on for compute functions.

-mamdgpu-ieee and -mno-amdgpu-ieee are added to Clang to control this function attribute.
By default it is on. -mno-amdgpu-ieee requires -fno-honor-nans or equivalent.

Reviewed by: Matt Arsenault

Differential Revision: https://reviews.llvm.org/D77013
2021-05-01 09:02:55 -04:00
Nemanja Ivanovic c3da07d216 [PowerPC] Provide fastmath sqrt and div functions in altivec.h
This adds the long overdue implementations of these functions
that have been part of the ABI document and are now part of
the "Power Vector Intrinsic Programming Reference" (PVIPR).

The approach is to add new builtins and to emit code with
the fast flag regardless of whether fastmath was specified
on the command line.

Differential revision: https://reviews.llvm.org/D101209
2021-04-30 19:17:48 -05:00
Joel E. Denny 82e99f5035 [OpenMP] Fix second debug name from map clause
This patch fixes a bug from D89802.  For example, without it, Clang
generates x as the debug map name for both x and y in the following
example:

```
 #pragma omp target map(to: x, y)
 x = y = 1;
```

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D101564
2021-04-30 16:26:59 -04:00
Florian Hahn 6c31295493
[clang] Refactor mustprogress handling, add it to all loops in c++11+.
Currently Clang does not add mustprogress to inifinite loops with a
known constant condition, matching C11 behavior. The forward progress
guarantee in C++11 and later should allow us to add mustprogress to any
loop (http://eel.is/c++draft/intro.progress#1).

This allows us to simplify the code dealing with adding mustprogress a
bit.

Reviewed By: aaron.ballman, lebedev.ri

Differential Revision: https://reviews.llvm.org/D96418
2021-04-30 14:13:47 +01:00
Akira Hatanaka 2e1d9ebd46 [ObjC][ARC] Don't enter the cleanup scope if the initializer expression
isn't an ExprWithCleanups

This patch fixes a bug where a temporary ObjC pointer is released before
the end of the full expression.

This fixes PR50043.

rdar://77030453

Differential Revision: https://reviews.llvm.org/D101502
2021-04-29 16:04:30 -07:00
Chirag Khandelwal c204106188 [Clang][OpenMP] Frontend work for sections - D89671
This patch is child of D89671, contains the clang
implementation to use the OpenMP IRBuilder's section
construct.

Co-author: @anchu-rajendran

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D91054
2021-04-29 19:52:27 +05:30
Evgeny Leviant 6a0283d0d2 [NewPM] Add an option to dump pass structure
Patch adds -debug-pass-structure option to dump pass structure when
new pass manager is used.

Differential revision: https://reviews.llvm.org/D99599
2021-04-29 10:29:42 +03:00
Dan Liew 1bbbcff99d [NFC] Rename SanitizeAddressDtorKind codegen opt to not have `Kind` suffix.
This is post commit follow up based on discussions in
https://reviews.llvm.org/D101122.

Differential Revision: https://reviews.llvm.org/D101490

(cherry picked from commit f4c7e82d1b21e637c4e0c53125b126c407d8bdbf)
2021-04-28 18:37:16 -07:00
Ryan Santhirarajan 0395f9e70b [ARM] Neon Polynomial vadd Intrinsic fix
The Neon vadd intrinsics were added to the ARMSIMD intrinsic map,
however due to being defined under an AArch64 guard in arm_neon.td,
were not previously useable on ARM. This change rectifies that.

It is important to note that poly128 is not valid on ARM, thus it was
extracted out of the original arm_neon.td definition and separated
for the sake of AArch64.

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D100772
2021-04-28 11:59:40 -07:00
Nico Weber 0f1137ba79 [clang/Basic] Make TargetInfo.h not use DataLayout again
Reverts parts of https://reviews.llvm.org/D17183, but keeps the
resetDataLayout() API and adds an assert that checks that datalayout string and
user label prefix are in sync.

Approach 1 in https://reviews.llvm.org/D17183#2653279
Reduces number of TUs build for 'clang-format' from 689 to 575.

I also implemented approach 2 in D100764. If someone feels motivated
to make us use DataLayout more, it's easy to revert this change here
and go with D100764 instead. I don't plan on doing more work in this
area though, so I prefer going with the smaller, more self-consistent change.

Differential Revision: https://reviews.llvm.org/D100776
2021-04-27 22:26:10 -04:00
Yonghong Song a2a3ca8d97 BPF: emit debuginfo for Function of DeclRefExpr if requested
Commit e3d8ee35e4 ("reland "[DebugInfo] Support to emit debugInfo
for extern variables"") added support to emit debugInfo for
extern variables if requested by the target. Currently, only
BPF target enables this feature by default.

As BPF ecosystem grows, callback function started to get
support, e.g., recently bpf_for_each_map_elem() is introduced
(https://lwn.net/Articles/846504/) with a callback function as an
argument. In the future we may have something like below as
a demonstration of use case :
    extern int do_work(int);
    long bpf_helper(void *callback_fn, void *callback_ctx, ...);
    long prog_main() {
        struct { ... } ctx = { ... };
        return bpf_helper(&do_work, &ctx, ...);
    }
Basically bpf helper may have a callback function and the
callback function is defined in another file or in the kernel.
In this case, we would like to know the debuginfo types for
do_work(), so the verifier can proper verify the safety of
bpf_helper() call.

For the following example,
    extern int do_work(int);
    long bpf_helper(void *callback_fn);
    long prog() {
        return bpf_helper(&do_work);
    }

Currently, there is no debuginfo generated for extern function do_work().
In the IR, we have,
    ...
    define dso_local i64 @prog() local_unnamed_addr #0 !dbg !7 {
    entry:
      %call = tail call i64 @bpf_helper(i8* bitcast (i32 (i32)* @do_work to i8*)) #2, !dbg !11
      ret i64 %call, !dbg !12
    }
    ...
    declare dso_local i32 @do_work(i32) #1
    ...

This patch added support for the above callback function use case, and
the generated IR looks like below:
    ...
    declare !dbg !17 dso_local i32 @do_work(i32) #1
    ...
    !17 = !DISubprogram(name: "do_work", scope: !1, file: !1, line: 1, type: !18, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !2)
    !18 = !DISubroutineType(types: !19)
    !19 = !{!20, !20}
    !20 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

The TargetInfo.allowDebugInfoForExternalVar is renamed to
TargetInfo.allowDebugInfoForExternalRef as now it guards
both extern variable and extern function debuginfo generation.

Differential Revision: https://reviews.llvm.org/D100567
2021-04-26 16:53:25 -07:00
Alexey Bader 7818906ca1 [SYCL] Implement SYCL address space attributes handling
Default address space (applies when no explicit address space was
specified) maps to generic (4) address space.

Added SYCL named address spaces `sycl_global`, `sycl_local` and
`sycl_private` defined as sub-sets of the default address space.

Static variables without address space now reside in global address
space when compile for SPIR target, unless they have an explicit address
space qualifier in source code.

Differential Revision: https://reviews.llvm.org/D89909
2021-04-26 13:44:10 +03:00
Levy Hsu 8cf54c7ff5 [RISCV] [1/2] Add IR intrinsic for Zbe extension
RV32/64:
bcompress
bdecompress

RV64 ONLY:
bcompressw
bdecompressw

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101143
2021-04-25 19:14:34 -07:00
Thomas Lively 502f54049d [WebAssembly] Finalize wasm_simd128.h intrinsics
Adds new intrinsics for instructions that are in the final SIMD spec but did not
previously have intrinsics. Also updates the names of existing intrinsics to
reflect the final names of the underlying instructions in the spec. Keeps the
old names as deprecated functions to ease the transition to the new names.

Differential Revision: https://reviews.llvm.org/D101112
2021-04-23 13:37:27 -07:00
Fangrui Song a92dbadffe [OpenMP] Fix -Wdeprecated-copy 2021-04-23 10:49:19 -07:00
Dávid Bolvanský 2cae7025c1 Reland "[Clang] Propagate guaranteed alignment for malloc and others"
This relands commit 6914a0ed2b. Crash in InstCombine was fixed.
2021-04-23 14:05:57 +02:00
Dávid Bolvanský 6914a0ed2b Revert "[Clang] Propagate guaranteed alignment for malloc and others"
This reverts commit c2297544c0. Some buildbots are broken.
2021-04-23 11:33:33 +02:00
Dávid Bolvanský c2297544c0 [Clang] Propagate guaranteed alignment for malloc and others
LLVM should be smarter about *known* malloc's alignment and this knowledge may enable other optimizations.

Originally started as LLVM patch - https://reviews.llvm.org/D100862 but this logic should be really in Clang.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D100879
2021-04-23 11:07:14 +02:00
Fangrui Song 2786e673c7 [IR][sanitizer] Add module flag "frame-pointer" and set it for cc1 -mframe-pointer={non-leaf,all}
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.

Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)

Also document the function attribute "frame-pointer" which is long overdue.

Differential Revision: https://reviews.llvm.org/D101016
2021-04-22 18:07:30 -07:00
Levy Hsu b49337bbb9 [RISCV] [1/2] Add IR intrinsic for Zbp extension
RV32/64:
    grev
    grevi
    gorc
    gorci
    shfl
    shfli
    unshfl
    unshfli

RV64 ONLY:
    grevw
    greviw
    gorcw
    gorciw
    shflw
    shfli     (For non-existing shfliw)
    unshfli   (For non-existing unshfliw)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100830
2021-04-22 16:34:51 -07:00
Fangrui Song ef5e7f90ea Temporarily revert the code part of D100981 "Delete le32/le64 targets"
This partially reverts commit 77ac823fd2.

Halide uses le32/le64 (https://github.com/halide/Halide/pull/5934).
Temporarily brings back the code part to give them some time for migration.
2021-04-22 10:18:44 -07:00
Hamza Mahfooz be2277fbf2
[Matrix] Support #pragma clang fp
From https://bugs.llvm.org/show_bug.cgi?id=49739:

Currently, `#pragma clang fp` are ignored for matrix types.

For the code below, the `contract` fast-math flag should be added to the generated call to `llvm.matrix.multiply` and `fadd`

```
typedef float fx2x2_t __attribute__((matrix_type(2, 2)));

void foo(fx2x2_t &A, fx2x2_t &C, fx2x2_t &B) {
  #pragma clang fp contract(fast)
  C = A*B + C;
}
```

Reviewed By: fhahn, mibintc

Differential Revision: https://reviews.llvm.org/D100834
2021-04-22 11:45:34 +01:00
Giorgis Georgakoudis a2dbfb6b72 [OpenMP] Simplify offloading parallel call codegen
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.

Reviewed By: jdoerfert, Meinersbur

Differential Revision: https://reviews.llvm.org/D95976
2021-04-21 18:46:07 -07:00
Fangrui Song 77ac823fd2 Delete le32/le64 targets
They are unused now.

Note: NaCl is still used and is currently expected to be needed until 2022-06
(https://blog.chromium.org/2020/08/changes-to-chrome-app-support-timeline.html).

Differential Revision: https://reviews.llvm.org/D100981
2021-04-21 18:44:12 -07:00
Fangrui Song 775a9483e5 [IR][sanitizer] Set nounwind on module ctor/dtor, additionally set uwtable if -fasynchronous-unwind-tables
On ELF targets, if a function has uwtable or personality, or does not have
nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.

Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified.
(i.e. If -g[123], every function gets `.eh_frame`.
This behavior is strange but that is the status quo on GCC and Clang.)

Let's take asan as an example. Other sanitizers are similar.
`asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true,
so every function gets `.eh_frame` if `-g[123]` is specified.
This is the root cause that
`-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame
while
`-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.

This patch

* sets the nounwind attribute on sanitizer module ctor/dtor.
* let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.

The "uwtable" mechanism is generic: synthesized functions not cloned/specialized
from existing ones should consider `Function::createWithDefaultAttr` instead of
`Function::create` if they want to get some default attributes which
have more of module semantics.

Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955
https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.

Differential Revision: https://reviews.llvm.org/D100251
2021-04-21 15:58:20 -07:00
Alexey Bataev 079884225a [OPENMP]Fix PR49698: OpenMP declare mapper causes segmentation fault.
The implicitly generated mappings for allocation/deallocation in mappers
runtime should be mapped as implicit, also no need to clear member_of
flag to avoid ref counter increment. Also, the ref counter should not be
incremented for the very first element that comes from the mapper
function.

Differential Revision: https://reviews.llvm.org/D100673
2021-04-21 10:38:31 -07:00
Erich Keane 0ed613612c Ensure target-multiversioning emits deferred declarations
As reported in PR50025, sometimes we would end up not emitting functions
needed by inline multiversioned variants. This is because we typically
use the 'deferred decl' mechanism to emit these.  However, the variants
are emitted after that typically happens.  This fixes that by ensuring
we re-run deferred decls after this happens. Also, the multiversion
emission is done recursively to ensure that MV functions that require
other MV functions to be emitted get emitted.
2021-04-20 08:10:26 -07:00
Jan Svoboda 6a72ed239c [clang] NFC: Fix range-based for loop warnings related to decl lookup 2021-04-19 18:31:31 +02:00
Xun Li 5faba87938 Revert "[Coroutines] Set presplit attribute in Clang instead of CoroEarly pass"
This reverts commit fa6b54c44a.
The commited patch broke mlir tests. It seems that mlir tests depend on coroutine function properties set in CoroEarly pass.
2021-04-18 17:22:28 -07:00
Xun Li fa6b54c44a [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920

To fix this, we set the attributes in the Clang front-end instead of in CoroEarly pass.

Reviewed By: rjmccall, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D100282
2021-04-18 15:41:09 -07:00
Xun Li c0211e8d7d Revert "[Coroutines] Move CoroEarly pass to before AlwaysInliner"
This reverts commit 2b50f5a434.
Forgot to update the description of the commit to sync with phabricator. Going to redo the commit.
2021-04-18 15:38:19 -07:00
Xun Li 2b50f5a434 [Coroutines] Move CoroEarly pass to before AlwaysInliner
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920

Differential Revision: https://reviews.llvm.org/D100282
2021-04-18 14:54:04 -07:00
Yaxun (Sam) Liu d5c0f00e21 [CUDA][HIP] Mark device var used by host only
Add device variables to llvm.compiler.used if they are
ODR-used by either host or device functions.

This is necessary to prevent them from being
eliminated by whole-program optimization
where the compiler has no way to know a device
variable is used by some host code.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D98814
2021-04-17 11:25:25 -04:00
Serge Guelton d6de1e1a71 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Thomas Lively 5c729750a6 [WebAssembly] Remove saturating fp-to-int target intrinsics
Use the target-independent @llvm.fptosi and @llvm.fptoui intrinsics instead.
This includes removing the instrinsics for i32x4.trunc_sat_zero_f64x2_{s,u},
which are now represented in IR as a saturating truncation to a v2i32 followed by
a concatenation with a zero vector.

Differential Revision: https://reviews.llvm.org/D100596
2021-04-16 12:11:20 -07:00
Alexey Bataev 10c7b9f64f [OPENMP]Fix PR49115: Incorrect results for scan directive.
For combined worksharing directives need to emit the temp arrays outside
of the parallel region and update them in the master thread only.

Differential Revision: https://reviews.llvm.org/D100187
2021-04-16 06:25:35 -07:00
Joshua Haberman 8344675908 Implemented [[clang::musttail]] attribute for guaranteed tail calls.
This is a Clang-only change and depends on the existing "musttail"
support already implemented in LLVM.

The [[clang::musttail]] attribute goes on a return statement, not
a function definition. There are several constraints that the user
must follow when using [[clang::musttail]], and these constraints
are verified by Sema.

Tail calls are supported on regular function calls, calls through a
function pointer, member function calls, and even pointer to member.

Future work would be to throw a warning if a users tries to pass
a pointer or reference to a local variable through a musttail call.

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D99517
2021-04-15 17:12:21 -07:00
Momchil Velikov f9d932e673 [clang][AArch64] Correctly align HFA arguments when passed on the stack
When we pass a AArch64 Homogeneous Floating-Point
Aggregate (HFA) argument with increased alignment
requirements, for example

    struct S {
      __attribute__ ((__aligned__(16))) double v[4];
    };

Clang uses `[4 x double]` for the parameter, which is passed
on the stack at alignment 8, whereas it should be at
alignment 16, following Rule C.4 in
AAPCS (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules)

Currently we don't have a way to express in LLVM IR the
alignment requirements of the function arguments. The align
attribute is applicable to pointers only, and only for some
special ways of passing arguments (e..g byval). When
implementing AAPCS32/AAPCS64, clang resorts to dubious hacks
of coercing to types, which naturally have the needed
alignment. We don't have enough types to cover all the
cases, though.

This patch introduces a new use of the stackalign attribute
to control stack slot alignment, when and if an argument is
passed in memory.

The attribute align is left as an optimizer hint - it still
applies to pointer types only and pertains to the content of
the pointer, whereas the alignment of the pointer itself is
determined by the stackalign attribute.

For byval arguments, the stackalign attribute assumes the
role, previously perfomed by align, falling back to align if
stackalign` is absent.

On the clang side, when passing arguments using the "direct"
style (cf. `ABIArgInfo::Kind`), now we can optionally
specify an alignment, which is emitted as the new
`stackalign` attribute.

Patch by Momchil Velikov and Lucas Prates.

Differential Revision: https://reviews.llvm.org/D98794
2021-04-15 22:58:14 +01:00
Martin Storsjö 8e0f2e89ff [clang] [AArch64] Fix handling of HFAs passed to Windows variadic functions
The documentation says that for variadic functions, all composites
are treated similarly, no special handling of HFAs/HVAs, not even
for the fixed arguments of a variadic function.

Differential Revision: https://reviews.llvm.org/D100467
2021-04-15 22:21:27 +03:00
cchen e0c2125d1d [OpenMP] Added codegen for masked directive
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D100514
2021-04-15 12:55:07 -05:00
Thomas Lively 6a18cc23ef [WebAssembly] Codegen for i64x2.extend_{low,high}_i32x4_{s,u}
Removes the builtins and intrinsics used to opt in to using these instructions
and replaces them with normal ISel patterns now that they are no longer
prototypes.

Differential Revision: https://reviews.llvm.org/D100402
2021-04-14 13:43:09 -07:00
Thomas Lively af7925b4dd [WebAssembly] Codegen for f64x2.convert_low_i32x4_{s,u}
Add a custom DAG combine and ISD opcode for detecting patterns like

  (uint_to_fp (extract_subvector ...))

before the extract_subvector is expanded to ensure that they will ultimately
lower to f64x2.convert_low_i32x4_{s,u} instructions. Since these instructions
are no longer prototypes and can now be produced via standard IR, this commit
also removes the target intrinsics and builtins that had been used to prototype
the instructions.

Differential Revision: https://reviews.llvm.org/D100425
2021-04-14 10:42:45 -07:00
Thomas Lively af7ab81ce3 [WebAssembly] Use standard intrinsics for f32x4 and f64x2 ops
Now that these instructions are no longer prototypes, we do not need to be
careful about keeping them opt-in and can use the standard LLVM infrastructure
for them. This commit removes the bespoke intrinsics we were using to represent
these operations in favor of the corresponding target-independent intrinsics.
The clang builtins are preserved because there is no standard way to easily
represent these operations in C/C++.

For consistency with the scalar codegen in the Wasm backend, the intrinsic used
to represent {f32x4,f64x2}.nearest is @llvm.nearbyint even though
@llvm.roundeven better captures the semantics of the underlying Wasm
instruction. Replacing our use of @llvm.nearbyint with use of @llvm.roundeven is
left to a potential future patch.

Differential Revision: https://reviews.llvm.org/D100411
2021-04-14 09:19:27 -07:00
Martin Storsjö 3637c5c8ec [clang] [AArch64] Fix Windows va_arg handling for larger structs
Aggregate types over 16 bytes are passed by reference.

Contrary to the x86_64 ABI, smaller structs with an odd (non power
of two) are padded and passed in registers.

Differential Revision: https://reviews.llvm.org/D100374
2021-04-14 14:51:53 +03:00
Liu, Chen3 1c4108ab66 [i386] Modify the alignment of __m128/__m256/__m512 vector type according i386 abi.
According to i386 System V ABI:

1. when __m256 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 32 byte boundary at the time of the call.
2. when __m512 are required to be passed on the stack, the stack pointer must be aligned on a 0 mod 64 byte boundary at the time of the call.

The current method of clang passing __m512 parameter are as follow:

1. when target supports avx512, passing it with 64 byte alignment;
2. when target supports avx, passing it with 32 byte alignment;
3. Otherwise, passing it with 16 byte alignment.

Passing __m256 parameter are as follow:

1. when target supports avx or avx512, passing it with 32 byte alignment;
2. Otherwise, passing it with 16 byte alignment.

This pach will passing __m128/__m256/__m512 following i386 System V ABI and
apply it to Linux only since other System V OS (e.g Darwin, PS4 and FreeBSD) don't
want to spend any effort dealing with the ramifications of ABI breaks at present.

Differential Revision: https://reviews.llvm.org/D78564
2021-04-14 16:44:54 +08:00
Ben Dunbobbin eae2d4b852 [Windows Itanium][PS4] handle dllimport/export w.r.t vtables/rtti
The existing Windows Itanium patches for dllimport/export
behaviour w.r.t vtables/rtti can't be adopted for PS4 due to
backwards compatibility reasons (see comments on
https://reviews.llvm.org/D90299).

This commit adds our PS4 scheme for this to Clang.

Differential Revision: https://reviews.llvm.org/D93203
2021-04-13 11:41:10 +01:00
Esme-Yi dff922f39b Reland [DebugInfo] Fix the mismatching between C++ language tags and Dwarf versions.""
This reverts commit c965e14a12.
2021-04-12 11:05:55 +00:00
Esme-Yi c965e14a12 Revert "[DebugInfo] Fix the mismatching between C++ language tags and Dwarf versions."
This reverts commit 62fa9b9388.
2021-04-12 10:36:46 +00:00
Esme-Yi 62fa9b9388 [DebugInfo] Fix the mismatching between C++ language tags and Dwarf versions.
Summary: The tags DW_LANG_C_plus_plus_14 and DW_LANG_C_plus_plus_11, introduced in Dwarf-5, are unexpected in previous versions. Fixing the mismathing doesn't have any drawbacks for any other debuggers, but helps dbx.

Reviewed By: aprantl, shchenz

Differential Revision: https://reviews.llvm.org/D99250
2021-04-12 07:42:54 +00:00
yifeng.dongyifeng 3a6a80b641 [Clang][Coroutine][DebugInfo] In c++ coroutine, clang will emit different debug info variables for parameters and move-parameters.
The first one is the real parameters of the coroutine function, the
other one just for copying parameters to the coroutine frame.

Considering the following c++ code:
```
struct coro {
  ...
};

coro foo(struct test & t) {
  ...
  co_await suspend_always();
    ...
    co_await suspend_always();
    ...
    co_await suspend_always();
}

int main(int argc, char *argv[]) {
  auto c = foo(...);
    c.handle.resume();
      ...
  }
```

Function foo is the standard coroutine function, and it has only
one parameter named t (ignoring this at first),
when we use the llvm code to compile this function, we can get the
following ir:

```
!2921 = distinct !DISubprogram(name: "foo", linkageName:
"_ZN6Object3fooE4test", scope: !2211, file: !45, li\
ne: 48, type: !2329, scopeLine: 48, flags: DIFlagPrototyped |
DIFlagAllCallsDescribed, spFlags: DISPFlagDefi\
nition | DISPFlagOptimized, unit: !44, declaration: !2328,
retainedNodes: !2922)
!2924 = !DILocalVariable(name: "t", arg: 2, scope: !2921, file: !45,
line: 48, type: !838)
...
!2926 = !DILocalVariable(name: "t", scope: !2921, type: !838, flags:
DIFlagArtificial)
```
We can find there are two `the same` DIVariable named t in the same
dwarf scope for foo.resume.
And when we try to use llvm-dwarfdump to dump the dwarf info of this
elf, we get the following output:

```
0x00006684:   DW_TAG_subprogram
                DW_AT_low_pc    (0x00000000004013a0)
                DW_AT_high_pc   (0x00000000004013a8)
                DW_AT_frame_base        (DW_OP_reg7 RSP)
                DW_AT_object_pointer    (0x0000669c)
                DW_AT_GNU_all_call_sites        (true)
                DW_AT_specification     (0x00005b5c "_ZN6Object3fooE4test")

0x000066a5:     DW_TAG_formal_parameter
                DW_AT_name    ("t")
                DW_AT_decl_file       ("/disk1/yifeng.dongyifeng/my_code/llvm/build/bin/coro-debug-1.cpp")
                DW_AT_decl_line       (48)
                DW_AT_type    (0x00004146 "test")

0x000066ba:     DW_TAG_variable
                  DW_AT_name    ("t")
                  DW_AT_type    (0x00004146 "test")
                  DW_AT_artificial      (true)
```
The elf also has two 't' in the same scope.
But unluckily, it might let the debugger
confused. And failed to print parameters for O0 or above.
This patch will make coroutine parameters and move
parameters use the same DIVar and try to fix the problems
that I mentioned before.

Test Plan: check-clang

Reviewed By: aprantl, jmorse

Differential Revision: https://reviews.llvm.org/D97533
2021-04-12 11:10:47 +08:00
Saurabh Jha 71ab6c98a0
[Matrix] Implement C-style explicit type conversions for matrix types.
This implements C-style type conversions for matrix types, as specified
in clang/docs/MatrixTypes.rst.

Fixes PR47141.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D99037
2021-04-10 11:48:41 +01:00
Roman Lebedev 6270b3a1ea
Temporairly revert "[CGCall] Annotate `this` argument with alignment"
As per @jyknight, "It seems like there's a bug with vtable thunks getting the wrong information."
See https://reviews.llvm.org/D99790#2680857, https://godbolt.org/z/MxhYMe1q7

This reverts commit 0aa0458f14.
2021-04-10 10:43:16 +03:00
Ben Shi 4f173c0c42 [clang][AVR] Support variable decorator '__flash'
Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D96853
2021-04-10 11:23:55 +08:00
cchen 1a43fd2769 [OpenMP51] Initial support for masked directive and filter clause
Adds basic parsing/sema/serialization support for the #pragma omp masked
directive.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99995
2021-04-09 14:00:36 -05:00
Soumi Manna 5d7cb79416 RISCVABIInfo::classifyArgumentType: Fix static analyzer warnings with uninitialized variables warnings - NFCI
Differential Revision: https://reviews.llvm.org/D100172
2021-04-09 15:23:32 +01:00
Yaxun (Sam) Liu 25942d7c49 [AMDGPU] Allow relaxed/consume memory order for atomic inc/dec
Reviewed by: Jon Chesterfield

Differential Revision: https://reviews.llvm.org/D100144
2021-04-09 09:23:41 -04:00
David Blaikie eb8a28e2cf DebugInfo: Include inline namespaces in template specialization parameter names
This ensures these types have distinct names if they are distinct types
(eg: if one is an instantiation with a type in one inline namespace, and
another from a type with the same simple name, but in a different inline
namespace).
2021-04-08 17:37:55 -07:00
Xiangling Liao d508561798 [AIX] Support init priority attribute
Differential Revision: https://reviews.llvm.org/D99291
2021-04-08 15:40:09 -04:00
Dávid Bolvanský 2cb8c10342 Revert "Reduce the number of attributes attached to each function"
This reverts commit 053dc95839. It causes perf regressions - see discussion in D97116.
2021-04-08 17:28:57 +02:00
Saurabh Jha 7d8513b7f2 [clang] Move int <-> float scalar conversion to a separate function
As prelude to this patch https://reviews.llvm.org/D99037, we want to
move the int-float conversion
into a separate function that can be reused by matrix cast

Differential Revision: https://reviews.llvm.org/D100051
2021-04-07 12:34:16 -07:00
Roman Lebedev 0aa0458f14
[CGCall] Annotate `this` argument with alignment
As it is being noted in D99249, lack of alignment information on `this`
has been preventing LICM from happening.

For some time now, lack of alignment attribute does *not* imply
natural alignment, but an alignment of `1`.
Also, we used to treat dereferenceable as implying alignment,
but we no longer do, so it's a bugfix.

Differential Revision: https://reviews.llvm.org/D99790
2021-04-07 11:02:01 +03:00
Yaxun (Sam) Liu 61d065e21f Let clang atomic builtins fetch add/sub support floating point types
Recently atomicrmw started to support fadd/fsub:

https://reviews.llvm.org/D53965

However clang atomic builtins fetch add/sub still does not support
emitting atomicrmw fadd/fsub.

This patch adds that.

Reviewed by: John McCall, Artem Belevich, Matt Arsenault, JF Bastien,
James Y Knight, Louis Dionne, Olivier Giroux

Differential Revision: https://reviews.llvm.org/D71726
2021-04-06 15:44:00 -04:00
Simon Pilgrim 2901dc7575 Don't directly dereference getAs<> casts to avoid potential null dereferences. NFCI.
Replace with castAs<> which asserts the cast is valid.

Fixes a number of static analyzer warnings.
2021-04-06 12:24:19 +01:00
Yevgeny Rouban 39e3e3aa51 [NewPM] Redesign of PreserveCFG Checker
The reason for the NewPM redesign is described in the commit
  cba3e783389a: [NewPM] Disable PreservedCFGChecker ...

The checker introduces an internal custom CFG analysis that tracks
current up-to date CFG snapshot. The analysis is invalidated along
any other CFG related analysis (the key is CFGAnalyses). If the CFG
analysis is not invalidated at a functional pass exit then the checker
asserts that the CFG snapshot taken from this analysis is equals to
a snapshot of the current CFG.

Along the way:
- the function CFG::printDiff() is simplified by removing function
  name calculation. The name is printed by the caller;
- fixed CFG invalidated condition (see CFG::invalidate());
- StandardInstrumentations::registerCallbacks() gets additional
  optional parameter of type FunctionAnalysisManager*, which is
  needed by the checker to get the custom CFG analysis;
- several PM related tests updated to explicitly set
  -verify-cfg-preserved=1 as they need.

This patch is safe to land as the CFGChecker is left switched off
(the options -verify-cfg-preserved is false by default). It will be
switched on by a separate patch to minimize possible reverts.

Reviewed By: skatkov, kuhar

Differential Revision: https://reviews.llvm.org/D91327
2021-04-06 12:35:49 +07:00
Jennifer Yu 7078ef4722 [OPENMP51]Initial support for nocontext clause.
Added basic parsing/sema/serialization support for the 'nocontext' clause.

Differential Revision: https://reviews.llvm.org/D99848
2021-04-05 11:45:49 -07:00
Craig Topper b4f2e80600 [RISCV] Refactor conversion of B extensions to IR intrinsics a little to reduce clang binary size.
These all pass 1 type to getIntrinsic. So rather than assigning
IntrinsicTypes for each builtin which invokes the SmallVector
constructor, just select the intrinsic ID with a switch and
share a single assignment of IntrinsicTypes.
2021-04-02 23:49:44 -07:00
Jennifer Yu 0fe8af9468 Fix build bot problem with missing OMPC_novariants in switch. 2021-04-02 13:58:39 -07:00
Levy Hsu f78d932cf2 [RISCV] Add IR intrinsics for Zbc extension
Head files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
clmul
clmulh
clmulr

Differential Revision: https://reviews.llvm.org/D99711
2021-04-02 12:09:13 -07:00
Levy Hsu 944adbf285 Recommit "[RISCV] Add IR intrinsic for Zbb extension"
Forgot to amend the Author.

Original commit message:

Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b

Differential Revision: https://reviews.llvm.org/D99320
2021-04-02 11:50:19 -07:00
Craig Topper 1f0b309f24 Revert "[RISCV] Add IR intrinsic for Zbb extension"
This reverts commit 1808194590.

I forgot to change the author.
2021-04-02 11:47:02 -07:00
Craig Topper 1808194590 [RISCV] Add IR intrinsic for Zbb extension
Header files are included in a separate patch in case the name needs to be changed.

RV32 / 64:
orc.b
2021-04-02 11:23:57 -07:00
Levy Hsu b001d574d7 [RISCV] Add IR intrinsic for Zbr extension
Implementation for RISC-V Zbr extension intrinsic.

Header files are included in separate patch in case the name needs to be changed

RV32 / 64:
        crc32b
        crc32h
        crc32w
        crc32cb
        crc32ch
        crc32cw

RV64 Only:
        crc32d
        crc32cd

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D99009
2021-04-02 10:58:45 -07:00
Joseph Huber 69ca50bd7d [OpenMP] Pass mapping names to add components in a user defined mapper
Summary:
Currently the mapping names are not passed to the mapper components that set up
the array region. This means array mappings will not have their names availible
in the runtime. This patch fixes this by passing the argument name to the region
correctly. This means that the mapped variable's name will be the declared
mapper that placed it on the device.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D99681
2021-04-01 15:51:03 -04:00
Raphael Isemann 60854c328d Avoid calling ParseCommandLineOptions in BackendUtil if possible
Calling `ParseCommandLineOptions` should only be called from `main` as the
CommandLine setup code isn't thread-safe. As BackendUtil is part of the
generic Clang FrontendAction logic, a process which has several threads executing
Clang FrontendActions will randomly crash in the unsafe setup code.

This patch avoids calling the function unless either the debug-pass option or
limit-float-precision option is set. Without these two options set the
`ParseCommandLineOptions` call doesn't do anything beside parsing
the command line `clang` which doesn't set any options.

See also D99652 where LLDB received a workaround for this crash.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D99740
2021-04-01 19:41:16 +02:00
Alexey Bataev a28e835e94 [OPENMP]Fix PR48885: Crash in passing firstprivate args to tasks on Apple M1.
Need to bitcast the function pointer passed as a parameter to the real
type to avoid possible problem with calling conventions.

Differential Revision: https://reviews.llvm.org/D99521
2021-03-31 13:00:58 -07:00
Alexey Bataev 66da4f6fc9 [OPENMP]Fix PR48658: [OpenMP 5.0] Compiler crash when OpenMP atomic sync hints used.
No need to consider hint clause kind as the main atomic clause kind at the
codegen.

Differential Revision: https://reviews.llvm.org/D99611
2021-03-31 12:58:24 -07:00
Thomas Lively 45783d0e8a [WebAssembly] Implement i64x2 comparisons
Removes the prototype builtin and intrinsic for i64x2.eq and implements that
instruction as well as the other i64x2 comparison instructions in the final SIMD
spec. Unsigned comparisons were not included in the final spec, so they still
need to be scalarized via a custom lowering.

Differential Revision: https://reviews.llvm.org/D99623
2021-03-31 10:46:17 -07:00
Wei Mi d535a05ca1 [ThinLTO] During module importing, close one source module before open
another one for distributed mode.

Currently during module importing, ThinLTO opens all the source modules,
collect functions to be imported and append them to the destination module,
then leave all the modules open through out the lto backend pipeline. This
patch refactors it in the way that one source module will be closed before
another source module is opened. All the source modules will be closed after
importing phase is done. It will save some amount of memory when there are
many source modules to be imported.

Note that this patch only changes the distributed thinlto mode. For in
process thinlto mode, one source module is shared acorss different thinlto
backend threads so it is not changed in this patch.

Differential Revision: https://reviews.llvm.org/D99554
2021-03-30 14:37:29 -07:00
Mike Rice b7899ba0e8 [OPENMP51]Initial support for the dispatch directive.
Added basic parsing/sema/serialization support for dispatch directive.

Differential Revision: https://reviews.llvm.org/D99537
2021-03-30 14:12:53 -07:00
Alexey Bataev e2c7bf08cc [OPENMP]Fix PR48607: Crash during clang openmp codegen for firstprivate() of `float _Complex`.
Need to cast the argument for the debug wrapper function call to the
corresponding parameter type to avoid crash.

Differential Revision: https://reviews.llvm.org/D99617
2021-03-30 13:39:45 -07:00
Alexey Bataev 1696b8ae96 [OPENMP]Fix PR48740: OpenMP declare reduction in C does not require an initializer
If no initializer-clause is specified, the private variables will be
initialized following the rules for initialization of objects with static
storage duration.

Need to adjust the implementation to the current version of the
standard.

Differential Revision: https://reviews.llvm.org/D99539
2021-03-30 05:38:20 -07:00
Raphael Isemann 1cbba533ec [ObjC][CodeGen] Fix missing debug info in situations where an instance and class property have the same identifier
Since the introduction of class properties in Objective-C it is possible to declare a class and an instance
property with the same identifier in an interface/protocol.

Right now Clang just generates debug information for whatever property comes first in the source file.
The second property is ignored as it's filtered out by the set of already emitted properties (which is just
using the identifier of the property to check for equivalence).  I don't think generating debug info in this case
was never supported as the identifier filter is in place since 7123bca7fb
(which precedes the introduction of class properties).

This patch expands the filter to take in account identifier + whether the property is class/instance. This
ensures that both properties are emitted in this special situation.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D99512
2021-03-30 11:07:16 +02:00
Johannes Doerfert 03cc8a1ba0 [OpenMP][NFC] Move the `noinline` to the parallel entry point
The `noinline` for non-SPMD parallel functions is probably not necessary
but as long as we use it we should put it on the outermost parallel
function, which is the wrapper, not the actual outlined function.

Resolves PR49752

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D99506
2021-03-30 01:12:45 -05:00
Alexey Bataev 0411b23319 [OPENMP]Map data field with l-value reference types.
Added initial support dfor the mapping of the data members with l-value
reference types.

Differential Revision: https://reviews.llvm.org/D98812
2021-03-29 07:07:09 -07:00
Alexey Bataev f6f21dcd6c [OPENMP]Fix PR49636: Assertion `(!Entry.getAddress() || Entry.getAddress() == Addr) && "Resetting with the new address."' failed.
The original issue is caused by the fact that the variable is allocated
with incorrect type i1 instead of i8. This causes the bitcasting of the
declaration to i8 type and the bitcast expression does not match the
original variable.
To fix the problem, the UndefValue initializer and the original
variable should be emitted with type i8, not i1.

Differential Revision: https://reviews.llvm.org/D99297
2021-03-29 06:55:57 -07:00
Alexey Bataev dcf96178cb [OPENMP]Fix PR49052: Clang crashed when compiling target code with assert(0).
Need to insert a basic block during generation of the target region to
avoid crash for the GPU to be able always calling a cleanup action.
This cleanup action is required for the correct emission of the target
region for the GPU.

Differential Revision: https://reviews.llvm.org/D99445
2021-03-29 06:36:06 -07:00
Matt Arsenault 9a0c9402fa Reapply "OpaquePtr: Turn inalloca into a type attribute"
This reverts commit 07e46367ba.
2021-03-29 08:55:30 -04:00
Oliver Stannard 07e46367ba Revert "Reapply "OpaquePtr: Turn inalloca into a type attribute""
Reverting because test 'Bindings/Go/go.test' is failing on most
buildbots.

This reverts commit fc9df30991.
2021-03-29 11:32:22 +01:00
Matt Arsenault fc9df30991 Reapply "OpaquePtr: Turn inalloca into a type attribute"
This reverts commit 20d5c42e0e.
2021-03-28 13:35:21 -04:00
Nico Weber 20d5c42e0e Revert "OpaquePtr: Turn inalloca into a type attribute"
This reverts commit 4fefed6563.
Broke check-clang everywhere.
2021-03-28 13:02:52 -04:00