Commit Graph

1575 Commits

Author SHA1 Message Date
Jon Chesterfield 27177b82d4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-10 15:30:56 +00:00
Jorge Gorbe Moya 770ddf599d Fix unused variable warning in release build 2021-11-09 19:48:42 -08:00
hsmahesha 3b9a85d10a [CFE][Codegen] Make sure to maintain the contiguity of all the static allocas
at the start of the entry block, which in turn would aid better code transformation/optimization.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D110257
2021-11-10 08:45:21 +05:30
Jon Chesterfield 0fa45d6d80 Revert "[OpenMP] Lower printf to __llvm_omp_vprintf"
This reverts commit db81d8f6c4.
2021-11-08 20:28:57 +00:00
Jon Chesterfield db81d8f6c4 [OpenMP] Lower printf to __llvm_omp_vprintf
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.

This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.

The exact set of changes to check-openmp probably needs revision before commit

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D112680
2021-11-08 18:38:00 +00:00
Mike Rice 6f9c25167d [OpenMP] Initial parsing/sema for the 'omp loop' construct
Adds basic parsing/sema/serialization support for the #pragma omp loop
directive.

Differential Revision: https://reviews.llvm.org/D112499
2021-10-28 08:26:43 -07:00
hsmahesha db9c2d7751 [CFE][Codegen] Remove CodeGenFunction::InitTempAlloca()
Sequel patch to https://reviews.llvm.org/D111316

Finally, remove the defintion of CodeGenFunction::InitTempAlloca().

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D111324
2021-10-12 10:04:15 +05:30
Giorgis Georgakoudis ac90dfc43a Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit 1d66649adf.

Revert to fix AMG GPU issue.
2021-09-21 13:20:39 -07:00
Giorgis Georgakoudis 1d66649adf [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert, jhuber6

Differential Revision: https://reviews.llvm.org/D102107
2021-09-21 10:50:04 -07:00
alokmishra.besu 000875c127 OpenMP 5.0 metadirective
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.

A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h

Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D91944
2021-09-18 13:40:44 -05:00
Nico Weber 31cca21565 Revert "OpenMP 5.0 metadirective"
This reverts commit c7d7b98e52.
Breaks tests on macOS, see comment on https://reviews.llvm.org/D91944
2021-09-18 09:10:37 -04:00
alokmishra.besu 347f3c186d OpenMP 5.0 metadirective
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.

A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h

Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D91944
2021-09-17 16:30:06 -05:00
cchen 7efb825382 Revert "OpenMP 5.0 metadirective"
This reverts commit c7d7b98e52.
2021-09-17 16:14:16 -05:00
cchen c7d7b98e52 OpenMP 5.0 metadirective
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.

A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h

Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D91944
2021-09-17 16:03:13 -05:00
Michael Kruse 650bbc5620 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Recommit of 707ce34b06. Don't introduce a
dependency to the LLVMPasses component, instead register the required
passes individually.

Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-04 19:18:58 -05:00
PeixinQiao a42380ce83 [OMPIRBuilder] Add ordered directive to OMPBuilder
Add support for ordered directive in the OpenMPIRBuilder.

This patch also modidies clang to use the ordered directive when the
option -fopenmp-enable-irbuilder is enabled.

Also fix one ICE when parsing one canonical for loop with the relational
operator LE or GE in openmp region by replacing unary increment
operation of the expression of the variable "Expr A" minus the variable
"Expr B" (++(Expr A - Expr B)) with binary addition operation of the
experssion of the variable "Expr A" minus the variable "Expr B" and the
expression with constant value "1" (Expr A - Expr B + "1").

Reviewed By: Meinersbur, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107430
2021-09-03 09:37:58 +08:00
Roman Lebedev 50634deaa5
Revert "[OpenMP][OpenMPIRBuilder] Implement loop unrolling."
Breaks build with -DBUILD_SHARED_LIBS=ON
```
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
  "LLVMFrontendOpenMP" of type SHARED_LIBRARY
    depends on "LLVMPasses" (weak)
  "LLVMipo" of type SHARED_LIBRARY
    depends on "LLVMFrontendOpenMP" (weak)
  "LLVMCoroutines" of type SHARED_LIBRARY
    depends on "LLVMipo" (weak)
  "LLVMPasses" of type SHARED_LIBRARY
    depends on "LLVMCoroutines" (weak)
    depends on "LLVMipo" (weak)
At least one of these targets is not a STATIC_LIBRARY.  Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed.  Build files cannot be regenerated correctly.
```

This reverts commit 707ce34b06.
2021-09-02 12:42:23 +03:00
Michael Kruse 707ce34b06 [OpenMP][OpenMPIRBuilder] Implement loop unrolling.
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:

 * `unrollLoopFull`
 * `unrollLoopPartial`
 * `unrollLoopHeuristic`

`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.

With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.

Reviewed By: jdoerfert, kiranchandramohan

Differential Revision: https://reviews.llvm.org/D107764
2021-09-02 02:37:25 -05:00
Andrei Elovikov 1724a16437 [NFC][clang] Move IR-independent parts of target MV support to X86TargetParser.cpp
...that is located under llvm/lib/Support/.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D108423
2021-08-30 09:48:48 -07:00
Simon Pilgrim 7f48bd3bed CGBuiltin.cpp - pass SVETypeFlags by const reference. NFC.
Don't pass the struct by value.
2021-08-22 12:13:17 +01:00
Alexander Potapenko b0391dfc73 [clang][Codegen] Introduce the disable_sanitizer_instrumentation attribute
The purpose of __attribute__((disable_sanitizer_instrumentation)) is to
prevent all kinds of sanitizer instrumentation applied to a certain
function, Objective-C method, or global variable.

The no_sanitize(...) attribute drops instrumentation checks, but may
still insert code preventing false positive reports. In some cases
though (e.g. when building Linux kernel with -fsanitize=kernel-memory
or -fsanitize=thread) the users may want to avoid any kind of
instrumentation.

Differential Revision: https://reviews.llvm.org/D108029
2021-08-20 14:01:06 +02:00
Melanie Blower bc5b5ea037 [clang][patch][FPEnv] Make initialization of C++ globals strictfp aware
@kpn pointed out that the global variable initialization functions didn't
have the "strictfp" metadata set correctly, and @rjmccall said that there
was buggy code in SetFPModel and StartFunction, this patch is to solve
those problems. When Sema creates a FunctionDecl, it sets the
FunctionDeclBits.UsesFPIntrin to "true" if the lexical FP settings
(i.e. a combination of command line options and #pragma float_control
settings) correspond to ConstrainedFP mode. That bit is used when CodeGen
starts codegen for a llvm function, and it translates into the
"strictfp" function attribute. See bugs.llvm.org/show_bug.cgi?id=44571

Reviewed By: Aaron Ballman

Differential Revision: https://reviews.llvm.org/D102343
2021-07-29 12:02:37 -04:00
Giorgis Georgakoudis fb0cf01795 Revert "[OpenMP] Codegen aggregate for outlined function captures"
This reverts commit e9c7291cb2.

Fix failing tests
2021-07-19 07:54:26 -07:00
Jamie Schmeiser 73840f9f81 thread_local support for AIX
Summary:
The AIX linker will produce errors on unresolved weak symbols.  Change the
generated code to not check for the initialization function but just call
it and ensure that it always exists.  Also, the AIX atexit routine has a
different name (and signature) so call it correctly.  Update the lit tests
to test on AIX appropriately.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: hubert.reinterpretcast (Hubert Tong)
Differential Revision: https://reviews.llvm.org/D104420
2021-07-19 10:03:22 -04:00
Giorgis Georgakoudis e9c7291cb2 [OpenMP] Codegen aggregate for outlined function captures
Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3)  forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102107
2021-07-16 23:27:44 -07:00
Graham Hunter c6a91ee6aa [Clang][OpenMP] Monotonic does not apply to SIMD
The codegen for simd constructs was affected by the presence (or
absence) of the 'monotonic' schedule modifier for worksharing
loops. The modifier is only intended to apply to the scheduling of
chunks for a thread, not iterations of a loop inside a chunk.

In addition, the monotonic modifier was applied to worksharing loops
by default if no schedule clause was present; the referenced part of
the OpenMP 4.5 spec in the code (section 2.7.1) only applies if the
user specified a schedule clause with a static kind but no modifier.
Without a user-specified schedule clause we should default to
nonmonotonic scheduling.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D103793
2021-06-22 10:24:11 +01:00
Michael Kruse a22236120f [OpenMP] Implement '#pragma omp unroll'.
Implementation of the unroll directive introduced in OpenMP 5.1. Follows the approach from D76342 for the tile directive (i.e. AST-based, not using the OpenMPIRBuilder). Tries to use `llvm.loop.unroll.*` metadata where possible, but has to fall back to an AST representation of the outer loop if the partially unrolled generated loop is associated with another directive (because it needs to compute the number of iterations).

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99459
2021-06-10 14:30:17 -05:00
Hsiangkai Wang 2b13ff6979 [Clang][CodeGen] Set the size of llvm.lifetime to unknown for scalable types.
If the memory object is scalable type, we do not know the exact size of
it at compile time. Set the size of lifetime marker to unknown if the
object is scalable one.

Differential Revision: https://reviews.llvm.org/D102822
2021-06-07 23:30:13 +08:00
Ten Tzen 797ad70152 [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.

This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.

Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.

NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.

The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
  faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
  must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
  outside of _try region must be updated in memory (not just in register)
  before the subsequent exception occurs.

The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.

Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.

This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.

One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:

A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others.  If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:

Warning: jump bypasses variable with a non-trivial destructor

In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.

Implementation:
Part-1: Clang implementation described below.

Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.

Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.

Part-2 (will be in Part-2 patch): LLVM implementation described below.

For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).

For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.

The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.

Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html
https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html

Differential Revision: https://reviews.llvm.org/D80344/new/
2021-05-17 22:42:17 -07:00
Florian Hahn 6c31295493
[clang] Refactor mustprogress handling, add it to all loops in c++11+.
Currently Clang does not add mustprogress to inifinite loops with a
known constant condition, matching C11 behavior. The forward progress
guarantee in C++11 and later should allow us to add mustprogress to any
loop (http://eel.is/c++draft/intro.progress#1).

This allows us to simplify the code dealing with adding mustprogress a
bit.

Reviewed By: aaron.ballman, lebedev.ri

Differential Revision: https://reviews.llvm.org/D96418
2021-04-30 14:13:47 +01:00
Joshua Haberman 8344675908 Implemented [[clang::musttail]] attribute for guaranteed tail calls.
This is a Clang-only change and depends on the existing "musttail"
support already implemented in LLVM.

The [[clang::musttail]] attribute goes on a return statement, not
a function definition. There are several constraints that the user
must follow when using [[clang::musttail]], and these constraints
are verified by Sema.

Tail calls are supported on regular function calls, calls through a
function pointer, member function calls, and even pointer to member.

Future work would be to throw a warning if a users tries to pass
a pointer or reference to a local variable through a musttail call.

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D99517
2021-04-15 17:12:21 -07:00
cchen e0c2125d1d [OpenMP] Added codegen for masked directive
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D100514
2021-04-15 12:55:07 -05:00
yifeng.dongyifeng 3a6a80b641 [Clang][Coroutine][DebugInfo] In c++ coroutine, clang will emit different debug info variables for parameters and move-parameters.
The first one is the real parameters of the coroutine function, the
other one just for copying parameters to the coroutine frame.

Considering the following c++ code:
```
struct coro {
  ...
};

coro foo(struct test & t) {
  ...
  co_await suspend_always();
    ...
    co_await suspend_always();
    ...
    co_await suspend_always();
}

int main(int argc, char *argv[]) {
  auto c = foo(...);
    c.handle.resume();
      ...
  }
```

Function foo is the standard coroutine function, and it has only
one parameter named t (ignoring this at first),
when we use the llvm code to compile this function, we can get the
following ir:

```
!2921 = distinct !DISubprogram(name: "foo", linkageName:
"_ZN6Object3fooE4test", scope: !2211, file: !45, li\
ne: 48, type: !2329, scopeLine: 48, flags: DIFlagPrototyped |
DIFlagAllCallsDescribed, spFlags: DISPFlagDefi\
nition | DISPFlagOptimized, unit: !44, declaration: !2328,
retainedNodes: !2922)
!2924 = !DILocalVariable(name: "t", arg: 2, scope: !2921, file: !45,
line: 48, type: !838)
...
!2926 = !DILocalVariable(name: "t", scope: !2921, type: !838, flags:
DIFlagArtificial)
```
We can find there are two `the same` DIVariable named t in the same
dwarf scope for foo.resume.
And when we try to use llvm-dwarfdump to dump the dwarf info of this
elf, we get the following output:

```
0x00006684:   DW_TAG_subprogram
                DW_AT_low_pc    (0x00000000004013a0)
                DW_AT_high_pc   (0x00000000004013a8)
                DW_AT_frame_base        (DW_OP_reg7 RSP)
                DW_AT_object_pointer    (0x0000669c)
                DW_AT_GNU_all_call_sites        (true)
                DW_AT_specification     (0x00005b5c "_ZN6Object3fooE4test")

0x000066a5:     DW_TAG_formal_parameter
                DW_AT_name    ("t")
                DW_AT_decl_file       ("/disk1/yifeng.dongyifeng/my_code/llvm/build/bin/coro-debug-1.cpp")
                DW_AT_decl_line       (48)
                DW_AT_type    (0x00004146 "test")

0x000066ba:     DW_TAG_variable
                  DW_AT_name    ("t")
                  DW_AT_type    (0x00004146 "test")
                  DW_AT_artificial      (true)
```
The elf also has two 't' in the same scope.
But unluckily, it might let the debugger
confused. And failed to print parameters for O0 or above.
This patch will make coroutine parameters and move
parameters use the same DIVar and try to fix the problems
that I mentioned before.

Test Plan: check-clang

Reviewed By: aprantl, jmorse

Differential Revision: https://reviews.llvm.org/D97533
2021-04-12 11:10:47 +08:00
Xiangling Liao d508561798 [AIX] Support init priority attribute
Differential Revision: https://reviews.llvm.org/D99291
2021-04-08 15:40:09 -04:00
Xun Li c7a39c833a [Coroutine][Clang] Force emit lifetime intrinsics for Coroutines
tl;dr Correct implementation of Corouintes requires having lifetime intrinsics available.

Coroutine functions are functions that can be suspended and resumed latter. To do so, data that need to stay alive after suspension must be put on the heap (i.e. the coroutine frame).
The optimizer is responsible for analyzing each AllocaInst and figure out whether it should be put on the stack or the frame.
In most cases, for data that we are unable to accurately analyze lifetime, we can just conservatively put them on the heap.
Unfortunately, there exists a few cases where certain data MUST be put on the stack, not on the heap. Without lifetime intrinsics, we are unable to correctly analyze those data's lifetime.

To dig into more details, there exists cases where at certain code points, the current coroutine frame may have already been destroyed. Hence no frame access would be allowed beyond that point.
The following is a common code pattern called "Symmetric Transfer" in coroutine:
```
auto tmp = await_suspend();
__builtin_coro_resume(tmp.address());
return;
```
In the above code example, `await_suspend()` returns a new coroutine handle, which we will obtain the address and then resume that coroutine. This essentially "transfered" from the current coroutine to a different coroutine.
During the call to `await_suspend()`, the current coroutine may be destroyed, which should be fine because we are not accessing any data afterwards.
However when LLVM is emitting IR for the above code, it needs to emit an AllocaInst for `tmp`. It will then call the `address` function on tmp. `address` function is a member function of coroutine, and there is no way for the LLVM optimizer to know that it does not capture the `tmp` pointer. So when the optimizer looks at it, it has to conservatively assume that `tmp` may escape and hence put it on the heap. Furthermore, in some cases `address` call would be inlined, which will generate a bunch of store/load instructions that move the `tmp` pointer around. Those stores will also make the compiler to think that `tmp` might escape.
To summarize, it's really difficult for the mid-end to figure out that the `tmp` data is short-lived.
I made some attempt in D98638, but it appears to be way too complex and is basically doing the same thing as inserting lifetime intrinsics in coroutines.

Also, for reference, we already force emitting lifetime intrinsics in O0 for AlwaysInliner: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Passes/PassBuilder.cpp#L1893

Differential Revision: https://reviews.llvm.org/D99227
2021-03-25 13:46:20 -07:00
Roman Lebedev e3a4701627
[clang][CodeGen] Lower Likelihood attributes to @llvm.expect intrin instead of branch weights
08196e0b2e exposed LowerExpectIntrinsic's
internal implementation detail in the form of
LikelyBranchWeight/UnlikelyBranchWeight options to the outside.

While this isn't incorrect from the results viewpoint,
this is suboptimal from the layering viewpoint,
and causes confusion - should transforms also use those weights,
or should they use something else, D98898?

So go back to status quo by making LikelyBranchWeight/UnlikelyBranchWeight
internal again, and fixing all the code that used it directly,
which currently is only clang codegen, thankfully,
to emit proper @llvm.expect intrinsics instead.
2021-03-21 22:50:21 +03:00
Vassil Vassilev 0cb7e7ca0c Make iteration over the DeclContext::lookup_result safe.
The idiom:
```
DeclContext::lookup_result R = DeclContext::lookup(Name);
for (auto *D : R) {...}
```

is not safe when in the loop body we trigger deserialization from an AST file.
The deserialization can insert new declarations in the StoredDeclsList whose
underlying type is a vector. When the vector decides to reallocate its storage
the pointer we hold becomes invalid.

This patch replaces a SmallVector with an singly-linked list. The current
approach stores a SmallVector<NamedDecl*, 4> which is around 8 pointers.
The linked list is 3, 5, or 7. We do better in terms of memory usage for small
cases (and worse in terms of locality -- the linked list entries won't be near
each other, but will be near their corresponding declarations, and we were going
to fetch those memory pages anyway). For larger cases: the vector uses a
doubling strategy for reallocation, so will generally be between half-full and
full. Let's say it's 75% full on average, so there's N * 4/3 + 4 pointers' worth
of space allocated currently and will be 2N pointers with the linked list. So we
break even when there are N=6 entries and slightly lose in terms of memory usage
after that. We suspect that's still a win on average.

Thanks to @rsmith!

Differential revision: https://reviews.llvm.org/D91524
2021-03-17 08:59:04 +00:00
Nikita Popov 46354bac76 [OpaquePtrs] Remove some uses of type-less CreateLoad APIs (NFC)
Explicitly pass loaded type when creating loads, in preparation
for the deprecation of these APIs.

There are still a couple of uses left.
2021-03-11 14:40:57 +01:00
Michael Kruse b119120673 [clang][OpenMP] Use OpenMPIRBuilder for workshare loops.
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
 * Only the worksharing-loop directive.
 * Recognizes only the nowait clause.
 * No loop nests with more than one loop.
 * Untested with templates, exceptions.
 * Semantic checking left to the existing infrastructure.

This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
 * The distance function: How many loop iterations there will be before entering the loop nest.
 * The loop variable function: Conversion from a logical iteration number to the loop variable.

These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.

The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.

For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.

Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.

Reviewed By: jdenny

Differential Revision: https://reviews.llvm.org/D94973
2021-03-04 22:52:59 -06:00
Reid Kleckner 1c2e7d200d [MS] Fix crash involving gnu stmt exprs and inalloca
Use a WeakTrackingVH to cope with the stmt emission logic that cleans up
unreachable blocks. This invalidates the reference to the deferred
replacement placeholder. Cope with it.

Fixes PR25102 (from 2015!)
2021-03-04 13:57:46 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Hsiangkai Wang 1a35a1b074 [RISCV] Add vadd with mask and without mask builtin.
Demonstrate how to add RISC-V V builtins and lower them to IR intrinsics for V extension.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>

Differential Revision: https://reviews.llvm.org/D93446
2021-02-24 07:57:31 +08:00
Michael Kruse 6c05005238 [OpenMP] Implement '#pragma omp tile', by Michael Kruse (@Meinersbur).
The tile directive is in OpenMP's Technical Report 8 and foreseeably will be part of the upcoming OpenMP 5.1 standard.

This implementation is based on an AST transformation providing a de-sugared loop nest. This makes it simple to forward the de-sugared transformation to loop associated directives taking the tiled loops. In contrast to other loop associated directives, the OMPTileDirective does not use CapturedStmts. Letting loop associated directives consume loops from different capture context would be difficult.

A significant amount of code generation logic is taking place in the Sema class. Eventually, I would prefer if these would move into the CodeGen component such that we could make use of the OpenMPIRBuilder, together with flang. Only expressions converting between the language's iteration variable and the logical iteration space need to take place in the semantic analyzer: Getting the of iterations (e.g. the overload resolution of `std::distance`) and converting the logical iteration number to the iteration variable (e.g. overload resolution of `iteration + .omp.iv`). In clang, only CXXForRangeStmt is also represented by its de-sugared components. However, OpenMP loop are not defined as syntatic sugar. Starting with an AST-based approach allows us to gradually move generated AST statements into CodeGen, instead all at once.

I would also like to refactor `checkOpenMPLoop` into its functionalities in a follow-up. In this patch it is used twice. Once for checking proper nesting and emitting diagnostics, and additionally for deriving the logical iteration space per-loop (instead of for the loop nest).

Differential Revision: https://reviews.llvm.org/D76342
2021-02-16 09:45:07 -08:00
Florian Hahn 6280bb4cd8 [clang] Remove redundant condition (NFC). 2021-02-12 20:14:24 +00:00
Florian Hahn 51bf4c0e6d [clang] Add -ffinite-loops & -fno-finite-loops options.
This patch adds 2 new options to control when Clang adds `mustprogress`:

  1. -ffinite-loops: assume all loops are finite; mustprogress is added
     to all loops, regardless of the selected language standard.
  2. -fno-finite-loops: assume no loop is finite; mustprogress is not
     added to any loop or function. We could add mustprogress to
     functions without loops, but we would have to detect that in Clang,
     which is probably not worth it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D96419
2021-02-12 19:25:49 +00:00
Akira Hatanaka ed4718eccb [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-12 09:51:57 -08:00
Nico Weber de1966e542 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 4a64d8fe39.
Makes clang crash when buildling trivial iOS programs, see comment
after https://reviews.llvm.org/D92808#2551401
2021-02-09 11:06:32 -05:00
Akira Hatanaka 4a64d8fe39 [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

This reapplies 3fe3946d9a without the
changes made to lib/IR/AutoUpgrade.cpp, which was violating layering.

Original commit message:

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 06:09:42 -08:00
Akira Hatanaka 2fbbb18c1d Revert "[ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly"
This reverts commit 3fe3946d9a.

The commit violates layering by including a header from Analysis in
lib/IR/AutoUpgrade.cpp.
2021-02-05 06:00:05 -08:00
Akira Hatanaka 3fe3946d9a [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly
emitting retainRV or claimRV calls in the IR

Background:

This patch makes changes to the front-end and middle-end that are
needed to fix a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.rv" to calls, which
  indicates the call is implicitly followed by a marker instruction and
  an implicit retainRV/claimRV call that consumes the call result. In
  addition, it emits a call to @llvm.objc.clang.arc.noop.use, which
  consumes the call result, to prevent the middle-end passes from changing
  the return type of the called function. This is currently done only when
  the target is arm64 and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  the call is annotated with claimRV since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if the implicit call is a call to
  retainRV and does nothing if it's a call to claimRV.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls annotated with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-02-05 05:55:18 -08:00
Petr Hosek bb9eb19829 Support for instrumenting only selected files or functions
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.

Differential Revision: https://reviews.llvm.org/D94820
2021-01-26 17:13:34 -08:00
Petr Hosek 1e634f3952 Revert "Support for instrumenting only selected files or functions"
This reverts commit 4edf35f11a because
the test fails on Windows bots.
2021-01-26 12:25:28 -08:00
Petr Hosek 4edf35f11a Support for instrumenting only selected files or functions
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.

Differential Revision: https://reviews.llvm.org/D94820
2021-01-26 11:11:39 -08:00
Alan Phipps 9f2967bcfe [Coverage] Add support for Branch Coverage in LLVM Source-Based Code Coverage
This is an enhancement to LLVM Source-Based Code Coverage in clang to track how
many times individual branch-generating conditions are taken (evaluate to TRUE)
and not taken (evaluate to FALSE).  Individual conditions may comprise larger
boolean expressions using boolean logical operators.  This functionality is
very similar to what is supported by GCOV except that it is very closely
anchored to the ASTs.

Differential Revision: https://reviews.llvm.org/D84467
2021-01-05 09:51:51 -06:00
Reid Kleckner d7098ff29c De-templatify EmitCallArgs argument type checking, NFCI
This template exists to abstract over FunctionPrototype and
ObjCMethodDecl, which have similar APIs for storing parameter types. In
place of a template, use a PointerUnion with two cases to handle this.
Hopefully this improves readability, since the type of the prototype is
easier to discover. This allows me to sink this code, which is mostly
assertions, out of the header file and into the cpp file. I can also
simplify the overloaded methods for computing isGenericMethod, and get
rid of the second EmitCallArgs overload.

Differential Revision: https://reviews.llvm.org/D92883
2020-12-09 11:08:00 -08:00
Tim Northover c5978f42ec UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
2020-12-08 10:28:26 +00:00
Reid Kleckner 3bd0672726 [MS] Fix double evaluation of MSVC builtin arguments
This code got quite twisted because we consider some MSVC builtins to be
target agnostic, and some to be target specific. Target specific
intrinsics have a pattern of doing up-front argument evaluation, while
general intrinsics do not evaluate their arguments up front. As we tried
to share codepaths between the target-specific and target-agnostic
handling, we ended up doing double evaluation.

Instead, have each target handle MSVC intrinsics consistently before up
front argument evaluation. This requires passing less data around and is
more consistent with target independent intrinsic handling.

See D50979 for past examples of this bug. I noticed this while looking
into adding some more intrinsics.

Differential Revision: https://reviews.llvm.org/D92061
2020-11-25 11:55:01 -08:00
Xiangling Liao 17497ec514 [AIX][FE] Support constructor/destructor attribute
Support attribute((constructor)) and attribute((destructor)) on AIX

Differential Revision: https://reviews.llvm.org/D90892
2020-11-19 09:24:01 -05:00
Kevin P. Neal 2069403cdf [FPEnv] Use strictfp metadata in casting nodes
The strictfp metadata was added to the casting AST nodes in D85960, but
we aren't using that metadata yet. This patch adds that support.

In order to avoid lots of ad-hoc passing around of the strictfp bits I
updated the IRBuilder when moving from a function that has the Expr* to a
function that lacks it. I believe we should switch to this pattern to keep
the strictfp support from being overly invasive.

For the purpose of testing that we're picking up the right metadata, I
also made my tests use a pragma to make the AST's strictfp metadata not
match the global strictfp metadata. This exposes issues that we need to
deal with in subsequent patches, and I believe this is the right method
for most all of our clang strictfp tests.

Differential Revision: https://reviews.llvm.org/D88913
2020-11-06 11:56:12 -05:00
Atmn Patel ac73b73c16 [clang] Add mustprogress and llvm.loop.mustprogress attribute deduction
Since C++11, the C++ standard has a forward progress guarantee
[intro.progress], so all such functions must have the `mustprogress`
requirement. In addition, from C11 and onwards, loops without a non-zero
constant conditional or no conditional are also required to make
progress (C11 6.8.5p6). This patch implements these attribute deductions
so they can be used by the optimization passes.

Differential Revision: https://reviews.llvm.org/D86841
2020-11-04 22:03:14 -05:00
Alex Lorenz 701456b523 [darwin] add support for __isPlatformVersionAtLeast check for if (@available)
The __isPlatformVersionAtLeast routine is an implementation of `if (@available)` check
that uses the _availability_version_check API on Darwin that's supported on
macOS 10.15, iOS 13, tvOS 13 and watchOS 6.

Differential Revision: https://reviews.llvm.org/D90367
2020-11-02 16:28:09 -08:00
Mark de Wever b46fddf75f [CodeGen] Implement [[likely]] and [[unlikely]] for while and for loop.
The attribute has no effect on a do statement since the path of execution
will always include its substatement.

It adds a diagnostic when the attribute is used on an infinite while loop
since the codegen omits the branch here. Since the likelihood attributes
have no effect on a do statement no diagnostic will be issued for
do [[unlikely]] {...} while(0);

Differential Revision: https://reviews.llvm.org/D89899
2020-10-31 17:51:29 +01:00
Liu, Chen3 00090a2b82 Support complex target features combinations
This patch is mainly doing two things:

1. Adding support for parentheses, making the combination of target features
   more diverse;
2. Making the priority of ’,‘ is higher than that of '|' by default. So I need
   to make some change with PTX Builtin function.

Differential Revision: https://reviews.llvm.org/D89184
2020-10-30 10:32:53 +08:00
Tyker d3205bbca3 [Annotation] Allows annotation to carry some additional constant arguments.
This allows using annotation in a much more contexts than it currently has.
especially when annotation with template or constexpr.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D88645
2020-10-26 10:50:05 +01:00
Mark de Wever 389c8d5b20 [NFC] Make non-modifying members const.
Implementing the likelihood attributes for the iteration statements adds
a new helper function. This function can't be const qualified since
these non-modifying members aren't const qualified.
2020-10-18 18:50:21 +02:00
Mark de Wever 2bcda6bb28 [Sema, CodeGen] Implement [[likely]] and [[unlikely]] in SwitchStmt
This implements the likelihood attribute for the switch statement. Based on the
discussion in D85091 and D86559 it only handles the attribute when placed on
the case labels or the default labels.

It also marks the likelihood attribute as feature complete. There are more QoI
patches in the pipeline.

Differential Revision: https://reviews.llvm.org/D89210
2020-10-18 13:48:42 +02:00
Mark de Wever 1113fbf44c [CodeGen] Improve likelihood branch weights
Bruno De Fraine discovered some issues with D85091. The branch weights
generated for `logical not` and `ternary conditional` were wrong. The
`logical and` and `logical or` differed from the code generated of
`__builtin_predict`.

Adjusted the generated code for the likelihood to match
`__builtin_predict`. The patch is based on Bruno's suggestions.

Differential Revision: https://reviews.llvm.org/D88363
2020-10-04 14:24:27 +02:00
Vedant Kumar 06bc685fa2 [ubsan] nullability-arg: Fix crash on C++ member pointers
Extend -fsanitize=nullability-arg to handle call sites which accept C++
member pointers.

rdar://62476022

Differential Revision: https://reviews.llvm.org/D88336
2020-09-28 09:41:18 -07:00
Mark de Wever 08196e0b2e Implements [[likely]] and [[unlikely]] in IfStmt.
This is the initial part of the implementation of the C++20 likelihood
attributes. It handles the attributes in an if statement.

Differential Revision: https://reviews.llvm.org/D85091
2020-09-09 20:48:37 +02:00
Erik Pilkington a9a6e62ddf [CodeGen] Make sure the EH cleanup for block captures is conditional when the block literal is in a conditional context
Previously, clang was crashing on the attached test because the EH cleanup for
the block capture was incorrectly emitted under the assumption that the
expression wasn't conditionally evaluated. This was because before 9a52de00260,
pushLifetimeExtendedDestroy was mainly used with C++ automatic lifetime
extension, where a conditionally evaluated expression wasn't possible. Now that
we're using this path for block captures, we need to handle this case.

rdar://66250047

Differential revision: https://reviews.llvm.org/D86854
2020-08-31 10:12:17 -04:00
George Rokos 537b16e9b8 [OpenMP 5.0] Codegen support to pass user-defined mapper functions to runtime
This patch implements the code generation to use OpenMP 5.0 declare mapper (a.k.a. user-defined mapper) constructs.
Patch written by Lingda Li.

Differential Revision: https://reviews.llvm.org/D67833
2020-07-15 18:11:43 -07:00
Vedant Kumar 8c4a65b9b2 [ubsan] Check implicit casts in ObjC for-in statements
Check that the implicit cast from `id` used to construct the element
variable in an ObjC for-in statement is valid.

This check is included as part of a new `objc-cast` sanitizer, outside
of the main 'undefined' group, as (IIUC) the behavior it's checking for
is not technically UB.

The check can be extended to cover other kinds of invalid casts in ObjC.

Partially addresses: rdar://12903059, rdar://9542496

Differential Revision: https://reviews.llvm.org/D71491
2020-07-13 15:11:18 -07:00
Ten Tzen 66f1dcd872 [Windows SEH] Fix the frame-ptr of a nested-filter within a _finally
This change fixed a SEH bug (exposed by test58 & test61 in MSVC test xcpt4u.c);
when an Except-filter is located inside a finally, the frame-pointer generated today
via intrinsic @llvm.eh.recoverfp is the frame-pointer of the immediate
parent _finally, not the frame-ptr of outermost host function.

The fix is to retrieve the Establisher's frame-pointer that was previously saved in
parent's frame.
The prolog of a filter inside a _finally should be like code below:

%0 = call i8* @llvm.eh.recoverfp(i8* bitcast (@"?fin$0@0@main@@"), i8*%frame_pointer)
%1 = call i8* @llvm.localrecover(i8* bitcast (@"?fin$0@0@main@@"), i8*%0, i32 0)
%2 = bitcast i8* %1 to i8**
%3 = load i8*, i8** %2, align 8

Differential Revision: https://reviews.llvm.org/D77982
2020-07-12 01:37:56 -07:00
Chuanqi Xu 8849831d55 [Coroutines] Warning if return type of coroutine_handle::address is not void*
User can own a version of coroutine_handle::address() whose return type is not
void* by using template specialization for coroutine_handle<> for some
promise_type.

In this case, the codes may violate the capability with existing async C APIs
that accepted a void* data parameter which was then passed back to the
user-provided callback.

Patch by ChuanqiXu

Differential Revision: https://reviews.llvm.org/D82442
2020-07-06 13:46:01 +08:00
Fady Ghanim 80e15b4574 [Clang][OpenMP][OMPBuilder] Moving OMP allocation and cache creation code to OMPBuilderCBHelpers
Summary:
Modified the OMPBuilderCBHelpers in the following ways:
- Moved location of class definition and deleted all constructors
- Moved OpenMP-specific address allocation of local variables
- Moved threadprivate variable creation for the current thread

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79676
2020-06-28 19:04:20 -04:00
Xiangling Liao 22337bfe7d [AIX][Frontend] Static init implementation for AIX considering no priority
1. Provides no piroirity supoort && disables three priority related
   attributes: init_priority, ctor attr, dtor attr;
2. '-qunique' in XL compiler equivalent behavior of emitting sinit
    and sterm functions name using getUniqueModuleId() util function
    in LLVM (currently no support for InternalLinkage and WeakODRLinkage
    symbols);
3. Add testcases to emit IR sample with __sinit80000000, __dtor, and
    __sterm80000000;
4. Temporarily side-steps the need to implement the functionality of
   llvm.global_ctors and llvm.global_dtors arrays. The uses of that
   functionality in this patch (with respect to the name of the functions
   involved) are not representative of how the functionality will be used
   once implemented.

Differential Revision: https://reviews.llvm.org/D74166
2020-06-19 08:27:07 -04:00
Sander de Smalen ad828e3f4d [SveEmitter] Add builtins for struct loads/stores (ld2/ld3/etc)
The struct store intrinsics in LLVM IR take the individual parts
as arguments, so this patch uses the intrinsics used for `svget`
to break the tuples into individual parts.

Reviewers: c-rhodes, efriedma, ctetreau, david-arm

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D81466
2020-06-19 10:35:42 +01:00
Lucas Prates ada4c9dc4a [ARM][Clang] Removing lowering of half-precision FP arguments and returns from Clang's CodeGen
Summary:
On the process of moving the argument lowering handling for
half-precision floating point arguments and returns to the backend, this
patch removes the code that was responsible for handling the coercion of
those arguments in Clang's Codegen.

Reviewers: rjmccall, chill, ostannard, dnsampaio

Reviewed By: ostannard

Subscribers: stuij, kristof.beyls, dmgreen, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D81451
2020-06-18 13:17:07 +01:00
Sander de Smalen 1d7b4a7e5e [SveEmitter] Add builtins for tuple creation (svcreate2/svcreate3/etc)
The svcreate builtins allow constructing a tuple from individual vectors, e.g.

  svint32x2_t svcreate2(svint32_t v2, svint32_t v2)`

Reviewers: c-rhodes, david-arm, efriedma

Reviewed By: c-rhodes, efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D81463
2020-06-18 10:07:09 +01:00
Tyker 51e4aa87e0 attempt to fix failing buildbots after 3bab88b7ba
Prevent IR-gen from emitting consteval declarations

Summary: with this patch instead of emitting calls to consteval function. the IR-gen will emit a store of the already computed result.
2020-06-15 12:58:37 +02:00
Kirill Bobyrev 550c4562d1 Revert "Prevent IR-gen from emitting consteval declarations"
This reverts commit 3bab88b7ba.

This patch causes test failures:
http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/17260
2020-06-15 12:14:15 +02:00
Tyker 3bab88b7ba Prevent IR-gen from emitting consteval declarations
Summary: with this patch instead of emitting calls to consteval function. the IR-gen will emit a store of the already computed result.

Reviewers: rsmith

Reviewed By: rsmith

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76420
2020-06-15 10:47:14 +02:00
Akira Hatanaka c9a52de002 [CodeGen] Simplify the way lifetime of block captures is extended
Rather than pushing inactive cleanups for the block captures at the
entry of a full expression and activating them during the creation of
the block literal, just call pushLifetimeExtendedDestroy to ensure the
cleanups are popped at the end of the scope enclosing the block
expression.

rdar://problem/63996471

Differential Revision: https://reviews.llvm.org/D81624
2020-06-11 16:06:22 -07:00
John McCall 7fac1acc61 Set the LLVM FP optimization flags conservatively.
Functions can have local pragmas that override the global settings.
We set the flags eagerly based on global settings, but if we emit
an expression under the influence of a pragma, we clear the
appropriate flags from the function.

In order to avoid doing a ton of redundant work whenever we emit
an FP expression, configure the IRBuilder to default to global
settings, and only reconfigure it when we see an FP expression
that's not using the global settings.

Patch by Michele Scandale!

https://reviews.llvm.org/D80462
2020-06-11 18:16:41 -04:00
Alexey Bataev 90b54fa045 [OPENMP50]Codegen for use_device_addr clauses.
Summary:
Added codegen for use_device_addr clause. The components of the list
items are mapped as a kind of RETURN components and then the returned
base address is used instead of the real address of the base declaration
used in the use_device_addr expressions.

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, cfe-commits, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D80730
2020-06-11 09:54:51 -04:00
Saiyedul Islam 675cefbf60 [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)

First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.

Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert

Reviewed By: arsenm, sameerds

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D80804
2020-06-09 17:02:58 +00:00
Alexey Bataev cb9191c042 [OPENMP]Improve code readability, NFC.
Reuse existing function instead of code duplication and use better type.
2020-06-09 08:50:36 -04:00
Alexey Bataev 4e3d4622b1 Fix undefined behaviour when trying to deref nullptr. 2020-06-04 17:52:06 -04:00
Alexey Bataev bd1c03d7b7 [OPENMP50]Codegen for inscan reductions in worksharing directives.
Summary:
Implemented codegen for reduction clauses with inscan modifiers in
worksharing constructs.

Emits the code for the directive with inscan reductions.
The code is the following:
```
size num_iters = <num_iters>;
<type> buffer[num_iters];
for (i: 0..<num_iters>) {
  <input phase>;
  buffer[i] = red;
}
for (int k = 0; k != ceil(log2(num_iters)); ++k)
for (size cnt = last_iter; cnt >= pow(2, k); --k)
  buffer[i] op= buffer[i-pow(2,k)];
for (0..<num_iters>) {
  red = InclusiveScan ? buffer[i] : buffer[i-1];
  <scan phase>;
}
```

Reviewers: jdoerfert

Subscribers: yaxunl, guansong, arphaman, cfe-commits, caomhin

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79948
2020-06-04 16:29:33 -04:00
John McCall 8a8d703be0 Fix how cc1 command line options are mapped into FP options.
Canonicalize on storing FP options in LangOptions instead of
redundantly in CodeGenOptions.  Incorporate -ffast-math directly
into the values of those LangOptions rather than considering it
separately when building FPOptions.  Build IR attributes from
those options rather than a mix of sources.

We should really simplify the driver/cc1 interaction here and have
the driver pass down options that cc1 directly honors.  That can
happen in a follow-up, though.

Patch by Michele Scandale!
https://reviews.llvm.org/D80315
2020-06-01 22:00:30 -04:00
Florian Hahn 8f3f88d2f5 [Matrix] Implement matrix index expressions ([][]).
This patch implements matrix index expressions
(matrix[RowIdx][ColumnIdx]).

It does so by introducing a new MatrixSubscriptExpr(Base, RowIdx, ColumnIdx).
MatrixSubscriptExprs are built in 2 steps in ActOnMatrixSubscriptExpr. First,
if the base of a subscript is of matrix type, we create a incomplete
MatrixSubscriptExpr(base, idx, nullptr). Second, if the base is an incomplete
MatrixSubscriptExpr, we create a complete
MatrixSubscriptExpr(base->getBase(), base->getRowIdx(), idx)

Similar to vector elements, it is not possible to take the address of
a MatrixSubscriptExpr.
For CodeGen, a new MatrixElt type is added to LValue, which is very
similar to VectorElt. The only difference is that we may need to cast
the type of the base from an array to a vector type when accessing it.

Reviewers: rjmccall, anemet, Bigcheese, rsmith, martong

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D76791
2020-06-01 20:08:49 +01:00
Zequan Wu e36076ee3a [clang] Add nomerge function attribute to clang
Differential Revision: https://reviews.llvm.org/D79121
2020-05-21 17:07:39 -07:00
Zequan Wu b0a0f01bc1 Revert "Add nomerge function attribute to clang"
This reverts commit 307e853954.
2020-05-21 16:13:18 -07:00
Zequan Wu 307e853954 Add nomerge function attribute to clang 2020-05-21 15:28:27 -07:00
Eli Friedman 62f3ef2b53 [CGCall] Annotate references with "align" attribute.
If we're going to assume references are dereferenceable, we should also
assume they're aligned: otherwise, we can't actually dereference them.

See also D80072.

Differential Revision: https://reviews.llvm.org/D80166
2020-05-19 20:21:30 -07:00
Sander de Smalen d6936be2ef [SveEmitter] Add builtins for svdup and svindex
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D79357
2020-05-12 11:02:32 +01:00
Sander de Smalen 4cad97595f [SveEmitter] Add builtins for svmovlb and svmovlt
These builtins are expanded in CGBuiltin to use intrinsics
for (signed/unsigned) shift left long top/bottom.

Reviewers: efriedma, SjoerdMeijer

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D79579
2020-05-11 09:41:58 +01:00
Sander de Smalen 3cb8b4c193 [SveEmitter] Add builtins for SVE2 Polynomial arithmetic
This patch adds builtins for:
- sveorbt
- sveortb
- svpmul
- svpmullb, svpmullb_pair
- svpmullt, svpmullt_pair

The svpmullb and svpmullt builtins are expressed using the svpmullb_pair
and svpmullt_pair LLVM IR intrinsics, respectively.

Reviewers: SjoerdMeijer, efriedma, rengolin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D79480
2020-05-07 11:53:04 +01:00
Michael Liao 276c8dde0b [clang][codegen] Refactor argument loading in function prolog. NFC.
Summary:
- Skip copying function arguments and unnecessary casting by using them
  directly.

Reviewers: rjmccall, kerbowa, yaxunl

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79394
2020-05-05 15:31:51 -04:00