Commit Graph

1826 Commits

Author SHA1 Message Date
Nikita Popov c63a3175c2 [AttrBuilder] Remove ctor accepting AttributeList and Index
Use the AttributeSet constructor instead. There's no good reason
why AttrBuilder itself should exact the AttributeSet from the
AttributeList. Moving this out of the AttrBuilder generally results
in cleaner code.
2022-01-15 22:39:31 +01:00
Erich Keane 2bcba21c8b [CPU-Dispatch] Make sure Dispatch names get updated if previously mangled
Cases where there is a mangling of a cpu-dispatch/cpu-specific function
before the function becomes 'multiversion' (such as a member function)
causes the wrong name to be emitted for one of the variants/resolver,
since the name is cached.  Make sure we invalidate the cache in
cpu-dispatch/cpu-specific modes, like we previously did for just target
multiversioning.
2022-01-14 10:45:55 -08:00
Erich Keane b699e8b11a Add another assert to cpu-dispatch emission to help track down a tough
to repro error.

As mentioned yesterday, I've got a problem that I can only reproduce on
Godbolt (none of the build configs on my local machine!), so this is at
least somewhat usable until I figure out a cause.
2022-01-13 06:54:08 -08:00
Erich Keane 6e77ad11ff Add an assert in cpudispatch emit to try to track down an error.
I'm attempting to debug an issue that I can only get to happen on
godbolt, where the cpu-dispatch resolver for an out of line member
function is generated with the wrong name, causing a link failure.
2022-01-12 10:31:28 -08:00
Serge Guelton d2cc6c2d0c Use a sorted array instead of a map to store AttrBuilder string attributes
Using and std::map<SmallString, SmallString> for target dependent attributes is
inefficient: it makes its constructor slightly heavier, and involves extra
allocation for each new string attribute. Storing the attribute key/value as
strings implies extra allocation/copy step.

Use a sorted vector instead. Given the low number of attributes generally
involved, this is cheaper, as showcased by

https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions

Differential Revision: https://reviews.llvm.org/D116599
2022-01-10 14:49:53 +01:00
Kazu Hirata 40446663c7 [clang] Use true/false instead of 1/0 (NFC)
Identified with modernize-use-bool-literals.
2022-01-09 00:19:47 -08:00
serge-sans-paille 9290ccc3c1 Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of
attributes to be removed from an AttrBuilder. Previously AttrBuilder was used
both for building and removing, which introduced odd situation like creation of
Attribute with dummy value because the only relevant part was the attribute
kind.

Differential Revision: https://reviews.llvm.org/D116110
2022-01-04 15:37:46 +01:00
Sami Tolvanen ec2e26eaf6 [Clang] Add __builtin_function_start
Control-Flow Integrity (CFI) replaces references to address-taken
functions with pointers to the CFI jump table. This is a problem
for low-level code, such as operating system kernels, which may
need the address of an actual function body without the jump table
indirection.

This change adds the __builtin_function_start() builtin, which
accepts an argument that can be constant-evaluated to a function,
and returns the address of the function body.

Link: https://github.com/ClangBuiltLinux/linux/issues/1353

Depends on D108478

Reviewed By: pcc, rjmccall

Differential Revision: https://reviews.llvm.org/D108479
2021-12-20 12:55:33 -08:00
Nikita Popov c3b624a191 [CodeGen] Avoid deprecated ConstantAddress constructor
Change all uses of the deprecated constructor to pass the
element type explicitly and drop it.

For cases where the correct element type was not immediately
obvious to me or would require a slightly larger change I'm
falling back to explicitly calling getPointerElementType() for now.
2021-12-15 10:42:41 +01:00
Peter Collingbourne 0a14674f27 CodeGen: Strip exception specifications from function types in CFI type names.
With C++17 the exception specification has been made part of the
function type, and therefore part of mangled type names.

However, it's valid to convert function pointers with an exception
specification to function pointers with the same argument and return
types but without an exception specification, which means that e.g. a
function of type "void () noexcept" can be called through a pointer
of type "void ()". We must therefore consider the two types to be
compatible for CFI purposes.

We can do this by stripping the exception specification before mangling
the type name, which is what this patch does.

Differential Revision: https://reviews.llvm.org/D115015
2021-12-03 14:50:52 -05:00
Ties Stuij e3b2f0226b [clang][ARM] PACBTI-M frontend support
Handle branch protection option on the commandline as well as a function
attribute. One patch for both mechanisms, as they use the same underlying
parsing mechanism.

These are recorded in a set of LLVM IR module-level attributes like we do for
AArch64 PAC/BTI (see https://reviews.llvm.org/D85649):

- command-line options are "translated" to module-level LLVM IR
  attributes (metadata).

- functions have PAC/BTI specific attributes iff the
  __attribute__((target("branch-protection=...))) was used in the function
  declaration.

- command-line option -mbranch-protection to armclang targeting Arm,
following this grammar:

branch-protection ::= "-mbranch-protection=" <protection>
protection ::=  "none" | "standard" | "bti" [ "+" <pac-ret-clause> ]
                | <pac-ret-clause> [ "+" "bti"]
pac-ret-clause ::= "pac-ret" [ "+" <pac-ret-option> ]
pac-ret-option ::= "leaf" ["+" "b-key"] | "b-key" ["+" "leaf"]

b-key is simply a placeholder to make it consistent with AArch64's
version. In Arm, however, it triggers a warning informing that b-key is
unsupported and a-key will be selected instead.

- Handle _attribute_((target(("branch-protection=..."))) for AArch32 with the
same grammer as the commandline options.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Victor Campos
- Ties Stuij

Reviewed By: vhscampos

Differential Revision: https://reviews.llvm.org/D112421
2021-12-01 10:37:16 +00:00
Erich Keane fc53eb69c2 Reapply 'Implement target_clones multiversioning'
See discussion in D51650, this change was a little aggressive in an
error while doing a 'while we were here', so this removes that error
condition, as it is apparently useful.

This reverts commit bb4934601d.
2021-11-29 06:30:01 -08:00
Phoebe Wang de34a940ae [X86] Add -mskip-rax-setup support to align with GCC
AMD64 ABI mandates caller to specify the number of used SSE registers
when passing variable arguments.
GCC also provides option -mskip-rax-setup to skip the setup of rax when
SSE is disabled. This helps to reduce the code size, see pr23258.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112413
2021-11-18 11:20:32 +08:00
Kazu Hirata d0ac215dd5 [clang] Use isa instead of dyn_cast (NFC) 2021-11-14 09:32:40 -08:00
Adrian Kuegel bb4934601d Revert "Implement target_clones multiversioning"
This reverts commit 9deab60ae7.
There is a possibly unintended semantic change.
2021-11-12 11:05:58 +01:00
Erich Keane 9deab60ae7 Implement target_clones multiversioning
As discussed here: https://lwn.net/Articles/691932/

GCC6.0 adds target_clones multiversioning. This functionality is
an odd cross between the cpu_dispatch and 'target' MV, but is compatible
with neither.

This attribute allows you to list all options, then emits a separately
optimized version of each function per-option (similar to the
cpu_specific attribute). It automatically generates a resolver, just
like the other two.

The mangling however, is... ODD to say the least. The mangling format
is:
<normal_mangling>.<option string>.<option ordinal>.

Differential Revision:https://reviews.llvm.org/D51650
2021-11-11 11:11:16 -08:00
Yaxun (Sam) Liu 4b3881e9f3 Emit hidden hostcall argument for sanitized kernels
this patch - https://reviews.llvm.org/D110337 changes the way how hostcall
hidden argument is emitted for printf, but the sanitized kernels also use
hostcall buffer to report a error for invalid memory access, which is not
handled by the above patch and it leads to vdi runtime error:

Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT:
Agent attempted to access an inaccessible address. code: 0x2b

Patch by: Praveen Velliengiri

Reviewed by: Yaxun Liu, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D112820
2021-11-10 17:05:57 -05:00
Yaxun (Sam) Liu 80072fde61 [CUDA][HIP] Allow comdat for kernels
Two identical instantiations of a template function can be emitted by two TU's
with linkonce_odr linkage without causing duplicate symbols in linker. MSVC
also requires these symbols be in comdat sections. Linux does not require
the symbols in comdat sections to be merged by linker but by default
clang puts them in comdat sections.

If a template kernel is instantiated identically in two TU's. MSVC requires
that them to be in comdat sections, otherwise MSVC linker will diagnose them as
duplicate symbols. However, currently clang does not put instantiated template
kernels in comdat sections, which causes link error for MSVC.

This patch allows putting instantiated template kernels into comdat sections.

Reviewed by: Artem Belevich, Reid Kleckner

Differential Revision: https://reviews.llvm.org/D112492
2021-11-10 16:42:23 -05:00
Itay Bookstein 9efce0baee [clang] Run LLVM Verifier in modes without CodeGen too
Previously, the Backend_Emit{Nothing,BC,LL} modes did
not run the LLVM verifier since it is usually added via
the TargetMachine::addPassesToEmitFile method according
to the DisableVerify parameter. This is called from
EmitAssemblyHelper::AddEmitPasses, which is only relevant
for BackendAction-s that require CodeGen.

Note:
* In these particular situations the verifier is added
  to the optimization pipeline rather than the codegen
  pipeline so that it runs prior to the BC/LL emission
  pass.
* This change applies to both the old and the new PMs.
* Because the clang tests use -emit-llvm ubiquitously,
  this change will enable the verifier for them.
* A small bug is fixed in emitIFuncDefinition so that
  the clang/test/CodeGen/ifunc.c test would pass:
  the emitIFuncDefinition incorrectly passed the
  GlobalDecl of the IFunc itself to the call to
  GetOrCreateLLVMFunction for creating the resolver.

Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D113352
2021-11-09 23:57:13 +02:00
Itay Bookstein 3b1fd19357 [CodeGen] Diagnose and reject non-function ifunc resolvers
Signed-off-by: Itay Bookstein <ibookstein@gmail.com>

Reviewed By: MaskRay, erichkeane

Differential Revision: https://reviews.llvm.org/D112868
2021-11-09 23:51:36 +02:00
Atmn Patel 737c4a2673 [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 15:11:05 -05:00
Atmn Patel ef717f3852 Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files"
This reverts commit 81a7cad2ff.
2021-11-09 02:10:42 -05:00
Atmn Patel 81a7cad2ff [clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113421
2021-11-09 01:52:52 -05:00
Benjamin Kramer 8adb6d6de2 [clang] Use llvm::reverse. NFCI. 2021-11-07 14:24:33 +01:00
Itay Bookstein 848812a55e [Verifier] Add verification logic for GlobalIFuncs
Verify that the resolver exists, that it is a defined
Function, and that its return type matches the ifunc's
type. Add corresponding check to BitcodeReader, change
clang to emit the correct type, and fix tests to comply.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112349
2021-10-31 20:00:57 -07:00
Aaron Ballman aad244dfc5 Revert "AddGlobalAnnotations for function with or without function body."
This reverts commit 121b2252de.

The following code causes a crash in some circumstances:

  struct k {
    ~k() __attribute__((annotate(""))) {}
  };
  void m() { k(); }
2021-10-21 07:08:18 -04:00
Itay Bookstein 08ed216000 [IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol
As discussed in:
* https://reviews.llvm.org/D94166
* https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html

The GlobalIndirectSymbol class lost most of its meaning in
https://reviews.llvm.org/D109792, which disambiguated getBaseObject
(now getAliaseeObject) between GlobalIFunc and everything else.
In addition, as long as GlobalIFunc is not a GlobalObject and
getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee
is a GlobalIFunc cannot currently be modeled properly. Creating
aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition,
calling getAliaseeObject on a GlobalIFunc will currently return nullptr,
which is undesirable because it should return the object itself for
non-aliases.

This patch refactors the GlobalIFunc class to inherit directly from
GlobalObject, and removes GlobalIndirectSymbol (while inlining the
relevant parts into GlobalAlias and GlobalIFunc). This allows for
calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc
itself, making getAliaseeObject() more consistent and enabling
alias-to-ifunc to be properly modeled in the IR.

I exercised some judgement in the API clients of GlobalIndirectSymbol:
some were 'monomorphized' for GlobalAlias and GlobalIFunc, and
some remained shared (with the type adapted to become GlobalValue).

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108872
2021-10-20 10:29:47 -07:00
Kazu Hirata d245f2e859 [clang] Use llvm::erase_if (NFC) 2021-10-17 13:50:29 -07:00
Chris Bieneman 121b2252de AddGlobalAnnotations for function with or without function body.
When AnnotateAttr is on a function, AddGlobalAnnotations is only called
in CodeGenModule::EmitGlobalFunctionDefinition which means AnnotateAttr
on function declaration without function body will be ignored.
The patch will move AddGlobalAnnotations  to
CodeGenModule::SetFunctionAttributes, so with or without function body,
the AnnotateAttr will get code gen for a function.

It'll help case when AnnotateAttr is on external function, and the
AnnotateAttr will be consumed in IR level.

For example, a pass to collect num of uses for functions with
__attribute((annotate("count_use"))) after optimizations,
As long as there's __attribute((annotate("count_use"))), function with
or without function body should be counted.

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D111109

Patch by:  python3kgae (Xiang Li)
2021-10-11 14:50:34 -05:00
Arthur Eubanks 05392466f0 Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 13:29:23 -07:00
Arthur Eubanks 569346f274 Revert "Reland [IR] Increase max alignment to 4GB"
This reverts commit 8d64314ffe.
2021-10-06 11:38:11 -07:00
Arthur Eubanks 8d64314ffe Reland [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 11:03:51 -07:00
Arthur Eubanks 72cf8b6044 Revert "[IR] Increase max alignment to 4GB"
This reverts commit df84c1fe78.

Breaks some bots
2021-10-06 10:21:35 -07:00
Arthur Eubanks df84c1fe78 [IR] Increase max alignment to 4GB
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.

This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.

The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.

Updating clang's max allowed alignment will come in a future patch.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D110451
2021-10-06 09:54:14 -07:00
Arthur Eubanks aa53785f23 Reland [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Previous revisions didn't properly declare the new dependencies.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 15:31:30 -07:00
Arthur Eubanks 7833d20f1f Revert "[clang] Rework dontcall attributes"
This reverts commit 2943071e2e.

Breaks bots
2021-09-28 14:49:27 -07:00
Arthur Eubanks 2943071e2e [clang] Rework dontcall attributes
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).

One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D110364
2021-09-28 14:21:10 -07:00
serge-sans-paille c3717b6858 Simplify handling of builtin with inline redefinition
(This is a recommit of 3d6f49a569 that should no longer break validation since
bd379915de).

It is a common practice in glibc header to provide an inline redefinition of an
existing function. It is especially the case for fortified function.

Clang currently has an imperfect approach to the problem, using a combination of
trivially recursive function detection and noinline attribute.

Simplify the logic by suffixing these functions by `.inline` during codegen, so
that they are not recognized as builtin by llvm.

After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite

Differential Revision: https://reviews.llvm.org/D109967
2021-09-28 21:00:47 +02:00
Kevin Athey 0d76d4833d Revert "Simplify handling of builtin with inline redefinition"
This reverts commit 3d6f49a569.

Broke bot: https://lab.llvm.org/buildbot/#/builders/5/builds/12360
2021-09-28 11:30:37 -07:00
serge-sans-paille 3d6f49a569 Simplify handling of builtin with inline redefinition
It is a common practice in glibc header to provide an inline redefinition of an
existing function. It is especially the case for fortified function.

Clang currently has an imperfect approach to the problem, using a combination of
trivially recursive function detection and noinline attribute.

Simplify the logic by suffixing these functions by `.inline` during codegen, so
that they are not recognized as builtin by llvm.

After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite

Differential Revision: https://reviews.llvm.org/D109967
2021-09-28 13:24:25 +02:00
Erich Keane e3b10525b4 Make multiversioning work with internal linkage
We previously made all multiversioning resolvers/ifuncs have weak
ODR linkage in IR, since we NEED to emit the whole resolver every time
we see a call, but it is not necessarily the place where all the
definitions live.

HOWEVER, when doing so, we neglected the case where the versions have
internal linkage.  This patch ensures we do this, so you don't get weird
behavior with static functions.
2021-09-17 05:56:38 -07:00
Justas Janickas f9bc1b3bee [OpenCL] Defines helper function for kernel language compatible OpenCL version
This change defines a helper function getOpenCLCompatibleVersion()
inside LangOptions class. The function contains mapping between
C++ for OpenCL versions and their corresponding compatible OpenCL
versions. This mapping function should be updated each time a new
C++ for OpenCL language version is introduced. The helper function
is expected to simplify conditions on OpenCL C and C++ for OpenCL
versions inside compiler code.

Code refactoring performed.

Differential Revision: https://reviews.llvm.org/D108693
2021-08-31 10:08:38 +01:00
Andrei Elovikov 1724a16437 [NFC][clang] Move IR-independent parts of target MV support to X86TargetParser.cpp
...that is located under llvm/lib/Support/.

Reviewed By: erichkeane

Differential Revision: https://reviews.llvm.org/D108423
2021-08-30 09:48:48 -07:00
Alex Richardson 7cab90a7b1 Fix __attribute__((annotate("")) with non-zero globals AS
The existing code attempting to bitcast from a value in the default globals AS
to i8 addrspace(0)* was triggering an assertion failure in our downstream fork.
I found this while compiling poppler for CHERI-RISC-V (we use AS200 for all
globals). The test case uses AMDGPU since that is one of the in-tree targets
with a non-zero default globals address space.
The new test previously triggered a "Invalid constantexpr bitcast!" assertion
and now correctly generates code with addrspace(1) pointers.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D105972
2021-08-26 10:09:40 +01:00
Nick Desaulniers 846e562dcc [Clang] add support for error+warning fn attrs
Add support for the GNU C style __attribute__((error(""))) and
__attribute__((warning(""))). These attributes are meant to be put on
declarations of functions whom should not be called.

They are frequently used to provide compile time diagnostics similar to
_Static_assert, but which may rely on non-ICE conditions (ie. relying on
compiler optimizations). This is also similar to diagnose_if function
attribute, but can diagnose after optimizations have been run.

While users may instead simply call undefined functions in such cases to
get a linkage failure from the linker, these provide a much more
ergonomic and actionable diagnostic to users and do so at compile time
rather than at link time. Users instead may be able use inline asm .err
directives.

These are used throughout the Linux kernel in its implementation of
BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be
converted to use _Static_assert because many of the parameters are not
ICEs. The Linux kernel still needs to be modified to make use of these
when building with Clang; I have a patch that does so I will send once
this feature is landed.

To do so, we create a new IR level Function attribute, "dontcall" (both
error and warning boil down to one IR Fn Attr).  Then, similar to calls
to inline asm, we attach a !srcloc Metadata node to call sites of such
attributed callees.

The backend diagnoses these during instruction selection, while we still
know that a call is a call (vs say a JMP that's a tail call) in an arch
agnostic manner.

The frontend then reconstructs the SourceLocation from that Metadata,
and determines whether to emit an error or warning based on the callee's
attribute.

Link: https://bugs.llvm.org/show_bug.cgi?id=16428
Link: https://github.com/ClangBuiltLinux/linux/issues/1173

Reviewed By: aaron.ballman

Differential Revision: https://reviews.llvm.org/D106030
2021-08-25 10:34:18 -07:00
Jonas Hahnfeld ea08c4cd1c [CUDA] Fix static device variables with -fgpu-rdc
NVPTX does not allow dots in the identifier, so ptxas errors out with
   fatal   : Parsing error near '.static': syntax error
because it parses .static as a directive. Avoid this problem by using
two underscores, similar to what OpenMP does for outlined functions.

Differential Revision: https://reviews.llvm.org/D108456
2021-08-25 09:31:22 +02:00
Andy Wingo 8da70fed70 [clang][NFC] Tighten up code for GetGlobalVarAddressSpace
The LangAS local is only used in the OpenCL case; move its decl
inwards.

Differential Revision: https://reviews.llvm.org/D108449
2021-08-23 14:55:58 +02:00
Andy Wingo d3d4d98576 [clang][NFC] GetOrCreateLLVMGlobal takes LangAS
Pass a LangAS instead of a target address space to
GetOrCreateLLVMGlobal, to remove a place where the frontend assumes that
target address space 0 is special.

Differential Revision: https://reviews.llvm.org/D108445
2021-08-23 14:55:58 +02:00
Arthur Eubanks de0ae9e89e [NFC] Cleanup more AttributeList::addAttribute() 2021-08-17 21:05:41 -07:00
Arthur Eubanks ad727ab7d9 [NFC] Migrate some callers away from Function/AttributeLists methods that take an index
These methods can be confusing.
2021-08-17 21:05:40 -07:00
Arthur Eubanks 46cf82532c [NFC] Replace Function handling of attributes with less confusing calls
To avoid magic constants and confusing indexes.
2021-08-17 21:05:40 -07:00
Arthur Eubanks 8e9ffa1dc6 [NFC] Cleanup callers of AttributeList::hasAttributes()
AttributeList::hasAttributes() is confusing, use clearer methods like
hasFnAttrs().
2021-08-13 12:16:52 -07:00
Arthur Eubanks 80ea2bb574 [NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes()
This is more consistent with similar methods.
2021-08-13 11:16:52 -07:00
Michael Liao 6ec36d18ec [cuda] Mark builtin texture/surface reference variable as 'externally_initialized'.
- They need to be preserved even if there's no reference within the
  device code as the host code may need to initialize them based on the
  application logic.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D107718
2021-08-09 13:27:40 -04:00
Pavel Asyutchenko 7df405e079 Apply -fmacro-prefix-map to __builtin_FILE()
This matches the behavior of GCC.
Patch does not change remapping logic itself, so adding one simple smoke test should be enough.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107393
2021-08-04 16:42:14 -07:00
Yaxun (Sam) Liu 44dbbe6106 [HIP] Preserve ASAN bitcode library functions
Address sanitizer passes may generate call of ASAN bitcode library
functions after bitcode linking in lld, therefore lld cannot add
those symbols since it does not know they will be used later.

To solve this issue, clang emits a reference to a bicode library
function which calls all ASAN functions which need to be
preserved. This basically force all ASAN functions to be
linked in.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D106315
2021-07-23 10:35:52 -04:00
Joseph Huber 9ce02ea8c9 [OpenMP] Add Module metadata for OpenMP compilation
This patch adds a module level metadata flag indicating that the module
was compiled with the `-fopenmp` flag. This will make it easier for
passes like OpenMPOpt to determine if it should be run.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102361
2021-06-25 16:34:19 -04:00
Nick Desaulniers 8ace121305 [IR] convert warn-stack-size from module flag to fn attr
Otherwise, this causes issues when building with LTO for object files
that use different values.

Link: https://github.com/ClangBuiltLinux/linux/issues/1395

Reviewed By: dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D104342
2021-06-21 15:09:25 -07:00
Simon Pilgrim 61cdaf66fe [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Nick Desaulniers fc018ebb60 [IR] make -warn-frame-size into a module attr
-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.

-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.

Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.

Reviewed By: rsmith, qcolombet

Differential Revision: https://reviews.llvm.org/D103928
2021-06-10 16:15:27 -07:00
Nathan Sidwell b2d0c16e91 [clang] p1099 using enum part 2
This implements the 'using enum maybe-qualified-enum-tag ;' part of
1099. It introduces a new 'UsingEnumDecl', subclassed from
'BaseUsingDecl'. Much of the diff is the boilerplate needed to get the
new class set up.

There is one case where we accept ill-formed, but I believe this is
merely an extended case of an existing bug, so consider it
orthogonal. AFAICT in class-scope the c++20 rule is that no 2 using
decls can bring in the same target decl ([namespace.udecl]/8). But we
already accept:

struct A { enum { a }; };
struct B : A { using A::a; };
struct C : B { using A::a;
using B::a; }; // same enumerator

this patch permits mixtures of 'using enum Bob;' and 'using Bob::member;' in the same way.

Differential Revision: https://reviews.llvm.org/D102241
2021-06-08 11:11:46 -07:00
Nick Desaulniers 3787ee4571 reland [IR] make -stack-alignment= into a module attr
Relands commit 433c8d950c with fixes for
MIPS.

Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 10:59:46 -07:00
Nick Desaulniers a596b54d47 Revert "[IR] make -stack-alignment= into a module attr"
This reverts commit 433c8d950c.

Breaks the MIPS build.
2021-06-08 08:55:50 -07:00
Nick Desaulniers 433c8d950c [IR] make -stack-alignment= into a module attr
Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 08:31:04 -07:00
Andrew Savonichev b31f41e78b [Clang] Support a user-defined __dso_handle
This fixes PR49198: Wrong usage of __dso_handle in user code leads to
a compiler crash.

When Init is an address of the global itself, we need to track it
across RAUW. Otherwise the initializer can be destroyed if the global
is replaced.

Differential Revision: https://reviews.llvm.org/D101156
2021-06-07 12:54:08 +03:00
Martin Storsjö 0e4cf807ae [clang] [MinGW] Don't mark emutls variables as DSO local
These actually can be automatically imported from another DLL. (This
works properly as long as the actual implementation of emutls is
linked dynamically from e.g. libgcc; if the implementation comes from
compiler-rt or a statically linked libgcc, it doesn't work as intended.)

This fixes PR50146 and https://github.com/msys2/MINGW-packages/issues/8706
(fixing calling std::call_once in a dynamically linked libstdc++);
since f731839584 the dso_local attribute
on the TLS variable affected the actual generated code for accessing
the emutls variable.

The dso_local attribute on the emutls variable made those accesses to
use 32 bit relative addressing in code, which requires runtime pseudo
relocations in the text section, and breaks entirely if the actual
other variable ends up loaded too far away in the virtual address
space.

Differential Revision: https://reviews.llvm.org/D102970
2021-05-27 23:51:22 +03:00
Nick Desaulniers 033138ea45 [IR] make stack-protector-guard-* flags into module attrs
D88631 added initial support for:

- -mstack-protector-guard=
- -mstack-protector-guard-reg=
- -mstack-protector-guard-offset=

flags, and D100919 extended these to AArch64. Unfortunately, these flags
aren't retained for LTO. Make them module attributes rather than
TargetOptions.

Link: https://github.com/ClangBuiltLinux/linux/issues/1378

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D102742
2021-05-21 15:53:30 -07:00
Yaxun (Sam) Liu 4cb42564ec [CUDA][HIP] Fix device variables used by host
variables emitted on both host and device side with different addresses
when ODR-used by host function should not cause device side counter-part
to be force emitted.

This fixes the regression caused by https://reviews.llvm.org/D102237

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102801
2021-05-20 17:04:29 -04:00
Reid Kleckner 8f20ac9595 [PGO] Don't reference functions unless value profiling is enabled
This reduces the size of chrome.dll.pdb built with optimizations,
coverage, and line table info from 4,690,210,816 to 2,181,128,192, which
makes it possible to fit under the 4GB limit.

This change can greatly reduce binary size in coverage builds, which do
not need value profiling. IR PGO builds are unaffected. There is a minor
behavior change for frontend PGO.

PGO and coverage both use InstrProfiling to create profile data with
counters. PGO records the address of each function in the __profd_
global. It is used later to map runtime function pointer values back to
source-level function names. Coverage does not appear to use this
information.

Recording the address of every function with code coverage drastically
increases code size. Consider this program:

  void foo();
  void bar();
  inline void inlineMe(int x) {
    if (x > 0)
      foo();
    else
      bar();
  }
  int getVal();
  int main() { inlineMe(getVal()); }

With code coverage, the InstrProfiling pass runs before inlining, and it
captures the address of inlineMe in the __profd_ global. This greatly
increases code size, because now the compiler can no longer delete
trivial code.

One downside to this approach is that users of frontend PGO must apply
the -mllvm -enable-value-profiling flag globally in TUs that enable PGO.
Otherwise, some inline virtual method addresses may not be recorded and
will not be able to be promoted. My assumption is that this mllvm flag
is not popular, and most frontend PGO users don't enable it.

Differential Revision: https://reviews.llvm.org/D102818
2021-05-20 11:09:24 -07:00
Fangrui Song 37561ba89b -fno-semantic-interposition: Don't set dso_local on GlobalVariable
`clang -fpic -fno-semantic-interposition` may set dso_local on variables for -fpic.

GCC folks consider there are 'address interposition' and 'semantic interposition',
and 'disabling semantic interposition' can optimize function calls but
cannot change variable references to use local aliases
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483).

This patch removes dso_local for variables in
`clang -fpic -fno-semantic-interposition` mode so that the built shared objects can
work with copy relocations. Building llvm-project tiself with
-fno-semantic-interposition (D102453) should now be safe with trunk Clang.

Example:
```
// a.c
int var;
int *addr() { return var; }

// old: cannot be interposed
movslq  .Lvar$local(%rip), %rax
// new: can be interposed
movq    var@GOTPCREL(%rip), %rax
movslq  (%rax), %rax
```

The local alias lowering for `GlobalVariable`s is kept in case there is a
future option allowing local aliases.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D102583
2021-05-19 16:08:28 -07:00
Ten Tzen 797ad70152 [Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.

This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.

Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.

NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.

The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
  faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
  must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
  outside of _try region must be updated in memory (not just in register)
  before the subsequent exception occurs.

The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.

Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.

This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.

One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:

A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others.  If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:

Warning: jump bypasses variable with a non-trivial destructor

In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.

Implementation:
Part-1: Clang implementation described below.

Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.

Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.

Part-2 (will be in Part-2 patch): LLVM implementation described below.

For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).

For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.

The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.

Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html
https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html

Differential Revision: https://reviews.llvm.org/D80344/new/
2021-05-17 22:42:17 -07:00
Arthur Eubanks 9f7d552cff [NFC] Pass GV value type instead of pointer type to GetOrCreateLLVMGlobal
For opaque pointers, to avoid PointerType::getElementType().

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102638
2021-05-17 18:41:17 -07:00
Roman Lebedev a624cec56d
[Clang][Codegen] Do not annotate thunk's this/return types with align/deref/nonnull attrs
As it was discovered in post-commit feedback
for 0aa0458f14,
we handle thunks incorrectly, and end up annotating
their this/return with attributes that are valid
for their callees, not for thunks themselves.

While it would be good to fix this properly,
and keep annotating them on thunks,
i've tried doing that in https://reviews.llvm.org/D100388
with little success, and the patch is stuck for a month now.

So for now, as a stopgap measure, subj.
2021-05-13 20:33:08 +03:00
Yaxun (Sam) Liu 98575708da [CUDA][HIP] Fix device template variables
Currently clang does not emit device template variables
instantiated only in host functions, however, nvcc is
able to do that:

https://godbolt.org/z/fneEfferY

This patch fixes this issue by refactoring and extending
the existing mechanism for emitting static device
var ODR-used by host only. Basically clang records
device variables ODR-used by host code and force
them to be emitted in device compilation. The existing
mechanism makes sure these device variables ODR-used
by host code are added to llvm.compiler-used, therefore
they are guaranteed not to be deleted.

It also fixes non-ODR-use of static device variable by host code
causing static device variable to be emitted and registered,
which should not.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102237
2021-05-12 11:13:29 -04:00
Johannes Doerfert df729e2b82 [OpenMP] Overhaul `declare target` handling
This patch fixes various issues with our prior `declare target` handling
and extends it to support `omp begin declare target` as well.

This started with PR49649 in mind, trying to provide a way for users to
avoid the "ref" global use introduced for globals with internal linkage.
From there it went down the rabbit hole, e.g., all variables, even
`nohost` ones, were emitted into the device code so it was impossible to
determine if "ref" was needed late in the game (based on the name only).
To make it really useful, `begin declare target` was needed as it can
carry the `device_type`. Not emitting variables eagerly had a ripple
effect. Finally, the precedence of the (explicit) declare target list
items needed to be taken into account, that meant we cannot just look
for any declare target attribute to make a decision. This caused the
handling of functions to require fixup as well.

I tried to clean up things while I was at it, e.g., we should not "parse
declarations and defintions" as part of OpenMP parsing, this will always
break at some point. Instead, we keep track what region we are in and
act on definitions and declarations instead, this is what we do for
declare variant and other begin/end directives already.

Highlights:
  - new diagnosis for restrictions specificed in the standard,
  - delayed emission of globals not mentioned in an explicit
    list of a declare target,
  - omission of `nohost` globals on the host and `host` globals on the
    device,
  - no explicit parsing of declarations in-between `omp [begin] declare
    variant` and the corresponding end anymore, regular parsing instead,
  - precedence for explicit mentions in `declare target` lists over
    implicit mentions in the declaration-definition-seq, and
  - `omp allocate` declarations will now replace an earlier emitted
    global, if necessary.

---

Notes:

The patch is larger than I hoped but it turns out that most changes do
on their own lead to "inconsistent states", which seem less desirable
overall.

After working through this I feel the standard should remove the
explicit declare target forms as the delayed emission is horrible.
That said, while we delay things anyway, it seems to me we check too
often for the current status even though that is often not sufficient to
act upon. There seems to be a lot of duplication that can probably be
trimmed down. Eagerly emitting some things seems pretty weak as an
argument to keep so much logic around.

---

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D101030
2021-05-06 02:10:41 -05:00
Leonard Chan 84c4754372 [clang] Add -fc++-abi= flag for specifying which C++ ABI to use
This implements the flag proposed in RFC
http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html.

The goal is to add a way to override the default target C++ ABI through a
compiler flag. This makes it easier to test and transition between different
C++ ABIs through compile flags rather than build flags.

In this patch:

- Store -fc++-abi= in a LangOpt. This isn't stored in a CodeGenOpt because
  there are instances outside of codegen where Clang needs to know what the
  ABI is (particularly through ASTContext::createCXXABI), and we should be
  able to override the target default if the flag is provided at that point.
- Expose the existing ABIs in TargetCXXABI as values that can be passed
  through this flag.
  - Create a .def file for these ABIs to make it easier to check flag values.
  - Add an error for diagnosing bad ABI flag values.

Differential Revision: https://reviews.llvm.org/D85802
2021-05-04 10:52:13 -07:00
Alex Lorenz 2669abaecf [clang][CodeGen] Use llvm::stable_sort for multi version resolver options
The use of llvm::sort causes periodic failures on the bot with EXPENSIVE_CHECKS enabled,
as the regular sort pre-shuffles the array in the expensive checks mode, leading to a
non-deterministic test result which causes the CodeGenCXX/attr-cpuspecific-outoflinedefs.cpp
testcase to fail on the bot (http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/).
2021-05-03 20:07:00 -07:00
Alexey Bader 7818906ca1 [SYCL] Implement SYCL address space attributes handling
Default address space (applies when no explicit address space was
specified) maps to generic (4) address space.

Added SYCL named address spaces `sycl_global`, `sycl_local` and
`sycl_private` defined as sub-sets of the default address space.

Static variables without address space now reside in global address
space when compile for SPIR target, unless they have an explicit address
space qualifier in source code.

Differential Revision: https://reviews.llvm.org/D89909
2021-04-26 13:44:10 +03:00
Fangrui Song 2786e673c7 [IR][sanitizer] Add module flag "frame-pointer" and set it for cc1 -mframe-pointer={non-leaf,all}
The Linux kernel objtool diagnostic `call without frame pointer save/setup`
arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism
introduced in D100251, it's trivial to respect the command line
-m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it.

Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan)
Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan)

Also document the function attribute "frame-pointer" which is long overdue.

Differential Revision: https://reviews.llvm.org/D101016
2021-04-22 18:07:30 -07:00
Fangrui Song 775a9483e5 [IR][sanitizer] Set nounwind on module ctor/dtor, additionally set uwtable if -fasynchronous-unwind-tables
On ELF targets, if a function has uwtable or personality, or does not have
nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module.

Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified.
(i.e. If -g[123], every function gets `.eh_frame`.
This behavior is strange but that is the status quo on GCC and Clang.)

Let's take asan as an example. Other sanitizers are similar.
`asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true,
so every function gets `.eh_frame` if `-g[123]` is specified.
This is the root cause that
`-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame
while
`-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame.

This patch

* sets the nounwind attribute on sanitizer module ctor/dtor.
* let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute.

The "uwtable" mechanism is generic: synthesized functions not cloned/specialized
from existing ones should consider `Function::createWithDefaultAttr` instead of
`Function::create` if they want to get some default attributes which
have more of module semantics.

Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955
https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc.

Differential Revision: https://reviews.llvm.org/D100251
2021-04-21 15:58:20 -07:00
Erich Keane 0ed613612c Ensure target-multiversioning emits deferred declarations
As reported in PR50025, sometimes we would end up not emitting functions
needed by inline multiversioned variants. This is because we typically
use the 'deferred decl' mechanism to emit these.  However, the variants
are emitted after that typically happens.  This fixes that by ensuring
we re-run deferred decls after this happens. Also, the multiversion
emission is done recursively to ensure that MV functions that require
other MV functions to be emitted get emitted.
2021-04-20 08:10:26 -07:00
Jan Svoboda 6a72ed239c [clang] NFC: Fix range-based for loop warnings related to decl lookup 2021-04-19 18:31:31 +02:00
Alexey Bataev f6f21dcd6c [OPENMP]Fix PR49636: Assertion `(!Entry.getAddress() || Entry.getAddress() == Addr) && "Resetting with the new address."' failed.
The original issue is caused by the fact that the variable is allocated
with incorrect type i1 instead of i8. This causes the bitcasting of the
declaration to i8 type and the bitcast expression does not match the
original variable.
To fix the problem, the UndefValue initializer and the original
variable should be emitted with type i8, not i1.

Differential Revision: https://reviews.llvm.org/D99297
2021-03-29 06:55:57 -07:00
Alex Lorenz d672d5219a Revert "[CodeGenModule] Set dso_local for Mach-O GlobalValue"
This reverts commit 809a1e0ffd.

Mach-O doesn't support dso_local and this change broke XNU because of the use of dso_local.

Differential Revision: https://reviews.llvm.org/D98458
2021-03-17 17:27:41 -07:00
Luke Drummond 440f6bdf34 [OpenCL][NFCI] Prefer CodeGenFunction::EmitRuntimeCall
`CodeGenFunction::EmitRuntimeCall` automatically sets the right calling
convention for the callee so we can avoid setting it ourselves.

As requested in https://reviews.llvm.org/D98411

Reviewed by: anastasia
Differential Revision: https://reviews.llvm.org/D98705
2021-03-16 16:22:19 +00:00
Luke Drummond fcfd3fda71 [OpenCL] Respect calling convention for builtin
`__translate_sampler_initializer` has a calling convention of
`spir_func`, but clang generated calls to it using the default CC.

Instruction Combining was lowering these mismatching calling conventions
to `store i1* undef` which itself was subsequently lowered to a trap
instruction by simplifyCFG resulting in runtime `SIGILL`

There are arguably two bugs here: but whether there's any wisdom in
converting an obviously invalid call into a runtime crash over aborting
with a sensible error message will require further discussion. So for
now it's enough to set the right calling convention on the runtime
helper.

Reviewed By: svenh, bader

Differential Revision: https://reviews.llvm.org/D98411
2021-03-15 17:26:51 +00:00
Sriraman Tallam cdb42a4cc4 Disable unique linkage suffixes ifor global vars until demanglers can be fixed.
D96109 added support for unique internal linkage names for both internal
linkage functions and global variables. There was a lot of discussion on how to
get the demangling right for functions but I completely missed the point that
demanglers do not support suffixes for global vars. For example:

$ c++filt _ZL3foo
foo
$ c++filt _ZL3foo.uniq.123
_ZL3foo.uniq.123

The demangling for functions works as expected.

I am not sure of the impact of this. I don't understand how debuggers and other
tools depend on the correctness of global variable demangling so I am
pre-emptively disabling it until we can get the demangling support added.

Importantly, uniquefying global variables is not needed right now as we do not
do profile attribution to global vars based on sampling. It was added for
completeness and so this feature is not exactly missed.

Differential Revision: https://reviews.llvm.org/D98392
2021-03-11 20:59:30 -08:00
Sriraman Tallam 78d0e91865 Refactor -funique-internal-linakge-names implementation.
The option -funique-internal-linkage-names was added in D73307 and D78243 as a
LLVM early pass to insert a unique suffix to internal linkage functions and
vars. The unique suffix was the hash of the module path. However, we found
that this can be done more cleanly in clang early and the fixes that need to
be done later can be completely avoided. The fixes in particular are trying
to modify the DW_AT_linkage_name and finding the right place to insert the
pass.

This patch ressurects the original implementation proposed in D73307 which
was reviewed and then ditched in favor of the pass based approach.

Differential Revision: https://reviews.llvm.org/D96109
2021-03-05 13:32:17 -08:00
Akira Hatanaka 1900503595 [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of
explicitly emitting retainRV or claimRV calls in the IR

This reapplies ed4718eccb, which was reverted
because it was causing a miscompile. The bug that was causing the miscompile
has been fixed in 75805dce5f.

Original commit message:

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease
optimization (see the link below) by separating calls from the marker
instructions or retainRV/claimRV calls. The backend changes are in
https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

- The front-end adds operand bundle "clang.arc.attachedcall" to calls,
  which indicates the call is implicitly followed by a marker
  instruction and an implicit retainRV/claimRV call that consumes the
  call result. In addition, it emits a call to
  @llvm.objc.clang.arc.noop.use, which consumes the call result, to
  prevent the middle-end passes from changing the return type of the
  called function. This is currently done only when the target is arm64
  and the optimization level is higher than -O0.

- ARC optimizer temporarily emits retainRV/claimRV calls after the calls
  with the operand bundle in the IR and removes the inserted calls after
  processing the function.

- ARC contract pass emits retainRV/claimRV calls after the call with the
  operand bundle. It doesn't remove the operand bundle on the call since
  the backend needs it to emit the marker instruction. The retainRV and
  claimRV calls are emitted late in the pipeline to prevent optimization
  passes from transforming the IR in a way that makes it harder for the
  ARC middle-end passes to figure out the def-use relationship between
  the call and the retainRV/claimRV calls (which is the cause of
  PR31925).

- The function inliner removes an autoreleaseRV call in the callee if
  nothing in the callee prevents it from being paired up with the
  retainRV/claimRV call in the caller. It then inserts a release call if
  claimRV is attached to the call since autoreleaseRV+claimRV is
  equivalent to a release. If it cannot find an autoreleaseRV call, it
  tries to transfer the operand bundle to a function call in the callee.
  This is important since the ARC optimizer can remove the autoreleaseRV
  returning the callee result, which makes it impossible to pair it up
  with the retainRV/claimRV call in the caller. If that fails, it simply
  emits a retain call in the IR if retainRV is attached to the call and
  does nothing if claimRV is attached to it.

- SCCP refrains from replacing the return value of a call with a
  constant value if the call has the operand bundle. This ensures the
  call always has at least one user (the call to
  @llvm.objc.clang.arc.noop.use).

- This patch also fixes a bug in replaceUsesOfNonProtoConstant where
  multiple operand bundles of the same kind were being added to a call.

Future work:

- Use the operand bundle on x86-64.

- Fix the auto upgrader to convert call+retainRV/claimRV pairs into
  calls with the operand bundles.

rdar://71443534

Differential Revision: https://reviews.llvm.org/D92808
2021-03-04 11:22:30 -08:00
Wang, Pengfei e7e67c930a Add Windows ehcont section support (/guard:ehcont).
Add option /guard:ehcont

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D96709
2021-03-04 11:47:29 +08:00
Hans Wennborg 0a5dd06718 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb.
2021-03-03 15:51:40 +01:00
Richard Smith 9e2579dbf4 Fix infinite recursion during IR emission if a constant-initialized lifetime-extended temporary object's initializer refers back to the same object.
`GetAddrOfGlobalTemporary` previously tried to emit the initializer of
a global temporary before updating the global temporary map. Emitting the
initializer could recurse back into `GetAddrOfGlobalTemporary` for the same
temporary, resulting in an infinite recursion.

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D97733
2021-03-01 22:19:21 -08:00
Yaxun (Sam) Liu 5cf2a37f12 [HIP] Emit kernel symbol
Currently clang uses stub function to launch kernel. This is inconvenient
to interop with C++ programs since the stub function has different name
as kernel, which is required by ROCm debugger.

This patch emits a variable symbol which has the same name as the kernel
and uses it to register and launch the kernel. This allows C++ program to
launch a kernel by using the original kernel name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D86376
2021-03-01 16:31:40 -05:00
Fangrui Song 8afdacba9d Add GNU attribute 'retain'
For ELF targets, GCC 11 will set SHF_GNU_RETAIN on the section of a
`__attribute__((retain))` function/variable to prevent linker garbage
collection. (See AttrDocs.td for the linker support).

This patch adds `retain` functions/variables to the `llvm.used` list, which has
the desired linker GC semantics. Note: `retain` does not imply `used`,
so an unused function/variable can be dropped by Sema.

Before 'retain' was introduced, previous ELF solutions require inline asm or
linker tricks, e.g.  `asm volatile(".reloc 0, R_X86_64_NONE, target");`
(architecture dependent) or define a non-local symbol in the section and use
`ld -u`. There was no elegant source-level solution.

With D97448, `__attribute__((retain))` will set `SHF_GNU_RETAIN` on ELF targets.

Differential Revision: https://reviews.llvm.org/D97447
2021-02-26 16:37:50 -08:00
Fangrui Song 28cb620321 Change some addUsedGlobal to addUsedOrCompilerUsedGlobal
An global value in the `llvm.used` list does not have GC root semantics on ELF targets.
This will be changed in a subsequent backend patch.

Change some `llvm.used` in the ELF code path to use `llvm.compiler.used` to
prevent undesired GC root semantics.

Change one extern "C" alias (due to `__attribute__((used))` in extern "C") to use `llvm.compiler.used` on all targets.

GNU ld has a rule "`__start_/__stop_` references from a live input section retain the associated C identifier name sections",
which LLD may drop entirely (currently refined to exclude SHF_LINK_ORDER/SHF_GROUP) in a future release (the rule makes it clumsy to GC metadata sections; D96914 added a way to try the potential future behavior).
For `llvm.used` global values defined in a C identifier name section, keep using `llvm.used` so that
the future LLD change will not affect them.

rnk kindly categorized the changes:
```
ObjC/blocks: this wants GC root semantics, since ObjC mainly runs on Mac.
MS C++ ABI stuff: wants GC root semantics, no change
OpenMP: unsure, but GC root semantics probably don't hurt
CodeGenModule: affected in this patch to *not* use GC root semantics so that __attribute__((used)) behavior remains the same on ELF, plus two other minor use cases that don't want GC semantics
Coverage: Probably want GC root semantics
CGExpr.cpp: refers to LTO, wants GC root
CGDeclCXX.cpp: one is MS ABI specific, so yes GC root, one is some other C++ init functionality, which should form GC roots (C++ initializers can have side effects and must run)
CGDecl.cpp: Changed in this patch for __attribute__((used))
```

Differential Revision: https://reviews.llvm.org/D97446
2021-02-26 10:42:07 -08:00
Yaxun (Sam) Liu 47acdec1dd [CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc
For -fgpu-rdc mode, static device vars in different TU's may have the same name.
To support accessing file-scope static device variables in host code, we need to give them
a distinct name and external linkage. This can be done by postfixing each static device variable with
a distinct CUID (Compilation Unit ID) hash.

Since the static device variables have different name across compilation units, now we let
them have external linkage so that they can be looked up by the runtime.

Reviewed by: Artem Belevich, and Jon Chesterfield

Differential Revision: https://reviews.llvm.org/D85223
2021-02-24 18:23:45 -05:00
Yaxun (Sam) Liu a3ce7f5cd2 [HIP] Fix managed variable linkage
Currently managed variables are emitted as undefined symbols, which
causes difficulty for diagnosing undefined symbols for non-managed
variables.

This patch transforms managed variables in device compilation so that
they can be emitted as normal variables.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D96195
2021-02-23 22:34:45 -05:00
Melanie Blower e64fcdf8d5 [clang][patch] Inclusive language, modify filename SanitizerBlacklist.h to NoSanitizeList.h
This patch responds to a comment from @vitalybuka in D96203: suggestion to
do the change incrementally, and start by modifying this file name. I modified
the file name and made the other changes that follow from that rename.

Reviewers: vitalybuka, echristo, MaskRay, jansvoboda11, aaron.ballman

Differential Revision: https://reviews.llvm.org/D96974
2021-02-22 15:11:37 -05:00
Igor Kudrin aa84289629 [DebugInfo] Keep the DWARF64 flag in the module metadata
This allows the option to affect the LTO output. Module::Max helps to
generate debug info for all modules in the same format.

Differential Revision: https://reviews.llvm.org/D96597
2021-02-17 17:03:34 +07:00
Artur Gainullin ff50b121e3 [SYCL] Ignore file-scope asm during device-side SYCL compilation.
Reviewed By: bader, eandrews

Differential Revision: https://reviews.llvm.org/D96538
2021-02-12 17:00:45 -08:00