llvm-project

Commit Graph

Author	SHA1	Message	Date
Tom Honermann	0ace0100ae	[clang] NFC: Simplify the interface to CodeGenModule::GetOrCreateMultiVersionResolver(). Previously, GetOrCreateMultiVersionResolver() required the caller to provide a GlobalDecl along with an llvm::type and FunctionDecl. The latter two can be cheaply obtained from the first, and the llvm::type parameter is not always used, so requiring the caller to provide them was unnecessary and created the possibility that callers would pass an inconsistent set. This change simplifies the interface to only require the GlobalDecl value. Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D122956	2022-04-05 19:50:22 -04:00
Tom Honermann	bed5ee3f4b	[clang] NFC: Enhance comments in CodeGen for multiversion function support. Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D122955	2022-04-05 19:50:22 -04:00
Shangwu Yao	15a1769631	Emit OpenCL metadata when targeting SPIR-V This is required for converting function calls such as get_global_id() into SPIR-V builtins. Differential Revision: https://reviews.llvm.org/D123049	2022-04-05 20:58:32 +00:00
Tom Honermann	7c53fc4fe1	[clang] Emit target_clones resolver functions as COMDAT. Previously, resolver functions synthesized for target_clones multiversion functions were not emitted as COMDAT. Now fixed.	2022-04-05 15:34:35 -04:00
Nikita Popov	ff18b158ed	[CodeGen] Avoid unnecessary ConstantExpr cast With opaque pointers, this is not necessarily a ConstantExpr. And we don't need one here either, just Constant is sufficient.	2022-04-05 11:28:40 +02:00
Erich Keane	9ba8c4024b	Fix behavior of ifuncs with 'used' extern "C" static functions We expect that `extern "C"` static functions to be usable in things like inline assembly, as well as ifuncs: See the bug report here: https://github.com/llvm/llvm-project/issues/54549 However, we were diagnosing this as 'not defined', because the ifunc's attempt to look up its resolver would generate a declared IR function. Additionally, as background, the way we allow these static extern "C" functions to work in inline assembly is by making an alias with the C mangling in MOST situations to the version we emit with internal-linkage/mangling. The problem here was multi-fold: First- We generated the alias after the ifunc was checked, so the function by that name didn't exist yet. Second, the ifunc's generation caused a symbol to exist under the name of the alias already (the declared function above), which suppressed the alias generation. This patch fixes all of this by moving the checking of ifuncs/CFE aliases until AFTER we have generated the extern-C alias. Then, it does a 'fixup' around the GlobalIFunc to make sure we correct the reference. Differential Revision: https://reviews.llvm.org/D122608	2022-04-01 13:00:59 -07:00
Chris Bieneman	dfde354958	NFC. Fixing warnings from adding DXContainer Adds DXContainer to switch statements in Clang and LLDB to silence warnings.	2022-03-29 14:46:24 -05:00
James Y Knight	d614874900	[Clang] Implement __builtin_source_location. This builtin returns the address of a global instance of the `std::source_location::__impl` type, which must be defined (with an appropriate shape) before calling the builtin. It will be used to implement std::source_location in libc++ in a future change. The builtin is compatible with GCC's implementation, and libstdc++'s usage. An intentional divergence is that GCC declares the builtin's return type to be `const void` (for ease-of-implementation reasons), while Clang uses the actual type, `const std::source_location::__impl`. In order to support this new functionality, I've also added a new 'UnnamedGlobalConstantDecl'. This artificial Decl is modeled after MSGuidDecl, and is used to represent a generic concept of an lvalue constant with global scope, deduplicated by its value. It's possible that MSGuidDecl itself, or some of the other similar sorts of things in Clang might be able to be refactored onto this more-generic concept, but there's enough special-case weirdness in MSGuidDecl that I gave up attempting to share code there, at least for now. Finally, for compatibility with libstdc++'s <source_location> header, I've added a second exception to the "cannot cast from void* to T* in constant evaluation" rule. This seems a bit distasteful, but feels like the best available option. Reviewers: aaron.ballman, erichkeane Differential Revision: https://reviews.llvm.org/D120159	2022-03-28 18:29:02 -04:00
Nikita Popov	b8f0e12847	[CodeGen] Remove some uses of deprecated Address constructor Remove two stray uses in CodeGenModule and CGCUDANV.	2022-03-22 10:02:35 +01:00
Benjamin Kramer	5d2ce7663b	Use llvm::append_range instead of push_back loops where applicable. NFCI.	2022-03-18 01:25:34 +01:00
Joseph Huber	806bbc49dc	[OpenMP] Try to embed offloading objects after codegen Currently we use the `-fembed-offload-object` option to embed a binary file into the host as a named section. This is currently only used as a codegen action, meaning we only handle this option correctly when the input is a bitcode file. This patch adds the same handling to embed an offloading object after we complete code generation. This allows us to embed the object correctly if the input file is source or bitcode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120270	2022-03-14 20:08:24 -04:00
Erich Keane	dc152659b4	Have cpu-specific variants set 'tune-cpu' as an optimization hint Due to various implementation constraints, despite the programmer choosing a 'processor' cpu_dispatch/cpu_specific needs to use the 'feature' list of a processor to identify it. This results in the identified processor in source-code not being propogated to the optimizer, and thus, not able to be tuned for. This patch changes to use the actual cpu as written for tune-cpu so that opt can make decisions based on the cpu-as-spelled, which should better match the behavior expected by the programmer. Note that the 'valid' list of processors for x86 is in llvm/include/llvm/Support/X86TargetParser.def. At the moment, this list contains only Intel processors, but other vendors may wish to add their own entries as 'alias'es (or with different feature lists!). If this is not done, there is two potential performance issues with the patch, but I believe them to be worth it in light of the improvements to behavior and performance. 1- In the event that the user spelled "ProcessorB", but we only have the features available to test for "ProcessorA" (where A is B minus features), AND there is an optimization opportunity for "B" that negatively affects "A", the optimizer will likely choose to do so. 2- In the event that the user spelled VendorI's processor, and the feature list allows it to run on VendorA's processor of similar features, AND there is an optimization opportunity for VendorIs that negatively affects "A"s, the optimizer will likely choose to do so. This can be fixed by adding an alias to X86TargetParser.def. Differential Revision: https://reviews.llvm.org/D121410	2022-03-14 06:14:30 -07:00
Itay Bookstein	f3480390be	[clang][CodeGen] Avoid emitting ifuncs with undefined resolvers The purpose of this change is to fix the following codegen bug: ``` // main.c __attribute__((cpu_specific(generic))) int foo(void) { static int z; return &z;} int main() { return foo() = 5; } // other.c __attribute__((cpu_dispatch(generic))) int foo(void); // run: clang main.c other.c -o main; ./main ``` This will segfault prior to the change, and return the correct exit code 5 after the change. The underlying cause is that when a translation unit contains a cpu_specific function without the corresponding cpu_dispatch the generated code binds the reference to foo() against a GlobalIFunc whose resolver is undefined. This is invalid: the resolver must be defined in the same translation unit as the ifunc, but historically the LLVM bitcode verifier did not check that. The generated code then binds against the resolver rather than the ifunc, so it ends up calling the resolver rather than the resolvee. In the example above it treats its return value as an int , therefore trying to write to program text. The root issue at the representation level is that GlobalIFunc, like GlobalAlias, does not support a "declaration" state. The object which provides the correct semantics in these cases is a Function declaration, but unlike Functions, changing a declaration to a definition in the GlobalIFunc case constitutes a change of the object type, as opposed to simply emitting code into a Function. I think this limitation is unlikely to change, so I implemented the fix by returning a function declaration rather than an ifunc when encountering cpu_specific, and upgrading it to an ifunc when emitting cpu_dispatch. This uses `takeName` + `replaceAllUsesWith` in similar vein to other places where the correct IR object type cannot be known locally/up-front, like in `CodeGenModule::EmitAliasDefinition`. Previous discussion in: https://reviews.llvm.org/D112349 Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D120266	2022-02-26 11:17:49 +02:00
Nikita Popov	5065076698	[CodeGen] Rename deprecated Address constructor To make uses of the deprecated constructor easier to spot, and to ensure that no new uses are introduced, rename it to Address::deprecated(). While doing the rename, I've filled in element types in cases where it was relatively obvious, but we're still left with 135 calls to the deprecated constructor.	2022-02-17 11:26:42 +01:00
Momchil Velikov	6398903ac8	Extend the `uwtable` attribute with unwind table kind We have the `clang -cc1` command-line option `-funwind-tables=1\|2` and the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind tables (1) or asynchronous unwind tables (2)`. However, this is encoded in LLVM IR by the presence or the absence of the `uwtable` attribute, i.e. we lose the information whether to generate want just some unwind tables or asynchronous unwind tables. Asynchronous unwind tables take more space in the runtime image, I'd estimate something like 80-90% more, as the difference is adding roughly the same number of CFI directives as for prologues, only a bit simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even more, if you consider tail duplication of epilogue blocks. Asynchronous unwind tables could also restrict code generation to having only a finite number of frame pointer adjustments (an example of not having a finite number of `SP` adjustments is on AArch64 when untagging the stack (MTE) in some cases the compiler can modify `SP` in a loop). Having the CFI precise up to an instruction generally also means one cannot bundle together CFI instructions once the prologue is done, they need to be interspersed with ordinary instructions, which means extra `DW_CFA_advance_loc` commands, further increasing the unwind tables size. That is to say, async unwind tables impose a non-negligible overhead, yet for the most common use cases (like C++ exceptions), they are not even needed. This patch extends the `uwtable` attribute with an optional value: - `uwtable` (default to `async`) - `uwtable(sync)`, synchronous unwind tables - `uwtable(async)`, asynchronous (instruction precise) unwind tables Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D114543	2022-02-14 14:35:02 +00:00
Arthur Eubanks	87dd3d350c	[clang][OpaquePtr] Remove call to getPointerElementType() in CodeGenModule::GetAddrOfGlobalTemporary()	2022-02-11 10:39:49 -08:00
Sameer Sahasrabuddhe	d8f99bb6e0	[AMDGPU] replace hostcall module flag with function attribute The module flag to indicate use of hostcall is insufficient to catch all cases where hostcall might be in use by a kernel. This is now replaced by a function attribute that gets propagated to top-level kernel functions via their respective call-graph. If the attribute "amdgpu-no-hostcall-ptr" is absent on a kernel, the default behaviour is to emit kernel metadata indicating that the kernel uses the hostcall buffer pointer passed as an implicit argument. The attribute may be placed explicitly by the user, or inferred by the AMDGPU attributor by examining the call-graph. The attribute is inferred only if the function is not being sanitized, and the implictarg_ptr does not result in a load of any byte in the hostcall pointer argument. Reviewed By: jdoerfert, arsenm, kpyzhov Differential Revision: https://reviews.llvm.org/D119216	2022-02-11 22:51:56 +05:30
Yaxun (Sam) Liu	1d97cb1f6e	[HIP] Emit amdgpu_code_object_version module flag code object version determines ABI, therefore should not be mixed. This patch emits amdgpu_code_object_version module flag in LLVM IR based on code object version (default 4). The amdgpu_code_object_version value is code object version times 100. LLVM IR with different amdgpu_code_object_version module flag cannot be linked. The -cc1 option -mcode-object-version=none is for ROCm device library use only, which supports multiple ABI. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D119026	2022-02-08 21:58:40 -05:00
Yaxun (Sam) Liu	171da443d5	[HIPSPV] Fix literals are mapped to Generic address space This issue is an oversight in D108621. Literals in HIP are emitted as global constant variables with default address space which maps to Generic address space for HIPSPV. In SPIR-V such variables translate to OpVariable instructions with Generic storage class which are not legal. Fix by mapping literals to CrossWorkGroup address space. The literals are not mapped to UniformConstant because the “flat” pointers in HIP may reference them and “flat” pointers are modeled as Generic pointers in SPIR-V. In SPIR-V/OpenCL UniformConstant pointers may not be casted to Generic. Patch by: Henry Linjamäki Reviewed by: Yaxun Liu Differential Revision: https://reviews.llvm.org/D118876	2022-02-05 17:26:52 -05:00
Hans Wennborg	853e0aa424	Don't dllexport reference temporaries Even if the reference itself is dllexport, the temporary should not be. In fact, we're already giving it internal linkage, so dllexporting it is not just wasteful, but will fail to link, as in the example below: $ cat /tmp/a.cc void _DllMainCRTStartup() {} const int __declspec(dllexport) &foo = 42; $ clang-cl -fuse-ld=lld /tmp/a.cc /Zl /link /dll /out:a.dll lld-link: error: <root>: undefined symbol: int const &foo::$RT1 Differential revision: https://reviews.llvm.org/D118980	2022-02-04 16:31:51 +01:00
Amilendra Kodithuwakku	1f08b08674	[clang][ARM] Emit warnings when PACBTI-M is used with unsupported architectures Branch protection in M-class is supported by - Armv8.1-M.Main - Armv8-M.Main - Armv7-M Attempting to enable this for other architectures, either by command-line (e.g -mbranch-protection=bti) or by target attribute in source code (e.g. __attribute__((target("branch-protection=..."))) ) will generate a warning. In both cases function attributes related to branch protection will not be emitted. Regardless of the warning, module level attributes related to branch protection will be emitted when it is enabled via the command-line. The following people also contributed to this patch: - Victor Campos Reviewed By: chill Differential Revision: https://reviews.llvm.org/D115501	2022-01-28 09:59:58 +00:00
Joao Moreira	82af95029e	[X86] Enable ibt-seal optimization when LTO is used in Kernel Intel's CET/IBT requires every indirect branch target to be an ENDBR instruction. Because of that, the compiler needs to correctly emit these instruction on function's prologues. Because this is a security feature, it is desirable that only actual indirect-branch-targeted functions are emitted with ENDBRs. While it is possible to identify address-taken functions through LTO, minimizing these ENDBR instructions remains a hard task for user-space binaries because exported functions may end being reachable through PLT entries, that will use an indirect branch for such. Because this cannot be determined during compilation-time, the compiler currently emits ENDBRs to every non-local-linkage function. Despite the challenge presented for user-space, the kernel landscape is different as no PLTs are used. With the intent of providing the most fit ENDBR emission for the kernel, kernel developers proposed an optimization named "ibt-seal" which replaces the ENDBRs for NOPs directly in the binary. The discussion of this feature can be seen in [1]. This diff brings the enablement of the flag -mibt-seal, which in combination with LTO enforces a different policy for ENDBR placement in when the code-model is set to "kernel". In this scenario, the compiler will only emit ENDBRs to address taken functions, ignoring non-address taken functions that are don't have local linkage. A comparison between an LTO-compiled kernel binaries without and with the -mibt-seal feature enabled shows that when -mibt-seal was used, the number of ENDBRs in the vmlinux.o binary patched by objtool decreased from 44383 to 33192, and that the number of superfluous ENDBR instructions nopped-out decreased from 11730 to 540. The 540 missed superfluous ENDBRs need to be investigated further, but hypotheses are: assembly code not being taken care of by the compiler, kernel exported symbols mechanisms creating bogus address taken situations or even these being removed due to other binary optimizations like kernel's static_calls. For now, I assume that the large drop in the number of ENDBR instructions already justifies the feature being merged. [1] - https://lkml.org/lkml/2021/11/22/591 Reviewed By: xiangzhangllvm Differential Revision: https://reviews.llvm.org/D116070	2022-01-21 10:55:34 +08:00
Yaxun (Sam) Liu	85c2bd2a0e	Prevent adding module flag amdgpu_hostcall multiple times HIP program with printf call fails to compile with -fsanitize=address option, because of appending module flag - amdgpu_hostcall twice, one for printf and one for sanitize option. This patch fixes that issue. Patch by: Praveen Velliengiri Reviewed by: Yaxun Liu, Roman Lebedev Differential Revision: https://reviews.llvm.org/D116216	2022-01-19 12:52:33 -05:00
Nikita Popov	c63a3175c2	[AttrBuilder] Remove ctor accepting AttributeList and Index Use the AttributeSet constructor instead. There's no good reason why AttrBuilder itself should exact the AttributeSet from the AttributeList. Moving this out of the AttrBuilder generally results in cleaner code.	2022-01-15 22:39:31 +01:00
Erich Keane	2bcba21c8b	[CPU-Dispatch] Make sure Dispatch names get updated if previously mangled Cases where there is a mangling of a cpu-dispatch/cpu-specific function before the function becomes 'multiversion' (such as a member function) causes the wrong name to be emitted for one of the variants/resolver, since the name is cached. Make sure we invalidate the cache in cpu-dispatch/cpu-specific modes, like we previously did for just target multiversioning.	2022-01-14 10:45:55 -08:00
Erich Keane	b699e8b11a	Add another assert to cpu-dispatch emission to help track down a tough to repro error. As mentioned yesterday, I've got a problem that I can only reproduce on Godbolt (none of the build configs on my local machine!), so this is at least somewhat usable until I figure out a cause.	2022-01-13 06:54:08 -08:00
Erich Keane	6e77ad11ff	Add an assert in cpudispatch emit to try to track down an error. I'm attempting to debug an issue that I can only get to happen on godbolt, where the cpu-dispatch resolver for an out of line member function is generated with the wrong name, causing a link failure.	2022-01-12 10:31:28 -08:00
Serge Guelton	d2cc6c2d0c	Use a sorted array instead of a map to store AttrBuilder string attributes Using and std::map<SmallString, SmallString> for target dependent attributes is inefficient: it makes its constructor slightly heavier, and involves extra allocation for each new string attribute. Storing the attribute key/value as strings implies extra allocation/copy step. Use a sorted vector instead. Given the low number of attributes generally involved, this is cheaper, as showcased by https://llvm-compile-time-tracker.com/compare.php?from=5de322295f4ade692dc4f1823ae4450ad3c48af2&to=05bc480bf641a9e3b466619af43a2d123ee3f71d&stat=instructions Differential Revision: https://reviews.llvm.org/D116599	2022-01-10 14:49:53 +01:00
Kazu Hirata	40446663c7	[clang] Use true/false instead of 1/0 (NFC) Identified with modernize-use-bool-literals.	2022-01-09 00:19:47 -08:00
serge-sans-paille	9290ccc3c1	Introduce the AttributeMask class This class is solely used as a lightweight and clean way to build a set of attributes to be removed from an AttrBuilder. Previously AttrBuilder was used both for building and removing, which introduced odd situation like creation of Attribute with dummy value because the only relevant part was the attribute kind. Differential Revision: https://reviews.llvm.org/D116110	2022-01-04 15:37:46 +01:00
Sami Tolvanen	ec2e26eaf6	[Clang] Add __builtin_function_start Control-Flow Integrity (CFI) replaces references to address-taken functions with pointers to the CFI jump table. This is a problem for low-level code, such as operating system kernels, which may need the address of an actual function body without the jump table indirection. This change adds the __builtin_function_start() builtin, which accepts an argument that can be constant-evaluated to a function, and returns the address of the function body. Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Depends on D108478 Reviewed By: pcc, rjmccall Differential Revision: https://reviews.llvm.org/D108479	2021-12-20 12:55:33 -08:00
Nikita Popov	c3b624a191	[CodeGen] Avoid deprecated ConstantAddress constructor Change all uses of the deprecated constructor to pass the element type explicitly and drop it. For cases where the correct element type was not immediately obvious to me or would require a slightly larger change I'm falling back to explicitly calling getPointerElementType() for now.	2021-12-15 10:42:41 +01:00
Peter Collingbourne	0a14674f27	CodeGen: Strip exception specifications from function types in CFI type names. With C++17 the exception specification has been made part of the function type, and therefore part of mangled type names. However, it's valid to convert function pointers with an exception specification to function pointers with the same argument and return types but without an exception specification, which means that e.g. a function of type "void () noexcept" can be called through a pointer of type "void ()". We must therefore consider the two types to be compatible for CFI purposes. We can do this by stripping the exception specification before mangling the type name, which is what this patch does. Differential Revision: https://reviews.llvm.org/D115015	2021-12-03 14:50:52 -05:00
Ties Stuij	e3b2f0226b	[clang][ARM] PACBTI-M frontend support Handle branch protection option on the commandline as well as a function attribute. One patch for both mechanisms, as they use the same underlying parsing mechanism. These are recorded in a set of LLVM IR module-level attributes like we do for AArch64 PAC/BTI (see https://reviews.llvm.org/D85649): - command-line options are "translated" to module-level LLVM IR attributes (metadata). - functions have PAC/BTI specific attributes iff the __attribute__((target("branch-protection=...))) was used in the function declaration. - command-line option -mbranch-protection to armclang targeting Arm, following this grammar: branch-protection ::= "-mbranch-protection=" <protection> protection ::= "none" \| "standard" \| "bti" [ "+" <pac-ret-clause> ] \| <pac-ret-clause> [ "+" "bti"] pac-ret-clause ::= "pac-ret" [ "+" <pac-ret-option> ] pac-ret-option ::= "leaf" ["+" "b-key"] \| "b-key" ["+" "leaf"] b-key is simply a placeholder to make it consistent with AArch64's version. In Arm, however, it triggers a warning informing that b-key is unsupported and a-key will be selected instead. - Handle _attribute_((target(("branch-protection=..."))) for AArch32 with the same grammer as the commandline options. This patch is part of a series that adds support for the PACBTI-M extension of the Armv8.1-M architecture, as detailed here: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension The PACBTI-M specification can be found in the Armv8-M Architecture Reference Manual: https://developer.arm.com/documentation/ddi0553/latest The following people contributed to this patch: - Momchil Velikov - Victor Campos - Ties Stuij Reviewed By: vhscampos Differential Revision: https://reviews.llvm.org/D112421	2021-12-01 10:37:16 +00:00
Erich Keane	fc53eb69c2	Reapply 'Implement target_clones multiversioning' See discussion in D51650, this change was a little aggressive in an error while doing a 'while we were here', so this removes that error condition, as it is apparently useful. This reverts commit `bb4934601d`.	2021-11-29 06:30:01 -08:00
Phoebe Wang	de34a940ae	[X86] Add -mskip-rax-setup support to align with GCC AMD64 ABI mandates caller to specify the number of used SSE registers when passing variable arguments. GCC also provides option -mskip-rax-setup to skip the setup of rax when SSE is disabled. This helps to reduce the code size, see pr23258. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D112413	2021-11-18 11:20:32 +08:00
Kazu Hirata	d0ac215dd5	[clang] Use isa instead of dyn_cast (NFC)	2021-11-14 09:32:40 -08:00
Adrian Kuegel	bb4934601d	Revert "Implement target_clones multiversioning" This reverts commit `9deab60ae7`. There is a possibly unintended semantic change.	2021-11-12 11:05:58 +01:00
Erich Keane	9deab60ae7	Implement target_clones multiversioning As discussed here: https://lwn.net/Articles/691932/ GCC6.0 adds target_clones multiversioning. This functionality is an odd cross between the cpu_dispatch and 'target' MV, but is compatible with neither. This attribute allows you to list all options, then emits a separately optimized version of each function per-option (similar to the cpu_specific attribute). It automatically generates a resolver, just like the other two. The mangling however, is... ODD to say the least. The mangling format is: <normal_mangling>.<option string>.<option ordinal>. Differential Revision:https://reviews.llvm.org/D51650	2021-11-11 11:11:16 -08:00
Yaxun (Sam) Liu	4b3881e9f3	Emit hidden hostcall argument for sanitized kernels this patch - https://reviews.llvm.org/D110337 changes the way how hostcall hidden argument is emitted for printf, but the sanitized kernels also use hostcall buffer to report a error for invalid memory access, which is not handled by the above patch and it leads to vdi runtime error: Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT: Agent attempted to access an inaccessible address. code: 0x2b Patch by: Praveen Velliengiri Reviewed by: Yaxun Liu, Matt Arsenault Differential Revision: https://reviews.llvm.org/D112820	2021-11-10 17:05:57 -05:00
Yaxun (Sam) Liu	80072fde61	[CUDA][HIP] Allow comdat for kernels Two identical instantiations of a template function can be emitted by two TU's with linkonce_odr linkage without causing duplicate symbols in linker. MSVC also requires these symbols be in comdat sections. Linux does not require the symbols in comdat sections to be merged by linker but by default clang puts them in comdat sections. If a template kernel is instantiated identically in two TU's. MSVC requires that them to be in comdat sections, otherwise MSVC linker will diagnose them as duplicate symbols. However, currently clang does not put instantiated template kernels in comdat sections, which causes link error for MSVC. This patch allows putting instantiated template kernels into comdat sections. Reviewed by: Artem Belevich, Reid Kleckner Differential Revision: https://reviews.llvm.org/D112492	2021-11-10 16:42:23 -05:00
Itay Bookstein	9efce0baee	[clang] Run LLVM Verifier in modes without CodeGen too Previously, the Backend_Emit{Nothing,BC,LL} modes did not run the LLVM verifier since it is usually added via the TargetMachine::addPassesToEmitFile method according to the DisableVerify parameter. This is called from EmitAssemblyHelper::AddEmitPasses, which is only relevant for BackendAction-s that require CodeGen. Note: * In these particular situations the verifier is added to the optimization pipeline rather than the codegen pipeline so that it runs prior to the BC/LL emission pass. * This change applies to both the old and the new PMs. * Because the clang tests use -emit-llvm ubiquitously, this change will enable the verifier for them. * A small bug is fixed in emitIFuncDefinition so that the clang/test/CodeGen/ifunc.c test would pass: the emitIFuncDefinition incorrectly passed the GlobalDecl of the IFunc itself to the call to GetOrCreateLLVMFunction for creating the resolver. Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D113352	2021-11-09 23:57:13 +02:00
Itay Bookstein	3b1fd19357	[CodeGen] Diagnose and reject non-function ifunc resolvers Signed-off-by: Itay Bookstein <ibookstein@gmail.com> Reviewed By: MaskRay, erichkeane Differential Revision: https://reviews.llvm.org/D112868	2021-11-09 23:51:36 +02:00
Atmn Patel	737c4a2673	[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113421	2021-11-09 15:11:05 -05:00
Atmn Patel	ef717f3852	Revert "[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files" This reverts commit `81a7cad2ff`.	2021-11-09 02:10:42 -05:00
Atmn Patel	81a7cad2ff	[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are just code bloat. By removing them, the codebase gets a bit cleaner. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113421	2021-11-09 01:52:52 -05:00
Benjamin Kramer	8adb6d6de2	[clang] Use llvm::reverse. NFCI.	2021-11-07 14:24:33 +01:00
Itay Bookstein	848812a55e	[Verifier] Add verification logic for GlobalIFuncs Verify that the resolver exists, that it is a defined Function, and that its return type matches the ifunc's type. Add corresponding check to BitcodeReader, change clang to emit the correct type, and fix tests to comply. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112349	2021-10-31 20:00:57 -07:00
Aaron Ballman	aad244dfc5	Revert "AddGlobalAnnotations for function with or without function body." This reverts commit `121b2252de`. The following code causes a crash in some circumstances: struct k { ~k() __attribute__((annotate(""))) {} }; void m() { k(); }	2021-10-21 07:08:18 -04:00
Itay Bookstein	08ed216000	[IR] Refactor GlobalIFunc to inherit from GlobalObject, Remove GlobalIndirectSymbol As discussed in: * https://reviews.llvm.org/D94166 * https://lists.llvm.org/pipermail/llvm-dev/2020-September/145031.html The GlobalIndirectSymbol class lost most of its meaning in https://reviews.llvm.org/D109792, which disambiguated getBaseObject (now getAliaseeObject) between GlobalIFunc and everything else. In addition, as long as GlobalIFunc is not a GlobalObject and getAliaseeObject returns GlobalObjects, a GlobalAlias whose aliasee is a GlobalIFunc cannot currently be modeled properly. Creating aliases for GlobalIFuncs does happen in the wild (e.g. glibc). In addition, calling getAliaseeObject on a GlobalIFunc will currently return nullptr, which is undesirable because it should return the object itself for non-aliases. This patch refactors the GlobalIFunc class to inherit directly from GlobalObject, and removes GlobalIndirectSymbol (while inlining the relevant parts into GlobalAlias and GlobalIFunc). This allows for calling getAliaseeObject() on a GlobalIFunc to return the GlobalIFunc itself, making getAliaseeObject() more consistent and enabling alias-to-ifunc to be properly modeled in the IR. I exercised some judgement in the API clients of GlobalIndirectSymbol: some were 'monomorphized' for GlobalAlias and GlobalIFunc, and some remained shared (with the type adapted to become GlobalValue). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D108872	2021-10-20 10:29:47 -07:00
Kazu Hirata	d245f2e859	[clang] Use llvm::erase_if (NFC)	2021-10-17 13:50:29 -07:00
Chris Bieneman	121b2252de	AddGlobalAnnotations for function with or without function body. When AnnotateAttr is on a function, AddGlobalAnnotations is only called in CodeGenModule::EmitGlobalFunctionDefinition which means AnnotateAttr on function declaration without function body will be ignored. The patch will move AddGlobalAnnotations to CodeGenModule::SetFunctionAttributes, so with or without function body, the AnnotateAttr will get code gen for a function. It'll help case when AnnotateAttr is on external function, and the AnnotateAttr will be consumed in IR level. For example, a pass to collect num of uses for functions with __attribute((annotate("count_use"))) after optimizations, As long as there's __attribute((annotate("count_use"))), function with or without function body should be counted. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D111109 Patch by: python3kgae (Xiang Li)	2021-10-11 14:50:34 -05:00
Arthur Eubanks	05392466f0	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 13:29:23 -07:00
Arthur Eubanks	569346f274	Revert "Reland [IR] Increase max alignment to 4GB" This reverts commit `8d64314ffe`.	2021-10-06 11:38:11 -07:00
Arthur Eubanks	8d64314ffe	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 11:03:51 -07:00
Arthur Eubanks	72cf8b6044	Revert "[IR] Increase max alignment to 4GB" This reverts commit `df84c1fe78`. Breaks some bots	2021-10-06 10:21:35 -07:00
Arthur Eubanks	df84c1fe78	[IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 09:54:14 -07:00
Arthur Eubanks	aa53785f23	Reland [clang] Rework dontcall attributes To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute value as the LLVM attribute value. This gives us all the information to report diagnostics we need from within the IR (aside from access to the original source). One downside is we directly use LLVM's demangler rather than using the existing Clang diagnostic pretty printing of symbols. Previous revisions didn't properly declare the new dependencies. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110364	2021-09-28 15:31:30 -07:00
Arthur Eubanks	7833d20f1f	Revert "[clang] Rework dontcall attributes" This reverts commit `2943071e2e`. Breaks bots	2021-09-28 14:49:27 -07:00
Arthur Eubanks	2943071e2e	[clang] Rework dontcall attributes To avoid using the AST when emitting diagnostics, split the "dontcall" attribute into "dontcall-warn" and "dontcall-error", and also add the frontend attribute value as the LLVM attribute value. This gives us all the information to report diagnostics we need from within the IR (aside from access to the original source). One downside is we directly use LLVM's demangler rather than using the existing Clang diagnostic pretty printing of symbols. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D110364	2021-09-28 14:21:10 -07:00
serge-sans-paille	c3717b6858	Simplify handling of builtin with inline redefinition (This is a recommit of `3d6f49a569` that should no longer break validation since `bd379915de`). It is a common practice in glibc header to provide an inline redefinition of an existing function. It is especially the case for fortified function. Clang currently has an imperfect approach to the problem, using a combination of trivially recursive function detection and noinline attribute. Simplify the logic by suffixing these functions by `.inline` during codegen, so that they are not recognized as builtin by llvm. After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite Differential Revision: https://reviews.llvm.org/D109967	2021-09-28 21:00:47 +02:00
Kevin Athey	0d76d4833d	Revert "Simplify handling of builtin with inline redefinition" This reverts commit `3d6f49a569`. Broke bot: https://lab.llvm.org/buildbot/#/builders/5/builds/12360	2021-09-28 11:30:37 -07:00
serge-sans-paille	3d6f49a569	Simplify handling of builtin with inline redefinition It is a common practice in glibc header to provide an inline redefinition of an existing function. It is especially the case for fortified function. Clang currently has an imperfect approach to the problem, using a combination of trivially recursive function detection and noinline attribute. Simplify the logic by suffixing these functions by `.inline` during codegen, so that they are not recognized as builtin by llvm. After that patch, clang passes all tests from https://github.com/serge-sans-paille/fortify-test-suite Differential Revision: https://reviews.llvm.org/D109967	2021-09-28 13:24:25 +02:00
Erich Keane	e3b10525b4	Make multiversioning work with internal linkage We previously made all multiversioning resolvers/ifuncs have weak ODR linkage in IR, since we NEED to emit the whole resolver every time we see a call, but it is not necessarily the place where all the definitions live. HOWEVER, when doing so, we neglected the case where the versions have internal linkage. This patch ensures we do this, so you don't get weird behavior with static functions.	2021-09-17 05:56:38 -07:00
Justas Janickas	f9bc1b3bee	[OpenCL] Defines helper function for kernel language compatible OpenCL version This change defines a helper function getOpenCLCompatibleVersion() inside LangOptions class. The function contains mapping between C++ for OpenCL versions and their corresponding compatible OpenCL versions. This mapping function should be updated each time a new C++ for OpenCL language version is introduced. The helper function is expected to simplify conditions on OpenCL C and C++ for OpenCL versions inside compiler code. Code refactoring performed. Differential Revision: https://reviews.llvm.org/D108693	2021-08-31 10:08:38 +01:00
Andrei Elovikov	1724a16437	[NFC][clang] Move IR-independent parts of target MV support to X86TargetParser.cpp ...that is located under llvm/lib/Support/. Reviewed By: erichkeane Differential Revision: https://reviews.llvm.org/D108423	2021-08-30 09:48:48 -07:00
Alex Richardson	7cab90a7b1	Fix __attribute__((annotate("")) with non-zero globals AS The existing code attempting to bitcast from a value in the default globals AS to i8 addrspace(0)* was triggering an assertion failure in our downstream fork. I found this while compiling poppler for CHERI-RISC-V (we use AS200 for all globals). The test case uses AMDGPU since that is one of the in-tree targets with a non-zero default globals address space. The new test previously triggered a "Invalid constantexpr bitcast!" assertion and now correctly generates code with addrspace(1) pointers. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D105972	2021-08-26 10:09:40 +01:00
Nick Desaulniers	846e562dcc	[Clang] add support for error+warning fn attrs Add support for the GNU C style __attribute__((error(""))) and __attribute__((warning(""))). These attributes are meant to be put on declarations of functions whom should not be called. They are frequently used to provide compile time diagnostics similar to _Static_assert, but which may rely on non-ICE conditions (ie. relying on compiler optimizations). This is also similar to diagnose_if function attribute, but can diagnose after optimizations have been run. While users may instead simply call undefined functions in such cases to get a linkage failure from the linker, these provide a much more ergonomic and actionable diagnostic to users and do so at compile time rather than at link time. Users instead may be able use inline asm .err directives. These are used throughout the Linux kernel in its implementation of BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be converted to use _Static_assert because many of the parameters are not ICEs. The Linux kernel still needs to be modified to make use of these when building with Clang; I have a patch that does so I will send once this feature is landed. To do so, we create a new IR level Function attribute, "dontcall" (both error and warning boil down to one IR Fn Attr). Then, similar to calls to inline asm, we attach a !srcloc Metadata node to call sites of such attributed callees. The backend diagnoses these during instruction selection, while we still know that a call is a call (vs say a JMP that's a tail call) in an arch agnostic manner. The frontend then reconstructs the SourceLocation from that Metadata, and determines whether to emit an error or warning based on the callee's attribute. Link: https://bugs.llvm.org/show_bug.cgi?id=16428 Link: https://github.com/ClangBuiltLinux/linux/issues/1173 Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D106030	2021-08-25 10:34:18 -07:00
Jonas Hahnfeld	ea08c4cd1c	[CUDA] Fix static device variables with -fgpu-rdc NVPTX does not allow dots in the identifier, so ptxas errors out with fatal : Parsing error near '.static': syntax error because it parses .static as a directive. Avoid this problem by using two underscores, similar to what OpenMP does for outlined functions. Differential Revision: https://reviews.llvm.org/D108456	2021-08-25 09:31:22 +02:00
Andy Wingo	8da70fed70	[clang][NFC] Tighten up code for GetGlobalVarAddressSpace The LangAS local is only used in the OpenCL case; move its decl inwards. Differential Revision: https://reviews.llvm.org/D108449	2021-08-23 14:55:58 +02:00
Andy Wingo	d3d4d98576	[clang][NFC] GetOrCreateLLVMGlobal takes LangAS Pass a LangAS instead of a target address space to GetOrCreateLLVMGlobal, to remove a place where the frontend assumes that target address space 0 is special. Differential Revision: https://reviews.llvm.org/D108445	2021-08-23 14:55:58 +02:00
Arthur Eubanks	de0ae9e89e	[NFC] Cleanup more AttributeList::addAttribute()	2021-08-17 21:05:41 -07:00
Arthur Eubanks	ad727ab7d9	[NFC] Migrate some callers away from Function/AttributeLists methods that take an index These methods can be confusing.	2021-08-17 21:05:40 -07:00
Arthur Eubanks	46cf82532c	[NFC] Replace Function handling of attributes with less confusing calls To avoid magic constants and confusing indexes.	2021-08-17 21:05:40 -07:00
Arthur Eubanks	8e9ffa1dc6	[NFC] Cleanup callers of AttributeList::hasAttributes() AttributeList::hasAttributes() is confusing, use clearer methods like hasFnAttrs().	2021-08-13 12:16:52 -07:00
Arthur Eubanks	80ea2bb574	[NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes() This is more consistent with similar methods.	2021-08-13 11:16:52 -07:00
Michael Liao	6ec36d18ec	[cuda] Mark builtin texture/surface reference variable as 'externally_initialized'. - They need to be preserved even if there's no reference within the device code as the host code may need to initialize them based on the application logic. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D107718	2021-08-09 13:27:40 -04:00
Pavel Asyutchenko	7df405e079	Apply -fmacro-prefix-map to __builtin_FILE() This matches the behavior of GCC. Patch does not change remapping logic itself, so adding one simple smoke test should be enough. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D107393	2021-08-04 16:42:14 -07:00
Yaxun (Sam) Liu	44dbbe6106	[HIP] Preserve ASAN bitcode library functions Address sanitizer passes may generate call of ASAN bitcode library functions after bitcode linking in lld, therefore lld cannot add those symbols since it does not know they will be used later. To solve this issue, clang emits a reference to a bicode library function which calls all ASAN functions which need to be preserved. This basically force all ASAN functions to be linked in. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D106315	2021-07-23 10:35:52 -04:00
Joseph Huber	9ce02ea8c9	[OpenMP] Add Module metadata for OpenMP compilation This patch adds a module level metadata flag indicating that the module was compiled with the `-fopenmp` flag. This will make it easier for passes like OpenMPOpt to determine if it should be run. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102361	2021-06-25 16:34:19 -04:00
Nick Desaulniers	8ace121305	[IR] convert warn-stack-size from module flag to fn attr Otherwise, this causes issues when building with LTO for object files that use different values. Link: https://github.com/ClangBuiltLinux/linux/issues/1395 Reviewed By: dblaikie, MaskRay Differential Revision: https://reviews.llvm.org/D104342	2021-06-21 15:09:25 -07:00
Simon Pilgrim	61cdaf66fe	[ADT] Remove APInt/APSInt toString() std::string variants <string> is currently the highest impact header in a clang+llvm build: https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps. This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code. Differential Revision: https://reviews.llvm.org/D103888	2021-06-11 13:19:15 +01:00
Nick Desaulniers	fc018ebb60	[IR] make -warn-frame-size into a module attr -Wframe-larger-than= is an interesting warning; we can't know the frame size until PrologueEpilogueInsertion (PEI); very late in the compilation pipeline. -Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO and needed to be re-specified via -plugin-opt. Instead, make it part of the IR proper as a module level attribute, similar to D103048. Introduce -fwarn-stack-size CC1 option. Reviewed By: rsmith, qcolombet Differential Revision: https://reviews.llvm.org/D103928	2021-06-10 16:15:27 -07:00
Nathan Sidwell	b2d0c16e91	[clang] p1099 using enum part 2 This implements the 'using enum maybe-qualified-enum-tag ;' part of 1099. It introduces a new 'UsingEnumDecl', subclassed from 'BaseUsingDecl'. Much of the diff is the boilerplate needed to get the new class set up. There is one case where we accept ill-formed, but I believe this is merely an extended case of an existing bug, so consider it orthogonal. AFAICT in class-scope the c++20 rule is that no 2 using decls can bring in the same target decl ([namespace.udecl]/8). But we already accept: struct A { enum { a }; }; struct B : A { using A::a; }; struct C : B { using A::a; using B::a; }; // same enumerator this patch permits mixtures of 'using enum Bob;' and 'using Bob::member;' in the same way. Differential Revision: https://reviews.llvm.org/D102241	2021-06-08 11:11:46 -07:00
Nick Desaulniers	3787ee4571	reland [IR] make -stack-alignment= into a module attr Relands commit `433c8d950c` with fixes for MIPS. Similar to D102742, specifying the stack alignment via CodegenOpts means that this flag gets dropped during LTO, unless the command line is re-specified as a plugin opt. Instead, encode this information as a module level attribute so that we don't have to expose this llvm internal flag when linking the Linux kernel with LTO. Looks like external dependencies might need a fix: * https://github.com/llvm-hs/llvm-hs/issues/345 * https://github.com/halide/Halide/issues/6079 Link: https://github.com/ClangBuiltLinux/linux/issues/1377 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103048	2021-06-08 10:59:46 -07:00
Nick Desaulniers	a596b54d47	Revert "[IR] make -stack-alignment= into a module attr" This reverts commit `433c8d950c`. Breaks the MIPS build.	2021-06-08 08:55:50 -07:00
Nick Desaulniers	433c8d950c	[IR] make -stack-alignment= into a module attr Similar to D102742, specifying the stack alignment via CodegenOpts means that this flag gets dropped during LTO, unless the command line is re-specified as a plugin opt. Instead, encode this information as a module level attribute so that we don't have to expose this llvm internal flag when linking the Linux kernel with LTO. Looks like external dependencies might need a fix: * https://github.com/llvm-hs/llvm-hs/issues/345 * https://github.com/halide/Halide/issues/6079 Link: https://github.com/ClangBuiltLinux/linux/issues/1377 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D103048	2021-06-08 08:31:04 -07:00
Andrew Savonichev	b31f41e78b	[Clang] Support a user-defined __dso_handle This fixes PR49198: Wrong usage of __dso_handle in user code leads to a compiler crash. When Init is an address of the global itself, we need to track it across RAUW. Otherwise the initializer can be destroyed if the global is replaced. Differential Revision: https://reviews.llvm.org/D101156	2021-06-07 12:54:08 +03:00
Martin Storsjö	0e4cf807ae	[clang] [MinGW] Don't mark emutls variables as DSO local These actually can be automatically imported from another DLL. (This works properly as long as the actual implementation of emutls is linked dynamically from e.g. libgcc; if the implementation comes from compiler-rt or a statically linked libgcc, it doesn't work as intended.) This fixes PR50146 and https://github.com/msys2/MINGW-packages/issues/8706 (fixing calling std::call_once in a dynamically linked libstdc++); since `f731839584` the dso_local attribute on the TLS variable affected the actual generated code for accessing the emutls variable. The dso_local attribute on the emutls variable made those accesses to use 32 bit relative addressing in code, which requires runtime pseudo relocations in the text section, and breaks entirely if the actual other variable ends up loaded too far away in the virtual address space. Differential Revision: https://reviews.llvm.org/D102970	2021-05-27 23:51:22 +03:00
Nick Desaulniers	033138ea45	[IR] make stack-protector-guard-* flags into module attrs D88631 added initial support for: - -mstack-protector-guard= - -mstack-protector-guard-reg= - -mstack-protector-guard-offset= flags, and D100919 extended these to AArch64. Unfortunately, these flags aren't retained for LTO. Make them module attributes rather than TargetOptions. Link: https://github.com/ClangBuiltLinux/linux/issues/1378 Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D102742	2021-05-21 15:53:30 -07:00
Yaxun (Sam) Liu	4cb42564ec	[CUDA][HIP] Fix device variables used by host variables emitted on both host and device side with different addresses when ODR-used by host function should not cause device side counter-part to be force emitted. This fixes the regression caused by https://reviews.llvm.org/D102237 Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D102801	2021-05-20 17:04:29 -04:00
Reid Kleckner	8f20ac9595	[PGO] Don't reference functions unless value profiling is enabled This reduces the size of chrome.dll.pdb built with optimizations, coverage, and line table info from 4,690,210,816 to 2,181,128,192, which makes it possible to fit under the 4GB limit. This change can greatly reduce binary size in coverage builds, which do not need value profiling. IR PGO builds are unaffected. There is a minor behavior change for frontend PGO. PGO and coverage both use InstrProfiling to create profile data with counters. PGO records the address of each function in the __profd_ global. It is used later to map runtime function pointer values back to source-level function names. Coverage does not appear to use this information. Recording the address of every function with code coverage drastically increases code size. Consider this program: void foo(); void bar(); inline void inlineMe(int x) { if (x > 0) foo(); else bar(); } int getVal(); int main() { inlineMe(getVal()); } With code coverage, the InstrProfiling pass runs before inlining, and it captures the address of inlineMe in the __profd_ global. This greatly increases code size, because now the compiler can no longer delete trivial code. One downside to this approach is that users of frontend PGO must apply the -mllvm -enable-value-profiling flag globally in TUs that enable PGO. Otherwise, some inline virtual method addresses may not be recorded and will not be able to be promoted. My assumption is that this mllvm flag is not popular, and most frontend PGO users don't enable it. Differential Revision: https://reviews.llvm.org/D102818	2021-05-20 11:09:24 -07:00
Fangrui Song	37561ba89b	-fno-semantic-interposition: Don't set dso_local on GlobalVariable `clang -fpic -fno-semantic-interposition` may set dso_local on variables for -fpic. GCC folks consider there are 'address interposition' and 'semantic interposition', and 'disabling semantic interposition' can optimize function calls but cannot change variable references to use local aliases (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483). This patch removes dso_local for variables in `clang -fpic -fno-semantic-interposition` mode so that the built shared objects can work with copy relocations. Building llvm-project tiself with -fno-semantic-interposition (D102453) should now be safe with trunk Clang. Example: ``` // a.c int var; int *addr() { return var; } // old: cannot be interposed movslq .Lvar$local(%rip), %rax // new: can be interposed movq var@GOTPCREL(%rip), %rax movslq (%rax), %rax ``` The local alias lowering for `GlobalVariable`s is kept in case there is a future option allowing local aliases. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D102583	2021-05-19 16:08:28 -07:00
Ten Tzen	797ad70152	[Windows SEH]: HARDWARE EXCEPTION HANDLING (MSVC -EHa) - Part 1 This patch is the Part-1 (FE Clang) implementation of HW Exception handling. This new feature adds the support of Hardware Exception for Microsoft Windows SEH (Structured Exception Handling). This is the first step of this project; only X86_64 target is enabled in this patch. Compiler options: For clang-cl.exe, the option is -EHa, the same as MSVC. For clang.exe, the extra option is -fasync-exceptions, plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual. NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change. The rules for C code: For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow three rules: * First, no exception can move in or out of _try region., i.e., no "potential faulty instruction can be moved across _try boundary. * Second, the order of exceptions for instructions 'directly' under a _try must be preserved (not applied to those in callees). * Finally, global states (local/global/heap variables) that can be read outside of _try region must be updated in memory (not just in register) before the subsequent exception occurs. The impact to C++ code: Although SEH is a feature for C code, -EHa does have a profound effect on C++ side. When a C++ function (in the same compilation unit with option -EHa ) is called by a SEH C function, a hardware exception occurs in C++ code can also be handled properly by an upstream SEH _try-handler or a C++ catch(...). As such, when that happens in the middle of an object's life scope, the dtor must be invoked the same way as C++ Synchronous Exception during unwinding process. Design: A natural way to achieve the rules above in LLVM today is to allow an EH edge added on memory/computation instruction (previous iload/istore idea) so that exception path is modeled in Flow graph preciously. However, tracking every single memory instruction and potential faulty instruction can create many Invokes, complicate flow graph and possibly result in negative performance impact for downstream optimization and code generation. Making all optimizations be aware of the new semantic is also substantial. This design does not intend to model exception path at instruction level. Instead, the proposed design tracks and reports EH state at BLOCK-level to reduce the complexity of flow graph and minimize the performance-impact on CPP code under -EHa option. One key element of this design is the ability to compute State number at block-level. Our algorithm is based on the following rationales: A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping into a _try is not allowed. The single entry must start with a seh_try_begin() invoke with a correct State number that is the initial state of the SEME. Through control-flow, state number is propagated into all blocks. Side exits marked by seh_try_end() will unwind to parent state based on existing SEHUnwindMap[]. Note side exits can ONLY jump into parent scopes (lower state number). Thus, when a block succeeds various states from its predecessors, the lowest State triumphs others. If some exits flow to unreachable, propagation on those paths terminate, not affecting remaining blocks. For CPP code, object lifetime region is usually a SEME as SEH _try. However there is one rare exception: jumping into a lifetime that has Dtor but has no Ctor is warned, but allowed: Warning: jump bypasses variable with a non-trivial destructor In that case, the region is actually a MEME (multiple entry multiple exits). Our solution is to inject a eha_scope_begin() invoke in the side entry block to ensure a correct State. Implementation: Part-1: Clang implementation described below. Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end(). _scope_begin() is immediately added after ctor() is called and EHStack is pushed. So it must be an invoke, not a call. With that it's also guaranteed an EH-cleanup-pad is created regardless whether there exists a call in this scope. _scope_end is added before dtor(). These two intrinsics make the computation of Block-State possible in downstream code gen pass, even in the presence of ctor/dtor inlining. Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark _try boundary and to prevent from exceptions being moved across _try boundary. All memory instructions inside a _try are considered as 'volatile' to assure 2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's acceptable as the amount of code directly under _try is very small. Part-2 (will be in Part-2 patch): LLVM implementation described below. For both C++ & C-code, the state of each block is computed at the same place in BE (WinEHPreparing pass) where all other EH tables/maps are calculated. In addition to _scope_begin & _scope_end, the computation of block state also rely on the existing State tracking code (UnwindMap and InvokeStateMap). For both C++ & C-code, the state of each block with potential trap instruction is marked and reported in DAG Instruction Selection pass, the same place where the state for -EHsc (synchronous exceptions) is done. If the first instruction in a reported block scope can trap, a Nop is injected before this instruction. This nop is needed to accommodate LLVM Windows EH implementation, in which the address in IPToState table is offset by +1. (note the purpose of that is to ensure the return address of a call is in the same scope as the call address. The handler for catch(...) for -EHa must handle HW exception. So it is 'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches C++ exceptions). Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW exceptions can be passed through. Original llvm-dev [RFC] discussions can be found in these two threads below: https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.html https://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html Differential Revision: https://reviews.llvm.org/D80344/new/	2021-05-17 22:42:17 -07:00
Arthur Eubanks	9f7d552cff	[NFC] Pass GV value type instead of pointer type to GetOrCreateLLVMGlobal For opaque pointers, to avoid PointerType::getElementType(). Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D102638	2021-05-17 18:41:17 -07:00
Roman Lebedev	a624cec56d	[Clang][Codegen] Do not annotate thunk's this/return types with align/deref/nonnull attrs As it was discovered in post-commit feedback for `0aa0458f14`, we handle thunks incorrectly, and end up annotating their this/return with attributes that are valid for their callees, not for thunks themselves. While it would be good to fix this properly, and keep annotating them on thunks, i've tried doing that in https://reviews.llvm.org/D100388 with little success, and the patch is stuck for a month now. So for now, as a stopgap measure, subj.	2021-05-13 20:33:08 +03:00
Yaxun (Sam) Liu	98575708da	[CUDA][HIP] Fix device template variables Currently clang does not emit device template variables instantiated only in host functions, however, nvcc is able to do that: https://godbolt.org/z/fneEfferY This patch fixes this issue by refactoring and extending the existing mechanism for emitting static device var ODR-used by host only. Basically clang records device variables ODR-used by host code and force them to be emitted in device compilation. The existing mechanism makes sure these device variables ODR-used by host code are added to llvm.compiler-used, therefore they are guaranteed not to be deleted. It also fixes non-ODR-use of static device variable by host code causing static device variable to be emitted and registered, which should not. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D102237	2021-05-12 11:13:29 -04:00
Johannes Doerfert	df729e2b82	[OpenMP] Overhaul `declare target` handling This patch fixes various issues with our prior `declare target` handling and extends it to support `omp begin declare target` as well. This started with PR49649 in mind, trying to provide a way for users to avoid the "ref" global use introduced for globals with internal linkage. From there it went down the rabbit hole, e.g., all variables, even `nohost` ones, were emitted into the device code so it was impossible to determine if "ref" was needed late in the game (based on the name only). To make it really useful, `begin declare target` was needed as it can carry the `device_type`. Not emitting variables eagerly had a ripple effect. Finally, the precedence of the (explicit) declare target list items needed to be taken into account, that meant we cannot just look for any declare target attribute to make a decision. This caused the handling of functions to require fixup as well. I tried to clean up things while I was at it, e.g., we should not "parse declarations and defintions" as part of OpenMP parsing, this will always break at some point. Instead, we keep track what region we are in and act on definitions and declarations instead, this is what we do for declare variant and other begin/end directives already. Highlights: - new diagnosis for restrictions specificed in the standard, - delayed emission of globals not mentioned in an explicit list of a declare target, - omission of `nohost` globals on the host and `host` globals on the device, - no explicit parsing of declarations in-between `omp [begin] declare variant` and the corresponding end anymore, regular parsing instead, - precedence for explicit mentions in `declare target` lists over implicit mentions in the declaration-definition-seq, and - `omp allocate` declarations will now replace an earlier emitted global, if necessary. --- Notes: The patch is larger than I hoped but it turns out that most changes do on their own lead to "inconsistent states", which seem less desirable overall. After working through this I feel the standard should remove the explicit declare target forms as the delayed emission is horrible. That said, while we delay things anyway, it seems to me we check too often for the current status even though that is often not sufficient to act upon. There seems to be a lot of duplication that can probably be trimmed down. Eagerly emitting some things seems pretty weak as an argument to keep so much logic around. --- Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D101030	2021-05-06 02:10:41 -05:00
Leonard Chan	84c4754372	[clang] Add -fc++-abi= flag for specifying which C++ ABI to use This implements the flag proposed in RFC http://lists.llvm.org/pipermail/cfe-dev/2020-August/066437.html. The goal is to add a way to override the default target C++ ABI through a compiler flag. This makes it easier to test and transition between different C++ ABIs through compile flags rather than build flags. In this patch: - Store -fc++-abi= in a LangOpt. This isn't stored in a CodeGenOpt because there are instances outside of codegen where Clang needs to know what the ABI is (particularly through ASTContext::createCXXABI), and we should be able to override the target default if the flag is provided at that point. - Expose the existing ABIs in TargetCXXABI as values that can be passed through this flag. - Create a .def file for these ABIs to make it easier to check flag values. - Add an error for diagnosing bad ABI flag values. Differential Revision: https://reviews.llvm.org/D85802	2021-05-04 10:52:13 -07:00
Alex Lorenz	2669abaecf	[clang][CodeGen] Use llvm::stable_sort for multi version resolver options The use of llvm::sort causes periodic failures on the bot with EXPENSIVE_CHECKS enabled, as the regular sort pre-shuffles the array in the expensive checks mode, leading to a non-deterministic test result which causes the CodeGenCXX/attr-cpuspecific-outoflinedefs.cpp testcase to fail on the bot (http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/).	2021-05-03 20:07:00 -07:00
Alexey Bader	7818906ca1	[SYCL] Implement SYCL address space attributes handling Default address space (applies when no explicit address space was specified) maps to generic (4) address space. Added SYCL named address spaces `sycl_global`, `sycl_local` and `sycl_private` defined as sub-sets of the default address space. Static variables without address space now reside in global address space when compile for SPIR target, unless they have an explicit address space qualifier in source code. Differential Revision: https://reviews.llvm.org/D89909	2021-04-26 13:44:10 +03:00
Fangrui Song	2786e673c7	[IR][sanitizer] Add module flag "frame-pointer" and set it for cc1 -mframe-pointer={non-leaf,all} The Linux kernel objtool diagnostic `call without frame pointer save/setup` arise in multiple instrumentation passes (asan/tsan/gcov). With the mechanism introduced in D100251, it's trivial to respect the command line -m[no-]omit-leaf-frame-pointer/-f[no-]omit-frame-pointer, so let's do it. Fix: https://github.com/ClangBuiltLinux/linux/issues/1236 (tsan) Fix: https://github.com/ClangBuiltLinux/linux/issues/1238 (asan) Also document the function attribute "frame-pointer" which is long overdue. Differential Revision: https://reviews.llvm.org/D101016	2021-04-22 18:07:30 -07:00
Fangrui Song	775a9483e5	[IR][sanitizer] Set nounwind on module ctor/dtor, additionally set uwtable if -fasynchronous-unwind-tables On ELF targets, if a function has uwtable or personality, or does not have nounwind (`needsUnwindTableEntry`), it marks that `.eh_frame` is needed in the module. Then, a function gets `.eh_frame` if `needsUnwindTableEntry` or `-g[123]` is specified. (i.e. If -g[123], every function gets `.eh_frame`. This behavior is strange but that is the status quo on GCC and Clang.) Let's take asan as an example. Other sanitizers are similar. `asan.module_[cd]tor` has no attribute. `needsUnwindTableEntry` returns true, so every function gets `.eh_frame` if `-g[123]` is specified. This is the root cause that `-fno-exceptions -fno-asynchronous-unwind-tables -g` produces .debug_frame while `-fno-exceptions -fno-asynchronous-unwind-tables -g -fsanitize=address` produces .eh_frame. This patch * sets the nounwind attribute on sanitizer module ctor/dtor. * let Clang emit a module flag metadata "uwtable" for -fasynchronous-unwind-tables. If "uwtable" is set, sanitizer module ctor/dtor additionally get the uwtable attribute. The "uwtable" mechanism is generic: synthesized functions not cloned/specialized from existing ones should consider `Function::createWithDefaultAttr` instead of `Function::create` if they want to get some default attributes which have more of module semantics. Other candidates: "frame-pointer" (https://github.com/ClangBuiltLinux/linux/issues/955 https://github.com/ClangBuiltLinux/linux/issues/1238), dso_local, etc. Differential Revision: https://reviews.llvm.org/D100251	2021-04-21 15:58:20 -07:00
Erich Keane	0ed613612c	Ensure target-multiversioning emits deferred declarations As reported in PR50025, sometimes we would end up not emitting functions needed by inline multiversioned variants. This is because we typically use the 'deferred decl' mechanism to emit these. However, the variants are emitted after that typically happens. This fixes that by ensuring we re-run deferred decls after this happens. Also, the multiversion emission is done recursively to ensure that MV functions that require other MV functions to be emitted get emitted.	2021-04-20 08:10:26 -07:00
Jan Svoboda	6a72ed239c	[clang] NFC: Fix range-based for loop warnings related to decl lookup	2021-04-19 18:31:31 +02:00
Alexey Bataev	f6f21dcd6c	[OPENMP]Fix PR49636: Assertion `(!Entry.getAddress() \|\| Entry.getAddress() == Addr) && "Resetting with the new address."' failed. The original issue is caused by the fact that the variable is allocated with incorrect type i1 instead of i8. This causes the bitcasting of the declaration to i8 type and the bitcast expression does not match the original variable. To fix the problem, the UndefValue initializer and the original variable should be emitted with type i8, not i1. Differential Revision: https://reviews.llvm.org/D99297	2021-03-29 06:55:57 -07:00
Alex Lorenz	d672d5219a	Revert "[CodeGenModule] Set dso_local for Mach-O GlobalValue" This reverts commit `809a1e0ffd`. Mach-O doesn't support dso_local and this change broke XNU because of the use of dso_local. Differential Revision: https://reviews.llvm.org/D98458	2021-03-17 17:27:41 -07:00
Luke Drummond	440f6bdf34	[OpenCL][NFCI] Prefer CodeGenFunction::EmitRuntimeCall `CodeGenFunction::EmitRuntimeCall` automatically sets the right calling convention for the callee so we can avoid setting it ourselves. As requested in https://reviews.llvm.org/D98411 Reviewed by: anastasia Differential Revision: https://reviews.llvm.org/D98705	2021-03-16 16:22:19 +00:00
Luke Drummond	fcfd3fda71	[OpenCL] Respect calling convention for builtin `__translate_sampler_initializer` has a calling convention of `spir_func`, but clang generated calls to it using the default CC. Instruction Combining was lowering these mismatching calling conventions to `store i1* undef` which itself was subsequently lowered to a trap instruction by simplifyCFG resulting in runtime `SIGILL` There are arguably two bugs here: but whether there's any wisdom in converting an obviously invalid call into a runtime crash over aborting with a sensible error message will require further discussion. So for now it's enough to set the right calling convention on the runtime helper. Reviewed By: svenh, bader Differential Revision: https://reviews.llvm.org/D98411	2021-03-15 17:26:51 +00:00
Sriraman Tallam	cdb42a4cc4	Disable unique linkage suffixes ifor global vars until demanglers can be fixed. D96109 added support for unique internal linkage names for both internal linkage functions and global variables. There was a lot of discussion on how to get the demangling right for functions but I completely missed the point that demanglers do not support suffixes for global vars. For example: $ c++filt _ZL3foo foo $ c++filt _ZL3foo.uniq.123 _ZL3foo.uniq.123 The demangling for functions works as expected. I am not sure of the impact of this. I don't understand how debuggers and other tools depend on the correctness of global variable demangling so I am pre-emptively disabling it until we can get the demangling support added. Importantly, uniquefying global variables is not needed right now as we do not do profile attribution to global vars based on sampling. It was added for completeness and so this feature is not exactly missed. Differential Revision: https://reviews.llvm.org/D98392	2021-03-11 20:59:30 -08:00
Sriraman Tallam	78d0e91865	Refactor -funique-internal-linakge-names implementation. The option -funique-internal-linkage-names was added in D73307 and D78243 as a LLVM early pass to insert a unique suffix to internal linkage functions and vars. The unique suffix was the hash of the module path. However, we found that this can be done more cleanly in clang early and the fixes that need to be done later can be completely avoided. The fixes in particular are trying to modify the DW_AT_linkage_name and finding the right place to insert the pass. This patch ressurects the original implementation proposed in D73307 which was reviewed and then ditched in favor of the pass based approach. Differential Revision: https://reviews.llvm.org/D96109	2021-03-05 13:32:17 -08:00
Akira Hatanaka	1900503595	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR This reapplies `ed4718eccb`, which was reverted because it was causing a miscompile. The bug that was causing the miscompile has been fixed in `75805dce5f`. Original commit message: Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-03-04 11:22:30 -08:00
Wang, Pengfei	e7e67c930a	Add Windows ehcont section support (/guard:ehcont). Add option /guard:ehcont Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D96709	2021-03-04 11:47:29 +08:00
Hans Wennborg	0a5dd06718	Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR" This caused miscompiles of Chromium tests for iOS due clobbering of live registers. See discussion on the code review for details. > Background: > > This fixes a longstanding problem where llvm breaks ARC's autorelease > optimization (see the link below) by separating calls from the marker > instructions or retainRV/claimRV calls. The backend changes are in > https://reviews.llvm.org/D92569. > > https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue > > What this patch does to fix the problem: > > - The front-end adds operand bundle "clang.arc.attachedcall" to calls, > which indicates the call is implicitly followed by a marker > instruction and an implicit retainRV/claimRV call that consumes the > call result. In addition, it emits a call to > @llvm.objc.clang.arc.noop.use, which consumes the call result, to > prevent the middle-end passes from changing the return type of the > called function. This is currently done only when the target is arm64 > and the optimization level is higher than -O0. > > - ARC optimizer temporarily emits retainRV/claimRV calls after the calls > with the operand bundle in the IR and removes the inserted calls after > processing the function. > > - ARC contract pass emits retainRV/claimRV calls after the call with the > operand bundle. It doesn't remove the operand bundle on the call since > the backend needs it to emit the marker instruction. The retainRV and > claimRV calls are emitted late in the pipeline to prevent optimization > passes from transforming the IR in a way that makes it harder for the > ARC middle-end passes to figure out the def-use relationship between > the call and the retainRV/claimRV calls (which is the cause of > PR31925). > > - The function inliner removes an autoreleaseRV call in the callee if > nothing in the callee prevents it from being paired up with the > retainRV/claimRV call in the caller. It then inserts a release call if > claimRV is attached to the call since autoreleaseRV+claimRV is > equivalent to a release. If it cannot find an autoreleaseRV call, it > tries to transfer the operand bundle to a function call in the callee. > This is important since the ARC optimizer can remove the autoreleaseRV > returning the callee result, which makes it impossible to pair it up > with the retainRV/claimRV call in the caller. If that fails, it simply > emits a retain call in the IR if retainRV is attached to the call and > does nothing if claimRV is attached to it. > > - SCCP refrains from replacing the return value of a call with a > constant value if the call has the operand bundle. This ensures the > call always has at least one user (the call to > @llvm.objc.clang.arc.noop.use). > > - This patch also fixes a bug in replaceUsesOfNonProtoConstant where > multiple operand bundles of the same kind were being added to a call. > > Future work: > > - Use the operand bundle on x86-64. > > - Fix the auto upgrader to convert call+retainRV/claimRV pairs into > calls with the operand bundles. > > rdar://71443534 > > Differential Revision: https://reviews.llvm.org/D92808 This reverts commit `ed4718eccb`.	2021-03-03 15:51:40 +01:00
Richard Smith	9e2579dbf4	Fix infinite recursion during IR emission if a constant-initialized lifetime-extended temporary object's initializer refers back to the same object. `GetAddrOfGlobalTemporary` previously tried to emit the initializer of a global temporary before updating the global temporary map. Emitting the initializer could recurse back into `GetAddrOfGlobalTemporary` for the same temporary, resulting in an infinite recursion. Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D97733	2021-03-01 22:19:21 -08:00
Yaxun (Sam) Liu	5cf2a37f12	[HIP] Emit kernel symbol Currently clang uses stub function to launch kernel. This is inconvenient to interop with C++ programs since the stub function has different name as kernel, which is required by ROCm debugger. This patch emits a variable symbol which has the same name as the kernel and uses it to register and launch the kernel. This allows C++ program to launch a kernel by using the original kernel name. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D86376	2021-03-01 16:31:40 -05:00
Fangrui Song	8afdacba9d	Add GNU attribute 'retain' For ELF targets, GCC 11 will set SHF_GNU_RETAIN on the section of a `__attribute__((retain))` function/variable to prevent linker garbage collection. (See AttrDocs.td for the linker support). This patch adds `retain` functions/variables to the `llvm.used` list, which has the desired linker GC semantics. Note: `retain` does not imply `used`, so an unused function/variable can be dropped by Sema. Before 'retain' was introduced, previous ELF solutions require inline asm or linker tricks, e.g. `asm volatile(".reloc 0, R_X86_64_NONE, target");` (architecture dependent) or define a non-local symbol in the section and use `ld -u`. There was no elegant source-level solution. With D97448, `__attribute__((retain))` will set `SHF_GNU_RETAIN` on ELF targets. Differential Revision: https://reviews.llvm.org/D97447	2021-02-26 16:37:50 -08:00
Fangrui Song	28cb620321	Change some addUsedGlobal to addUsedOrCompilerUsedGlobal An global value in the `llvm.used` list does not have GC root semantics on ELF targets. This will be changed in a subsequent backend patch. Change some `llvm.used` in the ELF code path to use `llvm.compiler.used` to prevent undesired GC root semantics. Change one extern "C" alias (due to `__attribute__((used))` in extern "C") to use `llvm.compiler.used` on all targets. GNU ld has a rule "`__start_/__stop_` references from a live input section retain the associated C identifier name sections", which LLD may drop entirely (currently refined to exclude SHF_LINK_ORDER/SHF_GROUP) in a future release (the rule makes it clumsy to GC metadata sections; D96914 added a way to try the potential future behavior). For `llvm.used` global values defined in a C identifier name section, keep using `llvm.used` so that the future LLD change will not affect them. rnk kindly categorized the changes: ``` ObjC/blocks: this wants GC root semantics, since ObjC mainly runs on Mac. MS C++ ABI stuff: wants GC root semantics, no change OpenMP: unsure, but GC root semantics probably don't hurt CodeGenModule: affected in this patch to not use GC root semantics so that __attribute__((used)) behavior remains the same on ELF, plus two other minor use cases that don't want GC semantics Coverage: Probably want GC root semantics CGExpr.cpp: refers to LTO, wants GC root CGDeclCXX.cpp: one is MS ABI specific, so yes GC root, one is some other C++ init functionality, which should form GC roots (C++ initializers can have side effects and must run) CGDecl.cpp: Changed in this patch for __attribute__((used)) ``` Differential Revision: https://reviews.llvm.org/D97446	2021-02-26 10:42:07 -08:00
Yaxun (Sam) Liu	47acdec1dd	[CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc For -fgpu-rdc mode, static device vars in different TU's may have the same name. To support accessing file-scope static device variables in host code, we need to give them a distinct name and external linkage. This can be done by postfixing each static device variable with a distinct CUID (Compilation Unit ID) hash. Since the static device variables have different name across compilation units, now we let them have external linkage so that they can be looked up by the runtime. Reviewed by: Artem Belevich, and Jon Chesterfield Differential Revision: https://reviews.llvm.org/D85223	2021-02-24 18:23:45 -05:00
Yaxun (Sam) Liu	a3ce7f5cd2	[HIP] Fix managed variable linkage Currently managed variables are emitted as undefined symbols, which causes difficulty for diagnosing undefined symbols for non-managed variables. This patch transforms managed variables in device compilation so that they can be emitted as normal variables. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D96195	2021-02-23 22:34:45 -05:00
Melanie Blower	e64fcdf8d5	[clang][patch] Inclusive language, modify filename SanitizerBlacklist.h to NoSanitizeList.h This patch responds to a comment from @vitalybuka in D96203: suggestion to do the change incrementally, and start by modifying this file name. I modified the file name and made the other changes that follow from that rename. Reviewers: vitalybuka, echristo, MaskRay, jansvoboda11, aaron.ballman Differential Revision: https://reviews.llvm.org/D96974	2021-02-22 15:11:37 -05:00
Igor Kudrin	aa84289629	[DebugInfo] Keep the DWARF64 flag in the module metadata This allows the option to affect the LTO output. Module::Max helps to generate debug info for all modules in the same format. Differential Revision: https://reviews.llvm.org/D96597	2021-02-17 17:03:34 +07:00
Artur Gainullin	ff50b121e3	[SYCL] Ignore file-scope asm during device-side SYCL compilation. Reviewed By: bader, eandrews Differential Revision: https://reviews.llvm.org/D96538	2021-02-12 17:00:45 -08:00
Akira Hatanaka	ed4718eccb	[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR Background: This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569. https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue What this patch does to fix the problem: - The front-end adds operand bundle "clang.arc.attachedcall" to calls, which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0. - ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function. - ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV and claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925). - The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it. - SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use). - This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call. Future work: - Use the operand bundle on x86-64. - Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles. rdar://71443534 Differential Revision: https://reviews.llvm.org/D92808	2021-02-12 09:51:57 -08:00
Anastasia Stulova	79b222c39f	[OpenCL] Fix types with signed prefix in arginfo metadata. Signed prefix is removed and the single word spelling is printed for the scalar types. Tags: #clang Differential Revision: https://reviews.llvm.org/D96161	2021-02-09 15:13:19 +00:00
Anastasia Stulova	ecc8ac3f08	[OpenCL] Fix pipe type printing in arg info metadata Pipe element type spelling for arg info metadata should follow the same behavior as normal type spelling. We should only use the canonical type spelling in the base type field. This patch also removed duplication in type handling. Tags: #clang Differential Revision: https://reviews.llvm.org/D96151	2021-02-08 16:05:13 +00:00
Yaxun (Sam) Liu	b008ea304d	[CUDA][HIP] Fix device variable linkage For -fgpu-rdc, shadow variables should not be internalized, otherwise they cannot be accessed by other TUs. This is necessary because the shadow variable of external device variables are always emitted as undefined symbols, which need to resolve to a global symbols. Managed variables need to be emitted as undefined symbols in device compilations. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D95901	2021-02-05 15:11:12 -05:00
Yaxun (Sam) Liu	0b2af1a288	[NFC][CUDA] Refactor registering device variable Extract registering device variable to CUDA runtime codegen function since it will be called in multiple places. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D95558	2021-02-03 14:29:51 -05:00
Hans Wennborg	0479c53b6c	[dllimport] Honor always_inline when deciding whether a dllimport function should be available for inlining (PR48925) Normally, Clang will not make dllimport functions available for inlining if they reference non-imported symbols, as this can lead to confusing link errors. But if the function is marked always_inline, the user presumably knows what they're doing and the attribute should be honored. Differential revision: https://reviews.llvm.org/D95673	2021-02-02 10:28:32 +01:00
Petr Hosek	bb9eb19829	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 17:13:34 -08:00
Petr Hosek	1e634f3952	Revert "Support for instrumenting only selected files or functions" This reverts commit `4edf35f11a` because the test fails on Windows bots.	2021-01-26 12:25:28 -08:00
Petr Hosek	4edf35f11a	Support for instrumenting only selected files or functions This change implements support for applying profile instrumentation only to selected files or functions. The implementation uses the sanitizer special case list format to select which files and functions to instrument, and relies on the new noprofile IR attribute to exclude functions from instrumentation. Differential Revision: https://reviews.llvm.org/D94820	2021-01-26 11:11:39 -08:00
Bjorn Pettersson	ea2cfda386	[CGExpr] Use getCharWidth() more consistently in CCGExprConstant. NFC Most of CGExprConstant.cpp is using the CharUnits abstraction and is using getCharWidth() (directly of indirectly) when converting between size of a char and size in bits. This patch is making that abstraction more consistent by adding CharTy to the CodeGenTypeCache (honoring getCharWidth() when mapping from char to LLVM IR types, instead of using Int8Ty directly). Reviewed By: rjmccall Differential Revision: https://reviews.llvm.org/D94979	2021-01-22 21:12:17 +01:00
Yaxun (Sam) Liu	622eaa4a4c	[HIP] Support __managed__ attribute This patch implements codegen for __managed__ variable attribute for HIP. Diagnostics will be added later. Differential Revision: https://reviews.llvm.org/D94814	2021-01-22 11:43:58 -05:00
Zequan Wu	e53bbd9951	[IR] move nomerge attribute from function declaration/definition to callsites Move nomerge attribute from function declaration/definition to callsites to allow virtual function calls attach the attribute. Differential Revision: https://reviews.llvm.org/D94537	2021-01-12 12:10:46 -08:00
Fangrui Song	e2e82c9983	[CodeGenModule] Drop dso_local on function declarations for ELF -fno-pic -fno-direct-access-external-data ELF -fno-pic sets dso_local on a function declaration to allow direct accesses when taking its address (similar to a data symbol). The emitted code follows the traditional GCC/Clang -fno-pic behavior: an absolute relocation is produced. If the function is not defined in the executable, a canonical PLT entry will be needed at link time. This is similar to a copy relocation and is incompatible with (-Bsymbolic or --dynamic-list linked shared objects / protected symbols in a shared object). This patch gives -fno-pic code a way to avoid such a canonical PLT entry. The FIXME was about a generalization for -fpie -mpie-copy-relocations (now -fpie -fdirect-access-external-data). While we could set dso_local to avoid GOT when taking the address of a function declaration (there is an ignorable difference about R_386_PC32 vs R_386_PLT32 on i386), it likely does not provide any benefit and can just cause trouble, so we don't make the generalization.	2021-01-09 16:31:56 -08:00
Fangrui Song	38a716c30f	Make -fno-pic respect -fno-direct-access-external-data D92633 added -f[no-]direct-access-external-data to supersede -m[no-]pie-copy-relocations. (The option works for -fpie but is a no-op for -fno-pic and -fpic.) This patch makes -fno-pic -fno-direct-access-external-data drop dso_local from global variable declarations. This usually causes the backend to emit a GOT indirection for external data access. With a GOT relocation, the subsequent -no-pie link will not have copy relocation even if the data symbol turns out to be defined by a shared object. Differential Revision: https://reviews.llvm.org/D92714	2021-01-09 00:32:02 -08:00
Fangrui Song	1d3ebbf537	Add -f[no-]direct-access-external-data to supersede -mpie-copy-relocations GCC r218397 "x86-64: Optimize access to globals in PIE with copy reloc" made -fpie code emit R_X86_64_PC32 to reference external data symbols by default. Clang adopted -mpie-copy-relocations D19996 as a flexible alternative. The name -mpie-copy-relocations can be improved [1] and does not capture the idea that this option can apply to -fno-pic and -fpic [2], so this patch introduces -f[no-]direct-access-external-data and makes -mpie-copy-relocations their aliases for compatibility. [1] For ``` extern int var; int get() { return var; } ``` if var is defined in another translation unit in the link unit, there is no copy relocation. [2] -fno-pic -fno-direct-access-external-data is useful to avoid copy relocations. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65888 If a shared object is linked with -Bsymbolic or --dynamic-list and exports a data symbol, normally the data symbol cannot be accessed by -fno-pic code (because by default an absolute relocation is produced which will lead to a copy relocation). -fno-direct-access-external-data can prevent copy relocations. -fpic -fdirect-access-external-data can avoid GOT indirection. This is like the undefined counterpart of -fno-semantic-interposition. However, the user should define var in another translation unit and link with -Bsymbolic or --dynamic-list, otherwise the linker will error in a -shared link. Generally the user has better tools for their goal but I want to mention that this combination is valid. On COFF, the behavior is like always -fdirect-access-external-data. `__declspec(dllimport)` is needed to enable indirect access. There is currently no plan to affect non-ELF behaviors or -fpic behaviors. -fno-pic -fno-direct-access-external-data will be implemented in the subsequent patch. GCC feature request https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112 Reviewed By: tmsriram Differential Revision: https://reviews.llvm.org/D92633	2021-01-09 00:32:01 -08:00
Fangrui Song	d1fd72343c	Refactor how -fno-semantic-interposition sets dso_local on default visibility external linkage definitions The idea is that the CC1 default for ELF should set dso_local on default visibility external linkage definitions in the default -mrelocation-model pic mode (-fpic/-fPIC) to match COFF/Mach-O and make output IR similar. The refactoring is made available by `2820a2ca3a`. Currently only x86 supports local aliases. We move the decision to the driver. There are three CC1 states: * -fsemantic-interposition: make some linkages interposable and make default visibility external linkage definitions dso_preemptable. * (default): selected if the target supports .Lfoo$local: make default visibility external linkage definitions dso_local * -fhalf-no-semantic-interposition: if neither option is set or the target does not support .Lfoo$local: like -fno-semantic-interposition but local aliases are not used. So references can be interposed if not optimized out. Add -fhalf-no-semantic-interposition to a few tests using the half-based semantic interposition behavior.	2020-12-31 13:59:45 -08:00
Fangrui Song	809a1e0ffd	[CodeGenModule] Set dso_local for Mach-O GlobalValue * static relocation model: always * other relocation models: if isStrongDefinitionForLinker This will make LLVM IR emitted for COFF/Mach-O and executable ELF similar.	2020-12-30 20:52:01 -08:00
Fangrui Song	2820a2ca3a	Move -fno-semantic-interposition dso_local logic from TargetMachine to Clang CodeGenModule This simplifies TargetMachine::shouldAssumeDSOLocal and and gives frontend the decision to use dso_local. For LLVM synthesized functions/globals, they may lose inferred dso_local but such optimizations are probably not very useful. Note: the hasComdat() condition in canBenefitFromLocalAlias (D77429) may be dead now. (llvm/CodeGen/X86/semantic-interposition-comdat.ll) (Investigate whether we need test coverage when Fuchsia C++ ABI is clearer)	2020-12-29 23:37:55 -08:00
Rong Xu	3733463dbb	[IR][PGO] Add hot func attribute and use hot/cold attribute in func section Clang FE currently has hot/cold function attribute. But we only have cold function attribute in LLVM IR. This patch adds support of hot function attribute to LLVM IR. This attribute will be used in setting function section prefix/suffix. Currently .hot and .unlikely suffix only are added in PGO (Sample PGO) compilation (through isFunctionHotInCallGraph and isFunctionColdInCallGraph). This patch changes the behavior. The new behavior is: (1) If the user annotates a function as hot or isFunctionHotInCallGraph is true, this function will be marked as hot. Otherwise, (2) If the user annotates a function as cold or isFunctionColdInCallGraph is true, this function will be marked as cold. The changes are: (1) user annotated function attribute will used in setting function section prefix/suffix. (2) hot attribute overwrites profile count based hotness. (3) profile count based hotness overwrite user annotated cold attribute. The intention for these changes is to provide the user a way to mark certain function as hot in cases where training input is hard to cover all the hot functions. Differential Revision: https://reviews.llvm.org/D92493	2020-12-17 18:41:12 -08:00
Zequan Wu	fb0f728805	[Clang] Make nomerge attribute a function attribute as well as a statement attribute. Differential Revision: https://reviews.llvm.org/D92800	2020-12-17 07:45:38 -08:00
Rong Xu	c36f31c4db	[PGO] remove unintentional code in early commit Remove unintentional code in commit 54e03d [PGO] Verify BFI counts after loading profile data.	2020-12-14 18:41:49 -08:00
Rong Xu	54e03d03a7	[PGO] Verify BFI counts after loading profile data This patch adds the functionality to compare BFI counts with real profile counts right after reading the profile. It will print remarks under -Rpass-analysis=pgo, or the internal option -pass-remarks-analysis=pgo. Differential Revision: https://reviews.llvm.org/D91813	2020-12-14 15:56:10 -08:00
Fangrui Song	1ab9327d1c	[TargetMachine][CodeGenModule] Delete unneeded ppc32 special case from shouldAssumeDSOLocal PPCMCInstLower does not actually call shouldAssumeDSOLocal for ppc32 so this is dead. Actually Clang ppc32 does produce a pair of absolute relocations which match GCC. This also fixes a comment (R_PPC_COPY and R_PPC64_COPY do exist).	2020-12-05 00:42:07 -08:00
Nico Weber	0cbf61be8b	[mac/arm] Fix rtti codegen tests when running on an arm mac shouldRTTIBeUnique() returns false for iOS64CXXABI, which causes RTTI objects to be emitted hidden. Update two tests that didn't expect this to happen for the default triple. Also rename iOS64CXXABI to AppleARM64CXXABI, since it's used for arm64-apple-macos triples too. Part of PR46644. Differential Revision: https://reviews.llvm.org/D91904	2020-12-03 09:11:03 -05:00
Ben Dunbobbin	e42021d5cc	[Clang][-fvisibility-from-dllstorageclass] Set DSO Locality from final visibility Ensure that the DSO Locality of the globals in the IR is derived from their final visibility when using -fvisibility-from-dllstorageclass. To accomplish this we reset the DSO locality of globals (before setting their visibility from their dllstorageclass) at the end of IRGen in Clang. This removes any effects that visibility options or annotations may have had on the DSO locality. The resulting DSO locality of the globals will be pessimistic w.r.t. to the normal compiler IRGen. Differential Revision: https://reviews.llvm.org/D91779	2020-11-24 00:32:14 +00:00
Xiangling Liao	17497ec514	[AIX][FE] Support constructor/destructor attribute Support attribute((constructor)) and attribute((destructor)) on AIX Differential Revision: https://reviews.llvm.org/D90892	2020-11-19 09:24:01 -05:00
Nick Desaulniers	f4c6080ab8	Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch" This reverts commit `b7926ce6d7`. Going with a simpler approach.	2020-11-17 17:27:14 -08:00

1 2 3 4 5 ...

1899 Commits