Commit Graph

271 Commits

Author SHA1 Message Date
Yaxun (Sam) Liu 8428c75da1 [CUDA][HIP] Do not treat host var address as constant in device compilation
Currently clang treats host var address as constant in device compilation,
which causes const vars initialized with host var address promoted to
device variables incorrectly and results in undefined symbols.

This patch fixes that.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D118153

Fixes: SWDEV-309881

Change-Id: I0a69357063c6f8539ef259c96c250d04615f4473
2022-01-28 16:04:52 -05:00
Florian Hahn 67aa314bce
[IRGen] Do not overwrite existing attributes in CGCall.
When adding new attributes, existing attributes are dropped. While
this appears to be a longstanding issue, this was highlighted by D105169
which dropped a lot of attributes due to adding the new noundef
attribute.

Ahmed Bougacha (@ab) tracked down the issue and provided the fix in
CGCall.cpp. I bundled it up and updated the tests.
2022-01-20 13:45:19 +00:00
Yaxun (Sam) Liu 85c2bd2a0e Prevent adding module flag amdgpu_hostcall multiple times
HIP program with printf call fails to compile with -fsanitize=address
option, because of appending module flag - amdgpu_hostcall twice, one
for printf and one for sanitize option. This patch fixes that issue.

Patch by: Praveen Velliengiri

Reviewed by: Yaxun Liu, Roman Lebedev

Differential Revision: https://reviews.llvm.org/D116216
2022-01-19 12:52:33 -05:00
hyeongyu kim 1b1c8d83d3 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2022-01-16 18:54:17 +09:00
Matt Arsenault 33315ef321 clang/AMDGPU: Don't set implicit arg attribute to default size
Since 2959e082e1, we conservatively
assume all inputs are enabled by default. This isn't the best
interface for controlling these anyway, since it's not granular and
only allows trimming the last fields.
2022-01-14 18:43:30 -05:00
Yaxun (Sam) Liu 3b172f60c6 [HIP] Fix -fgpu-rdc for Windows
This patch fixes issues for -fgpu-rdc for Windows MSVC
toolchain:

Fix COFF specific section flags and remove section types
in llvm-mc input file for Windows.

Escape fatbin path in llvm-mc input file.

Add -triple option to llvm-mc.

Put __hip_gpubin_handle in comdat when it has linkonce_odr
linkage.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D115039
2021-12-06 16:42:23 -05:00
Anshil Gandhi df0560ca00 [HIP] Add atomic load, atomic store and atomic cmpxchng_weak builtin support in HIP-clang
Introduce `__hip_atomic_load`, `__hip_atomic_store` and `__hip_atomic_compare_exchange_weak`
builtins in HIP.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D114553
2021-11-29 12:07:13 -07:00
Yaxun (Sam) Liu 38211bbab1 [HIP] Fix device stub name for Windows
This is a follow up of https://reviews.llvm.org/D68578
where device stub name is changed for Itanium
mangling but not Microsoft mangling.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D113491
2021-11-23 12:03:49 -05:00
Yaxun (Sam) Liu e13246a2ec [HIP] Add HIP scope atomic operations
Add an AtomicScopeModel for HIP and support for OpenCL builtins
that are missing in HIP.

Patch by: Michael Liao

Revised by: Anshil Ghandi

Reviewed by: Yaxun Liu

Differential Revision: https://reviews.llvm.org/D113925
2021-11-23 10:13:37 -05:00
Yaxun (Sam) Liu 4b3881e9f3 Emit hidden hostcall argument for sanitized kernels
this patch - https://reviews.llvm.org/D110337 changes the way how hostcall
hidden argument is emitted for printf, but the sanitized kernels also use
hostcall buffer to report a error for invalid memory access, which is not
handled by the above patch and it leads to vdi runtime error:

Device::callbackQueue aborting with error : HSA_STATUS_ERROR_MEMORY_FAULT:
Agent attempted to access an inaccessible address. code: 0x2b

Patch by: Praveen Velliengiri

Reviewed by: Yaxun Liu, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D112820
2021-11-10 17:05:57 -05:00
Yaxun (Sam) Liu 80072fde61 [CUDA][HIP] Allow comdat for kernels
Two identical instantiations of a template function can be emitted by two TU's
with linkonce_odr linkage without causing duplicate symbols in linker. MSVC
also requires these symbols be in comdat sections. Linux does not require
the symbols in comdat sections to be merged by linker but by default
clang puts them in comdat sections.

If a template kernel is instantiated identically in two TU's. MSVC requires
that them to be in comdat sections, otherwise MSVC linker will diagnose them as
duplicate symbols. However, currently clang does not put instantiated template
kernels in comdat sections, which causes link error for MSVC.

This patch allows putting instantiated template kernels into comdat sections.

Reviewed by: Artem Belevich, Reid Kleckner

Differential Revision: https://reviews.llvm.org/D112492
2021-11-10 16:42:23 -05:00
hsmahesha 3b9a85d10a [CFE][Codegen] Make sure to maintain the contiguity of all the static allocas
at the start of the entry block, which in turn would aid better code transformation/optimization.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D110257
2021-11-10 08:45:21 +05:30
hyeongyu kim fd9b099906 Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit aacfbb953e.

Revert "Fix lit test failures in CodeGenCoroutines"

This reverts commit 63fff0f5bf.
2021-11-09 02:15:55 +09:00
hyeongyukim aacfbb953e [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169

[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)

This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453

Resolve lit failures in clang after 8ca4b3e's land

Fix lit test failures in clang-ppc* and clang-x64-windows-msvc

Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc

Fix internal_clone(aarch64) inline assembly
2021-11-06 19:19:22 +09:00
Juneyoung Lee 89ad2822af Revert "[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default"
This reverts commit 7584ef766a.
2021-11-06 15:39:19 +09:00
Juneyoung Lee 7584ef766a [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.

Test updates are made as a separate patch: D108453

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D105169
2021-11-06 15:36:42 +09:00
Anshil Gandhi 0567f03331 [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true`
and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values
with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.

Reviewed By: yaxunl, #amdgpu

Differential Revision: https://reviews.llvm.org/D109707
2021-10-18 16:53:15 -06:00
Juneyoung Lee f193bcc701 Revert D105169 due to the two-stage failure in ASAN
This reverts the following commits:
37ca7a795b
9aa6c72b92
705387c507
8ca4b3ef19
80dba72a66
2021-10-18 23:52:46 +09:00
Juneyoung Lee 8ca4b3ef19 [Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:

(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.

(2) The remaining tests are updated manually.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D108453
2021-10-16 12:01:41 +09:00
Anshil Gandhi 1830ec94ac Revert "[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols"
This reverts commit 03375a3fb3.
2021-10-15 16:16:18 -06:00
Anshil Gandhi f92db6d3ff [HIP] Relax conditions for address space cast in builtin args
Allow (implicit) address space casting between LLVM-equivalent
target address spaces.

Reviewed By: yaxunl, tra

Differential Revision: https://reviews.llvm.org/D111734
2021-10-15 15:35:52 -06:00
Anshil Gandhi 53fc5100e0 Revert "[HIP] Relax conditions for address space cast in builtin args"
This reverts commit 3b48e1170d.
2021-10-15 14:42:28 -06:00
Anshil Gandhi 3b48e1170d [HIP] Relax conditions for address space cast in builtin args
Allow (implicit) address space casting between LLVM-equivalent
target address spaces.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D111734
2021-10-15 14:06:47 -06:00
Anshil Gandhi 03375a3fb3 [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true`
and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values
with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.

Reviewed By: yaxunl, #amdgpu

Differential Revision: https://reviews.llvm.org/D109707
2021-10-15 11:39:15 -06:00
hsmahesha 393581d8a5 [CFE][Codegen] Update auto-generated check lines for few GPU lit tests
which is essentially required as a pre-commit for https://reviews.llvm.org/D110257.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D110676
2021-10-07 09:05:39 +05:30
Yaxun (Sam) Liu c4afb5f81b [HIP] Fix linking of asanrt.bc
HIP currently uses -mlink-builtin-bitcode to link all bitcode libraries, which
changes the linkage of functions to be internal once they are linked in. This
works for common bitcode libraries since these functions are not intended
to be exposed for external callers.

However, the functions in the sanitizer bitcode library is intended to be
called by instructions generated by the sanitizer pass. If their linkage is
changed to internal, their parameters may be altered by optimizations before
the sanitizer pass, which renders them unusable by the sanitizer pass.

To fix this issue, HIP toolchain links the sanitizer bitcode library with
-mlink-bitcode-file, which does not change the linkage.

A struct BitCodeLibraryInfo is introduced in ToolChain as a generic
approach to pass the bitcode library information between ToolChain and Tool.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D110304
2021-09-27 13:25:46 -04:00
Jonas Hahnfeld ea08c4cd1c [CUDA] Fix static device variables with -fgpu-rdc
NVPTX does not allow dots in the identifier, so ptxas errors out with
   fatal   : Parsing error near '.static': syntax error
because it parses .static as a directive. Avoid this problem by using
two underscores, similar to what OpenMP does for outlined functions.

Differential Revision: https://reviews.llvm.org/D108456
2021-08-25 09:31:22 +02:00
Anshil Gandhi 7063ac1afa [HIP] Allow target addr space in target builtins
This patch allows target specific addr space in target builtins for HIP. It inserts implicit addr
space cast for non-generic pointer to generic pointer in general, and inserts implicit addr
space cast for generic to non-generic for target builtin arguments only.

It is NFC for non-HIP languages.

Differential Revision: https://reviews.llvm.org/D102405
2021-08-19 23:51:58 -06:00
Anshil Gandhi f5d5f17d3a Revert "[HIP] Allow target addr space in target builtins"
This reverts commit a35008955f.
2021-08-18 21:38:42 -06:00
Anshil Gandhi f22ba51873 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating a
compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-16 14:56:01 -06:00
Dávid Bolvanský 49de6070a2 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit 435785214f. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.
2021-08-15 11:44:13 +02:00
Anshil Gandhi 435785214f [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating
a compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-14 23:37:23 -06:00
Anshil Gandhi 29e11a1aa3 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit c4e5425aa5.
2021-08-13 23:58:04 -06:00
Anshil Gandhi c4e5425aa5 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpandPass to report atomics generating a compare
and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-13 22:44:08 -06:00
Anshil Gandhi a35008955f [HIP] Allow target addr space in target builtins
This patch allows target specific addr space in target builtins for HIP. It inserts implicit addr
space cast for non-generic pointer to generic pointer in general, and inserts implicit addr
space cast for generic to non-generic for target builtin arguments only.

It is NFC for non-HIP languages.

Differential Revision: https://reviews.llvm.org/D102405
2021-08-09 16:38:04 -06:00
Michael Liao 6ec36d18ec [cuda] Mark builtin texture/surface reference variable as 'externally_initialized'.
- They need to be preserved even if there's no reference within the
  device code as the host code may need to initialize them based on the
  application logic.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D107718
2021-08-09 13:27:40 -04:00
Yaxun (Sam) Liu 44dbbe6106 [HIP] Preserve ASAN bitcode library functions
Address sanitizer passes may generate call of ASAN bitcode library
functions after bitcode linking in lld, therefore lld cannot add
those symbols since it does not know they will be used later.

To solve this issue, clang emits a reference to a bicode library
function which calls all ASAN functions which need to be
preserved. This basically force all ASAN functions to be
linked in.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D106315
2021-07-23 10:35:52 -04:00
Michael Liao 4e5d9c8803 [Internalize] Preserve variables externally initialized.
- ``externally_initialized`` variables would be initialized or modified
  elsewhere. Particularly, CUDA or HIP may have host code to initialize
  or modify ``externally_initialized`` device variables, which may not
  be explicitly referenced on the device side but may still be used
  through the host side interfaces. Not preserving them triggers the
  elimination of them in the GlobalDCE and breaks the user code.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D105135
2021-07-08 10:48:19 -04:00
Sameer Sahasrabuddhe 280593bd3f [Clang] [NFC] fix CHECK lines for convergent attribute tests 2021-06-29 00:21:07 +05:30
Jay Foad 157473a58f [IR] Simplify createReplacementInstr
NFCI, although the test change shows that ConstantExpr::getAsInstruction
is better than the old implementation of createReplacementInstr because
it propagates things like the sdiv "exact" flag.

Differential Revision: https://reviews.llvm.org/D104124
2021-06-23 10:47:43 +01:00
Yaxun (Sam) Liu 054cc3b1b4 [CUDA][HIP] Fix store of vtbl in ctor
vtbl itself is in default global address space. When clang emits
ctor, it gets a pointer to the vtbl field based on the this pointer,
then stores vtbl to the pointer.

Since this pointer can point to any address space (e.g. an object
created in stack), this pointer points to default address space, therefore
the pointer to vtbl field in this object should also be in default
address space.

Currently, clang incorrectly casts the pointer to vtbl field in this object
to global address space. This caused assertions in backend.

This patch fixes that by removing the incorrect addr space cast.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D103835
2021-06-08 10:24:44 -04:00
Konstantin Zhuravlyov 4d9f8527db CUDA/HIP: Change device-use-host-var.cu's NOT "external" check to include variable name
Otherwise it is causing one of our build jobs to fail,
it is using "external" as directory, and NOT is
failing because "external" is found in ModuleID.

Differential Revision: https://reviews.llvm.org/D103658
2021-06-04 13:10:00 -04:00
Yaxun (Sam) Liu e42def62d8 [HIP] Fix amdgcn builtin for long type
Currently some amdgcn builtins are defined with long int type,
which causes invalid IR on Windows since long int is 32 bit
on Windows whereas these builtins have 64 bit arguments.

long long int type cannot be used since it is 128 bit in OpenCL.

This patch uses 64 bit int type instead of long int to define 64 bit int
arguments or return for amdgcn builtins.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D103563
2021-06-03 19:05:56 -04:00
Yaxun (Sam) Liu 04caa7c3e0 [CUDA][HIP] Promote const variables to constant
Recently we added diagnosing ODR-use of host variables
in device functions, which includes ODR-use of const
host variables since they are not really emitted on
device side. This caused regressions since we used
to allow ODR-use of const host variables in device
functions.

This patch allows ODR-use of const variables in device
functions if the const variables can be statically initialized
and have an empty dtor. Such variables are marked with
implicit constant attrs and emitted on device side. This is
in line with what clang does for constexpr variables.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D103108
2021-06-01 21:28:41 -04:00
Yaxun (Sam) Liu 4cb42564ec [CUDA][HIP] Fix device variables used by host
variables emitted on both host and device side with different addresses
when ODR-used by host function should not cause device side counter-part
to be force emitted.

This fixes the regression caused by https://reviews.llvm.org/D102237

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102801
2021-05-20 17:04:29 -04:00
Fangrui Song 37561ba89b -fno-semantic-interposition: Don't set dso_local on GlobalVariable
`clang -fpic -fno-semantic-interposition` may set dso_local on variables for -fpic.

GCC folks consider there are 'address interposition' and 'semantic interposition',
and 'disabling semantic interposition' can optimize function calls but
cannot change variable references to use local aliases
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483).

This patch removes dso_local for variables in
`clang -fpic -fno-semantic-interposition` mode so that the built shared objects can
work with copy relocations. Building llvm-project tiself with
-fno-semantic-interposition (D102453) should now be safe with trunk Clang.

Example:
```
// a.c
int var;
int *addr() { return var; }

// old: cannot be interposed
movslq  .Lvar$local(%rip), %rax
// new: can be interposed
movq    var@GOTPCREL(%rip), %rax
movslq  (%rax), %rax
```

The local alias lowering for `GlobalVariable`s is kept in case there is a
future option allowing local aliases.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D102583
2021-05-19 16:08:28 -07:00
Steffen Larsen f226e28a88 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions
for `sm_80` architecture or newer.

PTX ISA description of `redux.sync`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100124
2021-05-17 09:46:59 -07:00
Yaxun (Sam) Liu 98575708da [CUDA][HIP] Fix device template variables
Currently clang does not emit device template variables
instantiated only in host functions, however, nvcc is
able to do that:

https://godbolt.org/z/fneEfferY

This patch fixes this issue by refactoring and extending
the existing mechanism for emitting static device
var ODR-used by host only. Basically clang records
device variables ODR-used by host code and force
them to be emitted in device compilation. The existing
mechanism makes sure these device variables ODR-used
by host code are added to llvm.compiler-used, therefore
they are guaranteed not to be deleted.

It also fixes non-ODR-use of static device variable by host code
causing static device variable to be emitted and registered,
which should not.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102237
2021-05-12 11:13:29 -04:00
Yaxun (Sam) Liu c58a6a6fb4 [HIP] Fix device lib selection
Choose optimized device lib bitcode by fp options
for performance.

Reviewed by: Artem Belevich, Fangrui Song

Differential Revision: https://reviews.llvm.org/D101654
2021-05-01 20:31:11 -04:00
Yaxun (Sam) Liu d8805574c1 [CUDA][HIP] Allow non-ODR use of host var in device
Reviewed by: Artem Belevich, Richard Smith

Differential Revision: https://reviews.llvm.org/D98193
2021-04-19 14:45:24 -04:00