Commit Graph

373 Commits

Author SHA1 Message Date
Steffen Larsen 1b4c85fc02 [NVPTX] Add NVPTX intrinsics for CUDA PTX 6.5 ldmatrix instructions
Adds NVPTX intrinsics for the CUDA PTX `ldmatrix.sync.aligned` instructions added in PTX 6.5.

PTX ISA description of `ldmatrix.sync.aligned`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-ldmatrix

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D107046
2021-08-06 16:13:35 -07:00
Eli Friedman bdd55b2f18 Fix the default alignment of i1 vectors.
Currently, the default alignment is much larger than the actual size of
the vector in memory.  Fix this to use a sane default.

For SVE, temporarily remove lowering of load/store operations for
predicates with less than 16 elements. The layout the backend was
assuming for SVE predicates with less than 16 elements doesn't agree
with the frontend. More work probably needs to be done here.

This change is, strictly speaking, not backwards-compatible at the
bitcode level. But probably nobody is actually depending on that; i1
vectors in memory are rare, and the code that does use them probably
ends up forcing the alignment to something sane anyway.  If we think
this is a concern, I can restrict this to scalable vectors for now
(where it's actually causing issues for me at the moment).

Differential Revision: https://reviews.llvm.org/D88994
2021-07-31 14:09:59 -07:00
Simon Pilgrim fcb710a7ad [NVPTX] Add select(cc,binop(),binop()) fast-math tests
As discussed on D106058 - we're not propagating the common flags to the merged binop
2021-07-18 15:30:24 +01:00
Artem Belevich d774b4aa5e [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.
That should allow clang to compile mma.h from CUDA-11.3.

Differential Revision: https://reviews.llvm.org/D105384
2021-07-15 12:02:09 -07:00
Simon Pilgrim 3cc38703d5 [NVPTX] Tweak fast-math tests to avoid select(binop(x,y),binop(x,z)) fold
As suggested on D106058, tweak the tests to keep the combineRepeatedFPDivisors test coverage.
2021-07-15 15:42:25 +01:00
Simon Pilgrim e21663d32b [NVPTX] Add selp.f32 checks to select(cond,fpbinop(),fpbinop()) tests
Will help show codegen diffs in an upcoming patch
2021-07-15 12:42:29 +01:00
Tom Stellard 7f1c077c30 tests/CodeGen: Use %python lit substitution when invoking python
This will use the python that LLVM was configured to use rather than
python from PATH.

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D105224
2021-07-06 18:46:36 -07:00
Steffen Larsen 3644726a78 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0.

PTX ISA description of

  - `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld
  - `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st
  - `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma
  - `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma

Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>
Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D104847
2021-06-29 15:44:07 -07:00
Bjorn Pettersson 4c7f820b2b Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
serge-sans-paille 4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee0.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
thomasraoux 505933a489 [NVPTX] Fix lowering of frem for negative values
to match fmod frem result must have the dividend sign. Previous implementation
had the wrong sign when passing negative numbers. For ex: frem(-16, 7) was
returning 5 instead of -2. We should just a ftrunc instead of floor when
lowering to get the right behavior.

Differential Revision: https://reviews.llvm.org/D102528
2021-05-24 07:45:03 -07:00
Steffen Larsen f226e28a88 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions
for `sm_80` architecture or newer.

PTX ISA description of `redux.sync`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100124
2021-05-17 09:46:59 -07:00
Stuart Adams 02c2468864 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `cp.async` instructions for
`sm_80` architecture or newer.

PTX ISA description of `cp.async`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive

Authored-by: Stuart Adams <stuart.adams@codeplay.com>
Co-Authored-by: Alexander Johnston <alexander@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100394
2021-05-17 09:46:59 -07:00
William S. Moses 7aa3cad46a [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 20:12:12 -04:00
William S. Moses 8ede96493c Revert "[NVPTX] Enable lowering of atomics on local memory"
This reverts commit fede99d386.
2021-04-26 19:33:01 -04:00
William S. Moses fede99d386 [NVPTX] Enable lowering of atomics on local memory
LLVM does not have valid assembly backends for atomicrmw on local memory. However, as this memory is thread local, we should be able to lower this to the relevant load/store.

Differential Revision: https://reviews.llvm.org/D98650
2021-04-26 19:27:27 -04:00
Artem Belevich d0615a93bb [NVPTX] Handle bitcast and ASC(101) when trying to avoid argument copy.
This allows us to skip the copy in few more cases.

Differential Revision: https://reviews.llvm.org/D99979
2021-04-06 13:06:00 -07:00
Johannes Doerfert f40a2c3bef [NVPTX] CUDA does provide malloc/free since compute capability 2.X
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#dynamic-global-memory-allocation-and-operations

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D98606
2021-03-15 22:45:56 -05:00
Artem Belevich 50c7504a93 [NVPTX] Avoid temp copy of byval kernel parameters.
Avoid making a temporary copy of byval argument if all accesses are loads and
therefore the pointer to the parameter can not escape.

This avoids excessive global memory accesses when each kernel makes its own
copy.

Differential revision: https://reviews.llvm.org/D98469
2021-03-15 14:27:22 -07:00
Arthur Eubanks e84a4650eb [NVPTX][NewPM] Re-enable NVVMReflectPass
Disabled alongside NVVMIntrRangePass in https://reviews.llvm.org/D96166,
but turns out NVVMIntrRangePass was the issue.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D96291
2021-02-08 13:58:17 -08:00
Arthur Eubanks 526c0955c0 [NVPTX][NewPM] Temporarily disable NVPTX passes in new PM pipeline
These passes are causing numerical discrepancies after being added to
the pipeline. Disable while investigating.

Reviewed By: rupprecht

Differential Revision: https://reviews.llvm.org/D96166
2021-02-05 11:31:07 -08:00
Mircea Trofin 05e90cefeb [NFC] Disallow unused prefixes under llvm/test/CodeGen
This patch finishes addressing unused prefixes under CodeGen: 2
remaining tests fixed, and then undo-ing the lit.local.cfg changes under
various subdirs and moving the policy under CodeGen.

Differential Revision: https://reviews.llvm.org/D94430
2021-01-11 12:32:18 -08:00
QingShan Zhang 7539c75bb4 [DAGCombine] Remove the check for unsafe-fp-math when we are checking the AFN
We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent.
As the direction is to remove this global option, we need to remove the unsafe-fp-math
check for sqrt and update the test with afn fast-math flags.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D93891
2021-01-11 02:25:53 +00:00
Arthur Eubanks 9ccf13c36d [NewPM][NVPTX] Port NVPTX opt passes
There are only two used in the IR optimization pipeline.
Port these and add them to the default pipeline.

Similar to https://reviews.llvm.org/D93863.

I added -mtriple to some tests since under the new PM, the passes are
only available when the TargetMachine is specified.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D93930
2021-01-07 15:12:35 -08:00
Matt Arsenault 06c192d454 OpaquePtr: Bulk update tests to use typed byval
Upgrade of the IR text tests should be the only thing blocking making
typed byval mandatory. Partially done through regex and partially
manual.
2020-11-20 14:00:46 -05:00
Sven van Haastregt 5d03080092 [TargetLowering] Add i1 condition for bit comparison fold
For i1 types, boolean false is represented identically regardless of
the boolean content, so we can allow optimizations that otherwise
would not be correct for booleans with false represented as a negative
one.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D90145
2020-10-27 12:22:20 +00:00
Sven van Haastregt bfc961aeb2 [TargetLowering] Check boolean content when folding bit compare
Updates an optimization that relies on boolean contents being either 0
or 1 to properly check for this before triggering.

The following:
  (X & 8) != 0 --> (X & 8) >> 3
Produces unexpected results when a boolean 'true' value is represented
by negative one.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D89390
2020-10-21 11:46:55 +01:00
Sven van Haastregt 1af51f077b [TargetLowering] Add test for bit comparison fold
This adds a test covering an issue in bit comparison folding.  The
issue will be addressed in the subsequent commit.

Patch by Erik Hogeman.

Differential Revision: https://reviews.llvm.org/D89390
2020-10-21 11:46:45 +01:00
Justin Lebar e9ac1869a8
Preserve param alignment in NVPTXLowerArgs pass.
NVPTXLowerArgs works as follows.

  * Create a regular alloca with alignment identical to arg.
  * Copy arg from param space (and ASC'ing it from generic AS first) to
    the alloca (it's still in generic AS).
  * Replace loads of arg with loads of alloca.

The bug here is that we did not preserve the arg's alignment when
loading from the alloca.

The impact of this bug is that sometimes param loads would be lowered as
a series of u8 loads, because we're incorrectly assuming everything has
alignment 1.

Differential Revision: https://reviews.llvm.org/D89404
2020-10-14 11:15:30 -07:00
Ellis Hoag 33490acf24 [NVPTX] Fix typo in lit test
LBAEL => LABEL

I encountered this typo elsewhere and I decided to run a global search.
It probably was unnoticed because I think CHECK-LBAEL: is ignored by
lit.

    Differential Revision: https://reviews.llvm.org/D85569
2020-08-17 16:02:11 -04:00
tatz.j@northeastern.edu af5e61bf4f [NVPTX] Fix for NVPTX module asm regression
Currently module asm ends up emitted twice and at the wrong place in the PTX.
This patch moves module asm generation into emitStartOfAsmFile() which puts at
the correct location in the generated PTX.

Differential Revision: https://reviews.llvm.org/D82280
2020-06-24 11:17:09 -07:00
Jonathan Roelofs 7c5d2bec76 [llvm] Fix missing FileCheck directive colons
https://reviews.llvm.org/D77352
2020-04-06 09:59:08 -06:00
Qiu Chaofan 95bcab8272 [DAGCombiner] Require ninf for sqrt recip estimation
Currently, DAG combiner uses (fmul (rsqrt x) x) to estimate square
root of x. However, this method would return NaN if x is +Inf, which
is incorrect.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D76853
2020-04-01 16:23:43 +08:00
Matt Arsenault a8cc9047de CodeGen: Add -denormal-fp-math-f32 flag
Make the set of FP related attributes and command flags closer.
2020-03-27 14:00:39 -07:00
Matt Arsenault 0fd8030be3 Fix line endings in test 2020-03-27 16:26:06 -04:00
Matt Arsenault c4de8935a5 ARM: Fixup some tests using denormal-fp-math attribute
Don't use the deprecated, single mode form in tests. Also make sure to
parse the attribute, in case of the deprecated form.
2020-03-10 14:02:06 -04:00
Frederic Bastien 019ab61e25 [NVPTX, LSV] Move the LSV optimization pass to later when the graph is cleaner
This allow it to recognize more loads as being consecutive when the load's address are complex at the start.

Differential Revision: https://reviews.llvm.org/D74444
2020-02-13 12:15:38 -08:00
Yuanfang Chen 4ad7685258 Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""
This reverts commit 80a34ae311 with fixes.

Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
2020-02-13 10:16:06 -08:00
Yuanfang Chen 17122ec10a Revert "Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""""
This reverts commit bb51d24330.
2020-02-13 10:08:05 -08:00
Yuanfang Chen bb51d24330 Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""
This reverts commit 80a34ae311 with fixes.

On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
2020-02-13 10:02:53 -08:00
Yuanfang Chen 80a34ae311 Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""
This reverts commit rGcd5b308b828e, rGcd5b308b828e, rG8cedf0e2994c.

There are issues to be investigated for polly bots and bots turning on
EXPENSIVE_CHECKS.
2020-02-11 20:41:53 -08:00
Yuanfang Chen 8cedf0e299 Reland "[Support] make report_fatal_error `abort` instead of `exit`"
Summary:
Reland D67847 after D73742 is committed. Replace `sys::Process::Exit(1)`
with `abort` in `report_fatal_error`.

After this patch, for tools turning on `CrashRecoveryContext`,
crash handler installed by `CrashRecoveryContext` is called unless
they installed a non-returning handler using `llvm::install_fatal_error_handler`
like `cc1_main` currently does.

Reviewers: rnk, MaskRay, aganea, hans, espindola, jhenderson

Subscribers: jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, rupprecht, jocewei, jsji, Jim, dmgreen, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74456
2020-02-11 18:20:40 -08:00
Matt Arsenault a4451d88ee Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
2020-01-17 20:09:53 -05:00
Yuanfang Chen 6e24c6037f Revert "[Support] make report_fatal_error `abort` instead of `exit`"
This reverts commit 647c3f4e47.

Got bots failure from sanitizer-windows and maybe others.
2020-01-15 17:52:25 -08:00
Yuanfang Chen 647c3f4e47 [Support] make report_fatal_error `abort` instead of `exit`
Summary:
This patch could be treated as a rebase of D33960. It also fixes PR35547.
A fix for `llvm/test/Other/close-stderr.ll` is proposed in D68164. Seems
the consensus is that the test is passing by chance and I'm not
sure how important it is for us. So it is removed like in D33960 for now.
The rest of the test fixes are just adding `--crash` flag to `not` tool.

** The reason it fixes PR35547 is

`exit` does cleanup including calling class destructor whereas `abort`
does not do any cleanup. In multithreading environment such as ThinLTO or JIT,
threads may share states which mostly are ManagedStatic<>. If faulting thread
tearing down a class when another thread is using it, there are chances of
memory corruption. This is bad 1. It will stop error reporting like pretty
stack printer; 2. The memory corruption is distracting and nondeterministic in
terms of error message, and corruption type (depending one the timing, it
could be double free, heap free after use, etc.).

Reviewers: rnk, chandlerc, zturner, sepavloff, MaskRay, espindola

Reviewed By: rnk, MaskRay

Subscribers: wuzish, jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, arichardson, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, lenary, s.egerton, pzheng, cfe-commits, MaskRay, filcab, davide, MatzeB, mehdi_amini, hiraditya, steven_wu, dexonsmith, rupprecht, seiya, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D67847
2020-01-15 17:05:13 -08:00
Fangrui Song a36ddf0aa9 Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351 2019-12-24 16:27:51 -08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Artem Belevich d9972f8482 [NVPTX] Added llvm.nvvm.mma.m8n8k4.* intrinsics
Differential Revision: https://reviews.llvm.org/D69324
2019-10-28 13:55:30 -07:00
Artem Belevich 5c6ab2a0b1 [NVPTX] Restructure shfl instrinsics and add variants that return a predicate.
Also, amend constraints for non-sync variants that are no longer
available on sm_70+ with PTX6.4+.

Differential Revision: https://reviews.llvm.org/D68892

llvm-svn: 374790
2019-10-14 16:53:34 +00:00