Commit Graph

265 Commits

Author SHA1 Message Date
Michael Liao f3983cc14a [NVPTX] Fix PR41651
Summary:
- Use the passed `DL` directly as retrieving data layout from CS by
  checking the called function is not reliable. Under indirect function
  call, there is no called function.

Subscribers: jholewinski, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65468

llvm-svn: 367349
2019-07-30 19:52:01 +00:00
Benjamin Kramer fa1a4e4de5 [NVPTX] Use atomicrmw fadd instead of intrinsics
AutoUpgrade the old intrinsics to atomicrmw fadd.

llvm-svn: 365796
2019-07-11 17:11:25 +00:00
Simon Pilgrim 266f43964e [TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI.
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.

llvm-svn: 363048
2019-06-11 11:00:23 +00:00
Artem Belevich 16737538f4 PTX 6.3 extends `wmma` instruction to support s8/u8/s4/u4/b1 -> s32.
All of the new instructions are still handled mostly by tablegen. I've slightly
refactored the code to drive intrinsic/instruction generation from a master
list of supported variants, so all irregularities have to be implemented in one place only.

The test generation script wmma.py has been refactored in a similar way.

Differential Revision: https://reviews.llvm.org/D60015

llvm-svn: 359247
2019-04-25 22:27:57 +00:00
Bixia Zheng 6c21ccd245 [NVPTX] Fix the codegen for llvm.round.
Summary:
Previously, we translate llvm.round to PTX cvt.rni, which rounds to the
even interger when the source is equidistant between two integers. This
is not correct as llvm.round should round away from zero. This change
replaces llvm.round with a round away from zero implementation through
target specific custom lowering.

Modify a few affected tests to not check for cvt.rni. Instead, we check
for the use of a few constants used in implementing round. We are also
adding CUDA runnable tests to check for the values produced by
llvm.round to test-suites/External/CUDA.

Reviewers: tra

Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59947

llvm-svn: 357407
2019-04-01 16:10:26 +00:00
Jordan Rupprecht 6387fa2715 [NFC] Fix typos: preceeding -> preceding
llvm-svn: 354715
2019-02-23 01:28:32 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Justin Lebar 49fac56ea3 [NVPTX] Allow libcalls that are defined in the current module.
The patch adds a possibility to make library calls on NVPTX.

An important thing about library functions - they must be defined within
the current module. This basically should guarantee that we produce a
valid PTX assembly (without calls to not defined functions). The one who
wants to use the libcalls is probably will have to link against
compiler-rt or any other implementation.

Currently, it's completely impossible to make library calls because of
error LLVM ERROR: Cannot select: i32 = ExternalSymbol '...'. But we can
lower ExternalSymbol to TargetExternalSymbol and verify if the function
definition is available.

Also, there was an issue with a DAG during legalisation. When we expand
instruction into libcall, the inner call-chain isn't being "integrated"
into outer chain. Since the last "data-flow" (call retval load) node is
located in call-chain earlier than CALLSEQ_END node, the latter becomes
a leaf and therefore a dead node (and is being removed quite fast).
Proposed here solution relies on another data-flow pseudo nodes
(ProxyReg) which purpose is only to keep CALLSEQ_END at legalisation and
instruction selection phases - we remove the pseudo instructions before
register scheduling phase.

Patch by Denys Zariaiev!

Differential Revision: https://reviews.llvm.org/D34708

llvm-svn: 350069
2018-12-26 19:12:31 +00:00
Artem Belevich 6d74bd638a [NVPTX] Lower instructions that expand into libcalls.
The change is an effort to split and refactor abandoned
D34708 into smaller parts.

Here the behaviour of unsupported instructions is changed
to match the behaviour of explicit intrinsics calls.
Currently LLVM crashes with:
> Assertion getInstruction() && "Not a call or invoke instruction!" failed.

With this patch LLVM produces a more sensible error message:
> Cannot select: ... i32 = ExternalSymbol'__foobar'

Author: Denys Zariaiev <denys.zariaiev@gmail.com>

Differential Revision: https://reviews.llvm.org/D55145

llvm-svn: 349213
2018-12-14 23:53:06 +00:00
Artem Belevich e5664b1559 [NVPTX] Add lowering of i128 numbers as struct fields
Addition to D34555 - override VTs computation with ComputePTXValueVTs
for struct fields.

Author: Denys Zariaiev<denys.zariaiev@gmail.com>

Differential Revision: https://reviews.llvm.org/D55144

llvm-svn: 348057
2018-12-01 00:21:52 +00:00
Craig Topper 0b5f8169b0 [TargetLowering] Change TargetLoweringBase::getPreferredVectorAction to take an MVT instead of an EVT. NFC
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.

llvm-svn: 346180
2018-11-05 23:26:13 +00:00
Thomas Lively 30f1d69115 [NFC] Rename minnan and maxnan to minimum and maximum
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.

Reviewers: arsenm, aheejin, dschuff, javed.absar

Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D53112

llvm-svn: 345218
2018-10-24 22:49:55 +00:00
Benjamin Kramer 2f0dd1405e [NVPTX] Expand v2f16 INSERT_VECTOR_ELT
Vectorization can create them.

llvm-svn: 336227
2018-07-03 20:40:04 +00:00
Amaury Sechet 8467411dad Set ADDE/ADDC/SUBE/SUBC to expand by default
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.

Target that uses these opcodes are changed in order to ensure their behavior doesn't change.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D47422

llvm-svn: 333748
2018-06-01 13:21:33 +00:00
Eric Christopher 68f2218e1e Revert "Temporarily revert "[DEBUG] Initial adaptation of NVPTX target for debug info emission.""
This reapplies commits: r330271, r330592, r330779.

    [DEBUG] Initial adaptation of NVPTX target for debug info emission.

    Summary:
    Patch adds initial emission of the debug info for NVPTX target.
    Currently, only .file and .loc directives are emitted, everything else is
    commented out to not break the compilation of Cuda.

llvm-svn: 332689
2018-05-18 03:13:08 +00:00
Artem Belevich 2f348ea1c7 [NVPTX] Added a feature to use short pointers for const/local/shared AS.
Const/local/shared address spaces are all < 4GB and we can always use
32-bit pointers to access them. This has substantial performance impact
on kernels that uses shared memory for intermediary results.

The feature is disabled by default.

Differential Revision: https://reviews.llvm.org/D46147

llvm-svn: 331941
2018-05-09 23:46:19 +00:00
Eric Christopher 8024425801 Temporarily revert "[DEBUG] Initial adaptation of NVPTX target for debug info emission."
This appears to have some issues associated with the file directive output
causing multiple global symbols with the name "file" to be emitted into a
startup section. I'm investigating more specific causes and working with the
original author.

This reverts commit r330271.

Also Revert "[DEBUGINFO, NVPTX] Add the test for the debug info of the local"

This reverts commit r330592 and the follow up of 330779 as the testcase is dependent upon r330271.

llvm-svn: 331237
2018-05-01 00:10:13 +00:00
Benjamin Kramer 7dd437710e [NVPTX] Make the legalizer expand shufflevector of <2 x half>
There's no direct instruction for this, but it's trivially implemented
with two movs. Without this the code generator just dies when
encountering a shufflevector.

Differential Revision: https://reviews.llvm.org/D46116

llvm-svn: 330948
2018-04-26 15:26:29 +00:00
Benjamin Kramer bd89647229 [NVPTX] Deduplicate code. No functionality change.
llvm-svn: 330933
2018-04-26 12:30:16 +00:00
Artem Belevich 0ae8590354 [NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions.
The new instructions were added added for sm_70+ GPUs in CUDA-9.1.

Differential Revision: https://reviews.llvm.org/D45068

llvm-svn: 330296
2018-04-18 21:51:48 +00:00
Alexey Bataev 242706b8d1 [DEBUG] Initial adaptation of NVPTX target for debug info emission.
Summary:
Patch adds initial emission of the debug info for NVPTX target.
Currently, only .file and .loc directives are emitted, everything else is
commented out to not break the compilation of Cuda.

Reviewers: echristo, jlebar, tra, jholewinski

Subscribers: mgorny, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D41827

llvm-svn: 330271
2018-04-18 16:13:41 +00:00
Craig Topper 2fa1436206 [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

llvm-svn: 328806
2018-03-29 17:21:10 +00:00
David Blaikie 36a0f226b1 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

llvm-svn: 328397
2018-03-23 23:58:31 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
Artem Belevich 30512869ff [NVPTX] Make tensor shape part of WMMA intrinsic's name.
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.

Differential Revision: https://reviews.llvm.org/D44719

llvm-svn: 328158
2018-03-21 21:55:02 +00:00
Artem Belevich 914d4babec [NVPTX] Make tensor load/store intrinsics overloaded.
This way we can support address-space specific variants without explicitly
encoding the space in the name of the intrinsic. Less intrinsics to deal with ->
less boilerplate.

Added a bit of tablegen magic to match/replace an intrinsics with a pointer
argument in particular address space with the space-specific instruction
variant.

Updated tests to use non-default address spaces.

Differential Revision: https://reviews.llvm.org/D43268

llvm-svn: 328006
2018-03-20 17:18:59 +00:00
Artem Belevich 7b14e7f041 [NVPTX] TblGen-ized lowering of WMMA intrinsics.
NFC.

Differential Revision: https://reviews.llvm.org/D43151

llvm-svn: 327672
2018-03-15 21:40:56 +00:00
Artem Belevich 18a7c51520 [NVPTX] Removed always-true predicates in NVPTX.
NVPTX stopped supporting GPUs older than sm_20 (Fermi) quite a while back.
Removal of support of pre-Fermi GPUs made a lot of predicates in the NVPTX
backend pointless as they can't ever be false any more.
It's time to retire them. NFC intended.

Differential Revision: https://reviews.llvm.org/D43843

llvm-svn: 326349
2018-02-28 18:51:22 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Matt Arsenault 7d7adf4f2e TLI: Allow using PSV for intrinsic mem operands
llvm-svn: 320756
2017-12-14 22:34:10 +00:00
Matt Arsenault 1117133687 DAG: Expose all MMO flags in getTgtMemIntrinsic
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.

On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.

llvm-svn: 320746
2017-12-14 21:39:51 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Artem Belevich 55dcf5e586 Mark intrinsics operating on the whole warp as IntrInaccessibleMemOnly
It's needed to model the fact that they do access data from other threads in a
warp and thus can't be CSE'd.

llvm-svn: 318173
2017-11-14 19:14:00 +00:00
Justin Lebar da9e0bd3a2 [NVPTX] Implement __nvvm_atom_add_gen_d builtin.
Summary:
This just seems to have been an oversight.  We already supported the f64
atomic add with an explicit scope (e.g. "cta"), but not the scopeless
version.

Reviewers: tra

Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D39638

llvm-svn: 317623
2017-11-07 22:10:54 +00:00
Artem Belevich 3bafc2f0d9 [NVPTX] Implemented wmma intrinsics and instructions.
WMMA = "Warp Level Matrix Multiply-Accumulate".
These are the new instructions introduced in PTX6.0 and available
on sm_70 GPUs.

Differential Revision: https://reviews.llvm.org/D38645

llvm-svn: 315601
2017-10-12 18:27:55 +00:00
Peter Collingbourne 081ffe2ff2 Change CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI.
This was a use-after-free waiting to happen.

llvm-svn: 309159
2017-07-26 19:15:29 +00:00
Jonas Paulsson 024e319489 [SystemZ, LoopStrengthReduce]
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.

In order to achieve this, the following common code changes were made:

 * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
 LSR should do instruction-based addressing evaluations by calling
 isLegalAddressingMode() with the Instruction pointers.
 * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
 as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
 not just loads or stores.

SystemZ changes:

 * isLSRCostLess() implemented with Insns first, and without ImmCost.
 * New function supportedAddressingMode() that is a helper for TTI methods
 looking at Instructions passed via pointers.

Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262
https://reviews.llvm.org/D35049

llvm-svn: 308729
2017-07-21 11:59:37 +00:00
Artem Belevich d7a73824e4 [NVPTX] Add lowering of i128 params.
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].

Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
  ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
  [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types

Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)

llvm-svn: 308675
2017-07-20 21:16:03 +00:00
Artem Belevich fef0804e35 Changed EOL back to LF. NFC.
llvm-svn: 308671
2017-07-20 20:57:51 +00:00
Sam Clegg fd5ab25ae1 Remove unneeded use of #undef DEBUG_TYPE. NFC
Where is is needed (at the end of headers that define it), be
consistent about its use.

Also fix a few header guards that I found in the process.

Differential Revision: https://reviews.llvm.org/D34916

llvm-svn: 307840
2017-07-12 20:49:21 +00:00
Michael Kuperstein 20d8e4ef76 Reverting r307326 because it breaks clang tests.
llvm-svn: 307334
2017-07-06 23:24:39 +00:00
Michael Kuperstein b9fc48da83 [NVPTX] Add lowering of i128 params.
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].

Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types

Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)

llvm-svn: 307326
2017-07-06 22:18:54 +00:00
Hiroshi Inoue 79f8933f23 fix trivial typos in comments; NFC
llvm-svn: 307094
2017-07-04 16:35:26 +00:00
Chandler Carruth 6bda14b313 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Simon Pilgrim 55ff57861a [NVPTX] Don't flag StoreParam/LoadParam memory chain operands as ReadMem/WriteMem (PR32146)
Follow up to D33147

NVPTXTargetLowering::LowerCall was trusting the default argument values.

Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33189

llvm-svn: 303082
2017-05-15 17:17:44 +00:00
Simon Pilgrim f8389656e3 [NVPTX] Don't rely on default arguments to SelectionDAG::getMemIntrinsicNode. NFC.
NFC followup to D33147, this explicitly sets all the arguments (instead of relying on the defaults) to SelectionDAG::getMemIntrinsicNode to help identify -verify-machineinstrs issues.

llvm-svn: 303047
2017-05-15 10:47:48 +00:00
Simon Pilgrim a1978aaefd [NVPTX] Don't flag StoreRetVal memory chain operands as ReadMem (PR32146)
This fixes 47 of the 75 NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.

Differential Revision: https://reviews.llvm.org/D33147

llvm-svn: 302942
2017-05-12 19:56:43 +00:00
Serge Pavlov d526b13e61 Add extra operand to CALLSEQ_START to keep frame part set up previously
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to  CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.

This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.

The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
  affects all targets that use frame pseudo instructions and touched many
  files although the changes are uniform.
- Access to frame properties are implemented using special instructions
  rather than calls getOperand(N).getImm(). For X86 and ARM such
  replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
  instruction. These involve proper instruction initialization and
  methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
  frame parts initialized inside frame instruction pair and outside it.

The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.

Differential Revision: https://reviews.llvm.org/D32394

llvm-svn: 302527
2017-05-09 13:35:13 +00:00
Simon Pilgrim 98f1d02677 [NVPTX] Add support for ISD::ABS lowering
Use the ISD::ABS opcode directly

Differential Revision: https://reviews.llvm.org/D32944

llvm-svn: 302356
2017-05-06 17:42:09 +00:00
Reid Kleckner f021fab2af [IR] Make getParamAttributes take argument numbers, not ArgNo+1
Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1,
Kind) everywhere.

The fact that the AttributeList index for an argument is ArgNo+1 should
be a hidden implementation detail.

NFC

llvm-svn: 300272
2017-04-13 23:12:13 +00:00
Simon Pilgrim 68168d17b9 Spelling mistakes in comments. NFCI.
Based on corrections mentioned in patch for clang for PR27635

llvm-svn: 299072
2017-03-30 12:59:53 +00:00
Reid Kleckner b518054b87 Rename AttributeSet to AttributeList
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.

Rename AttributeSetImpl to AttributeListImpl to follow suit.

It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.

Reviewers: sanjoy, javed.absar, chandlerc, pete

Reviewed By: pete

Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits

Differential Revision: https://reviews.llvm.org/D31102

llvm-svn: 298393
2017-03-21 16:57:19 +00:00
Artem Belevich 2524a22562 [NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors.
Differential Revision: https://reviews.llvm.org/D30672

llvm-svn: 297198
2017-03-07 20:33:38 +00:00
Artem Belevich 620db1f3dd [NVPTX] Added support for .f16x2 instructions.
This patch enables support for .f16x2 operations.

Added new register type Float16x2.
Added support for .f16x2 instructions.
Added handling of vectorized loads/stores of v2f16 values.

Differential Revision: https://reviews.llvm.org/D30057
Differential Revision: https://reviews.llvm.org/D30310

llvm-svn: 296032
2017-02-23 22:38:24 +00:00
Simon Pilgrim 3b97067ae8 Fix -Wunused-but-set-variable warning by removing unused 'aggregateIsPacked' checking
llvm-svn: 295830
2017-02-22 13:37:31 +00:00
Artem Belevich 29bbdc1c32 [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.

This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.

General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
  of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
  which returns an array of flags marking boundaries of vectorized
  load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
  that constructs a vector according to the plan.

Differential Revision: https://reviews.llvm.org/D30011

llvm-svn: 295784
2017-02-21 22:56:05 +00:00
Justin Lebar 06fcea4cd9 [NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).
x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as
desired.

Verified that the particular nvptx approximate instructions here do in
fact return 0 for x = 0.

llvm-svn: 293713
2017-01-31 23:08:57 +00:00
Justin Lebar 1c9692a46f [NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate.
Summary:

This lets us lower to sqrt.approx and rsqrt.approx under more
circumstances.

* Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32,
  when fast-math is enabled.  Previously, we only would emit it for
  calls to @llvm.nvvm.sqrt.f.  (With this patch we no longer emit
  sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to
  simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.)

* Now we emit the ftz version of rsqrt.approx when ftz is enabled.
  Previously, we only emitted rsqrt.approx when ftz was disabled.

Reviewers: hfinkel

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D28508

llvm-svn: 293605
2017-01-31 05:58:22 +00:00
Justin Lebar 077f8fb168 [NVPTX] Move getDivF32Level, usePrecSqrtF32, and useF32FTZ into out of DAGToDAG and into TargetLowering.
Summary:
DADToDAG has access to TargetLowering, but not vice versa, so this is
the more general location for these functions.

NFC

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28795

llvm-svn: 292693
2017-01-21 01:00:14 +00:00
Artem Belevich 3d3f6190ab [NVPTX] Fix lowering of fp16 ISD::FNEG.
There's no neg.f16 instruction, so negation has to
be done via subtraction from zero.

Differential Revision: https://reviews.llvm.org/D28876

llvm-svn: 292452
2017-01-19 00:14:45 +00:00
Justin Lebar cc938fc197 [NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic.
Summary:
This change also lets us use max.{s,u}16.  There's a vague warning in a
test about this maybe being less efficient, but I could not come up with
a case where the resulting SASS (sm_35 or sm_60) was different with or
without max.{s,u}16.  It's true that nvcc seems to emit only
max.{s,u}32, but even ptxas 7.0 seems to have no problem generating
efficient SASS from max.{s,u}16 (the casts up to i32 and back down to
i16 seem to be implicit and nops, happening via register aliasing).

In the absence of evidence, better to have fewer special cases, emit
more straightforward code, etc.  In particular, if a new GPU has 16-bit
min/max instructions, we want to be able to use them.

Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D28732

llvm-svn: 292304
2017-01-18 00:09:01 +00:00
Justin Lebar c7d20128bd [NVPTX] Add lowering for llvm.bitreverse.
Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D28720

llvm-svn: 292301
2017-01-18 00:08:10 +00:00
Artem Belevich 64dc9be7b4 [NVPTX] Added support for half-precision floating point.
Only scalar half-precision operations are supported at the moment.

- Adds general support for 'half' type in NVPTX.
- fp16 math operations are supported on sm_53+ GPUs only
  (can be disabled with --nvptx-no-f16-math).
- Type conversions to/from fp16 are supported on all GPU variants.
- On GPU variants that do not have full fp16 support (or if it's disabled),
  fp16 operations are promoted to fp32 and results are converted back
  to fp16 for storage.

Differential Revision: https://reviews.llvm.org/D28540

llvm-svn: 291956
2017-01-13 20:56:17 +00:00
Artem Belevich d109f46573 [NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed.
Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32
instruction even when unsafe FP math was not allowed.

Clang-generated IR is not affected by this as it uses precise sin/cos
from CUDA's libdevice when unsafe math is disabled.

Differential Revision: https://reviews.llvm.org/D28619

llvm-svn: 291936
2017-01-13 18:48:13 +00:00
Eugene Zelenko c9f1f6b8ec [NVPTX] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 291490
2017-01-09 22:16:51 +00:00
Justin Lebar f0a80ba385 [NVPTX] Compute 'rem' using the result of 'div', if possible.
Summary:
In isel, transform

  Num % Den

into

  Num - (Num / Den) * Den

if the result of Num / Den is already available.

Reviewers: tra

Subscribers: hfinkel, llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D26090

llvm-svn: 285461
2016-10-28 21:44:00 +00:00
Peter Collingbourne 6733564e5a Target: Change various section classifiers in TargetLoweringObjectFile to take a GlobalObject.
These functions are about classifying a global which will actually be
emitted, so it does not make sense for them to take a GlobalValue which may
for example be an alias.

Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to
look through aliases before using TargetLoweringObjectFile interfaces. These
are functional changes but all appear to be bug fixes.

Differential Revision: https://reviews.llvm.org/D25917

llvm-svn: 285006
2016-10-24 19:23:39 +00:00
Artem Belevich 3e1211581c [NVPTX] Added intrinsics for atom.gen.{sys|cta}.* instructions.
These are only available on sm_60+ GPUs.

Differential Revision: https://reviews.llvm.org/D24943

llvm-svn: 282607
2016-09-28 17:25:38 +00:00
Jacques Pienaar 98345fc0a1 [NVPTX] Check if callsite is defined when computing argument allignment
Summary: In getArgumentAlignment check if the ImmutableCallSite pointer CS is non-null before dereferencing. If CS is 0x0 fall back to the ABI type alignment else compute the alignment as before.

Reviewers: eliben, jpienaar

Subscribers: jlebar, vchuravy, cfe-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D9168

llvm-svn: 282045
2016-09-21 01:57:57 +00:00
Eric Christopher 4367c7fb9a Move the Mangler from the AsmPrinter down to TLOF and clean up the
TLOF API accordingly.

llvm-svn: 281708
2016-09-16 07:33:15 +00:00
Sanjay Patel b1f0a0f4a8 getValueType().getSizeInBits() -> getValueSizeInBits() ; NFCI
llvm-svn: 281493
2016-09-14 16:05:51 +00:00
Justin Lebar adbf09e8cf [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

llvm-svn: 281151
2016-09-11 01:38:58 +00:00
Justin Lebar b5e884976b [NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.
Summary:
Previously these only worked via NVPTX-specific intrinsics.

This change will allow us to convert these target-specific intrinsics
into the general LLVM versions, allowing existing LLVM passes to reason
about their behavior.

It also gets us some minor codegen improvements as-is, from situations
where we canonicalize code into one of these llvm intrinsics.

Reviewers: majnemer

Subscribers: llvm-commits, jholewinski, tra

Differential Revision: https://reviews.llvm.org/D24300

llvm-svn: 281092
2016-09-09 21:07:26 +00:00
Michael Kuperstein 2bc3d4d46c [SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> fround
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.

Differential Revision: https://reviews.llvm.org/D23597

llvm-svn: 279129
2016-08-18 20:08:15 +00:00
Artem Belevich b2e76a5e7a [NVPTX] Improve lowering of byval args of device functions.
Avoid unnecessary spills of byval arguments of device functions to
local space on SASS level and subsequent pointer conversion to generic
address space that follows. Instead, make a local copy in IR, provide
a way to access arguments directly, and let LLVM optimize the copy away
when possible.

Differential Review: https://reviews.llvm.org/D21421

llvm-svn: 276153
2016-07-20 18:39:47 +00:00
Artem Belevich 9f97dcb018 [NVPTX] Make sure we adjust alignment at all call sites
.. including calls from kernel functions that were
ignored by mistake before.

llvm-svn: 275920
2016-07-18 21:58:48 +00:00
Artem Belevich 052b1ed2fd [NVPTX] Force minimum alignment of 4 for byval arguments of device-side functions.
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.

The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.

Differential Revision: https://reviews.llvm.org/D22428

llvm-svn: 275893
2016-07-18 19:54:56 +00:00
Justin Lebar 9c375817ac [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).

Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.

This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted.  It also greatly simplifies the process of adding another flag
to getLoad.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D22249

llvm-svn: 275592
2016-07-15 18:27:10 +00:00
Artem Belevich 4d5d7be8cc Revert r273313 "[NVPTX] Improve lowering of byval args of device functions."
The change causes llvm crash in some unoptimized builds.

llvm-svn: 274163
2016-06-29 20:51:15 +00:00
Justin Holewinski cb29fb4a98 Only emit extension for zeroext/signext arguments if type is < 32 bits
Reviewers: jingyue, jlebar

Subscribers: jholewinski

Differential Revision: http://reviews.llvm.org/D21756

llvm-svn: 273922
2016-06-27 20:22:22 +00:00
Artem Belevich d7ebcfb291 [NVPTX] Improve lowering of byval args of device functions.
Avoid unnecessary spills of such vars to local space on SASS level and
pointer space conversion.

Instead, make a local copy with appropriate addrspacecasts and let
LLVM optimize them away when possible.

This allows loading value of the argument using [symbol+offset]
instead of converting argument to general space pointer and using it
for indexing (which also implicitly converts param space pointer to
local space one on SASS level and triggers copying of argument into
local space in the process).

This reduces call overhead, uses less registers and reduces overall
SASS size by 2-4%.

Differential Review: http://reviews.llvm.org/D21421

llvm-svn: 273313
2016-06-21 20:30:26 +00:00
Benjamin Kramer bdc4956bac Pass DebugLoc and SDLoc by const ref.
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.

llvm-svn: 272512
2016-06-12 15:39:02 +00:00
Benjamin Kramer 46e38f3678 Avoid copies of std::strings and APInt/APFloats where we only read from it
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.

llvm-svn: 272126
2016-06-08 10:01:20 +00:00
Craig Topper 33772c5375 [CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior.
llvm-svn: 267853
2016-04-28 03:34:31 +00:00
Ahmed Bougacha 128f8732a5 [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.
Differential Revision: http://reviews.llvm.org/D17176

llvm-svn: 267606
2016-04-26 21:15:30 +00:00
Craig Topper 6f8b8e4c45 [NVPTX] Set ctlz_zero_undef to Expand so LegalizeDAG will convert it to ctlz. Remove the now unneccessary isel patterns. NFC
llvm-svn: 267265
2016-04-23 02:49:29 +00:00
Justin Lebar 96418481bc [NVPTX] Add a truncate DAG node to some calls.
Summary:
Previously, we were running afoul of the assertion

  EVT(CLI.Ins[i].VT) == InVals[i].getValueType() && "LowerCall emitted a value with the wrong type!"

in SelectionDAGBuilder.cpp when running the NVPTX/i8-param.ll test.
This is because our backend (for some reason) treats small return values
as i32, but it wasn't ever truncating the i32 back down to the expected
width in the DAG.

Unclear to me whether this fixes any actual bugs -- in this test, at
least, the generated code is unchanged.

Reviewers: jingyue

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: http://reviews.llvm.org/D17872

llvm-svn: 265091
2016-04-01 01:09:10 +00:00
Benjamin Kramer 9415e06da7 [NVPTX] Avoid temporary std::string and make single-use function local to the cpp file.
No functionality change intended.

llvm-svn: 264861
2016-03-30 12:31:51 +00:00
Justin Lebar b5ca00a58d [NVPTX] Use different, convergent MIs for convergent calls.
Summary:
Calls sometimes need to be convergent.  This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.

Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs.  But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.

Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.

Reviewers: jholewinski

Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits

Differential Revision: http://reviews.llvm.org/D17423

llvm-svn: 262373
2016-03-01 19:24:03 +00:00
Ahmed Bougacha f8dfb47c02 [CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI.
llvm-svn: 260316
2016-02-09 22:54:12 +00:00
Jingyue Wu 585ec8671d [NVPTX] expand mul_lohi to mul_lo and mul_hi
Summary: Fixes PR26186.

Reviewers: grosser, jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D16479

llvm-svn: 258536
2016-01-22 19:47:26 +00:00
Amjad Aboud d7cfb48485 Added support for macro emission in dwarf (supporting DWARF version 4).
Differential Revision: http://reviews.llvm.org/D15495

llvm-svn: 257060
2016-01-07 14:28:20 +00:00
Duncan P. N. Exon Smith 61149b86c3 NVPTX: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250779
2015-10-20 00:54:09 +00:00
Craig Topper ec15ea12e7 Use std::find instead of manual loop.
llvm-svn: 250624
2015-10-17 21:32:28 +00:00
Benjamin Kramer c5275bdec1 [NVPTX] Remove dead code.
I left helpers that look useful for debugging alone. NFC.

llvm-svn: 250410
2015-10-15 14:45:41 +00:00
Rafael Espindola 284093033f git-clang-format r249548.
Sorry for missing this the first time.

llvm-svn: 249610
2015-10-07 20:32:24 +00:00
Rafael Espindola 30d77777e7 Use non virtual destructors for sections.
llvm-svn: 249548
2015-10-07 13:46:06 +00:00
Bjarke Hammersholt Roune 6c64738e87 [NVPTX] Let NVPTX backend detect integer min and max patterns.
Summary:
Let NVPTX backend detect integer min and max patterns during isel and emit intrinsics that enable hardware support.


Reviewers: jholewinski, meheff, jingyue

Subscribers: arsenm, llvm-commits, meheff, jingyue, eliben, jholewinski

Differential Revision: http://reviews.llvm.org/D12377

llvm-svn: 246107
2015-08-26 23:22:02 +00:00
Mark Heffernan 438ffe5eac Use 32-bit divides instead of 64-bit divides where possible.
For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor
fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason
is that many index computations are carried out in 64-bits but never actually exceed the
capacity of a 32-bit word.

llvm-svn: 244684
2015-08-11 22:16:34 +00:00
Craig Topper e3dcce9700 De-constify pointers to Type since they can't be modified. NFC
This was already done in most places a while ago. This just fixes the ones that crept in over time.

llvm-svn: 243842
2015-08-01 22:20:21 +00:00