Commit Graph

5549 Commits

Author SHA1 Message Date
Jay Foad 892ef2e3c0 [AMDGPU] More codegen patterns for v2i16/v2f16 build_vector
It's simpler to do this at codegen time than to do ad-hoc constant
folding of machine instructions in SIFoldOperands.

Differential Revision: https://reviews.llvm.org/D88028
2020-09-22 10:41:38 +01:00
Matt Arsenault 6daddc213f AMDGPU: Don't add frame register to frame pseudos
We no longer treat the frame register like a function argument, so the
problem this avoided is no longer relevant.
2020-09-21 16:18:47 -04:00
Alexander Belyaev 17dc729bd4 Revert "[NFC][ScheduleDAG] Remove unused EntrySU SUnit"
This reverts commit 0345d88de6.

Google internal backend uses EntrySU, we are looking into removing
dependency on it.

Differential Revision: https://reviews.llvm.org/D88018
2020-09-21 13:33:05 +02:00
Matt Arsenault 3105d0f84b CodeGen: Move split block utility to MachineBasicBlock
AMDGPU needs this in several places, so consolidate them here.
2020-09-18 14:05:18 -04:00
Matt Arsenault 0576f436e5 AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9, this would sometimes
not emit the or to exec at the beginning of the block, where it really
has to be. If there is an instruction that defines one of the source
operands, split the block and turn the si_end_cf into a terminator.

This avoids regressions when regalloc fast is switched to inserting
reloads at the beginning of the block, instead of spills at the end of
the block.

In a future change, this should always split the block.
2020-09-18 13:43:01 -04:00
Francis Visoiu Mistrih 0345d88de6 [NFC][ScheduleDAG] Remove unused EntrySU SUnit
EntrySU doesn't seem to be used at all when building the ScheduleDAG.

Differential Revision: https://reviews.llvm.org/D87867
2020-09-18 09:50:47 -07:00
Matt Arsenault 27df165270 Revert "[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel."
This reverts commit c3492a1aa1.

I think this is the wrong strategy and wrong place to do this
transform anyway. Also reverts follow up commit
7d593d0d69.
2020-09-18 09:48:33 -04:00
Mirko Brkusanin ae36c02ad0 [AMDGPU] Set DS alignment requirements to be more strict
Alignment requirements for ds_read/write_b96/b128 for gfx9 and onward are
now the same as for other GCN subtargets. This way we can avoid any
unintentional use of these instructions on systems that do not support dword
alignment and instead require natural alignment.
This also makes 'SH_MEM_CONFIG.alignment_mode == STRICT' the default.

Differential Revision: https://reviews.llvm.org/D87821
2020-09-18 15:26:24 +02:00
Bogdan Graur 7d593d0d69 [amdgpu] Compilation fix for Release
Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D87838
2020-09-17 18:04:53 +02:00
Michael Liao c3492a1aa1 [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.
- Need to lower COPY from SGPR to VGPR to a real instruction as the
  standard COPY is used where the source and destination are from the
  same register bank so that we potentially coalesc them together and
  save one COPY. Considering that, backend optimizations, such as CSE,
  won't handle them. However, the copy from SGPR to VGPR always needs
  materializing to a native instruction, it should be lowered into a
  real one before other backend optimizations.

Differential Revision: https://reviews.llvm.org/D87556
2020-09-17 11:04:17 -04:00
alex-t 0efbb70b71 [AMDGPU] should expand ROTL i16 to shifts.
Instruction combining pass turns library rotl implementation to llvm.fshl.i16.
In the selection dag the intrinsic is turned to ISD::ROTL node that cannot be selected.
Need to expand it to shifts again.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D87618
2020-09-17 17:34:33 +03:00
Simon Pilgrim f026812110 InstCombiner.h - remove unnecessary KnownBits.h include. NFCI.
Move the include down to cpp files with an implicit dependency.
2020-09-17 14:28:42 +01:00
Simon Pilgrim 8adf92e2d1 [AMDGPU] Remove orphan SITargetLowering::LowerINT_TO_FP declaration. NFCI.
Method implementation no longer exists.
2020-09-17 10:45:53 +01:00
Stanislav Mekhanoshin 91f503c3af [AMDGPU] gfx1030 RT support
Differential Revision: https://reviews.llvm.org/D87782
2020-09-16 11:40:58 -07:00
Matt Arsenault 367248956e AMDGPU: Clear offset register when using local stack area
eliminateFrameIndex won't fix up the offset register when the direct
frame index reference is moved to a separate move instruction. Switch
the offset to a base 0 (which it probably should be to begin with).
2020-09-16 12:56:40 -04:00
Jay Foad cb64455faa [AMDGPU] Remove obsolete comment
Obsoleted by e4464bf3d4 "AMDGPU/GlobalISel: Select scalar v2s16 G_BUILD_VECTOR"
2020-09-16 17:03:55 +01:00
Dmitry Preobrazhensky 06d058afec [AMDGPU] Corrected directive to use for ELF weak refs
WeakRefDirective should specify a directive to declare "a global as being a weak undefined symbol".
The directive used by AMDGPU was incorrect - ".weakref" was intended for other purposes.
The correct directive is ".weak" and it is already defined as default for ELF.
So the redefinition was removed.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D87762
2020-09-16 18:51:26 +03:00
Mircea Trofin 6e85c3d5c7 [NFC][Regalloc] accessors for 'reg' and 'weight'
Also renamed the fields to follow style guidelines.

Accessors help with readability - weight mutation, in particular,
is easier to follow this way.

Differential Revision: https://reviews.llvm.org/D87725
2020-09-16 08:28:57 -07:00
Matt Arsenault 71131db689 AMDGPU: Improve <2 x i24> arguments and return value handling
This was asserting for GlobalISel. For SelectionDAG, this was
passing this on the stack. Instead, scalarize this as if it were a
32-bit vector.
2020-09-16 11:21:56 -04:00
Sebastian Neubauer 833b3b0d3a [AMDGPU] Add v3f16/v3i16 support to SDag
Fix lowering and instruction selection for v3x16 types
and enable InstCombine to emit them.

This patch only implements it for the selection dag.
GlobalISel tests in GlobalISel/llvm.amdgcn.image.load.1d.d16.ll and
GlobalISel/llvm.amdgcn.image.store.2d.d16.ll still don't work.

Differential Revision: https://reviews.llvm.org/D84420
2020-09-16 17:20:27 +02:00
Jay Foad 90777e2924 [AMDGPU] Enable scheduling around FP MODE-setting instructions
Pre-gfx10 all MODE-setting instructions were S_SETREG_B32 which is
marked as having unmodeled side effects, which makes the machine
scheduler treat it as a barrier. Now that we have proper implicit $mode
operands we can use a no-side-effects S_SETREG_B32_mode pseudo instead
for setregs that only touch the FP MODE bits, to give the scheduler more
freedom.

Differential Revision: https://reviews.llvm.org/D87446
2020-09-16 16:10:47 +01:00
Stanislav Mekhanoshin 277de43d88 [AMDGPU] Unify intrinsic ret/nortn interface
We have a single noret intrinsic an a lot of special handling
around it. Declare it just as any other but do not define rtn
instructions itself instead.

Differential Revision: https://reviews.llvm.org/D87719
2020-09-15 15:26:42 -07:00
Simon Pilgrim 97a23ab28a AMDGPUPrintfRuntimeBinding.cpp - drop unnecessary casts/dyn_casts. NFCI.
GetElementPtrInst::Create returns a GetElementPtrInst* so we don't need to cast. Similarly IntegerType inherits from the Type base class.

Also, I've used auto* in a few places to cleanup the code.

Helps fix some clang-tidy warnings which saw the dyn_casts and warned that these can return null.
2020-09-15 14:49:04 +01:00
Craig Topper c193a689b4 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Austin Kerbow f859c30ecb [AMDGPU] Add XDL resource to scheduling model
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87621
2020-09-14 13:48:54 -07:00
Eric Astor 20201dc76a [ms] [llvm-ml] Add support for size queries in MASM
Add support for size inference, sizeof, typeof, and lengthof.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86947
2020-09-14 14:27:06 -04:00
Jay Foad c799f873cb [AMDGPU] Don't cluster stores
Clustering loads has caching benefits, but as far as I know there is no
advantage to clustering stores on any AMDGPU subtargets.

The disadvantage is that it tends to increase register pressure and
restricts scheduling freedom.

Differential Revision: https://reviews.llvm.org/D85530
2020-09-14 13:40:17 +01:00
Petar Avramovic 6e2a86ed5a AMDGPU/GlobalISel Check for NoNaNsFPMath in isKnownNeverSNaN
Check for NoNaNsFPMath function attribute in isKnownNeverSNaN.
Function attributes are in held in 'TargetMachine.Options'.
Among other things, this allows selection of some patterns imported
in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN
returns true in lowerFMinNumMaxNum.

However we notice some incorrect results since function attributes are
not correctly written in TargetMachine.Options when next function is
processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0,
it has "no-nans-fp-math"="false" but TargetMachine.Options still has it
set to true since first function in test file had this attribute set to
true. This will be fixed in D87511.

Differential Revision: https://reviews.llvm.org/D87456
2020-09-14 12:11:00 +02:00
Petar Avramovic 09b8871f8d AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands
Predicates with 'let PredicateCodeUsesOperands = 1' want to examine
matched operands. When we encounter predicate code that uses operands,
analyze its named operand arguments and create a map between argument
index and name. Later, when leaf node with name is encountered, emit
GIM_RecordNamedOperand that will store that operand at its argument
index in operand list. This operand list will be an argument to c++
code of the predicate.

Differential Revision: https://reviews.llvm.org/D87285
2020-09-14 10:39:56 +02:00
Matt Arsenault e15215e041 AMDGPU: Hoist check for VGPRs 2020-09-09 19:45:40 -04:00
Matt Arsenault 85490874b2 AMDGPU: Skip all meta instructions in hazard recognizer
This was not adding a necessary nop due to thinking the kill counted.
2020-09-09 19:45:40 -04:00
Matt Arsenault 82cbc9330a AMDGPU: Fix inserting waitcnts before kill uses 2020-09-09 19:45:40 -04:00
dfukalov c259d3a061 [AMDGPU] Fix for folding v2.16 literals.
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.

Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.

Fixes: SWDEV-247595

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87158
2020-09-10 01:39:25 +03:00
Amara Emerson e5784ef8f6 [GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator.
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.

This is enabled only with optimizations enabled like SelectionDAG.

Differential Revision: https://reviews.llvm.org/D86824
2020-09-09 14:31:12 -07:00
Amara Emerson cc76da7ada [GlobalISel] Rewrite the elide-br-by-swapping-icmp-ops combine to do less.
This combine previously tried to take sequences like:
  %cond = G_ICMP pred, a, b
  G_BRCOND %cond, %truebb
  G_BR %falsebb
%truebb:
  ...
%falsebb:
  ...

and by inverting the compare predicate and swapping branch targets, delete the
G_BR and instead have a single conditional branch to the falsebb. Since in an
earlier patch we have a combine to fold not(icmp) into just an inverted icmp,
we don't need this combine to do as much. This patch instead generalizes the
combine by just looking for:
  G_BRCOND %cond, %truebb
  G_BR %falsebb
%truebb:
  ...
%falsebb:
  ...

and then inverting the condition using a not (xor). The xor can be folded away
in a separate combine. This change also lets us avoid some optimization code
in the IRTranslator.

I also think that deleting G_BRs in the combiner is unnecessary. That's
something that targets can decide to do at selection time and could simplify
generic code in future.

Differential Revision: https://reviews.llvm.org/D86664
2020-09-09 13:08:16 -07:00
Jay Foad 649bde488c [AMDGPU] Simplify S_SETREG_B32 case in EmitInstrWithCustomInserter
NFC.
2020-09-09 15:18:31 +01:00
Dmitry Preobrazhensky 95b7040e43 [AMDGPU][MC] Improved diagnostic messages for invalid registers
Corrected parser to issue meaningful error messages for invalid and malformed registers.

See bug 41303: https://bugs.llvm.org/show_bug.cgi?id=41303

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D87234
2020-09-09 16:44:03 +03:00
Ronak Chauhan f078577f31 Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors"
This reverts commit 487a805310.

Tests fail on big endian machines.
2020-09-09 18:01:28 +05:30
Mirko Brkusanin 43af2a6faa [AMDGPU] Workaround for LDS Misalignment bug on GFX10
Add subtarget feature check to avoid using ds_read/write_b96/128 with too
low alignment if a bug is present on that specific hardware.
Add this "feature" to GFX 10.1.1 as it is also affected.
Add global-isel test.
2020-09-09 11:46:09 +02:00
Ronak Chauhan 487a805310 [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder, jhenderson, kzhuravl

Differential Revision: https://reviews.llvm.org/D80713
2020-09-08 21:26:11 +05:30
alex-t 2480a31e5d [AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block
optimizeEndCF removes EXEC restoring instruction case this instruction is the only one except the branch to the single successor and that successor contains EXEC mask restoring instruction that was lowered from END_CF belonging to IF_ELSE.
As a result of such optimization we get the basic block with the only one instruction that is a branch to the single successor.
In case the control flow can reach such an empty block from S_CBRANCH_EXEZ/EXECNZ it might happen that spill/reload instructions that were inserted later by register allocator are placed under exec == 0 condition and never execute.
Removing empty block solves the problem.

This change require further work to re-implement LIS updates. Recently, LIS is always nullptr in this pass. To enable it we need another patch to fix many places across the codegen.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D86634
2020-09-07 19:37:27 +03:00
vnalamot aff94ec0f4 [AMDGPU] Remove the dead spill slots while spilling FP/BP to memory
During the PEI pass, the dead TargetStackID::SGPRSpill spill slots
are not being removed while spilling the FP/BP to memory.

Fixes: SWDEV-250393

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87032
2020-09-06 07:04:25 +05:30
Jonas Paulsson 714ceefad9 [SelectionDAG] Always intersect SDNode flags during getNode() node memoization.
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.

This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.

This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.

There is more that needs to be done in this area as discussed here:

Differential Revision: https://reviews.llvm.org/D86871

Review: Ulrich Weigand, Sanjay Patel
2020-09-05 10:30:38 +02:00
Matt Arsenault 3c2a7bd286 AMDGPU: Remove code to handle tied si_else operands
This has not used tied operands for a long time.
2020-09-03 19:46:05 -04:00
Michael Liao bf41c4d29e [codegen] Ensure target flags are cleared/set properly. NFC.
- When an operand is changed into an immediate value or like, ensure their
  target flags being cleared or set properly.

Differential Revision: https://reviews.llvm.org/D87109
2020-09-03 18:37:39 -04:00
Simon Pilgrim 1673a08044 SelectionDAG.h - remove unnecessary FunctionLoweringInfo.h include. NFCI.
Use forward declarations and move the include down to dependent files that actually use it.

This also exposes a number of implicit dependencies on KnownBits.h
2020-09-03 18:33:25 +01:00
Dmitry Preobrazhensky ecde200209 [AMDGPU][MC] Corrected parser to avoid generation of excessive error messages
Summary of changes:
- Changed parser to eliminate generation of excessive error messages;
- Corrected lit tests to match all expected error messages;
- Corrected lit tests to guard against unwanted extra messages (added option "--implicit-check-not=error:");
- Added missing checks and fixed some typos in tests.

See bug 46907: https://bugs.llvm.org/show_bug.cgi?id=46907

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D86940
2020-09-02 19:42:18 +03:00
Jay Foad 4bdab2e86a [AMDGPU] Fix offset for REL32_HI relocs
The addend in a REL32 reloc needs to be adjusted to account for the
offset from the PC value returned by the s_getpc instruction to the
point where the reloc is applied. This was being done correctly for
(GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a
difference if the target symbol happens to get loaded almost exactly
a multiple of 4G away from the relocated instructions.

Differential Revision: https://reviews.llvm.org/D86938
2020-09-02 10:55:55 +01:00
Michael Liao 1f4e7463b5 [amdgpu] Run SROA after loop unrolling.
Summary: - There are promotable `alloca`s after loop unrolling.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, nikic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84252
2020-09-01 16:09:56 -04:00
Matt Arsenault 9145d75226 AMDGPU: Fix incorrectly deleting copies after spilling SGPR tuples
The implicit def of the super register would appear to kill any live
uses of components before the spill, and would be deleted by
MachineCopyPropagation. We need to add implicit uses of the super
register, similarly to what copyPhysReg does. VGPR tuples appear to be
correctly handled already. I need to double check the SGPR->memory
path.
2020-08-28 17:50:37 -04:00
Craig Topper aab90384a3 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Matt Arsenault af1c1e20f4 AMDGPU/GlobalISel: Implement computeKnownBits for groupstaticsize 2020-08-27 19:39:44 -04:00
Matt Arsenault 9d3dc276a6 AMDGPU: Fix broken switch braces 2020-08-27 19:39:39 -04:00
Matt Arsenault a1bc37c9e5 AMDGPU: Use caller subtarget, not intrinsic declaration
Intrinsic declarations use the default subtarget, but this should be
using the subtarget for the calling function. I haven't been able to
come up with a case where it matters though.
2020-08-27 16:42:09 -04:00
Matt Arsenault 6c770a09be AMDGPU: Hoist subtarget lookup 2020-08-27 10:27:56 -04:00
Jay Foad 45eeb8c2a9 [AMDGPU] Remove unused variable introduced in r251860 2020-08-27 13:28:32 +01:00
Piotr Sobczak 4e9d207117 [AMDGPU] Preserve vcc_lo when shrinking V_CNDMASK
There is no justification for changing vcc_lo to vcc
when shrinking V_CNDMASK, and such a change could
later confuse live variable analysis.

Make sure the original register is preserved.

Differential Revision: https://reviews.llvm.org/D86541
2020-08-27 10:22:50 +02:00
Matt Arsenault f78687df9b AMDGPU: Don't assert on misaligned DS read2/write2 offsets
This would assert with unaligned DS access enabled. The offset may not
be aligned. Theoretically the pattern predicate should check the
memory alignment, although it is possible to have the memory be
aligned but not the immediate offset.

In this case I would expect it to use ds_{read|write}_b64 with
unaligned access, but am not clear if there's a reason it doesn't.
2020-08-26 14:08:05 -04:00
Jay Foad a75e67b3b4 [AMDGPU] Make more use of Subtarget reference in SIInstrInfo 2020-08-26 15:04:00 +01:00
Matt Arsenault ff34116cf0 AMDGPU: Use Subtarget reference in SIInstrInfo 2020-08-26 09:18:41 -04:00
Matt Arsenault 21ccedc24f AMDGPU/GlobalISel: Tolerate negated control flow intrinsic outputs
If the condition output is negated, swap the branch targets. This is
similar to what SelectionDAG does for when SelectionDAGBuilder
decides to invert the condition and swap the branches.

This is leaving behind a dead constant def for some reason.
2020-08-26 08:58:54 -04:00
Jay Foad 831457c6d5 [AMDGPU][GlobalISel] Eliminate barrier if workgroup size is not greater than wavefront size
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guaranteed
to come to the same point at the same time.

This is the same optimization that was implemented for SelectionDAG in
D31731.

Differential Revision: https://reviews.llvm.org/D86609
2020-08-26 13:47:51 +01:00
Stanislav Mekhanoshin b7760c3e5d [AMDGPU] Remove unsound dependency on ISA version in waitcnt
Differential Revision: https://reviews.llvm.org/D86566
2020-08-25 14:01:42 -07:00
Stanislav Mekhanoshin 817c831f02 [AMDGPU] Switch to named simm16 in vscnt insertion
Differential Revision: https://reviews.llvm.org/D86568
2020-08-25 13:05:27 -07:00
Matt Arsenault 0d2fe90063 AMDGPU/GlobalISel: Use more accurate legality rules for merge/unmerge
Most notably, we were incorrectly reporting <3 x s16> as a legal type
for these. Make sure these aren't legal to help make progress on
fixing the artifact combiner and vector legalizer
rules. Unfortunately, this means spreading the -global-isel-abort=0
hack, although this doesn't change the legalizer result in any
situation.
2020-08-25 09:40:20 -04:00
Matt Arsenault ef8f3b5a78 AMDGPU/GlobalISel: Apply bitcast load/store hack to pointer vectors
The selection patterns will currently fail on these.
2020-08-25 09:37:41 -04:00
Matt Arsenault 77e5a195f8 AMDGPU/GlobalISel: Handle AGPRs used for SGPR operands.
We would still need to waterfall if the value were somehow an AGPR,
and also need to explicitly copy to a VGPR.
2020-08-24 17:54:34 -04:00
Matt Arsenault 75e6f0b3d4 AMDGPU: Add flag to disable promotion of uniform i16 ops
This interferes with GlobalISel's much better handling of the
situation.

This should really be disable for GlobalISel. However, the fallback
only re-runs the selection passes, and doesn't go back and rerun any
codegen IR passes. I haven't come up with a good solution to this
problem.
2020-08-24 14:39:27 -04:00
Matt Arsenault 62d1fb828f AMDGPU/GlobalISel: Use unmerge instead of extract in addrspace queries
This is a bit more consistent with regular operation legalization.
2020-08-24 11:07:51 -04:00
Matt Arsenault 70cd9f5b77 AMDGPU/GlobalISel: Start implementing computeKnownBitsForTargetInstr
Handle workitem intrinsics. There isn't really away to adequately test
this right now, since none of the known bits users are fine grained
enough to test the edge conditions. This triggers a number of
instances of the new 64-bit to 32-bit shift combine in the existing
tests.
2020-08-24 09:53:27 -04:00
Matt Arsenault e1644a3779 GlobalISel: Reduce G_SHL width if source is extension
shl ([sza]ext x, y) => zext (shl x, y).

Turns expensive 64 bit shifts into 32 bit if it does not overflow the
source type:

This is a port of an AMDGPU DAG combine added in
5fa289f0d8. InstCombine does this
already, but we need to do it again here to apply it to shifts
introduced for lowered getelementptrs. This will help matching
addressing modes that use 32-bit offsets in a future patch.

TableGen annoyingly assumes only a single match data operand, so
introduce a reusable struct. However, this still requires defining a
separate GIMatchData for every combine which is still annoying.

Adds a morally equivalent function to the existing
getShiftAmountTy. Without this, we would have to do try to repeatedly
query the legalizer info and guess at what type to use for the shift.
2020-08-24 09:42:40 -04:00
Stanislav Mekhanoshin 9a9a092e61 [AMDGPU] Avoid sorting stalls in regbank-reassign
This is the slowest operation in the already slow pass.
Instead of sorting just put a stall list into an ordered
map.

Differential Revision: https://reviews.llvm.org/D86253
2020-08-21 11:49:41 -07:00
Mirko Brkusanin 0654ff703d [AMDGPU] Use ds_read/write_b96/b128 when possible for SDag
Do not break down local loads and stores so ds_read/write_b96/b128 in
ISelLowering can be selected on subtargets that support them and if align
requirements allow them.

Differential Revision: https://reviews.llvm.org/D84403
2020-08-21 12:26:31 +02:00
Mirko Brkusanin d17ea67b92 [AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.

Differential Revision: https://reviews.llvm.org/D81638
2020-08-21 12:26:31 +02:00
Mirko Brkusanin f5cd7ec9f3 [AMDGPU] Reorganize GCN subtarget features for unaligned access
Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine
whether hardware supports such access.
UnalignedAccessMode should be used to enable them.
hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be
now used to quickly check both.

Differential Revision: https://reviews.llvm.org/D84522
2020-08-21 12:26:31 +02:00
Mirko Brkusanin 5bd1febe21 [AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores
Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.

Differential Revision: https://reviews.llvm.org/D82788
2020-08-21 12:26:31 +02:00
Jay Foad 98de0d22f5 [AMDGPU] Apply llvm-prefer-register-over-unsigned from clang-tidy 2020-08-21 10:14:35 +01:00
Michael Liao 5257a60ee0 [amdgpu] Add codegen support for HIP dynamic shared memory.
Summary:
- HIP uses an unsized extern array `extern __shared__ T s[]` to declare
  the dynamic shared memory, which size is not known at the
  compile time.

Reviewers: arsenm, yaxunl, kpyzhov, b-sumner

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82496
2020-08-20 21:29:18 -04:00
Matt Arsenault 79ce9bb380 CodeGen: Don't drop AA metadata when splitting MachineMemOperands
Assuming this is used to split a memory access into smaller pieces,
the new access should still have the same aliasing properties as the
original memory access. As far as I can tell, this wasn't
intentionally dropped. It may be necessary to drop this if you are
moving the operand outside of the bounds of the original object in
such a way that it may alias another IR object, but I don't think any
of the existing users are doing this. Some of the uses widen into
unused alignment padding, which I think is OK.
2020-08-20 16:17:30 -04:00
Matt Arsenault 18b218007d AMDGPU/GlobalISel: Legalize odd sized loads with widening
Custom lower and widen odd sized loads up to the alignment. The
default set of legalization actions doesn't have a way to represent
this. This fixes naturally aligned <3 x s8> and <3 x s16> loads.

This also starts moving towards eliminating the buggy and
overcomplicated legalization rules for narrowing. All the memory size
changes should be done in the lower or custom action, not NarrowScalar
/ FewerElements. These currently have redundant and ambiguous code
with the lower action.
2020-08-20 16:15:53 -04:00
vnalamot 54d8ded4b1 allSGPRSpillsAreDead() should use actual FP/BP frame indices
The SGPR spills happen in SILowerSGPRSpills() and allSGPRSpillsAreDead()
make sure there are no SGPR spills pending during PEI. But the FP/BP
spills happen during PEI and are exceptions.

Use actual frame indices of FP/BP in allSGPRSpillsAreDead() to
accommodate the exceptions.

Differential Revision: https://reviews.llvm.org/D86291
2020-08-20 16:15:53 -04:00
Jay Foad 3497860203 [AMDGPU] Remove uses of Register::isPhysicalRegister/isVirtualRegister
... in favour of the isPhysical/isVirtual methods.
2020-08-20 17:59:11 +01:00
Sebastian Neubauer b8d1994778 [AMDGPU] Add A16/G16 to InstCombine
When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target hardware supports it.

An alternative would be to always apply this combination, independent of
the target hardware and extend 16 bit arguments to 32 bit arguments
during legalization. To me, this sounds like an unnecessary roundtrip
that could prevent some further InstCombine optimizations.

Differential Revision: https://reviews.llvm.org/D85887
2020-08-20 10:51:49 +02:00
dfukalov 33e2f69a24 [AMDGPU][LoopUnroll] Increase BB size to analyze for complete unroll.
The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no
information about a loop body BB cost. In some cases, e.g. for simple loop
```
for(int i=0; i<32; ++i){
  D = Arr2[i*8 + C1];
  Arr1[i*64 + C2] += C3 * D;
  Arr1[i*64 + C2 + 2048] += C4 * D;
}

```
current default parameter value is not enough to run deeper cost analyze so the
loop is not completely unrolled.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D86248
2020-08-20 10:41:47 +03:00
Matt Arsenault 31adc28d24 GlobalISel: Implement fewerElementsVector for G_CONCAT_VECTORS sources
This fixes <6 x s16> = G_CONCAT_VECTORS from <3 x s16> handling.
2020-08-19 18:53:24 -04:00
Matt Arsenault 9e8d59a9b8 AMDGPU/GlobalISel: Remove hack for combines forming illegal extloads
Previously we weren't adding the LegalizerInfo to the post-legalizer
combiner. Since that's fixed, we don't need to try to filter out the
one case that was breaking.
2020-08-19 14:15:38 -04:00
Ronak Chauhan fdf71d486c Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors"
This reverts commit cacfb02d28.

Reverting due to buildbot failures.
2020-08-19 13:12:29 +05:30
Ronak Chauhan cacfb02d28 [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D80713
2020-08-19 08:49:07 +05:30
Changpeng Fang e7081d117a AMDGPU: Implement waterfall loop for MIMG instructions with 256-bit SRsrc
Summary:
  When the resource descriptor is of vgpr, we need a waterfall loop
to read into a sgpr. In this patchm we generalized the  implementation
to work for any regster class sizes, and extend the work to MIMG
instructions.

Fixes: SWDEV-223405

Reviewers:
  arsenm, nhaehnle

Differential Revision:
  https://reviews.llvm.org/D82603
2020-08-18 16:27:36 -07:00
Matt Arsenault 5a15f6628e GlobalISel: Implement fewerElementsVector for G_INSERT_VECTOR_ELT
Add unit tests since AMDGPU will only trigger this for gigantic
vectors, and won't use the annoying odd sized breakdown case.
2020-08-18 13:51:19 -04:00
Matt Arsenault 2f5f5febf3 AMDGPU/GlobalISel: Select llvm.amdgcn.groupstaticsize
Previously, it would successfully select and assert if not HSA or PAL
when expanding the pseudoinstruction. We don't need the
pseudoinstruction anymore since we know the total size after
legalization.
2020-08-18 09:28:01 -04:00
Matt Arsenault 3ba7777b94 AMDGPU/GlobalISel: Fix selection of s1/s16 G_[F]CONSTANT
The code to determine the value size was overcomplicated and only
correct in the case where the result register already had a register
class assigned. We can always take the size directly from the
register's type.
2020-08-18 09:28:01 -04:00
Matt Arsenault a9ee0589a8 AMDGPU/GlobalISel: Match global saddr addressing mode 2020-08-17 15:48:06 -04:00
Matt Arsenault e1a2f4713c AMDGPU: Match global saddr addressing mode
The previous implementation was incorrect, and based off incorrect
instruction definitions. Unfortunately we can't match natural
addressing in a lot of cases due to the shift/scale applied in
getelementptrs. This relies on reducing the 64-bit shift to 32-bits.
2020-08-17 15:28:14 -04:00
Stanislav Mekhanoshin 24182f14b6 [AMDGPU] Define spill opcodes for all AGPR sizes
Since we have defined all these sizes I believe we shall be
able to spill these as well.

Differential Revision: https://reviews.llvm.org/D86098
2020-08-17 12:17:23 -07:00
Matt Arsenault c8a9872259 AMDGPU/GlobalISel: Look through copies in getPtrBaseWithConstantOffset
We may have an SGPR->VGPR copy if a totally uniform pointer
calculation is used for a VGPR pointer operand.

Also hack around a bug in MUBUF matching which would incorrectly use
MUBUF for global when flat was requested. This should really be a
predicate on the parent pattern, but the DAG always checked this
manually inside the complex pattern.
2020-08-17 12:31:38 -04:00
Steven Perron eed6476a87 Reset PAL metadata when AMDGPU traget stream finishes
If the same stream object is used for multiple compiles, the PAL metadata from eariler compilations will leak into later one.  See https://github.com/GPUOpen-Drivers/llpc/issues/882 for how this is happening in LLPC.

No tests were added because multiple compiles will have to happen using the same pass manager, and I do not see a setup for that on the LLVM side.  Let me know if there is a good way to test this.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D85667
2020-08-17 10:56:11 -04:00
Matt Arsenault c7b9cd31bf AMDGPU/GlobalISel: Fix missing 256-bit AGPR mapping 2020-08-17 09:53:26 -04:00
Matt Arsenault af162ac785 AMDGPU/GlobalISel: Fix using readfirstlane with ballot intrinsics
This should use the default mapping and insert a copy to the vcc bank,
and not try to insert a readfirstlane.
2020-08-17 09:53:25 -04:00
Matt Arsenault da3f357de6 AMDGPU: Don't look at dbg users for foldable operands
These would have always failed to fold, so checking them or adding
them to the fold candidates is useless.
2020-08-17 09:53:25 -04:00
Matt Arsenault 66ffa0e91f AMDGPU/GlobalISel: Fix using post-legal combiner without LegalizerInfo 2020-08-17 09:19:22 -04:00
Matt Arsenault e0375dbcb3 AMDGPU: Fix using wrong offsets for global atomic fadd intrinsics
Global instructions have the signed offsets.
2020-08-17 09:19:15 -04:00
Matt Arsenault f0af434b79 AMDGPU: Remove register class params from flat memory patterns 2020-08-15 12:12:33 -04:00
Matt Arsenault a7455652c0 AMDGPU: Fix global atomic saddr operand class 2020-08-15 12:12:28 -04:00
Matt Arsenault 625db2fe5b AMDGPU: Remove slc from flat offset complex patterns
This was always set to 0. Use a default value of 0 in this context to
satisfy the instruction definition patterns. We can't unconditionally
use SLC with a default value of 0 due to limitations in TableGen's
handling of defaulted operands when followed by non-default operands.
2020-08-15 12:12:24 -04:00
Matt Arsenault e5077b5c2a AMDGPU: Fix matching wrong offsets for global atomic loads
These used signed offsets with a different size.
2020-08-15 12:12:17 -04:00
Matt Arsenault 8cb022982a AMDGPU: Remove redundant FLAT complex patterns
These were identical to the non-atomic cases. I'm not sure why these
were ever separated.
2020-08-15 12:12:01 -04:00
Matt Arsenault 47af1ac69a AMDGPU: Correct definitions for global saddr instructions
The VGPR component is a 32-bit offset, not 64-bits.

I'm not sure what the correct syntax is for this. This maintains the
vaddr position and leaves saddr in the end "off" position. This is
particularly terrible for stores, since the operand order is now <vgpr
offset>, <data>, <sgpr base>, splitting the pointer operands. I
suppose this is a logical consequence from the mistake of not putting
the data operand first. I'm not sure what sp3 does.
2020-08-15 12:11:57 -04:00
Matt Arsenault 79298a5067 AMDGPU: Remove SIFixupVectorISel pass
This was only used for matching the saddr addressing mode of global
instructions, but this was not implemented correctly. The instruction
definitions aren't even correct, and are defined as using a 64-bit
VGPR component. Eliminate this pass to enable correcting the
instruction definitions. A new matching implementation can work in
GlobalISel or relying on DAG divergence information for the base
address.
2020-08-15 12:11:51 -04:00
Stanislav Mekhanoshin 43a38dc251 [AMDGPU] Fix MAI ld/st hazard handling
It did not process hazard for ds_permute because it does not
load or store even though it is DS.

Differential Revision: https://reviews.llvm.org/D86003
2020-08-14 17:07:37 -07:00
Craig Topper c7a0b2684f [X86][MC][Target] Initial backend support a tune CPU to support -mtune
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.

This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.

One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.

I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.

Differential Revision: https://reviews.llvm.org/D85165
2020-08-14 15:31:50 -07:00
Matt Arsenault 40a142fa57 AMDGPU/GlobalISel: Match andn2/orn2 for more types
Unfortunately this ends up not working as expected on targets with
16-bit operations due to AMDGPUCodeGenPrepare's promotion of uniform
16-bit ops to i32.

The vector case annoyingly requires switching the checked opcode,
since constants for vectors aren't directly handled.

I also need to think more carefully about whether this is valid for i1.
2020-08-14 13:18:03 -04:00
Sebastian Neubauer 9aa0ff77bd [AMDGPU] Enable .rodata for amdpal os
PAL recently got support for multiple ELF sections and relocations,
therefore we can now use .rodata sections instead of forcing constants
into .text.

Differential Revision: https://reviews.llvm.org/D85895
2020-08-14 09:05:48 +02:00
Austin Kerbow 7d1cb187fb [AMDGPU] Fix FP/BP spills when MUBUF constant offset exceeded
If we need a scratch register for the spill don't use the same scratch
register that is being used for the MBUF offset.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D85772
2020-08-13 14:12:00 -07:00
Stanislav Mekhanoshin 0462aef5f3 [AMDGPU] Inhibit SDWA if target instruction has FI
Differential Revision: https://reviews.llvm.org/D85918
2020-08-13 11:34:28 -07:00
Stanislav Mekhanoshin d25cb5a8a2 [AMDGPU] Fix misleading SDWA verifier error. NFC.
The old error from GFX9 shall be updated to GFX9+.
2020-08-13 11:32:17 -07:00
Carl Ritson d538c5837a [AMDGPU] Fix missed SI_RETURN_TO_EPILOG in pre-emit peephole
SIPreEmitPeephole does not process all terminators, which means
it can fail to handle SI_RETURN_TO_EPILOG if immediately preceeded
by a branch to the early exit block.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D85872
2020-08-13 21:52:41 +09:00
Ruiling Song 18b1e67523 [AMDGPU] Fix crash when dag-combining bitcast
From the code after the 'break', they are processing 64bit scalar and
vector bitcast. So I think the break-condition should be (cond1 || cond2)
This means we only execute following code if (64bit and dest-is-vector).

Also remove a previous fix which is not needed with this new fix.
(introduced in: 1349a04ef5)

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D85804
2020-08-13 10:23:13 +08:00
Matt Arsenault e14474a39a AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.fadd
Remove the intermediate transform in the DAG path. I believe this is
the last non-deprecated intrinsic that needs handling.
2020-08-12 10:04:53 -04:00
Matt Arsenault 701228c411 AMDGPU: Handle intrinsics in performMemSDNodeCombine
This avoids a possible regression in a future patch
2020-08-12 10:04:53 -04:00
Matt Arsenault 0dc4c36d3a AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane
Fixup the special case constant bus handling pre-gfx10.
2020-08-11 11:56:16 -04:00
Matt Arsenault 076305568c AMDGPU/GlobalISel: Prepare for more custom load lowerings
Slight restructuring of the code to avoid formatting changes when more
cases are handled here.
2020-08-11 11:09:05 -04:00
Matt Arsenault e2f1b48f86 GlobalISel: Implement bitcast action for G_INSERT_VECTOR_ELT
This mirrors the support for the equivalent extracts. This also
creates a huge mess that would be greatly improved if we had any bit
operation combines.
2020-08-11 10:39:14 -04:00
Matt Arsenault 53f21e0fb7 TableGen/GlobalISel: Hack the operand order for atomic_store
ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order
from regular ISD::STORE, which always introduced an annoying
duplication of patterns to handle both cases. Since in GlobalISel
there's just the one G_STORE, we need to swap the operands to
correctly emit the type check for the pointer operand.

Some work started in 20aafa3156 to
migrate SelectionDAG to use ISD::STORE for atomics, but that work
seems to have stalled. Since this is the pretty much the last
operation which matters which isn't supported for AMDGPU, use this
compatibility hack to unblock declaring it functionally complete.

Not sure what's going on with the pending_phis AArch64 test. It seems
it didn't always use atomics, and I'm not sure what it was originally
testing matters anymore.
2020-08-11 10:22:44 -04:00
Kerry McLaughlin 85c7e89f3b [CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85220
2020-08-11 12:17:10 +01:00
Matt Arsenault 6fe6b29c29 AMDGPU: Fix assertion in performSHLPtrCombine for 64-bit pointers 2020-08-10 13:46:52 -04:00
Matt Arsenault 68fab44acf AMDGPU: Fix visiting physreg dest users when folding immediate copies
This can fold the immediate into the physical destination, but this
should not look for further users of the register. Fixes regression
introduced by 766cb615a3.
2020-08-10 13:46:51 -04:00
Matt Arsenault 40188f807d AMDGPU/GlobalISel: Don't try to handle undef source operand
This is now illegal MIR
2020-08-10 08:49:43 -04:00
Matt Arsenault a0ec81f70d AMDGPU/GlobalISel: Merge load/store select cases 2020-08-10 08:46:26 -04:00
Matt Arsenault c8b17874e5 AMDGPU/GlobalISel: Fix typo 2020-08-10 08:41:17 -04:00
Matt Arsenault 9533f0ea68 AMDGPU/GlobalISel: Use nicer form of buildInstr 2020-08-10 08:41:07 -04:00
Petar Avramovic 0d58d9e8fb AMDGPU/GlobalISel: Lower G_FREM
Add custom lower for G_FREM.

Differential Revision: https://reviews.llvm.org/D84324
2020-08-10 10:10:46 +02:00
Piotr Sobczak 62d8b8a225 Fix 64-bit copy to SCC
Fix 64-bit copy to SCC by restricting the pattern resulting
in such a copy to subtargets supporting 64-bit scalar compare,
and mapping the copy to S_CMP_LG_U64.

Before introducing the S_CSELECT pattern with explicit SCC
(0045786f14), there was no need
for handling 64-bit copy to SCC ($scc = COPY sreg_64).

The proposed handling to read only the low bits was however
based on a false premise that it is only one bit that matters,
while in fact the copy source might be a vector of booleans and
all bits need to be considered.

The practical problem of mapping the 64-bit copy to SCC is that
the natural instruction to use (S_CMP_LG_U64) is not available
on old hardware. Fix it by restricting the problematic pattern
to subtargets supporting the instruction (hasScalarCompareEq64).

Differential Revision: https://reviews.llvm.org/D85207
2020-08-09 20:50:30 +02:00
Matt Arsenault 3c0597a9e4 AMDGPU: Avoid explicitly listing all the memory nodes 2020-08-07 19:22:46 -04:00
Vang Thao 04bd5b5286 [AMDGPU] Fix not rescheduling without clustering
Regions are sometimes skipped which should be rescheduled without memory op
clustering. RegionIdx is not incremented when iterating over regions that
are flagged to be skipped, causing the index to be incorrect.

Thanks to Vang Thao for discovering this bug!

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D85498
2020-08-07 11:15:58 -07:00
Bevin Hansson 5de6c56f7e [Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics.
Summary:
This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat,
which perform signed and unsigned saturating left shift,
respectively.

These are useful for implementing the Embedded-C fixed point
support in Clang, originally discussed in
http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html
and
http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html

Reviewers: leonardchan, craig.topper, bjope, jdoerfert

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83216
2020-08-07 15:09:24 +02:00
Matt Arsenault 87b2af8140 AMDGPU/GlobalISel: Enable s_{and|or}n2_{b32|b64} patterns 2020-08-06 18:00:38 -04:00
dfukalov 4ccc38813e [AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation.
Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model.
Also added operations with contract attribute.

Fixed line endings in test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D84995
2020-08-06 21:43:27 +03:00
Matt Arsenault e00201539f GlobalISel: Implement fewerElementsVector for G_EXTRACT_VECTOR_ELT
Use the same basic strategy as LegalizeVectorTypes. Try to index into
smaller pieces if there's a constant index, and otherwise fall back to
a stack temporary.
2020-08-06 14:33:16 -04:00
Matt Arsenault 1a0c0944c6 AMDGPU: Define raw/struct variants of buffer atomic fadd
Somehow the new FP atomic buffer intrinsics ended up using the legacy
style for buffer intrinsics.
2020-08-06 13:36:19 -04:00
Matt Arsenault 90eb7d5283 AMDGPU: Fix spilling of 96-bit AGPRs 2020-08-06 12:42:07 -04:00
Matt Arsenault 56270d1d42 AMDGPU/GlobalISel: Start trying to handle AGPR bank
Try to use AGPR banks for the various merge/unmerge type
operations. Previously these would introduce copies to VGPR.
2020-08-06 12:39:50 -04:00
Matt Arsenault 34040a4f61 GlobalISel: Define InvalidRegBankID enum value 2020-08-06 12:39:49 -04:00
Matt Arsenault 63cdc9a49f AMDGPU/GlobalISel: Handle llvm.amdgcn.ds.{fadd|fmin|fmax}
These intrinsics are missing mangling for both the pointer and data
type.
2020-08-06 11:09:08 -04:00
Matt Arsenault 63c4be53cf AMDGPU/GlobalISel: Try to promote to use packed saturating add/sub
This produces worse results right now for i8 vectors, but that should
be addressed when we actually try to optimize packed vectors.
2020-08-06 11:08:45 -04:00
Matt Arsenault dcf3ffb0a8 AMDGPU/GlobalISel: Move frame index selection to patterns
Doesn't really save any code until global value is handled too.
2020-08-06 10:42:15 -04:00
Matt Arsenault d188a608bd AMDGPU: Fix code duplication between the selectors
Not sure this is the right place for this helper.
2020-08-06 10:42:15 -04:00
Matt Arsenault 5a503521e7 AMDGPU/GlobalISel: Implement expansion for rsq.clamp
Not sure why we handle this removed instruction on newer subtargets
for this one and no others, but maintain compatibility with the DAG.
2020-08-06 10:23:25 -04:00
Matt Arsenault c015cbc68b AMDGPU/GlobalISel: Fix trying to widen <3 x s1> boolean ops 2020-08-06 10:07:22 -04:00
Matt Arsenault 28124a0a63 AMDGPU/GlobalISel: Stop using G_EXTRACT in argument lowering
We really need to put this undef padding stuff into a helper
somewhere, but leave that for when this is moved to generic code.
2020-08-06 09:55:35 -04:00
Matt Arsenault 6c7f640bf7 AMDGPU/GlobalISel: Implement LLT version of allowsMisalignedMemoryAccesses 2020-08-06 09:50:36 -04:00
Matt Arsenault 37894ba661 AMDGPU/GlobalISel: Make s16 phi legal
If we were to have an operation with an s16 def that needs to be
executed in a waterfall loop, not having s16 legal would place an
avoidable burden on RegBankSelect to widen it.
2020-08-06 09:41:14 -04:00
Matt Arsenault 5316256709 AMDGPU/GlobalISel: Fix assert on copy to vcc
This was trying to constrain a physical register. By the verifier's
understanding, it's impossible to have a 1-bit copy to vcc/vcc_lo so
don't try to handle physregs.
2020-08-06 09:41:14 -04:00
Matt Arsenault 0ee1eba581 AMDGPU: Remove ATOMIC_PK_FADD
The f32 and v2f16 cases should be handled the same way.
2020-08-05 22:00:52 -04:00
Ruiling Song 5ddc8b49ba [AMDGPU] add buffer_atomic_swap for float
The functionality is used when calling imageAtomicExhange() on float
type imageBuffer in Graphics shaders.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D85187
2020-08-06 09:45:48 +08:00
Stanislav Mekhanoshin 0bcda1a261 [AMDGPU] Scavenge temp reg for AGPR spill
Differential Revision: https://reviews.llvm.org/D85234
2020-08-05 13:29:19 -07:00
Matt Arsenault ec8c172d01 AMDGPU: Correct prolog SP initialization logic
Having callees that will read SP is not the only reason we need to
reference the stack pointer.
2020-08-05 15:47:53 -04:00
Stanislav Mekhanoshin ea7d0e2996 [AMDGPU] gfx1031 target
Differential Revision: https://reviews.llvm.org/D85337
2020-08-05 12:36:26 -07:00
Matt Arsenault 83eaf5d55d AMDGPU: Eliminate BUFFER_ATOMIC_PK_ADD_F16 node
This is redundant with the other no return buffer atomic node, and we
don't really need a separate type profile for it.
2020-08-05 15:16:51 -04:00
Matt Arsenault 43c0c9252a AMDGPU: Refactor buffer atomic intrinsic lowering
Move raw/struct buffer atomic lowering to separate functions. This
avoids a long nested switch, and simplifies a future patch.
2020-08-05 14:44:55 -04:00
Matt Arsenault 3e52667433 AMDGPU: Fix verifier error with undef source producing s_bitset*
This needs to preserve the undef flag.
2020-08-05 14:42:20 -04:00
Jay Foad 8cbf4a17ac [AMDGPU] Propagate fast math flags in frem lowering
Differential Revision: https://reviews.llvm.org/D84518
2020-08-05 09:09:38 +01:00
Jay Foad 04cf4a5a65 [AMDGPU] Lower frem f16
Without this it would fail to select on subtargets that have 16-bit
instructions.

Differential Revision: https://reviews.llvm.org/D84517
2020-08-05 09:08:40 +01:00
Matt Arsenault 486e84dfa4 AMDGPU/GlobalISel: Use live in helper function for returnaddress 2020-08-04 17:36:01 -04:00
Matt Arsenault 89011fc3c9 AMDGPU/GlobalISel: Select llvm.returnaddress 2020-08-04 17:14:38 -04:00
Matt Arsenault f8fb7835d6 GlobalISel: Add utilty for getting function argument live ins
Get the argument register and ensure there's a copy to the virtual
register. AMDGPU and AArch64 have similarish code to get the livein
value, and I also want to use this in multiple places.

This is a bit more aggressive about setting the register class than
the original function, but that's probably OK.

I think we're missing a few verifier checks for function live ins. I
noticed AArch64's calling convention code is not actually adding
liveins to functions, only the entry block (which apparently might not
matter that much?). There should probably be a verifier check that
entry block live ins are also live into the function. We also might
need a verifier check that the copy to the livein virtual register is
in the entry block.
2020-08-04 16:55:55 -04:00
Matt Arsenault 0de547ed4a AMDGPU/GlobalISel: Ensure subreg is valid when selecting G_UNMERGE_VALUES
Fixes verifier error with SGPR unmerges with 96-bit result types.
2020-08-04 12:27:34 -04:00
Jay Foad 8ec8ad868d [AMDGPU] Use fma for lowering frem
This gives shorter f64 code and perhaps better accuracy.

Differential Revision: https://reviews.llvm.org/D84516
2020-08-04 16:18:23 +01:00
Carl Ritson 57899934ea [AMDGPU] Make GCNRegBankReassign assign based on subreg banks
When scavenging consider the sub-register of the source operand
to determine the bank of a candidate register (not just sub0).
Without this it is possible to introduce an infinite loop,
e.g. $sgpr15_sgpr16_sgpr17 can be assigned for a conflict between
$sgpr0 and SGPR_96:sub1.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D84910
2020-08-04 12:54:44 +09:00
Christopher Tetreault 3b92db4c84 [SVE] Remove bad call to VectorType::getNumElements() from AMDGPU
Differential Revision: https://reviews.llvm.org/D85151
2020-08-03 15:56:10 -07:00
Cameron McInally 31c7a2fd5c [FPEnv] Don't transform FSUB(-0,X)->FNEG(X) in SelectionDAGBuilder.
This patch stops unconditionally transforming FSUB(-0,X) into an FNEG(X) while building the DAG. There is also one small change to handle the new FSUB(-0,X) similarly to FNEG(X) in the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D84056
2020-08-03 10:22:25 -05:00
Matt Arsenault 2414bab5d7 AMDGPU/GlobalISel: Remove old hacks for boolean selection
There were various hacks used to try to avoid making s1 SGPR vs. s1
VCC ambiguous after constraining the register before we had a strategy
to deal with this. This also attempted to handle undef operands, which
are now illegal gMIR.
2020-08-03 09:04:14 -04:00
Matt Arsenault fd63e46941 AMDGPU/GlobalISel: Apply load bitcast to s.buffer.load intrinsic
Should also apply this to the non-scalar buffer loads.
2020-08-03 08:54:29 -04:00
Matt Arsenault d8ef1d1251 AMDGPU/GlobalISel: Fix selecting broken copies for s32->s64 anyext
These should probably not be legal in the first place, but that might
also be a pain.
2020-08-03 08:36:41 -04:00
Matt Arsenault 212570abcf GlobalISel: Implement bitcast action for G_EXTRACT_VECTOR_ELEMENT
For AMDGPU, vectors with elements < 32 bits should be indexed in
32-bit elements and the desired bits extracted from there. For
elements > 64-bits, these should be reduce to 64/32 elements to enable
the normal dynamic indexing paths.

In the dynamic index cases, this produces shorter code most of the
time. This does immediately regress the constant index cases, but this
should be fixed once we have the most basic of shift combines.

The element size > 64 case is pretty much ported from the exisiting
DAG implementation for extract element promote. The increasing element
size case is new.
2020-08-02 10:42:07 -04:00
Benjamin Kramer c6f08b14d4 Hide some internal symbols. NFC. 2020-07-31 17:28:02 +02:00
Matt Arsenault 57bd64ff84 Support addrspacecast initializers with isNoopAddrSpaceCast
Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs
with the DataLayout.
2020-07-31 10:42:43 -04:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
Matt Arsenault e56e9022bc AMDGPU: Fix liveness errors when copying AGPR tuples
Avoid recursively calling copyPhysReg for AGPR handling. This was
dropping the necessary super register implicit defs to avoid liveness
verifier errors.
2020-07-30 18:13:04 -04:00
Changpeng Fang 243376cdc7 AMDGPU: Put inexpensive ops first in AMDGPUAnnotateUniformValues::visitLoadInst
Summary:
  This is in response to the review of https://reviews.llvm.org/D84873:
The expensive check should be reordered last

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D84890
2020-07-30 14:37:06 -07:00
Stanislav Mekhanoshin 5b32518f96 [AMDGPU] Do not use undef on indirect source
We are using undef on the indirect move source subreg and then
using implicit super-reg. This creates a problem in RA when
Greedy decides to split the register. It reassigns the implicit
super-reg but does not bother to change undef source because
it is really does not matter. The fix is to stop lying to RA and
drop undef flag.

This has also hit a problem in SIFoldOperands as it can fold
immediate into an indirect move since there is no undef flag
anymore. That results in multiple test failures, so added the
check for this case.

Differential Revision: https://reviews.llvm.org/D84899
2020-07-30 10:41:59 -07:00
hsmahesha 33fd4a18e7 [AMDGPU/MemOpsCluster] Clean-up fixme's around mem ops clustering logic
Get rid of all fixmes and base heuristic on `num-clustered-dwords`. The main intuition behind this is as
follows. The existing heuristic roughly summarizes as below:

* Assume, all the mem ops instructions participating in the clustering process,  loads/stores same num bytes
* If num bytes loaded by each mem op is 4 bytes, then cluster at max 5 mem ops, that is at max 20 bytes
* If num bytes loaded by each mem op is 8 bytes, then cluster at max 3 mem ops, that is at max 24 bytes
* If num bytes loaded by each mem op is 16 bytes, then cluster at max 2 mem ops, that is at max 32 bytes

So, we need to make sure that the new heuristic do not completey deviate away from the above one, and it
properly handles both the sub-word loads and the wide loads.

Reviewed By: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D84354
2020-07-30 21:41:13 +05:30
Matt Arsenault 0da582d9b6 GlobalISel: Handle llvm.roundeven
I still think it's highly questionable that we have two intrinsics
with identical behavior and only vary by the name of the libcall used
if it happens to be lowered that way, but try to reduce the feature
delta between SDAG and GlobalISel for recently added intrinsics. I'm
not sure which opcode should be considered the canonical one, but
lower roundeven back to round.
2020-07-29 20:01:12 -04:00
Stanislav Mekhanoshin decfdb8ce3 [AMDGPU] Fixed formatting in GCNHazardRecognizer.cpp. NFC. 2020-07-29 12:21:28 -07:00
Stanislav Mekhanoshin 13b63be472 [AMDGPU] prefer non-mfma in post-RA schedule
MFMA instructions shall not be scheduled back to back
to avoid MAI SIMD stall. Tell post-RA schedule we would
prefer some other instruction instead.

Differential Revision: https://reviews.llvm.org/D84883
2020-07-29 12:17:50 -07:00
Matt Arsenault 59fac51ff2 AMDGPU/GlobalISel: Handle llvm.amdgcn.reloc.constant 2020-07-29 14:24:21 -04:00
Matt Arsenault 0b7de7966f GlobalISel: Implement lower for G_EXTRACT_VECTOR_ELT
Use the basic store to stack and reload.
2020-07-29 14:16:28 -04:00
Matt Arsenault 766cb615a3 AMDGPU: Relax restriction on folding immediates into physregs
I never completed the work on the patches referenced by
f8bf7d7f42, but this was intended to
avoid folding immediate writes into m0 which the coalescer doesn't
understand very well. Relax this to allow simple SGPR immediates to
fold directly into VGPR copies. This pattern shows up routinely in
current GlobalISel code since nothing is smart enough to emit VGPR
constants yet.
2020-07-29 14:01:53 -04:00
Matt Arsenault d42c7b2211 AMDGPU: Account for the size of LDS globals used through constant
expressions.

Also "fix" the longstanding bug where the computed size depends on the
order of the visitation. We could try to predict the allocation order
used by legalization, but it would never be 100% perfect. Until we
start fixing the addresses somehow (or have a more reliable allocation
scheme later), just try to compute the size based on the worst case
padding.
2020-07-29 11:40:42 -04:00
Matt Arsenault 200bb5191a AMDGPU/GlobalISel: Refactor special argument management 2020-07-29 08:27:31 -04:00
Matt Arsenault c230965ccf AMDGPU: Make saturating add/sub legal for DAG path 2020-07-29 08:27:31 -04:00
Matt Arsenault cdd45d5f9c AMDGPU/GlobalISel: Select llvm.amdgcn.global.atomic.csub
Remove the custom node boilerplate. Not sure why this tried to handle
the LDS atomic stuff.
2020-07-29 08:27:31 -04:00
Matt Arsenault 44211f20a8 AMDGPU: Optimize copies to exec with other insts after exec def
It's possible to have terminator instructions after a write to exec,
so skip over them to find it.
2020-07-28 21:34:50 -04:00
Matt Arsenault b6ebc77326 AMDGPU/GlobalISel: Fix selecting llvm.amdgcn.s.getreg
This introduces the same bug llvm.amdgcn.s.setreg has where if the
user specified an immediate outside of the valid 16-bit range, it will
select into a verifier error.
2020-07-28 21:34:50 -04:00
Matt Arsenault 6a7b6dd54b AMDGPU: Don't assert in canInsertSelect
Currently GlobalISel doesn't force all VGPR phi operands to VGPRs, so
this hit a case where it was queried with a VGPR and SGPR. This could
arguably be a verifier error, but it's currently not.
2020-07-28 21:01:06 -04:00
Matt Arsenault 068808d102 AMDGPU: Don't assume call targets are registers
GlobalISel let through a call to null, which would then fold into the
source operand like any other inline immediate. The SelectionDAG
lowering deletes calls to null and undef as a workaround from before
calls were supported. We should probably drop the special handling
case in the DAG lowering now, since the middle end optimizers delete
null calls anyway.
2020-07-28 20:46:06 -04:00
Matt Arsenault 8860daf0ed AMDGPU: Handle a few missing cases in getAddrModeArguments 2020-07-28 20:22:38 -04:00
Matt Arsenault b3e63aa8a4 AMDGPU: Don't assume there is only one terminator copy
This would stop on the first in reverse order, failing the verifier if
there were more earlier in the block.
2020-07-28 20:22:38 -04:00
Matt Arsenault 592f2e8d1c AMDGPU: Fix verifier error on spilling partially defined SGPRs
This needs an implicit def of the super-register in case one of the
lanes isn't defined, similar to copyPhysReg (or the not-VGPR spill
case below). This showed up in GlobalISel testing since it currently
doesn't fold out many undef instructions.
2020-07-28 20:01:57 -04:00
Matt Arsenault 66d60e06cb AMDGPU: Serialize MFI spill fields
These should probably be inferred from the function on parse, but the
target specific infrastructure currently does not give you a way to do
this. SILowerSGPRSpills early exits without this reporting spills,
which makes it difficult to write a MIR test for.
2020-07-28 20:01:57 -04:00
Matt Arsenault e87356b498 GlobalISel: Don't assert on operations with no type indices
Fix not marking G_FENCE as legal on AMDGPU This was apparently
defaulting to legal using the "legacy" rules, whatever those are.
2020-07-28 16:49:55 -04:00
Matt Arsenault 5174e7b443 GlobalISel: Add typeIsNot LegalityPredicate
This allows sorting the legal/custom rules first as is recommended
2020-07-28 16:49:55 -04:00
Matt Arsenault 9731ef3ec5 AMDGPU/GlobalISel: Add SReg_96 to SGPRRegBank 2020-07-28 16:49:55 -04:00
Matt Arsenault e9b236f411 AMDGPU: Check for other defs when folding conditions into s_andn2_b64
We can't fold the masked compare value through the select if the
select condition is re-defed after the and instruction. Fixes a
verifier error and trying to use the outgoing value defined in the
block.

I'm not sure why this pass is bothering to handle physregs. It's
making this more complex and forces extra liveness computation.
2020-07-28 16:36:23 -04:00
Austin Kerbow adeeac9d5a [AMDGPU] Spill CSR VGPR which is reserved for SGPR spills
Update logic for reserving VGPR for SGPR spills. A CSR VGPR being reserved for
SGPR spills could be clobbered if there were no free lower VGPR's available.
Create a stack object so that it will be spilled in the prologue. Also
adds more tests.

Differential Revision: https://reviews.llvm.org/D83730
2020-07-28 11:53:02 -07:00
Matt Arsenault 16bcd54570 AMDGPU/GlobalISel: Mark GlobalISel classes as final 2020-07-28 11:42:17 -04:00
Matt Arsenault bb23b5cfe0 AMDGPU/GlobalISel: Merge identical select cases 2020-07-28 11:42:17 -04:00
Matt Arsenault a4edc04693 AMDGPU/GlobalISel: Use clamp modifier for [us]addsat/[us]subsat
We also have never handled this for SelectionDAG, which needs
additional work.
2020-07-28 11:18:05 -04:00
Matt Arsenault ce944af33c AMDGPU/GlobalISel: Mark G_ATOMICRMW_{NAND|FSUB} as lower
These aren't implemented and we're still relying on the AtomicExpand
pass, but mark these as lower to eliminate a few of the few remaining
no rules defined cases.
2020-07-27 18:47:40 -04:00
Matt Arsenault 8b81d0633f AMDGPU: global_atomic_csub is not always dereferenceable 2020-07-27 18:47:39 -04:00
Kazu Hirata 902cbcd59e Use llvm::is_contained where appropriate (NFC)
Summary:
This patch replaces std::find with llvm::is_contained where
appropriate.

Reviewers: efriedma, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84489
2020-07-27 10:20:44 -07:00
Piotr Sobczak 590dd73c6e [AMDGPU] Make generating cache invalidating instructions optional
Summary:
D78800 skipped generating cache invalidating instrucions altogether
on AMDPAL. However, this is sometimes too restrictive - we want a
more flexible option to be able to toggle this behaviour on and off
while we work towards developing a correct implementation of the
alternative memory model.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, dexonsmith, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84448
2020-07-27 09:24:11 +02:00
Matt Arsenault e97aa5609f AMDGPU/GlobalISel: Don't assert in LegalizerInfo constructor
We don't really need these asserts. The LegalizerInfo is also
overly-aggressivly constructed, even when not in use. It needs to not
assert on dummy targets that have manually specified, unrelated
features.
2020-07-26 23:01:28 -04:00
Matt Arsenault d35e2c101d AMDGPU/GlobalISel: Fix not constraining ds_append/consume operands 2020-07-26 10:17:36 -04:00
Matt Arsenault f6176f8a5f GlobalISel: Handle G_PTR_ADD in narrowScalar 2020-07-26 10:08:17 -04:00
Matt Arsenault 7c09c173a2 AMDGPU/GlobalISel: Reorder G_CONSTANT legality rules
The legal cases should be the first rules.
2020-07-26 10:05:05 -04:00
Matt Arsenault bcf5184a68 AMDGPU/GlobalISel: Make sure <2 x s1> phis are scalarized 2020-07-26 10:04:47 -04:00
Matt Arsenault 6f961a1e7e AMDGPU/GlobalISel: Legalize GDS atomics
I noticed these don't use the _gfx9, non-m0 reading variants but not
sure if that's a bug or not. It's the same in the DAG.
2020-07-26 10:03:34 -04:00
Matt Arsenault 5819159995 AMDGPU/GlobalISel: Pack constant G_BUILD_VECTOR_TRUNCs when selecting 2020-07-26 09:55:34 -04:00
Matt Arsenault 4033aa1467 AMDGPU/GlobalISel: Sign extend integer constants
This matches the DAG behavior and fixes immediate folding
2020-07-26 09:30:14 -04:00
Matt Arsenault 392b969c32 AMDGPU/GlobalISel: Don't assert on G_INSERT > 128-bits
Just fallback for now. Really tablegen needs to generate all of the
subregister index handling we need.
2020-07-25 10:05:44 -04:00
Matt Arsenault 2bd72abef0 AMDGPU: Skip other terminators before inserting s_cbranch_exec[n]z
PHIElimination/createPHISourceCopy inserts non-branch terminators
after the control flow pseudo if a successor phi reads a register
defined by the control flow pseudo. If this happens, we need to split
the expansion of the control flow pseudo to ensure all the branches
are after all of the other mask management instructions.

GlobalISel hit this in testscases that happened to be tail
duplicated. The original testcase still does not work, since the same
problem appears to be present in a later pass.
2020-07-24 16:51:59 -04:00
madhur13490 4a577c3a22 [AMDGPU] Fix incorrect arch assert while setting up FlatScratchInit
Reviewers: arsenm, foad, rampitec, scott.linder

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84391
2020-07-24 18:19:04 +00:00
Dmitry Preobrazhensky 6b8948922c [AMDGPU][MC] Added support of SP3 syntax for MTBUF format modifier
Currently supported LLVM MTBUF syntax is shown below. It is not compatible with SP3.

    op     dst, addr, rsrc, FORMAT, soffset

This change adds support for SP3 syntax:

    op     dst, addr, rsrc, soffset SP3FORMAT

In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants.

format:<expression>
format:[<numeric-format-name>,<data-format-name>]
format:[<data-format-name>,<numeric-format-name>]
format:[<data-format-name>]
format:[<numeric-format-name>]
format:[<unified-format-name>]

The last syntax variant is supported for GFX10 only.

See llvm bug 37738

Reviewers: arsenm, rampitec, vpykhtin

Differential Revision: https://reviews.llvm.org/D84026
2020-07-24 16:41:03 +03:00
Petar Avramovic 47bd41d099 AMDGPU/GlobalISel: Select set.inactive intrinsic
Differential Revision: https://reviews.llvm.org/D84407
2020-07-24 10:14:14 +02:00
Matt Arsenault 891759db73 GlobalISel: Add scalarSameSizeAs LegalizeRule
Widen or narrow a type to a type with the same scalar size as
another. This can be used to force G_PTR_ADD/G_PTRMASK's scalar
operand to match the bitwidth of the pointer type. Use this to
disallow narrower types for G_PTRMASK.
2020-07-23 21:17:31 -04:00
Matt Arsenault b9c644ec61 AMDGPU: Fix failures from overflowing uint8_t number of operands
If the operand index exceeded the limit of unsigned char, it wrapped
and would point to the wrong operand. Increase the size of the operand
index field to avoid this, and also don't bother trying to fold into
implicit operands.
2020-07-23 15:39:33 -04:00
Matt Arsenault d2b8fcff34 AMDGPU/GlobalISel: Handle call return values
The only case that I know doesn't work is the implicit sret case when
the return type doesn't fit in the return registers.
2020-07-23 14:29:35 -04:00
Sebastian Neubauer 896679733d [AMDGPU] Fix typo. NFC 2020-07-23 17:01:12 +02:00
Jay Foad b35833b84e [GlobalISel][AMDGPU] Legalize saturating add/subtract
Add support in LegalizerHelper for lowering G_SADDSAT etc. either
using add/subtract-with-overflow or using max/min instructions.

Enable this lowering for AMDGPU so it can be tested. The legalization
rules are still approximate and skips out on using the clamp bit to
treat these as legal, which has never been used before. This also
doesn't yet try to deal with expanding SALU cases.
2020-07-23 09:06:42 -04:00
Matt Arsenault 0c92bfa4b8 GlobalISel: Don't use virtual for distinguishing arg handlers
There's no reason to involve the hassle of a virtual method targets
have to override for a simple boolean.

Not sure exactly what's going on with Mips, but it seems to define its
own totally separate handler classes.
2020-07-22 14:14:43 -04:00
Matt Arsenault 6f437117af AMDGPU: Don't assert on f16 inv2pi immediates pre-gfx8
v_cvt_f32_f16 can still accept this value as a literal constant. This
showed up in GlobalISel since it doesn't have constant folding for
G_FPEXT.
2020-07-22 13:59:03 -04:00
Matt Arsenault b98f902f18 GlobalISel: Restructure argument lowering loop in handleAssignments
This was structured in a way that implied every split argument is in
memory, or in registers. It is possible to pass an original argument
partially in registers, and partially in memory. Transpose the logic
here to only consider a single piece at a time. Every individual
CCValAssign should be treated independently, and any merge to original
value needs to be handled later.

This is in preparation for merging some preprocessing hacks in the
AMDGPU calling convention lowering into the generic code.

I'm also not sure what the correct behavior for memlocs where the
promoted size is larger than the original value. I've opted to clamp
the memory access size to not exceed the value register to avoid the
explicit trunc/extend/vector widen/vector extract instruction. This
happens for AMDGPU for i8 arguments that end up stack passed, which
are promoted to i16 (I think this is a preexisting DAG bug though, and
they should not really be promoted when in memory).
2020-07-22 13:31:11 -04:00
Matt Arsenault 1fd1beea18 AMDGPU/GlobalISel: Fix translation of indirect calls 2020-07-22 13:13:21 -04:00
Dmitry Preobrazhensky 0b8fd77ad9 [AMDGPU][MC] Corrected decoding of 16-bit literals
16-bit literals are encoded as 32-bit values. If high 16-bits of the value is 0xFFFF, the decoded instruction cannot be reassembled.

For example, the following code

0xff,0x04,0x04,0x52,0xcd,0xab,0xff,0xff

was decoded as

v_mul_lo_u16_e32 v2, 0xffffabcd, v2

However this literal is actually a 64-bit constant 0x00000000ffffabcd which violates requirements described in the documentation - the truncation is not safe.

This change corrects decoding to make reassembly possible.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D84098
2020-07-22 17:20:43 +03:00
Sebastian Neubauer 2a6c871596 [InstCombine] Move target-specific inst combining
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.

D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).

This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic

A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.

This allows to move about 3000 lines out from InstCombine to the targets.

Differential Revision: https://reviews.llvm.org/D81728
2020-07-22 15:59:49 +02:00
Petar Avramovic 44967fc604 AMDGPU: Simplify f16 to i64 custom lowering
Range that f16 can represent fits into i32.
Lower as f16->i32->i64 instead of f16->f32->i64
since f32->i64 has long expansion.

Differential Revision: https://reviews.llvm.org/D84166
2020-07-22 10:32:14 +02:00
Matt Arsenault b258920095 AMDGPU/GlobalISel: Fix not erasing inst when lowering G_FRINT 2020-07-21 18:29:41 -04:00
Matt Arsenault 7cd8a0256d GlobalISel: Legalize G_FPOWI 2020-07-21 18:13:04 -04:00
Matt Arsenault 1168119c2f AMDGPU: Start interpreting byref on kernel arguments
These are treated identically to value aggregates placed in the kernel
argument list. A %struct.foo or %struct.foo addrspace(4)*
byref(sizeof(%struct.foo)) align(alignof(%struct.foo)) argument should
produce the same offsets and argument metadata.

This handles all 3 kernel ABI implementations, and the two HSA
metadata emission paths.
2020-07-21 18:11:22 -04:00
Matt Arsenault 2fe0ea8261 DAG: Handle expanding strict_fsub into fneg and strict_fadd
The AMDGPU handling of f16 vectors is terrible still since it gets
scalarized even when the vector operation is legal.

The code is is essentially duplicated between the non-strict and
strict case. Apparently no other expansions are currently trying to do
this. This is mostly because I found the behavior of
getStrictFPOperationAction to be confusing. In the ARM case, it would
expand strict_fsub even though it shouldn't due to the later check. At
that point, the logic required to check for legality was more complex
than just duplicating the 2 instruction expansion.
2020-07-21 16:17:10 -04:00
Matt Arsenault 107c954c13 AMDGPU/GlobalISel: Remove unnecessary parameter 2020-07-20 20:53:01 -04:00
Matt Arsenault ce76d15a70 AMDGPU: Use MCRegister for preloaded arguments
Attempt to fix build error with ancient GCC
2020-07-20 13:34:28 -04:00
Matt Arsenault 21ef01b7e3 AMDGPU: Remove outdated fixme 2020-07-20 11:41:41 -04:00
Matt Arsenault 84704d989b AMDGPU: Fix not accounting for constantexpr uses of LDS globals
This was failing to add the size of LDS globals that weren't directly
used by an instruction. They could be used by constant expressions
which are transitively used by the function. This requires a better
search, but just abort on this for now for correctness.
2020-07-20 11:41:41 -04:00
Matt Arsenault 61f1f2a204 AMDGPU/GlobalISel: Initial Implementation of calls
Return values, and tail calls are not yet handled.
2020-07-20 11:13:22 -04:00
Matt Arsenault ad8e900cb3 Verifier: Disallow byval and similar for AMDGPU calling conventions
These imply stack-like semantics, which doesn't make any sense for
entry points.
2020-07-20 10:58:57 -04:00
Petar Avramovic 6a1030aa0e AMDGPU/GlobalISel: Legalize s16->s64 G_FPEXT
Legalize using narrowScalar as s16->s32 G_FPEXT
followed by s32->s64 G_FPEXT.

Differential Revision: https://reviews.llvm.org/D84030
2020-07-20 16:12:19 +02:00
Matt Arsenault 100564bdf8 AMDGPU/GlobalISel: Remove outdated comment 2020-07-20 10:06:18 -04:00
Matt Arsenault 93311a9812 AMDGPU/GlobalISel: Fix custom lowering of llvm.trunc.f64 for SI
This was missing an operand from BFE and not erasing the original
instruction.
2020-07-20 10:06:18 -04:00
Petar Avramovic ba938f6388 AMDGPU/GlobalISel: Legalize s16->s64 G_FPTOSI/G_FPTOUI
Add narrowScalarFor action.
Add narrow scalar for typeIndex == 0 for G_FPTOSI/G_FPTOUI.
Legalize using narrowScalarFor as s16->s32 G_FPTOSI/G_FPTOUI
followed by s32->s64 G_SEXT/G_ZEXT.

Differential Revision: https://reviews.llvm.org/D84010
2020-07-20 11:06:11 +02:00
Dmitry Preobrazhensky 2e87acac9b [AMDGPU] Removed s_mov_regrd and mov_fed opcodes
These opcodes are not intended for public use.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D81659
2020-07-17 19:52:54 +03:00
Matt Arsenault 994fb86bc2 AMDGPU: Fix promoting f16 fpowi with legal f16 2020-07-17 11:29:05 -04:00
Jay Foad 760af7a074 [AMDGPU] Avoid splitting FLAT offsets in unsafe ways
As explained in the comment:

// For a FLAT instruction the hardware decides whether to access
// global/scratch/shared memory based on the high bits of vaddr,
// ignoring the offset field, so we have to ensure that when we add
// remainder to vaddr it still points into the same underlying object.
// The easiest way to do that is to make sure that we split the offset
// into two pieces that are both >= 0 or both <= 0.

In particular FLAT (as opposed to SCRATCH and GLOBAL) instructions have
an unsigned immediate offset field, so we can't use it to help split a
negative offset.

Differential Revision: https://reviews.llvm.org/D83394
2020-07-17 11:44:10 +01:00
hsmahesha 4905536086 Revert "[AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size"
This reverts commit cc9d693856.
2020-07-17 12:20:37 +05:30
Carl Ritson 3a18665748 [AMDGPU] Translate s_and/s_andn2 to s_mov in vcc optimisation
When SCC is dead, but VCC is required then replace s_and / s_andn2
with s_mov into VCC when mask value is 0 or -1.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D83850
2020-07-17 11:48:57 +09:00
Matt Arsenault 1912ace968 AMDGPU: Move handling of AGPR copies to a separate function
This is in preparation for fixing multiple problems with the way AGPR
copies are handled, but this change is NFC itself. First, it's relying
on recursively calling copyPhysReg, which is losing information
necessary to get correct super register handling.

Second, it's constructing a new RegScavenger and doing a O(N^2) walk
on every single sub-spill for every AGPR tuple copy. Third, it's using
the forward form of the scavenger, and not using the preferred
backwards scan.
2020-07-16 14:32:24 -04:00
Matt Arsenault 219a9fea14 AMDGPU: Rename gfx9 version of v_add_i32/v_sub_i32
The carry-out opcode is renamed, so eliminate the deceptive _gfx9,
which looked like the encoded instruction. The real encoded version
was named _gfx9_gfx9.

Move it into the VI encoding namespace. The gfx9 namespace is just to
deal with the renamed instructions that reinterpret the opcode. When
codegened, it would fail to find the real instruction since it wasn't
in the right namespace.
2020-07-16 13:32:05 -04:00
Matt Arsenault 79f67cae91 AMDGPU: Rename add/sub with carry out instructions
The hardware has created a real mess in the naming for add/sub, which
have been renamed basically every generation. Switch the carry out
pseudos to have the gfx9/gfx10 names. We were using the original SI/CI
v_add_i32/v_sub_i32 names. Later targets reintroduced these names as
carryless instructions with a saturating clamp bit, which we do not
define. Do this rename so we can unambiguously add these missing
instructions.

The carry-in versions should also be renamed, but at least those had a
consistent _u32 name to begin with. The 16-bit instructions were also
renamed, but aren't ambiguous.

This does regress assembler error message quality in some cases. In
mismatched wave32/wave64 situations, this will switch from
"unsupported instruction" to "invalid operand", with the error
pointing at the wrong position. I couldn't quite follow how the
assembler selects these, but the previous behavior seemed accidental
to me. It looked like there was a partial attempt to handle this which
was never completed (i.e. there is an AMDGPUOperand::isBoolReg but it
isn't used for anything).
2020-07-16 13:16:30 -04:00
Petar Avramovic 6850033ca6 AMDGPU/GlobalISel: Legalize s64->s16 G_SITOFP/G_UITOFP
Add widenScalar for TypeIdx == 0 for G_SITOFP/G_UITOFP.
Legailize, using widenScalar, as s64->s32 G_SITOFP/G_UITOFP
followed by s32->s16 G_FPTRUNC.

Differential Revision: https://reviews.llvm.org/D83880
2020-07-16 16:31:57 +02:00
Roman Lebedev 4028409d77
Reland "[NFC] SimplifyCFGOptions: drop multi-parameter ctor, use default member-init"
This reverts commit 5831e86190,
which reverted commit 90c1b0442a
in preparation for reverting
commit b2018198c3 in
commit 1067d3e176 due to the introducton
of a dependency cycle.

Now that the other revert is reverted with a fix, this can be relanded.
2020-07-16 13:40:01 +03:00
Petar Avramovic 5658002b80 AMDGPU/GlobalISel: Select G_FREEZE
Select G_FREEZE in the same way that COPY is selected.

Differential Revision: https://reviews.llvm.org/D83031
2020-07-16 11:10:48 +02:00
Adrian Kuegel 5831e86190 Revert "[NFC] SimplifyCFGOptions: drop multi-parameter ctor, use default member-init"
This reverts commit 90c1b0442a.
This is based on another commit which also needs to be reverted.
The other commit introduced a Dependency Cycle between Transforms/Scalar
and TransformUtils. Scalar already depends (in many ways) on
TransformUtils, so making TransformUtils depend on Scalar should be
avoided.
2020-07-16 10:32:50 +02:00
Carl Ritson 5bf2a9dd40 [AMDGPU] Update VMEM scalar write hazard mitigation sequence
Using s_waitcnt_depctr 0xffe3 is potentially faster than v_nop.

Reviewed By: rampitec, foad

Differential Revision: https://reviews.llvm.org/D83872
2020-07-16 11:37:45 +09:00
dfukalov 76a0c0ee6f [AMDGPU][CostModel] Improve cost estimation for fused {fadd|fsub}(a,fmul(b,c))
Summary:
If result of fmul(b,c) has one use, in almost all cases (except denormals are
IEEE) the pair of operations will be fused in one fma/mad/mac/etc.

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits, kerbowa

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83919
2020-07-16 03:06:38 +03:00
Roman Lebedev 90c1b0442a
[NFC] SimplifyCFGOptions: drop multi-parameter ctor, use default member-init
Likewise, just use the builder pattern.
Taking multiple params is unmaintainable.
2020-07-16 01:48:34 +03:00
Dmitry Preobrazhensky e122eba185 [AMDGPU][MC] Corrected MTBUF parsing and decoding
MTBUF implementation has many issues and this change addresses most of these:
- refactored duplicated code;
- hardcoded constants moved out of high-level code;
- fixed a decoding error when nfmt or dfmt are zero (bug 36932);
- corrected parsing of operand separators (bug 46403);
- corrected handling of missing operands (bug 46404);
- corrected handling of out-of-range modifiers (bug 46421);
- corrected default value (bug 46467).

Reviewers: arsenm, rampitec, vpykhtin, artem.tamazov, kzhuravl

Differential Revision: https://reviews.llvm.org/D83760
2020-07-15 19:46:00 +03:00
Carl Ritson 674226126d [AMDGPU] Apply pre-emit s_cbranch_vcc optimation to more patterns
Add handling of s_andn2 and mask of 0.
This eliminates redundant instructions from uniform control flow.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D83641
2020-07-15 11:02:35 +09:00
Logan Smith a19461d9e1 [NFC] Add 'override' keyword where missing in include/ and lib/.
This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual.

Differential Revision: https://reviews.llvm.org/D83709
2020-07-14 09:47:29 -07:00
Jay Foad 8a24208977 [AMDGPU] Simplify AMDGPUSubtarget::getWavesPerEU. NFC. 2020-07-14 14:20:02 +01:00
Jay Foad 5ab2e14d31 [AMDGPU] Fix typos in performCtlz_CttzCombine()
Fix two obvious errors in the code and also update the test check.
Also add one test to catch the failure.

Patch by Ruiling Song!

Differential Revision: https://reviews.llvm.org/D83280
2020-07-14 10:18:18 +01:00
Jay Foad 1658b8d7dd [AMDGPU] Avoid using s_cmpk when src0 is not register
The hardware spec require src0 of s_cmpk should be a register. So, we
should not optimize s_cmp to s_cmpk if src0 is not register.

Patch by Ruiling Song!
2020-07-14 09:05:53 +01:00
Carl Ritson 74c14202d9 [AMDGPU] Propagate dead flag during pre-RA exec mask optimizations
Preserve SCC dead flags in SIOptimizeExecMaskingPreRA.
This helps with removing redundant s_andn2 instructions later.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D83637
2020-07-14 12:53:43 +09:00
Matt Arsenault 6a8c11a11f GlobalISel: Implement widenScalar for saturating add/sub
Add a placeholder legality rule for AMDGPU until the rest of the
actions are handled.
2020-07-13 14:46:40 -04:00
Mirko Brkusanin 38998cfa9c [AMDGPU][GlobalISel] Fix subregister index for EXEC register in selectBallot.
Temporarily remove subregister for EXEC in selectBallot added in
https://reviews.llvm.org/D83214 to fix failures on expensive checks buildbot.
2020-07-13 13:35:34 +02:00
Mirko Brkusanin ce23e54162 [AMDGPU][GlobalISel] Select llvm.amdgcn.ballot
Select ballot intrinsic for GlobalISel.

Differential Revision: https://reviews.llvm.org/D83214
2020-07-13 12:14:43 +02:00
Christudasan Devadasan d7a05698ef [AMDGPU] Move LowerSwitch pass to CodeGenPrepare.
It is possible that LowerSwitch pass leaves certain blocks
unreachable from the entry. If not removed, these dead blocks
can cause undefined behavior in the subsequent passes.
It caused a crash in the AMDGPU backend after the instruction
selection when a PHI node has its incoming values coming from
these unreachable blocks.

In the AMDGPU pass flow, the last invocation of UnreachableBlockElim
precedes where LowerSwitch is currently placed and eventually
missed out on the opportunity to get these blocks eliminated.
This patch ensures that LowerSwitch pass get inserted earlier
to make use of the existing unreachable block elimination pass.

Reviewed By: sameerds, arsenm

Differential Revision: https://reviews.llvm.org/D83584
2020-07-11 16:33:38 +05:30
Matt Arsenault 31f4e43f3f AMDGPU: Remove .value_type from kernel metadata
This doesn't appear used for anything, and is emitted incorrectly
based on the description. This also depends on the IR type, and
pointee element type.
2020-07-10 18:16:31 -04:00
Sidharth Baveja e541e1b757 [NFC] Separate Peeling Properties into its own struct (re-land after minor fix)
Summary:
This patch separates the peeling specific parameters from the UnrollingPreferences,
and creates a new struct called PeelingPreferences. Functions which used the
UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-10 18:39:30 +00:00
dstuttar 69a89b54c6 [NFC] Change isFPPredicate comparison to ignore lower bound
Summary:
Since changing the Predicate to be an unsigned enum, the lower bound check for
isFPPredicate no longer needs to check the lower bound, since
it will always evaluate to true.

Also fixed a similar issue in SIISelLowering.cpp by removing the need for
comparing to FIRST and LAST predicates

Added an assert to the isFPPredicate comparison to flag if the
FIRST_FCMP_PREDICATE is ever changed to anything other than 0, in which case the
logic will break.

Without this change warnings are generated in VS.

Change-Id: I358f0daf28c0628c7bda8ad4cab4e1757b761bab

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83540
2020-07-10 11:57:20 +01:00
Mirko Brkusanin cf40db21af [AMDGPU][GlobalISel] Fix G_AMDGPU_TBUFFER_STORE_FORMAT mapping
Add missing mappings and tablegen definitions for TBUFFER_STORE_FORMAT.

Differential Revision: https://reviews.llvm.org/D83240
2020-07-10 11:32:32 +02:00
Stanislav Mekhanoshin 77f8f813a9 [AMDGPU] Return restricted number of regs from TTI
This is practically NFC at the moment because nothing really
asks the real number or does anything useful with it.

Differential Revision: https://reviews.llvm.org/D82202
2020-07-09 14:31:28 -07:00
Nikita Popov 0b39d2d752 Revert "[NFC] Separate Peeling Properties into its own struct"
This reverts commit 0369dc98f9.

Many failing tests.
2020-07-08 21:43:32 +02:00
Sidharth Baveja 0369dc98f9 [NFC] Separate Peeling Properties into its own struct
Summary:
This patch makes the peeling properties of the loop accessible by other loop transformations.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-08 18:59:59 +00:00
Anh Tuyen Tran 6965af43e6 Revert "[NFC] Separate Peeling Properties into its own struct"
This reverts commit fead250b43.
2020-07-08 18:58:05 +00:00
Anh Tuyen Tran fead250b43 [NFC] Separate Peeling Properties into its own struct
Summary:
This patch makes the peeling properties of the loop accessible by other loop transformations.

Author: sidbav (Sidharth Baveja)

Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel)

Reviewed By: Meinersbur (Michael Kruse)

Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D80580
2020-07-08 18:56:03 +00:00
Jay Foad 47788b97a9 SILoadStoreOptimizer: add support for GFX10 image instructions
GFX10 image instructions use one or more address operands starting at
vaddr0, instead of a single vaddr operand, to allow for NSA forms.

Differential Revision: https://reviews.llvm.org/D81675
2020-07-08 19:15:46 +01:00
Jay Foad a8816ebee0 [AMDGPU] Fix and simplify AMDGPULegalizerInfo::legalizeUDIV_UREM32Impl
Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32.

Differential Revision: https://reviews.llvm.org/D83383
2020-07-08 19:14:49 +01:00
Jay Foad ecac951be9 [AMDGPU] Fix and simplify AMDGPUTargetLowering::LowerUDIVREM
Use the algorithm from AMDGPUCodeGenPrepare::expandDivRem32.

Differential Revision: https://reviews.llvm.org/D83382
2020-07-08 19:14:49 +01:00
Jay Foad f4bd01c191 [AMDGPU] Fix and simplify AMDGPUCodeGenPrepare::expandDivRem32
Fix the division/remainder algorithm by adding a second quotient
refinement step, which is required in some cases like
0xFFFFFFFFu / 0x11111111u (https://bugs.llvm.org/show_bug.cgi?id=46212).

Also document, rewrite and simplify it by ensuring that we always have a
lower bound on inv(y), which simplifies the UNR step and the quotient
refinement steps.

Differential Revision: https://reviews.llvm.org/D83381
2020-07-08 19:14:48 +01:00
Matt Arsenault 23157f3bdb GlobalISel: Handle EVT argument lowering correctly
handleAssignments was assuming every argument type is an MVT, and
assignArg would always fail. This fixes one of the hacks in the
current AMDGPU calling convention code that pre-processes the
arguments.
2020-07-07 16:36:14 -04:00
Matt Arsenault 42bb481442 AMDGPU/GlobalISel: Fix skipping unused kernel arguments
The tests in a5b9ad7e9a actually failed
the verifier, which for some reason is not the default. Also add tests
for 0-sized function arguments, which do not add entries to the
expected register lists.
2020-07-07 16:36:13 -04:00
Carl Ritson 560292fa99 [AMDGPU] Update isFMAFasterThanFMulAndFAdd assumptions
MAD/MAC is no longer always available.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D83207
2020-07-07 15:40:44 +09:00
Stanislav Mekhanoshin f7a7efbf88 [AMDGPU] Tweak getTypeLegalizationCost()
Even though wide vectors are legal they still cost more as we
will have to eventually split them. Not all operations can
be uniformly done on vector types.

Conservatively add the cost of splitting at least to 8 dwords,
which is our widest possible load.

We are more or less lying to cost mode with this change but
this can prevent vectorizer from creation of wide vectors which
results in RA problems for us.

Differential Revision: https://reviews.llvm.org/D83078
2020-07-06 14:07:48 -07:00
Matt Arsenault f25d020c2e AMDGPU/GlobalISel: Add types to special inputs
When passing special ABI inputs, we have no existing context for the
type to use.
2020-07-06 17:00:55 -04:00
Nicolai Hähnle dfcc68c528 DomTree: Remove getRoots() accessor
Summary:
Avoid exposing details about how roots are stored. This enables subsequent
type-erasure changes.

v5:
- cleanup a unit test by using EXPECT_EQ instead of EXPECT_TRUE

Change-Id: I532b774cc71f2224e543bc7d79131d97f63f093d

Reviewers: arsenm, RKSimon, mehdi_amini, courbet

Subscribers: jvesely, wdng, hiraditya, kuhar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83085
2020-07-06 21:58:11 +02:00
Matt Arsenault c19c153e74 AMDGPU: Don't ignore carry out user when expanding add_co_pseudo
This was resulting in a missing vreg def in the use select
instruction.

The output of the pseudo doesn't make sense, since it really shouldn't
have the vreg output in the first place, and instead an implicit scc
def to match the real scalar behavior.

We could have easier to understand tests if we selected scalar
versions of the [us]{add|sub}.with.overflow intrinsics.

This does still end up producing vector code in the end, since it gets
moved later.
2020-07-06 14:28:01 -04:00
Matt Arsenault a5b9ad7e9a AMDGPU/GlobalISel: Don't emit code for unused kernel arguments 2020-07-06 09:04:06 -04:00
Matt Arsenault 7b76a5c8a2 AMDGPU: Fix fixed ABI SGPR arguments
The default constructor wasn't setting isSet o the ArgDescriptor, so
while these had the value set, they were treated as missing. This only
ended up mattering in the indirect call case (and for regular calls in
GlobalISel, which current doesn't have a way to support the variable
ABI).
2020-07-06 09:01:18 -04:00