Commit Graph

2871 Commits

Author SHA1 Message Date
Tim Corringham 6821a3ccd6 [AMDGPU] Add attribute for target loop unroll threshold default
Summary:
Add a function attribute to allow the target specific default loop unroll threshold
to be specified on a per-function basis. This allows a front-end to give guidance
where it has insight that is not available to the back-end, while still allowing the
target specific heuristics to also have an effect.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68873
2019-11-21 09:47:28 +00:00
Piotr Sobczak 4a801170f3 [AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/stores
Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69794
2019-11-20 22:59:30 +01:00
Stanislav Mekhanoshin 899cdf95d9 [AMDGPU] Fixed mfma test check. NFC. 2019-11-20 12:33:12 -08:00
Michael Liao 4a308d302c [AMDGPU] Keep consistent check of legal addressing mode.
Summary:
- Add test cases for GFX10, which has narrower offset range compared to
  GFX9.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70473
2019-11-20 15:08:17 -05:00
Sameer Sahasrabuddhe 52c5014da0 [AMDGPU] add support for hostcall buffer pointer as hidden kernel argument
Hostcall is a service that allows a kernel to submit requests to the
host using shared buffers, and block until a response is
received. This will eventually replace the shared buffer currently
used for printf, and repurposes the same hidden kernel argument. This
change introduces a new ValueKind in the HSA metadata to represent the
hostcall buffer.

Differential Revision: https://reviews.llvm.org/D70038
2019-11-20 15:53:55 +05:30
Austin Kerbow f3225f2abe AMDGPU/GlobalISel: Legalize FDIV64
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70403
2019-11-19 21:02:27 -08:00
Matt Arsenault db0ed3e429 AMDGPU: Refactor treatment of denormal mode
Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.

This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.
2019-11-19 19:55:43 +05:30
Matt Arsenault ea23b6428b AMDGPU: Be explicit about denormal mode in MIR tests
Start checking the machine function in GlobalISel instead of the
target directly.

This temporarily breaks fcanonicalize selection in GlobalISel.
2019-11-19 19:55:43 +05:30
dfukalov 6fd11b14f6 [AMDGPU] Tune inlining parameters for AMDGPU target (part 2)
Summary:
Most of IR instructions got better code size estimations after commit 47a5c36b.
So default parameters values should be updated to improve inlining and
unrolling for the target.

Reviewers: rampitec, arsenm

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70391
2019-11-19 16:33:16 +03:00
Piotr Sobczak 02419ab5c7 [AMDGPU] Lower llvm.amdgcn.s.buffer.load.v3[i|f]32
Summary: Add lowering support for 32-bit vec3 variant of s.buffer.load intrinsic.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70118
2019-11-15 15:01:15 +01:00
Matt Arsenault 31479d868e AMDGPU: Change boolean content type to 0 or 1
The usage of target boolean checks is overly inflexible, since sext
and zext of a compare are equally cheap. The choice is arbitrary, but
using 0/1 to some degree is the choice of lower resistance since
that's what most targets use. This enables a few combines that don't
bother to support ZeroOrNegativeOneBooleanContent.
2019-11-15 13:43:47 +05:30
Matt Arsenault 69fcfb7d35 AMDGPU: Try to commute sub of boolean ext
Avoids another regression in a future patch.
2019-11-15 13:43:42 +05:30
Matt Arsenault bc276c6379 GlobalISel: Lower s1 source G_SITOFP/G_UITOFP 2019-11-15 13:37:20 +05:30
Stanislav Mekhanoshin 4fa44f989e [AMDGPU] Fixed dpp test. NFC. 2019-11-13 16:38:54 -08:00
Stanislav Mekhanoshin af7d4022c7 [AMDGPU] Fixed mfma-loop test. NFC. 2019-11-13 16:03:54 -08:00
Matt Arsenault 9d7bccab66 AMDGPU: Extend add x, (ext setcc) combine to sub
This is the same as the add case, but inverts the operation type.

This avoids regressions in a future patch.
2019-11-13 07:13:58 +05:30
Matt Arsenault 4b47213951 AMDGPU: Switch backend default max workgroup size to 1024
Previously this would default to 256, not the maximum supported size
of 1024. Using a maximum lower than the hardware maximum requires
language runtimes to enforce this limit for correctness, which no
language has correctly done. Switch the default to the conservatively
correct maximum, and force frontends to opt-in to the more optimal 256
default maximum.

I don't really understand why the changes in occupancy-levels.ll
increased the computed occupancy, which I expected to decrease. I'm
not sure if these tests should be forcing the old maximum.
2019-11-13 07:11:02 +05:30
Matt Arsenault 25c5da5a42 AMDGPU Reduce reported maximum group size to 1024
While some targets allow encoding 2048, this was never tested or
supported.
2019-11-13 06:34:28 +05:30
Tim Renouf 07ebd74154 MCP: Fixed bug with dest overlapping copy source
In MachineCopyPropagation, when propagating the source of a copy into
the operand of a later instruction, bail if a destination overlaps
(partly defines) the copy source. If the instruction where the
substitution is happening is also a copy, allowing the propagation
confuses the tracking mechanism.

Differential Revision: https://reviews.llvm.org/D69953

Change-Id: Ic570754f878f2d91a4a50a9bdcf96fbaa240726d
2019-11-12 08:18:11 +00:00
Matt Arsenault e16a71382d AMDGPU: Select global atomicrmw fadd
This only works if there is no use of the return value.
2019-11-06 16:06:38 -08:00
Stanislav Mekhanoshin d17bcf2bb9 [AMDGPU] Add handling of 160 bit registers in analyzeResourceUsage
This was omitted. Also SReg_96Reg missed IsSGPR assignment.

Differential Revision: https://reviews.llvm.org/D69919
2019-11-06 15:47:32 -08:00
Quentin Colombet 52af7aedfe [GISel][ArtifactCombiner] Relax the constraint to combine unmerge with concat_vectors
The combine G_UNMERGE_VALUES with G_CONCAT_VECTORS used to only be performed
when the result type of the G_UNMERGE_VALUES was a vector type.
In other words, we were expecting that the G_UNMERGE_VALUES was effectively
the exact opposite of the G_CONCAT_VECTORS.

Lift that constraint by allowing any G_UNMERGE_VALUES to be combined
with any G_CONCAT_VECTORS (as long as the size of the different pieces
that we merge/unmerge match).

Differential Revision: https://reviews.llvm.org/D69288
2019-11-06 11:27:50 -08:00
Daniel Sanders e74c5b9661 [globalisel] Rename G_GEP to G_PTR_ADD
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD

Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69734
2019-11-05 10:31:17 -08:00
Michael Liao 4531aee2ac [amdgpu] Fix known bits compuation on `MUL_I24`/`MUL_U24`.
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69735
2019-11-01 17:06:17 -04:00
Matt Arsenault d9e0a2942a AMDGPU: Disallow spill folding with m0 copies
readlane and writelane instructions are not allowed to use m0 as the
data operand, so spilling them is tricky and would require an
intermediate SGPR to spill it. Constrain the virtual register class in
this caes to disallow the inline spiller from folding the m0 operand
directly into the spill instruction.

I copied this hack from AArch64 which has the same problem for $sp.
2019-10-30 14:56:33 -07:00
Matt Arsenault edca9ac0de AMDGPU: Don't fold S_NOPs with implicit operands 2019-10-30 14:40:56 -07:00
Jay Foad e5972f2a04 [AMDGPU] Simplify VCCZ bug handling
Summary:
VCCZBugHandledSet was used to make sure we don't apply the same
workaround more than once to a single cbranch instruction, but it's not
necessary because the workaround involves inserting an s_waitcnt
instruction, which is enough for subsequent iterations to detect that no
further workaround is necessary.

Also beef up the test case to check that the workaround was only applied
once. I have also manually verified that the test still passes even if I
hack the big do-while loop in runOnMachineFunction to run a minimum of
five iterations.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69621
2019-10-30 17:09:07 +00:00
Jay Foad 86549c7528 [SelectionDAG] Add support for FP_ROUND in WidenVectorOperand.
Summary:
This is used on AMDGPU for rounding from v3f64 (which is illegal) to
v3f32 (which is legal).

Subscribers: jvesely, nhaehnle, tpr, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69339
2019-10-30 15:18:21 +00:00
Austin Kerbow 2b88b344f2 AMDGPU/GlobalISel: Legalize FDIV32
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69581
2019-10-29 17:18:06 -07:00
Matt Arsenault 21bc8e5a13 AMDGPU: Make VReg_1 only include 1 artificial register
When TableGen is inferring register classes from contexts, it uses a
sorting function based on the number of registers in the class. Since
this was being treated as an alias of VGPR_32, they had exactly the
same size. The sort used wasn't a stable sort, and even if it were, I
believe the tie breaker would effectively end up being the
alphabetical ordering of the class name. There appear to be issues
trying to use an empty set of registers, so add only one so this will
always sort to the end.

Also add a comment explaining how VReg_1 is a dirty hack for
SelectionDAG.

This does end up changing the behavior of i1 with inline asm and VGPR
constraints, but the existing behavior was was already nonsensical and
inconsistent. It should probably be disallowed anyway.

Fixes bug 43699
2019-10-28 20:51:51 -07:00
Austin Kerbow d11b93ec6a AMDGPU: Avoid overwriting saved PC
Summary:
An outstanding load with same destination sgpr as call could cause PC to be
updated with junk value on return.

Reviewers: arsenm, rampitec

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69474
2019-10-28 10:02:22 -07:00
cdevadas e921ede540 [AMDGPU] Fix Vreg_1 PHI lowering in SILowerI1Copies.
There is a minor flaw in the implementation of function lowerPhis.
This function replaces values of regclass Vreg_1 (boolean values)
involved in PHIs into an SGPR. Currently it iterates over the MBBs
and performs an inplace lowering of PHIs and fails to lower any
incoming value that itself is another PHI of Vreg_1 regclass.
The failure occurs only when the MBB where the incoming PHI value
belongs is not visited/lowered yet.

To fix this problem, collect all Vreg_1 PHIs upfront and then
perform the lowering.

Differential Revision: https://reviews.llvm.org/D69182
2019-10-26 14:37:45 +05:30
Sanjay Patel 4c47617627 [SDAG] fold extract_vector_elt with undef index
This makes the DAG behavior consistent with IR's extractelement after:
rGb32e4664a715

https://bugs.llvm.org/show_bug.cgi?id=42689

I've tried to maintain test intent for WebAssembly.
The AMDGPU test is trying to test for crashing or other bad behavior,
but I'm not sure if that's possible after this change.
2019-10-25 19:27:26 -04:00
Stanislav Mekhanoshin 4c0251da14 [AMDGPU] Enable SGPR copy folding
That used to fail in the last testcase function because after
%0:sreg_64.sub0 was folded into %3:sreg_32_xm0_xexec COPY, it
was further folded into S_STORE_DWORD_IMM. Its legal effective
subreg class is SReg_32 while instruction expects more restricted
SReg_32_XM0_EXEC. However, SIInstrInfo::isLegalRegOperand()
passed the legality check and it was caught in the verifier.

Borrowed code from the verifier to check for RC legality.

Differential Revision: https://reviews.llvm.org/D69445
2019-10-25 15:08:30 -07:00
Matt Arsenault 1a276d1e8c GlobalISel: Implement widenScalar for G_INSERT_VECTOR_ELT 2019-10-25 13:55:07 -07:00
Matt Arsenault 171cf5302f AMDGPU/GlobalISel: Handle flat/global G_ATOMIC_CMPXCHG
Custom lower this to a target instruction with the merge operands. I
think it might be better to directly select this and emit a
REG_SEQUENCE, but this would be more work since it would require
splitting the tablegen patterns for these cases from the other
atomics.
2019-10-25 13:11:09 -07:00
Changpeng Fang 1ce552f3ef AMDGPU: Fix the broken dominator tree when creating waterfall loop for resource descriptor
Summary:
  In loadSRsrcFromVGPR, if MBB is the same as Succ, Remiander is not the immediate dominator of Succ.

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D69358
2019-10-25 13:08:04 -07:00
Stanislav Mekhanoshin d4303b3861 [AMDGPU] Fold AGPR reg_sequence initializers
Differential Revision: https://reviews.llvm.org/D69413
2019-10-25 11:39:02 -07:00
vpykhtin c9c18e5a31 [AMDGPU] Disallow dpp combining for dpp instructions without Src2 operand (when Src2 is required)
Differential revision: https://reviews.llvm.org/D69430
2019-10-25 21:30:37 +03:00
Austin Kerbow c35b358b74 AMDGPU/GlobalISel: Legalize FDIV16
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69347
2019-10-25 11:07:17 -07:00
Scott Linder 7ad3636c30 [AMDGPU] Remove update_llc_test_checks for a test
The test split-arg-dbg-value.ll has a host-specific path in the
full output captured by update_llc_test_checks.

Fix for test failures introduced in https://reviews.llvm.org/D69402

Tags: #llvm
2019-10-25 11:47:33 -04:00
Scott Linder 2c37833931 [AMDGPU] Clean up update_llc_test_checks CodeGen tests
Summary:
Some tests have been hand edited without removing the
update_llc_test_checks header, some have slightly outdated CHECK lines
which still pass, and some have additional comments which
update_llc_test_checks pushes towards the function body.

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69402
2019-10-24 17:35:33 -04:00
Craig Topper a5376f6322 [GlobalISel][AArch64][AMDGPU][X86] Teach LegalizationArtifactCombiner to combine trunc(g_constant).
This allows X86 to properly form shift by immediate instructions
since we require an 8-bit constant to match the imported
SelectionDAG patterns.
2019-10-24 12:59:26 -07:00
Stanislav Mekhanoshin 3c8e055187 [AMDGPU] Fix mfma scheduling crash
An SUnit can be neither intruction not SDNode. It is all
null if represents a nop. Fixed a crash on using SU->getInstr().

Differential Revision: https://reviews.llvm.org/D69395
2019-10-24 11:01:52 -07:00
Michael Liao b2a65f0d70 [AMDGPU] Skip additional folding on the same operand.
Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69355
2019-10-24 11:30:22 -04:00
Stanislav Mekhanoshin 61e7a61bdc [AMDGPU] Allow folding of sgpr to vgpr copy
Potentially sgpr to sgpr copy should also be possible.
That is however trickier because we may end up with a
wrong register class at use because of xm0/xexec permutations.

Differential Revision: https://reviews.llvm.org/D69280
2019-10-23 18:42:48 -07:00
Stanislav Mekhanoshin f9b1dc5553 [AMDGPU] Updated fold-vgpr-copy.mir test. NFC. 2019-10-22 12:43:23 -07:00
Stanislav Mekhanoshin 48f57138be [AMDGPU] Allow tied operand subreg folding
Turns out it makes sense, contrarily to what comment said.

Differential Revision: https://reviews.llvm.org/D69287
2019-10-22 11:27:36 -07:00
Austin Kerbow 97263fa2dd AMDGPU/GlobalISel: Legalize fast unsafe FDIV
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69231

llvm-svn: 375460
2019-10-21 22:18:26 +00:00
Matt Arsenault 38038f116f AMDGPU: Use CopyToReg for interp intrinsic lowering
This doesn't use the default value, so doesn't benefit from the hack
to help optimize it.

llvm-svn: 375450
2019-10-21 19:53:49 +00:00