Commit Graph

3424 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin 992fbce4e9 [AMDGPU] copyPhysReg() for 16 bit SGPR subregs
Differential Revision: https://reviews.llvm.org/D78255
2020-04-17 11:59:39 -07:00
Stanislav Mekhanoshin fde2aefa22 [AMDGPU] Use SDWA for 16 bit subreg copy
This simplifies the logic and allows to use it on GFX8.

Differential Revision: https://reviews.llvm.org/D78150
2020-04-17 11:45:44 -07:00
Dominik Montada 55e3a7c6b2 [GlobalISel][AMDGPU] add legalization for G_FREEZE
Summary:
Copy the legalization rules from SelectionDAG:
-widenScalar using anyext
-narrowScalar using intermediate merges
-scalarize/fewerElements using unmerge
-moreElements using G_IMPLICIT_DEF and insert

Add G_FREEZE legalization actions to AMDGPULegalizerInfo.
Use the same legalization actions as G_IMPLICIT_DEF.

Depends on D77795.

Reviewers: dsanders, arsenm, aqjune, aditya_nandakumar, t.p.northover, lebedev.ri, paquette, aemerson

Reviewed By: arsenm

Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, volkan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78092
2020-04-17 16:44:46 +02:00
Stanislav Mekhanoshin 2e94a64b57 [AMDGPU] Define 16 bit SGPR subregs
These are needed as a counterpart for VGPR subregs even though
there are no scalar instructions which can operate 16 bit values.
When we are materializing a constant that is done into an SGPR
and that SGPR may/will be copied into a 16 bit VGPR subreg. Such
copy is illegal. There are also similar problems if a source
operand of a 16 bit VALU instruction is an SGPR. In addition
we need to get a register with a lo16 subregister of an SGPR
RC during selection and this fails as well.

All of that makes me believe we need these subregisters as a
syntactic glue.

Differential Revision: https://reviews.llvm.org/D78250
2020-04-16 10:31:39 -07:00
Konstantin Schwarz 1a3e89aa2b [MIR] Add comments to INLINEASM immediate flag MachineOperands
Summary:
The INLINEASM MIR instructions use immediate operands to encode the values of some operands.
The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature.

Reviewers: SjoerdMeijer, arsenm, efriedma

Reviewed By: arsenm

Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78088
2020-04-16 13:46:14 +02:00
Dominik Montada e5d666d768 Revert "Revert "[GlobalISel] Fix invalid combine of unmerge(merge) with intermediate cast""
This reverts commit 1265899c5f.
2020-04-16 09:30:34 +02:00
Stanislav Mekhanoshin 8a9d48b46d [AMDGPU] Fixed lane mask in test. NFC. 2020-04-15 15:26:53 -07:00
Amara Emerson c22cb5bd31 [GlobalISel] Enable artifact combiner to combine starting from a G_MERGE_VALUES.
We generally only combine starting from users to defs in the artifact combiner,
but this doesn't catch cases where at the point of combining a G_UNMERGE we don't
yet have the opposite G_MERGE on input yet since we haven't legalized that far.

This change adds the users of a G_MERGE to the artifact combiner worklist if one
of the uses is a G_UNMERGE or G_TRUNC.

Differential Revision: https://reviews.llvm.org/D77931
2020-04-15 10:34:13 -07:00
Dominik Montada 1265899c5f Revert "[GlobalISel] Fix invalid combine of unmerge(merge) with intermediate cast"
This reverts commit bddac41b9f.
2020-04-15 18:47:39 +02:00
Dominik Montada bddac41b9f [GlobalISel] Fix invalid combine of unmerge(merge) with intermediate cast
Summary:
The combine for unmerge(cast(merge)) is only valid for vectors, but was
missing a corresponding check. Add a check that the operands are vectors
to avoid an invalid combine.

Without this check, the combiner would emit incorrect code for scalars
and pointers because the artifact cast (trunc/ext) only affects bits at
the end of the type, while this combine assumes that the casted bits
appear between meaningful bits.

This also uncovered a segmentation fault in the AMDGPU
InstructionSelector. The tests triggering this bug have been moved to
their own file and a check for the segmentation fault has been added.

Reviewers: arsenm, dsanders, aemerson, paquette, aditya_nandakumar

Reviewed By: arsenm

Subscribers: tpr, jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78191
2020-04-15 17:19:14 +02:00
Matt Arsenault ef2cb8db34 AMDGPU/GlobalISel: Add some artifact combiner tests 2020-04-15 09:03:07 -04:00
Matt Arsenault cc149172da AMDGPU/GlobalISel: Fix selection of scalar f64 G_FABS
This wasn't covered by existing tablegen patterns, but also suffers
the same issues as G_FNEG. Workaround them by manually selecting, like
G_FNEG.
2020-04-14 22:05:22 -04:00
Matt Arsenault cb5dc3765b TableGen/GlobalISel: Fix constraining REG_SEQUENCE operands
This was hitting the default instruction constraint code which uses
the register classes in the instruction def, which REG_SEQUENCE does
not have.

Fixes not constraining the register class for AMDGPU fneg/fabs
patterns, which would fail when the use was another generic,
unconstrained instruction.

Another oddity I noticed is that the temporary registers are created
with an unnecessary, but incorrect 16-bit LLT but this shouldn't
matter.

I'm also still unclear why root and sub-instructions have to be
handled differently.
2020-04-14 22:05:22 -04:00
Eli Friedman 2876b3eef3 [SelectionDAG] Always preserve offset in MachinePointerInfo
Previously, getWithOffset() would drop the offset if the base was null.
Because of this, MachineMemOperand would return the wrong result from
getAlign() in these cases.  MachineMemOperand stores the alignment of
the pointer without the offset.

A bunch of MIR tests changed because we print the offset now.

Split off from D77687.

Differential Revision: https://reviews.llvm.org/D78049
2020-04-14 15:29:41 -07:00
Matt Arsenault f48fe2c36e GlobalISel: Fix casted unmerge of G_CONCAT_VECTORS
This was assuming a scalarizing unmerge, and would fail assert if the
unmerge was to smaller vector types.
2020-04-13 22:03:05 -04:00
Matt Arsenault 0ba40d4ccf AMDGPU/GlobalISel: Combines for V_CVT_F32_UBYTE[0-3]
Ports the existing DAG combines, minus the simplify demanded bits
which seems to have no equivalent now. Without these, this isn't
particularly helpful in most of the IR sample cases.
2020-04-13 19:18:19 -04:00
Austin Kerbow a69b3e010c [AMDGPU][GlobalISel] Fix div_scale in FDIV lowering
Differential Revision: https://reviews.llvm.org/D78004
2020-04-13 15:54:49 -07:00
Eli Friedman 89e0662dee Make IRBuilder automatically set alignment on load/store/alloca.
This is equivalent in terms of LLVM IR semantics, but we want to
transition away from using MaybeAlign to represent the alignment of
these instructions.

Differential Revision: https://reviews.llvm.org/D77984
2020-04-13 13:43:14 -07:00
Matt Arsenault e6605a209c DAG: Fix wrong legality check for ISD::FMAD
Since 1725f28841, this should check
isFMADLegalForFAddFSub rather than the the plain isOperationLegal.

This would assert in a subset of cases due to an oddity in how FMAD is
selected. We will allow FMA formation pre-legalize, but not FMAD even
in cases where it would be valid.

The current hook requires passing in the root fadd/fsub. However, in
this distributed case, this would be far more complicated to pass in
the relevant operand. AMDGPU doesn't get any value from the node, and
only needs the type and is the only implementor, so I'm not sure why
we have this complexity. Just rename and expand the assert to avoid
the more complicated checks spread through the distribution logic.
2020-04-13 10:25:39 -07:00
Jon Roelofs 0b0bb1969f [llvm] Fix yet more missing FileCheck colons 2020-04-13 10:49:19 -06:00
Jonathan Roelofs 17bc995388 [llvm] Fix more missing FileCheck directive colons 2020-04-13 10:16:29 -06:00
Austin Kerbow eab9a4f119 [AMDGPU] Don't assert on partial exec copy
After Machine CSE and coalescing we can end up with copies of exec to
subregister SGPRs.

Differential Revision: https://reviews.llvm.org/D77992
2020-04-12 21:14:36 -07:00
Matt Arsenault 96819011ca AMDGPU/GlobalISel: Fix RegBankSelect for v2s16 shifts
These need to be promoted and scalarized for the SALU.
2020-04-11 20:55:33 -04:00
Matt Arsenault ac8d51a3c6 AMDGPU/GlobalISel: Legalize 16-bit shift amounts to s16
The current selector depends on 16-bit shifts using 16-bit shift
amount types, but really it should accept either for all types.
2020-04-11 18:12:26 -04:00
Matt Arsenault c5497e5399 AMDGPU/GlobalISel: Fix legalizing <3 x s16> vselects 2020-04-11 15:59:51 -04:00
Matt Arsenault cf29333f40 AMDGPU/GlobalISel: Work around forming illegal zextload after legalize
Selection would fail after the post legalize combiner put an illegal
zextload back together.

The base combiner has parameter to only allow legal operations, but
they appear to not be used. I also don't see a nice way to remove a
single entry from all_combines, so just hack around this.
2020-04-11 10:52:58 -04:00
Matt Arsenault 49ae0fc2f0 GlobalISel: Fix incorrect lowering G_FCOPYSIGN
In the basic case, this was reading the sign from the wrong operand.
2020-04-10 21:00:25 -04:00
Stanislav Mekhanoshin 44920e8566 [AMDGPU] Disable sub-dword scralar loads IR widening
These will be widened in the DAG. In the meanwhile early
widening prevents otherwise possible vectorization of
such loads.

Differential Revision: https://reviews.llvm.org/D77835
2020-04-10 08:20:49 -07:00
Simon Pilgrim 54502476e7 [AMDGPU] Refresh fmin_legacy.ll checks to fix issue reported on D77354
Some of the condition codes had inverted since this was generated
2020-04-08 17:05:29 +01:00
Simon Pilgrim 7e62684251 [AMDGPU] Regenerate vector-extract-insert test checks to fix issue reported on D77354 2020-04-08 13:18:32 +01:00
Simon Pilgrim 98181a1f98 [AMDGPU] Regenerate si-annotate-cfg-loop-assert test checks to fix issue reported on D77354 2020-04-08 13:09:08 +01:00
Dominik Montada c8393240ab [GlobalISel] combine trunc(trunc) pattern
Summary:
Legalization can introduce the trunc(trunc) pattern. This can cause
problems if one of these intermediate truncs is not legal.
Combine truncs of this pattern, if the resulting trunc is legal.

Reviewers: arsenm, aemerson, dsanders

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76601
2020-04-08 11:58:28 +02:00
Dominik Montada 432720f1c4 [GlobalISel] Combine sext([sz]ext) -> [sz]ext, zext(zext) -> zext
Summary:
Combine sext(zext x) to (zext x) since the sign-bit is 0
after the zero-extension.

Combine sext(sext x) to (sext x) and ext(zext x) to (zext x)
since the intermediate step is not needed.

Reviewers: arsenm, volkan, aemerson, aditya_nandakumar

Reviewed By: arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77210
2020-04-08 11:24:29 +02:00
Dominik Montada 35950fea8d [GlobalISel] support narrow G_IMPLICIT_DEF for DstSize % NarrowSize != 0
Summary:
When narrowing G_IMPLICIT_DEF where the original size is not a multiple
of the narrow size, emit a smaller G_IMPLICIT_DEF and use G_ANYEXT.

To prevent a potential endless loop in the legalizer, the condition
to combine G_ANYEXT(G_IMPLICIT_DEF) is changed from isInstUnsupported
to !isInstLegal, since in this case the combine is only valid if
consequent legalization of the newly combined G_IMPLICIT_DEF does not
introduce G_ANYEXT due to narrowing.

Although this legalization for G_IMPLICIT_DEF would also be valid for
the general case, it actually caused a lot of code regressions when
tried due to superfluous COPYs and combines not getting hit anymore.

Reviewers: dsanders, aemerson, volkan, arsenm, aditya_nandakumar

Reviewed By: arsenm

Subscribers: jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76598
2020-04-08 11:00:07 +02:00
Stanislav Mekhanoshin f96810ff34 [AMDGPU] Expand vector trunc stores from i16 to i8
Differential Revision: https://reviews.llvm.org/D77693
2020-04-07 21:47:45 -07:00
Stanislav Mekhanoshin 96e51ed005 [AMDGPU] Implement copyPhysReg for 16 bit subregs
Differential Revision: https://reviews.llvm.org/D74937
2020-04-07 14:22:46 -07:00
Graham Sellers a19a56f6a1 [AMDGPU] Extend constant folding for logical operations
This patch extends existing constant folding in logical operations to
handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and
V_AND_OR_B32. Also added a couple of tests for existing folds.
2020-04-07 14:37:16 -04:00
Matt Arsenault f524194ffd AMDGPU: Cleanup test MIR 2020-04-07 14:09:25 -04:00
Stanislav Mekhanoshin 12a324393d [AMDGPU] Limit endcf-collapase to simple if
We can only collapse adjacent SI_END_CF if outer statement
belongs to a simple SI_IF, otherwise correct mask is not in the
register we expect, but is an argument of an S_XOR instruction.

Even if SI_IF is simple it might be lowered using S_XOR because
lowering is dependent on a basic block layout. It is not
considered simple if instruction consuming its output is
not an SI_END_CF. Since that SI_END_CF might have already been
lowered to an S_OR isSimpleIf() check may return false.

This situation is an opportunity for a further optimization
of SI_IF lowering, but that is a separate optimization. In the
meanwhile move SI_END_CF post the lowering when we already know
how the rest of the CFG was lowered since a non-simple SI_IF
case still needs to be handled.

Differential Revision: https://reviews.llvm.org/D77610
2020-04-07 10:27:23 -07:00
Stanislav Mekhanoshin 9f09550c50 [AMDGPU] Remove clutter from endcf test. NFC. 2020-04-06 16:32:21 -07:00
Konstantin Pyzhov 72e8754916 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 09:05:58 -04:00
Konstantin Pyzhov 51dc028314 Revert e1730cfeb3 2020-04-06 05:56:11 -04:00
Konstantin Pyzhov e1730cfeb3 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 05:10:37 -04:00
Jonathan Roelofs 7c5d2bec76 [llvm] Fix missing FileCheck directive colons
https://reviews.llvm.org/D77352
2020-04-06 09:59:08 -06:00
Matt Arsenault 8a5f0dafd4 AMDGPU/GlobalISel: Select llvm.amdgcn.div.scale 2020-04-06 11:50:19 -04:00
Matt Arsenault e87ec66762 AMDGPU/GlobalISel: Fix llvm.amdgcn.div.fmas.ll 2020-04-06 11:50:16 -04:00
Matt Arsenault 08772f1742 AMDGPU/GlobalISel: Add unmerge of concat tests 2020-04-06 11:03:55 -04:00
Matt Arsenault 70726cec5b DAG: Combine extract_vector_elt of concat_vectors
Fixes extra canonicalize regressions when legalizing
vector fminnum/fmaxnum.
2020-04-06 09:26:29 -04:00
Matt Arsenault 9620fe02df AMDGPU/GlobalISel: Add some G_INSERT/G_EXTRACT select tests 2020-04-05 10:54:24 -04:00
Matt Arsenault 6bfe28e92f AMDGPU: Fix annotate kernel features through casted calls
I thought I was testing this before, but the workitem id x
case isn't great since it's mandatory in the parent kernel.
2020-04-04 20:44:44 -04:00