Commit Graph

2231 Commits

Author SHA1 Message Date
Matt Arsenault b6c599afd3 Reapply r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
This reverts commit r359912.

This should pass now, since the clang test was made less fragile in
r359918.

llvm-svn: 359919
2019-05-03 19:06:57 +00:00
Nico Weber bb852a9672 Revert r359906, "RegAllocFast: Add heuristic to detect values not live-out of a block"
Makes clang/test/Misc/backend-stack-frame-diagnostics-fallback.cpp fail.

llvm-svn: 359912
2019-05-03 18:08:03 +00:00
Matt Arsenault daf2d653fa RegAllocFast: Add heuristic to detect values not live-out of a block
Add an improved/new heuristic to catch more cases when values are not
live out of a basic block.

Patch by Matthias Braun

llvm-svn: 359906
2019-05-03 17:03:24 +00:00
Matt Arsenault 657ef48a88 AMDGPU: Select VOP3 form of sub
The VOP3 form should always be the preferred selection form to be
shrunk later.

The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 359899
2019-05-03 15:37:07 +00:00
Matt Arsenault cfd0ca38b0 AMDGPU: Support shrinking add with FI in SIFoldOperands
Avoids test regression in a future patch

llvm-svn: 359898
2019-05-03 15:21:53 +00:00
Matt Arsenault ca7a582bf3 AMDGPU: Add baseline test for future patch
llvm-svn: 359893
2019-05-03 14:54:38 +00:00
Matt Arsenault 0446fbe45e AMDGPU: Replace shrunk instruction with dummy implicit_def
This was broken if the original operand was killed. The kill flag
would appear on both instructions, and fail the verifier. Keep the
kill flag, but remove the operands from the old instruction. This has
an added benefit of really reducing the use count for future folds.

Ideally the pass would be structured more like what PeepholeOptimizer
does to avoid this hack to avoid breaking instruction iterators.

llvm-svn: 359891
2019-05-03 14:40:10 +00:00
Matt Arsenault 6d0c59605c AMDGPU: Forgot to commit test file for r358890
llvm-svn: 359885
2019-05-03 13:55:40 +00:00
Matt Arsenault 2c8936fd26 AMDGPU: Fix incorrect commute with sub when folding immediates
When a fold of an immediate into a sub/subrev required shrinking the
instruction, the wrong VOP2 opcode was used. This was using the VOP2
equivalent of the original instruction, not the commuted instruction
with the inverted opcode.

llvm-svn: 359883
2019-05-03 13:42:56 +00:00
Matt Arsenault 2636460f0e AMDGPU: Fix test verification
This should run the verifier, and needs to enable trackRegLiveness.

llvm-svn: 359882
2019-05-03 13:42:55 +00:00
Stanislav Mekhanoshin 64399da8b8 [AMDGPU] gfx1010 lost VOP2 forms of some add/sub
Add legalization of V_ADD_I32, V_SUB_I32, V_SUBREV_I32.

Differential Revision:

llvm-svn: 359757
2019-05-02 04:26:35 +00:00
Stanislav Mekhanoshin 5cf8167735 [AMDGPU] gfx1010 allows VOP3 to have a literal
Differential Revision: https://reviews.llvm.org/D61413

llvm-svn: 359756
2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin 3b7925f035 [AMDGPU] gfx1010 GCNRegBankReassign pass
Reassign registers to reduce register bank conflicts.

Differential Revision: https://reviews.llvm.org/D61344

llvm-svn: 359704
2019-05-01 16:49:31 +00:00
Stanislav Mekhanoshin c29d491596 [AMDGPU] gfx1010 GCNNSAReassign pass
Convert NSA into non-NSA images.

Differential Revision: https://reviews.llvm.org/D61341

llvm-svn: 359700
2019-05-01 16:40:49 +00:00
Stanislav Mekhanoshin 692560dc98 [AMDGPU] gfx1010 MIMG implementation
Differential Revision: https://reviews.llvm.org/D61339

llvm-svn: 359698
2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin a224f68a10 [AMDGPU] gfx1010 DS implementation
Differential Revision: https://reviews.llvm.org/D61332

llvm-svn: 359696
2019-05-01 16:11:11 +00:00
Fangrui Song e29e30b139 [llvm-readobj] Change -long-option to --long-option in tests. NFC
We use both -long-option and --long-option in tests. Switch to --long-option for consistency.

In the "llvm-readelf" mode, -long-option is discouraged as it conflicts with grouped short options and it is not accepted by GNU readelf.

While updating the tests, change llvm-readobj -s to llvm-readobj -S to reduce confusion ("s" is --section-headers in llvm-readobj but --symbols in llvm-readelf).

llvm-svn: 359649
2019-05-01 05:27:20 +00:00
Stanislav Mekhanoshin a6322941ff [AMDGPU] gfx1010 VMEM and SMEM implementation
Differential Revision: https://reviews.llvm.org/D61330

llvm-svn: 359621
2019-04-30 22:08:23 +00:00
Bjorn Pettersson 820994572c [DAG] Refactor DAGCombiner::ReassociateOps
Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.

Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.

Reviewers: spatel, craig.topper, tstellar

Reviewed By: spatel

Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61199

llvm-svn: 359476
2019-04-29 17:50:10 +00:00
Mark Searles 76c5b62988 Revert "AMDGPU: Split block for si_end_cf"
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.

We discovered some internal test failures, so reverting for now.

Differential Revision: https://reviews.llvm.org/D61213

llvm-svn: 359363
2019-04-27 00:51:18 +00:00
Stanislav Mekhanoshin 61beff020e [AMDGPU] gfx1010 VOP3 and VOP3P implementation
Differential Revision: https://reviews.llvm.org/D61202

llvm-svn: 359328
2019-04-26 17:56:03 +00:00
Stanislav Mekhanoshin 8f3da70eed [AMDGPU] gfx1010 VOP2 changes
Differential Revision: https://reviews.llvm.org/D61156

llvm-svn: 359316
2019-04-26 16:37:51 +00:00
Stanislav Mekhanoshin cee607e414 [AMDGPU] Add gfx1010 target definitions
Differential Revision: https://reviews.llvm.org/D61041

llvm-svn: 359113
2019-04-24 17:03:15 +00:00
Stanislav Mekhanoshin c464dddccb [AMDGPU] Fixed addReg() in SIOptimizeExecMaskingPreRA.cpp
The second argument is flags, not subreg.

Differential Revision: https://reviews.llvm.org/D61031

llvm-svn: 359017
2019-04-23 17:59:26 +00:00
Scott Linder 3eed961973 [AMDGPU] Fix hidden argument metadata duplication for V3
Essentially complete a proper rebase of the V3 metadata change over
https://reviews.llvm.org/D49096.

Minimize the diff between the V2 and V3 variants of the relevant lit
tests, and clean up some trailing whitespace.

llvm-svn: 358992
2019-04-23 14:31:17 +00:00
Nicolai Haehnle 7edae4c403 AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies
Summary:
When an LCSSA phi survives through instruction selection, the pass
ends up removing that phi entirely because it is dominated by the
logic that does the lanemask merging.

This then used to trigger an assertion when processing a dependent
phi instruction.

Change-Id: Id4949719f8298062fe476a25718acccc109113b6

Reviewers: llvm-commits

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, tpr, dstuttard, rtaylor, arsenm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60999

llvm-svn: 358983
2019-04-23 13:12:52 +00:00
Bjorn Pettersson f97b29be88 [DAGCombiner] Combine OR as ADD when no common bits are set
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.

This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).

To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.

Reviewers: spatel, RKSimon, craig.topper, kparzysz

Reviewed By: spatel

Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59758

llvm-svn: 358965
2019-04-23 10:01:08 +00:00
Michael Liao 389d5a3474 [AMDGPU] Fix an issue in `op_sel_hi` skipping.
Summary:
- Only apply packed literal `op_sel_hi` skipping on operands requiring
  packed literals. Even an instruction is `packed`, it may have operand
  requiring non-packed literal, such as `v_dot2_f32_f16`.

Reviewers: rampitec, arsenm, kzhuravl

Subscribers: jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60978

llvm-svn: 358922
2019-04-22 22:05:49 +00:00
Matt Arsenault f84ce75cd1 AMDGPU: Skip debug instructions in assert
These are inserted after branch relaxation, and for some reason it's
decided to put them in the long branch expansion block. It's probably
not great to rely on the source block address, so this should probably
be switched to being PC relative instead of relying on the block
address

llvm-svn: 358909
2019-04-22 19:14:26 +00:00
Matt Arsenault 2b6f76f05f AMDGPU/GlobalISel: Fix non-power-of-2 G_EXTRACT sources
llvm-svn: 358894
2019-04-22 15:22:46 +00:00
Matt Arsenault 8f624abc1d GlobalISel: Legalize scalar G_EXTRACT sources
llvm-svn: 358892
2019-04-22 15:10:42 +00:00
Simon Pilgrim 6276ce0142 [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

Differential Revision: https://reviews.llvm.org/D60462

llvm-svn: 358887
2019-04-22 14:04:35 +00:00
Simon Pilgrim ffd67233d4 [AMDGPU] Regenerate uitofp i8 to float conversion tests.
Prep work for D60462

llvm-svn: 358879
2019-04-22 10:19:09 +00:00
Simon Pilgrim 4c09b7d921 [AMDGPU] Regenerate extractelt->truncate test.
Prep work for D60462

llvm-svn: 358746
2019-04-19 09:49:04 +00:00
Tim Renouf 7c55c8d8c3 [AMDGPU] Avoid DAG combining assert with fneg(fadd(A,0))
fneg combining attempts to turn it into fadd(fneg(A), fneg(0)), but
creating the new fadd folds to just fneg(A). When A has multiple uses,
this confuses it and you get an assert. Fixed.

Differential Revision: https://reviews.llvm.org/D60633

Change-Id: I0ddc9b7286abe78edc0cd8d734fdeb05ff09821c
llvm-svn: 358640
2019-04-18 05:27:01 +00:00
Rhys Perry c2814e12e7 AMDGPU: Force skip over SMRD, VMEM and s_waitcnt instructions
Summary: This fixes a large Dawn of War 3 performance regression with RADV from Mesa 19.0 to master which was caused by creating less code in some branches.

Reviewers: arsen, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60824

llvm-svn: 358592
2019-04-17 16:31:52 +00:00
Matt Arsenault 101abd219b AMDGPU: Fix unreachable when counting register usage of SGPR96
llvm-svn: 358447
2019-04-15 20:51:12 +00:00
Matt Arsenault fbdd2a1887 AMDGPU: Fix printed format of SReg_96
These are artificial, so I think this should only come up with inline
asm comments.

llvm-svn: 358446
2019-04-15 20:42:18 +00:00
Amara Emerson 946b1246d6 [GlobalISel] Enable CSE in the IRTranslator & legalizer for -O0 with constants only.
Other opcodes shouldn't be CSE'd until we can be sure debug info quality won't
be degraded.

This change also improves the IRTranslator so that in most places, but not all,
it creates constants using the MIRBuilder directly instead of first creating a
new destination vreg and then creating a constant. By doing this, the
buildConstant() method can just return the vreg of an existing G_CONSTANT
instead of having to create a COPY from it.

I measured a 0.2% improvement in compile time and a 0.9% improvement in code
size at -O0 ARM64.

Compile time:
Program                                        base   cse    diff
test-suite...ark/tramp3d-v4/tramp3d-v4.test     9.04   9.12  0.8%
test-suite...Mark/mafft/pairlocalalign.test     2.68   2.66 -0.7%
test-suite...-typeset/consumer-typeset.test     5.53   5.51 -0.4%
test-suite :: CTMark/lencod/lencod.test         5.30   5.28 -0.3%
test-suite :: CTMark/Bullet/bullet.test        25.82  25.76 -0.2%
test-suite...:: CTMark/ClamAV/clamscan.test     6.92   6.90 -0.2%
test-suite...TMark/7zip/7zip-benchmark.test    34.24  34.17 -0.2%
test-suite :: CTMark/SPASS/SPASS.test           6.25   6.24 -0.1%
test-suite...:: CTMark/sqlite3/sqlite3.test     1.66   1.66 -0.1%
test-suite :: CTMark/kimwitu++/kc.test         13.61  13.60 -0.0%
Geomean difference                                          -0.2%

Code size:
Program                                        base     cse      diff
test-suite...-typeset/consumer-typeset.test    1315632  1266480 -3.7%
test-suite...:: CTMark/ClamAV/clamscan.test    1313892  1297508 -1.2%
test-suite :: CTMark/lencod/lencod.test        1439504  1423112 -1.1%
test-suite...TMark/7zip/7zip-benchmark.test    2936980  2904172 -1.1%
test-suite :: CTMark/Bullet/bullet.test        3478276  3445460 -0.9%
test-suite...ark/tramp3d-v4/tramp3d-v4.test    8082868  8033492 -0.6%
test-suite :: CTMark/kimwitu++/kc.test         3870380  3853972 -0.4%
test-suite :: CTMark/SPASS/SPASS.test          1434904  1434896 -0.0%
test-suite...Mark/mafft/pairlocalalign.test    764528   764528   0.0%
test-suite...:: CTMark/sqlite3/sqlite3.test    782092   782092   0.0%
Geomean difference                                              -0.9%

Differential Revision: https://reviews.llvm.org/D60580

llvm-svn: 358369
2019-04-15 05:04:20 +00:00
David Green 0861c87b06 Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.

I will try to follow this up with some better tests.

llvm-svn: 358113
2019-04-10 18:00:41 +00:00
Matt Arsenault 7187272b2b GlobalISel: Support legalizing G_CONSTANT with irregular breakdown
llvm-svn: 358109
2019-04-10 17:27:53 +00:00
Matt Arsenault 9e0eeba569 GlobalISel: Handle odd breakdowns for bit ops
llvm-svn: 358105
2019-04-10 17:07:56 +00:00
Stanislav Mekhanoshin 913ba8eeb4 Revert LIS handling in MachineDCE
One of out of tree targets has regressed with this patch. Reverting
it for now and let liveness to be fully reconstructed in case pass
was used after the LIS is created to resolve the regression.

Differential Revision: https://reviews.llvm.org/D60466

llvm-svn: 358015
2019-04-09 16:13:53 +00:00
Tom Stellard 206b9927f8 AMDGPU/GlobalISel: Implement call lowering for shaders returning values
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, llvm-commits

Differential Revision: https://reviews.llvm.org/D57166

llvm-svn: 357964
2019-04-09 02:26:03 +00:00
Nikita Popov 3db93ac5d6 Reapply [ValueTracking] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.

Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 357870
2019-04-07 17:22:16 +00:00
Stanislav Mekhanoshin c8f78f8dd3 [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.

MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.

Differential Revision: https://reviews.llvm.org/D59626

llvm-svn: 357805
2019-04-05 20:11:32 +00:00
Matt Arsenault 4ed6ccab9b AMDGPU/GlobalISel: Fix non-power-of-2 select
llvm-svn: 357762
2019-04-05 14:03:04 +00:00
Piotr Sobczak 0376ac1d94 [SelectionDAG] Compute known bits of CopyFromReg
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.

This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.

Also add a missing truncation on X86, found by testing of this patch.

Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa

Reviewers: bogner, craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59535

llvm-svn: 357745
2019-04-05 07:44:09 +00:00
Matt Arsenault 396653f8a1 AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.

Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.

This avoids regressions in the fast register allocator rewrite from
inverting the direction.

llvm-svn: 357634
2019-04-03 20:53:20 +00:00
Matt Arsenault f426ddbfc7 AMDGPU: Assume ECC is enabled by default if supported
The test should really be checking for the property directly in the
code object headers, but there are problems with this. I don't see
this directly represented in the text form, and for the binary
emission this is depending on a function level subtarget feature to
emit a global flag.

llvm-svn: 357558
2019-04-03 01:58:57 +00:00