Commit Graph

361 Commits

Author SHA1 Message Date
Nicolai Haehnle 7cfd99ab15 AMDGPU/GFX10: fix scratch resource descriptor
Summary:
The stride should depend on the wave size, not the hardware generation.

Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be
relevant.

Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6

Reviewers: arsenm, rampitec, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63808

llvm-svn: 364788
2019-07-01 15:43:00 +00:00
Nicolai Haehnle 2710171a15 AMDGPU: Write LDS objects out as global symbols in code generation
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.

Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.

Some notes:

- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
  to a constant at compile times, which means some tests can no longer
  be applied.

  The current "solution" is a terrible hack, but the intrinsic isn't
  used by Mesa, so we can keep it for now.

- We no longer know the full LDS size per kernel at compile time, which
  means that we can no longer generate a relevant error message at
  compile time. It would be possible to add a check for the size of
  individual variables, but ultimately the linker will have to perform
  the final check.

Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275

Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin

Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61494

llvm-svn: 364297
2019-06-25 11:52:30 +00:00
Matt Arsenault c67c484f36 AMDGPU: Don't clobber VCC in MUBUF addr64 emulation
Introducing VCC defs during SIFixSGPRCopies is generally
problematic. Avoid it by starting with the VOP3 form with the general
condition register. This is the easiest to fix instance, but doesn't
solve any specific problems I'm looking at.

llvm-svn: 363904
2019-06-20 00:51:28 +00:00
Matt Arsenault e4c2e9b016 AMDGPU: Consolidate some getGeneration checks
This is incomplete, and ideally these would all be removed, but it's
better to localize them to the subtarget first with comments about
what they're for.

llvm-svn: 363902
2019-06-19 23:54:58 +00:00
Matt Arsenault 4d000d2488 AMDGPU: Fix folding immediate into readfirstlane through reg_sequence
The def instruction for the vreg may not match, because it may be
folding through a reg_sequence. The assert was overly conservative and
not necessary. It's not actually important if DefMI really defined the
register, because the fold that will be done cares about the def of
the value that will be folded.

For some reason copies aren't making it through the reg_sequence,
although they should.

llvm-svn: 363876
2019-06-19 20:44:15 +00:00
Matt Arsenault 4d55d024be Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"
This reapplies r363678, using the correct chain for the CopyToReg for
v0. glueCopyToM0 counterintuitively changes the operands of the
original node.

llvm-svn: 363870
2019-06-19 19:55:27 +00:00
Simon Pilgrim 128ce93c60 Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on
SI/CI.
........
Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/

llvm-svn: 363797
2019-06-19 13:00:54 +00:00
Matt Arsenault 8d35dcd703 AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on
SI/CI.

llvm-svn: 363678
2019-06-18 13:19:57 +00:00
Matt Arsenault f39f3bd056 AMDGPU: Change API for checking for exec modification
Invert the name and return value to better reflect the imprecise
nature.

Force passing in the DefMI, since it's known in the 2 users and could
possibly fail for an arbitrary vreg.

Allow specifying a specific user instruction. Scan through use
instructions, instead of use operands. Add scan thresholds instead of
searching infinitely.

Stop using a set to track seen uses. I didn't understand this usage,
or why it would not check the last use. I don't think the use list has
any particular order.

llvm-svn: 363675
2019-06-18 12:48:36 +00:00
Sander de Smalen 5d6ee76c16 Describe stack-id as an enum
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.

This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.

Reviewers: arsenm, thegameg, qcolombet

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D60137

llvm-svn: 363533
2019-06-17 09:13:29 +00:00
Nicolai Haehnle 41abf2766e AMDGPU: Prepare for explicit absolute relocations in code generation
Summary:
We will use absolute relocations for LDS symbols.

Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61492

llvm-svn: 363517
2019-06-16 17:43:37 +00:00
Nicolai Haehnle 6d71be4e67 AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0
Summary:
Instead of encoding a high-word of 0 using a fake TargetGlobalAddress,
just use a literal target constant. This simplifies some subsequent changes.

The generated assembly is now more explicit about the kind of relocation
that is to be used.

Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61491

llvm-svn: 363516
2019-06-16 17:32:01 +00:00
Stanislav Mekhanoshin 5250021672 [AMDGPU] gfx10 conditional registers handling
This is cpp source part of wave32 support, excluding overriden
getRegClass().

Differential Revision: https://reviews.llvm.org/D63351

llvm-svn: 363513
2019-06-16 17:13:09 +00:00
Matt Arsenault 642f39c93e AMDGPU: Fix missing const
llvm-svn: 363383
2019-06-14 13:26:23 +00:00
Stanislav Mekhanoshin 245b5ba344 [AMDGPU] gfx1010 dpp16 and dpp8
Differential Revision: https://reviews.llvm.org/D63203

llvm-svn: 363186
2019-06-12 18:02:41 +00:00
Stanislav Mekhanoshin 5f581c9f08 [AMDGPU] gfx1010 premlane instructions
Differential Revision: https://reviews.llvm.org/D63202

llvm-svn: 363185
2019-06-12 17:52:51 +00:00
Matt Arsenault ddd2c9ac86 AMDGPU: Force skips around traps
llvm-svn: 362852
2019-06-07 23:02:52 +00:00
Matt Arsenault b6cfa129cc AMDGPU: Insert skip branches over return blocks
SIInsertSkips really doesn't understand the control flow, and makes
very stupid assumptions about the block layout. This was able to get
away with not skipping return blocks, since usually after
structurization there is only one placed at the end of the
function. Tail duplication can break this assumption.

llvm-svn: 362754
2019-06-06 22:51:51 +00:00
Alexander Timofeev 37bd9bd137 [AMDGPU] Partial revert for the ba447bae74
"Divergence driven ISel. Assign register class for cross block values
       according to the divergence."
       that discovered the design flaw leading to several issues that
       required to be solved before.

       This change reverts AMDGPU specific changes and keeps common part
       unaffected.

llvm-svn: 362749
2019-06-06 21:13:02 +00:00
Matt Arsenault b812b7a45e AMDGPU: Invert frame index offset interpretation
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.

Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.

The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.

Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.

Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.

llvm-svn: 362661
2019-06-05 22:20:47 +00:00
Matt Arsenault 0f8a764e8f AMDGPU: Fix using 2 different enums for same operand flags
These enums are really for the same namespace of flags set on
arbitrary MachineOperands, so merge them to avoid value collisions.

llvm-svn: 362640
2019-06-05 20:32:25 +00:00
Dmitry Preobrazhensky 9111f35f02 [AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands
See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292

Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D62660

llvm-svn: 362400
2019-06-03 13:51:24 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
Matt Arsenault ca64ef2043 MC: Allow getMaxInstLength to depend on the subtarget
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.

For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.

llvm-svn: 361405
2019-05-22 16:28:41 +00:00
Matt Arsenault 2cba91b8db AMDGPU: Assume calls read exec
llvm-svn: 361333
2019-05-21 23:23:16 +00:00
Matt Arsenault 6dd08e335f AMDGPU: Force skip branches over calls
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.

Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.

llvm-svn: 361202
2019-05-20 22:04:42 +00:00
Stanislav Mekhanoshin 05791d90c9 [AMDGPU] Fixed handling of imemdiate i1 literals
This bug was exposed by the rL360395.

Differential Revision: https://reviews.llvm.org/D61812

llvm-svn: 360689
2019-05-14 16:18:00 +00:00
Nicolai Haehnle 79ea85c6af AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand
Summary:
No test case because I don't know of a way to trigger this, but I
accidentally caused this to fail while working on a different change.

Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61490

llvm-svn: 360123
2019-05-07 09:19:09 +00:00
Stanislav Mekhanoshin 491746a584 [AMDGPU] gfx1010 verifier changes
Differential Revision: https://reviews.llvm.org/D61521

llvm-svn: 360095
2019-05-06 22:49:45 +00:00
Stanislav Mekhanoshin 971cb8b633 [AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32
GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for
all targets.

Differential Revision: https://reviews.llvm.org/D61525

llvm-svn: 360094
2019-05-06 22:27:05 +00:00
Stanislav Mekhanoshin 28a1936f6d [AMDGPU] gfx1010: use fmac instructions
Differential Revision: https://reviews.llvm.org/D61527

llvm-svn: 359959
2019-05-04 04:20:37 +00:00
Stanislav Mekhanoshin 5cf8167735 [AMDGPU] gfx1010 allows VOP3 to have a literal
Differential Revision: https://reviews.llvm.org/D61413

llvm-svn: 359756
2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin f2baae0abb [AMDGPU] gfx1010 constant bus limit
Constant bus limit has increased to 2 with GFX10.

Differential Revision: https://reviews.llvm.org/D61404

llvm-svn: 359754
2019-05-02 03:47:23 +00:00
Stanislav Mekhanoshin 692560dc98 [AMDGPU] gfx1010 MIMG implementation
Differential Revision: https://reviews.llvm.org/D61339

llvm-svn: 359698
2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin a6322941ff [AMDGPU] gfx1010 VMEM and SMEM implementation
Differential Revision: https://reviews.llvm.org/D61330

llvm-svn: 359621
2019-04-30 22:08:23 +00:00
Mark Searles 76c5b62988 Revert "AMDGPU: Split block for si_end_cf"
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.

We discovered some internal test failures, so reverting for now.

Differential Revision: https://reviews.llvm.org/D61213

llvm-svn: 359363
2019-04-27 00:51:18 +00:00
Stanislav Mekhanoshin 61beff020e [AMDGPU] gfx1010 VOP3 and VOP3P implementation
Differential Revision: https://reviews.llvm.org/D61202

llvm-svn: 359328
2019-04-26 17:56:03 +00:00
Stanislav Mekhanoshin 8f3da70eed [AMDGPU] gfx1010 VOP2 changes
Differential Revision: https://reviews.llvm.org/D61156

llvm-svn: 359316
2019-04-26 16:37:51 +00:00
Stanislav Mekhanoshin cee607e414 [AMDGPU] Add gfx1010 target definitions
Differential Revision: https://reviews.llvm.org/D61041

llvm-svn: 359113
2019-04-24 17:03:15 +00:00
Bjorn Pettersson 238c9d6308 [CodeGen] Add "const" to MachineInstr::mayAlias
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).

The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60856

llvm-svn: 358744
2019-04-19 09:08:38 +00:00
Matt Arsenault 396653f8a1 AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.

Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.

This avoids regressions in the fast register allocator rewrite from
inverting the direction.

llvm-svn: 357634
2019-04-03 20:53:20 +00:00
Neil Henning 0a30f33ce2 [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal
with WWM operations potentially trashing valid values in inactive lanes.

Previously, the SIFixWWMLiveness pass would work out which registers
were being trashed within WWM regions, and ensure that the register
allocator did not have any values it was depending on resident in those
registers if the WWM section would trash them. This worked perfectly
well, but would cause sometimes severe register pressure when the WWM
section resided before divergent control flow (or at least that is where
I mostly observed it).

This fix instead runs through the WWM sections and pre allocates some
registers for WWM. It then reserves these registers so that the register
allocator cannot use them. This results in a significant register
saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just
this change!).

Differential Revision: https://reviews.llvm.org/D59295

llvm-svn: 357400
2019-04-01 15:19:52 +00:00
Matt Arsenault a353fd572a AMDGPU: Make exec mask optimzations more resistant to block splits
Also improve the check for SALU instructions to also ignore
implicit_def and other fake instructions.

llvm-svn: 357170
2019-03-28 14:01:39 +00:00
Matt Arsenault 28f97f1dbc AMDGPU: Don't hardcode num defs for MUBUF instructions
This shouldn't change anything since the no-ret atomics are selected
later.

llvm-svn: 357084
2019-03-27 16:12:29 +00:00
Matt Arsenault bbc59d8d0d AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics
The offset operand index is different for atomics.

llvm-svn: 357073
2019-03-27 15:41:00 +00:00
Tim Renouf 033f99a2e5 [AMDGPU] Added v5i32 and v5f32 register classes
They are not used by anything yet, but a subsequent commit will start
using them for image ops that return 5 dwords.

Differential Revision: https://reviews.llvm.org/D58903

Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271
llvm-svn: 356735
2019-03-22 10:11:21 +00:00
Tim Renouf 361b5b2193 [AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58902

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9
llvm-svn: 356659
2019-03-21 12:01:21 +00:00
Michael Liao efb4f9e568 [AMDGPU] Enable code selection using `s_mul_hi_u32`/`s_mul_hi_i32`.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59501

llvm-svn: 356405
2019-03-18 20:40:09 +00:00