Commit Graph

176 Commits

Author SHA1 Message Date
Matt Arsenault 26faed3960 AMDGPU: Consolidate inline immediate predicate functions
llvm-svn: 288718
2016-12-05 22:26:17 +00:00
Matt Arsenault 97279a8ca3 AMDGPU: Rename flat operands to match mubuf
Use vaddr/vdst for the same purposes.

This also fixes a beg in SIInsertWaits for the
operand check. The stored value operand is currently called
data0 in the single offset case, not data.

llvm-svn: 288188
2016-11-29 19:30:44 +00:00
Matt Arsenault 437fd71f5b AMDGPU: Use else if
llvm-svn: 288187
2016-11-29 19:30:41 +00:00
Marek Olsak 79c05871a2 AMDGPU/SI: Add back reverted SGPR spilling code, but disable it
suggested as a better solution by Matt

llvm-svn: 287942
2016-11-25 17:37:09 +00:00
Marek Olsak e3895bfb47 Revert "AMDGPU: Implement SGPR spilling with scalar stores"
This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e.

llvm-svn: 287936
2016-11-25 16:03:34 +00:00
Marek Olsak a45dae458d Revert "AMDGPU: Make m0 unallocatable"
This reverts commit 124ad83dae04514f943902446520c859adee0e96.

llvm-svn: 287932
2016-11-25 16:03:15 +00:00
Matt Arsenault 9e5c7b1031 AMDGPU: Make m0 unallocatable
m0 may need to be written for spill code, so
we don't want general code uses relying on the
value stored in it.

This introduces a few code quality regressions where copies
from m0 are not coalesced into copies of a copy of m0.

llvm-svn: 287841
2016-11-24 00:26:40 +00:00
Nicolai Haehnle ce2b589df5 AMDGPU: Fix legalization of MUBUF instructions in shaders
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions.  This affects
e.g.  shaders that access buffer textures.

Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention.  If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26747

llvm-svn: 287339
2016-11-18 11:55:52 +00:00
Tom Stellard 0d162b1c4f AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass
Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
   instructions.

The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.

This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23408

llvm-svn: 287131
2016-11-16 18:42:17 +00:00
Matt Arsenault 3666629837 AMDGPU: Analyze mubuf with immediate soffset
Fixes giving up on clustering common addr64 accesses with
constant 0 soffset.

llvm-svn: 287018
2016-11-15 20:14:27 +00:00
Stanislav Mekhanoshin ea91cca593 [AMDGPU] Add wave barrier builtin
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.

Differential Revision: https://reviews.llvm.org/D26585

llvm-svn: 287007
2016-11-15 19:00:15 +00:00
Matt Arsenault dc45274d54 AMDGPU: Implement SGPR spilling with scalar stores
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.

This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.

The cache also needs to be flushed before wave termination
if a scalar store is used.

llvm-svn: 286766
2016-11-13 18:20:54 +00:00
Konstantin Zhuravlyov f86e4b7266 [AMDGPU] Add f16 support (VI+)
Differential Revision: https://reviews.llvm.org/D25975

llvm-svn: 286753
2016-11-13 07:01:11 +00:00
Matt Arsenault 52f14ec596 AMDGPU: Preserve vcc undef flags when inverting branch
If the branch was on a read-undef of vcc, passes that used
analyzeBranch to invert the branch condition wouldn't preserve
the undef flag resulting in a verifier error.

Fixes verifier failures in a future commit.

Also fix verifier error when inserting copy for vccz
corruption bug.

llvm-svn: 286133
2016-11-07 19:09:27 +00:00
Matt Arsenault 314cbf7a3b AMDGPU: Refactor copyPhysReg
Separate the subregister splitting logic to re-use later.

llvm-svn: 286118
2016-11-07 16:39:22 +00:00
Nicolai Haehnle 368972c3b3 AMDGPU: Allow additional implicit operands on MOVRELS instructions
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.

Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25656

llvm-svn: 285835
2016-11-02 17:03:11 +00:00
Matt Arsenault 3d463193a9 AMDGPU: Default to using scalar mov to materialize immediate
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.

Also start verifying the register classes of inlineasm operands.

llvm-svn: 285762
2016-11-01 22:55:07 +00:00
Matt Arsenault 2d8c289b4b AMDGPU: Workaround for instruction size with literals
Instructions with a 32-bit base encoding with an optional
32-bit literal encoded after them report their size as 4
for the disassembler. Consider these when computing the
MachineInstr size. This fixes problems caused by size estimate
consistency in BranchRelaxation.

llvm-svn: 285743
2016-11-01 20:42:24 +00:00
Matt Arsenault c88ba36eab AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

llvm-svn: 285490
2016-10-29 04:05:06 +00:00
Tom Stellard 6695ba0440 AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.

Reviewers: tony-tye, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D25998

llvm-svn: 285479
2016-10-28 23:53:48 +00:00
Matt Arsenault 7b6475568d AMDGPU: Add definitions for scalar store instructions
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.

This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.

llvm-svn: 285463
2016-10-28 21:55:15 +00:00
Matt Arsenault 08906a3c62 AMDGPU: Fix using incorrect private resource with no allocation
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.

llvm-svn: 285435
2016-10-28 19:43:31 +00:00
Matt Arsenault 1110f14b42 AMDGPU: Fix counting si_mask_branch as 4 bytes
llvm-svn: 285202
2016-10-26 14:53:54 +00:00
Nicolai Haehnle a785209bc2 AMDGPU: Fix Two Address problems with v_movreld
Summary:
The v_movreld machine instruction is used with three operands that are
in a sense tied to each other (the explicit VGPR_32 def and the implicit
VGPR_NN def and use). There is no way to express that using the currently
available operand bits, and indeed there are cases where the Two Address
instructions pass does the wrong thing.

This patch introduces a new set of pseudo instructions that are identical
in intended semantics as v_movreld, but they only have two tied operands.

Having to add a new set of pseudo instructions is admittedly annoying, but
it's a fairly straightforward and solid approach. The only alternative I
see is to try to teach the Two Address instructions pass about Three Address
instructions, and I'm afraid that's trickier and is going to end up more
fragile.

Note that v_movrels does not suffer from this problem, and so this patch
does not touch it.

This fixes several GL45-CTS.shaders.indexing.* tests.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25633

llvm-svn: 284980
2016-10-24 14:56:02 +00:00
Konstantin Zhuravlyov c96b5d7073 [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables
Differential Revision: https://reviews.llvm.org/D25562

llvm-svn: 284196
2016-10-14 04:37:34 +00:00
Matt Arsenault d486d3f8d1 AMDGPU: Initial implementation of VGPR indexing mode
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.

llvm-svn: 284031
2016-10-12 18:49:05 +00:00
Matt Arsenault 6bc43d8627 BranchRelaxation: Support expanding unconditional branches
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

llvm-svn: 283464
2016-10-06 16:20:41 +00:00
Matt Arsenault 5d8eb25e78 AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Matt Arsenault e6740754f0 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Matt Arsenault 7b1dc2c983 AMDGPU: Use i64 scalar compare instructions
VI added eq/ne for i64, so use them.

llvm-svn: 281800
2016-09-17 02:02:19 +00:00
Matt Arsenault 7ccf6cd104 AMDGPU: Use SOPK compare instructions
llvm-svn: 281780
2016-09-16 21:41:16 +00:00
Matt Arsenault 1b9fc8ed65 Finish renaming remaining analyzeBranch functions
llvm-svn: 281535
2016-09-14 20:43:16 +00:00
Matt Arsenault e8e0f5cac6 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

llvm-svn: 281506
2016-09-14 17:24:15 +00:00
Matt Arsenault a2b036e88b AArch64: Use TTI branch functions in branch relaxation
The main change is to return the code size from
InsertBranch/RemoveBranch.

Patch mostly by Tim Northover

llvm-svn: 281505
2016-09-14 17:23:48 +00:00
Matt Arsenault 25dba30017 AMDGPU: Support commuting a FrameIndex operand
llvm-svn: 281369
2016-09-13 19:03:12 +00:00
Nicolai Haehnle e58e0e3fe3 AMDGPU: Do not clobber SCC in SIWholeQuadMode
Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22198

llvm-svn: 281230
2016-09-12 16:25:20 +00:00
Matt Arsenault 3354f42ae7 AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndex
llvm-svn: 281128
2016-09-10 01:20:33 +00:00
Matt Arsenault 124384f08d AMDGPU: Fix immediate folding logic when shrinking instructions
If the literal is being folded into src0, it doesn't matter
if it's an SGPR because it's being replaced with the literal.

Also fixes initially selecting 32-bit versions of some instructions
which also confused commuting.

llvm-svn: 281117
2016-09-09 23:32:53 +00:00
Sam Kolton 1eeb11bfd4 AMDGPU] Assembler: better support for immediate literals in assembler.
Summary:
Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals.
E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least.

With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction).
Here are rules how we convert literals:
    - We parsed fp literal:
        - Instruction expects 64-bit operand:
            - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5)
                - then we do nothing this literal
            - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5)
                - report error
            - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant
                - If instruction expect fp operand type (f64)
                    - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5)
                        - If so then do nothing
                    - Else (e.g. v_fract_f64 v[0:1], 3.1415)
                        - report warning that low 32 bits will be set to zeroes and precision will be lost
                        - set low 32 bits of literal to zeroes
                - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5)
                    - report error as it is unclear how to encode this literal
        - Instruction expects 32-bit operand:
            - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow
            - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5)
                - do nothing
                - Else report error
            - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0)
    - Parsed binary literal:
        - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35)
            - do nothing
        - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35)
            - report error
        - Else, literal is not-inlinable and we are not required to inline it
            - Are high 32 bit of literal zeroes or same as sign bit (32 bit)
                - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef)
            - Else
                - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0)

For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types:
'''
enum OperandType {
    OPERAND_REG_IMM32_INT,
    OPERAND_REG_IMM32_FP,
    OPERAND_REG_INLINE_C_INT,
    OPERAND_REG_INLINE_C_FP,
}
'''

This is not working yet:
    - Several tests are failing
    - Problems with predicate methods for inline immediates
    - LLVM generated assembler parts try to select e64 encoding before e32.
More changes are required for several AsmOperands.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, artem.tamazov

Differential Revision: https://reviews.llvm.org/D22922

llvm-svn: 281050
2016-09-09 14:44:04 +00:00
Matt Arsenault d745c28945 AMDGPU: Sign extend constants when splitting them
This will confuse later passes which try to look at the
immediate value and don't truncate first.

llvm-svn: 280974
2016-09-08 17:44:36 +00:00
Matt Arsenault bbb47da8a1 AMDGPU: Support commuting with immediate in src0
llvm-svn: 280970
2016-09-08 17:19:29 +00:00
Konstantin Zhuravlyov 1d65026ca6 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562

llvm-svn: 280747
2016-09-06 20:22:28 +00:00
Tom Stellard 2add8a1140 AMDGPU/SI: Teach SIInstrInfo::FoldImmediate() to fold immediates into copies
Summary:
I put this code here, because I want to re-use it in a few other places.
This supersedes some of the immediate folding code we have in SIFoldOperands.
I think the peephole optimizers is probably a better place for folding
immediates into copies, since it does some register coalescing in the same time.

This will also make it easier to transition SIFoldOperands into a smarter pass,
where it looks at all uses of instruction at once to determine the optimal way to
fold operands.  Right now, the pass just considers one operand at a time.

Reviewers: arsenm

Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23402

llvm-svn: 280744
2016-09-06 20:00:26 +00:00
Matt Arsenault ac42ba8633 AMDGPU: Set sizes of spill pseudos
llvm-svn: 280595
2016-09-03 17:25:44 +00:00
Matt Arsenault 2510a31677 AMDGPU: Fix spilling of m0
readlane/writelane do not support using m0 as the output/input.
Constrain the register class of spill vregs to try to avoid this,
but also handle spilling of the physreg when necessary by inserting
an additional copy to a normal SGPR.

llvm-svn: 280584
2016-09-03 06:57:55 +00:00
Tom Stellard 662f330852 AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint()
Summary:
The SILoadStoreOptimizer will need to use AliasAnalysis here in order to
move it before scheduling.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23813

llvm-svn: 279963
2016-08-29 12:05:32 +00:00
Matt Arsenault 22e417956d AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

llvm-svn: 279901
2016-08-27 01:00:37 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Matt Arsenault c1ebd82ebe AMDGPU: Fix not estimating MBB operand sizes correctly
llvm-svn: 278590
2016-08-13 01:43:54 +00:00
Matt Arsenault 11587d97be AMDGPU: Remove unnecessary cast
llvm-svn: 278274
2016-08-10 19:11:45 +00:00