SI_IF and SI_ELSE are terminators which also produce a value. For
these instructions ISel always inserts a COPY to move their value
to another basic block. This COPY ends up between SI_(IF|ELSE)
and the S_BRANCH* instruction at the end of the block.
This breaks MachineBasicBlock::getFirstTerminator() and also the
machine verifier which assumes that terminators are grouped together at
the end of blocks.
To solve this we coalesce the copy away right after ISel to make sure
there are no instructions in between terminators at the end of blocks.
llvm-svn: 207591
When legalizing ops, with UDIV/UREM set to expand, they automatically
expand to UDIVREM (if legal or custom).
We need to do this manually for legalize types.
v2:
SI should be set to Expand because the type is legal, and it is
automatically lowered to UDIVREM if UDIVREM is Legal/Custom
R600 should set to UDIV/UREM to Custom because it needs to lower them
during type legalization
Patch by: Jan Vesely
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207587
Having i128 as a legal type complicates the legalization phase. v4i32
is already a legal type, so we will use that instead.
This fixes several piglit tests.
llvm-svn: 206500
Setting vector types to expand will result in scalarization on pre SI hw,
as those gpus don't have vector shifts either.
Expand also i32 vectors, this helps llvm make the correct decision
about scalarizing the vector ops.
v2: move setOperation() calls to R600ISelLowering.cpp.
cleanup the SI code to make it obvious that this patch does is nop for SI
Patch by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 206348
Through some oddity where truncate (sextload x) isn't folded into
an anyextload for vectors, the sextload remains if the
vector isn't immediately scalarized. This keeps the expected
zextload instructions in the kernel-args test when small type
vectors aren't scalarized.
llvm-svn: 206070
Moving these patterns from TableGen files to PerformDAGCombine()
should allow us to generate better code by eliminating unnecessary
shifts and extensions earlier.
This also fixes a bug where the MAD pattern was calling
SimplifyDemandedBits with a 24-bit mask on the first operand
even when the full pattern wasn't being matched. This occasionally
resulted in some instructions being incorrectly deleted from the
program.
v2:
- Fix bug with 64-bit mul
llvm-svn: 205731
The SReg_(32|64) register classes contain special registers in addition
to the numbered SGPRs. This can lead to machine verifier errors when
these register classes are used as sub-registers for SReg_128, since
SReg_128 only uses the numbered SGPRs.
Replacing SReg_(32|64) with SGPR_(32|64) fixes this problem, since
the SGPR_(32|64) register classes contain only numbered SGPRs.
Tests cases for this are comming in a later commit.
llvm-svn: 204474
These are sometimes created by the shrink to boolean optimization in the
globalopt pass.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 203280
Different sized address spaces should theoretically work
most of the time now, and since 64-bit add is currently
disabled, using more 32-bit pointers fixes some cases.
llvm-svn: 197659
We were ignoring the ordered/onordered bits and also the signed/unsigned
bits of condition codes when lowering the DAG to MachineInstrs.
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195514
The SelectionDAGBuilder was promoting vector kernel arguments to legal
types, but this won't work for R600 and SI since kernel arguments are
stored in memory and can't be promoted. In order to handle vector
arguments correctly we need to look at the original types from the LLVM IR
function.
llvm-svn: 193215
During instruction selection, we rewrite the destination register
class for MIMG instructions based on their writemasks. This creates
machine verifier errors since the new register class does not match
the register class in the MIMG instruction definition.
We can avoid this by defining different MIMG instructions for each
possible destination type and then switching to the correct instruction
when we change the register class.
llvm-svn: 192365
For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.
The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.
The maximum number of input SGPRs is bumped to 17.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190575
This adds minimal support to the SelectionDAG for handling address spaces
with different pointer sizes. The SelectionDAG should now correctly
lower pointer function arguments to the correct size as well as generate
the correct code when lowering getelementptr.
This patch also updates the R600 DataLayout to use 32-bit pointers for
the local address space.
v2:
- Add more helper functions to TargetLoweringBase
- Use CHECK-LABEL for tests
llvm-svn: 189221
Now that compute support is better on SI, we can't continue using v16i8
for descriptors since this is also a legal type in OpenCL.
This patch fixes numerous hangs with the piglit OpenCL test and since
we now use a target specific DAG node for LOAD_CONSTANT with the
correct MemOperandFlags, this should also fix:
https://bugs.freedesktop.org/show_bug.cgi?id=66805
llvm-svn: 188429
The previous code declared the operand as unknown:$vaddr, which made
it possible for scalar registers to be used instead of vector registers.
llvm-svn: 188425
Since the VSrc_* register classes contain both VGPRs and SGPRs, copies
that used be emitted by isel like this:
SGPR = COPY VGPR
Will now be emitted like this:
VSrC = COPY VGPR
This patch also adds a pass that tries to identify and fix situations where
a VGPR to SGPR copy may occur. Hopefully, these changes will make it
impossible for the compiler to generate illegal VGPR to SGPR copies.
llvm-svn: 187831
By default, we expand these operations for both EG and SI. Move the
duplicated code into a common space for now. If the targets ever actually
implement these operations as instructions, we can override that in the relevant
target.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184848
Also add lit test for both cases on SI, and v2i32 for evergreen.
Note: I followed the guidance of the v4i32 EG check... UREM produces really
complex code, so let's just check that the instruction was lowered
successfully.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184844
Also add lit test for both cases on SI, and v2i32 for evergreen.
Note: I followed the guidance of the v4i32 EG check... UDIV produces really
complex code, so let's just check that the instruction was lowered
successfully.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 184843
In reality, some unaligned memory accesses are legal for 32-bit types and
smaller too, but it all depends on the address space. Allowing
unaligned loads/stores for > 32-bit types is mainly to prevent the
legalizer from splitting one load into multiple loads of smaller types.
https://bugs.freedesktop.org/show_bug.cgi?id=65873
llvm-svn: 184822
Also add a v2i32 test to the existing v4i32 test.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry<awatry@gmail.com>
llvm-svn: 184482
Also add SI tests to existing file and a v2i32 test for both
R600 and SI.
Patch by: Aaron Watry
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 184481
This is a candidate for the stable branch.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=64694
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 182084
Depending on the number of bits set in the writemask.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 179166
SITargetLowering::analyzeImmediate() was converting the 64-bit values
to 32-bit and then checking if they were an inline immediate. Some
of these conversions caused this check to succeed and produced
S_MOV instructions with 64-bit immediates, which are illegal.
v2:
- Clean up logic
Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 178927
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 178127
This is certainly not the last word on scheduling for this target, but
right now this allows a few apps to run / finish with radeonsi, most
notably UT2004 / Lightsmark. They fail to compile some shaders with the
default scheduler because it ends up trying to spill registers, which
we don't support yet (and which is probably a bad idea in general for
performance if it can be avoided).
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176687
v2: update CMakeLists.txt as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176626
v2: fix R600 regressions
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176624
Just encode the type as target specific attribute.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176622
Include immediate folding and SGPR limit handling for VOP3 instructions.
v2: remove leftover hasExtraSrcRegAllocReq
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 176101
It actually fixes quite a bunch of piglit tests.
This is a candidate for the mesa-stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175756
Instead of using custom inserters, it's simpler and
should make DAG folding easier.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175755
Fixing asm operation names.
v2: use ZERO constant, also add asm operands
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 175749
It's completely unnecessary and can be replace with proper
SReg_64 handling instead.
This actually fixes a piglit test on SI.
v2: use correct register class in addRegisterClass,
set special classes as not allocatable
v3: revert setting special classes as not allocateable
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175355
Seems to be allot simpler, and also paves the
way for further improvements.
v2: rebased on master, use 0 in BUFFER_LOAD_FORMAT_XYZW,
use VGPR0 in dummy EXP, avoid compiler warning, break
after encoding the first literal.
v3: correctly use V_ADD_F32_e64
This is a candidate for the stable branch.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175354
The important fix is that the constant interpolation value is stored in the
parameter slot P0, which is encoded as 2.
In addition, drop the SI_INTERP_CONST pseudo instruction, pass the parameter
slot as an operand to V_INTERP_MOV_F32 instead of hardcoding it there, and
add a special operand class for the parameter slots for type checking and
pretty printing.
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 175193
The modifiers don't seem to have any effect with V_MOV_B32, supposedly it's
meant to just move bits untouched.
Fixes 46 piglit tests with radeonsi, though unfortunately 11 of those had
just regressed because they started using the clamp modifier.
NOTE: This is a candidate for the Mesa stable branch.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 174890
v1i32, v2i32, v8i32 and v16i32.
Only add VGPR register classes for integer vector types, to avoid attempts
copying from VGPR to SGPR registers, which is not possible.
Patch By: Michel Dänzer
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 174632
Some instructions like memory reads/writes are executed
asynchronously, so we need to insert S_WAITCNT instructions
to block before accessing their results. Previously we have
just inserted S_WAITCNT instructions after each async
instruction, this patch fixes this and adds a prober
insertion pass.
Patch by: Christian König
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
llvm-svn: 172846
We shouldn't insert KILL optimization if we don't have a
kill instruction at all.
Patch by: Christian König
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
llvm-svn: 172845
This patch replaces the control flow handling with a new
pass which structurize the graph before transforming it to
machine instruction. This has a couple of different advantages
and currently fixes 20 piglit tests without a single regression.
It is now a general purpose transformation that could be not
only be used for SI/R6xx, but also for other hardware
implementations that use a form of structurized control flow.
v2: further cleanup, fixes and documentation
Patch by: Christian König
Signed-off-by: Christian König <deathsimple@vodafone.de>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 170591