Commit Graph

1093 Commits

Author SHA1 Message Date
Nicolai Haehnle 5b50497617 AMDGPU: add +xnack feature
Summary:
Enabling this feature will account for the two SGPRs used by the hardware
to store the XNACK_MASK physically.

The hardware only requires this reservation when the XNACK feature is
explicitly enabled. At some point, HSA will probably want to do that, but
it does increase SGPR register pressure, so leave it disabled by default
for now (but do add a small test).

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15869

llvm-svn: 256794
2016-01-04 23:35:53 +00:00
Tom Stellard 3da5672755 AMDGPU/SI: Move VI SMEM pattern back into VIInstructions.td
Summary: This was accidently moved to CIInstructions.td in r256282

Reviewers: cfang, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15763

llvm-svn: 256775
2016-01-04 20:23:10 +00:00
Nicolai Haehnle e705aadd67 AMDGPU: Avoid assertions after SGPR spilling failed
Summary:
The comment explains it: emitError does not necessarily exit the compilation
process, and then using NoRegister leads to assertions later on.
This generates incorrect code, of course, but the user should know to not use
the result when an error has been emitted.

It would be nice to have a test-case for this inside the LLVM repository,
but llc exits on error. shader-db tests trigger the underlying issue at least
on Tonga.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15826

llvm-svn: 256757
2016-01-04 15:50:01 +00:00
Craig Topper daf2e3ff7a Remove extra forward declarations and scrub includes for all in tree InstPrinters. NFC
llvm-svn: 256427
2015-12-25 22:10:01 +00:00
Matt Arsenault 4339b3ff35 AMDGPU: Fix getRegisterBitWidth for vectors
llvm-svn: 256362
2015-12-24 05:14:55 +00:00
Tom Stellard 5ebdfbe562 AMDGPU/SI: Fix encoding of flat instructions on VI
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15735

llvm-svn: 256360
2015-12-24 03:18:18 +00:00
Tom Stellard 668f793049 AMDGPU/SI: Remove non-existent flat instructions
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15734

llvm-svn: 256357
2015-12-24 02:41:55 +00:00
Changpeng Fang b41574a961 AMDGPU/SI: Use flat for global load/store when targeting HSA
Summary:
  For some reason doing executing an MUBUF instruction with the addr64
  bit set and a zero base pointer in the resource descriptor causes
  the memory operation to be dropped when the shader is executed using
  the HSA runtime.

  This kind of MUBUF instruction is commonly used when the pointer is
  stored in VGPRs.  The base pointer field in the resource descriptor
  is set to zero and and the pointer is stored in the vaddr field.

  This patch resolves the issue by only using flat instructions for
  global memory operations when targeting HSA. This is an overly
  conservative fix as all other configurations of MUBUF instructions
  appear to work.

  NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll

Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15543

llvm-svn: 256282
2015-12-22 20:55:23 +00:00
Rafael Espindola 4b0d24c00a Revert "AMDGPU/SI: Use flat for global load/store when targeting HSA"
This reverts commit r256273.

It broke CodeGen/AMDGPU/llvm.dbg.value.ll

llvm-svn: 256275
2015-12-22 19:46:44 +00:00
Changpeng Fang 9b8a9be058 AMDGPU/SI: Use flat for global load/store when targeting HSA
Summary:
  For some reason doing executing an MUBUF instruction with the addr64
  bit set and a zero base pointer in the resource descriptor causes
  the memory operation to be dropped when the shader is executed using
  the HSA runtime.

  This kind of MUBUF instruction is commonly used when the pointer is
  stored in VGPRs.  The base pointer field in the resource descriptor
  is set to zero and and the pointer is stored in the vaddr field.

  This patch resolves the issue by only using flat instructions for
  global memory operations when targeting HSA. This is an overly
  conservative fix as all other configurations of MUBUF instructions
  appear to work.

Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15543

llvm-svn: 256273
2015-12-22 19:32:28 +00:00
Tom Stellard 2b65ed306d AMDGPU/SI: Fix encoding for FLAT_SCRATCH registers on VI
Summary:
These register has different encodings on CI and VI, so we add pseudo
FLAT_SCRACTH registers to be used before MC, and subtarget specific
registers to be used by the MC layer.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15661

llvm-svn: 256178
2015-12-21 18:44:27 +00:00
Tom Stellard 9da8620cdb AMDGPU/SI: Change assembly name for flat scratch registers to flat_scratch
This matches what the assembler accepts.

llvm-svn: 256177
2015-12-21 18:44:21 +00:00
Tom Stellard ffc1a5aef7 AMDGPU/SI: Fix implemenation of isSourceOfDivergence() for graphics shaders
Summary:
The analysis of shader inputs was completely wrong.  We were passing the
wrong index to AttributeSet::hasAttribute() and the logic for which
inputs where in SGPRs was wrong too.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15608

llvm-svn: 256082
2015-12-19 02:54:15 +00:00
Matt Arsenault 2aed6ca1d3 AMDGPU: Switch barrier intrinsics to using convergent
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.

llvm-svn: 256075
2015-12-19 01:46:41 +00:00
Nicolai Haehnle 6bcf8b2890 AMDGPU/SI: use S_MOV_B64 for larger copies in copyPhysReg
Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15629

llvm-svn: 256073
2015-12-19 01:36:26 +00:00
Nicolai Haehnle dd58705af6 AMDGPU: fix overlapping copies in copyPhysReg
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.

Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)

Together with r255906, this fixes a VM fault in Unreal Elemental Demo.

While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15622

llvm-svn: 256072
2015-12-19 01:16:06 +00:00
Changpeng Fang c9963936e7 AMDGPU/SI: Test commit
Summary: This is just my first commit. Test!

    Reviewers: none

    Subscribers: none

    Differential Revision: none

llvm-svn: 256022
2015-12-18 20:04:28 +00:00
Changpeng Fang ef735b74c1 Revert "AMDGPU/SI: Test commit"
This reverts commit a493cb636e0152ad28210934a47c6c44b1437193.

llvm-svn: 256021
2015-12-18 20:04:26 +00:00
Changpeng Fang 7fdf674c2e AMDGPU/SI: Test commit
Summary: This is just my first commit. Test!

Reviewers: none

Subscribers: none

Differential Revision: none

llvm-svn: 256020
2015-12-18 19:57:41 +00:00
Tom Stellard caaa3aa07c AMDGPU/SI: Reserve appropriate number of sgprs for flat scratch init.
Reviewers: tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15583

Patch by: Changpeng Fang

llvm-svn: 255908
2015-12-17 17:05:09 +00:00
Nicolai Haehnle 87323da6eb AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex
Summary:
The method insertNOPs expected the number of wait states to be passed as
parameter, while eliminateFrameIndex passed the immediate argument for the
S_NOP, leading to an off-by-one error. Rename the method to make the
meaning of its parameter clearer. The number of 4 / 5 wait states (which
is what the method has always _tried_ to do according to the comment) is
correct according to the hardware docs.

I stumbled upon this while trying to track down the cause of
https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed,
this patch unfortunately does not fix that bug...

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15542

llvm-svn: 255906
2015-12-17 16:46:42 +00:00
Matt Arsenault e05ff15186 AMDGPU: Override getCFInstrCost
The default cost was 0 with the assumption that it is predictable.

llvm-svn: 255796
2015-12-16 18:37:19 +00:00
Tom Stellard 7750f4ed9e AMDGPU/SI: Set the code object work group segment size when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15493

llvm-svn: 255702
2015-12-15 23:15:25 +00:00
Tom Stellard a495307e5e AMDGPU/SI: Set the code objects private segment size when targeting HSA.
Summary: I'm not sure how things worked before without this.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15492

llvm-svn: 255692
2015-12-15 22:55:30 +00:00
Tom Stellard 29dd05e92f AMDGPU/SI: Emit constant variables in the .hsatext section when targeting HSA
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15426

llvm-svn: 255689
2015-12-15 22:39:36 +00:00
Tom Stellard a6f24c6565 AMDGPU/SI: Select constant loads with non-uniform addresses to MUBUF instructions
Summary:
We were previously selecting all constant loads to SMRD instructions and legalizing
the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass.

This new solution is more simple and also generates much better code, because
the instruction selector is able to take advantage of all the MUBUF addressing
modes that are legalization pass wasn't able to.

We also no longer need to generate v_add_* instructions when we
have a uniform pointer and a non-uniform offset, as this is now folded into the
MUBUF instruction during instruction selection.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15425

llvm-svn: 255672
2015-12-15 20:55:55 +00:00
Tom Stellard dbe374b2c5 AMDGPU/SI: Implement AMDGPUTargetTransformInfo::isSourceOfDivergence()
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15476

llvm-svn: 255661
2015-12-15 18:04:38 +00:00
Tom Stellard 8f307217c3 AMDGPU/SI: Fix bitcast between v2f32 and f64
The radeonsi fp64 support can hit these now that some redundant bitcasts
are folded.

Patch by: Michel Dänzer

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 255657
2015-12-15 17:11:17 +00:00
Tom Stellard 43f52df0b5 AMDGPU/SI: Add llvm.amdgcn.mbcnt.* intrinsics
Summary:
These are meant to be used instead of the llvm.SI.tid intrinsic which will
be deprecated at some point.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15475

llvm-svn: 255652
2015-12-15 17:02:52 +00:00
Tom Stellard ad7d03daa6 AMDGPU/SI: Add llvm.amdgcn.v.interp.p[12] intrinsics
Summary:
These are meant to be used instead of the llvm.SI.fs.interp intrinsic which
will be deprecated at some point.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15474

llvm-svn: 255651
2015-12-15 17:02:49 +00:00
Tom Stellard ac00eb5470 AMDGPU/SI: Add getShaderType() function to Utils/
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15424

llvm-svn: 255650
2015-12-15 16:26:16 +00:00
Krzysztof Parzyszek dac7102874 [Packetizer] Add AliasAnalysis as a parameter to the packetizer
This will make the depedence graph more accurate if an alias analysis
is provided. If nullptr is specified in its place, the behavior will
remain as it is currently.

llvm-svn: 255540
2015-12-14 20:35:13 +00:00
Krzysztof Parzyszek d44a1fd506 Add "const" to function arguments in DFAPacketizer
llvm-svn: 255526
2015-12-14 18:54:44 +00:00
Matt Arsenault d079285e05 AMDGPU: Use generic bitreverse intrinsic
Also fix bug in vector legalization for bitreverse.

llvm-svn: 255512
2015-12-14 17:25:38 +00:00
Matt Arsenault 52a52a564b AMDGPU: Fix splitting vector loads with existing offsets
If the original MMO had an offset, it was dropped.
Also use the correct alignment after adding the new offset.

llvm-svn: 255508
2015-12-14 16:59:40 +00:00
Cong Hou c106989fd5 Normalize MBB's successors' probabilities in several locations.
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).


Differential revision: http://reviews.llvm.org/D15259

llvm-svn: 255455
2015-12-13 09:26:17 +00:00
Matt Arsenault fbd9bbfda3 Start replacing vector_extract/vector_insert with extractelt/insertelt
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.

Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.

Example from AArch64:

def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
          (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;

Which is trying to do sext_inreg i8, i8.

llvm-svn: 255359
2015-12-11 19:20:16 +00:00
Tom Stellard c2d654322b AMDGPU/SI: Fix warning introduced by r255204
llvm-svn: 255205
2015-12-10 03:10:46 +00:00
Tom Stellard c93fc11f36 AMDGPU/SI: Emit constant arrays in the .text section
Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15257

llvm-svn: 255204
2015-12-10 02:13:01 +00:00
Tom Stellard b3c3bda512 AMDGPU/SI: Add support for sgpr and vgpr inline assembly constraints
Summary: The 's' constraint represents sgprs and the 'v' constraint represents vgprs.

Reviewers: arsenm, echristo

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15342

llvm-svn: 255203
2015-12-10 02:12:53 +00:00
Tom Stellard 9760f03757 AMDGPU/SI: Emit constant arrays in the .hsrodata_readonly_agent section
Summary: This is done only when targeting HSA.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13807

llvm-svn: 254587
2015-12-03 03:34:32 +00:00
Tom Stellard 00f2f91af4 AMDGPU/SI: Correctly emit agent global segment variables when targeting HSA
Differential Revision: http://reviews.llvm.org/D14508

llvm-svn: 254540
2015-12-02 19:47:57 +00:00
Tom Stellard e928533dae AMDGPU: Fix msan test failure
llvm-svn: 254527
2015-12-02 18:35:23 +00:00
Tom Stellard e3b5aeaf83 AMDGPU/SI: Don't emit group segment global variables
Summary: Only global or readonly segment variables should appear in object files.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15111

llvm-svn: 254519
2015-12-02 17:00:42 +00:00
Matt Arsenault 592d068198 AMDGPU: Error on addrspacecasts that aren't actually implemented
llvm-svn: 254469
2015-12-01 23:04:05 +00:00
Matt Arsenault f9bfeafd00 AMDGPU: Implement isNoopAddrSpaceCast
llvm-svn: 254468
2015-12-01 23:04:00 +00:00
Matt Arsenault 3b15967008 AMDGPU: Disallow flat_scr in SI assembler
llvm-svn: 254459
2015-12-01 20:31:08 +00:00
Matt Arsenault 856d1928a8 AMDGPU: Optimize VOP2 operand legalization
Don't use commuteInstruction, and don't commute if
doing so will not improve legality. Skip the more
complex checks for literal operands and constant bus restrictions,
which are not a concern for VOP2 instructions because src1
does not accept SGPRs or constants and few implicitly
read vcc.

This gets called quite a few times and the
attempts at commuting are a significant fraction
of the time spent in SIFixSGPRCopies, so it's
somewhat worthwhile to optimize. With this patch and others
leading up to it, this reduces the compile time of SIFixSGPRCopies
on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system.

llvm-svn: 254452
2015-12-01 19:57:17 +00:00
Matt Arsenault e830f5427b AMDGPU: Report extractelement as free in cost model
The cost for scalarized operations is computed as N * (scalar operation
cost + 1 extractelement + 1 insertelement). This partially fixes
inflating the cost of scalarized operations since every operation is
scalarized and free. I don't think we want any cost asociated with
scalarization, but for now insertelement is still counted. I'm not sure
if we should pretend that insertelement is also free, or add a way
to compute a custom scalarization cost.

llvm-svn: 254438
2015-12-01 19:08:39 +00:00
Tom Stellard 38b7cbe3e0 AMDGPU/SI: Remove REGISTER_STORE/REGISTER_LOAD code which is now dead
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15050

llvm-svn: 254427
2015-12-01 17:45:22 +00:00
Tom Stellard ff63c25753 AMDGPU: Use the default strings for data emission directives
Summary:
This makes the assembly output look nicer and there is no reason to
have custom strings for these.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D14671

llvm-svn: 254426
2015-12-01 17:45:17 +00:00
Cong Hou d97c100dc4 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
(This is the second attempt to submit this patch. The first caused two assertion
 failures and was reverted. See https://llvm.org/bugs/show_bug.cgi?id=25687)

The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254377
2015-12-01 05:29:22 +00:00
Hans Wennborg 1dbaf67537 Revert r254348: "Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces."
and the follow-up r254356: "Fix a bug in MachineBlockPlacement that may cause assertion failure during BranchProbability construction."

Asserts were firing in Chromium builds. See PR25687.

llvm-svn: 254366
2015-12-01 03:49:42 +00:00
Matt Arsenault 456fdfcdc2 Squelch unused variable warning in SIRegisterInfo.cpp.
Patch by Justin Lebar

llvm-svn: 254362
2015-12-01 02:14:33 +00:00
Cong Hou fa1917c673 Replace all weight-based interfaces in MBB with probability-based interfaces, and update all uses of old interfaces.
The patch in http://reviews.llvm.org/D13745 is broken into four parts:

1. New interfaces without functional changes (http://reviews.llvm.org/D13908).
2. Use new interfaces in SelectionDAG, while in other passes treat probabilities
as weights (http://reviews.llvm.org/D14361).
3. Use new interfaces in all other passes.
4. Remove old interfaces.

This patch is 3+4 above. In this patch, MBB won't provide weight-based
interfaces any more, which are totally replaced by probability-based ones.
The interface addSuccessor() is redesigned so that the default probability is
unknown. We allow unknown probabilities but don't allow using it together
with known probabilities in successor list. That is to say, we either have a
list of successors with all known probabilities, or all unknown
probabilities. In the latter case, we assume each successor has 1/N
probability where N is the number of successors. An assertion checks if the
user is attempting to add a successor with the disallowed mixed use as stated
above. This can help us catch many misuses.

All uses of weight-based interfaces are now updated to use probability-based
ones.


Differential revision: http://reviews.llvm.org/D14973

llvm-svn: 254348
2015-12-01 00:02:51 +00:00
Matt Arsenault ada6cf1b22 AMDGPU: Fix unused function
llvm-svn: 254333
2015-11-30 21:32:10 +00:00
Matt Arsenault 41003af292 AMDGPU: Error if too many user SGPRs used
llvm-svn: 254332
2015-11-30 21:16:07 +00:00
Matt Arsenault 26f8f3db39 AMDGPU: Rework how private buffer passed for HSA
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.

If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.

This also only selectively enables all of the input registers
which are really required instead of always enabling them.

llvm-svn: 254331
2015-11-30 21:16:03 +00:00
Matt Arsenault ac234b604d AMDGPU: Rename enums to be consistent with HSA code object terminology
llvm-svn: 254330
2015-11-30 21:15:57 +00:00
Matt Arsenault 0e3d38937e AMDGPU: Remove SIPrepareScratchRegs
It does not work because of emergency stack slots.
This pass was supposed to eliminate dummy registers for the
spill instructions, but the register scavenger can introduce
more during PrologEpilogInserter, so some would end up
left behind if they were needed.

The potential for spilling the scratch resource descriptor
and offset register makes doing something like this
overly complicated. Reserve registers to use for the resource
descriptor and use them directly in eliminateFrameIndex.

Also removes creating another scratch resource descriptor
when directly selecting scratch MUBUF instructions.

The choice of which registers are reserved is temporary.
For now it attempts to pick the next available registers
after the user and system SGPRs.

llvm-svn: 254329
2015-11-30 21:15:53 +00:00
Matt Arsenault ff6da2fe89 AMDGPU: Use assert zext for workgroup sizes
llvm-svn: 254328
2015-11-30 21:15:45 +00:00
Matt Arsenault ea03cf2fa1 AMDGPU: Don't reserve SCRATCH_PTR input register
This hasn't been doing anything since using relocations was added.

llvm-svn: 254304
2015-11-30 15:46:47 +00:00
Tom Stellard 48f29f21ee AMDGPU: Add llvm.amdgcn.dispatch.ptr intrinsic
Summary:
This returns a pointer to the dispatch packet, which can be used to load
information about the kernel dispach.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D14898

llvm-svn: 254116
2015-11-26 00:43:29 +00:00
Marek Olsak 7ed6b2f414 AMDGPU/SI: select S_ABS_I32 when possible (v2)
v2: added more tests, moved the SALU->VALU conversion to a separate function

It looks like it's not possible to get subregisters in the S_ABS lowering
code, and I don't feel like guessing without testing what the correct code
would look like.

llvm-svn: 254095
2015-11-25 21:22:45 +00:00
Matt Arsenault 49affb8462 AMDGPU: Check feature attributes in SIMachineFunctionInfo
llvm-svn: 254091
2015-11-25 20:55:12 +00:00
Matt Arsenault 61001bbc03 AMDGPU: Make v2i64/v2f64 legal types.
They can be loaded and stored, so count them as legal. This is
mostly to fix a number of common cases for load/store merging.

llvm-svn: 254086
2015-11-25 19:58:34 +00:00
Artyom Skrobov 314ee04268 Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.

Reviewers: MatzeB, andreadb, tstellarAMD

Subscribers: arsenm, jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D14945

llvm-svn: 254085
2015-11-25 19:41:11 +00:00
Matt Arsenault ff05da806c AMDGPU: Split LDS vector loads
If properly aligned this could allow using ds_read_b64.

llvm-svn: 253975
2015-11-24 12:18:54 +00:00
Matt Arsenault 4d801cd357 AMDGPU: Split x8 and x16 vector loads instead of scalarize
The one regression in the builtin tests is in the read2 test which now
(again) has many extra copies, but this should be solved once the pass
is replaced with a DAG combine.

llvm-svn: 253974
2015-11-24 12:05:03 +00:00
Pete Cooper 67cf9a723b Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.

This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787

llvm-svn: 253543
2015-11-19 05:56:52 +00:00
Pete Cooper 72bc23ef02 Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

These intrinsics currently have an explicit alignment argument which is
required to be a constant integer.  It represents the alignment of the
source and dest, and so must be the minimum of those.

This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments.  The alignment
argument itself is removed.

There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe.  For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.

For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)

For out of tree owners, I was able to strip alignment from calls using sed by replacing:
  (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
  $1i1 false)

and similarly for memmove and memcpy.

I then added back in alignment to test cases which needed it.

A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.

In IRBuilder itself, a new argument was added.  Instead of calling:
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)

There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool.  This is to prevent isVolatile here from passing its default
parameter to the source alignment.

Note, changes in future can now be made to codegen.  I didn't change anything here, but this
change should enable better memcpy code sequences.

Reviewed by Hal Finkel.

llvm-svn: 253511
2015-11-18 22:17:24 +00:00
Akira Hatanaka b11ef0897c Reduce the size of MCRelaxableFragment.
MCRelaxableFragment previously kept a copy of MCSubtargetInfo and
MCInst to enable re-encoding the MCInst later during relaxation. A copy
of MCSubtargetInfo (instead of a reference or pointer) was needed
because the feature bits could be modified by the parser.

This commit replaces the MCSubtargetInfo copy in MCRelaxableFragment
with a constant reference to MCSubtargetInfo. The copies of
MCSubtargetInfo are kept in MCContext, and the target parsers are now
responsible for asking MCContext to provide a copy whenever the feature
bits of MCSubtargetInfo have to be toggled.
 
With this patch, I saw a 4% reduction in peak memory usage when I
compiled verify-uselistorder.lto.bc using llc.

rdar://problem/21736951

Differential Revision: http://reviews.llvm.org/D14346

llvm-svn: 253127
2015-11-14 06:35:56 +00:00
Akira Hatanaka bd9fc28444 [MCTargetAsmParser] Move the member varialbes that reference
MCSubtargetInfo in the subclasses into MCTargetAsmParser and define a
member function getSTI.

This is done in preparation for making changes to shrink the size of
MCRelaxableFragment. (see http://reviews.llvm.org/D14346).

llvm-svn: 253124
2015-11-14 05:20:05 +00:00
Tom Stellard afd6e2f3c3 AMDGPU: Add stony support
Patch by: Alex Deucher

llvm-svn: 253053
2015-11-13 17:06:32 +00:00
Tom Stellard 0967c91e0c Revert "Remove unnecessary call to getAllocatableRegClass"
This reverts commit r252565.

This also includes the revert of the commit mentioned below in order to
avoid breaking tests in AMDGPU:

Revert "AMDGPU: Set isAllocatable = 0 on VS_32/VS_64"

This reverts commit r252674.

llvm-svn: 252956
2015-11-12 21:43:25 +00:00
Matt Arsenault 8246d4aead AMDGPU: Print more fields in comments
llvm-svn: 252677
2015-11-11 00:27:46 +00:00
Matt Arsenault 61cb6fa848 AMDGPU: Remove dead code
llvm-svn: 252675
2015-11-11 00:01:36 +00:00
Matt Arsenault 6690d7de39 AMDGPU: Set isAllocatable = 0 on VS_32/VS_64
llvm-svn: 252674
2015-11-11 00:01:32 +00:00
Tom Stellard 41b7e63040 AMDGPU/SI: Refactor VOP[12C] tablegen definitions
Summary:
Pass the VOPProfile object all the through to *_m multiclasses.  This will
allow us to do more simplifications in the future.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13437

llvm-svn: 252339
2015-11-06 20:56:18 +00:00
Matt Arsenault f59e538937 AMDGPU: Cleanup includes
llvm-svn: 252328
2015-11-06 18:23:00 +00:00
Matt Arsenault 0c90e9501e AMDGPU: Create emergency stack slots during frame lowering
Test has a bogus verifier error which will be fixed by later commits.

llvm-svn: 252327
2015-11-06 18:17:45 +00:00
Matt Arsenault 08f14de244 AMDGPU: Remove unused scratch resource operands
The SGPR spill pseudos don't actually use them.

llvm-svn: 252324
2015-11-06 18:07:53 +00:00
Matt Arsenault 3931948bb6 AMDGPU: Add pass to detect used kernel features
Mark kernels that use certain features that require user
SGPRs to support with kernel attributes. We need to know
before instruction selection begins because it impacts
the kernel calling convention lowering.

For now this only detects the workitem intrinsics.

llvm-svn: 252323
2015-11-06 18:01:57 +00:00
Matt Arsenault 4dc7a5a5c6 AMDGPU: Fix hardcoded alignment of spill.
Instead of forcing 4 alignment when spilled, set register class
alignments.

llvm-svn: 252322
2015-11-06 17:54:47 +00:00
Matt Arsenault 623e6fd466 AMDGPU: Hack for VS_32 register pressure
For some reason VS_32 ends up factoring into the pressure heuristics
even though we should never see a virtual register with this class.

When SGPRs are reserved for register spilling, this for some reason
triggers reg-crit scheduling.

Setting isAllocatable = 0 may help with this since that seems to remove
it from the default implementation's generated table.

llvm-svn: 252321
2015-11-06 17:54:43 +00:00
Tom Stellard 1e1b05db24 AMDGPU/SI: Emit HSA kernels with symbol type STT_AMDGPU_HSA_KERNEL
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13804

llvm-svn: 252291
2015-11-06 11:45:14 +00:00
Matt Arsenault 5b22dfa65d AMDGPU: Also track whether SGPRs were spilled
llvm-svn: 252145
2015-11-05 05:27:10 +00:00
Matt Arsenault d41c0dbff0 AMDGPU: Print number user SGPRs
This doesn't quite match how SC prints it, which doesn't put it in a
comment.

llvm-svn: 252144
2015-11-05 05:27:07 +00:00
Matt Arsenault 68802d3177 AMDGPU: Disallow s[102:103] on VI in assembler
llvm-svn: 252142
2015-11-05 03:11:27 +00:00
Matt Arsenault a40450cba2 AMDGPU: Fix assert when legalizing atomic operands
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.

This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.

llvm-svn: 252140
2015-11-05 02:46:56 +00:00
Matt Arsenault bed42a7320 AMDGPU: Make addr64 atomic operand order consistent
vaddr comes before srsrc in every other MUBUF instruction,
and is the order it is printed.

llvm-svn: 252139
2015-11-05 02:46:53 +00:00
Matt Arsenault 6c2e200d38 AMDGPU: Fix typo
llvm-svn: 252116
2015-11-05 01:03:08 +00:00
Matt Arsenault aac9b49325 AMDGPU: Make flat_scratch name consistent
The printed name and the parsed assembler names weren't the same.
I'm not sure which name SC prints these as, but I think it's this one.

llvm-svn: 252010
2015-11-03 22:50:34 +00:00
Matt Arsenault 967c2f5dee AMDGPU: Fix asserts on invalid register ranges
If the requested SGPR was not actually aligned, it was
accepted and rounded down instead of rejected.

Also fix an assert if the range is an invalid size.

llvm-svn: 252009
2015-11-03 22:50:32 +00:00
Matt Arsenault 3473c72aab AMDGPU: Fix off by one error in register parsing
If trying to use one past the end, this would assert.

llvm-svn: 252008
2015-11-03 22:50:27 +00:00
Matt Arsenault e8ed13d946 AMDGPU: s[102:103] is unavailable on VI
llvm-svn: 252000
2015-11-03 22:39:52 +00:00
Matt Arsenault 192b282bf3 AMDGPU: Define correct number of SGPRs
There are actually 104 so 2 were missing.

More assembler tests with high register number tuples
will be included in later patches.

llvm-svn: 251999
2015-11-03 22:39:50 +00:00
Matt Arsenault 6c0674112a AMDGPU: Make findUsedSGPR more readable
Add more comments etc.

llvm-svn: 251996
2015-11-03 22:30:15 +00:00
Matt Arsenault 782c03bb7e AMDGPU: Initialize SIFixSGPRCopies so -print-after works
llvm-svn: 251995
2015-11-03 22:30:13 +00:00
Matt Arsenault d9d659aa23 AMDGPU: Alphabetize includes
llvm-svn: 251994
2015-11-03 22:30:08 +00:00
Matthias Braun 93563e7032 ScheduleDAGInstrs: Remove IsPostRA flag; NFC
ScheduleDAGInstrs doesn't behave differently before or after register
allocation. It was only used in a method of MachineSchedulerBase which
behaved differently in MachineScheduler/PostMachineScheduler. Change
this to let MachineScheduler/PostMachineScheduler just pass in a
parameter to that function.

The order of the LiveIntervals* and bool RemoveKillFlags paramters have
been switched to make out-of-tree code fail instead of unintentionally
passing a value intended for the IsPostRA flag to the (previously
following and default initialized) RemoveKillFlags.

Differential Revision: http://reviews.llvm.org/D14245

llvm-svn: 251883
2015-11-03 01:53:29 +00:00
Matt Arsenault f1aebbf33a AMDGPU: Stop assuming vreg for build_vector
This was causing a variety of test failures when v2i64
is added as a legal type.

SIFixSGPRCopies should correctly handle the case of vector inputs
to a scalar reg_sequence, so this isn't necessary anymore. This
was hiding some deficiencies in how reg_sequence is handled later,
but this shouldn't be a problem anymore since the register class
copy of a reg_sequence is now done before the reg_sequence.

llvm-svn: 251860
2015-11-02 23:30:48 +00:00
Matt Arsenault d48da14269 AMDGPU: Error on graphics shaders with HSA
I've found myself pointlessly debugging problems from running
graphics tests with an HSA triple a few times, so stop this from
happening again.

llvm-svn: 251858
2015-11-02 23:23:02 +00:00
Matt Arsenault 0de924b76d AMDGPU: Distribute SGPR->VGPR copies of REG_SEQUENCE
Make the REG_SEQUENCE be a VGPR, and do the register class
copy first.

llvm-svn: 251855
2015-11-02 23:15:42 +00:00
Marek Olsak 6f6d318e16 AMDGPU/SI: handle undef for llvm.SI.packf16
llvm-svn: 251632
2015-10-29 15:29:09 +00:00
Marek Olsak 74d084f466 AMDGPU/SI: use S_OR for fneg (fabs f32)
llvm-svn: 251631
2015-10-29 15:29:05 +00:00
Marek Olsak f924dd6f3c AMDGPU/SI: use S_AND for i1 trunc
llvm-svn: 251630
2015-10-29 15:05:03 +00:00
Matt Arsenault 2ea0a23f18 AMDGPU: Print modifiers when dumping AMDGPUOperand
llvm-svn: 251160
2015-10-24 00:12:56 +00:00
Matt Arsenault 382557ec72 AMDGPU: Fix parsing of 32-bit literals with sign bit set
llvm-svn: 251132
2015-10-23 18:07:58 +00:00
Matt Arsenault 391be09ef3 AMDGPU: Fix adding redundant m0 uses
BuildMI already adds these since they are defined correctly now.

llvm-svn: 250961
2015-10-21 22:37:51 +00:00
Matt Arsenault e8c0891e42 AMDGPU: Fix verifier error in SIFoldOperands
There may be other use operands that also need their kill flags cleared.

This happens in a few tests when SIFoldOperands is moved after
PeepholeOptimizer.

PeepholeOptimizer rewrites cases that look like:
%vreg0 = ...
%vreg1 = COPY %vreg0
use %vreg1<kill>
%vreg2 = COPY %vreg0
use %vreg2<kill>

to use the earlier source to
%vreg0 = ...
use %vreg0
use %vreg0

Currently SIFoldOperands sees the copied registers, so there is
only one use. So far I haven't managed to come up with a test
that currently has multiple uses of a foldable VGPR -> VGPR copy.

llvm-svn: 250960
2015-10-21 22:37:50 +00:00
Matt Arsenault b6fd98c7d9 AMDGPU: Split DiagnosticInfoUnsupported into its own file
llvm-svn: 250959
2015-10-21 22:37:46 +00:00
Matt Arsenault 6005fcbe12 AMDGPU: Simplify VOP3 operand legalization.
This was checking for a variety of situations that should
never happen. This saves a tiny bit of compile time.

We should not be selecting instructions with invalid operands in the
first place. Most of the time for registers copys are inserted
to the correct operand register class.

For VOP3, since all operand types are supported and literal
constants never are, we just need to verify the constant bus
requirements (all immediates should be legal inline ones).

The only possibly tricky case to maybe worry about is if when
legalizing operands in moveToVALU with s_add_i32 and similar
instructions. If the original s_add_i32 had a literal constant
and we need to replace it with v_add_i32_e64 we would have an
unsupported literal operand.  However, I don't think we should worry
about that because SIFoldOperands should handle folding literal
constant operands into the SALU instructions based on the uses.
At SIFoldOperands time, the legality and profitability of
operand types is a bit different.

llvm-svn: 250951
2015-10-21 21:51:02 +00:00
Matt Arsenault e223cebd10 AMDGPU: Fix not checking implicit operands in verifyInstruction
When verifying constant bus restrictions, this wasn't catching
uses in implicit operands.

llvm-svn: 250948
2015-10-21 21:15:01 +00:00
Matt Arsenault 3add6439d0 AMDGPU: Add MachineInstr overloads for instruction format tests
llvm-svn: 250797
2015-10-20 04:35:43 +00:00
Matt Arsenault 8f18917a90 AMDGPU: Stop reserving v[254:255]
This wasn't doing anything useful. They weren't explicitly used
anywhere, and the RegScavenger ignores reserved registers.

This for some reason caused a random scheduling change in the test.
Getting the check lines to pass is too frustrating, and there's probably
not too much value in checking the vector case's operands N times.

llvm-svn: 250794
2015-10-20 03:59:58 +00:00
Craig Topper 2626094fa1 Make a bunch of static arrays const.
llvm-svn: 250642
2015-10-18 05:15:34 +00:00
Artyom Skrobov 63471330d2 Don't pretend AMDGPU backend knows how to custom-lower UDIVREM for vector types; it can't
Reviewers: arsenm, jvesely, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13734

llvm-svn: 250384
2015-10-15 09:18:47 +00:00
Duncan P. N. Exon Smith a73371a9b7 AMDGPU: Remove implicit ilist iterator conversions, NFC
One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new
one.  Previously, bundle iterators and single-instruction iterators
could be compared to each other (comparing on underlying pointers).
I changed a comparison from using `MBB->end()` to using
`MBB->instr_end()`, since both end iterators should point at the some
place anyway.

I don't think the implicit conversion between the two iterator types is
a good idea since it's fairly easy to accidentally compare to the wrong
thing (they aren't always end iterators).  Otherwise I would have just
added the conversion.

Even with that, no there should be functionality change here.

llvm-svn: 250218
2015-10-13 20:07:10 +00:00
Matt Arsenault f0d9e47da2 AMDGPU: Refactor isVGPRToSGPRCopy
It should now correctly handle physical registers and make
it easier to identify the other direction.

llvm-svn: 250132
2015-10-13 00:07:54 +00:00
Matt Arsenault 61dc235f20 DAGCombiner: Combine extract_vector_elt from build_vector
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.

InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.

llvm-svn: 250129
2015-10-12 23:59:50 +00:00
Matt Arsenault 8c0ef8b36d AMDGPU: Register some more passes so -print-before works
llvm-svn: 250071
2015-10-12 17:43:59 +00:00
Justin Bogner 468c998031 CodeGen: print and verify after TargetPassConfig::insertPass by default
In r224059, we started verifying after addPass, but missed doing so on
insertPass. There isn't a good reason for the discrepancy, and
skipping the verifier in these cases causes bugs.

This also exposes a verifier error that was introduced in r249087, but
the verifier doesn't run until after the register coalescer, when the
issue happens to have been resolved. I've skipped the verifier after
SIFixSGPRLiveRangesID to avoid the failures for now and will follow up
with Matt for a proper fix.

llvm-svn: 249643
2015-10-08 00:36:22 +00:00
Matt Arsenault fc0ad42516 AMDGPU: Fix missing implicit m0 uses on movrel instructions
llvm-svn: 249577
2015-10-07 17:46:32 +00:00
Matt Arsenault 10e6a61892 AMDGPU: Add comment for VOP2b operand class
Because of the constant bus requirement, it is never legal to
use a literal constant for these instructions despite the encoding
allowing it. This was already doing the right thing, but note why.

llvm-svn: 249500
2015-10-07 01:36:00 +00:00
Matt Arsenault 187276fa94 AMDGPU: Properly register passes
llvm-svn: 249495
2015-10-07 00:42:53 +00:00
Matt Arsenault 284192730a AMDGPU: Use explicit register size indirect pseudos
This stops using an unknown reg class operand.

Currently build_vector selection has a broken looking check
where it tries to use a VGPR reg class and an SGPR one if it
sees an SGPR use.

With the source operand has an explicit VGPR class,
illegal copies will be inserted that SIFixSGPRCopies will take care
of normally later, which will allow removing the weird check
of build_vector users. Without this, when removed v_movrels_b32 would
still be emitted even though all of the values were only stored in
SGPRs.

llvm-svn: 249494
2015-10-07 00:42:51 +00:00
Matt Arsenault 922b7bf808 AMDGPU: Remove inferRegClassFromUses / inferRegClassFromDefs
I'm not sure why this would be necessary, and no tests fail with
them removed. Looking at the uses is suspect as well because
the use reg classes will likely change when the users are moved
as a result of moving this instruction.

llvm-svn: 249493
2015-10-07 00:42:31 +00:00
Tom Stellard 0fbf899c0f AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments()
Summary:
We currently ignore the calling convention, so there is no real reason to
assert on the calling convention of functions.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13367

llvm-svn: 249468
2015-10-06 21:16:34 +00:00
Tom Stellard 88e0b25181 AMDGPU/SI: Add 64-bit versions of v_nop and v_clrexcp
Summary:
The assembly printing of these is still missing the encoding size
suffix, but this will be fixed in a later commit.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13436

llvm-svn: 249424
2015-10-06 15:57:53 +00:00
Tom Stellard d585cd85a3 AMDGPU/SI: Add a helper for creating aliases for the _e32 instructions
Summary:
We are currently only using these aliases for VOPC instructions,
but this helper will make it easier to use them everywhere.

These aliases allow for the automatic matching of instructions
with forced 32-bit encoding.  Eventually, we should be able to remove
the custom C++ logic we have for this in the assembler.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13396

llvm-svn: 249330
2015-10-05 17:57:39 +00:00
Tom Stellard dc9088a10e AMDGPU/SI: Remove unused tablegen multiclass
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13395

llvm-svn: 249221
2015-10-03 00:29:50 +00:00
Matt Arsenault d092a068ba AMDGPU/SI: Add verifier check for exec reads
Make sure we aren't accidentally not setting
these in the instruction definitions.

llvm-svn: 249170
2015-10-02 18:58:37 +00:00
Matt Arsenault b733f00510 AMDGPU: Fix unused variable warning in release build
llvm-svn: 249091
2015-10-01 22:40:35 +00:00
Matt Arsenault b87fc22915 AMDGPU: Move SIFixSGPRLiveRanges to be a regalloc pass
Replace LiveInterval usage with LiveVariables. LiveIntervals
computes far more information than is needed for this pass
which just needs to find if an SGPR is live out of the
defining block.

LiveIntervals are not usually available that early, requiring
computing them twice which is very expensive. The extra run of
LiveIntervals/LiveVariables/SlotIndexes was costing in total
about 5% of compile time.

Continuing to use LiveIntervals is problematic. It seems
there is an option (early-live-intervals) to run the analysis
about where it should go to avoid recomputing LiveVariables,
but it seems to be completely broken with subreg liveness
enabled. There are also problems from trying to recompute
LiveIntervals since this seems to undo LiveVariables
and clearing kill flags, causing TwoAddressInstructions
to make bad decisions.

Insert the pass right after live variables and preserve it.
The tricky case to worry about might be phis since
LiveVariables doesn't count a register as live out if
in the successor block it is only used in a phi,
but I don't think this is a concern right now
because SIFixSGPRCopies replaces SGPR phis.

llvm-svn: 249087
2015-10-01 22:10:03 +00:00
Matt Arsenault d2c7589f93 AMDGPU: Merge if and switch
llvm-svn: 249082
2015-10-01 21:51:59 +00:00
Matt Arsenault db7f0ef367 AMDGPU: Remove dead code
There's no point in checking VReg_1 because all uses
of it should already have been removed by SILowerI1Copies.

llvm-svn: 249081
2015-10-01 21:51:57 +00:00
Matt Arsenault d1d499aa56 AMDGPU: Make SIInsertWaits about a factor of 4 faster
This was the slowest target custom pass and was spending 80%
of the time in getMinimalPhysRegClass which was called
for every register operand.

Try to use the statically known register class when possible from
the instruction's MCOperandInfo. There are a few pseudo instructions
which are not well behaved with unknown register classes which still
require the expensive physical register class search.

There are a few other possibilities for making this even faster,
such as not inspecting implicit operands. For now those are checked
because it is technically possible to have a scalar load into
exec or vcc which can be implicitly used.

llvm-svn: 249079
2015-10-01 21:43:15 +00:00
Tom Stellard e9f8b24985 AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass
Summary:
Instead of asserting when the kernel metadata is different than we expect,
we should just skip lowering that function.  This fixes assertion
failures with OpenCL argument metadata from older LLVM releases.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13356

llvm-svn: 249073
2015-10-01 21:16:05 +00:00
Tom Stellard e0e582c9aa AMDGPU: Add MEM_RAT STORE_TYPED.
v2: Add test (Matt).
    Fix capitalization of isEOP (Matt).
    Move pattern to class parameter (Matt).
    Make the instruction available to Cayman (Matt).
    Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED.

Patch by: Zoltan Gilian

llvm-svn: 249042
2015-10-01 17:51:34 +00:00
Tom Stellard c0f0fba2c4 AMDGPU: Factor out EOP query.
v2: Fix brace placement and capitalization (Matt).

Patch by: Zoltan Gilian

llvm-svn: 249041
2015-10-01 17:51:29 +00:00
Tom Stellard 1f0e7bbc5b AMDGPU/SI: Re-order PreloadedValue enum and number entries based on init order
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12451

llvm-svn: 248978
2015-10-01 02:02:46 +00:00
Marek Olsak d1a69a2839 AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is set
to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.

This is a candidate for stable llvm 3.7.

Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 248858
2015-09-29 23:37:32 +00:00
Matt Arsenault ba6aae785a AMDGPU: Factor switch into separate function
llvm-svn: 248742
2015-09-28 20:54:57 +00:00
Matt Arsenault 73aa8f687a AMDGPU: Fix splitting x16 SMRD loads
When used recursively, this would set the kill flag
on the intermediate step from first splitting
x16 to x8.

llvm-svn: 248741
2015-09-28 20:54:52 +00:00
Matt Arsenault e5d042cd56 AMDGPU: Fix moving SMRD loads with literal offsets on CI
llvm-svn: 248740
2015-09-28 20:54:46 +00:00
Matt Arsenault dd49c5fc1b AMDGPU: Fix splitting SMRD with large offset
The splitting of > 4 dword SMRD instructions
if using an offset in an SGPR instead of an immediate
was not setting the destination register,
resulting an an instruction missing an operand
which would assert later.

Test will be included in a following commit
which fixes a related issue.

llvm-svn: 248739
2015-09-28 20:54:42 +00:00
Andrew Kaylor 16c4da03d5 Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)

Differential Revision: http://reviews.llvm.org/D11370

llvm-svn: 248735
2015-09-28 20:33:22 +00:00
Matt Arsenault 1d36b717a5 AMDGPU: Remove hasPostISelHook from most instructions
Since this is only needed for VOP3 and a few other special
case instructions, stop setting it on everything.

llvm-svn: 248657
2015-09-26 05:06:48 +00:00
Matt Arsenault f32481372c AMDGPU: Switch over reg class size instead of checking all super classes
This gets isSGPRClass out of my profile of SIFixSGPRCopies.

llvm-svn: 248656
2015-09-26 04:59:04 +00:00
Matt Arsenault 6e28010215 AMDGPU: Don't handle invalid reg classes in helper functions
No tests hit these and it would be better to have checks like
this explicit where they are used.

llvm-svn: 248655
2015-09-26 04:53:30 +00:00
Saleem Abdulrasool 9174623b2d AMDGPU: address -Winconsistent-missing-override
Add missing override.  NFC.

llvm-svn: 248652
2015-09-26 04:34:52 +00:00
Matt Arsenault 8e1ddf84fe AMDGPU: Set CopyCost of register classes
These require multiple mov instructions to copy,
but the default value is that 1 instruction is needed.
I'm not sure if this actually changes anything.

llvm-svn: 248651
2015-09-26 04:09:34 +00:00
Matt Arsenault e98a074c42 AMDGPU: VOP3b definition cleanups
llvm-svn: 248647
2015-09-26 02:25:48 +00:00
Matt Arsenault 86095b8dec AMDGPU: Fix sched model for VOP2b instructions
Trying to use the version with the explicit output operand
would complain because of the missing WriteSALU. I'm not sure
why it doesn't complain about this with the implicit VCC def.

llvm-svn: 248646
2015-09-26 02:25:45 +00:00
Matt Arsenault e229c0c45e AMDGPU: Construct new buffer instruction when moving SMRD
It's easier to understand creating a full instruction
than the current situation where sometimes a new
instruction is created and sometimes it is awkwardly
mutated in place.

llvm-svn: 248627
2015-09-25 22:21:19 +00:00
Tom Stellard e135ffd554 AMDGPU/SI: Use .hsatext section instead of .text for HSA
Reviewers: arsenm, grosbach, rafael

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12424

llvm-svn: 248619
2015-09-25 21:41:28 +00:00
Matt Arsenault f743b838cb AMDGPU: Make getNamedOperandIdx declaration readonly
This matches how it is defined in the generated implementation.

llvm-svn: 248598
2015-09-25 18:09:15 +00:00
Matt Arsenault 0a10900070 AMDGPU: Disable some passes that are not meaningful
Don't run passes related to stack maps, garbage collection,
exceptions since these aren't useful for GPUs.

There might be a few more to turn off that I'm less sure about
(e.g. ShrinkWrapping) or I'm not sure how to disable
(SafeStack and StackProtector)

llvm-svn: 248591
2015-09-25 17:41:20 +00:00
Matt Arsenault 4bf43d4e68 AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG
This fixes a select error when the i64 source was also
bitcasted to v2i32 in the original source.

Instead of awkwardly trying to select the modified source value and
the store, replace before isel begins.

Uses a worklist to avoid possible problems from mutating the DAG,
although it seems to work OK without it.

llvm-svn: 248589
2015-09-25 17:27:08 +00:00
Matt Arsenault 0cb8517dc6 AMDGPU: Fix recomputing dominator tree unnecessarily
SIFixSGPRCopies does not modify the CFG, but this was
being recomputed before running SIFoldOperands.

llvm-svn: 248587
2015-09-25 17:21:28 +00:00
Matt Arsenault 2d6fdb8495 AMDGPU: Re-justify workaround and fix worked around problem
When buffer resource descriptors were built, the upper two components
of the descriptor were first composed into a 64-bit register because
legalizeOperands assumed all operands had the same register class.
Fix that problem, but keep the workaround. I'm not sure anything
actually is actually emitting such a REG_SEQUENCE now.

If multiple resource descriptors are set up with different base
pointers, this is copied with a single s_mov_b64. We probably
should fix this better by recognizing a pair of s_mov_b32 later,
but for now delete the dead code.

llvm-svn: 248585
2015-09-25 17:08:42 +00:00
Matt Arsenault 3ad55ec946 AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sources
This avoids needting to re-legalize the new REG_SEQUENCE.

llvm-svn: 248584
2015-09-25 17:08:40 +00:00
Matt Arsenault 6525aa3529 AMDGPU: Fix not adding exec to defs of cmpx instruction pseudos
This was only set on the final _si/_vi version, but not
on the pseudos most of codegen sees.

No test since these instructions aren't used yet.

llvm-svn: 248583
2015-09-25 16:58:27 +00:00
Matt Arsenault 5f70436c49 AMDGPU: Improve accuracy of instruction rates for VOPC
These were all using the default 32-bit VALU write class,
but the i64/f64 compares are half rate.

I'm not sure this is really correct, because they are still using
the write to VALU write class, even though they really write
to the SALU.

llvm-svn: 248582
2015-09-25 16:58:25 +00:00
Matt Arsenault 8aa9973696 AMDGPU: Remove unused includes
llvm-svn: 248553
2015-09-25 00:28:43 +00:00
Matt Arsenault e66621b306 AMDGPU: Add s_dcache_* instructions
llvm-svn: 248533
2015-09-24 19:52:27 +00:00
Matt Arsenault d6adfb401c AMDGPU: Add cache invalidation instructions.
These are necessary for implementing mem_fence for
OpenCL 2.0.

The VI assembler tests are disabled since it seems to be
using the wrong encoding or opcode.

llvm-svn: 248532
2015-09-24 19:52:21 +00:00
Matt Arsenault 68d938649e Introduce target hook for optimizing register copies
Allow a target to do something other than search for copies
that will avoid cross register bank copies.

Implement for SI by only rewriting the most basic copies,
so it should look through anything like a subregister extract.

I'm not entirely satisified with this because it seems like
eliminating a reg_sequence that isn't fully used should work
generically for all targets without them having to override
something. However, it seems to be tricky to have a simple
implementation of this without rewriting to invalid  kinds
of subregister copies on some targets.

I'm not sure if there is currently a generic way to easily check
if a subregister index would be valid for the current use.
The current set of TargetRegisterInfo::get*Class functions don't
quite behave like I would expect (e.g. getSubClassWithSubReg
returns the maximal register class rather than the minimal), so
I'm not sure how to make the generic test keep searching if
SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making
the default implementation to check for simple copies breaks
a variety of ARM and x86 tests by producing illegal subregister uses.

The ARM tests are not actually changed since it should still be using
the same sharesSameRegisterFile implementation, this just relaxes
them to not check for specific registers.

llvm-svn: 248478
2015-09-24 08:36:14 +00:00
Matt Arsenault e068f9a263 AMDGPU: Return after instruction is processed.
llvm-svn: 248476
2015-09-24 07:51:28 +00:00
Matt Arsenault 708586faa2 AMDGPU: Remove another unnecessary check from commuteInstruction
llvm-svn: 248475
2015-09-24 07:51:25 +00:00
Matt Arsenault fa242960fc AMDGPU: Add readonly to InstrMapping functions
llvm-svn: 248474
2015-09-24 07:51:23 +00:00
Matt Arsenault cab64f1c75 AMDGPU: Fix printing trailing whitespace for mubuf atomics
llvm-svn: 248472
2015-09-24 07:51:17 +00:00
Matt Arsenault c8e2ce4046 AMDGPU: Reduce number of copies emitted
Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.

This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.

llvm-svn: 248467
2015-09-24 07:16:37 +00:00
NAKAMURA Takumi 0a7d0ad95f Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
NAKAMURA Takumi a9cb538a74 Reformat blank lines.
llvm-svn: 248263
2015-09-22 11:14:39 +00:00
NAKAMURA Takumi 84965031a7 Reformat comment lines.
llvm-svn: 248262
2015-09-22 11:14:12 +00:00
Matt Arsenault f11e7489e1 AMDGPU: Remove unnecessary check
If the instruction doesn't have enough operands, it
either shouldn't be marked as isCommutable or is malformed.

llvm-svn: 248242
2015-09-22 04:17:45 +00:00
Matt Arsenault 85441dd724 AMDGPU: Move copy handling under switch like other instructions
llvm-svn: 248172
2015-09-21 16:27:22 +00:00
Craig Topper 0013be16ff Use makeArrayRef or None to avoid unnecessarily mentioning the ArrayRef type extra times. NFC
llvm-svn: 248140
2015-09-21 05:32:41 +00:00
Craig Topper 4e9b03d6f9 Don't pass StringRefs around by const reference. Pass by value instead per coding standards. NFC
llvm-svn: 248136
2015-09-21 00:18:00 +00:00
Matt Arsenault 1fafdc82d6 AMDGPU: Remove dead code
getCFGStructurizerRegClass is not used for SI, so
move it into R600 specific stuff.

llvm-svn: 248087
2015-09-19 06:41:10 +00:00
Eric Christopher a4e5d3cf8e constify the Function parameter to the TTI creation callback and
propagate to all callers/users/etc.

llvm-svn: 247864
2015-09-16 23:38:13 +00:00
Sanjay Patel a260701bbb propagate fast-math-flags on DAG nodes
After D10403, we had FMF in the DAG but disabled by default. Nick reported no crashing errors after some stress testing, 
so I enabled them at r243687. However, Escha soon notified us of a bug not covered by any in-tree regression tests: 
if we don't propagate the flags, we may fail to CSE DAG nodes because differing FMF causes them to not match. There is
one test case in this patch to prove that point.

This patch hopes to fix or leave a 'TODO' for all of the in-tree places where we create nodes that are FMF-capable. I 
did this by putting an assert in SelectionDAG.getNode() to find any FMF-capable node that was being created without FMF
( D11807 ). I then ran all regression tests and test-suite and confirmed that everything passes.

This patch exposes remaining work to get DAG FMF to be fully functional: (1) add the flags to non-binary nodes such as
FCMP, FMA and FNEG; (2) add the flags to intrinsics; (3) use the flags as conditions for transforms rather than the
current global settings.

Differential Revision: http://reviews.llvm.org/D12095

llvm-svn: 247815
2015-09-16 16:31:21 +00:00
Daniel Sanders 50f17235dd Revert r247692: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Eric has replied and has demanded the patch be reverted.

llvm-svn: 247702
2015-09-15 16:17:27 +00:00
Daniel Sanders 153010c52d Re-commit r247683: Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change. Thanks go to Pavel Labath for fixing LLDB for me.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247692
2015-09-15 14:08:28 +00:00
Daniel Sanders c40de48041 Revert r247684 - Replace Triple with a new TargetTuple ...
LLDB needs to be updated in the same commit.

llvm-svn: 247686
2015-09-15 13:46:21 +00:00
Daniel Sanders 18d4b0dab7 Replace Triple with a new TargetTuple in MCTargetDesc/* and related. NFC.
Summary:
This is the first patch in the series to migrate Triple's (which are ambiguous)
to TargetTuple's (which aren't).

For the moment, TargetTuple simply passes all requests to the Triple object it
holds. Once it has replaced Triple, it will start to implement the interface in
a more suitable way.

This change makes some changes to the public C++ API. In particular,
InitMCSubtargetInfo(), createMCRelocationInfo(), and createMCSymbolizer()
now take TargetTuples instead of Triples. The other public C++ API's have
been left as-is for the moment to reduce patch size.

This commit also contains a trivial patch to clang to account for the C++ API
change.

Reviewers: rengolin

Subscribers: jyknight, dschuff, arsenm, rampitec, danalbert, srhines, javed.absar, dsanders, echristo, emaste, jholewinski, tberghammer, ted, jfb, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D10969

llvm-svn: 247683
2015-09-15 13:17:40 +00:00
Bruce Mitchener e9ffb45b60 Fix typos.
Summary: This fixes a variety of typos in docs, code and headers.

Subscribers: jholewinski, sanjoy, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12626

llvm-svn: 247495
2015-09-12 01:17:08 +00:00
Cong Hou c536bd9e73 Pass BranchProbability/BlockMass by value instead of const& as they are small. NFC.
llvm-svn: 247357
2015-09-10 23:10:42 +00:00
Matt Arsenault e0b44040aa AMDGPU: Simplify debug printing
llvm-svn: 247345
2015-09-10 21:51:19 +00:00
Matt Arsenault 57116cce19 AMDGPU: Use StringRef value
llvm-svn: 247344
2015-09-10 21:51:15 +00:00
Matt Arsenault 80f766a032 AMDGPU/SI: Fix more cases of losing exec operands
llvm-svn: 247230
2015-09-10 01:23:28 +00:00
Matt Arsenault ad46e0c1ab AMDGPU/SI: Fix creating v_mov_b32s without exec uses
This will be caught by existing tests with a
verifier check to be added in a future commit.

llvm-svn: 247229
2015-09-10 01:06:06 +00:00
Matt Arsenault ef67d76869 AMDGPU: Extract full 64-bit subregister and use subregs
Instead of extracting both 32-bit components from the 128-bit
register. This produces fewer copies and is easier for
the copy peephole optimizer to understand and see the actual uses
as extracts from a reg_sequence.

This avoids needing to handle subregister composing in the
PeepholeOptimizer's ValueTracker for this case.

llvm-svn: 247162
2015-09-09 17:03:29 +00:00
Matt Arsenault b5541fb098 AMDGPU: Remove unused multiclass argument
llvm-svn: 247161
2015-09-09 17:03:18 +00:00
Tom Stellard 9a197676b1 AMDGPU/SI: Fold operands through REG_SEQUENCE instructions
Summary:
This helps mostly when we use add instructions for address calculations
that contain immediates.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12256

llvm-svn: 247157
2015-09-09 15:43:26 +00:00
Matt Arsenault d768737454 AMDGPU: Fix not encoding src2 of VOP3b instructions
Broken by r247074. Should include an assembler test,
but the assembler is currently broken for VOP3b apparently.

llvm-svn: 247123
2015-09-09 08:39:49 +00:00
Matt Arsenault acd68b58ae SelectionDAG: Support Expand of f16 extloads
Currently this hits an assert that extload should
always be supported, which assumes integer extloads.

This moves a hack out of SI's argument lowering and
is covered by existing tests.

llvm-svn: 247113
2015-09-09 01:12:27 +00:00
Matt Arsenault 86d336e91b AMDGPU/SI: Fix input vcc operand for VOP2b instructions
Adds vcc to output string input for e32. Allows option
of using e64 encoding with assembler.

Also fixes these instructions not implicitly reading exec.

llvm-svn: 247074
2015-09-08 21:15:00 +00:00
Matt Arsenault 8ac35cd031 AMDGPU: Mark s_barrier as a high latency instruction
These were marked as WriteSALU, which is low latency.
I'm guessing at the value to use, but it should probably
be considered the highest latency instruction.

I'm not sure this has any actual effect since hasSideEffects
probably is preventing any moving of these.

llvm-svn: 247060
2015-09-08 19:54:32 +00:00
Matt Arsenault 8fb810a1d2 AMDGPU: Fix s_barrier flags
This should be convergent. This is not a
barrier in the isBarrier sense, nor
hasCtrlDep.

llvm-svn: 247059
2015-09-08 19:54:25 +00:00
Matt Arsenault 966a94f861 AMDGPU: Handle sub of constant for DS offset folding
sub C, x - > add (sub 0, x), C for DS offsets.

This is mostly to fix regressions that show up when
SeparateConstOffsetFromGEP is enabled.

llvm-svn: 247054
2015-09-08 19:34:22 +00:00
Sanjay Patel ce74db9d8d check for fastness before merging in DAGCombiner::MergeConsecutiveStores()
Use and check the 'IsFast' optional parameter to TLI.allowsMemoryAccess() any time
we have a merged access candidate. Without this patch, we were generating unaligned 
16-byte (SSE) memops for x86 targets where those accesses are slow.

This change was mentioned in:
http://reviews.llvm.org/D10662 and
http://reviews.llvm.org/D10905

and will help solve PR21711.

Differential Revision: http://reviews.llvm.org/D12573

llvm-svn: 246771
2015-09-03 15:03:19 +00:00
Matt Arsenault 51d2d0f668 AMDGPU: Fix adding redundant implicit operands
These are already added during the MachineInstr construction,
so this was adding the implicit registers twice.

llvm-svn: 246525
2015-09-01 02:02:21 +00:00
Matt Arsenault e4d0c142e8 AMDGPU: Add sdst operand to VOP2b instructions
The VOP3 encoding of these allows any SGPR pair for the i1
output, but this was forced before to always use vcc.
This doesn't yet try to use this, but does add the operand
to the definitions so the main change is adding vcc to the
output of the VOP2 encoding.

llvm-svn: 246358
2015-08-29 07:16:50 +00:00
Matt Arsenault 9a32cd3d3b AMDGPU: Set mem operands for spill instructions
llvm-svn: 246357
2015-08-29 06:48:57 +00:00
Matt Arsenault 5c004a7c61 AMDGPU: Fix dropping mem operands when moving to VALU
Without a memory operand, mayLoad or mayStore instructions
are treated as hasUnorderedMemRef, which results in much worse
scheduling.

We really should have a verifier check that any
non-side effecting mayLoad or mayStore has a memory operand.
There are a few instructions (interp and images) which I'm
not sure what / where to add these.

llvm-svn: 246356
2015-08-29 06:48:46 +00:00
Tom Stellard eea72ccbf2 AMDGPU/SI: Fix some invaild assumptions when folding 64-bit immediates
Summary:
We were assuming tha if the use operand had a sub-register that
the immediate was 64-bits, but this was breaking the case of
folding a 64-bit immediate into another 64-bit instruction.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12255

llvm-svn: 246354
2015-08-29 01:58:21 +00:00
Tom Stellard b8ce14c4c3 AMDGPU/SI: Factor operand folding code into its own function
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12254

llvm-svn: 246353
2015-08-28 23:45:19 +00:00
Matt Arsenault 8a067121f8 AMDGPU: Delete dead code
There is no context where s_mov_b64 is emitted
and could potentially be moved to the VALU.
It is currently only emitted for materializing
immediates, which can't be dependent on vector sources.

The immediate splitting is already done when selecting
constants. I'm not sure what contexts if any the register
splitting would have been used before.

Also clean up using s_mov_b64 in place of v_mov_b64_pseudo,
although this isn't required and just skips the extra step
of eliminating the copy from the SReg_64.

llvm-svn: 246080
2015-08-26 20:48:08 +00:00
Matt Arsenault 5e7f95e567 AMDGPU: Don't reprocess instructions when splitting i64 bcnt
llvm-svn: 246079
2015-08-26 20:48:04 +00:00
Matt Arsenault 445833cc91 AMDGPU: Fix not moving users of s_bfe_i64 to VALU
This wouldn't propagate to users of the original BFE
and would hit a verifier error.

llvm-svn: 246078
2015-08-26 20:47:58 +00:00
Matt Arsenault f003c38e1e AMDGPU: Don't create intermediate SALU instructions
When splitting 64-bit operations, create the correct
VALU instructions immediately.

This was splitting things like s_or_b64 into the two
s_or_b32s and then pushing the new instructions
onto the worklist. There's no reason we need
to do this intermediate step.

llvm-svn: 246077
2015-08-26 20:47:50 +00:00
Matt Arsenault 602a16d3db AMDGPU/SI: Report SIFixSGPRLiveRanges changed function
llvm-svn: 246056
2015-08-26 19:12:03 +00:00
Matt Arsenault bd66061db7 AMDGPU: Make sure to reserve super registers
I think this could potentially have broken if
one of the super registers were allocated
that contain v254/v255.

llvm-svn: 246051
2015-08-26 18:54:50 +00:00
Matt Arsenault 19c5488015 AMDGPU: Produce error on dynamic_stackalloc
llvm-svn: 246048
2015-08-26 18:37:13 +00:00
Matt Arsenault 0a3ac1be43 AMDGPU: Allow specifying different opcode on VI for SMRD/SMEM
Although the basic s_load_* instructions happen to use the same
opcode, some of the special case SMRD instructions have
different opcodes.

llvm-svn: 245775
2015-08-22 00:54:31 +00:00
Matt Arsenault e8df879948 AMDGPU: Improve accuracy of instruction rates for some FP instructions
llvm-svn: 245774
2015-08-22 00:50:41 +00:00
Matt Arsenault 33010103b7 AMDGPU: Use DFS to avoid second loop over function
llvm-svn: 245772
2015-08-22 00:43:38 +00:00
Matt Arsenault c8d8e4ed76 AMDGPU: Make sure to run verifier after SIFixSGPRLiveRanges
llvm-svn: 245769
2015-08-22 00:19:34 +00:00
Matt Arsenault aba29d6ab1 AMDGPU: Improve debug printing in SIFixSGPRLiveRanges
llvm-svn: 245768
2015-08-22 00:19:25 +00:00
Matt Arsenault 6adf07a92e AMDGPU: Move CI instructions into CIInstructions.td
There are still a couple of CI patterns left in SIInstructions.

llvm-svn: 245767
2015-08-22 00:16:34 +00:00
Matt Arsenault f56872dc30 AMDGPU: Minor cleanups to help with f16 support
The main change is inverting the condition for the
operand class classes so that VT.Size == 16 uses VGPR_32
instead of 64.

llvm-svn: 245764
2015-08-21 23:49:51 +00:00
Tom Stellard bd8a0856e2 AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.

Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.

Here's an example of subtle perf reduction this patch solves:

This is without the patch:

buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen

The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.

Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.

Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.

Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.

Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.

Patch by: Axel Davy

Differential Revision: http://reviews.llvm.org/D11883

llvm-svn: 245755
2015-08-21 22:47:27 +00:00
Michael Kuperstein dcdab4cd3a [TLI] Refactor "is integer division cheap" queries.
This removes the isPow2SDivCheap() query, as it is not currently used in
any meaningful way. isIntDivCheap() no longer relies on a state variable
(as all in-tree target set it to false), but the interface allows querying
based on the type optimization level.

NFC.

Differential Revision: http://reviews.llvm.org/D12082

llvm-svn: 245430
2015-08-19 11:17:59 +00:00
Matthias Braun d55bcf2646 MachineRegisterInfo: Introduce isPhysRegUsed()
This method checks whether a physical regiser or any of its aliases are
used in the function.

Using this function in SIRegisterInfo::findUnusedReg() should also fix
this reported failure:

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html
http://reviews.llvm.org/rL242173#inline-533

The report doesn't come with a testcase and I don't know enough about
AMDGPU to create one myself.

llvm-svn: 245329
2015-08-18 18:54:27 +00:00
Yaron Keren 178c465223 Add missing include guard.
llvm-svn: 245173
2015-08-16 07:55:08 +00:00
Matt Arsenault 588732bd6e AMDGPU/SI: Only look at live out SGPR defs
When trying to fix SGPR live ranges, skip defs that are
killed in the same block as the def. I don't think
we need to worry about these cases as long as the
live ranges of the SGPRs in dominating blocks are
correct.

This reduces the number of elements the second
loop over the function needs to look at, and makes
it generally easier to understand. The second loop
also only considers if the live range is live
in to a block, which logically means it
must have been live out from another.

llvm-svn: 245150
2015-08-15 02:58:49 +00:00
James Y Knight 5567bafe93 Remove redundant TargetFrameLowering::getFrameIndexOffset virtual
function.

This was the same as getFrameIndexReference, but without the FrameReg
output.

Differential Revision: http://reviews.llvm.org/D12042

llvm-svn: 245148
2015-08-15 02:32:35 +00:00
Matt Arsenault 297ae311ce AMDGPU/SI: Fix printing useless info with amdhsa
The comments at the bottom would all report 0 if
amdhsa was used.

llvm-svn: 245135
2015-08-15 00:12:39 +00:00
Matt Arsenault 0259a7aa41 AMDGPU/SI: Update LiveVariables
This is simple but won't work if/when this pass
is moved to be post-SSA.

llvm-svn: 245134
2015-08-15 00:12:37 +00:00
Matt Arsenault 670ba46efe AMDGPU/SI: Update LiveIntervals during SIFixSGPRLiveRanges
Does not mark SlotIndexes as reserved, although I think
that might be OK.

LiveVariables still need to be handled.

llvm-svn: 245133
2015-08-15 00:12:35 +00:00
Matt Arsenault b75233235c AMDGPU: Remove unnecessary assert
These shouldn't ever be null. The number of successors
was already asserted to be 2.

llvm-svn: 245132
2015-08-15 00:12:32 +00:00
Matt Arsenault 4275c29a02 AMDGPU/SI: Make comments more precise.
True branch instructions do behave as expected with liveness.

Avoid the phrasing "branch decision is based on a value in an SGPR"
because this could be misleading. A VALU compare instruction's
result is still based on an SGPR, even though that condition
may be divergent.

llvm-svn: 245131
2015-08-15 00:12:30 +00:00
Tom Stellard bef1094ee7 AMDGPU/SI: Add missing spill class
The compiler was failing to spill for some shaders.

Patch By: Axel Davy

llvm-svn: 245087
2015-08-14 19:46:05 +00:00
Simon Pilgrim 7218251861 [AMDGPU] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the AMDGPU implementation
D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect.

Differential Revision: http://reviews.llvm.org/D12007

llvm-svn: 244960
2015-08-13 21:40:02 +00:00
Yaron Keren 556b21aa10 Remove and forbid raw_svector_ostream::flush() calls.
After r244870 flush() will only compare two null pointers and return,
doing nothing but wasting run time. The call is not required any more
as the stream and its SmallString are always in sync.

Thanks to David Blaikie for reviewing.

llvm-svn: 244928
2015-08-13 18:12:56 +00:00
Matt Arsenault c574686529 AMDGPU: Fix assert on dbg_value instructions
llvm-svn: 244728
2015-08-12 09:04:44 +00:00
Alex Lorenz e40c8a2b26 PseudoSourceValue: Replace global manager with a manager in a machine function.
This commit removes the global manager variable which is responsible for
storing and allocating pseudo source values and instead it introduces a new
manager class named 'PseudoSourceValueManager'. Machine functions now own an
instance of the pseudo source value manager class.

This commit also modifies the 'get...' methods in the 'MachinePointerInfo'
class to construct pseudo source values using the instance of the pseudo
source value manager object from the machine function.

This commit updates calls to the 'get...' methods from the 'MachinePointerInfo'
class in a lot of different files because those calls now need to pass in a
reference to a machine function to those methods.

This change will make it easier to serialize pseudo source values as it will
enable me to transform the mips specific MipsCallEntry PseudoSourceValue
subclass into two target independent subclasses.

Reviewers: Akira Hatanaka
llvm-svn: 244693
2015-08-11 23:09:45 +00:00
Benjamin Kramer df005cbe19 Fix some comment typos.
llvm-svn: 244402
2015-08-08 18:27:36 +00:00
Tom Stellard 30cf77457d AMDGPU/SI: Another attempt to fix Windows bots broken by r244372
llvm-svn: 244383
2015-08-08 01:11:07 +00:00
Matt Arsenault cbd753761a AMDGPU: Implement AMDGPUOperand::print()
llvm-svn: 244381
2015-08-08 00:41:51 +00:00
Matt Arsenault 4635915504 AMDGPU/SI: Remove VCCReg
llvm-svn: 244380
2015-08-08 00:41:48 +00:00
Matt Arsenault 6942d1a034 AMDGPU/SI: Remove source uses of VCCReg
llvm-svn: 244379
2015-08-08 00:41:45 +00:00
Tom Stellard fc70950bf2 AMDGPU/SI: Attempt to fix Windows bots broken by r244372
llvm-svn: 244376
2015-08-08 00:17:59 +00:00
Tom Stellard fd25395c72 AMDGPU: Add pass to lower OpenCL image and sampler arguments.
The pass adds new kernel arguments for image attributes, and
resolves calls to dummy attribute and resource id getter functions.

Patch by: Zoltan Gilian

llvm-svn: 244372
2015-08-07 23:19:30 +00:00
Tom Stellard 8ebad11ee9 AMDGPU/SI: Use InstAlias instead of MnemonicAlias for VOPC instructions
Summary:
With InstAlias, we don't need to print the _e32 portion of the mnemonic
when we print the $dst operand.  This change makes it possible to
include vcc in the asm string when we switch VOPC over to having
implicit vcc defs.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11813

llvm-svn: 244362
2015-08-07 22:00:56 +00:00
Matt Arsenault 711b390a7c AMDGPU: Assume SMRD access for constant address space
Since r243294 these are selected to SMRD and
moved later if required.

llvm-svn: 244354
2015-08-07 20:18:34 +00:00
Tom Stellard c8733e805e AMDGPU/SI: Use correct encoding of vopc for VI in the assembler
Summary: We were using the SI encoding for VI.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11812

llvm-svn: 244332
2015-08-07 16:45:33 +00:00
Tom Stellard 85656cabfb AMDGPU/SI: v_mac_legacy_f32 does not exist on VI
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11810

llvm-svn: 244322
2015-08-07 15:34:30 +00:00
Tom Stellard 11f19f78f0 AMDGPU/SI: Remove unused outs parameter from VOPC TableGen classes
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11809

llvm-svn: 244321
2015-08-07 15:34:27 +00:00
Tom Stellard d488605ed3 AMDGPU/SI: Add Fiji support
Patch by: Alex Deucher

llvm-svn: 244255
2015-08-06 19:43:02 +00:00
Tom Stellard 217361c33f AMDGPU/SI: Add support for 32-bit immediate SMRD offsets on CI
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11604

llvm-svn: 244254
2015-08-06 19:28:38 +00:00
Tom Stellard dee26a2876 AMDGPU/SI: Use ComplexPatterns for SMRD addressing modes
Summary: This allows us to consolidate several of the TableGen patterns.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11602

llvm-svn: 244253
2015-08-06 19:28:30 +00:00
Matt Arsenault 95f0606e62 AMDGPU/SI: Remove EXECReg
For the same reasons as the other physical registers.

llvm-svn: 244062
2015-08-05 16:42:57 +00:00
Matt Arsenault 4c0487bff6 AMDGPU: Remove SCCReg.
These should be handled as a physical register rather
than a virtual register class with one member.

llvm-svn: 244061
2015-08-05 16:42:54 +00:00
Craig Topper e3dcce9700 De-constify pointers to Type since they can't be modified. NFC
This was already done in most places a while ago. This just fixes the ones that crept in over time.

llvm-svn: 243842
2015-08-01 22:20:21 +00:00
Alex Lorenz b4d0d6a345 AMDGPU/SI: Add implicit register operands in the correct order.
This commit fixes a bug in the class 'SIInstrInfo' where the implicit register
machine operands were added to a machine instruction in an incorrect order -
the implicit uses were added before the implicit defs.

I found this bug while working on moving the implicit register operand
verification code from the MIR parser to the machine verifier.

This commit also makes the method 'addImplicitDefUseOperands' in the machine
instruction class public so that it can be reused in the 'SIInstrInfo' class.

Reviewers: Matt Arsenault

Differential Revision: http://reviews.llvm.org/D11689

llvm-svn: 243799
2015-07-31 23:30:09 +00:00
Matt Arsenault e1ce344b5a AMDGPU: Fix v16i32 to v16i8 truncstore
llvm-svn: 243731
2015-07-31 04:12:04 +00:00
Matt Arsenault ba01337942 AMDGPU/SI: Set DwarfRegNum
This requires a fix in tablegen for the cast<int> from bits<16>
to work in the list initializer.

llvm-svn: 243723
2015-07-31 01:12:10 +00:00
Tom Stellard 82325598c3 AMDGPU/SI: Remove unused pattern for f32 constant loads
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11603

llvm-svn: 243719
2015-07-31 01:02:32 +00:00
Matt Arsenault 7a0c3a92c0 AMDGPU: Set SubRegIndex size and offset
I'm not sure what reasons the comment here could have
had for not setting these. Without these set, there is
an assertion hit during DWARF emission.

llvm-svn: 243661
2015-07-30 17:03:11 +00:00
Matt Arsenault b39e858356 AMDGPU: Fix unreachable when emitting binary debug info
Copy implementation of applyFixup from AArch64 with AArch64 bits
ripped out.

Tests will be included with a later commit. Several other
problems must be fixed before binary debug info emission
will work.

llvm-svn: 243660
2015-07-30 17:03:08 +00:00
Tom Stellard 4229aa942d AMDGPU/SI: Simplify moveSMRDToVALU()
Summary:
Replace the switch on instruction opcode with a switch on register size.
This way we don't need to update the switch statement when we add new
SMRD variants.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11601

llvm-svn: 243652
2015-07-30 16:20:42 +00:00
Tom Stellard 9d74076065 AMDGPU/SI: Remove isTriviallyReMaterializable() function from SIInstrInfo
Summary:
This function is never called.  isReallyTriviallyReMaterializable() is
the function that should be implemented instead.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11620

llvm-svn: 243651
2015-07-30 16:20:40 +00:00
Nick Lewycky c3890d2969 Fix typo "fuction" noticed in comments in AssumptionCache.h, and also all the other files that have the same typo. All comments, no functionality change! (Merely a "fuctionality" change.)
Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h.

llvm-svn: 243585
2015-07-29 22:32:47 +00:00
Alex Lorenz d8a1e542ab Fix broken ArrayRef conversion from r243497.
llvm-svn: 243501
2015-07-28 23:34:27 +00:00
Alex Lorenz ef5c196fb0 MIR Serialization: Serialize the target index machine operands.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 243497
2015-07-28 23:02:45 +00:00
Matt Arsenault 7227cc1a48 AMDGPU: Don't try to use LDS/vector for private if pointer value stored
If the pointer is the store's value operand, this would produce
a broken module. Make sure the use is actually for the pointer operand.

llvm-svn: 243462
2015-07-28 18:47:00 +00:00
Matt Arsenault fdcd39a8ad AMDGPU: Fix crash if called function is a bitcast
getCalledFunction() is null, so this would crash. Replace
crash with an error on unsupported call.

llvm-svn: 243461
2015-07-28 18:29:14 +00:00
Matt Arsenault 916cea5682 AMDGPU: Fix return type of getImplicitParameterOffset.
Patch by Zoltan Gilian <zoltan.gilian@gmail.com>

llvm-svn: 243459
2015-07-28 18:09:55 +00:00
Colin LeMahieu fe2c8b8015 [llvm-mc] Pushing plumbing through for --fatal-warnings flag.
llvm-svn: 243334
2015-07-27 21:56:53 +00:00
Marek Olsak 93df060871 AMDGPU: don't match vgpr loads for constant loads
Author: Dave Airlie <airlied@redhat.com>

In order to implement indirect sampler loads, we don't
want to match on a VGPR load but an SGPR one for constants,
as we cannot feed VGPRs to the sampler only SGPRs.

this should be applicable for llvm 3.7 as well.

llvm-svn: 243294
2015-07-27 18:16:08 +00:00
Marek Olsak 1354b87695 AMDGPU/SI: Fix the V_FRACT_F64 SI bug workaround
This is a candidate for 3.7.

llvm-svn: 243263
2015-07-27 11:37:42 +00:00
Chandler Carruth 96ada25bf3 [PM/AA] Remove all of the dead AliasAnalysis pointers being threaded
through APIs that are no longer necessary now that the update API has
been removed.

This will make changes to the AA interfaces significantly less
disruptive (I hope). Either way, it seems like a really nice cleanup.

llvm-svn: 242882
2015-07-22 09:52:54 +00:00
Matt Arsenault f849bb49cc AMDGPU: Set isMoveImm on s_movk_i32
llvm-svn: 242747
2015-07-21 00:40:08 +00:00
Tom Stellard 70580f83cc AMDGPU/SI: Add VI patterns to select FLAT instructions for global memory ops
Summary:
The MUBUF addr64 bit has been removed on VI, so we must use FLAT
instructions when the pointer is stored in VGPRs.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11067

llvm-svn: 242673
2015-07-20 14:28:41 +00:00
Simon Pilgrim ba51d116c4 Remove TargetInstrInfo::canFoldMemoryOperand
canFoldMemoryOperand is not actually used anywhere in the codebase - all existing users instead call foldMemoryOperand directly when they wish to fold and can correctly deduce what they need from the return value. 

This patch removes the canFoldMemoryOperand base function and the target implementations; only x86 had a real (bit-rotted) implementation, although AMDGPU had a preparatory stub that had never needed to be completed.

Differential Revision: http://reviews.llvm.org/D11331

llvm-svn: 242638
2015-07-19 10:50:53 +00:00
Tom Stellard 78655fcfdc AMDPGU/SI: Negative offsets aren't allowed in MUBUF's vaddr operand
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11226

llvm-svn: 242434
2015-07-16 19:40:09 +00:00
Tom Stellard c98ee20328 AMDPGU/SI: Use AssertZext node to mask high bit for scratch offsets
Summary:
We can safely assume that the high bit of scratch offsets will never
be set, because this would require at least 128 GB of GPU memory.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11225

llvm-svn: 242433
2015-07-16 19:40:07 +00:00
Tom Stellard ff5efb8c03 AMDGPU/R600: Remove unused variable
This fixes a warning introduced by r242410.

llvm-svn: 242412
2015-07-16 16:13:34 +00:00
Tom Stellard 1d46fb2d2f AMDPGU/R600: Replace llvm_unreachable() call with LLVMContext::emitError()
Summary:
This fixes an issue on MIPS where the infinite-loop-evergreen.ll test
was failing to terminate.

Fixes PR24147.

Reviewers: arsenm, dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11260

llvm-svn: 242410
2015-07-16 15:38:29 +00:00
Mehdi Amini e029eae634 Add missing break in switch case in R600ISelLowering
Summary: Catched by coverity.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11120

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 242388
2015-07-16 06:23:12 +00:00
Pete Cooper 65c69407c8 Add allnodes() iterator range to SelectionDAG. NFC.
SelectionDAG already had begin/end methods for iterating over all
the nodes, but didn't define an iterator_range for us in foreach
loops.

This adds such a method and uses it in some of the eligible places
throughout the backends.

llvm-svn: 242212
2015-07-14 22:10:54 +00:00
Matt Arsenault 24692118ba AMDGPU: Avoid using 64-bit shift for i64 (shl x, 32)
This can be done only with moves which theoretically
will optimize better later.

Although this transform increases the instruction count,
it should be code size / cycle count neutral in the worst
VALU case. It also seems to slightly improve a couple
of testcases due to other DAG combines this exposes.

This is probably slightly worse for the SALU case, so
it might be better to handle this during moveToVALU,
although then you lose some simplifications like
the load width reducing in the simple testcase.

llvm-svn: 242177
2015-07-14 18:20:33 +00:00
Matt Arsenault 84db5d97b0 AMDGPU/SI: Fix read2 merging into a super register.
If the read2 produced was supposed to be writing into a
super register, it would use the wrong subregister indices.
Fix this by inserting copies, so we only ever write to a vreg_64.
Run the register coalescer again to clean this up, although this
isn't ideal and often does result in an extra move.

Also remove the assert that offset1 > offset0.

There isn't a real reason to not allow this other than a minor
convenience in the compiler, and it doesn't seem worth the effort
of avoiding it.

llvm-svn: 242174
2015-07-14 17:57:36 +00:00
Matthias Braun 9912bb817c MachineRegisterInfo: Remove UsedPhysReg infrastructure
We have a detailed def/use lists for every physical register in
MachineRegisterInfo anyway, so there is little use in maintaining an
additional bitset of which ones are used.

Removing it frees us from extra book keeping. This simplifies
VirtRegMap.

Differential Revision: http://reviews.llvm.org/D10911

llvm-svn: 242173
2015-07-14 17:52:07 +00:00
Tom Stellard e48fe2a27a AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructions
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11061

llvm-svn: 242146
2015-07-14 14:15:03 +00:00
Matt Arsenault ca95d44110 AMDGPU: Minor cleanups to always inline pass
llvm-svn: 242053
2015-07-13 19:08:36 +00:00
Tom Stellard db5a11f698 AMDGPU/SI: Select mad patterns to v_mac_f32
The two-address instruction pass will convert these back to v_mad_f32
if necessary.

Differential Revision: http://reviews.llvm.org/D11060

llvm-svn: 242038
2015-07-13 15:47:57 +00:00
Matt Arsenault cf13d18730 AMDGPU: Fix chains for memory ops dependent on argument loads
Most loads and stores are derived from pointers derived from
a kernel argument load inserted during argument lowering.
This was just using the EntryToken chain for the argument loads,
and any users of these loads were also on the EntryToken chain.

Return the chain of the lowered argument load so that dependent loads
end up on the correct chain.

No test since I'm not aware of any case where this actually
broke.

llvm-svn: 241960
2015-07-10 22:51:36 +00:00
Duncan P. N. Exon Smith 754e21f244 MC: Remove MCSubtargetInfo() default constructor
Force all creators of `MCSubtargetInfo` to immediately initialize it,
merging the default constructor and the initializer into an initializing
constructor.  Besides cleaning up the code a little, this makes it clear
that the initializer is never called again later.

Out-of-tree backends need a trivial change: instead of calling:

    auto *X = new MCSubtargetInfo();
    InitXYZMCSubtargetInfo(X, ...);
    return X;

they should call:

    return createXYZMCSubtargetInfoImpl(...);

There's no real functionality change here.

llvm-svn: 241957
2015-07-10 22:43:42 +00:00
Matt Arsenault 0d5197380c AMDGPU: Use requested chain when lowering arguments
No test since I'm not aware of any case where this will
end up being a different chain.

llvm-svn: 241954
2015-07-10 22:28:41 +00:00
Tom Stellard dcb9f0907f AMDGPU: Add helper function for implicit parameter offsets.
Patch by: Zoltan Gilian

llvm-svn: 241861
2015-07-09 21:20:37 +00:00
Matt Arsenault 8b03e6c164 AMDGPU/R600: Return correct chain when lowering loads
The other LowerLOAD should be returning the correct chain.

llvm-svn: 241839
2015-07-09 18:47:03 +00:00
Tom Stellard ab6e9c0f94 AMDGPU/SI: The SIShrinkInstructions pass should only fold immediates with one use
This is convered by existing testcases and will be exposed by a future
commit.

llvm-svn: 241817
2015-07-09 16:30:36 +00:00
Tom Stellard 9ebf7ca2f0 AMDGPU/SI: Fix crash on physical registers in SIInstrInfo::isOperandLegal()
No test case for this.  I ran into it while working on some improvements
to SIShrinkInstructions.cpp.

llvm-svn: 241816
2015-07-09 16:30:27 +00:00
Mehdi Amini eaabc51e78 Re-instate the EVT parameter to getScalarShiftAmountTy() for OOT user
A documentation for this function would be nice by the way.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241807
2015-07-09 15:12:23 +00:00
Mehdi Amini a749f2ad47 Remove getDataLayout() from TargetLowering
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: yaron.keren, rafael, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D11042

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241779
2015-07-09 02:09:52 +00:00
Mehdi Amini 0cdec1e2ab Make isLegalAddressingMode() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11040

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241778
2015-07-09 02:09:40 +00:00
Mehdi Amini 9639d650bb Make TargetLowering::getShiftAmountTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11037

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241776
2015-07-09 02:09:20 +00:00
Mehdi Amini 44ede33a69 Make TargetLowering::getPointerTy() taking DataLayout as an argument
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, ted, yaron.keren, rafael, llvm-commits

Differential Revision: http://reviews.llvm.org/D11028

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241775
2015-07-09 02:09:04 +00:00
Mehdi Amini 5010ebf181 Make TargetTransformInfo keeping a reference to the Module DataLayout
DataLayout is no longer optional. It was initialized with or without
a DataLayout, and the DataLayout when supplied could have been the
one from the TargetMachine.

Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.

Reviewers: echristo

Subscribers: jholewinski, llvm-commits, rafael, yaron.keren

Differential Revision: http://reviews.llvm.org/D11021

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241774
2015-07-09 02:08:42 +00:00
Matt Arsenault db7781c6e9 AMDGPU: Run SIInsertWaits as pre-emit pass
Running this after the scheduler enables scheduling
waits later so other ALU instructions can run while
this would be waiting.

When combined with enabling the post-RA scheduler, this
gives about a ~20% improvement on sgemm.

llvm-svn: 241473
2015-07-06 17:02:20 +00:00
Daniel Sanders f423f5627c Change the last few internal StringRef triples into Triple objects.
Summary:
This concludes the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.

At this point, the StringRef-form of GNU Triples should only be used in the
public API (including IR serialization) and a couple objects that directly
interact with the API (most notably the Module class). The next step is to
replace these Triple objects with the TargetTuple object that will represent
our authoratative/unambiguous internal equivalent to GNU Triples.

Reviewers: rengolin

Subscribers: llvm-commits, jholewinski, ted, rengolin

Differential Revision: http://reviews.llvm.org/D10962

llvm-svn: 241472
2015-07-06 16:56:07 +00:00
Matt Arsenault 706f930b72 AMDGPU/SI: Add debugging subtarget feature for DS offsets
We don't have a good way to detect most situations where
DS offsets are usable on SI, so add an option to force using
them even if unsafe for debugging performance problems.

llvm-svn: 241462
2015-07-06 16:01:58 +00:00
Benjamin Kramer 9bfb627a0e [TargetLowering] StringRefize asm constraint getters.
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.

llvm-svn: 241411
2015-07-05 19:29:18 +00:00
Matt Arsenault 24e33d10a0 AMDGPU: Fix indentation of switch
llvm-svn: 241380
2015-07-03 23:33:38 +00:00
Ranjeet Singh 86ecbb7b54 Reverting r241058 because it's causing buildbot failures.
llvm-svn: 241061
2015-06-30 12:32:53 +00:00
Ranjeet Singh 5b119091a1 There are a few places where subtarget features are still
represented by uint64_t, this patch replaces these
usages with the FeatureBitset (std::bitset) type.

Differential Revision: http://reviews.llvm.org/D10542

llvm-svn: 241058
2015-06-30 11:30:42 +00:00
Matt Arsenault 8ebce8f12b AMDGPU/SI: Fix extra space when printing v_div_fmas_*
llvm-svn: 240911
2015-06-28 18:16:14 +00:00
Tom Stellard 4694ed0a14 AMDPGU/SI: Use correct resource descriptors for VI on HSA
Summary: We need to set MTYPE = 2 for VI shaders when targeting the HSA runtime.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D10777

llvm-svn: 240841
2015-06-26 21:58:42 +00:00
Tom Stellard ff7416ba06 AMDGPU/SI: Update amd_kernel_code_t definition and add assembler support
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10772

llvm-svn: 240839
2015-06-26 21:58:31 +00:00
Tom Stellard 833ae4fadd AMDGPU/SI: Remove unused variable
This should fix some bots that were broken by r240831.

llvm-svn: 240838
2015-06-26 21:58:26 +00:00
Tom Stellard 91efe9cebe AMDGPU/SI: Set ELF OS/ABI to ELFOSABI_AMDGPU_HSA
Reviewers: arsenm, rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10708

llvm-svn: 240832
2015-06-26 21:15:11 +00:00
Tom Stellard 347ac79b15 AMDGPU/SI: Add hsa code object directives
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10757

llvm-svn: 240831
2015-06-26 21:15:07 +00:00
Tom Stellard b5798b09d3 AMDGPU/SI: There are no implicit kernel args in the amdhsa ABI
Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10706

llvm-svn: 240830
2015-06-26 21:15:03 +00:00
Tom Stellard f151a45ccd AMDGPU/SI: Emit amd_kernel_code_t in EmitFunctionBodyStart()
Summary:
This way the function symbol points to the start of amd_kernel_code_t
rather than the start of the function.

Reviewers: arsenm

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10705

llvm-svn: 240829
2015-06-26 21:14:58 +00:00
Marek Olsak cfbdba2d0b AMDGPU: really don't commute REV opcodes if the target variant doesn't exist
If pseudoToMCOpcode failed, we would return the original opcode, so operands
would be swapped, but the instruction would remain the same.
It resulted in LSHLREV a, b ---> LSHLREV b, a.

This fixes Glamor text rendering and
piglit/arb_sample_shading-builtin-gl-sample-mask on VI.

This is a candidate for stable branches.

v2: the test was simplified by Tom Stellard
llvm-svn: 240824
2015-06-26 20:29:10 +00:00
Benjamin Kramer e61cbd1f3a Replace copy-pasted debug value skipping with MBB::getLastNonDebugInstr
No functional change intended.

llvm-svn: 240639
2015-06-25 13:28:24 +00:00
Alexander Kornienko f00654e31b Revert r240137 (Fixed/added namespace ending comments using clang-tidy. NFC)
Apparently, the style needs to be agreed upon first.

llvm-svn: 240390
2015-06-23 09:49:53 +00:00
Matt Arsenault 0b554ed364 AMDGPU: Use getAsInteger instead of atoi
llvm-svn: 240365
2015-06-23 02:05:55 +00:00
Tom Stellard f0296cee9b R600/SI: Use ELF64 format instead of ELF32
Reviewers: arsenm, rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10392

llvm-svn: 240331
2015-06-22 21:03:54 +00:00
Tom Stellard 3aed34e947 R600: Use EM_AMDGPU for the ELF Machine type
Reviewers: arsenm, rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10390

llvm-svn: 240330
2015-06-22 21:03:52 +00:00
Alexander Kornienko 70bc5f1398 Fixed/added namespace ending comments using clang-tidy. NFC
The patch is generated using this command:

tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
  -checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
  llvm/lib/


Thanks to Eugene Kosov for the original patch!

llvm-svn: 240137
2015-06-19 15:57:42 +00:00
Eric Christopher 572e03a396 Fix "the the" in comments.
llvm-svn: 240112
2015-06-19 01:53:21 +00:00
Matt Arsenault 417c93e3c1 AMDGPU: Change unreachable into reported error
llvm-svn: 239943
2015-06-17 20:55:25 +00:00
Sanjoy Das b666ea369c [TargetInstrInfo] Rename getLdStBaseRegImmOfs and implement for x86.
Summary:

TargetInstrInfo::getLdStBaseRegImmOfs to
TargetInstrInfo::getMemOpBaseRegImmOfs and implement for x86.  The
implementation only handles a few easy cases now and will be made more
sophisticated in the future.

This is NFCI: the only user of `getLdStBaseRegImmOfs` (now
`getmemOpBaseRegImmOfs`) is `LoadClusterMotion` and `LoadClusterMotion`
is disabled for x86.

Reviewers: reames, ab, MatzeB, atrick

Reviewed By: MatzeB, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D10199

llvm-svn: 239741
2015-06-15 18:44:14 +00:00
Tom Stellard 104ad064df AMDGPU: s/R600/AMDGPU/ in the Makefiles
Now the library names in the Makefiles match the library names in
LLVMBuild.txt.

This should hopefully fix the remaining bot failures.

llvm-svn: 239661
2015-06-13 05:11:14 +00:00
Tom Stellard 45bb48ea19 R600 -> AMDGPU rename
llvm-svn: 239657
2015-06-13 03:28:10 +00:00
Tom Stellard 1be1aa84ec Revert "AMDGPU: Add core backend files for R600/SI codegen v6"
This reverts commit 4ea70107c5e51230e9e60f0bf58a0f74aa4885ea.

llvm-svn: 160303
2012-07-16 18:19:53 +00:00
Tom Stellard 151dc338e4 Revert "Target/AMDGPU/R600KernelParameters.cpp: Fix two includes, <llvm/IRBuilder.h> and <llvm/TypeBuilder.h>"
This reverts commit 0258a6bdd30802f5cc0e8e57c8e768fde2aef590.

llvm-svn: 160299
2012-07-16 18:19:41 +00:00
Tom Stellard 1bd3012505 Revert "Target/AMDGPU: [CMake] Fix dependencies. 1) Add intrinsics_gen. Add AMDGPUCommonTableGen."
This reverts commit ebc934ba32ee71abbb8f0f2eb6a0fbaa613ba0d2.

llvm-svn: 160298
2012-07-16 18:19:40 +00:00
Tom Stellard 781853e11f Revert "Target/AMDGPU/R600KernelParameters.cpp: Don't use "and", "or" as conditional operator..."
This reverts commit 29f28bc14ad5a907f5dc849f004fafeec0aab33a.

llvm-svn: 160297
2012-07-16 18:19:38 +00:00
Tom Stellard 2e007de42d Revert "Target/AMDGPU/AMDILIntrinsicInfo.cpp: Use llvm_unreachable() in nonreturn function, instead of assert(0)."
This reverts commit 4ba4acc1bc2561b944a571edbb6a2dc78e357dfe.

llvm-svn: 160296
2012-07-16 18:19:37 +00:00
Tom Stellard f65e78b2fa Revert "Target/AMDGPU: Fix includes, or msvc build failed."
This reverts commit fef4aa1b16fcf7a472559abbbcf4c1adc9eb5ca6.

llvm-svn: 160295
2012-07-16 18:19:32 +00:00
NAKAMURA Takumi 96cc5e5bf9 Target/AMDGPU: Fix includes, or msvc build failed.
llvm-svn: 160280
2012-07-16 15:43:50 +00:00
NAKAMURA Takumi dc4261794f Target/AMDGPU/AMDILIntrinsicInfo.cpp: Use llvm_unreachable() in nonreturn function, instead of assert(0).
llvm-svn: 160279
2012-07-16 15:43:09 +00:00
NAKAMURA Takumi 5f5fd8e545 Target/AMDGPU/R600KernelParameters.cpp: Don't use "and", "or" as conditional operator...
llvm-svn: 160278
2012-07-16 15:42:35 +00:00
NAKAMURA Takumi bb42a5e2cf Target/AMDGPU: [CMake] Fix dependencies. 1) Add intrinsics_gen. Add AMDGPUCommonTableGen.
llvm-svn: 160276
2012-07-16 15:09:11 +00:00
NAKAMURA Takumi 3128d26124 Target/AMDGPU/R600KernelParameters.cpp: Fix two includes, <llvm/IRBuilder.h> and <llvm/TypeBuilder.h>
llvm-svn: 160275
2012-07-16 15:08:47 +00:00
Tom Stellard bcce80fa95 AMDGPU: Add core backend files for R600/SI codegen v6
llvm-svn: 160270
2012-07-16 14:17:08 +00:00