Commit Graph

1912 Commits

Author SHA1 Message Date
Tom Stellard ef33c4b3f2 AMDGPU/SI: Emit fixups for long branches
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25366

llvm-svn: 283570
2016-10-07 16:01:18 +00:00
Artem Tamazov 73f1ab28cd [AMDGPU][mc] Add support for buffer_load_dwordx3, buffer_store_dwordx3.
Partially fixes Bug 28232.
Lit tests added.

Differential Revision: https://reviews.llvm.org/D25367

llvm-svn: 283567
2016-10-07 15:53:16 +00:00
Sam Kolton a3ec5c10e2 [AMDGPU] Assembler: support v_mac_f32 DPP and SDWA. Move getNamedOperandIdx to AMDGPUBaseInfo.h
Reviewers: artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25084

llvm-svn: 283560
2016-10-07 14:46:06 +00:00
Konstantin Zhuravlyov c09e2d7e46 [AMDGPU] AMDGPUCodeGenPrepare: remove extra ';'
llvm-svn: 283558
2016-10-07 14:39:53 +00:00
Konstantin Zhuravlyov f74fc60a7d [AMDGPU] Promote uniform (i1, i16] operations to i32
Differential Revision: https://reviews.llvm.org/D25302

llvm-svn: 283555
2016-10-07 14:22:58 +00:00
Nicolai Haehnle 87bc4c218b AMDGPU: Fix use-after-free in SIOptimizeExecMasking
Summary:
There was a bug with sequences like

   s_mov_b64 s[0:1], exec
   s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
   ...
   s_mov_b64_term exec, s[2:3]

because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.

Note that the test case also exposes an unrelated bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25306

llvm-svn: 283528
2016-10-07 08:40:14 +00:00
Peter Collingbourne 2261d78cd2 Target: Remove unused patterns and transforms. NFC.
llvm-svn: 283515
2016-10-07 00:30:49 +00:00
Matt Arsenault 5e63a04e46 AMDGPU: Don't fold undef uses or copies with implicit uses
llvm-svn: 283476
2016-10-06 18:12:13 +00:00
Matt Arsenault c59a92387e AMDGPU: Remove scheduling info from si_mask_branch
llvm-svn: 283475
2016-10-06 18:12:07 +00:00
Matt Arsenault c2ee42cd16 AMDGPU: Remove leftover implicit operands when folding immediates
When constant folding an operation to a copy or an immediate
mov, the implicit uses/defs of the old instruction were left behind,
e.g. replacing v_or_b32 left the implicit exec use on the new copy.

llvm-svn: 283471
2016-10-06 17:54:30 +00:00
Matt Arsenault 11f7402075 Reapply "AMDGPU: Support using tablegened MC pseudo expansions"
Fix bad merge

llvm-svn: 283470
2016-10-06 17:19:11 +00:00
Matt Arsenault cbc879ee2f Revert "AMDGPU: Support using tablegened MC pseudo expansions"
llvm-svn: 283469
2016-10-06 17:08:01 +00:00
Matt Arsenault d20a2dd7ac AMDGPU: Support using tablegened MC pseudo expansions
Make the necessary refactorings to make use of PseudoInstExpansion

llvm-svn: 283467
2016-10-06 16:56:41 +00:00
Matt Arsenault 6bc43d8627 BranchRelaxation: Support expanding unconditional branches
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

llvm-svn: 283464
2016-10-06 16:20:41 +00:00
Sam Kolton 3381d7a216 [AMDGPU] Disassembler: print label names in branch instructions
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.
Initialize MCObjectFileInfo with some default values.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D24802

llvm-svn: 283450
2016-10-06 13:46:08 +00:00
Matt Arsenault 10c17ca6c6 AMDGPU: Partially fix reported code size for some instructions
These ones need to have the size on the pseudo instruction set for
getInstSizeInBytes to work correctly. These also have a statically
known size.

llvm-svn: 283437
2016-10-06 10:13:23 +00:00
Konstantin Zhuravlyov b4eb5d5049 [AMDGPU] Promote uniform i16 bitreverse intrinsic to i32
Differential Revision: https://reviews.llvm.org/D25121

llvm-svn: 283415
2016-10-06 02:20:46 +00:00
Matthias Braun 0a6916f303 AMDGPU: Do not re-use tmpreg in spill/restore lowering
The register scavenging code does not support multiple definitions of
the same vreg.

Differential Revision: https://reviews.llvm.org/D25220

llvm-svn: 283369
2016-10-05 20:02:51 +00:00
Matt Arsenault dcf0cfca4c AMDGPU: Refactor indirect vector lowering
Allow inserting multiple instructions in the
expanded loop.

llvm-svn: 283177
2016-10-04 01:41:05 +00:00
Matt Arsenault 283fbc24f6 AMDGPU: Factor SGPR spilling into separate functions
llvm-svn: 283175
2016-10-04 01:14:56 +00:00
Konstantin Zhuravlyov 60a8373778 [AMDGPU] Pass optimization level to SelectionDAGISel
llvm-svn: 283133
2016-10-03 18:47:26 +00:00
Konstantin Zhuravlyov 691e2e020b [AMDGPU] Sign extend AShr when promoting (instead of zero extending)
llvm-svn: 283130
2016-10-03 18:29:01 +00:00
Matt Arsenault 4a048a44e5 AMDGPU: Fix typo
llvm-svn: 283108
2016-10-03 13:06:58 +00:00
Volkan Keles 1c38681ae6 Add new target hooks for LoadStoreVectorizer
Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D24727

llvm-svn: 283099
2016-10-03 10:31:34 +00:00
Konstantin Zhuravlyov c9786f0d8a [AMDGPU] Remove unused variables from SIOptimizeExecMasking
Differential Revision: https://reviews.llvm.org/D25110

llvm-svn: 283087
2016-10-03 04:43:22 +00:00
Mehdi Amini 117296c0a0 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
Mehdi Amini 86eeda8e20 Revert "AMDGPU: Don't use offen if it is 0"
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038

llvm-svn: 283003
2016-10-01 02:35:24 +00:00
Matt Arsenault 3070fdf798 AMDGPU: Don't use offen if it is 0
This removes many re-initializations of a base register to 0.

llvm-svn: 282999
2016-10-01 01:37:15 +00:00
Konstantin Zhuravlyov 836cbff695 [AMDGPU] Choose VMCNT, EXPCNT, LGKMCNT masks and shifts based on the isa version
Differential Revision: https://reviews.llvm.org/D24973

llvm-svn: 282877
2016-09-30 17:01:40 +00:00
Konstantin Zhuravlyov d7bdf24f32 [AMDGPU] Ask subtarget if waitcnt instruction is needed before barrier instruction
Differential Revision: https://reviews.llvm.org/D24985

llvm-svn: 282875
2016-09-30 16:50:36 +00:00
Konstantin Zhuravlyov 4658e5f7b0 [AMDGPU] Do not run scalar optimization passes at "-O0"
Differential Revision: https://reviews.llvm.org/D25055

llvm-svn: 282873
2016-09-30 16:39:24 +00:00
Matt Arsenault 5d8eb25e78 AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Matt Arsenault e6740754f0 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Konstantin Zhuravlyov e14df4b236 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Konstantin Zhuravlyov da4687c531 [AMDGPU] Enable changing instprinter's behavior based on the per-function
subtarget

This is a prerequisite for coming waitcnt changes

Differential Revision: https://reviews.llvm.org/D24939

llvm-svn: 282489
2016-09-27 14:42:48 +00:00
Tom Stellard 1b9748c6a2 AMDGPU/SI: Don't crash on anonymous GlobalValues
Summary:
We need to call AsmPrinter::getNameWithPrefix() in order to handle
anonymous GlobalValues (e.g. @0, @1).

Reviewers: arsenm, b-sumner

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D24865

llvm-svn: 282420
2016-09-26 17:29:25 +00:00
Sam Kolton 984461062f Revert "[AMDGPU] Disassembler: print label names in branch instructions"
This reverts commit 6c6dbe625263ec9fcf8de0df27263cf147cde550.

llvm-svn: 282396
2016-09-26 11:29:03 +00:00
Sam Kolton 1559f76257 [AMDGPU] Disassembler: print label names in branch instructions
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D24802

llvm-svn: 282394
2016-09-26 10:05:50 +00:00
Valery Pykhtin fbf2d93f73 [AMDGPU] Fix for bz30427: wrong MTBUF encoding on VI
Differential revision: https://reviews.llvm.org/D24875

llvm-svn: 282296
2016-09-23 21:21:21 +00:00
Valery Pykhtin 355103f6c0 [AMDGPU] Refactor VOP1 and VOP2 instruction TD definitions
Differential revision: https://reviews.llvm.org/D24738

llvm-svn: 282234
2016-09-23 09:08:07 +00:00
Tom Stellard e88bbc34c6 AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size
Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D24835

llvm-svn: 282223
2016-09-23 01:33:26 +00:00
Artem Tamazov 2146a0a90e [AMDGPU][mc] Add support for absolute expressions in DPP modifiers.
Also added range checking for DPP attributes.
Assembler tests added as well.

Differential Revision: https://reviews.llvm.org/D24755

llvm-svn: 282145
2016-09-22 11:47:21 +00:00
Artem Tamazov 2e217b87cb [AMDGPU][mc] Add support for ds_add_[rtn_]f32.
Lit tests added.
Resolves https://github.com/RadeonOpenCompute/hcc/issues/122.

Differential Revision: https://reviews.llvm.org/D24765

llvm-svn: 282086
2016-09-21 16:35:44 +00:00
Tim Northover 862758ec14 GlobalISel: pass Function to lowerFormalArguments directly (NFC).
The only implementation that exists immediately looks it up anyway, and the
information is needed to handle various parameter attributes (stored on the
function itself).

llvm-svn: 282068
2016-09-21 12:57:35 +00:00
Sam Kolton 12b633beda [AMDGPU] Assembler: remove unused AMDGPUMCObjectWriter.
Summary: It is replaced by AMDGPUELFObjectWriter

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl

Differential Revision: https://reviews.llvm.org/D24654

llvm-svn: 282065
2016-09-21 10:33:32 +00:00
Valery Pykhtin e330cfa294 [AMDGPU] Refactor VOP3 instruction TD definitions
Differential revision: https://reviews.llvm.org/D24664

llvm-svn: 281965
2016-09-20 10:41:16 +00:00
Valery Pykhtin 2828b9be1e [AMDGPU] Refactor VOPC instruction TD definitions
Differential Revision: https://reviews.llvm.org/D24546

llvm-svn: 281903
2016-09-19 14:39:49 +00:00
Sam Kolton be7ffb90bf [AMDGPU] Fix s_branch with -1 offset
Summary:
In case s_branch instruction target is itself backend should emit offset -1 but instead it emit 0.
'''
label:
    s_branch label  // should emit [0xff,0xff,0x82,0xbf]
'''

Tom, Matt: why are we adjusting fixup values in applyFixup() method instead of processFixup()? processFixup() is calling adjustFixupValue() but does nothing with its result.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl

Differential Revision: https://reviews.llvm.org/D24671

llvm-svn: 281896
2016-09-19 10:20:55 +00:00
Matt Arsenault ac0fc849cf AMDGPU: Fix broken FrameIndex handling
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.

The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.

llvm-svn: 281824
2016-09-17 16:09:55 +00:00
Matt Arsenault bcfd94c298 AMDGPU: Rename spill operands to match real instruction
llvm-svn: 281823
2016-09-17 15:52:37 +00:00
Matt Arsenault d99ef1144b AMDGPU: Push bitcasts through build_vector
This reduces the number of copies and reg_sequences
when using fp constant vectors. This significantly
reduces the code size in local-stack-alloc-bug.ll

llvm-svn: 281822
2016-09-17 15:44:16 +00:00
Matt Arsenault 7b1dc2c983 AMDGPU: Use i64 scalar compare instructions
VI added eq/ne for i64, so use them.

llvm-svn: 281800
2016-09-17 02:02:19 +00:00
Tom Stellard 7998db634c AMDGPU/SI: Fix kernel argument ABI for HSA
Summary: i8, i16, and f16 values are not extended to 32-bit in the HSA kernel ABI.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24621

llvm-svn: 281789
2016-09-16 22:20:24 +00:00
Matt Arsenault 6408c9135c AMDGPU: Allow some control flow intrinsics to be CSEd
These clean up some unnecessary or instructions in
cases with complex loops.

In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.

llvm-svn: 281786
2016-09-16 22:11:18 +00:00
Tom Stellard bbeb45aff6 AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument.  The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.

This consolidates all the logic AMDGPU uses for deducing memory types into a single
function.  This will make it much easier to support different ABIs in the future.

Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24614

llvm-svn: 281781
2016-09-16 21:53:00 +00:00
Matt Arsenault 7ccf6cd104 AMDGPU: Use SOPK compare instructions
llvm-svn: 281780
2016-09-16 21:41:16 +00:00
Tom Stellard 0b76fc4c77 AMDGPU/SI: Add support for triples with the mesa3d operating system
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22783

llvm-svn: 281779
2016-09-16 21:34:26 +00:00
Eric Christopher 4367c7fb9a Move the Mangler from the AsmPrinter down to TLOF and clean up the
TLOF API accordingly.

llvm-svn: 281708
2016-09-16 07:33:15 +00:00
Matt Arsenault 1b9fc8ed65 Finish renaming remaining analyzeBranch functions
llvm-svn: 281535
2016-09-14 20:43:16 +00:00
Matt Arsenault f40b70fa75 Revert "AMDGPU: Use SOPK compare instructions"
Accidentally committed

llvm-svn: 281514
2016-09-14 18:04:42 +00:00
Matt Arsenault f757c87959 AMDGPU: Use SOPK compare instructions
llvm-svn: 281513
2016-09-14 18:03:53 +00:00
Matt Arsenault e8e0f5cac6 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

llvm-svn: 281506
2016-09-14 17:24:15 +00:00
Matt Arsenault a2b036e88b AArch64: Use TTI branch functions in branch relaxation
The main change is to return the code size from
InsertBranch/RemoveBranch.

Patch mostly by Tim Northover

llvm-svn: 281505
2016-09-14 17:23:48 +00:00
Matt Arsenault 2bc198a333 AMDGPU: Support folding FrameIndex operands
This avoids test regressions in a future commit.

llvm-svn: 281491
2016-09-14 15:51:33 +00:00
Matt Arsenault fa5f767a38 AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.

llvm-svn: 281488
2016-09-14 15:19:03 +00:00
Matt Arsenault a992f71bef AMDGPU: Remove code I think is dead
As far as I can tell, resolveFrameIndex is supposed to be
called with a legal offset, so inserting an add shouldn't be
necessary.

llvm-svn: 281372
2016-09-13 19:15:25 +00:00
Matt Arsenault 25dba30017 AMDGPU: Support commuting a FrameIndex operand
llvm-svn: 281369
2016-09-13 19:03:12 +00:00
Nicolai Haehnle e58e0e3fe3 AMDGPU: Do not clobber SCC in SIWholeQuadMode
Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22198

llvm-svn: 281230
2016-09-12 16:25:20 +00:00
Sam Kolton fb0d9d9c13 [AMDGPU] Assembler: Move disabled SDWA and DPP instruction into Disable asm variant
Summary: This removes disabled instructions from match tables so we will not match them at all.

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: wdng, nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D24452

llvm-svn: 281216
2016-09-12 14:42:43 +00:00
Duncan P. N. Exon Smith 1872096f1e CodeGen: Give MachineBasicBlock::reverse_iterator a handle to the current MI
Now that MachineBasicBlock::reverse_instr_iterator knows when it's at
the end (since r281168 and r281170), implement
MachineBasicBlock::reverse_iterator directly on top of an
ilist::reverse_iterator by adding an IsReverse template parameter to
MachineInstrBundleIterator.  This replaces another hard-to-reason-about
use of std::reverse_iterator on list iterators, matching the changes for
ilist::reverse_iterator from r280032 (see the "out of scope" section at
the end of that commit message).  MachineBasicBlock::reverse_iterator
now has a handle to the current node and has obvious invalidation
semantics.

r280032 has a more detailed explanation of how list-style reverse
iterators (invalidated when the pointed-at node is deleted) are
different from vector-style reverse iterators like std::reverse_iterator
(invalidated on every operation).  A great motivating example is this
commit's changes to lib/CodeGen/DeadMachineInstructionElim.cpp.

Note: If your out-of-tree backend deletes instructions while iterating
on a MachineBasicBlock::reverse_iterator or converts between
MachineBasicBlock::iterator and MachineBasicBlock::reverse_iterator,
you'll need to update your code in similar ways to r280032.  The
following table might help:

                  [Old]              ==>             [New]
        delete &*RI, RE = end()                   delete &*RI++
        RI->erase(), RE = end()                   RI++->erase()
      reverse_iterator(I)                 std::prev(I).getReverse()
      reverse_iterator(I)                          ++I.getReverse()
    --reverse_iterator(I)                            I.getReverse()
      reverse_iterator(std::next(I))                 I.getReverse()
                RI.base()                std::prev(RI).getReverse()
                RI.base()                         ++RI.getReverse()
              --RI.base()                           RI.getReverse()
     std::next(RI).base()                           RI.getReverse()

(For more details, have a look at r280032.)

llvm-svn: 281172
2016-09-11 18:51:28 +00:00
Justin Lebar adbf09e8cf [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

llvm-svn: 281151
2016-09-11 01:38:58 +00:00
Valery Pykhtin b66e5eb612 [AMDGPU] Refactor MUBUF/MTBUF instructions
Differential revision: https://reviews.llvm.org/D24295

llvm-svn: 281137
2016-09-10 13:09:16 +00:00
Matt Arsenault 3354f42ae7 AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndex
llvm-svn: 281128
2016-09-10 01:20:33 +00:00
Matt Arsenault 7348a7eadd AMDGPU: Fix scheduling info for spill pseudos
These defaulted to Write32Bit. I don't think this actually matters
since these don't exist during scheduling.

llvm-svn: 281127
2016-09-10 01:20:28 +00:00
Matt Arsenault 124384f08d AMDGPU: Fix immediate folding logic when shrinking instructions
If the literal is being folded into src0, it doesn't matter
if it's an SGPR because it's being replaced with the literal.

Also fixes initially selecting 32-bit versions of some instructions
which also confused commuting.

llvm-svn: 281117
2016-09-09 23:32:53 +00:00
Matt Arsenault 0efdd06b22 AMDGPU: Run LoadStoreVectorizer pass by default
llvm-svn: 281112
2016-09-09 22:29:28 +00:00
Wei Ding 06f8d39424 AMDGPU : Fix mqsad_u32_u8 instruction incorrect data type.
Differential Revision: http://reviews.llvm.org/D23700

llvm-svn: 281081
2016-09-09 19:31:51 +00:00
Tom Stellard b2869eb6e9 AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSA
Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24405

llvm-svn: 281080
2016-09-09 19:28:00 +00:00
Sam Kolton 1eeb11bfd4 AMDGPU] Assembler: better support for immediate literals in assembler.
Summary:
Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals.
E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least.

With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction).
Here are rules how we convert literals:
    - We parsed fp literal:
        - Instruction expects 64-bit operand:
            - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5)
                - then we do nothing this literal
            - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5)
                - report error
            - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant
                - If instruction expect fp operand type (f64)
                    - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5)
                        - If so then do nothing
                    - Else (e.g. v_fract_f64 v[0:1], 3.1415)
                        - report warning that low 32 bits will be set to zeroes and precision will be lost
                        - set low 32 bits of literal to zeroes
                - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5)
                    - report error as it is unclear how to encode this literal
        - Instruction expects 32-bit operand:
            - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow
            - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5)
                - do nothing
                - Else report error
            - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0)
    - Parsed binary literal:
        - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35)
            - do nothing
        - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35)
            - report error
        - Else, literal is not-inlinable and we are not required to inline it
            - Are high 32 bit of literal zeroes or same as sign bit (32 bit)
                - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef)
            - Else
                - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0)

For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types:
'''
enum OperandType {
    OPERAND_REG_IMM32_INT,
    OPERAND_REG_IMM32_FP,
    OPERAND_REG_INLINE_C_INT,
    OPERAND_REG_INLINE_C_FP,
}
'''

This is not working yet:
    - Several tests are failing
    - Problems with predicate methods for inline immediates
    - LLVM generated assembler parts try to select e64 encoding before e32.
More changes are required for several AsmOperands.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, artem.tamazov

Differential Revision: https://reviews.llvm.org/D22922

llvm-svn: 281050
2016-09-09 14:44:04 +00:00
Sam Kolton a2e5c88baf [AMDGPU] Assembler: rename amd_kernel_code_t asm names according to spec
Summary:
Also removed duplicate code from AMDGPUTargetAsmStreamer.
This change only change how amd_kernel_code_t is parsed and printed. No variable names are changed.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D24296

llvm-svn: 281028
2016-09-09 10:08:02 +00:00
Sam Kolton d63d8a7c05 [AMDGPU] Assembler: match e32 VOP instructions before e64.
Summary:
Split assembler match table in 4 tables with assembler variants:

Default - all instructions except VOP3, SDWA and DPP
  - VOP3
  - SDWA
  - DPP
First match Default table then VOP3, SDWA and DPP.

Reviewers:  tstellarAMD, artem.tamazov, vpykhtin

Subscribers: arsenm, wdng, nhaehnle, AMDGPU

Differential Revision: https://reviews.llvm.org/D24252

llvm-svn: 281023
2016-09-09 09:37:51 +00:00
Matt Arsenault d745c28945 AMDGPU: Sign extend constants when splitting them
This will confuse later passes which try to look at the
immediate value and don't truncate first.

llvm-svn: 280974
2016-09-08 17:44:36 +00:00
Matt Arsenault be90f70d3a AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32
llvm-svn: 280972
2016-09-08 17:35:41 +00:00
Matt Arsenault bbb47da8a1 AMDGPU: Support commuting with immediate in src0
llvm-svn: 280970
2016-09-08 17:19:29 +00:00
Yaxun Liu 90658fff1b AMDGPU: Remove a useless variable which caused build failure for lld.
llvm-svn: 280841
2016-09-07 18:31:11 +00:00
Yaxun Liu 638914009a AMDGPU: Add hidden kernel arguments to runtime metadata
OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata.

Differential Revision: https://reviews.llvm.org/D23424

llvm-svn: 280829
2016-09-07 17:44:00 +00:00
Matt Arsenault 479ba3aac0 AMDGPU: Make some scalar instructions commutable
llvm-svn: 280784
2016-09-07 06:25:55 +00:00
Matt Arsenault 6cda10c950 Remove unnecessary call to getAllocatableRegClass
This reapplies r252565 and r252674, effectively reverting r252956.

This allows VS_32/VS_64 to be unallocatable like they should be.

llvm-svn: 280783
2016-09-07 06:16:45 +00:00
Konstantin Zhuravlyov 1d65026ca6 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562

llvm-svn: 280747
2016-09-06 20:22:28 +00:00
Tom Stellard 2add8a1140 AMDGPU/SI: Teach SIInstrInfo::FoldImmediate() to fold immediates into copies
Summary:
I put this code here, because I want to re-use it in a few other places.
This supersedes some of the immediate folding code we have in SIFoldOperands.
I think the peephole optimizers is probably a better place for folding
immediates into copies, since it does some register coalescing in the same time.

This will also make it easier to transition SIFoldOperands into a smarter pass,
where it looks at all uses of instruction at once to determine the optimal way to
fold operands.  Right now, the pass just considers one operand at a time.

Reviewers: arsenm

Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23402

llvm-svn: 280744
2016-09-06 20:00:26 +00:00
Wei Ding 5e832e866e AMDGPU : Add XNACK feature to GPUs that support it.
Differential Revision: http://reviews.llvm.org/D24276

llvm-svn: 280742
2016-09-06 19:55:17 +00:00
Valery Pykhtin 8bc659637c [AMDGPU] Refactor FLAT TD instructions
Differential revision: https://reviews.llvm.org/D24072

llvm-svn: 280655
2016-09-05 11:22:51 +00:00
Matt Arsenault ac42ba8633 AMDGPU: Set sizes of spill pseudos
llvm-svn: 280595
2016-09-03 17:25:44 +00:00
Matt Arsenault 5ffe3e1d93 AMDGPU: Fix adding duplicate implicit exec uses
I'm not sure if this should be considered a bug in
copyImplicitOps or not, but implicit operands that are part
of the static instruction definition should not be copied.

llvm-svn: 280594
2016-09-03 17:25:39 +00:00
Nicolai Haehnle 3bba6a8438 AMDGPU: Reduce the duration of whole-quad-mode
Summary:
This contains two changes that reduce the time spent in WQM, with the
intention of reducing bandwidth required by VMEM loads:

1. Sampling instructions by themselves don't need to run in WQM, only their
   coordinate inputs need it (unless of course there is a dependent sampling
   instruction). The initial scanInstructions step is modified accordingly.

2. When switching back from WQM to Exact, switch back as soon as possible.
   This affects the logic in processBlock.

This should always be a win or at best neutral.

There are also some cleanups (e.g. remove unused ExecExports) and some new
debugging output.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22092

llvm-svn: 280590
2016-09-03 12:26:38 +00:00
Nicolai Haehnle a246dccc26 AMDGPU: Fix an interaction between WQM and polygon stippling
Summary:
This fixes a rare bug in polygon stippling with non-monolithic pixel shaders.

The underlying problem is as follows: the prolog part contains the polygon
stippling sequence, i.e. a kill. The main part then enables WQM based on the
_reduced_ exec mask, effectively undoing most of the polygon stippling.

Since we cannot know whether polygon stippling will be used, the main part
of a non-monolithic shader must always return to exact mode to fix this
problem.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23131

llvm-svn: 280589
2016-09-03 12:26:32 +00:00
Matt Arsenault 2510a31677 AMDGPU: Fix spilling of m0
readlane/writelane do not support using m0 as the output/input.
Constrain the register class of spill vregs to try to avoid this,
but also handle spilling of the physreg when necessary by inserting
an additional copy to a normal SGPR.

llvm-svn: 280584
2016-09-03 06:57:55 +00:00
Jan Vesely ea45746d5a AMDGPU/R600: EXTRACT_VECT_ELT should only bypass BUILD_VECTOR if the vectors have the same number of elements.
Fixes R600 piglit regressions since r280298

Differential Revision: https://reviews.llvm.org/D24174

llvm-svn: 280535
2016-09-02 20:13:19 +00:00
Jan Vesely 00864886f4 AMDGPU/R600: Expand unaligned writes to local and global AS
LOCAL and GLOBAL AS only
PRIVATE needs special treatment

Differential Revision: https://reviews.llvm.org/D23971

llvm-svn: 280526
2016-09-02 19:07:06 +00:00
Yaxun Liu add05a8d95 AMDGPU: Add runtime metadata for pointee alignment of argument.
Add runtime metdata for pointee alignment of pointer type kernel argument. The key is KeyArgPointeeAlign and the value is a 32 bit unsigned integer.

Differential Revision: https://reviews.llvm.org/D24145

llvm-svn: 280399
2016-09-01 18:46:49 +00:00
Changpeng Fang b28fe0307f AMDGPU/SI: MIMG TD Refactoring.
Summary:
 Created a new td file MIMGInstructions.td which contains all definitions
of MIMG related instructions.

Reviewed by:
  kzhuravl, vpykhtin

Differential Revision:
  http://reviews.llvm.org/D24106

llvm-svn: 280385
2016-09-01 17:54:54 +00:00
Valery Pykhtin 1b13886b5f [AMDGPU] Scalar Memory instructions TD refactoring
Differential revision: https://reviews.llvm.org/D23996

llvm-svn: 280349
2016-09-01 09:56:47 +00:00
Matt Arsenault b50eb8dc2b AMDGPU: Fix introducing stack access on unaligned v16i8
llvm-svn: 280298
2016-08-31 21:52:27 +00:00
Matt Arsenault 1d2151781b AMDGPU: Use copy instead of mov during frame lowering
This occurs before RA pseudos are expanded. It's less
code to emit the copy.

llvm-svn: 280297
2016-08-31 21:52:25 +00:00
Matt Arsenault 57bc4324f8 AMDGPU: Refactor frame lowering
This will make future changes easier.

llvm-svn: 280296
2016-08-31 21:52:21 +00:00
Tom Stellard ba5730884b AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is at least 4-byte aligned
Summary: This fixes some OpenCV tests that were broken by libclc commit r276443.

Reviewers: arsenm, jvesely

Subscribers: arsenm, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D24051

llvm-svn: 280274
2016-08-31 18:46:07 +00:00
Nikolay Haustov eba808957e AMDGPU/SI: Handle aliases in AMDGPUAlwaysInlinePass
Summary:
Simply replace usage of aliases to functions with aliasee.
This came up when bitcode linking to builtin library and
calls to aliases not being resolved.

Also made minor improvements to existing test.

Reviewers: tstellarAMD, alex-t, vpykhtin

Subscribers: arsenm, wdng, rampitec

Differential Revision: https://reviews.llvm.org/D24023

llvm-svn: 280221
2016-08-31 11:18:33 +00:00
Matt Arsenault a609e2d5ce AMDGPU: Relax SGPR asm constraint register class
s should be SReg_32 to be as general as possible. This can avoid a copy
from m0.

llvm-svn: 280154
2016-08-30 20:50:08 +00:00
Valery Pykhtin a34fb49f8f [AMDGPU] Refactor SOP instructions TD files.
Differential revision: https://reviews.llvm.org/D23617

llvm-svn: 280101
2016-08-30 15:20:31 +00:00
NAKAMURA Takumi 9720f57a17 SILoadStoreOptimizer.cpp: Fix a warning in r279991. [-Wunused-variable]
llvm-svn: 280075
2016-08-30 11:50:21 +00:00
Jan Vesely 89876673cd AMDGPU/R600: Cleanup DAGCombine
Move SDLoc initialization to comon place.
fall back to AMDGPU version in one place

Differential Revision: https://reviews.llvm.org/D23900

llvm-svn: 280030
2016-08-29 23:21:46 +00:00
Jan Vesely 77ed6af416 AMDGPU/R600: Remove MergeVectorStores from legalization
This is handled by DAGCombiner in a more generic way

Differential Revision: https://reviews.llvm.org/D23970

llvm-svn: 280019
2016-08-29 22:05:06 +00:00
Saleem Abdulrasool 43e5fe3fac AMDGPU: fix mismatch tags, NFC
llvm-svn: 280006
2016-08-29 20:42:07 +00:00
Tom Stellard 0d23ebe888 AMDGPU/SI: Implement a custom MachineSchedStrategy
Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.

It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.

Shader DB stats:

32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23688

llvm-svn: 279995
2016-08-29 19:42:52 +00:00
Tom Stellard c2ff0eb697 AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler
Summary:
The SILoadStoreOptimizer can now look ahead more then one instruction when
looking for instructions to merge, which greatly improves the number of
loads/stores that we are able to merge.

Moving the pass before scheduling avoids increasing register pressure after
the scheduler, so that the scheduler's register pressure estimates will be
more accurate.  It also gives more consistent results, since it is no longer
affected by minor scheduling changes.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23814

llvm-svn: 279991
2016-08-29 19:15:22 +00:00
Matt Arsenault b90fc9b3b4 AMDGPU/R600: Fix fixups used for constant arrays
Fixes bug 29289

llvm-svn: 279986
2016-08-29 19:01:48 +00:00
Tom Stellard 5d3f71f721 AMDGPU/SI: Improve register allocation hints for sopk instructions
Summary:
For shrinking SOPK instructions, we were creating a hint to tell the
register allocator to use the register allocated for src0 for the dst
operand as well.  However, this seems to not work sometimes depending
on the order virtual registers are assigned physical registers.

To fix this, I've added a second allocation hint which does the reverse,
asks that the register allocated for dst is used for src0.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23862

llvm-svn: 279968
2016-08-29 13:06:10 +00:00
Tom Stellard 662f330852 AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint()
Summary:
The SILoadStoreOptimizer will need to use AliasAnalysis here in order to
move it before scheduling.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23813

llvm-svn: 279963
2016-08-29 12:05:32 +00:00
Jan Vesely 38814fa2fd AMDGPU/R600: Enable Load combine
Fix and improve tests

Differential Revision: https://reviews.llvm.org/D23899

llvm-svn: 279925
2016-08-27 19:09:43 +00:00
Matt Arsenault a15ea4e217 AMDGPU: Mark sched model complete
Fixes bug 26800

llvm-svn: 279910
2016-08-27 03:39:27 +00:00
Matt Arsenault 71ed8a67e8 AMDGPU: Remove unneeded implicit exec uses/defs
SI_BREAK, SI_IF_BREAK, and SI_ELSE_BREAK do not def exec.
SI_IF_BREAK and SI_ELSE_BREAK do not read it either.

llvm-svn: 279909
2016-08-27 03:00:51 +00:00
Matt Arsenault 2712d4a3d8 AMDGPU: Select mulhi 24-bit instructions
llvm-svn: 279902
2016-08-27 01:32:27 +00:00
Matt Arsenault 22e417956d AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

llvm-svn: 279901
2016-08-27 01:00:37 +00:00
Matt Arsenault e949744474 AMDGPU: Fix sched type for branches
llvm-svn: 279900
2016-08-27 00:51:02 +00:00
Matt Arsenault f98a596954 AMDGPU: Remove register operand from si_mask_branch
It isn't used for anything, and is also misleading since
it could be spilled at the end of the block, so it can't be relied
on. There ends up being a verifier error about using an undefined
register since the spill kills the register.

llvm-svn: 279899
2016-08-27 00:42:21 +00:00
Matt Arsenault 00e102baf4 AMDGPU: Improve error reporting for maximum branch distance
Unfortunately this seems to only help the assembler diagnostic.

llvm-svn: 279895
2016-08-27 00:21:22 +00:00
Tom Stellard e175d8aba5 AMDGPU/SI: Canonicalize offset order for merged DS instructions
Summary:
If the scheduler clusters the loads, then the offsets will be sorted,
but it is possible for the scheduler to scheduler loads together
without out explicitly clustering them, which would give us non-sorted
offsets.

Also, we will want to do this if we move the load/store optimizer before
the scheduler.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23776

llvm-svn: 279870
2016-08-26 21:36:47 +00:00
Tom Stellard 4b5cd87ed3 XXX
llvm-svn: 279868
2016-08-26 21:16:40 +00:00
Tom Stellard 7c463c9168 AMDGPU/SI: Use a better method for determining the largest pressure sets
Summary:
There are a few different sgpr pressure sets, but we only care about
the one which covers all of the sgprs.  We were using hard-coded
register pressure set names to determine the reg set id for the
biggest sgpr set.  However, we were using the wrong name, and this
method is pretty fragile, since the reg pressure set names may
change.

The new method just looks for the pressure set that contains the most
reg units and sets that set as our SGPR pressure set.  We've also
adopted the same technique for determining our VGPR pressure set.

Reviewers: arsenm

Subscribers: MatzeB, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23687

llvm-svn: 279867
2016-08-26 21:16:37 +00:00
Reid Kleckner a5b1eef846 [MC] Move .cv_loc management logic out of MCContext
MCContext already has many tasks, and separating CodeView out from it is
probably a good idea. The .cv_loc tracking was modelled on the DWARF
tracking which lived directly in MCContext.

Removes the inclusion of MCCodeView.h from MCContext.h, so now there are
only 10 build actions while I hack on CodeView support instead of 265.

llvm-svn: 279847
2016-08-26 17:58:37 +00:00
Changpeng Fang 75f0968b39 AMDGCN/SI: Implement readlane/readfirstlane intrinsics
Summary:
  This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.

Reviewed by:
  arsenm and tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D22489

llvm-svn: 279660
2016-08-24 20:35:23 +00:00
Wei Ding 1041a646a9 AMDGPU : Add V_SAD_U32 instruction pattern.
Differential Revision: http://reviews.llvm.org/D23069

llvm-svn: 279629
2016-08-24 14:59:47 +00:00
Matthias Braun 733fe3676c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this patch, hopefully I will get away without any warnings
in the constructor now.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279602
2016-08-24 01:52:46 +00:00
Richard Smith 8c3fbdc6c4 Revert r279564. It introduces undefined behavior (binding a reference to a
dereferenced null pointer) in MachineModuleInfo::MachineModuleInfo that causes
-Werror builds (including several buildbots) to fail.

llvm-svn: 279580
2016-08-23 22:08:27 +00:00
Matthias Braun 90799ce8b2 MachineFunction: Introduce NoPHIs property
I want to compute the SSA property of .mir files automatically in
upcoming patches. The problem with this is that some inputs will be
reported as static single assignment with some passes claiming not to
support SSA form.  In reality though those passes do not support PHI
instructions => Track the presence of PHI instructions separate from the
SSA property.

Differential Revision: https://reviews.llvm.org/D22719

llvm-svn: 279573
2016-08-23 21:19:49 +00:00
Matthias Braun 4c1f1f120c CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
Re-apply this commit with the deletion of a MachineFunction delegated to
a separate pass to avoid use after free when doing this directly in
AsmPrinter.

This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279564
2016-08-23 20:58:29 +00:00
Matthias Braun 7f66202d38 Revert "(HEAD -> master, origin/master, origin/HEAD) CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses"
Reverting while tracking down a use after free.

This reverts commit r279502.

llvm-svn: 279503
2016-08-23 05:17:11 +00:00
Matthias Braun fd936841eb CodeGen: Remove MachineFunctionAnalysis => Enable (Machine)ModulePasses
This patch removes the MachineFunctionAnalysis. Instead we keep a
map from IR Function to MachineFunction in the MachineModuleInfo.

This allows the insertion of ModulePasses into the codegen pipeline
without breaking it because the MachineFunctionAnalysis gets dropped
before a module pass.

Peak memory should stay unchanged without a ModulePass in the codegen
pipeline: Previously the MachineFunction was freed at the end of a codegen
function pipeline because the MachineFunctionAnalysis was dropped; With
this patch the MachineFunction is freed after the AsmPrinter has
finished.

Differential Revision: http://reviews.llvm.org/D23736

llvm-svn: 279502
2016-08-23 03:20:09 +00:00
Matt Arsenault 78fc9daf8d AMDGPU: Split SILowerControlFlow into two pieces
Do most of the lowering in a pre-RA pass. Keep the skip jump
insertion late, plus a few other things that require more
work to move out.

One concern I have is now there may be COPY instructions
which do not have the necessary implicit exec uses
if they will be lowered to v_mov_b32.

This has a positive effect on SGPR usage in shader-db.

llvm-svn: 279464
2016-08-22 19:33:16 +00:00
NAKAMURA Takumi 59a20649c6 Untabify.
llvm-svn: 279408
2016-08-22 00:58:04 +00:00
Tim Shen b5e0f5ac95 [GraphTraits] Make nodes_iterator dereference to NodeType*/NodeRef
Currently nodes_iterator may dereference to a NodeType* or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later.

Differential Revision: https://reviews.llvm.org/D23704
Differential Revision: https://reviews.llvm.org/D23705

llvm-svn: 279326
2016-08-19 21:20:13 +00:00
Michael Kuperstein 2bc3d4d46c [SelectionDAG] Rename fextend -> fpextend, fround -> fpround, frnd -> fround
The names of the tablegen defs now match the names of the ISD nodes.
This makes the world a slightly saner place, as previously "fround" matched
ISD::FP_ROUND and not ISD::FROUND.

Differential Revision: https://reviews.llvm.org/D23597

llvm-svn: 279129
2016-08-18 20:08:15 +00:00
Wei Ding 52bb661dec AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.
Differential Revision: http://reviews.llvm.org/D23689

llvm-svn: 279126
2016-08-18 19:51:14 +00:00
Valery Pykhtin 609c2f8137 [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.
Differential revision: https://reviews.llvm.org/D23666

llvm-svn: 279106
2016-08-18 18:06:20 +00:00
Justin Bogner cd1d5aaf2e Replace a few more "fall through" comments with LLVM_FALLTHROUGH
Follow up to r278902. I had missed "fall through", with a space.

llvm-svn: 278970
2016-08-17 20:30:52 +00:00
Matt Arsenault d42d58cf21 AMDGPU: Remove dead option
llvm-svn: 278965
2016-08-17 20:07:16 +00:00
Justin Bogner b03fd12cef Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

llvm-svn: 278902
2016-08-17 05:10:15 +00:00
Chandler Carruth 67fc52f067 [PM] Port the always inliner to the new pass manager in a much more
minimal and boring form than the old pass manager's version.

This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:

- Array alloca merging
  - To support the above, bottom-up inlining with careful history
    tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
  Instead, it focuses on inlining functions with that attribute.

The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.

The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.

The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.

One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.

Anyways, hopefully a reasonable starting point for this pass.

Differential Revision: https://reviews.llvm.org/D23299

llvm-svn: 278896
2016-08-17 02:56:20 +00:00
Duncan P. N. Exon Smith db53d99d02 AMDGPU: Avoid looking for the DebugLoc in end()
The end() iterator isn't a safe thing to dereference.  Pass the DebugLoc
into EmitFetchClause and EmitALUClause to avoid it.

llvm-svn: 278873
2016-08-17 00:06:43 +00:00
Konstantin Zhuravlyov e0b87181cf [AMDGPU] Remove duplicate initialization of SIDebuggerInsertNops pass
Differential Revision: https://reviews.llvm.org/D23556

llvm-svn: 278863
2016-08-16 22:30:11 +00:00
Matt Arsenault 7f19298bfa AMDGPU: Remove excessive padding from ImmOp and RegOp.
The structs ImmOp and RegOp are in AArch64AsmParser.cpp (inside
anonymous namespace).
This diff changes the order of fields and removes the excessive padding
(8 bytes).

Patch by Alexander Shaposhnikov

llvm-svn: 278844
2016-08-16 20:28:06 +00:00
Reid Kleckner 229d32abfc [AMDGPU] Give enum an explicit 64-bit type to fix MSVC 2013 failures
Recall that MSVC always gives enums the type 'int', nothing else.  MSVC
2015 does not appear to have this problem anymore.

Clang-cl -Wmicrosoft-enum-value flags this, FWIW, so now I have a true
positive for my warning. :)

llvm-svn: 278762
2016-08-15 23:54:44 +00:00
Jan Vesely 0486f739a4 AMDGPU/R600: Convert buffer id to VTX_READ input
Use patterns instead of multiple instructions
Add buffer id to asm string

https://reviews.llvm.org/D22650

llvm-svn: 278749
2016-08-15 21:38:30 +00:00
Yaxun Liu c7cbd72921 AMDGPU: Update AMDGPURuntimeMetadata.h for enums of address space qualifiers
llvm-svn: 278682
2016-08-15 16:54:25 +00:00
Matt Arsenault 3661e90e71 AMDGPU: Don't fold subregister extracts into tied operands
llvm-svn: 278676
2016-08-15 16:18:36 +00:00
Valery Pykhtin c761675ef4 [AMDGPU] fix failure on printing of non-existing instruction operands.
Differential revision: https://reviews.llvm.org/D23323

llvm-svn: 278665
2016-08-15 10:56:48 +00:00
Matt Arsenault c1ebd82ebe AMDGPU: Fix not estimating MBB operand sizes correctly
llvm-svn: 278590
2016-08-13 01:43:54 +00:00
Matt Arsenault 3cc1e0066d AMDGPU: Fix missing test for addressing mode with odd offsets
Add test if the constant offset looks unaligned.

llvm-svn: 278589
2016-08-13 01:43:51 +00:00
Matt Arsenault 44f6d694b3 AMDGPU/R600: Remove macros
llvm-svn: 278588
2016-08-13 01:43:46 +00:00
Hans Wennborg 0dd9ed1d45 Fix more dereferenced end() iterators after r278532
llvm-svn: 278587
2016-08-13 01:12:49 +00:00
Duncan P. N. Exon Smith f197b1f78f ADT: Remove all ilist_iterator => pointer casts, NFC
Remove all ilist_iterator to pointer casts.  There were two reasons for
casts:

  - Checking for an uninitialized (i.e., null) iterator.  I added
    MachineInstrBundleIterator::isValid() to check for that case.

  - Comparing an iterator against the underlying pointer value while
    avoiding converting the pointer value to an iterator.  This is
    occasionally necessary in MachineInstrBundleIterator, since there is
    an assertion in the constructors that the underlying MachineInstr is
    not bundled (but we don't care about that if we're just checking for
    pointer equality).

To support the latter case, I rewrote the == and != operators for
ilist_iterator and MachineInstrBundleIterator.

  - The implicit constructors now use enable_if to exclude
    const-iterator => non-const-iterator conversions from overload
    resolution (previously it was a compiler error on instantiation, now
    it's SFINAE).

  - The == and != operators are now global (friends), and are not
    templated.

  - MachineInstrBundleIterator has overloads to compare against both
    const_pointer and const_reference.  This avoids the implicit
    conversions to MachineInstrBundleIterator that assert, instead just
    checking the address (and I added unit tests to confirm this).

Notably, the only remaining uses of ilist_iterator::getNodePtrUnchecked
are in ilist.h, and no code outside of ilist*.h directly relies on this
UB end-iterator-to-pointer conversion anymore.  It's still needed for
ilist_*sentinel_traits, but I'll clean that up soon.

llvm-svn: 278478
2016-08-12 05:05:36 +00:00
David Majnemer 562e82945e Use the range variant of find_if instead of unpacking begin/end
No functionality change is intended.

llvm-svn: 278443
2016-08-12 00:18:03 +00:00
David Majnemer 0d955d0bf5 Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278433
2016-08-11 22:21:41 +00:00
Matt Arsenault 18da70dd2d AMDGPU: Remove unused tablegen utilities
llvm-svn: 278414
2016-08-11 21:08:43 +00:00
Wei Ding 70cda07526 AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32
Differential Revision: http://reviews.llvm.org/D23336

llvm-svn: 278403
2016-08-11 20:34:48 +00:00
Matt Arsenault 2ffe8fd2ce AMDGPU: Prune includes
llvm-svn: 278391
2016-08-11 19:18:50 +00:00
Matt Arsenault 56684d4538 AMDGPU: Fix crashes on memory functions
llvm-svn: 278369
2016-08-11 17:31:42 +00:00
Matt Arsenault 4b5fc093d0 AMDGPU: Remove custom getSubReg
This was kind of confusing, the subregister
class shouldn't really be necessary.

llvm-svn: 278362
2016-08-11 17:15:32 +00:00
Matt Arsenault 69fd2c1179 AMDGPU: Remove unused tracking of flat instructions
llvm-svn: 278361
2016-08-11 17:15:28 +00:00
Wei Ding 34e1753585 AMDGPU : Add LLVM intrinsics for SAD related instructions.
Differential Revision: http://reviews.llvm.org/D23133

llvm-svn: 278354
2016-08-11 16:33:53 +00:00
Valery Pykhtin 82c73bee2b Revert "[AMDGPU] fix failure on printing of non-existing instruction operands."
This reverts revision 278333, newly added test failed.

llvm-svn: 278336
2016-08-11 14:22:05 +00:00
Valery Pykhtin 3048ff6ec3 [AMDGPU] fix failure on printing of non-existing instruction operands.
Differential revision: https://reviews.llvm.org/D23323

llvm-svn: 278333
2016-08-11 13:49:46 +00:00
Tim Northover 406024a108 GlobalISel: implement simple function calls on AArch64.
We're still limited in the arguments we support, but this at least handles the
basic cases.

llvm-svn: 278293
2016-08-10 21:44:01 +00:00
Changpeng Fang fb9c3818dd AMDGPU/SI: Implement amdgcn image intrinsics with sampler
Summary:
  This patch define and implement amdgcn image intrinsics with sampler.

    1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty,
       and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name
       to have three suffixes to overload each of these three types;

    2. D128 as well as two other flags are implied in the three types, for example,
       if you use v8i32 as resource type, then r128 is 0!

    3. don't expose TFE flag, and other flags are exposed in the instruction order:
       unrm, glc, slc, lwe and da.

Differential Revision: http://reviews.llvm.org/D22838

Reviewed by:
  arsenm and tstellarAMD

llvm-svn: 278291
2016-08-10 21:15:30 +00:00
Matt Arsenault 61f8ba8b79 AMDGPU: s_setpc_b64 should be an indirect branch
llvm-svn: 278278
2016-08-10 19:20:02 +00:00
Matt Arsenault c6b1350039 AMDGPU: Set sizes on control flow pseudos
llvm-svn: 278276
2016-08-10 19:11:51 +00:00
Matt Arsenault f4af802381 AMDGPU: Remove empty file comment
llvm-svn: 278275
2016-08-10 19:11:48 +00:00
Matt Arsenault 11587d97be AMDGPU: Remove unnecessary cast
llvm-svn: 278274
2016-08-10 19:11:45 +00:00
Matt Arsenault 57431c9680 AMDGPU: Change insertion point of si_mask_branch
Insert before the skip branch if one is created.
This is a somewhat more natural placement relative
to the skip branches, and makes it possible to implement
analyzeBranch for skip blocks.

The test changes are mostly due to a quirk where
the block label is not emitted if there is a terminator
that is not also a branch.

llvm-svn: 278273
2016-08-10 19:11:42 +00:00
Matt Arsenault b920e9987d AMDGPU: Use CreateStackObject instead of CreateSpillStackObject
I'm not sure what the difference is, but no other target
uses this for emergency spill slots.

llvm-svn: 278272
2016-08-10 19:11:36 +00:00
Marek Olsak 355a8642b4 AMDGPU/SI: Increase SGPR limit to 96 on Tonga/Iceland
Summary:
This is the setting of the Vulkan closed source driver.

It decreases the max wave count from 10 to 8.

26010 shaders in 14650 tests
Totals:
VGPRS: 829593 -> 808440 (-2.55 %)
Spilled SGPRs: 81878 -> 42226 (-48.43 %)
Spilled VGPRs: 367 -> 358 (-2.45 %)
Scratch VGPRs: 1764 -> 1748 (-0.91 %) dwords per thread
Code Size: 36677864 -> 35923932 (-2.06 %) bytes

There is a massive decrease in SGPR spilling in general and -7.4% spilled
VGPRs for DiRT Showdown (= SGPRs spilled to scratch?)

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23034

llvm-svn: 277867
2016-08-05 21:23:29 +00:00
Yaxun Liu 86c052238a [OpenCL] Add missing tests for getOCLTypeName
Adding missing tests for OCL type names for half, float, double, char, short, long, and unknown.

Patch by Aaron En Ye Shi.

Differential Revision: https://reviews.llvm.org/D22964

llvm-svn: 277759
2016-08-04 19:45:00 +00:00
Alina Sbirlea 6f937b1144 LoadStoreVectorizer: Remove TargetBaseAlign. Keep alignment for stack adjustments.
Summary:
TargetBaseAlign is no longer required since LSV checks if target allows misaligned accesses.
A constant defining a base alignment is still needed for stack accesses where alignment can be adjusted.

Previous patch (D22936) was reverted because tests were failing. This patch also fixes the cause of those failures:
- x86 failing tests either did not have the right target, or the right alignment.
- NVPTX failing tests did not have the right alignment.
- AMDGPU failing test (merge-stores) should allow vectorization with the given alignment but the target info
  considers <3xi32> a non-standard type and gives up early. This patch removes the condition and only checks
  for a maximum size allowed and relies on the next condition checking for %4 for correctness.
  This should be revisited to include 3xi32 as a MVT type (on arsenm's non-immediate todo list).

Note that checking the sizeInBits for a MVT is undefined (leads to an assertion failure),
so we need to create an EVT, hence the interface change in allowsMisaligned to include the Context.

Reviewers: arsenm, jlebar, tstellarAMD

Subscribers: jholewinski, arsenm, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23068

llvm-svn: 277735
2016-08-04 16:38:44 +00:00
Nikolai Bozhenov f679530ba1 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
Matt Arsenault 979902b3ff AMDGPU: fdiv -1, x -> rcp -x
llvm-svn: 277535
2016-08-02 22:25:04 +00:00
Nicolai Haehnle 8a482b33fe AMDGPU: Stay in WQM for non-intrinsic stores
Summary:
Two types of stores are possible in pixel shaders: stores to memory that are
explicitly requested at the API level, and stores that are an implementation
detail of register spilling or lowering of arrays.

For the first kind of store, we must ensure that helper pixels have no effect
and hence WQM must be disabled. The second kind of store must always be
executed, because the written value may be loaded again in a way that is
relevant for helper pixels as well -- and there are no externally visible
effects anyway.

This is a candidate for the 3.9 release branch.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22675

llvm-svn: 277504
2016-08-02 19:31:14 +00:00
Nicolai Haehnle bef0e90cf1 AMDGPU: Track physical registers in SIWholeQuadMode
Summary:
There are cases where uniform branch conditions are computed in VGPRs, and
we didn't correctly mark those as WQM.

The stray change in basic-branch.ll is because invoking the LiveIntervals
analysis leads to the detection of a dead register that would otherwise not
be seen at -O0.

This is a candidate for the 3.9 branch, as it fixes a possible hang.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22673

llvm-svn: 277500
2016-08-02 19:17:37 +00:00
Valery Pykhtin 902db3101b [AMDGPU] refactor DS instruction definitions. NFC.
Differential revision: https://reviews.llvm.org/D22522

llvm-svn: 277344
2016-08-01 14:21:30 +00:00
Benjamin Kramer 22ff865a83 [AMDGPU] Fix lifetime of SmallVector temporaries.
Found by asan -fsanitize-address-use-after-scope.

llvm-svn: 277265
2016-07-30 11:31:16 +00:00
Matt Arsenault 749035b7b1 AMDGPU: Fix shouldConvertConstantLoadToIntImm behavior
This should really be true for any immediate, not just
inline ones.

llvm-svn: 277260
2016-07-30 01:40:36 +00:00
Matt Arsenault d2141b6030 AMDGPU: Set s_setpc_b64 as a terminator
llvm-svn: 277259
2016-07-30 01:40:34 +00:00
Matt Arsenault dc744412ad AMDGPU: Remove unused pattern
llvm-svn: 277258
2016-07-30 01:40:30 +00:00
Sjoerd Meijer 0eb96ed0de TargetInstrInfo: add virtual function getInstSizeInBytes
This adds a target hook getInstSizeInBytes to TargetInstrInfo that a lot of
subclasses already implement.

Differential Revision: https://reviews.llvm.org/D22885

llvm-svn: 277126
2016-07-29 08:16:16 +00:00
Changpeng Fang 26fb9d268b AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB.
Differential Revision: http://reviews.llvm.org/D22021

Reviewed by: arsenm

llvm-svn: 277073
2016-07-28 23:01:45 +00:00
Matthias Braun 941a705b7b MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

llvm-svn: 277017
2016-07-28 18:40:00 +00:00
Wei Ding 07e03712d3 AMDGPU : Add intrinsics for compare with the full wavefront result
Differential Revision: http://reviews.llvm.org/D22482

llvm-svn: 276998
2016-07-28 16:42:13 +00:00
Tom Stellard 19f4301099 AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling
Summary:
We were using reserved VGPRs for SGPR spilling and this was causing
some programs with a workgroup size of 1024 to use more than 64
registers, which is illegal.

Reviewers: arsenm, mareko, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22032

llvm-svn: 276980
2016-07-28 14:30:43 +00:00
Nicolai Haehnle 3b572002a2 AMDGPU: add execfix flag to SI_ELSE
Summary:
SI_ELSE is lowered into two parts:

s_or_saveexec_b64 dst, src (at the start of the basic block)

s_xor_b64 exec, exec, dst (at the end of the basic block)

The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction

s_and_b64 exec, exec, s[...]

which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:

s_or_savexec_b64 dst, src

s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow

s_xor_b64 exec, exec, dst

Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.

Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit

s_or_saveexec_b64 dst, src

...

s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst

and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions

s_and_b64 vcc, exec, vcc

before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.

In any case, this current patch could help in the short term with the whole
ExecModified business.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22846

llvm-svn: 276972
2016-07-28 11:39:24 +00:00
Matt Arsenault edc7dcb2aa AMDGPU: Turn dead checks into asserts
llvm-svn: 276946
2016-07-28 00:32:05 +00:00
Matt Arsenault fe26775992 AMDGPU: Remove analyzeImmediate
This no longer uses the more complicated classification
of constants.

llvm-svn: 276945
2016-07-28 00:32:02 +00:00
Reid Kleckner 46cb48c74a Remove MCAsmInfo.h include from TargetOptions.h
TargetOptions wants the ExceptionHandling enum. Move that to
MCTargetOptions.h to avoid transitively including Dwarf.h everywhere in
clang. Now you can add a DWARF tag without a full rebuild of clang
semantic analysis.

llvm-svn: 276883
2016-07-27 16:03:57 +00:00
Ahmed Bougacha 6756a2c953 [GlobalISel] Introduce an instruction selector.
And implement it for AArch64, supporting x/w ADD/OR.

Differential Revision: https://reviews.llvm.org/D22373

llvm-svn: 276875
2016-07-27 14:31:55 +00:00
Matt Arsenault e3862cdc93 AMDGPU: Use rcp for fdiv 1, x with fpmath metadata
Using rcp should be OK for safe math usually, so this
should not be replacing the original fdiv.

llvm-svn: 276823
2016-07-26 23:25:44 +00:00
Matt Arsenault c6b69a9114 AMDGPU: Use implicit_def for selecting anyext
llvm-svn: 276819
2016-07-26 23:06:33 +00:00
Matt Arsenault 60a750fa69 AMDGPU/R600: Remove dead custom inserters
The intrinsics for these were removed, so this is dead.

llvm-svn: 276805
2016-07-26 21:03:38 +00:00
Matt Arsenault b06db8f666 AMDGPU: Minor AsmPrinter cleanups
llvm-svn: 276804
2016-07-26 21:03:36 +00:00
Matt Arsenault 52ef4019fd AMDGPU: Make AMDGPUMachineFunction fields private
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.

llvm-svn: 276766
2016-07-26 16:45:58 +00:00
Matt Arsenault 32fc527c65 AMDGPU: Add fp legacy instruction intrinsics
This could use some additional optimization work
to use mad/mac legacy.

llvm-svn: 276764
2016-07-26 16:45:45 +00:00
Jan Vesely b64c8925e9 AMDGPU: Remove read_workdim intrinsic
Differential revision: https://reviews.llvm.org/D22732

llvm-svn: 276682
2016-07-25 20:17:02 +00:00
Matt Arsenault 2fa171c43a AMDGPU: Make skip threshold an option
llvm-svn: 276680
2016-07-25 19:48:29 +00:00
Matt Arsenault cdae95bef2 AMDGPU: Delete dead code
llvm-svn: 276675
2016-07-25 19:06:25 +00:00
Joel Jones 373d7d30dd MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC
Some targets, notably AArch64 for ILP32, have different relocation encodings
based upon the ABI. This is an enabling change, so a future patch can use the
ABIName from MCTargetOptions to chose which relocations to use. Tested using
check-llvm.

The corresponding change to clang is in: http://reviews.llvm.org/D16538

Patch by: Joel Jones

Differential Revision: https://reviews.llvm.org/D16213

llvm-svn: 276654
2016-07-25 17:18:28 +00:00
Matt Arsenault b40d8600ca AMDGPU: Delete dead code
This has been dead since r269479

llvm-svn: 276518
2016-07-23 07:07:14 +00:00
Tom Stellard b8253c88b6 Revert "[AMDGPU] Emit read-only data to .rodata for hsa"
This reverts commit r276298.

Data stored in .rodata can have a negative offset from .text, but we
don't support negative values in relocations yet.

This caused a regression in one of the amp conformance tests:
5_Data_Cont/5_2_a_v/5_2_3_m/Assignment/Test.02.01

llvm-svn: 276498
2016-07-22 23:46:40 +00:00
Tim Northover 33b07d6725 GlobalISel: implement legalization pass, with just one transformation.
This adds the actual MachineLegalizeHelper to do the work and a trivial pass
wrapper that legalizes all instructions in a MachineFunction. Currently the
only transformation supported is splitting up a vector G_ADD into one acting on
smaller vectors.

llvm-svn: 276461
2016-07-22 20:03:43 +00:00
Matt Arsenault 3c07c813c0 AMDGPU: Fix groupstaticsize for large LDS
The size can exceed s_movk_i32's limit, and we don't
want to use it this early since it inhibits optimizations.

This should probably be merged to the release branch.

llvm-svn: 276438
2016-07-22 17:01:33 +00:00
Matt Arsenault 8d718dcfda AMDGPU: Add HSA dispatch id intrinsic
llvm-svn: 276437
2016-07-22 17:01:30 +00:00
Matt Arsenault f9245b75c0 AMDGPU: Delete more dead code
Remove dead code from r600 intrinsic removal.
Remove unset members, rename StackSize to be less ambiguous.

llvm-svn: 276436
2016-07-22 17:01:25 +00:00
Matt Arsenault 7fb961f3e6 AMDGPU: Fix i1 fp_to_int
R600's i1 fp_to_uint selected but was incorrect according to
what instcombine constant folds to.

llvm-svn: 276435
2016-07-22 17:01:21 +00:00
Matt Arsenault d40ded6681 AMDGPU: Don't reinvent transferSuccessorsAndUpdatePHIs
llvm-svn: 276434
2016-07-22 17:01:15 +00:00
Konstantin Zhuravlyov 3c0d8d22fe [AMDGPU] Emit read-only data to .rodata for hsa
Differential Revision: https://reviews.llvm.org/D22538

llvm-svn: 276298
2016-07-21 15:59:23 +00:00
Konstantin Zhuravlyov 155626238b AMDGPU/SI: Add support for R_AMDGPU_ABS32
Differential Revision: https://reviews.llvm.org/D21646

llvm-svn: 276294
2016-07-21 15:29:19 +00:00
Sam Kolton 6c4efcadfe [AMDGPU] Some code cleaning in SIRegisterInfo.td
Reviewers: tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl

Differential Revision: https://reviews.llvm.org/D22620

llvm-svn: 276274
2016-07-21 13:29:57 +00:00
Matt Arsenault f0ba86a4d5 AMDGPU: Fix phis from blocks split due to register indexing
llvm-svn: 276257
2016-07-21 09:40:57 +00:00
Yaxun Liu 4b1d9f7f18 AMDGPU: Fix bug causing crash due to invalid opencl version metadata.
Differential Revision: https://reviews.llvm.org/D22526

llvm-svn: 276119
2016-07-20 14:38:06 +00:00
Matt Arsenault a1fe17c9ad AMDGPU: Change fdiv lowering based on !fpmath metadata
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

llvm-svn: 276051
2016-07-19 23:16:53 +00:00
Davide Italiano 63e5968033 [AMDGPU] Remove spurious line (should've been removed in r276029).
llvm-svn: 276030
2016-07-19 21:16:30 +00:00
Davide Italiano 1576e38598 [AMDGPU] Remove dead code.
LGTM'd by Matt Arsenault.

llvm-svn: 276029
2016-07-19 21:10:49 +00:00
Matt Arsenault 03006fd3c4 AMDGPU: Only use legal inline immediates with kill pseudo
Only if the value is negative or positive is what matters,
so use a constant that doesn't require an instruction to
materialize.

These should really just emit the write exec directly,
but for stick with the kill pseudo-terminator.

llvm-svn: 275988
2016-07-19 16:27:56 +00:00
Matt Arsenault fe358066ea AMDGPU/SI: Fix SI scheduler refcount issue
Without this fix, releaseSuccessors when InOrOutBlock is
false could release SUs outside the schedule BasicBlock.

Patch by Axel Davy

llvm-svn: 275935
2016-07-19 00:35:22 +00:00
Matt Arsenault cb540bc03c AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

llvm-svn: 275934
2016-07-19 00:35:03 +00:00
Matt Arsenault 210b7cf3e2 AMDGPU: Remove pointless dyn_cast_or_null
This is already casted above so non-null

llvm-svn: 275881
2016-07-18 19:00:07 +00:00
Matt Arsenault b51dcb97bb AMDGPU: Fix missing switch case warning
llvm-svn: 275873
2016-07-18 18:40:51 +00:00
Matt Arsenault c96e1deffa AMDGPU: Add intrinsic for s_flbit_i32/v_ffbh_i32
llvm-svn: 275871
2016-07-18 18:35:05 +00:00
Matt Arsenault 4c519d3518 AMDGPU/R600: Replace barrier intrinsics
llvm-svn: 275870
2016-07-18 18:34:59 +00:00
Matt Arsenault efb24540b1 AMDGPU: Remove dead check in AMDGPUPromoteAlloca
This is currently only called with GEP users. A direct
alloca would only happen with current typed pointers
for arrays which are a perverse case.

Also fix crashes on 0 x and 1 x arrays.

llvm-svn: 275869
2016-07-18 18:34:53 +00:00
Matt Arsenault 2e08e181a7 AMDGPU: Remove dead code and redundant check
Non intrinsic calls aren't really handled, and this
IntrinsicInst dyn_cast checks for the function for us.

llvm-svn: 275868
2016-07-18 18:34:48 +00:00
Nicolai Haehnle bef1ceb815 AMDGPU: Disable AMDGPUPromoteAlloca pass for shader calling conventions.
Summary:
The work item intrinsics are not available for the shader
calling conventions. And even if we did hook them up most
shader stages haves some extra restrictions on the amount
of available LDS.

Reviewers: tstellarAMD, arsenm

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D20728

llvm-svn: 275779
2016-07-18 09:02:47 +00:00
Yaxun Liu a711cc7951 Re-commit [AMDGPU] Add metadata for runtime
Attempting to fix lit test failure on ppc.

llvm-svn: 275676
2016-07-16 05:09:21 +00:00
Matt Arsenault 73d2f8954a AMDGPU: Fix verifier error from partially undef copy
In this situation:

%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2

The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.

llvm-svn: 275635
2016-07-15 22:32:02 +00:00
Matt Arsenault a65e6b8335 AMDGPU: Remove brev intrinsic
llvm-svn: 275620
2016-07-15 21:27:13 +00:00
Matt Arsenault 82e5e1e564 AMDGPU: Fix TargetPrefix for remaining r600 intrinsics
llvm-svn: 275619
2016-07-15 21:27:08 +00:00
Matt Arsenault 11d3e21f2b AMDGPU: Remove AMDGPU.ldexp
llvm-svn: 275618
2016-07-15 21:26:56 +00:00
Matt Arsenault 09b2c4aee8 AMDGPU: Remove legacy rsq.clamped intrinsic
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.

Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.

llvm-svn: 275617
2016-07-15 21:26:52 +00:00
Matt Arsenault d7f4414543 AMDGPU/R600: Delete dead code.
Dead or the same as the base implementation.

llvm-svn: 275616
2016-07-15 21:26:46 +00:00
Vitaly Buka 7f64844481 Revert "[AMDGPU] Add metadata for runtime"
This reverts commit r275566.

llvm-svn: 275599
2016-07-15 19:14:57 +00:00
Justin Lebar 9c375817ac [SelectionDAG] Get rid of bool parameters in SelectionDAG::getLoad, getStore, and friends.
Summary:
Instead, we take a single flags arg (a bitset).

Also add a default 0 alignment, and change the order of arguments so the
alignment comes before the flags.

This greatly simplifies many callsites, and fixes a bug in
AMDGPUISelLowering, wherein the order of the args to getLoad was
inverted.  It also greatly simplifies the process of adding another flag
to getLoad.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, jyknight, dsanders, nemanjai, llvm-commits

Differential Revision: http://reviews.llvm.org/D22249

llvm-svn: 275592
2016-07-15 18:27:10 +00:00
Yaxun Liu b3d17690eb [AMDGPU] Add metadata for runtime
Added emitting metadata to elf for runtime.

Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream.

Differential Revision: https://reviews.llvm.org/D21849

llvm-svn: 275566
2016-07-15 14:58:21 +00:00
Jacques Pienaar 71c30a14b7 Rename AnalyzeBranch* to analyzeBranch*.
Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect.

Reviewers: tstellarAMD, mcrosier

Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai

Differential Revision: https://reviews.llvm.org/D22409

llvm-svn: 275564
2016-07-15 14:41:04 +00:00
Matt Arsenault b91805ea2b AMDGPU: Fix not expanding control flow after some kill blocks
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.

Fixes bug 28550

llvm-svn: 275510
2016-07-15 00:58:15 +00:00
Matt Arsenault fa5a86a403 AMDGPU: Fix trying to skip from a block with no successors
Found while reducing bug 28550

llvm-svn: 275509
2016-07-15 00:58:13 +00:00
Matt Arsenault 83ab049af2 AMDGPU: Fix splitting kill blocks with defs before kill
llvm-svn: 275508
2016-07-15 00:58:09 +00:00
Sam Kolton 7a2a323feb [AMDGPU] Assembler: fix row_bcast parsing
Summary: This change fix bug 28538

Reviewers: tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl

Differential Revision: https://reviews.llvm.org/D22355

llvm-svn: 275422
2016-07-14 14:50:35 +00:00
Matt Arsenault ca7f5701f8 AMDGPU/R600: Delete/rename intrinsics no longer used by mesa
Use the replacement pass to update the tests, and delete old names.

llvm-svn: 275375
2016-07-14 05:47:17 +00:00
Matt Arsenault 648e422bd9 AMDGPU/R600: Remove intrinsics with no tests and no users
Mesa removed this path, so nothing is using these anymore.

llvm-svn: 275372
2016-07-14 05:23:23 +00:00
Matt Arsenault 897eee4187 AMDGPU: Remove unused intrinsics
llvm-svn: 275371
2016-07-14 05:23:19 +00:00
Matt Arsenault 0bf9984bc8 AMDGPU: Remove dead code
llvm-svn: 275369
2016-07-14 05:23:08 +00:00
Matt Arsenault f071102647 AMDGPU: Remove last AMDIL intrinsics
llvm-svn: 275309
2016-07-13 19:42:06 +00:00
Marek Olsak 0532c190f7 AMDGPU/SI: Emit the number of SGPR and VGPR spills
Summary:
v2: don't count SGPRs spilled to scratch twice

I think this is sufficient. It doesn't count private memory usage, which
happens often and uses scratch but isn't technically a spill. The private
memory usage can be computed by:
  [scratch_per_thread - vgpr_spills - a random multiple of SGPR spills].

The fact SGPR spills add very high numbers to the scratch size make that
computation a guessing game, but I don't have a solution to that.

Reviewers: tstellarAMD

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D22197

llvm-svn: 275288
2016-07-13 17:35:15 +00:00
Tom Stellard 418beb7671 AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL
Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21484

llvm-svn: 275268
2016-07-13 14:23:33 +00:00
Matt Arsenault 0056868c4a AMDGPU: Fold out no-op kill intrinsics
llvm-svn: 275253
2016-07-13 06:04:22 +00:00
Matt Arsenault 8dff86d878 AMDGPU: WQM cleanups
- Add new TTI instruction checks
- Don't use const for blocks that are mutated.
- Checking isBranch and isTerminator should be redundant

llvm-svn: 275252
2016-07-13 05:55:15 +00:00
Matt Arsenault 786724a22e AMDGPU: Follow up to r275203
I meant to squash this into it.

llvm-svn: 275220
2016-07-12 21:41:32 +00:00
Matt Arsenault 657f871a4e AMDGPU: Fix verifier error with kill intrinsic
Don't create a terminator in the middle of the block.
We should probably get rid of this intrinsic.

llvm-svn: 275203
2016-07-12 19:01:23 +00:00
Matt Arsenault 10531d1020 AMDGPU: Set isConvergent on v_cmpx* instructions
No test since these aren't used now, except for one place
in a pre-emit pass.

llvm-svn: 275200
2016-07-12 18:41:03 +00:00
Wei Ding 5b2636a152 AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8
Differential Revision: http://reviews.llvm.org/D22239

llvm-svn: 275197
2016-07-12 18:02:14 +00:00
Nicolai Haehnle 7968c34586 AMDGPU: Unify MOVRELSOffset and MOVRELDOffset
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22217

llvm-svn: 275160
2016-07-12 08:12:16 +00:00
Matt Arsenault fc7e6a0a0e AMDGPU: Cleanup pseudoinstructions
llvm-svn: 275133
2016-07-12 00:23:17 +00:00
Matt Arsenault 840593e19d AMDGPU: Fix missing scc def on control flow pseudos
These are all expanded to instructions that include an scc def.

llvm-svn: 275132
2016-07-12 00:08:14 +00:00
Matt Arsenault e3742466b9 AMDGPU: Enable trackLivenessAfterRegAlloc
This has caught a number of bugs.

llvm-svn: 275131
2016-07-11 23:56:30 +00:00
Nicolai Haehnle c06bfa1daa AMDGPU: Treat texture gather instructions more like other MIMG instructions
Summary:
Setting MIMG to 0 has a bunch of unexpected side effects, including that
isVMEM returns false which leads to incorrect treatment in the hazard
recognizer. The reason I noticed it is that it also leads to incorrect
treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug.

The only reason why MIMG was set to 0 is to signal the special handling of
dmasks, but that can be checked differently.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22210

llvm-svn: 275113
2016-07-11 21:59:43 +00:00
Nicolai Haehnle f52c3cf272 AMDGPU: fix local stack slot allocation bugs
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.

The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21551

llvm-svn: 275108
2016-07-11 21:44:40 +00:00
Nirav Dave 8603062ee4 Fix branch relaxation in 16-bit mode.
Thread through MCSubtargetInfo to relaxInstruction function allowing relaxation
to generate jumps with 16-bit sized immediates in 16-bit mode.

This fixes PR22097.

Reviewers: dwmw2, tstellarAMD, craig.topper, jyknight

Subscribers: jfb, arsenm, jyknight, llvm-commits, dsanders

Differential Revision: http://reviews.llvm.org/D20830

llvm-svn: 275068
2016-07-11 14:23:53 +00:00
Artem Tamazov 53c9de08d2 [AMDGPU][llvm-mc] Quickfix for r272748 to enable labels in branch instructions.
Fixes issue mentioned at:
  https://github.com/RadeonOpenCompute/LLVM-AMDGPU-Assembler-Extra/issues/13.
Lit tests added.

Differential Revision: http://reviews.llvm.org/D22133

llvm-svn: 275054
2016-07-11 12:07:18 +00:00
Jan Vesely 2fa28c330c AMDGPU/R600: Add implicitarg.ptr intrinsic
Differential Revision: http://reviews.llvm.org/D21622

llvm-svn: 275024
2016-07-10 21:20:29 +00:00
Matt Arsenault 52a4d9b429 AMDGPU: Move R600 only pieces into R600 classes
llvm-svn: 274979
2016-07-09 18:11:15 +00:00
Matt Arsenault 48d70cb486 Revert "AMDGPU: Remove unused control flow intrinsic"
llvm-svn: 274978
2016-07-09 17:18:39 +00:00
NAKAMURA Takumi f35424b73f AMDGPU: Prune AMDGPUAsmParser in libdeps.
llvm-svn: 274970
2016-07-09 07:54:27 +00:00
Matt Arsenault dfec5ce032 AMDGPU: Fix fdiv lowering when f32 denormals supported
Also fix test not actually using function labels.

llvm-svn: 274969
2016-07-09 07:48:11 +00:00
Matt Arsenault 1322b6f8bb AMDGPU: Improve offset folding for register indexing
llvm-svn: 274954
2016-07-09 01:13:56 +00:00
Matt Arsenault 95c7897555 AMDGPU: Simplify isSchedulingBoundary
llvm-svn: 274953
2016-07-09 01:13:51 +00:00
Matt Arsenault 8f0a92f0ba AMDGPU: Remove unused control flow intrinsic
llvm-svn: 274939
2016-07-08 21:39:44 +00:00
Duncan P. N. Exon Smith 4d29511894 AMDGPU: Remove implicit iterator conversions, NFC
Remove remaining implicit conversions from MachineInstrBundleIterator to
MachineInstr* from the AMDGPU backend.  In most cases, I made them less
attractive by preferring MachineInstr& or using a ranged-based for loop.

Once all the backends are fixed I'll make the operator explicit so that
this doesn't bitrot back.

llvm-svn: 274906
2016-07-08 19:16:05 +00:00
Duncan P. N. Exon Smith 221847ef63 AMDGPU: Make infinite loop clear, NFC
Change a while loop that was checking for nullptr on an
iterator-to-pointer conversion to an infinite for loop.  Now it's clear
that the condition doesn't terminate.

The only change in behaviour is if an invalid iterator (holding nullptr)
was passed into AMDGPUCFGStructurizer::reversePredicateSetter.  There
are only two callers, and they both dereference the iterator before
sending it in, so rather than adding an early return to avoid the loop
I've just asserted (using a static_cast, to avoid an implicit conversion
to pointer).

llvm-svn: 274902
2016-07-08 19:00:17 +00:00
Matt Arsenault b63f18c9c3 AMDGPU: Minor adjustment to r274817
The commit message is inaccurate, modifiesRegister
will check for partial defs of exec.

We currently don't ever emit partial defs of exec,
so it doesn't really matter.

llvm-svn: 274886
2016-07-08 17:06:48 +00:00
Valery Pykhtin 68853ab2c5 [AMDGPU] fix ds_swizzle_b32 opcode for VI (bz 28371)
Differential Revision: http://reviews.llvm.org/D22049

llvm-svn: 274852
2016-07-08 15:12:46 +00:00
Matt Arsenault a74374a86b AMDGPU: Move si_mask_branch register operand to be a use
llvm-svn: 274818
2016-07-08 00:55:44 +00:00
Matt Arsenault d4a84b1ed2 AMDGPU: Cleanup. Use definesRegister instead of manual loop
Also this will be more precise since it will check
exec_lo/exec_hi writes.

llvm-svn: 274817
2016-07-08 00:55:39 +00:00
Valery Pykhtin af8b1bddbd [AMDGPU] fix ds_write_src2 encoding (bz26027)
Differential revision: http://reviews.llvm.org/D22041

llvm-svn: 274756
2016-07-07 14:23:38 +00:00
Michael Kuperstein aa71bdd3af [TTI] The cost model should not assume vector casts get completely scalarized
The cost model should not assume vector casts get completely scalarized, since
on targets that have vector support, the common case is a partial split up to
the legal vector size. So, when a vector cast  gets split, the resulting casts
end up legal and cheap.

Instead of pessimistically assuming scalarization, base TTI can use the costs
the concrete TTI provides for the split vector, plus a fudge factor to account
for the cost of the split itself. This fudge factor is currently 1 by default,
except on AMDGPU where inserts and extracts are considered free.

Differential Revision: http://reviews.llvm.org/D21251

llvm-svn: 274642
2016-07-06 17:30:56 +00:00
Nicolai Haehnle e40530ea7b AMDGPU: Fix return of non-void-returning shaders
Summary:
Since "AMDGPU: Fix verifier errors in SILowerControlFlow", the logic that
ensures that a non-void-returning shader falls off the end of the last
basic block was effectively disabled, since SI_RETURN is now used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96731

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21975

llvm-svn: 274612
2016-07-06 08:35:17 +00:00
Matt Arsenault e8dbf791b1 AMDGPU: Remove unnecessary string usage in AsmPrinter
Registers are printed a lot, so don't create temporary
std::strings. Using char instead of a string to an ostream
saves a function call.

llvm-svn: 274581
2016-07-05 22:06:56 +00:00
Matt Arsenault ffc8275f2b AMDGPU: Fix folding SGPRs into madak/madmk src0
Because of the special immediate operand, the constant
bus is already used so SGPRs are never useful.

r263212 changed the name of the immediate operand, which
broke the verifier check for the restriction.

llvm-svn: 274564
2016-07-05 17:09:01 +00:00
Tom Stellard a4b746d808 AMDGPU/SI: Remove address space query functions from AMDGPUDAGToDAGISel
Summary:
These have been replaced with TableGen code (except for isConstantLoad,
which is still used for R600).  The queries were broken for cases
where MemOperand was a PseudoSourceValue.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21684

llvm-svn: 274561
2016-07-05 16:10:44 +00:00
Valery Pykhtin e65b39ec09 [AMDGPU] rename DS_1A1D_Off8_NORET to DS_1A2D_Off8_NORET as ds_write2xx use 2 source registers. NFC.
llvm-svn: 274556
2016-07-05 15:15:28 +00:00
Sam Kolton a9cd6aa895 [AMDGPU] Assembler: Fix parsing error with floating-point literals passed to integer instructions
Differential Revision: http://reviews.llvm.org/D21972

llvm-svn: 274551
2016-07-05 14:01:11 +00:00
Tom Stellard 4a105d73a9 AMDGPU/R600: Add PatFrags for selecting the correct vtx id for loads
This moves of the r600 logic out of isGlobalLoad() and into the
TableGen files.

Differential Revision: http://reviews.llvm.org/D21710

llvm-svn: 274527
2016-07-05 00:12:51 +00:00
Tom Stellard 17a0ec5400 AMDGPU/SI: Remove hack for selecting < 32-bit loads to MUBUF instructions
Summary:
The isGlobalLoad() query was returning true for constant address space loads
with memory types less than 32-bits, which is wrong.  This logic has been
replaced with PatFrag in the TableGen files, to provide the same functionality.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21696

llvm-svn: 274521
2016-07-04 20:41:48 +00:00
Jan Vesely 991dfd7b07 AMDGPU/R600: Add indentation to VTX and TEX fetch asm strings
These are printed as part of Fetch clauses.

Differential Revision: http://reviews.llvm.org/D21730

llvm-svn: 274517
2016-07-04 19:45:00 +00:00
Matt Arsenault 7f681ac7a9 AMDGPU: Add feature for unaligned access
llvm-svn: 274398
2016-07-01 23:03:44 +00:00
Matt Arsenault 8af47a09e5 AMDGPU: Expand unaligned accesses early
Due to visit order problems, in the case of an unaligned copy
the legalized DAG fails to eliminate extra instructions introduced
by the expansion of both unaligned parts.

llvm-svn: 274397
2016-07-01 22:55:55 +00:00
Matt Arsenault 327bb5ad82 AMDGPU: Improve load/store of illegal types.
There was a combine before to handle the simple copy case.
Split this into handling loads and stores separately.

We might want to change how this handles some of the vector
extloads, since this can result in large code size increases.

llvm-svn: 274394
2016-07-01 22:47:50 +00:00
Matt Arsenault 105c2a204c AMDGPU/SI: Enable testing several variants for si scheduler
Enable testing different scheduling variants if sgpr usage
is very high. It was previously disabled because of a bug
in handleMove, but it has been fixed since.

Patch by Axel Davy

llvm-svn: 274372
2016-07-01 18:03:46 +00:00
Nikolay Haustov beb24f5b20 Resubmit r268719 - AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.
This was reverted in r268740 because of problems with corresponding Clang change.
Clang change was updated and resubmitted in r274220.

Check calling convention in AMDGPUMachineFunction::isKernel

This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.

Also, in the future unused non-kernels may be optimized.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19917

llvm-svn: 274341
2016-07-01 10:00:58 +00:00
Sam Kolton 5196b88f07 [AMDGPU] Assembler: support SDWA for VOPC instructions
Summary: dst_sel and dst_unused disabled for VOPC as they have no effect on result

Reviewers: artem.tamazov, tstellarAMD, vpykhtin

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21376

llvm-svn: 274340
2016-07-01 09:59:21 +00:00
NAKAMURA Takumi 566597330a Update libdeps; AMDGPUCodeGen requires LLVMVectorize.
llvm-svn: 274339
2016-07-01 09:55:23 +00:00
Matt Arsenault 908b9e26a6 AMDGPU: Add option to run the load/store vectorizer
llvm-svn: 274329
2016-07-01 03:33:52 +00:00
Matt Arsenault 0994bd57fb AMDGPU: Implement getLoadStoreVecRegBitWidth
llvm-svn: 274312
2016-07-01 00:56:27 +00:00
Duncan P. N. Exon Smith 632987296f Target: Remove unused arguments from overrideSchedPolicy, NFC
TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr*
arguments (begin and end) that invite implicit conversions from
MachineInstrBundleIterator.  One option would be to change their type to
an iterator, but since they don't seem to have been used since the API
was added in 2010, I'm deleting the dead code.

llvm-svn: 274304
2016-07-01 00:23:27 +00:00
Duncan P. N. Exon Smith e4f5e4f4d1 CodeGen: Use MachineInstr& in TargetLowering, NFC
This is a mechanical change to make TargetLowering API take MachineInstr&
(instead of MachineInstr*), since the argument is expected to be a valid
MachineInstr.  In one case, changed a parameter from MachineInstr* to
MachineBasicBlock::iterator, since it was used as an insertion point.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

llvm-svn: 274287
2016-06-30 22:52:52 +00:00
Matt Arsenault c1142725bd AMDGPU: Add m0 vgpr load loop block as successor
This shows up as a verifier error when I move this
earlier, not sure why it didn't before.

llvm-svn: 274275
2016-06-30 20:49:28 +00:00
Rafael Espindola d86e8bb0ed Delete MCCodeGenInfo.
MC doesn't really care about CodeGen stuff, so this was just
complicating target initialization.

llvm-svn: 274258
2016-06-30 18:25:11 +00:00
Duncan P. N. Exon Smith 9cfc75c214 CodeGen: Use MachineInstr& in TargetInstrInfo, NFC
This is mostly a mechanical change to make TargetInstrInfo API take
MachineInstr& (instead of MachineInstr* or MachineBasicBlock::iterator)
when the argument is expected to be a valid MachineInstr.  This is a
general API improvement.

Although it would be possible to do this one function at a time, that
would demand a quadratic amount of churn since many of these functions
call each other.  Instead I've done everything as a block and just
updated what was necessary.

This is mostly mechanical fixes: adding and removing `*` and `&`
operators.  The only non-mechanical change is to split
ARMBaseInstrInfo::getOperandLatencyImpl out from
ARMBaseInstrInfo::getOperandLatency.  Previously, the latter took a
`MachineInstr*` which it updated to the instruction bundle leader; now,
the latter calls the former either with the same `MachineInstr&` or the
bundle leader.

As a side effect, this removes a bunch of MachineInstr* to
MachineBasicBlock::iterator implicit conversions, a necessary step
toward fixing PR26753.

Note: I updated WebAssembly, Lanai, and AVR (despite being
off-by-default) since it turned out to be easy.  I couldn't run tests
for AVR since llc doesn't link with it turned on.

llvm-svn: 274189
2016-06-30 00:01:54 +00:00
Matt Arsenault eb9025d698 AMDGPU: Fix global isel crashes
llvm-svn: 274039
2016-06-28 17:42:09 +00:00
Matt Arsenault 254a6450dd AMDGPU: Fix typo
llvm-svn: 274034
2016-06-28 16:59:53 +00:00
Matt Arsenault 3d846501fb AMDGPU: Remove unused function
llvm-svn: 274033
2016-06-28 16:59:49 +00:00
Matt Arsenault b4d9503171 AMDGPU: Fix out of bounds indirect indexing errors
This was producing acceses to registers beyond the super
register's limits, resulting in verifier failures.

llvm-svn: 273977
2016-06-28 01:09:00 +00:00
Matt Arsenault 55dff27122 AMDGPU: Fix global isel build
llvm-svn: 273964
2016-06-28 00:11:26 +00:00
Matt Arsenault ed1fd7e056 AMDGPU: Set MinInstAlignment
Not sure this actually changes anything

llvm-svn: 273947
2016-06-27 21:42:49 +00:00
Matt Arsenault 59c0ffa22a AMDGPU: Implement per-function subtargets
llvm-svn: 273940
2016-06-27 20:48:03 +00:00
Matt Arsenault 03d8584590 AMDGPU: Move subtarget feature checks into passes
llvm-svn: 273937
2016-06-27 20:32:13 +00:00
Matt Arsenault 21a4625a16 AMDGPU: Fix verifier errors with undef vector indices
Also fix pointlessly adding exec to liveins.

llvm-svn: 273916
2016-06-27 19:57:44 +00:00
Simon Pilgrim 634dde36b4 Fix "not all control paths return a value" warning on MSVC
llvm-svn: 273872
2016-06-27 12:58:10 +00:00
NAKAMURA Takumi 5cbd41e0a8 SIMachineFunctionInfo.cpp: Appease msc18 to use std::array.
llvm-svn: 273860
2016-06-27 10:26:43 +00:00
NAKAMURA Takumi f619b50fab Reformat.
llvm-svn: 273859
2016-06-27 10:26:36 +00:00
NAKAMURA Takumi d377ad806e Reformat blank lines.
llvm-svn: 273858
2016-06-27 10:26:25 +00:00
Jan Vesely 3bc1af2be4 AMDGPU/R600: Fix GlobalValue regressions.
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly.
Fixes: r272705

Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32.
Fixes: r273166

Differential Revision: http://reviews.llvm.org/D21633

llvm-svn: 273785
2016-06-25 18:24:16 +00:00
Konstantin Zhuravlyov f2f3d14774 [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.

Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
  - offset 0: work group ID x
  - offset 4: work group ID y
  - offset 8: work group ID z
  - offset 16: work item ID x
  - offset 20: work item ID y
  - offset 24: work item ID z

Set
  - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
  - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
  - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled

Differential Revision: http://reviews.llvm.org/D20335

llvm-svn: 273769
2016-06-25 03:11:28 +00:00
Tom Stellard b164a9843b AMDGPU/SI: Make sure not to fold offsets into local address space globals
Summary:
Offset folding only works if you are emitting relocations, and we don't
emit relocations for local address space globals.

Reviewers: arsenm, nhaustov

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21647

llvm-svn: 273765
2016-06-25 01:59:16 +00:00
Matthias Braun 1e374a7aa6 AMDGPU: Define a schedule class for COPY.
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.

Differential Revision: http://reviews.llvm.org/D21540

llvm-svn: 273751
2016-06-24 23:52:11 +00:00
Matt Arsenault 86de486d31 AMDGPU: Add stub custom CodeGenPrepare pass
This will do various things including ones
CodeGenPrepare does, but with knowledge of uniform
values.

llvm-svn: 273657
2016-06-24 07:07:55 +00:00
Matt Arsenault c581611e11 AMDGPU: Remove disable-irstructurizer subtarget feature
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.

llvm-svn: 273653
2016-06-24 06:30:22 +00:00
Matt Arsenault 43e92fe306 AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

llvm-svn: 273652
2016-06-24 06:30:11 +00:00
Tom Stellard 14416ae6cd Support/ELF: Add R_AMDGPU_GOTPCREL relocation
Summary:
We will start generating this in a future patch.

Reviewers: arsenm, kzhuravl, rafael, ruiu, tony-tye

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21482

llvm-svn: 273628
2016-06-23 23:11:29 +00:00
Matt Arsenault 8d4b0eddd6 AMDGPU: Add option to disable spilling SGPRs to VGPRs.
This can help debug spilling problems.

llvm-svn: 273605
2016-06-23 20:00:34 +00:00
Valery Pykhtin a852d695b8 [AMDGPU] Enable absolute expression initializer for amd_kernel_code_t fields.
Differential Revision: http://reviews.llvm.org/D21380

llvm-svn: 273561
2016-06-23 14:13:06 +00:00
Diana Picus e440f99913 [AMDGPU] Remove exit-on-error in test (PR27761)
The exit-on-error flag was necessary in order to avoid an assertion when
handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize.

We can avoid the assertion by creating some dummy nodes. This enables us to
remove the exit-on-error flag on the first 2 run lines (SI), but on the third
run line (R600) we would run into another assertion when trying to reserve
indirect registers. This patch also replaces that assertion with an early exit
from the function.

Fixes PR27761.

Differential Revision: http://reviews.llvm.org/D20852

llvm-svn: 273550
2016-06-23 09:19:16 +00:00
Matt Arsenault 529cf25e60 AMDGPU: readlane/writelane do not read exec
llvm-svn: 273525
2016-06-23 01:26:16 +00:00
Matt Arsenault 3cb4ddeb4e AMDGPU: Fix liveness when expanding m0 loop
llvm-svn: 273514
2016-06-22 23:40:57 +00:00
Changpeng Fang 47efe1f6db AMDGPU/SI: Define an intrinsic to expose ds_swizzle_b32
Reviewers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D21533

llvm-svn: 273496
2016-06-22 21:33:49 +00:00
Matt Arsenault 4a07bf6372 AMDGPU: Run verifier after 2nd run of SIShrinkInstructions
llvm-svn: 273469
2016-06-22 20:26:24 +00:00
Matt Arsenault 9babdf4265 AMDGPU: Fix verifier errors in SILowerControlFlow
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.

Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.

llvm-svn: 273467
2016-06-22 20:15:28 +00:00
Matt Arsenault 0e5befe315 AMDGPU: Make FrameLowering stack alignment 16
We don't need it to be that high. The natural alignment
for a single workitem's stack is 16.

llvm-svn: 273448
2016-06-22 17:47:39 +00:00
Matt Arsenault 180e0d5cef AMDGPU: Fix gcc warnings
Mostly removing dead code. Apparently gcc's warning
for unused functions is better

llvm-svn: 273363
2016-06-22 01:53:49 +00:00
Rafael Espindola 7b4ef068c6 Delete more dead code.
Found by gcc 6.

llvm-svn: 273322
2016-06-21 21:51:41 +00:00
Jan Vesely fea814d531 AMDGPU: Add implicitarg.ptr intrinsic.
Points to the start of implicit arguments (appended after explicit arguments)

Differential Revision: http://reviews.llvm.org/D20297

llvm-svn: 273317
2016-06-21 20:46:20 +00:00
Rafael Espindola 48975881ab Delete some dead code.
Found by gcc 6.

llvm-svn: 273303
2016-06-21 19:48:12 +00:00
Matt Arsenault 2209625387 AMDGPU: Preserve undef flag on vcc when shrinking v_cndmask_b32
The implicit operand is added by the initial instruction construction,
so this was adding an additional vcc use. The original one
was missing the undef flag the original condition had,
so the verifier would complain.

llvm-svn: 273182
2016-06-20 18:34:00 +00:00
Matt Arsenault b6d8c37e1a AMDGPU: Fold more custom nodes to undef
This will help sneak undefs past GVN into the DAG for
some tests.

Also add missing intrinsic for rsq_legacy, even though the node
was already selected to the instruction. Also start passing
the debug location to intrinsic errors.

llvm-svn: 273181
2016-06-20 18:33:56 +00:00
Matt Arsenault ff98241f37 Generalize DiagnosticInfoStackSize to support other limits
Backends may want to report errors on resources other than
stack size.

llvm-svn: 273177
2016-06-20 18:13:04 +00:00
Matt Arsenault a9720c67f1 AMDGPU: Use correct method for determining instruction size
llvm-svn: 273172
2016-06-20 17:51:32 +00:00
Tom Stellard 5350894265 AMDGPU: Add support for R_AMDGPU_REL32 relocations
Reviewers: arsenm, kzhuravl, rafael

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21401

llvm-svn: 273168
2016-06-20 17:33:43 +00:00
Tom Stellard 1c89eb7db0 AMDGPU: Emit R_AMDGPU_ABS32_{HI,LO} for scratch buffer relocations
Reviewers: arsenm, rafael, kzhuravl

Subscribers: rafael, arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21400

llvm-svn: 273166
2016-06-20 16:59:44 +00:00
NAKAMURA Takumi fd92154b20 Reformat blank lines.
llvm-svn: 273131
2016-06-20 01:05:15 +00:00
NAKAMURA Takumi fe1202c4cb Untabify.
llvm-svn: 273129
2016-06-20 00:37:41 +00:00
Matt Arsenault e935f05a94 AMDGPU: Fix kernel argument alignment impacting stack size
Don't use AllocateStack because kernel arguments have nothing
to do with the stack. The ensureMaxAlignment call was still
changing the stack alignment.

llvm-svn: 273080
2016-06-18 05:15:53 +00:00
Matt Arsenault 0bb294b224 AMDGPU: Temporarily select trap to s_endpgm
This should select to s_trap, but that requires
additonal work to setup and enable the trap handler.
For now emit s_endpgm so bugpoint stops getting stuck
on the unsupported call to abort.

Emit a warning that this will only terminate the wave and
not really trap.

llvm-svn: 273062
2016-06-17 22:27:03 +00:00
Tom Stellard 0114bb5aa0 AMDGPU/SI: Simplify code in SITargetLowering::LowerGlobalAddress()
This change were suggested in http://reviews.llvm.org/D21154.

llvm-svn: 273059
2016-06-17 22:22:09 +00:00
Matt Arsenault 8885910f8e AMDGPU: Remove llvm.SI.tid intrinsic
Mesa doesn't emit this for llvm >= 3.8 anymore.

llvm-svn: 273050
2016-06-17 21:18:41 +00:00
Changpeng Fang 3e06e1edac AMDGPU/SI: Propagate the Kill flag in storeRegToStackSlot and eliminateFrameIndex
Reviewers: arsenm, tstellarAMD

Differential Revision:  http://reviews.llvm.org/21438

llvm-svn: 272958
2016-06-16 21:20:47 +00:00
Matt Arsenault 01e062f5c6 AMDGPU: Fix maximum instruction size for amdgcn
This was causing the conservative estimate of inline asm
size to be twice as big as expected.

llvm-svn: 272956
2016-06-16 21:14:05 +00:00
Wei Ding ab3d91b8f1 AMDGPU: Add v_mad 16-bit instructions definition.
Differential Revision: http://reviews.llvm.org/D21362

llvm-svn: 272919
2016-06-16 16:50:04 +00:00
Valery Pykhtin 02e2086e41 [AMDGPU] Fix few coding style issues. NFC.
llvm-svn: 272785
2016-06-15 13:55:09 +00:00
Nicolai Haehnle a609259832 AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics
Summary:
This fixes two related bugs. First, the generic optimization passes
unfortunately generate negative constant offsets but the hardware treats
SOffset as an unsigned value.

Second, there is a hardware bug on SI and CI, where address clamping in MUBUF
instructions does not work correctly when SOffset is larger than the buffer
size. This patch works around this bug by never using SOffset.

An alternative workaround would be to do the clamping manually when SOffset
is too large, but generating the required code sequence during instruction
selection would be rather involved, and in any case the resulting code would
probably be worse.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21326

llvm-svn: 272761
2016-06-15 07:13:05 +00:00
Tom Stellard 82785e9fe7 AMDGPU/SI: Correctly encode constant expressions
Summary:
We we have an MCConstantExpr, we can encode it directly into the instruction
instead of emitting fixups.

Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov, arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21236

Change-Id: I88b3edf288d48e65c5d705fc4850d281f8e36948
llvm-svn: 272750
2016-06-15 03:09:39 +00:00
Tom Stellard 89049702ce AMDGPU/AsmParser: Add support for parsing symbol operands
Summary:
We can now reference symbols directly in operands, like this:
s_mov_b32 s0, global

Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21038

llvm-svn: 272748
2016-06-15 02:54:14 +00:00
Matt Arsenault f42c69206d AMDGPU: Run pointer optimization passes
llvm-svn: 272736
2016-06-15 00:11:01 +00:00
Peter Collingbourne 96efdd6107 IR: Introduce local_unnamed_addr attribute.
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.

This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
  the normal rule that the global must have a unique address can be broken without
  being observable by the program by performing comparisons against the global's
  address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
  its own copy of the global if it requires one, and the copy in each linkage unit
  must be the same)
- It is a constant or a function (which means that the program cannot observe that
  the unique-address rule has been broken by writing to the global)

Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.

See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.

Part of the fix for PR27553.

Differential Revision: http://reviews.llvm.org/D20348

llvm-svn: 272709
2016-06-14 21:01:22 +00:00
Tom Stellard bf3e6e5bb4 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Re-commit this after fixing a bug where we were trying to use a
reference to a Triple object that had already been destroyed.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272705
2016-06-14 20:29:59 +00:00
Tom Stellard b1a523fa68 Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables"
This reverts commit r272675.

llvm-svn: 272677
2016-06-14 15:16:35 +00:00
Tom Stellard 5e6298b0f2 AMDGPU/SI: Refactor fixup handling for constant addrspace variables
Summary:
We now use a standard fixup type applying the pc-relative address of
constant address space variables, and we have the GlobalAddress lowering
code add the required 4 byte offset to the global address rather than
doing it as part of the fixup.

This refactoring will make it easier to use the same code for global
address space variables and also simplifies the code.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21154

llvm-svn: 272675
2016-06-14 15:11:01 +00:00
Artem Tamazov 17091364d1 [AMDGPU][llvm-mc] Predefined symbols to access -mcpu from the assembly source (.option.machine_version...)
The feature allows for conditional assembly etc.
TODO: make those symbols read-only.
Test added.

Differential Revision: http://reviews.llvm.org/D21238

llvm-svn: 272673
2016-06-14 15:03:59 +00:00
Marek Olsak e93f6d6923 AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1

This makes one particular compute shader 8x faster.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21136

llvm-svn: 272556
2016-06-13 16:05:57 +00:00
Matt Arsenault 80bc355048 AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAlloc
The condition reg of the cndmask_b64 expansion can't be killed by
the first one, and the implicit super register implicit def is needed.

llvm-svn: 272554
2016-06-13 15:53:52 +00:00
Benjamin Kramer d3f4c05aea Move instances of std::function.
Or replace with llvm::function_ref if it's never stored. NFC intended.

llvm-svn: 272513
2016-06-12 16:13:55 +00:00
Benjamin Kramer bdc4956bac Pass DebugLoc and SDLoc by const ref.
This used to be free, copying and moving DebugLocs became expensive
after the metadata rewrite. Passing by reference eliminates a ton of
track/untrack operations. No functionality change intended.

llvm-svn: 272512
2016-06-12 15:39:02 +00:00
Tom Stellard f3af841462 AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21153

llvm-svn: 272417
2016-06-10 19:26:38 +00:00
Sam Kolton 945231ada5 [AMDGPU] AsmParser: Support for sext() modifier in SDWA. Some code cleaning in AMDGPUOperand.
Summary:
sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported.
Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier.
Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers.
Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...).

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20968

llvm-svn: 272384
2016-06-10 09:57:59 +00:00
Matt Arsenault 37fefd68cc AMDGPU: Fix trailing whitespace
llvm-svn: 272364
2016-06-10 02:18:02 +00:00
Matt Arsenault 58ddad5bd6 AMDGPU: v_cndmask_b32 does not def vcc
Fixes verifier errors after SIShrinkInstructions.

llvm-svn: 272351
2016-06-10 00:18:41 +00:00
Tom Stellard 26a2ab7477 AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permute
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.

Reviewers: scchan, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19994

llvm-svn: 272349
2016-06-10 00:01:04 +00:00
Tom Stellard 1d3940e8cc AMDGPU/SI: Use common topological sort algorithm in SIScheduleDAGMI
Reviewers: arsenm, axeldavy

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19823

llvm-svn: 272346
2016-06-09 23:48:02 +00:00
Matt Arsenault 7757c59e48 AMDGPU: Fix flat atomics
The flat atomics could already be selected, but only
when using flat instructions for global memory. Add
patterns for flat addresses.

llvm-svn: 272345
2016-06-09 23:42:54 +00:00
Matt Arsenault 887018179a AMDGPU: Fix i64 global cmpxchg
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.

There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.

llvm-svn: 272344
2016-06-09 23:42:48 +00:00
Matt Arsenault e2bd9a32bc AMDGPU: Run verifer after insert waits pass
llvm-svn: 272338
2016-06-09 23:19:14 +00:00
Matt Arsenault cfb61e780a AMDGPU: Remove incorrect assertion
I'm still not sure under what circumstances the offset here is non-0,
but private memory is not limited to 27-bits.

llvm-svn: 272337
2016-06-09 23:19:08 +00:00
Matt Arsenault c3a01ec9db AMDGPU: Properly initialize SIShrinkInstructions
llvm-svn: 272336
2016-06-09 23:18:47 +00:00
Wei Ding ed0f97fad2 AMDGPU/SI: Fix 32-bit fdiv lowering
We were using the fast fdiv lowering for all division, implementation of
IEEE754 fdiv is added.

http://reviews.llvm.org/D20557

llvm-svn: 272292
2016-06-09 19:17:15 +00:00
Sam Kolton c9bdcb75c4 [AMDGPU] Disassembler: Support for sdwa instructions
Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21129

llvm-svn: 272255
2016-06-09 11:04:45 +00:00
Nicolai Haehnle c00e03b8f5 AMDGPU: Add amdgpu-ps-wqm-outputs function attributes
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.

The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.

Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D20839

llvm-svn: 272063
2016-06-07 21:37:17 +00:00
Eric Christopher 538d09d0dd Revert "Differential Revision: http://reviews.llvm.org/D20557"
Author: Wei Ding <wei.ding2@amd.com>
Date:   Tue Jun 7 19:04:44 2016 +0000

    Differential Revision: http://reviews.llvm.org/D20557

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044
    91177308-0d34-0410-b5e6-96231b3b80d8

as it was breaking the bots.

This reverts commit r272044.

llvm-svn: 272056
2016-06-07 20:27:12 +00:00
Wei Ding a70216f1b3 Differential Revision: http://reviews.llvm.org/D20557
llvm-svn: 272044
2016-06-07 19:04:44 +00:00
Matt Arsenault 02458c2d27 AMDGPU: Add function for getting instruction size
llvm-svn: 271936
2016-06-06 20:10:33 +00:00
Matt Arsenault 3b2e2a59e8 AMDGPU: Fix constantexpr addrspacecasts
If we had a constant group address space cast the queue pointer
wasn't enabled for the function, resulting in a crash on noreg
later.

llvm-svn: 271935
2016-06-06 20:03:31 +00:00
Artem Tamazov 135487767b [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC.
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.

Differential Revision: http://reviews.llvm.org/D20796

llvm-svn: 271900
2016-06-06 15:23:43 +00:00
Artem Tamazov f88397c84c [test/AMDGPU] Square-braced-syntax for registers: add macro test/example.
Test added as per discussion in http://reviews.llvm.org/D20588.
The macro is just a demonstration, useless in practice.
Coding style fixes.

Differential Revision: http://reviews.llvm.org/D20797

llvm-svn: 271675
2016-06-03 14:41:17 +00:00
Sam Kolton a4a99ad1bc [AMDGPU] Assembler: More tests for SDWA instructions. Fix for SDWA float modifiers.
Summary: Depends on D20625

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20674

llvm-svn: 271662
2016-06-03 11:43:09 +00:00
Sam Kolton 05ef1c940f [AMDGPU] Assembler: Custom converters for SDWA instructions. Support for _dpp and _sdwa suffixes in mnemonics.
Summary:
Added custom converters for SDWA instruction to support optional operands and modifiers.
Support for _dpp and _sdwa suffixes that allows to force DPP or SDWA encoding for instructions.

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20625

llvm-svn: 271655
2016-06-03 10:27:37 +00:00
Matt Arsenault 43578ec2a8 AMDGPU: Handle flat in getMemOpBaseRegImmOfs
It can still report the base register, and the uses give up when it
fails.

llvm-svn: 271575
2016-06-02 20:05:20 +00:00
Matt Arsenault d1097a38e2 AMDGPU: Cleanup load tests
There are a lot of different kinds of loads to test for,
and these were scattered around inconsistently with
some redundancy. Try to comprehensively test all loads
in a consistent way.

llvm-svn: 271571
2016-06-02 19:54:26 +00:00
Matt Arsenault 52dec8d36a AMDGPU: Temporary fix for broken store combine
llvm-svn: 271567
2016-06-02 19:00:55 +00:00
Matt Arsenault 8e00194be8 AMDGPU: Fix crashes on unknown processor name
If the processor name failed to parse for amdgcn,
the resulting output would have R600 ISA in it.

If the processor name was missing or invalid for R600,
the wavefront size would not be set and there would be
crashes from missing itinerary data.

Fixes crashes in future commit caused by dividing by the unset/0
wavefront size.

llvm-svn: 271561
2016-06-02 18:37:16 +00:00
Matt Arsenault 598f55387a AMDGPU: Fix incorrectly setting kill flag when copying register tuples
This fixes some verifier errors when trackLivenessAfterRegAlloc is
enabled.

llvm-svn: 271446
2016-06-02 00:04:30 +00:00
Matt Arsenault d3e4c646ea AMDGPU: SIDebuggerInsertNops preserves CFG
This saves an additional run of the DominatorTree and
MachineLoopInfo

llvm-svn: 271444
2016-06-02 00:04:22 +00:00
Matt Arsenault ec30eb507e AMDGPU: Remove unused address space
Also return a single StringRef instead of building a string.

llvm-svn: 271296
2016-05-31 16:57:45 +00:00
Matt Arsenault d8d304d1d6 AMDGPU: Fix trailing whitespace
llvm-svn: 271081
2016-05-28 00:50:51 +00:00
Matt Arsenault 7401516985 AMDGPU: Add fract intrinsic
Remove broken patterns matching it. This was matching the
unsafe math pattern and expanding the fix for the buggy instruction
from the pattern. The problems are also on CI. Remove the workarounds
and only use fract with unsafe math or from the intrinsic.

llvm-svn: 271078
2016-05-28 00:19:52 +00:00
Artem Tamazov 7da9b82e02 [AMDGPU][llvm-mc] Square-braced-syntax for registers - make ":expr2" optional.
Register numbers may be specified as assembly-time expressions.
This feature can be useful in macros and alike. However, expressions
are supported within sqare braces only.

Sqare braces were initially intended to support specifying of multiple
(pairs/quads...) registers. Syntax like v[8:8] which specifies single register
is also supported. That allows expressions but looks a bit unnatural.

This change supports syntax REG[EXPR].
Tests added.

Differential Revision: http://reviews.llvm.org/D20588

llvm-svn: 270990
2016-05-27 12:50:13 +00:00
Benjamin Kramer 4fed928f53 Avoid some copies by using const references.
clang-tidy's performance-unnecessary-copy-initialization with some manual
fixes. No functional changes intended.

llvm-svn: 270988
2016-05-27 12:30:51 +00:00
Benjamin Kramer 3e9a5d3468 Apply clang-tidy's misc-static-assert where it makes sense.
Also fold conditions into assert(0) where it makes sense. No functional
change intended.

llvm-svn: 270982
2016-05-27 11:36:04 +00:00
Changpeng Fang 71369b3a39 AMDGPU/SI: Enable load-store-opt by default.
Summary: Enable load-store-opt by default, and update LIT tests.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D20694

llvm-svn: 270894
2016-05-26 19:35:29 +00:00
Artem Tamazov 6edc135d0f [AMDGPU][llvm-mc] s_getreg/setreg* - hwreg - factor out strings/literals etc.
Hwreg(...) syntax implementation unified with sendmsg(...).
Common strings moved to Utils
MathExtras.h functionality utilized.
Added missing build dependency in Disassembler.

Differential Revision: http://reviews.llvm.org/D20381

llvm-svn: 270871
2016-05-26 17:00:33 +00:00
Artem Tamazov b49c3361e5 Fix build warning introduced in r270552 "[AMDGPU][llvm-mc] Disassembler: support for TTMP/TBA/TMA registers."
llvm-svn: 270859
2016-05-26 15:52:16 +00:00
Diana Picus 81bc3170e8 [AMDGPU] Remove exit-on-error flag from test (PR27762)
Similar to r269948, but for argument lowering.

Fixes PR27762

Differential Revision: http://reviews.llvm.org/D20430

llvm-svn: 270856
2016-05-26 15:24:55 +00:00
Matt Arsenault e57206d81b AMDGPU: Fix v2i64/v2f64 bitcasts
These operations tend to get promoted away to v4i32 so
this doesn't happen often.

llvm-svn: 270740
2016-05-25 18:07:36 +00:00
Matt Arsenault 1cc4991412 AMDGPU: Fix inconsistent lowering of select of vectors
f32 vectors would use a sequence of BFI instructions instead
of unrolled cmp + select. This was better in the case of a VALU
select with SGPR inputs, but we don't have a way of dealing with that
in the DAG.

llvm-svn: 270731
2016-05-25 17:34:58 +00:00
Nirav Dave e003bb7ead Soften assertion in AMDGPU emitPrologue.
[AMDGPU] emitPrologue looks for an unused unallocated SGPR that is not
the scratch descriptor. Continue search if unused register found fails
other requirements.

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D20526

llvm-svn: 270646
2016-05-25 01:45:42 +00:00
Konstantin Zhuravlyov 29ddd2b2f2 [AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs
Differential Revision: http://reviews.llvm.org/D20081

llvm-svn: 270594
2016-05-24 18:37:18 +00:00
Sam Kolton 11de370cca [AMDGPU] Assembler: rework parsing of optional operands.
Summary:
Change process of parsing of optional operands. All optional operands use same parsing method - parseOptionalOperand().
No default values are added to OperandsVector.
Get rid of WORKAROUND_USE_DUMMY_OPERANDS_INSTEAD_MUTIPLE_DEFAULT_OPERANDS.

Reviewers: tstellarAMD, vpykhtin, artem.tamazov, nhaustov

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D20527

llvm-svn: 270556
2016-05-24 12:38:33 +00:00
Artem Tamazov 212a251c8d [AMDGPU][llvm-mc] Disassembler: support for TTMP/TBA/TMA registers.
Differential Revision: http://reviews.llvm.org/D20476

llvm-svn: 270552
2016-05-24 12:05:16 +00:00
Sam Kolton 1bdcef7697 [AMDGPU] Assembler: refactor parsing of modifiers and immediates. Allow modifiers for imms.
Reviewers: nhaustov, tstellarAMD

Subscribers: kzhuravl, arsenm

Differential Revision: http://reviews.llvm.org/D20166

llvm-svn: 270415
2016-05-23 09:59:02 +00:00
Matt Arsenault 7f9eabd2c2 AMDGPU: Define priorities for register classes
Allocating larger register classes first should give better allocation
results (and more importantly for myself, make the lit tests more stable
with respect to scheduler changes).

Patch by Matthias Braun

llvm-svn: 270312
2016-05-21 03:55:07 +00:00
Matt Arsenault 71e6676169 AMDGPU: Cleanup lowering actions
These are kind of a mess and hard to follow, particularly
for loads and stores. Fix various redundant, unnecessary
and dead settings.

llvm-svn: 270307
2016-05-21 02:27:49 +00:00
Matt Arsenault 81a709503d AMDGPU: Fix high bits after division optimization
This is essentially doing a 24-bit signed division with FP.
We need to truncate to the N bit result.

llvm-svn: 270305
2016-05-21 01:53:33 +00:00
Matt Arsenault b6e1cc2a92 AMDGPU: Fix verifier error when spilling SGPRs
The current SGPR spilling test does not stress this
because it is using s_buffer_load instructions to
increase SGPR pressure and spill, but their output
operands have the same SReg_32_XM0 constraint. This fixes
an error when the SReg_32 output from most instructions
is spilled.

llvm-svn: 270301
2016-05-21 00:53:42 +00:00
Matt Arsenault 8f5e008534 AMDGPU: Fix relationship between SReg_32 and SReg_32_XM0
llvm-svn: 270300
2016-05-21 00:53:28 +00:00
Matt Arsenault 4945905f5f AMDGPU: Handle cbranch vccz/vccnz
llvm-svn: 270297
2016-05-21 00:29:40 +00:00
Matt Arsenault 72fcd5f597 AMDGPU: Implement ReverseBranchCondition
llvm-svn: 270296
2016-05-21 00:29:34 +00:00
Matt Arsenault 6d09380532 AMDGPU: Implement AnalyzeBranch
Original patch by Tom Stellard

llvm-svn: 270295
2016-05-21 00:29:27 +00:00
Matt Arsenault 4e3d383c46 AMDGPU: Remove pointless conversions
llvm-svn: 270139
2016-05-19 21:09:58 +00:00
Matt Arsenault 4318ea354a AMDGPU: Also look for s_cbranch_vccz
llvm-svn: 270091
2016-05-19 18:20:25 +00:00
Artem Tamazov 8ce1f7177b [AMDGPU][llvm-mc] Fixes to support buffer atomics.
Fixes for MUBUF_Atomic instructions to make operand list valid:
 - For RTN insns, make a copy of $vdata_in operand as $vdata.
 - Do not add operand for GLC, it is hardcoded and comes as a token.
Workaround to avoid adding multiple default optional operands.
Tests added.

Differential Revision: http://reviews.llvm.org/D20257

llvm-svn: 270049
2016-05-19 12:22:39 +00:00
Matt Arsenault c5bebac934 AMDGPU: Fix verifier error when spilling undef subreg
llvm-svn: 270002
2016-05-18 23:35:53 +00:00
Matt Arsenault c438ef574d AMDGPU: Fix promote alloca for pointer loads
If the load has a pointer type, we don't want to change
its type.

llvm-svn: 270000
2016-05-18 23:20:24 +00:00
Rafael Espindola 8c34dd8257 Delete Reloc::Default.
Having an enum member named Default is quite confusing: Is it distinct
from the others?

This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.

llvm-svn: 269988
2016-05-18 22:04:49 +00:00
Jan Vesely ae265c03f7 AMDGPU: Fix incorrect simm check
Use signed division otherwise all back jumps fail the check
Fixes regression introduced in r269951

Differential Revision: http://reviews.llvm.org/D20380

llvm-svn: 269972
2016-05-18 19:07:58 +00:00
Matt Arsenault a519cf593f AMDGPU: Error if branch distance exceeds limit
llvm-svn: 269951
2016-05-18 16:10:24 +00:00
Matt Arsenault 1735da460b AMDGPU: Other sizes of popcnt are fast
We can chain bcnt instructions together, so
any width popcnt is pretty fast.

llvm-svn: 269950
2016-05-18 16:10:19 +00:00
Matt Arsenault 9430b9113a AMDGPU: Fix assert when erroring on a call
For some reason an assert is now hit when a valid chain
is not returned, so return the entry chain.

llvm-svn: 269948
2016-05-18 16:10:11 +00:00
Matt Arsenault 891fccc0c1 AMDGPU: Handle alloca promoting with null operands
If the second pointer in a multi-pointer instruction is
a constant, we can replace the type.

llvm-svn: 269945
2016-05-18 15:57:21 +00:00
Matt Arsenault bde80346c1 AMDGPU: Don't run passes that aren't useful
llvm-svn: 269943
2016-05-18 15:41:07 +00:00
Matt Arsenault ab3429c2b4 AMDGPU: Fix assert on ttmp registers
Use register class that does not include them when looking
for unallocated registers.

This is hit by the udiv v8i64 test in the opencl integer
conformance test, and takes a few seconds to compile in
a debug build so no test included.

llvm-svn: 269938
2016-05-18 15:19:50 +00:00
Jan Vesely 687ca8df18 AMDGPU/R600: Use correct number of vector elements when lowering private loads
Reviewer: tstellardAMD, arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D20032

llvm-svn: 269725
2016-05-16 23:56:32 +00:00
Matt Arsenault 8a028bf4d7 AMDGPU: Fix promote alloca pass creating huge arrays
This was assuming it could use all memory before, which is
a bad decision because it restricts occupancy.

By default, only try to use enough space that could reduce
occupancy to 7, an arbitrarily chosen limit.

Based on the exist LDS usage, try to round up to the limit
in the current tier instead of further hurting occupancy.
This isn't ideal, because it doesn't accurately know how much
space is going to be used for alignment padding.

llvm-svn: 269708
2016-05-16 21:19:59 +00:00
Jan Vesely 91aacad9c3 AMDGPU: Unify LowerGlobalAddress
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19794

llvm-svn: 269481
2016-05-13 20:39:34 +00:00
Jan Vesely 1680039a7a AMDGPU/R600: Fold global address operand
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19793

llvm-svn: 269480
2016-05-13 20:39:31 +00:00
Jan Vesely f97de00745 AMDGPU/R600: Implement memory loads from constant AS
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19792

llvm-svn: 269479
2016-05-13 20:39:29 +00:00
Jan Vesely a1f9fdfcbc AMDGPU/R600: Add support for emitting MCExpr
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19791

llvm-svn: 269478
2016-05-13 20:39:26 +00:00
Jan Vesely 7971464da8 AMDGPU: Add support for MCExpr to instruction printer
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19790

llvm-svn: 269477
2016-05-13 20:39:24 +00:00
Jan Vesely 4368c1cf7e AMDGPU/R600: Use machine operands instead of ints to track literals
This will be used for global addresses

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19789

llvm-svn: 269476
2016-05-13 20:39:22 +00:00
Jan Vesely fac8d7ecb1 AMDGPU/R600: There are other uses for ALU_LITERAL besides Imm
This will be used for GV

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19788

llvm-svn: 269475
2016-05-13 20:39:20 +00:00
Jan Vesely fbcb754e66 AMDGPU: Make CONST_DATA_PTR available to R600
Rename to AMDGPUconstdata_ptr

Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19786

llvm-svn: 269474
2016-05-13 20:39:18 +00:00
Jan Vesely 81f1b30035 AMDGPU/EG,CM: Add instruction to read from constant AS (VTX2)
Reviewers: tstellard

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19785

llvm-svn: 269473
2016-05-13 20:39:16 +00:00
Konstantin Zhuravlyov e3d322af57 [AMDGPU] Update nop insertion for debugger usage
- Insert one nop for each high level statement instead of two
- Do not insert nop before prologue

Differential Revision: http://reviews.llvm.org/D20215

llvm-svn: 269452
2016-05-13 18:21:28 +00:00
Matt Arsenault 999f7dd84c AMDGPU: Remove verifier check for scc live ins
We only really need this to be true for SIFixSGPRCopies.
I'm not sure there's any way this could happen before that point.

Fixes a case where MachineCSE could introduce a cross block
scc use.

llvm-svn: 269391
2016-05-13 04:15:48 +00:00
Justin Bogner 95927c0fd0 SDAG: Implement Select instead of SelectImpl in AMDGPUDAGToDAGISel
- Where we were returning a node before, call ReplaceNode instead.
- Where we would return null to fall back to another selector, rename
  the method to try* and return a bool for success.
- Where we were calling SelectNodeTo, just return afterwards.

Part of llvm.org/pr26808.

llvm-svn: 269349
2016-05-12 21:03:32 +00:00
Matt Arsenault 8300272823 AMDGPU: Fix getIntegerAttribute type and error message
llvm-svn: 269268
2016-05-12 02:45:18 +00:00
Matt Arsenault a61cb48dd2 AMDGPU: Fix breaking IR on instructions with multiple pointer operands
The promote alloca pass would attempt to promote an alloca with
a select, icmp, or phi user, even though the other operand was
from a non-promotable source, producing a select on two different
pointer types.

Only do this if we know that both operands derive from the same
alloca. In the future we should be able to relax this to an alloca
which will also be promoted.

llvm-svn: 269265
2016-05-12 01:58:58 +00:00
Matt Arsenault 4234542503 AMDGPU: Make some instructions convergent
llvm-svn: 269147
2016-05-11 00:32:31 +00:00
Matt Arsenault e8ed8e59e5 AMDGPU: Change private_element_size to 4
llvm-svn: 269145
2016-05-11 00:28:54 +00:00
Peter Collingbourne dba995601b Cloning: Clean up the interface to the CloneFunction function.
Remove the ModuleLevelChanges argument, and the ability to create new
subprograms for cloned functions. The latter was added without review in
r203662, but it has no in-tree clients (all non-test callers pass false
for ModuleLevelChanges [1], so it isn't reachable outside of tests). It
also isn't clear that adding a duplicate subprogram to the compile unit is
always the right thing to do when cloning a function within a module. If
this functionality comes back it should be accompanied with a more concrete
use case.

Furthermore, all in-tree clients add the returned function to the module.
Since that's pretty much the only sensible thing you can do with the function,
just do that in CloneFunction.

[1] http://llvm-cs.pcc.me.uk/lib/Transforms/Utils/CloneFunction.cpp/rCloneFunction

Differential Revision: http://reviews.llvm.org/D18628

llvm-svn: 269110
2016-05-10 20:23:24 +00:00
Konstantin Zhuravlyov a791932145 [AMDGPU][NFC] Rename SIInsertNops -> SIDebuggerInsertNops
Differential Revision: http://reviews.llvm.org/D20117

llvm-svn: 269098
2016-05-10 18:33:41 +00:00
Matthias Braun 31d19d43c7 CodeGen: Move TargetPassConfig from Passes.h to an own header; NFC
Many files include Passes.h but only a fraction needs to know about the
TargetPassConfig class. Move it into an own header. Also rename
Passes.cpp to TargetPassConfig.cpp while we are at it.

llvm-svn: 269011
2016-05-10 03:21:59 +00:00
Simon Pilgrim 0a81921cdb Fixed unused but set variable warning
llvm-svn: 268931
2016-05-09 16:42:23 +00:00
Matt Arsenault a949dc619c AMDGPU: Fold shift into cvt_f32_ubyteN
llvm-svn: 268930
2016-05-09 16:29:50 +00:00
Artem Tamazov f0b6b40fa4 [AMDGPU][llvm-mc] Some refactoring of .td files
Some custom Operands and AsmOperandClasses moved to proper place.
No functional changes.

Differential Revision: http://reviews.llvm.org/D20012

llvm-svn: 268780
2016-05-06 19:32:38 +00:00
Artem Tamazov ebe71ce36a [AMDGPU][llvm-mc] Add support for sendmsg(...) syntax.
Added support for sendmsg(MSG[, OP[, STREAM_ID]]) syntax
in s_sendmsg and s_sendmsghalt instructions.
The syntax matches the SP3 assembler/disassembler rules.
That is why implicit inputs (like M0 and EXEC) are not printed
to disassembly output anymore.

sendmsg(...) allows only known message types and attributes,
even if literals are used instead of symbolic names.
However, raw literal (without "sendmsg") still can be used,
and that allows for any 16-bit value.

Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19596

llvm-svn: 268762
2016-05-06 17:48:48 +00:00
Nikolay Haustov 6eb050ea4e Revert "AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2."
This reverts commit 47486d52454d60cdf6becc0b2efe533c73794380.

It broke calling OpenCL kernel from another kernel.

llvm-svn: 268739
2016-05-06 14:59:04 +00:00
Sam Kolton 5f10a137d0 [TableGen] AsmMatcher: support for default values for optional operands
Summary:
This change allows to specify "DefaultMethod" for optional operand (IsOptional = 1) in AsmOperandClass that return default value for operand. This is used in convertToMCInst to set default values in MCInst.
Previously if you wanted to set default value for operand you had to create custom converter method. With this change it is possible to use standard converters even when optional operands presented.

Reviewers: tstellarAMD, ab, craig.topper

Subscribers: jyknight, dsanders, arsenm, nhaustov, llvm-commits

Differential Revision: http://reviews.llvm.org/D18242

llvm-svn: 268726
2016-05-06 11:31:17 +00:00
Nikolay Haustov dc1bb79b92 AMDGPU/SI: Add amdgpu_kernel calling convention. Part 2.
Summary:
    Check calling convention in AMDGPUMachineFunction::isKernel

    This will be used for AMDGPU_HSA_KERNEL symbol type in output ELF.

    Also, in the future unused non-kernels may be optimized.

    Reviewers: tstellarAMD, arsenm

    Subscribers: arsenm, joker.eph, llvm-commits

    Differential Revision: http://reviews.llvm.org/D19917

llvm-svn: 268719
2016-05-06 09:23:13 +00:00
Justin Bogner b012699741 SDAG: Rename Select->SelectImpl and repurpose Select as returning void
This is a step towards removing the rampant undefined behaviour in
SelectionDAG, which is a part of llvm.org/PR26808.

We rename SelectionDAGISel::Select to SelectImpl and update targets to
match, and then change Select to return void and consolidate the
sketchy behaviour we're trying to get away from there.

Next, we'll update backends to implement `void Select(...)` instead of
SelectImpl and eventually drop the base Select implementation.

llvm-svn: 268693
2016-05-05 23:19:08 +00:00
Matt Arsenault 539ca882c6 AMDGPU: Simplify control flow / conditions
llvm-svn: 268676
2016-05-05 20:27:02 +00:00
Nicolai Haehnle ffbd56a1c9 AMDGPU: Uniform branch conditions can originate with intrinsics
Summary:
Discovered by Dave Airlie, fixes an assertion in Khronos OpenGL CTS
GL43-CTS.shader_storage_buffer_object.advanced-matrix.

In this particular case, the buffer load intrinsic fed into a uniform
conditional branch, and led the brcond lowering down the wrong path.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19931

llvm-svn: 268650
2016-05-05 17:36:36 +00:00
Tom Stellard fcfaea4cff AMDGPU/SI: Add support for AMD code object version 2.
Summary:
Version 2 is now the default.  If you want to emit version 1, use
the amdgcn--amdhsa-amdcov1 triple.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19283

llvm-svn: 268647
2016-05-05 17:03:33 +00:00
Jan Vesely bbc2231983 AMDGPU/R600: Minor cleanup in InstrInfo
Use std::make_pair instead of constructor
Use C++11 loop
Reuse helper var

Reviewers: tstellardAMD

Subsribers: arsenm

Differential Revision: http://reviews.llvm.org/D19787

llvm-svn: 268503
2016-05-04 14:55:45 +00:00
Tom Stellard 4a304b3886 AMDGPU/SI: Use range loops to simplify some code in the SI Scheduler
Reviewers: arsenm, axeldavy

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19822

llvm-svn: 268396
2016-05-03 16:30:56 +00:00
Aaron Ballman 3bd56b3b43 Silence unused variable warning; NFC.
llvm-svn: 268392
2016-05-03 15:17:25 +00:00
Matt Arsenault bcdfee7030 AMDGPU: Custom lower v2i32 loads and stores
This will allow us to split up 64-bit private accesses when
necessary.

llvm-svn: 268296
2016-05-02 20:13:51 +00:00
Tom Stellard 154c9cdd24 AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratch
We were using v_readlane_b32 with the lane set to zero, but this won't
work if thread 0 is not active.

Differential Revision: http://reviews.llvm.org/D19745

llvm-svn: 268295
2016-05-02 20:11:44 +00:00
Matt Arsenault 2b957b5a6f AMDGPU: Make i64 loads/stores promote to v2i32
Now that unaligned access expansion should not attempt
to produce i64 accesses, we can remove the hack in
PreprocessISelDAG where this is done.

This allows splitting i64 private accesses while
allowing the new add nodes indexing the vector components
can be folded with the base pointer arithmetic.

llvm-svn: 268293
2016-05-02 20:07:26 +00:00
Reid Kleckner 0549ab6033 Fix instance of -Winconsistent-missing-override in AMDGPU code
llvm-svn: 268289
2016-05-02 19:45:10 +00:00
Tom Stellard ce5e994887 AMDGPU/SI: Set the kill flag on temp VGPRs used to restore SGPRs from scratch
Summary:
When we restore an SGPR value from scratch, we first load it into a
temporary VGPR and then use v_readlane_b32 to copy the value from the
VGPR back into an SGPR.

We weren't setting the kill flag on the VGPR in the v_readlane_b32
instruction, so the register scavenger wasn't able to re-use this
temp value later.

I wasn't able to create a lit test for this.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19744

llvm-svn: 268287
2016-05-02 19:37:56 +00:00
Tom Stellard 27233b727f AMDGPU: Move R600 specific code out of AMDGPUISelLowering.cpp
Reviewers: arsenm

Subscribers: jvesely, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19736

llvm-svn: 268267
2016-05-02 18:05:17 +00:00
Tom Stellard 341e293d67 AMDGPU/SI: Fix bug in SIInstrInfo::insertWaitStates() uncovered by r268260
We can't use MI->getDebugLoc() when MI is an iterator that could be
MBB.end().

llvm-svn: 268265
2016-05-02 18:02:24 +00:00
Tom Stellard 1f520e5c98 AMDGPU/SI: Use the hazard recognizer to break SMEM soft clauses
Summary:
Add support for detecting hazards in SMEM soft clauses, so that we only
break the clauses when necessary, either by adding s_nop or re-ordering
other alu instructions.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18870

llvm-svn: 268260
2016-05-02 17:39:06 +00:00
Nicolai Haehnle 119d3d80cb AMDGPU: llvm.SI.fs.constant is a source of divergence
Summary:
This intrinsic is used to get flat-shaded fragment shader inputs. Those are
uniform across a primitive, but a fragment shader wave may process pixels from
multiple primitives (as indicated by the prim_mask), and so that's where
divergence can arise.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19747

llvm-svn: 268259
2016-05-02 17:37:01 +00:00
Tom Stellard a27007eb4f AMDGPU/SI: Use hazard recognizer to detect DPP hazards
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18603

llvm-svn: 268247
2016-05-02 16:23:09 +00:00
Aaron Ballman 5c190d056d Silence unused variable warnings; NFC.
llvm-svn: 268234
2016-05-02 14:48:03 +00:00
Rafael Espindola 92dd7b82be Add missing override.
llvm-svn: 268163
2016-04-30 15:18:21 +00:00
Tom Stellard c51e4468b7 AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaits
This was supposed to be part of r268143.

llvm-svn: 268154
2016-04-30 04:04:48 +00:00
Tom Stellard cb6ba62d6f AMDGPU/SI: Enable the post-ra scheduler
Summary:
This includes a hazard recognizer implementation to replace some of
the hazard handling we had during frame index elimination.

Reviewers: arsenm

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18602

llvm-svn: 268143
2016-04-30 00:23:06 +00:00
Matt Arsenault 701c21ea10 AMDGPU: Fix crash with unreachable terminators.
If a block has no successors because it ends in unreachable,
this was accessing an invalid iterator.

Also stop counting instructions that don't emit any
real instructions.

llvm-svn: 268119
2016-04-29 21:52:13 +00:00
Matt Arsenault dc4ebad6d4 AMDGPU: Add kernarg.segment.ptr intrinsic
llvm-svn: 268105
2016-04-29 21:16:52 +00:00
Matt Arsenault cf2744f1c8 AMDGPU/SI: Move post regalloc run of SIShrinkInstructions
Move to addPreEmitPass. This is so it runs after post-RA
scheduling so we can merge s_nops emitted by the scheduler
and hazard recognizer.

llvm-svn: 268095
2016-04-29 20:23:42 +00:00
Artem Tamazov 38e496b175 Fixed/Recommitted r267733 "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD."
Previously reverted by r267752.

r267733 review:
Differential Revision: http://reviews.llvm.org/D19342

llvm-svn: 268066
2016-04-29 17:04:50 +00:00
Tom Stellard 92b24f324b AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions
Summary:
These instructions can add an immediate offset to the address, like other
ds instructions.

Reviewers: arsenm

Subscribers: arsenm, scchan

Differential Revision: http://reviews.llvm.org/D19233

llvm-svn: 268043
2016-04-29 14:34:26 +00:00
Nikolay Haustov 4f672a34ed AMDGPU/SI: Assembler: Unify parsing/printing of operands.
Summary:
The goal is for each operand type to have its own parse function and
at the same time share common code for tracking state as different
instruction types share operand types (e.g. glc/glc_flat, etc).

Introduce parseAMDGPUOperand which can parse any optional operand.
DPP and Clamp/OMod have custom handling for now. Sam also suggested
to have class hierarchy for operand types instead of table. This
can be done in separate change.

Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps,
parseMubufOptionalOps, parseDPPOptionalOps.
Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class.
Rename AsmMatcher/InstPrinter methods accordingly.
Print immediate type when printing parsed immediate operand.
Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3).
Update tests.

Reviewers: tstellarAMD, SamWot, artem.tamazov

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19584

llvm-svn: 268015
2016-04-29 09:02:30 +00:00
Matt Arsenault 7d1b6c81af AMDGPU: Stop reporting an addressing mode for unknown addrspace
This was being treated the same as private, which has an immediate
offset. For unknown, it probably means it's for a computation not
actually being used for accessing memory, so it should not have a
nontrivial addressing mode.

llvm-svn: 268002
2016-04-29 06:25:10 +00:00
Matt Arsenault 1c4d0efe56 AMDGPU: Emit error if too much LDS is used
llvm-svn: 267922
2016-04-28 19:37:35 +00:00
Matt Arsenault c5fce69031 AMDGPU: Fix mishandling array allocations when promoting alloca
The canonical form for allocas is a single allocation of the array type.
In case we see a non-canonical array alloca, make sure we aren't
replacing this with an array N times smaller.

llvm-svn: 267916
2016-04-28 18:38:48 +00:00
Craig Topper 33772c5375 [CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior.
llvm-svn: 267853
2016-04-28 03:34:31 +00:00
Matt Arsenault 0547b016b1 AMDGPU: Account for globals in AMDGPUPromoteAlloca pass
Patch by Bas Nieuwenhuizen

llvm-svn: 267791
2016-04-27 21:05:08 +00:00
Chad Rosier 03e1647d19 Revert "[AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD."
This reverts commit r267733 due to a -Werror,-Wunused-function error.

llvm-svn: 267752
2016-04-27 18:29:11 +00:00
Reid Kleckner 7f0ae15e9d Silence a -Wdangling-else
llvm-svn: 267737
2016-04-27 16:46:33 +00:00
Artem Tamazov 3896f8f83d [AMDGPU][llvm-mc] Add support of TTMP quads. Rework M0 exclusion for SMRD.
Added support of TTMP quads.
Reworked M0 exclusion machinery for SMRD and similar instructions
to enable usage of TTMP registers in those instructions as destinations.
Tests added.

Differential Revision: http://reviews.llvm.org/D19342

llvm-svn: 267733
2016-04-27 16:20:23 +00:00
Nicolai Haehnle f66bdb5ea8 AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic
Summary:
So it appears that to guarantee some of the ordering requirements of a GLSL
memoryBarrier() executed in the shader, we need to emit an s_waitcnt.

(We can't use an s_barrier, because memoryBarrier() may appear anywhere in
the shader, in particular it may appear in non-uniform control flow.)

Reviewers: arsenm, mareko, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19203

llvm-svn: 267729
2016-04-27 15:46:01 +00:00
Artem Tamazov 5cd55b1784 [AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware registers.
Possibility to specify code of hardware register kept.
Disassemble to symbolic name, if name is known.
Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19335

llvm-svn: 267724
2016-04-27 15:17:03 +00:00
Ahmed Bougacha 128f8732a5 [CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI.
Differential Revision: http://reviews.llvm.org/D17176

llvm-svn: 267606
2016-04-26 21:15:30 +00:00
Konstantin Zhuravlyov 71515e57f9 [AMDGPU] Move reserved vgpr count for trap handler usage to SIMachineFunctionInfo + minor commenting changes
Differential Revision: http://reviews.llvm.org/D19537

llvm-svn: 267573
2016-04-26 17:24:40 +00:00
Konstantin Zhuravlyov 1d99c4d03c [AMDGPU] Reserve VGPRs for trap handler usage if instructed
Differential Revision: http://reviews.llvm.org/D19235

llvm-svn: 267563
2016-04-26 15:43:14 +00:00
Sam Kolton 3025e7f25f [AMDGPU] Assembler: basic support for SDWA instructions
Support for SDWA instructions for VOP1 and VOP2 encoding.
Not done yet:
  - converters for support optional operands and modifiers
  - VOPC
  - sext() modifier
  - intrinsics
  - VOP2b (see vop_dpp.s)
  - V_MAC_F32 (see vop_dpp.s)

Differential Revision: http://reviews.llvm.org/D19360

llvm-svn: 267553
2016-04-26 13:33:56 +00:00
Andrew Kaylor 7de74af929 Add optimization bisect opt-in calls for AMDGPU passes
Differential Revision: http://reviews.llvm.org/D19450

llvm-svn: 267485
2016-04-25 22:23:44 +00:00
Matt Arsenault 074ea2851c AMDGPU/SI: Optimize adjacent s_nop instructions
Use the operand for how long to wait. This is somewhat
distasteful, since it would be better to just emit s_nop
with the right argument in the first place. This would require
changing TII::insertNoop to emit N operands, which would be easy.
Slightly more problematic is the post-RA scheduler and hazard recognizer
represent nops as a single null node, and would require inventing
another way of representing N nops.

llvm-svn: 267456
2016-04-25 19:53:22 +00:00
Matt Arsenault 99c14524ec AMDGPU: Implement addrspacecast
llvm-svn: 267452
2016-04-25 19:27:24 +00:00
Matt Arsenault 48ab526f12 AMDGPU: Add queue ptr intrinsic
llvm-svn: 267451
2016-04-25 19:27:18 +00:00
Matt Arsenault dfaf4261ab AMDGPU: Add DAG to debug dump
Also reorder case to match enum order

llvm-svn: 267449
2016-04-25 19:27:09 +00:00
Etienne Bergeron 06c14ec31e Fix incorrect redundant expression in target AMDGPU.
Summary:
The expression is detected as a redundant expression.
Turn out, this is probably a bug.

```
/home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression]
  if (isSMRD(*FirstLdSt) && isSMRD(*FirstLdSt)) {
```

Reviewers: rnk, tstellarAMD

Subscribers: arsenm, cfe-commits

Differential Revision: http://reviews.llvm.org/D19460

llvm-svn: 267415
2016-04-25 15:06:33 +00:00
Artem Tamazov d6468666b5 [AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax.
Added hwreg(reg[,offset,width]) syntax.
Default offset = 0, default width = 32.
Possibility to specify 16-bit immediate kept.
Added out-of-range checks.
Disassembling is always to hwreg(...) format.
Tests updated/added.

Differential Revision: http://reviews.llvm.org/D19329

llvm-svn: 267410
2016-04-25 14:13:51 +00:00
Craig Topper 855d182656 Fix a couple assertions that can never fire because they just contained the text string which always evaluates to true. Add a ! so they'll evaluate to false.
llvm-svn: 267312
2016-04-24 02:01:25 +00:00
Matt Arsenault 7e8de01f84 AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.Size
llvm-svn: 267244
2016-04-22 22:59:16 +00:00
Matt Arsenault efa3fe14d1 AMDGPU: Re-visit nodes in performAndCombine
This fixes test regressions when i64 loads/stores are made promote.

llvm-svn: 267240
2016-04-22 22:48:38 +00:00
Konstantin Zhuravlyov a40d8358e7 [AMDGPU] Insert nop pass: take care of outstanding feedback
- Switch few loops to range-based for loops
- Fix nop insertion at the end of BB
- Fix formatting
- Check for endpgm

Differential Revision: http://reviews.llvm.org/D19380

llvm-svn: 267167
2016-04-22 17:04:51 +00:00
Nicolai Haehnle b0c9748709 AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic
Summary:
This intrinsic returns true if the current thread belongs to a live pixel
and false if it belongs to a pixel that we are executing only for derivative
computation. It will be used by Mesa to implement gl_HelperInvocation.

Note that for pixels that are killed during the shader, this implementation
also returns true, but it doesn't matter because those pixels are always
disabled in the EXEC mask.

This unearthed a corner case in the instruction verifier, which complained
about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but
correct code, so make the verifier accept it as such.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19191

llvm-svn: 267102
2016-04-22 04:04:08 +00:00
Matt Arsenault 98f8394e7c AMDGPU: Fix debug name of pass to better match
I get this wrong every time I try to debug this.

llvm-svn: 267030
2016-04-21 18:21:54 +00:00
Nicolai Haehnle 97788020c5 Split IntrReadArgMem into IntrReadMem and IntrArgMemOnly
Summary:
IntrReadWriteArgMem simply becomes IntrArgMemOnly.

So there are fewer intrinsic properties that express their orthogonality
better, and correspond more closely to the corresponding IR attributes.

Suggested by: Philip Reames

Reviewers: joker.eph, reames, tstellarAMD

Subscribers: jholewinski, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19291

llvm-svn: 267021
2016-04-21 17:48:02 +00:00
Sam Kolton 201398e8a3 [AMDGPU] Assembler: prevent parseDPPCtrlOps from eating invalid tokens
Reviewers: nhaustov, tstellarAMD

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D19317

llvm-svn: 266984
2016-04-21 13:14:24 +00:00
Nikolay Haustov fb5c307ccd AMDGPU/SI: Assembler: improvements to support trap handlers.
Add ParseAMDGPURegister which can be invoked recursively for parsing lists.
Rename getRegForName to getSpecialRegForName.
Support legacy SP3 register list syntax: [s2,s3,s4,s5] or [flat_scratch_lo,flat_scratch_hi].
Add 64-bit registers TBA, TMA where missing.
Add some tests.

Differential Revision: http://reviews.llvm.org/D19163

llvm-svn: 266865
2016-04-20 09:34:48 +00:00
Nicolai Haehnle b48275f134 Add IntrWrite[Arg]Mem intrinsic property
Summary:
This property is used to mark an intrinsic that only writes to memory, but
neither reads from memory nor has other side effects.

An example where this is useful is the llvm.amdgcn.buffer.store.format.*
intrinsic, which corresponds to a store instruction that goes through a special
buffer descriptor rather than through a plain pointer.

With this property, the intrinsic should still be handled as having side
effects at the LLVM IR level, but machine scheduling can make smarter
decisions.

Reviewers: tstellarAMD, arsenm, joker.eph, reames

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18291

llvm-svn: 266826
2016-04-19 21:58:33 +00:00
Nicolai Haehnle e2dda4f750 AMDGPU: Guard VOPC instructions against incorrect commute
Summary:
The added testcase, which triggered this, was derived from a shader-db case
via bugpoint. A separate question is why scalar branching wasn't used.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19208

llvm-svn: 266825
2016-04-19 21:58:22 +00:00
Nicolai Haehnle 7483937bf0 AMDGPU/SI: SGPR accounting in getSIProgramInfo must ignore exec_lo/hi
Summary:
A shader stored the live mask (initial exec mask) in an SGPR which was then
spilled during register allocation. The allocator quite reasonably
optimized turned the spill into

  v_writelane_b32 %vgpr, exec_lo, N
  v_writelane_b32 %vgpr, exec_hi, N+1

at the beginning of the shader, confusing the SGPR accounting.

No test case, because si-sgpr-spill.ll together with an upcoming patch for
WQM handling exhibits the problem.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19199

llvm-svn: 266824
2016-04-19 21:58:17 +00:00
Konstantin Zhuravlyov 8c273ad719 [AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt
Also,
- Skip pass if machine module does not have debug info
- Minor comment changes
- Added test

Differential Revision: http://reviews.llvm.org/D19079

llvm-svn: 266626
2016-04-18 16:28:23 +00:00
Artem Tamazov e2762423c2 [AMDGPU][llvm-mc] s_setreg* - Fix order of operands
Order should match the sp3 syntax, where destination (simm16 denoting the hwreg) is coming first.

Differential Revision: http://reviews.llvm.org/D19161

llvm-svn: 266617
2016-04-18 14:54:26 +00:00
Aaron Ballman 2eeefe8ed8 Silence some "initialized but unused" warnings from MSVC -- the function being called is a static function, so there's no need for an instance variable. NFC.
llvm-svn: 266616
2016-04-18 14:47:19 +00:00
Mehdi Amini b550cb1750 [NFC] Header cleanup
Removed some unused headers, replaced some headers with forward class declarations.

Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'

Patch by Eugene Kosov <claprix@yandex.ru>

Differential Revision: http://reviews.llvm.org/D19219

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
2016-04-18 09:17:29 +00:00
Matt Arsenault c10783c42d AMDGPU: Enable LocalStackSlotAllocation pass
This resolves more frame indexes early and folds
the immediate offsets into the scratch mubuf instructions.

This cleans up a lot of the mess that's currently emitted,
such as emitting add 0s and repeatedly initializing the same
register to 0 when spilling.

llvm-svn: 266508
2016-04-16 02:13:37 +00:00
Matt Arsenault b6be202779 AMDGPU: Use s_addk_i32 / s_mulk_i32
llvm-svn: 266506
2016-04-16 01:46:49 +00:00
Jun Bum Lim 4c5bd58ebe [MachineScheduler]Add support for store clustering
Perform store clustering just like load clustering. This change add
StoreClusterMutation in machine-scheduler. To control StoreClusterMutation,
added enableClusterStores() in TargetInstrInfo.h. This is enabled only on
AArch64 for now.

This change also add support for unscaled stores which were not handled in
getMemOpBaseRegImmOfs().

llvm-svn: 266437
2016-04-15 14:58:38 +00:00
Nicolai Haehnle 750082d1fe AMDGPU/SI: Fix regression with no-return atomics
Summary:
In the added test-case, the atomic instruction feeds into a non-machine
CopyToReg node which hasn't been selected yet, so guard against
non-machine opcodes here.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19043

llvm-svn: 266433
2016-04-15 14:42:36 +00:00
Matt Arsenault 9c499c3a74 AMDGPU: Remove custom load/store scalarization
llvm-svn: 266385
2016-04-14 23:31:26 +00:00
Matt Arsenault fd8ab09c0e AMDGPU: Include LDS size in printed comment
llvm-svn: 266382
2016-04-14 22:11:51 +00:00
Matt Arsenault 3d1c1deb04 AMDGPU: Run SIFoldOperands after PeepholeOptimizer
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.

shader-db stats:

Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)

Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)

Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)

llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Matt Arsenault 4ac341c8b3 AMDGPU: Directly emit m0 initialization with s_mov_b32
Currently what comes out of instruction selection is a
register initialized to -1, and then copied to m0.
MachineCSE doesn't consider copies, but we want these
to be CSEed. This isn't much of a problem currently,
because SIFoldOperands is run immediately after.

This avoids regressions when SIFoldOperands is run later
from leaving all copies to m0.

llvm-svn: 266377
2016-04-14 21:58:15 +00:00
Matt Arsenault 7900334dd5 AMDGPU: Fold bitcasts of scalar constants to vectors
This cleans up some messes since the individual scalar components
can be CSEed.

llvm-svn: 266376
2016-04-14 21:58:07 +00:00
Tom Stellard 000c5af3e6 AMDGPU: Add skeleton GlobalIsel implementation
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.

Reviewers: arsenm, qcolombet

Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19077

llvm-svn: 266356
2016-04-14 19:09:28 +00:00
Nicolai Haehnle 05b127da06 [StructurizeCFG] Annotate branches that were treated as uniform
Summary:
This fully solves the problem where the StructurizeCFG pass does not
consider the same branches as uniform as the SIAnnotateControlFlow pass.
The patch in D19013 helps with this problem, but is not sufficient
(and, interestingly, causes a "regression" with one of the existing
test cases).

No tests included here, because tests in D19013 already cover this.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19018

llvm-svn: 266346
2016-04-14 17:42:35 +00:00
Nicolai Haehnle 723b73b4eb AMDGPU: Remove SIFixSGPRLiveRanges pass
Summary:
This pass is unnecessary and overly conservative. It was motivated by
situations like

  def %vreg0:SGPR_32
  ...
if-block:
  ..
  def %vreg1:SGPR_32
  ...
else-block:
  ...
  use %vreg0:SGPR_32
  ...

and similar situations with uses after the non-uniform control flow, where
we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
even though in the original, thread/workitem-based CFG, it looks like the
live ranges of these registers do not overlap.

However, by the time register allocation runs, we have moved to a wave-based
CFG that accurately represents the fact that the wave may run through both
the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
overlap even without the SIFixSGPRLiveRanges pass.

In addition to proving this change correct, I have tested it with Piglit
and a small number of other tests.

Reviewers: arsenm, tstellarAMD

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19041

llvm-svn: 266345
2016-04-14 17:42:29 +00:00
Nicolai Haehnle 19f0f5177d AMDGPU: change a redundant if () to an assert(). NFC
Summary:
I've been carrying this change around with me for a while, because the if ()
managed to confuse me while following the code. All callers ensure that the
assertion holds.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19042

llvm-svn: 266344
2016-04-14 17:42:18 +00:00
Tom Stellard 79a1fd718c AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.

This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.

Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.

Reviewers: mareko, arsenm, tstellarAMD, nhaehnle

Subscribers: FireBurn, kerberizer, llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D18340

Patch By: Bas Nieuwenhuizen

llvm-svn: 266337
2016-04-14 16:27:07 +00:00
Tom Stellard f110f8f9f7 AMDGPU/SI: Use the correct scratch wave offset register for shaders.
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,

The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Reviewers: mareko, arsenm, nhaehnle, tstellarAMD

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18941

Patch By: Bas Nieuwenhuizen

llvm-svn: 266336
2016-04-14 16:27:03 +00:00
Matt Arsenault 9cd90712f0 AMDGPU: Implement canonicalize
Also add generic DAG node for it.

llvm-svn: 266272
2016-04-14 01:42:16 +00:00
Tom Stellard b951f10701 AMDGPU/SI: Add support for spilling VGPRs without having to scavenge registers
Summary:
When we are spilling SGPRs to scratch memory, we usually don't have
free SGPRs to do the address calculation, so we need to re-use the
ScratchOffset register for the calculation.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18917

llvm-svn: 266244
2016-04-13 20:44:16 +00:00
Artem Tamazov eb4d5a9b0b [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.

TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.

Differential Revision: http://reviews.llvm.org/D17825

llvm-svn: 266205
2016-04-13 16:18:41 +00:00
Tom Stellard 703b2ec43f AMDGPU/SI: Fix spilling of 96-bit registers
Summary:
It seems like this was broken in r252327.  I thought we had test cases
for this, but it's really hard to tirgger spills of this exact register
size since they aren't used very much.

Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19021

llvm-svn: 266152
2016-04-12 23:57:30 +00:00
Nicolai Haehnle df77c9ada4 AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like
llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and
atomic counters at least when robust buffer access behavior is desired.
(These instructions perform no format conversion and do buffer range checking
per component.)

As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format,
it has become trivial to add support for the f32 and v2f32 variants of that
intrinsic, so the patch does so.

Also DAG-ify (and fix) some tests that I noticed intermittent failures in
while developing this patch.

Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects
changes to the BUFFER_STORE_DWORD* instructions. See also
http://reviews.llvm.org/D18291.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18292

llvm-svn: 266126
2016-04-12 21:18:10 +00:00
Tom Stellard ab1d3a9d50 AMDGPU/SI: Insert wait states required after v_readfirstlane on SI
Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation  fixes a hang with the piglit
test:

spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra

Reviewers: arsenm, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18988

llvm-svn: 266105
2016-04-12 18:40:43 +00:00
Matt Arsenault 3b08238f78 AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32
This helps clean up some of the mess when expanding unaligned 64-bit
loads when changed to be promote to v2i32, and fixes situations
where or x, 0 was emitted after splitting 64-bit ors during moveToVALU.

I think this could be a generic combine but I'm not sure.

llvm-svn: 266104
2016-04-12 18:24:38 +00:00
Nicolai Haehnle 279970c0dc AMDGPU/SI: Fix a mis-compilation of multi-level breaks
Summary:
Under certain circumstances, multi-level breaks (or what is understood by
the control flow passes as such) could be miscompiled in a way that causes
infinite loops, by emitting incorrect control flow intrinsics.

This fixes a hang in
dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18967

llvm-svn: 266088
2016-04-12 16:10:38 +00:00
Matt Arsenault 64fa2f4513 AMDGPU: Implement i64 global atomics
llvm-svn: 266075
2016-04-12 14:05:11 +00:00
Matt Arsenault a9dbdcae04 AMDGPU: Add atomic_inc + atomic_dec intrinsics
These are different than atomicrmw add 1 because they have
an additional input value to clamp the result.

llvm-svn: 266074
2016-04-12 14:05:04 +00:00
Matt Arsenault 21ecfe43ba AMDGPU: Remove trailing whitespace
llvm-svn: 266073
2016-04-12 14:04:54 +00:00
Tom Stellard 0ffdf65eaa Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute"
This reverts commit r263720.

Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute.

llvm-svn: 265992
2016-04-11 20:38:40 +00:00
Jan Vesely 43b7b5b846 AMDGPU/SI: Implement atomic load/store for i32 and i64
Standard load/store instructions with GLC bit set.

Reviewers: tstellardAMD, arsenm

Differential Revision: http://reviews.llvm.org/D18760

llvm-svn: 265709
2016-04-07 19:23:11 +00:00
Tom Stellard 9112758077 AMDGPU/SI: Add latency for export instructions
Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18599

llvm-svn: 265708
2016-04-07 18:30:05 +00:00
Tom Stellard d37630e461 AMDGPU/SI: Add MachineBasicBlock parameter to SIInstrInfo::insertWaitStates
Summary: This makes it possible to insert nops at the end of blocks.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18549

llvm-svn: 265678
2016-04-07 14:47:07 +00:00
Valery Pykhtin e23b6deb01 [AMDGPU] fix readlane/readfirstlane src vgpr operand type.
For VGPR_32 operand disassembler expects a VGPR register encoded as 0..255 (enum8 src operand).
readfirstlane/readline actually has enum9 operand and this change fixes VGPR_32 to VS_32 (enum9 encoding).

Differential Revision: http://reviews.llvm.org/D18696

llvm-svn: 265670
2016-04-07 13:41:51 +00:00
Benjamin Kramer 4fb78518f1 Make helper functions static. NFC.
llvm-svn: 265653
2016-04-07 10:10:09 +00:00
Nicolai Haehnle df3a20cd80 AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.

Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Differential Revision: http://reviews.llvm.org/D18559

llvm-svn: 265589
2016-04-06 19:40:20 +00:00
Sam Kolton ff90c60a78 [AMDGPU] AsmParser: disable DPP for unsupported instructions. New dpp tests. Fix v_nop_dpp.
Summary:
1. Disable DPP encoding for instructions that do not support it:
    - VOP1:
        - v_readfirstlane_b32
        - v_clrexcp
        - v_movreld_b32
        - v_movrels_b32
        - v_movrelsd_b32
    - VOP2:
        - v_madmk_f16/32
        - v_madak_f16/32
    - VOPC, VINTRP, VOP3
2. Fix DPP for v_nop
3. New DPP tests for VOP1 and VOP2 instructions

Reviewers: nhaustov, tstellarAMD, vpykhtin

Subscribers: tstellarAMD, arsenm

Differential Revision: http://reviews.llvm.org/D18552

llvm-svn: 265538
2016-04-06 13:29:59 +00:00
Matthias Braun 7dc03f060e RegisterScavenger: Take a reference as enterBasicBlock() argument.
Make it obvious that the argument cannot be nullptr.
Remove an unnecessary nullptr check in initRegState.

llvm-svn: 265511
2016-04-06 02:47:09 +00:00
Konstantin Zhuravlyov e63e02cb0c [AMDGPU] Emit linkonce and linkonce_odr symbols
Differential Revision: http://reviews.llvm.org/D18726

llvm-svn: 265408
2016-04-05 16:00:58 +00:00
Tom Stellard 354a43c7bc AMDGPU: Implement {BUFFER,FLAT}_ATOMIC_CMPSWAP{,_X2}
Summary:
Implement BUFFER_ATOMIC_CMPSWAP{,_X2} instructions on all GCN targets, and FLAT_ATOMIC_CMPSWAP{,_X2} on CI+.

32-bit instruction variants tested manually on Kabini and Bonaire. Tests and parts of code provided by Jan Veselý.

Patch by: Vedran Miletić

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: jvesely, scchan, kanarayan, arsenm

Differential Revision: http://reviews.llvm.org/D17280

llvm-svn: 265170
2016-04-01 18:27:37 +00:00
Valery Pykhtin 5b3559c1ec [AMDGPU] fix MADAK/MADMK instructions operand namings to match encoding fields.
$vsrc1 -> $src1, $k -> $imm

Differential Revision: http://reviews.llvm.org/D18659

llvm-svn: 265141
2016-04-01 13:13:12 +00:00
Sam Kolton 1048fb1818 [AMDGPU] Disassembler: support for DPP
Review: http://reviews.llvm.org/D18642
llvm-svn: 265015
2016-03-31 14:15:04 +00:00
Matt Arsenault 2fe4fbc184 AMDGPU: Add frexp_exp intrinsic
llvm-svn: 264944
2016-03-30 22:28:52 +00:00
Aaron Ballman ef0fe1eed8 Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC.
llvm-svn: 264929
2016-03-30 21:30:00 +00:00
Tom Stellard 1d5e6d4bdc AMDGPU/SI: Improve MachineSchedModel definition
This patch contains a few improvements to the model, including:

- Using a single resource with a defined buffers size for each memory unit.
- Setting the IssueWidth correctly.
- Fixing latency values for memory instructions.

shader-db stats:

16429 shaders in 3231 tests
Totals:
SGPRS: 318232 -> 312328 (-1.86 %)
VGPRS: 208996 -> 209346 (0.17 %)
Code Size: 7147044 -> 7166440 (0.27 %) bytes
LDS: 83 -> 83 (0.00 %) blocks
Scratch: 1862656 -> 1459200 (-21.66 %) bytes per wave
Max Waves: 49182 -> 49243 (0.12 %)
Wait states: 0 -> 0 (0.00 %)A

Differential Revision: http://reviews.llvm.org/D18453

llvm-svn: 264877
2016-03-30 16:35:13 +00:00
Tom Stellard 0bc954e3bc AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.

This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.

shader-db stats:

2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)

Depends on D18451

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18452

llvm-svn: 264876
2016-03-30 16:35:09 +00:00
Konstantin Zhuravlyov ecc7cbf611 Test commit access
llvm-svn: 264736
2016-03-29 15:15:44 +00:00
Tom Stellard a76bcc2ea1 AMDGPU/SI: Limit load clustering to 16 bytes instead of 4 instructions
Summary:
This helps prevent load clustering from drastically increasing register
pressure by trying to cluster 4 SMRDx8 loads together.  The limit of 16
bytes was chosen, because it seems like that was the original intent
of setting the limit to 4 instructions, but more analysis could show
that a different limit is better.

This fixes yields small decreases in register usage with shader-db, but
also helps avoid a large increase in register usage when lane mask
tracking is enabled in the machine scheduler, because lane mask tracking
enables more opportunities for load clustering.

shader-db stats:

2379 shaders in 477 tests
Totals:
SGPRS: 49744 -> 48600 (-2.30 %)
VGPRS: 34120 -> 34076 (-0.13 %)
Code Size: 1282888 -> 1283184 (0.02 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 495616 -> 492544 (-0.62 %) bytes per wave
Max Waves: 6843 -> 6853 (0.15 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18451

llvm-svn: 264589
2016-03-28 16:10:13 +00:00
Justin Bogner f2a0d349a6 AMDGPU: Fix a use-after free and a missing break
We're erasing MI here, but then immediately using it again inside the
`if`. This moves the erase after we're done using it.

Doing that reveals a second problem though - this case is missing a
break, so we fall through to the default and dereference MI again.
This is obviously a bug, though I don't know how to write a test that
triggers it - all we do in the error case is print some extra debug
output.

Both of these issue crash on lots of tests under ASAN with the
recycling allocator changes from PR26808 applied.

llvm-svn: 264442
2016-03-25 18:33:16 +00:00
Matt Arsenault 8c8fcb2585 AMDGPU: Cost model for basic integer operations
This resolves bug 21148 by preventing promotion to
i64 induction variables.

llvm-svn: 264376
2016-03-25 01:16:40 +00:00
Matt Arsenault 9651813ee0 AMDGPU: Partially implement getArithmeticInstrCost for FP ops
llvm-svn: 264374
2016-03-25 01:00:32 +00:00
Matt Arsenault 59767cea79 AMDGPU: TTI: Make insertelement free.
We don't want to have a cost to scalarizing operations.

llvm-svn: 264364
2016-03-25 00:14:11 +00:00
Tom Stellard 9babad25e5 AMDGPU/SI: Add Polaris support
Patch By: Sonny Jiang

llvm-svn: 264295
2016-03-24 15:31:05 +00:00
Vasileios Kalintiris b8a37205d2 Fix sequence point warning. NFC.
llvm-svn: 264255
2016-03-24 10:53:28 +00:00
Matt Arsenault 30d37a74da AMDGPU: Remove atomic inc/dec patterns
There is no benefit to these since materializing the constant 1
requires the same number of instructions as materializing uint_max

llvm-svn: 264215
2016-03-23 23:23:38 +00:00
Matt Arsenault 0a30e456b4 AMDGPU: Promote alloca should skip volatiles
llvm-svn: 264214
2016-03-23 23:17:29 +00:00
Matt Arsenault f43c2a0b49 AMDGPU: Insert moves of frame index to value operands
Strengthen tests of storing frame indices.

Right now this just creates irrelevant scheduling changes.

We don't want to have multiple frame index operands
on an instruction. There seem to be various assumptions
that at least the same frame index will not appear twice
in the LocalStackSlotAllocation pass.

There's no reason to have this happen, and it just
makes it easy to introduce bugs where the immediate
offset is appplied to the storing instruction when it should
really be applied to the value being stored as a separate
add.

This might not be sufficient. It might still be problematic
to have an add fi, fi situation, but that's even less unlikely
to happen in real code.

llvm-svn: 264200
2016-03-23 21:49:25 +00:00
Valery Pykhtin c0a77c5064 [AMDGPU] Fix missing assembler predicates.
Differential Revision: http://reviews.llvm.org/D18351

llvm-svn: 264137
2016-03-23 04:27:26 +00:00
Tom Stellard 52ecd2d69b AMDGPU: Cache information about register pressure sets
We can statically decide whether or not a register pressure set is for
SGPRs or VGPRs, so we don't need to re-compute this information in
SIRegisterInfo::getRegPressureSetLimit().

Differential Revision: http://reviews.llvm.org/D14805

llvm-svn: 264126
2016-03-23 01:53:22 +00:00
Nicolai Haehnle 0a33abdfd2 AMDGPU: Fix dangling references introduced by r263982
Fixes Valgrind errors on the test cases that were reported as failing
by buildbots.

llvm-svn: 264000
2016-03-21 22:54:02 +00:00
Nicolai Haehnle a56e6b6a53 AMDGPU: Coding style fixes
I meant to add these before committing r263982 as per the review,
but I forgot to squash.

llvm-svn: 263983
2016-03-21 20:39:24 +00:00
Nicolai Haehnle 213e87f2ee AMDGPU: Add SIWholeQuadMode pass
Summary:
Whole quad mode is already enabled for pixel shaders that compute
derivatives, but it must be suspended for instructions that cause a
shader to have side effects (i.e. stores and atomics).

This pass addresses the issue by storing the real (initial) live mask
in a register, masking EXEC before instructions that require exact
execution and (re-)enabling WQM where required.

This pass is run before register coalescing so that we can use
machine SSA for analysis.

The changes in this patch expose a problem with the second machine
scheduling pass: target independent instructions like COPY implicitly
use EXEC when they operate on VGPRs, but this fact is not encoded in
the MIR. This can lead to miscompilation because instructions are
moved past changes to EXEC.

This patch fixes the problem by adding use-implicit operands to
target independent instructions. Some general codegen passes are
relaxed to work with such implicit use operands.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: MatzeB, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18162

llvm-svn: 263982
2016-03-21 20:28:33 +00:00
Tom Stellard 92339e888f AMDGPU/SI: Fix threshold calculation for branching when exec is zero
Summary:
When control flow is implemented using the exec mask, the compiler will
insert branch instructions to skip over the masked section when exec is
zero if the section contains more than a certain number of instructions.

The previous code would only count instructions in successor blocks,
and this patch modifies the code to start counting instructions in all
blocks between the start and end of the branch.

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18282

llvm-svn: 263969
2016-03-21 18:56:58 +00:00
Matt Arsenault cb38a6bd35 AMDGPU: Remove SignBitIsZero for mubuf scratch offsets
These instructions do not have the same negative base
address problem that DS instructions do on SI.

llvm-svn: 263964
2016-03-21 18:02:18 +00:00
Matt Arsenault b96b57347a AMDGPU: Add frexp_mant intrinsic
llvm-svn: 263948
2016-03-21 16:11:05 +00:00
Nicolai Haehnle fa771811b3 AMDGPU: add missing braces around multi-line if block
This fixes an issue with rL263658 pointed out by Tom Stellard.

llvm-svn: 263823
2016-03-18 20:32:04 +00:00
Nicolai Haehnle 95e8ffd398 AMDGPU: Overload return type of llvm.amdgcn.buffer.load.format
Summary:
Allow the selection of BUFFER_LOAD_FORMAT_x and _XY. Do this now before
the frontend patches land in Mesa. Eventually, we may want to automatically
reduce the size of loads at the LLVM IR level, which requires such overloads,
and in some cases Mesa can generate them directly.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18255

llvm-svn: 263792
2016-03-18 16:24:40 +00:00
Nicolai Haehnle ad63638f6d AMDGPU/SI: Add llvm.amdgcn.buffer.atomic.* intrinsics
Summary:
These intrinsics expose the BUFFER_ATOMIC_* instructions and will be used
by Mesa to implement atomics with buffer semantics. The intrinsic interface
matches that of buffer.load.format and buffer.store.format, except that the
GLC bit is not exposed (it is automatically deduced based on whether the
return value is used).

The change of hasSideEffects is required for TableGen to accept the pattern
that matches the intrinsic.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, rivanvx, llvm-commits

Differential Revision: http://reviews.llvm.org/D18151

llvm-svn: 263791
2016-03-18 16:24:31 +00:00
Nicolai Haehnle 3003ba00a3 AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format
Summary:
We cannot easily deduce that an offset is in an SGPR, but the Mesa frontend
cannot easily make use of an explicit soffset parameter either. Furthermore,
it is likely that in the future, LLVM will be in a better position than the
frontend to choose an SGPR offset if possible.

Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18218

llvm-svn: 263790
2016-03-18 16:24:20 +00:00
Sam Kolton a74cd526e9 [AMDGPU] Assembler: Change dpp_ctrl syntax to match sp3
Review: http://reviews.llvm.org/D18267
llvm-svn: 263789
2016-03-18 15:35:51 +00:00
Changpeng Fang 234fcb81d3 AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute
Symmary:
  ds_permute/ds_bpermute do not read memory so s_waitcnt is not needed.

Reviewers
  arsenm, tstellarAMD

Subscribers
  llvm-commits, arsenm

Differential Revision:
  http://reviews.llvm.org/D18197

llvm-svn: 263720
2016-03-17 16:43:50 +00:00
Nicolai Haehnle 79cad857a0 AMDGPU: mark atomic instructions as sources of divergence
Summary:
As explained by the comment, threads will typically see different values
returned by atomic instructions even if the arguments are equal.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18156

llvm-svn: 263719
2016-03-17 16:21:59 +00:00
Nicolai Haehnle ef160de3e5 AMDGPU: Prevent uniform loops from becoming infinite
Summary:
Uniform loops where the branch leaving the loop is predicated on VCCNZ
must be skipped if EXEC = 0, otherwise they will be infinite.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18137

llvm-svn: 263658
2016-03-16 20:14:33 +00:00
Michel Danzer 302f83ac4e AMDGPU: Verify instructions in non-debug builds as well
And emit an error if it fails.

This prevents illegal instructions from getting sent to the GPU, which
would potentially result in a hang.

This is a candidate for the stable branch(es).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
llvm-svn: 263627
2016-03-16 09:10:42 +00:00
Michel Danzer beb79ceb19 AMDGPU/SI: Clean up indentation in SIInstrInfo::getDefaultRsrcDataFormat
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 263626
2016-03-16 09:10:35 +00:00
Changpeng Fang 01f6062227 AMDGPU/SI: Implement GroupStaticSize Intrinsic for Dynamic LDS
Summary:
  Static LDS size is saved in MachineFunctionInfo::LDSSize,
We define a pseudo instruction with usesCustomInserter bit set. Then, in EmitInstrWithCustomInserter,
we replace this pseudo instruction with a mov of MachineFunctionInfo::LDSSize.

Reviewers:
    arsenm
    tstellarAMD

Subscribers
    llvm-commits, arsenm

Differential Revision:
   http://reviews.llvm.org/D18064

llvm-svn: 263563
2016-03-15 17:28:44 +00:00
Sanjay Patel 5719584129 [DAG] use isUndef() ; NFCI
llvm-svn: 263448
2016-03-14 17:28:46 +00:00
Tom Stellard 331f981cc9 AMDGPU/SI: Handle wait states required for DPP instructions
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17543

llvm-svn: 263447
2016-03-14 17:05:56 +00:00
Marek Olsak ed2213e6ef AMDGPU/SI: Incomplete shader binaries need to finish execution at the end
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D18058

llvm-svn: 263441
2016-03-14 15:57:14 +00:00
Nicolai Haehnle 74127fe8d7 AMDGPU: mark llvm.amdgcn.image.atomic.* as a source of divergence
Summary:
When multiple threads perform an atomic op with the same arguments, they
will usually see different return values.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18101

llvm-svn: 263440
2016-03-14 15:37:18 +00:00
Nikolay Haustov 79af6b33e0 [AMDGPU] Assembler: SOP* instruction fixes
s_bitset0_b64, s_bitset1_b64 has 32-bit src0, not 64-bit.
s_rfe_b64 has just one destination operand and no source.
Uncomment S_BITCMP* and S_SETVSKIP, adjust SOPC_* classes for that.
Add s_memrealtime test and change comments in smem.s to follow common style.
Change test for s_memtime to use non-zero register to make it really test encoding.
Add tests for s_buffer_load*.
Add tests for SOPC instructions (same for SI and VI)

Differential Revision: http://reviews.llvm.org/D18040

llvm-svn: 263420
2016-03-14 11:17:19 +00:00
Valery Pykhtin 0f97f17152 [AMDGPU] AsmParser: Factor out parseRegister. NFC.
llvm-svn: 263411
2016-03-14 07:43:42 +00:00
Valery Pykhtin 9e33c7f5d3 [AMDGPU] AsmParser: refactor post push_back vector access. NFC.
llvm-svn: 263409
2016-03-14 05:25:44 +00:00
Valery Pykhtin f91911c3ae [AMDGPU] AsmParser: remove redundant isReg checks. NFC.
llvm-svn: 263407
2016-03-14 05:01:45 +00:00
Valery Pykhtin a7f480b4e9 [AMDGPU] Fix VOPC instruction operand namings
Differential Revision: http://reviews.llvm.org/D17966

llvm-svn: 263242
2016-03-11 14:53:28 +00:00
Nikolay Haustov 6560781c4f [AMDGPU] Assembler: change v_madmk operands to have same order as mad.
The constant is now at source operand 1 (previously at 2).
This is also how it is in legacy AMD sp3 assembler.
Update tests.

Differential Revision: http://reviews.llvm.org/D17984

llvm-svn: 263212
2016-03-11 09:27:25 +00:00
Matt Arsenault bafc9dc591 AMDGPU: Don't use InstVisitor for AMDGPUPromoteAlloca
Frontend authors are strongly encouraged to keep allocas
in the entry block, so don't bother visiting every instruction
in the other blocks of the function.

llvm-svn: 263206
2016-03-11 08:20:50 +00:00
Matt Arsenault 6b6a2c37bc AMDGPU: R600 code splitting cleanup
Move a few functions only used by R600 to R600 specific code,
fix header macros to stop using R600, mark classes as final.

llvm-svn: 263204
2016-03-11 08:00:27 +00:00
Matt Arsenault 9a19c240c0 AMDGPU: Materialize sign bits with bfrev
If a constant is the same as the reverse of an inline immediate,
this is 4 bytes smaller than having to embed a 32-bit literal.

llvm-svn: 263201
2016-03-11 07:42:49 +00:00
Nicolai Haehnle b142770bfe AMDGPU/SI: add llvm.amdgcn.buffer.load/store.format intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_FORMAT_XYZW and will be used by Mesa
to implement the GL_ARB_shader_image_load_store extension.

The intention is that for llvm.amdgcn.buffer.load.format, LLVM will decide
whether one of the _X/_XY/_XYZ opcodes can be used (similar to image sampling
and loads). However, this is not currently implemented.

For llvm.amdgcn.buffer.store, LLVM cannot decide to use one of the "smaller"
opcodes and therefore the intrinsic is overloaded. Currently, only the v4f32
is actually implemented since GLSL also only has a vec4 variant of the store
instructions, although it's conceivable that Mesa will want to be smarter
about this in the future.

BUFFER_LOAD_FORMAT_XYZW is already exposed via llvm.SI.vs.load.input, which
has a legacy name, pretends not to access memory, and does not capture the
full flexibility of the instruction.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17277

llvm-svn: 263140
2016-03-10 18:43:50 +00:00
Changpeng Fang 278a5b31a5 AMDGPU/SI: Define S_GETREG Intrinsic
Summary:
 Define s_getreg intrinsic to generate s_getreg instruction to read
hardware registers.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17892

llvm-svn: 263124
2016-03-10 16:47:15 +00:00
Valery Pykhtin a4db224d54 [AMDGPU] Fix SMEM instructions encoding/operand namings
Differential Revision: http://reviews.llvm.org/D17651

llvm-svn: 263108
2016-03-10 13:06:08 +00:00
Chad Rosier c27a18f39f [TII] Allow getMemOpBaseRegImmOfs() to accept negative offsets. NFC.
http://reviews.llvm.org/D17967

llvm-svn: 263021
2016-03-09 16:00:35 +00:00
Teresa Johnson e50b23c67f Fix build error due to unsigned compare >= 0 in r263008 (NFC)
Fixes error from building with clang:

/usr/local/google/home/tejohnson/llvm/llvm_15/lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp:407:12:
error: comparison of unsigned expression >= 0 is always true
[-Werror,-Wtautological-compare]
  if ((Imm >= 0x000) && (Imm <= 0x0ff)) {
         ~~~ ^  ~~~~~

llvm-svn: 263014
2016-03-09 14:58:23 +00:00
Sam Kolton dfa29f7c5b [AMDGPU] Assembler: Support DPP instructions.
Supprot DPP syntax as used in SP3 (except several operands syntax).
Added dpp-specific operands in td-files.
Added DPP flag to TSFlags to determine if instruction is dpp in InstPrinter.
Support for VOP2 DPP instructions in td-files.
Some tests for DPP instructions.

ToDo:
  - VOP2bInst:
    - vcc is considered as operand
    - AsmMatcher doesn't apply mnemonic aliases when parsing operands
  - v_mac_f32
  - v_nop
  - disable instructions with 64-bit operands
  - change dpp_ctrl assembler representation to conform sp3

Review: http://reviews.llvm.org/D17804
llvm-svn: 263008
2016-03-09 12:29:31 +00:00
Nikolay Haustov 9b7577ed22 [AMDGPU] Assembler: Support abs() syntax.
Support legacy SP3 abs(v1) syntax. InstPrinter still uses |v1|.
Add tests.

Differential Revision: http://reviews.llvm.org/D17887

llvm-svn: 263006
2016-03-09 11:03:21 +00:00
Nikolay Haustov 8e3f099497 [AMDGPU] Assembler: Fix s_setpc_b64
s_setpc_b64 has just one 64-bit source which is the address of instruction to jump to.

Differential Revision: http://reviews.llvm.org/D17888

llvm-svn: 263005
2016-03-09 10:56:19 +00:00
Matt Arsenault c89f2919a4 AMDGPU: Match more med3 integer patterns
llvm-svn: 262864
2016-03-07 21:54:48 +00:00
Matt Arsenault 56356c8a9c AMDGPU: Remove a fixme for ptrrtoint handling
llvm-svn: 262854
2016-03-07 21:12:46 +00:00
Matt Arsenault 81d06015c6 AMDGPU: Move function only used by R600
llvm-svn: 262853
2016-03-07 21:10:13 +00:00
Valery Pykhtin dc11054f20 [AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler.
Engages code from r262804.

Differential Revision: http://reviews.llvm.org/D17151

llvm-svn: 262808
2016-03-06 20:25:36 +00:00
Valery Pykhtin 50cd3c4ec7 fix sanitizer-ppc64be-linux failure for r262804
error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move]

http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/930

llvm-svn: 262805
2016-03-06 15:13:54 +00:00
Valery Pykhtin 499a5c6323 [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields
Differential Revision: http://reviews.llvm.org/D17150

llvm-svn: 262804
2016-03-06 13:27:13 +00:00
Valery Pykhtin 0c6293da68 [AMDGPU] SOPxx instructions operand naming fixed in td files.
dst -> sdst
ssrcN -> srcN

Differential Revision: http://reviews.llvm.org/D17646

llvm-svn: 262801
2016-03-06 10:31:44 +00:00
Tom Stellard 649b5db557 AMDGPU/SI: Add support for spiling SGPRs to scratch buffer
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

llvm-svn: 262732
2016-03-04 18:31:18 +00:00
Tom Stellard ebef6f9771 AMDGPU/SI: Enable frame index scavenging during PrologEpilogueInserter
Summary:
This allows us to use virtual registers when we need extra registers
for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex().

Once all the frame indices have been eliminated, the
PrologEpilogueInserter does an extra pass over the program to replace
all virtual registers with physical ones.

This allows us to make more efficient use of our emergency spill slots,
so we only need to create one.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17591

llvm-svn: 262728
2016-03-04 18:02:01 +00:00
Sam Kolton f51f4b8370 Test commit access
llvm-svn: 262714
2016-03-04 12:29:14 +00:00
Valery Pykhtin 824e804bf6 test commit
llvm-svn: 262709
2016-03-04 10:59:50 +00:00
Nikolay Haustov 5bf46ac150 AMDGPU/SI: add llvm.amdgcn.image.atomic.* intrinsics
These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the
GL_ARB_shader_image_load_store extension.

Initial change by Nicolai H.hnle

Differential Revision: http://reviews.llvm.org/D17401

llvm-svn: 262701
2016-03-04 10:39:50 +00:00
Tom Stellard cc7067a668 AMDGPU: Insert two S_NOP instructions for every high level source statement.
Patch by: Konstantin Zhuravlyov

Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.

Reviewers: tstellarAMD, arsenm

Subscribers: echristo, dblaikie, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17454

llvm-svn: 262579
2016-03-03 03:53:29 +00:00
Tom Stellard 600ca6fd39 AMDGPU/SI: Don't try to move scratch wave offset when there are no free SGPRs
Summary:
When there were no free SGPRs, we were trying to move this value into
some of the reserved registers which was causing a segmentation fault.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17590

llvm-svn: 262577
2016-03-03 03:45:09 +00:00
Matt Arsenault 8226fc4829 AMDGPU: Simplify boolean conditional return statements
Patch by Richard Thomson

llvm-svn: 262536
2016-03-02 23:00:21 +00:00
Nikolay Haustov f2fbabe9c1 Revert "[AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields"
Build failure with clang.

llvm-svn: 262477
2016-03-02 11:16:56 +00:00
Nikolay Haustov f0f24628cb Revert "[AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler."
Build failure with clang.

llvm-svn: 262475
2016-03-02 10:54:21 +00:00
Nikolay Haustov 73447a9714 [AMDGPU] Using table-driven amd_kernel_code_t field parser in assembler.
complementary patch to table-driven amd_kernel_code_t field parser/printer utility. lit tests passed.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17151

llvm-svn: 262474
2016-03-02 10:36:30 +00:00
Nikolay Haustov 6c8c74969a [AMDGPU] table-driven parser/printer for amd_kernel_code_t structure fields
This is going to be used in .hsatext disassembler and can be used
in current assembler parser (lit tests passed on parsing).
Code using this helpers isn't included in this patch.

Benefits:

unified approach
fast field name lookup on parsing
Later I would like to enhance some of the field naming/syntax using this code.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17150

llvm-svn: 262473
2016-03-02 10:36:25 +00:00
Matt Arsenault f2dcb4737b AMDGPU: Fix bug 26659.
Fix checking the same instruction twice instead of the
second branch that uses vccz. I don't think this matters
currently because s_branch_vccnz is always used currently.

llvm-svn: 262457
2016-03-02 04:12:39 +00:00
Matt Arsenault a266bd8760 AMDGPU: Cleanup suggested in bug 23960
llvm-svn: 262456
2016-03-02 04:05:14 +00:00
Matt Arsenault 5de68cbc4c Bug 20810: Use report_fatal_error instead of unreachable
llvm-svn: 262455
2016-03-02 03:33:55 +00:00
Matthias Braun 17cb57995e TableGen: Check scheduling models for completeness
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:

- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model

Typical steps necessary to complete a model:

- Ensure all pseudo instructions that are expanded before machine
  scheduling (usually everything handled with EmitYYY() functions in
  XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
  resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.

Differential Revision: http://reviews.llvm.org/D17747

llvm-svn: 262384
2016-03-01 20:03:21 +00:00
Changpeng Fang 24f035af32 AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics
Summary:
  This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics,
which are new since VI.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17614

llvm-svn: 262356
2016-03-01 17:51:23 +00:00
Nikolay Haustov e309e1415d [AMDGPU] Remove unused disassembler code.
llvm-svn: 262346
2016-03-01 16:02:40 +00:00
Nikolay Haustov 47a115cd41 [AMDGPU] Fix build warnings.
llvm-svn: 262338
2016-03-01 14:50:59 +00:00
Nikolay Haustov ac106add0f [AMDGPU] Disassembler code refactored + error messages.
Idea behind this change is to make code shorter and as much common for all targets as possible. Let's even accept more code than is valid for a particular target, leaving it for the assembler to sort out.

64bit instructions decoding added.

Error\warning messages on unrecognized instructions operands added, InstPrinter allowed to print invalid operands helping to find invalid/unsupported code.

The change is massive and hard to compare with previous version, so it makes sense just to take a look on the new version. As a bonus, with a few TD changes following, it disassembles the majority of instructions. Currently it fully disassembles >300K binary source of some blas kernel.

Previous TODOs were saved whenever possible.

Patch by: Valery Pykhtin

Differential Revision: http://reviews.llvm.org/D17720

llvm-svn: 262332
2016-03-01 13:57:29 +00:00
Nikolay Haustov ea8febde04 [TableGen] AsmMatcher: Skip optional operands in the midle of instruction if it is not present
Previosy, if actual instruction have one of optional operands then other optional operands listed before this also should be presented.
For example instruction v_fract_f32 v0, v1, mul:2 have one optional operand - OMod and do not have optional operand clamp. Previously this was not allowed because clamp is listed before omod in AsmString:

string AsmString = "v_fract_f32$vdst, $src0_modifiers$clamp$omod";
Making this work required some hacks (both OMod and Clamp match classes have same PredicateMethod).

Now, if MatchInstructionImpl meets formal optional operand that is not presented in actual instruction it skips this formal operand and tries to match current actual operand with next formal.

Patch by: Sam Kolton

Review: http://reviews.llvm.org/D17568

[AMDGPU] Assembler: Check immediate types for several optional operands in predicate methods
With this change you should place optional operands in order specified by asm string:

clamp -> omod
offset -> glc -> slc -> tfe
Fixes for several tests.
Depends on D17568

Patch by: Sam Kolton

Review: http://reviews.llvm.org/D17644
llvm-svn: 262314
2016-03-01 08:34:43 +00:00
Matt Arsenault d275fcabcb AMDGPU: Don't emit build_pair during udivrem legalization
Technically you aren't supposed to emit these after type legalization
for some reason, and we use vector extracts of bitcasted integers
as the canonical way to do this.

llvm-svn: 262298
2016-03-01 05:06:05 +00:00
Matt Arsenault f4dfc1a027 AMDGPU: Don't use estimated stack size when we know the real stack size
llvm-svn: 262297
2016-03-01 04:58:20 +00:00
Matt Arsenault 59b8b77405 AMDGPU: Set HasExtractBitInsn
This currently does not have the control over the bitwidth,
and there are missing optimizations to reduce the integer to
32-bit if it can be.

But in most situations we do want the sinking to occur.

llvm-svn: 262296
2016-03-01 04:58:17 +00:00
Matt Arsenault 3a61985b2f AMDGPU: More bits of frame index are known to be zero
The maximum private allocation for the whole GPU is 4G,
so the maximum possible index for a single workitem is the
maximum size divided by the smallest granularity for a dispatch.

This increases the number of known zero high bits, which
enables more offset folding. The maximum private size per
workitem with this is 128M but may be smaller still.

llvm-svn: 262153
2016-02-27 20:26:57 +00:00
Duncan P. N. Exon Smith be8f8c4478 CodeGen: Update LiveIntervalAnalysis API to use MachineInstr&, NFC
These parameters aren't expected to be null, so take them by reference.

llvm-svn: 262151
2016-02-27 20:14:29 +00:00
Duncan P. N. Exon Smith 5702287809 CodeGen: Update DFAPacketizer API to take MachineInstr&, NFC
In all but one case, change the DFAPacketizer API to take MachineInstr&
instead of MachineInstr*.  In DFAPacketizer::endPacket(), take
MachineBasicBlock::iterator.  Besides cleaning up the API, this is in
search of PR26753.

llvm-svn: 262142
2016-02-27 19:09:00 +00:00
Matt Arsenault 9d82ee7526 AMDGPU: Split vi-insts subtarget feature
This will be more useful for marking builtins acceptable for which
subtargets.

llvm-svn: 262121
2016-02-27 08:53:55 +00:00
Matt Arsenault 274d34e725 AMDGPU: Add s_sleep intrinsic
llvm-svn: 262120
2016-02-27 08:53:52 +00:00
Matt Arsenault 61738cbcb6 AMDGPU: Implement readcyclecounter
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.

Also introduces new intrinsics for each of s_memtime / s_memrealtime.

llvm-svn: 262119
2016-02-27 08:53:46 +00:00
Duncan P. N. Exon Smith 3ac9cc6156 CodeGen: Take MachineInstr& in SlotIndexes and LiveIntervals, NFC
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals.  The MachineInstrs here are
never null, so this cleans up the API a bit.  It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).

At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.

llvm-svn: 262115
2016-02-27 06:40:41 +00:00
Nikolay Haustov 2f684f1347 [AMDGPU] Assembler: Basic support for MIMG
Add parsing and printing of image operands. Matches legacy sp3 assembler.
Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last.
Update SITargetLowering for new order.
Add basic MC test.
Update CodeGen tests.

Review: http://reviews.llvm.org/D17574
llvm-svn: 261995
2016-02-26 09:51:05 +00:00
Nikolay Haustov 161a158e5c [AMDGPU] Disassembler: Support for all VOP1 instructions.
Support all instructions with VOP1 encoding with 32 or 64-bit operands for VI subtarget:

VGPR_32 and VReg_64 operand register classes
VS_32 and VS_64 operand register classes with inline and literal constants
Tests for VOP1 instructions.

Patch by: skolton

Reviewers: arsenm, tstellarAMD

Review: http://reviews.llvm.org/D17194
llvm-svn: 261878
2016-02-25 16:09:14 +00:00
Nikolay Haustov 2e4c72977c [AMDGPU] Assembler: Simplify handling of optional operands
Resubmit with index problem fixed. Verified with valgrind.

Prepare to support DPP encodings.

For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands.
However this means that when parsing instruction which has no mnemonic prefix,
we cannot add both default values for VOP3 and for DPP optional operands
to OperandVector - neither instructions would match. So add default values
for optional operands to MCInst during conversion instead.

Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.

Review: http://reviews.llvm.org/D17445
llvm-svn: 261856
2016-02-25 10:58:54 +00:00
NAKAMURA Takumi 3d3d0f4151 Revert r261742, "[AMDGPU] Assembler: Simplify handling of optional operands"
It brought undefined behavior.

llvm-svn: 261839
2016-02-25 08:35:27 +00:00
Nikolay Haustov 4f073ca7fa [AMDGPU] Assembler: Simplify handling of optional operands
Prepare to support DPP encodings.

For DPP encodings, we want row_mask/bank_mask/bound_ctrl to be optional operands. However this means that when parsing instruction which has no mnemonic prefix, we cannot add both default values for VOP3 and for DPP optional operands to OperandVector - neither instructions would match. So add default values for optional operands to MCInst during conversion instead.

Mark more operands as IsOptional = 1 in .td files.
Do not add default values for optional operands to OperandVector in AMDGPUAsmParser.
Add default values for optional operands during conversion using new helper addOptionalImmOperand.
Change to cvtVOP3_2_mod to check instruction flag instead of presence of modifiers. In the future, cvtVOP3* functions can be combined into one.
Separate cvtFlat and cvtFlatAtomic.
Fix CNDMASK_B32 definition to have no modifiers.

Review: http://reviews.llvm.org/D17445

Reviewers: tstellarAMD
llvm-svn: 261742
2016-02-24 14:22:47 +00:00
Nikolay Haustov 3c728e43aa [AMDGPU] fix amd_kernel_code_t bit field position as per spec (added missing reserved fields)
lit tests passed before and after because it doesn't test the binary representation of amd_kernel_code_t.

Patch by: Valery Pykhtin (Valery.Pykhtin@amd.com)

Reviewers: arsenm
llvm-svn: 261732
2016-02-24 10:54:25 +00:00
Matt Arsenault cd09961fb3 AMDGPU: Check cheaper condition before SignBitIsZero
Don't do an expensive computeKnownBits call when we
can do the cheap check for legal offsets first.

llvm-svn: 261720
2016-02-24 04:55:29 +00:00
Nikolay Haustov 2a62b3c244 [AMDGPU] Fix operands of S_BFE_U64 and S_BFM_B64
src1 of s_bfe_u64 is 32-bit (same as s_bfe_i64).
src0 and src1 of s_bfm_b64 are 32-bit.
Update tests.

Review: http://reviews.llvm.org/D17480

Reviewers: arsenm
llvm-svn: 261621
2016-02-23 09:19:14 +00:00
Duncan P. N. Exon Smith 6307eb5518 CodeGen: TII: Take MachineInstr& in predicate API, NFC
Change TargetInstrInfo API to take `MachineInstr&` instead of
`MachineInstr*` in the functions related to predicated instructions
(I'll try to come back later and get some of the rest).  All of these
functions require non-null parameters already, so references are more
clear.  As a bonus, this happens to factor away a host of implicit
iterator => pointer conversions.

No functionality change intended.

llvm-svn: 261605
2016-02-23 02:46:52 +00:00
Duncan P. N. Exon Smith d84f600653 CodeGen: Bring back MachineBasicBlock::iterator::getInstrIterator()...
This is a little embarrassing.

When I reverted r261504 (getIterator() => getInstrIterator()) in
r261567, I did a `git grep` to see if there were new calls to
`getInstrIterator()` that I needed to migrate.  There were 10-20 hits,
and I blindly did a `sed ...` before calling `ninja check`.

However, these were `MachineInstrBundleIterator::getInstrIterator()`,
which predated r261567.  Perhaps coincidentally, these had an identical
name and return type.

This commit undoes my careless sed and restores
`MachineBasicBlock::iterator::getInstrIterator()`.

llvm-svn: 261577
2016-02-22 21:30:15 +00:00
Matt Arsenault fa67bdbde0 AMDGPU/R600: Implement allowsMisalignedMemoryAccess
This avoids some test regressions in a future commit
when unaligned operations are expanded when they
have custom lowering.

llvm-svn: 261570
2016-02-22 21:04:16 +00:00
Duncan P. N. Exon Smith c5b668deb8 Revert "CodeGen: MachineInstr::getIterator() => getInstrIterator(), NFC"
This reverts commit r261504, since it's not obvious the new name is
better:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160222/334298.html

I'll recommit if we get consensus that it's the right direction.

llvm-svn: 261567
2016-02-22 20:49:58 +00:00
Tom Stellard d93a34f714 [AMDGPU][llvm-mc] Support for 32-bit inline literals
Patch by: Artem Tamazov

Summary:
Note: Support for 64-bit inline literals TBD
Added: Support of abs/neg modifiers for literals (incomplete; parsing TBD).
Added: Some TODO comments.
Reworked/clarity: rename isInlineImm() to isInlinableImm()
Reworked/robustness: disallow BitsToFloat() with undefined value in isInlinableImm()
Reworked/reuse: isSSrc32/64(), isVSrc32/64()
Tests added.

Reviewers: tstellarAMD, arsenm

Subscribers: vpykhtin, nhaustov, SamWot, arsenm

Projects: #llvm-amdgpu-spb

Differential Revision: http://reviews.llvm.org/D17204

llvm-svn: 261559
2016-02-22 19:17:56 +00:00
Tom Stellard 82fc153baa [AMDGPU] [llvm-mc] [VI] Fix encoding of LDS/GDS instructions.
Patch by: Artem Tamazov

Summary: Tests added.

Reviewers: tstellarAMD, arsenm

Subscribers: vpykhtin, SamWot, #llvm-amdgpu-spb

Projects: #llvm-amdgpu-spb

Differential Revision: http://reviews.llvm.org/D17271

llvm-svn: 261558
2016-02-22 19:17:53 +00:00
Duncan P. N. Exon Smith dc0848c029 CodeGen: MachineInstr::getIterator() => getInstrIterator(), NFC
Delete MachineInstr::getIterator(), since the term "iterator" is
overloaded when talking about MachineInstr.

- Downcast to ilist_node in iplist::getNextNode() and getPrevNode() so
  that ilist_node::getIterator() is still available.
- Add it back as MachineInstr::getInstrIterator().  This matches the
  naming in MachineBasicBlock.
- Add MachineInstr::getBundleIterator().  This is explicitly called
  "bundle" (not matching MachineBasicBlock) to disintinguish it clearly
  from ilist_node::getIterator().
- Update all calls.  Some of these I switched to `auto` to remove
  boiler-plate, since the new name is clear about the type.

There was one call I updated that looked fishy, but it wasn't clear what
the right answer was.  This was in X86FrameLowering::inlineStackProbe(),
added in r252578 in lib/Target/X86/X86FrameLowering.cpp.  I opted to
leave the behaviour unchanged, but I'll reply to the original commit on
the list in a moment.

llvm-svn: 261504
2016-02-21 22:58:35 +00:00
Tom Stellard 467b5b9024 AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointer
Summary:
Instead of trying to replace SMRD instructions with a VGPR base pointer
with an equivalent MUBUF instruction, we now copy the base pointer to
SGPRs using v_readfirstlane.

This is safe to do, because any load selected as an SMRD instruction
has been proven to have a uniform base pointer, so each thread in the
wave will have the same pointer value in VGPRs.

This will fix some errors on VI from trying to replace SMRD instructions
with addr64-enabled MUBUF instructions that don't exist.

Reviewers: arsenm, cfang, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17305

llvm-svn: 261385
2016-02-20 00:37:25 +00:00
Tom Stellard 2d26fe7aa6 AMDGPU/SI: Fix s_waitcnt insertion for flat instructions
Summary:
This was broken in r260694 which swapped the address and data operands
for flat store instructions.  The code in SIInsertWaits assumes
that the data operand always comes before the address operand, so
we need to add a special case for flat.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17366

llvm-svn: 261330
2016-02-19 15:33:13 +00:00
Nicolai Haehnle f2c64db55a AMDGPU/SI: add llvm.amdgcn.image.load/store[.mip] intrinsics
Summary:
These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa
for the GL_ARB_shader_image_load_store extension.

IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has
a legacy name and pretends not to read memory.

Differential Revision: http://reviews.llvm.org/D17276

llvm-svn: 261224
2016-02-18 16:44:18 +00:00
Nikolay Haustov 10813a4efa Test commit access.
llvm-svn: 261199
2016-02-18 10:02:12 +00:00
Tom Stellard e1818af8c5 [AMDGPU] Disassembler: Added basic disassembler for AMDGPU target
Changes:

- Added disassembler project
- Fixed all decoding conflicts in .td files
- Added DecoderMethod=“NONE” option to Target.td that allows to
  disable decoder generation for an instruction.
- Created decoding functions for VS_32 and VReg_32 register classes.
- Added stubs for decoding all register classes.
- Added several tests for disassembler

Disassembler only supports:

- VI subtarget
- VOP1 instruction encoding
- 32-bit register operands and inline constants

[Valery]

One of the point that requires to pay attention to is how decoder
conflicts were resolved:

- Groups of target instructions were separated by using different
  DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate
  approach.

- There were conflicts in IMAGE_<> instructions caused by two
  different reasons:

1. dmask wasn’t specified for the output (fixed)
2. There are image instructions that differ only by the number of
   the address components but have the same encoding by the HW spec. The
   actual number of address components is determined by the HW at runtime
   using image resource descriptor starting from the VGPR encoded in an
   IMAGE instruction. This means that we should choose only one instruction
   from conflicting group to be the rule for decoder. I didn’t find the way
   to disable decoder generation for an arbitrary instruction and therefore
   made a onelinear fix to tablegen generator that would suppress decoder
   generation when DecoderMethod is set to “NONE”. This is a change that
   should be reviewed and submitted first. Otherwise I would need to
   specify different DecoderNamespace for every instruction in the
   conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not
   used in other targets.
3. IMAGE_GATHER decoder generation is for now disabled and to be
   done later.

[/Valery]

Patch By: Sam Kolton

Differential Revision: http://reviews.llvm.org/D16723

llvm-svn: 261185
2016-02-18 03:42:32 +00:00
Tom Stellard cc4c8718ed [AMDGPU] Rename $dst operand to $vdst for VOP instructions.
Summary: This change renames output operand for VOP instructions from dst to vdst. This is needed to enable decoding named operands for disassembler.

Reviewers: vpykhtin, tstellarAMD, arsenm

Subscribers: arsenm, llvm-commits, nhaustov

Projects: #llvm-amdgpu-spb

Differential Revision: http://reviews.llvm.org/D16920

llvm-svn: 260986
2016-02-16 18:14:56 +00:00
Matt Arsenault f2ddbf00ed AMDGPU: Prepare for reducing private element size.
Tests for the new scalarize all private access options will be
included with a future commit.

The only functional change is to make the split/scalarize behavior
for private access of > 4 element vectors to be consistent
with the flat/global handling. This makes the spilling worse
in the two changed tests.

llvm-svn: 260804
2016-02-13 04:18:53 +00:00
Tom Stellard 4409051d00 AMDGPU/SI: Add llvm.amdgcn.mov.dpp intrinsic
This intrinsic will be used to expose dpp functionality to higher-level
languages. It will map to the dpp version of v_mov_b32.

llvm-svn: 260792
2016-02-13 02:09:49 +00:00
Matt Arsenault d2759212b8 AMDGPU: Cleanup includes and random macros
llvm-svn: 260784
2016-02-13 01:24:08 +00:00
Matt Arsenault ce56a0ef54 AMDGPU: Add intrinsics for sin/cos
These provide direct access to the hardware instruction without
the unit version required like llvm.sin/llvm.cos lowering requires.

llvm-svn: 260782
2016-02-13 01:19:56 +00:00
Matt Arsenault 79963e80b8 AMDGPU: Rename intrinsic to better match instruction name
Also fixes missing f32 test.

llvm-svn: 260780
2016-02-13 01:03:00 +00:00
Tom Stellard 607051640c AMDGPU/SI: Add instruction defs for VOP1 DPP instructions
Reviewers: nhaustov, cfang, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17159

llvm-svn: 260774
2016-02-13 00:51:31 +00:00
Matt Arsenault 16f48d7c55 AMDGPU: Fix broken condition causing warning
llvm-svn: 260773
2016-02-13 00:36:10 +00:00
Tom Stellard bc4497b13c AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions
Reviewers: arsenm

Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16603

llvm-svn: 260765
2016-02-12 23:45:29 +00:00
Tom Stellard 46937ca4e7 [AMDGPU] Assembler: Swap operands of flat_store instructions to match AMD assembler
Historically, AMD internal sp3 assembler has flat_store* addr, data
format. To match existing code and to enable reuse, change LLVM
definitions to match.  Also update MC and CodeGen tests.

Differential Revision: http://reviews.llvm.org/D16927

Patch by: Nikolay Haustov

llvm-svn: 260694
2016-02-12 17:57:54 +00:00
Changpeng Fang e07f1aa8fa AMDGPU/SI: Annotate Loops with Constant Condition in SIAnnotateControlFlow pass.
Summary:
  It is possible that the loop condition can be a boolean constant (infinite loop,
for example). So we sould handle constant condition in annotating a loop. This
patch adds this functionality to support annotating constant condition.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D15093

llvm-svn: 260692
2016-02-12 17:11:04 +00:00
Benjamin Kramer ac5e36f52e Fix uninitialized memory read.
Found by msan.

llvm-svn: 260676
2016-02-12 12:37:21 +00:00
Matt Arsenault 296b849163 AMDGPU: Set flat_scratch from flat_scratch_init reg
This was hardcoded to the static private size, but this
would be missing the offset and additional size for someday
when we have dynamic sizing.

Also stops always initializing flat_scratch even when unused.

In the future we should stop emitting this unless flat instructions
are used to access private memory. For example this will initialize
it almost always on VI because flat is used for global access.

llvm-svn: 260658
2016-02-12 06:31:30 +00:00
Matt Arsenault 24ee0785dd AMDGPU: Set element_size in private resource descriptor
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.

llvm-svn: 260651
2016-02-12 02:40:47 +00:00
Matt Arsenault 16f7bcb661 AMDGPU: Fix mishandling alignment when scalarizing vector loads/stores
I don't think this was causing any real problems, so I'm not sure
how to test for this.

llvm-svn: 260646
2016-02-12 02:22:21 +00:00
Matt Arsenault 55d49cfe2d AMDGPU: Initialize SILowerControlFlow
llvm-svn: 260645
2016-02-12 02:16:10 +00:00
Matt Arsenault 806dd0a532 AMDGPU: Remove trailing whitespace
llvm-svn: 260644
2016-02-12 02:16:07 +00:00
Tom Stellard 1397d49ef5 AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs
Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations.  When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17102

llvm-svn: 260599
2016-02-11 21:45:07 +00:00
Tom Stellard fa8f204c5b AMDGPU/SI: When splitting SMRD instructions, add its users to VALU worklist
Summary:
When we split SMRD instructions into two MUBUFs we were adding the users
of the newly created MUBUFs to the VALU worklist.  However, the only
users these instructions had was the REG_SEQUENCE that was inserted
by splitSMRD when the original SMRD instruction was split.

We need to make sure to add the users of the original SMRD to the VALU
worklist before it is split.

I have a test case, but it requires one other bug fix, so it will be
added in a later commt.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17101

llvm-svn: 260588
2016-02-11 21:14:34 +00:00
Tom Stellard e993451f5c [AMDGPU] Fix for "v_div_scale_f64 reg, vcc, ..." parsing
Summary:
Added support for "VOP3Only" attribute in VOP3bInst encoding.
Set VOP3Only=1 for V_DIV_SCALE_F64/32 insns.
Added support for multi-dest instructions in AMDGPUAs::cvt*().
Added lit test for "V_DIV_SCALE_F64|F32 vreg,vcc|sreg,vreg,vreg,vreg".

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, SamWot, nhaustov, vpykhtin

Differential Revision: http://reviews.llvm.org/D16995

Patch By: Artem Tamazov

llvm-svn: 260560
2016-02-11 18:25:26 +00:00
Matt Arsenault fcb345f172 AMDGPU: Fix constant bus use check with subregisters
If the two operands to an instruction were both
subregisters of the same super register, it would incorrectly
think this counted as the same constant bus use.

This fixes the verifier error in fmin_legacy.ll which
was missing -verify-machineinstrs.

llvm-svn: 260495
2016-02-11 06:15:39 +00:00
Matt Arsenault 427c548925 AMDGPU: Fix passes depending on dominator tree for no reason
llvm-svn: 260494
2016-02-11 06:15:34 +00:00
Matt Arsenault fe26def35c AMDGPU: Fix not handling new workitem intrinsics in DivergenceAnalysis
llvm-svn: 260491
2016-02-11 05:32:51 +00:00
Matt Arsenault 9524566314 AMDGPU: Split R600 and SI store lowering
These were only sharing some somewhat incorrect
logic for when to scalarize or split vectors.

llvm-svn: 260490
2016-02-11 05:32:46 +00:00
Tom Stellard a90b9526df [AMDGPU] Assembler: Fix VOP3 only instructions
Separate methods to convert parsed instructions to MCInst:

  - VOP3 only instructions (always create modifiers as operands in MCInst)
  - VOP2 instrunctions with modifiers (create modifiers as operands
    in MCInst when e64 encoding is forced or modifiers are parsed)
  - VOP2 instructions without modifiers (do not create modifiers
    as operands in MCInst)
  - Add VOP3Only flag. Pass HasMods flag to VOP3Common.
  - Simplify code that deals with modifiers (-1 is now same as
    0). This is no longer needed.
  - Add few tests (more will be added separately).
    Update error message now correct.

Patch By: Nikolay Haustov

Differential Revision: http://reviews.llvm.org/D16778

llvm-svn: 260483
2016-02-11 03:28:15 +00:00
Nicolai Haehnle d791bd07c7 AMDGPU: Release the scavenged offset register during VGPR spill
Summary:
This fixes a crash where subsequent spills would be unable to scavenge
a register. In particular, it fixes a crash in piglit's
spec@glsl-1.50@execution@geometry@max-input-components (the test still
has a shader that fails to compile because of too many SGPR spills, but
at least it doesn't crash any more).

This is a candidate for the release branch.

Reviewers: arsenm, tstellarAMD

Subscribers: qcolombet, arsenm

Differential Revision: http://reviews.llvm.org/D16558

llvm-svn: 260427
2016-02-10 20:13:58 +00:00
Matt Arsenault a14364115e AMDGPU: Fix indentation and variable names
llvm-svn: 260399
2016-02-10 18:21:45 +00:00
Matt Arsenault 6dfda9625d AMDGPU: Split R600 and SI load lowering
These weren't actually sharing anything in the common
LowerLOAD.

llvm-svn: 260398
2016-02-10 18:21:39 +00:00
Ahmed Bougacha f8dfb47c02 [CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI.
llvm-svn: 260316
2016-02-09 22:54:12 +00:00
Tom Stellard 309617645d AMDGPU/SI: Implement a work-around for smrd corrupting vccz bit
Summary:
We will hit this once we have enabled uniform branches.  The
smrd-vccz-bug.ll test will be added with the uniform branch commit.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16725

llvm-svn: 260137
2016-02-08 19:49:20 +00:00
Matt Arsenault 92edab2df9 AMDGPU: Remove bfi and bfm intrinsics
Nothing is using them.

llvm-svn: 260123
2016-02-08 19:06:01 +00:00
Matt Arsenault 7f83397d72 AMDGPU: Account for LDS alignment
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.

Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.

llvm-svn: 259912
2016-02-05 19:47:29 +00:00
Matt Arsenault cf84e26fb6 AMDGPU: Preserve alignments on new created globals
Also switch to internal linkage, and include the name of the function in
the name.

llvm-svn: 259911
2016-02-05 19:47:23 +00:00
Tom Stellard 1242ce9695 AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16862

llvm-svn: 259900
2016-02-05 18:44:57 +00:00
Tom Stellard 5dde1d2eb3 AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructors
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16863

llvm-svn: 259897
2016-02-05 18:29:17 +00:00
Tom Stellard 6e1967ef66 AMDGPU/SI: Correctly initialize SIInsertWaits pass
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16724

llvm-svn: 259894
2016-02-05 17:42:38 +00:00
Matt Arsenault de4208122b AMDGPU: Do not promote allocas with non-inbounds GEPs
If we can't assume the pointer value isn't within the bounds
of the object, it seems risky to try to replace the pointer
calculations.

llvm-svn: 259573
2016-02-02 21:16:12 +00:00
Matt Arsenault 7e747f1a38 AMDGPU: Handle promoting memmove
Also add missing tests for the others.

llvm-svn: 259558
2016-02-02 20:28:10 +00:00
Matt Arsenault 8b175672cb AMDGPU: Skip promote alloca with no optimizations
llvm-svn: 259551
2016-02-02 19:32:42 +00:00
Matt Arsenault fb8cdbae0c AMDGPU: Minor cleanups for AMDGPUPromoteAlloca
Mostly convert to use range loops.

llvm-svn: 259550
2016-02-02 19:32:35 +00:00
Matt Arsenault e5737f7cac AMDGPU: Report AMDGPUPromoteAlloca changed the function
llvm-svn: 259547
2016-02-02 19:18:57 +00:00
Matt Arsenault ad1348459f AMDGPU: Whitelist handled intrinsics
We shouldn't crash on unhandled intrinsics.
Also simplify failure handling in loop.

llvm-svn: 259546
2016-02-02 19:18:53 +00:00
Matt Arsenault 853a1fc6d9 AMDGPU: Use inbounds when calculating workitem offset
When promoting allocas to LDS, we know we are indexing
into a specific area just created, and the calculation
will also never overflow.

Also emit some of the muls as nsw nuw, because instcombine
infers this already from the range metadata. I think
putting this on the other adds and muls might be OK too,
but I'm not 100% sure.

llvm-svn: 259545
2016-02-02 19:18:48 +00:00
Oliver Stannard 7e7d983a87 Refactor backend diagnostics for unsupported features
Re-commit of r258951 after fixing layering violation.

The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

llvm-svn: 259498
2016-02-02 13:52:43 +00:00
Matt Arsenault e013246462 AMDGPU: Fix emitting invalid workitem intrinsics for HSA
The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.

Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.

Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.

Start emitting errors for the intrinsics not handled with HSA.

llvm-svn: 259297
2016-01-30 05:19:45 +00:00
Matt Arsenault d0799df707 AMDGPU: Stop checking intrinsics not used by HSA for dispatch-ptr
Only the dispatch.ptr intrinsic is supposed to be used now to get
the workgroup size, and the read.local.size intrinsics do not
work correctly.

llvm-svn: 259296
2016-01-30 05:10:59 +00:00
Matt Arsenault 43976df0da AMDGPU: Add new amdgcn workitem intrinsics
These use the correct prefix and follow the HSA naming convention
rather than the config register option names.

llvm-svn: 259293
2016-01-30 04:25:19 +00:00
Matt Arsenault 295875efda AMDGPU: Remove 24-bit intrinsics
The known bit matching code seems to work reasonably well,
so these shouldn't really be needed.

llvm-svn: 259180
2016-01-29 10:05:16 +00:00
Matt Arsenault 5b39b34ca5 AMDGPU: Match fmed3 patterns with legacy fmin/fmax
llvm-svn: 259090
2016-01-28 20:53:48 +00:00
Matt Arsenault f639c32739 AMDGPU: Match some med3 patterns
llvm-svn: 259089
2016-01-28 20:53:42 +00:00
Matt Arsenault 7293f9895e AMDGPU: Set DX10Clamp bit
llvm-svn: 259088
2016-01-28 20:53:35 +00:00
Tom Stellard 3d2c852958 AMDGPU: waitcnt operand fixes
Summary:
Allow lgkmcnt up to 0xF (hardware allows that).
Fix mask for ExpCnt in AMDGPUInstPrinter.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16314

Patch by: Nikolay Haustov

llvm-svn: 259059
2016-01-28 17:13:44 +00:00
Tom Stellard 2ff726272a AMDGPU: Move subtarget specific code out of AMDGPUInstrInfo.cpp
Summary:
Also delete all the stub functions that are identical to the
implementations in TargetInstrInfo.cpp.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16609

llvm-svn: 259054
2016-01-28 16:04:37 +00:00
Oliver Stannard 02fa1c80c4 Revert r259035, it introduces a cyclic library dependency
llvm-svn: 259045
2016-01-28 13:19:47 +00:00
Oliver Stannard b4b092ea1b Add backend dignostic printer for unsupported features
Re-commit of r258951 after fixing layering violation.

The related LLVM patch adds a backend diagnostic type for reporting
unsupported features, this adds a printer for them to clang.

In the case where debug location information is not available, I've
changed the printer to report the location as the first line of the
function, rather than the closing brace, as the latter does not give the
user any information. This also affects optimisation remarks.

Differential Revision: http://reviews.llvm.org/D16590

llvm-svn: 259035
2016-01-28 10:07:27 +00:00
NAKAMURA Takumi 628a7a0aef Revert r258951 (and r258950), "Refactor backend diagnostics for unsupported features"
It broke layering violation in LLVMIR.

clang r258950 "Add backend dignostic printer for unsupported features"
llvm  r258951 "Refactor backend diagnostics for unsupported features"

llvm-svn: 259016
2016-01-28 04:41:32 +00:00
Oliver Stannard 1e67a9f196 Refactor backend diagnostics for unsupported features
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

The implementation of DiagnosticInfoUnsupported::print must be in
lib/Codegen rather than in the existing file in lib/IR/ to avoid
introducing a dependency from IR to CodeGen.

Differential Revision: http://reviews.llvm.org/D16590

llvm-svn: 258951
2016-01-27 17:30:33 +00:00
Tom Stellard 6e3b14de62 AMDGPU/SI: Fix commuting of 32-bit VOPC instructions
Summary:
We didn't have entries in the commuting table for the 32-bit
instructions.  I don't think we hit this problem now, but we
will once uniform branching is enabled.  Tests will come in
a later commit.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16600

llvm-svn: 258936
2016-01-27 15:53:52 +00:00
Marek Olsak e86f252209 AMDGPU/SI: Stoney has only 16 LDS banks
Summary:
This is a candidate for stable, along with all patches that add the "stoney"
processor.

Reviewers: tstellarAMD

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16485

llvm-svn: 258922
2016-01-27 11:19:45 +00:00
Benjamin Kramer b3e8a6d2b8 Move MCTargetAsmParser.h to llvm/MC/MCParser where it belongs.
llvm-svn: 258917
2016-01-27 10:01:28 +00:00
Matt Arsenault b22828f2fb AMDGPU: Fix default device handling
When no device name is specified, default to kaveri
for HSA since SI is not supported and it woud fail.

Default to "tahiti" instead of "SI" since these are
effectively the same, and tahiti is an actual device.

Move default device handling to the TargetMachine
rather than the AMDGPUSubtarget. The module ISA version
is computed from the device name provided with the target
machine, so the attributes printed by the AsmPrinter were
inconsistent with those computed in the subtarget.

Also remove DevName field from subtarget since it's redundant
with getCPU() in the superclass.

llvm-svn: 258901
2016-01-27 02:17:49 +00:00
Reid Kleckner 5b4637141e [llvm-tblgen] Avoid StringMatcher for GCC and MS builtin names
This brings the compile time of Function.cpp from ~40s down to ~4s for
me locally. It also shaves off about 400KB of object file size in a
release+asserts build.

I also realized that the AMDGPU backend does not have any GCC builtin
names to match, so the extra lookup was a no-op. I removed it to silence
a zero-length string table array warning. There should be no functional
change here.

This change really ends the story of PR11951.

llvm-svn: 258897
2016-01-27 01:43:12 +00:00
Reid Kleckner 1c93b4cd7b [llvm-tblgen] Stop emitting the intrinsic name matching code
The AMDGPU backend was the last user of the old StringMatcher
recognition code. Move it over to the new lookupLLVMIntrinsicName
funciton, which is now improved to handle all of the interesting edge
cases exposed by AMDGPU intrinsic names.

llvm-svn: 258875
2016-01-26 23:01:21 +00:00
Chris Bieneman e49730d4ba Remove autoconf support
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html

"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi

Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark

Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16471

llvm-svn: 258861
2016-01-26 21:29:08 +00:00
Matt Arsenault bee7575e1a AMDGPU: Move AMDGPU intrinsics only used by R600
llvm-svn: 258790
2016-01-26 04:49:24 +00:00
Matt Arsenault 382d945d16 AMDGPU: Tidy minor td file issues
Make comments and indentation more consistent.

Rearrange a few things to be in a more consistent order,
such as organizing subtarget features from those describing
an actual device property, and those used as options.

llvm-svn: 258789
2016-01-26 04:49:22 +00:00
Matt Arsenault c5f6152911 AMDGPU: Make v32i8/v64i8 illegal types
Old intrinsics were forcing these, but they have now all
been removed. This fixes large i8 vector operations generally
being broken.

llvm-svn: 258788
2016-01-26 04:43:48 +00:00
Matt Arsenault 018179fc46 AMDGPU: Remove old sample intrinsics
I did my best to try to update all the uses in tests that
just happened to use the old ones to the newer intrinsics.

I'm not sure I got all of the immediate operand conversions
correct, since the value seems to have been ignored by the
old pattern but I don't think it really matters.

llvm-svn: 258787
2016-01-26 04:38:08 +00:00
Matt Arsenault 051d6f9fde AMDGPU: Add new amdgcn intrinsics for cube instructions
More cleanup to try to get all intrinsics using the correct
amdgcn prefix that are as close to the instruction as possible.

llvm-svn: 258786
2016-01-26 04:29:56 +00:00
Matt Arsenault 9a10cea7fb AMDGPU: Implement read_register and write_register intrinsics
Some of the special intrinsics now that now correspond to a instruction
also have special setting of some registers, e.g. llvm.SI.sendmsg sets
m0 as well as use s_sendmsg. Using these explicit register intrinsics
may be a better option.

Reading the exec mask and others may be useful for debugging. For this
I'm not sure this is entirely correct because we would want this to
be convergent, although it's possible this is already treated
sufficently conservatively.

llvm-svn: 258785
2016-01-26 04:29:24 +00:00
Matt Arsenault 0c3e2338fe AMDGPU: Restore AMDGPU prefixed rsq intrinsic for now
Also move into backend intrinsics to discourage use of the old name.

llvm-svn: 258783
2016-01-26 04:14:16 +00:00
Matt Arsenault 7713162c32 AMDGPU: Remove more unused intrinsics
Replace tests with lrp with basic IR expansion

llvm-svn: 258612
2016-01-23 05:42:38 +00:00
Matt Arsenault f75257aaa6 AMDGPU: Move amdgcn intrinsic handling into SITargetLowering
llvm-svn: 258608
2016-01-23 05:32:20 +00:00
Matt Arsenault f1341406bf AMDGPU: Remove IntrNoMem from llvm.SI.sendmsg
This has side effects.

llvm-svn: 258607
2016-01-23 05:32:18 +00:00
Matt Arsenault 2a93bb6365 AMDGPU: Remove Feature64BitPtr
This is a leftover from AMDIL that doesn't do anything
and doesn't belong here.

llvm-svn: 258606
2016-01-23 05:32:14 +00:00
Matt Arsenault 10ca39ca8b AMDGPU: Add new name for barrier intrinsic
llvm-svn: 258558
2016-01-22 21:30:43 +00:00
Matt Arsenault bef34e21c7 AMDGPU: Rename intrinsics to use amdgcn prefix
The intrinsic target prefix should match the target name
as it appears in the triple.

This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.

llvm-svn: 258557
2016-01-22 21:30:34 +00:00
Matt Arsenault 0b783ef076 AMDGPU: Fix crash with invariant markers
The promote alloca pass didn't handle these intrinsics and crashed.
These intrinsics should accept any address space, but for now just
erase them to avoid breaking.

llvm-svn: 258537
2016-01-22 19:47:54 +00:00
Matt Arsenault 59bd3014f2 AMDGPU: Rename some r600 intrinsics to use correct TargetPrefix
These ones aren't directly emitted by mesa and inserted by a pass.

llvm-svn: 258523
2016-01-22 19:00:09 +00:00
Matt Arsenault bb4ff5f5b6 AMDGPU: Remove unused R600 intrinsics
llvm-svn: 258522
2016-01-22 18:52:14 +00:00
Matt Arsenault 7898b90ee1 AMDGPU: Change control flow intrinsics to use amdgcn prefix
These aren't supposed to be used outside of the backend,
so there aren't any users to worry about.

llvm-svn: 258516
2016-01-22 18:42:55 +00:00
Matt Arsenault 8d903029e8 AMDGPU: Don't use separate mulhu/mulhs Pats
llvm-svn: 258515
2016-01-22 18:42:49 +00:00
Matt Arsenault ee0930821a AMDGPU: Remove random TGSI intrinsic
I don't think this was ever used.

llvm-svn: 258514
2016-01-22 18:42:44 +00:00
Matt Arsenault 0cbaa1762b AMDGPU: Remove AMDGPU.fract intrinsic
Mesa doesn't use this, and this is pattern matched already
from fsub x, (ffloor x)

llvm-svn: 258513
2016-01-22 18:42:38 +00:00
Tom Stellard de008d338c AMDGPU/SI: Pass whether to use the SI scheduler via Target Attribute
Summary:
Currently the SI scheduler can be selected via command line option,
but it turned out it would be better if it was selectable via a Target Attribute.

This patch adds "si-scheduler" attribute to the backend.

Reviewers: tstellarAMD, echristo

Subscribers: echristo, arsenm

Differential Revision: http://reviews.llvm.org/D16192

llvm-svn: 258386
2016-01-21 04:28:34 +00:00
Tom Stellard d1efda8e9e AMDGPU/SI: Promote i1 SETCC operations
Summary:
While working on uniform branching, I've hit a few cases where we emit
i1 SETCC operations.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16233

llvm-svn: 258352
2016-01-20 21:48:24 +00:00
Matt Arsenault 7836f895fe AMDGPU: Fix old comments that mention AMDIL
llvm-svn: 258350
2016-01-20 21:22:21 +00:00
Matt Arsenault 7ba334a7d9 AMDGPU: Remove AMDGPU.trunc intrinsic
llvm-svn: 258348
2016-01-20 21:05:53 +00:00
Matt Arsenault 15fbe49daf AMDGPU: Remove AMDIL.fraction intrinsic
llvm-svn: 258347
2016-01-20 21:05:49 +00:00
Matt Arsenault 7cccd2672e AMDGPU: Remove AMDIL.round.nearest intrinsic
llvm-svn: 258346
2016-01-20 21:05:40 +00:00
Matt Arsenault 1c9e4ef0df AMDGPU: Remove abs intrinsic
llvm-svn: 258343
2016-01-20 20:58:29 +00:00
Matt Arsenault f7e6e89718 AMDGPU: Remove min/max intrinsics
This removes support for mesa 11.0.x

llvm-svn: 258342
2016-01-20 20:50:19 +00:00
Tom Stellard 77a177722f Correctly initialize SIAnnotateControlFlow
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16304

llvm-svn: 258319
2016-01-20 15:48:27 +00:00
Matthias Braun 5d458617aa RegisterPressure: Make liveness tracking subregister aware
Differential Revision: http://reviews.llvm.org/D14968

llvm-svn: 258258
2016-01-20 00:23:26 +00:00
Tom Stellard 2e045bbc5f AMDGPU/SI: Prevent the DAGCombiner from creating setcc with i1 inputs
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15035

llvm-svn: 258256
2016-01-20 00:13:22 +00:00
Matt Arsenault 33e3ecee0c AMDGPU: Reduce 64-bit SRAs
llvm-svn: 258096
2016-01-18 22:09:04 +00:00
Matt Arsenault 6e3a45193a AMDGPU: Split 64-bit and of constant up
This breaks the tests that were meant for testing
64-bit inline immediates, so move those to shl where
they won't be broken up.

This should be repeated for the other related bit ops.

llvm-svn: 258095
2016-01-18 22:01:13 +00:00
Matt Arsenault 3cbbc10488 AMDGPU: Generalize shl combine
Reduce 64-bit shl with constant > 32. We already special cased
this for the == 32 case, but this also works for any >= 32 constant.

llvm-svn: 258092
2016-01-18 21:55:14 +00:00
Matt Arsenault 80edab99ff AMDGPU: Reduce 64-bit lshr by constant to 32-bit
64-bit shifts are very slow on some subtargets.

llvm-svn: 258090
2016-01-18 21:43:36 +00:00
Matt Arsenault e83690c1cc AMDGPU: Add subtarget feature for instruction rates
llvm-svn: 258085
2016-01-18 21:13:50 +00:00
Manuel Jacob 5f6eaac611 GlobalValue: use getValueType() instead of getType()->getPointerElementType().
Reviewers: mjacob

Subscribers: jholewinski, arsenm, dsanders, dblaikie

Patch by Eduard Burtescu.

Differential Revision: http://reviews.llvm.org/D16260

llvm-svn: 257999
2016-01-16 20:30:46 +00:00
Rui Ueyama da00f2fdf4 Update to use new name alignTo().
llvm-svn: 257804
2016-01-14 21:06:47 +00:00
Rafael Espindola 8340f94df1 Convert a few assert failures into proper errors.
Fixes PR25944.

llvm-svn: 257697
2016-01-13 22:56:57 +00:00
Changpeng Fang c16be00313 AMDGPU/SI: Update ISA version for FIJI
llvm-svn: 257666
2016-01-13 20:39:25 +00:00
Hans Wennborg 81efb6b418 Fix struct/class mismatch for MachineSchedContext
llvm-svn: 257648
2016-01-13 18:59:45 +00:00
Marek Olsak 46dadbfab2 AMDGPU/SI: Fix a GPU hang with POS_W_FLOAT enabled
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16037

llvm-svn: 257625
2016-01-13 17:23:20 +00:00
Marek Olsak 3c0ebc71f1 AMDGPU/SI: Remove ending s_endpgm from non-void functions
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16035

llvm-svn: 257623
2016-01-13 17:23:12 +00:00
Marek Olsak 8e9cc63bfb AMDGPU/SI: Add s_waitcnt at the end of non-void functions
Summary:
v2: Make ReturnsVoid private, so that I can another 8 lines of code and
    look more productive.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16034

llvm-svn: 257622
2016-01-13 17:23:09 +00:00
Marek Olsak 8a0f335ad6 AMDGPU/SI: Add support for non-void functions
Summary:
Return values can be stored in SGPRs (i32) and VGPRs (f32).

This will be used by functions which expect some bytecode or other binary to
be appended at the end. It allows defining in which registers the return
values will be stored.

v2: don't do this for compute shaders

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16033

llvm-svn: 257621
2016-01-13 17:23:04 +00:00
Nicolai Haehnle 02c3291566 AMDGPU/SI: Add SI Machine Scheduler
Summary:
It is off by default, but can be used
with --misched=si

Patch by: Axel Davy

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D11885

llvm-svn: 257609
2016-01-13 16:10:10 +00:00
Marek Olsak 4e99b6ec01 AMDGPU/SI: Allow more shader inputs
Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16032

llvm-svn: 257593
2016-01-13 11:46:48 +00:00
Marek Olsak b6c8c3d165 AMDGPU/SI: Allow any number of PS inputs
Summary:
With the ability to concatenate shader binaries, the limit of 15 no longer
applies.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16031

llvm-svn: 257592
2016-01-13 11:46:10 +00:00