Commit Graph

1318 Commits

Author SHA1 Message Date
Matt Arsenault c88ba36eab AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

llvm-svn: 285490
2016-10-29 04:05:06 +00:00
Tom Stellard 6695ba0440 AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.

Reviewers: tony-tye, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D25998

llvm-svn: 285479
2016-10-28 23:53:48 +00:00
Matt Arsenault 4e9c1e3a79 AMDGPU: Fix instruction flags for s_endpgm
Set isReturn, remove hasSideEffects. Also remove
hasCtrlDep, I'm not really sure what that does.

llvm-svn: 285476
2016-10-28 23:00:38 +00:00
Matt Arsenault 7b6475568d AMDGPU: Add definitions for scalar store instructions
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.

This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.

llvm-svn: 285463
2016-10-28 21:55:15 +00:00
Matt Arsenault 4b6a6cc8e9 AMDGPU: Rename glc operand type
While trying to add the glc bit to SMEM instructions on VI
with the new refactoring I ran into some kind of shadowing
problem for the glc operand when using the pseudoinstruction
as a multiclass parameter.

Everywhere that currently uses it defines the operand to have the same
name as its type, i.e. glc:$glc which works. For some reason now it
conflicts, and its up evaluating to the wrong thing. For the
real encoding classes,

let Inst{16} = !if(ps.has_glc, glc, ?); was not being evaluated
and still visible in the Inst initializer in the expanded td file.
In other cases I got a a different error about an illegal operand
where this was using { 0 } initializer from the bits<1> glc initializer
instead of evaluating it as false in the if.

For consistency all of the operand types should probably
be captialized to avoid conflicting with the variable names
unless somebody has a better idea of how to fix this.

llvm-svn: 285462
2016-10-28 21:55:08 +00:00
Matt Arsenault 4eae301995 AMDGPU: Diagnose using too many SGPRs
This is possible when using inline asm.

llvm-svn: 285447
2016-10-28 20:31:47 +00:00
Matt Arsenault 08906a3c62 AMDGPU: Fix using incorrect private resource with no allocation
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.

llvm-svn: 285435
2016-10-28 19:43:31 +00:00
Tom Stellard aea899e2a0 AMDGPU/SI: Handle hazard with s_rfe_b64
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25638

llvm-svn: 285368
2016-10-27 23:50:21 +00:00
Tom Stellard 04051b5fad AMDGPU/SI: Handle hazard with sgpr lane selects for v_{read,write}lane
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25637

llvm-svn: 285367
2016-10-27 23:42:29 +00:00
Tom Stellard 6b9c1be4ea AMDGPU/SI: Fix unused variable warning on non-debug builds
llvm-svn: 285363
2016-10-27 23:28:03 +00:00
Tom Stellard b133fbb9a4 AMDGPU/SI: Handle hazard with > 8 byte VMEM stores
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25577

llvm-svn: 285359
2016-10-27 23:05:31 +00:00
Tom Stellard 30d30824b4 AMDGPU/SI: Handle s_setreg hazard in GCNHazardRecognizer
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25528

llvm-svn: 285338
2016-10-27 20:39:09 +00:00
Nicolai Haehnle 7b0e25b7ad AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.

Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].

The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25829

llvm-svn: 285273
2016-10-27 08:15:07 +00:00
Yaxun Liu 94add85adb AMDGPU: Refactor processor definition to use ISA version features
Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend.

Refactor processor definition to use ISA version features.

Fixed ISA version for stoney.

Based on Laurent Morichetti's patch.

Differential Revision: https://reviews.llvm.org/D25919

llvm-svn: 285210
2016-10-26 16:37:56 +00:00
Matt Arsenault 39787bdcbb Reapply "AMDGPU: Don't use offen if it is 0"
This reverts r283003

llvm-svn: 285203
2016-10-26 15:08:16 +00:00
Matt Arsenault 1110f14b42 AMDGPU: Fix counting si_mask_branch as 4 bytes
llvm-svn: 285202
2016-10-26 14:53:54 +00:00
Tom Stellard f8e6eaff6e AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch
Summary:
A single flat memory operations that might access the scratch buffer
can only access MaxPrivateElementSize bytes.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25788

llvm-svn: 285198
2016-10-26 14:38:47 +00:00
Peter Collingbourne 6733564e5a Target: Change various section classifiers in TargetLoweringObjectFile to take a GlobalObject.
These functions are about classifying a global which will actually be
emitted, so it does not make sense for them to take a GlobalValue which may
for example be an alias.

Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to
look through aliases before using TargetLoweringObjectFile interfaces. These
are functional changes but all appear to be bug fixes.

Differential Revision: https://reviews.llvm.org/D25917

llvm-svn: 285006
2016-10-24 19:23:39 +00:00
Nicolai Haehnle a785209bc2 AMDGPU: Fix Two Address problems with v_movreld
Summary:
The v_movreld machine instruction is used with three operands that are
in a sense tied to each other (the explicit VGPR_32 def and the implicit
VGPR_NN def and use). There is no way to express that using the currently
available operand bits, and indeed there are cases where the Two Address
instructions pass does the wrong thing.

This patch introduces a new set of pseudo instructions that are identical
in intended semantics as v_movreld, but they only have two tied operands.

Having to add a new set of pseudo instructions is admittedly annoying, but
it's a fairly straightforward and solid approach. The only alternative I
see is to try to teach the Two Address instructions pass about Three Address
instructions, and I'm afraid that's trickier and is going to end up more
fragile.

Note that v_movrels does not suffer from this problem, and so this patch
does not touch it.

This fixes several GL45-CTS.shaders.indexing.* tests.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25633

llvm-svn: 284980
2016-10-24 14:56:02 +00:00
Konstantin Zhuravlyov fda33eaf0c [AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP
This will prevent following regression when enabling i16 support (D18049):
  test/CodeGen/AMDGPU/cvt_f32_ubyte.ll

Differential Revision: https://reviews.llvm.org/D25805

llvm-svn: 284891
2016-10-21 22:10:03 +00:00
Tom Stellard 6c7dd980e4 AMDGPU/SI: Fix crash caused by r284267
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25782

llvm-svn: 284875
2016-10-21 20:25:11 +00:00
Artem Tamazov 751985a757 [AMDGPU][mc] Fix ds_min/max[_rtn]_f32 - extra source operand removed.
Fixes Bug 28215. Lit tests updated.

Differential Revision: https://reviews.llvm.org/D25837

llvm-svn: 284825
2016-10-21 14:49:22 +00:00
Konstantin Zhuravlyov 521e5ef4ce [AMDGPU] Make note record name a static const member of target streamer
Differential Revision: https://reviews.llvm.org/D25746

llvm-svn: 284760
2016-10-20 18:22:36 +00:00
Konstantin Zhuravlyov 08326b6256 [AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only)
Differential Revision: https://reviews.llvm.org/D25693

llvm-svn: 284759
2016-10-20 18:12:38 +00:00
Sanjay Patel 0051efcf97 [Target] remove TargetRecip class; 2nd try
This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs
caused by faulty usage of StringRef.

This version also renames a pair of functions:
getRecipEstimateDivEnabled()
getRecipEstimateSqrtEnabled()
as suggested by Eric Christopher.

original commit msg:

[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering

This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284746
2016-10-20 16:55:45 +00:00
Valery Pykhtin e55fd41f73 [AMDGPU] add fcopysign(f64, f32) pattern
Differential revision: https://reviews.llvm.org/D25827

llvm-svn: 284743
2016-10-20 16:17:54 +00:00
Benjamin Kramer 2a8bef8769 Do a sweep over move ctors and remove those that are identical to the default.
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.

No functionality change intended.

llvm-svn: 284721
2016-10-20 12:20:28 +00:00
Wei Ding 3cb2a1e8d1 AMDGPU : Add a function to enable and disable IEEEBit for SC and shader
respectively.

Differential Revision: http://reviews.llvm.org/D25789

llvm-svn: 284655
2016-10-19 22:34:49 +00:00
Krzysztof Parzyszek c87155037b [AMDGPU] Stop using MCRegisterClass::getSize()
Differential Review: https://reviews.llvm.org/D24675

llvm-svn: 284619
2016-10-19 17:40:36 +00:00
Sanjay Patel 19601fa587 revert r284495: [Target] remove TargetRecip class
There's something wrong with the StringRef usage while parsing the attribute string.

llvm-svn: 284513
2016-10-18 18:36:49 +00:00
Sanjay Patel 08fff9ca81 [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.

This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.

If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.

As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.

Differential Revision: https://reviews.llvm.org/D25440

llvm-svn: 284495
2016-10-18 17:05:05 +00:00
Konstantin Zhuravlyov 98a3ac7106 [AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it
Differential Revision: https://reviews.llvm.org/D25694

llvm-svn: 284435
2016-10-17 22:40:15 +00:00
Tom Stellard 083f1626f5 AMDGPU/SI: LowerParameter() should be computing align based on memory type
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25203

llvm-svn: 284398
2016-10-17 16:56:19 +00:00
Tom Stellard bc6c523cce AMDGPU/SI: Fix LowerParameter() for i16 arguments
Summary:
If we are loading an i16 value from a 32-bit memory location, then
we need to be able to truncate the loaded value to i16.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25198

llvm-svn: 284397
2016-10-17 16:21:45 +00:00
Tom Stellard 961811c906 AMDGPU/SI: Handle s_getreg hazard in GCNHazardRecognizer
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25526

llvm-svn: 284298
2016-10-15 00:58:14 +00:00
Tom Stellard 09c2bd6bd4 AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)

Reviewers: arsenm, nhaehnle

Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24672

llvm-svn: 284267
2016-10-14 19:14:29 +00:00
Tom Stellard 64a9d0876c AMDGPU/SI: Don't allow unaligned scratch access
Summary: The hardware doesn't support this.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25523

llvm-svn: 284257
2016-10-14 18:10:39 +00:00
Nicolai Haehnle 67624af0cc AMDGPU: Select 64-bit {ADD,SUB}{C,E} nodes
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25289

llvm-svn: 284224
2016-10-14 10:30:00 +00:00
Nicolai Haehnle bd15c3267f AMDGPU: Fix use-after-frees
Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25312

llvm-svn: 284215
2016-10-14 09:03:04 +00:00
Konstantin Zhuravlyov c96b5d7073 [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables
Differential Revision: https://reviews.llvm.org/D25562

llvm-svn: 284196
2016-10-14 04:37:34 +00:00
Konstantin Zhuravlyov 2a2ac37c2c [AMDGPU] Add 32-bit lo/hi got and pc relative variant kinds and emit appropriate relocations
Differential Revision: https://reviews.llvm.org/D25548

llvm-svn: 284195
2016-10-14 04:21:32 +00:00
Nirav Dave a81682aad4 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157
2016-10-13 20:23:25 +00:00
Nirav Dave 4b36957243 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
Matt Arsenault 253640e18d AMDGPU: Assume spilling will occur at -O0
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.

llvm-svn: 284119
2016-10-13 13:10:00 +00:00
Matt Arsenault dac31db12f AMDGPU: Fix truncate to bool warnings
llvm-svn: 284116
2016-10-13 12:45:16 +00:00
Matt Arsenault d486d3f8d1 AMDGPU: Initial implementation of VGPR indexing mode
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.

llvm-svn: 284031
2016-10-12 18:49:05 +00:00
Matt Arsenault cc88ce36ed AMDGPU: Add instruction definitions for VGPR indexing
VI added a second method of indexing into VGPRs
besides using v_movrel*

llvm-svn: 284027
2016-10-12 18:00:51 +00:00
Tom Stellard fac248cb5f AMDGPU/SI: Change mimg intrinsic signatures
This makes more fields overridable and removes redundant bits.

Patch by: Changpeng Fang

llvm-svn: 284024
2016-10-12 16:35:29 +00:00
Konstantin Zhuravlyov cdd4547607 [AMDGPU] Refactor waitcnt encoding
- Refactor bit packing/unpacking
- Calculate bit mask given bit shift and bit width
- Introduce function for decoding bits of waitcnt
- Introduce function for encoding bits of waitcnt
- Introduce function for getting waitcnt mask (instead of using bare numbers)
- Introduce function fot getting max waitcnt(s) (instead of using bare numbers)

Differential Revision: https://reviews.llvm.org/D25298

llvm-svn: 283919
2016-10-11 18:58:22 +00:00
Changpeng Fang 98317d20f4 AMDGPU/SI: Update ISA version numbers for Tonga and Polaris10/11.
Differential Revision:
  http://reviews.llvm.org/D25454

Reviewers:
  tstellarAMD

llvm-svn: 283893
2016-10-11 16:00:47 +00:00
Peter Collingbourne 0da86301ad Revert r283690, "MC: Remove unused entities."
llvm-svn: 283814
2016-10-10 22:49:37 +00:00
Mehdi Amini f42454b94b Move the global variables representing each Target behind accessor function
This avoids "static initialization order fiasco"

Differential Revision: https://reviews.llvm.org/D25412

llvm-svn: 283702
2016-10-09 23:00:34 +00:00
Peter Collingbourne cc723cccab MC: Remove unused entities.
llvm-svn: 283691
2016-10-09 04:39:13 +00:00
Peter Collingbourne 5c924d7117 Target: Remove unused entities.
llvm-svn: 283690
2016-10-09 04:38:57 +00:00
Tom Stellard 5ab6154dc3 AMDGPU/SI: Handle div_fmas hazard in GCNHazardRecognizer
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25250

llvm-svn: 283622
2016-10-07 23:42:48 +00:00
Tom Stellard 6982bb8f25 AMDGPU/SI: Add support for 8-byte relocations
Reviewers: arsenm, kzhuravl

Subscribers: wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25375

llvm-svn: 283593
2016-10-07 20:36:58 +00:00
Tom Stellard ef33c4b3f2 AMDGPU/SI: Emit fixups for long branches
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25366

llvm-svn: 283570
2016-10-07 16:01:18 +00:00
Artem Tamazov 73f1ab28cd [AMDGPU][mc] Add support for buffer_load_dwordx3, buffer_store_dwordx3.
Partially fixes Bug 28232.
Lit tests added.

Differential Revision: https://reviews.llvm.org/D25367

llvm-svn: 283567
2016-10-07 15:53:16 +00:00
Sam Kolton a3ec5c10e2 [AMDGPU] Assembler: support v_mac_f32 DPP and SDWA. Move getNamedOperandIdx to AMDGPUBaseInfo.h
Reviewers: artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25084

llvm-svn: 283560
2016-10-07 14:46:06 +00:00
Konstantin Zhuravlyov c09e2d7e46 [AMDGPU] AMDGPUCodeGenPrepare: remove extra ';'
llvm-svn: 283558
2016-10-07 14:39:53 +00:00
Konstantin Zhuravlyov f74fc60a7d [AMDGPU] Promote uniform (i1, i16] operations to i32
Differential Revision: https://reviews.llvm.org/D25302

llvm-svn: 283555
2016-10-07 14:22:58 +00:00
Nicolai Haehnle 87bc4c218b AMDGPU: Fix use-after-free in SIOptimizeExecMasking
Summary:
There was a bug with sequences like

   s_mov_b64 s[0:1], exec
   s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
   ...
   s_mov_b64_term exec, s[2:3]

because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.

Note that the test case also exposes an unrelated bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25306

llvm-svn: 283528
2016-10-07 08:40:14 +00:00
Peter Collingbourne 2261d78cd2 Target: Remove unused patterns and transforms. NFC.
llvm-svn: 283515
2016-10-07 00:30:49 +00:00
Matt Arsenault 5e63a04e46 AMDGPU: Don't fold undef uses or copies with implicit uses
llvm-svn: 283476
2016-10-06 18:12:13 +00:00
Matt Arsenault c59a92387e AMDGPU: Remove scheduling info from si_mask_branch
llvm-svn: 283475
2016-10-06 18:12:07 +00:00
Matt Arsenault c2ee42cd16 AMDGPU: Remove leftover implicit operands when folding immediates
When constant folding an operation to a copy or an immediate
mov, the implicit uses/defs of the old instruction were left behind,
e.g. replacing v_or_b32 left the implicit exec use on the new copy.

llvm-svn: 283471
2016-10-06 17:54:30 +00:00
Matt Arsenault 11f7402075 Reapply "AMDGPU: Support using tablegened MC pseudo expansions"
Fix bad merge

llvm-svn: 283470
2016-10-06 17:19:11 +00:00
Matt Arsenault cbc879ee2f Revert "AMDGPU: Support using tablegened MC pseudo expansions"
llvm-svn: 283469
2016-10-06 17:08:01 +00:00
Matt Arsenault d20a2dd7ac AMDGPU: Support using tablegened MC pseudo expansions
Make the necessary refactorings to make use of PseudoInstExpansion

llvm-svn: 283467
2016-10-06 16:56:41 +00:00
Matt Arsenault 6bc43d8627 BranchRelaxation: Support expanding unconditional branches
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

llvm-svn: 283464
2016-10-06 16:20:41 +00:00
Sam Kolton 3381d7a216 [AMDGPU] Disassembler: print label names in branch instructions
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.
Initialize MCObjectFileInfo with some default values.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D24802

llvm-svn: 283450
2016-10-06 13:46:08 +00:00
Matt Arsenault 10c17ca6c6 AMDGPU: Partially fix reported code size for some instructions
These ones need to have the size on the pseudo instruction set for
getInstSizeInBytes to work correctly. These also have a statically
known size.

llvm-svn: 283437
2016-10-06 10:13:23 +00:00
Konstantin Zhuravlyov b4eb5d5049 [AMDGPU] Promote uniform i16 bitreverse intrinsic to i32
Differential Revision: https://reviews.llvm.org/D25121

llvm-svn: 283415
2016-10-06 02:20:46 +00:00
Matthias Braun 0a6916f303 AMDGPU: Do not re-use tmpreg in spill/restore lowering
The register scavenging code does not support multiple definitions of
the same vreg.

Differential Revision: https://reviews.llvm.org/D25220

llvm-svn: 283369
2016-10-05 20:02:51 +00:00
Matt Arsenault dcf0cfca4c AMDGPU: Refactor indirect vector lowering
Allow inserting multiple instructions in the
expanded loop.

llvm-svn: 283177
2016-10-04 01:41:05 +00:00
Matt Arsenault 283fbc24f6 AMDGPU: Factor SGPR spilling into separate functions
llvm-svn: 283175
2016-10-04 01:14:56 +00:00
Konstantin Zhuravlyov 60a8373778 [AMDGPU] Pass optimization level to SelectionDAGISel
llvm-svn: 283133
2016-10-03 18:47:26 +00:00
Konstantin Zhuravlyov 691e2e020b [AMDGPU] Sign extend AShr when promoting (instead of zero extending)
llvm-svn: 283130
2016-10-03 18:29:01 +00:00
Matt Arsenault 4a048a44e5 AMDGPU: Fix typo
llvm-svn: 283108
2016-10-03 13:06:58 +00:00
Volkan Keles 1c38681ae6 Add new target hooks for LoadStoreVectorizer
Summary: Added 6 new target hooks for the vectorizer in order to filter types, handle size constraints and decide how to split chains.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, mzolotukhin, wdng, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D24727

llvm-svn: 283099
2016-10-03 10:31:34 +00:00
Konstantin Zhuravlyov c9786f0d8a [AMDGPU] Remove unused variables from SIOptimizeExecMasking
Differential Revision: https://reviews.llvm.org/D25110

llvm-svn: 283087
2016-10-03 04:43:22 +00:00
Mehdi Amini 117296c0a0 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
Mehdi Amini 86eeda8e20 Revert "AMDGPU: Don't use offen if it is 0"
This reverts commit r282999.
Tests are not passing: http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/20038

llvm-svn: 283003
2016-10-01 02:35:24 +00:00
Matt Arsenault 3070fdf798 AMDGPU: Don't use offen if it is 0
This removes many re-initializations of a base register to 0.

llvm-svn: 282999
2016-10-01 01:37:15 +00:00
Konstantin Zhuravlyov 836cbff695 [AMDGPU] Choose VMCNT, EXPCNT, LGKMCNT masks and shifts based on the isa version
Differential Revision: https://reviews.llvm.org/D24973

llvm-svn: 282877
2016-09-30 17:01:40 +00:00
Konstantin Zhuravlyov d7bdf24f32 [AMDGPU] Ask subtarget if waitcnt instruction is needed before barrier instruction
Differential Revision: https://reviews.llvm.org/D24985

llvm-svn: 282875
2016-09-30 16:50:36 +00:00
Konstantin Zhuravlyov 4658e5f7b0 [AMDGPU] Do not run scalar optimization passes at "-O0"
Differential Revision: https://reviews.llvm.org/D25055

llvm-svn: 282873
2016-09-30 16:39:24 +00:00
Matt Arsenault 5d8eb25e78 AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Matt Arsenault e6740754f0 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00
Konstantin Zhuravlyov e14df4b236 [AMDGPU] Promote uniform i16 ops to i32 ops for targets that have 16 bit instructions
Differential Revision: https://reviews.llvm.org/D24125

llvm-svn: 282624
2016-09-28 20:05:39 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Konstantin Zhuravlyov da4687c531 [AMDGPU] Enable changing instprinter's behavior based on the per-function
subtarget

This is a prerequisite for coming waitcnt changes

Differential Revision: https://reviews.llvm.org/D24939

llvm-svn: 282489
2016-09-27 14:42:48 +00:00
Tom Stellard 1b9748c6a2 AMDGPU/SI: Don't crash on anonymous GlobalValues
Summary:
We need to call AsmPrinter::getNameWithPrefix() in order to handle
anonymous GlobalValues (e.g. @0, @1).

Reviewers: arsenm, b-sumner

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D24865

llvm-svn: 282420
2016-09-26 17:29:25 +00:00
Sam Kolton 984461062f Revert "[AMDGPU] Disassembler: print label names in branch instructions"
This reverts commit 6c6dbe625263ec9fcf8de0df27263cf147cde550.

llvm-svn: 282396
2016-09-26 11:29:03 +00:00
Sam Kolton 1559f76257 [AMDGPU] Disassembler: print label names in branch instructions
Summary: Add AMDGPUSymbolizer for finding names for labels from ELF symbol table.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D24802

llvm-svn: 282394
2016-09-26 10:05:50 +00:00
Valery Pykhtin fbf2d93f73 [AMDGPU] Fix for bz30427: wrong MTBUF encoding on VI
Differential revision: https://reviews.llvm.org/D24875

llvm-svn: 282296
2016-09-23 21:21:21 +00:00
Valery Pykhtin 355103f6c0 [AMDGPU] Refactor VOP1 and VOP2 instruction TD definitions
Differential revision: https://reviews.llvm.org/D24738

llvm-svn: 282234
2016-09-23 09:08:07 +00:00
Tom Stellard e88bbc34c6 AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size
Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D24835

llvm-svn: 282223
2016-09-23 01:33:26 +00:00
Artem Tamazov 2146a0a90e [AMDGPU][mc] Add support for absolute expressions in DPP modifiers.
Also added range checking for DPP attributes.
Assembler tests added as well.

Differential Revision: https://reviews.llvm.org/D24755

llvm-svn: 282145
2016-09-22 11:47:21 +00:00
Artem Tamazov 2e217b87cb [AMDGPU][mc] Add support for ds_add_[rtn_]f32.
Lit tests added.
Resolves https://github.com/RadeonOpenCompute/hcc/issues/122.

Differential Revision: https://reviews.llvm.org/D24765

llvm-svn: 282086
2016-09-21 16:35:44 +00:00
Tim Northover 862758ec14 GlobalISel: pass Function to lowerFormalArguments directly (NFC).
The only implementation that exists immediately looks it up anyway, and the
information is needed to handle various parameter attributes (stored on the
function itself).

llvm-svn: 282068
2016-09-21 12:57:35 +00:00
Sam Kolton 12b633beda [AMDGPU] Assembler: remove unused AMDGPUMCObjectWriter.
Summary: It is replaced by AMDGPUELFObjectWriter

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl

Differential Revision: https://reviews.llvm.org/D24654

llvm-svn: 282065
2016-09-21 10:33:32 +00:00
Valery Pykhtin e330cfa294 [AMDGPU] Refactor VOP3 instruction TD definitions
Differential revision: https://reviews.llvm.org/D24664

llvm-svn: 281965
2016-09-20 10:41:16 +00:00
Valery Pykhtin 2828b9be1e [AMDGPU] Refactor VOPC instruction TD definitions
Differential Revision: https://reviews.llvm.org/D24546

llvm-svn: 281903
2016-09-19 14:39:49 +00:00
Sam Kolton be7ffb90bf [AMDGPU] Fix s_branch with -1 offset
Summary:
In case s_branch instruction target is itself backend should emit offset -1 but instead it emit 0.
'''
label:
    s_branch label  // should emit [0xff,0xff,0x82,0xbf]
'''

Tom, Matt: why are we adjusting fixup values in applyFixup() method instead of processFixup()? processFixup() is calling adjustFixupValue() but does nothing with its result.

Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl

Differential Revision: https://reviews.llvm.org/D24671

llvm-svn: 281896
2016-09-19 10:20:55 +00:00
Matt Arsenault ac0fc849cf AMDGPU: Fix broken FrameIndex handling
We were trying to avoid using a FrameIndex operand in non-pointer
operands in a convoluted way, and would break because of
using TargetFrameIndex. The TargetFrameIndex should only be used
in the case where it makes sense to fold it as part of the addressing
mode, otherwise it requires materialization like a normal constant.
This wasn't working reliably and failed in the added testcase, hitting
the assert when processing the frame index.

The TargetFrameIndex was coming from trying to produce an AssertZext
limiting the maximum stack size. I'm not sure this was correct to begin
with, because it is apparently possible to have a single workitem
dispatch that requires all 4G of private memory.

llvm-svn: 281824
2016-09-17 16:09:55 +00:00
Matt Arsenault bcfd94c298 AMDGPU: Rename spill operands to match real instruction
llvm-svn: 281823
2016-09-17 15:52:37 +00:00
Matt Arsenault d99ef1144b AMDGPU: Push bitcasts through build_vector
This reduces the number of copies and reg_sequences
when using fp constant vectors. This significantly
reduces the code size in local-stack-alloc-bug.ll

llvm-svn: 281822
2016-09-17 15:44:16 +00:00
Matt Arsenault 7b1dc2c983 AMDGPU: Use i64 scalar compare instructions
VI added eq/ne for i64, so use them.

llvm-svn: 281800
2016-09-17 02:02:19 +00:00
Tom Stellard 7998db634c AMDGPU/SI: Fix kernel argument ABI for HSA
Summary: i8, i16, and f16 values are not extended to 32-bit in the HSA kernel ABI.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24621

llvm-svn: 281789
2016-09-16 22:20:24 +00:00
Matt Arsenault 6408c9135c AMDGPU: Allow some control flow intrinsics to be CSEd
These clean up some unnecessary or instructions in
cases with complex loops.

In the original testcase I noticed this, the same
or with exec was repeated 5 or 6 times in a row. With
this only one is emitted or sometimes a copy.

llvm-svn: 281786
2016-09-16 22:11:18 +00:00
Tom Stellard bbeb45aff6 AMDGPU: Refactor kernel argument lowering
Summary:
The main challenge in lowering kernel arguments for AMDGPU is determing the
memory type of the argument.  The generic calling convention code assumes
that only legal register types can be stored in memory, but this is not the
case for AMDGPU.

This consolidates all the logic AMDGPU uses for deducing memory types into a single
function.  This will make it much easier to support different ABIs in the future.

Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D24614

llvm-svn: 281781
2016-09-16 21:53:00 +00:00
Matt Arsenault 7ccf6cd104 AMDGPU: Use SOPK compare instructions
llvm-svn: 281780
2016-09-16 21:41:16 +00:00
Tom Stellard 0b76fc4c77 AMDGPU/SI: Add support for triples with the mesa3d operating system
Summary:
mesa3d will use the same kernel calling convention as amdhsa, but it will
handle everything else like the default 'unknown' OS type.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22783

llvm-svn: 281779
2016-09-16 21:34:26 +00:00
Eric Christopher 4367c7fb9a Move the Mangler from the AsmPrinter down to TLOF and clean up the
TLOF API accordingly.

llvm-svn: 281708
2016-09-16 07:33:15 +00:00
Matt Arsenault 1b9fc8ed65 Finish renaming remaining analyzeBranch functions
llvm-svn: 281535
2016-09-14 20:43:16 +00:00
Matt Arsenault f40b70fa75 Revert "AMDGPU: Use SOPK compare instructions"
Accidentally committed

llvm-svn: 281514
2016-09-14 18:04:42 +00:00
Matt Arsenault f757c87959 AMDGPU: Use SOPK compare instructions
llvm-svn: 281513
2016-09-14 18:03:53 +00:00
Matt Arsenault e8e0f5cac6 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

llvm-svn: 281506
2016-09-14 17:24:15 +00:00
Matt Arsenault a2b036e88b AArch64: Use TTI branch functions in branch relaxation
The main change is to return the code size from
InsertBranch/RemoveBranch.

Patch mostly by Tim Northover

llvm-svn: 281505
2016-09-14 17:23:48 +00:00
Matt Arsenault 2bc198a333 AMDGPU: Support folding FrameIndex operands
This avoids test regressions in a future commit.

llvm-svn: 281491
2016-09-14 15:51:33 +00:00
Matt Arsenault fa5f767a38 AMDGPU: Improve splitting 64-bit bit ops by constants
This addresses a TODO to handle operations besides and. This
also starts eliminating no-op operations with a constant that
can emerge later.

llvm-svn: 281488
2016-09-14 15:19:03 +00:00
Matt Arsenault a992f71bef AMDGPU: Remove code I think is dead
As far as I can tell, resolveFrameIndex is supposed to be
called with a legal offset, so inserting an add shouldn't be
necessary.

llvm-svn: 281372
2016-09-13 19:15:25 +00:00
Matt Arsenault 25dba30017 AMDGPU: Support commuting a FrameIndex operand
llvm-svn: 281369
2016-09-13 19:03:12 +00:00
Nicolai Haehnle e58e0e3fe3 AMDGPU: Do not clobber SCC in SIWholeQuadMode
Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22198

llvm-svn: 281230
2016-09-12 16:25:20 +00:00
Sam Kolton fb0d9d9c13 [AMDGPU] Assembler: Move disabled SDWA and DPP instruction into Disable asm variant
Summary: This removes disabled instructions from match tables so we will not match them at all.

Reviewers: tstellarAMD, vpykhtin, artem.tamazov

Subscribers: wdng, nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D24452

llvm-svn: 281216
2016-09-12 14:42:43 +00:00
Duncan P. N. Exon Smith 1872096f1e CodeGen: Give MachineBasicBlock::reverse_iterator a handle to the current MI
Now that MachineBasicBlock::reverse_instr_iterator knows when it's at
the end (since r281168 and r281170), implement
MachineBasicBlock::reverse_iterator directly on top of an
ilist::reverse_iterator by adding an IsReverse template parameter to
MachineInstrBundleIterator.  This replaces another hard-to-reason-about
use of std::reverse_iterator on list iterators, matching the changes for
ilist::reverse_iterator from r280032 (see the "out of scope" section at
the end of that commit message).  MachineBasicBlock::reverse_iterator
now has a handle to the current node and has obvious invalidation
semantics.

r280032 has a more detailed explanation of how list-style reverse
iterators (invalidated when the pointed-at node is deleted) are
different from vector-style reverse iterators like std::reverse_iterator
(invalidated on every operation).  A great motivating example is this
commit's changes to lib/CodeGen/DeadMachineInstructionElim.cpp.

Note: If your out-of-tree backend deletes instructions while iterating
on a MachineBasicBlock::reverse_iterator or converts between
MachineBasicBlock::iterator and MachineBasicBlock::reverse_iterator,
you'll need to update your code in similar ways to r280032.  The
following table might help:

                  [Old]              ==>             [New]
        delete &*RI, RE = end()                   delete &*RI++
        RI->erase(), RE = end()                   RI++->erase()
      reverse_iterator(I)                 std::prev(I).getReverse()
      reverse_iterator(I)                          ++I.getReverse()
    --reverse_iterator(I)                            I.getReverse()
      reverse_iterator(std::next(I))                 I.getReverse()
                RI.base()                std::prev(RI).getReverse()
                RI.base()                         ++RI.getReverse()
              --RI.base()                           RI.getReverse()
     std::next(RI).base()                           RI.getReverse()

(For more details, have a look at r280032.)

llvm-svn: 281172
2016-09-11 18:51:28 +00:00
Justin Lebar adbf09e8cf [CodeGen] Split out the notions of MI invariance and MI dereferenceability.
Summary:
An IR load can be invariant, dereferenceable, neither, or both.  But
currently, MI's notion of invariance is IR-invariant &&
IR-dereferenceable.

This patch splits up the notions of invariance and dereferenceability at
the MI level.  It's NFC, so adds some probably-unnecessary
"is-dereferenceable" checks, which we can remove later if desired.

Reviewers: chandlerc, tstellarAMD

Subscribers: jholewinski, arsenm, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D23371

llvm-svn: 281151
2016-09-11 01:38:58 +00:00
Valery Pykhtin b66e5eb612 [AMDGPU] Refactor MUBUF/MTBUF instructions
Differential revision: https://reviews.llvm.org/D24295

llvm-svn: 281137
2016-09-10 13:09:16 +00:00
Matt Arsenault 3354f42ae7 AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndex
llvm-svn: 281128
2016-09-10 01:20:33 +00:00
Matt Arsenault 7348a7eadd AMDGPU: Fix scheduling info for spill pseudos
These defaulted to Write32Bit. I don't think this actually matters
since these don't exist during scheduling.

llvm-svn: 281127
2016-09-10 01:20:28 +00:00
Matt Arsenault 124384f08d AMDGPU: Fix immediate folding logic when shrinking instructions
If the literal is being folded into src0, it doesn't matter
if it's an SGPR because it's being replaced with the literal.

Also fixes initially selecting 32-bit versions of some instructions
which also confused commuting.

llvm-svn: 281117
2016-09-09 23:32:53 +00:00
Matt Arsenault 0efdd06b22 AMDGPU: Run LoadStoreVectorizer pass by default
llvm-svn: 281112
2016-09-09 22:29:28 +00:00
Wei Ding 06f8d39424 AMDGPU : Fix mqsad_u32_u8 instruction incorrect data type.
Differential Revision: http://reviews.llvm.org/D23700

llvm-svn: 281081
2016-09-09 19:31:51 +00:00
Tom Stellard b2869eb6e9 AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSA
Reviewers: arsenm

Subscribers: arsenm, wdng, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D24405

llvm-svn: 281080
2016-09-09 19:28:00 +00:00
Sam Kolton 1eeb11bfd4 AMDGPU] Assembler: better support for immediate literals in assembler.
Summary:
Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals.
E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least.

With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction).
Here are rules how we convert literals:
    - We parsed fp literal:
        - Instruction expects 64-bit operand:
            - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5)
                - then we do nothing this literal
            - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5)
                - report error
            - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant
                - If instruction expect fp operand type (f64)
                    - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5)
                        - If so then do nothing
                    - Else (e.g. v_fract_f64 v[0:1], 3.1415)
                        - report warning that low 32 bits will be set to zeroes and precision will be lost
                        - set low 32 bits of literal to zeroes
                - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5)
                    - report error as it is unclear how to encode this literal
        - Instruction expects 32-bit operand:
            - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow
            - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5)
                - do nothing
                - Else report error
            - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0)
    - Parsed binary literal:
        - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35)
            - do nothing
        - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35)
            - report error
        - Else, literal is not-inlinable and we are not required to inline it
            - Are high 32 bit of literal zeroes or same as sign bit (32 bit)
                - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef)
            - Else
                - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0)

For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types:
'''
enum OperandType {
    OPERAND_REG_IMM32_INT,
    OPERAND_REG_IMM32_FP,
    OPERAND_REG_INLINE_C_INT,
    OPERAND_REG_INLINE_C_FP,
}
'''

This is not working yet:
    - Several tests are failing
    - Problems with predicate methods for inline immediates
    - LLVM generated assembler parts try to select e64 encoding before e32.
More changes are required for several AsmOperands.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, artem.tamazov

Differential Revision: https://reviews.llvm.org/D22922

llvm-svn: 281050
2016-09-09 14:44:04 +00:00
Sam Kolton a2e5c88baf [AMDGPU] Assembler: rename amd_kernel_code_t asm names according to spec
Summary:
Also removed duplicate code from AMDGPUTargetAsmStreamer.
This change only change how amd_kernel_code_t is parsed and printed. No variable names are changed.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, wdng, nhaehnle

Differential Revision: https://reviews.llvm.org/D24296

llvm-svn: 281028
2016-09-09 10:08:02 +00:00
Sam Kolton d63d8a7c05 [AMDGPU] Assembler: match e32 VOP instructions before e64.
Summary:
Split assembler match table in 4 tables with assembler variants:

Default - all instructions except VOP3, SDWA and DPP
  - VOP3
  - SDWA
  - DPP
First match Default table then VOP3, SDWA and DPP.

Reviewers:  tstellarAMD, artem.tamazov, vpykhtin

Subscribers: arsenm, wdng, nhaehnle, AMDGPU

Differential Revision: https://reviews.llvm.org/D24252

llvm-svn: 281023
2016-09-09 09:37:51 +00:00
Matt Arsenault d745c28945 AMDGPU: Sign extend constants when splitting them
This will confuse later passes which try to look at the
immediate value and don't truncate first.

llvm-svn: 280974
2016-09-08 17:44:36 +00:00
Matt Arsenault be90f70d3a AMDGPU: Try to commute when selecting s_addk_i32/s_mulk_i32
llvm-svn: 280972
2016-09-08 17:35:41 +00:00
Matt Arsenault bbb47da8a1 AMDGPU: Support commuting with immediate in src0
llvm-svn: 280970
2016-09-08 17:19:29 +00:00
Yaxun Liu 90658fff1b AMDGPU: Remove a useless variable which caused build failure for lld.
llvm-svn: 280841
2016-09-07 18:31:11 +00:00
Yaxun Liu 638914009a AMDGPU: Add hidden kernel arguments to runtime metadata
OpenCL kernels have hidden kernel arguments for global offset and printf buffer. For consistency, these hidden argument should be included in the runtime metadata. Also updated kernel argument kind metadata.

Differential Revision: https://reviews.llvm.org/D23424

llvm-svn: 280829
2016-09-07 17:44:00 +00:00
Matt Arsenault 479ba3aac0 AMDGPU: Make some scalar instructions commutable
llvm-svn: 280784
2016-09-07 06:25:55 +00:00
Matt Arsenault 6cda10c950 Remove unnecessary call to getAllocatableRegClass
This reapplies r252565 and r252674, effectively reverting r252956.

This allows VS_32/VS_64 to be unallocatable like they should be.

llvm-svn: 280783
2016-09-07 06:16:45 +00:00
Konstantin Zhuravlyov 1d65026ca6 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562

llvm-svn: 280747
2016-09-06 20:22:28 +00:00
Tom Stellard 2add8a1140 AMDGPU/SI: Teach SIInstrInfo::FoldImmediate() to fold immediates into copies
Summary:
I put this code here, because I want to re-use it in a few other places.
This supersedes some of the immediate folding code we have in SIFoldOperands.
I think the peephole optimizers is probably a better place for folding
immediates into copies, since it does some register coalescing in the same time.

This will also make it easier to transition SIFoldOperands into a smarter pass,
where it looks at all uses of instruction at once to determine the optimal way to
fold operands.  Right now, the pass just considers one operand at a time.

Reviewers: arsenm

Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23402

llvm-svn: 280744
2016-09-06 20:00:26 +00:00
Wei Ding 5e832e866e AMDGPU : Add XNACK feature to GPUs that support it.
Differential Revision: http://reviews.llvm.org/D24276

llvm-svn: 280742
2016-09-06 19:55:17 +00:00
Valery Pykhtin 8bc659637c [AMDGPU] Refactor FLAT TD instructions
Differential revision: https://reviews.llvm.org/D24072

llvm-svn: 280655
2016-09-05 11:22:51 +00:00