Commit Graph

3865 Commits

Author SHA1 Message Date
Matt Arsenault 7bedceb5b2 GlobalISel: moreElementsVector for G_LOAD/G_STORE
AMDGPU change and test is a placeholder until a future patch with
complete handling.

llvm-svn: 367503
2019-08-01 01:44:22 +00:00
Matt Arsenault d48324ff6f Reapply "AMDGPU: Split block for si_end_cf"
This reverts commit r359363, reapplying r357634

llvm-svn: 367500
2019-08-01 01:25:27 +00:00
Matt Arsenault 3594011de0 AMDGPU/GlobalISel: Select local loads
llvm-svn: 367498
2019-08-01 00:53:38 +00:00
Stanislav Mekhanoshin 2594fa8593 [AMDGPU] Fix high occupancy calculation and print it
We had couple places which still return 10 as a maximum
occupancy. Fixed.

Also print comment about occupancy as compiler see it.

Differential Revision: https://reviews.llvm.org/D65423

llvm-svn: 367381
2019-07-31 01:07:10 +00:00
Matt Arsenault 52c262484f TableGen: Add MinAlignment predicate
AMDGPU uses some custom code predicates for testing alignments.

I'm still having trouble comprehending the behavior of predicate bits
in the PatFrag hierarchy. Any attempt to abstract these properties
unexpectdly fails to apply them.

llvm-svn: 367373
2019-07-31 00:14:43 +00:00
Stanislav Mekhanoshin 9aff33bb95 [AMDGPU] Print register pressure for agpr and vgpr separately
Differential Revision: https://reviews.llvm.org/D65476

llvm-svn: 367355
2019-07-30 20:45:15 +00:00
Stanislav Mekhanoshin 450afcea39 [AMDGPU] Reserve all AGPRs on targets which do not have them
Differential Revision: https://reviews.llvm.org/D65471

llvm-svn: 367347
2019-07-30 19:29:33 +00:00
Austin Kerbow c99f62e313 [AMDGPU/GlobalISel] Add llvm.amdgcn.fdiv.fast legalization.
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64966

llvm-svn: 367344
2019-07-30 18:49:16 +00:00
Matt Arsenault 57ef94fb06 AMDGPU: Avoid emitting "true" predicates
Empty condition strings are considerde always true. This removes a lot
of clutter from the generated matcher tables.

This shrinks the source size of AMDGPUGenDAGISel.inc from 7.3M to
6.1M.

llvm-svn: 367326
2019-07-30 15:56:43 +00:00
Tom Stellard cc0bc941d4 AMDGPU/LoadStoreOptimizer: combine MMOs when merging instructions
Summary:
The LoadStoreOptimizer was creating instructions with 2
MachineMemOperands, which meant they were assumed to alias with all other instructions,
because MachineInstr:mayAlias() returns true when an instruction has multiple
MachineMemOperands.

This was preventing these instructions from being merged again, and was
giving the scheduler less freedom to reorder them.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65036

llvm-svn: 367237
2019-07-29 16:40:58 +00:00
Jay Foad 3bdcedbf3d [AMDGPU] Fix typo in error message
llvm-svn: 367235
2019-07-29 16:17:13 +00:00
Jay Foad dcb7532479 [DivergenceAnalysis] Add methods for querying divergence at use
Summary:
The existing isDivergent(Value) methods query whether a value is
divergent at its definition. However even if a value is uniform at its
definition, a use of it in another basic block can be divergent because
of divergent control flow between the def and the use.

This patch adds new isDivergent(Use) methods to DivergenceAnalysis,
LegacyDivergenceAnalysis and GPUDivergenceAnalysis.

This might allow D63953 or other similar workarounds to be removed.

Reviewers: alex-t, nhaehnle, arsenm, rtaylor, rampitec, simoll, jingyue

Reviewed By: nhaehnle

Subscribers: jfb, jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65141

llvm-svn: 367218
2019-07-29 10:22:09 +00:00
David Stuttard 20235ef3e7 [AMDGPU] Enable v4f16 and above for v_pk_fma instructions
Summary:
If isel is presented with <2 x half> vectors then it will correctly select
v_pk_fma style instructions.
If isel is presented with e.g. <4 x half> vectors it will scalarize, unlike for
other instruction types (such as fadd, fmul etc.)

Added extra support to enable this. Updated one of the tests to include a test
for this (as well as extending the test to GFX9)

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65325

Change-Id: I50a4577a3f8223fb53992af3b7d26121f65b71ee
llvm-svn: 367206
2019-07-29 08:15:10 +00:00
Michael Liao 711556e6a8 [AMDGPU] Fix typo.
llvm-svn: 367131
2019-07-26 17:13:59 +00:00
Carl Ritson 0b28357053 [AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAG
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65328

llvm-svn: 367105
2019-07-26 13:11:44 +00:00
Carl Ritson 00e89b428b [AMDGPU] Add llvm.amdgcn.softwqm intrinsic
Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm
only if there is other WQM computation in the shader.

Reviewers: nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64935

llvm-svn: 367097
2019-07-26 09:54:12 +00:00
Matt Arsenault a9ea8a9aae AMDGPU/GlobalISel: Handle most function return types
handleAssignments gives up pretty easily on structs, and i8 values for
some reason. The other case that doesn't work is when an implicit sret
needs to be inserted if the return size exceeds the number of return
registers.

llvm-svn: 367082
2019-07-26 02:36:05 +00:00
Michael Liao 53f967f2bd [AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs.
Summary:
- As LCSSA is turned on just before isel, it may create PHI of the flow,
  which is consumed by pseudo structurized CFG instructions. When that
  PHIs are eliminated in O0, COPY may be placed wrongly as the these
  pseudo structurized CFG instructions are considering prologue of MBB.
- Run extra `unreachable-mbb-elimination` at the end of isel to clean up
  PHIs.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64353

llvm-svn: 367023
2019-07-25 14:50:18 +00:00
Matt Arsenault a85af76c72 AMDGPU: Don't assert on v4f16 arguments to shader calling conventions
llvm-svn: 367018
2019-07-25 13:55:07 +00:00
Stanislav Mekhanoshin c43784ff26 [AMDGPU] Increase kernel padding
To support prefetch mode 3 we need to pad current
cacheline and fill 3 cachelines after. Current padding
is only sufficient for mode 2.

Differential Revision: https://reviews.llvm.org/D65236

llvm-svn: 366938
2019-07-24 19:40:13 +00:00
Dmitry Preobrazhensky 5e1dd02c90 [AMDGPU][MC][GFX10] Enabled GFX10 assembly with arbitrary wavesize assumed by the code
Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D65216

llvm-svn: 366921
2019-07-24 16:50:17 +00:00
Stanislav Mekhanoshin 5cdacea297 [AMDGPU] Add all vgpr classes to asm parser
Differential Revision: https://reviews.llvm.org/D65158

llvm-svn: 366917
2019-07-24 16:21:18 +00:00
Matt Arsenault 0e7d8698b5 AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts
The G_ANYEXT handling can end up reaching selectCOPY, which mutates
the instruction in place.

llvm-svn: 366915
2019-07-24 16:05:53 +00:00
Simon Pilgrim c60c12fb10 Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI.
llvm-svn: 366808
2019-07-23 14:04:54 +00:00
Matt Arsenault 827427f65b AMDGPU: Don't use SDNodeXForm for DS offset output
The xform has no real valuewhen it's using out of a complex pattern
output. The complex pattern was already creating TargetConstants with
i16, so this was just unnecessary machinery.

This allows global isel to import the simple cases once the complex
pattern is implemented.

llvm-svn: 366743
2019-07-22 21:38:11 +00:00
Matt Arsenault 937d0ee5d8 AMDGPU/GlobalISel: Remove unnecessary code
The minnum/maxnum case are dead, and the cvt is handled by the
default.

llvm-svn: 366685
2019-07-22 13:05:25 +00:00
Jay Foad 298500ae33 [AMDGPU] Save some work when an atomic op has no uses
Summary:
In the atomic optimizer, save doing a bunch of work and generating a
bunch of dead IR in the fairly common case where the result of an
atomic op (i.e. the value that was in memory before the atomic op was
performed) is not used. NFC.

Reviewers: arsenm, dstuttard, tpr

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64981

llvm-svn: 366667
2019-07-22 07:19:44 +00:00
Matt Arsenault f3bfb85bce AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces
llvm-svn: 366621
2019-07-19 22:28:44 +00:00
Stanislav Mekhanoshin 05d9e6a2a3 [AMDGPU] Autogenerate register sequences in tuples
Differential Revision: https://reviews.llvm.org/D65007

llvm-svn: 366619
2019-07-19 21:43:42 +00:00
Stanislav Mekhanoshin 7b5a54e369 [AMDGPU] Fixed occupancy calculation for gfx10
Differential Revision: https://reviews.llvm.org/D65010

llvm-svn: 366616
2019-07-19 21:29:51 +00:00
Matt Arsenault 5e23f42820 AMDGPU: Avoid custom predicates for stores with glue
llvm-svn: 366613
2019-07-19 21:01:30 +00:00
Matt Arsenault e3401a9b86 AMDGPU: Redefine setcc condition PatLeafs
Avoid using custom code predicates.

llvm-svn: 366609
2019-07-19 20:24:40 +00:00
Matt Arsenault 48c0df5d46 AMDGPU: Don't rely on m0 being -1 for GWS offsets
This only works if the high bits of m0 are also 0, so m0 would have to
be set to 0xffff.

llvm-svn: 366608
2019-07-19 20:01:24 +00:00
Matt Arsenault 85f3890126 AMDGPU: Force s_waitcnt after GWS instructions
This is apparently required to be the immediately following
instruction, so force it into a bundle with a waitcnt.

llvm-svn: 366607
2019-07-19 19:47:30 +00:00
Stanislav Mekhanoshin 01fcf9238f [AMDGPU] Allow register tuples to set asm names
This change reverts most of the previous register name generation.
The real problem is that RegisterTuple does not generate asm names.
Added optional operand to RegisterTuple. This way we can simplify
register name access and dramatically reduce the size of static
tables for the backend.

Differential Revision: https://reviews.llvm.org/D64967

llvm-svn: 366598
2019-07-19 18:05:01 +00:00
Matt Arsenault 7df225dfc2 AMDGPU/GlobalISel: Fix MMO flags for kernel argument loads
The DAG lowering sets dereferencable and invariant, not nontemporal.

llvm-svn: 366597
2019-07-19 17:52:56 +00:00
Matt Arsenault 08494f6231 AMDGPU/GlobalISel: Selection for fminnum/fmaxnum
v2f16 case doesn't work yet because the VOP3P complex patterns haven't
been ported yet.

llvm-svn: 366585
2019-07-19 14:42:40 +00:00
Matt Arsenault b60a2ae40e AMDGPU/GlobalISel: Support arguments with multiple registers
Handles structs used directly in argument lists.

llvm-svn: 366584
2019-07-19 14:29:30 +00:00
Matt Arsenault fecf43eba3 AMDGPU/GlobalISel: Rewrite lowerFormalArguments
This should now handle everything except structs passed as multiple
registers.

I think most of the packing logic should be handled by
handleAssignments, but I'm unclear on what the contract is for
multiple registers. This is copying how x86 handles this.

This does change the behavior of the test_sgpr_alignment0 amdgpu_vs
test. I don't think shader arguments should try to follow the
alignment, and registers need to be repacked. I also don't think it
matters, since I think the pointers are packed to the beginning of the
argument list anyway.

llvm-svn: 366582
2019-07-19 14:15:18 +00:00
Matt Arsenault 1022c0dfde AMDGPU: Decompose all values to 32-bit pieces for calling conventions
This is the more natural lowering, and presents more opportunities to
reduce 64-bit ops to 32-bit.

This should also help avoid issues graphics shaders have had with
64-bit values, and simplify argument lowering in globalisel.

llvm-svn: 366578
2019-07-19 13:57:44 +00:00
Dmitry Preobrazhensky 4ccb7f8c45 [AMDGPU][MC] Corrected parsing of branch offsets
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D64629

llvm-svn: 366571
2019-07-19 13:12:47 +00:00
Jay Foad 7d06ffff46 [AMDGPU] Simplify the exclusive scan used for optimized atomics
Summary:
Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8,
16, 32) instead of starting off shifting by 1, 2 and 3 and then doing
a 3-way ADD, because:

1. It simplifies the compiler a little.
2. It minimizes vgpr pressure because each instruction is now of the
   form vn = vn + vn << c.
3. It is more friendly to the DPP combiner, which currently can't
   combine into an ADD3 instruction.

Because of #2 and #3 the end result is improved from this:

  v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
  v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
  v_mov_b32_dpp v1, v3  row_shr:3 row_mask:0xf bank_mask:0xf
  v_add3_u32 v1, v4, v5, v1
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf

To this:

  v_add_u32_dpp v1, v1, v1  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf

I.e. two fewer computational instructions, one extra nop where we could
schedule something else.

Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64411

llvm-svn: 366543
2019-07-19 08:40:37 +00:00
Stanislav Mekhanoshin a9c71e01e7 [AMDGPU] Drop Reg32 and use regular AsmName
This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb.

Differential Revision: https://reviews.llvm.org/D64952

llvm-svn: 366505
2019-07-18 22:18:33 +00:00
Stanislav Mekhanoshin 7872d76a16 [AMDGPU] Simplify AMDGPUInstPrinter::printRegOperand()
Differential Revision: https://reviews.llvm.org/D64892

llvm-svn: 366385
2019-07-17 22:58:43 +00:00
Stanislav Mekhanoshin 9c7f4264d3 [AMDGPU] Stop special casing flat_scratch for register name
Differential Revision: https://reviews.llvm.org/D64885

llvm-svn: 366376
2019-07-17 21:35:11 +00:00
Daniil Fukalov d912a9ba9b [AMDGPU] Tune inlining parameters for AMDGPU target
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.

amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64642

llvm-svn: 366348
2019-07-17 16:51:29 +00:00
Matt Arsenault 06eed42213 AMDGPU: Use getTargetConstant
Avoids creating an extra intermediate mov.

llvm-svn: 366340
2019-07-17 15:35:36 +00:00
Jay Foad 70235c642e [AMDGPU] Optimize atomic AND/OR/XOR
Summary: Extend the atomic optimizer to handle AND, OR and XOR.

Reviewers: arsenm, sheredom

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64809

llvm-svn: 366323
2019-07-17 13:40:03 +00:00
Nicolai Haehnle 8b7041a5c6 AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC
Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630

Reviewers: rampitec, mareko

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64807

Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc
llvm-svn: 366314
2019-07-17 11:22:57 +00:00
Nicolai Haehnle a256b8b7d7 AMDGPU: Improve alias analysis for GDS
Summary: GDS cannot alias anything else.

Original patch by: Marek Olšák

Reviewers: arsenm, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64114

Change-Id: I07bfbd96f5d5c37a6dfba7997df12f291dd794b0
llvm-svn: 366313
2019-07-17 11:22:19 +00:00
Stanislav Mekhanoshin e5012ab308 [AMDGPU] Autogenerate register asm names
Differential Revision: https://reviews.llvm.org/D64839

llvm-svn: 366283
2019-07-16 23:44:21 +00:00
Matt Arsenault f8c8284455 AMDGPU/GlobalISel: Select G_ASHR
llvm-svn: 366257
2019-07-16 20:31:25 +00:00
Matt Arsenault e5b28b98e9 AMDGPU/GlobalISel: Select G_LSHR
llvm-svn: 366256
2019-07-16 20:25:43 +00:00
Matt Arsenault 1b69fd275d AMDGPU/GlobalISel: Select G_SHL
I think this manages to not break the DAG handling with the divergent
predicates because the stadalone divergent patterns end up with a
higher priority than the pattern on the instruction definition.

The 16-bit versions don't work yet.

llvm-svn: 366254
2019-07-16 20:15:30 +00:00
Stanislav Mekhanoshin 6e0fa292c2 [AMDGPU] Change register type for v32 vectors
When it is AReg_1024 this results in unnecessary copying into
AGPRs of a 32 element vectors even though they are not intended
for an mfma instruction.

Differential Revision: https://reviews.llvm.org/D64815

llvm-svn: 366252
2019-07-16 20:06:00 +00:00
Matt Arsenault 2d10407719 AMDGPU/GlobalISel: Fix selection of private stores
llvm-svn: 366249
2019-07-16 19:27:44 +00:00
Matt Arsenault 7161fb0be5 AMDGPU/GlobalISel: Select private loads
llvm-svn: 366248
2019-07-16 19:22:21 +00:00
Matt Arsenault dad1f89210 AMDGPU/GlobalISel: Select flat stores
llvm-svn: 366246
2019-07-16 18:42:53 +00:00
Matt Arsenault 7eb1902cd5 AMDGPU: Add register classes to flat store patterns
For some reason GlobalISelEmitter needs register classes to import
these, although it works for the load patterns.

llvm-svn: 366242
2019-07-16 18:26:42 +00:00
Matt Arsenault 8f8d07e93b AMDGPU: Replace store PatFrags
Convert the easy cases to formats understood for GlobalISel.

llvm-svn: 366240
2019-07-16 18:21:25 +00:00
Matt Arsenault 35c96598b1 AMDGPU/GlobalISel: Select flat loads
Now that the patterns use the new PatFrag address space support, the
only blocker to importing most load patterns is the addressing mode
complex patterns.

llvm-svn: 366237
2019-07-16 18:05:29 +00:00
Jay Foad 17060f0a54 [AMDGPU] Optimize atomic max/min
Summary:
Extend the atomic optimizer to handle signed and unsigned max and min
operations, as well as add and subtract.

Reviewers: arsenm, sheredom, critson, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64328

llvm-svn: 366235
2019-07-16 17:44:54 +00:00
Matt Arsenault c6fd5abecc AMDGPU: Redefine load PatFrags
Rewrite PatFrags using the new PatFrag address space matching in
tablegen. These will now work with both SelectionDAG and GlobalISel.

llvm-svn: 366234
2019-07-16 17:38:50 +00:00
Michael Liao b3f967d411 [AMDGPU] Add the adjusted FP as a livein register.
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64145

llvm-svn: 366223
2019-07-16 15:57:12 +00:00
Matt Arsenault 22c4a147a9 AMDGPU/GlobalISel: Fix test failures in release build
Apparently the check for legal instructions during instruction
select does not happen without an asserts build, so these would
successfully select in release, and fail in debug.

Make s16 and/or/xor legal. These can just be selected directly
to the 32-bit operation, as is already done in SelectionDAG, so just
make them legal.

llvm-svn: 366210
2019-07-16 14:28:30 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
Matt Arsenault 1739b700b1 AMDGPU: Avoid code predicates for extload PatFrags
Use the MemoryVT field. This will be necessary for tablegen to
automatically handle patterns for GlobalISel.

Doesn't handle the d16 lo/hi patterns. Those are a special case since
it involvess the custom node type.

llvm-svn: 366168
2019-07-16 02:46:05 +00:00
Austin Kerbow 423b4a18a4 [AMDGPU] Enable merging m0 initializations.
Summary:
Enable hoisting and merging m0 defs that are initialized with the same
immediate value. Fixes bug where removed instructions are not considered
to interfere with other inits, and make sure to not hoist inits before block
prologues.

Reviewers: rampitec, arsenm

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64766

llvm-svn: 366135
2019-07-15 22:07:05 +00:00
Matt Arsenault b082f1055b AMDGPU: Use standalone MUBUF load patterns
We already do this for the flat and DS instructions, although it is
certainly uglier and more verbose.

This will allow using separate pattern definitions for extload and
zextload. Currently we get away with using a single PatFrag with
custom predicate code to check if the extension type is a zextload or
anyextload. The generic mechanism the global isel emitter understands
treats these as mutually exclusive. I was considering making the
pattern emitter accept zextload or sextload extensions for anyextload
patterns, but in global isel, the different extending loads have
distinct opcodes, and there is currently no mechanism for an opcode
matcher to try multiple (and there probably is very little need for
one beyond this case).

llvm-svn: 366132
2019-07-15 21:41:44 +00:00
Matt Arsenault 66ee934440 AMDGPU/GlobalISel: Allow scalar s1 and/or/xor
If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to
whether the result is 0. If the inputs are SCC, these can be copied to
a 32-bit SGPR to produce an SCC result.

llvm-svn: 366125
2019-07-15 20:20:18 +00:00
Matt Arsenault c8291c94f8 AMDGPU/GlobalISel: Select G_AND/G_OR/G_XOR
llvm-svn: 366121
2019-07-15 19:50:07 +00:00
Matt Arsenault ad19b50c00 AMDGPU/GlobalISel: Don't constrain source register of VCC copies
This is a hack until I come up with a better way of dealing with the
pseudo-register banks used for boolean values. If the use instruction
constrains the register, the selector for the def instruction won't
see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have
been SCCRegBank or VCCRegBank in wave32.

This is necessary to successfully select branches with and and/or/xor
condition.

llvm-svn: 366120
2019-07-15 19:48:36 +00:00
Matt Arsenault e1b52f4180 AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copies
The extra test change is correct, although how it arrives there is a
bug that needs work. With wave32, the test for isVCC ambiguously
reports true for an SCC or VCC source. A new allocatable pseudo
register class for SCC may be necesssary.

llvm-svn: 366119
2019-07-15 19:46:48 +00:00
Matt Arsenault 3bfdb54d88 AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC
llvm-svn: 366118
2019-07-15 19:45:49 +00:00
Matt Arsenault 18b7133843 AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCC
This was emitting a copy from a 32-bit register to a 64-bit.

llvm-svn: 366117
2019-07-15 19:44:07 +00:00
Matt Arsenault 6ed315f89b AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELT
llvm-svn: 366116
2019-07-15 19:43:04 +00:00
Matt Arsenault b0e04c018c AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELT
Turn the constant cases into G_EXTRACTs.

llvm-svn: 366115
2019-07-15 19:40:59 +00:00
Matt Arsenault 5dfd466032 AMDGPU/GlobalISel: Fix G_ICMP for wave32
llvm-svn: 366114
2019-07-15 19:39:31 +00:00
Matt Arsenault 90bdfb3daf AMDGPU/GlobalISel: Widen vector extracts
llvm-svn: 366103
2019-07-15 18:31:10 +00:00
Matt Arsenault 53fa759ff5 AMDGPU/GlobalISel: Handle llvm.amdgcn.if.break
llvm-svn: 366102
2019-07-15 18:25:24 +00:00
Matt Arsenault b390121efb AMDGPU/GlobalISel: Select llvm.amdgcn.end.cf
llvm-svn: 366099
2019-07-15 18:18:46 +00:00
Matt Arsenault 49169a963e AMDGPU: Add 24-bit mul intrinsics
Insert these during codegenprepare.

This works around a DAG issue where generic combines eliminate the and
asserting the high bits are zero, which then exposes an unknown read
source to the mul combine. It doesn't worth the hassle of trying to
insert an AssertZext or something to try to deal with it.

llvm-svn: 366094
2019-07-15 17:50:31 +00:00
Stanislav Mekhanoshin 7938424eb9 [AMDGPU] Copy missing predicate from pseudo to real
NFC at the momemnt, needed for future commit.

Differential Revision: https://reviews.llvm.org/D64761

llvm-svn: 366092
2019-07-15 17:49:25 +00:00
Matt Arsenault a65913e752 AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTOR
llvm-svn: 366087
2019-07-15 17:26:43 +00:00
Matt Arsenault cc02b17082 AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORS
llvm-svn: 366086
2019-07-15 17:20:40 +00:00
Stanislav Mekhanoshin fd08dcb9db [AMDGPU] fixed scheduler crash in gfx908
For some reason scheduler can send down an SUnit without an
instruction.

Differential Revision: https://reviews.llvm.org/D64709

llvm-svn: 366074
2019-07-15 15:34:05 +00:00
Dmitry Preobrazhensky 5153b1723a [AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL message
Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D64729

llvm-svn: 366071
2019-07-15 15:12:16 +00:00
Dmitry Preobrazhensky 8d879c8d95 [AMDGPU][MC] Corrected encoding of src0 for DS_GWS_* instructions
See bug 42599: https://bugs.llvm.org/show_bug.cgi?id=42599

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D64716

llvm-svn: 366067
2019-07-15 14:37:57 +00:00
Bill Wendling 796ed134cc Remove set but unused variable.
llvm-svn: 366041
2019-07-15 06:35:28 +00:00
Stanislav Mekhanoshin 1dfae6fe50 [AMDGPU] use v32f32 for 3 mfma intrinsics
These should really use v32f32, but were defined as v32i32
due to the lack of the v32f32 type.

Differential Revision: https://reviews.llvm.org/D64667

llvm-svn: 365972
2019-07-12 22:42:01 +00:00
Matt Arsenault 51a05d72ae AMDGPU: Drop remnants of byval support for shaders
Before 2018, mesa used to use byval interchangably with inreg, which
didn't really make sense. Fix tests still using it to avoid breaking
in a future commit.

llvm-svn: 365953
2019-07-12 20:12:17 +00:00
David Tenty ae79a2c390 Fix missing use of defined() in include guard
Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64657

llvm-svn: 365952
2019-07-12 20:12:15 +00:00
Stanislav Mekhanoshin 495b0f5cc3 [AMDGPU] Extend MIMG opcode to 8 bits
This is NFC, but required for future commit.

Differential Revision: https://reviews.llvm.org/D64649

llvm-svn: 365940
2019-07-12 18:38:06 +00:00
Jay Foad 27ec195f39 [AMDGPU] Fix DPP combiner check for exec modification
Summary:
r363675 changed the exec modification helper function, now called
execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks
all instructions in the basic block, even beyond the last use. That
meant that the DPP combiner no longer worked in any basic block that
ended with a control flow instruction, and in particular it didn't work
on code sequences generated by the atomic optimizer.

Fix it by reinstating the old behaviour but in a new helper function
execMayBeModifiedBeforeAnyUse, and limiting the number of instructions
scanned.

Reviewers: arsenm, vpykhtin

Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64393

llvm-svn: 365910
2019-07-12 15:59:40 +00:00
Jay Foad 7816ad918f [AMDGPU] Restrict v_cndmask_b32 abs/neg modifiers to f32
Summary:
D64497 allowed abs/neg source modifiers on v_cndmask_b32 but it doesn't
make any sense to apply them to f16 operands; they would interpret the
bits of the value as an f32, giving nonsensical results. This patch
restricts them to f32 operands.

Reviewers: arsenm, hakzsam

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64636

llvm-svn: 365904
2019-07-12 15:02:59 +00:00
Fangrui Song b251cc0d91 Delete dead stores
llvm-svn: 365903
2019-07-12 14:58:15 +00:00
Michael Liao 16d3c1ac03 [AMDGPU] Skip calculating callee saved registers for entry function.
Reviewers: arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64596

llvm-svn: 365846
2019-07-11 23:53:30 +00:00
Matt Arsenault e5fb434d92 AMDGPU: s_waitcnt field should be treated as unsigned
Also make it an ImmLeaf, so it should work with global isel as well,
which was part of the point of moving it in the first place.

llvm-svn: 365842
2019-07-11 23:42:57 +00:00
Stanislav Mekhanoshin 28550c8680 [AMDGPU] Fixed asan error with agpr spilling
Instruction was used after it was erased.

llvm-svn: 365837
2019-07-11 22:30:11 +00:00
Stanislav Mekhanoshin 937ff6e701 [AMDGPU] gfx908 agpr spilling
Differential Revision: https://reviews.llvm.org/D64594

llvm-svn: 365833
2019-07-11 21:54:13 +00:00
Stanislav Mekhanoshin 7d2019bb96 [AMDGPU] gfx908 hazard recognizer
Differential Revision: https://reviews.llvm.org/D64593

llvm-svn: 365829
2019-07-11 21:30:34 +00:00
Stanislav Mekhanoshin b83e283e65 [AMDGPU] gfx908 scheduling
Differential Revision: https://reviews.llvm.org/D64590

llvm-svn: 365826
2019-07-11 21:25:00 +00:00
Stanislav Mekhanoshin e67cc380a8 [AMDGPU] gfx908 mfma support
Differential Revision: https://reviews.llvm.org/D64584

llvm-svn: 365824
2019-07-11 21:19:33 +00:00
Matt Arsenault b725d27350 AMDGPU/GlobalISel: Move kernel argument handling to separate function
llvm-svn: 365782
2019-07-11 14:18:25 +00:00
Jay Foad c1b7db9eda Remove some redundant code from r290372 and improve a comment.
llvm-svn: 365741
2019-07-11 08:49:52 +00:00
Stanislav Mekhanoshin e93279fd1b [AMDGPU] gfx908 atomic fadd and atomic pk_fadd
Differential Revision: https://reviews.llvm.org/D64435

llvm-svn: 365717
2019-07-11 00:10:17 +00:00
Stanislav Mekhanoshin c0ae1be066 [AMDGPU] gfx908 dot instruction support
Differential Revision: https://reviews.llvm.org/D64431

llvm-svn: 365715
2019-07-11 00:00:27 +00:00
Matt Arsenault 6ce1b4fec5 GlobalISel: Legalization for G_FMINNUM/G_FMAXNUM
llvm-svn: 365658
2019-07-10 16:31:19 +00:00
Matt Arsenault 58426a3707 AMDGPU: Serialize mode from MachineFunctionInfo
llvm-svn: 365653
2019-07-10 16:09:26 +00:00
Jay Foad bba37e89a5 [AMDGPU] Allow abs/neg source modifiers on v_cndmask_b32
Summary:
D59191 added support for these modifiers in the assembler and
disassembler. This patch just teaches instruction selection that it can
use them.

Reviewers: arsenm, tstellar

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64497

llvm-svn: 365640
2019-07-10 14:53:47 +00:00
Tom Stellard d0ba79fe7b AMDGPU/GlobalISel: Add support for wide loads >= 256-bits
Summary:
This adds support for the most commonly used wide load types:
<8xi32>, <16xi32>, <4xi64>, and <8xi64>

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57399

llvm-svn: 365586
2019-07-10 00:22:41 +00:00
Matt Arsenault b1843e130a GlobalISel: Implement lower for G_FCOPYSIGN
In SelectionDAG AMDGPU treated these as legal, but this was mostly
because the bitcasts required for FP types were painful. Theoretically
the bitpattern should eventually match to bfi, so don't bother trying
to get the patterns to import.

llvm-svn: 365583
2019-07-09 23:34:29 +00:00
Matt Arsenault 3f1a34546c AMDGPU/GlobalISel: Fix legality for G_BUILD_VECTOR
llvm-svn: 365575
2019-07-09 22:48:04 +00:00
Stanislav Mekhanoshin 1e9eae95af [AMDGPU] gfx908 v_pk_fmac_f16 support
Differential Revision: https://reviews.llvm.org/D64433

llvm-svn: 365573
2019-07-09 22:42:24 +00:00
Stanislav Mekhanoshin 50d7f46460 [AMDGPU] gfx908 mAI instructions, MC part
Differential Revision: https://reviews.llvm.org/D64446

llvm-svn: 365563
2019-07-09 21:43:09 +00:00
Craig Topper 84a1f07363 [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.

This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.

Differential Revision: https://reviews.llvm.org/D64295

llvm-svn: 365549
2019-07-09 19:55:28 +00:00
Stanislav Mekhanoshin 9e77d0c6df [AMDGPU] gfx908 register file changes
Differential Revision: https://reviews.llvm.org/D64438

llvm-svn: 365546
2019-07-09 19:41:51 +00:00
Stanislav Mekhanoshin 22b2c3d651 [AMDGPU] gfx908 target
Differential Revision: https://reviews.llvm.org/D64429

llvm-svn: 365525
2019-07-09 18:10:06 +00:00
Christudasan Devadasan b2d24bd540 [AMDGPU] Created a sub-register class for the return address operand in the return instruction.
Function return instruction lowering, currently uses the fixed register pair s[30:31] for holding
the return address. It can be any SGPR pair other than the CSRs. Created an SGPR pair sub-register class
exclusive of the CSRs, and used this regclass while lowering the return instruction.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D63924

llvm-svn: 365512
2019-07-09 16:48:42 +00:00
Matt Arsenault 4dd5755d01 AMDGPU/GlobalISel: Legalize more concat_vectors
llvm-svn: 365488
2019-07-09 14:17:31 +00:00
Matt Arsenault 6bdb92d833 AMDGPU/GlobalISel: Improve regbankselect for icmp s16
Account for 64-bit scalar eq/ne when available.

llvm-svn: 365487
2019-07-09 14:13:09 +00:00
Matt Arsenault 8b8eee5904 AMDGPU/GlobalISel: Make s16 G_ICMP legal
llvm-svn: 365486
2019-07-09 14:10:43 +00:00
Matt Arsenault e6d10f97dd AMDGPU/GlobalISel: Select G_SUB
llvm-svn: 365484
2019-07-09 14:05:11 +00:00
Matt Arsenault 872f38be7e AMDGPU/GlobalISel: Select G_UNMERGE_VALUES
llvm-svn: 365483
2019-07-09 14:02:26 +00:00
Matt Arsenault 9b7ffc4e55 AMDGPU/GlobalISel: Select G_MERGE_VALUES
llvm-svn: 365482
2019-07-09 14:02:20 +00:00
Stanislav Mekhanoshin c776dc0b60 [AMDGPU] Added td definitions for HW regs
Infrastructure work for future commit. NFC.

Differential Revision: https://reviews.llvm.org/D64370

llvm-svn: 365432
2019-07-09 03:20:33 +00:00
Stanislav Mekhanoshin 818d748a45 [AMDGPU] Always use s_memtime for readcyclecounter
Differential Revision: https://reviews.llvm.org/D64369

llvm-svn: 365431
2019-07-09 03:10:18 +00:00
Matt Arsenault 9e7cbc0e7d AMDGPU: Split extload/zextload local load patterns
This will help removing the custom load predicates, allowing the
global isel emitter to handle them.

llvm-svn: 365398
2019-07-08 22:08:23 +00:00
Bill Wendling c8933c4070 Add parentheses to silence warning.
llvm-svn: 365394
2019-07-08 22:00:33 +00:00
Matt Arsenault 8561844321 AMDGPU: Fix unused variable in release build
llvm-svn: 365378
2019-07-08 19:47:42 +00:00
Matt Arsenault acc9e1e4c2 AMDGPU: Fix stray typing
llvm-svn: 365373
2019-07-08 19:05:19 +00:00
Matt Arsenault 71dfb7ec5c AMDGPU: Make s34 the FP register
Make the FP register callee saved.

This is tricky because now the FP needs to be spilled in the prolog
relative to the incoming SP register, rather than the frame register
used throughout the rest of the function. I don't like how this
bypassess the standard mechanism for CSR spills just to get the
correct insert point. I may look for a better solution, since all CSR
VGPRs may also need to have all lanes activated. Another option might
be to make getFrameIndexReference change the base register if the
frame index is a CSR, and then try to figure out the right insertion
point in emitProlog.

If there is a free VGPR lane available for SGPR spilling, try to use
it for the FP. If that would require intrtoducing a new VGPR spill,
try to use a free call clobbered SGPR. Only fallback to introducing a
new VGPR spill as a last resort.

This also doesn't attempt to handle SGPR spilling with scalar stores.

llvm-svn: 365372
2019-07-08 19:03:38 +00:00
Matt Arsenault 5e643036cb AMDGPU: Move DEBUG_TYPE definition below includes
llvm-svn: 365369
2019-07-08 18:48:39 +00:00
Matt Arsenault 224d8cd987 AMDGPU: Remove mubuf specific PatFrags
These are identical to the *_global PatFrag, and will only create more
work to get the GlobalISel importer to handle them.

llvm-svn: 365350
2019-07-08 16:53:53 +00:00
Matt Arsenault 430b0497e7 AMDGPU: Move waitcnt intrinsic to instruction definition pattern
llvm-svn: 365349
2019-07-08 16:53:48 +00:00
Dmitry Preobrazhensky 2eff0318c6 [AMDGPU][MC] Corrected parsing of FLAT offset modifier
Summary of changes:

- simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset;
- provided information about error position for pre-gfx9 targets;
- improved errors handling.

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D64244

llvm-svn: 365321
2019-07-08 14:27:37 +00:00
Jay Foad 38902350ef [AMDGPU] Use a named predicate instead of a magic number.
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64201

llvm-svn: 365294
2019-07-08 07:04:58 +00:00
Matt Arsenault 5e9610a3f5 AMDGPU: Fix assert in clang test
llvm-svn: 365245
2019-07-05 21:09:53 +00:00
Matt Arsenault e7e23e3e91 AMDGPU: Make AMDGPUPerfHintAnalysis an SCC pass
Add a string attribute instead of directly setting
MachineFunctionInfo. This avoids trying to get the analysis in the
MachineFunctionInfo in a way that doesn't work with the new pass
manager.

This will also avoid re-visiting the call graph for every single
function.

llvm-svn: 365241
2019-07-05 20:26:13 +00:00
Michael Liao 8d6ea2d48c [CodeGen] Enhance `MachineInstrSpan` to allow the end of MBB to be used.
Summary:
- Explicitly specify the parent MBB to allow the end iterator to be
  used.

Reviewers: aprantl, MatzeB, craig.topper, qcolombet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64261

llvm-svn: 365240
2019-07-05 20:23:59 +00:00
Christudasan Devadasan 652ad423bb [NFC] A test commit to check the access permission. Removed a blank line.
llvm-svn: 365223
2019-07-05 17:07:42 +00:00
Yaxun Liu a62413526d [AMDGPU] Added a new metadata for multi grid sync implicit argument
Patch by Christudasan Devadasan.

Differential Revision: https://reviews.llvm.org/D63886

llvm-svn: 365217
2019-07-05 16:05:17 +00:00
Jay Foad 7e0c10b55f [AMDGPU] DPP combiner: recognize identities for more opcodes
Summary:
This allows the DPP combiner to kick in more often. For example the
exclusive scan generated by the atomic optimizer for a divergent atomic
add used to look like this:

        v_mov_b32_e32 v3, v1
        v_mov_b32_e32 v5, v1
        v_mov_b32_e32 v6, v1
        v_mov_b32_dpp v3, v2  wave_shr:1 row_mask:0xf bank_mask:0xf
        s_nop 1
        v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
        v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
        v_mov_b32_dpp v6, v3  row_shr:3 row_mask:0xf bank_mask:0xf
        v_add3_u32 v3, v4, v5, v6
        v_mov_b32_e32 v4, v1
        s_nop 1
        v_mov_b32_dpp v4, v3  row_shr:4 row_mask:0xf bank_mask:0xe
        v_add_u32_e32 v3, v3, v4
        v_mov_b32_e32 v4, v1
        s_nop 1
        v_mov_b32_dpp v4, v3  row_shr:8 row_mask:0xf bank_mask:0xc
        v_add_u32_e32 v3, v3, v4
        v_mov_b32_e32 v4, v1
        s_nop 1
        v_mov_b32_dpp v4, v3  row_bcast:15 row_mask:0xa bank_mask:0xf
        v_add_u32_e32 v3, v3, v4
        s_nop 1
        v_mov_b32_dpp v1, v3  row_bcast:31 row_mask:0xc bank_mask:0xf
        v_add_u32_e32 v1, v3, v1
        v_add_u32_e32 v1, v2, v1
        v_readlane_b32 s0, v1, 63

But now most of the dpp movs are combined into adds:

        v_mov_b32_e32 v3, v1
        v_mov_b32_e32 v5, v1
        s_nop 0
        v_mov_b32_dpp v3, v2  wave_shr:1 row_mask:0xf bank_mask:0xf
        s_nop 1
        v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
        v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
        v_mov_b32_dpp v1, v3  row_shr:3 row_mask:0xf bank_mask:0xf
        v_add3_u32 v1, v4, v5, v1
        s_nop 1
        v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
        s_nop 1
        v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
        s_nop 1
        v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
        s_nop 1
        v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf
        v_add_u32_e32 v1, v2, v1
        v_readlane_b32 s0, v1, 63

Reviewers: arsenm, vpykhtin

Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64207

llvm-svn: 365211
2019-07-05 14:52:48 +00:00
Tim Renouf 5816889c74 [AMDGPU] Custom lower INSERT_SUBVECTOR v3, v4, v5, v8
Summary:
Since the changes to introduce vec3 and vec5, INSERT_VECTOR for these
sizes has been marked "expand", which made LegalizeDAG lower it to loads
and stores via a stack slot. The code got optimized a bit later, but the
now-unused stack slot was never deleted.

This commit avoids that problem by custom lowering INSERT_SUBVECTOR into
an EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT for each element in the
subvector to insert.

V2: Addressed review comments re test.

Differential Revision: https://reviews.llvm.org/D63160

Change-Id: I9e3c13e36f68cfa3431bb9814851cc1f673274e1
llvm-svn: 365148
2019-07-04 17:38:24 +00:00
Jay Foad 0cd50b2a95 Fix typos in comments and debug output.
llvm-svn: 365146
2019-07-04 15:04:29 +00:00
Michael Liao 7a9ad430fe [AMDGPU] Correct the setting of `FlatScratchInit`.
Summary: - That flag setting should skip spilling stack slot.

Reviewers: arsenm, rampitec

Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64143

llvm-svn: 365137
2019-07-04 13:29:45 +00:00
Matt Arsenault 5b0922fe1f AMDGPU: Add pass to lower SGPR spills
This is split out from my patches to split register allocation into a
separate SGPR and VGPR phase, and has some parts that aren't yet used
(like maintaining LiveIntervals).

This simplifies making the frame pointer register callee saved. As it
is now, the code to determine callee saves needs to predict all the
possible SGPR spills and how many callee saved VGPRs are needed. By
handling this before PrologEpilogInserter, it's possible to just check
the spill objects that already exist.

Change-Id: I29e6df4034afcf949e06f8ef44206acb94696f04
llvm-svn: 365095
2019-07-03 23:32:29 +00:00
Matt Arsenault c96c174557 Revert "[AMDGPU] Kernel arg metadata: added support for "__hip_texture" type."
This reverts commit r365073.

This is crashing, and is improperly relying on IR type names.

llvm-svn: 365087
2019-07-03 21:34:34 +00:00
Konstantin Pyzhov 6f419a3370 [AMDGPU] Kernel arg metadata: added support for "__hip_texture" type.
Summary:
Hip texture type is equivalent to OpenCL image. So, we need to set the Image type for kernel arguments with __hip_texture type.

Differential revision: https://reviews.llvm.org/D63850

llvm-svn: 365073
2019-07-03 19:11:35 +00:00
Michael Liao 80177ca5a9 [AMDGPU] Enable serializing of argument info.
Summary:
- Support serialization of all arguments in machine function info. This
  enables fabricating MIR tests depending on argument info.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64096

llvm-svn: 364995
2019-07-03 02:00:21 +00:00
Matt Arsenault c04aab9c06 AMDGPU: Look through bundles for existing waitcnts
These aren't produced now, but will be in a future patch.

llvm-svn: 364983
2019-07-03 00:30:44 +00:00
Matt Arsenault 5fe851b6cd AMDGPU: Custom lower vector_shuffle for v4i16/v4f16
Ordinarily it is lowered as a build_vector of each extract_vector_elt,
which in turn get lowered to bitcasts and bit shifts. Very little
understand the lowered extract pattern, resulting in much worse
code. We treat concat_vectors of v2i16 as legal, so prefer that.

llvm-svn: 364959
2019-07-02 19:15:45 +00:00
Alexander Timofeev 66ac6b409d [AMDGPU] LCSSA pass added in preISel. Fixing typo in previous commit
llvm-svn: 364952
2019-07-02 18:16:42 +00:00
Alexander Timofeev 2ce560f029 [AMDGPU] LCSSA pass added in preISel. Uniform values defined in the divergent loop and used outside
Differential Revision: https://reviews.llvm.org/D63953

Reviewers: rampitec, nhaehnle, arsenm
llvm-svn: 364950
2019-07-02 17:59:44 +00:00
Matt Arsenault 50be3481d4 AMDGPU/GlobalISel: Try generated matcher with intrinsics
llvm-svn: 364933
2019-07-02 14:52:16 +00:00
Matt Arsenault a8bff4b963 AMDGPU/GlobalISel: Select mul
llvm-svn: 364932
2019-07-02 14:52:14 +00:00
Matt Arsenault 70a4d3f67c AMDGPU/GlobalISel: Fix G_GEP with mixed SGPR/VGPR operands
The register bank for the destination of the sample argument copy was
wrong. We shouldn't be constraining each source to the result register
bank. Allow constraining the original register to the right size.

llvm-svn: 364928
2019-07-02 14:40:22 +00:00
Matt Arsenault ed63399244 AMDGPU/GlobalISel: Select G_FENCE
Manually select to workaround tablegen emitter emitting checks for
G_CONSTANT.

llvm-svn: 364927
2019-07-02 14:17:38 +00:00
Matt Arsenault 40c08052a5 AMDGPU: Correct properties for adjcallstack* pseudos
These should be SALU writes, and these are lowered to instructions
that def SCC.

llvm-svn: 364859
2019-07-01 22:01:05 +00:00
Matt Arsenault bae3636f96 AMDGPU/GlobalISel: Handle more input argument intrinsics
llvm-svn: 364836
2019-07-01 18:50:50 +00:00
Matt Arsenault 9e8e8c60fa AMDGPU/GlobalISel: Lower kernarg segment ptr intrinsics
llvm-svn: 364835
2019-07-01 18:49:01 +00:00
Matt Arsenault 756d81905f AMDGPU/GlobalISel: Legalize workgroup ID intrinsics
llvm-svn: 364834
2019-07-01 18:47:22 +00:00
Matt Arsenault e2c86cce3a AMDGPU/GlobalISel: Legalize workitem ID intrinsics
Tests don't cover the masked input path since non-kernel arguments
aren't lowered yet.

Test is copied directly from the existing test, with 2 additions.

llvm-svn: 364833
2019-07-01 18:45:36 +00:00
Matt Arsenault e15770aec4 AMDGPU/GlobalISel: Custom lower control flow intrinsics
Replace the brcond for the 2 cases that act as branches. For now
follow how the current system works, although I think we can
eventually get rid of the pseudos.

llvm-svn: 364832
2019-07-01 18:40:23 +00:00
Matt Arsenault 4073b33786 AMDGPU/GlobalISel: Handle 16-bit SALU min/max
This needs to be extended to s32, and expanded into cmp+select.  This
is relying on the fact that widenScalar happens to leave the
instruction in place, but this isn't a guaranteed property of
LegalizerHelper.

llvm-svn: 364831
2019-07-01 18:33:37 +00:00
Matt Arsenault 5a7d5111e5 AMDGPU/GlobalISel: Lower SALU min/max to cmp+select
Use a change observer to apply a register bank to the newly created
intermediate result register.

llvm-svn: 364830
2019-07-01 18:30:45 +00:00
Matt Arsenault ef59cb6982 AMDGPU/GlobalISel: Legalize s16 add/sub/mul
If this is scalar, promote to s32. Use a new observer class to assign
the register bank of newly created registers.

llvm-svn: 364827
2019-07-01 18:18:55 +00:00
Matt Arsenault 9470bb262b AMDGPU/GlobalISel: Fix allowing non-boolean conditions for G_SELECT
The condition register bank must be scc or vcc so that a copy will be
inserted, which will be lowered to a compare.

Currently greedy unnecessarily forces using a VCC select.

llvm-svn: 364825
2019-07-01 18:13:12 +00:00
Matt Arsenault b2ea20eedd AMDGPU/GlobalISel: RegBankSelect for sendmsg/sendmsghalt
llvm-svn: 364819
2019-07-01 17:40:18 +00:00
Matt Arsenault 40d1faf38f AMDGPU/GlobalISel: Legalize s16 fcmp
llvm-svn: 364817
2019-07-01 17:35:53 +00:00
Nicolai Haehnle 10c911db63 AMDGPU/GFX10: implement ds_ordered_count changes
Summary:
ds_ordered_count can now simultaneously operate on up to 4 dwords
in a single instruction, which are taken from (and returned to)
lanes 0..3 of a single VGPR.

Change-Id: I19b6e7b0732b617c10a779a7f9c0303eec7dd276

Reviewers: mareko, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63716

llvm-svn: 364815
2019-07-01 17:17:52 +00:00
Nicolai Haehnle 4dc3b2bf95 AMDGPU: Support GDS atomics
Summary:
Original patch by Marek Olšák

Change-Id: Ia97d5d685a63a377d86e82942436d1fe6e429bab

Reviewers: mareko, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63452

llvm-svn: 364814
2019-07-01 17:17:45 +00:00
Matt Arsenault 1094e6a814 AMDGPU/GlobalISel: RegBankSelect for DS ordered add/swap
llvm-svn: 364811
2019-07-01 17:04:57 +00:00
Matt Arsenault 265059eaf6 AMDGPU/GlobalISel: RegBankSelect for amdgcn.writelane
llvm-svn: 364808
2019-07-01 16:41:36 +00:00
Matt Arsenault a310727830 AMDGPU/GlobalISel: Fail instead of assert when selecting loads
llvm-svn: 364807
2019-07-01 16:36:39 +00:00
Matt Arsenault 0a52e9d026 AMDGPU/GlobalISel: Complete implementation of G_GEP
Also works around tablegen defect in selecting add with unused carry,
but if we have to manually select GEP, might as well handle add
manually.

llvm-svn: 364806
2019-07-01 16:34:48 +00:00
Matt Arsenault e1006259d8 AMDGPU/GlobalISel: Select G_PHI
llvm-svn: 364805
2019-07-01 16:32:47 +00:00
Matt Arsenault d810ff2588 AMDGPU/GlobalISel: Try to select VOP3 form of add
There are several things broken, but at least emit the right thing for
gfx9.

The import of the pattern with the unused carry out seems to not
work. Needs a special class for clamp, because OperandWithDefaultOps
doesn't really work.

llvm-svn: 364804
2019-07-01 16:27:32 +00:00
Matt Arsenault 62d64b0c30 AMDGPU/GlobalISel: RegBankSelect for readlane/readfirstlane
llvm-svn: 364801
2019-07-01 16:19:39 +00:00
Tom Stellard 9e9dd30de3 AMDGPU/GlobalISel: Implement select for 32-bit G_ADD
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: hiraditya, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58804

llvm-svn: 364797
2019-07-01 16:09:33 +00:00
Matt Arsenault 2ab25f9ceb AMDGPU/GlobalISel: Select G_BRCOND for vcc
llvm-svn: 364795
2019-07-01 16:06:02 +00:00
Matt Arsenault cda82f0bb6 AMDGPU/GlobalISel: Select G_FRAME_INDEX
llvm-svn: 364789
2019-07-01 15:48:18 +00:00
Nicolai Haehnle 7cfd99ab15 AMDGPU/GFX10: fix scratch resource descriptor
Summary:
The stride should depend on the wave size, not the hardware generation.

Also, the 32_FLOAT format is 0x16, not 16; though that shouldn't be
relevant.

Change-Id: I088f93bf6708974d085d1c50967f119061da6dc6

Reviewers: arsenm, rampitec, mareko

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63808

llvm-svn: 364788
2019-07-01 15:43:00 +00:00
Matt Arsenault fdf36729c7 AMDGPU/GlobalISel: Make s16 select legal
This is easy to handle and avoids legalization artifacts which are
likely to obscure combines.

llvm-svn: 364787
2019-07-01 15:42:47 +00:00
Matt Arsenault 6464280eb0 AMDGPU/GlobalISel: Select G_BRCOND for scc conditions
llvm-svn: 364786
2019-07-01 15:39:27 +00:00
Matt Arsenault 1daad91af6 AMDGPU/GlobalISel: Tolerate copies with no type set
isVCC has the same bug, but isn't used in a context where it can cause
a problem.

llvm-svn: 364784
2019-07-01 15:23:04 +00:00
Matt Arsenault 4f64ade04c AMDGPU/GlobalISel: Select src modifiers
llvm-svn: 364782
2019-07-01 15:18:56 +00:00
Matt Arsenault 1b317685e9 AMDGPU: Convert some places to Register
llvm-svn: 364769
2019-07-01 13:44:46 +00:00
Matt Arsenault 5bf850d52e AMDGPU/GlobalISel: Fix RegBankSelect for G_FCANONICALIZE
llvm-svn: 364768
2019-07-01 13:40:18 +00:00
Matt Arsenault b5fc94f3e7 AMDGPU/GlobalISel: Fix RegBankSelect for G_BUILD_VECTOR
llvm-svn: 364767
2019-07-01 13:40:17 +00:00
Matt Arsenault 89fc8bcdd6 AMDGPU/GlobalISel: Fail on store to 32-bit address space
llvm-svn: 364766
2019-07-01 13:37:39 +00:00
Matt Arsenault 3b7668ae4b AMDGPU/GlobalISel: Improve icmp selection coverage.
Select s64 eq/ne scalar icmp.

llvm-svn: 364765
2019-07-01 13:34:26 +00:00
Matt Arsenault c23149f612 AMDGPU/GlobalISel: RegBankSelect for WWM/WQM
llvm-svn: 364763
2019-07-01 13:30:12 +00:00
Matt Arsenault facf69e844 AMDGPU/GlobalISel: Use vcc reg bank for amdgcn.wqm.vote
llvm-svn: 364762
2019-07-01 13:30:09 +00:00
Matt Arsenault 9f992c238a AMDGPU/GlobalISel: Fix scc->vcc copy handling
This was checking the size of the register with the value of the size,
which happens to be exec. Also fix assuming VCC is 64-bit to fix
wave32.

Also remove some untested handling for physical registers which is
skipped. This doesn't insert the V_CNDMASK_B32 if SCC is the physical
copy source. I'm not sure if this should be trying to handle this
special case instead of dealing with this in copyPhysReg.

llvm-svn: 364761
2019-07-01 13:22:07 +00:00
Matt Arsenault 5dafcb9b11 AMDGPU/GlobalISel: Use and instead of BFE with inline immediate
Zext from s1 is the only case where this should do anything with the
current legal extensions.

llvm-svn: 364760
2019-07-01 13:22:06 +00:00
Florian Hahn 33c8c0ea27 [AMDGPU] Call isLoopExiting for blocks in the loop.
isLoopExiting should only be called for blocks in the loop. A follow
up patch makes this requirement an assertion.

I've updated the usage here, to only match for actual exit blocks. Previously,
it would also match blocks not in the loop.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D63980

llvm-svn: 364750
2019-07-01 12:36:44 +00:00
Matt Arsenault 0d45209757 AMDGPU/GlobalISel: RegBankSelect for update.dpp
llvm-svn: 364701
2019-06-29 00:44:36 +00:00
Matt Arsenault fd82cf4f4d AMDGPU/GlobalISel: RegBankSelect for atomic.inc/atomic.dec
llvm-svn: 364699
2019-06-29 00:39:20 +00:00
Matt Arsenault adb1f21e52 AMDGPU/GlobalISel: RegBankSelect for some DS intrinsics
llvm-svn: 364698
2019-06-29 00:33:13 +00:00