Commit Graph

102 Commits

Author SHA1 Message Date
alex-t 5df1ac7846 [AMDGPU] fixed divergence driven shift operations selection
Differential Revision: https://reviews.llvm.org/D73483

Reviewers: rampitec
2020-01-31 20:49:56 +03:00
Matt Arsenault 7d67742160 AMDGPU/GlobalISel: Fix import of zext of s16 op patterns 2020-01-09 10:29:32 -05:00
Matt Arsenault 4844bf0fe2 AMDGPU: Apply i16 add->sub pattern with zext to i32
This was only applying the deeper nested zext pattern, and missing the
special case code size fold.
2020-01-07 16:36:31 -05:00
Matt Arsenault de46ab698b AMDGPU: Fix misleading, misplaced end block comments 2020-01-07 15:10:08 -05:00
Matt Arsenault 7fa0bfe7d5 AMDGPU/GlobalISel: Select mul24 intrinsics 2019-12-30 14:24:25 -05:00
Stanislav Mekhanoshin 4312c4afd4 [AMDGPU] deduplicate tablegen predicates
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.

We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.

This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886

Differential Revision: https://reviews.llvm.org/D69815
2019-11-04 12:19:17 -08:00
Dmitry Preobrazhensky 6c7d7eebda [AMDGPU][MC][GFX10] Added sdwa/dpp versions of v_cndmask_b32
See https://bugs.llvm.org/show_bug.cgi?id=43608

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69096

llvm-svn: 375241
2019-10-18 14:49:53 +00:00
Dmitry Preobrazhensky 7d325fe57b [AMDGPU][MC][GFX9] Corrected parsing of v_cndmask_b32_sdwa
See https://bugs.llvm.org/show_bug.cgi?id=43607

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D69095

llvm-svn: 375231
2019-10-18 13:31:53 +00:00
Stanislav Mekhanoshin d4ab74ee0b [AMDGPU] Supress unused sdwa insts generation
Do not generate non-existing sdwa instructions. It reduces the
number of generated instructions by 185.

Differential Revision: https://reviews.llvm.org/D69010

llvm-svn: 375016
2019-10-16 16:58:06 +00:00
Stanislav Mekhanoshin e2d104f64c [AMDGPU] link dpp pseudos and real instructions on gfx10
This defaults to zero fi operand, but we do not expose it
anyway. Should we expose it later it needs to be added to
the pseudo.

This enables dpp combining on gfx10.

Differential Revision: https://reviews.llvm.org/D68888

llvm-svn: 374604
2019-10-11 22:03:36 +00:00
Matt Arsenault 190a17bbd1 AMDGPU: Fix i16 arithmetic pattern redundancy
There were 2 problems here. First, these patterns were duplicated to
handle the inverted shift operands instead of using the commuted
PatFrags.

Second, the point of the zext folding patterns don't apply to the
non-0ing high subtargets. They should be skipped instead of inserting
the extension. The zeroing high code would be emitted when necessary
anyway. This was also emitting unnecessary zexts in cases where the
high bits were undefined.

llvm-svn: 374092
2019-10-08 17:36:38 +00:00
Stanislav Mekhanoshin 8f002193bf [AMDGPU] Disable unused gfx10 dpp instructions
Inhibit generation of unused real dpp instructions on gfx10 just
like it is done on other subtargets. This does not change anything
because these are illegal anyway and not accepted, but it does
reduce the number of instruction definitions generated.

Differential Revision: https://reviews.llvm.org/D68607

llvm-svn: 374083
2019-10-08 16:56:01 +00:00
Matt Arsenault 1237aa2996 AMDGPU/GlobalISel: Fix selection of 16-bit shifts
llvm-svn: 373945
2019-10-07 19:10:44 +00:00
Matt Arsenault 317d991fa5 AMDGPU/GlobalISel: Fix select for v2s16 and/or/xor
llvm-svn: 373180
2019-09-30 06:31:30 +00:00
Tim Renouf 1786117111 [AMDGPU] Allow FP inline constant in v_madak_f16 and v_fmaak_f16
Differential Revision: https://reviews.llvm.org/D67680

Change-Id: Ic38f47cb2079c2c1070a441b5943854844d80a7c
llvm-svn: 372208
2019-09-18 09:32:06 +00:00
Stanislav Mekhanoshin 1fb584f7a2 [AMDGPU] Added MI bit IsDOT
NFC, needed for future commit.

Differential Revision: https://reviews.llvm.org/D67669

llvm-svn: 372151
2019-09-17 17:56:13 +00:00
Matt Arsenault 638f802381 AMDGPU/GlobalISel: Select 16-bit VALU bit ops
llvm-svn: 371807
2019-09-13 03:55:43 +00:00
Matt Arsenault 4a73c6eada AMDGPU/GlobalISel: Select G_CTPOP
llvm-svn: 371798
2019-09-13 00:11:20 +00:00
Matt Arsenault d2a9516a6d AMDGPU: Move MnemonicAlias out of instruction def hierarchy
Unfortunately MnemonicAlias defines a "Predicates" field just like an
instruction or pattern, with a somewhat different interpretation.

This ends up overriding the intended Predicates set by
PredicateControl on the pseudoinstruction defintions with an empty
list. This allowed incorrectly selecting instructions that should have
been rejected due to the SubtargetPredicate from patterns on the
instruction definition.

This does remove the divergent predicate from the 64-bit shift
patterns, which were already not used for the 32-bit shift, so I'm not
sure what the point was. This also removes a second, redundant copy of
the 64-bit divergent patterns.

llvm-svn: 371427
2019-09-09 17:25:35 +00:00
Matt Arsenault f8c8284455 AMDGPU/GlobalISel: Select G_ASHR
llvm-svn: 366257
2019-07-16 20:31:25 +00:00
Matt Arsenault e5b28b98e9 AMDGPU/GlobalISel: Select G_LSHR
llvm-svn: 366256
2019-07-16 20:25:43 +00:00
Matt Arsenault 1b69fd275d AMDGPU/GlobalISel: Select G_SHL
I think this manages to not break the DAG handling with the divergent
predicates because the stadalone divergent patterns end up with a
higher priority than the pattern on the instruction definition.

The 16-bit versions don't work yet.

llvm-svn: 366254
2019-07-16 20:15:30 +00:00
Stanislav Mekhanoshin c0ae1be066 [AMDGPU] gfx908 dot instruction support
Differential Revision: https://reviews.llvm.org/D64431

llvm-svn: 365715
2019-07-11 00:00:27 +00:00
Stanislav Mekhanoshin 1e9eae95af [AMDGPU] gfx908 v_pk_fmac_f16 support
Differential Revision: https://reviews.llvm.org/D64433

llvm-svn: 365573
2019-07-09 22:42:24 +00:00
Stanislav Mekhanoshin 0846c125f9 [AMDGPU] gfx1010 core wave32 changes
Differential Revision: https://reviews.llvm.org/D63204

llvm-svn: 363934
2019-06-20 15:08:34 +00:00
Stanislav Mekhanoshin 121956108f [AMDGPU] Use custom inserter for gfx10 VOP2b
This is part of the approved D63204 pending parent revision.
This small change is in fact a part of the VOP2b legalization which
does not technically belong to wave32 support, so extracted
separately.

llvm-svn: 363625
2019-06-17 22:37:37 +00:00
Stanislav Mekhanoshin c43e67bfff [AMDGPU] gfx1011/gfx1012 targets
Differential Revision: https://reviews.llvm.org/D63307

llvm-svn: 363344
2019-06-14 00:33:31 +00:00
Stanislav Mekhanoshin 8bcc9bb595 [AMDGPU] gfx1010 base changes for wave32
Differential Revision: https://reviews.llvm.org/D63293

llvm-svn: 363299
2019-06-13 19:18:29 +00:00
Stanislav Mekhanoshin 245b5ba344 [AMDGPU] gfx1010 dpp16 and dpp8
Differential Revision: https://reviews.llvm.org/D63203

llvm-svn: 363186
2019-06-12 18:02:41 +00:00
Dmitry Preobrazhensky ee51d851ea [AMDGPU][GFX8][GFX9] Corrected predicate of v_*_co_u32 aliases
Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D61905

llvm-svn: 360702
2019-05-14 19:16:24 +00:00
Matt Arsenault 01434f9377 AMDGPU: Select VOP3 form of add
The VOP3 form should always be the preferred selection, to be shrunk
later. This should only be an optimization issue, but this partially
works around a problem from clobbering VCC when SIFixSGPRCopies
rewrites an SCC defining operation directly to VCC.

3 of the testcases are regressions from failing to fold the immediate
in cases it should. These can be avoided by improving the VCC liveness
handling in SIFoldOperands. Simply increasing the threshold to
computeRegisterLiveness works, although this is common enough that VCC
liveness should probably be tracked throughout the pass. The hack of
leaving behind an implicit_def instruction to avoid breaking iterator
wastes instruction count, which inhibits finding the VCC def in long
chains of adds. Doing this however exposes different, worse looking
regressions from poor scheduling behavior. This could probably be
avoided around by forcing the shrink of the addc here, but the
scheduler should probably be fixed.

The r600 add test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 360293
2019-05-08 22:09:57 +00:00
Changpeng Fang 73b7272e7a AMDGPU: Fix a mis-placed bracket
Differential Revision:
  https://reviews.llvm.org/D61430

llvm-svn: 360283
2019-05-08 19:46:04 +00:00
Matt Arsenault 657ef48a88 AMDGPU: Select VOP3 form of sub
The VOP3 form should always be the preferred selection form to be
shrunk later.

The r600 sub test needs to be split out because it asserts on the
arguments in the new test during the calling convention lowering.

llvm-svn: 359899
2019-05-03 15:37:07 +00:00
Matt Arsenault 344d68d3c9 AMDGPU: Remove redundant patterns for shifts
llvm-svn: 359895
2019-05-03 15:08:36 +00:00
Matt Arsenault ada33314a2 AMDGPU: Remove redundant patterns for sub
There were 2 patterns for sub, one selecting to sub and one to
subrev. Only one of these will succeed, so remove the reversed one.

llvm-svn: 359894
2019-05-03 15:08:35 +00:00
Stanislav Mekhanoshin 4f331cb1f3 [AMDGPU] gfx1010 VOPC implementation
Differential Revision: https://reviews.llvm.org/D61208

llvm-svn: 359358
2019-04-26 23:16:16 +00:00
Stanislav Mekhanoshin 8f3da70eed [AMDGPU] gfx1010 VOP2 changes
Differential Revision: https://reviews.llvm.org/D61156

llvm-svn: 359316
2019-04-26 16:37:51 +00:00
Stanislav Mekhanoshin 2c97ff07bf [AMDGPU] gfx1010 VOP1 instructions
Differential Revision: https://reviews.llvm.org/D61099

llvm-svn: 359225
2019-04-25 19:01:51 +00:00
Stanislav Mekhanoshin 5182302a37 [AMDGPU] Sort out and rename multiple CI/VI predicates
Differential Revision: https://reviews.llvm.org/D60346

llvm-svn: 357835
2019-04-06 09:20:48 +00:00
Stanislav Mekhanoshin 7895c03232 [AMDGPU] predicate and feature refactoring
We have done some predicate and feature refactoring lately but
did not upstream it. This is to sync.

Differential revision: https://reviews.llvm.org/D60292

llvm-svn: 357791
2019-04-05 18:24:34 +00:00
Tim Renouf cfdfba996b [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic
Allow the clamp modifier on vop3 int arithmetic instructions in assembly
and disassembly.

This involved adding a clamp operand to the affected instructions in MIR
and MC, and thus having to fix up several places in codegen and MIR
tests.

Differential Revision: https://reviews.llvm.org/D59267

Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e
llvm-svn: 356399
2019-03-18 19:35:44 +00:00
Tim Renouf 2e94f6e584 [AMDGPU] Asm/disasm v_cndmask_b32_e64 with abs/neg source modifiers
This commit allows v_cndmask_b32_e64 with abs, neg source
modifiers on src0, src1 to be assembled and disassembled.

This does appear to be allowed, even though they are floating point
modifiers and the operand type is b32.

To do this, I added src0_modifiers and src1_modifiers to the
MachineInstr, which involved fixing up several places in codegen and mir
tests.

Differential Revision: https://reviews.llvm.org/D59191

Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea
llvm-svn: 356398
2019-03-18 19:25:39 +00:00
Dmitry Preobrazhensky 6023d5990d [AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, v_readlane_b32 and v_writelane_b32
See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D58713

llvm-svn: 355312
2019-03-04 12:48:32 +00:00
Konstantin Zhuravlyov 9a278bf6b5 Revert "AMDGPU/NFC: Cleanup subtarget predicates"
It breaks one of our downstream merges, so revert it
temporarily while investigating failures downstream

llvm-svn: 354700
2019-02-22 23:21:06 +00:00
Konstantin Zhuravlyov c2650178a1 AMDGPU/NFC: Cleanup subtarget predicates
Differential Revision: https://reviews.llvm.org/D58522

llvm-svn: 354620
2019-02-21 20:43:43 +00:00
Matt Arsenault d7047276ec AMDGPU: Remove GCN features and predicates
These are no longer necessary since the R600 tablegen files are split
out now.

llvm-svn: 353548
2019-02-08 19:18:01 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Tim Corringham 4c4d2fe280 [AMDGPU] Add new Mode Register pass
A new pass to manage the Mode register.

Currently this just manages the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.

The immediate motivation comes from the requirement to use the
round-to-zero rounding mode for the 16 bit interpolation
instructions, where the rounding mode setting is shared between
16 and 64 bit operations.

llvm-svn: 348754
2018-12-10 12:06:10 +00:00
Valery Pykhtin 3d9afa273f [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)
Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses.

Differential revision: https://reviews.llvm.org/D53762

llvm-svn: 347993
2018-11-30 14:21:56 +00:00
Matt Arsenault 687ec75d10 DAG: Change behavior of fminnum/fmaxnum nodes
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.

There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.

llvm-svn: 344914
2018-10-22 16:27:27 +00:00