Commit Graph

45 Commits

Author SHA1 Message Date
Matt Arsenault a4edc04693 AMDGPU/GlobalISel: Use clamp modifier for [us]addsat/[us]subsat
We also have never handled this for SelectionDAG, which needs
additional work.
2020-07-28 11:18:05 -04:00
Matt Arsenault d2e74fad20 AMDGPU: Set more mov flags on V_ACCVGPR_{READ|WRITE}_B32
This fixes extra copies when materializing constants in AGPRs. This
made it a lot harder to trigger the spilling in spill-agpr.ll
2020-07-01 18:58:59 -04:00
Matt Arsenault 651c36b508 AMDGPU: Select strict_fmul 2020-06-04 17:49:00 -04:00
Matt Arsenault 483d4daa5e AMDGPU: Select strict_fma
Like with strict_fadd, the legalization is scalarizing the v4f16 when
it should split.
2020-06-04 17:49:00 -04:00
Matt Arsenault ae26c064ce AMDGPU: Select strict_fadd 2020-06-04 17:49:00 -04:00
Matt Arsenault 4b4496312e AMDGPU: Start adding MODE register uses to instructions
This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.

Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.

Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.
2020-05-27 14:47:00 -04:00
Fangrui Song 2cb48d620f [TableGen] Drop deprecated leading # operation (NOP) and replace ## with # 2020-04-25 16:26:45 -07:00
Kazuaki Ishizaki 0312b9f550 [llvm] NFC: Fix trivial typo in rst and td files
Differential Revision: https://reviews.llvm.org/D77469
2020-04-23 14:26:32 +09:00
Matt Arsenault db06870dbd AMDGPU: Move dot intrinsic patterns to instruction def
I tried to use some of the new tablegen features to avoid creating
different operand list permutations, but I still don't see a way to
programmatically build a source pattern dag.

Also add GlobalISel tests, which now all import successfully.

Some of the fneg fold tests are incorrect, which need to be fixed in a
future commit
2020-02-21 13:35:40 -05:00
Matt Arsenault 4c1c9422a3 AMDGPU/GlobalISel: Select llvm.amdgcn.fdot2
I'm slighly worried about the generated checks, since they won't catch
incorrect modifiers being added at the end of the line.
2020-02-21 13:35:40 -05:00
Matt Arsenault 60023e3471 AMDGPU: Use default operand for VOP3P clamp
We don't use this, and matching from the def doesn't make much sense.

There are multiple tablegen bugs with default operand
handling. undef_tied_input should work to handle the vdst_in
correctly, but this breaks the operand register class constraint which
it should be able to infer.
2020-02-21 12:14:18 -05:00
Matt Arsenault 7f3280ecdd AMDGPU/GlobalISel: Select permlane16/permlanex16 2020-01-29 17:55:31 -05:00
Stanislav Mekhanoshin 4312c4afd4 [AMDGPU] deduplicate tablegen predicates
We are duplicating predicates if several parts of the combined
predicate list contain the same condition. Added code to deduplicate
the list.

We have AssemblerPredicates and AssemblerPredicate in the
PredicateControl, but we never use AssemblerPredicates with an
actual list, so this one is dropped.

This addresses the first part of the llvm bug 43886:
https://bugs.llvm.org/show_bug.cgi?id=43886

Differential Revision: https://reviews.llvm.org/D69815
2019-11-04 12:19:17 -08:00
Stanislav Mekhanoshin 1fb584f7a2 [AMDGPU] Added MI bit IsDOT
NFC, needed for future commit.

Differential Revision: https://reviews.llvm.org/D67669

llvm-svn: 372151
2019-09-17 17:56:13 +00:00
Stanislav Mekhanoshin 50d7f46460 [AMDGPU] gfx908 mAI instructions, MC part
Differential Revision: https://reviews.llvm.org/D64446

llvm-svn: 365563
2019-07-09 21:43:09 +00:00
Matt Arsenault e24b34e9c9 AMDGPU: Undo sub x, c canonicalization for v2i16
Should avoid regression from D62341

llvm-svn: 363899
2019-06-19 23:37:43 +00:00
Stanislav Mekhanoshin c43e67bfff [AMDGPU] gfx1011/gfx1012 targets
Differential Revision: https://reviews.llvm.org/D63307

llvm-svn: 363344
2019-06-14 00:33:31 +00:00
Stanislav Mekhanoshin 61beff020e [AMDGPU] gfx1010 VOP3 and VOP3P implementation
Differential Revision: https://reviews.llvm.org/D61202

llvm-svn: 359328
2019-04-26 17:56:03 +00:00
Stanislav Mekhanoshin 5182302a37 [AMDGPU] Sort out and rename multiple CI/VI predicates
Differential Revision: https://reviews.llvm.org/D60346

llvm-svn: 357835
2019-04-06 09:20:48 +00:00
Stanislav Mekhanoshin 0e858b028d [AMDGPU] Split dot-insts feature
Differential Revision: https://reviews.llvm.org/D57971

llvm-svn: 353587
2019-02-09 00:34:21 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Stanislav Mekhanoshin d3757d3f3a [AMDGPU] Separate feature dot-insts
Differential Revision: https://reviews.llvm.org/D56524

llvm-svn: 350793
2019-01-10 03:25:20 +00:00
Tim Corringham 4c4d2fe280 [AMDGPU] Add new Mode Register pass
A new pass to manage the Mode register.

Currently this just manages the floating point double precision
rounding requirements, but is intended to be easily extended to
encompass all Mode register settings.

The immediate motivation comes from the requirement to use the
round-to-zero rounding mode for the 16 bit interpolation
instructions, where the rounding mode setting is shared between
16 and 64 bit operations.

llvm-svn: 348754
2018-12-10 12:06:10 +00:00
Farhana Aleen 5853762e5a [AMDGPU] Handle the idot8 pattern generated by FE.
Summary: Different variants of idot8 codegen dag patterns are not generated by llvm-tablegen due to a huge
         increase in the compile time. Support the pattern that clang FE generates after reordering the
         additions in integer-dot8 source language pattern.

Author: FarhanaAleen

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D53937

llvm-svn: 345902
2018-11-01 22:48:19 +00:00
Matt Arsenault 687ec75d10 DAG: Change behavior of fminnum/fmaxnum nodes
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.

There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.

llvm-svn: 344914
2018-10-22 16:27:27 +00:00
Farhana Aleen 4bc597bff5 [AMDGPU] Match signed dot4/8 pattern.
Summary: This patch matches signed dot4 and dot8 pattern.

Author: FarhanaAleen

Reviewed By: msearles

Differential Revision: https://reviews.llvm.org/D52520

llvm-svn: 343798
2018-10-04 16:57:37 +00:00
Farhana Aleen f5a2848376 [AMDGPU] Match udot8 pattern
Summary: D.u32 = S0.u4[0] * S1.u4[0] +

         S0.u4[1] * S1.u4[1] +
         S0.u4[2] * S1.u4[2] +
         S0.u4[3] * S1.u4[3] +
         S0.u4[4] * S1.u4[4] +
         S0.u4[5] * S1.u4[5] +
         S0.u4[6] * S1.u4[6] +
         S0.u4[7] * S1.u4[7] +
         S2.u32

Author: FarhanaAleen

Reviewed By: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D51947

llvm-svn: 342497
2018-09-18 16:59:48 +00:00
Farhana Aleen 9250c92d0e [AMDGPU] Match udot4 pattern.
Summary: D.u32 = S0.u8[0] * S1.u8[0] +
                 S0.u8[1] * S1.u8[1] +
                 S0.u8[2] * S1.u8[2] +
                 S0.u8[3] * S1.u8[3] + S2.u32

Author: FarhanaAleen

Reviewed By: arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D50921

llvm-svn: 340936
2018-08-29 16:31:18 +00:00
Farhana Aleen 3528c80378 [AMDGPU] Support idot2 pattern.
Summary: Transform add (mul ((i32)S0.x, (i32)S1.x),

         add( mul ((i32)S0.y, (i32)S1.y), (i32)S3) => i/udot2((v2i16)S0, (v2i16)S1, (i32)S3)

Author: FarhanaAleen

Reviewed By: arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D50024

llvm-svn: 340295
2018-08-21 16:21:15 +00:00
Konstantin Zhuravlyov bb30ef7af4 AMDGPU: Add clamp bit to dot intrinsics
Differential Revision: https://reviews.llvm.org/D49874

llvm-svn: 338470
2018-08-01 01:31:30 +00:00
Farhana Aleen c370d7b33d [AMDGPU] [AMDGPU] Support a fdot2 pattern.
Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z))
                   -> fdot2((v2f16)S0, (v2f16)S1, (float)z)

Author: FarhanaAleen

Reviewed By: rampitec, b-sumner

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D49146

llvm-svn: 337198
2018-07-16 18:19:59 +00:00
Konstantin Zhuravlyov f13c9969fc AMDGPU: Fix v_dot{4, 8}* instruction encoding
Differential Revision: https://reviews.llvm.org/D46848

llvm-svn: 332387
2018-05-15 19:32:47 +00:00
Matt Arsenault 0084adc516 AMDGPU: Add Vega12 and Vega20
Changes by
  Matt Arsenault
  Konstantin Zhuravlyov

llvm-svn: 331215
2018-04-30 19:08:16 +00:00
Nicolai Haehnle 4f850eabb6 AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classes
Differential revision: https://reviews.llvm.org/D44820

Change-Id: I732979e2964006aa15d78a333d8886e6855f319a
llvm-svn: 328496
2018-03-26 13:56:53 +00:00
Matt Arsenault 28f52e51f1 AMDGPU: Add max-mix-insts subtarget feature
llvm-svn: 316553
2017-10-25 07:00:51 +00:00
Matt Arsenault 90c7593a75 AMDGPU: Remove global isGCN predicates
These are problematic because they apply to everything,
and can easily clobber whatever more specific predicate
you are trying to add to a function.

Currently instructions use SubtargetPredicate/PredicateControl
to apply this to patterns applied to an instruction definition,
but not to free standing Pats. Add a wrapper around Pat
so the special PredicateControls requirements can be appended
to the final predicate list like how Mips does it.

llvm-svn: 314742
2017-10-03 00:06:41 +00:00
Matt Arsenault 8cbb4884a5 AMDGPU: Start selecting v_mad_mixhi_f16
llvm-svn: 313814
2017-09-20 21:01:24 +00:00
Matt Arsenault e135c4c6a6 AMDGPU: Add tied operands to v_mad_mix{lo|hi}_f16
These write to the low and high half of the destination
register and leave the other 16-bits unchanged. This is true
for most 16-bit instructions on gfx9, but we don't use that
now.

llvm-svn: 313812
2017-09-20 20:53:49 +00:00
Matt Arsenault 76935122cc AMDGPU: Start selecting v_mad_mixlo_f16
Also add some tests that should be able to use v_mad_mixhi_f16,
but do not yet. This is trickier because we don't really model
the partial update of the register done by 16-bit instructions.

llvm-svn: 313806
2017-09-20 20:28:39 +00:00
Matt Arsenault 644883ff07 AMDGPU: Fix encoding of op_sel for mad_mix* opcodes
llvm-svn: 313797
2017-09-20 19:09:28 +00:00
Matt Arsenault c8f8cda0cd AMDGPU: Correct operand types for v_mad_mix*
These aren't really packed instructions, so the default
op_sel_hi should be 0 since this indicates a conversion.
The operand types are scalar values that behave similar
to an f16 scalar that may be converted to f32.

Doesn't change the default printing for op_sel_hi, just
the parsing.

llvm-svn: 312179
2017-08-30 22:18:40 +00:00
Dmitry Preobrazhensky 095ec3da81 [AMDGPU][MC] Added missing VOP3P opcodes
Added support of the following opcodes:
  v_pk_sub_u16
  v_pk_mad_i16
  v_pk_mad_u16

See Bug 33593: https://bugs.llvm.org//show_bug.cgi?id=33593

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D34890

llvm-svn: 308281
2017-07-18 09:24:10 +00:00
Dmitry Preobrazhensky b2d24e23ce [AMDGPU][mc][gfx9] Added support of op_sel/op_sel_hi for V_MAD_MIX*
See https://bugs.llvm.org//show_bug.cgi?id=33595

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D35021

llvm-svn: 307402
2017-07-07 14:29:06 +00:00
Matt Arsenault eb522e68bc AMDGPU: Support v2i16/v2f16 packed operations
llvm-svn: 296396
2017-02-27 22:15:25 +00:00
Matt Arsenault 9be7b0d485 AMDGPU: Add VOP3P instruction format
Add a few non-VOP3P but instructions related to packed.

Includes hack with dummy operands for the benefit of the assembler

llvm-svn: 296368
2017-02-27 18:49:11 +00:00