Commit Graph

2537 Commits

Author SHA1 Message Date
Konstantin Zhuravlyov c233ae8004 AMDGPU/Metadata: Always report a fixed number of hidden arguments
Currently it is 6. If the "feature" was not used, report dummy
hidden argument. Otherwise it does not match the kernarg size
reported in the kernel header.

Differential Revision: https://reviews.llvm.org/D45129

llvm-svn: 329341
2018-04-05 20:46:04 +00:00
Simon Pilgrim 1d793b8ac5 [SchedModel] Complete models shouldn't match against itineraries when they don't use them (PR35639)
For schedule models that don't use itineraries, checkCompleteness still checks that an instruction has a matching itinerary instead of skipping and going straight to matching the InstRWs. That doesn't seem to match what happens in TargetSchedule.cpp

This patch causes problems for a number of models that had been incorrectly flagged as complete.

Differential Revision: https://reviews.llvm.org/D43235

llvm-svn: 329280
2018-04-05 13:11:36 +00:00
Dmitry Preobrazhensky 523872ea59 [AMDGPU][MC] Enabled instruction TBUFFER_LOAD_FORMAT_XYZ for SI/CI
See bug 36958: https://bugs.llvm.org/show_bug.cgi?id=36958

Differential Revision: https://reviews.llvm.org/D45099

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329197
2018-04-04 13:54:55 +00:00
Dmitry Preobrazhensky a0b8cd038c [AMDGPU][MC] Added support of 3-element addresses for MIMG instructions
See bug 35999: https://bugs.llvm.org/show_bug.cgi?id=35999

Differential Revision: https://reviews.llvm.org/D45084

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329187
2018-04-04 13:01:17 +00:00
Nico Weber 1cbd096914 Sort targetgen calls in lib/Target/*/CMakeLists.
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.

Also remove some stale-looking comments from the Nios2 target cmakefile.

No intended behavior change.

llvm-svn: 329181
2018-04-04 12:37:44 +00:00
Nicolai Haehnle 2f5a73820c AMDGPU: Dimension-aware image intrinsics
Summary:
These new image intrinsics contain the texture type as part of
their name and have each component of the address/coordinate as
individual parameters.

This is a preparatory step for implementing the A16 feature, where
coordinates are passed as half-floats or -ints, but the Z compare
value and texel offsets are still full dwords, making it difficult
or impossible to distinguish between A16 on or off in the old-style
intrinsics.

Additionally, these intrinsics pass the 'texfailpolicy' and
'cachectrl' as i32 bit fields to reduce operand clutter and allow
for future extensibility.

v2:
- gather4 supports 2darray images
- fix a bug with 1D images on SI

Change-Id: I099f309e0a394082a5901ea196c3967afb867f04

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44939

llvm-svn: 329166
2018-04-04 10:58:54 +00:00
Nicolai Haehnle 3ffd383a15 AMDGPU: Fix copying i1 value out of loop with non-uniform exit
Summary:
When an i1-value is defined inside of a loop and used outside of it, we
cannot simply use the SGPR bitmask from the loop's last iteration.

There are also useful and correct cases of an i1-value being copied between
basic blocks, e.g. when a condition is computed outside of a loop and used
inside it. The concept of dominators is not sufficient to capture what is
going on, so I propose the notion of "lane-dominators".

Fixes a bug encountered in Nier: Automata.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743
Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D40547

llvm-svn: 329164
2018-04-04 10:57:58 +00:00
Farhana Aleen e80aeac0f2 [AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3.
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.

Author: FarhanaAleen

Reviewed By: arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D45219

llvm-svn: 329131
2018-04-03 23:00:30 +00:00
Farhana Aleen 936947349a Revert "MSG"
This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de.

This was committed by mistake.

llvm-svn: 329119
2018-04-03 21:51:45 +00:00
Farhana Aleen 3ab409dc86 MSG
llvm-svn: 329114
2018-04-03 21:20:39 +00:00
Dmitry Preobrazhensky b181c7312e [AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16
See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847

Differential Revision: https://reviews.llvm.org/D45097

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328988
2018-04-02 17:09:20 +00:00
Dmitry Preobrazhensky 6bad04ecf5 [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions
Fixed a bug which caused Tablegen crash.

See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837

Differential Revision: https://reviews.llvm.org/D45085

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328983
2018-04-02 16:10:25 +00:00
Nico Weber f492f58182 Revert r328975, it makes TableGen assert on the bots.
llvm-svn: 328978
2018-04-02 14:20:23 +00:00
Dmitry Preobrazhensky 32c450ae6a [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837

Differential Revision: https://reviews.llvm.org/D45085

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328975
2018-04-02 13:52:23 +00:00
Nicolai Haehnle 4254d45a79 AMDGPU: Make isIntrinsicSourceOfDivergence table-driven
Summary:
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.

Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44938

llvm-svn: 328939
2018-04-01 17:09:14 +00:00
Nicolai Haehnle 5d0d30304c AMDGPU: Make getTgtMemIntrinsic table-driven for resource-based intrinsics
Summary:
Avoids having to list all intrinsics manually.

This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.

Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44937

llvm-svn: 328938
2018-04-01 17:09:07 +00:00
Stanislav Mekhanoshin 74e2974ac6 [AMDGPU] Fixed some instructions latencies
Differential Revision: https://reviews.llvm.org/D45073

llvm-svn: 328874
2018-03-30 16:19:13 +00:00
Michael Bedy 59e5ef793c [AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE.
Summary:
The phase attempts to transform operations that extract a portion of a value
into an SDWA src operand in cases where that value is used only once. It
was not prepared for this use to be the preserved portion of a value for
dst:UNUSED_PRESERVE, resulting in a crash or assert.

This change either rejects the illegal SDWA attempt, or in the case where
dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded
extract instruction.

Reviewers: arsenm, #amdgpu

Reviewed By: arsenm, #amdgpu

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44364

llvm-svn: 328856
2018-03-30 05:03:36 +00:00
Matt Arsenault efd1b30436 AMDGPU: Fix build warning in release
llvm-svn: 328832
2018-03-29 21:44:44 +00:00
Matt Arsenault 03ae399d50 AMDGPU: Support realigning stack
While the stack access instructions don't care about
alignment > 4, some transformations on the pointer calculation
do make assumptions based on knowing the low bits of a pointer
are 0. If a stack object ends up being accessed through its
absolute address (relative to the kernel scratch wave offset),
the addressing expression may depend on the stack frame being
properly aligned. This was breaking in a testcase due to the
add->or combine.

I think some of the SP/FP handling logic is still backwards,
and overly simplistic to support all of the stack features.
Code which tries to modify the SP with inline asm for example
or variable sized objects will probably require redoing this.

llvm-svn: 328831
2018-03-29 21:30:06 +00:00
Matt Arsenault ffb132e74b AMDGPU: Increase default stack alignment
8 and 16-byte values are common, so increase the default
alignment to avoid realigning the stack in most functions.

llvm-svn: 328821
2018-03-29 20:22:04 +00:00
Matt Arsenault 6c041a3cab AMDGPU: Fix selection error on constant loads with < 4 byte alignment
llvm-svn: 328818
2018-03-29 19:59:28 +00:00
Craig Topper 2fa1436206 [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

llvm-svn: 328806
2018-03-29 17:21:10 +00:00
David Blaikie a373d18eb7 Transforms: Introduce Transforms/Utils.h rather than spreading the declarations amongst Scalar.h and IPO.h
Fixes layering - Transforms/Utils shouldn't depend on including a Scalar
or IPO header, because Scalar and IPO depend on Utils.

llvm-svn: 328717
2018-03-28 17:44:36 +00:00
Dmitry Preobrazhensky 622bde8bc7 [AMDGPU][MC] Added ds_add_src2_f32
See bug 36833: https://bugs.llvm.org/show_bug.cgi?id=36833

Differential Revision: https://reviews.llvm.org/D44779

Reviewers: arsenm, artem.tamazov, timcorringham
llvm-svn: 328713
2018-03-28 16:21:56 +00:00
Dmitry Preobrazhensky 2456ac696a [AMDGPU][MC] Added PCK variants of image load/store instructions
See bug 36834: https://bugs.llvm.org/show_bug.cgi?id=36834

Differential Revision: https://reviews.llvm.org/D44795

Reviewers: artem.tamazov, arsenm, timcorringham, nhaehnle
llvm-svn: 328710
2018-03-28 15:44:16 +00:00
Dmitry Preobrazhensky a917e88585 [AMDGPU][MC][GFX9] Added buffer_*_format_d16_hi_x
See bug 36835: https://bugs.llvm.org/show_bug.cgi?id=36835

Differential Revision: https://reviews.llvm.org/D44825

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328707
2018-03-28 14:53:13 +00:00
Dmitry Preobrazhensky dd2b929ffb [AMDGPU][MC][GFX9] Added s_scratch* instructions
See bug 36836: https://bugs.llvm.org/show_bug.cgi?id=36836

Differential Revision: https://reviews.llvm.org/D44832

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328704
2018-03-28 14:08:03 +00:00
Tim Renouf cdac172e2a Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader"
This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981.

It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a
release-with-asserts build. I will resubmit the change when I have fixed
that.

Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a
llvm-svn: 328695
2018-03-28 11:21:07 +00:00
Matt Arsenault bd49eccca1 AMDGPU: Really implement getFrameRegister
Currently this seems to only really be used for debug
info.

llvm-svn: 328677
2018-03-27 23:26:59 +00:00
Tim Renouf e4208bfa5b [AMDGPU] For OS type AMDPAL, fixed scratch on compute shader
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).

This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.

Reviewers: kzhuravl, nhaehnle, timcorringham

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D44468

Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 328673
2018-03-27 21:35:00 +00:00
Matt Arsenault 17f3338015 AMDGPU: Fix not preserving CSR VGPR if used for SGPR spills
Before this was not done if the function had no calls in it. This
is still a possible issue with any callable function, regardless
of calls present.

llvm-svn: 328659
2018-03-27 19:42:55 +00:00
Matt Arsenault 95329f8c53 AMDGPU: Set natural stack alignment in DataLayout
Only 4 byte alignment is ever useful, so increasing anything
beyond this may require realigning the stack.

llvm-svn: 328656
2018-03-27 19:26:40 +00:00
Matt Arsenault 0a0c871f60 AMDGPU: Fix crash when MachinePointerInfo invalid
The combine on a select of a load only triggers for
addrspace 0, and discards the MachinePointerInfo. The
conservative default needs to be used for this.

llvm-svn: 328652
2018-03-27 18:39:45 +00:00
Matt Arsenault e9f3679031 AMDGPU: Fix FP restore from being reordered with stack ops
In a function, s5 is used as the frame base SGPR. If a function
is calling another function, during the call sequence
it is copied to a preserved SGPR and restored.

Before it was possible for the scheduler to move stack operations
before the restore of s5, since there's nothing to associate
a frame index access with the restore.

Add an implicit use of s5 to the adjcallstack pseudo which ends
the call sequence to preven this from happening. I'm not 100%
satisfied with this solution, but I'm not sure what else would be
better.

llvm-svn: 328650
2018-03-27 18:38:51 +00:00
Tim Corringham 7116e8963d [AMDGPU] Improve disassembler error handling
Summary:
llvm-objdump now disassembles unrecognised opcodes as data, using
the .long directive. We treat unrecognised opcodes as being 32 bit
values, so move along 4 bytes rather than the single byte which
previously resulted in a cascade of bogus disassembly following an
unrecognised opcode.

While no solution can always disassemble code that contains
embedded data correctly this provides a significant improvement.

The disassembler will now cope with an arbitrary length section
as it no longer truncates it to a multiple of 4 bytes, and will
use the .byte directive for trailing bytes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44685

llvm-svn: 328553
2018-03-26 17:06:33 +00:00
Nicolai Haehnle 4f850eabb6 AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classes
Differential revision: https://reviews.llvm.org/D44820

Change-Id: I732979e2964006aa15d78a333d8886e6855f319a
llvm-svn: 328496
2018-03-26 13:56:53 +00:00
Mandeep Singh Grang 860adef9e6 [AMDGPU] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Reviewers: tstellar, RKSimon, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44856

llvm-svn: 328429
2018-03-24 17:15:04 +00:00
David Blaikie 36a0f226b1 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

llvm-svn: 328397
2018-03-23 23:58:31 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
David Blaikie 6054e650ff Move TargetLoweringObjectFile from CodeGen to Target to fix layering
It's implemented in Target & include from other Target headers, so the
header should be in Target.

llvm-svn: 328392
2018-03-23 23:58:19 +00:00
Tony Tye 7a893d4e34 [AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU
- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS.

Differential Revision: https://reviews.llvm.org/D43736

llvm-svn: 328349
2018-03-23 18:45:18 +00:00
David Blaikie 2be3922807 Fix a couple of layering violations in Transforms
Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.

Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.

Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.

llvm-svn: 328165
2018-03-21 22:34:23 +00:00
Nirav Dave 3264c1bdf6 [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"
Reland ISel cycle checking improvements after simplifying node id
invariant traversal and correcting typo.

llvm-svn: 327898
2018-03-19 20:19:46 +00:00
Nicolai Haehnle 4186cc7c08 TableGen: Check the dynamic type of !cast<Rec>(string)
Summary:
The docs already claim that this happens, but so far it hasn't. As a
consequence, existing TableGen files get this wrong a lot, but luckily
the fixes are all reasonably straightforward.

To make this work with all the existing forms of self-references (since
the true type of a record is only built up over time), the lookup of
self-references in !cast is delayed until the final resolving step.

Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164

Reviewers: arsenm, craig.topper, tra, MartinO

Subscribers: wdng, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D44475

llvm-svn: 327849
2018-03-19 14:14:20 +00:00
Matt Arsenault fed0a45036 AMDGPU/GlobalISel: RegBankSelect for basic int ops
llvm-svn: 327843
2018-03-19 14:07:23 +00:00
Matt Arsenault 69932e4d69 AMDGPU: Don't leave dead illegal VGPR->SGPR copies
Normally DCE kills these, but at -O0 these get left behind
leaving suspicious looking illegal copies.

Replace with IMPLICIT_DEF to avoid iterator issues.

llvm-svn: 327842
2018-03-19 14:07:15 +00:00
Nirav Dave 5f0ab71b62 Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172""
as it times out building test-suite on PPC.

llvm-svn: 327778
2018-03-17 19:24:54 +00:00
Nirav Dave 982d3a56ea [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"
Reland ISel cycle checking improvements after simplifying and reducing
node id invariant traversal.

llvm-svn: 327777
2018-03-17 17:42:10 +00:00
Matt Arsenault abdc4f2dc7 AMDGPU/GlobalISel: Cleanup constant legality
llvm-svn: 327774
2018-03-17 15:17:48 +00:00
Matt Arsenault 685d1e8157 AMDGPU/GlobalISel: Basic G_GEP legality
llvm-svn: 327773
2018-03-17 15:17:45 +00:00
Matt Arsenault 85803366d6 AMDGPU/GlobalISel: Basic legality for load/store
llvm-svn: 327772
2018-03-17 15:17:41 +00:00
Farhana Aleen c6c9dc8773 [AMDGPU] Supported ds_write_b128 generation.
Summary: This is a follow-on patch of https://reviews.llvm.org/D44210

Author: FarhanaAleen

Reviewed By: msearles

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44319

llvm-svn: 327726
2018-03-16 18:12:00 +00:00
Dmitry Preobrazhensky 4c8f4234b6 [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP opcodes
See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751

Differential Revision: https://reviews.llvm.org/D44529

Reviewers: artem.tamazov, arsenm
llvm-svn: 327723
2018-03-16 16:38:04 +00:00
Dmitry Preobrazhensky 9c1a6e7e24 [AMDGPU][MC] Corrected default values for unused SDWA operands
See bug 36355:  https://bugs.llvm.org/show_bug.cgi?id=36355

Differential Revision: https://reviews.llvm.org/D44481

Reviewers: artem.tamazov, arsenm
llvm-svn: 327720
2018-03-16 15:40:27 +00:00
Mark Searles c3c02bde73 [AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed.
Differential Revision: https://reviews.llvm.org/D44434

llvm-svn: 327583
2018-03-14 22:04:32 +00:00
Dmitry Preobrazhensky d98c97b4f9 [AMDGPU][MC][GFX8] Added BUFFER_STORE_LDS_DWORD Instruction
See bug 36558: https://bugs.llvm.org/show_bug.cgi?id=36558

Differential Revision: https://reviews.llvm.org/D43950

Reviewers: artem.tamazov, arsenm
llvm-svn: 327299
2018-03-12 17:29:24 +00:00
Yaxun Liu a99e7d8e44 [AMDGPU] Fix lowering enqueue kernel when kernel has no name
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.

Differential Revision: https://reviews.llvm.org/D44322

llvm-svn: 327291
2018-03-12 16:34:06 +00:00
Dmitry Preobrazhensky da4a7c01bf [AMDGPU][MC] Corrected GATHER4 opcodes
See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252

Differential Revision: https://reviews.llvm.org/D43874

Reviewers: artem.tamazov, arsenm
llvm-svn: 327278
2018-03-12 15:03:34 +00:00
Matt Arsenault 7b9ed89dcf AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT|EXTRACT}_VECTOR_ELT
llvm-svn: 327269
2018-03-12 13:35:53 +00:00
Matt Arsenault c0aefd561e AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES
llvm-svn: 327268
2018-03-12 13:35:49 +00:00
Matt Arsenault 503afda95f AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal
llvm-svn: 327267
2018-03-12 13:35:43 +00:00
Michael Bedy 80cf9ff564 Test commit - change comment slightly.
llvm-svn: 327234
2018-03-11 03:27:50 +00:00
Matt Arsenault cbda7ff4ae AMDGPU: Fix crash when constant folding with physreg operand
llvm-svn: 327209
2018-03-10 16:05:35 +00:00
Nirav Dave 042678bd55 Revert: r327172 "Correct load-op-store cycle detection analysis"
r327171 "Improve Dependency analysis when doing multi-node Instruction Selection"
        r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection"

Reverting patch as NodeId invariant change is causing pathological
increases in compile time on PPC

llvm-svn: 327197
2018-03-10 02:16:15 +00:00
Nirav Dave 071699bf82 [DAG] Enforce stricter NodeId invariant during Instruction selection
Instruction Selection makes use of the topological ordering of nodes
by node id (a node's operands have smaller node id than it) when doing
cycle detection.  During selection we may violate this property as a
selection of multiple nodes may induce a use dependence (and thus a
node id restriction) between two unrelated nodes. If a selected node
has an unselected successor this may allow us to miss a cycle in
detection an invalid selection.

This patch fixes this by marking all unselected successors of a
selected node have negated node id.  We avoid pruning on such negative
ids but still can reconstruct the original id for pruning.

In-tree targets have been updated to replace DAG-level replacements
with ISel-level ones which enforce this property.

This preemptively fixes PR36312 before triggering commit r324359 relands

Reviewers: craig.topper, bogner, jyknight

Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D43198

llvm-svn: 327170
2018-03-09 20:57:15 +00:00
Farhana Aleen a7cb31123c [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space.
Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64.
         This patch supports ds_read_b128 instruction pattern and generation of this instruction.
         In the vectorizer, this patch also widen the vector length so that vectorizer generates
         128 bit loads for local address-space which gets translated to ds_read_b128.
         Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128.

Author: FarhanaAleen

Reviewed By: rampitec, arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44210

llvm-svn: 327153
2018-03-09 17:41:39 +00:00
Stanislav Mekhanoshin c8127fc674 [AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9
GFX9 should select opsel version.

Differential Revision: https://reviews.llvm.org/D44279

llvm-svn: 327106
2018-03-09 07:21:43 +00:00
Matt Arsenault c3fe46bbcf AMDGPU/GlobalISel: Pass subtarget + TM to LegalizerInfo
These are the parameters x86 already uses.

llvm-svn: 327020
2018-03-08 16:24:16 +00:00
Farhana Aleen 89196642f7 [AMDGPU] Increased vector length for global/constant loads.
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache;
         loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44179

llvm-svn: 326910
2018-03-07 17:09:18 +00:00
Farhana Aleen 347d12b4ce Revert "[AMDGPU] Widened vector length for global/constant address space."
This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03.

llvm-svn: 326907
2018-03-07 16:55:27 +00:00
Farhana Aleen 0d03d0588d [AMDGPU] Widened vector length for global/constant address space.
llvm-svn: 326904
2018-03-07 16:29:05 +00:00
Craig Topper 80d3bb3b4b [TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC
The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have.

There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run.

A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does.

llvm-svn: 326832
2018-03-06 19:44:52 +00:00
Stanislav Mekhanoshin 0f72225433 [AMDGPU] Add default ISA version targets
In case if -mattr used to modify feature set bits in llvm-mc call
getIsaVersion can fail to identify specific ISA due to test mismatch.
Adding default fallback tests which will always correctly report at
least major version.

Differential Revision: https://reviews.llvm.org/D44163

llvm-svn: 326825
2018-03-06 18:33:55 +00:00
Yaxun Liu 46439e8d4a [AMDGPU] Fix lowering OpenCL enqueue_kernel
One addrspacecast disappeared in clang emitted IR for
block invoke function due to adoption of the new
addr space mapping.

Differential Revision: https://reviews.llvm.org/D43785

llvm-svn: 326806
2018-03-06 16:04:39 +00:00
Matt Arsenault e31ab94e97 AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACT
llvm-svn: 326715
2018-03-05 16:25:18 +00:00
Matt Arsenault 71272e6d4e AMDGPU/GlobalISel: Make some G_EXTRACTs legal
As far as I can tell legalization of weird sizes for the
output type isn't implemented.

llvm-svn: 326714
2018-03-05 16:25:15 +00:00
Matt Arsenault 4cc0b85276 AMDGPU: Fix build warning about override
llvm-svn: 326713
2018-03-05 16:25:10 +00:00
Alexander Timofeev 2e5eeceeb7 Pass Divergence Analysis data to Selection DAG to drive divergence
dependent instruction selection.

Differential revision: https://reviews.llvm.org/D35267

llvm-svn: 326703
2018-03-05 15:12:21 +00:00
Matt Arsenault b9699c009d AMDGPU/GlobalISel: InstrMapping for G_ZEXT
llvm-svn: 326589
2018-03-02 16:55:37 +00:00
Matt Arsenault 1c1aab99ae AMDGPU/GlobalISel: InstrMapping for G_TRUNC
llvm-svn: 326588
2018-03-02 16:55:33 +00:00
Matt Arsenault ef8db767d7 AMDGPU/GlobalISel: Define InstrMappings for G_FCMP
Patch by Tom Stellard

llvm-svn: 326587
2018-03-02 16:53:15 +00:00
Matt Arsenault 2607dc60de AMDGPU/GlobalISel: Define instruction mapping for @llvm.minnum
Patch by Tom Stellard

llvm-svn: 326586
2018-03-02 16:40:17 +00:00
Matt Arsenault b46c191c49 AMDGPU/GlobalISel: Define instruction mapping for @llvm.maxnum
Patch by Tom Stellard

llvm-svn: 326567
2018-03-02 12:23:00 +00:00
Jan Vesely b283ea0f0f AMDGPU/GCN: Promote i16 ctpop
i16 capable ASICs do not support i16 operands for this instruction.
Add tablegen pattern to merge chained i16 additions.

Differential Revision: https://reviews.llvm.org/D43985

llvm-svn: 326535
2018-03-02 02:50:22 +00:00
Matt Arsenault 41d2e3d98e AMDGPU/GlobalISel: Define instruction mapping for G_FPTOSI
Patch by Tom Stellard

llvm-svn: 326534
2018-03-02 02:19:16 +00:00
Matt Arsenault b23041ad4d AMDGPU/GlobalISel: Define instruction mapping for G_FPTOUI
Patch by Tom Stellard

llvm-svn: 326533
2018-03-02 02:19:11 +00:00
Matt Arsenault 327d5fb2e5 AMDGPU/GlobalISel: Define instruction mapping for G_FMUL
llvm-svn: 326532
2018-03-02 02:17:01 +00:00
Matt Arsenault 5a9e834eac AMDGPU/GlobalISel: Define instruction mapping for G_FADD
Patch by Tom Stellard

llvm-svn: 326526
2018-03-02 01:22:13 +00:00
Matt Arsenault d99317f1b3 AMDGPU/GlobalISel: Define instruction mapping for G_SHL
Patch by Tom Stellard

llvm-svn: 326525
2018-03-02 01:22:10 +00:00
Matt Arsenault 3c7a123ccc AMDGPU/GlobalISel: Define instruction mapping for G_XOR
llvm-svn: 326524
2018-03-02 01:22:06 +00:00
Matt Arsenault c0f34c9e36 AMDGPU/GlobalISel: Define instruction mapping for G_AND
Patch by Tom Stellard

llvm-svn: 326523
2018-03-02 01:22:01 +00:00
Matt Arsenault 364f12e8f9 AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.cvt.pkrtz
Patch by Tom Stellard

llvm-svn: 326490
2018-03-01 21:25:30 +00:00
Matt Arsenault 5320ee4a05 AMDGPU/GlobalISel: Define instruction mapping for G_OR
Patch by Tom Stellard

llvm-svn: 326489
2018-03-01 21:25:25 +00:00
Matt Arsenault e65404f5c5 AMDGPU/GlobalISel: Remove default register mapping
This crashes for some opcodes, which prevents the SelectionDAG
fallback from working.

Patch by Tom Stellard

llvm-svn: 326487
2018-03-01 21:20:44 +00:00
Matt Arsenault 1422a19a88 AMDGPU/GlobalISel: Use a more correct getValueMapping
This was finding the wrong size registers for anything with
more than 2 components.

Patch by Tom Stellard

llvm-svn: 326483
2018-03-01 21:08:51 +00:00
Matt Arsenault 62669ede94 AMDGPU/GlobalISel: Define instruction mapping for G_BITCAST
Patch by Tom Stellard

llvm-svn: 326482
2018-03-01 20:59:44 +00:00
Matt Arsenault 0529a8e2de AMDGPU/GlobalISel: Mark i32->i64 zext as legal
llvm-svn: 326481
2018-03-01 20:56:21 +00:00
Matt Arsenault 36b99e1937 AMDGPU/GlobalISel: InstrMapping for llvm.amdgcn.exp.compr
Patch by Tom Stellard

llvm-svn: 326479
2018-03-01 20:40:55 +00:00
Matt Arsenault 8931bbf8df AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.exp
Patch by Tom Stellard

llvm-svn: 326477
2018-03-01 20:24:37 +00:00