Commit Graph

166 Commits

Author SHA1 Message Date
Jay Foad 4f87d30a06 [AMDGPU] Introduce and use isGFX10Plus. NFC.
It's more future-proof to use isGFX10Plus from the start, on the
assumption that future architectures will be based on current
architectures.

Also make use of the existing isGFX9Plus in a few places.

Differential Revision: https://reviews.llvm.org/D92092
2020-11-26 09:02:36 +00:00
Sebastian Neubauer a022b1ccd8 [AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.

Differential Revision: https://reviews.llvm.org/D88540
2020-11-09 16:51:44 +01:00
Sebastian Neubauer 76313288cd [AMDGPU] Fix ieee mode default value
Previously, the default value for ieee mode was
- on for compute kernels and compute shaders,
- off for all shaders except compute shaders.

This commit changes the default to be
- on for compute kernels,
- off for shaders.

This aligns the default value with the settings that are actually in
use.  To my knowledge, all users of shader calling conventions (mesa and
llpc) disable the ieee mode by default.

Differential Revision: https://reviews.llvm.org/D89388
2020-11-04 12:25:38 +01:00
Konstantin Zhuravlyov 3fdf3b1539 AMDGPU: Update AMDHSA code object version handling
Differential Revision: https://reviews.llvm.org/D89076
2020-10-14 13:04:27 -04:00
Stanislav Mekhanoshin acce6b6082 [AMDGPU] Create isGFX9Plus utility function
Introduce a utility function to make it more
convenient to write code that is the same on
the GFX9 and GFX10 subtargets.

Use isGFX9Plus in the AsmParser for AMDGPU.

Authored By: Joe_Nash

Differential Revision: https://reviews.llvm.org/D88908
2020-10-06 10:18:43 -07:00
dfukalov c259d3a061 [AMDGPU] Fix for folding v2.16 literals.
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.

Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.

Fixes: SWDEV-247595

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87158
2020-09-10 01:39:25 +03:00
Stanislav Mekhanoshin b7760c3e5d [AMDGPU] Remove unsound dependency on ISA version in waitcnt
Differential Revision: https://reviews.llvm.org/D86566
2020-08-25 14:01:42 -07:00
Dmitry Preobrazhensky 6b8948922c [AMDGPU][MC] Added support of SP3 syntax for MTBUF format modifier
Currently supported LLVM MTBUF syntax is shown below. It is not compatible with SP3.

    op     dst, addr, rsrc, FORMAT, soffset

This change adds support for SP3 syntax:

    op     dst, addr, rsrc, soffset SP3FORMAT

In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants.

format:<expression>
format:[<numeric-format-name>,<data-format-name>]
format:[<data-format-name>,<numeric-format-name>]
format:[<data-format-name>]
format:[<numeric-format-name>]
format:[<unified-format-name>]

The last syntax variant is supported for GFX10 only.

See llvm bug 37738

Reviewers: arsenm, rampitec, vpykhtin

Differential Revision: https://reviews.llvm.org/D84026
2020-07-24 16:41:03 +03:00
Dmitry Preobrazhensky e122eba185 [AMDGPU][MC] Corrected MTBUF parsing and decoding
MTBUF implementation has many issues and this change addresses most of these:
- refactored duplicated code;
- hardcoded constants moved out of high-level code;
- fixed a decoding error when nfmt or dfmt are zero (bug 36932);
- corrected parsing of operand separators (bug 46403);
- corrected handling of missing operands (bug 46404);
- corrected handling of out-of-range modifiers (bug 46421);
- corrected default value (bug 46467).

Reviewers: arsenm, rampitec, vpykhtin, artem.tamazov, kzhuravl

Differential Revision: https://reviews.llvm.org/D83760
2020-07-15 19:46:00 +03:00
Guillaume Chatelet 52911428ef [Alignment][NFC] Migrate AMDGPU backend to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82743
2020-06-29 11:56:06 +00:00
Matt Arsenault 5f5f566b26 AMDGPU: Don't use 16-bit FP inline constants in integer operands
It seems to be a hardware defect that the half inline constants do not
work as expected for the 16-bit integer operations (the inverse does
work correctly). Experimentation seems to show these are really
reading the 32-bit inline constants, which can be observed by writing
inline asm using op_sel to see what's in the high half of the
constant. Theoretically we could fold the high halves of the 32-bit
constants using op_sel.

The *_asm_all.s MC tests are broken, and I don't know where the script
to autogenerate these are. I started manually fixing it, but there's
just too many cases to fix. This also does break the
assembler/disassembler support for these values, and I'm not sure what
to do about it. These are still valid encodings, so it seems like you
should be able to use them in some way. If you wrote assembly using
them, you could have really meant it (perhaps to read the high bits
with op_sel?). The disassembler will print the invalid literal
constant which will fail to re-assemble. The behavior is also
different depending on the use context. Consider this example, which
was previously accepted and encoded using the inline constant:

  v_mad_i16 v5, v1, -4.0, v3
  ; encoding: [0x05,0x00,0xec,0xd1,0x01,0xef,0x0d,0x04]

In contexts where an inline immediate is required (such as on gfx8/9),
this will now be rejected. For gfx10, this will produce the literal
encoding and change the printed format:
  v_mad_i16 v5, v1, 0xc400, v3
  ; encoding: [0x05,0x00,0x5e,0xd7,0x01,0xff,0x0d,0x04,0x00,0xc4,0x00,0x00]

This is just another variation of the issue that we don't perfectly
handle round trip assembly/disassembly due to not tracking how
immediates were encoded. This doesn't matter much in practice, since
compilers don't emit the suboptimal encoding. I doubt any users are
relying on this behavior (although I did make use of the old behavior
to figure out what was wrong).

Fixes bug 46302.
2020-06-17 19:14:10 -04:00
Stanislav Mekhanoshin 9ee272f13d [AMDGPU] Add gfx1030 target
Differential Revision: https://reviews.llvm.org/D81886
2020-06-15 16:18:05 -07:00
Sebastian Neubauer 29a6ad94fd [AMDGPU] Add G16 support to image instructions
Add G16 feature for GFX10 and support A16 and G16 in GlobalISel.

Differential Revision: https://reviews.llvm.org/D76836
2020-06-12 11:26:31 +02:00
Dmitry Preobrazhensky b087b91c91 [AMDGPU][CODEGEN] Added 'A' constraint for inline assembler
Summary: 'A' constraint requires an immediate int or fp constant that can be inlined in an instruction encoding.

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D78494
2020-05-25 14:23:34 +03:00
Dmitry Preobrazhensky 5998baccb9 [AMDGPU][MC][GFX9+] Enabled 21-bit signed offsets for SMEM instructions
Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D79288
2020-05-06 14:13:10 +03:00
Simon Pilgrim 43d6f9a876 AMDGPU/Utils - cleanup include and forward declarations. NFC.
Remove unused includes + forward declarations.
Reduce unnecessary StringRef.h includes to StringRef forward declaration.
2020-04-26 12:12:21 +01:00
Matt Arsenault 156afb2253 AMDGPU: Fix inlining logic for denormals
This was backwards from intended and missing a test. We perhaps should
just ignored the FP mode here, since it shouldn't be legal to mix code
with different default modes in the absence of strictfp.
2020-04-23 15:30:48 -04:00
Austin Kerbow 30f18ed387 [AMDGPU] Handle SMRD signed offset immediate
Summary:
This fixes a few issues related to SMRD offsets. On gfx9 and gfx10 we have a
signed byte offset immediate, however we can overflow into a negative since we
treat it as unsigned.

Also, the SMRD SOFFSET sgpr is an unsigned offset on all subtargets. We
sometimes tried to use negative values here.

Third, S_BUFFER instructions should never use a signed offset immediate.

Differential Revision: https://reviews.llvm.org/D77082
2020-04-02 17:41:52 -07:00
Matt Arsenault 5660bb6bc9 AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Matt Arsenault 75cf30918f AMDGPU: Assume f32 denormals are enabled by default
This will likely introduce catastrophic performance regressions on
older subtargets, but should be correct. A follow up change will
remove the old fp32-denormals subtarget features, and switch to using
the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends
should be making sure to add the denormal-fp-math-f32 attribute when
appropriate to avoid performance regressions.
2020-04-02 17:17:12 -04:00
Jay Foad 0fe096c4e9 [AMDGPU] Rename overloaded getMaxWavesPerEU to getWavesPerEUForWorkGroup
Summary: I think Max in the name was misleading. NFC.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76860
2020-03-26 20:21:04 +00:00
Jay Foad bb9c4fd7ea [AMDGPU] Remove getMaxWavesPerCU in favour of getWavesPerWorkGroup.
Summary:
These methods were identical. I chose to remove getMaxWavesPerCU because
I think Max in the name was misleading. NFC.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76859
2020-03-26 20:21:04 +00:00
Jay Foad 596bed3fd3 [AMDGPU] Remove unused methods. NFC. 2020-03-26 16:11:35 +00:00
Sebastian Neubauer 8756869170 [AMDGPU] Add a16 feature to gfx10
Based on D72931

This adds a new feature called A16 which is enabled for gfx10.
gfx9 keeps the R128A16 feature so it can share all the instruction encodings
with gfx7/8.

Differential Revision: https://reviews.llvm.org/D73956
2020-02-10 09:04:23 +01:00
Matt Arsenault 1024b73ef5 AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.

This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled
2020-02-04 10:44:21 -08:00
Matt Arsenault 0426c2d07d Reapply "AMDGPU: Cleanup and fix SMRD offset handling"
This reverts commit 6a4acb9d80.
2020-01-31 06:01:28 -08:00
Matt Arsenault 6a4acb9d80 Revert "AMDGPU: Cleanup and fix SMRD offset handling"
This reverts commit 17dbc6611d.

A test is failing on some bots
2020-01-30 15:39:51 -08:00
Matt Arsenault 17dbc6611d AMDGPU: Cleanup and fix SMRD offset handling
I believe this also fixes bugs with CI 32-bit handling, which was
incorrectly skipping offsets that look like signed 32-bit values. Also
validate the offsets are dword aligned before folding.
2020-01-30 15:04:21 -08:00
Piotr Sobczak 4a801170f3 [AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/stores
Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69794
2019-11-20 22:59:30 +01:00
Matt Arsenault db0ed3e429 AMDGPU: Refactor treatment of denormal mode
Start moving towards treating this as a property of the calling
convention, and not the subtarget. The default denormal mode should
not be part of the subtarget, and be moved into a separate function
attribute.

This patch is still NFC. The denormal mode remains as a subtarget
feature for now, but make the necessary changes to switch to using an
attribute.
2019-11-19 19:55:43 +05:30
Matt Arsenault 19e7f8a21d AMDGPU: Add default denormal mode to MachineFunctionInfo
The default FP mode should really be a property of a specific
function, and not a subtarget. Introduce the necessary fields to the
SIMachineFunctionInfo to help move towards this goal.
2019-11-01 00:03:39 -07:00
Piotr Sobczak 265e94e657 [AMDGPU] Extend buffer intrinsics with swizzling
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.

Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store

Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.

The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.

There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.

Reviewers: arsenm, nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68200

llvm-svn: 373491
2019-10-02 17:22:36 +00:00
Matt Arsenault cfdc2b9bd9 AMDGPU: Disambiguate v3f16 format in load/store tables
Currently the searchable tables report the number of dwords. These
round to the same number for 3 and 4 component d16
instructions. Change this to report the number of elements so this
isn't ambiguous.

llvm-svn: 369202
2019-08-18 00:20:43 +00:00
Stanislav Mekhanoshin 7b5a54e369 [AMDGPU] Fixed occupancy calculation for gfx10
Differential Revision: https://reviews.llvm.org/D65010

llvm-svn: 366616
2019-07-19 21:29:51 +00:00
Stanislav Mekhanoshin 50d7f46460 [AMDGPU] gfx908 mAI instructions, MC part
Differential Revision: https://reviews.llvm.org/D64446

llvm-svn: 365563
2019-07-09 21:43:09 +00:00
Dmitry Preobrazhensky 2eff0318c6 [AMDGPU][MC] Corrected parsing of FLAT offset modifier
Summary of changes:

- simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset;
- provided information about error position for pre-gfx9 targets;
- improved errors handling.

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D64244

llvm-svn: 365321
2019-07-08 14:27:37 +00:00
Dmitry Preobrazhensky e1eb25ff3e [AMDGPU][MC] Fix 2 for sanitizer failure in 364645
llvm-svn: 364656
2019-06-28 16:28:46 +00:00
Dmitry Preobrazhensky 1d572ce395 [AMDGPU][MC] Enabled constant expressions as operands of sendmsg
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D62735

llvm-svn: 364645
2019-06-28 14:14:02 +00:00
Ryan Taylor 9ab812d475 [AMDGPU] Fix for branch offset hardware workaround
Summary:
This fixes a hardware bug that makes a branch offset of 0x3f unsafe.
This replaces the 32 bit branch with offset 0x3f to a 64 bit
instruction that includes the same 32 bit branch and the encoding
for a s_nop 0 to follow. The relaxer than modifies the offsets
accordingly.

Change-Id: I10b7aed99d651f8159401b01bb421f105fa6288e

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63494

llvm-svn: 364451
2019-06-26 17:34:57 +00:00
Stanislav Mekhanoshin 8bcc9bb595 [AMDGPU] gfx1010 base changes for wave32
Differential Revision: https://reviews.llvm.org/D63293

llvm-svn: 363299
2019-06-13 19:18:29 +00:00
Dmitry Preobrazhensky 1fca3b1972 [AMDGPU][MC] Enabled constant expressions as operands of s_getreg/s_setreg
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61125

llvm-svn: 363255
2019-06-13 12:46:37 +00:00
Piotr Sobczak 9b11e93d90 [AMDGPU] Optimize image_[load|store]_mip
Summary:
Replace image_load_mip/image_store_mip
with image_load/image_store if lod is 0.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63073

llvm-svn: 362957
2019-06-10 15:58:51 +00:00
Stanislav Mekhanoshin 692560dc98 [AMDGPU] gfx1010 MIMG implementation
Differential Revision: https://reviews.llvm.org/D61339

llvm-svn: 359698
2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin 956b0be72e [AMDGPU] gfx1010 utility functions
Differential Revision: https://reviews.llvm.org/D61094

llvm-svn: 359224
2019-04-25 18:53:41 +00:00
Stanislav Mekhanoshin cee607e414 [AMDGPU] Add gfx1010 target definitions
Differential Revision: https://reviews.llvm.org/D61041

llvm-svn: 359113
2019-04-24 17:03:15 +00:00
Matt Arsenault 055e4dce45 AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.

Also introduce a new amdgpu-ieee attribute to match.

The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.

Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.

llvm-svn: 357302
2019-03-29 19:14:54 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
David Stuttard f77079f892 [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda


Work around for ppcle compiler bug

Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
2019-01-14 11:55:24 +00:00
Neil Henning 76504a4c5e [AMDGPU] Extend the SI Load/Store optimizer to combine more things.
I've extended the load/store optimizer to be able to produce dwordx3
loads and stores, This change allows many more load/stores to be combined,
and results in much more optimal code for our hardware.

Differential Revision: https://reviews.llvm.org/D54042

llvm-svn: 348937
2018-12-12 16:15:21 +00:00
Nicolai Haehnle a7b00058e0 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 348050
2018-11-30 22:55:38 +00:00
David Stuttard c6603861d8 Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

llvm-svn: 347911
2018-11-29 20:14:17 +00:00
David Stuttard de02e4b1cc Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda
llvm-svn: 347871
2018-11-29 15:21:13 +00:00
Nicolai Haehnle 1a94cbb3f5 AMDGPU/InsertWaitcnts: Untangle some semi-global state
Summary:
Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the
   first one which determines the required wait, if any, without changing
   the ScoreBrackets, and the second one which actually inserts the wait
   and updates the brackets.

2. Communicate pre-existing s_waitcnt instructions using an argument to
   generateWaitcntInstBefore instead of through the ScoreBrackets.

To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.

There are some functional changes:

1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
   SMEM instructions as required instead of waiting on all counters.

2. We now properly track pre-existing waitcnt's in all cases, which leads
   to less conservative waitcnts being emitted in some cases.

     s_load_dword ...
     s_waitcnt lgkmcnt(0)    <-- pre-existing wait count
     ds_read_b32 v0, ...
     ds_read_b32 v1, ...
     s_waitcnt lgkmcnt(0)    <-- this is too conservative
     use(v0)
     more code
     use(v1)

   This increases code size a bit, but the reduced latency should still be a
   win in basically all cases. The worst code size regressions in my shader-db
   are:

 WORST REGRESSIONS - Code Size
 Before After     Delta Percentage
   1724  1736        12    0.70 %   shaders/private/f1-2015/1334.shader_test [0]
   2276  2284         8    0.35 %   shaders/private/f1-2015/1306.shader_test [0]
   4632  4640         8    0.17 %   shaders/private/ue4_elemental/62.shader_test [0]
   2376  2384         8    0.34 %   shaders/private/f1-2015/1308.shader_test [0]
   3284  3292         8    0.24 %   shaders/private/talos_principle/1955.shader_test [0]

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54226

llvm-svn: 347848
2018-11-29 11:06:06 +00:00
Nicolai Haehnle bc233f5523 Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"
This reverts commit r344696 for now (except for some test additions).

See https://bugs.freedesktop.org/show_bug.cgi?id=108611.

llvm-svn: 346364
2018-11-07 21:53:43 +00:00
Konstantin Zhuravlyov 108927b944 AMDGPU: Add sram-ecc feature
Differential Revision: https://reviews.llvm.org/D53222

llvm-svn: 346177
2018-11-05 22:44:19 +00:00
Nicolai Haehnle c4a2ff0950 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 344696
2018-10-17 15:37:30 +00:00
Konstantin Zhuravlyov 71e43ee47d AMDGPU: Re-apply r341982 after fixing the layering issue
Move isa version determination into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).

llvm-svn: 342069
2018-09-12 18:50:47 +00:00
Ilya Biryukov 95066496d0 Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser."
This reverts commit r341982.

The change introduced a layering violation. Reverting to unbreak
our integrate.

llvm-svn: 342023
2018-09-12 07:05:30 +00:00
Konstantin Zhuravlyov 941615e4c8 AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination
into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).

Differential Revision: https://reviews.llvm.org/D51890

llvm-svn: 341982
2018-09-11 18:56:51 +00:00
Tim Renouf 4f703f5e11 [AMDGPU] New buffer intrinsics
Summary:
This commit adds new intrinsics
  llvm.amdgcn.raw.buffer.load
  llvm.amdgcn.raw.buffer.load.format
  llvm.amdgcn.raw.buffer.load.format.d16
  llvm.amdgcn.struct.buffer.load
  llvm.amdgcn.struct.buffer.load.format
  llvm.amdgcn.struct.buffer.load.format.d16
  llvm.amdgcn.raw.buffer.store
  llvm.amdgcn.raw.buffer.store.format
  llvm.amdgcn.raw.buffer.store.format.d16
  llvm.amdgcn.struct.buffer.store
  llvm.amdgcn.struct.buffer.store.format
  llvm.amdgcn.struct.buffer.store.format.d16
  llvm.amdgcn.raw.buffer.atomic.*
  llvm.amdgcn.struct.buffer.atomic.*

with the following changes from the llvm.amdgcn.buffer.*
intrinsics:

* there are separate raw and struct versions: raw does not have an
  index arg and sets idxen=0 in the instruction, and struct always sets
  idxen=1 in the instruction even if the index is 0, to allow for the
  fact that gfx9 does bounds checking differently depending on whether
  idxen is set;

* there is a combined cachepolicy arg (glc+slc)

* there are now only two offset args: one for the offset that is
  included in bounds checking and swizzling, to be split between the
  instruction's voffset and immoffset fields, and one for the offset
  that is excluded from bounds checking and swizzling, to go into the
  instruction's soffset field.

The AMDISD::BUFFER_* SD nodes always have an index operand, all three
offset operands, combined cachepolicy operand, and an extra idxen
operand.

The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D50306

Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205
llvm-svn: 340269
2018-08-21 11:07:10 +00:00
Ryan Taylor 894c8fd0e2 [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero
Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to zero, if so remove lod and change opcode to equivalent mapped _LZ.

Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71

Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49483

llvm-svn: 338523
2018-08-01 12:12:01 +00:00
Tom Stellard c5a154db48 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

llvm-svn: 335942
2018-06-28 23:47:12 +00:00
Scott Linder 1e8c2c705d [AMDGPU] Update assembler for HSA Code Object v3
Update AMDGPU assembler syntax behind the code-object-v3 feature:

* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Provide more diagnostics (e.g. values out of range, missing values, repeated
  values).
* Provide path for backwards compatibility, even with underlying descriptor
  changes.

Differential Revision: https://reviews.llvm.org/D47736

llvm-svn: 335281
2018-06-21 19:38:56 +00:00
Nicolai Haehnle 7a9c03f484 AMDGPU: Select MIMG instructions manually in SITargetLowering
Summary:
Having TableGen patterns for image intrinsics is hitting limitations:
for D16 we already have to manually pre-lower the packing of data
values, and we will have to do the same for A16 eventually.

Since there is already some custom C++ code anyway, it is arguably easier
to just do everything in C++, now that we can use the beefed-up generic
tables backend of TableGen to provide all the required metadata and map
intrinsics to corresponding opcodes. With this approach, all image
intrinsic lowering happens in SITargetLowering::lowerImage. That code is
dense due to all the cases that it handles, but it should still be easier
to follow than what we had before, by virtue of it all being done in a
single location, and by virtue of not relying on the TableGen pattern
magic that very few people really understand.

This means that we will have MachineSDNodes with MIMG instructions
during DAG combining, but that seems alright: previously we had
intrinsic nodes instead, but those are similarly opaque to the generic
CodeGen infrastructure, and the final pattern matching just did a 1:1
translation to machine instructions anyway. If anything, the fact that
we now merge the address words into a vector before DAG combine should
be an advantage.

Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6

Reviewers: arsenm, rampitec, rtaylor, tstellar

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48017

llvm-svn: 335228
2018-06-21 13:36:57 +00:00
Nicolai Haehnle 0ab200b6c9 AMDGPU: Refactor MIMG instruction TableGen using generic tables
Summary:
This allows us to access rich information about MIMG opcodes from C++ code.
Simplifying the mapping between equivalent opcodes of different data size
becomes quite natural.

This also flattens the MIMG-related class and multiclass hierarchy a little,
and collapses together some of the scaffolding for sample and gather4 opcodes.

Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48016

llvm-svn: 335227
2018-06-21 13:36:44 +00:00
Nicolai Haehnle f267431901 AMDGPU: Turn D16 for MIMG instructions into a regular operand
Summary:
This allows us to reduce the number of different machine instruction
opcodes, which reduces the table sizes and helps flatten the TableGen
multiclass hierarchies.

We can do this because for each hardware MIMG opcode, we have a full set
of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata
and vaddr registers. Instead of having separate D16 machine instructions,
a packed D16 instructions loading e.g. 4 components can simply use the
same V2 opcode variant that non-D16 instructions use.

We still require a TSFlag for D16 buffer instructions, because the
D16-ness of buffer instructions is part of the opcode. Renaming the flag
should help avoid future confusion.

The one non-obvious code change is that for gather4 instructions, the
disassembler can no longer automatically decide whether to use a V2 or
a V4 variant. The existing logic which choose the correct variant for
other MIMG instruction is extended to cover gather4 as well.

As a bonus, some of the assembler error messages are now more helpful
(e.g., complaining about a wrong data size instead of a non-existing
instruction).

While we're at it, delete a whole bunch of dead legacy TableGen code.

Change-Id: I89b02c2841c06f95e662541433e597f5d4553978

Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor

Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47434

llvm-svn: 335222
2018-06-21 13:36:01 +00:00
Konstantin Zhuravlyov 00f2cb1116 AMDHSA: Code object v3 updates
- Do not emit following assembler directives:
  - .hsa_code_object_version
  - .hsa_code_object_isa
  - .amd_amdgpu_isa
  - .amd_amdgpu_hsa_metadata
  - .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
  alignments

llvm-svn: 334519
2018-06-12 18:02:46 +00:00
Konstantin Zhuravlyov c72ece6c2c AMDGPU : Recalculate SGPRs when trap handler is supported
Differential Revision: https://reviews.llvm.org/D29911

llvm-svn: 332523
2018-05-16 20:47:48 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Alexander Timofeev 2e5eeceeb7 Pass Divergence Analysis data to Selection DAG to drive divergence
dependent instruction selection.

Differential revision: https://reviews.llvm.org/D35267

llvm-svn: 326703
2018-03-05 15:12:21 +00:00
Matt Arsenault bcf7bec4b8 AMDGPU: Fix layering issue
Move utility function that depends on codegen.
Fixes build with r324487 reapplied.

llvm-svn: 324746
2018-02-09 16:57:48 +00:00
Dmitry Preobrazhensky 0a1ff464e1 [AMDGPU][MC] Corrected dst/data size for MIMG opcodes with d16 modifier
See bug 36154: https://bugs.llvm.org/show_bug.cgi?id=36154

Differential Revision: https://reviews.llvm.org/D42847

Reviewers: cfang, artem.tamazov, arsenm
llvm-svn: 324237
2018-02-05 14:18:53 +00:00
Dmitry Preobrazhensky e3271aee44 [AMDGPU][MC] Added validation of d16 and r128 modifiers of MIMG opcodes
See bugs 36094, 36095:
  https://bugs.llvm.org/show_bug.cgi?id=36094
  https://bugs.llvm.org/show_bug.cgi?id=36095

Differential Revision: https://reviews.llvm.org/D42692

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 324231
2018-02-05 12:45:43 +00:00
Dmitry Preobrazhensky 0b4eb1ead1 [AMDGPU][MC] Added support of 64-bit image atomics
See bug 35998: https://bugs.llvm.org/show_bug.cgi?id=35998

Differential Revision: https://reviews.llvm.org/D42469

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323534
2018-01-26 15:43:29 +00:00
Dmitry Preobrazhensky 3afbd825a3 [AMDGPU][MC][GFX8][GFX9] Added XNACK_MASK support
See bug 35764: https://bugs.llvm.org/show_bug.cgi?id=35764

Differential Revision: https://reviews.llvm.org/D41614

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322189
2018-01-10 14:22:19 +00:00
Matt Arsenault cad7fa857c AMDGPU: Partially fix disassembly of MIMG instructions
Stores failed to decode at all since they didn't have a
DecoderNamespace set. Loads worked, but did not change
the register width displayed to match the numbmer of
enabled channels.

The number of printed registers for vaddr is still wrong,
but I don't think that's encoded in the instruction so
there's not much we can do about that.

Image atomics are still broken. MIMG is the same
encoding for SI/VI, but the image atomic classes
are split up into encoding specific versions unlike
every other MIMG instruction. They have isAsmParserOnly
set on them for some reason. dmask is also special for
these, so we probably should not have it as an explicit
operand as it is now.

llvm-svn: 320614
2017-12-13 21:07:51 +00:00
Konstantin Zhuravlyov 435151ad75 AMDGPU: Fix set but not used warnings related to AMDGPUAS
Differential Revision: https://reviews.llvm.org/D39499

llvm-svn: 317114
2017-11-01 19:12:38 +00:00
Konstantin Zhuravlyov eda425edd4 AMDGPU: Do not emit deprecated notes for code object v3
Differential Revision: https://reviews.llvm.org/D38749

llvm-svn: 315810
2017-10-14 15:59:07 +00:00
Konstantin Zhuravlyov 9c05b2bc3b AMDGPU: Add support for isa version note
- Emit NT_AMD_AMDGPU_ISA
  - Add assembler parsing for isa version directive
    - If isa version directive does not match command line arguments, then return error

Differential Revision: https://reviews.llvm.org/D38748

llvm-svn: 315808
2017-10-14 15:40:33 +00:00
Matt Arsenault 894e53d6ac AMDGPU: Fix using SMRD instructions for argument loads in functions
These are not actually uniform values except in kernels.

llvm-svn: 309172
2017-07-26 20:39:42 +00:00
Dmitry Preobrazhensky dc4ac823ec [AMDGPU][MC] Corrected V_*QSAD* instructions to check that dest register is different than any of the src
See Bug 33279: https://bugs.llvm.org//show_bug.cgi?id=33279

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D34003

llvm-svn: 305915
2017-06-21 14:41:34 +00:00
Konstantin Zhuravlyov 5b0bf2ff0d AMDGPU: Remove deprecated and unused elf definitions
Differential Revision: https://reviews.llvm.org/D33689

llvm-svn: 304737
2017-06-05 21:33:40 +00:00
Sam Kolton f7659d71eb [AMDGPU] SDWA: Add assembler support for GFX9
Summary:
Added separate pseudo and real instruction for GFX9 SDWA instructions.
Currently supports only in assembler.
Depends D32493

Reviewers: vpykhtin, artem.tamazov

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D33132

llvm-svn: 303620
2017-05-23 10:08:55 +00:00
Matt Arsenault 2b1f9aa577 AMDGPU: Start defining a calling convention
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

llvm-svn: 303308
2017-05-17 21:56:25 +00:00
Matt Arsenault efa9f4b210 AMDGPU: Refactor SIMachineFunctionInfo slightly
Prepare for handling non-entry functions.

llvm-svn: 299999
2017-04-11 22:29:28 +00:00
Matt Arsenault e622dc3803 AMDGPU: Refactor argument lowering
Split into smaller functions and prepare for handling
non-entry functions.

llvm-svn: 299998
2017-04-11 22:29:24 +00:00
Matt Arsenault 678e111e11 AMDGPU: Fix crash when disassembling VOP3 mac
The unused dummy src2_modifiers is missing, so it crashes
when trying to print it.

I tried to fully remove src2_modifiers, but there are some
irritations in the places where it is converted to mad since
it starts to require modifying use lists while iterating over
them.

llvm-svn: 299861
2017-04-10 17:58:06 +00:00
Yaxun Liu 1a14bfa022 [AMDGPU] Get address space mapping by target triple environment
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.

The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.

Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.

Differential Revision: https://reviews.llvm.org/D31284

llvm-svn: 298846
2017-03-27 14:04:01 +00:00
Dmitry Preobrazhensky 03880f8d24 [AMDGPU][MC] Fix for Bug 30829 + LIT tests
Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction).
Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code).
Added LIT tests.

llvm-svn: 296873
2017-03-03 14:31:06 +00:00
Matt Arsenault 9be7b0d485 AMDGPU: Add VOP3P instruction format
Add a few non-VOP3P but instructions related to packed.

Includes hack with dummy operands for the benefit of the assembler

llvm-svn: 296368
2017-02-27 18:49:11 +00:00
Matt Arsenault e823d92f7f AMDGPU: Merge initial gfx9 support
llvm-svn: 295554
2017-02-18 18:29:53 +00:00
Eugene Zelenko d96089b248 [MC] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
Same changes in files affected by reduced MC headers dependencies.

llvm-svn: 295009
2017-02-14 00:33:36 +00:00
Konstantin Zhuravlyov 9f89ede107 [AMDGPU] Add target information that is required by tools to metadata
Differential Revision: https://reviews.llvm.org/D28760#fb670e28

llvm-svn: 294449
2017-02-08 14:05:23 +00:00
Tom Stellard 08efb7ebf6 AMDGPU/SI: Move some ISel helpers into utils so they can be shared with GISel
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D29068

llvm-svn: 293321
2017-01-27 18:41:14 +00:00
Matt Arsenault 4bd7236193 AMDGPU: Fix handling of 16-bit immediates
Since 32-bit instructions with 32-bit input immediate behavior
are used to materialize 16-bit constants in 32-bit registers
for 16-bit instructions, determining the legality based
on the size is incorrect. Change operands to have the size
specified in the type.

Also adds a workaround for a disassembler bug that
produces an immediate MCOperand for an operand that
is supposed to be OPERAND_REGISTER.

The assembler appears to accept out of bounds immediates and
truncates them, but this seems to be an issue for 32-bit
already.

llvm-svn: 289306
2016-12-10 00:39:12 +00:00
Matt Arsenault 26faed3960 AMDGPU: Consolidate inline immediate predicate functions
llvm-svn: 288718
2016-12-05 22:26:17 +00:00
Tom Stellard b133fbb9a4 AMDGPU/SI: Handle hazard with > 8 byte VMEM stores
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25577

llvm-svn: 285359
2016-10-27 23:05:31 +00:00
Konstantin Zhuravlyov 08326b6256 [AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only)
Differential Revision: https://reviews.llvm.org/D25693

llvm-svn: 284759
2016-10-20 18:12:38 +00:00
Krzysztof Parzyszek c87155037b [AMDGPU] Stop using MCRegisterClass::getSize()
Differential Review: https://reviews.llvm.org/D24675

llvm-svn: 284619
2016-10-19 17:40:36 +00:00
Konstantin Zhuravlyov cdd4547607 [AMDGPU] Refactor waitcnt encoding
- Refactor bit packing/unpacking
- Calculate bit mask given bit shift and bit width
- Introduce function for decoding bits of waitcnt
- Introduce function for encoding bits of waitcnt
- Introduce function for getting waitcnt mask (instead of using bare numbers)
- Introduce function fot getting max waitcnt(s) (instead of using bare numbers)

Differential Revision: https://reviews.llvm.org/D25298

llvm-svn: 283919
2016-10-11 18:58:22 +00:00