Commit Graph

269 Commits

Author SHA1 Message Date
Nicolai Haehnle 283b995097 AMDGPU: Fix getInstSizeInBytes
Summary:
Add some optional code to validate getInstSizeInBytes for emitted
instructions. This flushed out some issues which are fixed by this
patch:

- Streamline getInstSizeInBytes
- Properly define the VI readlane/writelane instruction as VOP3
- Fix the inline constant determination. Specifically, this change
  fixes an issue where a 32-bit value of 0xffffffff was recorded
  as unsigned. This is equal to -1 when restricting to a 32-bit
  comparison, and an inline constant can be used.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D50629

Change-Id: Id87c3b7975839da0de8156a124b0ce98c5fb47f2
llvm-svn: 340903
2018-08-29 07:46:09 +00:00
Matt Arsenault de6c421cc8 AMDGPU: Shrink insts to fold immediates
This needs to be done in the SSA fold operands
pass to be effective, so there is a bit of overlap
with SIShrinkInstructions but I don't think this
is practically avoidable.

llvm-svn: 340859
2018-08-28 18:34:24 +00:00
Matt Arsenault 35b1902bce AMDGPU: Move canShrink into TII
llvm-svn: 340855
2018-08-28 18:22:34 +00:00
Tim Renouf 904343f879 [AMDGPU] Add support for multi-dword s.buffer.load intrinsic
Summary:
Patch by Marek Olsak and David Stuttard, both of AMD.

This adds a new amdgcn intrinsic supporting s.buffer.load, in particular
multiple dword variants. These are convenient to use from some front-end
implementations.

Also modified the existing llvm.SI.load.const intrinsic to common up the
underlying implementation.

This modification also requires that we can lower to non-uniform loads correctly
by splitting larger dword variants into sizes supported by the non-uniform
versions of the load.

V2: Addressed minor review comments.
V3: i1 glc is now i32 cachepolicy for consistency with buffer and
    tbuffer intrinsics, plus fixed formatting issue.
V4: Added glc test.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D51098

Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b
llvm-svn: 340684
2018-08-25 14:53:17 +00:00
Marcello Maggioni 5ca4128b45 [PSV] Update API to be able to use TargetCustom without UB.
getTargetCustom() requires values for "Kind" in the constructor
that are not in the PSVKind enum. Passing a value that is not inside
an enum as an argument to a constructor of the type of the enum is
UB. Changing to the underlying type of the enum would solve the UB

Differential Revision: https://reviews.llvm.org/D50909

llvm-svn: 340200
2018-08-20 19:23:45 +00:00
Chandler Carruth c73c0307fe [MI] Change the array of `MachineMemOperand` pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

llvm-svn: 339940
2018-08-16 21:30:05 +00:00
Nicolai Haehnle 7f0d05d532 AMDGPU: Force skip over s_sendmsg and exp instructions
Summary:
These instructions interact with hardware blocks outside the shader core,
and they can have "scalar" side effects even when EXEC = 0. We don't
want these scalar side effects to occur when all lanes want to skip
these instructions, so always add the execz skip branch instruction
for basic blocks that contain them.

Also ensure that we skip scalar stores / atomics, though we don't
code-gen those yet.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48431

Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44
llvm-svn: 338235
2018-07-30 09:23:59 +00:00
Matt Arsenault 0183c56c11 AMDGPU: Fix code size for return_to_epilog pseudo
llvm-svn: 338113
2018-07-27 09:15:03 +00:00
Tom Stellard 5bfbae5cb1 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

llvm-svn: 336851
2018-07-11 20:59:01 +00:00
Tom Stellard c5a154db48 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

llvm-svn: 335942
2018-06-28 23:47:12 +00:00
Stanislav Mekhanoshin 739174c4be [AMDGPU] Construct memory clauses before RA
Memory clauses are formed into bundles in presence of xnack.
Their source operands are marked as early-clobber.

This allows to allocate distinct source and destination registers
within a clause and prevent breaking the clause with s_nop in the
hazard recognizer.

Clauses are undone before post-RA scheduler to allow some rescheduling,
which will not break the clause since artificial edges are created in
the dag to keep memory operations together. Yet this allows a better
ILP in some cases.

Differential Revision: https://reviews.llvm.org/D47511

llvm-svn: 333691
2018-05-31 20:13:51 +00:00
Tom Stellard 44b30b4537 AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.

This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.

I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46272

llvm-svn: 332930
2018-05-22 02:03:23 +00:00
Stanislav Mekhanoshin 432936161e [AMDGPU] Added checks for dpp_ctrl value
- Report error for invalid dpp_ctrl values.
- Changed the way it is reported, now the error will be emitted into
  asm and will work with release build as well.
- Added dpp_ctrl value verifier for codegen.
- Added symbolic constants for dpp_ctrl.

Differential Revision: https://reviews.llvm.org/D46565

llvm-svn: 331775
2018-05-08 16:53:02 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Matt Arsenault 0084adc516 AMDGPU: Add Vega12 and Vega20
Changes by
  Matt Arsenault
  Konstantin Zhuravlyov

llvm-svn: 331215
2018-04-30 19:08:16 +00:00
Stanislav Mekhanoshin a4bfb3c446 [AMDGPU] Truncate packed inline constant
If a packed inline constant is sign extended it must be truncated
after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented
as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended
to 64 bit. After the value shifted right by 16 to use it in a low part
with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline
constant any longer.

Fixed the error and added verification code. Without the fix and with
the verification bug is causing pk_max_f16_literal.ll to fail.

Differential Revision: https://reviews.llvm.org/D45987

llvm-svn: 330752
2018-04-24 18:17:55 +00:00
Matt Arsenault b21f9592be AMDGPU: Move a flawed assert when spilling SGPRs
It's possible to validly spill the frame offset register
in a call sequence to a VGPR. There are definitely issues
with SGPR spilling to memory, so move the assert later.

llvm-svn: 330612
2018-04-23 16:13:30 +00:00
Matt Arsenault adc59d7076 AMDGPU: Assign enum name to stack ID
Also assert that it is correct for SGPRs. There is currently a bug
where stack slot coloring replaces SGPR spill FIs with one with
the default ID, which results in a more confusing assert later
about a dead object.

llvm-svn: 330607
2018-04-23 15:51:26 +00:00
Nicolai Haehnle 7a87977fb2 AMDGPU: Legalize the operand of SI_INIT_M0
Summary:
This fixes a case where the argument to a sendmsg intrinsic
ends up in a VGPR, for whatever reason.

The underlying performance issue is that a multiplication that
can be an s_mul_i32 is instead needlessly generated as
v_mul_u32_u24, but this is not addressed by this patch.

Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45826

llvm-svn: 330393
2018-04-20 07:14:25 +00:00
Stanislav Mekhanoshin 160f85794d [AMDGPU] Use packed literals with zero either lower or hi part
Differential Revision: https://reviews.llvm.org/D45790

llvm-svn: 330365
2018-04-19 21:16:50 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
Matt Arsenault 69932e4d69 AMDGPU: Don't leave dead illegal VGPR->SGPR copies
Normally DCE kills these, but at -O0 these get left behind
leaving suspicious looking illegal copies.

Replace with IMPLICIT_DEF to avoid iterator issues.

llvm-svn: 327842
2018-03-19 14:07:15 +00:00
Tim Renouf 2a99fa2c08 [AMDGPU] added writelane intrinsic
Summary:
For use by LLPC SPV_AMD_shader_ballot extension.

The v_writelane instruction was already implemented for use by SGPR
spilling, but I had to add an extra dummy operand tied to the
destination, to represent that all lanes except the selected one keep
the old value of the destination register.

.ll test changes were due to schedule changes caused by that new
operand.

Differential Revision: https://reviews.llvm.org/D42838

llvm-svn: 326353
2018-02-28 19:10:32 +00:00
Marek Olsak 7d92b7e23a AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU
Author: Bas Nieuwenhuizen

https://reviews.llvm.org/D42881

llvm-svn: 324353
2018-02-06 15:17:55 +00:00
Marek Olsak d4bb329d0e AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9
Summary:
This enables load merging into x2, x4, which is driven by inline offsets.

6500 shaders are affected:
Code Size in affected shaders: -15.14 %

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42078

llvm-svn: 323909
2018-01-31 20:18:11 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Sam Kolton 5f7f32c382 [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole.
Summary:

Reviewers: arsenm, vpykhtin, rampitec

Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D37817

llvm-svn: 319662
2017-12-04 16:22:32 +00:00
Matt Arsenault 686d5c728f AMDGPU: Use carry-less adds in FI elimination
llvm-svn: 319501
2017-11-30 23:42:30 +00:00
Matt Arsenault 84445dd13c AMDGPU: Use gfx9 carry-less add/sub instructions
llvm-svn: 319491
2017-11-30 22:51:26 +00:00
Nicolai Haehnle 39980dac0b AMDGPU: Consistently check for immediates in SIInstrInfo::FoldImmediate
Summary:
The PeepholeOptimizer pass calls this function solely based on checking
DefMI->isMoveImmediate(), which only checks the MoveImm bit of the
instruction description. So it's up to FoldImmediate itself to properly
check that DefMI *actually* moves from an immediate.

I don't have a separate test case for this, but the next patch introduces
a test case which happens to crash without this change.

This error is caught by the assertion in MachineOperand::getImm().

Change-Id: I88e7cdbcf54d75e1a296822e6fe5f9a5f095bbf8

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40342

llvm-svn: 319155
2017-11-28 08:41:50 +00:00
David Blaikie b3bde2ea50 Fix a bunch more layering of CodeGen headers that are in Target
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).

llvm-svn: 318490
2017-11-17 01:07:10 +00:00
Matt Arsenault 301162c4fe AMDGPU: Replace i64 add/sub lowering
Use VOP3 add/addc like usual.

This has some tradeoffs. Inline immediates fold
a little better, but other constants are worse off.
SIShrinkInstructions could be made smarter to handle
these cases.

This allows us to avoid selecting scalar adds where we
need to track the carry in scc and replace its users.
This makes it easier to use the carryless VALU adds.

llvm-svn: 318340
2017-11-15 21:51:43 +00:00
Marek Olsak ffadcb744b AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM
Summary:
-5.3% code size in affected shaders.

Changed stats only:

48486 shaders in 30489 tests
Totals:
SGPRS: 2086406 -> 2072430 (-0.67 %)
VGPRS: 1626872 -> 1627960 (0.07 %)
Spilled SGPRs: 7865 -> 7912 (0.60 %)
Code Size: 60978060 -> 60188764 (-1.29 %) bytes
Max Waves: 374530 -> 374342 (-0.05 %)

Totals from affected shaders:
SGPRS: 299664 -> 285688 (-4.66 %)
VGPRS: 233844 -> 234932 (0.47 %)
Spilled SGPRs: 3959 -> 4006 (1.19 %)
Code Size: 14905272 -> 14115976 (-5.30 %) bytes
Max Waves: 46202 -> 46014 (-0.41 %)

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38915

llvm-svn: 317750
2017-11-09 01:52:17 +00:00
Marek Olsak 5914ece6aa AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset
Summary:
Apps that benefit:
- alien isolation
- bioshock infinite
- civilization: beyond earth
- company of heroes 2
- dirt showdown
- dota 2
- F1 2015
- grid autosport
- hitman
- legend of grimrock
- serious sam 3: bfe
- shadow warrior
- talos principle
- total war: warhammer
- UE4 demos: effects cave, elemental, sun temple

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D38914

llvm-svn: 317038
2017-10-31 21:06:42 +00:00
Marek Olsak ce76ea0394 AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.

Also allow kill in all shader stages.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38544

llvm-svn: 316427
2017-10-24 10:27:13 +00:00
Matt Arsenault 9ab1fa6803 AMDGPU: Fix not accounting for instruction size in bundles
These were counted as 0. Fixes branch limit exceeded errors
in some large programs.

llvm-svn: 314944
2017-10-04 22:59:12 +00:00
Nicolai Haehnle ce4ddd06da AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC
The hardware will only forward EXEC_LO; the high 32 bits will be zero.

Additionally, inline constants do not work. At least,

   v_addc_u32_e64 v0, vcc, v0, v1, -1

which could conceivably be used to combine (v0 + v1 + 1) into a single
instruction, acts as if all carry-in bits are zero.

The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine

   s_mov_b64 s[0:1], exec
   v_cndmask_b32_e64 v0, v1, v2, s[0:1]

into

   v_mov_b32 v0, v3

but it's not particularly high priority.

Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.*

llvm-svn: 314522
2017-09-29 15:37:31 +00:00
Matt Arsenault fdcdd88d57 AMDGPU: Fix crash on immediate operand
We can have a v_mac with an immediate src0.
We can still fold if it's an inline immediate,
otherwise it already uses the constant bus.

llvm-svn: 313852
2017-09-21 00:45:59 +00:00
Konstantin Zhuravlyov ca8946a376 AMDGPU: Start selecting s_xnor_{b32, b64}
Differential Revision: https://reviews.llvm.org/D37981

llvm-svn: 313565
2017-09-18 21:22:45 +00:00
Jan Sjodin 1f2f57a7ea Fix warnings in r313297.
llvm-svn: 313302
2017-09-14 21:49:52 +00:00
Matt Arsenault c317287fde AMDGPU: Fix violating constant bus restriction
You can't use madmk/madmk if it already uses an SGPR input.

llvm-svn: 313298
2017-09-14 20:54:29 +00:00
Jan Sjodin 312ccf761c Add AddresSpace to PseudoSourceValue.
Differential Revision: https://reviews.llvm.org/D35089

llvm-svn: 313297
2017-09-14 20:53:51 +00:00
Matt Arsenault ecb43ef1bc AMDGPU: Don't spill SP reg like a normal CSR
llvm-svn: 313217
2017-09-13 23:47:01 +00:00
Stanislav Mekhanoshin 7fe9a5d9b4 Allow target to decide when to cluster loads/stores in misched
MachineScheduler when clustering loads or stores checks if base
pointers point to the same memory. This check is done through
comparison of base registers of two memory instructions. This
works fine when instructions have separate offset operand. If
they require a full calculated pointer such instructions can
never be clustered according to such logic.

Changed shouldClusterMemOps to accept base registers as well and
let it decide what to do about it.

Differential Revision: https://reviews.llvm.org/D37698

llvm-svn: 313208
2017-09-13 22:20:47 +00:00
Stanislav Mekhanoshin 710da42b86 [AMDGPU] Produce madak and madmk from the two-address pass
These two instructions are normally selected, but when the
two address pass converts mac into mad we end up with the
mad where we could have one of these.

Differential Revision: https://reviews.llvm.org/D37389

llvm-svn: 312928
2017-09-11 17:13:57 +00:00
Stanislav Mekhanoshin 949fac9e40 [AMDGPU] Fix shouldClusterMemOps to process flat loads
Flat loads do not have vdata operand but have vdst instead.

Differential Revision: https://reviews.llvm.org/D37502

llvm-svn: 312640
2017-09-06 15:31:30 +00:00
Eugene Zelenko 59e128266c [AMDGPU] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 310328
2017-08-08 00:47:13 +00:00
Connor Abbott 66b9bd6e50 [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic
Summary:
This intrinsic lets us set inactive lanes to an identity value when
implementing wavefront reductions. In combination with Whole Wavefront
Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan.
Lowering the intrinsic needs to happen post-RA so that RA knows that the
destination isn't completely overwritten due to the EXEC shenanigans, so
we need another pseudo-instruction to represent the un-lowered
intrinsic.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34719

llvm-svn: 310088
2017-08-04 18:36:54 +00:00
Connor Abbott 92638ab625 [AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level.  This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.

Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.

Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.

Reviewers: arsenm, nhaehnle, tpr

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D35524

llvm-svn: 310087
2017-08-04 18:36:52 +00:00
Connor Abbott 8c217d0a29 [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM
Summary:
Previously, we assumed that certain types of instructions needed WQM in
pixel shaders, particularly DS instructions and image sampling
instructions. This was ok because with OpenGL, the assumption was
correct. But we want to start using DPP instructions for derivatives as
well as other things, so the assumption that we can infer whether to use
WQM based on the instruction won't continue to hold. This intrinsic lets
frontends like Mesa indicate what things need WQM based on their
knowledge of the API, rather than second-guessing them in the backend.
We need to keep around the old method of enabling WQM, but eventually we
should remove it once Mesa catches up. For now, this will let us use DPP
instructions for computing derivatives correctly.

Reviewers: arsenm, tpr, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35167

llvm-svn: 310085
2017-08-04 18:36:49 +00:00