Commit Graph

295 Commits

Author SHA1 Message Date
Matt Arsenault 3ead7d7389 AMDGPU: Fix packing undef parts of build_vector
llvm-svn: 339511
2018-08-12 08:42:46 +00:00
Matt Arsenault a7dfd48310 AMDGPU: Use SPseudoInst helper
llvm-svn: 338631
2018-08-01 20:49:00 +00:00
Matt Arsenault de496c32a4 AMDGPU: Reduce code size with fcanonicalize (fneg x)
When fcanonicalize is lowered to a mul, we can
use -1.0 for free and avoid the cost of the bigger
encoding for source modifers.

llvm-svn: 338244
2018-07-30 12:16:58 +00:00
Matt Arsenault 0183c56c11 AMDGPU: Fix code size for return_to_epilog pseudo
llvm-svn: 338113
2018-07-27 09:15:03 +00:00
Tom Stellard c5a154db48 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

llvm-svn: 335942
2018-06-28 23:47:12 +00:00
Stanislav Mekhanoshin 298a61590a [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16
Differential Revision: https://reviews.llvm.org/D48677

llvm-svn: 335866
2018-06-28 15:24:46 +00:00
Nicolai Haehnle b3a9b68513 AMDGPU: Add implicit def of SCC to kill and indirect pseudos
Summary:
Kill instructions sometimes do use SCC in unusual circumstances, when
v_cmpx cannot be used due to the operands that are involved.

Additionally, even if SCC was never defined by the expansion, kill pseudos
could previously occur between an s_cmp and an s_cbranch_scc, which breaks
the SCC liveness tracking when the pseudo is expanded to split the basic
block. While it would be possible to explicitly mark the SCC as live-in for
the successor basic block, it's simpler to just mark the pseudo as using SCC,
so that such a sequence is never emitted by instruction selection in the
first place.

A similar issue affects indirect source/dest pseudos in principle, although
I haven't been able to come up with a test case where it actually matters
(this affects instruction selection, so a MIR test can't be used).

Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always
Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47761

llvm-svn: 335223
2018-06-21 13:36:08 +00:00
Matt Arsenault 5a4ec8127f AMDGPU: Fix scalar_to_vector for v4i16/v4f16
llvm-svn: 335161
2018-06-20 19:45:48 +00:00
Matt Arsenault 02dc7e19e2 AMDGPU: Make v4i16/v4f16 legal
Some image loads return these, and it's awkward working
around them not being legal.

llvm-svn: 334835
2018-06-15 15:15:46 +00:00
Matt Arsenault 697300bd4f AMDGPU: Use scalar operations for f16 fabs/fneg patterns
Fixes unnecessary differences between subtargets.

llvm-svn: 334184
2018-06-07 10:15:20 +00:00
Matt Arsenault e9524f1fb3 AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16
Fixes terrible code on targets without f16 support. The
legalization creates a mess that is difficult to recover
from. Also should avoid randomly breaking these tests
multiple times in sequence in future commits.

Some regressions in cases where it happens to be better
to pull the source modifier after the conversion.

llvm-svn: 334132
2018-06-06 21:28:11 +00:00
Matt Arsenault 606bc315d6 AMDGPU: Fix v2f16 fneg/fabs pattern
The integer operation convertion for some reason only happens
if the source is a bitcast from an integer, which happens to
always be the situation when the result is loaded. Add
an additional pattern for when the source operation is really
an FP operation.

llvm-svn: 333019
2018-05-22 20:13:34 +00:00
Matt Arsenault 1349a04ef5 AMDGPU: Make v2i16/v2f16 legal on VI
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.

Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.

llvm-svn: 332953
2018-05-22 06:32:10 +00:00
Matt Arsenault 0084adc516 AMDGPU: Add Vega12 and Vega20
Changes by
  Matt Arsenault
  Konstantin Zhuravlyov

llvm-svn: 331215
2018-04-30 19:08:16 +00:00
Matt Arsenault 8474803c7c AMDGPU: Consolidate SubtargetPredicate definitions
llvm-svn: 330979
2018-04-26 19:21:26 +00:00
Tom Stellard a2be8f4c35 AMDGPU: Remove deprecated llvm.AMDGPU.kilp intrinsic
Summary: This is no longer used by mesa since its 18.0.0 release.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D45988

llvm-svn: 330775
2018-04-24 21:37:57 +00:00
Dmitry Preobrazhensky 4c8f4234b6 [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP opcodes
See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751

Differential Revision: https://reviews.llvm.org/D44529

Reviewers: artem.tamazov, arsenm
llvm-svn: 327723
2018-03-16 16:38:04 +00:00
Jan Vesely b283ea0f0f AMDGPU/GCN: Promote i16 ctpop
i16 capable ASICs do not support i16 operands for this instruction.
Add tablegen pattern to merge chained i16 additions.

Differential Revision: https://reviews.llvm.org/D43985

llvm-svn: 326535
2018-03-02 02:50:22 +00:00
Matt Arsenault 0063ce7201 AMDGPU: Remove tied operand from si_else
llvm-svn: 324751
2018-02-09 17:18:38 +00:00
Matt Arsenault a18b3bcf51 AMDGPU: Select BFI patterns with 64-bit ints
llvm-svn: 324431
2018-02-07 00:21:34 +00:00
Matt Arsenault af88f0eb44 AMDGPU: Fix missing SCC def from s_xor_b64_term
llvm-svn: 323927
2018-01-31 22:54:27 +00:00
Matt Arsenault 84445dd13c AMDGPU: Use gfx9 carry-less add/sub instructions
llvm-svn: 319491
2017-11-30 22:51:26 +00:00
Dmitry Preobrazhensky a0342dc9eb [AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765

Reviewers: tamazov, SamWot, arsenm, vpykhtin

Differential Revision: https://reviews.llvm.org/D40088

llvm-svn: 318675
2017-11-20 18:24:21 +00:00
Matt Arsenault 301162c4fe AMDGPU: Replace i64 add/sub lowering
Use VOP3 add/addc like usual.

This has some tradeoffs. Inline immediates fold
a little better, but other constants are worse off.
SIShrinkInstructions could be made smarter to handle
these cases.

This allows us to avoid selecting scalar adds where we
need to track the carry in scc and replace its users.
This makes it easier to use the carryless VALU adds.

llvm-svn: 318340
2017-11-15 21:51:43 +00:00
Marek Olsak ce76ea0394 AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.

Also allow kill in all shader stages.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38544

llvm-svn: 316427
2017-10-24 10:27:13 +00:00
Matt Arsenault cc85223f87 AMDGPU: Fix incorrect selection of pseudo-branches
These should only be used if the machine structurizer is enabled.

llvm-svn: 315357
2017-10-10 20:22:07 +00:00
Matt Arsenault 90c7593a75 AMDGPU: Remove global isGCN predicates
These are problematic because they apply to everything,
and can easily clobber whatever more specific predicate
you are trying to add to a function.

Currently instructions use SubtargetPredicate/PredicateControl
to apply this to patterns applied to an instruction definition,
but not to free standing Pats. Add a wrapper around Pat
so the special PredicateControls requirements can be appended
to the final predicate list like how Mips does it.

llvm-svn: 314742
2017-10-03 00:06:41 +00:00
Stanislav Mekhanoshin 442e28dd42 [AMDGPU] Use v_pk_max_f16 for fcanonicalize
Differential Revision: https://reviews.llvm.org/D37325

llvm-svn: 312676
2017-09-06 22:27:29 +00:00
Stanislav Mekhanoshin ea134bcb13 [AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize
Differential Revision: https://reviews.llvm.org/D37522

llvm-svn: 312660
2017-09-06 18:29:51 +00:00
Matt Arsenault fe003f347a AMDGPU: Turn int pack pattern into build_vector
build_vector is a more useful canonical form when
pattern matching packed operations, so turn shift
into high element into a build_vector.

Should show no change for now.

llvm-svn: 312282
2017-08-31 21:17:22 +00:00
Stanislav Mekhanoshin 06cab79e50 [AMDGPU] Use v_max_f* for fcanonicalize
If denorms are not flushed we can use max instead of multiplication
by 1. For double that is simply faster, while for float and half
it is shorter, because mul uses constant bus and VOP3.

Differential Revision: https://reviews.llvm.org/D36856

llvm-svn: 312095
2017-08-30 03:03:38 +00:00
Matt Arsenault 6b114d2c50 AMDGPU: Select clamp pattern with v2f16
llvm-svn: 312087
2017-08-30 01:20:17 +00:00
Dmitry Preobrazhensky ff64aa514b [AMDGPU][MC][GFX9] Added integer clamping support for VOP3 opcodes
See Bug 34152: https://bugs.llvm.org//show_bug.cgi?id=34152

Reviewers: SamWot, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D36674

llvm-svn: 311006
2017-08-16 13:51:56 +00:00
Matt Arsenault 71bcbd451f AMDGPU: Start adding tail call support
Handle the sibling call cases.

llvm-svn: 310753
2017-08-11 20:42:08 +00:00
Connor Abbott 66b9bd6e50 [AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic
Summary:
This intrinsic lets us set inactive lanes to an identity value when
implementing wavefront reductions. In combination with Whole Wavefront
Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan.
Lowering the intrinsic needs to happen post-RA so that RA knows that the
destination isn't completely overwritten due to the EXEC shenanigans, so
we need another pseudo-instruction to represent the un-lowered
intrinsic.

Reviewers: tstellar, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D34719

llvm-svn: 310088
2017-08-04 18:36:54 +00:00
Connor Abbott 92638ab625 [AMDGPU] Add support for Whole Wavefront Mode
Summary:
Whole Wavefront Wode (WWM) is similar to WQM, except that all of the
lanes are always enabled, regardless of control flow. This is required
for implementing wavefront reductions in non-uniform control flow, where
we need to use the inactive lanes to propagate intermediate results, so
they need to be enabled. We need to propagate WWM to uses (unless
they're explicitly marked as exact) so that they also propagate
intermediate results correctly. We do the analysis and exec mask munging
during the WQM pass, since there are interactions with WQM for things
that require both WQM and WWM. For simplicity, WWM is entirely
block-local -- blocks are never WWM on entry or exit of a block, and WWM
is not propagated to the block level.  This means that computations
involving WWM cannot involve control flow, but we only ever plan to use
WWM for a few limited purposes (none of which involve control flow)
anyways.

Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There
isn't yet a way to turn WWM off -- that will be added in a future
change.

Finally, it turns out that turning on inactive lanes causes a number of
problems with register allocation. While the best long-term solution
seems like teaching LLVM's register allocator about predication, for now
we need to add some hacks to prevent ourselves from getting into trouble
due to constraints that aren't currently expressed in LLVM. For the gory
details, see the comments at the top of SIFixWWMLiveness.cpp.

Reviewers: arsenm, nhaehnle, tpr

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D35524

llvm-svn: 310087
2017-08-04 18:36:52 +00:00
Connor Abbott 8c217d0a29 [AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM
Summary:
Previously, we assumed that certain types of instructions needed WQM in
pixel shaders, particularly DS instructions and image sampling
instructions. This was ok because with OpenGL, the assumption was
correct. But we want to start using DPP instructions for derivatives as
well as other things, so the assumption that we can infer whether to use
WQM based on the instruction won't continue to hold. This intrinsic lets
frontends like Mesa indicate what things need WQM based on their
knowledge of the API, rather than second-guessing them in the backend.
We need to keep around the old method of enabling WQM, but eventually we
should remove it once Mesa catches up. For now, this will let us use DPP
instructions for computing derivatives correctly.

Reviewers: arsenm, tpr, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35167

llvm-svn: 310085
2017-08-04 18:36:49 +00:00
Matt Arsenault 6ed7b9bfc0 AMDGPU: Analyze callee resource usage in AsmPrinter
llvm-svn: 309781
2017-08-02 01:31:28 +00:00
Matt Arsenault b62a4eb524 AMDGPU: Initial implementation of calls
Includes a hack to fix the type selected for
the GlobalAddress of the function, which will be
fixed by changing the default datalayout to use
generic pointers for 0.

llvm-svn: 309732
2017-08-01 19:54:18 +00:00
Konstantin Zhuravlyov 070d88e335 AMDGPU: Introduce maybeAtomic instruction flag
Testing is in the follow up change

llvm-svn: 308779
2017-07-21 21:05:45 +00:00
Dmitry Preobrazhensky abf2839478 [AMDGPU][MC][GFX9] Added support of VOP3 'op_sel' modifier
See bug 33591: https://bugs.llvm.org//show_bug.cgi?id=33591

Reviewers: vpykhtin, artem.tamazov, SamWot, arsenm

Differential Revision: https://reviews.llvm.org/D35424

llvm-svn: 308740
2017-07-21 13:54:11 +00:00
Matt Arsenault c37fe66ec5 AMDGPU: Add encoding for carryless add/sub instructions
llvm-svn: 308639
2017-07-20 17:42:47 +00:00
Sam Kolton 4685b70a77 [AMDGPU] resubmit r308179: CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions
llvm-svn: 308310
2017-07-18 14:23:26 +00:00
Chandler Carruth 9a7442d088 Revert r308179 which causes tablegen to spam stderr on every build.
Original commit log:
[AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions

llvm-svn: 308270
2017-07-18 07:40:47 +00:00
Sam Kolton a2b9e2f755 [AMDGPU] CodeGen: check dst operand type to determine if omod is supported for VOP3 instructions
Summary:
Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod.
Changed .td files to check if dst operand instead of src operand.

Reviewers: arsenm, vpykhtin

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D35350

llvm-svn: 308179
2017-07-17 14:23:38 +00:00
Stanislav Mekhanoshin d445455643 [AMDGPU] Add pattern for v_alignbit_b32 with immediate
If immediate in shift is less than 32 we can use alignbit too.

Differential Revision: https://reviews.llvm.org/D34729

llvm-svn: 306500
2017-06-28 02:52:39 +00:00
Stanislav Mekhanoshin e8bf6c9629 [AMDGPU] Add 2 new alignbit patterns
Differential Revision: https://reviews.llvm.org/D34655

llvm-svn: 306449
2017-06-27 19:10:47 +00:00
Jan Sjodin a06bfe054e Re-submit AMDGPUMachineCFGStructurizer.
Differential Revision: https://reviews.llvm.org/D23209

llvm-svn: 303111
2017-05-15 20:18:37 +00:00
Jan Sjodin 0e289822fa Revert 303091.
llvm-svn: 303098
2017-05-15 18:39:47 +00:00
Jan Sjodin e9d2ddc9dd Add AMDGPUMachineCFGStructurizer.
Differential Revision: https://reviews.llvm.org/D23209

llvm-svn: 303091
2017-05-15 18:13:56 +00:00