Commit Graph

2814 Commits

Author SHA1 Message Date
Tom Stellard 5bfbae5cb1 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

llvm-svn: 336851
2018-07-11 20:59:01 +00:00
Konstantin Zhuravlyov bde59e9989 AMDGPU/NFC: Use already available explicit kernarg
size instead of calculating it again when filling
out the metadata.

llvm-svn: 336825
2018-07-11 17:27:17 +00:00
Richard Trieu d5e57ed9c2 Fix -Wmismatched-tags warning
class -> struct in forward declaration.

llvm-svn: 336733
2018-07-10 22:09:33 +00:00
Scott Linder 01ce144ddf [AMDGPU] Fix layering issue with AMDGPUHSAMetadataStreamer (NFC)
llvm-svn: 336722
2018-07-10 20:07:22 +00:00
Scott Linder 2ad2c18b82 [AMDGPU] Refactor HSAMetadataStream::emitKernel (NFC)
Move all metadata construction into AMDGPUHSAMetadataStreamer.

Differential Revision: https://reviews.llvm.org/D48176

llvm-svn: 336707
2018-07-10 17:31:32 +00:00
Konstantin Zhuravlyov f0badd5ac1 AMDGPU: Make hidden argument metadata consistent with
amdgpu-implicitarg-num-bytes attribute

Differential Revision: https://reviews.llvm.org/D49096

llvm-svn: 336697
2018-07-10 16:12:51 +00:00
Matt Arsenault a680199a96 Reapply "AMDGPU: Force inlining if LDS global address is used"
This reverts commit r336623

llvm-svn: 336675
2018-07-10 14:03:41 +00:00
Vlad Tsyrklevich 688e752207 Revert "AMDGPU: Force inlining if LDS global address is used"
This reverts commit r336587, it was causing test failures on the
sanitizer bots.

llvm-svn: 336623
2018-07-10 00:46:07 +00:00
Mark Searles 5bfd8d8991 [AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error
Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com>

Fixes the following building error:

external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61:
error: comparison of integers of different signs:
'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type'
(aka 'int') and 'unsigned int' [-Werror,-Wsign-compare]
                      BlockWaitcntProcessedSet.end(), &MBB) < Count)) {
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~
1 error generated.

Differential Revision: https://reviews.llvm.org/D49089

llvm-svn: 336588
2018-07-09 19:28:14 +00:00
Matt Arsenault 40cb6cab56 AMDGPU: Force inlining if LDS global address is used
These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.

Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.

llvm-svn: 336587
2018-07-09 19:22:22 +00:00
Tom Stellard ec4feae1b6 AMDGPU: Fix UBSan error caused by r335942
Summary: Fixes PR38071.

Reviewers: arsenm, dstenb

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48979

llvm-svn: 336448
2018-07-06 17:16:17 +00:00
Matt Arsenault 29f303799b AMDGPU/GlobalISel: Implement custom kernel arg lowering
Avoid using allocateKernArg / AssignFn. We do not want any
of the type splitting properties of normal calling convention
lowering.

For now at least this exists alongside the IR argument lowering
pass. This is necessary to handle struct padding correctly while
some arguments are still skipped by the IR argument lowering
pass.

llvm-svn: 336373
2018-07-05 17:01:20 +00:00
Ryan Taylor 5f04458a61 [AMDGPU] Add VALU to V_INTERP Instructions
Wait states are not properly being inserted after buffer_store for v_interp instructions.

Add VALU to V_INTERP instructions so that the GCNHazardRecognizer can
check and insert the appropriate wait states when needed.

Differential Revision: https://reviews.llvm.org/D48772

Change-Id: Id540c9b074fc69b5c1de6b182276aa089c74aa64
llvm-svn: 336339
2018-07-05 12:02:07 +00:00
Piotr Padlewski 5b3db45e8f Implement strip.invariant.group
Summary:
This patch introduce new intrinsic -
strip.invariant.group that was described in the
RFC: Devirtualization v2

Reviewers: rsmith, hfinkel, nlopes, sanjoy, amharc, kuhar

Subscribers: arsenm, nhaehnle, JDevlieghere, hiraditya, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47103

Co-authored-by: Krzysztof Pszeniczny <krzysztof.pszeniczny@gmail.com>
llvm-svn: 336073
2018-07-02 04:49:30 +00:00
Tom Stellard eebbfc2809 AMDGPU/GlobalISel: Make IMPLICIT_DEF of all sizes < 512 legal.
Summary:
We could split sizes that are not power of two into smaller sized
G_IMPLICIT_DEF instructions, but this ends up generating
G_MERGE_VALUES instructions which we then have to handle in the instruction
selector.  Since G_IMPLICIT_DEF is really a no-op it's easier just to
keep everything that can fit into a register legal.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48777

llvm-svn: 336041
2018-06-30 04:09:44 +00:00
Matt Arsenault f5be3ad7f8 AMDGPU: Don't use struct type for argument layout
This was introducing unnecessary padding after the explicit
arguments, depending on the alignment of the total struct type.
Also has the side effect of avoiding creating an extra GEP for
the offset from the base kernel argument to the explicit kernel
argument offset.

llvm-svn: 335999
2018-06-29 17:31:42 +00:00
Stanislav Mekhanoshin 20d4795d93 [AMDGPU] Enable LICM in the BE pipeline
This allows to hoist code portion to compute reciprocal of loop
invariant denominator in integer division after codegen prepare
expansion.

Differential Revision: https://reviews.llvm.org/D48604

llvm-svn: 335988
2018-06-29 16:26:53 +00:00
Tom Stellard c5a154db48 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

llvm-svn: 335942
2018-06-28 23:47:12 +00:00
Stanislav Mekhanoshin 67aa18f165 [AMDGPU] Early expansion of 32 bit udiv/urem
This allows hoisting of a common code, for instance if denominator
is loop invariant. Current change is expansion only, adding licm to
the target pass list going to be a separate patch. Given this patch
changes to codegen are minor as the expansion is similar to that on
DAG. DAG expansion still must remain for R600.

Differential Revision: https://reviews.llvm.org/D48586

llvm-svn: 335868
2018-06-28 15:59:18 +00:00
Stanislav Mekhanoshin 298a61590a [AMDGPU] Overload llvm.amdgcn.fmad.ftz to support f16
Differential Revision: https://reviews.llvm.org/D48677

llvm-svn: 335866
2018-06-28 15:24:46 +00:00
Matt Arsenault 75e7192ba3 AMDGPU: Remove MFI::ABIArgOffset
We have too many mechanisms for tracking the various offsets
used for kernel arguments, so remove one. There's still a lot of
confusion with these because there are two different "implicit"
argument areas located at the beginning and end of the kernarg
segment.

Additionally, the offset was determined based on the memory
size of the split element types. This would break in a future
commit where v3i32 is decomposed into separate i32 pieces.

llvm-svn: 335830
2018-06-28 10:18:55 +00:00
Matt Arsenault 1fb9013368 AMDGPU: Error on calls from graphics shaders
In principle nothing should stop these from working, but
work is necessary to create an ABI for dealing with the stack
related registers.

llvm-svn: 335829
2018-06-28 10:18:36 +00:00
Matt Arsenault 12269dda5c AMDGPU: Fix AMDGPUCodeGenPrepare using uninitialized AMDGPUAS struct
Not sure how this wasn't noticed before.

llvm-svn: 335828
2018-06-28 10:18:23 +00:00
Matt Arsenault 513e0c0ea4 AMDGPU: Fix assert on aggregate type kernel arguments
Just fix the crash for now by not doing the optimization since
figuring out how to properly convert the bits for an arbitrary
struct is a pain.

Also fix a crash when there is only an empty struct argument.

llvm-svn: 335827
2018-06-28 10:18:11 +00:00
Stanislav Mekhanoshin 1a1687f1bb [AMDGPU] Convert rcp to rcp_iflag
If a source of rcp instruction is a result of any conversion from
an integer convert it into rcp_iflag instruction. No FP exception
can ever happen except division by zero if a single precision rcp
argument is a representation of an integral number.

Differential Revision: https://reviews.llvm.org/D48569

llvm-svn: 335742
2018-06-27 15:33:33 +00:00
Konstantin Zhuravlyov 30f03b3bc0 AMDGPU/NFC: Fix typo in comment
llvm-svn: 335707
2018-06-27 05:36:03 +00:00
Konstantin Zhuravlyov 777477705a AMDGPU: Silence unused warnings in waitcnt insertion pass in release build
Differential Revision: https://reviews.llvm.org/D48607

llvm-svn: 335669
2018-06-26 21:33:38 +00:00
Stanislav Mekhanoshin dacda79ee6 [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsic
This intrinsic selects v_mad_f32 regardless of fp32 denorm support.

Differential Revision: https://reviews.llvm.org/D48573

llvm-svn: 335654
2018-06-26 20:04:19 +00:00
Matt Arsenault 8c4a35237a AMDGPU: Add pass to lower kernel arguments to loads
This replaces most argument uses with loads, but for
now not all.

The code in SelectionDAG for calling convention lowering
is actively harmful for amdgpu_kernel. It attempts to
split the argument types into register legal types, which
results in low quality code for arbitary types. Since
all kernel arguments are passed in memory, we just want the
raw types.

I've tried a couple of methods of mitigating this in SelectionDAG,
but it's easier to just bypass this problem alltogether. It's
possible to hack around the problem in the initial lowering,
but the real problem is the DAG then expects to be able to use
CopyToReg/CopyFromReg for uses of the arguments outside the block.

Exposing the argument loads in the IR also has the advantage
that the LoadStoreVectorizer can merge them.

I'm not sure the best approach to dealing with the IR
argument list is. The patch as-is just leaves the IR arguments
in place, so all the existing code will still compute the same
kernarg size and pointlessly lowers the arguments.

Arguably the frontend should emit kernels with an empty argument
list in the first place. Alternatively a dummy array could be
inserted as a single argument just to reserve space.

This does have some disadvantages. Local pointer kernel arguments can
no longer have AssertZext placed  on them as the equivalent !range
metadata is not valid on pointer  typed loads. This is mostly bad
for SI which needs to know about the known bits in order to use the
DS instruction offset, so in this case this is not done.

More importantly, this skips noalias arguments since this pass
does not yet convert this to the equivalent !alias.scope and !noalias
metadata. Producing this metadata correctly seems to be tricky,
although this logically is the same as inlining into a function which
doesn't exist. Additionally, exposing these loads to the vectorizer
may result in degraded aliasing information if a pointer load is
merged with another argument load.

I'm also not entirely sure this is preserving the current clover
ABI, although I would greatly prefer if it would stop widening
arguments and match the HSA ABI. As-is I think it is extending
< 4-byte arguments to 4-bytes but doesn't align them to 4-bytes.

llvm-svn: 335650
2018-06-26 19:10:00 +00:00
Matt Arsenault b1cc4f52ff AMDGPU/GlobalISel: Add support for llvm.amdgcn.kernarg.segment.ptr
Note a normal select test is not currently possible because this
relies on input registers tracked in SIMachineFunctionInfo which
are not currently serializable in MIR, but this does work end-to-end
from the IR.

llvm-svn: 335490
2018-06-25 16:17:48 +00:00
Matt Arsenault 2811a20f77 AMDGPU: Remove commented out code
llvm-svn: 335486
2018-06-25 15:42:20 +00:00
Matt Arsenault b3feccd7fa AMDGPU/GlobalISel: Fix G_IMPLICIT_DEF for pointers
llvm-svn: 335485
2018-06-25 15:42:12 +00:00
Matt Arsenault 73eeb42e50 AMDGPU: Respect align argument parameter
This should avoid relying on the pointee type
to get the alignment, particularly since pointee
types are supposed to be removed at some point.

Also fixes not getting the alignment for unsized types.

llvm-svn: 335478
2018-06-25 14:29:04 +00:00
Reid Kleckner fd7c9ab971 [AMDGPU] Update includes for intrinsic changes :(
llvm-svn: 335409
2018-06-23 03:05:39 +00:00
Reid Kleckner f5890e4e43 [IR] Split Intrinsics.inc into enums and implementations
Implements PR34259

Intrinsics.h is a very popular header. Most LLVM TUs care about things
like dbg_value, but they don't care how they are implemented. After I
split these out, IntrinsicImpl.inc is 1.7 MB, so this saves each LLVM TU
from scanning 1.7 MB of source that gets pre-processed away.

It also means we can modify intrinsic properties without triggering a
full rebuild, but that's probably less of a win.

I think the next best thing to do would be to split out the target
intrinsics into their own header. Very, very few TUs care about
target-specific intrinsics. It's very hard to split up the target
independent intrinsics like llvm.expect, assume, and dbg.value, though.

llvm-svn: 335407
2018-06-23 02:02:38 +00:00
Matt Arsenault 3f8e7a3dbc AMDGPU: Add patterns for i32/i64 local atomic load/store
Not sure why the 32/64 split is needed in the atomic_load
store hierarchies. The regular PatFrags do this, but we don't
do it for the existing handling for global.

llvm-svn: 335325
2018-06-22 08:39:52 +00:00
Tom Stellard 6af7307650 AMDGPU/GlobalISel: Default to using TableGen'd instruction selector
Summary:
We can select all instructions that are marked as legal in a full piglit run,
so now is a good time to make the TableGen'd instruction selector default
for all opcodes.  This is NFC for a full piglit run, which is why there are
no tests.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48198

llvm-svn: 335319
2018-06-22 03:04:35 +00:00
Tom Stellard 26fac0f8e1 AMDGPU/GlobalISel: legalize and select 32-bit G_ASHR
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D48196

llvm-svn: 335318
2018-06-22 02:54:57 +00:00
Tom Stellard 9a6535718e AMDGPU/GlobalISel: legalize and select 32-bit G_SITOFP
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48195

llvm-svn: 335316
2018-06-22 02:34:29 +00:00
Tom Stellard 7712ee8891 AMDGPU/GlobalISel: Implement select() for COPY
Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46151

llvm-svn: 335315
2018-06-22 00:44:29 +00:00
Tom Stellard 3f1c6fe156 AMDGPU/GlobalISel: Implement select() for G_IMPLICIT_DEF
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46150

llvm-svn: 335307
2018-06-21 23:38:20 +00:00
Konstantin Zhuravlyov e004b3d97b AMDGPU: Remove ability to reserve VGPRs for debugger
Differential Revision: https://reviews.llvm.org/D48234

llvm-svn: 335288
2018-06-21 20:28:19 +00:00
Scott Linder 1e8c2c705d [AMDGPU] Update assembler for HSA Code Object v3
Update AMDGPU assembler syntax behind the code-object-v3 feature:

* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Provide more diagnostics (e.g. values out of range, missing values, repeated
  values).
* Provide path for backwards compatibility, even with underlying descriptor
  changes.

Differential Revision: https://reviews.llvm.org/D47736

llvm-svn: 335281
2018-06-21 19:38:56 +00:00
Scott Linder 5792dd0f39 [AMDGPU] Fix bug with tracking processed blocks in SIInsertWaitcnts
BlockWaitcntProcessedSet was not being cleared between calls, so it was
producing incorrect counts in cases where MBB addresses happened to coincide
across multiple calls.

Differential Revision: https://reviews.llvm.org/D48391

llvm-svn: 335268
2018-06-21 18:48:48 +00:00
Konstantin Zhuravlyov 766c77efd7 AMDGPU/AMDHSA: Remove GridWorkGroupCountX/Y/Z
and everything that comes with it from implementation
and v3 header files.

Leave definition in v2 header files for backwards
compatibility.

Differential Revision: https://reviews.llvm.org/D48191

llvm-svn: 335267
2018-06-21 18:36:04 +00:00
Nicolai Haehnle 15745ba5c1 AMDGPU: Remove redundant MIMG instruction variants
Summary:
For sample and gather ops, we can accurately determine the set of
vaddr-size instruction variants that are required. This reduces
the size of instruction tables by ~5%.

The number of machine instruction opcodes is reduced from 10002
to 9476.

Change-Id: Ie7fc65d3657b762c7816017fe70b2e9bec644a8a

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D48168

llvm-svn: 335232
2018-06-21 13:37:55 +00:00
Nicolai Haehnle db6911a6f9 AMDGPU: Remove old-style image intrinsics
Summary:
This also removes the need for atomic pseudo instructions, since
we select the correct encoding directly in SITargetLowering::lowerImage
for dimension-aware image intrinsics.

Mesa uses dimension-aware image intrinsics since
commit a9a7993441.

Change-Id: I7473d20009476a4ed6d919cae4e6dca9ff42e77a

Reviewers: arsenm, rampitec, mareko, tpr, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48167

llvm-svn: 335231
2018-06-21 13:37:45 +00:00
Nicolai Haehnle 7a9c03f484 AMDGPU: Select MIMG instructions manually in SITargetLowering
Summary:
Having TableGen patterns for image intrinsics is hitting limitations:
for D16 we already have to manually pre-lower the packing of data
values, and we will have to do the same for A16 eventually.

Since there is already some custom C++ code anyway, it is arguably easier
to just do everything in C++, now that we can use the beefed-up generic
tables backend of TableGen to provide all the required metadata and map
intrinsics to corresponding opcodes. With this approach, all image
intrinsic lowering happens in SITargetLowering::lowerImage. That code is
dense due to all the cases that it handles, but it should still be easier
to follow than what we had before, by virtue of it all being done in a
single location, and by virtue of not relying on the TableGen pattern
magic that very few people really understand.

This means that we will have MachineSDNodes with MIMG instructions
during DAG combining, but that seems alright: previously we had
intrinsic nodes instead, but those are similarly opaque to the generic
CodeGen infrastructure, and the final pattern matching just did a 1:1
translation to machine instructions anyway. If anything, the fact that
we now merge the address words into a vector before DAG combine should
be an advantage.

Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6

Reviewers: arsenm, rampitec, rtaylor, tstellar

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48017

llvm-svn: 335228
2018-06-21 13:36:57 +00:00
Nicolai Haehnle 0ab200b6c9 AMDGPU: Refactor MIMG instruction TableGen using generic tables
Summary:
This allows us to access rich information about MIMG opcodes from C++ code.
Simplifying the mapping between equivalent opcodes of different data size
becomes quite natural.

This also flattens the MIMG-related class and multiclass hierarchy a little,
and collapses together some of the scaffolding for sample and gather4 opcodes.

Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48016

llvm-svn: 335227
2018-06-21 13:36:44 +00:00
Nicolai Haehnle e741d7e0fd AMDGPU: Use generic tables instead of SearchableTable
Summary:

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48014

Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641
llvm-svn: 335226
2018-06-21 13:36:33 +00:00
Nicolai Haehnle 2367f03565 AMDGPU: Pass AMDGPUSampleVariant to MIMG_{Sampler,Gather}(_WQM)
Summary:
This will allows us to provide rich metadata about the instructions
in tables that are accessible by custom C++ code.

Change-Id: Id9305a26304ab6a6cceb6c65c8cd49141cc0101d

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48011

llvm-svn: 335224
2018-06-21 13:36:13 +00:00
Nicolai Haehnle b3a9b68513 AMDGPU: Add implicit def of SCC to kill and indirect pseudos
Summary:
Kill instructions sometimes do use SCC in unusual circumstances, when
v_cmpx cannot be used due to the operands that are involved.

Additionally, even if SCC was never defined by the expansion, kill pseudos
could previously occur between an s_cmp and an s_cbranch_scc, which breaks
the SCC liveness tracking when the pseudo is expanded to split the basic
block. While it would be possible to explicitly mark the SCC as live-in for
the successor basic block, it's simpler to just mark the pseudo as using SCC,
so that such a sequence is never emitted by instruction selection in the
first place.

A similar issue affects indirect source/dest pseudos in principle, although
I haven't been able to come up with a test case where it actually matters
(this affects instruction selection, so a MIR test can't be used).

Fixes: dEQP-GLES3.functional.shaders.discard.dynamic_loop_always
Change-Id: Ica8d82ecff1a763b892a1112cf1b06c948863a4f

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47761

llvm-svn: 335223
2018-06-21 13:36:08 +00:00
Nicolai Haehnle f267431901 AMDGPU: Turn D16 for MIMG instructions into a regular operand
Summary:
This allows us to reduce the number of different machine instruction
opcodes, which reduces the table sizes and helps flatten the TableGen
multiclass hierarchies.

We can do this because for each hardware MIMG opcode, we have a full set
of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata
and vaddr registers. Instead of having separate D16 machine instructions,
a packed D16 instructions loading e.g. 4 components can simply use the
same V2 opcode variant that non-D16 instructions use.

We still require a TSFlag for D16 buffer instructions, because the
D16-ness of buffer instructions is part of the opcode. Renaming the flag
should help avoid future confusion.

The one non-obvious code change is that for gather4 instructions, the
disassembler can no longer automatically decide whether to use a V2 or
a V4 variant. The existing logic which choose the correct variant for
other MIMG instruction is extended to cover gather4 as well.

As a bonus, some of the assembler error messages are now more helpful
(e.g., complaining about a wrong data size instead of a non-existing
instruction).

While we're at it, delete a whole bunch of dead legacy TableGen code.

Change-Id: I89b02c2841c06f95e662541433e597f5d4553978

Reviewers: arsenm, rampitec, kzhuravl, artem.tamazov, dp, rtaylor

Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47434

llvm-svn: 335222
2018-06-21 13:36:01 +00:00
Matt Arsenault 5a4ec8127f AMDGPU: Fix scalar_to_vector for v4i16/v4f16
llvm-svn: 335161
2018-06-20 19:45:48 +00:00
Matt Arsenault 3d06668ad4 AMDGPU: Fix missing C++ mode comment
llvm-svn: 335160
2018-06-20 19:45:40 +00:00
Stanislav Mekhanoshin 3b11794dbf [AMDGPU] setcc (select cc, CT, CF), CF, eq | ne -> xor cc, -1 | cc
This is the common case in the BE when we serialize condition and then
rematerialize it. Use either original or inverted condition.

Differential Revision: https://reviews.llvm.org/D48246

llvm-svn: 334882
2018-06-16 03:46:59 +00:00
Matt Arsenault 63bc0e3cb9 AMDGPU: Add combine for short vector extract_vector_elts
Try to access pieces 4 bytes at a time. This helps
various hasOneUse extract_vector_elt combines, such
as load width reductions.

Avoids test regressions in a future commit.

llvm-svn: 334836
2018-06-15 15:31:36 +00:00
Matt Arsenault 02dc7e19e2 AMDGPU: Make v4i16/v4f16 legal
Some image loads return these, and it's awkward working
around them not being legal.

llvm-svn: 334835
2018-06-15 15:15:46 +00:00
Roman Lebedev dec562c849 [AMDGPU] Recognize x & ~(-1 << y) pattern.
Summary: The same pattern as D48010, but this one is IR-canonical as of D47428.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48012

llvm-svn: 334817
2018-06-15 09:56:45 +00:00
Roman Lebedev 9c17dad8f2 [AMDGPU] Recognize x & ((1 << y) - 1) pattern.
Summary:
As a followup for D48007.

Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern,
which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`),
i think also handling a pattern that is ub for `y == bitwidth` should be fine.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48010

llvm-svn: 334816
2018-06-15 09:56:39 +00:00
Roman Lebedev aa8587d1fc [AMDGPU] Recognize x & (-1 >> (32 - y)) pattern.
Summary:
D47980 will canonicalize the `x << (32 - y) >> (32 - y)`,
which is the pattern the AMDGPU expects to `x &  (-1 >> (32 - y))`,
which is not recognized by AMDGPU.

Thus, it needs to be recognized, too.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48007

llvm-svn: 334815
2018-06-15 09:56:31 +00:00
Tom Stellard a92847359a AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45907

llvm-svn: 334757
2018-06-14 19:26:37 +00:00
Tom Stellard 46bbbc33c0 AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46171

llvm-svn: 334665
2018-06-13 22:30:47 +00:00
Stanislav Mekhanoshin 7bec57300c [AMDGPU] Corrected computeKnownBits for V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48133

llvm-svn: 334640
2018-06-13 18:52:54 +00:00
Yaxun Liu fb17bf60dd [AMDGPU] Change enqueue kernel handle type
Currently the handle type is a global pointer which holds 8 bytes.
We need a larger type which hold 16 bytes, therefore change it
to [i64 x 2].

Differential Revision: https://reviews.llvm.org/D48094

llvm-svn: 334625
2018-06-13 17:31:51 +00:00
Dmitry Preobrazhensky 32c6b5cb70 [AMDGPU][MC] Enabled parsing of relocations on VALU instructions
See bug 37566: https://bugs.llvm.org/show_bug.cgi?id=37566

Reviewers: artem.tamazov, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D47884

llvm-svn: 334622
2018-06-13 17:02:03 +00:00
Dmitry Preobrazhensky ffbee7acdc [AMDGPU][MC][GFX8][GFX9] Allow LDS direct reads for BUFFER_LOAD_DWORDX2/X3/X4
See bug 37653: https://bugs.llvm.org/show_bug.cgi?id=37653

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D47885

llvm-svn: 334609
2018-06-13 15:32:46 +00:00
Tom Stellard 264c171f36 AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering
Summary:
The code that handles ISD:Register and ISD::CopyFromReg assumes
the target is amdgcn, so this is broken on r600.  We don't
need this analysis on r600 anyway so we can safely move
it to SITargetLowering.

Reviewers: alex-t, arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46298

llvm-svn: 334607
2018-06-13 15:06:37 +00:00
Stanislav Mekhanoshin 8fd3c4e431 [AMDGPU] DAG combine to produce V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48099

llvm-svn: 334559
2018-06-12 23:50:37 +00:00
Konstantin Zhuravlyov ce25bc3e82 AMDHSA/NFC: Code object v3 updates (additional):
- Move section selection and alignment to AMDGPUAsmPrinter

llvm-svn: 334521
2018-06-12 18:33:51 +00:00
Konstantin Zhuravlyov 00f2cb1116 AMDHSA: Code object v3 updates
- Do not emit following assembler directives:
  - .hsa_code_object_version
  - .hsa_code_object_isa
  - .amd_amdgpu_isa
  - .amd_amdgpu_hsa_metadata
  - .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
  alignments

llvm-svn: 334519
2018-06-12 18:02:46 +00:00
Mark Searles 987f292c56 [AMDGPU] prevent hitting Assertion `isReg() && "Wrong MachineOperand accessor"'
The use iterator, used within findMaskOperands(), can return anything which is
not a def. isUse() requires a register, so check isReg() before calling isUse().

Differential Revision: https://reviews.llvm.org/D48047

llvm-svn: 334459
2018-06-12 00:41:26 +00:00
George Burgess IV c72204d5b5 Simplify; NFC
Not shown in the diff: AQ is a `vector<SUnit *>`, and SU is a `SUnit *`

llvm-svn: 334451
2018-06-11 22:58:32 +00:00
Konstantin Zhuravlyov 3e5d66ac66 AMDGPU: Add 64-bit relative variant kind
Differential Revision: https://reviews.llvm.org/D47601

llvm-svn: 334443
2018-06-11 21:37:57 +00:00
Stanislav Mekhanoshin 7ba3fc730c [AMDGPU] Do not consider indirect acces through phi for wave limiter
Rational: if there is indirect access that is usually an issue
because load is not ready by the use. However, if use is inside a
loop and load is outside that is potentially an issue for a first
iteration only.

Differential Revision: https://reviews.llvm.org/D47740

llvm-svn: 334420
2018-06-11 16:50:49 +00:00
Daniil Fukalov c9a098b314 [AMDGPU] Inline asm - added i16, half and i128 types support
AMDGPU inline assembler support i16, half and i128 typed variables in constraints, but they were reported as error.
Needed to fix https://github.com/RadeonOpenCompute/ROCm/issues/341,
e.g. to be able to load with global_load_dwordx4 to a 128bit integer variable

Differential Revision: https://reviews.llvm.org/D44920

llvm-svn: 334301
2018-06-08 16:29:04 +00:00
Matt Arsenault 6fc3759811 AMDGPU: Error on LDS global address in functions
These won't work as expected now, so error on them to avoid
wasting time debugging this in the future.

llvm-svn: 334269
2018-06-08 08:05:54 +00:00
Tony Tye 6db1f5da4f [AMDGPU] Simplify memory legalizer (add missing virtual descructor)
Differential Revision: https://reviews.llvm.org/D47504

llvm-svn: 334257
2018-06-08 01:00:11 +00:00
Tony Tye a5a7c331e7 [AMDGPU] Simplify memory legalizer
- Make code easier to maintain.
- Avoid generating waitcnts for VMEM if the address sppace does not involve VMEM.
- Add support to generate waitcnts for LDS and GDS memory.

Differential Revision: https://reviews.llvm.org/D47504

llvm-svn: 334241
2018-06-07 22:28:32 +00:00
Matt Arsenault f1c868ef08 AMDGPU: Fix not including v2f64 in SReg_128
Fixes assertion with calls returning v2f64.

llvm-svn: 334189
2018-06-07 12:16:31 +00:00
Matt Arsenault 697300bd4f AMDGPU: Use scalar operations for f16 fabs/fneg patterns
Fixes unnecessary differences between subtargets.

llvm-svn: 334184
2018-06-07 10:15:20 +00:00
Matt Arsenault 90083d3088 AMDGPU: Try a lot harder to emit scalar loads
This has two main components. First, widen
widen short constant loads in DAG when they have
the correct alignment. This is already done a bit in
AMDGPUCodeGenPrepare, since that has access to
DivergenceAnalysis. This can't help kernarg loads
created in the DAG. Start to use DAG divergence analysis
to help this case.

The second part is to avoid kernel argument lowering
breaking the alignment of short vector elements because
calling convention lowering wants to split everything
into legal register types.

When loading a split type, load the nearest 4-byte aligned
segment and shift to get the desired bits. This extra
load of the earlier argument piece ends up merging,
and the bit extract hopefully folds out.

There are a number of improvements and regressions with
this, but I think as-is this is a better compromise between
several of the worst parts of SelectionDAG.

Particularly when i16 is legal, this produces worse code
for i8 and i16 element vector kernel arguments. This is
partially due to the very weak load merging the DAG does.
It only looks for fairly specific combines between pairs
of loads which no longer appear. In particular this
causes v4i16 loads to be split into 2 components when
previously the two halves were merged.

Worse, because of the newly introduced shifts, there
is a lot more unnecessary vector packing and unpacking code
emitted. At least some of this is due to reporting
false for isTypeDesirableForOp for i16 as a workaround for
the lack of divergence information in the DAG. The cases
where this happens it doesn't actually matter, but the
relevant code in SimplifyDemandedBits doens't have the context
to know to ignore this.

The use of the  scalar cache is probably more important
than the mess of mostly scalar instructions doing this packing
and unpacking. Future work can fix this, possibly by making better
use of the new DAG divergence information for controlling promotion
decisions, or adding another version of shift + trunc + shift
combines that doesn't only know about the used types.

llvm-svn: 334180
2018-06-07 09:54:49 +00:00
Stanislav Mekhanoshin df61be70b2 [AMDGPU] Improve reciprocal handling
When denormals are supported we are producing a full division for
1.0f / x. That still can be replaced by the faster version:

    bool c = fabs(x) > 0x1.0p+96f;
    float s = c ? 0x1.0p-32f : 1.0f;
    x *= s;
    return s * v_rcp_f32(x)

in case if requested accuracy is 2.5ulp or less. The same version
is used if denormals are not supported for non 1.0 numerators, where
just v_rcp_f32 is then used for 1.0 numerator.

The optimization of 1/x is extended to the case -1/x, which is the
same except for the resulting sign bit.

OpenCL conformance passed with both enabled and disabled denorms.

Differential Revision: https://reviews.llvm.org/D47805

llvm-svn: 334142
2018-06-06 22:22:32 +00:00
Matt Arsenault e9524f1fb3 AMDGPU: Custom lower v2f16 fneg/fabs with illegal f16
Fixes terrible code on targets without f16 support. The
legalization creates a mess that is difficult to recover
from. Also should avoid randomly breaking these tests
multiple times in sequence in future commits.

Some regressions in cases where it happens to be better
to pull the source modifier after the conversion.

llvm-svn: 334132
2018-06-06 21:28:11 +00:00
Peter Smith 57f661bd7d [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
On targets like Arm some relaxations may only be performed when certain
architectural features are available. As functions can be compiled with
differing levels of architectural support we must make a judgement on
whether we can relax based on the MCSubtargetInfo for the function. This
change passes through the MCSubtargetInfo for the function to
fixupNeedsRelaxation so that the decision on whether to relax can be made
per function. In this patch, only the ARM backend makes use of this
information. We must also pass the MCSubtargetInfo to applyFixup because
some fixups skip error checking on the assumption that relaxation has
occurred, to prevent code-generation errors applyFixup must see the same
MCSubtargetInfo as fixupNeedsRelaxation.

Differential Revision: https://reviews.llvm.org/D44928

llvm-svn: 334078
2018-06-06 09:40:06 +00:00
Matt Arsenault 57e541e87e AMDGPU: Preserve metadata when widening loads
Preserves the low bound of the !range. I don't think
it's legal to do anything with the top half since it's
theoretically reading garbage.

llvm-svn: 334045
2018-06-05 19:52:56 +00:00
Matt Arsenault 9224c00d2b AMDGPU: Use more custom insert/extract_vector_elt lowering
Apply to i8 vectors.

llvm-svn: 334044
2018-06-05 19:52:46 +00:00
David Blaikie 31b98d2e99 Move Analysis/Utils/Local.h back to Transforms
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)

llvm-svn: 333954
2018-06-04 21:23:21 +00:00
Stanislav Mekhanoshin 838c07c531 [AMDGPU] Small refactoring in the scheduler
After last changes some code can be simplified.

Differential Revision: https://reviews.llvm.org/D47661

llvm-svn: 333934
2018-06-04 17:57:40 +00:00
Stanislav Mekhanoshin 28624f94d5 [AMDGPU] Factored out common part of GCNRPTracker::reset()
Differential Revision: https://reviews.llvm.org/D47664

llvm-svn: 333931
2018-06-04 17:21:54 +00:00
Mark Searles f0b93f1e9e [AMDGPU][Waitcnt] Fix handling of flat instrs
On GFX9 and earlier, flat memory ops may decrement VMCNT out-of-order as well as LGKMCNT out-of-order.

Differential Revision: https://reviews.llvm.org/D46616

llvm-svn: 333926
2018-06-04 16:51:59 +00:00
Nicolai Haehnle 59198ed040 AMDGPU: Make various NamedOperands upper case
Summary:
Avoid name clashes with the corresponding bit fields in the instruction
encoding.

Change-Id: Id1644e703e976e78f7af93788d9f44cb48c3251f

Reviewers: arsenm, rampitec, kzhuravl

Subscribers: wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47433

llvm-svn: 333905
2018-06-04 14:45:20 +00:00
Nicolai Haehnle 01d261f18d TableGen: Streamline the semantics of NAME
Summary:
The new rules are straightforward. The main rules to keep in mind
are:

1. NAME is an implicit template argument of class and multiclass,
   and will be substituted by the name of the instantiating def/defm.

2. The name of a def/defm in a multiclass must contain a reference
   to NAME. If such a reference is not present, it is automatically
   prepended.

And for some additional subtleties, consider these:

3. defm with no name generates a unique name but has no special
   behavior otherwise.

4. def with no name generates an anonymous record, whose name is
   unique but undefined. In particular, the name won't contain a
   reference to NAME.

Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.

The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.

Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.

Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.

Change-Id: I694095231565b30f563e6fd0417b41ee01a12589

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47430

llvm-svn: 333900
2018-06-04 14:26:05 +00:00
Amaury Sechet 8467411dad Set ADDE/ADDC/SUBE/SUBC to expand by default
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.

Target that uses these opcodes are changed in order to ensure their behavior doesn't change.

Reviewers: efriedma, craig.topper, dblaikie, bkramer

Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits

Differential Revision: https://reviews.llvm.org/D47422

llvm-svn: 333748
2018-06-01 13:21:33 +00:00
Tom Stellard e43778895c AMDGPU/R600: Move intrinsics to IntrinsicsAMDGPU.td
Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47487

llvm-svn: 333720
2018-06-01 02:19:46 +00:00
Stanislav Mekhanoshin 739174c4be [AMDGPU] Construct memory clauses before RA
Memory clauses are formed into bundles in presence of xnack.
Their source operands are marked as early-clobber.

This allows to allocate distinct source and destination registers
within a clause and prevent breaking the clause with s_nop in the
hazard recognizer.

Clauses are undone before post-RA scheduler to allow some rescheduling,
which will not break the clause since artificial edges are created in
the dag to keep memory operations together. Yet this allows a better
ILP in some cases.

Differential Revision: https://reviews.llvm.org/D47511

llvm-svn: 333691
2018-05-31 20:13:51 +00:00
Roman Tereshin 76c29c68dc [GlobalISel][AMDGPU] LegalizerInfo verifier: Adding LegalizerInfo::verify(...) call for AMDGPU
Reviewers: aemerson, qcolombet

Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D46339

llvm-svn: 333664
2018-05-31 16:16:48 +00:00
Stanislav Mekhanoshin d4b500cb08 [AMDGPU] Track occupancy in MFI
Keep track of achieved occupancy in SIMachineFunctionInfo.
At the moment we have a lot of duplicated or even missed code to
query and maintain occupancy info. Record it in the MFI and
query in a single call. Interfaces:

- getOccupancy() - returns current recorded achieved occupancy.
- getMinAllowedOccupancy() - returns lesser of the achieved occupancy
and the lowest occupancy we are ready to tolerate. For example if
a kernel is memory bound we are ready to tolerate 4 waves.
- limitOccupancy() - record occupancy level if we have to lower it.
- increaseOccupancy() - record occupancy if scheduler managed to
increase the occupancy.

MFI takes care of integrating different checks affecting occupancy,
including LDS use and waves-per-eu attribute. Note that scheduler
starts with not yet known register pressure, so has to record either
limit or increase in occupancy after it is done. Later passes can
just query a resulting value.

New interface is used in the active scheduler and NFC wrt its work.
Changes are also made to experimental schedulers to use it and record
an occupancy after they are done. Before the change waves-per-eu was
ignored by experimental schedulers and tolerance window for memory
bound kernels was not used.

Differential Revision: https://reviews.llvm.org/D47509

llvm-svn: 333629
2018-05-31 05:36:04 +00:00
Jan Vesely f5016b79a6 AMDGPU/R600: Make sure functions are cacheline aligned
v2: use "ensureAlignment"
    make functions cache line aligned
Fixes GPU hangs since r333219:
"AMDGPU: Split R600 AsmPrinter code into its own class"

Differential Revision: https://reviews.llvm.org/D47516

llvm-svn: 333622
2018-05-31 04:08:08 +00:00
Tom Stellard c7624317d7 AMDGPU: Split AMDGPUTTI into GCNTTI and R600TTI
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47359

llvm-svn: 333605
2018-05-30 22:55:35 +00:00
Mark Searles ed54ff1d51 [AMDGPU][Waitcnt] Fix build error: unused variable 'SWaitInst'
https://reviews.llvm.org/rL333556 caused a buildbot failure.

See http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/21876/steps/build_Lld/logs/stdio

/Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:2007:10: error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
    auto SWaitInst = BuildMI(EntryBB, EntryBB.getFirstNonPHI(),

The unused variable was for debugging purposes; removing that piece of code
to fix the build.

llvm-svn: 333559
2018-05-30 16:27:57 +00:00
Matt Arsenault 7b4826e6ce AMDGPU: Use better alignment for kernarg lowering
This was just emitting loads with the ABI alignment
for the raw type. The true alignment is often better,
especially when an illegal vector type was scalarized.
The better alignment allows using a scalar load
more often.

llvm-svn: 333558
2018-05-30 16:17:51 +00:00
Mark Searles 1054541490 [AMDGPU][Waitcnt] Fix handling of loops with many bottom blocks
In terms of waitcnt insertion/if necessary, the waitcnt pass forces convergence
for a loop. Previously, that kicked if greater than 2 passes over a loop, which
doesn't account for loop with many bottom blocks. So, increase the threshold to
(n+1), where n is the number of bottom blocks. This gives the pass an
opportunity to consider the contribution of each bottom block, to the overall
loop, before the forced convergence potentially kicks in.

Differential Revision: https://reviews.llvm.org/D47488

llvm-svn: 333556
2018-05-30 15:47:45 +00:00
Matt Arsenault 2e4d338d16 AMDGPU: Fix typo in option description
llvm-svn: 333457
2018-05-29 19:35:46 +00:00
Matt Arsenault 1ea0402e82 AMDGPU: Round up kernel argument allocation size
AFAIK the driver's allocation will actually have to round this
up anyway. It is useful to track the rounded up size, so that
the end of the kernel segment is known to be dereferencable so
a wider s_load_dword can be used for a short argument at the end
of the segment.

llvm-svn: 333456
2018-05-29 19:35:00 +00:00
Konstantin Zhuravlyov 2ca6b1f2ba AMDGPU: Always set COMPUTE_PGM_RSRC2.ENABLE_TRAP_HANDLER to zero for AMDHSA as
it is set by CP

Differential Revision: https://reviews.llvm.org/D47392

llvm-svn: 333451
2018-05-29 19:09:13 +00:00
Matt Arsenault ceafc55e5a AMDGPU: Pass function directly instead of MachineFunction
These functions just query the underlying IR function,
so pass it directly.

llvm-svn: 333442
2018-05-29 17:42:50 +00:00
Matt Arsenault 2fb9ccf770 AMDGPU: Add nuw to add off of kernarg ptr
llvm-svn: 333441
2018-05-29 17:42:38 +00:00
Tom Stellard 57b9342c80 AMDGPU: Split R600 MCInst lowering into its own class
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47307

llvm-svn: 333439
2018-05-29 17:41:59 +00:00
Tim Renouf fa213f797b [AMDGPU] Fixed build warning
Summary:
V2: Use cast instead of extra if.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47426

Change-Id: I6ac31da0306f79706960284a7ebd7b9c6237a83a
llvm-svn: 333397
2018-05-29 08:15:37 +00:00
Farhana Aleen eacb1020aa [AMDGPU] Re-enabled 128bit wide-vector generation for local addr space by default.
Summary: Bug reported here https://bugs.freedesktop.org/show_bug.cgi?id=105464 found
         to be resolved by some other fixes.

Author: FarhanaAleen
llvm-svn: 333380
2018-05-28 18:15:11 +00:00
Tim Renouf 364edcd2e5 [AMDGPU] Fixed WWM bug in block otherwise entirely in WQM
Summary:
For a block with WQM on entry and exit and containing no exact mode
code, but containing some WWM code, the WQM pass forgot to process the
block at all and so did not insert code to enter and leave WWM.

This commit fixes that.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47027

Change-Id: I044792eead1293bed4203fb26ce75f47878afeb6
llvm-svn: 333362
2018-05-27 17:26:11 +00:00
Mark Searles 32efedcff3 [AMDGPU][Waitcnt] Remove obsolete waitcnt option
With the removal of the old waitcnt pass, the '-enable-si-insert-waitcnts' option is obsolete. Remove it.

Differential Revision: https://reviews.llvm.org/D47378

llvm-svn: 333303
2018-05-25 20:24:08 +00:00
Stanislav Mekhanoshin 7fc1cee051 [AMDGPU] Fixed test failure with AMDGPUPerfHint
We shall not keep iterator to a map while map is modified,
this leads to a broken map.

llvm-svn: 333298
2018-05-25 18:46:58 +00:00
Reid Kleckner cb48efd585 Fix -Winconsistent-missing-overrides in AMDGPU code
llvm-svn: 333291
2018-05-25 17:46:24 +00:00
Stanislav Mekhanoshin 1c538423dc [AMDGPU] Add perf hints to functions
This is adoption of HSAIL perfhint pass. Two types of hints are produced:

1. Function is memory bound.
2. Kernel can use wave limiter.

Currently these hints are used in the scheduler. If a function is suspected
to be memory bound we allow occupancy to decrease to 4 waves in the course
of scheduling.

Differential Revision: https://reviews.llvm.org/D46992

llvm-svn: 333289
2018-05-25 17:25:12 +00:00
Tim Renouf ad8b7c1190 [AMDGPU] Fixed incorrect break from loop
Summary:
Lower control flow did not correctly handle the case that a loop break
in if/else was on a condition that was not guaranteed to be masked by
exec. The first test kernel shows an example of this going wrong; after
exiting the loop, exec is all ones, even if it was not before the loop.

The fix is for lowering of if-break and else-break to insert an
S_AND_B64 to mask the break condition with exec. This commit also
includes the optimization of not inserting that S_AND_B64 if it is
obviously not needed because the break condition is the result of a
V_CMP in the same basic block.

V2: Addressed some review comments.
V3: Test fixes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44046

Change-Id: I0fc56a01209a9e99d1d5c9b0ffd16f111caf200c
llvm-svn: 333258
2018-05-25 07:55:04 +00:00
Tom Stellard 79fffe3515 AMDGPU: Remove AMDGPUMCInstLower.h
Summary:
The AMDGPUMCInstLower class is not used outside AMDGPUMCInstLower.cpp,
so we don't need a header file.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47264

llvm-svn: 333254
2018-05-25 04:57:02 +00:00
Tom Stellard c501501055 AMDGPU: Split R600 AsmPrinter code into its own class
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D47245

llvm-svn: 333219
2018-05-24 20:02:01 +00:00
Tom Stellard 1b95fed6f7 AMDGPU/R600: Remove code for handling AMDGPUISD::CLAMP
Summary:
We don't generate AMDGPUISD::CLAMP for R600 now that llvm.AMDGPU.clamp
is gone.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47181

llvm-svn: 333153
2018-05-24 05:28:34 +00:00
Matt Arsenault 606bc315d6 AMDGPU: Fix v2f16 fneg/fabs pattern
The integer operation convertion for some reason only happens
if the source is a bitcast from an integer, which happens to
always be the situation when the result is loaded. Add
an additional pattern for when the source operation is really
an FP operation.

llvm-svn: 333019
2018-05-22 20:13:34 +00:00
Tom Stellard b12f4dec08 AMDGPU: Move AMDGPUTargetLowering::isFPExtFoldable() into SITargetLowering
Summary: This is always false for R600.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D47180

llvm-svn: 333016
2018-05-22 19:37:55 +00:00
Matt Arsenault 1349a04ef5 AMDGPU: Make v2i16/v2f16 legal on VI
This usually results in better code. Fixes using
inline asm with short2, and also fixes having a different
ABI for function parameters between VI and gfx9.

Partially cleans up the mess used for lowering of the d16
operations. Making v4f16 legal will help clean this up more,
but this requires additional work.

llvm-svn: 332953
2018-05-22 06:32:10 +00:00
Tom Stellard 44b30b4537 AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.

This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.

I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46272

llvm-svn: 332930
2018-05-22 02:03:23 +00:00
Peter Collingbourne dcd7d6c331 MC: Separate creating a generic object writer from creating a target object writer. NFCI.
With this we gain a little flexibility in how the generic object
writer is created.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47045

llvm-svn: 332868
2018-05-21 19:20:29 +00:00
Stanislav Mekhanoshin 9badad2051 [AMDGPU] Add divergence analysis as a dependency for ISel
AMDGPUDAGToDAGISel adds DivergenceAnalysis in getAnalysisUsage
but does not list it in pass dependencies which may lead to
crash.

Differential Revision: https://reviews.llvm.org/D47151

llvm-svn: 332862
2018-05-21 18:18:52 +00:00
Peter Collingbourne 571a3301ae MC: Change MCAsmBackend::writeNopData() to take a raw_ostream instead of an MCObjectWriter. NFCI.
To make this work I needed to add an endianness field to MCAsmBackend
so that writeNopData() implementations know which endianness to use.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47035

llvm-svn: 332857
2018-05-21 17:57:19 +00:00
Tom Stellard a91ce17b5f AMDGPU/GlobalISel: Address post-commit review comments for r332379
MCRegisterInfo::getPhysRegSize() will be deprecated.

llvm-svn: 332856
2018-05-21 17:49:31 +00:00
Simon Pilgrim ede0e4073e Fix MSVC unused variable warning. NFCI.
AMDGPURegisterInfo::getSubRegFromChannel is a static method - we don't need to get the AMDGPURegisterInfo instance.

llvm-svn: 332807
2018-05-19 12:46:02 +00:00
Matt Arsenault 372d796ab1 AMDGPU: Add pass to optimize reqd_work_group_size
Eliminate loads from the dispatch packet when they will have
a known value.

Also pattern match the code used by the library to handle partial
workgroup dispatches, which isn't necessary if reqd_work_group_size
is used.

llvm-svn: 332771
2018-05-18 21:35:00 +00:00
Peter Collingbourne e3f652973e Support: Simplify endian stream interface. NFCI.
Provide some free functions to reduce verbosity of endian-writing
a single value, and replace the endianness template parameter with
a field.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47032

llvm-svn: 332757
2018-05-18 19:46:24 +00:00
Konstantin Zhuravlyov caa8251971 AMDGPU/NFC: Set symbol's type that is coming from an argument in
EmitAMDGPUSymbolType, instead of hard-coding it to STT_AMDGPU_HSA_KERNEL.

llvm-svn: 332753
2018-05-18 18:41:37 +00:00
Peter Collingbourne f7b81db715 MC: Change the streamer ctors to take an object writer instead of a stream. NFCI.
The idea is that a client that wants split dwarf would create a
specific kind of object writer that creates two files, and use it to
create the streamer.

Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47050

llvm-svn: 332749
2018-05-18 18:26:45 +00:00
Changpeng Fang 860d460063 AMDGPU/SI: Don't promote alloca to vector for atomic load/store
Summary:
  Don't promote alloca to vector for atomic load/store

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D46085

llvm-svn: 332673
2018-05-17 21:49:44 +00:00
Changpeng Fang 391bcf8893 AMDGPU/SI: Handle infinite loop for the structurizer to work with CFG with infinite loops.
Summary:
  The current StructurizeCFG pass only works for CFG with one exit. AMDGPUUnifyDivergentExitNodes combines multiple "return" blocks and/or "unreachable" blocks
to one exit block for the Structurizer to work. However, infinite loop is another kind of special "exit", and if we don't handle it, the case of multiple exits will prevent the structurizer from working.

In this work, for each infinite loop, we add a dummy edge to the "return" block, and thus the AMDGPUUnifyDivergentExitNodes pass will work with infinite loops.
This will make CFG with infinite loops be structurized.

Reviewer:
  nhaehnle

Differential Revision:
  https://reviews.llvm.org/D46340

llvm-svn: 332625
2018-05-17 16:45:01 +00:00
Konstantin Zhuravlyov c72ece6c2c AMDGPU : Recalculate SGPRs when trap handler is supported
Differential Revision: https://reviews.llvm.org/D29911

llvm-svn: 332523
2018-05-16 20:47:48 +00:00
Tony Tye 43259df44a [AMDGPU] Change llvm.debugtrap to be a debug breakpoint that can resume execution.
No longer require the queue pointer to be passed in in fixed SGPRs.

Differential Revision: https://reviews.llvm.org/D46769

llvm-svn: 332485
2018-05-16 16:19:34 +00:00
Matt Arsenault 67a9815a5c AMDGPU: Custom lower v4i16/v4f16 vector operations
Avoids stack access.

Also handle extract hi elt pattern from truncate + shift
to avoid a couple test regressions.

llvm-svn: 332453
2018-05-16 11:47:30 +00:00
Stanislav Mekhanoshin 57d341c27a [AMDGPU] Fix handling of void types in isLegalAddressingMode
It is legal for the type passed to isLegalAddressingMode to be
unsized or, more specifically, VoidTy. In this case, we must
check the legality of load / stores for all legal types. Directly
trying to call getTypeStoreSize is incorrect, and leads to breakage
in e.g. Loop Strength Reduction. This change guards against that
behaviour.

Differential Revision: https://reviews.llvm.org/D40405

llvm-svn: 332409
2018-05-15 22:07:51 +00:00
Konstantin Zhuravlyov f13c9969fc AMDGPU: Fix v_dot{4, 8}* instruction encoding
Differential Revision: https://reviews.llvm.org/D46848

llvm-svn: 332387
2018-05-15 19:32:47 +00:00
Tom Stellard e182b28ae4 AMDGPU/GlobalISel: Implement select() for G_FCONSTANT
Summary: Also clean up G_CONSTANT selection.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46170

llvm-svn: 332379
2018-05-15 17:57:09 +00:00
Konstantin Zhuravlyov 603a43fcd5 AMDGPU: Add disasm tests for deep learning instructions + fix v_fmac_f32 disasm
Differential Revision: https://reviews.llvm.org/D46853

llvm-svn: 332377
2018-05-15 17:39:13 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Matt Arsenault 432aaea63f AMDGPU: Rename OpenCL lowering pass to be R600 specific.
This pass is
  a) broken.
  b) r600 specific.

Fixing (a) is a bit more non-trivial, but fixing (b)
is easy. Move this pass to being R600 only for now.

This pass does pass all the unit tests, however clang
no longer generates code that looks like the unit test
input, so fixing the pass requires fixing the tests and
the pass as one, and checking it works with clang still.

Patch by Dave Airlie

llvm-svn: 332196
2018-05-13 10:04:48 +00:00
Matt Arsenault dfb88dfe30 AMDGPU: Make undef legal for v2i16/v2f16
This is apparently necessary to stop undef from being
turned into a build_vector of 0s.

llvm-svn: 332195
2018-05-13 10:04:38 +00:00
Stanislav Mekhanoshin 7012c246c1 [AMDGPU] Fix amdgpu-waves-per-eu accounting in scheduler
We cannot query this attribute from a subtarget given a machine function.
At this point attribute itself is already unavailable and can only be
obtained through MFI.

Differential Revision: https://reviews.llvm.org/D46781

llvm-svn: 332166
2018-05-12 01:41:56 +00:00
Tom Stellard 655fdd3f82 AMDGPU/GlobalISel: Implement select() for >32-bit G_STORE
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46153

llvm-svn: 332154
2018-05-11 23:12:49 +00:00
Changpeng Fang f094885a9e AMDGPU/SI: Don't promote alloca to vector for AddrSpaceCast instruction.
Summary:
  We have no logic to promote alloca to vector for an AddrSpaceCast instruction.

Reviewer:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D45993

llvm-svn: 332147
2018-05-11 22:17:57 +00:00
Yaxun Liu deba150c27 [AMDGPU] Fix compilation failure when IR contains comdat
Remove a useless SwitchSection which also causes compilation failure
when IR contains comdat.

The SwitchSection is useless because the current section is already
correct text section for the function therefore no need to switch.

It causes compilation failure for comdat because functions with comdat
has specific text section, not the default .text section.

Since HIP uses comdat, this bug caused failures for HIP.

Differential Revision: https://reviews.llvm.org/D46770

llvm-svn: 332137
2018-05-11 20:40:14 +00:00
Tom Stellard dcc95e9385 AMDGPU/GlobalISel: Implement select() for 32-bit G_FPTOUI
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45883

llvm-svn: 332082
2018-05-11 05:44:16 +00:00