Commit Graph

2814 Commits

Author SHA1 Message Date
Tom Stellard 1e0edad4bb AMDGPU/GlobalISel: Implement select() for G_BITCAST s32 <--> <2 x s16>
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45881

llvm-svn: 332042
2018-05-10 21:20:10 +00:00
Tom Stellard 1dc90204bf AMDGPU/GlobalISel: Enable TableGen'd instruction selector
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, mgorny, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45994

llvm-svn: 332039
2018-05-10 20:53:06 +00:00
Farhana Aleen e24f3ff8de [AMDGPU] Support horizontal vectorization of min/max.
Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: AMDGPU

Differential Revision: https://reviews.llvm.org/D46604

llvm-svn: 331920
2018-05-09 21:18:34 +00:00
Matt Arsenault eac81b2448 AMDGPU: Ignore any_extend in mul24 combine
If a multiply is truncated, SimplifyDemandedBits
sometimes turns a zero_extend of the inputs into an
any_extend, which makes the known bits computation unhelpful.
Ignore these and compute known bits for the underlying value,
since we insert the correct extend type after.

llvm-svn: 331919
2018-05-09 21:11:35 +00:00
Matt Arsenault 74fd7600d2 AMDGPU: Handle partial shift reduction for variable shifts
If the variable shift amount has known bits, we can still reduce
the shift.

llvm-svn: 331917
2018-05-09 20:52:54 +00:00
Matt Arsenault b143d9a5ea AMDGPU: Partially shrink 64-bit shifts if reduced to 16-bit
This is an extension of an existing combine to reduce wider
shls if the result fits in the final result type. This
introduces the same combine, but reduces the shift to a middle
sized type to avoid the slow 64-bit shift.

llvm-svn: 331916
2018-05-09 20:52:43 +00:00
Matt Arsenault 762d498808 AMDGPU: Add combine for trunc of bitcast from build_vector
If the truncate is only accessing the first element of the vector,
we can use the original source value.

This helps with some combine ordering issues after operations are
lowered to integer operations between bitcasts of build_vector.
In particular it stops unnecessarily materializing the unused
top half of a vector in some cases.

llvm-svn: 331909
2018-05-09 18:37:39 +00:00
Matt Arsenault 378f86998c AMDGPU: Stop special casing constant indexes of extract_vector_elt
The same result folds out of the dynamic expansion logic if the
index is constant.

llvm-svn: 331906
2018-05-09 18:29:26 +00:00
Shiva Chen 801bf7ebbe [DebugInfo] Examine all uses of isDebugValue() for debug instructions.
Because we create a new kind of debug instruction, DBG_LABEL, we need to
check all passes which use isDebugValue() to check MachineInstr is debug
instruction or not. When expelling debug instructions, we should expel
both DBG_VALUE and DBG_LABEL. So, I create a new function,
isDebugInstr(), in MachineInstr to check whether the MachineInstr is
debug instruction or not.

This patch has no new test case. I have run regression test and there is
no difference in regression test.

Differential Revision: https://reviews.llvm.org/D45342

Patch by Hsiangkai Wang.

llvm-svn: 331844
2018-05-09 02:42:00 +00:00
Tim Renouf 64afc2d7f0 [AMDGPU] Provide machine -> name mapping
Summary:
AMDGPU stores a numerical code for the particular GPU variant in EFlags
in the ELF file. This commit provides a mapping from that number into
the machine name for use by objdump-type tools.

Change-Id: Id37fc0bebad443bd89c0080985ce298c4e7e9319

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46587

llvm-svn: 331798
2018-05-08 18:53:04 +00:00
Matt Arsenault 869cbedc81 AMDGPU: Fix broken dynamic vector indexing for packed types
The intention of this was to multiply by 16, not shift by 16.

llvm-svn: 331793
2018-05-08 18:43:25 +00:00
Changpeng Fang d049da3740 AMDGPU: Use eraseFromParent to delete am instruction when it is no longer needed.
Reviewer: Nicolai

Differential Revision:
  https://reviews.llvm.org/D46438

llvm-svn: 331788
2018-05-08 18:32:35 +00:00
Stanislav Mekhanoshin 432936161e [AMDGPU] Added checks for dpp_ctrl value
- Report error for invalid dpp_ctrl values.
- Changed the way it is reported, now the error will be emitted into
  asm and will work with release build as well.
- Added dpp_ctrl value verifier for codegen.
- Added symbolic constants for dpp_ctrl.

Differential Revision: https://reviews.llvm.org/D46565

llvm-svn: 331775
2018-05-08 16:53:02 +00:00
Tom Stellard 37444285f1 AMDGPU/GlobalISel: Don't try to lower hull shaders
Summary: The AMDGPU_HS calling convention is not supported yet.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46149

llvm-svn: 331691
2018-05-07 22:17:54 +00:00
Mark Searles 4a0f2c5047 [AMDGPU][Waitcnt] Remove the old waitcnt pass
Remove the old waitcnt pass ( si-insert-waits ), which is no longer maintained
and getting crufty

Differential Revision: https://reviews.llvm.org/D46448

llvm-svn: 331641
2018-05-07 14:43:28 +00:00
Tim Renouf 18a1e9d03a [AMDGPU] Don't force WQM for DS op
Summary:
Previously, all DS ops forced WQM in a pixel shader. That was a hack to
allow for graphics frontends using ds_swizzle to implement explicit
derivatives, on SI/CI at least where DPP is not available. But it forced
WQM for _any_ DS op.

With this commit, DS ops no longer force WQM. Both graphics frontends
(Mesa and LLPC) need to change to issue an explicit llvm.amdgcn.wqm
intrinsic call when calculating explicit derivatives.

The required Mesa change is: "amd/common: use llvm.amdgcn.wqm for
explicit derivatives".

Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46051

Change-Id: I9b745b626fa91bbd66456e6cf41ee07eeea42f81
llvm-svn: 331633
2018-05-07 13:21:26 +00:00
Konstantin Zhuravlyov 91a74f53db AMDGPU/NFC: Update D16PreservesUnusedBits description based Tony Tye's comments
llvm-svn: 331564
2018-05-04 22:53:55 +00:00
Konstantin Zhuravlyov 3fc4067ac4 AMDGPU/NFC: Fix formatting for 900, 902 ISA Version features
llvm-svn: 331553
2018-05-04 20:21:31 +00:00
Konstantin Zhuravlyov c2c2eb7d01 AMDGPU: Add D16 instructions preserve unused bits feature
- Predicate D16 patterns on this new feature
- Added this new feature to gfx900/2/4

Differential Revision: https://reviews.llvm.org/D46366

llvm-svn: 331551
2018-05-04 20:06:57 +00:00
Michael Berg 7acc81b744 Fast Math Flag mapping into SDNode
Summary: Adding support for Fast flags in the SDNode to leverage fast math sub flag usage.

Reviewers: spatel, arsenm, jbhateja, hfinkel, escha, qcolombet, echristo, wristow, javed.absar

Reviewed By: spatel

Subscribers: llvm-commits, rampitec, nhaehnle, tstellar, FarhanaAleen, nemanjai, javed.absar, jbhateja, hfinkel, wdng

Differential Revision: https://reviews.llvm.org/D45710

llvm-svn: 331547
2018-05-04 18:48:20 +00:00
Tom Stellard b03c98d1a3 AMDGPU: Make getSubRegFromChannel a static member of AMDGPURegisterInfo
Summary:
This makes is possible to have R600RegisterInfo and SIRegisterInfo
not inherit from AMDGPURegisterInfo.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D46280

llvm-svn: 331490
2018-05-03 22:38:06 +00:00
Piotr Padlewski 5dde809404 Rename invariant.group.barrier to launder.invariant.group
Summary:
This is one of the initial commit of "RFC: Devirtualization v2" proposal:
https://docs.google.com/document/d/16GVtCpzK8sIHNc2qZz6RN8amICNBtvjWUod2SujZVEo/edit?usp=sharing

Reviewers: rsmith, amharc, kuhar, sanjoy

Subscribers: arsenm, nhaehnle, javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D45111

llvm-svn: 331448
2018-05-03 11:03:01 +00:00
Farhana Aleen 07e612340f [AMDGPU] A trivial fix for a buildbot failure caused by "commit 224a839fcbbead221f872cd32a1dd0c308d37299".
Author: FarhanaAleen
llvm-svn: 331383
2018-05-02 18:16:39 +00:00
Farhana Aleen 150cb6d91a Revert "[AMDGPU] performAddCombine should run after DAG is legalized."
This reverts commit 6b97d2995566b4dddd6bf0d75579ff44501d4494.

llvm-svn: 331371
2018-05-02 16:48:52 +00:00
Farhana Aleen 2f4100f56e [AMDGPU] performAddCombine should run after DAG is legalized.
Summary: performAddCombine should run after DAG is legalized; Otherwise generic optimization
         in the DAGCombiner can optimize an addcarry+trunc into an addcarry instruction with
         illegal types.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D46337

llvm-svn: 331368
2018-05-02 16:24:10 +00:00
Farhana Aleen e2dfe8a853 [AMDGPU] Support horizontal vectorization.
Author: FarhanaAleen

Reviewed By: rampitec, arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D46213

llvm-svn: 331313
2018-05-01 21:41:12 +00:00
Konstantin Zhuravlyov 1501af4846 AMDGPU: Remove remnants of gfx901 (it was deprecated some time ago)
llvm-svn: 331298
2018-05-01 18:47:48 +00:00
Adrian Prantl 4dfcc4a788 Remove @brief commands from doxygen comments, too.
This is a follow-up to r331272.

We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by
  for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done

https://reviews.llvm.org/D46290

llvm-svn: 331275
2018-05-01 16:10:38 +00:00
Adrian Prantl 5f8f34e459 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

llvm-svn: 331272
2018-05-01 15:54:18 +00:00
Matt Arsenault 0084adc516 AMDGPU: Add Vega12 and Vega20
Changes by
  Matt Arsenault
  Konstantin Zhuravlyov

llvm-svn: 331215
2018-04-30 19:08:16 +00:00
Tom Stellard add59c052d AMDGPU: Remove some dead code
llvm-svn: 331196
2018-04-30 16:28:02 +00:00
Tom Stellard 6c81418a63 AMDGPU/GlobalISel: Don't try to lower geometry shaders
Summary: The AMDGPU_GS calling convention is not supported yet.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46041

llvm-svn: 331186
2018-04-30 15:15:23 +00:00
Nico Weber 432a38838d IWYU for llvm-config.h in llvm, additions.
See r331124 for how I made a list of files missing the include.
I then ran this Python script:

    for f in open('filelist.txt'):
        f = f.strip()
        fl = open(f).readlines()

        found = False
        for i in xrange(len(fl)):
            p = '#include "llvm/'
            if not fl[i].startswith(p):
                continue
            if fl[i][len(p):] > 'Config':
                fl.insert(i, '#include "llvm/Config/llvm-config.h"\n')
                found = True
                break
        if not found:
            print 'not found', f
        else:
            open(f, 'w').write(''.join(fl))

and then looked through everything with `svn diff | diffstat -l | xargs -n 1000 gvim -p`
and tried to fix include ordering and whatnot.

No intended behavior change.

llvm-svn: 331184
2018-04-30 14:59:11 +00:00
Matt Arsenault 540512c297 DAG: Fix not legalizing vector fcanonicalizes
If an fcanoncialize was done on a vector type that was legal,

llvm-svn: 330981
2018-04-26 19:21:37 +00:00
Matt Arsenault fcc5ba46b7 AMDGPU: Extend extract_vector_elt fneg combine to fabs
Fixes a regression in a future commit.

llvm-svn: 330980
2018-04-26 19:21:32 +00:00
Matt Arsenault 8474803c7c AMDGPU: Consolidate SubtargetPredicate definitions
llvm-svn: 330979
2018-04-26 19:21:26 +00:00
Mark Searles 2a19af6e17 [AMDGPU][Waitcnt] As of gfx7, VMEM operations do not increment the export counter and the input registers are available in the next instruction; update the waitcnt pass to take this into account.
Differential Revision: https://reviews.llvm.org/D46067

llvm-svn: 330954
2018-04-26 16:11:19 +00:00
Tom Stellard dce46fa1cf AMDGPU/R600: Move int_r600_store_stream_output to the public intrinsic file
Summary:
The TableGen'd GlobalISel instruction selector assumes all intrinsics are in
the public Intrinsic:: namespace.

Reviewers: jvesely, nhaehnle

Reviewed By: jvesely, nhaehnle

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45989

llvm-svn: 330866
2018-04-25 20:02:53 +00:00
Mark Searles ec58183e1b [AMDGPU] Waitcnt pass: add debug options
- Add "amdgpu-waitcnt-forcezero" to force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)

- Add debug counters to control force emit of s_waitcnt instrs; debug counters:
si-insert-waitcnts-forceexp: force emit s_waitcnt expcnt(0) instrs
si-insert-waitcnts-forcevm: force emit s_waitcnt lgkmcnt(0) instrs
si-insert-waitcnts-forcelgkm: force emit s_waitcnt vmcnt(0) instrs

- Add some debug statements

Note that a variant of this patch was previously committed/reverted.

Differential Revision: https://reviews.llvm.org/D45888

llvm-svn: 330862
2018-04-25 19:21:26 +00:00
Alexander Timofeev b934728cd2 [AMDGPU] Revert b0efc4fd6 (https://reviews.llvm.org/D40556)
llvm-svn: 330818
2018-04-25 12:32:46 +00:00
Tom Stellard a2be8f4c35 AMDGPU: Remove deprecated llvm.AMDGPU.kilp intrinsic
Summary: This is no longer used by mesa since its 18.0.0 release.

Reviewers: nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D45988

llvm-svn: 330775
2018-04-24 21:37:57 +00:00
Tom Stellard 257882ff72 AMDGPU/GlobalISel: Fall-back to SelectionDAG for non-void functions
Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45843

llvm-svn: 330774
2018-04-24 21:29:36 +00:00
Tom Stellard c7709e1c29 AMDGPU/GlobalISel: Add support for amdgpu_ps calling convention
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45837

llvm-svn: 330767
2018-04-24 20:51:28 +00:00
Stanislav Mekhanoshin a4bfb3c446 [AMDGPU] Truncate packed inline constant
If a packed inline constant is sign extended it must be truncated
after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented
as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended
to 64 bit. After the value shifted right by 16 to use it in a low part
with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline
constant any longer.

Fixed the error and added verification code. Without the fix and with
the verification bug is causing pk_max_f16_literal.ll to fail.

Differential Revision: https://reviews.llvm.org/D45987

llvm-svn: 330752
2018-04-24 18:17:55 +00:00
Mark Searles 70901b9047 [AMDGPU][Waitcnt] NFC. Cleanup some code/naming consistency:
- s/SWaitcnt/Waitcnt s/WaitCnt/Waitcnt

llvm-svn: 330730
2018-04-24 15:59:59 +00:00
Matt Arsenault b21f9592be AMDGPU: Move a flawed assert when spilling SGPRs
It's possible to validly spill the frame offset register
in a call sequence to a VGPR. There are definitely issues
with SGPR spilling to memory, so move the assert later.

llvm-svn: 330612
2018-04-23 16:13:30 +00:00
Matt Arsenault adc59d7076 AMDGPU: Assign enum name to stack ID
Also assert that it is correct for SGPRs. There is currently a bug
where stack slot coloring replaces SGPR spill FIs with one with
the default ID, which results in a more confusing assert later
about a dead object.

llvm-svn: 330607
2018-04-23 15:51:26 +00:00
Nicolai Haehnle cbebba4917 AMDGPU: Fix SDWA peephole for V_AND_B32
Summary:
Found by inspection. We care about the operand that *doesn't*
contain the immediate.

I believe this is currently not hit because we fold 0xff / 0xffff
immediates only later.

Change-Id: Ic3cf8538bc7da5eff3200d96eccf9d339e6345a7

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45886

llvm-svn: 330586
2018-04-23 13:06:03 +00:00
Nicolai Haehnle 5a995664f0 AMDGPU: Fix a corner case crash in SIOptimizeExecMasking
Summary:
See the new test case; this is really unlikely to happen with real code,
but I ran into this while attempting to bugpoint-reduce a different issue.

Change-Id: I9ade1dc1aa8fd9c4d9fc83661d7b80e310b5c4a6

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45885

llvm-svn: 330585
2018-04-23 13:05:50 +00:00
Nico Weber 5d53aed419 Consistently sort add_subdirectory calls in lib/Target/*/CMakeLists.txt
llvm-svn: 330584
2018-04-23 12:49:34 +00:00
Nicolai Haehnle 7a87977fb2 AMDGPU: Legalize the operand of SI_INIT_M0
Summary:
This fixes a case where the argument to a sendmsg intrinsic
ends up in a VGPR, for whatever reason.

The underlying performance issue is that a multiplication that
can be an s_mul_i32 is instead needlessly generated as
v_mul_u32_u24, but this is not addressed by this patch.

Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45826

llvm-svn: 330393
2018-04-20 07:14:25 +00:00
Stanislav Mekhanoshin 160f85794d [AMDGPU] Use packed literals with zero either lower or hi part
Differential Revision: https://reviews.llvm.org/D45790

llvm-svn: 330365
2018-04-19 21:16:50 +00:00
Mark Searles 1bc6e71f32 [AMDGPU] Do not only rely on BB number when finding bottom loop
We should also check that the "bottom" basic block of a loopis a successor of the "header" basic block, otherwise we don't propagate the information correctly when the CFG is complex. This fixes an important rendering problem with Wolfsentein 2, because of one vector-memory wait was missing.

Differential Revision: https://reviews.llvm.org/D43831

llvm-svn: 330337
2018-04-19 15:42:30 +00:00
David Stuttard 31f482c26b [AMDGPU] Fix issues for backend divergence tracking
Summary:
A change to use divergence analysis in the AMDGPU backend was getting formal
arguments incorrect (not tagged as divergent) unless they were VGPR0, VGPR1 or
VGPR2

For graphics shaders it is possible to have more than these passed in as VGPR

Modified the checking code to check for any VGPR registers passed in as formal
arguments.

Also, some intrinsics that are sources of divergence may have been lowered
during instruction selection and are missed on subsequent calls to
isSDNodeSourceOfDivergence - added the relevant AMDGPUISD checks as well.

Finally, the FunctionLoweringInfo tracks virtual registers that are live across
basic block boundaries. This is used to check for divergence of CopyFromRegister
registers using the DivergenceAnalysis analysis. For multiple blocks the lazily
evaluated inverted map VirtReg2Value was not cleared when the ValueMap map was.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45372

Change-Id: I112f3bd6dfe0f62e63ce9b43b893982778e4bee3
llvm-svn: 330257
2018-04-18 13:53:31 +00:00
Stanislav Mekhanoshin 8b20b7dc2b [AMDGPU] Enabled v2.16 literals for VOP3P
Literal encoding needs op_sel_hi to select low 16 bit in this case.

Differential Revision: https://reviews.llvm.org/D45745

llvm-svn: 330230
2018-04-17 23:09:05 +00:00
Dmitry Preobrazhensky 4c45e6ff0e [AMDGPU][MC][VI][GFX9] Added support of SDWA/DPP for v_cndmask_b32
See bug 36356: https://bugs.llvm.org/show_bug.cgi?id=36356

Differential Revision: https://reviews.llvm.org/D45446

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 330123
2018-04-16 12:41:38 +00:00
Hiroshi Inoue ae17900997 [NFC] fix trivial typos in document and comments
"not not" -> "not" etc

llvm-svn: 330083
2018-04-14 08:59:00 +00:00
Hiroshi Inoue 372ffa15cb [NFC] fix trivial typos in comments
"the the" -> "the", "we we" -> "we", etc

llvm-svn: 330006
2018-04-13 11:37:06 +00:00
Jonas Paulsson e8f1ac7063 [MachineScheduler] NFC refactoring
This patch makes tryCandidate() virtual and some utility functions like
tryLess(), tryGreater(), ... externally available (used to be static).

This makes it possible for a target to derive a new MachineSchedStrategy from
GenericScheduler and reuse most parts.

It was necessary to wrap functions with the same names in
AMDGPU/SIMachineScheduler in a local namespace.

Review: Andy Trick, Florian Hahn
https://reviews.llvm.org/D43329

llvm-svn: 329884
2018-04-12 07:21:39 +00:00
Tim Renouf fd8d4af3bc [AMDGPU] Ensure there are enough registers for wave dispatch
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.

Re-landed after noticing that the buildbot failure from 329808 seemed to
be unrelated.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45503

Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329826
2018-04-11 17:18:36 +00:00
Yaxun Liu 9381ae9791 [AMDGPU] Fix lowering enqueue_kernel
Two issues were fixed:

runtime has difficulty to allocate memory for an external symbol of a
kernel and set the address of the external symbol, therefore make the runtime
handle of an enqueued kernel an ordinary global variable. Runtime only needs
to store the address of the loaded kernel to the handle and has verified
that this approach works.

handle the situation where __enqueue_kernel* gets inlined therefore
the enqueued kernel may be used through a constant expr instead
of an instruction.

Differential Revision: https://reviews.llvm.org/D45187

llvm-svn: 329815
2018-04-11 14:46:15 +00:00
Tim Renouf 8ca33bfcf3 Revert "[AMDGPU] Ensure there are enough registers for wave dispatch"
This reverts 329808. That change caused a report of a failure in
test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect
it is an expensive-check-only error.

Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0
llvm-svn: 329811
2018-04-11 14:27:41 +00:00
Tim Renouf f26b723491 [AMDGPU] Ensure there are enough registers for wave dispatch
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45503

Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329808
2018-04-11 14:02:41 +00:00
Dmitry Preobrazhensky fc715551a3 [AMDGPU][MC][GFX9] Added v_screen_partition_4se_b32
See bug 36845: https://bugs.llvm.org/show_bug.cgi?id=36845

Differential Revision: https://reviews.llvm.org/D45443

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329801
2018-04-11 13:13:30 +00:00
Marek Olsak a9a58fa236 AMDGPU: enable 128-bit for local addr space under an option
Author: Samuel Pitoiset

ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.

Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).

v2: - fix regressions in merge-stores.ll and multiple_tails.ll

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329764
2018-04-10 22:48:23 +00:00
Nicolai Haehnle b1c3b22b4c AMDGPU/MC: Allow disassembling without symbol info
Summary:
We would like the UMR debugging tool[0] to be able to provide
disassembly for currently live waves based on plain memory
dumps, and we want to leverage the LLVM disassembler for this.

This mostly works, except that UMR clearly can't provide real
symbol info, so it wants to set DisInfo == nullptr.

[0] https://cgit.freedesktop.org/amd/umr/

Reviewers: arsenm, rampitec, artem.tamazov, dp

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45477

Change-Id: Ibb2c5af2e66f2e100b4702fd81308e1932bc4ee6
llvm-svn: 329715
2018-04-10 15:46:43 +00:00
Tim Renouf 7190a4692a [AMDGPU] For OS type AMDPAL, fixed scratch on compute shader
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).

This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.

V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading
scratch descriptor from GIT.

Reviewers: kzhuravl, nhaehnle, timcorringham

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D44468

Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 329690
2018-04-10 11:25:15 +00:00
Konstantin Zhuravlyov 6183065b97 AMDGPU: Remove max_scratch_backing_memory_byte_size from kernel header
1. Remove max_scratch_backing_memory_byte_size from kernel header
2. Make it a reserved field
3. Ignore it while parsing assembly for backwards compatibility
4. Bump up minor version of kernel header

Differential Revision: https://reviews.llvm.org/D45452

llvm-svn: 329620
2018-04-09 20:47:22 +00:00
Alex Shlyapnikov 79f2c720b5 Revert "AMDGPU: enable 128-bit for local addr space under an option"
This reverts commit r329591.

It breaks various bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/16516
http://lab.llvm.org:8011/builders/clang-ppc64be-linux/builds/17374
http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/15992
http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/11251
...

llvm-svn: 329610
2018-04-09 19:47:38 +00:00
Marek Olsak 52b033b827 AMDGPU: enable 128-bit for local addr space under an option
Author: Samuel Pitoiset

ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.

Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329591
2018-04-09 16:56:32 +00:00
Tom Stellard e753c52227 AMDGPU: Initialize GlobalISel passes
Summary:
This fixes AMDGPU GlobalISel test failures when enabling the AMDGPU
target without any other targets that use GlobalISel.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D45353

llvm-svn: 329588
2018-04-09 16:09:13 +00:00
Dmitry Preobrazhensky 2f8e146ad3 [AMDGPU][MC][GFX9] Added instructions s_mul_hi_*32, s_lshl*_add_u32
See bugs
  36841: https://bugs.llvm.org/show_bug.cgi?id=36841
  36842: https://bugs.llvm.org/show_bug.cgi?id=36842

Differential Revision: https://reviews.llvm.org/D45251

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329562
2018-04-09 13:10:33 +00:00
Dmitry Preobrazhensky ae31223ba7 [AMDGPU][MC][GFX9] Added s_call_b64
See bug 36843: https://bugs.llvm.org/show_bug.cgi?id=36843

Differential Revision: https://reviews.llvm.org/D45268

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329440
2018-04-06 18:24:49 +00:00
Dmitry Preobrazhensky 306b1a0119 [AMDGPU][MC][GFX9] Added instruction s_endpgm_ordered_ps_done
See bug 36844: https://bugs.llvm.org/show_bug.cgi?id=36844

Differential Revision: https://reviews.llvm.org/D45313

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329430
2018-04-06 17:25:00 +00:00
Dmitry Preobrazhensky f20aff565d [AMDGPU][MC][GFX9] Added instructions *saveexec*, *wrexec* and *bitreplicate*
See bug 36840: https://bugs.llvm.org/show_bug.cgi?id=36840

Differential Revision: https://reviews.llvm.org/D45250

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329419
2018-04-06 16:35:11 +00:00
Dmitry Preobrazhensky 59399ae4cc [AMDGPU][MC][VI][GFX9] Added s_atc_probe* instructions
See bug 36839: https://bugs.llvm.org/show_bug.cgi?id=36839

Differential Revision: https://reviews.llvm.org/D45249

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329408
2018-04-06 15:48:39 +00:00
Dmitry Preobrazhensky 4732d876ee [AMDGPU][MC][GFX9] Added s_dcache_discard* instructions
See bug 36838: https://bugs.llvm.org/show_bug.cgi?id=36838

Differential Revision: https://reviews.llvm.org/D45247

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329397
2018-04-06 15:08:42 +00:00
Konstantin Zhuravlyov c233ae8004 AMDGPU/Metadata: Always report a fixed number of hidden arguments
Currently it is 6. If the "feature" was not used, report dummy
hidden argument. Otherwise it does not match the kernarg size
reported in the kernel header.

Differential Revision: https://reviews.llvm.org/D45129

llvm-svn: 329341
2018-04-05 20:46:04 +00:00
Simon Pilgrim 1d793b8ac5 [SchedModel] Complete models shouldn't match against itineraries when they don't use them (PR35639)
For schedule models that don't use itineraries, checkCompleteness still checks that an instruction has a matching itinerary instead of skipping and going straight to matching the InstRWs. That doesn't seem to match what happens in TargetSchedule.cpp

This patch causes problems for a number of models that had been incorrectly flagged as complete.

Differential Revision: https://reviews.llvm.org/D43235

llvm-svn: 329280
2018-04-05 13:11:36 +00:00
Dmitry Preobrazhensky 523872ea59 [AMDGPU][MC] Enabled instruction TBUFFER_LOAD_FORMAT_XYZ for SI/CI
See bug 36958: https://bugs.llvm.org/show_bug.cgi?id=36958

Differential Revision: https://reviews.llvm.org/D45099

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329197
2018-04-04 13:54:55 +00:00
Dmitry Preobrazhensky a0b8cd038c [AMDGPU][MC] Added support of 3-element addresses for MIMG instructions
See bug 35999: https://bugs.llvm.org/show_bug.cgi?id=35999

Differential Revision: https://reviews.llvm.org/D45084

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329187
2018-04-04 13:01:17 +00:00
Nico Weber 1cbd096914 Sort targetgen calls in lib/Target/*/CMakeLists.
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.

Also remove some stale-looking comments from the Nios2 target cmakefile.

No intended behavior change.

llvm-svn: 329181
2018-04-04 12:37:44 +00:00
Nicolai Haehnle 2f5a73820c AMDGPU: Dimension-aware image intrinsics
Summary:
These new image intrinsics contain the texture type as part of
their name and have each component of the address/coordinate as
individual parameters.

This is a preparatory step for implementing the A16 feature, where
coordinates are passed as half-floats or -ints, but the Z compare
value and texel offsets are still full dwords, making it difficult
or impossible to distinguish between A16 on or off in the old-style
intrinsics.

Additionally, these intrinsics pass the 'texfailpolicy' and
'cachectrl' as i32 bit fields to reduce operand clutter and allow
for future extensibility.

v2:
- gather4 supports 2darray images
- fix a bug with 1D images on SI

Change-Id: I099f309e0a394082a5901ea196c3967afb867f04

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44939

llvm-svn: 329166
2018-04-04 10:58:54 +00:00
Nicolai Haehnle 3ffd383a15 AMDGPU: Fix copying i1 value out of loop with non-uniform exit
Summary:
When an i1-value is defined inside of a loop and used outside of it, we
cannot simply use the SGPR bitmask from the loop's last iteration.

There are also useful and correct cases of an i1-value being copied between
basic blocks, e.g. when a condition is computed outside of a loop and used
inside it. The concept of dominators is not sufficient to capture what is
going on, so I propose the notion of "lane-dominators".

Fixes a bug encountered in Nier: Automata.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743
Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D40547

llvm-svn: 329164
2018-04-04 10:57:58 +00:00
Farhana Aleen e80aeac0f2 [AMDGPU] performMinMaxCombine should not optimize patterns of vectors to min3/max3.
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.

Author: FarhanaAleen

Reviewed By: arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D45219

llvm-svn: 329131
2018-04-03 23:00:30 +00:00
Farhana Aleen 936947349a Revert "MSG"
This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de.

This was committed by mistake.

llvm-svn: 329119
2018-04-03 21:51:45 +00:00
Farhana Aleen 3ab409dc86 MSG
llvm-svn: 329114
2018-04-03 21:20:39 +00:00
Dmitry Preobrazhensky b181c7312e [AMDGPU][MC][GFX9] Added instructions v_cvt_norm_*16_f16, v_sat_pk_u8_i16
See bug 36847: https://bugs.llvm.org/show_bug.cgi?id=36847

Differential Revision: https://reviews.llvm.org/D45097

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328988
2018-04-02 17:09:20 +00:00
Dmitry Preobrazhensky 6bad04ecf5 [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions
Fixed a bug which caused Tablegen crash.

See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837

Differential Revision: https://reviews.llvm.org/D45085

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328983
2018-04-02 16:10:25 +00:00
Nico Weber f492f58182 Revert r328975, it makes TableGen assert on the bots.
llvm-svn: 328978
2018-04-02 14:20:23 +00:00
Dmitry Preobrazhensky 32c450ae6a [AMDGPU][MC][GFX9] Added s_atomic_* and s_buffer_atomic_* instructions
See bug 36837: https://bugs.llvm.org/show_bug.cgi?id=36837

Differential Revision: https://reviews.llvm.org/D45085

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328975
2018-04-02 13:52:23 +00:00
Nicolai Haehnle 4254d45a79 AMDGPU: Make isIntrinsicSourceOfDivergence table-driven
Summary:
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.

Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44938

llvm-svn: 328939
2018-04-01 17:09:14 +00:00
Nicolai Haehnle 5d0d30304c AMDGPU: Make getTgtMemIntrinsic table-driven for resource-based intrinsics
Summary:
Avoids having to list all intrinsics manually.

This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.

Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44937

llvm-svn: 328938
2018-04-01 17:09:07 +00:00
Stanislav Mekhanoshin 74e2974ac6 [AMDGPU] Fixed some instructions latencies
Differential Revision: https://reviews.llvm.org/D45073

llvm-svn: 328874
2018-03-30 16:19:13 +00:00
Michael Bedy 59e5ef793c [AMDGPU] Fix the SDWA Peephole phase to handle src for dst:UNUSED_PRESERVE.
Summary:
The phase attempts to transform operations that extract a portion of a value
into an SDWA src operand in cases where that value is used only once. It
was not prepared for this use to be the preserved portion of a value for
dst:UNUSED_PRESERVE, resulting in a crash or assert.

This change either rejects the illegal SDWA attempt, or in the case where
dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded
extract instruction.

Reviewers: arsenm, #amdgpu

Reviewed By: arsenm, #amdgpu

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44364

llvm-svn: 328856
2018-03-30 05:03:36 +00:00
Matt Arsenault efd1b30436 AMDGPU: Fix build warning in release
llvm-svn: 328832
2018-03-29 21:44:44 +00:00
Matt Arsenault 03ae399d50 AMDGPU: Support realigning stack
While the stack access instructions don't care about
alignment > 4, some transformations on the pointer calculation
do make assumptions based on knowing the low bits of a pointer
are 0. If a stack object ends up being accessed through its
absolute address (relative to the kernel scratch wave offset),
the addressing expression may depend on the stack frame being
properly aligned. This was breaking in a testcase due to the
add->or combine.

I think some of the SP/FP handling logic is still backwards,
and overly simplistic to support all of the stack features.
Code which tries to modify the SP with inline asm for example
or variable sized objects will probably require redoing this.

llvm-svn: 328831
2018-03-29 21:30:06 +00:00
Matt Arsenault ffb132e74b AMDGPU: Increase default stack alignment
8 and 16-byte values are common, so increase the default
alignment to avoid realigning the stack in most functions.

llvm-svn: 328821
2018-03-29 20:22:04 +00:00
Matt Arsenault 6c041a3cab AMDGPU: Fix selection error on constant loads with < 4 byte alignment
llvm-svn: 328818
2018-03-29 19:59:28 +00:00
Craig Topper 2fa1436206 [IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to CodeGen layer.
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.

The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.

Differential Revision: https://reviews.llvm.org/D45017

llvm-svn: 328806
2018-03-29 17:21:10 +00:00
David Blaikie a373d18eb7 Transforms: Introduce Transforms/Utils.h rather than spreading the declarations amongst Scalar.h and IPO.h
Fixes layering - Transforms/Utils shouldn't depend on including a Scalar
or IPO header, because Scalar and IPO depend on Utils.

llvm-svn: 328717
2018-03-28 17:44:36 +00:00
Dmitry Preobrazhensky 622bde8bc7 [AMDGPU][MC] Added ds_add_src2_f32
See bug 36833: https://bugs.llvm.org/show_bug.cgi?id=36833

Differential Revision: https://reviews.llvm.org/D44779

Reviewers: arsenm, artem.tamazov, timcorringham
llvm-svn: 328713
2018-03-28 16:21:56 +00:00
Dmitry Preobrazhensky 2456ac696a [AMDGPU][MC] Added PCK variants of image load/store instructions
See bug 36834: https://bugs.llvm.org/show_bug.cgi?id=36834

Differential Revision: https://reviews.llvm.org/D44795

Reviewers: artem.tamazov, arsenm, timcorringham, nhaehnle
llvm-svn: 328710
2018-03-28 15:44:16 +00:00
Dmitry Preobrazhensky a917e88585 [AMDGPU][MC][GFX9] Added buffer_*_format_d16_hi_x
See bug 36835: https://bugs.llvm.org/show_bug.cgi?id=36835

Differential Revision: https://reviews.llvm.org/D44825

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328707
2018-03-28 14:53:13 +00:00
Dmitry Preobrazhensky dd2b929ffb [AMDGPU][MC][GFX9] Added s_scratch* instructions
See bug 36836: https://bugs.llvm.org/show_bug.cgi?id=36836

Differential Revision: https://reviews.llvm.org/D44832

Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 328704
2018-03-28 14:08:03 +00:00
Tim Renouf cdac172e2a Revert "[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader"
This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981.

It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a
release-with-asserts build. I will resubmit the change when I have fixed
that.

Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a
llvm-svn: 328695
2018-03-28 11:21:07 +00:00
Matt Arsenault bd49eccca1 AMDGPU: Really implement getFrameRegister
Currently this seems to only really be used for debug
info.

llvm-svn: 328677
2018-03-27 23:26:59 +00:00
Tim Renouf e4208bfa5b [AMDGPU] For OS type AMDPAL, fixed scratch on compute shader
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).

This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.

Reviewers: kzhuravl, nhaehnle, timcorringham

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm

Differential Revision: https://reviews.llvm.org/D44468

Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 328673
2018-03-27 21:35:00 +00:00
Matt Arsenault 17f3338015 AMDGPU: Fix not preserving CSR VGPR if used for SGPR spills
Before this was not done if the function had no calls in it. This
is still a possible issue with any callable function, regardless
of calls present.

llvm-svn: 328659
2018-03-27 19:42:55 +00:00
Matt Arsenault 95329f8c53 AMDGPU: Set natural stack alignment in DataLayout
Only 4 byte alignment is ever useful, so increasing anything
beyond this may require realigning the stack.

llvm-svn: 328656
2018-03-27 19:26:40 +00:00
Matt Arsenault 0a0c871f60 AMDGPU: Fix crash when MachinePointerInfo invalid
The combine on a select of a load only triggers for
addrspace 0, and discards the MachinePointerInfo. The
conservative default needs to be used for this.

llvm-svn: 328652
2018-03-27 18:39:45 +00:00
Matt Arsenault e9f3679031 AMDGPU: Fix FP restore from being reordered with stack ops
In a function, s5 is used as the frame base SGPR. If a function
is calling another function, during the call sequence
it is copied to a preserved SGPR and restored.

Before it was possible for the scheduler to move stack operations
before the restore of s5, since there's nothing to associate
a frame index access with the restore.

Add an implicit use of s5 to the adjcallstack pseudo which ends
the call sequence to preven this from happening. I'm not 100%
satisfied with this solution, but I'm not sure what else would be
better.

llvm-svn: 328650
2018-03-27 18:38:51 +00:00
Tim Corringham 7116e8963d [AMDGPU] Improve disassembler error handling
Summary:
llvm-objdump now disassembles unrecognised opcodes as data, using
the .long directive. We treat unrecognised opcodes as being 32 bit
values, so move along 4 bytes rather than the single byte which
previously resulted in a cascade of bogus disassembly following an
unrecognised opcode.

While no solution can always disassemble code that contains
embedded data correctly this provides a significant improvement.

The disassembler will now cope with an arbitrary length section
as it no longer truncates it to a multiple of 4 bytes, and will
use the .byte directive for trailing bytes.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D44685

llvm-svn: 328553
2018-03-26 17:06:33 +00:00
Nicolai Haehnle 4f850eabb6 AMDGPU: Introduce common SOP_Pseudo and VOP_Pseudo TableGen base classes
Differential revision: https://reviews.llvm.org/D44820

Change-Id: I732979e2964006aa15d78a333d8886e6855f319a
llvm-svn: 328496
2018-03-26 13:56:53 +00:00
Mandeep Singh Grang 860adef9e6 [AMDGPU] Change std::sort to llvm::sort in response to r327219
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.

To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.

Reviewers: tstellar, RKSimon, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44856

llvm-svn: 328429
2018-03-24 17:15:04 +00:00
David Blaikie 36a0f226b1 Fix layering by moving ValueTypes.h from CodeGen to IR
ValueTypes.h is implemented in IR already.

llvm-svn: 328397
2018-03-23 23:58:31 +00:00
David Blaikie 13e77db2df Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

llvm-svn: 328395
2018-03-23 23:58:25 +00:00
David Blaikie 6054e650ff Move TargetLoweringObjectFile from CodeGen to Target to fix layering
It's implemented in Target & include from other Target headers, so the
header should be in Target.

llvm-svn: 328392
2018-03-23 23:58:19 +00:00
Tony Tye 7a893d4e34 [AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU
- Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target.
- Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS.

Differential Revision: https://reviews.llvm.org/D43736

llvm-svn: 328349
2018-03-23 18:45:18 +00:00
David Blaikie 2be3922807 Fix a couple of layering violations in Transforms
Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering.

Transforms depends on Transforms/Utils, not the other way around. So
remove the header and the "createStripGCRelocatesPass" function
declaration (& definition) that is unused and motivated this dependency.

Move Transforms/Utils/Local.h into Analysis because it's used by
Analysis/MemoryBuiltins.cpp.

llvm-svn: 328165
2018-03-21 22:34:23 +00:00
Nirav Dave 3264c1bdf6 [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"
Reland ISel cycle checking improvements after simplifying node id
invariant traversal and correcting typo.

llvm-svn: 327898
2018-03-19 20:19:46 +00:00
Nicolai Haehnle 4186cc7c08 TableGen: Check the dynamic type of !cast<Rec>(string)
Summary:
The docs already claim that this happens, but so far it hasn't. As a
consequence, existing TableGen files get this wrong a lot, but luckily
the fixes are all reasonably straightforward.

To make this work with all the existing forms of self-references (since
the true type of a record is only built up over time), the lookup of
self-references in !cast is delayed until the final resolving step.

Change-Id: If5923a72a252ba2fbc81a889d59775df0ef31164

Reviewers: arsenm, craig.topper, tra, MartinO

Subscribers: wdng, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D44475

llvm-svn: 327849
2018-03-19 14:14:20 +00:00
Matt Arsenault fed0a45036 AMDGPU/GlobalISel: RegBankSelect for basic int ops
llvm-svn: 327843
2018-03-19 14:07:23 +00:00
Matt Arsenault 69932e4d69 AMDGPU: Don't leave dead illegal VGPR->SGPR copies
Normally DCE kills these, but at -O0 these get left behind
leaving suspicious looking illegal copies.

Replace with IMPLICIT_DEF to avoid iterator issues.

llvm-svn: 327842
2018-03-19 14:07:15 +00:00
Nirav Dave 5f0ab71b62 Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172""
as it times out building test-suite on PPC.

llvm-svn: 327778
2018-03-17 19:24:54 +00:00
Nirav Dave 982d3a56ea [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"
Reland ISel cycle checking improvements after simplifying and reducing
node id invariant traversal.

llvm-svn: 327777
2018-03-17 17:42:10 +00:00
Matt Arsenault abdc4f2dc7 AMDGPU/GlobalISel: Cleanup constant legality
llvm-svn: 327774
2018-03-17 15:17:48 +00:00
Matt Arsenault 685d1e8157 AMDGPU/GlobalISel: Basic G_GEP legality
llvm-svn: 327773
2018-03-17 15:17:45 +00:00
Matt Arsenault 85803366d6 AMDGPU/GlobalISel: Basic legality for load/store
llvm-svn: 327772
2018-03-17 15:17:41 +00:00
Farhana Aleen c6c9dc8773 [AMDGPU] Supported ds_write_b128 generation.
Summary: This is a follow-on patch of https://reviews.llvm.org/D44210

Author: FarhanaAleen

Reviewed By: msearles

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44319

llvm-svn: 327726
2018-03-16 18:12:00 +00:00
Dmitry Preobrazhensky 4c8f4234b6 [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP opcodes
See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751

Differential Revision: https://reviews.llvm.org/D44529

Reviewers: artem.tamazov, arsenm
llvm-svn: 327723
2018-03-16 16:38:04 +00:00
Dmitry Preobrazhensky 9c1a6e7e24 [AMDGPU][MC] Corrected default values for unused SDWA operands
See bug 36355:  https://bugs.llvm.org/show_bug.cgi?id=36355

Differential Revision: https://reviews.llvm.org/D44481

Reviewers: artem.tamazov, arsenm
llvm-svn: 327720
2018-03-16 15:40:27 +00:00
Mark Searles c3c02bde73 [AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed.
Differential Revision: https://reviews.llvm.org/D44434

llvm-svn: 327583
2018-03-14 22:04:32 +00:00
Dmitry Preobrazhensky d98c97b4f9 [AMDGPU][MC][GFX8] Added BUFFER_STORE_LDS_DWORD Instruction
See bug 36558: https://bugs.llvm.org/show_bug.cgi?id=36558

Differential Revision: https://reviews.llvm.org/D43950

Reviewers: artem.tamazov, arsenm
llvm-svn: 327299
2018-03-12 17:29:24 +00:00
Yaxun Liu a99e7d8e44 [AMDGPU] Fix lowering enqueue kernel when kernel has no name
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.

Differential Revision: https://reviews.llvm.org/D44322

llvm-svn: 327291
2018-03-12 16:34:06 +00:00
Dmitry Preobrazhensky da4a7c01bf [AMDGPU][MC] Corrected GATHER4 opcodes
See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252

Differential Revision: https://reviews.llvm.org/D43874

Reviewers: artem.tamazov, arsenm
llvm-svn: 327278
2018-03-12 15:03:34 +00:00
Matt Arsenault 7b9ed89dcf AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT|EXTRACT}_VECTOR_ELT
llvm-svn: 327269
2018-03-12 13:35:53 +00:00
Matt Arsenault c0aefd561e AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES
llvm-svn: 327268
2018-03-12 13:35:49 +00:00
Matt Arsenault 503afda95f AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal
llvm-svn: 327267
2018-03-12 13:35:43 +00:00
Michael Bedy 80cf9ff564 Test commit - change comment slightly.
llvm-svn: 327234
2018-03-11 03:27:50 +00:00
Matt Arsenault cbda7ff4ae AMDGPU: Fix crash when constant folding with physreg operand
llvm-svn: 327209
2018-03-10 16:05:35 +00:00
Nirav Dave 042678bd55 Revert: r327172 "Correct load-op-store cycle detection analysis"
r327171 "Improve Dependency analysis when doing multi-node Instruction Selection"
        r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection"

Reverting patch as NodeId invariant change is causing pathological
increases in compile time on PPC

llvm-svn: 327197
2018-03-10 02:16:15 +00:00
Nirav Dave 071699bf82 [DAG] Enforce stricter NodeId invariant during Instruction selection
Instruction Selection makes use of the topological ordering of nodes
by node id (a node's operands have smaller node id than it) when doing
cycle detection.  During selection we may violate this property as a
selection of multiple nodes may induce a use dependence (and thus a
node id restriction) between two unrelated nodes. If a selected node
has an unselected successor this may allow us to miss a cycle in
detection an invalid selection.

This patch fixes this by marking all unselected successors of a
selected node have negated node id.  We avoid pruning on such negative
ids but still can reconstruct the original id for pruning.

In-tree targets have been updated to replace DAG-level replacements
with ISel-level ones which enforce this property.

This preemptively fixes PR36312 before triggering commit r324359 relands

Reviewers: craig.topper, bogner, jyknight

Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D43198

llvm-svn: 327170
2018-03-09 20:57:15 +00:00
Farhana Aleen a7cb31123c [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space.
Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64.
         This patch supports ds_read_b128 instruction pattern and generation of this instruction.
         In the vectorizer, this patch also widen the vector length so that vectorizer generates
         128 bit loads for local address-space which gets translated to ds_read_b128.
         Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128.

Author: FarhanaAleen

Reviewed By: rampitec, arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44210

llvm-svn: 327153
2018-03-09 17:41:39 +00:00
Stanislav Mekhanoshin c8127fc674 [AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9
GFX9 should select opsel version.

Differential Revision: https://reviews.llvm.org/D44279

llvm-svn: 327106
2018-03-09 07:21:43 +00:00
Matt Arsenault c3fe46bbcf AMDGPU/GlobalISel: Pass subtarget + TM to LegalizerInfo
These are the parameters x86 already uses.

llvm-svn: 327020
2018-03-08 16:24:16 +00:00
Farhana Aleen 89196642f7 [AMDGPU] Increased vector length for global/constant loads.
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache;
         loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44179

llvm-svn: 326910
2018-03-07 17:09:18 +00:00
Farhana Aleen 347d12b4ce Revert "[AMDGPU] Widened vector length for global/constant address space."
This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03.

llvm-svn: 326907
2018-03-07 16:55:27 +00:00
Farhana Aleen 0d03d0588d [AMDGPU] Widened vector length for global/constant address space.
llvm-svn: 326904
2018-03-07 16:29:05 +00:00
Craig Topper 80d3bb3b4b [TargetLowering] Rename DAGCombinerInfo::isAfterLegalizeVectorOps to DAGCombiner::isAfterLegalizeDAG since that's what it checks. NFC
The code checks Level == AfterLegalizeDAG which is the fourth and last of the possible DAG combine stages that we have.

There is a Level called AfterLegalVectorOps, but that's the third DAG combine and it doesn't always run.

A function called isAfterLegalVectorOps should imply it returns true in either of the DAG combines that runs after the legalize vector ops stage, but that's not what this function does.

llvm-svn: 326832
2018-03-06 19:44:52 +00:00
Stanislav Mekhanoshin 0f72225433 [AMDGPU] Add default ISA version targets
In case if -mattr used to modify feature set bits in llvm-mc call
getIsaVersion can fail to identify specific ISA due to test mismatch.
Adding default fallback tests which will always correctly report at
least major version.

Differential Revision: https://reviews.llvm.org/D44163

llvm-svn: 326825
2018-03-06 18:33:55 +00:00
Yaxun Liu 46439e8d4a [AMDGPU] Fix lowering OpenCL enqueue_kernel
One addrspacecast disappeared in clang emitted IR for
block invoke function due to adoption of the new
addr space mapping.

Differential Revision: https://reviews.llvm.org/D43785

llvm-svn: 326806
2018-03-06 16:04:39 +00:00
Matt Arsenault e31ab94e97 AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACT
llvm-svn: 326715
2018-03-05 16:25:18 +00:00
Matt Arsenault 71272e6d4e AMDGPU/GlobalISel: Make some G_EXTRACTs legal
As far as I can tell legalization of weird sizes for the
output type isn't implemented.

llvm-svn: 326714
2018-03-05 16:25:15 +00:00
Matt Arsenault 4cc0b85276 AMDGPU: Fix build warning about override
llvm-svn: 326713
2018-03-05 16:25:10 +00:00
Alexander Timofeev 2e5eeceeb7 Pass Divergence Analysis data to Selection DAG to drive divergence
dependent instruction selection.

Differential revision: https://reviews.llvm.org/D35267

llvm-svn: 326703
2018-03-05 15:12:21 +00:00
Matt Arsenault b9699c009d AMDGPU/GlobalISel: InstrMapping for G_ZEXT
llvm-svn: 326589
2018-03-02 16:55:37 +00:00
Matt Arsenault 1c1aab99ae AMDGPU/GlobalISel: InstrMapping for G_TRUNC
llvm-svn: 326588
2018-03-02 16:55:33 +00:00
Matt Arsenault ef8db767d7 AMDGPU/GlobalISel: Define InstrMappings for G_FCMP
Patch by Tom Stellard

llvm-svn: 326587
2018-03-02 16:53:15 +00:00
Matt Arsenault 2607dc60de AMDGPU/GlobalISel: Define instruction mapping for @llvm.minnum
Patch by Tom Stellard

llvm-svn: 326586
2018-03-02 16:40:17 +00:00
Matt Arsenault b46c191c49 AMDGPU/GlobalISel: Define instruction mapping for @llvm.maxnum
Patch by Tom Stellard

llvm-svn: 326567
2018-03-02 12:23:00 +00:00
Jan Vesely b283ea0f0f AMDGPU/GCN: Promote i16 ctpop
i16 capable ASICs do not support i16 operands for this instruction.
Add tablegen pattern to merge chained i16 additions.

Differential Revision: https://reviews.llvm.org/D43985

llvm-svn: 326535
2018-03-02 02:50:22 +00:00
Matt Arsenault 41d2e3d98e AMDGPU/GlobalISel: Define instruction mapping for G_FPTOSI
Patch by Tom Stellard

llvm-svn: 326534
2018-03-02 02:19:16 +00:00
Matt Arsenault b23041ad4d AMDGPU/GlobalISel: Define instruction mapping for G_FPTOUI
Patch by Tom Stellard

llvm-svn: 326533
2018-03-02 02:19:11 +00:00
Matt Arsenault 327d5fb2e5 AMDGPU/GlobalISel: Define instruction mapping for G_FMUL
llvm-svn: 326532
2018-03-02 02:17:01 +00:00
Matt Arsenault 5a9e834eac AMDGPU/GlobalISel: Define instruction mapping for G_FADD
Patch by Tom Stellard

llvm-svn: 326526
2018-03-02 01:22:13 +00:00
Matt Arsenault d99317f1b3 AMDGPU/GlobalISel: Define instruction mapping for G_SHL
Patch by Tom Stellard

llvm-svn: 326525
2018-03-02 01:22:10 +00:00
Matt Arsenault 3c7a123ccc AMDGPU/GlobalISel: Define instruction mapping for G_XOR
llvm-svn: 326524
2018-03-02 01:22:06 +00:00
Matt Arsenault c0f34c9e36 AMDGPU/GlobalISel: Define instruction mapping for G_AND
Patch by Tom Stellard

llvm-svn: 326523
2018-03-02 01:22:01 +00:00
Matt Arsenault 364f12e8f9 AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.cvt.pkrtz
Patch by Tom Stellard

llvm-svn: 326490
2018-03-01 21:25:30 +00:00
Matt Arsenault 5320ee4a05 AMDGPU/GlobalISel: Define instruction mapping for G_OR
Patch by Tom Stellard

llvm-svn: 326489
2018-03-01 21:25:25 +00:00
Matt Arsenault e65404f5c5 AMDGPU/GlobalISel: Remove default register mapping
This crashes for some opcodes, which prevents the SelectionDAG
fallback from working.

Patch by Tom Stellard

llvm-svn: 326487
2018-03-01 21:20:44 +00:00
Matt Arsenault 1422a19a88 AMDGPU/GlobalISel: Use a more correct getValueMapping
This was finding the wrong size registers for anything with
more than 2 components.

Patch by Tom Stellard

llvm-svn: 326483
2018-03-01 21:08:51 +00:00
Matt Arsenault 62669ede94 AMDGPU/GlobalISel: Define instruction mapping for G_BITCAST
Patch by Tom Stellard

llvm-svn: 326482
2018-03-01 20:59:44 +00:00
Matt Arsenault 0529a8e2de AMDGPU/GlobalISel: Mark i32->i64 zext as legal
llvm-svn: 326481
2018-03-01 20:56:21 +00:00
Matt Arsenault 36b99e1937 AMDGPU/GlobalISel: InstrMapping for llvm.amdgcn.exp.compr
Patch by Tom Stellard

llvm-svn: 326479
2018-03-01 20:40:55 +00:00
Matt Arsenault 8931bbf8df AMDGPU/GlobalISel: Define instruction mapping for @llvm.amdgcn.exp
Patch by Tom Stellard

llvm-svn: 326477
2018-03-01 20:24:37 +00:00
Matt Arsenault 50721ab325 AMDGPU/GlobalISel: Define InstrMappings for G_ICMP
Patch by Tom Stellard

llvm-svn: 326472
2018-03-01 19:27:10 +00:00
Matt Arsenault dc14ec05d4 AMDGPU/GlobalISel: Make i32 mul legal
llvm-svn: 326471
2018-03-01 19:22:05 +00:00
Matt Arsenault 06cbb27a79 AMDGPU/GlobalISel: Define instruction mapping for G_IMPLICIT_DEF
Patch by Tom Stellard

llvm-svn: 326470
2018-03-01 19:16:52 +00:00
Matt Arsenault e3d9ecf2b9 AMDGPU/GlobalISel: Define instruction mapping for G_FCONSTANT
Patch by Tom Stellard

llvm-svn: 326468
2018-03-01 19:13:30 +00:00
Matt Arsenault 51b0b20023 AMDGPU/GlobalISel: Add copyCost for VGPR->SGPR copies
Patch by Tom Stellard

llvm-svn: 326467
2018-03-01 19:09:25 +00:00
Matt Arsenault 3f6a204eaa AMDGPU/GlobalISel: Make i32 xor legal
llvm-svn: 326466
2018-03-01 19:09:21 +00:00
Matt Arsenault 8e80a5fbca AMDGPU/GlobalISel: Mark 32/64-bit G_FCMP as legal
Patch by Tom Stellard

llvm-svn: 326465
2018-03-01 19:09:16 +00:00
Matt Arsenault dd022ce064 AMDGPU/GlobalISel: Mark 32-bit G_FPTOSI as legal
Patch by Tom Stellard

llvm-svn: 326464
2018-03-01 19:04:25 +00:00
Alexander Timofeev 0081d23fd8 [AMDGPU] : fix for the crash in SIRegisterInfo when the regiser class not found
Differential revision: https://reviews.llvm.org./D43334

llvm-svn: 326451
2018-03-01 17:36:43 +00:00
Tim Renouf 2a99fa2c08 [AMDGPU] added writelane intrinsic
Summary:
For use by LLPC SPV_AMD_shader_ballot extension.

The v_writelane instruction was already implemented for use by SGPR
spilling, but I had to add an extra dummy operand tied to the
destination, to represent that all lanes except the selected one keep
the old value of the destination register.

.ll test changes were due to schedule changes caused by that new
operand.

Differential Revision: https://reviews.llvm.org/D42838

llvm-svn: 326353
2018-02-28 19:10:32 +00:00
Konstantin Zhuravlyov 40b09e86b9 AMDGPU: Add fast fmaf feature to gfx702
Differential Revision: https://reviews.llvm.org/D43790

llvm-svn: 326252
2018-02-27 21:46:15 +00:00
Matt Arsenault 2a26a286db AMDGPU/GlobalISel: Make f64 constants legal
llvm-svn: 326101
2018-02-26 17:20:43 +00:00
Tim Renouf 832f90fa0c [AMDGPU] Scratch setup fix on AMDPAL gfx9+ merge shader
Summary:
With OS type AMDPAL, the scratch descriptor is hardwired to be loaded
from offset 0 of the global information table, whose low pointer is
passed in s0. For a merge shader on gfx9+, it needs to be s8 instead, as
the hardware reserves s0-s7.

Reviewers: kzhuravl

Subscribers: arsenm, nhaehnle, dstuttard, llvm-commits, t-tye, yaxunl, wdng, kzhuravl

Differential Revision: https://reviews.llvm.org/D42203

llvm-svn: 326088
2018-02-26 14:46:43 +00:00
Stanislav Mekhanoshin fa48c496e2 [AMDGPU] Shrinking V_SUBBREV_U32
V_SUBBREV_U32 is a commute opcode for V_SUBB_U32. However, when
we try to commute V_SUBB_U32 in order to shrink it we do not then
process V_SUBBREV_U32 and it stay VOP3. This is fixed.

Differential Revision: https://reviews.llvm.org/D43699

llvm-svn: 326011
2018-02-24 01:32:32 +00:00
Geoff Berry d6ba3dbbbd Fix compiler warning introduced in r325931. NFC.
llvm-svn: 325938
2018-02-23 19:11:33 +00:00
Geoff Berry f8bf2ec0a8 [MachineOperand][Target] MachineOperand::isRenamable semantics changes
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers.  This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.

Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).

Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.

Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.

Clear the IsRenamable bit when changing an operand's register value.

Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.

Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.

Reviewers: qcolombet, MatzeB

Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D43042

llvm-svn: 325931
2018-02-23 18:25:08 +00:00
Nicolai Haehnle 6cf306deca AMDGPU: Track physreg uses in SILoadStoreOptimizer
Summary:
This handles def-after-use of physregs, and allows us to merge loads and
stores even across some physreg defs (typically M0 defs).

Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42647

llvm-svn: 325882
2018-02-23 10:45:56 +00:00
Nicolai Haehnle 40b140fef1 AMDGPU: Stop using .NAME in .td files
Summary:
.NAME is a bit of an odd duck, in that we should really treat it like
a template argument, but we currently don't, and so when and where
NAME is initialized and how is pretty inconsistent. Best to just avoid
using it as a field of already instantiated records, and use cast to
string instead.

Change-Id: I5a0c202401cede3d5c3827ab9c7858ea48b29108

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D43551

llvm-svn: 325794
2018-02-22 15:25:11 +00:00
Hiroshi Inoue 7f9f92f8b6 [NFC] fix trivial typos in comments
"a a" -> "a"

llvm-svn: 325752
2018-02-22 07:48:29 +00:00
Nicolai Haehnle 770397f4cd AMDGPU: Do not combine loads/store across physreg defs
Summary:
Since this pass operates on machine SSA form, this should only really
affect M0 in practice.

Fixes various piglit variable-indexing/vs-varying-array-mat4-index-*

Change-Id: Ib2a1dc3a8d7b08225a8da49a86f533faa0986aa8
Fixes: r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4")

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40343

llvm-svn: 325677
2018-02-21 13:31:35 +00:00
Dmitry Preobrazhensky d6e1a9404d [AMDGPU][MC] Added lds support for MUBUF instructions
See bug 28234: https://bugs.llvm.org/show_bug.cgi?id=28234

Differential Revision: https://reviews.llvm.org/D43472

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 325676
2018-02-21 13:13:48 +00:00
Konstantin Zhuravlyov 5c1237a1fd Revert "[AMDGPU] Increased vector length for global/constant loads."
https://reviews.llvm.org/rL325518

It breaks following OpenCL conformance tests:
  - Basic - parameter_types
  - Basic - vload_private

llvm-svn: 325643
2018-02-20 23:30:21 +00:00
Tim Renouf 8234b4893a [AMDGPU] stop buffer_store being moved illegally
Summary:
The machine instruction scheduler was illegally moving a buffer store
past a buffer load with the same descriptor and offset. Fixed by marking
buffer ops as mayAlias and isAliased. This may be overly conservative,
and we may need to revisit.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D43332

Change-Id: Iff3173d9e0653e830474546276ab9d30318b8ef7
llvm-svn: 325567
2018-02-20 10:03:38 +00:00
Mark Searles 65207923f6 [AMDGPU] Make note of existing waitcnt instrs; this is add-on work related to suppression of redundant waitcnt instrs. It is necessary to make note of these existing waitcnt instrs so that we do not fall into an infinite loop when handling loops. Also, [NFC] some minor code clean-up.
llvm-svn: 325524
2018-02-19 19:19:59 +00:00
Mark Searles 419bdab759 [AMDGPU] Increased vector length for global/constant loads.
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D43275

llvm-svn: 325518
2018-02-19 16:42:49 +00:00
Jonas Paulsson b51a9bc358 [AMDGPU] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Stanislav Mekhanoshin, Tom Stellard.
llvm-svn: 325425
2018-02-17 10:00:28 +00:00
Konstantin Zhuravlyov ef9aafcddc AMDGPU: Remove unused private member of AMDGPUTargetELFStreamer
llvm-svn: 325408
2018-02-16 23:04:11 +00:00
Eric Christopher 8ceddb0ecd Remove an unused function.
llvm-svn: 325403
2018-02-16 22:46:47 +00:00
Konstantin Zhuravlyov 9122a63143 AMDGPU: Bring elf flags in sync with the spec
- Add MACH flags
- Add XNACK flag
- Add reserved flags
- Minor cleanups in docs

Differential Revision: https://reviews.llvm.org/D43356

llvm-svn: 325399
2018-02-16 22:33:59 +00:00
Konstantin Zhuravlyov 331f97e171 AMDGPU: Bring processors and features in sync with the spec
- Remove gfx800
- Make iceland gfx802
- Add xnack to gfx902

Differential Revision: https://reviews.llvm.org/D43355

llvm-svn: 325393
2018-02-16 21:26:25 +00:00
Changpeng Fang ba92059ca9 AMDGPU/SI: Extend promoting alloca to vector to arrays of up to 16 elements
Summary:
  This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce
an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed.

Reviewers:
  arsenm

Differential Revision:
  https://reviews.llvm.org/D33559

llvm-svn: 325372
2018-02-16 19:14:17 +00:00
Changpeng Fang da38b5fd49 AMDGPU/SI: Turn off GPR Indexing Mode immediately after the interested instruction.
Summary:
  In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction
is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect
vgpr.
 In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction.

Reviewers:
  rampitec

Differential Revision:
  https://reviews.llvm.org/D43297

llvm-svn: 325355
2018-02-16 16:31:30 +00:00
Stanislav Mekhanoshin ff2763a658 [AMDGPU] Combine adjacent waitcounts in a single strongest wait
Differential Revision: https://reviews.llvm.org/D43350

llvm-svn: 325299
2018-02-15 22:03:55 +00:00
Stanislav Mekhanoshin c078ca92eb [AMDGPU] Remove non-temporal flag from argument loads
Kernel arguments likely read by all workitems and should not bypass
cache. Fixes performance hit in sub-dword argument loads.

Differential Revision: https://reviews.llvm.org/D43249

llvm-svn: 325146
2018-02-14 18:05:14 +00:00
Yaxun Liu 0124b5484c [AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43170

llvm-svn: 325030
2018-02-13 18:00:25 +00:00
Daniel Neilson a60f4621ae [AMDGPUPromoteAlloca] Replace deprecated memory intrinsic APIs (NFCI)
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AMDGPUPromoteAlloca pass to cease using:
1) The old getAlignment() API of MemoryIntrinsic in favour of getting source & dest specific
alignments through the new API.
2) The old IRBuilder createMemCpy/createMemMove single-alignment APIs in favour of the new
API that allows setting source and destination alignments independently.

Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, r323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.

Reference
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

llvm-svn: 324774
2018-02-09 21:56:15 +00:00
Matt Arsenault 0063ce7201 AMDGPU: Remove tied operand from si_else
llvm-svn: 324751
2018-02-09 17:18:38 +00:00
Matt Arsenault 923712b6b5 Reapply "AMDGPU: Add 32-bit constant address space"
This reverts r324494 and reapplies r324487.

llvm-svn: 324747
2018-02-09 16:57:57 +00:00
Matt Arsenault bcf7bec4b8 AMDGPU: Fix layering issue
Move utility function that depends on codegen.
Fixes build with r324487 reapplied.

llvm-svn: 324746
2018-02-09 16:57:48 +00:00
Stanislav Mekhanoshin 9c6cd0458b [AMDGPU] More descriptive names in the memory legalizer
NFC.

Differential Revision: https://reviews.llvm.org/D43054

llvm-svn: 324712
2018-02-09 06:05:33 +00:00
Matt Arsenault 9c2f3c4852 AMDGPU: Process SDWA block at a time
Right now this loops over the entire function every time there
is a change, which is not very efficient. There's no practical
reason to track this so globally, since the code motion optimization
passes should be sinking instructions with single uses and
the pass currently will not fold with multiple uses.

llvm-svn: 324667
2018-02-08 22:46:41 +00:00
Matt Arsenault c24d5e2819 AMDGPU: Minor cleanups
Column limit, typo, unnecessary reference

llvm-svn: 324666
2018-02-08 22:46:38 +00:00
Matt Arsenault b02cebf552 AMDGPU: Fix incorrect reordering when inline asm defines LDS address
Defs of operands outside of the instruction's explicit defs need
to be checked.

llvm-svn: 324554
2018-02-08 01:56:14 +00:00
Matt Arsenault c908e3f77a AMDGPU: Don't crash when trying to fold implicit operands
llvm-svn: 324550
2018-02-08 01:12:46 +00:00
Stanislav Mekhanoshin db39b4b0b4 [AMDGPU] Fixed wait count reuse
The code reusing existing wait counts is incorrect since it keeps
adding new operands to an old instruction instead of replacing
the immediate. It was also effectively switched off by the condition
that wait count is not an AMDGPU::S_WAITCNT.

Also switched to BuildMI instead of creating instructions directly.

Differential Revision: https://reviews.llvm.org/D42997

llvm-svn: 324547
2018-02-08 00:18:35 +00:00
Rafael Espindola f4e3f3e31c Revert "AMDGPU: Add 32-bit constant address space"
This reverts commit r324487.

It broke clang tests.

llvm-svn: 324494
2018-02-07 18:09:35 +00:00
Marek Olsak 871c30e540 AMDGPU: Add 32-bit constant address space
Note: This is a candidate for LLVM 6.0, because it was planned to be
      in that release but was delayed due to a long review period.

Merge conflict in release_60 - resolution:
    Add "-p6:32:32" into the second (non-amdgiz) string.

Only scalar loads support 32-bit pointers. An address in a VGPR will
fail to compile. That's OK because the results of loads will only be used
in places where VGPRs are forbidden.

Updated AMDGPUAliasAnalysis and used SReg_64_XEXEC.
The tests cover all uses cases we need for Mesa.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D41651

llvm-svn: 324487
2018-02-07 16:01:00 +00:00
Marek Olsak b2cc77985b AMDGPU: Remove the s_buffer workaround for GFX9 chips
Summary:
I checked the AMD closed source compiler and the workaround is only
needed when x3 is emulated as x4, which we don't do in LLVM.

SMEM x3 opcodes don't exist, and instead there is a possibility to use x4
with the last component being unused. If the last component is out of
buffer bounds and falls on the next 4K page, the hw hangs.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D42756

llvm-svn: 324486
2018-02-07 16:00:40 +00:00
Tom Stellard 33445765dd AMDGPU/GlobalISel: Mark 32-bit G_FPTOUI as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D42152

llvm-svn: 324446
2018-02-07 04:47:59 +00:00
Mark Searles 24c92eeb83 [AMDGPU] Suppress redundant waitcnt instrs.
1. Run the memory legalizer prior to the waitcnt pass; keep the policy that the waitcnt pass does not remove any waitcnts within the incoming IR.

2. The waitcnt pass doesn't (yet) track waitcnts that exist prior to the waitcnt pass (it just skips over them); because the waitcnt pass is ignorant of them, it may insert a redundant waitcnt. To avoid this, check the prev instr. If it and the to-be-inserted waitcnt are the same, suppress the insertion. We keep the existing waitcnt under the assumption that whomever, e.g., the memory legalizer, inserted it knows what they were doing.

3. Follow-on work: teach the waitcnt pass to record the pre-existing waitcnts for better waitcnt production.

Differential Revision: https://reviews.llvm.org/D42854

llvm-svn: 324440
2018-02-07 02:21:21 +00:00
Matt Arsenault a18b3bcf51 AMDGPU: Select BFI patterns with 64-bit ints
llvm-svn: 324431
2018-02-07 00:21:34 +00:00
Stanislav Mekhanoshin ce2d428a98 [AMDGPU] removed dead code handling rmw in memory legalizer
It was always using cmpxchg path and in rmw and cmpxchg instructions
are not distinguishable in the BE.

Differential Revision: https://reviews.llvm.org/D42976

llvm-svn: 324383
2018-02-06 19:11:56 +00:00
Marek Olsak 7d92b7e23a AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU
Author: Bas Nieuwenhuizen

https://reviews.llvm.org/D42881

llvm-svn: 324353
2018-02-06 15:17:55 +00:00
Tim Renouf 807ecc3d66 [AMDGPU] do not generate .AMDGPU.config for amdpal os type
Summary:
Now we generate PAL metadata for the amdpal os type, there is no need to
generate the .AMDGPU.config section.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37760

Change-Id: I303c5fad66656ce97293da60621afac6595b4c18
llvm-svn: 324346
2018-02-06 13:39:38 +00:00
Konstantin Zhuravlyov 8818d13ed2 AMDGPU/MemoryModel: Fix monotonic atomic loads
Those should have glc bit set for system and agent synchronization scopes

llvm-svn: 324314
2018-02-06 04:06:04 +00:00
Dmitry Preobrazhensky 0a1ff464e1 [AMDGPU][MC] Corrected dst/data size for MIMG opcodes with d16 modifier
See bug 36154: https://bugs.llvm.org/show_bug.cgi?id=36154

Differential Revision: https://reviews.llvm.org/D42847

Reviewers: cfang, artem.tamazov, arsenm
llvm-svn: 324237
2018-02-05 14:18:53 +00:00
Dmitry Preobrazhensky e3271aee44 [AMDGPU][MC] Added validation of d16 and r128 modifiers of MIMG opcodes
See bugs 36094, 36095:
  https://bugs.llvm.org/show_bug.cgi?id=36094
  https://bugs.llvm.org/show_bug.cgi?id=36095

Differential Revision: https://reviews.llvm.org/D42692

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 324231
2018-02-05 12:45:43 +00:00
Yaxun Liu 2a22c5deff [AMDGPU] Switch to the new addr space mapping by default
This requires corresponding clang change.

Differential Revision: https://reviews.llvm.org/D40955

llvm-svn: 324101
2018-02-02 16:07:16 +00:00
Changpeng Fang 29fcf883fb AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the target has UnpackedD16VMem feature.
Reviewers:
  Matt and Brian

Differential Revision:
  https://reviews.llvm.org/D42548

llvm-svn: 323988
2018-02-01 18:41:33 +00:00
Matt Arsenault af88f0eb44 AMDGPU: Fix missing SCC def from s_xor_b64_term
llvm-svn: 323927
2018-01-31 22:54:27 +00:00
Marek Olsak d4bb329d0e AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9
Summary:
This enables load merging into x2, x4, which is driven by inline offsets.

6500 shaders are affected:
Code Size in affected shaders: -15.14 %

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42078

llvm-svn: 323909
2018-01-31 20:18:11 +00:00
Marek Olsak 13e4741275 AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

llvm-svn: 323908
2018-01-31 20:18:04 +00:00
Geoff Berry 1d53101387 [AMDGPU] isRenamable fixes to support copy forwarding
Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will
be marked as not renamable, to avoid copy forwarding violating the
constraint that only one operand may use the constant bus.

These changes fix a few mis-compiles when copy forwarding is enabled in
MachineCopyPropagation by D41835 (and were reviewed as part of that change).

llvm-svn: 323794
2018-01-30 17:37:39 +00:00
Mark Searles 94ae3b2f9b [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output."
Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\
teps/build_Lld/logs/stdio :
        /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable]
          static int32_t InstCnt = 0;
                                              "
This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc.

llvm-svn: 323791
2018-01-30 17:17:06 +00:00
Mark Searles d6d5a2571f [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output.
-amdgpu-waitcnt-forcezero={1|0}  Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n>  Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n>   Force emit a s_waitcnt vmcnt(0) before the first <n> instrs

This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch).

Differential Revision: https://reviews.llvm.org/D40091

llvm-svn: 323788
2018-01-30 16:49:38 +00:00
Changpeng Fang 0905870f93 AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace.
Reviewer:
  Dmitry (dp).

Differential Revision:
  https://reviews.llvm.org/D42596

llvm-svn: 323785
2018-01-30 16:42:40 +00:00
Tom Stellard 3ae38d271e AMDGPU: Move ADDRIndirect complex pattern into R600Instructions.td
Summary: This is only used by R600.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D37114

llvm-svn: 323709
2018-01-29 23:29:26 +00:00
Marek Olsak 48057b554c AMDGPU: Allow a SGPR for the conditional KILL operand
Patch by: Bas Nieuwenhuizen

Just use the _e64 variant if needed. This should be possible as per

def : Pat <
  (int_amdgcn_kill (i1 (setcc f32:$src, InlineFPImm<f32>:$imm, cond:$cond))),
  (SI_KILL_F32_COND_IMM_PSEUDO $src, (bitcast_fpimm_to_i32 $imm), (cond_as_i32imm $cond))
> ;

I don't think we can get an immediate for the other operand for which we
need the second 32-bit word.

https://reviews.llvm.org/D42302

llvm-svn: 323706
2018-01-29 23:19:10 +00:00
Geoff Berry d37dc77b6e [AMDGPU][X86][Mips] Make sure renamable bit not set for reserved regs
Summary:
Fix a few places that were modifying code after register
allocation to set the renamable bit correctly to avoid failing the
validation added in D42449.

llvm-svn: 323675
2018-01-29 18:47:48 +00:00
Daniel Sanders 9ade5592d9 [globalisel] Make LegalizerInfo::LegalizeAction available outside of LegalizerInfo. NFC
Summary:
The improvements to the LegalizerInfo discussed in D42244 require that
LegalizerInfo::LegalizeAction be available for use in other classes. As such,
it needs to be moved out of LegalizerInfo. This has been done separately to the
next patch to minimize the noise in that patch.

llvm-svn: 323669
2018-01-29 17:37:29 +00:00
Dmitry Preobrazhensky 4f321aef74 [AMDGPU][MC] Corrected parsing of image opcode modifiers r128 and d16
See bugs 36092, 36093:
    https://bugs.llvm.org/show_bug.cgi?id=36092
    https://bugs.llvm.org/show_bug.cgi?id=36093

Differential Revision: https://reviews.llvm.org/D42583

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323651
2018-01-29 14:20:42 +00:00
Hiroshi Inoue c8e9245816 [NFC] fix trivial typos in comments and documents
"to to" -> "to"

llvm-svn: 323628
2018-01-29 05:17:03 +00:00
Dmitry Preobrazhensky 706828157f [AMDGPU][MC] Added validation of image dst/data size (must match dmask and tfe)
See bug 36000: https://bugs.llvm.org/show_bug.cgi?id=36000

Differential Revision: https://reviews.llvm.org/D42483

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323538
2018-01-26 16:42:51 +00:00
Dmitry Preobrazhensky 0b4eb1ead1 [AMDGPU][MC] Added support of 64-bit image atomics
See bug 35998: https://bugs.llvm.org/show_bug.cgi?id=35998

Differential Revision: https://reviews.llvm.org/D42469

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323534
2018-01-26 15:43:29 +00:00
Dmitry Preobrazhensky 6cb42e7622 [AMDGPU][MC] Enabled disassembler for image atomic operations
See bug 35988: https://bugs.llvm.org/show_bug.cgi?id=35988

Differential Revision: https://reviews.llvm.org/D42186

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 323527
2018-01-26 14:07:38 +00:00
Daniil Fukalov 6e1dc68117 [AMDGPU] fix LDS f32 intrinsics
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic

Reviewed by: b-sumner

Differential Revision: https://reviews.llvm.org/D42383

llvm-svn: 323516
2018-01-26 11:09:38 +00:00
Geoff Berry c4796d4745 [AMDGPU] Make sure all super regs of reserved regs are marked reserved.
Summary:
Move reserveRegisterTuples into AMDGPURegisterInfo and use it in
R600RegisterInfo::getReservedRegs and
R600InstrInfo::reserveIndirectRegisters to ensure that all super
registers of reserved registers are also marked as reserved.

Before this change, under certain circumstances, the registers %t1_x and
%t1_xyzw would be marked as reserved, but %t1_xy and %t1_xyz would not
be, leading to the register allocator sometimes assigning a register to
%t1_xy, which is invalid since %t1_x is reserved.

Reviewers: arsenm, tstellar, MatzeB, qcolombet

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D42448

llvm-svn: 323356
2018-01-24 18:09:53 +00:00
Hiroshi Inoue 501931b117 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323302
2018-01-24 05:04:35 +00:00
Mark Searles 7687d42052 [AMDGPU] SI Load Store Optimizer: When merging with offset, use V_ADD_{I|U}32_e64
- Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register.
- Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts.

Differential Revision: https://reviews.llvm.org/D42124

llvm-svn: 323153
2018-01-22 21:46:43 +00:00
Hiroshi Inoue 290adb3184 [NFC] fix trivial typos in comments
"the the" -> "the"

llvm-svn: 323074
2018-01-22 05:54:46 +00:00
Dmitry Preobrazhensky 0e074e349d [AMDGPU][MC] Corrected parsing of image modifiers and encoding of image atomics
See bugs
    35962: https://bugs.llvm.org/show_bug.cgi?id=35962
    35963: https://bugs.llvm.org/show_bug.cgi?id=35963

Differential Revision: https://reviews.llvm.org/D42184

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322942
2018-01-19 13:49:53 +00:00
Matthias Braun 4a7c8e7aa2 Split MachineLICM into EarlyMachineLICM and MachineLICM; NFC
This avoids playing games with pseudo pass IDs and avoids using an
unreliable MRI::isSSA() check to determine whether register allocation
has happened.

Note that this renames:
- MachineLICMID -> EarlyMachineLICM
- PostRAMachineLICMID -> MachineLICMID
to be consistent with the EarlyTailDuplicate/TailDuplicate naming.

llvm-svn: 322927
2018-01-19 06:46:10 +00:00
Changpeng Fang ba6240cc71 AMDGPU/SI: Fix typos in d16 support patch the buffer intrinsics.
llvm-svn: 322906
2018-01-18 22:57:57 +00:00
Changpeng Fang 4737e892de AMDGPU/SI: Add d16 support for image intrinsics.
Summary:
  This patch implements d16 support for image load, image store and image sample intrinsics.

Reviewers:
  Matt, Brian.

Differential Revision:
  https://reviews.llvm.org/D3991

llvm-svn: 322903
2018-01-18 22:08:53 +00:00
Aditya Nandakumar 18b3f9d384 [GISel] Make constrainSelectedInstRegOperands() available to the legalizer. NFC
https://reviews.llvm.org/D42149

llvm-svn: 322743
2018-01-17 19:31:33 +00:00
Matt Arsenault 1491ca8911 AMDGPU: Error in SIAnnotateControlFlow instead of assert
This assert typically happens if an unstructured CFG is passed
to the pass. This can happen if the pass is run independently
without the structurizer.

llvm-svn: 322685
2018-01-17 16:30:01 +00:00
Daniil Fukalov d5fca554e2 [AMDGPU] add LDS f32 intrinsics
added llvm.amdgcn.atomic.{add|min|max}.f32 intrinsics
to allow generate ds_{add|min|max}[_rtn]_f32 instructions
needed for OpenCL float atomics in LDS

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D37985

llvm-svn: 322656
2018-01-17 14:05:05 +00:00
Dmitry Preobrazhensky 6b65f7c380 [AMDGPU][MC][GFX9] Enable inline constants for SDWA operands
See bug 35771: https://bugs.llvm.org/show_bug.cgi?id=35771

Differential Revision: https://reviews.llvm.org/D42058

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322655
2018-01-17 14:00:48 +00:00
Stanislav Mekhanoshin 62875fcd6c [AMDGPU] Add HW_REG_SH_MEM_BASES symbolic name for s_getreg_b32
Differential Revision: https://reviews.llvm.org/D41617

llvm-svn: 322500
2018-01-15 18:49:15 +00:00
Stanislav Mekhanoshin f630047ef6 [AMDGPU] Copy impdefs from pseudo to real instructions
In some cases we do not copy implicit defs from pseudo to real
VOP instructions. It has no visible impact at the moment thus no
tests are affected or added.

Differential Revision: https://reviews.llvm.org/D41783

llvm-svn: 322496
2018-01-15 17:55:35 +00:00
Tim Renouf 75ced9d5b8 [AMDGPU] stop image_store being moved illegally
Summary:
A recent change
321556: AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores
can allow the machine instruction scheduler to move an image store past
an image load using the same descriptor.

V2: Fixed by marking image ops as mayAlias and isAliased. This may be
overly conservative, and we may need to revisit.
V3: Reverted test change done on 321556.

Reviewers: arsenm, nhaehnle, dstuttard

Subscribers: llvm-commits, t-tye, yaxunl, wdng, kzhuravl

Differential Revision: https://reviews.llvm.org/D41969

llvm-svn: 322419
2018-01-12 22:57:24 +00:00
Changpeng Fang 44dfa1de3b AMDGPU/SI: Add d16 support for buffer intrinsics.
Differential Revision:
  https://reviews.llvm.org/D38906

Reviewers:
  Matt and Brian.

llvm-svn: 322402
2018-01-12 21:12:19 +00:00
Dmitry Preobrazhensky 3afbd825a3 [AMDGPU][MC][GFX8][GFX9] Added XNACK_MASK support
See bug 35764: https://bugs.llvm.org/show_bug.cgi?id=35764

Differential Revision: https://reviews.llvm.org/D41614

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 322189
2018-01-10 14:22:19 +00:00
Tim Renouf 6eaad1e539 [AMDGPU] Fixed incorrect uniform branch condition
Summary:
I had a case where multiple nested uniform ifs resulted in code that did
v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and
s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first
ensuring that bits for inactive lanes were clear.

There was already code for inserting an "s_and_b64 vcc, exec, vcc" to
clear bits for inactive lanes in the case that the branch is instruction
selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in
SIFixSGPRCopies. I have added the same code into SILowerControlFlow for
the case that the branch is instruction selected as s_cbranch_vccnz.

This de-optimizes the code in some cases where the s_and is not needed,
because vcc is the result of a v_cmp, or multiple v_cmp instructions
combined by s_and/s_or. We should add a pass to re-optimize those cases.

Reviewers: arsenm, kzhuravl

Subscribers: wdng, yaxunl, t-tye, llvm-commits, dstuttard, timcorringham, nhaehnle

Differential Revision: https://reviews.llvm.org/D41292

llvm-svn: 322119
2018-01-09 21:34:43 +00:00
Matt Arsenault 4ff5e002ea AMDGPU: Remove dead file
llvm-svn: 321752
2018-01-03 18:45:42 +00:00
Alex Bradbury b22f751fa7 Thread MCSubtargetInfo through Target::createMCAsmBackend
Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend. 
D20830 threaded an MCSubtargetInfo reference through 
MCAsmBackend::relaxInstruction, but this isn't the only function that would 
benefit from access. This patch removes the Triple and CPUString arguments 
from createMCAsmBackend and replaces them with MCSubtargetInfo.

This patch just changes the interface without making any intentional 
functional changes. Once in, several cleanups are possible:
* Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend
* Support 16-bit instructions when valid in MipsAsmBackend::writeNopData
* Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl
* Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221)

This change initially exposed PR35686, which has since been resolved in r321026.

Differential Revision: https://reviews.llvm.org/D41349

llvm-svn: 321692
2018-01-03 08:53:05 +00:00
Matt Arsenault e19bc2ee0f AMDGPU: Use unique PSVs for buffer resources
Also fixes using the wrong memory type for some
intrinsics when custom lowering them.

llvm-svn: 321557
2017-12-29 17:18:21 +00:00
Matt Arsenault d94b63d765 AMDGPU: Remove mayLoad/hasSideEffects from MIMG stores
Atomics still have hasSideEffects set on them because
of the mess that is the memory properties.

llvm-svn: 321556
2017-12-29 17:18:18 +00:00
Matt Arsenault 905f3518ba AMDGPU: Implement getTgtMemIntrinsic for images
Currently all images are lowered to have a single
image PseudoSourceValue. Image stores happen to have
overly strict mayLoad/mayStore/hasSideEffects flags
set on them, so this happens to work. When these
are fixed to be correct, the scheduler breaks
this because the identical PSVs are assumed to
be the same address. These need to be unique
to the image resource value.

llvm-svn: 321555
2017-12-29 17:18:14 +00:00
Dmitry Preobrazhensky 414e05383f [AMDGPU][MC] Incorrect parsing of flat/global atomic modifiers
See bug 35730: https://bugs.llvm.org/show_bug.cgi?id=35730

Differential Revision: https://reviews.llvm.org/D41598

Reviewers: vpykhtin, artem.tamazov, arsenm
llvm-svn: 321552
2017-12-29 13:55:11 +00:00
Sanjoy Das 26d11ca4b0 (Re-landing) Expose a TargetMachine::getTargetTransformInfo function
Re-land r321234.  It had to be reverted because it broke the shared
library build.  The shared library build broke because there was a
missing LLVMBuild dependency from lib/Passes (which calls
TargetMachine::getTargetIRAnalysis) to lib/Target.  As far as I can
tell, this problem was always there but was somehow masked
before (perhaps because TargetMachine::getTargetIRAnalysis was a
virtual function).

Original commit message:

This makes the TargetMachine interface a bit simpler.  We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.

See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html

I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.

Reviewers: echristo, MatzeB, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D41464

llvm-svn: 321375
2017-12-22 18:21:59 +00:00
Dmitry Preobrazhensky 471adf7fdc [AMDGPU][MC] Corrected handling of negative expressions
See bug 35716: https://bugs.llvm.org/show_bug.cgi?id=35716

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41488

llvm-svn: 321372
2017-12-22 18:03:35 +00:00
Dmitry Preobrazhensky c5b0c172f6 [AMDGPU][MC] Corrected parsing of optional operands for ds_swizzle_b32
See bug 35645: https://bugs.llvm.org/show_bug.cgi?id=35645

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41186

llvm-svn: 321367
2017-12-22 17:13:28 +00:00
Dmitry Preobrazhensky 2713495318 [AMDGPU][MC] Added support of 256- and 512-bit tuples of ttmp registers
See bug 35561: https://bugs.llvm.org/show_bug.cgi?id=35561

This patch also affects implementation of SGPR and VGPR registers though changes are cosmetic.

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41437

llvm-svn: 321359
2017-12-22 15:18:06 +00:00
Sanjoy Das 747d1114d6 Revert "Expose a TargetMachine::getTargetTransformInfo function"
This reverts commit r321234.  It breaks the -DBUILD_SHARED_LIBS=ON build.

llvm-svn: 321243
2017-12-21 02:34:39 +00:00
Sanjoy Das 0c3de350b4 Expose a TargetMachine::getTargetTransformInfo function
Summary:
This makes the TargetMachine interface a bit simpler.  We still need
the std::function in TargetIRAnalysis to avoid having to add a
dependency from Analysis to Target.

See discussion:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119749.html

I avoided adding all of the backend owners to this review since the
change is simple, but let me know if you feel differently about this.

Reviewers: echristo, MatzeB, hfinkel

Reviewed By: hfinkel

Subscribers: jholewinski, jfb, arsenm, dschuff, mcrosier, sdardis, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, llvm-commits

Differential Revision: https://reviews.llvm.org/D41464

llvm-svn: 321234
2017-12-21 01:06:58 +00:00
Matt Arsenault f7f59b5292 [AMDGPU, AsmParser] Enable the mnemonic spell corrector.
Patch by Dmitry Venikov

llvm-svn: 321202
2017-12-20 18:52:57 +00:00
Mark Searles e4f067ebe2 [AMDGPU] Turn off MergeConsecutiveStores() before Instruction Selection for AMDGPU. Commit dbbb6c5fc3642987430866dffdf710df4f616ac7 turned on MergeConsecutiveStores() before Instruction Selection for all targets. Enough AMDGPU compiles go into an infinite loop ( MergeConsecutiveStores() merges two stores; LegalizeStoreOps() un-merges; MergeConsecutiveStores() re-merges, etc. ) to warrant turning it off until the issues can be addressed.
Differential Revision: https://reviews.llvm.org/D41377

llvm-svn: 321100
2017-12-19 19:26:23 +00:00
Matthias Braun f1caa2833f MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Yaxun Liu c41e2f6e7b Recommit CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value
The regression on ppc64 was not due to this commit.

llvm-svn: 320788
2017-12-15 03:56:57 +00:00
Matt Arsenault 7d7adf4f2e TLI: Allow using PSV for intrinsic mem operands
llvm-svn: 320756
2017-12-14 22:34:10 +00:00
Matt Arsenault 1117133687 DAG: Expose all MMO flags in getTgtMemIntrinsic
Rather than adding more bits to express every
MMO flag you could want, just directly use the
MMO flags. Also fixes using a bunch of bool arguments to
getMemIntrinsicNode.

On AMDGPU, buffer and image intrinsics should always
have MODereferencable set, but currently there is no
way to do that directly during the initial intrinsic
lowering.

llvm-svn: 320746
2017-12-14 21:39:51 +00:00
Yaxun Liu f902ef0a5d Revert CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value
This commit might have caused regression on ppc64. Revert it to verify that.

llvm-svn: 320712
2017-12-14 16:12:04 +00:00
Yaxun Liu a5315a040d CodeGen: Fix assertion in machine inst sheduler due to llvm.dbg.value
Two issues were found about machine inst scheduler when compiling ProRender
with -g for amdgcn target:

GCNScheduleDAGMILive::schedule tries to update LiveIntervals for DBG_VALUE, which it
should not since DBG_VALUE is not mapped in LiveIntervals.

when DBG_VALUE is the last instruction of MBB, ScheduleDAGInstrs::buildSchedGraph and
ScheduleDAGMILive::scheduleMI does not move RPTracker properly, which causes assertion.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D41132

llvm-svn: 320650
2017-12-13 22:38:09 +00:00
Matt Arsenault cad7fa857c AMDGPU: Partially fix disassembly of MIMG instructions
Stores failed to decode at all since they didn't have a
DecoderNamespace set. Loads worked, but did not change
the register width displayed to match the numbmer of
enabled channels.

The number of printed registers for vaddr is still wrong,
but I don't think that's encoded in the instruction so
there's not much we can do about that.

Image atomics are still broken. MIMG is the same
encoding for SI/VI, but the image atomic classes
are split up into encoding specific versions unlike
every other MIMG instruction. They have isAsmParserOnly
set on them for some reason. dmask is also special for
these, so we probably should not have it as an explicit
operand as it is now.

llvm-svn: 320614
2017-12-13 21:07:51 +00:00
Craig Topper ac59db2efe [Targets] Don't automatically include the scheduler class enum from *GenInstrInfo.inc with GET_INSTRINFO_ENUM. Make targets request is separately.
Most of the targets don't need the scheduler class enum.

I have an X86 scheduler model change that causes some names in the enum to become about 18000 characters long. This is because using instregex in scheduler models causes the scheduler class to get named with every instruction that matches the regex concatenated together. MSVC has a limit of 4096 characters for an identifier name. Rather than trying to come up with way to reduce the name length, I'm just going to sidestep the problem by not including the enum in X86.

llvm-svn: 320552
2017-12-13 07:26:17 +00:00
Matthias Braun f842297d50 Rename LiveIntervalAnalysis.h to LiveIntervals.h
Headers/Implementation files should be named after the class they
declare/define.

Also eliminated an `#include "llvm/CodeGen/LiveIntervalAnalysis.h"` in
favor of `class LiveIntarvals;`

llvm-svn: 320546
2017-12-13 02:51:04 +00:00
Matt Arsenault 3e268cc0dd LSR: Check more intrinsic pointer operands
llvm-svn: 320424
2017-12-11 21:38:43 +00:00
Dmitry Preobrazhensky ac2b02643b [AMDGPU][MC][GFX9] Corrected encoding of ttmp registers, disabled tba/tma
See bugs 35494 and 35559:
https://bugs.llvm.org/show_bug.cgi?id=35494
https://bugs.llvm.org/show_bug.cgi?id=35559

Reviewers: vpykhtin, artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D41007

llvm-svn: 320375
2017-12-11 15:23:20 +00:00
Konstantin Zhuravlyov c40d9f2e5d AMDGPU/GCN: Bring processors in sync with AMDGPUUsage
- Add gfx704
    - Change bonaire to gfx704
  - Remove gfx804
  - Remove gfx901
  - Remove gfx903

Differential Revision: https://reviews.llvm.org/D40046

llvm-svn: 320194
2017-12-08 20:52:28 +00:00
Matt Arsenault 73ce93b08b AMDGPU: Set IntrReadMem on memtime intrinsics
llvm-svn: 320188
2017-12-08 20:01:02 +00:00
Matt Arsenault 856777d8c9 AMDGPU: image_getlod and image_getresinfo do not read memory
llvm-svn: 320187
2017-12-08 20:00:57 +00:00
Matt Arsenault ecad0d5364 AMDGPU: Preserve MMO in adjustWritemask
Follow up to r319705. Currently the MMO is
produced after this in the custom inserter,
so this doesn't change anything yet.

llvm-svn: 320186
2017-12-08 20:00:45 +00:00