Commit Graph

278 Commits

Author SHA1 Message Date
Sameer Sahasrabuddhe 3cbbded68c Introduce unify-loop-exits pass.
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.

The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".

This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.

The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.

Reviewers: madhur13490, arsenm, nhaehnle

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D75865
2020-03-30 13:23:56 -04:00
Matt Arsenault 348735b723 AMDGPU: Stop setting attributes based on TargetOptions
Having arbitrary passes looking at the TargetOptions is pretty
messy. This was also disregarding if a function already had an
explicit attribute setting on it. opt/llc now add the attributes to
functions that don't specify the attribute. clang and lld do not call
the function to do this, which they maybe should.

This was also treating unsafe-fp-math as implying the others, and
setting the other attributes based on it. This is not done anywhere
else, and I'm not sure is correct based on the current description of
the option bit.

Effectively reverts 1d8cf2be89
2020-03-27 13:13:43 -07:00
Stanislav Mekhanoshin 6e00e3fcb0 [AMDGPU] Preserve original symbol during attribute propagation
AMDGPUPropagateAttributes can swap names while cloning a function.
Only do it if original symbol was not externally visible.

Differential Revision: https://reviews.llvm.org/D76789
2020-03-25 15:26:30 -07:00
cdevadas ce984129ea [AMDGPU] Add SIPreEmitPeephole pass.
This pass can handle all the optimization
opportunities found just before code emission.
Presently it includes the handling of vcc branch
optimization that was handled earlier in SIInsertSkips.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D76712
2020-03-25 15:35:35 +00:00
Ram Nalamothu 24698e526f Implement wave32 DWARF register mapping
Implement the DWARF register mapping described in llvm/docs/AMDGPUUsage.rst.

This enables generating appropriate DWARF register numbers for wave64 and
wave32 modes.
2020-03-23 10:24:16 -04:00
Scott Linder 60b1967c39 [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
Add the scratch wave offset to the scratch buffer descriptor (SRSrc) in
the entry function prologue. This allows us to removes the scratch wave
offset register from the calling convention ABI.

As part of this change, allow the use of an inline constant zero for the
SOffset of MUBUF instructions accessing the stack in entry functions
when a frame pointer is not requested/required. Entry functions with
calls still need to set up the calling convention ABI stack pointer
register, and reference it in order to address arguments of called
functions. The ABI stack pointer register remains unswizzled, but is now
wave-relative instead of queue-relative.

Non-entry functions also use an inline constant zero SOffset for
wave-relative scratch access, but continue to use the stack and frame
pointers as before. When the stack or frame pointer is converted to a
swizzled offset it is now scaled directly, as the scratch wave offset no
longer needs to be subtracted first.

Update llvm/docs/AMDGPUUsage.rst to reflect these changes to the calling
convention.

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75138
2020-03-19 15:35:16 -04:00
Scott Linder db099f994b [AMDGPU][NFC] Refactor some uses of unsigned to Register
Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76035
2020-03-19 15:35:16 -04:00
Matt Arsenault 015b640be4 AMDGPU: Add flag to used fixed function ABI
Pass all arguments to every function, rather than only passing the
minimum set of inputs needed for the call graph.
2020-03-13 13:27:05 -07:00
Matt Arsenault fee41517fe AMDGPU/GlobalISel: Introduce post-legalize combiner
The current set of custom combines are only really useful after
legalization, so move them there. There is a lot of overlap in the
boilerplate here, but I think we do want a pretty different set of
combines before and after legalize. I think we will want a lot of
overlap between the post-legalize and a post-regbankselect combiner.
2020-02-24 22:12:12 -05:00
Matt Arsenault 54137bbaaf GlobalISel: Allow running localizer earlier
This required legal and regbankselected MIR for seemingly no
reason. For AMDGPU this wouldn't see legalized G_GLOBAL_VALUEs.
2020-02-17 11:24:06 -08:00
Matt Arsenault 5fdc9851d0 AMDGPU/GlobalISel: Run the localizer pass
While looking at the output on real sized programs, there is a lot of
extra SGPR spilling compared to the DAG path. This seems to largely be
from all constants being SGPRs in the entry block.
2020-02-17 07:38:12 -08:00
Jay Foad e9900b1fbf [AMDGPU] Add one more pass to LLVMInitializeAMDGPUTarget 2020-02-12 11:19:14 +00:00
Matt Arsenault 1024b73ef5 AMDGPU: Split denormal mode tracking bits
Prepare to accurately track the future denormal-fp-math attribute
changes. The way to actually set these separately is not wired in yet.

This is just a mechanical change, and mostly still assumes the input
and output mode match. This should be refined for some cases. For
example, fcanonicalize lowering should use the flushing variant if
either input or output flushing is enabled
2020-02-04 10:44:21 -08:00
Stanislav Mekhanoshin 555d8f4ef5 [AMDGPU] Bundle loads before post-RA scheduler
We are relying on atrificial DAG edges inserted by the
MemOpClusterMutation to keep loads and stores together in the
post-RA scheduler. This does not work all the time since it
allows to schedule a completely independent instruction in the
middle of the cluster.

Removed the DAG mutation and added pass to bundle already
clustered instructions. These bundles are unpacked before the
memory legalizer because it does not work with bundles but also
because it allows to insert waitcounts in the middle of a store
cluster.

Removing artificial edges also allows a more relaxed scheduling.

Differential Revision: https://reviews.llvm.org/D72737
2020-01-24 11:33:38 -08:00
Matt Arsenault a174f0da62 AMDGPU/GlobalISel: Add pre-legalize combiner pass
Just copy the AArch64 pass as-is for now, except for removing the
memcpy handling.
2020-01-22 10:16:39 -05:00
cdevadas e53a9d96e6 Resubmit: [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.

This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.

Differential revision: https://reviews.llvm.org/D68092
2020-01-22 13:18:32 +09:00
Nicolai Hähnle a80291ce10 Revert "[AMDGPU] Invert the handling of skip insertion."
This reverts commit 0dc6c249bf.

The commit is reported to cause a regression in piglit/bin/glsl-vs-loop for
Mesa.
2020-01-21 09:17:25 +01:00
cdevadas 0dc6c249bf [AMDGPU] Invert the handling of skip insertion.
The current implementation of skip insertion (SIInsertSkip) makes it a
mandatory pass required for correctness. Initially, the idea was to
have an optional pass. This patch inserts the s_cbranch_execz upfront
during SILowerControlFlow to skip over the sections of code when no
lanes are active. Later, SIRemoveShortExecBranches removes the skips
for short branches, unless there is a sideeffect and the skip branch is
really necessary.

This new pass will replace the handling of skip insertion in the
existing SIInsertSkip Pass.

Differential revision: https://reviews.llvm.org/D68092
2020-01-15 15:18:16 +05:30
Tom Stellard 0dbcb36394 CMake: Make most target symbols hidden by default
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.

A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.

This patch reduces the number of public symbols in libLLVM.so by about
25%.  This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so

One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.

Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278

Reviewers: chandlerc, beanz, mgorny, rnk, hans

Reviewed By: rnk, hans

Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D54439
2020-01-14 19:46:52 -08:00
Michael Liao 01a4b83154 [codegen,amdgpu] Enhance MIR DIE and re-arrange it for AMDGPU.
Summary:
- `dead-mi-elimination` assumes MIR in the SSA form and cannot be
  arranged after phi elimination or DeSSA. It's enhanced to handle the
  dead register definition by skipping use check on it. Once a register
  def is `dead`, all its uses, if any, should be `undef`.
- Re-arrange the DIE in RA phase for AMDGPU by placing it directly after
  `detect-dead-lanes`.
- Many relevant tests are refined due to different register assignment.

Reviewers: rampitec, qcolombet, sunfish

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72709
2020-01-14 19:26:15 -05:00
Reid Kleckner 05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Matt Arsenault 19e7f8a21d AMDGPU: Add default denormal mode to MachineFunctionInfo
The default FP mode should really be a property of a specific
function, and not a subtarget. Introduce the necessary fields to the
SIMachineFunctionInfo to help move towards this goal.
2019-11-01 00:03:39 -07:00
Jay Foad 765055658c Revert "[AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs."
Summary:
This has been superseded by "[AMDGPU]: PHI Elimination hooks added for custom COPY insertion."

This reverts the code changes from commit 53f967f2bd
but keeps the test case.

Reviewers: hliao, arsenm, tpr, dstuttard

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68769

llvm-svn: 374347
2019-10-10 13:34:31 +00:00
Matt Arsenault 12994a70cf AMDGPU: Use SGPR_128 instead of SReg_128 for vregs
SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
the additional non-allocatable TTMP registers. There's no point in
allocating SReg_128 vregs. This shrinks the size of the classes
regalloc needs to consider, which is usually good.

llvm-svn: 374284
2019-10-10 07:11:33 +00:00
Matt Arsenault 3b95986a32 AMDGPU: Run AMDGPUCodeGenPrepare after scalar opts
The mul24 matching could interfere with SLSR and the other addressing
mode related passes. This probably is not the optimal placement, but
is an intermediate step. This should probably be moved after all the
generic IR passes, particularly LSR. Moving this after LSR seems to
help in some cases, and hurts others.

As-is in this patch, in idiv-licm, it saves 1-2 instructions inside
some of the loop bodies, but increases the number in others. Moving
this later helps these loops. In the new lsr tests in
mul24-pass-ordering, the intrinsic prevents introducing more
instructions in the loop preheader, so moving this later ends up
hurting them. This shouldn't be any worse than before the intrinsics
were introduced in r366094, and LSR should probably be smarter. I
think it's because it doesn't know the and inside the loop will be
folded away.

llvm-svn: 369991
2019-08-27 00:08:31 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Stanislav Mekhanoshin 4c9c98f36b [AMDGPU] Printf runtime binding pass
This pass is a port of the according pass from the HSAIL compiler.
It parses printf calls and setup runtime printf buffer.
After that it copies printf arguments to the buffer and fills in
module metadata for runtime.

Differential Revision: https://reviews.llvm.org/D24035

llvm-svn: 368592
2019-08-12 17:12:29 +00:00
Michael Liao 53f967f2bd [AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs.
Summary:
- As LCSSA is turned on just before isel, it may create PHI of the flow,
  which is consumed by pseudo structurized CFG instructions. When that
  PHIs are eliminated in O0, COPY may be placed wrongly as the these
  pseudo structurized CFG instructions are considering prologue of MBB.
- Run extra `unreachable-mbb-elimination` at the end of isel to clean up
  PHIs.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64353

llvm-svn: 367023
2019-07-25 14:50:18 +00:00
Michael Liao b3f967d411 [AMDGPU] Add the adjusted FP as a livein register.
Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64145

llvm-svn: 366223
2019-07-16 15:57:12 +00:00
Matt Arsenault 58426a3707 AMDGPU: Serialize mode from MachineFunctionInfo
llvm-svn: 365653
2019-07-10 16:09:26 +00:00
Matt Arsenault e7e23e3e91 AMDGPU: Make AMDGPUPerfHintAnalysis an SCC pass
Add a string attribute instead of directly setting
MachineFunctionInfo. This avoids trying to get the analysis in the
MachineFunctionInfo in a way that doesn't work with the new pass
manager.

This will also avoid re-visiting the call graph for every single
function.

llvm-svn: 365241
2019-07-05 20:26:13 +00:00
Matt Arsenault 5b0922fe1f AMDGPU: Add pass to lower SGPR spills
This is split out from my patches to split register allocation into a
separate SGPR and VGPR phase, and has some parts that aren't yet used
(like maintaining LiveIntervals).

This simplifies making the frame pointer register callee saved. As it
is now, the code to determine callee saves needs to predict all the
possible SGPR spills and how many callee saved VGPRs are needed. By
handling this before PrologEpilogInserter, it's possible to just check
the spill objects that already exist.

Change-Id: I29e6df4034afcf949e06f8ef44206acb94696f04
llvm-svn: 365095
2019-07-03 23:32:29 +00:00
Michael Liao 80177ca5a9 [AMDGPU] Enable serializing of argument info.
Summary:
- Support serialization of all arguments in machine function info. This
  enables fabricating MIR tests depending on argument info.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64096

llvm-svn: 364995
2019-07-03 02:00:21 +00:00
Alexander Timofeev 2ce560f029 [AMDGPU] LCSSA pass added in preISel. Uniform values defined in the divergent loop and used outside
Differential Revision: https://reviews.llvm.org/D63953

Reviewers: rampitec, nhaehnle, arsenm
llvm-svn: 364950
2019-07-02 17:59:44 +00:00
Matt Arsenault 9cac4e6d14 Rename ExpandISelPseudo->FinalizeISel, delay register reservation
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.

Patch by Matthias Braun

llvm-svn: 363757
2019-06-19 00:25:39 +00:00
Stanislav Mekhanoshin 3138278287 [AMDGPU] Propagate function attributes thru bitcasts
AMDGPUPropagateAttributes will not work on function bitcatsts,
so move AMDGPUFixFunctionBitcasts before it.

Differential Revision: https://reviews.llvm.org/D63455

llvm-svn: 363614
2019-06-17 20:42:48 +00:00
Stanislav Mekhanoshin a9191c8492 [AMDGPU] gfx1010 wavefrontsize intrinsic folding
Differential Revision: https://reviews.llvm.org/D63206

llvm-svn: 363588
2019-06-17 17:57:50 +00:00
Stanislav Mekhanoshin ad04e7ad42 [AMDGPU] Pass to propagate ABI attributes from kernels to the functions
The pass works in two modes:

Mode 1: Just set attributes starting from kernels. This can work at
the very beginning of opt and llc pipeline, but cannot clone functions
because it must be a function pass.

Mode 2: Actually clone functions for new attributes. This can only work
after all function passes in the opt pipeline because it has to be a
module pass.

Differential Revision: https://reviews.llvm.org/D63208

llvm-svn: 363586
2019-06-17 17:47:28 +00:00
Tom Stellard 4b0b26199b Revert CMake: Make most target symbols hidden by default
This reverts r362990 (git commit 374571301d)

This was causing linker warnings on Darwin:

ld: warning: direct access in function 'llvm::initializeEvexToVexInstPassPass(llvm::PassRegistry&)'
from file '../../lib/libLLVMX86CodeGen.a(X86EvexToVex.cpp.o)' to global weak symbol
'void std::__1::__call_once_proxy<std::__1::tuple<void* (&)(llvm::PassRegistry&),
std::__1::reference_wrapper<llvm::PassRegistry>&&> >(void*)' from file '../../lib/libLLVMCore.a(Verifier.cpp.o)'
means the weak symbol cannot be overridden at runtime. This was likely caused by different translation
units being compiled with different visibility settings.

llvm-svn: 363028
2019-06-11 03:21:13 +00:00
Tom Stellard 374571301d CMake: Make most target symbols hidden by default
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.

A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.

This patch reduces the number of public symbols in libLLVM.so by about
25%.  This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so

One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.

Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278

Reviewers: chandlerc, beanz, mgorny, rnk, hans

Reviewed By: rnk, hans

Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D54439

llvm-svn: 362990
2019-06-10 22:12:56 +00:00
Richard Trieu 8ce2ee9d56 [AMDGPU] Create a TargetInfo header. NFC
Move the declarations of getThe<Name>Target() functions into a new header in
TargetInfo and make users of these functions include this new header.
This fixes a layering problem.

llvm-svn: 360713
2019-05-14 21:54:37 +00:00
Stanislav Mekhanoshin 3b7925f035 [AMDGPU] gfx1010 GCNRegBankReassign pass
Reassign registers to reduce register bank conflicts.

Differential Revision: https://reviews.llvm.org/D61344

llvm-svn: 359704
2019-05-01 16:49:31 +00:00
Stanislav Mekhanoshin c29d491596 [AMDGPU] gfx1010 GCNNSAReassign pass
Convert NSA into non-NSA images.

Differential Revision: https://reviews.llvm.org/D61341

llvm-svn: 359700
2019-05-01 16:40:49 +00:00
Amara Emerson d189680baa [GlobalISel] Introduce a CSEConfigBase class to allow targets to define their own CSE configs.
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.

This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.

llvm-svn: 358368
2019-04-15 04:53:46 +00:00
Nikita Popov 3db93ac5d6 Reapply [ValueTracking] Support min/max selects in computeConstantRange()
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This fixes an infinite InstCombine loop, with the
test case taken from D59378.

Relative to the previous iteration, this contains some adjustments for
AMDGPU med3 tests: The AMDGPU target runs InstSimplify prior to codegen,
which ends up constant folding some existing med3 tests after this
change. To preserve these tests a hidden -amdgpu-scalar-ir-passes option
is added, which allows disabling scalar IR passes (that use InstSimplify)
for testing purposes.

Differential Revision: https://reviews.llvm.org/D59506

llvm-svn: 357870
2019-04-07 17:22:16 +00:00
Stanislav Mekhanoshin c8f78f8dd3 [AMDGPU] Add MachineDCE pass after RenameIndependentSubregs
Detect dead lanes can create some dead defs. Then RenameIndependentSubregs
will break a REG_SEQUENCE which may use these dead defs. At this point
a dead instruction can be removed but we do not run a DCE anymore.

MachineDCE was only running before live variable analysis. The patch
adds a mean to preserve LiveIntervals and SlotIndexes in case it works
past this.

Differential Revision: https://reviews.llvm.org/D59626

llvm-svn: 357805
2019-04-05 20:11:32 +00:00
Neil Henning 0a30f33ce2 [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal
with WWM operations potentially trashing valid values in inactive lanes.

Previously, the SIFixWWMLiveness pass would work out which registers
were being trashed within WWM regions, and ensure that the register
allocator did not have any values it was depending on resident in those
registers if the WWM section would trash them. This worked perfectly
well, but would cause sometimes severe register pressure when the WWM
section resided before divergent control flow (or at least that is where
I mostly observed it).

This fix instead runs through the WWM sections and pre allocates some
registers for WWM. It then reserves these registers so that the register
allocator cannot use them. This results in a significant register
saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just
this change!).

Differential Revision: https://reviews.llvm.org/D59295

llvm-svn: 357400
2019-04-01 15:19:52 +00:00
Matt Arsenault 4d47ac3b30 AMDGPU: Add additional MIR tests for exec mask optimizations
Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.

Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.

llvm-svn: 357091
2019-03-27 16:58:30 +00:00
Matt Arsenault cf55a657f0 CodeGen: Refactor regallocator command line and target selection
This will allow targets more flexibility to replace the
register allocator core passes. In a future commit,
AMDGPU will run the core register assignment passes
twice, and will also want to disallow using the
standard -regalloc option.

llvm-svn: 356506
2019-03-19 19:33:12 +00:00
Neil Henning 523dab0788 [AMDGPU] Add an experimental buffer fat pointer address space.
Add an experimental buffer fat pointer address space that is currently
unhandled in the backend. This commit reserves address space 7 as a
non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer
descriptor + 32-bit offset) that is heavily used in graphics workloads
using the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D58957

llvm-svn: 356373
2019-03-18 14:44:28 +00:00