Commit Graph

325 Commits

Author SHA1 Message Date
Arthur Eubanks 42d682a217 [NewPM][AMDGPU] Skip adding CGSCCOptimizerLate callbacks at O0
The legacy PM's EP_CGSCCOptimizerLate was only used under not-O0.

Fixes clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp under the new PM.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95250
2021-01-22 12:29:39 -08:00
Arthur Eubanks a11bf9a7fb [AMDGPU][Inliner] Remove amdgpu-inline and add a new TTI inline hook
Having a custom inliner doesn't really fit in with the new PM's
pipeline. It's also extra technical debt.

amdgpu-inline only does a couple of custom things compared to the normal
inliner:
1) It disables inlining if the number of BBs in a function would exceed
   some limit
2) It increases the threshold if there are pointers to private arrays(?)

These can all be handled as TTI inliner hooks.
There already exists a hook for backends to multiply the inlining
threshold.

This way we can remove the custom amdgpu-inline pass.

This caused inline-hint.ll to fail, and after some investigation, it
looks like getInliningThresholdMultiplier() was previously getting
applied twice in amdgpu-inline (https://reviews.llvm.org/D62707 fixed it
not applying at all, so some later inliner change must have fixed
something), so I had to change the threshold in the test.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D94153
2021-01-21 20:29:17 -08:00
Jay Foad c0b3c5a064 [AMDGPU][GlobalISel] Run SIAddImgInit
This pass is required to get correct codegen for image instructions with
the tfe or lwe bits set.

Differential Revision: https://reviews.llvm.org/D95132
2021-01-21 15:54:54 +00:00
Matt Arsenault 20566a2ed8 AMDGPU: Add occupancy to serialized MachineFunctionInfo
Not sure about the default value handling, but also not sure
defaulting to a theoretically subtarget dependent value.
2021-01-21 09:21:00 -05:00
dfukalov 6a87e9b08b [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Arthur Eubanks 28a326eba0 [NFC] Rename registerAliasAnalyses -> registerDefaultAliasAnalyses
To clarify that this only affects the "default" AA.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D93980
2021-01-05 11:07:58 -08:00
Arthur Eubanks 8e293fe6ad [NewPM][AMDGPU] Pass TargetMachine to AMDGPUSimplifyLibCallsPass
Missed in https://reviews.llvm.org/D93863.
2021-01-04 13:48:09 -08:00
Arthur Eubanks 191552344b [NewPM][AMDGPU] Make amdgpu-aa work with NewPM
An AMDGPUAA class already existed that was supposed to work with the new
PM, but it wasn't tested and was a bit broken.

Fix up the existing classes to have the right keys/parameters.
Wire up AMDGPUAA inside AMDGPUTargetMachine.

Add it to the list of alias analyses for the "default" AAManager since
in adjustPassManager() amdgpu-aa is added into the pipeline at the
beginning.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93914
2021-01-04 12:36:27 -08:00
Arthur Eubanks 4e838ba9ea [NewPM][AMDGPU] Port amdgpu-always-inline
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94025
2021-01-04 12:27:01 -08:00
Arthur Eubanks fd323a897c [NewPM][AMDGPU] Port amdgpu-printf-runtime-binding
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94026
2021-01-04 12:25:50 -08:00
Arthur Eubanks e1833e7493 [NewPM][AMDGPU] Port amdgpu-unify-metadata
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94023
2021-01-04 11:57:46 -08:00
Arthur Eubanks a5f863e076 [NewPM][AMDGPU] Port amdgpu-propagate-attributes-early/late
And add to AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94022
2021-01-04 11:53:37 -08:00
Arthur Eubanks b8f22f9d30 [NewPM][AMDGPU] Run InternalizePass when -amdgpu-internalize-symbols
The legacy PM doesn't run EP_ModuleOptimizerEarly on -O0, so skip
running it here when given O0.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93886
2021-01-04 11:34:40 -08:00
Arthur Eubanks 7ecbe0c7a0 [NewPM][AMDGPU] Port amdgpu-lower-kernel-attributes
And add it to the AMDGPU opt pipeline.

This is a function pass instead of a module pass (like the legacy pass)
because it's getting added to a CGSCCPassManager, and you can't put a
module pass in a CGSCCPassManager.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93885
2020-12-29 10:26:06 -08:00
Arthur Eubanks c2ef06d3dd [NewPM] Port infer-address-spaces
And add it to the AMDGPU opt pipeline.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93880
2020-12-28 19:58:12 -08:00
Arthur Eubanks 0e9abcfc19 [AMDGPU][NewPM] Port amdgpu-promote-alloca(-to-vector)
And add to AMDGPU opt pipeline.

Don't pin an opt run to the legacy PM when -enable-new-pm=1 if these
passes (or passes introduced in https://reviews.llvm.org/D93863) are in
the list of passes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D93875
2020-12-28 17:52:31 -08:00
Arthur Eubanks 9abc457724 [NewPM][AMDGPU] Port amdgpu-simplifylib/amdgpu-usenative
And add them to the pipeline via
AMDGPUTargetMachine::registerPassBuilderCallbacks(), which mirrors
AMDGPUTargetMachine::adjustPassManager().

These passes can't be unconditionally added to PassRegistry.def since
they are only present when the AMDGPU backend is enabled. And there are
no target-specific headers in llvm/include, so parsing these pass names
must occur somewhere in the AMDGPU directory. I decided the best place
was inside the TargetMachine, since the PassBuilder invokes
TargetMachine::registerPassBuilderCallbacks() anyway. If we come up with
a cleaner solution for target-specific passes in the future that's fine,
but there aren't too many target-specific IR passes living in
target-specific directories so it shouldn't be too bad to change in the
future.

Reviewed By: ychen, arsenm

Differential Revision: https://reviews.llvm.org/D93863
2020-12-28 10:38:51 -08:00
Alex Richardson 51e09e1d5a [AMDGPU] Set the default globals address space to 1
This will ensure that passes that add new global variables will create them
in address space 1 once the passes have been updated to no longer default
to the implicit address space zero.
This also changes AutoUpgrade.cpp to add -G1 to the DataLayout if it wasn't
already to present to ensure bitcode backwards compatibility.

Reviewed by: arsenm

Differential Revision: https://reviews.llvm.org/D84345
2020-11-20 15:46:53 +00:00
Michael Liao f375885ab8 [InferAddrSpace] Teach to handle assumed address space.
- In certain cases, a generic pointer could be assumed as a pointer to
  the global memory space or other spaces. With a dedicated target hook
  to query that address space from a given value, infer-address-space
  pass could infer and propagate that to all its users.

Differential Revision: https://reviews.llvm.org/D91121
2020-11-16 17:06:33 -05:00
Carl Ritson 057934a6d7 [AMDGPU] Fix insert of SIPreAllocateWWMRegs in FastRegAlloc
SIPreAllocateWWMRegs was being inserted after RegisterCoalescer
but this pass does not exist during FastAlloc so pre-allocation
pass was never being run.
Insert pre-allocation after TwoAddressInstructionPass instead.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90236
2020-10-28 12:15:15 +09:00
Michael Liao 46c3d5cb05 [amdgpu] Add the late codegen preparation pass.
Summary:
- Teach that pass to widen naturally aligned but not DWORD aligned
  sub-DWORD loads.

Reviewers: rampitec, arsenm

Subscribers:

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80364
2020-10-27 14:07:59 -04:00
Carl Ritson 7a880ab388 [AMDGPU] Move WQM Pass after MI Scheduler
Exec mask manipulation inserted by SIWholeQuadMode barriers to
instruction scheduling.  Move the entire pass after the machine
instruction scheduler and make changes so pass is correct for
non-SSA operation.  These changes should leave the pass still
usable pre-scheduler, although tests have be updated to reflect
post-scheduler results.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D88081
2020-10-27 10:25:53 +09:00
Austin Kerbow 37d907899f [HazardRec] Allow inserting multiple wait-states simultaneously
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.

Reviewed By: rampitec, dmgreen

Differential Revision: https://reviews.llvm.org/D89753
2020-10-20 17:03:47 -07:00
Austin Kerbow 978fbd8268 [AMDGPU] Run hazard recognizer pass later
If instructions were removed in peephole passes after the hazard recognizer was
run it is possible that new hazards could be introduced.

Fixes: SWDEV-253090

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D89077
2020-10-16 12:15:51 -07:00
Roman Lebedev 03bd5198b6
[OldPM] Pass manager: run SROA after (simple) loop unrolling
I have stumbled into this pretty accidentally, when rewriting
some spaghetti-like code into something more structured,
which involved using some `std::array<>`s. And to my surprise,
the `alloca`s remained, causing about `+160%` perf regression.

https://llvm-compile-time-tracker.com/compare.php?from=bb6f4d32aac3eecb51909f4facc625219307ee68&to=d563e66f40f9d4d145cb2050e41cb961e2b37785&stat=instructions
suggests that this has geomean compile-time cost of `+0.08%`.

Note that D68593 / cecc0d27ad
already did this chage for NewPM, but left OldPM in a pessimized state.

This fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40011 | PR40011 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=42794 | PR42794 ]] and probably some other reports.

Reviewed By: nikic, xbolva00

Differential Revision: https://reviews.llvm.org/D87972
2020-10-04 11:53:50 +03:00
Jay Foad c799f873cb [AMDGPU] Don't cluster stores
Clustering loads has caching benefits, but as far as I know there is no
advantage to clustering stores on any AMDGPU subtargets.

The disadvantage is that it tends to increase register pressure and
restricts scheduling freedom.

Differential Revision: https://reviews.llvm.org/D85530
2020-09-14 13:40:17 +01:00
Amara Emerson e5784ef8f6 [GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator.
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.

This is enabled only with optimizations enabled like SelectionDAG.

Differential Revision: https://reviews.llvm.org/D86824
2020-09-09 14:31:12 -07:00
Michael Liao 1f4e7463b5 [amdgpu] Run SROA after loop unrolling.
Summary: - There are promotable `alloca`s after loop unrolling.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, nikic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84252
2020-09-01 16:09:56 -04:00
Craig Topper aab90384a3 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Matt Arsenault 79298a5067 AMDGPU: Remove SIFixupVectorISel pass
This was only used for matching the saddr addressing mode of global
instructions, but this was not implemented correctly. The instruction
definitions aren't even correct, and are defined as using a 64-bit
VGPR component. Eliminate this pass to enable correcting the
instruction definitions. A new matching implementation can work in
GlobalISel or relying on DAG divergence information for the base
address.
2020-08-15 12:11:51 -04:00
Matt Arsenault 57bd64ff84 Support addrspacecast initializers with isNoopAddrSpaceCast
Moves isNoopAddrSpaceCast to the TargetMachine. It logically belongs
with the DataLayout.
2020-07-31 10:42:43 -04:00
Christudasan Devadasan d7a05698ef [AMDGPU] Move LowerSwitch pass to CodeGenPrepare.
It is possible that LowerSwitch pass leaves certain blocks
unreachable from the entry. If not removed, these dead blocks
can cause undefined behavior in the subsequent passes.
It caused a crash in the AMDGPU backend after the instruction
selection when a PHI node has its incoming values coming from
these unreachable blocks.

In the AMDGPU pass flow, the last invocation of UnreachableBlockElim
precedes where LowerSwitch is currently placed and eventually
missed out on the opportunity to get these blocks eliminated.
This patch ensures that LowerSwitch pass get inserted earlier
to make use of the existing unreachable block elimination pass.

Reviewed By: sameerds, arsenm

Differential Revision: https://reviews.llvm.org/D83584
2020-07-11 16:33:38 +05:30
Carl Ritson a3daa3f75a [AMDGPU] Unify early PS termination blocks
Generate a single early exit block out-of-line and branch to this
if all lanes are killed. This avoids branching if lanes are active.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D82641
2020-07-03 09:58:05 +09:00
Jay Foad def2e4c47f [AMDGPU] Simplify GCNPassConfig::addOptimizedRegAlloc. NFC. 2020-06-17 15:56:15 +01:00
Sameer Sahasrabuddhe d8f651d3e8 [AMDGPU] Enable structurizer workarounds by default
Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D81211
2020-06-09 13:14:15 +05:30
Stanislav Mekhanoshin 689e616ed0 [AMDGPU] Promote alloca to vector in opt
Promote alloca to vector before SROA and loop unroll. If we manage
to eliminate allocas before unroll we may choose to unroll less.

Differential Revision: https://reviews.llvm.org/D80386
2020-05-21 13:49:51 -07:00
Jay Foad 42a5560503 [AMDGPU] New SIInsertHardClauses pass
Enable clausing of memory loads on gfx10 by adding a new pass to insert
the s_clause instructions that mark the start of each hard clause.

Differential Revision: https://reviews.llvm.org/D79792
2020-05-14 18:54:49 +01:00
Carl Ritson e3ffe7269b [AMDGPU] Cluster shader exports
Summary:
Add DAG scheduling mutation to cluster export instructions.
This avoids unnecessary waitcnts being added when computation
ends up interspersed with exports.

Reviewers: foad, arsenm, rampitec, nhaehnle

Reviewed By: foad

Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79481
2020-05-07 19:05:38 +09:00
alex-t 5b898bddff [AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection.
Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence.

Reviewers: rampitec, arsenm, vpykhtin

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78091
2020-05-04 16:42:25 +03:00
Sameer Sahasrabuddhe 8c11bc0cd0 Introduce fix-irreducible pass
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.

This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D77198

This restores commit 2ada8e2525.

Originally reverted with commit 44e09b59b8.
2020-04-15 15:05:51 +05:30
Sameer Sahasrabuddhe 44e09b59b8 Revert "Introduce fix-irreducible pass"
This reverts commit 2ada8e2525.

Buildbots produced compilation errors which I was not able to quickly
reproduce locally. Need more time to investigate.
2020-04-15 12:19:50 +05:30
Sameer Sahasrabuddhe 2ada8e2525 Introduce fix-irreducible pass
An irreducible SCC is one which has multiple "header" blocks, i.e., blocks
with control-flow edges incident from outside the SCC. This pass converts an
irreducible SCC into a natural loop by introducing a single new header
block and redirecting all the edges on the original headers to this
new block.

This is a useful workaround for a limitation in the structurizer
which, which produces incorrect control flow in the presence of
irreducible regions. The AMDGPU backend provides an option to
enable this pass before the structurizer, which may eventually be
enabled by default.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D77198
2020-04-15 11:29:19 +05:30
Jay Foad 4970a1deca [AMDGPU] Remove outdated comment 2020-04-09 10:36:00 +01:00
Matt Arsenault 0aa0d70067 MIR: Use Register 2020-04-08 22:07:26 -04:00
Konstantin Pyzhov 72e8754916 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 09:05:58 -04:00
Konstantin Pyzhov 51dc028314 Revert e1730cfeb3 2020-04-06 05:56:11 -04:00
Konstantin Pyzhov e1730cfeb3 [AMDGPU] Disable 'Skip Uniform Regions' optimization by default for AMDGPU.
Reviewers: sameerds, dstuttard

Differential Revision: https://reviews.llvm.org/D77228
2020-04-06 05:10:37 -04:00
Sameer Sahasrabuddhe 3cbbded68c Introduce unify-loop-exits pass.
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.

The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".

This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.

The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.

Reviewers: madhur13490, arsenm, nhaehnle

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D75865
2020-03-30 13:23:56 -04:00
Matt Arsenault 348735b723 AMDGPU: Stop setting attributes based on TargetOptions
Having arbitrary passes looking at the TargetOptions is pretty
messy. This was also disregarding if a function already had an
explicit attribute setting on it. opt/llc now add the attributes to
functions that don't specify the attribute. clang and lld do not call
the function to do this, which they maybe should.

This was also treating unsafe-fp-math as implying the others, and
setting the other attributes based on it. This is not done anywhere
else, and I'm not sure is correct based on the current description of
the option bit.

Effectively reverts 1d8cf2be89
2020-03-27 13:13:43 -07:00
Stanislav Mekhanoshin 6e00e3fcb0 [AMDGPU] Preserve original symbol during attribute propagation
AMDGPUPropagateAttributes can swap names while cloning a function.
Only do it if original symbol was not externally visible.

Differential Revision: https://reviews.llvm.org/D76789
2020-03-25 15:26:30 -07:00