So we don't over count the number of chunks and do unnecessary work reducing more chunks than exist.
This lowers some random reduction I tested with locally from 250s to 232s.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D136127
Try some dumb strength reductions to "simpler" opcodes.
Make some opcode substitutions I typically try to get smaller
MIR out of codegen. This is a bit target specific and I have a
lot of increasingly target specific modifications I try
during manual reduction.
Verify all the requested passes exist before trying to run any.
For long reductions, it was really annoying for it to get halfway through
and then I come back later to an incomplete reduction.
Also add a new skip-delta-passes flag. Most of the time I want to opt out
of specific reductions, rather than run a select few.
Without this patch, we hit the following a lot:
"llvm.dbg.declare intrinsic requires a !dbg attachment"
"DICompileUnit not listed in llvm.dbg.cu"
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D135492
Phis have a quirk where the same predecessor block may appear multiple times
if the same block branches to it multiple ways. All the values need to match,
but this was replacing each operand independently. If an operand can be simplified,
make sure to replace every instance of the incoming block's value.
This new pass for llvm-reduce attempts to reduce DebugInfo metadata.
The process used is:
1. Scan every MD node, keeping track of nodes already visited.
2. Look for DebugInfo nodes, then record any operands that are lists.
3. Bisect though all the elements of the collected lists.
Differential Revision: https://reviews.llvm.org/D132077
This reverts commit 69549de865.
The change in D135237 can lead to verification failures like `scope must have two or three operands`.
The ongoing work in D132077 does something similar without these failures, so lets wait for that to land and revert this patch.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D135395
There can be lots of `MDTuple` debug metadata nodes. For example, `globals: !{!1, !2}` in `!DICompileUnit()`. Search through all debug info to find `MDTuple`'s and remove some of their elements.
For D135114 I was able to get a reproducer with 364 lines without manually deleting elements. After this patch I got it down to 67 lines.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D135237
Avoid crash in `reduceOperandsOneDeltaPass` function for operands with
vector of pointer type.
While on it add a `reduce-operands-ptr.ll` test in the spirit of the
existing `reduce-operands-int.ll`/`reduce-operands-fp.ll` tests.
Differential Revision: https://reviews.llvm.org/D135307
Without this patch, the assertion triggers below on the test case,
because we are using different oracles for the verification.
Assertion failed: (Targets == NoChunksCounter.count() && "number of chunks changes when reducing"), function runDeltaPass, file Delta.cpp, line 272.
This patch modifies the testcase to use error substitution so it will pass on all platforms.
Reviewed By: fanbo-meng, zibi
Differential Revision: https://reviews.llvm.org/D134034
Also skip dead defs when looking for a previous vreg with the same
class. This helps avoid some mid-reduction verifier errors when
LiveIntervals computation starts introducing dead flags everywhere.
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code. Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.
Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.
So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.
Discovered while trying to sort out related stuff on D102817.
Differential Revision: https://reviews.llvm.org/D124697
so that we can reduce away incidental parts of the CFG in cases where
the full simplifyCFG pass makes the test case uninteresting
Differential Revision: https://reviews.llvm.org/D131920
out of chunk ones. the non-default second argument to
removePredecessor() is necessary to avoid creating invalid IR on
examples like the one in the provided test case
Differential Revision: https://reviews.llvm.org/D131843
This was done by adding --abort-on-invalid-reduction to remove-function-bodies-used-in-globals.ll and fixing the fallout.
Aliases must have a GlobalValue or ConstantExpr aliasee and the aliasee must be a definition if it's a GlobalValue.
Don't RAUW functions with null if there's an alias pointing to it, and similarly don't delete the body of a function.
Don't delete the entire body of a function when reducing blocks, preserve at least one block.
Also make debugging these sorts of things easier by dumping the module when --abort-on-invalid-reduction triggers.
Reviewed By: regehr
Differential Revision: https://reviews.llvm.org/D131505
Implement an intrinsic for use lowering LDS variables to different
addresses from different kernels. This will allow kernels that cannot
reach an LDS variable to avoid wasting space for it.
There are a number of implicit arguments accessed by intrinsic already
so this implementation closely follows the existing handling. It is slightly
novel in that this SGPR is written by the kernel prologue.
It is necessary in the general case to put variables at different addresses
such that they can be compactly allocated and thus necessary for an
indirect function call to have some means of determining where a
given variable was allocated. Claiming an arbitrary SGPR into which
an integer can be written by the kernel, in this implementation based
on metadata associated with that kernel, which is then passed on to
indirect call sites is sufficient to determine the variable address.
The intent is to emit a __const array of LDS addresses and index into it.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D125060
Try to insert an implicit_def to replace the instruction's value,
replacing the original instruction's def with a dead register. If all
defs are delete the instruction entirely.
This is pretty similar to the instruction reduction, but leaves the
new defs in the same place as the original instruction. This could
possibly replace it. I'm not sure if we should directly delete the
instructions here, or leave dead ones behind.
This could also further work to replace physical register defs.
I have a register allocator failure that only reproduces with IPRA
enabled, and requires the specific regmask if I want to only run the
one relevant pass. The printed custom regmask is enormous and I would
like to reduce it.
This reduces each individual bit in the mask, but it would probably be
better to start at register units and clear all aliasing fields at a
time. This would require stricter verification that all aliasing bits
are set in regmasks (although I would prefer to switch regmasks to use
register units in the first place).
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:
; Before:
%res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
to label %asm.fallthrough [label %foo]
; After:
%res = callbr i8* asm "", "=r,r,!i"(i8* %x)
to label %asm.fallthrough [label %foo]
The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:
* Allow unrolling/peeling/rotation of callbr, or any other
clone-based optimizations
(https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
(https://github.com/llvm/llvm-project/issues/45248)
This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.
Differential Revision: https://reviews.llvm.org/D129288
Integer vectors were previously ignored when reducing operands. When
6b8bd0f72 introduced support for reducing floating-point
scalars/vectors, the vector case was written to only handle
floating-point values. It would crash when creating an invalid
ConstantFP from the integer element type.
Instead of reinstating the old integer vector behaviour, we might as
well reduce integer vectors to all-one splats.
A couple of existing tests has also been renamed from "remove" to
"reduce" to better reflect the deltas they test.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D129629
Adds support for reading and writing LTO bitcode files.
- Emit a summary if the original bitcode file had a summary
- Use split LTO units if the original bitcode file used them.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D127168
The intention is that these should never have undef operands. It turns
out the restriction the verifier enforces is too lax. The verifier
enforces that registers without a register class cannot be undef, but
it's valid to use a register with a register class and type. The
verifier needs to change to be based on the opcode.
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.
Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.