Now that the legacy PM is no longer tested, the huge matrix of
test prefixes used by attributor tests is no longer needed and very
confusing for the casual reader. Reduce the prefixes down to just
CHECK, TUNIT and CGSCC.
This is the first patch in a series intended for removing flag
-enable-new-pm=0 from lit tests. This is part of a bigger
effort of completely removing legacy code related to legacy
pass manager in favor of currently default new pass manager.
In this patch flag has been removed only from tests where no significant
change has been required because checks has been duplicated for
both PMs.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D134150
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
only as an afterthought. `genericValueTraversal` did offer an option
but `AAValueSimplify` did not. Thus, we might end up with "too much"
simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
problems like the infinite recursion bug (#54981) as well as code
duplication.
This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.
Fixes: https://github.com/llvm/llvm-project/issues/54981
Note: A previous version was flawed and consequently reverted in
6555558a80.
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
only as an afterthought. `genericValueTraversal` did offer an option
but `AAValueSimplify` did not. Thus, we might end up with "too much"
simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
problems like the infinite recursion bug (#54981) as well as code
duplication.
This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.
Fixes: https://github.com/llvm/llvm-project/issues/54981
Note: A previous version was flawed and consequently reverted in
6555558a80.
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
only as an afterthought. `genericValueTraversal` did offer an option
but `AAValueSimplify` did not. Thus, we might end up with "too much"
simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
problems like the infinite recursion bug (#54981) as well as code
duplication.
This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good.
Fixes: https://github.com/llvm/llvm-project/issues/54981
When we run the CGSCC pass we should only invest time on the SCC. We can
initialize AAs with information from the module slice but we should not
update those AAs. We make an exception for are call site of the SCC as
they are helpful providing information for the SCC.
Minor modifications to pointer privatization allow us to perform it even
in the CGSCC pass, similar to ArgumentPromotion.
When we run the CGSCC pass we should only invest time on the SCC. We can
initialize AAs with information from the module slice but we should not
update those AAs.
We look through loads in the "generic value traversal" and we
consequently don't need to look through them again in AAValueSimplify*.
The test changes stem from the fact that we allowed any simplified
value, incl. non-dynamically unique ones, as long as the underlying
memory was an alloca. This doesn't seem to make sense as allocas do not
protect against dynamically non-unique values. We need to make the
unique check better rather than excluding allocas. That in mind, we can
remove a lot of code by simply relying on the generic value traversal
load look through.
To soften the blow some minor adjustments have been made that allow more
simplification through the now used scheme and some tests have been
given a `norecurse` for now.
When simplify values we might end up with an instruction from a
different scope or just one that does not dominate the use. If the
instruction can be reproduced without side-effect (incl. UB) we can
now do that. For now this is mostly used for speculatable (intrinsic)
calls but as we learn to make things like arguments or loads available
this will become more powerful.
This will also allow us to remove dead stores more easily in a follow
up.
Most intrinsics, especially "default" ones, will not call back into the
IR module. `nocallback` encodes this nicely. As it was not used before,
this patch also makes use of `nocallback` in the Attributor which
results in many more `norecurse` deductions.
Tablegen part is mechanical, test updates by script.
Differential Revision: https://reviews.llvm.org/D118680
When we look through memory for a store we used to allow any other use
of the memory that is reachable. This is generally OK but we need to
make sure to actually let the user look at these properly. For now,
we simply require loads (via exact reloads).
There was some ad-hoc handling of liveness and manifest to avoid
breaking CGSCC guarantees. Things always slipped through though.
This cleanup will:
1) Prevent us from manifesting any "information" outside the CGSCC.
This might be too conservative but we need to opt-in to annotation
not try to avoid some problematic ones.
2) Avoid running any liveness analysis outside the CGSCC. We did have
some AAIsDeadFunction handling to this end but we need this for all
AAIsDead classes. The reason is that AAIsDead information is only
correct if we actually manifest it, since we don't (see point 1) we
cannot actually derive/use it at all. We are currently trying to
avoid running any AA updates outside the CGSCC but that seems to
impact things quite a bit.
3) Assert, don't check, that our modifications (during cleanup) modifies
only CGSCC functions.
Dropping this restriction seems to work fine (there are no assertion
failures), so it appears that either the updater got smarter or the
problematic cases are restricted elsewhere.
If doing this still causes issues, then the place to address it
would probably be 8f5bdaf481/llvm/lib/Transforms/IPO/Attributor.cpp (L1856-L1859),
which already prevents replacement outside the SCC, so I'm not
quite sure what this check is intended to avoid.
Differential Revision: https://reviews.llvm.org/D120987
Both `undef` and `nullptr` are maximally aligned. This is especially
important as we often see `undef` until a proper value has been
identified during simplification.
When we use liveness for edges during the `genericValueTraversal` we
need to make sure to use the AAIsDead of the correct function. This
patch adds the proper logic and some simple caching scheme. We also
add an assertion to the `isEdgeDead` call to make sure future misuse
is detected earlier.
Fixes https://github.com/llvm/llvm-project/issues/53872
D106720 introduced features that did not work properly as we could add
new queries after a fixpoint was reached and which could not be answered
by the information gathered up to the fixpoint alone.
As an alternative to D110078, which forced eager computation where we
want to continue to be lazy, this patch fixes the problem.
QueryAAs are AAs that allow lazy queries during their lifetime. They are
never fixed if they have no outstanding dependences and always run as
part of the updates in an iteration. To determine if we are done, all
query AAs are asked if they received new queries, if not, we only need
to consider updated AAs, as before. If new queries are present we go for
another iteration.
Differential Revision: https://reviews.llvm.org/D118669
We missed out on AANoRecurse in the module pass because we had no call
graph. With AAFunctionReachability we can simply ask if the function may
reach itself.
Differential Revision: https://reviews.llvm.org/D110099
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
Currently the max alignment representable is 1GB, see D108661.
Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945.
This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits.
We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now.
The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field.
Updating clang's max allowed alignment will come in a future patch.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D110451
For now we do should not treat byval arguments as local copies performed
on the call edge, though, in general we should. To make that happen we
need to teach various passes, e.g., DSE, about the copy effect of a
byval. That would also allow us to mark functions only accessing byval
arguments as readnone again, atguably their acceses have no effect
outside of the function, like accesses to allocas.
Reviewed By: kuter
Differential Revision: https://reviews.llvm.org/D108140
In LLVM IR, `AlignmentBitfieldElementT` is 5-bit wide
But that means that the maximal alignment exponent is `(1<<5)-2`,
which is `30`, not `29`. And indeed, alignment of `1073741824`
roundtrips IR serialization-deserialization.
While this doesn't seem all that important, this doubles
the maximal supported alignment from 512MiB to 1GiB,
and there's actually one noticeable use-case for that;
On X86, the huge pages can have sizes of 2MiB and 1GiB (!).
So while this doesn't add support for truly huge alignments,
which i think we can easily-ish do if wanted, i think this adds
zero-cost support for a not-trivially-dismissable case.
I don't believe we need any upgrade infrastructure,
and since we don't explicitly record the IR version,
we don't need to bump one either.
As @craig.topper speculates in D108661#2963519,
this might be an artificial limit imposed by the original implementation
of the `getAlignment()` functions.
Differential Revision: https://reviews.llvm.org/D108661
When we simplify at least one operand in the Attributor simplification
we can use the InstSimplify to work on the simplified operands. This
allows us to avoid duplication of the logic.
Depends on D106189
Differential Revision: https://reviews.llvm.org/D106190
D106185 allows us to determine if a store is needed easily. Using that
knowledge we can start to delete dead stores.
In AAIsDead we now track more state as an instruction can be dead (= the
old optimisitc state) or just "removable". A store instruction can be
removable while being very much alive, e.g., if it stores a constant
into an alloca or internal global. If we would pretend it was dead
instead of only removablewe we would ignore it when we determine what
values a load can see, so that is not what we want.
Differential Revision: https://reviews.llvm.org/D106188
This patch introduces `getPotentialCopiesOfStoredValue` which uses
AAPointerInfo to determine all "aliases" or "potential copies" of a
value that is stored into memory. This operation can fail but if it
succeeds it means we can visit all "uses" of a value even if it is
temporarily stored in memory.
There are two users for the function:
1) `Attributor::checkForAllUses` which will now ignore the value use
in a store if all "potential copies" can be identified and instead
be visited. This allows various AAs, including AAPointerInfo
itself, to look through memory.
2) `AANoCapture` which uses a custom use tracking through the
CaptureTracker interface and therefore needs to be thought
explicitly.
Differential Revision: https://reviews.llvm.org/D106185
We first simplify the operands of a compare and then reason on the
simplified versions, e.g., with AANonNull.
This does improve the simplification capabilities but also fixes a
potential problem that has not yet been observed by simplifying the
operands first.
A byval argument is a different value in the caller and callee, we
cannot propagate the information as part of AAValueSimplify. Users that
want to deal with byval arguments need to specifically perform the
argument -> call site step. We do not do this for now.
When we talk to outside analyse, e.g., LVI and ScalarEvolution, we need
to be careful with the query. The particular error occurred because we
folded a PHI node before the LVI query but the context location was now
not dominated by the value anymore. This is not supported by LVI so we
have to filter these situations before we query the outside analyses.
We have to be careful when we replace values to not use a non-dominating
instruction. It makes sense that simplification offers those as
"simplified values" but we can't manifest them in the IR without PHI
nodes. In the future we should consider potentially adding those PHI
nodes.
We should use AAValueSimplify for all value simplification, however
there was some leftover logic that predates AAValueSimplify in
AAReturnedValues. This remove the AAReturnedValues part and provides a
replacement by making AAValueSimplifyReturned strong enough to handle
all previously covered cases. Further, this improve
AAValueSimplifyCallSiteReturned to handle returned arguments.
AAReturnedValues is now much easier and the collected returned
values/instructions are now from the associated function only, making it
much more sane. We also do not have the brittle logic anymore that looks
for unresolved calls. Instead, we use AAValueSimplify to handle
recursion.
Useful code has been split into helper functions, e.g., an Attributor
interface to get a simplified value.
Differential Revision: https://reviews.llvm.org/D103860
Not all attributes are able to handle the interprocedural step and
follow the uses into a call site. Let them be able to combine call site
uses instead. This might result in some unused values/arguments being
leftover but it removes problems where we misused "is dead" even though
it was actually "is simplified/replaced".
We explicitly check for dead values due to constant propagation in
`AAIsDeadValueImpl::areAllUsesAssumedDead` instead.
Differential Revision: https://reviews.llvm.org/D103858
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
When we talk to outside analyse, e.g., LVI and ScalarEvolution, we need
to be careful with the query. The particular error occurred because we
folded a PHI node before the LVI query but the context location was now
not dominated by the value anymore. This is not supported by LVI so we
have to filter these situations before we query the outside analyses.
We have to be careful when we replace values to not use a non-dominating
instruction. It makes sense that simplification offers those as
"simplified values" but we can't manifest them in the IR without PHI
nodes. In the future we should consider potentially adding those PHI
nodes.
We should use AAValueSimplify for all value simplification, however
there was some leftover logic that predates AAValueSimplify in
AAReturnedValues. This remove the AAReturnedValues part and provides a
replacement by making AAValueSimplifyReturned strong enough to handle
all previously covered cases. Further, this improve
AAValueSimplifyCallSiteReturned to handle returned arguments.
AAReturnedValues is now much easier and the collected returned
values/instructions are now from the associated function only, making it
much more sane. We also do not have the brittle logic anymore that looks
for unresolved calls. Instead, we use AAValueSimplify to handle
recursion.
Useful code has been split into helper functions, e.g., an Attributor
interface to get a simplified value.
Differential Revision: https://reviews.llvm.org/D103860
Not all attributes are able to handle the interprocedural step and
follow the uses into a call site. Let them be able to combine call site
uses instead. This might result in some unused values/arguments being
leftover but it removes problems where we misused "is dead" even though
it was actually "is simplified/replaced".
We explicitly check for dead values due to constant propagation in
`AAIsDeadValueImpl::areAllUsesAssumedDead` instead.
Differential Revision: https://reviews.llvm.org/D103858