This restores commit b756096b0c, which was
originally reverted in 00b09a7b18.
AAPointerInfo now maintains a list of all Access objects that it owns, along
with the following maps:
- OffsetBins: OffsetAndSize -> { Access }
- InstTupleMap: RemoteI x LocalI -> Access
A RemoteI is any instruction that accesses memory. RemoteI is different from
LocalI if and only if LocalI is a call; then RemoteI is some instruction in the
callgraph starting from LocalI.
Motivation: When AAPointerInfo recomputes the offset for an instruction, it sets
the value to Unknown if the new offset is not the same as the old offset. The
instruction must now be moved from its current bin to the bin corresponding to
the new offset. This happens for example, when:
- A PHINode has operands that result in different offsets.
- The same remote inst is reachable from the same local inst via different paths
in the callgraph:
```
A (local inst)
|
B
/ \
C1 C2
\ /
D (remote inst)
```
This fixes a bug where a store is incorrectly eliminated in a lit test.
Reviewed By: jdoerfert, ye-luo
Differential Revision: https://reviews.llvm.org/D136526
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.
The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.
High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.
Differential Revision: https://reviews.llvm.org/D135780
AAPointerInfo now maintains a list of all Access objects that it owns, along
with the following maps:
- OffsetBins: OffsetAndSize -> { Access }
- InstTupleMap: RemoteI x LocalI -> Access
A RemoteI is any instruction that accesses memory. RemoteI is different from
LocalI if and only if LocalI is a call; then RemoteI is some instruction in the
callgraph starting from LocalI.
Motivation: When AAPointerInfo recomputes the offset for an instruction, it sets
the value to Unknown if the new offset is not the same as the old offset. The
instruction must now be moved from its current bin to the bin corresponding to
the new offset. This happens for example, when:
- A PHINode has operands that result in different offsets.
- The same remote inst is reachable from the same local inst via different paths
in the callgraph:
```
A (local inst)
|
B
/ \
C1 C2
\ /
D (remote inst)
```
This fixes a bug where a store is incorrectly eliminated in a lit test.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D136526
A User like the PHINode may be visited multiple times for the same pointer along
different def-use edges. The uninitialized state of OffsetInfo at the first
visit needs to be distinct from the Unknown value that may be assigned after
processing the PHINode. Without that, a PHINode with all inputs Unknown is never
followed to its uses. This results in incorrect optimization because some
interfering accessess are missed.
Differential Revision: https://reviews.llvm.org/D134704
Now that the legacy PM is no longer tested, the huge matrix of
test prefixes used by attributor tests is no longer needed and very
confusing for the casual reader. Reduce the prefixes down to just
CHECK, TUNIT and CGSCC.
This is the first patch in a series intended for removing flag
-enable-new-pm=0 from lit tests. This is part of a bigger
effort of completely removing legacy code related to legacy
pass manager in favor of currently default new pass manager.
In this patch flag has been removed only from tests where no significant
change has been required because checks has been duplicated for
both PMs.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D134150
We no longer need specialized knowledge of these allocator functions in
this file since we have the correct attributes available now.
As far as I can tell the changes in the attributor tests are due to
things getting more consistent on alloc-family once we remove the static
list entries.
The two test changes in NewGVN merit extra scrutiny: NewGVN appears to
be _extremely_ sensitive to the inaccessiblememonly for reasons that
are beyond me. As a result, I had-enumerated all the attributes on
allocation functions in those two tests instead of using -inferattrs.
I assumed that the two -disable-simplify-libcalls tests there no
longer are sensible since the function declaration now includes all the
relevant attributes.
Differential Revision: https://reviews.llvm.org/D130107
If we look at a write, we should not enact the "has been written to"
logic introduced to avoid spurious write -> read dependences. Doing so
lead to elimination of stores we needed, which is obviously bad.
If a function is non-recursive we only performed intra-procedural
reasoning for reachability (via AA::isPotentiallyReachable). However,
if it is re-entrant that doesn't mean we can't reach. Instead of this
problematic logic in the reachability reasoning we utilize logic in
AAPointerInfo. If a location is for sure written by a function it can
be re-entrant or recursive we know only intra-procedural reasoning is
sufficient.
If we have a dominating must-write access we do not need to know the
initial value of some object to perform reasoning about the potential
values. The dominating must-write has overwritten the initial value.
We were quite conservative when it came to PHI node handling to avoid
recursive reasoning. Now we check more direct if we have seen a PHI
already or not. This allows non-recursive PHI chains to be handled.
This also exposed a bug as we did only model the effect of one loop
traversal. `phi_no_store_3` has been adapted to show how we would have
used `undef` instead of `1` before. With this patch we don't replace
it at all, which is expected as we do not argue about loop iterations
(or alignments).
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
only as an afterthought. `genericValueTraversal` did offer an option
but `AAValueSimplify` did not. Thus, we might end up with "too much"
simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
problems like the infinite recursion bug (#54981) as well as code
duplication.
This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.
Fixes: https://github.com/llvm/llvm-project/issues/54981
Note: A previous version was flawed and consequently reverted in
6555558a80.
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
only as an afterthought. `genericValueTraversal` did offer an option
but `AAValueSimplify` did not. Thus, we might end up with "too much"
simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
problems like the infinite recursion bug (#54981) as well as code
duplication.
This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good even if some tests look like they regress.
Fixes: https://github.com/llvm/llvm-project/issues/54981
Note: A previous version was flawed and consequently reverted in
6555558a80.
If we are certainly not in a loop we can directly emit the heap2stack
allocas in the function entry block. This will help to get rid of them
(SROA) and avoid stacksave/restore intrinsics when the function is
inlined.
For the longest time we used `AAValueSimplify` and
`genericValueTraversal` to determine "potential values". This was
problematic for many reasons:
- We recomputed the result a lot as there was no caching for the 9
locations calling `genericValueTraversal`.
- We added the idea of "intra" vs. "inter" procedural simplification
only as an afterthought. `genericValueTraversal` did offer an option
but `AAValueSimplify` did not. Thus, we might end up with "too much"
simplification in certain situations and then gave up on it.
- Because `genericValueTraversal` was not a real `AA` we ended up with
problems like the infinite recursion bug (#54981) as well as code
duplication.
This patch introduces `AAPotentialValues` and replaces the
`AAValueSimplify` uses with it. `genericValueTraversal` is folded into
`AAPotentialValues` as are the instruction simplifications performed in
`AAValueSimplify` before. We further distinguish "intra" and "inter"
procedural simplification now.
`AAValueSimplify` was not deleted as we haven't ported the
re-materialization of instructions yet. There are other differences over
the former handling, e.g., we may not fold trivially foldable
instructions right now, e.g., `add i32 1, 1` is not folded to `i32 2`
but if an operand would be simplified to `i32 1` we would fold it still.
We are also even more aware of function/SCC boundaries in CGSCC passes,
which is good.
Fixes: https://github.com/llvm/llvm-project/issues/54981
When we run the CGSCC pass we should only invest time on the SCC. We can
initialize AAs with information from the module slice but we should not
update those AAs. We make an exception for are call site of the SCC as
they are helpful providing information for the SCC.
Minor modifications to pointer privatization allow us to perform it even
in the CGSCC pass, similar to ArgumentPromotion.
When we run the CGSCC pass we should only invest time on the SCC. We can
initialize AAs with information from the module slice but we should not
update those AAs.
The Attributor, as many other parts in LLVM, uses pointer equivalence
for `llvm::Value`s. This only works as long as `llvm::Value`s are
dynamically unique, or, to be exact, we will never end up with the same
`llvm::Value` representing two dynamic instances. We already provided a
helper to check the former, namely `AA::isDynamicallyUnique`, however we
could not check the latter. In this patch we move the logic into a
separate AA which helps with the growing complexity and use cases. We
also extend the interface to answer the second question rather than the
first. So we do not determine dynamically uniqueness but if we might end
up with the `llvm::Value` describing a different dynamic instance. Note
that the latter is very much tied to the Attributor capabilities to look
through memory, recursion, etc. so we need to update the logic as we go.
We look through loads in the "generic value traversal" and we
consequently don't need to look through them again in AAValueSimplify*.
The test changes stem from the fact that we allowed any simplified
value, incl. non-dynamically unique ones, as long as the underlying
memory was an alloca. This doesn't seem to make sense as allocas do not
protect against dynamically non-unique values. We need to make the
unique check better rather than excluding allocas. That in mind, we can
remove a lot of code by simply relying on the generic value traversal
load look through.
To soften the blow some minor adjustments have been made that allow more
simplification through the now used scheme and some tests have been
given a `norecurse` for now.
Most intrinsics, especially "default" ones, will not call back into the
IR module. `nocallback` encodes this nicely. As it was not used before,
this patch also makes use of `nocallback` in the Attributor which
results in many more `norecurse` deductions.
Tablegen part is mechanical, test updates by script.
Differential Revision: https://reviews.llvm.org/D118680
Heap-2-stack and heap-2-shared can replace an allocation call with
something else. To avoid us deriving information from the allocator
implementation we register a simplification callback now that will
force us to stop at the call site. We probably should create the
replacement memory eagerly and return that instead though.
We have the `clang -cc1` command-line option `-funwind-tables=1|2` and
the codegen option `VALUE_CODEGENOPT(UnwindTables, 2, 0) ///< Unwind
tables (1) or asynchronous unwind tables (2)`. However, this is
encoded in LLVM IR by the presence or the absence of the `uwtable`
attribute, i.e. we lose the information whether to generate want just
some unwind tables or asynchronous unwind tables.
Asynchronous unwind tables take more space in the runtime image, I'd
estimate something like 80-90% more, as the difference is adding
roughly the same number of CFI directives as for prologues, only a bit
simpler (e.g. `.cfi_offset reg, off` vs. `.cfi_restore reg`). Or even
more, if you consider tail duplication of epilogue blocks.
Asynchronous unwind tables could also restrict code generation to
having only a finite number of frame pointer adjustments (an example
of *not* having a finite number of `SP` adjustments is on AArch64 when
untagging the stack (MTE) in some cases the compiler can modify `SP`
in a loop).
Having the CFI precise up to an instruction generally also means one
cannot bundle together CFI instructions once the prologue is done,
they need to be interspersed with ordinary instructions, which means
extra `DW_CFA_advance_loc` commands, further increasing the unwind
tables size.
That is to say, async unwind tables impose a non-negligible overhead,
yet for the most common use cases (like C++ exceptions), they are not
even needed.
This patch extends the `uwtable` attribute with an optional
value:
- `uwtable` (default to `async`)
- `uwtable(sync)`, synchronous unwind tables
- `uwtable(async)`, asynchronous (instruction precise) unwind tables
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114543
D106720 introduced features that did not work properly as we could add
new queries after a fixpoint was reached and which could not be answered
by the information gathered up to the fixpoint alone.
As an alternative to D110078, which forced eager computation where we
want to continue to be lazy, this patch fixes the problem.
QueryAAs are AAs that allow lazy queries during their lifetime. They are
never fixed if they have no outstanding dependences and always run as
part of the updates in an iteration. To determine if we are done, all
query AAs are asked if they received new queries, if not, we only need
to consider updated AAs, as before. If new queries are present we go for
another iteration.
Differential Revision: https://reviews.llvm.org/D118669
We missed out on AANoRecurse in the module pass because we had no call
graph. With AAFunctionReachability we can simply ask if the function may
reach itself.
Differential Revision: https://reviews.llvm.org/D110099
genericValueTraversal can look through arguments and allow value
simplification across function boundaries. In fact, the latter already
happened unchecked. With this change we allow the user of
genericValueTraversal to opt-out of interprocedural traversal if
required. We explicitly look through arguments now which helps to do
various things, incl. the propagation of constants into OpenMP parallel
regions (on the host).
We have two attributes that can answer readnone queries. While there is
a dependence between them, it seems best to not force the users to know
what AA to ask. The helpers also allow to check for readonly nicely.
Test changes show where we now deduce readnone but haven't before,
mostly because we only asked AAMemoryBehavior and not AAMemoryLocation.
AANoAlias has not been ported to the new API yet.
Since D104432 we can look through memory by analyzing all writes that
might interfere with a load. This patch provides some logic to exclude
writes that cannot interfere with a location, due to CFG reasoning.
We make sure to avoid multi-thread write-read situations properly while
we ignore writes that cannot reach a load or writes that will be
overwritten before the load is reached.
Differential Revision: https://reviews.llvm.org/D106397
Rewrite the calloc specific handling in heap-to-stack to allow arbitrary init values. The basic problem being solved is that if an allocation is initilized to anything other than zero, this must be explicitly done for the formed alloca as well.
This covers the calloc case today, but once a couple of earlier guards are removed in this code, downstream allocators with other init values could also be handled.
Inspired by discussion on D116971
If we look at potentially interfering accesses we need to ensure the
"IsExact" flag is set appropriately. Accesses that have an "unknown"
size or offset cannot be exact matches and we missed to flag that.
Error and test reported by Serguei N. Dmitriev.
AAPointerInfo, and thereby other places, can look already through
internal global and stack memory. This patch enables them to look
through heap memory returned by functions with a `noalias` return.
In the future we can look through `noalias` arguments as well but that
will require AAIsDead to learn that such memory can be inspected by the
caller later on. We also need teach AAPointerInfo about dominance to
actually deal with memory that might not be `null` or `undef`
initialized. D106397 is a first step in that direction already.
Reviewed By: kuter
Differential Revision: https://reviews.llvm.org/D109170
Recursion can happen when we see a PHI use the second time or when we
look at a store value operand use again. We already visited the
potential copies and doing so again will just cause endless looping.
Reviewed By: kuter
Differential Revision: https://reviews.llvm.org/D108190
PHI nodes are not pass through but change their value, we have to
account for that to avoid missing stores.
Follow up for D107798 to fix PR51249 for good.
Differential Revision: https://reviews.llvm.org/D107808
AAPointerInfoFloating needs to visit all uses and some multiple times if
we go through PHI nodes. Attributor::checkForAllUses keeps a visited set
so we don't recurs endlessly. We now allow recursion for non-phi uses so
we track all pointer offsets via PHI nodes properly without endless
recursion.
This replaces the first attempt D107579.
Differential Revision: https://reviews.llvm.org/D107798
When we simplify at least one operand in the Attributor simplification
we can use the InstSimplify to work on the simplified operands. This
allows us to avoid duplication of the logic.
Depends on D106189
Differential Revision: https://reviews.llvm.org/D106190