With typed pointers the pointer operand type checks the address space
and the load/store type. With opaque pointers we have to check the
load/store type separately.
Move out the induction step creation from emitTransformedIndex to the
callers. In some places (e.g. widenIntOrFpInduction) the step is already
created. Passing the step in ensures the steps are kept in sync.
This rewrites ArgPromotion to be based on offsets rather than GEP
structure. We inspect all loads at constant offsets and remember
which types are loaded at which offsets. Then we promote based on
those types.
This generalizes ArgPromotion to work with bitcasted loads, and
is compatible with opaque pointers.
This patch also fixes incorrect handling of alignment during
argument promotion. Previously, the implementation only checked
that the pointer is dereferenceable, but was happy to speculate
overaligned loads. (I would have fixed this separately in advance,
but I found this hard to do with the previous implementation
approach).
Differential Revision: https://reviews.llvm.org/D118685
This makes the function independent of shared state in ILV (ensures no
new dependencies on things like the cost model are introduced) and allows
for use directly in recipe's ::execute functions.
As long as *all* the invokes in the set are indirect,
we can merge them, but don't merge direct invokes into the set,
even though it would be legal to do.
This patch replaces the function we emit the remark on when we run into
the fix-point limit. Previously we got a function to emit a remark on
from the worklist's associated function. However, the worklist may not
always have an associated function in the case of global variables.
Replace this with the function set, and if there are no functions don't
emit the remark.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D119248
PredicatedScalarEvolution has a predicate type for representing A == B. This change generalizes it into something which can represent a A <pred> B.
This generality is currently unused, but is motivated by a couple of recent cases which have come up. In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.
Alloca promotion can only deal with cases where the load/store
types match the alloca type (it explicitly does not support
bitcasted load/stores).
With opaque pointers this is no longer enforced through the pointer
type, so add an explicit check.
If the original invokes had uses, the uses must have been in PHI's,
but that immediately results in the incoming values being incompatible.
But we'll replace uses of the original invokes with the use of the
merged invoke, so as long as the incoming values become compatible
after that, we can merge.
Even if the invokes have normal destination, iff it's the same block,
we can merge them. For now, require that there are no PHI nodes,
and the returned values of invokes aren't used.
Before walking all the callers, check whether we have a
dereferenceable attribute directly on the argument.
Also make it clearer that the code currently does not treat
alignment correctly.
The helper `Attributor::checkForAllReturnedValuesAndReturnInsts`
simplifies the returned value optimistically. In `AAUndefinedBehavior`
we cannot use such optimistic values when deducing UB. As a result, we
assumed UB for the return value of a function because we initially
(=optimistically) thought the function return is `undef`. While we later
adjusted this properly, the `AAUndefinedBehavior` was under the
impression the return value is "known" (=fix) and could never change.
To correct this we use `Attributor::checkForAllInstructions` and then
manually to perform simplification of the return value, only allowing
known values to be used. This actually matches the other UB deductions.
Fixes#53647
The vectorizer will choose at times to "vectorize" loops with a scalar
factor (VF=1) with interleaving (IC > 1). This can occasionally produce
better code than the unroller (notable for reductions where it can
produce independent reduction chains that are combined after the loop).
At times this is not very beneficial though, for example when runtime
checks are needed or when the scalar code requires predication.
This addresses the second point, preventing the vectorizer from
interleaving when the scalar loop will require predication. This
prevents it from making a bit of a mess, that is worse than the original
and better left for the unroller to unroll if beneficial. It helps
reverse some of the regressions from D118090.
Differential Revision: https://reviews.llvm.org/D118566
This is a follow-up suggested in D119060.
Instead of checking each of the bottom 2 bits individually,
we can check them together and handle the possibility that
we demand both together.
https://alive2.llvm.org/ce/z/C2ihC2
Differential Revision: https://reviews.llvm.org/D119139
D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model.
What it essentially does is prevents scalarized vectorization of masked memory operations:
```
// TODO: Cost model for emulated masked load/store is completely
// broken. This hack guides the cost model to use an artificially
// high enough value to practically disable vectorization with such
// operations, except where previously deployed legality hack allowed
// using very low cost values. This is to avoid regressions coming simply
// from moving "masked load/store" check from legality to cost model.
// Masked Load/Gather emulation was previously never allowed.
// Limited number of Masked Store/Scatter emulation was allowed.
```
While i don't really understand about what specifically `is completely broken`
was talking about, i believe that at least on X86 with AVX2-or-later,
this is no longer true. (or at least, i would like to know what is still broken).
So i would like to follow suit after D111460, and like wise disable that hack for AVX2+.
But since this was added for X86 specifically, let's just instead completely remove this hack.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D114779
Enabled loop interchange support for floating point reductions
if it is allowed to reorder floating point operations.
Previously when we encouter a floating point PHI node in the
outer loop exit block, we bailed out since we could not detect
floating point reductions in the early days. Now we remove this
limiation since we are able to detect floating point reductions.
Reviewed By: #loopoptwg, Meinersbur
Differential Revision: https://reviews.llvm.org/D117450
In scalarizeInstruction(), isUniformAfterVectorization is used to detect
cases where it is sufficient to always access the first lane. This
should map directly checking whether the operand is a uniform replicate
recipe.
Differential Revision: https://reviews.llvm.org/D116654
I'm seeing ext-tsp helps CSSPGO for our intern large benchmarks so I'm turning on it for CSSPGO. For non-CS AutoFDO, ext-tsp doesn't seem to help, probably because of lower profile counts quality.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D119048
As per LangRef's definition of `noreturn` attribute:
```
noreturn
This function attribute indicates that the function never returns
normally, hence through a return instruction.
This produces undefined behavior at runtime if the function
ever does dynamically return. nnotated functions may still
raise an exception, i.a., nounwind is not implied.
```
So if we `invoke` a `noreturn` function, and the normal destination
of an invoke is not an `unreachable`, point it at the new `unreachable`
block.
The change/fix from the original commit is that we now actually create
the new block, and don't just repurpose the original block,
because said normal destination block could have other users.
This reverts commit db1176ce66,
relanding commit 598833c987.
As per LangRef's definition of `noreturn` attribute:
```
noreturn
This function attribute indicates that the function never returns
normally, hence through a return instruction.
This produces undefined behavior at runtime if the function
ever does dynamically return. nnotated functions may still
raise an exception, i.a., nounwind is not implied.
```
Changes the remark to emit on the function call that captures the globalized
variable instead of the globalized variable itself. The user should be able to
see which variable it was in the argument list of the function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106980
We can simply compute the value of this field on demand. Doing so clarifies the behavior when one of the instructions within a bundle doesn't have valid dependencies. I vaguely thing this could change behavior slightly, but none of the test cases are affected, and my attempts to write one by hand have failed.
This also minorly reduces memory usage, but that's a secondary value at best.