This should fix these build bot errors:
Step 6 (build-check-mlir-build-only) failure: build (failure)
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(129): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1386): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1388): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead.
Previously we would only CSE constrained FP intrinsics in the default
floating point environment. Exception behavior of "strict" is still not
allowed since we are not allowed to remove any traps in that case.
There are no restrictions on CSE across function calls inside a function.
Differential Revision: https://reviews.llvm.org/D112256
With profile data, non-trivial LoopUnswitch will only apply on non-cold loops, as unswitching cold loops may not gain much benefit but significantly increase the code size.
Reviewed By: aeubanks, asbirlea
Differential Revision: https://reviews.llvm.org/D129599
We are seeing significant performance loss when an alloca fails to get promoted
to register. I have observed that this is due to the common type found when
attempting to rewrite partition users being unviable for promotion. While if we
would have continue looking for a type, we would have found a subtype in the
original allocated type that would have enabled promotion. Thus first check if
the initial common type found is promotion viable and if not then continue
looking instead of stopping with the initial common type found.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D128073
When EarlyCSE tries to common vector masked loads/stores, it first checks that
they have same base operand and then assumes that this is enough for mask types
to be equal. This is true for typed pointers but false for opaque ones -
two loads of different vector sizes from same base pointer '%b' are the same,
`ptr %b`. (For typed pointers, `%b` was cast to vector pointer type so bases
were different).
Change assert to return from lambda `isSubmask` so this transformation properly
works with opaque pointers.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D131251
This is the 2nd patch of the two-patch series (D130188, D130189) that
fix PR56275 (https://github.com/llvm/llvm-project/issues/56275) which
is a missed opportunity for loop interchange.
As follow-up on the dependence analysis (DA) patch D130188, this patch
normalizes DA results in loop interchange, such that negative dependence
vectors queried by loop interchange are reversed to be non-negative.
Now all tests in PR56275 can get interchanged. Those tests are added
in lit test as `pr56275.ll`.
Reviewed By: kawashima-fj, bmahjour, Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D130189
Mark ModRefInfo as a bitmask enum, which allows using normal
& and | operators on it. This supersedes various functions like
unionModRef() and intersectModRef(). I think this makes the code
cleaner than going through helper functions...
Differential Revision: https://reviews.llvm.org/D130870
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
I am playing with the LoopDataPrefetch pass and found out that it
bails to work with a pointer in a non-zero address space. This
patch adds the target callback to check if an address space is to
be considered for prefetching. Default implementation still only
allows address space 0, so this is NFCI.
This does not currently affect any known targets, but seems to be
generally useful for the future.
Differential Revision: https://reviews.llvm.org/D129795
Instructions between two adjacent loops will be hoisted above the first
loop, or sunk below the second to facilitate loop fusion. Hoisting will
be attempted for an instruction that dominates the first loop.
Otherwise, sinking this instructions will be attempted.
Instructions with side effects will not be considered for sinking or
hoisting. Hoisting/sinking of any instructions between loops will only
be performed if all the instructions can be moved. As well,
sinking/hoisting is considered for each instruction in isolation,
without taking into account sinking/hoisting decisions for other
instructions in the preheader.
Differential Revision: https://reviews.llvm.org/D118076
In D129523, it was noted that there is are some questionable naked casts
from Instruction to BinaryOperator, which could be addressed by doing a
dyn_cast directly to BinaryOperator, avoiding the need for the later cast.
This cleans up that casting.
Reviewed By: nikic, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D130448
In D129523, it was noted that the approach to check whether a value can
have FastMathFlags was done in different ways, and they should be made
consistent. This patch makes minor changes to fix that.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D130408
The existing code doesn't expect dummy values (undef, poison, null-derived
constants etc) as arguments of these intrinsics. However, they can be there
in unreached code. Currently we fail trying to find base for them.
Handle these cases separately. Return null as base for them to be consistent
with the handling in the main algorithm in findBaseDefiningValue.
Differential Revision: https://reviews.llvm.org/D129561
Reviewed By: apilipenko
Remove isFreeCall() in favor of getFreedOperand(). Replace the
two remaining uses with a getFreedOperand() != nullptr check, as
they only care that something is getting freed. (The usage in DSE
is correct as such. The allocator-related checks in CFLGraph look
rather questionable in general.)
We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.
To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.
This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.
One of the transforms in LoopSimplifyCFG demands that the LCSSA form is
truly maintained for all values, tokens included, otherwise it may end up creating
a use that is not dominated by def (and Phi creation for tokens is impossible).
Detect this situation and prevent transform for it early.
Differential Revision: https://reviews.llvm.org/D129984
Reviewed By: efriedma
Changes since initial commit:
* Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in d767b392.
* isLoopInvariant returns true for instructions outside the loop, but not necessarily *above* the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in 2aed3cdb.
Original commit message follows...
The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already *are* loop invariant.
As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former?
Differential Revision: https://reviews.llvm.org/D129793
Compiling with '-ffast-math' tuns on all the FastMathFlags (FMF), as
expected, and that enables FP reassociation. Only the two FMF flags
'reassoc' and 'nsz' are technically required to perform reassociation,
but disabling other unrelated FMF bits is needlessly suppressing the
optimization.
This patch fixes that needless suppression, and makes appropriate
adjustments to test-cases, fixing some outstanding TODOs in the process.
Fixes: #56483
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D129523
The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already *are* loop invariant.
As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former?
Differential Revision: https://reviews.llvm.org/D129793
This is a followup to D129630, which switches LSR to the member
isSafeToExpand() variant, and removes the freestanding function.
This is done by creating the SCEVExpander early (already during the
analysis phase). Because the SCEVExpander is now available for the
whole lifetime of LSRInstance, I've also made it into a member
variable, rather than passing it around in even more places.
Differential Revision: https://reviews.llvm.org/D129769
In analyzing issue #56483, it was noticed that running `opt` with
`-reassociate` was missing some minor optimizations. For example,
there were cases where the running `opt` on IR with floating-point
instructions that have the `fast` flags applied, sometimes resulted in
less efficient code than the input IR (things like dead instructions
left behind, and missed reassociations). These were sometimes noted
in the test-files with TODOs, to investigate further. This commit
fixes some of these problems, removing some TODOs in the process.
FTR, I refer to these as "minor" missed optimizations, because when
running a full clang/llvm compilation, these inefficiencies are not
happening, as other passes clean that residue up. Regardless, having
cleaner IR produced by `opt`, makes assessing the quality of fixes done
in `opt` easier.
This reverts commit f1b05a0a2b.
Need to revert to due to issues identified with testing. The
transformation is incorrect for blocks that contain convergent
instructions.
As a followup to D129630, this switches a usage of the freestanding
function in LoopPredication to use the member variant instead. This
was the last use of the freestanding function, so drop it entirely.
isSafeToExpand() for addrecs depends on whether the SCEVExpander
will be used in CanonicalMode. At least one caller currently gets
this wrong, resulting in PR50506.
Fix this by a) making the CanonicalMode argument on the freestanding
functions required and b) adding member functions on SCEVExpander
that automatically take the SCEVExpander mode into account. We can
use the latter variant nearly everywhere, and thus make sure that
there is no chance of CanonicalMode mismatch.
Fixes https://github.com/llvm/llvm-project/issues/50506.
Differential Revision: https://reviews.llvm.org/D129630
After replacing a loop phi with the preheader value, it's usually
possible to simplify some of the using instructions, so do that as
part of replaceLoopPHINodesWithPreheaderValues().
Doing this as part of IndVars is valuable, because it may make GEPs
in the loop have constant offsets and allow the following SROA run
to succeed (as demonstrated in the PhaseOrdering test).
Differential Revision: https://reviews.llvm.org/D129293