I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D107271
ValueTracking should allow for value ranges that may satisfy
llvm.assume, instead of restricting the ranges only to values that
will always satisfy the condition.
Differential Revision: https://reviews.llvm.org/D107298
The check for size_t parameter 1 was already here for snprintf_chk,
but it wasn't applied to regular snprintf. This could lead to
mismatching and eventually crashing as shown in:
https://llvm.org/PR50885
If a reduction Phi has a single user which `AND`s the Phi with a type mask,
`lookThroughAnd` will return the user of the Phi and the narrower type represented
by the mask. Currently this is only used for arithmetic reductions, whereas loops
containing logical reductions will create a reduction intrinsic using the widened
type, for example:
for.body:
%phi = phi i32 [ %and, %for.body ], [ 255, %entry ]
%mask = and i32 %phi, 255
%gep = getelementptr inbounds i8, i8* %ptr, i32 %iv
%load = load i8, i8* %gep
%ext = zext i8 %load to i32
%and = and i32 %mask, %ext
...
^ this will generate an and reduction intrinsic such as the following:
call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...)
The same example for an add instruction would create an intrinsic of type i8:
call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...)
This patch changes AddReductionVar to call lookThroughAnd for other integer
reductions, allowing loops similar to the example above with reductions such
as and, or & xor to vectorize.
Reviewed By: david-arm, dmgreen
Differential Revision: https://reviews.llvm.org/D105632
D106850 introduced a simplification for llvm.vscale by looking at the
surrounding function's vscale_range attributes. The call that's being
simplified may not yet have been inserted into the IR. This happens for
example during function cloning.
This patch fixes the issue by checking if the instruction is in a
parent basic block.
We don't allowing inlining for functions with blockaddress with uses other than strictly callbr. This is because if the blockaddress escapes the function via a global variable, inlining may lead to an invalid cross-function reference.
We check against such cases during inlining, however the check can fail for ThinLTO post-link because CFG simplification can incorrectly removes blocks based on wrong block reachability.
When we import a function with blockaddress taken in a global variable but without importing that variable, we won't go through value mapping to reflect the real address-taken-ness of the cloned blocks. For the imported clone, this leads to blocks reachable from indirect branch through global variable being incorrectly treated as unreachable and removed by SimplifyCFG.
Since inlining for such cases shouldn't be allowed in the first place, I'm marking them as ineligible for importing during pre-link to save the problem of missing address-taken-ness of imported clone as well as bad DCE and inlining.
Differential Revision: https://reviews.llvm.org/D106930
We already have an indication (error) if the desired inline advisor
cannot be enabled, but we don't have a positive indication. Added
LLVM_DEBUG messages for the latter.
The Exit instruction passed in for checking if it's an ordered reduction need not be
an FPAdd operation. We need to bail out at that point instead of
assuming it is an FPAdd (and hence has two operands). See added testcase.
It crashes without the patch because the Exit instruction is a phi with
exactly one operand.
This latent bug was exposed by 95346ba which added support for
multi-exit loops for vectorization.
Reviewed-By: kmclaughlin
Differential Revision: https://reviews.llvm.org/D106843
Users, especially the Attributor, might replace multiple operands at
once. The actual implementation of simplifyWithOpReplaced is able to
handle that just fine, the interface was simply not allowing to replace
more than one operand at a time. This is exposing a more generic
interface without intended changes for existing code.
Differential Revision: https://reviews.llvm.org/D106189
Proposed alternative to D105338.
This is ugly, but short-term I think it's the best way forward: first,
let's formalize the hacks into a coherent model. Then we can consider
extensions of that model (we could have different flavors of volatile
with different rules).
Differential Revision: https://reviews.llvm.org/D106309
This patch removes RtCheck from RuntimeCheckingPtrGroup to make it
possible to construct RuntimeCheckingPtrGroup objects without a
RuntimePointerChecking object. This should make it easier to
re-use the code to generate runtime checks, e.g. in D102834.
RtCheck was only used to access the pointer info for a given index.
Instead, the start and end expressions can be passed directly.
For code-gen, we also need to know the address space to use. This can
also be explicitly passed at construction.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D105481
This adjusts mayHaveSideEffect() to return true for !willReturn()
instructions. Just like other side-effects, non-willreturn calls
(aka "divergence") cannot be removed and cannot be reordered relative
to other side effects. This fixes a number of bugs where
non-willreturn calls are either incorrectly dropped or moved. In
particular, it also fixes the last open problem in
https://bugs.llvm.org/show_bug.cgi?id=50511.
I performed a cursory review of all current mayHaveSideEffect()
uses, which convinced me that these are indeed the desired default
semantics. Places that do not want to consider non-willreturn as a
sideeffect generally do not want mayHaveSideEffect() semantics at
all. I identified two such cases, which are addressed by D106591
and D106742. Finally, there is a use in SCEV for which we don't
really have an appropriate API right now -- what it wants is
basically "would this be considered forward progress". I've just
spelled out the previous semantics there.
Differential Revision: https://reviews.llvm.org/D106749
Tests with multiple benchmarks, like Embench [1], showed that the
CallPenalty magic number has the most influence on inlining decisions
when optimizing for size.
On the other hand, there was no good default value for this parameter.
Some benchmarks profited strongly from a reduced call penalty. On
example is the picojpeg benchmark compiled for RISC-V, which got 6%
smaller with a CallPenalty of 10 instead of 12. Other benchmarks
increased in size, like matmult.
This commit makes the compromise of turning the magic number constant of
CallPenalty into a configurable value. This introduces the flag
`--inline-call-penalty`. With that flag users can fine tune the inliner
to their needs.
The CallPenalty constant was also used for loops. This commit replaces
the CallPenalty constant with a new LoopPenalty constant that is now
used instead.
This is a slimmed down version of https://reviews.llvm.org/D30899
[1]: https://github.com/embench/embench-iot
Differential Revision: https://reviews.llvm.org/D105976
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:
1. Tree-wise. This is the typical fast-math reduction that involves
continually splitting a vector up into halves and adding each
half together until we get a scalar result. This is the default
behaviour for integers, whereas for floating point we only do this
if reassociation is allowed.
2. Ordered. This now allows us to estimate the cost of performing
a strict vector reduction by treating it as a series of scalar
operations in lane order. This is the case when FP reassociation
is not permitted. For scalable vectors this is more difficult
because at compile time we do not know how many lanes there are,
and so we use the worst case maximum vscale value.
I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.
New tests have been added here:
Analysis/CostModel/AArch64/reduce-fadd.ll
Analysis/CostModel/AArch64/sve-intrinsics.ll
Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll
Differential Revision: https://reviews.llvm.org/D105432
Eli pointed out the issue when reviewing D104140. The max trip count logic makes an assumption that the value of IV changes. When the step is zero, the nowrap fact becomes trivial, and thus there's nothing preventing the loop from being nearly infinite. (The "nearly" part is because mustprogress may disallow an infinite loop while still allowing 999999999 iterations before RHS happens to allow an exit.)
This is very difficult to see in practice. You need a means to produce a loop varying RHS in a mustprogress loop which doesn't allow the loop to be infinite. In most cases, LICM or SCEV are smart enough to remove the loop varying expressions.
Differential Revision: https://reviews.llvm.org/D106327
Avoid buffering just to copy the buffered data, in 'development
mode', when logging. Instead, just populate the underlying protobuf.
Differential Revision: https://reviews.llvm.org/D106592
Constfold constrained variants of operations fadd, fsub, fmul, fdiv,
frem, fma and fmuladd.
The change also sets up some means to support for removal of unused
constrained intrinsics. They are declared as accessing memory to model
interaction with floating point environment, so they were not removed,
as they have side effect. Now constrained intrinsics that have
"fpexcept.ignore" as exception behavior are removed if they have no uses.
As for intrinsics that have exception behavior other than "fpexcept.ignore",
they can be removed if it is known that they do not raise floating point
exceptions. It happens when doing constant folding, attributes of such
intrinsic are changed so that the intrinsic is not claimed as accessing
memory.
Differential Revision: https://reviews.llvm.org/D102673
This patch changes `__kmpc_free_shared` to take an additional argument
corresponding to the associated allocation's size. This makes it easier to
implement the allocator in the runtime.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106496
Make getLatchCmpInst non-static and use it in LoopFlatten as a more
robust way of identifying the compare.
Differential Revision: https://reviews.llvm.org/D106256
If a reduction Phi has a single user which `AND`s the Phi with a type mask,
`lookThroughAnd` will return the user of the Phi and the narrower type represented
by the mask. Currently this is only used for arithmetic reductions, whereas loops
containing logical reductions will create a reduction intrinsic using the widened
type, for example:
for.body:
%phi = phi i32 [ %and, %for.body ], [ 255, %entry ]
%mask = and i32 %phi, 255
%gep = getelementptr inbounds i8, i8* %ptr, i32 %iv
%load = load i8, i8* %gep
%ext = zext i8 %load to i32
%and = and i32 %mask, %ext
...
^ this will generate an and reduction intrinsic such as the following:
call i32 @llvm.vector.reduce.and.v8i32(<8 x i32>...)
The same example for an add instruction would create an intrinsic of type i8:
call i8 @llvm.vector.reduce.add.v8i8(<8 x i8>...)
This patch changes AddReductionVar to call lookThroughAnd for other integer
reductions, allowing loops similar to the example above with reductions such
as and, or & xor to vectorize.
Reviewed By: david-arm, dmgreen
Differential Revision: https://reviews.llvm.org/D105632
Allow arbitrary strides, and make sure we return the correct result when
the backedge-taken count is zero.
Differential Revision: https://reviews.llvm.org/D106197
Wrap semantics are subtle when combined with multiple exits. This has caused several rounds of confusion during recent reviews, so try to document the subtly distinction between when wrap flags provide <u and <=u facts.
The specific case that triggered this was when inlining a recursive
internal function into itself caused the recursion to go away, allowing
the inliner to mark the function as dead. The inliner marks the SCC as
invalidated but does not provide a new SCC to continue with.
This matches the implementations of ModuleToPostOrderCGSCCPassAdaptor
and CGSCCPassManager.
Fixes PR50363.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D106306
It turns out that during training, the time required to parse the
textual protobuf of a training log is about the same as the time it
takes to compile the module generating that log. Using binary protobufs
instead elides that cost almost completely.
Differential Revision: https://reviews.llvm.org/D106157
This fixes the lower and upper bound calculation of a
RuntimeCheckingPtrGroup when it has more than one loop
invariant pointers. Resolves PR50686.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D104148
The current implementation of computeBECount doesn't account for the
possibility that adding "Stride - 1" to Delta might overflow. For almost
all loops, it doesn't, but it's not actually proven anywhere.
To deal with this, use a variety of tricks to try to prove that the
addition doesn't overflow. If the proof is impossible, use an alternate
sequence which never overflows.
Differential Revision: https://reviews.llvm.org/D105216
D104806 broke some uses of getMinusSCEV() in DependenceAnalysis:
subtraction with different pointer bases returns a SCEVCouldNotCompute.
Make sure we avoid cases involving such subtractions.
Differential Revision: https://reviews.llvm.org/D106099
This is split from D105216, it handles only a subset of the cases in that patch.
Specifically, the issue being fixed is that the code incorrectly assumed that (Start-Stide) < End implied that the backedge was taken at least once. This is not true when e.g. Start = 4, Stride = 2, and End = 3. Note that we often do produce the right backedge taken count despite the flawed reasoning.
The fix chosen here is to use an alternate form of uceil (ceiling of unsigned divide) lowering which is safe when max(RHS,Start) > Start - Stride. (Note that signedness of both max expression and comparison depend on the signedness of the comparison being analyzed, and that overflow in the Start - Stride expression is allowed.) Note that this is weaker than proving the backedge is taken because it allows start - stride < end < start. Some cases which can't be proven safe are sent down the generic path, and we do end up generating less optimal expressions in a few cases.
Credit for coming up with the approach goes entirely to Eli. I just split it off, tweaked the comments a bit, and did some additional testing.
Differential Revision: https://reviews.llvm.org/D105942
This is split from D105216, but the code is hoisted much earlier into
the path where we can actually get a zero stride flowing through. Some
fairly simple proofs handle the cases which show up in practice. The
only test changes are the cases where we really do need a non-zero
divider to produce the right result.
Recommitting with isLoopInvariant() check.
Differential Revision: https://reviews.llvm.org/D105921
This is split from D105216, but the code is hoisted much earlier into the path where we can actually get a zero stride flowing through. Some fairly simple proofs handle the cases which show up in practice. The only test changes are the cases where we really do need a non-zero divider to produce the right result.
Differential Revision: https://reviews.llvm.org/D105921
Split off from D105216 to simplify review. Rewritten with a lambda to be easier to follow. Comments clarified.
Sorry for no test case, this is tricky to exercise with the current structure of the code. It's about to be hit more frequently in a follow up patch, and the change itself is simple.
This is split from D105216 to reduce patch complexity. Original code by Eli with very minor modification by me.
The primary point of this patch is to add the getUDivCeilSCEV routine. I included the two callers with constant arguments as we know those must constant fold even without any of the fancy inference logic.
This is now the same as isIntAttrKind(), so use that instead, as
it does not require manual maintenance. The naming is also more
accurate in that both int and type attributes have an argument,
but this method was only targeting int attributes.
I initially wanted to tighten the AttrBuilder assertion, but we
have some in-tree uses that would violate it.
Rules:
1. SCEVUnknown is a pointer if and only if the LLVM IR value is a
pointer.
2. SCEVPtrToInt is never a pointer.
3. If any other SCEV expression has no pointer operands, the result is
an integer.
4. If a SCEVAddExpr has exactly one pointer operand, the result is a
pointer.
5. If a SCEVAddRecExpr's first operand is a pointer, and it has no other
pointer operands, the result is a pointer.
6. If every operand of a SCEVMinMaxExpr is a pointer, the result is a
pointer.
7. Otherwise, the SCEV expression is invalid.
I'm not sure how useful rule 6 is in practice. If we exclude it, we can
guarantee that ScalarEvolution::getPointerBase always returns a
SCEVUnknown, which might be a helpful property. Anyway, I'll leave that
for a followup.
This is basically mop-up at this point; all the changes with significant
functional effects have landed. Some of the remaining changes could be
split off, but I don't see much point.
Differential Revision: https://reviews.llvm.org/D105510
In order to mirror the GetElementPtrInst::indices() API.
Wanted to use this in the IRForTarget code, and was surprised to
find that it didn't exist yet.
Currently InstructionSimplify.cpp knows how to simplify floating point
instructions that have a NaN operand. It does not know how to handle the
matching constrained FP intrinsic.
This patch teaches it how to simplify so long as the exception handling
is not "fpexcept.strict".
Differential Revision: https://reviews.llvm.org/D103169
This reverts commit 5b350183cd (and
also "[NFC][ScalarEvolution] Cleanup howManyLessThans.",
009436e9c1, to make it apply).
See https://reviews.llvm.org/D105216 for discussion on various
miscompilations caused by that commit.
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reductions
from trees starting at extract elements. IsPairWise is now assumed to be
false, which was the predominant way that the value was used from both
the Loop and SLP vectorizers. Since the adjustments such as D93860, the
SLP vectorizer has not relied upon this distinction between paiwise and
non-pairwise reductions.
This also removes some code that was detecting reductions trees starting
from extract elements inside the costmodel. This case was
double-counting costs though, adding the individual costs on the
individual instruction _and_ the total cost of the reduction. Removing
it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to
not double count. The cost of reduction intrinsics is still tested
through the various tests in
llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.
Differential Revision: https://reviews.llvm.org/D105484
There was an alias between 'simplifycfg' and 'simplify-cfg' in the
PassRegistry. That was the original reason for this patch, which
effectively removes the alias.
This patch also replaces all occurrances of 'simplify-cfg'
by 'simplifycfg'. Reason for choosing that form for the name is
that it matches the DEBUG_TYPE for the pass, and the legacy PM name
and also how it is spelled out in other passes such as
'loop-simplifycfg', and in other options such as
'simplifycfg-merge-cond-stores'.
I for some reason the name should be changed to 'simplify-cfg' in
the future, then I think such a renaming should be more widely done
and not only impacting the PassRegistry.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D105627
There are two issues with the current implementation of computeBECount:
1. It doesn't account for the possibility that adding "Stride - 1" to
Delta might overflow. For almost all loops, it doesn't, but it's not
actually proven anywhere.
2. It doesn't account for the possibility that Stride is zero. If Delta
is zero, the backedge is never taken; the value of Stride isn't
relevant. To handle this, we have to make sure that the expression
returned by computeBECount evaluates to zero.
To deal with this, add two new checks:
1. Use a variety of tricks to try to prove that the addition doesn't
overflow. If the proof is impossible, use an alternate sequence which
never overflows.
2. Use umax(Stride, 1) to handle the possibility that Stride is zero.
Differential Revision: https://reviews.llvm.org/D105216
Add a function removePointerBase that returns, essentially, S -
getPointerBase(S). Use it in getMinusSCEV instead of actually
subtracting pointers.
Differential Revision: https://reviews.llvm.org/D105503
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.
There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.
The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.
As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base. This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.
This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes). The resulting changes
seem okay to me, but suggestions welcome. As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.
I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.
Recommitting with fix to MemoryDepChecker::isDependent.
Differential Revision: https://reviews.llvm.org/D104806
If any operand of a math op is poison, that takes
precedence over general undef/NaN.
This should not be visible with binary ops because
it requires 2 constant operands to trigger (and if
both operands of a binop are constant, that should
get handled first in ConstantFolding).
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.
There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.
The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.
As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base. This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.
This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes). The resulting changes
seem okay to me, but suggestions welcome. As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.
I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.
Differential Revision: https://reviews.llvm.org/D104806
This patch adds a TTI function, isElementTypeLegalForScalableVector, to query
whether it is possible to vectorize a given element type. This is called by
isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if
any of the instruction types in the loop are unsupported, e.g:
int foo(__int128_t* ptr, int N)
#pragma clang loop vectorize_width(4, scalable)
for (int i=0; i<N; ++i)
ptr[i] = ptr[i] + 42;
This example currently crashes if we attempt to vectorize since i128 is not a
supported type for scalable vectorization.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D102253
We already have a fold for variable index with constant vector,
but if we can determine a scalar splat value, then it does not
matter whether that value is constant or not.
We overlooked this fold in D102404 and earlier patches,
but the fixed vector variant is shown in:
https://llvm.org/PR50817
Alive2 agrees on that:
https://alive2.llvm.org/ce/z/HpijPC
The same logic applies to scalable vectors.
Differential Revision: https://reviews.llvm.org/D104867
This replaces the current ad-hoc implementation,
by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`,
with one exception that here in SimplifyCFG we are allowed to remove EH instructions.
Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return),
arithmetic/logic operations, etc.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D105374
This change yields an additional 2% size reduction on an internal search
binary, and an additional 0.5% size reduction on fuchsia.
Differential Revision: https://reviews.llvm.org/D104751
This is the cause of the miscompile in:
https://llvm.org/PR50944
The problem has likely existed for some time, but it was made visible with:
5af8bacc94 ( D104661 )
handleOtherCmpSelSimplifications() assumed it can convert select of
constants to bool logic ops, but that does not work with poison.
We had a very similar construct in InstCombine, so the fix here
mimics the fix there.
The bug is in instsimplify, but I'm not sure how to reproduce it outside of
instcombine. The reason this is visible in instcombine is because we have a
hack (FIXME) to bypass simplification of a select when it has an icmp user:
955f125899/llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp (L2632)
So we get to an unusual case where we are trying to simplify an instruction
that has an operand that would have already simplified if we had processed
it in normal order.
Differential Revision: https://reviews.llvm.org/D105298
Use separate variable for adjusted scale used for GCD computations. This
fixes an issue where we incorrectly determined that all indices are
non-negative and returned noalias because of that.
Follow up to 91fa3565da.
We have analogous rules in instsimplify, etc.., but were missing the same in SCEV. The fold is near trivial, but came up in the context of a larger change.
(V * Scale) % X may not produce the same result for any possible value
of V, e.g. if the multiplication overflows. This means we currently
incorrectly determine NoAlias in some cases.
This patch updates LinearExpression to track whether the expression
has NSW and uses that to adjust the scale used for alias checks.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99424
There's no reason to use the weaker name-only analysis when we
have a function prototype to check (in fact, we probably should
not even have that name-only function exposed for general use,
but removing it requires auditing all of the callers).
The version of getLibFunc that takes a Function argument also
does some prototype checking to make sure the arguments/return
type match the expected signature of a real library call.
This is NFC-intended because the code in MemoryBuiltins does its
own function signature checking. For now, that means there may
be some redundancy in the checking, but that should not be above
the noise for compile-time. Ideally, we can move the checks to
a single location.
There's still a hole in the logic that allows the example in
https://llvm.org/PR50846 to cause a compiler crash.
This patch extends applyLoopGuards to detect a single-cond range check
idiom that InstCombine generates.
It extends applyLoopGuards to detect conditions of the form
(-C1 + X < C2). InstCombine will create this form when combining two
checks of the form (X u< C2 + C1) and (X >=u C1).
In practice, this enables us to correctly compute a tight trip count
bounds for code as in the function below. InstCombine will fold the
minimum iteration check created by LoopRotate with the user check (< 8).
void unsigned_check(short *pred, unsigned width) {
if (width < 8) {
for (int x = 0; x < width; x++)
pred[x] = pred[x] * pred[x];
}
}
As a consequence, LLVM creates dead vector loops for the code above,
e.g. see https://godbolt.org/z/cb8eTcqEThttps://alive2.llvm.org/ce/z/SHHW4d
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104741
This patch generalizes MatchBinaryAddToConst to support matching
(A + C1), (A + C2), instead of just matching (A + C1), A.
The existing cases can be handled by treating non-add expressions A as
A + 0.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D104634
Convert getValueForCondition to a worklist model instead of using
recursion.
In pathological cases getValueForCondition recurses heavily.
Stack frames are quite expensive on x86-64, and some operating
systems (e.g. Windows) have relatively low stack size limits.
Using a worklist avoids potential failures from stack overflow.
Differential Revision: https://reviews.llvm.org/D104191
getPointerBase should only be looking through Add and AddRec
expressions; other expressions either aren't pointers, or can't be
looked through.
Technically, this is a functional change. For a multiply or min/max
expression, if they have exactly one pointer operand, and that operand
is the first operand, the behavior here changes. Similarly, if an AddRec
has a pointer-type step, the behavior changes. But that shouldn't be
happening in practice, and we plan to make such expressions illegal.
SCEVNAryExpr::getType() could return the wrong type for a SCEVAddExpr.
Remove it, and add getType() methods to the relevant subclasses.
NFC because nothing uses it directly, as far as I know; this is just
future-proofing.
Make getPointersDiff() and sortPtrAccesses() compatible with opaque
pointers by explicitly passing in the element type instead of
determining it from the pointer element type.
The SLPVectorizer result is slightly non-optimal in that unnecessary
pointer bitcasts are added.
Differential Revision: https://reviews.llvm.org/D104784
This borrows as much as possible from the SDAG version of the code
(originally added with D27129 and since updated with big endian support).
In IR, we can test more easily for correctness than we did in the
original patch. I'm using the simplest cases that I could find for
InstSimplify: we computeKnownBits on variable shift amounts to see if
they are zero or in range. So shuffle constant elements into a vector,
cast it, and shift it.
The motivating x86 example from https://llvm.org/PR50123 is also here.
We computeKnownBits in the caller code, but we only check if the shift
amount is in range. That could be enhanced to catch the 2nd x86 test -
if the shift amount is known too big, the result is 0.
Alive2 understands the datalayout and agrees that the tests here are
correct - example:
https://alive2.llvm.org/ce/z/KZJFMZ
Differential Revision: https://reviews.llvm.org/D104472
This adds more poison folding optimizations to InstSimplify.
Since all binary operators propagate poison, these are fine.
Also, the precondition of `select cond, undef, x` -> `x` is relaxed to allow the case when `x` is undef.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104661
This adds handling for signed predicates, similar to how unsigned
predicates are already handled.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104732
Summary:
The changes to globalization introduced in D97680 created two new functions to
push / pop shareably memory on the GPU, __kmpc_alloc_shared and
__kmpc_free_shared. This patch adds these new runtime functions to the
library info so they can be used by the HeapToStack attributor interface. This
optimization replaces malloc / free pairs with stack memory if legal.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D102087
Don't do this while stipping pointer casts, instead fetch it at
the end. This improves compatibility with opaque pointers for the
case where the base object is not opaque.
Currently we drop wrapping flags for expressions like (A + C1)<flags> - C2.
But we can retain flags under certain conditions:
* Adding a smaller constant is NUW if the original AddExpr was NUW.
* Adding a constant with the same sign and small magnitude is NSW, if the
original AddExpr was NSW.
This can improve results after using `SimplifyICmpOperands`, which may
subtract one in order to use stricter predicates, as is the case for
`isKnownPredicate`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104319
Handle to gep p, 0-v case separately, and not as part of the loop
that ensures all indices are constant integers. Those two things
are not really related.
A backedge-taken count doesn't refer to memory; returning a pointer type
is nonsense. So make sure we always return an integer.
The obvious way to do this would be to just convert the operands of the
icmp to integers, but that doesn't quite work out at the moment:
isLoopEntryGuardedByCond currently gets confused by ptrtoint operations.
So we perform the ptrtoint conversion late for lt/gt operations.
The test changes are mostly innocuous. The most interesting changes are
more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to
i64)) + %ptr)". This is expected: we can't fold this to zero because we
need to preserve the pointer base.
The call to isLoopEntryGuardedByCond in howFarToZero is less precise
because of ptrtoint operations; this shows up in the function
pr46786_c26_char in ptrtoint.ll. Fixing it here would require more
complex refactoring. It should eventually be fixed by future
improvements to isImpliedCond.
See https://bugs.llvm.org/show_bug.cgi?id=46786 for context.
Differential Revision: https://reviews.llvm.org/D103656
They are not conducive to being stored in git. Instead, we autogenerate
mock model artifacts for use in tests. Production models can be
specified with the cmake flag LLVM_INLINER_MODEL_PATH.
LLVM_INLINER_MODEL_PATH has two sentinel values:
- download, which will download the most recent compatible model.
- autogenerate, which will autogenerate a "fake" model for testing the
model uptake infrastructure.
Differential Revision: https://reviews.llvm.org/D104251
D97225 moved LazyCallGraph verify() calls behind EXPENSIVE_CHECKS,
but verity() is defined for debug builds only so this had the unintended
effect of breaking release builds with EXPENSIVE_CHECKS.
Fix by enabling verify() for both debug and EXPENSIVE_CHECKS.
Differential Revision: https://reviews.llvm.org/D104514
The old version of this code would blindly perform arithmetic without
paying attention to whether the types involved were pointers or
integers. This could lead to weird expressions like negating a pointer.
Explicitly handle simple cases involving pointers, like "x < y ? x : y".
In all other cases, coerce the operands of the comparison to integer
types. This avoids the weird cases, while handling most of the
interesting cases.
Differential Revision: https://reviews.llvm.org/D103660
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
Currently the value is only used when calling `F->viewCFG()` which is missing out on its potential and usefulness.
So I added the check to the printer passes as well.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D102011
We already have this fold:
fadd float poison, 1.0 --> poison
...via ConstantFolding, so this makes the behavior consistent
if the other operand(s) are non-constant.
The fold for undef was added before poison existed as a
value/type in IR.
This came up in D102673 / D103169
because we're trying to sort out the more complicated handling
for constrained math ops.
We should have the handling for the regular instructions done
first, so we can build on that (or diverge as needed).
Differential Revision: https://reviews.llvm.org/D104383
As per (committed without review) @reames's rGac81cb7e6dde9b0890ee1780eae94ab96743569b change,
we are now allowed to produce `ptrtoint` for non-integral pointers.
This will unblock further unbreaking of SCEV regarding int-vs-pointer type confusion.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D104322
We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.
Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.
Relanding with a check for self-referential values.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101103
The current implementation for computing relative block frequencies does
not handle correctly control-flow graphs containing irreducible loops. This
results in suboptimally generated binaries, whose perf can be up to 5%
worse than optimal.
To resolve the problem, we apply a post-processing step, which iteratively
updates block frequencies based on the frequencies of their predesessors.
This corresponds to finding the stationary point of the Markov chain by
an iterative method aka "PageRank computation". The algorithm takes at
most O(|E| * IterativeBFIMaxIterations) steps but typically converges faster.
It is turned on by passing option `use-iterative-bfi-inference`
and applied only for functions containing profile data and irreducible loops.
Tested on SPEC06/17, where it is helping to get correct profile counts for one of
the binaries (403.gcc). In prod binaries, we've seen a speedup of up to 2%-5%
for binaries containing functions with hot irreducible loops.
Reviewed By: hoy, wenlei, davidxl
Differential Revision: https://reviews.llvm.org/D103289
Both doInitialize and runOnModule were running the entire analysis
due to the actual work being done in the constructor. Strip it out here
and only get the similarity during runOnModule.
Author: lanza
Reviewers: AndrewLitteken, paquette, plofti
Differential Revision: https://reviews.llvm.org/D92524
Every invocation this was copying the Mapper for no reason. Take a const
ref instead.
Author: lanza
Reviewers: AndrewLitteken, plofti, paquette,
Differential Review: https://reviews.llvm.org/D92532
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).
Differential Revision: https://reviews.llvm.org/D104029
Essentially, the cover function simply combines the loop level check and the function level scope into one call. This simplifies several callers and is (subjectively) less error prone.
This addresses a performance regression reported against 3c6e4191. That change (correctly) limited a transform based on assumed finiteness to mustprogress loops, but the previous change (38540d7) which introduced the mustprogress check utility only handled function attributes, not the loop metadata form.
It turns out that clang uses the function attribute form for C++, and the loop metadata form for C. As a result, 3c6e4191 ended up being a large regression in practice for C code as loops weren't being considered mustprogress despite the language semantics.
Previously such folding was enabled for half, float and double values
only. With this change it is allowed for other floating point values
also.
Differential Revision: https://reviews.llvm.org/D103956
Currently, NoWrapFlags are dropped if we inline operands of SCEVAddExpr
operands. As a consequence, we always drop flags when building
expressions like `getAddExpr(A, getAddExpr(B, C, NUW), NUW)`.
We should be able to retain NUW flags common among all inlined
SCEVAddExpr and the original flags.
Reviewed By: nikic, mkazantsev
Differential Revision: https://reviews.llvm.org/D103877
Introduce a new cl::opt to hide "cold" blocks from CFG DOT graphs.
Use BFI to get block relative frequency. Hide the block if the
frequency is below the threshold set by the command line option value.
Reviewed By: davidxl, hoy
Differential Revision: https://reviews.llvm.org/D103640
This patch allows that scalable vector can also use the fold that already
exists for fixed vector, only when the lane index is lower than the minimum
number of elements of the vector.
Differential Revision: https://reviews.llvm.org/D102404
Noticed via code inspection. We changed the semantics of the IR when we added mustprogress, and we appear to have not updated this location.
Differential Revision: https://reviews.llvm.org/D103834
Fix a bug introduced by f6f6f6375d.
Now for empty PHIs, instead of crashing on assert(hasVal()) in
Optional's internals, we'll return NoAlias, as we did before that patch.
Differential Revision: https://reviews.llvm.org/D103831
The motivation here is simple loops with unsigned induction variables w/non-one steps. A toy example would be:
for (unsigned i = 0; i < N; i += 2) { body; }
Given C/C++ semantics, we do not get the nuw flag on the induction variable. Given that lack, we currently can't compute a bound for this loop. We can do better for many cases, depending on the contents of "body".
The basic intuition behind this patch is as follows:
* A step which evenly divides the iteration space must wrap through the same numbers repeatedly. And thus, we can ignore potential cornercases where we exit after the n-th wrap through uint32_max.
* Per C++ rules, infinite loops without side effects are UB. We already have code in SCEV which relies on this. In LLVM, this is tied to the mustprogress attribute.
Together, these let us conclude that the trip count of this loop must come before unsigned overflow unless the body would form a well defined infinite loop.
A couple notes for those reading along:
* I reused the loop properties code which is overly conservative for this case. I may follow up in another patch to generalize it for the actual UB rules.
* We could cache the n(s/u)w facts. I left that out because doing a pre-patch which cached existing inference showed a lot of diffs I had trouble fully explaining. I plan to get back to this, but I don't want it on the critical path.
Differential Revision: https://reviews.llvm.org/D103118
Use cast<> instead which will assert that the cast is correct and not just return null - the match() should have already failed if the cast isn't valid anyhow.
Fixes static analysis warning.
We might want to use it when creating SCEV proper in createSCEV(),
now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`,
which might have caused us to loose some optimization potential.
Restructure handling of cfg-hide-unreachable-paths and
cfg-hide-deoptimize-paths options so as to make it easier
to introduce new types of hidden blocks.
Add getDemandedBits method for uses so we can query demanded bits for each use. This can help getting better use information. For example, for the code below
define i32 @test_use(i32 %a) {
%1 = and i32 %a, -256
%2 = or i32 %1, 1
%3 = trunc i32 %2 to i8 (didn't optimize this to 1 for illustration purpose)
... some use of %3
ret %2
}
if we look at the demanded bit of %2 (which is all 32 bits because of the return), we would conclude that %a is used regardless of how its return is used. However, if we look at each use separately, we will see that the demanded bit of %2 in trunc only uses the lower 8 bits of %a which is redefined, therefore %a's usage depends on how the function return is used.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D97074
When rewriting
powf(2.0, itofp(x)) -> ldexpf(1.0, x)
exp2(sitofp(x)) -> ldexp(1.0, sext(x))
exp2(uitofp(x)) -> ldexp(1.0, zext(x))
the wrong type was used for the second argument in the ldexp/ldexpf
libc call, for target architectures with 16 bit "int" type.
The transform incorrectly used a bitcasted function pointer with
a 32-bit argument when emitting the ldexp/ldexpf call for such
targets.
The fault is solved by using the correct function prototype
in the call, by asking TargetLibraryInfo about the size of "int".
TargetLibraryInfo by default derives the size of the int type by
assuming that it is 16 bits for 16-bit architectures, and
32 bits otherwise. If this isn't true for a target it should be
possible to override that default in the TargetLibraryInfo
initializer.
Differential Revision: https://reviews.llvm.org/D99438
Some existing places use getPointerElementType() to create a copy of a
pointer type with some new address space.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D103429
We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.
Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101103
When we're remapping an AddRec, the AddRec constructed by a partial
rewrite might not make sense. This triggers an assertion complaining
it's not loop-invariant.
Instead of constructing the partially rewritten AddRec, just skip
straight to calling evaluateAtIteration.
Testcase was automatically reduced using llvm-reduce, so it's a little
messy, but hopefully makes sense.
Differential Revision: https://reviews.llvm.org/D102959
Update isFirstOrderRecurrence to explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.
Fixes PR44286 (and another PR or two).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D84951
1. Removed redundant includes,
2. Removed never defined and used `releaseMemory()`.
3. Fixed member functions names first letter case.
4. Renamed duplicate (in nested struct `NonLocalPointerInfo`) name
`NonLocalDeps` to `NonLocalDepsMap`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D102358
ExprValueMap is a map from SCEV * to a set-vector of (Value *, ConstantInt *) pair,
and while the map itself will likely be big-ish (have many keys),
it is a reasonable assumption that each key will refer to a small-ish
number of pairs.
In particular looking at n=512 case from
https://bugs.llvm.org/show_bug.cgi?id=50384,
the small-size of 4 appears to be the sweet spot,
it results in the least allocations while minimizing memory footprint.
```
$ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i | tail -n 6; echo ""; done
heaptrack.opt.0-orig.gz
total runtime: 14.32s.
calls to allocation functions: 8222442 (574192/s)
temporary memory allocations: 2419000 (168924/s)
peak heap memory consumption: 190.98MB
peak RSS (including heaptrack overhead): 239.65MB
total memory leaked: 67.58KB
heaptrack.opt.1-n1.gz
total runtime: 13.72s.
calls to allocation functions: 7184188 (523705/s)
temporary memory allocations: 2419017 (176338/s)
peak heap memory consumption: 191.38MB
peak RSS (including heaptrack overhead): 239.64MB
total memory leaked: 67.58KB
heaptrack.opt.2-n2.gz
total runtime: 12.24s.
calls to allocation functions: 6146827 (502355/s)
temporary memory allocations: 2418997 (197695/s)
peak heap memory consumption: 163.31MB
peak RSS (including heaptrack overhead): 211.01MB
total memory leaked: 67.58KB
heaptrack.opt.3-n4.gz
total runtime: 12.28s.
calls to allocation functions: 6068532 (494260/s)
temporary memory allocations: 2418985 (197017/s)
peak heap memory consumption: 155.43MB
peak RSS (including heaptrack overhead): 201.77MB
total memory leaked: 67.58KB
heaptrack.opt.4-n8.gz
total runtime: 12.06s.
calls to allocation functions: 6068042 (503321/s)
temporary memory allocations: 2418992 (200646/s)
peak heap memory consumption: 166.03MB
peak RSS (including heaptrack overhead): 213.55MB
total memory leaked: 67.58KB
heaptrack.opt.5-n16.gz
total runtime: 12.14s.
calls to allocation functions: 6067993 (499958/s)
temporary memory allocations: 2418999 (199307/s)
peak heap memory consumption: 187.24MB
peak RSS (including heaptrack overhead): 233.69MB
total memory leaked: 67.58KB
```
While that test may be an edge worst-case scenario,
https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions
agrees that this also results in improvements in the usual situations.
This is similar to the fix in c590a9880d ( PR49832 ), but
we missed handling the pattern for select of bools (no compare
inst).
We can't substitute a vector value because the equality condition
replacement that we are attempting requires that the condition
is true/false for the entire value. Vector select can be partly
true/false.
I added an assert for vector types, so we shouldn't hit this again.
Fixed formatting while auditing the callers.
https://llvm.org/PR50500
When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE
definition defined around the #includes in files include it could
result in redefinition warnings even compile errors.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102594
This reverts commit 1ed7f8ede5.
This change can cause loop-distribute to crash in some cases. Revert
until I have more time to wrap up a fix.
See PR50296, PR5028 and D102266.
MSVC-style RTTI produces loads through a GEP of a local alias which
itself is a GEP. Currently we aren't able to devirtualize any virtual
calls when MSVC RTTI is enabled.
This patch attempts to simplify a load's GEP operand by calling
SymbolicallyEvaluateGEP() with an option to look through local aliases.
Differential Revision: https://reviews.llvm.org/D101100
This patch implements getSmallConstantTripMultiple(L) correctly for multiple exit loops. The previous implementation was both imprecise, and violated the specified behavior of the method. This was fine in practice, because it turns out the function was both dead in real code, and not tested for the multiple exit case.
Differential Revision: https://reviews.llvm.org/D103189
This came up in review for another patch, see https://reviews.llvm.org/D102982#2782407 for full context.
I've reviewed the callers to make sure they can handle multiple exit loops w/non-zero returns. There's two cases in target cost models where results might change (Hexagon and PowerPC), but the results looked legal and reasonable. If a target maintainer wishes to back out the effect of the costing change, they should explicitly check for multiple exit loops and handle them as desired.
Differential Revision: https://reviews.llvm.org/D103182
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost.
By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928.
Differential Revision: https://reviews.llvm.org/D102934
When memoized values for a SCEV expressions are dropped, we also
drop all BECounts that make use of the SCEV expression. This is done
by iterating over all the ExitNotTaken counts and (recursively)
checking whether they use the SCEV expression. If there are many
exits, this will take a lot of time.
This patch improves the situation by pre-computing a set of all
used operands, so that we can determine whether a certain BEInfo
needs to be invalidated using a simple set lookup. Will still need
to loop over all BEInfos though.
This makes for a mild improvement on non-degenerate cases:
https://llvm-compile-time-tracker.com/compare.php?from=b661a55a253f4a1cf5a0fbcb86e5ba7b9fb1387b&to=be1393f450e594c53f0ad7e62339a6bc831b16f6&stat=instructions
For the degenerate case from https://bugs.llvm.org/show_bug.cgi?id=50384,
for n=128 I'm seeing run time drop from 1.6s to 1.1s.
Differential Revision: https://reviews.llvm.org/D102796
The semantics of select with undefined/poison condition
are not explicitly stated in the LangRef, but this matches
comments in the code and Alive2 appears to concur:
https://alive2.llvm.org/ce/z/KXytmd
We can find this pattern after demanded elements transforms.
As noted in D101191, fuzzers are finding infinite loops because
we may not account for this pattern in other passes.
This follows from the underlying logic for binops and min/max.
Although it does not appear that we handle this for min/max
intrinsics currently.
https://alive2.llvm.org/ce/z/Kq9Xnh
These functions were marked unavailable for MSVC targets before,
within an "T.isOSWindows() && !T.isOSCygMing()" block, but these ones
are unavailable on MinGW targets too.
This avoids generating calls to stpcpy for MinGW targets, which has
been happening since 6dbf0cfcf7 (in
some cases).
This fixes https://github.com/mstorsjo/llvm-mingw/issues/201.
Differential Revision: https://reviews.llvm.org/D102946
Replace use of host floating types with operations on APFloat when it is
possible. Use of APFloat makes analysis more convenient and facilitates
constant folding in the case of non-default FP environment.
Differential Revision: https://reviews.llvm.org/D102672
The previous rule:
(insert_vector _, (extract_vector X, 0), 0) -> X
is not quite correct. The correct fold should be:
(insert_vector Y, (extract_vector X, 0), 0) -> X
where: Y is X, or Y is undef
This commit updates the pattern.
Reviewed By: peterwaller-arm, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D102699
Previously APFloat::convertToDouble may be called only for APFloats that
were built using double semantics. Other semantics like single precision
were not allowed although corresponding numbers could be converted to
double without loss of precision. The similar restriction applied to
APFloat::convertToFloat.
With this change any APFloat that can be precisely represented by double
can be handled with convertToDouble. Behavior of convertToFloat was
updated similarly. It make the conversion operations more convenient and
adds support for formats like half and bfloat.
Differential Revision: https://reviews.llvm.org/D102671
Currently all AA analyses marked as preserved are stateless, not taking
into account their dependent analyses. So there's no need to mark them
as preserved, they won't be invalidated unless their analyses are.
SCEVAAResults was the one exception to this, it was treated like a
typical analysis result. Make it like the others and don't invalidate
unless SCEV is invalidated.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D102032
This code was re-implementing the same-BB case of
isPotentiallyReachable(). Historically, this was done because
CaptureTracking used additional caching for local dominance
queries. Now that it is no longer needed, the code is effectively
the same as isPotentiallyReachable().
The only difference are extra checks for invoke/phis. These are
misleading checks related to dominance in the value availability
sense that are not relevant for control reachability. The invoke
check was correct but redundant in that invokes are always
terminators, so `I` could never come before the invoke. The phi
check is a matter of interpretation (should an earlier phi node be
considered reachable from a later phi node in the same block?)
but ultimately doesn't matter because phis don't capture anyway.
Reapply after adjusting the synchronized.m test case, where the
TODO is now resolved. The pointer is only captured on the exception
handling path.
-----
For the CapturesBefore tracker, it is sufficient to check that
I can not reach BeforeHere. This does not necessarily require
that BeforeHere dominates I, it can also occur if the capture
happens on an entirely disjoint path.
This change was previously accepted in D90688, but had to be
reverted due to large compile-time impact in some cases: It
increases the number of reachability queries that are performed.
After recent changes, the compile-time impact is largely mitigated,
so I'm reapplying this patch. The remaining compile-time impact
is largely proportional to changes in code-size.
This reverts commit 6b8b43e7af.
This causes clang test to fail (CodeGenObjC/synchronized.m).
Revert until I can figure out whether that's an expected change.
For the CapturesBefore tracker, it is sufficient to check that
I can not reach BeforeHere. This does not necessarily require
that BeforeHere dominates I, it can also occur if the capture
happens on an entirely disjoint path.
This change was previously accepted in D90688, but had to be
reverted due to large compile-time impact in some cases: It
increases the number of reachability queries that are performed.
After recent changes, the compile-time impact is largely mitigated,
so I'm reapplying this patch. The remaining compile-time impact
is largely proportional to changes in code-size.
Reachability queries are very expensive, and currently performed
for each instruction we look at, even though most of them will
not lead to a capture and are thus ultimately irrelevant. It is
more efficient to walk a few unnecessary instructions than to
perform unnecessary reachability queries.
Theoretically, this may produce worse results, because the additional
instructions considered may cause us to hit the use count limit
earlier. In practice, this does not appear to be a problem, e.g.
on test-suite O3 we report only one more captured-before with this
change, with no resulting codegen differences.
This makes PointerMayBeCapturedBefore() significantly cheaper in
practice, hopefully allowing it to be used in more places.
These checks are not specific to the instruction based variant of
isPotentiallyReachable(), they are equally valid for the basic
block based variant. Move them there, to make sure that switching
between the instruction and basic block variants cannot introduce
regressions.
Use comesBefore() instead of performing an instruction walk. In
line with the previous implementation, instructions are considered
to reach themselves.
GlobalVariables are Constants, yet should not unconditionally be
considered true for __builtin_constant_p.
Via the LangRef
https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic:
This intrinsic generates no code. If its argument is known to be a
manifest compile-time constant value, then the intrinsic will be
converted to a constant true value. Otherwise, it will be converted
to a constant false value.
In particular, note that if the argument is a constant expression
which refers to a global (the address of which _is_ a constant, but
not manifest during the compile), then the intrinsic evaluates to
false.
Move isManifestConstant from ConstantFolding to be a method of
Constant so that we can reuse the same logic in
LowerConstantIntrinsics.
pr/41459
Reviewed By: rsmith, george.burgess.iv
Differential Revision: https://reviews.llvm.org/D102367
Previously, we already used BatchAA for individual simple pointer
dependency queries. This extends BatchAA usage for the non-local
case, so that only one BatchAA instance is used for all blocks,
instead of one instance per block.
Use of BatchAA is safe as IR cannot be modified during a MemDep
query.
This is not expected to have any practical compile-time effect,
as the alias() calls inside callCapturesBefore() are rare. This
should still be supported for API completeness, and might be
useful for reachability caching.
This patch updates the cost model for full unrolling to discount the cost of a loop invariant expression on all but one iteration. The reasoning here is that such an expression (as determined by SCEV) will be CSEd or DSEd once the loop is unrolled. Note that SCEVs reasoning will find things which could be invariant, not simply those outside the loop.
Differential Revision: https://reviews.llvm.org/D102506
This commit removes some redundant {insert,extract}_vector intrinsic
chains by implementing the following patterns as instsimplifies:
(insert_vector _, (extract_vector X, 0), 0) -> X
(extract_vector (insert_vector _, X, 0), 0) -> X
Reviewed By: peterwaller-arm
Differential Revision: https://reviews.llvm.org/D101986
We already apply loop-guards when computing the maximum with unitary
steps. This extends the code to also do so when dealing with non-unitary
steps.
This allows us to infer a tighter maximum in some cases.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D102267
This patch adds support for Darwin's libsystem math vector functions to
TLI. Darwin's libsystem provides a range of vector functions for libm
functions.
This initial patch only adds the 2 x double and 4 x float versions,
which are available on both X86 and ARM64. On X86, wider vector versions
are supported as well.
Reviewed By: jroelofs
Differential Revision: https://reviews.llvm.org/D101856
Patch by Artem Radzikhovskyy!
Allow delinearization of fixed sized arrays if we can prove that the GEP indices do not overflow the array dimensions. The checks applied are similar to the ones that are used for delinearization of parametric size arrays. Make sure that the GEP indices are non-negative and that they are smaller than the range of that dimension.
Changes Summary:
- Updated the LIT tests with more exact values, as we are able to delinearize and apply more exact tests
- profitability.ll - now able to delinearize in all cases, no need to use -da-disable-delinearization-checks flag and run the test twice
- loop-interchange-optimization-remarks.ll - in one of the cases we are able to delinearize without using -da-disable-delinearization-checks
- SimpleSIVNoValidityCheckFixedSize.ll - removed unnecessary "-da-disable-delinearization-checks" flag. Now can get the exact answer without it.
- SimpleSIVNoValidityCheckFixedSize.ll and PreliminaryNoValidityCheckFixedSize.ll - made negative tests more explicit, in order to demonstrate the need for "-da-disable-delinearization-checks" flag
Differential Revision: https://reviews.llvm.org/D101486
applyLoopGuards() already combines conditions from multiple nested
guards. However, it cannot use multiple conditions on the same guard,
combined using and/or. Add support for this by recursing into either
`and` or `or`, depending on the direction of the branch.
Differential Revision: https://reviews.llvm.org/D101692
Printing pass manager invocations is fairly verbose and not super
useful.
This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.
This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D101797
I think currently isImpliedViaMerge can incorrectly return true for phis
in a loop/cycle, if the found condition involves the previous value of
Consider the case in exit_cond_depends_on_inner_loop.
At some point, we call (modulo simplifications)
isImpliedViaMerge(<=, %x.lcssa, -1, %call, -1).
The existing code tries to prove IncV <= -1 for all incoming values
InvV using the found condition (%call <= -1). At the moment this succeeds,
but only because it does not compare the same runtime value. The found
condition checks the value of the last iteration, but the incoming value
is from the *previous* iteration.
Hence we incorrectly determine that the *previous* value was <= -1,
which may not be true.
I think we need to be more careful when looking at the incoming values
here. In particular, we need to rule out that a found condition refers to
any value that may refer to one of the previous iterations. I'm not sure
there's a reliable way to do so (that also works of irreducible control
flow).
So for now this patch adds an additional requirement that the incoming
value must properly dominate the phi block. This should ensure the
values do not change in a cycle. I am not entirely sure if will catch
all cases and I appreciate a through second look in that regard.
Alternatively we could also unconditionally bail out in this case,
instead of checking the incoming values
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101829
This will allow writing
propagateMetadata(Inst, collectInterestingValues(...))
without concern about empty lists. In case of an empty list,
Inst is returned without any changes.
induction variable to be perfect
This patch allow more conditional branches to be considered as loop
guard, and so more loop nests can be considered perfect.
Reviewed By: bmahjour, sidbav
Differential Revision: https://reviews.llvm.org/D94717
Pointers escape when converted to integers, so a pointer produced by
converting an integer to a pointer must not be a local non-escaping
object.
Reviewed By: nikic, nlopes, aqjune
Differential Revision: https://reviews.llvm.org/D101541
getValueFromCondition() uses a Visited set to record the intermediate value.
However, it uses a postorder way to compute the value first and update the
Visited set later. Thus it will be trapped into an infinite recursion if there
exists IRs that use no dominated by its def as in this example:
%tmp3 = or i1 undef, %tmp4
%tmp4 = or i1 undef, %tmp3
To prevent this, we can insert an Overdefined placeholder into the set
before computing the actual value.
Reviewed by: nikic
Differential Revision: https://reviews.llvm.org/D101273
The CGSCC pass manager interplay with the FunctionAnalysisManagerCGSCCProxy is 'special' in the sense that the former will rerun the latter if there are changes to a SCC structure; that being said, some of the functions in the SCC may be unchanged. In that case, the function simplification pipeline will be re-run, which impacts compile time[1].
This patch allows the function simplification pipeline be skipped if it was already run and the function was not modified since.
The behavior is currently disabled by default. This is because, currently, the rerunning of the function simplification pipeline on an unchanged function may still result in changes. The patch simplifies investigating and fixing those cases where repeated function pass runs do actually positively impact code quality, while offering an easy workaround for those impacted negatively by compile time regressions, and not impacting mainline scenarios.
[1] A [[ http://llvm-compile-time-tracker.com/compare.php?from=eb37d3546cd0c6e67798496634c45e501f7806f1&to=ac722d1190dc7bbdd17e977ef7ec95e69eefc91e&stat=instructions | compile time tracker ]] run with the option enabled.
Differential Revision: https://reviews.llvm.org/D98103
LazyBlockFrequenceInfoPass, LazyBranchProbabilityInfoPass and
LoopAccessLegacyAnalysis all cache pointers to their nestled required
analysis passes. One need to use addRequiredTransitive to describe
that the nestled passes can't be freed until those analysis passes
no longer are used themselves.
There is still a bit of a mess considering the getLazyBPIAnalysisUsage
and getLazyBFIAnalysisUsage functions. Those functions are used from
both Transform, CodeGen and Analysis passes. I figure it is OK to
use addRequiredTransitive also when being used from Transform and
CodeGen passes. On the other hand, I figure we must do it when
used from other Analysis passes. So using addRequiredTransitive should
be more correct here. An alternative solution would be to add a
bool option in those functions to let the user tell if it is a
analysis pass or not. Since those lazy passes will be obsolete when
new PM has conquered the world I figure we can leave it like this
right now.
Intention with the patch is to fix PR49950. It at least solves the
problem for the reproducer in PR49950. However, that reproducer
need five passes in a specific order, so there are lots of various
"solutions" that could avoid the crash without actually fixing the
root cause.
This is a reapply of commit 3655f0757f, that was reverted in
33ff3c2049 due to problems with assertions in the polly
lit tests. That problem is supposed to be solved by also adjusting
ScopPass to explicitly preserve LazyBlockFrequencyInfo and
LazyBranchProbabilityInfo (it already preserved
OptimizationRemarkEmitter which depends on those lazy passes).
Differential Revision: https://reviews.llvm.org/D100958