This change fixes a latent bug which was exposed by a change currently in review (https://reviews.llvm.org/D99802#2685032).
The story on this is a bit involved. Without this change, what ended up happening with the pending review was that we'd strip attributes off intrinsics, and then selectiondag would fail to lower the intrinsic. Why? Because the lowering of the intrinsic relies on the presence of the readonly attribute. We don't have a matcher to select the case where there's a glue node needed.
Now, on the surface, this still seems like a codegen bug. However, here it gets fun. I was unable to reproduce this with a standalone test at all, and was pretty much struck until skatkov provided the critical detail. This reproduces only when RS4GC and codegen are run in the same process and context. Why? Because it turns out we can't roundtrip the stripped attribute through serialized IR!
We'll happily print out the missing attribute, but when we parse it back, the auto-upgrade logic has a side effect of blindly overwriting attributes on intrinsics with those specified in Intrinsics.td. This makes it impossible to exercise SelectionDAG from a standalone test case.
At this point, I decided to treat this an RS4GC bug as a) we don't need to strip in this case, and b) I could write a test which shows the correct behavior to ensure this doesn't break again in the future.
As an aside, I'd originally set out to handle libfuncs too - since in theory they might have the same issues - but backed away quickly when I realized how the semantics of builtin, nobuiltin, and no-builtin-x all interacted. I'm utterly convinced that no part of the optimizer handles that correctly, and decided not to open that can of worms here.
During store promotion, we check whether the pointer was captured
to exclude potential reads from other threads. However, we're only
interested in captures before or inside the loop. Check this using
PointerMayBeCapturedBefore against the loop header.
Differential Revision: https://reviews.llvm.org/D100706
Recently processMinMaxIntrinsic has been added and we started to observe a number of analysis get invalidated after CVP. The problem is CVP conservatively returns 'true' even if there were no modifications to IR. I found one more place besides processMinMaxIntrinsic which has the same problem. I think processMinMaxIntrinsic and similar should better have boolean return status to prevent similar issue reappear in future.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D100538
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from
if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
to
if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")
Introduce a getValueAsBool that normalize the check, with the following
behavior:
no attributes or attribute set to "false" => return false
attribute set to "true" => return true
Differential Revision: https://reviews.llvm.org/D99299
These were misleading, they're more of a "clear" than an "invalidate".
We shouldn't be individually clearing analysis results. Either we clear
all analyses when some IR becomes invalid, or we properly go through
invalidation.
There was only one use of this, which can be simulated with
AM.invalidate(F, PA).
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D100519
This patch changed the isLegalUse check to ensure that
LSRInstance::GenerateConstantOffsetsImpl generates an
offset that results in a legal addressing mode and
formula. The check is changed to look similar to the
assert check used for illegal formulas.
Differential Revision: https://reviews.llvm.org/D100383
Change-Id: Iffb9e32d59df96b8f072c00f6c339108159a009a
If the PHI-of-ops simplifies to an existing value, no real PHI is
created, which means the dependencies between the
PHI-of-ops and its operands is not materialized in IR. At the
moment, we fail to create a real PHI node for the PHI-of-ops,
because the PHI-of-ops root instruction is not re-visited if
one of the PHI-of-ops operands changes. We need to add the
operands as additional users in this case.
Even with this patch, there are still some dependencies
missing. I will continue tackling the outstanding
reporeted crashes in this area.
Fixes PR36501, PR42422, PR42557.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D66924
This refactors SCCP and creates a SCCPSolver interface and class so that it can
be used by other passes and transformations. We will use this in D93838, which
adds a function specialisation pass.
This is based on an early version by Vinay Madhusudan.
Differential Revision: https://reviews.llvm.org/D93762
Jump threading can replace select then unconditional branch with
conditional branch, but when doing so loses debug info.
This destructive transform is eventually leading to a failed Verifier
run during full LTO builds of the Linux kernel with CFI and KCOV
enabled, as reported in PR39531.
ModuleSanitizerCoveragePass will insert calls to
__sanitizer_cov_trace_pc, and sometimes split critical edges,
using whatever debug info may or may not exist for the branch for
the added libcall. Since we can inline calls to
__sanitizer_cov_trace_pc due to LTO, this can lead to the error
observed in PR39531 when the debug info isn't propagated to
the libcall, because of prior destructive transforms that failed to
retain debug info.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100137
Say we have
%1=min(%a,%b)
%2=min(%b,%c)
%3=min(%2,%a)
The optimization will try to reassociate the later one so that we can rewrite it to %3=min(%1, %c) and remove %2.
But if %2 has another uses outside of %3 then we can't remove %2 and end up with:
%1=min(%a,%b)
%2=min(%b,%c)
%3=min(%1, %c)
This doesn't harm by itself except it is not profitable and changes IR for no good reason.
What is bad it triggers next iteration which finds out that optimization is applicable to %2 and %3 and generates:
%1=min(%a,%b)
%2=min(%b,%c)
%3=min(%1,%c)
%4=min(%2,%a)
and so on...
The solution is to prevent optimization in the first place if intermediate result (%2) has side uses and
known to be not removed.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D100170
The default is likely wrong.
Out of all the callees, only a single one needs to pass-in false (JumpThread),
everything else either already passes true, or should pass true.
Until the default is flipped, at least make it harder to unintentionally
add new callees with UseBlockValue=false.
Add an ability to store `Offset` between partially aliased location. Use this
storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D98718
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D98027
Previously loading the vtable used in calling a virtual method in a loop
was not hoisted out of the loop. This fixes that.
canSinkOrHoistInst() itself doesn't check that the load operands are
loop invariant, callers also check that separately.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99784
meetBDVState looks pretty difficult to read and follow.
This is purely NFC but doing several things:
1) Combine meet and meetBDVState
2) Move the function to be a member of BDVState
3) Make BDVState be a mutable object
4) Convert switch to sequence of ifs
5) Adds comments.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D99064
This fixes a "Cached first special instruction is wrong!" assert.
The assert fires because replacing a value with another can cause an
instruction to no longer be "special" to ICF. In this case,
devirtualization happened, turning an indirect call to a
call to a willreturn function which is no longer special.
Reviewed By: nikic, rnk
Differential Revision: https://reviews.llvm.org/D99977
After D99249 we use three different loop pass managers for LICM,
LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI
and LazyBPI are not preserved by LoopRotate (note that D74640
is no longer needed). Avoid this by marking them as preserved.
My understanding of D86156 is that it is okay to simply preserve
them (which LoopUnswitch already does for the same reason) and
rely on callbacks to deal with deleted blocks.
Differential Revision: https://reviews.llvm.org/D99843
After loop interchange, the (old) outer loop header should not jump to
the `LoopExit`. Note that the old outer loop becomes the new inner loop
after interchange. If we branched to `LoopExit` then after interchange
we would jump directly from the (new) inner loop header to `LoopExit`
without executing the rest of outer loop.
This patch modifies adjustLoopBranches() such that the old outer
loop header (which becomes the new inner loop header) jumps to the
old inner loop latch which becomes the new outer loop latch after
interchange.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D98475
During LoopStrengthReduce, some of the SSA values that are used by debug values
may be lost and/or salvaged. After LSR we attempt to recover any undef debug
values, including any that were salvaged but then lost their values afterwards,
by replacing the lost values with any live equal values (plus a possible
constant offset) that have been gathered prior to running LSR. When we do this
we restore the debug value's original DIExpression, to undo any salvaging (as we
have gone back to using the original debug value).
This process can currently produce invalid debug info if the number of operands
has changed by salvaging during LSR. Replacing old values during the
applyEqualValues step does not change the number of location operands, which
means that when we restore the old DIExpression we may have a mismatch between
the number of operands used by the debug value and the number of operands
referenced by the DIExpression. This patch fixes this by restoring the full
original location metadata at the start of the applyEqualValues step, so that
there is no mismatch in operand count between the debug value and its
DIExpression.
Differential Revision: https://reviews.llvm.org/D98644
After loop interchange, the (old) outer loop header should not jump to
`LoopExit`. Note that the old outer loop becomes the new inner loop
after interchange. If we branched to `LoopExit` then after interchange
we would jump directly from the (new) inner loop header to `LoopExit`
without executing the rest of (new) outer loop.
This patch modifies adjustLoopBranches() such that the old outer
loop header (which becomes the new inner loop header) jumps to the
old inner loop latch which becomes the new outer loop latch after
interchange.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D98475
-Make sure of the CreateShl/LShr/AShr methods that take a uint64_t
instead of creating a ConstantInt for 1 ourselves.
-Use Builder.getInt1 or ConstantInt::getBool instead of a conditional.
-Pull out repeated calls to getType.
Follow up to a6d2a8d6f5. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
Follow up to a6d2a8d6f5. This covers all the public interfaces of the bundle related code. I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.
performScalarPREInsertion() inserts instructions into blocks that we
need to tell ImplicitControlFlowTracking about, otherwise the ICF cache
may be invalid.
Fixes PR49193.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99909
The key change (4f5e92c) to switch gc.result and gc.relocate to being readnone landed nearly two weeks ago, and we haven't seen any fallout. Time to remove the code added to make reverting easy.
When we are able to SROA an alloca, we know all uses of it, meaning we
don't have to preserve the invariant group intrinsics and metadata.
It's possible that we could lose information regarding redundant
loads/stores, but that's unlikely to have any real impact since right
now the only user is Clang and vtables.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99760
When run under valgrind, or with a malloc that poisons freed memory,
this can lead to segfaults or other problems.
To avoid modifying the AdditionalUsers DenseMap while still iterating,
save the instructions to be notified in a separate SmallPtrSet, and use
this to later call OperandChangedState on each instruction.
Fixes PR49582.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D98602
The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering.
I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.
Support reassociation for min/max. With that we should be able to transform min(min(a, b), c) -> min(min(a, c), b) if min(a, c) is already available.
Reviewed By: mkazantsev, lebedev.ri
Differential Revision: https://reviews.llvm.org/D88287
This is a patch to fix the bug in alignment calculation (see https://reviews.llvm.org/D90529#2619492).
Consider this code:
```
call void @llvm.assume(i1 true) ["align"(i32* %a, i32 32, i32 28)]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 -1
; aligment of %arrayidx?
```
The llvm.assume guarantees that `%a - 28` is 32-bytes aligned, meaning that `%a` is 32k + 28 for some k.
Therefore `a - 4` cannot be 32-bytes aligned but the existing code was calculating the pointer as 32-bytes aligned.
The reason why this happened is as follows.
`DiffSCEV` stores `%arrayidx - %a` which is -4.
`OffSCEV` stores the offset value of “align”, which is 28.
`DiffSCEV` + `OffSCEV` = 24 should be used for `a - 4`'s offset from 32k, but `DiffSCEV` - `OffSCEV` = 32 was being used instead.
Reviewed By: Tyker
Differential Revision: https://reviews.llvm.org/D98759
The code is assuming that having an exact exit count for the loop implies that exit counts for every exit are known. This used to be true, but when we added handling for dead exits we broke this invariant. The new invariant is that an exact loop count implies that any exits non trivially dead have exit counts.
We could have fixed this by either a) explicitly checking for a dead exit, or b) just testing for SCEVCouldNotCompute. I chose the second as it was simpler.
(Debugging this took longer than it should have since I'd mistyped the original assert and it wasn't checking what it was meant to...)
p.s. Sorry for the lack of test case. Getting things into a state to actually hit this is difficult and fragile. The original repro involves loop-deletion leaving SCEV in a slightly inprecise state which lets us bypass other transforms in IndVarSimplify on the way to this one. All of my attempts to separate it into a standalone test failed.
Removes CFGAnalyses from the preserved analyses set
returned by LoopFlattenPass::run().
Reviewed By: Dave Green, Ta-Wei Tu
Differential Revision: https://reviews.llvm.org/D99700