This patch removes the bail out for signed predicates and non-positive
strides in howManyLessThans and updates computeMaxBECountForLT to return
SCEVCouldNotCompute for signed predicates with negative strides.
AFAICT bail-out was only added because computeMaxBECountForLT may not
handle negative signed strides correctly. Instead of not calling
computeMaxBECountForLT at all because we bail out earlier, we can
instead return SCEVCouldNotCompute in computeMaxBECountForLT.
The max backedge taken count will be computed as the max value of the
symbolic backedge taken count.
This improves precision in cases where we can compute symbolic backedge
taken counts and also fixes a crash.
Fixes#57818.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D135667
Followup to D135962 to rename remaining uses of
FunctionModRefBehavior to MemoryEffects. Does not touch API names
yet, but also updates variables names FMRB/MRB to ME, to match the
new type name.
Moving an instruction can invalidate the cached block dispositions of
the corresponding SCEV. Invalidate the cached dispositions.
Also fixes a copy-paste error in forgetBlockAndLoopDispositions where
the start expression S was removed from BlockDispositions in the loop
but not the current values. This was also exposed by the new test case.
Fixes#58439.
InstSimplify currently checks whether the instruction simplifies
back to itself, and returns undef in that case. Generally, this
should only occur in unreachable code.
However, this was also done for the simplifyInstructionWithOperands()
API. In that case, the instruction only serves as a template that
provides the opcode and other non-operand data. In this case,
simplifying back to the same "instruction" may be expected. This
caused PR58401 in conjunction with D134954.
As such, move this check into simplifyInstruction() only. The only
other caller of simplifyInstructionWithOperands() also handles the
self-simplification case explicitly.
When looking for underlying objects, if we encounter one that we
have already seen, then we should skip it (as it has already been
checked) rather than bail out. In particular, this adds support
for the case where we have a loop use of a phi recurrence.
makeLoopInvariant may recursively move its operands to make them
invariant, before moving the passed in instruction. Those recursively
moved instructions are currently missed when invalidating block and loop
dispositions.
To address this, move the invalidation code to Loop::makeLoopInvariant.
Fixes#58314.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D135909
If we have translated across a cycle backedge, the same SSA value
for the condition might be referring to two different loop iterations.
Use the isValueEqualInPotentialCycles() helper to avoid assuming
equality in that case.
Accesses to constant memory are not observable and should be
reported as readnone, not readonly. This is consistent with what
we do for normal (non-call) instructions: For those, the TBAA
metadata will result in pointsToConstantMemory() returning true,
which will then result in a NoModRef result, not a Ref result.
Differential Revision: https://reviews.llvm.org/D135864
When training large models, we encounter use-after-free bugs when
writing to the input tensors for various MLGO models. This patch fixes the
issue by marking the tensors as "persistent".
Differential Revision: https://reviews.llvm.org/D135739
The algorithm in allLoopPathsLeadToBlock() does not handle the case
where the loop latch is part of the predecessor set correctly: In
this case, we may take the backedge (escaping to a different loop
iteration) and not execute other latch successors. This can happen
if the latch is part of an inner cycle.
Fixes https://github.com/llvm/llvm-project/issues/57780.
Differential Revision: https://reviews.llvm.org/D134279
This extends the existing SCEV verification to catch cache invalidation
issues as in #57837.
The validation logic is similar to the recently added loop disposition
cache validation in bb68b2402d.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134531
If the single-thread model is used, or the
-licm-force-thread-model-single flag is specified, skip checks
related to thread-safety. This means that store promotion for
conditionally executed stores only requires proof of
dereferenceability and writability, but not of thread-safety. For
example, this enables promotion of stores to (non-constant) globals,
as well as captured allocas.
Fixes https://github.com/llvm/llvm-project/issues/50537.
Differential Revision: https://reviews.llvm.org/D130466
Extend forgetBlockAndLoopDisposition to allow clearing information for a
single value. This can be useful when only a single value is changed,
e.g. because the instruction is moved.
We also need to clear the cached values for all SCEV users, because they
may depend on the starting value's disposition.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134614
When SimplifyLibCalls is dealing with wchar_t (e.g. optimizing wcslen)
it uses ValueTracking helpers with a CharSize/ElementSize that isn't
8, but rather 16 or 32 (to match with the size in bits of a wchar_t).
Problem I've seen is that llvm::getConstantDataArrayInfo is taking
both an "ElementSize" argument (basically indicating size of a
char/element in bits) and an "Offset" which afaict is an offset
in the unit "number of elements". Then it also use
stripAndAccumulateConstantOffsets to get a "StartIdx" which afaict
is calculated in bytes. The returned Slice.Length is based on
arithmetics that add/subtract variables that are having different
units (bytes vs elements). Most notably I think the "StartIdx" must
be scaled using the "ElementSize" to get correct results.
The symptom of the above problem was seen in the wcslen-1.ll test
case which miscompiled.
This patch is supposed to resolve the bug by converting between
bytes and elements when needed.
Differential Revision: https://reviews.llvm.org/D135263
Currently, AAResultBase (from which alias analysis providers inherit)
stores a reference back to the AAResults aggregation it is part of,
so it can perform recursive alias analysis queries via
getBestAAResults().
This patch removes the back-reference from AAResultBase to AAResults,
and instead passes the used aggregation through the AAQueryInfo.
This can be used to perform recursive AA queries using the full
aggregation.
Differential Revision: https://reviews.llvm.org/D94363
https://alive2.llvm.org/ce/z/oShzr3
This was noted as a missing fold in D134876 (with additional
examples based on issue #58046).
I'm assuming that fmul with a zero operand is rare enough
that the use of ValueTracking will not noticeably increase
compile-time.
This adjusts a PowerPC codegen test that was added with D88388
because it would get folded away and no longer provide coverage
for the bug fix.
Simplify LoopAccessLegacyAnalysis by using LoopAccessInfoManager from
D134606. As a side-effect this also removes printing support from
LoopAccessLegacyAnalysis.
Depends on D134606.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134608
The attached test case can cause LLVM crash in buildVPlanWithVPRecipes because
invalid VPlan is generated.
FIRST-ORDER-RECURRENCE-PHI ir<%792> = phi ir<%501>, ir<%806>
CLONE ir<%804> = fdiv ir<1.000000e+00>, vp<%17> // use of %17
CLONE ir<%806> = load ir<%805>
EMIT vp<%17> = first-order splice ir<%792> ir<%806> // def of %17
...
There is a use before def error on %17.
When vectorizer generates a VPlan, it generates a "first-order splice"
instruction for a loop carried variable after its definition. All related PHI
users are changed to use this "first-order splice" result, and are moved after
it. The move is guided by a MapVector SinkAfter. And the content of SinkAfter is
filled by RecurrenceDescriptor::isFixedOrderRecurrence.
Let's look at the first PHI and related instructions
%v792 = phi double [ %v806, %Loop ], [ %d1, %Entry ]
%v802 = fdiv double %v794, %v792
%v804 = fdiv double 1.000000e+00, %v792
%v806 = load double, ptr %v805, align 8
%v806 is a loop carried variable, %v792 is related PHI instruction. Vectorizer
will generated a new "first-order splice" instruction for %v806, and it will be
used by %v802 and %v804. So %v802 and %v804 will be moved after %v806 and its
"first-order splice" instruction. So SinkAfter contains
%v802 -> %v806
%v804 -> %v802
It means %v802 should be moved after %v806 and %v804 will be moved after %v802.
Please pay attention that the order is important.
When isFixedOrderRecurrence processing PHI instruction %v794, related
instructions are
%v793 = phi double [ %v813, %Loop ], [ %d1, %Entry ]
%v794 = phi double [ %v793, %Loop ], [ %d2, %Entry ]
%v802 = fdiv double %v794, %v792
%v813 = load double, ptr %v812, align 8
This time its related loop carried variable is %v813, its user is %v802. So
%v802 should also be moved after %v813. But %v802 is already in SinkAfter,
because %v813 is later than %v806, so the original %v802 entry in SinkAfter is
deleted, a new %v802 entry is added. Now SinkAfter contains
%v804 -> %v802
%v802 -> %v813
With these data, %v802 can still be moved after all its operands, but %v804
can't be moved after %v806 and its "first-order splice" instruction. And causes
use before def error.
So when remove/re-insert an instruction I in SinkAfter, we should also
recursively remove instructions targeting I and re-insert them into SinkAfter.
But for simplicity I just bail out in this case.
Differential Revision: https://reviews.llvm.org/D134083
The constant is already commuted for an fmul opcode,
but this code can be called more directly for fma,
so we have to swap for that caller. There are tests
in InstSimplify and InstCombine to verify that this
works as expected.
Added a helper in TargetLibraryInfo to get size of "size_t" in bits,
given a Module reference. The new getSizeTSize helper is using the
same strategy as for example isValidProtoForLibFunc has been using
in the past, assuming that the size can be derived by asking
DataLayout about the size/type of a pointer to int.
FortifiedLibCallSimplifier::optimizeStrpCpyChk was changed to use
the new getSizeTSize helper instead of assuming that sizeof(size_t)
is equal to sizeof(int*) by itself (that is the assumption used in
TargetLibraryInfoImpl::getSizeTSize so the result will be the same).
Having a common helper for this ensure that we use the same strategy
when deriving the size of "size_t" in different parts of the code.
One bonus with this refactoring (basing it on Module instead of just
DataLayout) is that it makes it easier to override this for a specific
target triple, in case the assumption of using getPointerSizeInBits
wouldn't hold.
Differential Revision: https://reviews.llvm.org/D110585
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.
To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.
Fixes#50940.
Fixes#51669.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D134606
Update both memprof and callsite metadata to reflect inlined functions.
For callsite metadata this is simply a concatenation of each cloned
call's call stack with that of the inlined callsite's.
For memprof metadata, each profiled memory info block (MIB) is either
moved to the cloned allocation call or left on the original allocation
call depending on whether its context matches the newly refined call
stack context on the cloned call. We also reapply context trimming
optimizations based on the refined set of contexts on each of the calls
(cloned and original), via utilities in MemoryProfileInfo.
Depends on D128142.
Differential Revision: https://reviews.llvm.org/D128143
See also related RFCs:
RFC: Sanitizer-based Heap Profiler [1]
RFC: A binary serialization format for MemProf [2]
RFC: IR metadata format for MemProf [3]*
* Note that the IR metadata format has changed from the RFC during
implementation, as described in the preceeding patch adding the basic
metadata and verification support.
The matching is performed during the normal PGO annotation phase, to
ensure that the inlines applied in the IR at that point are a subset
of the inlines in the profiled binary and thus reflected in the
profile's call stacks. This is important because the call frames are
associated with functions in the profile based on the inlining in the
symbolized call stacks, and this simplifies locating the subset of
profile data relevant for matching onto each function's IR.
The PGOInstrumentationUse pass is enhanced to perform matching for
whatever combination of memprof and regular PGO profile data exists in
the profile.
Using the utilities introduced in D128854:
The memprof profile data for each context is converted to "cold" or
"notcold" based on parameterized thresholds for size, access count, and
lifetime. The memprof allocation contexts are trimmed to the minimal
amount of context required to uniquely identify whether the context is
cold or not cold. For allocations where all profiled contexts have the
same allocation type, no memprof metadata is attached and instead the
allocation call is directly annotated with an attribute specifying the
alloction type. This is the same attributed that will be applied to
allocation calls once cloned for different contexts, and later used
during LibCall simplification to emit allocation hints [4].
Depends on D128141 and D128854.
[1] https://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
[2] https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
[3] https://discourse.llvm.org/t/rfc-ir-metadata-format-for-memprof/59165
[4] ab87cf382d
Differential Revision: https://reviews.llvm.org/D128142
This is an extension of the existing min/max+select fold (which already
has a very large number of variations) to allow a vector shuffle because
that's what we have in the motivating example from issue #42100.
A couple of Alive2 checks of variants (I don't know how to generalize
these in Alive):
https://alive2.llvm.org/ce/z/jUFAqT
And verify the PR42100 test:
https://alive2.llvm.org/ce/z/3EcASf
It's possible there is some generalization of the fold or a
VectorCombine/SLP answer for the motivating test, but I haven't found a
better/smaller solution yet.
We can also add even more variants here as follow-up patches. For example,
we can have shuffle followed by min/max; we also don't have this
canonicalization or the reverse:
https://alive2.llvm.org/ce/z/StHD9f
Differential Revision: https://reviews.llvm.org/D134879
breakLoopBackedge may remove blocks and loops. Also clear block &
loop disposition to avoid the cache containing invalid blocks and loops.
The coverage for the change is provided when using an ASAN build of opt
to run the LoopDeletion unit tests; without the fix, pointers to invalid
objects would be used.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D134663
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information and then produce the log file.
Differential Revision: https://reviews.llvm.org/D133616
This patch teaches the module inliner a traversal order designed for
the instrumentation FDO (+ThinLTO) scenario.
The new traversal order prioritizes call sites in the following order:
1. Those call sites that are expected to reduce the caller size
2. Those call sites that have gone through the cost-benefit analaysis
3. The remaining call sites
With this fairly simple traversal order, a large internel benchmark
yields performance comparable to the bottom-up inliner -- both in
terms of the execution performance and .text* sizes.
Big thanks goes to Liqiang Tao for the module inliner infrastructure.
I still have hacks outside this patch to prevent excessively long
compilation or .text* size explosion. I'm trying to come up with
acceptable solutions in near future.
Differential Revision: https://reviews.llvm.org/D134376
CFG with cycles may requires additional passes of "while (Changed)"
iteration if to propagate data back from latter blocks to earlier blocks,
ordered according to depth_fist.
OR logic, used for ::May, converge to stable state faster then AND logic
use for ::Must.
Though the better solution is to switch to some some form of queue, but
having that this one is good enough, I will consider to do that later.
We can switch ::Must to OR logic if we calculate "may be dead" instead
of direct "must be alive" and then convert values to match existing
interface.
Additionally it fixes correctness in "@cycle" test.
Reviewed By: kstoimenov, fmayer
Differential Revision: https://reviews.llvm.org/D134796