Two interesting ommissions:
* When reordering in either direction, reordering two calls which both
contain inf-loops is illegal. This one is possibly a change in behavior
for certain callers (e.g. fixes a latent bug.)
* When moving down, control dependence must be respected by checking the
inverse of isSafeToSpeculativeExecute. Current callers all seem to
handle this case - though admitted, I did not do an exhaustive audit.
Most seem to be only interested in moving upwards within a block. This
is mostly a case of future proofing an API so that it implements what
the comments says, not just what current callers need.
Noticed via inspection. I don't have a test case.
The implementation is just a generalization of the Select handler.
We're no trying to be smart and compute any kind of fixed point.
Differential Revision: https://reviews.llvm.org/D121897
With D107249 I saw huge compile time regressions on a module (150s ->
5700s). This turned out to be due to a huge RefSCC in
the module. As we ran the function simplification pipeline on functions
in the SCCs in the RefSCC, some of those SCCs would be split out to
their RefSCC, a child of the current RefSCC. We'd skip the remaining
SCCs in the huge RefSCC because the current RefSCC is now the RefSCC
just split out, then revisit the original huge RefSCC from the
beginning. This happened many times because many functions in the
RefSCC were optimizable to the point of becoming their own RefSCC.
This patch makes it so we don't skip SCCs not in the current RefSCC so
that we split out all the child RefSCCs on the first iteration of
RefSCC. When we split out a RefSCC, we invalidate the original RefSCC
and add the remainder of the SCCs into a new RefSCC in
RCWorklist. This happens repeatedly until we finish visiting all
SCCs, at which point there is only one valid RefSCC in
RCWorklist from the original RefSCC containing all the SCCs that
were not split out, and we visit that.
For example, in the newly added test cgscc-refscc-mutation-order.ll,
we'd previously run instcombine in this order:
f1, f2, f1, f3, f1, f4, f1
Now it's:
f1, f2, f3, f4, f1
This can cause more passes to be run in some specific cases,
e.g. if f1<->f2 gets optimized to f1<-f2, we'd previously run f1, f2;
now we run f1, f2, f2.
This improves kimwitu++ compile times by a lot (12-15% for various -O3 configs):
https://llvm-compile-time-tracker.com/compare.php?from=2371c5a0e06d22b48da0427cebaf53a5e5c54635&to=00908f1d67400cab1ad7bcd7cacc7558d1672e97&stat=instructions
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D121953
The patch adds an extra check to only set MinAbsVarIndex if
abs(V * Scale) won't wrap. In the absence of IsNSW, try to use the
bitwidths of the original V and Scale to rule out wrapping.
Attempt to model https://alive2.llvm.org/ce/z/HE8ZKj
The code in the else if below probably needs the same treatment, but I
need to come up with a test first.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D121695
Currently some optimizations are disabled because llvm::CannotBeNegativeZero()
does not know how to deal with the constrained intrinsics. This patch fixes
that by extending the existing implementation.
Differential Revision: https://reviews.llvm.org/D121483
This avoids false positive verification failures if the condition
is not literally true/false, but SCEV still makes use of the fact
that a loop is not reachable through more complex reasoning.
Fixes https://github.com/llvm/llvm-project/issues/54434.
This changes MemorySSA to be constructed in unoptimized form.
MemorySSA::ensureOptimizedUses() can be called to optimize all
uses (once). This should be done by passes where having optimized
uses is beneficial, either because we're going to query all uses
anyway, or because we're doing def-use walks.
This should help reduce the compile-time impact of MemorySSA for
some use cases (the reason why I started looking into this is
D117926), which can avoid optimizing all uses upfront, and instead
only optimize those that are actually queried.
Actually, we have an existing use-case for this, which is EarlyCSE.
Disabling eager use optimization there gives a significant
compile-time improvement, because EarlyCSE will generally only query
clobbers for a subset of all uses (this change is not included in
this patch).
Differential Revision: https://reviews.llvm.org/D121381
Splat loads are inexpensive in X86. For a 2-lane vector we need just one
instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads
to worse code. This patch adds a new score dedicated for splat loads.
Please note that a splat is usually three IR instructions:
- It is usually a load and 2 inserts:
%ld = load double, double* %gep
%ins1 = insertelement <2 x double> poison, double %ld, i32 0
%ins2 = insertelement <2 x double> %ins1, double %ld, i32 1
- But it can also be a load, an insert and a shuffle:
%ld = load double, double* %gep
%ins = insertelement <2 x double> poison, double %ld, i32 0
%shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer
Because of this some of the lit tests contain more IR instructions.
Differential Revision: https://reviews.llvm.org/D121354
Move structural hashing into virtual methods on Pass. This will
allow MachineFunctionPass to override the method to add hashing of
the MachineFunction.
Differential Revision: https://reviews.llvm.org/D120123
With opaque pointers, we cannot use the pointer element type to
determine the LocationSize for the AA query. Instead, -aa-eval
tests are now required to have an explicit load or store for any
pointer they want to compute alias results for, and the load/store
types are used to determine the location size.
This may affect ordering of results, and sorting within one result,
as the type is not considered part of the sorted string anymore.
To somewhat minimize the churn, printing still uses faux typed
pointer notation.
RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA.
GlobalsAA isn't invalidated (unless specifically invalidated) because
it's self-updating via ValueHandles, but can be imprecise during the
self-updates.
Rather than invalidating GlobalsAA, which would invalidate AAManager and
any analyses that use AAManager, create a new pass that recomputes
GlobalsAA.
Fixes#53131.
Differential Revision: https://reviews.llvm.org/D121167
We still need the code after stripAndAccumulateConstantOffsets() since
it doesn't handle GEPs of scalable types and non-constant but identical
indexes.
Differential Revision: https://reviews.llvm.org/D120523
DSE assumes that this is the case when forming a calloc from a
malloc + memset pair.
For tests, either update the malloc signature or change the
data layout.
If an instruction is first legal instruction in the module, and is the only legal instruction in its basic block, it will be ignored by the outliner due to a length check inherited from the older version of the outliner that was restricted to outlining within a single basic block. This removes that check, and updates any tests that broke because of it.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D120786
Musttail calls require extra handling to properly propagate the calling convention information and tail call information. The outliner does not currently do this, so we ignore call instructions that utilize the swifttailcc and tailcc calling convention as well as functions marked with the attribute musttail.
Reviewers: paquette, aschwaighofer
Differential Revision: https://reviews.llvm.org/D120733
The logic exposed by this patch via `llvm::DetermineUseCaptureKind` was
part of `llvm::PointerMayBeCaptured`. In the Attributor we want to keep
track of the work list items but still reuse the logic if a use might
capture a value. A follow up for the Attributor removes ~100 lines of
code and complexity while making future handling of simplified values
possible.
Differential Revision: https://reviews.llvm.org/D121272
This patch adds a CL option for avoiding the attribute compatibility
check between caller and callee in TTI. TTI attribute compatibility
checks for target CPU and target features.
In our downstream compiler, this attribute always remains the same
between callee and caller. By avoiding the addition of this attribute to
each of our inline candidate (and then checking them here during inline
cost), we save some compile time.
The option is kept false, so this change is an NFC upstream.
This is a revert of cfcc42bdc. The analysis is wrong as shown by
the minimal tests for instcombine:
https://alive2.llvm.org/ce/z/y9Dp8A
There may be a way to salvage some of the other tests,
but that can be done as follow-ups. This avoids a miscompile
and fixes#54311.
This patch adds PrettyStackEntries before running passes. The entries
include the pass name and the IR unit the pass runs on.
The information is used the print additional information when a pass
crashes, including the name and a reference to the IR unit on which it
crashed. This is similar to the behavior of the legacy pass manager.
The improved stack trace now includes:
Stack dump:
0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll
1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll'
2. Running pass 'LoopVectorizePass' on function '@a'
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120993
This extends SCEV verification to check not only backedge-taken
counts, but all entries in the IR -> SCEV cache. The restrictions
are the same as for the BECount case, i.e. we ignore expressions
based on undef, we only diagnose constant deltas (there are way
too many false positives otherwise) and we limit to reachable code.
Differential Revision: https://reviews.llvm.org/D121104
Introduce a new attribute "function-inline-cost-multiplier" which
multiplies the inline cost of a call site (or all calls to a callee) by
the multiplier.
When processing the list of calls created by inlining, check each call
to see if the new call's callee is in the same SCC as the original
callee. If so, set the "function-inline-cost-multiplier" attribute of
the new call site to double the original call site's attribute value.
This does not happen when the original call site is intra-SCC.
This is an alternative to D120584, which marks the call sites as
noinline.
Hopefully fixes PR45253.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D121084
SCEV verification should no longer affect results of subsequent
queries, and our lit tests as well as llvm-test-suite pass with
SCEV verification enabled, so I think we can enable it by default
under EXPENSIVE_CHECKS now.
Differential Revision: https://reviews.llvm.org/D120708
Currently, we hardly ever actually run SCEV verification, even in
tests with -verify-scev. This is because the NewPM LPM does not
verify SCEV. The reason for this is that SCEV verification can
actually change the result of subsequent SCEV queries, which means
that you see different transformations depending on whether
verification is enabled or not.
To allow verification in the LPM, this limits verification to
BECounts that have actually been cached. It will not calculate
new BECounts.
BackedgeTakenInfo::getExact() is still not entirely readonly,
it still calls getUMinFromMismatchedTypes(). But I hope that this
is not problematic in the same way. (This could be avoided by
performing the umin in the other SCEV instance, but this would
require duplicating some of the code.)
Differential Revision: https://reviews.llvm.org/D120551
When a SCEVUnknown gets RAUWd, we currently drop it from the folding
set, but don't forget memoized values. I believe we should be
treating RAUW the same way as deletion here and invalidate all
caches and dependent expressions.
I don't have any specific cases where this causes issues right now,
but it does address the FIXME in https://reviews.llvm.org/D119488.
Differential Revision: https://reviews.llvm.org/D120033
This ensures the right order in the sink-after map is maintained. If we
re-sink an instruction, it must be sunk after all earlier instructions
have been sunk.
Fixes https://github.com/llvm/llvm-project/issues/54223
Prior to this change LLVM would happily elide a call to any allocation
function and a call to any free function operating on the same unused
pointer. This can cause problems in some obscure cases, for example if
the body of operator::new can be inlined but the body of
operator::delete can't, as in this example from jyknight:
#include <stdlib.h>
#include <stdio.h>
int allocs = 0;
void *operator new(size_t n) {
allocs++;
void *mem = malloc(n);
if (!mem) abort();
return mem;
}
__attribute__((noinline)) void operator delete(void *mem) noexcept {
allocs--;
free(mem);
}
void deleteit(int*i) { delete i; }
int main() {
int*i = new int;
deleteit(i);
if (allocs != 0)
printf("MEMORY LEAK! allocs: %d\n", allocs);
}
This patch addresses the issue by introducing the concept of an
allocator function family and uses it to make sure that alloc/free
function pairs are only removed if they're in the same family.
Differential Revision: https://reviews.llvm.org/D117356
Previous and OhterPrev may not be in the same block. Use DT::dominates
instead of local comesBefore. DT::dominates is already used earlier to
check the order of Previous and SinkCandidate.
Fixes https://github.com/llvm/llvm-project/issues/54195
Per discussion on
https://reviews.llvm.org/D59709#inline-1148734, this seems like the
right course of action. `canBeOmittedFromSymbolTable()` subsumes and
generalizes the previous logic. In addition to handling `linkonce_odr`
`unnamed_addr` globals, we now also internalize `linkonce_odr` +
`local_unnamed_addr` constants.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D120173
The similar getICmpCode and getPredForICmpCode are already there.
This moves FP for consistency.
I think InstCombine is currently the only user of both.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D120754