This extends haveNoCommonBitsSet() to two additional cases, allowing
the following folds:
* `A + (B & ~A)` --> `A | (B & ~A)`
(https://alive2.llvm.org/ce/z/crxxhN)
* `A + ((A & B) ^ B)` --> `A | ((A & B) ^ B)`
(https://alive2.llvm.org/ce/z/A_wsH_)
These should further fold to just `A | B`, though this currently
only works in the first case.
The reason why the second fold is necessary is that we consider
this to be the canonical form if B is a constant. (I did check
whether we can change that, but it looks like a number of folds
depend on the current canonicalization, so I ended up adding both
patterns here.)
Differential Revision: https://reviews.llvm.org/D124763
Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info. Runtime checks are generated to ensure the store
does not alias any other accesses in the loop.
Ordered fadd reductions are not yet supported.
Differential Revision: https://reviews.llvm.org/D110235
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.
The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.
Differential Revision: https://reviews.llvm.org/D124358
ConstantFolding currently converts "getelementptr i8, Ptr, (sub 0, V)"
to "inttoptr (sub (ptrtoint Ptr), V)". This transform is, taken by
itself, correct, but does came with two issues:
1. It unnecessarily broadens provenance by introducing an inttoptr.
We generally prefer not to introduce inttoptr during optimization.
2. For the case where V == ptrtoint Ptr, this folds to inttoptr 0,
which further folds to null. In that case provenance becomes
incorrect. This has been observed as a real-world miscompile with
rustc.
We should probably address that incorrect inttoptr 0 fold at some
point, but in either case we should also drop this inttoptr-introducing
fold. Instead, replace it with a fold rooted at
ptrtoint(getelementptr), which seems to cover the original
motivation for this fold (test2 in the changed file).
Differential Revision: https://reviews.llvm.org/D124677
Currently loop cache cost (LCC) cannot analyze fix-sized arrays
since it cannot delinearize them. This patch adds the capability
to delinearize fix-sized arrays to LCC. Most of the code is ported
from DependenceAnalysis.cpp and some refactoring will be done in a
next patch.
Reviewed By: #loopoptwg, Meinersbur
Differential Revision: https://reviews.llvm.org/D122857
The result is a data bag, this makes sure it's signaled to a user that
the data can't be mutated when, for example, doing something like:
auto &R = FAM.getResult<FunctionPropertiesAnalysis>(F)
...
R.Uses++
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.
Differential Revision: https://reviews.llvm.org/D100486
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.
Differential Revision: https://reviews.llvm.org/D100486
This relands commit 8f550368b1.
The test is amended with REQUIRES: x86-registered-target, in line with
the other debuginfo-scev-salvage tests.
Differential Revision: https://reviews.llvm.org/D120169
Second of two patches to extend SCEV-based salvaging to dbg.value
intrinsics that have multiple location ops pre-LSR. This second patch
adds the core implementation.
Reviewers: @StephenTozer, @djtodoro
Differential Revision: https://reviews.llvm.org/D120169
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.
Differential Revision: https://reviews.llvm.org/D124202
This is a simple datatype with a few JSON utilities, and is independent
of the underlying executor. The main motivation is to allow taking a
dependency on it on the AOT side, and allow us build a correctly-sized
buffer in the cases when the requested feature isn't supported by the
model. This, in turn, allows us to grow the feature set supported by the
compiler in a backward-compatible way; and also collect traces exposing
the new features, but starting off the older model, and continue
training from those new traces.
Differential Revision: https://reviews.llvm.org/D124417
This patch set LastCallToStaticBonus based on check, it has
no noticeable size reduction on an internal workload and linux kernel
with Os/Oz.
Differential Revision: https://reviews.llvm.org/D124233
The motivation is twofold:
1) Allow plugging in a different training-time evaluator, e.g.
TFLite-based, etc.
2) Allow using TensorSpec for AOT, too, to support evolution: we start
by extracting a superset of the features currently supported by a
model. For the tensors the model does not support, we just return a
valid, but useless, buffer. This makes using a 'smaller' model (less
supported tensors) transparent to the compiler. The key is to
dimension the buffer appropriately, and we already have TensorSpec
modeling that info.
The only coupling was due to the reliance of a TF internal API for
getting the element size, but for the types we are interested in,
`sizeof` is sufficient.
A subsequent change will yank out TensorSpec in its own module.
Differential Revision: https://reviews.llvm.org/D124045
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.
Part of D100486
Differential Revision: https://reviews.llvm.org/D115653
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.
Part of D100486
Differential Revision: https://reviews.llvm.org/D115653
When constructing canonical relationships between two regions, the first instruction of a basic block from the first region is used to find the corresponding basic block from the second region. However, debug instructions are not included in similarity matching, and therefore do not have a canonical numbering. This patch makes sure to ignore the debug instructions when finding the first instruction in a basic block.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D123903
Issue: https://github.com/llvm/llvm-project/issues/54431
PHINodes that need to be generated to accommodate a PHINode outside the region due to different output paths need to have their own numbering to determine the number of output schemes required to properly handle all the outlined regions. This numbering was previously only determined by the order and values of the incoming values, as well as the parent block of the PHINode. This adds the incoming blocks to the calculation of a hash value for these PHINodes as well, and the supporting infrastructure to give each block in a region a corresponding canonical numbering.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D122207
Currently the fsub optimizations in InstSimplify don't know how to fold
-0.0 - (-X) to X when the constrained intrinsics are used. This adds partial
support. The rest of the support will come later with work on the IR
matchers.
This review is split out from D107285.
Differential Revision: https://reviews.llvm.org/D123396
Refactor from iteratively using BitCastInst::getOperand()
to using stripPointerCasts() instead. This is an improvement
since now we are able to analyze more cases, please refer
to test cases added in this patch.
Reviewed By: Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D123559
This reverts commit e810d55809.
The commit was not taken into account the fact that strduped string could be
modified. Checking if such modification happens would make the function very
costly, without a test case in mind it's not worth the effort.
Retain the behavior we get without opaque pointers: A call to a
known function with different function type is considered an
indirect call.
This fixes the crash reported in https://reviews.llvm.org/D123300#3444772.
And thread DSE's ephemeral values to EarliestEscapeInfo.
This allows more precise analysis in DSEState::isReadClobber() via BatchAA.
Followup to D123162.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D123342
Rather than checking the rounded type store size, check the type
size in bits. We don't want to forward a store of i1 to a load
of i8 for example, even though they have the same type store size.
The padding bits have unspecified contents.
This is a partial fix for the issue reported at
https://reviews.llvm.org/D115924#inline-1179482,
the problem also needs to be addressed more generally in the
constant folding code.
It actually implements support for seeing through loads, using alias analysis to
refine the result.
This is rather limited, but I didn't want to rely on more than available
analysis at that point (to be gentle with compilation time), and it does seem to
catch common scenario, as showcased by the included tests.
Differential Revision: https://reviews.llvm.org/D122431
Currently, the utility supports lowering of non atomic memory transfer routines only. This patch adds support for atomic version of memcopy. This may be useful for targets not supporting atomic memcopy.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D118443
This lines up with other parts of the codebase that only use special
knowledge about allocator functions if they're builtins.
Differential Revision: https://reviews.llvm.org/D123053
This got changed to use hasAttrSomewhere() during review, and I didn't
notice until today when I was writing some tests for another part of
this system that using hasAttrSomewhere only checked the callsite for
allocalign, rather than both the callsite and the definition. This fixes
that by introducing a helper method.
Differential Revision: https://reviews.llvm.org/D121641
This has been true since dba73135c8, but
didn't matter until now because clang wasn't emitting allocalign
attributes.
Differential Revision: https://reviews.llvm.org/D121640
Add void casts to mark the variables used, next to the places where
they are used in assert or `LLVM_DEBUG()` expressions.
Differential Revision: https://reviews.llvm.org/D123117
The LLVM IR verifier and analysis linter defines and uses several macros in
code that performs validation of IR expectations. Previously, these macros
were named with an 'Assert' prefix. These names were misleading since the
macro definitions are not conditioned on build kind; they are defined
identically in builds that have asserts enabled and those that do not. This
was confusing since an LLVM developer might expect these macros to be
conditionally enabled as 'assert' is. Further confusion was possible since
the LLVM IR verifier is implicitly disabled (in Clang::ConstructJob()) for
builds without asserts enabled, but only for Clang driver invocations; not
for clang -cc1 invocations. This could make it appear that the macros were
not active for builds without asserts enabled, e.g. when investigating
behavior using the Clang driver, and thus lead to surprises when running
tests that exercise the clang -cc1 interface.
This change renames this set of macros as follows:
Assert -> Check
AssertDI -> CheckDI
AssertTBAA -> CheckTBAA
Prior to this change, CallBase::hasFnAttr checked the called function to
see if it had an attribute if it wasn't set on the CallBase, but
getFnAttr didn't do the same delegation, which led to very confusing
behavior. This patch fixes the issue by making CallBase::getFnAttr also
check the function under the same circumstances.
Test changes look (to me) like they're cleaning up redundant attributes
which no longer get specified both on the callee and call. We also clean
up the one ad-hoc implementation of this getter over in InlineCost.cpp.
Differential Revision: https://reviews.llvm.org/D122821
Two interesting ommissions:
* When reordering in either direction, reordering two calls which both
contain inf-loops is illegal. This one is possibly a change in behavior
for certain callers (e.g. fixes a latent bug.)
* When moving down, control dependence must be respected by checking the
inverse of isSafeToSpeculativeExecute. Current callers all seem to
handle this case - though admitted, I did not do an exhaustive audit.
Most seem to be only interested in moving upwards within a block. This
is mostly a case of future proofing an API so that it implements what
the comments says, not just what current callers need.
Noticed via inspection. I don't have a test case.
The implementation is just a generalization of the Select handler.
We're no trying to be smart and compute any kind of fixed point.
Differential Revision: https://reviews.llvm.org/D121897
With D107249 I saw huge compile time regressions on a module (150s ->
5700s). This turned out to be due to a huge RefSCC in
the module. As we ran the function simplification pipeline on functions
in the SCCs in the RefSCC, some of those SCCs would be split out to
their RefSCC, a child of the current RefSCC. We'd skip the remaining
SCCs in the huge RefSCC because the current RefSCC is now the RefSCC
just split out, then revisit the original huge RefSCC from the
beginning. This happened many times because many functions in the
RefSCC were optimizable to the point of becoming their own RefSCC.
This patch makes it so we don't skip SCCs not in the current RefSCC so
that we split out all the child RefSCCs on the first iteration of
RefSCC. When we split out a RefSCC, we invalidate the original RefSCC
and add the remainder of the SCCs into a new RefSCC in
RCWorklist. This happens repeatedly until we finish visiting all
SCCs, at which point there is only one valid RefSCC in
RCWorklist from the original RefSCC containing all the SCCs that
were not split out, and we visit that.
For example, in the newly added test cgscc-refscc-mutation-order.ll,
we'd previously run instcombine in this order:
f1, f2, f1, f3, f1, f4, f1
Now it's:
f1, f2, f3, f4, f1
This can cause more passes to be run in some specific cases,
e.g. if f1<->f2 gets optimized to f1<-f2, we'd previously run f1, f2;
now we run f1, f2, f2.
This improves kimwitu++ compile times by a lot (12-15% for various -O3 configs):
https://llvm-compile-time-tracker.com/compare.php?from=2371c5a0e06d22b48da0427cebaf53a5e5c54635&to=00908f1d67400cab1ad7bcd7cacc7558d1672e97&stat=instructions
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D121953
The patch adds an extra check to only set MinAbsVarIndex if
abs(V * Scale) won't wrap. In the absence of IsNSW, try to use the
bitwidths of the original V and Scale to rule out wrapping.
Attempt to model https://alive2.llvm.org/ce/z/HE8ZKj
The code in the else if below probably needs the same treatment, but I
need to come up with a test first.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D121695
Currently some optimizations are disabled because llvm::CannotBeNegativeZero()
does not know how to deal with the constrained intrinsics. This patch fixes
that by extending the existing implementation.
Differential Revision: https://reviews.llvm.org/D121483
This avoids false positive verification failures if the condition
is not literally true/false, but SCEV still makes use of the fact
that a loop is not reachable through more complex reasoning.
Fixes https://github.com/llvm/llvm-project/issues/54434.
This changes MemorySSA to be constructed in unoptimized form.
MemorySSA::ensureOptimizedUses() can be called to optimize all
uses (once). This should be done by passes where having optimized
uses is beneficial, either because we're going to query all uses
anyway, or because we're doing def-use walks.
This should help reduce the compile-time impact of MemorySSA for
some use cases (the reason why I started looking into this is
D117926), which can avoid optimizing all uses upfront, and instead
only optimize those that are actually queried.
Actually, we have an existing use-case for this, which is EarlyCSE.
Disabling eager use optimization there gives a significant
compile-time improvement, because EarlyCSE will generally only query
clobbers for a subset of all uses (this change is not included in
this patch).
Differential Revision: https://reviews.llvm.org/D121381
Splat loads are inexpensive in X86. For a 2-lane vector we need just one
instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads
to worse code. This patch adds a new score dedicated for splat loads.
Please note that a splat is usually three IR instructions:
- It is usually a load and 2 inserts:
%ld = load double, double* %gep
%ins1 = insertelement <2 x double> poison, double %ld, i32 0
%ins2 = insertelement <2 x double> %ins1, double %ld, i32 1
- But it can also be a load, an insert and a shuffle:
%ld = load double, double* %gep
%ins = insertelement <2 x double> poison, double %ld, i32 0
%shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer
Because of this some of the lit tests contain more IR instructions.
Differential Revision: https://reviews.llvm.org/D121354
Move structural hashing into virtual methods on Pass. This will
allow MachineFunctionPass to override the method to add hashing of
the MachineFunction.
Differential Revision: https://reviews.llvm.org/D120123
With opaque pointers, we cannot use the pointer element type to
determine the LocationSize for the AA query. Instead, -aa-eval
tests are now required to have an explicit load or store for any
pointer they want to compute alias results for, and the load/store
types are used to determine the location size.
This may affect ordering of results, and sorting within one result,
as the type is not considered part of the sorted string anymore.
To somewhat minimize the churn, printing still uses faux typed
pointer notation.
RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA.
GlobalsAA isn't invalidated (unless specifically invalidated) because
it's self-updating via ValueHandles, but can be imprecise during the
self-updates.
Rather than invalidating GlobalsAA, which would invalidate AAManager and
any analyses that use AAManager, create a new pass that recomputes
GlobalsAA.
Fixes#53131.
Differential Revision: https://reviews.llvm.org/D121167
We still need the code after stripAndAccumulateConstantOffsets() since
it doesn't handle GEPs of scalable types and non-constant but identical
indexes.
Differential Revision: https://reviews.llvm.org/D120523
DSE assumes that this is the case when forming a calloc from a
malloc + memset pair.
For tests, either update the malloc signature or change the
data layout.
If an instruction is first legal instruction in the module, and is the only legal instruction in its basic block, it will be ignored by the outliner due to a length check inherited from the older version of the outliner that was restricted to outlining within a single basic block. This removes that check, and updates any tests that broke because of it.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D120786
Musttail calls require extra handling to properly propagate the calling convention information and tail call information. The outliner does not currently do this, so we ignore call instructions that utilize the swifttailcc and tailcc calling convention as well as functions marked with the attribute musttail.
Reviewers: paquette, aschwaighofer
Differential Revision: https://reviews.llvm.org/D120733
The logic exposed by this patch via `llvm::DetermineUseCaptureKind` was
part of `llvm::PointerMayBeCaptured`. In the Attributor we want to keep
track of the work list items but still reuse the logic if a use might
capture a value. A follow up for the Attributor removes ~100 lines of
code and complexity while making future handling of simplified values
possible.
Differential Revision: https://reviews.llvm.org/D121272
This patch adds a CL option for avoiding the attribute compatibility
check between caller and callee in TTI. TTI attribute compatibility
checks for target CPU and target features.
In our downstream compiler, this attribute always remains the same
between callee and caller. By avoiding the addition of this attribute to
each of our inline candidate (and then checking them here during inline
cost), we save some compile time.
The option is kept false, so this change is an NFC upstream.
This is a revert of cfcc42bdc. The analysis is wrong as shown by
the minimal tests for instcombine:
https://alive2.llvm.org/ce/z/y9Dp8A
There may be a way to salvage some of the other tests,
but that can be done as follow-ups. This avoids a miscompile
and fixes#54311.
This patch adds PrettyStackEntries before running passes. The entries
include the pass name and the IR unit the pass runs on.
The information is used the print additional information when a pass
crashes, including the name and a reference to the IR unit on which it
crashed. This is similar to the behavior of the legacy pass manager.
The improved stack trace now includes:
Stack dump:
0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll
1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll'
2. Running pass 'LoopVectorizePass' on function '@a'
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120993
This extends SCEV verification to check not only backedge-taken
counts, but all entries in the IR -> SCEV cache. The restrictions
are the same as for the BECount case, i.e. we ignore expressions
based on undef, we only diagnose constant deltas (there are way
too many false positives otherwise) and we limit to reachable code.
Differential Revision: https://reviews.llvm.org/D121104
Introduce a new attribute "function-inline-cost-multiplier" which
multiplies the inline cost of a call site (or all calls to a callee) by
the multiplier.
When processing the list of calls created by inlining, check each call
to see if the new call's callee is in the same SCC as the original
callee. If so, set the "function-inline-cost-multiplier" attribute of
the new call site to double the original call site's attribute value.
This does not happen when the original call site is intra-SCC.
This is an alternative to D120584, which marks the call sites as
noinline.
Hopefully fixes PR45253.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D121084
SCEV verification should no longer affect results of subsequent
queries, and our lit tests as well as llvm-test-suite pass with
SCEV verification enabled, so I think we can enable it by default
under EXPENSIVE_CHECKS now.
Differential Revision: https://reviews.llvm.org/D120708
Currently, we hardly ever actually run SCEV verification, even in
tests with -verify-scev. This is because the NewPM LPM does not
verify SCEV. The reason for this is that SCEV verification can
actually change the result of subsequent SCEV queries, which means
that you see different transformations depending on whether
verification is enabled or not.
To allow verification in the LPM, this limits verification to
BECounts that have actually been cached. It will not calculate
new BECounts.
BackedgeTakenInfo::getExact() is still not entirely readonly,
it still calls getUMinFromMismatchedTypes(). But I hope that this
is not problematic in the same way. (This could be avoided by
performing the umin in the other SCEV instance, but this would
require duplicating some of the code.)
Differential Revision: https://reviews.llvm.org/D120551
When a SCEVUnknown gets RAUWd, we currently drop it from the folding
set, but don't forget memoized values. I believe we should be
treating RAUW the same way as deletion here and invalidate all
caches and dependent expressions.
I don't have any specific cases where this causes issues right now,
but it does address the FIXME in https://reviews.llvm.org/D119488.
Differential Revision: https://reviews.llvm.org/D120033
This ensures the right order in the sink-after map is maintained. If we
re-sink an instruction, it must be sunk after all earlier instructions
have been sunk.
Fixes https://github.com/llvm/llvm-project/issues/54223
Prior to this change LLVM would happily elide a call to any allocation
function and a call to any free function operating on the same unused
pointer. This can cause problems in some obscure cases, for example if
the body of operator::new can be inlined but the body of
operator::delete can't, as in this example from jyknight:
#include <stdlib.h>
#include <stdio.h>
int allocs = 0;
void *operator new(size_t n) {
allocs++;
void *mem = malloc(n);
if (!mem) abort();
return mem;
}
__attribute__((noinline)) void operator delete(void *mem) noexcept {
allocs--;
free(mem);
}
void deleteit(int*i) { delete i; }
int main() {
int*i = new int;
deleteit(i);
if (allocs != 0)
printf("MEMORY LEAK! allocs: %d\n", allocs);
}
This patch addresses the issue by introducing the concept of an
allocator function family and uses it to make sure that alloc/free
function pairs are only removed if they're in the same family.
Differential Revision: https://reviews.llvm.org/D117356
Previous and OhterPrev may not be in the same block. Use DT::dominates
instead of local comesBefore. DT::dominates is already used earlier to
check the order of Previous and SinkCandidate.
Fixes https://github.com/llvm/llvm-project/issues/54195
Per discussion on
https://reviews.llvm.org/D59709#inline-1148734, this seems like the
right course of action. `canBeOmittedFromSymbolTable()` subsumes and
generalizes the previous logic. In addition to handling `linkonce_odr`
`unnamed_addr` globals, we now also internalize `linkonce_odr` +
`local_unnamed_addr` constants.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D120173
The similar getICmpCode and getPredForICmpCode are already there.
This moves FP for consistency.
I think InstCombine is currently the only user of both.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D120754
This patch extends first-order recurrence handling to support cases
where we already sunk an instruction for a different recurrence, but
LastPrev comes before Previous.
To handle those cases correctly, we need to find the earliest entry for
the sink-after chain, because this is references the Previous from the
original recurrence. This is needed to ensure we use the correct
instruction as sink point.
Depends on D118558.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D118642
Instead of passing an InstCmpInt * and a bool just pass the predicate
from the caller.
I'm considering moving the similar FCmp functions from InstCombine
over here and this makes the interface consistent with what is used
for FCmp.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D120609
For unreachable loops, any BECount is legal, and since D98706 SCEV
can make use of this for loops that are unreachable due to constant
branches. To avoid false positives, adjust SCEV verification to only
check BECounts in reachable loops.
Fixes https://github.com/llvm/llvm-project/issues/50523.
Differential Revision: https://reviews.llvm.org/D120651
The change fixes treatment of constrained compare intrinsics if
compared values are of vector type.
Differential revision: https://reviews.llvm.org/D110322
SCEVs ExprValueMap currently tracks not only which IR Values
correspond to a given SCEV expression, but additionally stores that
it may be expanded in the form X+Offset. In theory, this allows
reusing existing IR Values in more cases.
In practice, this doesn't seem to be particularly useful (the test
changes are rather underwhelming) and adds a good bit of complexity.
Per https://github.com/llvm/llvm-project/issues/53905, we have an
invalidation issue with these offseted expressions.
Differential Revision: https://reviews.llvm.org/D120311
D118090 causes a pretty significant (19%) regression in some Eigen
benchmarks. Investigating is a bit time consuming as the compilation
unit where this occurs is large. Rather than revert, this patch adds a
flag controlling that behavior (enabled by default).
Adds new optimization remarks when loop vectorization fails due to
the compiler being unable to find bound of an array access inside
a loop
Differential Revision: https://reviews.llvm.org/D115873
In D111530, I suggested that we add some relatively basic pattern-matching
folds for shifts and funnel shifts and avoid a more specialized solution
if possible.
We can start by implementing at least one of these in IR because it's
easier to write the code and verify with Alive2:
https://alive2.llvm.org/ce/z/qHpmNn
This will need to be adapted/extended for SDAG to handle the motivating
bug ( #49541 ) because the patterns only appear later with that example
(added some tests: bb850d422b)
This can be extended within InstSimplify to handle cases where we 'and'
with a shift too (in that case, kill the funnel shift).
We could also handle patterns where the shift and funnel shift directions
are inverted, but I think it's better to canonicalize that instead to
avoid pattern-match case explosion.
Differential Revision: https://reviews.llvm.org/D120253
This is the same special logic we apply for SPF signed clamps
when computing the number of sign bits, just for intrinsics.
This just uses the same logic as the select case, but there's
multiple directions this could be improved in: We could also use
the num sign bits from the clamped value, we could do this during
constant range calculation, and there's probably unsigned analogues
for the constant range case at least.
Extends getReductionOpChain to look through Phis which may be part of
the reduction chain. adjustRecipesForReductions will now also create a
CondOp for VPReductionRecipe if the block is predicated and not only if
foldTailByMasking is true.
Changes were required in tryToBlend to ensure that we don't attempt
to convert the reduction Phi into a select by returning a VPBlendRecipe.
The VPReductionRecipe will create a select between the Phi and the reduction.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D117580
This patch fixes a logical error in how we work with `LoopUsers` map.
It maps a loop onto a set of AddRecs that depend on it. The Addrecs
are added to this map only once when they are created and put to
the UniqueSCEVs` map.
The only purpose of this map is to make sure that, whenever we forget
a loop, all (directly or indirectly) dependent SCEVs get forgotten too.
Current code erases SCEVs from dependent set of a given loop whenever
we forget this loop. This is not a correct behavior due to the following scenario:
1. We have a loop `L` and an AddRec `AR` that depends on it;
2. We modify something in the loop, but don't destroy it. We still call forgetLoop on it;
3. `AR` is no longer dependent on `L` according to `LoopUsers`. It is erased from
ValueExprMap` and `ExprValue map, but still exists in UniqueSCEVs;
4. We can later request the very same AddRec for the very same loop again, and get existing
SCEV `AR`.
5. Now, `AR` exists and is used again, but its notion that it depends on `L` is lost;
6. Then we decide to delete `L`. `AR` will not be forgotten because we have lost it;
7. Just you wait when you run into a dangling pointer problem, or any other kind of problem
because an active SCEV is now referecing a non-existent loop.
The solution to this is to stop erasing values from `LoopUsers`. Yes, we will maybe forget something
that is already not used, but it's cheap.
This fixes a functional bug and potentially may have negative compile time impact on methods with
huge or numerous loops.
Differential Revision: https://reviews.llvm.org/D120303
Reviewed By: nikic
This patch fixes an invalid TypeSize->uint64_t implicit conversion in
FoldReinterpretLoadFromConst. If the size of the constant is scalable
we bail out of the optimisation for now.
Tests added here:
Transforms/InstCombine/load-store-forward.ll
Differential Revision: https://reviews.llvm.org/D120240
The problem can be shown from the newly added test case.
There are two invocations to MemorySSAUpdater::moveToPlace, and the
internal data structure VisitedBlocks is changed in the first
invocation, and reused in the second invocation. In between the two
invocations, there is a change to the CFG, and MemorySSAUpdater is
notified about the change.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D119898
Make the reading of condition for restricting re-ordering simpler.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D120005
The code was using exact sizing only, but since what we really need is just to make sure the offsets are in bounds, a minimum bound on the object size is sufficient.
To demonstrate the difference, support computing minimum sizes from obects of scalable vector type.
Remove some code which tried to handle the case of comparing two allocas where an object size could not be precisely computed. This code had zero coverage in tree, and at least one nasty bug.
The bug comes from the fact that the code uses the size of the result pointer as a proxy for whether the alloca can be of size zero. Since the result of an alloca is *always* a pointer type, and a pointer type can *never* be empty, this check was a nop. As a result, we blindly consider a zero offset from two allocas to never be equal. They can in fact be equal when one or more of the allocas is zero sized.
This is particularly ugly because instcombine contains the exact opposite rule. If instcombine reaches the allocas first, it combines them into one (making them equal). If instsimplify reaches the compare first, it would consider them not equal. This creates all kinds of fun scenarios for order of optimization reaching different and contradictory conclusions.
Our current strategy of computing ranges of SCEVUnknown Phis was to simply
compute the union of ranges of all its inputs. In order to avoid infinite recursion,
we mark Phis as pending and conservatively return full set for them. As result,
even simplest patterns of cycled phis always have a range of full set.
This patch makes this logic a bit smarter. We basically do the same, but instead
of taking inputs of single Phi we find its strongly connected component (SCC)
and compute the union of all inputs that come into this SCC from outside.
Processing entire SCC together has one more advantage: we can set range for all
of them at once, because the only thing that happens to them is the same value is
being passed between those Phis. So, despite we spend more time analyzing a
single Phi, overall we may save time by not processing other SCC members, so
amortized compile time spent should be approximately the same.
Differential Revision: https://reviews.llvm.org/D110620
Reviewed By: reames
Atomic store with Release semantic allows re-ordering of unordered load/store before the store.
Implement it.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119844
zext(umin(x,y)) == umin(zext(x),zext(y))
zext(x) == 0 -> x == 0
While it is not a very likely scenario, we probably should not expect
that instcombine already dropped such a redundant zext,
but handle directly. Moreover, perhaps there was no ZExtInst,
and SCEV somehow managed to pull out said zext out of the SCEV expression.
zext(umin(x,y)) == umin(zext(x),zext(y))
zext(x) == 0 -> x == 0
Extra leading zeros do not affect the result of comparison with zero,
nor do they matter for the unsigned min/max,
so we should not be dissuaded when we find a zero-extensions,
but instead we should just skip it.
In a prior review I was asked to move the helper function canIgnoreSNaN()
out to FPEnv.h. This wasn't possible at the time because that function
needs the fast math flags, and including them includes lots of other stuff
that isn't needed.
This patch moves the fast math flags out into a new FMF.h file unchanged,
and moves the helper function out to FPEnv.h also unchanged. This ticket
only moves code around.
Differential Revision: https://reviews.llvm.org/D119752
Currently the fsub optimizations in InstSimplify don't know how to fold
X - -0.0 to X when we know X is not zero and the constrained intrinsics
are used. This adds the support.
This review is split out from D107285.
Differential Revision: https://reviews.llvm.org/D119746
This one tries to fix:
https://github.com/llvm/llvm-project/issues/53357.
Simply, this one would check (x & y) and ~(x | y) in
haveNoCommonBitsSet. Since they shouldn't have common bits (we could
traverse the case by enumerating), and we could convert this one to (x &
y) | ~(x | y) . Then the compiler could handle it in
InstCombineAndOrXor.
Further more, since ((x & y) + (~x & ~y)) would be converted to ((x & y)
+ ~(x | y)), this patch would fix it too.
https://alive2.llvm.org/ce/z/qsKzRS
Reviewed By: spatel, xbolva00, RKSimon, lebedev.ri
Differential Revision: https://reviews.llvm.org/D118094
Volatile store does not provide any special rules for reordering with
atomics. Usual must alias anaylsis is enough here.
This makes the bahavior similar to how volatile load is handled.
Reviewers: reames, nikic
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119818
A more general enhancement needs to add tests and make sure
that intrinsics that return structs are correct. There are also
target-specific intrinsics, and I'm not sure what behavior is
expected for those.
A more general enhancement needs to add tests and make sure
that intrinsics that return structs are correct. There are also
target-specific intrinsics, and I'm not sure what behavior is
expected for those.
Instead of doing an inbounds strip first and another non-inbounds
strip afterward for equality comparisons, directly do a single
inbounds or non-inbounds strip based on whether we have an equality
predicate or not.
This is NFC-ish in that the alloca equality codepath is the only
part that sees additional non-inbounds offsets now, and for that
codepath it doesn't matter whether or not the GEP is inbounds, as
it does a stronger check itself. InstCombine would infer inbounds
for such GEPs.
Currently the fsub optimizations in InstSimplify don't know how to fold X
- +0.0 to X when using the constrained intrinsics. This adds the support.
This review is split out from D107285.
Differential Revision: https://reviews.llvm.org/D118928
Fixes a MemCpyOpt miscompile with opaque pointers.
This function can be further cleaned up, but let's just fix the miscompile first.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D119652
This mechanism was used for a couple of purposes, but the primary one was keeping track of which predicates in a union might apply to an expression. As these sets are small and agressively deduped, this has little value.
Even if the search is marked as terminated after only looking at
the first operand, we'd still look at the remaining operands
before actually ending the search.
This seems pointless and wasteful, let's not do that.
Since we don't greedily flatten `umin_seq(a, umin(b, c))` into `umin_seq(a, b, c)`,
just looking at the operands of the outer-level `umin` is not sufficient,
and we need to recurse into all same-typed `umin`'s.
This lets us avoid redundant implication work in the constructor of SCEVUnionPredicate which simplifies an upcoming change. If we're actually building a predicate via PSE, that goes through addPredicate which does include the implication check.
SCCP requires that the load/store type and global type are the
same (it does not support bitcasts of tracked globals). With
typed pointers this was implicitly enforced.
Note that this doesn't actually cause the top level predicate to become a non-union just yet.
The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
We'd catch the tautological select pattern later anyways
due to constant folding, so that leaves PHI-like select,
but it does not appear to fire there.
Currently `createNodeForSelectOrPHI()` takes an Instruction,
and only works on the Cond that is an ICmpInst,
but that can be relaxed somewhat.
For now, simply rename the existing function,
and add a thin wrapper ontop that still does
the same thing as it used to.
https://alive2.llvm.org/ce/z/ULuZxB
We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `add`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
https://alive2.llvm.org/ce/z/aKAr94
We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umin`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
https://alive2.llvm.org/ce/z/SMEaoc
We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umax`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
The code was relying upon the implicit conversion of TypeSize to
uint64_t and assuming the type in question was always fixed. However,
I discovered an issue when running the canon-freeze pass with some
IR loops that contains scalable vector types. I've changed the code
to bail out if the size is unknown at compile time, since we cannot
compute whether the step is a multiple of the type size or not.
I added a test here:
Transforms/CanonicalizeFreezeInLoops/phis.ll
Differential Revision: https://reviews.llvm.org/D118696
This is the last major stepping stone before being able to allocate the node via the folding set allocator. That will in turn allow more general SCEV predicate expression trees.
For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state. It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.
Previously we relied on the pointee type to determine what type we need
to do runtime pointer access checks.
With opaque pointers, we can access a pointer with more than one type,
so now we keep track of all the types we're accessing a pointer's
memory with.
Also some other minor getPointerElementType() removals.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D119047
PredicatedScalarEvolution has a predicate type for representing A == B. This change generalizes it into something which can represent a A <pred> B.
This generality is currently unused, but is motivated by a couple of recent cases which have come up. In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.
D108992 added KnownBits handling for 'Quadratic Reciprocity' self-multiplication patterns (bit[1] == 0), which can be used for non-undef values (poison is OK).
This patch adds noundef selfmultiply handling to value tracking so demanded bits patterns can make use of it.
Differential Revision: https://reviews.llvm.org/D117995
Use existing functionality to strip constant offsets that works well
with AS casts and avoids the code duplication.
Since we strip AS casts during the computation of the offset we also
need to adjust the APInt properly to avoid mismatches in the bit width.
This code ensures the caller of `compute` sees APInts that match the
index type size of the value passed to `compute`, not the value result
of the strip pointer cast.
Fixes#53559.
Differential Revision: https://reviews.llvm.org/D118727
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.
More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.
This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].
As a consequence of that patch, downstream user may need to manually some extra
files:
llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h
In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after: 10978519
before: 11245451
Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions
Differential Revision: https://reviews.llvm.org/D118781
This is in anticipation of my next patch, where I need to store more information about free functions than just their argument count. It felt invasive enough on this function that it seemed worthwhile to just extract this as its own commit that makes no functional changes.
Differential Revision: https://reviews.llvm.org/D117350
The change implements constant folding of ‘llvm.experimental.constrained.fcmp’
and ‘llvm.experimental.constrained.fcmps’ intrinsics.
Differential Revision: https://reviews.llvm.org/D110322
Created to fix: https://github.com/llvm/llvm-project/issues/53537
Some intrinsics functions are considered commutative since they are performing operations like addition or multiplication. Some of these have extra parameters to provide extra information that are not part of the operation itself and are not commutative. This makes sure that if an instruction that is an intrinsic takes the non commutative path to handle this case.
Reviewer: paquette
Closes Issue #53537
Differential Revision: https://reviews.llvm.org/D118807
Adds new optimization remarks when vectorization fails.
More specifically, new remarks are added for following 4 cases:
- Backward dependency
- Backward dependency that prevents Store-to-load forwarding
- Forward dependency that prevents Store-to-load forwarding
- Unknown dependency
It is important to note that only one of the sources
of failures (to vectorize) is reported by the remarks.
This source of failure may not be first in program order.
A regression test has been added to test the following cases:
a) Loop can be vectorized: No optimization remark is emitted
b) Loop can not be vectorized: In this case an optimization
remark will be emitted for one source of failure.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D108371
When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept.
Reviewed By: jeroen.dobbelaere
Differential Revision: https://reviews.llvm.org/D108221
Extend scalar evolution to handle >= and <= if a loop is known to be finite and the induction variable guards the condition. Specifically, with these assumptions lhs <= rhs is equivalent to lhs < rhs + 1 and lhs >= rhs to lhs > rhs -1.
In the case of lhs <= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmax, which would have resulted in an infinite loop.
In the case of lhs >= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmin, which would again have resulted in an infinite loop.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D118090
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.
This also adds an additional command line flag debug option to disable outlining intrinsics.
Recommit of: 8de76bd569
Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D109450
Currently, basic AA has special support for llvm.memcpy.* intrinsics. This change extends this support for any memory trancsfer opration and in particular llvm.memmove.* intrinsic.
Reviewed By: reames, nikic
Differential Revision: https://reviews.llvm.org/D117095
This is a bugfix in IVDescriptor.cpp.
The helper function `RecurrenceDescriptor::getExactFPMathInst()`
is supposed to return the 1st FP instruction that does not allow
reordering. However, when constructing the RecurrenceDescriptor,
we trace the use-def chain staring from a PHI node and for each
instruction in the use-def chain, its descriptor overrides the
previous one. Therefore in the final RecurrenceDescriptor we
constructed, we lose previous FP instructions that does not allow
reordering.
Reviewed By: kmclaughlin
Differential Revision: https://reviews.llvm.org/D118073
This extract a common isNotVisibleOnUnwind() helper into
AliasAnalysis, which handles allocas, byval arguments and noalias
calls. After D116998 this could also handle sret arguments. We
have similar logic in DSE and MemCpyOpt, which will be switched
to use this helper as well.
The noalias call case is a bit different from the others, because
it also requires that the object is not captured. The caller is
responsible for doing the appropriate check.
Differential Revision: https://reviews.llvm.org/D117000
We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function.
Recommit of 515eec3553
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D106997
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.
This also adds an additional command line flag debug option to disable outlining intrinsics.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D109450
The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions.
There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D109448
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
This patch adds support for implication inference logic for the
following pattern:
```
lhs < (y >> z) <= y, y <= rhs --> lhs < rhs
```
We should be able to use the fact that value shifted to right is
not greater than the original value (provided it is non-negative).
Differential Revision: https://reviews.llvm.org/D116150
Reviewed-By: apilipenko
This matches the actual runtime function more closely.
I considered also renaming both RetainRV/UnsafeClaimRV to end with
"ARV", for AutoreleasedReturnValue, but there's less potential
for confusion there.
Presence of operand bundles changes semantics in respect to ModRef. In particular, spec says: "From the compilers perspective, deoptimization operand bundles make the call sites theyre attached to at least readonly. They read through all of their pointer typed operands (even if theyre not otherwise escaped) and the entire visible heap. Deoptimization operand bundles do not capture their operands except during deoptimization, in which case control will not be returned to the compiled frame". Fix handling of llvm.memcpy.* according to the spec.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D118033
The behavior in Analysis (knownbits) implements poison semantics already,
and we expect the transforms (for example, in instcombine) derived from
those semantics, so this patch changes the LangRef and remaining code to
be consistent. This is one more step in removing "undef" from LLVM.
Without this, I think https://github.com/llvm/llvm-project/issues/53330
has a legitimate complaint because that report wants to allow subsequent
code to mask off bits, and that is allowed with undef values. The clang
builtins are not actually documented anywhere AFAICT, but we might want
to add that to remove more uncertainty.
Differential Revision: https://reviews.llvm.org/D117912
Peculiarly, the necessary code to handle pointers (including the
check for non-integral address spaces) is already in place,
because we were already allowing vectors of pointers here, just
not plain pointers.
The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.
To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.
This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.
The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.
Differential Revision: https://reviews.llvm.org/D117752
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information (currently after the Virtual Register
Rewriter) and then produce the log file.
This patch also introduces the score injection pass, 'Register
Allocation Pass Scoring', which is trivially just logging the score in
development mode.
Differential Revision: https://reviews.llvm.org/D117147
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.
Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.
This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D115847
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.
This patch does just that for llvm.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D117121
Integrate intersection with assumes into getBlockValue(), to ensure
that it is consistently performed.
We were doing it in nearly all places, but for example missed it
for select inputs.
Following up on 1470f94d71 (r63981173):
The result here (probably) depends on endianness. Don't bother
trying to handle this exotic case, just bail out.
Allocation functions should be marked with onlyAccessesInaccessibleMemory (when that is correct for the given function) which is checked elsewhere so this check is no longer needed.
Differential Revision: https://reviews.llvm.org/D117180
Since we don't merge/expand non-sequential umin exprs into umin_seq exprs,
we may have umin_seq(umin(umin_seq())) chain, and the innermost umin_seq
can have duplicate operands still.
This doesn't require callers to put the pointer operand and the indices
in a container like a vector when calling the function. This is not
really an issue with the existing callers. But when using it from
IRBuilder the inputs are available as separate pointer value and indices
ArrayRef.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D117038
The reinterpret load code will convert undef values into zero.
Check the uniform value case before it to produce a better result
for all-undef initializers.
However, the uniform value handling will return the uniform value
even if the access is out of bounds, while the reinterpret load
code will return undef. Add an explicit check to retain the
previous result in this case.
The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code.
Note this is not NFC for two reasons:
* The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space.
* I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that.
Differential Revision: https://reviews.llvm.org/D117241
Since 26c6a3e736, LLVM's inliner will "upgrade" the caller's stack protector
attribute based on the callee. This lead to surprising results with Clang's
no_stack_protector attribute added in 4fbf84c173 (D46300). Consider the
following code compiled with clang -fstack-protector-strong -Os
(https://godbolt.org/z/7s3rW7a1q).
extern void h(int* p);
inline __attribute__((always_inline)) int g() {
return 0;
}
int __attribute__((__no_stack_protector__)) f() {
int a[1];
h(a);
return g();
}
LLVM will inline g() into f(), and f() would get a stack protector, against the
users explicit wishes, potentially breaking the program e.g. if h() changes the
value of the stack cookie. That's a miscompile.
More recently, bc044a88ee (D91816) addressed this problem by preventing
inlining when the stack protector is disabled in the caller and enabled in the
callee or vice versa. However, the problem remained if the callee is marked
always_inline as in the example above. This affected users, see e.g.
http://crbug.com/1274129 and http://llvm.org/pr52886.
One way to fix this would be to prevent inlining also in the always_inline
case. Despite the name, always_inline does not guarantee inlining, so this
would be legal but potentially surprising to users.
However, I think the better fix is to not enable the stack protector in a
caller based on the callee. The motivation for the old behaviour is unclear, it
seems counter-intuitive, and causes real problems as we've seen.
This commit implements that fix, which means in the example above, g() gets
inlined into f() (also without always_inline), and f() is emitted without stack
protector. I think that matches most developers' expectations, and that's also
what GCC does.
Another effect of this change is that a no_stack_protector function can now be
inlined into a stack protected function, e.g. (https://godbolt.org/z/hafP6W856):
extern void h(int* p);
inline int __attribute__((__no_stack_protector__)) __attribute__((always_inline)) g() {
return 0;
}
int f() {
int a[1];
h(a);
return g();
}
I think that's fine. Such code would be unusual since no_stack_protector is
normally applied to a program entry point which sets up the stack canary. And
even if such code exists, inlining doesn't change the semantics: there is still
no stack cookie setup/check around entry/exit of the g() code region, but there
may be in the surrounding context, as there was before inlining. This also
matches GCC.
See also the discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722
Differential revision: https://reviews.llvm.org/D116589
We could use knownbits on both operands for even more folds (and there are
already tests in place for that), but this is enough to recover the example
from:
https://github.com/llvm/llvm-project/issues/51934
(the tests are derived from the code in that example)
I am assuming no noticeable compile-time impact from this because udiv/urem
are rare opcodes.
Differential Revision: https://reviews.llvm.org/D116616
This is required to query the legality more precisely in the LoopVectorizer.
This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.
Differential Revision: https://reviews.llvm.org/D115329
This avoids the InlineAdvisor carrying the responsibility of deleting
Function objects. We use LazyCallGraph::Node objects instead, which are
stable in memory for the duration of the Module-wide performance of CGSCC
passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which
is the case here)
Differential Revision: https://reviews.llvm.org/D116964
This happens in e.g. regalloc, where we trace decisions per function,
but wouldn't want to spew N log files (i.e. one per function). So we
output a key-value association, where the key is an ID for the
sub-module object, and the value is the tensorflow::SequenceExample.
The current relation with protobuf is tenuous, so we're avoiding a
custom message type in favor of using the `Struct` message, but that
requires the values be wire-able strings, hence base64 encoding.
We plan on resolving the protobuf situation shortly, and improve the
encoding of such logs, but this is sufficient for now for setting up
regalloc training.
Differential Revision: https://reviews.llvm.org/D116985
Alternative to D116817.
This introduces a new value-based folding interface for Or (FoldOr),
which takes 2 values and returns an existing Value or a constant if the
Or can be simplified. Otherwise nullptr is returned. This replaces the
more restrictive CreateOr which takes 2 constants.
This is the used to implement a folder that uses InstructionSimplify.
The logic to simplify `Or` instructions is moved there. Subsequent
patches are going to transition other CreateXXX to the more general
FoldXXX interface.
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D116935
Extend the existing malloc-family specific optimization to all noalias calls. This allows us to handle allocation wrappers, and removes a dependency on a lib-func check in favor of generic attribute usage.
Differential Revision: https://reviews.llvm.org/D116980
D92270 updated constant expression folding to fold inbounds GEP to
poison if the base is undef. Apply the same logic to SimplifyGEPInst.
The justification is that we can choose an out-of-bounds pointer as base
pointer.
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D117015
We could just merge all umin into umin_seq, but that is likely
a pessimization, so don't do that, but pretend that we did
for the purpose of deduplication.
Having the same operand more than once doesn't change the outcome here,
neither reduction-wise nor poison-wise.
We must keep the first instance specifically though.
Two crashes have been reported. This change disables the new logic while leaving the new node in tree. Hopefully, that's enough to allow investigation without breakage while avoiding massive churn.
Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.
As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692,
SCEV is forbidden from reasoning about 'backedge taken count'
if the branch condition is a poison-safe logical operation,
which is conservatively correct, but is severely limiting.
Instead, we should have a way to express those
poison blocking properties in SCEV expressions.
The proposed semantics is:
```
Sequential/in-order min/max SCEV expressions are non-commutative variants
of commutative min/max SCEV expressions. If none of their operands
are poison, then they are functionally equivalent, otherwise,
if the operand that represents the saturation point* of given expression,
comes before the first poison operand, then the whole expression is not poison,
but is said saturation point.
```
* saturation point - the maximal/minimal possible integer value for the given type
The lowering is straight-forward:
```
compare each operand to the saturation point,
perform sequential in-order logical-or (poison-safe!) ordered reduction
over those checks, and if reduction returned true then return
saturation point else return the naive min/max reduction over the operands
```
https://alive2.llvm.org/ce/z/Q7jxvH (2 ops)
https://alive2.llvm.org/ce/z/QCRrhk (3 ops)
Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS
Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97
That allows us to handle the patterns in question.
Reviewed By: nikic, reames
Differential Revision: https://reviews.llvm.org/D116766
(Split from original patch to separate non-NFC part and add coverage. I typoed when adding the new test, so this change includes the typo fix to let libfunc recongize the signature. Didn't figure it was worth another separate commit.)
Differential Revision: https://reviews.llvm.org/D116851 (part 2 of 2)
There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically.
This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack. The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC. The later will be reviewed separately.
Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)
We currently have two similar implementations of this concept:
isNoAliasCall() only checks for the noalias return attribute.
isNoAliasFn() also checks for allocation functions.
We should switch to only checking the attribute. SLC is responsible
for inferring the noalias return attribute for non-new allocation
functions (with a missing case fixed in
348bc76e35).
For new, clang is responsible for setting the attribute,
if -fno-assume-sane-operator-new is not passed.
Differential Revision: https://reviews.llvm.org/D116800
We're testing that the RegionLoop pointer is null in the first part of the check, so we need to check that its non-null before dereferencing it in a later part of the check.