This patch adds initial support for a pointer diff based runtime check
scheme for vectorization. This scheme requires fewer computations and
checks than the existing full overlap checking, if it is applicable.
The main idea is to only check if source and sink of a dependency are
far enough apart so the accesses won't overlap in the vector loop. To do
so, it is sufficient to compute the difference and compare it to the
`VF * UF * AccessSize`. It is sufficient to check
`(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards
dependence in the vector loop with the given VF and UF. If Src >=u Sink,
there is not dependence preventing vectorization, hence the overflow
should not matter and using the ULT should be sufficient.
Note that the initial version is restricted in multiple ways:
1. Pointers must only either be read or written, by a single
instruction (this allows re-constructing source/sink for
dependences with the available information)
2. Source and sink pointers must be add-recs, with matching steps
3. The step must be a constant.
3. abs(step) == AccessSize.
Most of those restrictions can be relaxed in the future.
See https://github.com/llvm/llvm-project/issues/53590.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D119078
I fixed some poison-safety violations on related patterns in InstCombine
and noticed that we missed adding nsw/nuw on them, so this adds clauses
to the underlying analysis for that.
We need the undef input restriction to make this safe according to Alive2:
https://alive2.llvm.org/ce/z/48g9K8
Differential Revision: https://reviews.llvm.org/D125500
This adds two conjugated folds:
* A | B -> B if A implies B (https://alive2.llvm.org/ce/z/R6GU4j)
* A & B -> A if A implies B (https://alive2.llvm.org/ce/z/EGMqyy)
If A and B are icmps themselves, we will usually fold this through
other logic already (though the tests show a couple additional cases
we previously missed). However, isImpliedCond() also supports A
being of the form X & Y, which allows us to handle cases like
(X & Y) | B where X implies B. This addresses the regression from
D125398.
Something that notably doesn't work yet is the (X | Y) & B case.
This is due to an asymmetry in the isImpliedCondition()
implementation that will have to be addressed separately.
Differential Revision: https://reviews.llvm.org/D125530
Scaffolding support for generating runtime checks for multiple SCEV expressions
per pointer. The initial version just adds support for looking through
a single pointer select.
The more sophisticated logic for analyzing forks is in D108699
Reviewed By: huntergr
Differential Revision: https://reviews.llvm.org/D114487
D98718 caused the order of Values/MemoryLocations we pass to alias() to
be significant due to storing the offset in the PartialAlias case. But
some callers weren't audited and were still passing swapped arguments,
causing the returned PartialAlias offset to be negative in some
cases. For example, the newly added unittests would return -1
instead of 1.
Fixes#55343, a miscompile.
Reviewed By: asbirlea, nikic
Differential Revision: https://reviews.llvm.org/D125328
This issue reproduces in the context of LoopDeletion, because the
bitcast does not get simplified away there. For a plain -inst-simplify
run the bitcast would get folded away first.
Fixes https://github.com/llvm/llvm-project/issues/54615.
When the first commutative instruction in a region using the same value in both positions was compared to a corresponding instruction with two different values, there was an early check that determined that since the values were new, it was true that these values acted in the same way structurally. If this was not contradicted later in the program, the regions were marked as similar. This removes that check, so that it is clear that the same value cannot be mapped to two different values.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D124775
This allows the compiler to support more features than those supported by a
model. The only requirement (development mode only) is that the new
features must be appended at the end of the list of features requested
from the model. The support is transparent to compiler code: for
unsupported features, we provide a valid buffer to copy their values;
it's just that this buffer is disconnected from the model, so insofar
as the model is concerned (AOT or development mode), these features don't
exist. The buffers are allocated at setup - meaning, at steady state,
there is no extra allocation (maintaining the current invariant). These
buffers has 2 roles: one, keep the compiler code simple. Second, allow
logging their values in development mode. The latter allows retraining
a model supporting the larger feature set starting from traces produced
with the old model.
For release mode (AOT-ed models), this decouples compiler evolution from
model evolution, which we want in scenarios where the toolchain is
frequently rebuilt and redeployed: we can first deploy the new features,
and continue working with the older model, until a new model is made
available, which can then be picked up the next time the compiler is built.
Differential Revision: https://reviews.llvm.org/D124565
Rename the legacy `DOTGraphTraits{Module,}{Viewer,Printer}` to the corresponding `DOTGraphTraits...WrapperPass`, and implement a new `DOTGraphTraitsViewer` with new pass manager.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D123677
We can try to vectorize number of stores less than MinVecRegSize
/ scalar_value_size, if it is allowed by target. Gives an extra
opportunity for the vectorization.
Fixes PR54985.
Differential Revision: https://reviews.llvm.org/D124284
Fold %x umin_seq %y to %x if %x ule %y. This also subsumes the
special handling for constant operands, as if %y is constant this
folds to umin via implied poison reasoning, and if %x is constant
then either %x is not zero and it folds to umin, or it is known
zero, in which case it is ule anything.
Fold %x umin_seq %y to %x umin %y if %x cannot be zero. They only
differ in semantics for %x==0.
More generally %x *_seq %y folds to %x * %y if %x cannot be the
saturation fold (though currently we only have umin_seq).
If a constrained intrinsic call was replaced by some value, it was not
removed in some cases. The dangling instruction resulted in useless
instructions executed in runtime. It happened because constrained
intrinsics usually have side effect, it is used to model the interaction
with floating-point environment. In some cases side effect is actually
absent or can be ignored.
This change adds specific treatment of constrained intrinsics so that
their side effect can be removed if it actually absents.
Differential Revision: https://reviews.llvm.org/D118426
Similar to how we convert logical and/or to bitwise and/or, we should
also convert umin_seq to umin based on implied poison reasoning. In
%x umin_seq %y, if %y being poison implies %x being poison, then we
don't need the sequential evaluation: Having %y contribute towards
the result will never make the result more poisonous. An important
corollary of this is that if %y is never poison, we also don't need
the sequential evaluation.
This avoids some of the regressions in D124910.
Differential Revision: https://reviews.llvm.org/D124921
The assertion is to check we always get backedge taken count
(`BECount`) of zero when the exit condition is in select form
(`isa<BinaryOperation>(ExitCond)`) and the exit limit for the
first operand is zero `EL0.ExactNotTaken->isZero()`). However
the assertion is checking that the exit condition is NOT in
select form. Removing the the whole assertion since we now handle
select form in ScalarEvolution::getSequentialMinMaxExpr.
Reviewed By: reames, nikic
Differential Revision: https://reviews.llvm.org/D122835
Per feedback on D123086 after submit.
Also added a test for vec_malloc et al attribute inference to show it's
doing the right thing.
The new tests exposed a defect, corrected by adding vec_free to the list of
free functions in MemoryBuiltins.cpp, which had been overlooked all the
way back in D94710, over a year ago.
Differential Revision: https://reviews.llvm.org/D124859
This extends haveNoCommonBitsSet() to two additional cases, allowing
the following folds:
* `A + (B & ~A)` --> `A | (B & ~A)`
(https://alive2.llvm.org/ce/z/crxxhN)
* `A + ((A & B) ^ B)` --> `A | ((A & B) ^ B)`
(https://alive2.llvm.org/ce/z/A_wsH_)
These should further fold to just `A | B`, though this currently
only works in the first case.
The reason why the second fold is necessary is that we consider
this to be the canonical form if B is a constant. (I did check
whether we can change that, but it looks like a number of folds
depend on the current canonicalization, so I ended up adding both
patterns here.)
Differential Revision: https://reviews.llvm.org/D124763
Adds ability to vectorize loops containing a store to a loop-invariant
address as part of a reduction that isn't converted to SSA form due to
lack of aliasing info. Runtime checks are generated to ensure the store
does not alias any other accesses in the loop.
Ordered fadd reductions are not yet supported.
Differential Revision: https://reviews.llvm.org/D110235
This adds fptosi_sat and fptoui_sat to the list of trivially
vectorizable functions, mainly so that the loop vectorizer can vectorize
the instruction. Marking them as trivially vectorizable also allows them
to be SLP vectorized, and Scalarized.
The signature of a fptosi_sat requires two type overrides
(@llvm.fptosi.sat.v2i32.v2f32), unlike other intrinsics that often only
take a single. This patch alters hasVectorInstrinsicOverloadedScalarOpd
to isVectorIntrinsicWithOverloadTypeAtArg, so that it can mark the first
operand of the intrinsic as a overloaded (but not scalar) operand.
Differential Revision: https://reviews.llvm.org/D124358
ConstantFolding currently converts "getelementptr i8, Ptr, (sub 0, V)"
to "inttoptr (sub (ptrtoint Ptr), V)". This transform is, taken by
itself, correct, but does came with two issues:
1. It unnecessarily broadens provenance by introducing an inttoptr.
We generally prefer not to introduce inttoptr during optimization.
2. For the case where V == ptrtoint Ptr, this folds to inttoptr 0,
which further folds to null. In that case provenance becomes
incorrect. This has been observed as a real-world miscompile with
rustc.
We should probably address that incorrect inttoptr 0 fold at some
point, but in either case we should also drop this inttoptr-introducing
fold. Instead, replace it with a fold rooted at
ptrtoint(getelementptr), which seems to cover the original
motivation for this fold (test2 in the changed file).
Differential Revision: https://reviews.llvm.org/D124677
Currently loop cache cost (LCC) cannot analyze fix-sized arrays
since it cannot delinearize them. This patch adds the capability
to delinearize fix-sized arrays to LCC. Most of the code is ported
from DependenceAnalysis.cpp and some refactoring will be done in a
next patch.
Reviewed By: #loopoptwg, Meinersbur
Differential Revision: https://reviews.llvm.org/D122857
The result is a data bag, this makes sure it's signaled to a user that
the data can't be mutated when, for example, doing something like:
auto &R = FAM.getResult<FunctionPropertiesAnalysis>(F)
...
R.Uses++
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.
Differential Revision: https://reviews.llvm.org/D100486
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.
Differential Revision: https://reviews.llvm.org/D100486
This relands commit 8f550368b1.
The test is amended with REQUIRES: x86-registered-target, in line with
the other debuginfo-scev-salvage tests.
Differential Revision: https://reviews.llvm.org/D120169
Second of two patches to extend SCEV-based salvaging to dbg.value
intrinsics that have multiple location ops pre-LSR. This second patch
adds the core implementation.
Reviewers: @StephenTozer, @djtodoro
Differential Revision: https://reviews.llvm.org/D120169
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.
Differential Revision: https://reviews.llvm.org/D124202
This is a simple datatype with a few JSON utilities, and is independent
of the underlying executor. The main motivation is to allow taking a
dependency on it on the AOT side, and allow us build a correctly-sized
buffer in the cases when the requested feature isn't supported by the
model. This, in turn, allows us to grow the feature set supported by the
compiler in a backward-compatible way; and also collect traces exposing
the new features, but starting off the older model, and continue
training from those new traces.
Differential Revision: https://reviews.llvm.org/D124417
This patch set LastCallToStaticBonus based on check, it has
no noticeable size reduction on an internal workload and linux kernel
with Os/Oz.
Differential Revision: https://reviews.llvm.org/D124233
The motivation is twofold:
1) Allow plugging in a different training-time evaluator, e.g.
TFLite-based, etc.
2) Allow using TensorSpec for AOT, too, to support evolution: we start
by extracting a superset of the features currently supported by a
model. For the tensors the model does not support, we just return a
valid, but useless, buffer. This makes using a 'smaller' model (less
supported tensors) transparent to the compiler. The key is to
dimension the buffer appropriately, and we already have TensorSpec
modeling that info.
The only coupling was due to the reliance of a TF internal API for
getting the element size, but for the types we are interested in,
`sizeof` is sufficient.
A subsequent change will yank out TensorSpec in its own module.
Differential Revision: https://reviews.llvm.org/D124045
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.
Part of D100486
Differential Revision: https://reviews.llvm.org/D115653
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.
Part of D100486
Differential Revision: https://reviews.llvm.org/D115653
When constructing canonical relationships between two regions, the first instruction of a basic block from the first region is used to find the corresponding basic block from the second region. However, debug instructions are not included in similarity matching, and therefore do not have a canonical numbering. This patch makes sure to ignore the debug instructions when finding the first instruction in a basic block.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D123903
Issue: https://github.com/llvm/llvm-project/issues/54431
PHINodes that need to be generated to accommodate a PHINode outside the region due to different output paths need to have their own numbering to determine the number of output schemes required to properly handle all the outlined regions. This numbering was previously only determined by the order and values of the incoming values, as well as the parent block of the PHINode. This adds the incoming blocks to the calculation of a hash value for these PHINodes as well, and the supporting infrastructure to give each block in a region a corresponding canonical numbering.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D122207
Currently the fsub optimizations in InstSimplify don't know how to fold
-0.0 - (-X) to X when the constrained intrinsics are used. This adds partial
support. The rest of the support will come later with work on the IR
matchers.
This review is split out from D107285.
Differential Revision: https://reviews.llvm.org/D123396
Refactor from iteratively using BitCastInst::getOperand()
to using stripPointerCasts() instead. This is an improvement
since now we are able to analyze more cases, please refer
to test cases added in this patch.
Reviewed By: Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D123559
This reverts commit e810d55809.
The commit was not taken into account the fact that strduped string could be
modified. Checking if such modification happens would make the function very
costly, without a test case in mind it's not worth the effort.
Retain the behavior we get without opaque pointers: A call to a
known function with different function type is considered an
indirect call.
This fixes the crash reported in https://reviews.llvm.org/D123300#3444772.
And thread DSE's ephemeral values to EarliestEscapeInfo.
This allows more precise analysis in DSEState::isReadClobber() via BatchAA.
Followup to D123162.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D123342
Rather than checking the rounded type store size, check the type
size in bits. We don't want to forward a store of i1 to a load
of i8 for example, even though they have the same type store size.
The padding bits have unspecified contents.
This is a partial fix for the issue reported at
https://reviews.llvm.org/D115924#inline-1179482,
the problem also needs to be addressed more generally in the
constant folding code.
It actually implements support for seeing through loads, using alias analysis to
refine the result.
This is rather limited, but I didn't want to rely on more than available
analysis at that point (to be gentle with compilation time), and it does seem to
catch common scenario, as showcased by the included tests.
Differential Revision: https://reviews.llvm.org/D122431
Currently, the utility supports lowering of non atomic memory transfer routines only. This patch adds support for atomic version of memcopy. This may be useful for targets not supporting atomic memcopy.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D118443
This lines up with other parts of the codebase that only use special
knowledge about allocator functions if they're builtins.
Differential Revision: https://reviews.llvm.org/D123053
This got changed to use hasAttrSomewhere() during review, and I didn't
notice until today when I was writing some tests for another part of
this system that using hasAttrSomewhere only checked the callsite for
allocalign, rather than both the callsite and the definition. This fixes
that by introducing a helper method.
Differential Revision: https://reviews.llvm.org/D121641
This has been true since dba73135c8, but
didn't matter until now because clang wasn't emitting allocalign
attributes.
Differential Revision: https://reviews.llvm.org/D121640
Add void casts to mark the variables used, next to the places where
they are used in assert or `LLVM_DEBUG()` expressions.
Differential Revision: https://reviews.llvm.org/D123117
The LLVM IR verifier and analysis linter defines and uses several macros in
code that performs validation of IR expectations. Previously, these macros
were named with an 'Assert' prefix. These names were misleading since the
macro definitions are not conditioned on build kind; they are defined
identically in builds that have asserts enabled and those that do not. This
was confusing since an LLVM developer might expect these macros to be
conditionally enabled as 'assert' is. Further confusion was possible since
the LLVM IR verifier is implicitly disabled (in Clang::ConstructJob()) for
builds without asserts enabled, but only for Clang driver invocations; not
for clang -cc1 invocations. This could make it appear that the macros were
not active for builds without asserts enabled, e.g. when investigating
behavior using the Clang driver, and thus lead to surprises when running
tests that exercise the clang -cc1 interface.
This change renames this set of macros as follows:
Assert -> Check
AssertDI -> CheckDI
AssertTBAA -> CheckTBAA
Prior to this change, CallBase::hasFnAttr checked the called function to
see if it had an attribute if it wasn't set on the CallBase, but
getFnAttr didn't do the same delegation, which led to very confusing
behavior. This patch fixes the issue by making CallBase::getFnAttr also
check the function under the same circumstances.
Test changes look (to me) like they're cleaning up redundant attributes
which no longer get specified both on the callee and call. We also clean
up the one ad-hoc implementation of this getter over in InlineCost.cpp.
Differential Revision: https://reviews.llvm.org/D122821
Two interesting ommissions:
* When reordering in either direction, reordering two calls which both
contain inf-loops is illegal. This one is possibly a change in behavior
for certain callers (e.g. fixes a latent bug.)
* When moving down, control dependence must be respected by checking the
inverse of isSafeToSpeculativeExecute. Current callers all seem to
handle this case - though admitted, I did not do an exhaustive audit.
Most seem to be only interested in moving upwards within a block. This
is mostly a case of future proofing an API so that it implements what
the comments says, not just what current callers need.
Noticed via inspection. I don't have a test case.
The implementation is just a generalization of the Select handler.
We're no trying to be smart and compute any kind of fixed point.
Differential Revision: https://reviews.llvm.org/D121897
With D107249 I saw huge compile time regressions on a module (150s ->
5700s). This turned out to be due to a huge RefSCC in
the module. As we ran the function simplification pipeline on functions
in the SCCs in the RefSCC, some of those SCCs would be split out to
their RefSCC, a child of the current RefSCC. We'd skip the remaining
SCCs in the huge RefSCC because the current RefSCC is now the RefSCC
just split out, then revisit the original huge RefSCC from the
beginning. This happened many times because many functions in the
RefSCC were optimizable to the point of becoming their own RefSCC.
This patch makes it so we don't skip SCCs not in the current RefSCC so
that we split out all the child RefSCCs on the first iteration of
RefSCC. When we split out a RefSCC, we invalidate the original RefSCC
and add the remainder of the SCCs into a new RefSCC in
RCWorklist. This happens repeatedly until we finish visiting all
SCCs, at which point there is only one valid RefSCC in
RCWorklist from the original RefSCC containing all the SCCs that
were not split out, and we visit that.
For example, in the newly added test cgscc-refscc-mutation-order.ll,
we'd previously run instcombine in this order:
f1, f2, f1, f3, f1, f4, f1
Now it's:
f1, f2, f3, f4, f1
This can cause more passes to be run in some specific cases,
e.g. if f1<->f2 gets optimized to f1<-f2, we'd previously run f1, f2;
now we run f1, f2, f2.
This improves kimwitu++ compile times by a lot (12-15% for various -O3 configs):
https://llvm-compile-time-tracker.com/compare.php?from=2371c5a0e06d22b48da0427cebaf53a5e5c54635&to=00908f1d67400cab1ad7bcd7cacc7558d1672e97&stat=instructions
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D121953
The patch adds an extra check to only set MinAbsVarIndex if
abs(V * Scale) won't wrap. In the absence of IsNSW, try to use the
bitwidths of the original V and Scale to rule out wrapping.
Attempt to model https://alive2.llvm.org/ce/z/HE8ZKj
The code in the else if below probably needs the same treatment, but I
need to come up with a test first.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D121695
Currently some optimizations are disabled because llvm::CannotBeNegativeZero()
does not know how to deal with the constrained intrinsics. This patch fixes
that by extending the existing implementation.
Differential Revision: https://reviews.llvm.org/D121483
This avoids false positive verification failures if the condition
is not literally true/false, but SCEV still makes use of the fact
that a loop is not reachable through more complex reasoning.
Fixes https://github.com/llvm/llvm-project/issues/54434.
This changes MemorySSA to be constructed in unoptimized form.
MemorySSA::ensureOptimizedUses() can be called to optimize all
uses (once). This should be done by passes where having optimized
uses is beneficial, either because we're going to query all uses
anyway, or because we're doing def-use walks.
This should help reduce the compile-time impact of MemorySSA for
some use cases (the reason why I started looking into this is
D117926), which can avoid optimizing all uses upfront, and instead
only optimize those that are actually queried.
Actually, we have an existing use-case for this, which is EarlyCSE.
Disabling eager use optimization there gives a significant
compile-time improvement, because EarlyCSE will generally only query
clobbers for a subset of all uses (this change is not included in
this patch).
Differential Revision: https://reviews.llvm.org/D121381
Splat loads are inexpensive in X86. For a 2-lane vector we need just one
instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads
to worse code. This patch adds a new score dedicated for splat loads.
Please note that a splat is usually three IR instructions:
- It is usually a load and 2 inserts:
%ld = load double, double* %gep
%ins1 = insertelement <2 x double> poison, double %ld, i32 0
%ins2 = insertelement <2 x double> %ins1, double %ld, i32 1
- But it can also be a load, an insert and a shuffle:
%ld = load double, double* %gep
%ins = insertelement <2 x double> poison, double %ld, i32 0
%shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer
Because of this some of the lit tests contain more IR instructions.
Differential Revision: https://reviews.llvm.org/D121354
Move structural hashing into virtual methods on Pass. This will
allow MachineFunctionPass to override the method to add hashing of
the MachineFunction.
Differential Revision: https://reviews.llvm.org/D120123
With opaque pointers, we cannot use the pointer element type to
determine the LocationSize for the AA query. Instead, -aa-eval
tests are now required to have an explicit load or store for any
pointer they want to compute alias results for, and the load/store
types are used to determine the location size.
This may affect ordering of results, and sorting within one result,
as the type is not considered part of the sorted string anymore.
To somewhat minimize the churn, printing still uses faux typed
pointer notation.
RequireAnalysis<GlobalsAA> doesn't actually recompute GlobalsAA.
GlobalsAA isn't invalidated (unless specifically invalidated) because
it's self-updating via ValueHandles, but can be imprecise during the
self-updates.
Rather than invalidating GlobalsAA, which would invalidate AAManager and
any analyses that use AAManager, create a new pass that recomputes
GlobalsAA.
Fixes#53131.
Differential Revision: https://reviews.llvm.org/D121167
We still need the code after stripAndAccumulateConstantOffsets() since
it doesn't handle GEPs of scalable types and non-constant but identical
indexes.
Differential Revision: https://reviews.llvm.org/D120523
DSE assumes that this is the case when forming a calloc from a
malloc + memset pair.
For tests, either update the malloc signature or change the
data layout.
If an instruction is first legal instruction in the module, and is the only legal instruction in its basic block, it will be ignored by the outliner due to a length check inherited from the older version of the outliner that was restricted to outlining within a single basic block. This removes that check, and updates any tests that broke because of it.
Reviewer: paquette
Differential Revision: https://reviews.llvm.org/D120786
Musttail calls require extra handling to properly propagate the calling convention information and tail call information. The outliner does not currently do this, so we ignore call instructions that utilize the swifttailcc and tailcc calling convention as well as functions marked with the attribute musttail.
Reviewers: paquette, aschwaighofer
Differential Revision: https://reviews.llvm.org/D120733
The logic exposed by this patch via `llvm::DetermineUseCaptureKind` was
part of `llvm::PointerMayBeCaptured`. In the Attributor we want to keep
track of the work list items but still reuse the logic if a use might
capture a value. A follow up for the Attributor removes ~100 lines of
code and complexity while making future handling of simplified values
possible.
Differential Revision: https://reviews.llvm.org/D121272
This patch adds a CL option for avoiding the attribute compatibility
check between caller and callee in TTI. TTI attribute compatibility
checks for target CPU and target features.
In our downstream compiler, this attribute always remains the same
between callee and caller. By avoiding the addition of this attribute to
each of our inline candidate (and then checking them here during inline
cost), we save some compile time.
The option is kept false, so this change is an NFC upstream.
This is a revert of cfcc42bdc. The analysis is wrong as shown by
the minimal tests for instcombine:
https://alive2.llvm.org/ce/z/y9Dp8A
There may be a way to salvage some of the other tests,
but that can be done as follow-ups. This avoids a miscompile
and fixes#54311.
This patch adds PrettyStackEntries before running passes. The entries
include the pass name and the IR unit the pass runs on.
The information is used the print additional information when a pass
crashes, including the name and a reference to the IR unit on which it
crashed. This is similar to the behavior of the legacy pass manager.
The improved stack trace now includes:
Stack dump:
0. Program arguments: bin/opt -loop-vectorize -force-vector-width=4 crash.ll
1. Running pass 'ModuleToFunctionPassAdaptor' on module 'crash.ll'
2. Running pass 'LoopVectorizePass' on function '@a'
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D120993
This extends SCEV verification to check not only backedge-taken
counts, but all entries in the IR -> SCEV cache. The restrictions
are the same as for the BECount case, i.e. we ignore expressions
based on undef, we only diagnose constant deltas (there are way
too many false positives otherwise) and we limit to reachable code.
Differential Revision: https://reviews.llvm.org/D121104
Introduce a new attribute "function-inline-cost-multiplier" which
multiplies the inline cost of a call site (or all calls to a callee) by
the multiplier.
When processing the list of calls created by inlining, check each call
to see if the new call's callee is in the same SCC as the original
callee. If so, set the "function-inline-cost-multiplier" attribute of
the new call site to double the original call site's attribute value.
This does not happen when the original call site is intra-SCC.
This is an alternative to D120584, which marks the call sites as
noinline.
Hopefully fixes PR45253.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D121084
SCEV verification should no longer affect results of subsequent
queries, and our lit tests as well as llvm-test-suite pass with
SCEV verification enabled, so I think we can enable it by default
under EXPENSIVE_CHECKS now.
Differential Revision: https://reviews.llvm.org/D120708
Currently, we hardly ever actually run SCEV verification, even in
tests with -verify-scev. This is because the NewPM LPM does not
verify SCEV. The reason for this is that SCEV verification can
actually change the result of subsequent SCEV queries, which means
that you see different transformations depending on whether
verification is enabled or not.
To allow verification in the LPM, this limits verification to
BECounts that have actually been cached. It will not calculate
new BECounts.
BackedgeTakenInfo::getExact() is still not entirely readonly,
it still calls getUMinFromMismatchedTypes(). But I hope that this
is not problematic in the same way. (This could be avoided by
performing the umin in the other SCEV instance, but this would
require duplicating some of the code.)
Differential Revision: https://reviews.llvm.org/D120551
When a SCEVUnknown gets RAUWd, we currently drop it from the folding
set, but don't forget memoized values. I believe we should be
treating RAUW the same way as deletion here and invalidate all
caches and dependent expressions.
I don't have any specific cases where this causes issues right now,
but it does address the FIXME in https://reviews.llvm.org/D119488.
Differential Revision: https://reviews.llvm.org/D120033
This ensures the right order in the sink-after map is maintained. If we
re-sink an instruction, it must be sunk after all earlier instructions
have been sunk.
Fixes https://github.com/llvm/llvm-project/issues/54223
Prior to this change LLVM would happily elide a call to any allocation
function and a call to any free function operating on the same unused
pointer. This can cause problems in some obscure cases, for example if
the body of operator::new can be inlined but the body of
operator::delete can't, as in this example from jyknight:
#include <stdlib.h>
#include <stdio.h>
int allocs = 0;
void *operator new(size_t n) {
allocs++;
void *mem = malloc(n);
if (!mem) abort();
return mem;
}
__attribute__((noinline)) void operator delete(void *mem) noexcept {
allocs--;
free(mem);
}
void deleteit(int*i) { delete i; }
int main() {
int*i = new int;
deleteit(i);
if (allocs != 0)
printf("MEMORY LEAK! allocs: %d\n", allocs);
}
This patch addresses the issue by introducing the concept of an
allocator function family and uses it to make sure that alloc/free
function pairs are only removed if they're in the same family.
Differential Revision: https://reviews.llvm.org/D117356
Previous and OhterPrev may not be in the same block. Use DT::dominates
instead of local comesBefore. DT::dominates is already used earlier to
check the order of Previous and SinkCandidate.
Fixes https://github.com/llvm/llvm-project/issues/54195
Per discussion on
https://reviews.llvm.org/D59709#inline-1148734, this seems like the
right course of action. `canBeOmittedFromSymbolTable()` subsumes and
generalizes the previous logic. In addition to handling `linkonce_odr`
`unnamed_addr` globals, we now also internalize `linkonce_odr` +
`local_unnamed_addr` constants.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D120173
The similar getICmpCode and getPredForICmpCode are already there.
This moves FP for consistency.
I think InstCombine is currently the only user of both.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D120754
This patch extends first-order recurrence handling to support cases
where we already sunk an instruction for a different recurrence, but
LastPrev comes before Previous.
To handle those cases correctly, we need to find the earliest entry for
the sink-after chain, because this is references the Previous from the
original recurrence. This is needed to ensure we use the correct
instruction as sink point.
Depends on D118558.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D118642
Instead of passing an InstCmpInt * and a bool just pass the predicate
from the caller.
I'm considering moving the similar FCmp functions from InstCombine
over here and this makes the interface consistent with what is used
for FCmp.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D120609
For unreachable loops, any BECount is legal, and since D98706 SCEV
can make use of this for loops that are unreachable due to constant
branches. To avoid false positives, adjust SCEV verification to only
check BECounts in reachable loops.
Fixes https://github.com/llvm/llvm-project/issues/50523.
Differential Revision: https://reviews.llvm.org/D120651
The change fixes treatment of constrained compare intrinsics if
compared values are of vector type.
Differential revision: https://reviews.llvm.org/D110322
SCEVs ExprValueMap currently tracks not only which IR Values
correspond to a given SCEV expression, but additionally stores that
it may be expanded in the form X+Offset. In theory, this allows
reusing existing IR Values in more cases.
In practice, this doesn't seem to be particularly useful (the test
changes are rather underwhelming) and adds a good bit of complexity.
Per https://github.com/llvm/llvm-project/issues/53905, we have an
invalidation issue with these offseted expressions.
Differential Revision: https://reviews.llvm.org/D120311
D118090 causes a pretty significant (19%) regression in some Eigen
benchmarks. Investigating is a bit time consuming as the compilation
unit where this occurs is large. Rather than revert, this patch adds a
flag controlling that behavior (enabled by default).
Adds new optimization remarks when loop vectorization fails due to
the compiler being unable to find bound of an array access inside
a loop
Differential Revision: https://reviews.llvm.org/D115873
In D111530, I suggested that we add some relatively basic pattern-matching
folds for shifts and funnel shifts and avoid a more specialized solution
if possible.
We can start by implementing at least one of these in IR because it's
easier to write the code and verify with Alive2:
https://alive2.llvm.org/ce/z/qHpmNn
This will need to be adapted/extended for SDAG to handle the motivating
bug ( #49541 ) because the patterns only appear later with that example
(added some tests: bb850d422b)
This can be extended within InstSimplify to handle cases where we 'and'
with a shift too (in that case, kill the funnel shift).
We could also handle patterns where the shift and funnel shift directions
are inverted, but I think it's better to canonicalize that instead to
avoid pattern-match case explosion.
Differential Revision: https://reviews.llvm.org/D120253
This is the same special logic we apply for SPF signed clamps
when computing the number of sign bits, just for intrinsics.
This just uses the same logic as the select case, but there's
multiple directions this could be improved in: We could also use
the num sign bits from the clamped value, we could do this during
constant range calculation, and there's probably unsigned analogues
for the constant range case at least.
Extends getReductionOpChain to look through Phis which may be part of
the reduction chain. adjustRecipesForReductions will now also create a
CondOp for VPReductionRecipe if the block is predicated and not only if
foldTailByMasking is true.
Changes were required in tryToBlend to ensure that we don't attempt
to convert the reduction Phi into a select by returning a VPBlendRecipe.
The VPReductionRecipe will create a select between the Phi and the reduction.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D117580
This patch fixes a logical error in how we work with `LoopUsers` map.
It maps a loop onto a set of AddRecs that depend on it. The Addrecs
are added to this map only once when they are created and put to
the UniqueSCEVs` map.
The only purpose of this map is to make sure that, whenever we forget
a loop, all (directly or indirectly) dependent SCEVs get forgotten too.
Current code erases SCEVs from dependent set of a given loop whenever
we forget this loop. This is not a correct behavior due to the following scenario:
1. We have a loop `L` and an AddRec `AR` that depends on it;
2. We modify something in the loop, but don't destroy it. We still call forgetLoop on it;
3. `AR` is no longer dependent on `L` according to `LoopUsers`. It is erased from
ValueExprMap` and `ExprValue map, but still exists in UniqueSCEVs;
4. We can later request the very same AddRec for the very same loop again, and get existing
SCEV `AR`.
5. Now, `AR` exists and is used again, but its notion that it depends on `L` is lost;
6. Then we decide to delete `L`. `AR` will not be forgotten because we have lost it;
7. Just you wait when you run into a dangling pointer problem, or any other kind of problem
because an active SCEV is now referecing a non-existent loop.
The solution to this is to stop erasing values from `LoopUsers`. Yes, we will maybe forget something
that is already not used, but it's cheap.
This fixes a functional bug and potentially may have negative compile time impact on methods with
huge or numerous loops.
Differential Revision: https://reviews.llvm.org/D120303
Reviewed By: nikic
This patch fixes an invalid TypeSize->uint64_t implicit conversion in
FoldReinterpretLoadFromConst. If the size of the constant is scalable
we bail out of the optimisation for now.
Tests added here:
Transforms/InstCombine/load-store-forward.ll
Differential Revision: https://reviews.llvm.org/D120240
The problem can be shown from the newly added test case.
There are two invocations to MemorySSAUpdater::moveToPlace, and the
internal data structure VisitedBlocks is changed in the first
invocation, and reused in the second invocation. In between the two
invocations, there is a change to the CFG, and MemorySSAUpdater is
notified about the change.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D119898
Make the reading of condition for restricting re-ordering simpler.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D120005
The code was using exact sizing only, but since what we really need is just to make sure the offsets are in bounds, a minimum bound on the object size is sufficient.
To demonstrate the difference, support computing minimum sizes from obects of scalable vector type.
Remove some code which tried to handle the case of comparing two allocas where an object size could not be precisely computed. This code had zero coverage in tree, and at least one nasty bug.
The bug comes from the fact that the code uses the size of the result pointer as a proxy for whether the alloca can be of size zero. Since the result of an alloca is *always* a pointer type, and a pointer type can *never* be empty, this check was a nop. As a result, we blindly consider a zero offset from two allocas to never be equal. They can in fact be equal when one or more of the allocas is zero sized.
This is particularly ugly because instcombine contains the exact opposite rule. If instcombine reaches the allocas first, it combines them into one (making them equal). If instsimplify reaches the compare first, it would consider them not equal. This creates all kinds of fun scenarios for order of optimization reaching different and contradictory conclusions.
Our current strategy of computing ranges of SCEVUnknown Phis was to simply
compute the union of ranges of all its inputs. In order to avoid infinite recursion,
we mark Phis as pending and conservatively return full set for them. As result,
even simplest patterns of cycled phis always have a range of full set.
This patch makes this logic a bit smarter. We basically do the same, but instead
of taking inputs of single Phi we find its strongly connected component (SCC)
and compute the union of all inputs that come into this SCC from outside.
Processing entire SCC together has one more advantage: we can set range for all
of them at once, because the only thing that happens to them is the same value is
being passed between those Phis. So, despite we spend more time analyzing a
single Phi, overall we may save time by not processing other SCC members, so
amortized compile time spent should be approximately the same.
Differential Revision: https://reviews.llvm.org/D110620
Reviewed By: reames
Atomic store with Release semantic allows re-ordering of unordered load/store before the store.
Implement it.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119844
zext(umin(x,y)) == umin(zext(x),zext(y))
zext(x) == 0 -> x == 0
While it is not a very likely scenario, we probably should not expect
that instcombine already dropped such a redundant zext,
but handle directly. Moreover, perhaps there was no ZExtInst,
and SCEV somehow managed to pull out said zext out of the SCEV expression.
zext(umin(x,y)) == umin(zext(x),zext(y))
zext(x) == 0 -> x == 0
Extra leading zeros do not affect the result of comparison with zero,
nor do they matter for the unsigned min/max,
so we should not be dissuaded when we find a zero-extensions,
but instead we should just skip it.
In a prior review I was asked to move the helper function canIgnoreSNaN()
out to FPEnv.h. This wasn't possible at the time because that function
needs the fast math flags, and including them includes lots of other stuff
that isn't needed.
This patch moves the fast math flags out into a new FMF.h file unchanged,
and moves the helper function out to FPEnv.h also unchanged. This ticket
only moves code around.
Differential Revision: https://reviews.llvm.org/D119752
Currently the fsub optimizations in InstSimplify don't know how to fold
X - -0.0 to X when we know X is not zero and the constrained intrinsics
are used. This adds the support.
This review is split out from D107285.
Differential Revision: https://reviews.llvm.org/D119746
This one tries to fix:
https://github.com/llvm/llvm-project/issues/53357.
Simply, this one would check (x & y) and ~(x | y) in
haveNoCommonBitsSet. Since they shouldn't have common bits (we could
traverse the case by enumerating), and we could convert this one to (x &
y) | ~(x | y) . Then the compiler could handle it in
InstCombineAndOrXor.
Further more, since ((x & y) + (~x & ~y)) would be converted to ((x & y)
+ ~(x | y)), this patch would fix it too.
https://alive2.llvm.org/ce/z/qsKzRS
Reviewed By: spatel, xbolva00, RKSimon, lebedev.ri
Differential Revision: https://reviews.llvm.org/D118094
Volatile store does not provide any special rules for reordering with
atomics. Usual must alias anaylsis is enough here.
This makes the bahavior similar to how volatile load is handled.
Reviewers: reames, nikic
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D119818
A more general enhancement needs to add tests and make sure
that intrinsics that return structs are correct. There are also
target-specific intrinsics, and I'm not sure what behavior is
expected for those.
A more general enhancement needs to add tests and make sure
that intrinsics that return structs are correct. There are also
target-specific intrinsics, and I'm not sure what behavior is
expected for those.
Instead of doing an inbounds strip first and another non-inbounds
strip afterward for equality comparisons, directly do a single
inbounds or non-inbounds strip based on whether we have an equality
predicate or not.
This is NFC-ish in that the alloca equality codepath is the only
part that sees additional non-inbounds offsets now, and for that
codepath it doesn't matter whether or not the GEP is inbounds, as
it does a stronger check itself. InstCombine would infer inbounds
for such GEPs.
Currently the fsub optimizations in InstSimplify don't know how to fold X
- +0.0 to X when using the constrained intrinsics. This adds the support.
This review is split out from D107285.
Differential Revision: https://reviews.llvm.org/D118928
Fixes a MemCpyOpt miscompile with opaque pointers.
This function can be further cleaned up, but let's just fix the miscompile first.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D119652
This mechanism was used for a couple of purposes, but the primary one was keeping track of which predicates in a union might apply to an expression. As these sets are small and agressively deduped, this has little value.
Even if the search is marked as terminated after only looking at
the first operand, we'd still look at the remaining operands
before actually ending the search.
This seems pointless and wasteful, let's not do that.
Since we don't greedily flatten `umin_seq(a, umin(b, c))` into `umin_seq(a, b, c)`,
just looking at the operands of the outer-level `umin` is not sufficient,
and we need to recurse into all same-typed `umin`'s.
This lets us avoid redundant implication work in the constructor of SCEVUnionPredicate which simplifies an upcoming change. If we're actually building a predicate via PSE, that goes through addPredicate which does include the implication check.
SCCP requires that the load/store type and global type are the
same (it does not support bitcasts of tracked globals). With
typed pointers this was implicitly enforced.
Note that this doesn't actually cause the top level predicate to become a non-union just yet.
The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
We'd catch the tautological select pattern later anyways
due to constant folding, so that leaves PHI-like select,
but it does not appear to fire there.
Currently `createNodeForSelectOrPHI()` takes an Instruction,
and only works on the Cond that is an ICmpInst,
but that can be relaxed somewhat.
For now, simply rename the existing function,
and add a thin wrapper ontop that still does
the same thing as it used to.
https://alive2.llvm.org/ce/z/ULuZxB
We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `add`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
https://alive2.llvm.org/ce/z/aKAr94
We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umin`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
https://alive2.llvm.org/ce/z/SMEaoc
We could transparently handle wider bitwidths,
by effectively casting iN to <N x i1> and performing the `umax`
bit/element -wise, the expression will be rather large,
so let's not do that for now.
The code was relying upon the implicit conversion of TypeSize to
uint64_t and assuming the type in question was always fixed. However,
I discovered an issue when running the canon-freeze pass with some
IR loops that contains scalable vector types. I've changed the code
to bail out if the size is unknown at compile time, since we cannot
compute whether the step is a multiple of the type size or not.
I added a test here:
Transforms/CanonicalizeFreezeInLoops/phis.ll
Differential Revision: https://reviews.llvm.org/D118696
This is the last major stepping stone before being able to allocate the node via the folding set allocator. That will in turn allow more general SCEV predicate expression trees.
For those curious, the whole reason for tracking the predicate set seperately as opposed to just immediately registering the dependencies appears to be allowing the printing code to print a result without changing the PSE state. It's slightly questionable if this justifies the complexity, but since we can preserve it with local ugliness, I did so.
Previously we relied on the pointee type to determine what type we need
to do runtime pointer access checks.
With opaque pointers, we can access a pointer with more than one type,
so now we keep track of all the types we're accessing a pointer's
memory with.
Also some other minor getPointerElementType() removals.
Reviewed By: #opaque-pointers, nikic
Differential Revision: https://reviews.llvm.org/D119047
PredicatedScalarEvolution has a predicate type for representing A == B. This change generalizes it into something which can represent a A <pred> B.
This generality is currently unused, but is motivated by a couple of recent cases which have come up. In particular, I'm currently playing around with using this to simplify the runtime checking code in LoopVectorizer. Regardless of the outcome of that prototyping, generalizing the compare node seemed useful.
D108992 added KnownBits handling for 'Quadratic Reciprocity' self-multiplication patterns (bit[1] == 0), which can be used for non-undef values (poison is OK).
This patch adds noundef selfmultiply handling to value tracking so demanded bits patterns can make use of it.
Differential Revision: https://reviews.llvm.org/D117995
Use existing functionality to strip constant offsets that works well
with AS casts and avoids the code duplication.
Since we strip AS casts during the computation of the offset we also
need to adjust the APInt properly to avoid mismatches in the bit width.
This code ensures the caller of `compute` sees APInts that match the
index type size of the value passed to `compute`, not the value result
of the strip pointer cast.
Fixes#53559.
Differential Revision: https://reviews.llvm.org/D118727
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.
More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.
This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].
As a consequence of that patch, downstream user may need to manually some extra
files:
llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h
In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after: 10978519
before: 11245451
Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions
Differential Revision: https://reviews.llvm.org/D118781
This is in anticipation of my next patch, where I need to store more information about free functions than just their argument count. It felt invasive enough on this function that it seemed worthwhile to just extract this as its own commit that makes no functional changes.
Differential Revision: https://reviews.llvm.org/D117350
The change implements constant folding of ‘llvm.experimental.constrained.fcmp’
and ‘llvm.experimental.constrained.fcmps’ intrinsics.
Differential Revision: https://reviews.llvm.org/D110322
Created to fix: https://github.com/llvm/llvm-project/issues/53537
Some intrinsics functions are considered commutative since they are performing operations like addition or multiplication. Some of these have extra parameters to provide extra information that are not part of the operation itself and are not commutative. This makes sure that if an instruction that is an intrinsic takes the non commutative path to handle this case.
Reviewer: paquette
Closes Issue #53537
Differential Revision: https://reviews.llvm.org/D118807
Adds new optimization remarks when vectorization fails.
More specifically, new remarks are added for following 4 cases:
- Backward dependency
- Backward dependency that prevents Store-to-load forwarding
- Forward dependency that prevents Store-to-load forwarding
- Unknown dependency
It is important to note that only one of the sources
of failures (to vectorize) is reported by the remarks.
This source of failure may not be first in program order.
A regression test has been added to test the following cases:
a) Loop can be vectorized: No optimization remark is emitted
b) Loop can not be vectorized: In this case an optimization
remark will be emitted for one source of failure.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D108371
When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept.
Reviewed By: jeroen.dobbelaere
Differential Revision: https://reviews.llvm.org/D108221
Extend scalar evolution to handle >= and <= if a loop is known to be finite and the induction variable guards the condition. Specifically, with these assumptions lhs <= rhs is equivalent to lhs < rhs + 1 and lhs >= rhs to lhs > rhs -1.
In the case of lhs <= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmax, which would have resulted in an infinite loop.
In the case of lhs >= rhs, this is true since the only case these are not equivalent
is when rhs == unsigned/signed intmin, which would again have resulted in an infinite loop.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D118090
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.
This also adds an additional command line flag debug option to disable outlining intrinsics.
Recommit of: 8de76bd569
Adds extra checking of intrinsic function calls names to avoid taking the address of intrinsic calls when extracting function calls.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D109450
Currently, basic AA has special support for llvm.memcpy.* intrinsics. This change extends this support for any memory trancsfer opration and in particular llvm.memmove.* intrinsic.
Reviewed By: reames, nikic
Differential Revision: https://reviews.llvm.org/D117095
This is a bugfix in IVDescriptor.cpp.
The helper function `RecurrenceDescriptor::getExactFPMathInst()`
is supposed to return the 1st FP instruction that does not allow
reordering. However, when constructing the RecurrenceDescriptor,
we trace the use-def chain staring from a PHI node and for each
instruction in the use-def chain, its descriptor overrides the
previous one. Therefore in the final RecurrenceDescriptor we
constructed, we lose previous FP instructions that does not allow
reordering.
Reviewed By: kmclaughlin
Differential Revision: https://reviews.llvm.org/D118073
This extract a common isNotVisibleOnUnwind() helper into
AliasAnalysis, which handles allocas, byval arguments and noalias
calls. After D116998 this could also handle sret arguments. We
have similar logic in DSE and MemCpyOpt, which will be switched
to use this helper as well.
The noalias call case is a bit different from the others, because
it also requires that the object is not captured. The caller is
responsible for doing the appropriate check.
Differential Revision: https://reviews.llvm.org/D117000
We use the same similarity scheme we used for branch instructions for phi nodes, and allow them to be outlined. There is not a lot of special handling needed for these phi nodes when outlining, as they simply act as outputs. The code extractor does not currently allow for non entry blocks within the extracted region to have predecessors, so there are not conflicts to handle with respect to predecessors no longer contained in the function.
Recommit of 515eec3553
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D106997
Due to some complications with lifetime, and assume-like intrinsics, intrinsics were not included as outlinable instructions. This patch opens up most intrinsics, excluding lifetime and assume-like intrinsics, to be outlined. For similarity, it is required that the intrinsic IDs, and the intrinsics names match exactly, as well as the function type. This puts intrinsics in a different class than normal call instructions (https://reviews.llvm.org/D109448), where the name will no longer have to match.
This also adds an additional command line flag debug option to disable outlining intrinsics.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D109450
The outliner currently requires that function calls not be indirect calls, and have that the function name, and function type must match, as well as other attributes such as calling conventions. This patch treats called functions as values, and just another operand, and named function calls as constants. This allows functions to be treated like any other constant, or input and output into the outlined functions.
There are also debugging flags added to enforce the old behaviors where indirect calls not be allowed, and to enforce the old rule that function calls names must also match.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D109448
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
This patch adds support for implication inference logic for the
following pattern:
```
lhs < (y >> z) <= y, y <= rhs --> lhs < rhs
```
We should be able to use the fact that value shifted to right is
not greater than the original value (provided it is non-negative).
Differential Revision: https://reviews.llvm.org/D116150
Reviewed-By: apilipenko
This matches the actual runtime function more closely.
I considered also renaming both RetainRV/UnsafeClaimRV to end with
"ARV", for AutoreleasedReturnValue, but there's less potential
for confusion there.
Presence of operand bundles changes semantics in respect to ModRef. In particular, spec says: "From the compilers perspective, deoptimization operand bundles make the call sites theyre attached to at least readonly. They read through all of their pointer typed operands (even if theyre not otherwise escaped) and the entire visible heap. Deoptimization operand bundles do not capture their operands except during deoptimization, in which case control will not be returned to the compiled frame". Fix handling of llvm.memcpy.* according to the spec.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D118033
The behavior in Analysis (knownbits) implements poison semantics already,
and we expect the transforms (for example, in instcombine) derived from
those semantics, so this patch changes the LangRef and remaining code to
be consistent. This is one more step in removing "undef" from LLVM.
Without this, I think https://github.com/llvm/llvm-project/issues/53330
has a legitimate complaint because that report wants to allow subsequent
code to mask off bits, and that is allowed with undef values. The clang
builtins are not actually documented anywhere AFAICT, but we might want
to add that to remove more uncertainty.
Differential Revision: https://reviews.llvm.org/D117912
Peculiarly, the necessary code to handle pointers (including the
check for non-integral address spaces) is already in place,
because we were already allowing vectors of pointers here, just
not plain pointers.
The tensorflow AOT compiler can cross-target, but it can't run on (for
example) arm64. We added earlier support where the AOT-ed header and object
would be built on a separate builder and then passed at build time to
a build host where the AOT compiler can't run, but clang can be otherwise
built.
To simplify such scenarios given we now support more than one AOT-able
case (regalloc and inliner), we make the AOT scenario centered on whether
files are generated, case by case (this includes the "passed from a
different builder" scenario).
This means we shouldn't need an 'umbrella' LLVM_HAVE_TF_AOT, in favor of
case by case control. A builder can opt out of an AOT case by passing that case's
model path as `none`. Note that the overrides still take precedence.
This patch controls conditional compilation with case-specific flags,
which can be enabled locally, for the component where those are
available. We still keep an overall flag for some tests.
The 'development/training' mode is unchanged, because there the model is
passed from the command line and interpreted.
Differential Revision: https://reviews.llvm.org/D117752
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information (currently after the Virtual Register
Rewriter) and then produce the log file.
This patch also introduces the score injection pass, 'Register
Allocation Pass Scoring', which is trivially just logging the score in
development mode.
Differential Revision: https://reviews.llvm.org/D117147
The global state refers to the number of the nodes currently in the
module, and the number of direct calls between nodes, across the
module.
Node counts are not a problem; edge counts are because we want strictly
the kind of edges that affect inlining (direct calls), and that is not
easily obtainable without iteration over the whole module.
This patch avoids relying on analysis invalidation because it turned out
to be too aggressive in some cases. It leverages the fact that Node
objects are stable - they do not get deleted while cgscc passes are
run over the module; and cgscc pass manager invariants.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D115847
LLVM Programmer’s Manual strongly discourages the use of `std::vector<bool>` and suggests `llvm::BitVector` as a possible replacement.
This patch does just that for llvm.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D117121
Integrate intersection with assumes into getBlockValue(), to ensure
that it is consistently performed.
We were doing it in nearly all places, but for example missed it
for select inputs.
Following up on 1470f94d71 (r63981173):
The result here (probably) depends on endianness. Don't bother
trying to handle this exotic case, just bail out.
Allocation functions should be marked with onlyAccessesInaccessibleMemory (when that is correct for the given function) which is checked elsewhere so this check is no longer needed.
Differential Revision: https://reviews.llvm.org/D117180
Since we don't merge/expand non-sequential umin exprs into umin_seq exprs,
we may have umin_seq(umin(umin_seq())) chain, and the innermost umin_seq
can have duplicate operands still.
This doesn't require callers to put the pointer operand and the indices
in a container like a vector when calling the function. This is not
really an issue with the existing callers. But when using it from
IRBuilder the inputs are available as separate pointer value and indices
ArrayRef.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D117038
The reinterpret load code will convert undef values into zero.
Check the uniform value case before it to produce a better result
for all-undef initializers.
However, the uniform value handling will return the uniform value
even if the access is out of bounds, while the reinterpret load
code will return undef. Add an explicit check to retain the
previous result in this case.
The basic idea is that we can parameterize the getObjectSize implementation with a callback which lets us replace the operand before analysis if desired. This is what Attributor is doing during it's abstract interpretation, and allows us to have one copy of the code.
Note this is not NFC for two reasons:
* The existing attributor code is wrong. (Well, this is under-specified to be honest, but at least inconsistent.) The intermediate math needs to be done in the index type of the pointer space. Imagine e.g. i64 arguments in a 32 bit address space.
* I did not preserve the behavior in getAPInt where we return 0 for a partially analyzed value. This looks simply wrong in the original code, and nothing test wise contradicts that.
Differential Revision: https://reviews.llvm.org/D117241
Since 26c6a3e736, LLVM's inliner will "upgrade" the caller's stack protector
attribute based on the callee. This lead to surprising results with Clang's
no_stack_protector attribute added in 4fbf84c173 (D46300). Consider the
following code compiled with clang -fstack-protector-strong -Os
(https://godbolt.org/z/7s3rW7a1q).
extern void h(int* p);
inline __attribute__((always_inline)) int g() {
return 0;
}
int __attribute__((__no_stack_protector__)) f() {
int a[1];
h(a);
return g();
}
LLVM will inline g() into f(), and f() would get a stack protector, against the
users explicit wishes, potentially breaking the program e.g. if h() changes the
value of the stack cookie. That's a miscompile.
More recently, bc044a88ee (D91816) addressed this problem by preventing
inlining when the stack protector is disabled in the caller and enabled in the
callee or vice versa. However, the problem remained if the callee is marked
always_inline as in the example above. This affected users, see e.g.
http://crbug.com/1274129 and http://llvm.org/pr52886.
One way to fix this would be to prevent inlining also in the always_inline
case. Despite the name, always_inline does not guarantee inlining, so this
would be legal but potentially surprising to users.
However, I think the better fix is to not enable the stack protector in a
caller based on the callee. The motivation for the old behaviour is unclear, it
seems counter-intuitive, and causes real problems as we've seen.
This commit implements that fix, which means in the example above, g() gets
inlined into f() (also without always_inline), and f() is emitted without stack
protector. I think that matches most developers' expectations, and that's also
what GCC does.
Another effect of this change is that a no_stack_protector function can now be
inlined into a stack protected function, e.g. (https://godbolt.org/z/hafP6W856):
extern void h(int* p);
inline int __attribute__((__no_stack_protector__)) __attribute__((always_inline)) g() {
return 0;
}
int f() {
int a[1];
h(a);
return g();
}
I think that's fine. Such code would be unusual since no_stack_protector is
normally applied to a program entry point which sets up the stack canary. And
even if such code exists, inlining doesn't change the semantics: there is still
no stack cookie setup/check around entry/exit of the g() code region, but there
may be in the surrounding context, as there was before inlining. This also
matches GCC.
See also the discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722
Differential revision: https://reviews.llvm.org/D116589
We could use knownbits on both operands for even more folds (and there are
already tests in place for that), but this is enough to recover the example
from:
https://github.com/llvm/llvm-project/issues/51934
(the tests are derived from the code in that example)
I am assuming no noticeable compile-time impact from this because udiv/urem
are rare opcodes.
Differential Revision: https://reviews.llvm.org/D116616
This is required to query the legality more precisely in the LoopVectorizer.
This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.
Differential Revision: https://reviews.llvm.org/D115329
This avoids the InlineAdvisor carrying the responsibility of deleting
Function objects. We use LazyCallGraph::Node objects instead, which are
stable in memory for the duration of the Module-wide performance of CGSCC
passes started under the same ModuleToPostOrderCGSCCPassAdaptor (which
is the case here)
Differential Revision: https://reviews.llvm.org/D116964
This happens in e.g. regalloc, where we trace decisions per function,
but wouldn't want to spew N log files (i.e. one per function). So we
output a key-value association, where the key is an ID for the
sub-module object, and the value is the tensorflow::SequenceExample.
The current relation with protobuf is tenuous, so we're avoiding a
custom message type in favor of using the `Struct` message, but that
requires the values be wire-able strings, hence base64 encoding.
We plan on resolving the protobuf situation shortly, and improve the
encoding of such logs, but this is sufficient for now for setting up
regalloc training.
Differential Revision: https://reviews.llvm.org/D116985
Alternative to D116817.
This introduces a new value-based folding interface for Or (FoldOr),
which takes 2 values and returns an existing Value or a constant if the
Or can be simplified. Otherwise nullptr is returned. This replaces the
more restrictive CreateOr which takes 2 constants.
This is the used to implement a folder that uses InstructionSimplify.
The logic to simplify `Or` instructions is moved there. Subsequent
patches are going to transition other CreateXXX to the more general
FoldXXX interface.
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D116935
Extend the existing malloc-family specific optimization to all noalias calls. This allows us to handle allocation wrappers, and removes a dependency on a lib-func check in favor of generic attribute usage.
Differential Revision: https://reviews.llvm.org/D116980
D92270 updated constant expression folding to fold inbounds GEP to
poison if the base is undef. Apply the same logic to SimplifyGEPInst.
The justification is that we can choose an out-of-bounds pointer as base
pointer.
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D117015
We could just merge all umin into umin_seq, but that is likely
a pessimization, so don't do that, but pretend that we did
for the purpose of deduplication.
Having the same operand more than once doesn't change the outcome here,
neither reduction-wise nor poison-wise.
We must keep the first instance specifically though.
Two crashes have been reported. This change disables the new logic while leaving the new node in tree. Hopefully, that's enough to allow investigation without breakage while avoiding massive churn.
Not all allocation functions are removable if unused. An example of a non-removable allocation would be a direct call to the replaceable global allocation function in C++. An example of a removable one - at least according to historical practice - would be malloc.
As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692,
SCEV is forbidden from reasoning about 'backedge taken count'
if the branch condition is a poison-safe logical operation,
which is conservatively correct, but is severely limiting.
Instead, we should have a way to express those
poison blocking properties in SCEV expressions.
The proposed semantics is:
```
Sequential/in-order min/max SCEV expressions are non-commutative variants
of commutative min/max SCEV expressions. If none of their operands
are poison, then they are functionally equivalent, otherwise,
if the operand that represents the saturation point* of given expression,
comes before the first poison operand, then the whole expression is not poison,
but is said saturation point.
```
* saturation point - the maximal/minimal possible integer value for the given type
The lowering is straight-forward:
```
compare each operand to the saturation point,
perform sequential in-order logical-or (poison-safe!) ordered reduction
over those checks, and if reduction returned true then return
saturation point else return the naive min/max reduction over the operands
```
https://alive2.llvm.org/ce/z/Q7jxvH (2 ops)
https://alive2.llvm.org/ce/z/QCRrhk (3 ops)
Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS
Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97
That allows us to handle the patterns in question.
Reviewed By: nikic, reames
Differential Revision: https://reviews.llvm.org/D116766
(Split from original patch to separate non-NFC part and add coverage. I typoed when adding the new test, so this change includes the typo fix to let libfunc recongize the signature. Didn't figure it was worth another separate commit.)
Differential Revision: https://reviews.llvm.org/D116851 (part 2 of 2)
There are a few places where the alignment argument for AlignedAllocLike functions was previously hardcoded. This patch adds an getAllocAlignment function and a change to the MemoryBuiltin table to allow alignment arguments to be found generically.
This will shortly allow alignment inference on operator new's with align_val params and an extension to Attributor's HeapToStack. The former will follow shortly - I split Bryce's patch for purpose of having the large change be NFC. The later will be reviewed separately.
Differential Revision: https://reviews.llvm.org/D116851 (part 1 of 2)
We currently have two similar implementations of this concept:
isNoAliasCall() only checks for the noalias return attribute.
isNoAliasFn() also checks for allocation functions.
We should switch to only checking the attribute. SLC is responsible
for inferring the noalias return attribute for non-new allocation
functions (with a missing case fixed in
348bc76e35).
For new, clang is responsible for setting the attribute,
if -fno-assume-sane-operator-new is not passed.
Differential Revision: https://reviews.llvm.org/D116800
We're testing that the RegionLoop pointer is null in the first part of the check, so we need to check that its non-null before dereferencing it in a later part of the check.
strdup/strndup are already partially implemented, move remaining comment to relevant place. Remaining named routines are copy routines and mostly handled via intrinsics already - they do not allocate new memory.
This is in preparation for D115545 which attempts to delete discardable functions if they are unused. With that change, shifting RefSCCs becomes noticeable in compile time. This change makes the LCG update negligible again.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D116776
This is a reoccuring pattern, we can consolidate three copies into one. The main motivation is to reduce usages of isMallocLike.
The original commit (which was quickly reverted) didn't account for the allocation function could be an invoke, test coverage for that case added in this commit.
This patch adds a couple of NewPM function passes (dot-dom and
dot-dom-only) that dump DomTree into .dot files.
Reviewed-By: aeubanks
Differential Revision: https://reviews.llvm.org/D116629
This reverts commit 640beb38e7.
That commit caused performance degradtion in Quicksilver test QS:sGPU and a functional test failure in (rocPRIM rocprim.device_segmented_radix_sort).
Reverting until we have a better solution to s_cselect_b64 codegen cleanup
Change-Id: Ibf8e397df94001f248fba609f072088a46abae08
Reviewed By: kzhuravl
Differential Revision: https://reviews.llvm.org/D115960
Change-Id: Id169459ce4dfffa857d5645a0af50b0063ce1105
The naming has come up as a source of confusion in several recent reviews. onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.
In particular, this also preserves undef when loading from padding,
rather than converting it to zero through a different codepath.
This is the remaining part of D115924.
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case b) it bypasses casts that might not be legal generally but
do work with uniform values.
We had multiple implementations of this, with a different set of
supported values each time. This replaces two usages with a more
complete helper. Other usages will be replaced separately, because
they have larger impact.
This is part of D115924.
This was noted in post-commit review for D116322 / 0edf99950e .
I am not seeing how to expose the bug in a test though because
we don't pass an assumption cache into this analysis from there.
This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them.
Motivation comes from compiling a simple example with -ftrapv.
Differential Revision: https://reviews.llvm.org/D116499
0a00d64 turned an early exit here into an assertion, but the assertion
can be triggered, as PR52920 shows.
The later code is agnostic to the accessed type, so just drop the
assert. The patch also adds tests for LAA directly and
loop-load-elimination to show the behavior is sane.
For loops that contain in-loop reductions but no loads or stores, large
VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes
has no element types to check through and so returns the default widths
(-1U for the smallest and 8 for the widest). This results in the widest
VF being chosen for the following example,
float s = 0;
for (int i = 0; i < N; ++i)
s += (float) i*i;
which, for more computationally intensive loops, leads to large loop
sizes when the operations end up being scalarized.
In this patch, for the case where ElementTypesInLoop is empty, the widest
type is determined by finding the smallest type used by recurrences in
the loop instead of falling back to a default value of 8 bits. This
results in the cost model choosing a more sensible VF for loops like
the one above.
Differential Revision: https://reviews.llvm.org/D113973
This function returns an upper bound on the number of bits needed
to represent the signed value. Use "Max" to match similar functions
in KnownBits like countMaxActiveBits.
Rename APInt::getMinSignedBits->getSignificantBits. Keeping the old
name around to keep this patch size down. Will do a bulk rename as
follow up.
Rename KnownBits::countMaxSignedBits->countMaxSignificantBits.
Reviewed By: lebedev.ri, RKSimon, spatel
Differential Revision: https://reviews.llvm.org/D116522
This reverts commit fd4808887e.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
We can fold an equality or unsigned icmp between base+offset1 and
base+offset2 with inbounds offsets by comparing the offsets directly.
This replaces a pair of specialized folds that tried to reason
based on the GEP structure instead. One of those folds was plain
wrong (because it does not account for negative offsets), while
the other is unnecessarily complicated and limited (e.g. it will
fail with bitcasts involved).
The disadvantage of this change is that it requires data layout,
so the fold is no longer performed by datalayout-independent
constant folding. I don't think this is a loss in practice, but
it does regress the ConstantExprFold.ll test, which checks folding
without running any passes.
Differential Revision: https://reviews.llvm.org/D116332
We should not lose analysis precision if an 'add' has both no-wrap
flags (nsw and nuw) compared to just one or the other.
This patch is modeled on a similar construct that was added with
D59386.
I don't think it is possible to expose a problem with an unsigned
compare because of the way this was coded (nuw is handled first).
InstCombine has an assert that fires with the example from:
https://github.com/llvm/llvm-project/issues/52884
...because it was expecting InstSimplify to handle this kind of
pattern with an smax.
Fixes#52884
Differential Revision: https://reviews.llvm.org/D116322
Remove the special casing for intrinsics in MemoryLocation::getForDest()
and handle them through the general attribute based code. On the DSE
side, this means that isRemovable() now needs to handle more than a
hardcoded list of intrinsics. We consider everything apart from
volatile memory intrinsics and lifetime markers to be removable.
This allows us to perform DSE on intrinsics that DSE has not been
specially taught about, using a matrix store as an example here.
There is an interesting test change for invariant.start, but I
believe that optimization is correct. It only looks a bit odd
because the code is immediate UB anyway.
Differential Revision: https://reviews.llvm.org/D116210
Adding following fold opportunity:
((A | B) ^ A) & ((A | B) ^ B) --> 0
Reviewed By: spatel, rampitec
Differential Revision: https://reviews.llvm.org/D115755
When looking at building the generator for regalloc, we realized we'd
need quite a bit of custom logic, and that perhaps it'd be easier to
just have each usecase (each kind of mlgo policy) have it's own
stand-alone test generator.
This patch just consolidates the old `config.py` and
`generate_mock_model.py` into one file, and does away with
subdirectories under Analysis/models.
As reames mentioned on related reviews, we don't need the nocapture
requirement here. First of all, from an API perspective, this is
not something that MemoryLocation::getForDest() should be checking
in the first place, because it does not affect which memory this
particular call can access; it's an orthogonal concern that should
be handled by the caller if necessary.
However, for both of the motivating users in DSE and InstCombine,
we don't need the nocapture requirement, because the capture can
either be purely local to the call (a pointer identity check that
is irrelevant to us), be part of the return value (which we check
is unused), or be written in the dest location, which we have
determined to be dead.
This allows us to remove the special handling for libcalls as well.
Differential Revision: https://reviews.llvm.org/D116148
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.
This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.
This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).
Differential Revision: https://reviews.llvm.org/D116031
This fixes the assertion failure reported at
https://reviews.llvm.org/D114889#3198921 with a straightforward
check, until the cleaner fix in D115924 can be reapplied.
This is a reapply of a8a51fe5, which was reverted in 1ba99e due to a failing compiler-rt test. That test was a false positive because it was checking asan failures not accounting for the fact the call could be validly optimized out. I hopefully managed to stablize that test in 9b955f. (That's a speculative fix due to disk consumption needed to build compiler-rt tests locally being absurd.)
Original commit message follows..
The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size.
Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet).
In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest.
Differential Revision: https://reviews.llvm.org/D115904
The availability of SVE should be sufficient to enable scalable
auto-vectorization.
This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.
Differential Revision: https://reviews.llvm.org/D115651
This reverts commit 9fd4f80e33.
This breaks SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c
in test-suite. Either the test is incorrect, or clang is generating
incorrect union initialization code. I've submitted
https://reviews.llvm.org/D115994 to fix the test, assuming my
interpretation is correct. Reverting this in the meantime as it
may take some time to resolve.
Before this change, AAResults::getModRefInfo() was missing a case for
callbr instructions (asm goto), which may read/write memory. In PR52735,
this led to a miscompile where a load was incorrect eliminated.
Add this missing case, as well as an assert verifying that all
memory-accessing instructions are handled properly.
Fixes#52735.
Differential Revision: https://reviews.llvm.org/D115992
The majority of this change is sinking logic from instcombine into MemoryLocation such that it can be generically reused. If we have a call with a single analyzable write to an argument, we can treat that as-if it were a store of unknown size.
Merging the code in this was unblocks DSE in the store to dead memory code paths. In theory, it should also enable classic DSE of such calls, but the code appears to not know how to use object sizes to refine unknown access bounds (yet).
In addition, this does make the isAllocRemovable path slightly stronger by reusing the libfunc and additional intrinsics bits which are already in getForDest.
Differential Revision: https://reviews.llvm.org/D115904
Pull out an explicit check rather than relying on the fact that the callee operand is not a data operand. The only real value is it gives us a clear place to move the comment, and makes the code slightly more understandable.
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case and b) it bypasses casts that might not be legal generally
but do work with uniform values.
We had multiple implementations of this, with a different set of
supported values each time, as well as incomplete type checks in
some cases. In particular, this fixes the assertion reported in
https://reviews.llvm.org/D114889#3198921, as well as a similar
assertion that could be triggered via constant folding.
Differential Revision: https://reviews.llvm.org/D115924
Preserve the invariant that offset reported in the case of a
`PartialAlias` between `Loc1` and `Loc2`, is such that
`Loc1 + Offset = Loc2`, where `Loc1` and `Loc2` are the first and
the second argument, respectively, in alias queries.
Differential Revision: https://reviews.llvm.org/D115927
This patch updates applyLoopGuards to first collect all conditions and
then applies them in reverse order. This ensures the SCEVs with the
shortest dependency chains are constructed first, limiting the required
stack size.
This fixes a crash reported in D113578.
Note that the order conditions are applied can impact the accuracy of
the result, mostly due to missing min/max simplifications when
constructing SCEVs.
The changed test highlights the impact of the evaluation order. I will
follow up with a SCEV patch to improve min/max simplifications to get
the same results for both orders.
After the switch to the new pass manager, we have observed multiple
instances of catastrophic inlining, where the inliner produces huge
functions with many hundreds of thousands of instructions from small
input IR. We were forced to back out the switch to the new pass
manager for this reason. This patch fixes at least one of the root
cause issues.
LLVM uses a bottom-up inliner, and the fact that functions are processed
bottom-up is not just a question of optimality -- it is an imporant
requirement to prevent runaway inlining. The premise of the current
inlining approach and cost model is that after all calls inside a function
have been inlined, it may get large enough that inlining it into its
callers is no longer considered profitable. This safeguard does not
exist if inlining doesn't happen bottom-up, as inlining the callees,
and their callees, and their callees etc. will always seem individually
profitable, and the inliner can easily flatten the whole call tree.
There are instances where we necessarily have to deviate from bottom-up
inlining: When inlining in an SCC there is no natural "bottom", so
inlining effectively happens top-down. This requires special care,
and the inliner avoids exponential blowup by ensuring that functions
in the SCC grow in a balanced way and will eventually hit the threshold.
However, there is one instance where the inlining advisor explicitly
violates the bottom-up principle: Deferred inlining tries to "defer"
inlining a call if it determines that inlining the caller into all
its call-sites would be more profitable. Something very important to
understand about deferred inlining is that it doesn't make one inlining
choice in place of another -- it effectively chooses to do both. If we
have a call chain A -> B -> C and cost modelling tells us that inlining
B -> C is profitable, but we defer this and instead inline A -> B first,
then we'll now have a call A -> C, and the cost model will (a few special
cases notwithstanding) still tell us that this is profitable. So the end
result is that we inlined *both* B and C, even though under the usual
cost model function B would have been too large to further inline after
C has been integrated into it.
Because deferred inlining violates the bottom-up invariant of the inliner,
it can result in exponential inlining. The exponential-deferred-inlining.ll
test case illustrates this on a simple example (see
https://gist.github.com/nikic/1262b5f7d27278e1b34a190ae10947f5 for a
much more catastrophic case with about 5000x size blowup). If the call
chain A -> B -> C is not a chain but a tree of calls, then we end up
deferring inlining across the tree and end up flattening everything into
the root node.
This patch proposes to address this by disabling deferred inlining
entirely (currently still behind an option). Beyond the issue of
exponential inlining, I don't think that the whole concept makes sense,
at least as long as deferred inlining still ends up inlining both call
edges.
I believe the motivation for having deferred inlining in the first place
is that you might have a small wrapper function with local linkage that
could be eliminated if inlined. This would automatically happen if there
was a single caller, due to the large "last call to local" bonus. However,
this bonus is not extended if there are multiple callers, even if we
would eventually end up inlining into all of them (if the bonus were
extended).
Now, unlike the normal inlining cost model, the deferred inlining cost
model does look at all callers, and will extend the "last call to local"
bonus if it determines that we could inline all of them as long as we
defer the current inlining decision. This makes very little sense.
The "last call to local" bonus doesn't really cost model anything.
It's basically an "infinite" bonus that ensures we always inline the
last call to a local. The fact that it's not literally infinite just
prevents inlining of huge functions, which can easily result in
scalability issues. I very much doubt that it was an intentional
cost-modelling choice to say that getting rid of a small local function
is worth adding 15000 instructions elsewhere, yet this is exactly how
this value is getting used here.
The main alternative I see to complete removal is to change deferred
inlining to an actual either/or decision. That is, to mark deferred
calls as noinline so we're actually trading off one inlining decision
against another, and not just adding a side-channel to the cost model
to do both.
Apart from fixing the catastrophic inlining case, the effect on rustc
is a modest compile-time improvement on average (up to 8% for a
parsing-type crate, where tree-like calls are expected) and pretty
neutral where run-time performance is concerned (mix of small wins
and losses, usually in the sub-1% category).
Differential Revision: https://reviews.llvm.org/D115497
An well-formed IR function definition must have an entry basic block and
a well-formed IR basic block must have one terminator so the emptiness
check can be simplified.
Also simplify the test a bit.
Reviewed By: luna
Differential Revision: https://reviews.llvm.org/D115780
These are deprecated and should be replaced with getAlign().
Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.
These flags are documented as generating poison values for particular input values. As such, we should really be consistent about their handling with how we handle nsw/nuw/exact/inbounds.
Differential Revision: https://reviews.llvm.org/D115460
This reverts commit ac60263ad1.
It looks like the test fails on certain non-Darwin system, even though
the triple is explicitly set to macos. Revert while I investigate.
memset_pattern{4,8,16} writes to the first argument. Use getForDest
to return the corresponding MemoryLocation.
Reviewed By: ab
Differential Revision: https://reviews.llvm.org/D114906
Usually the case where the types are the same ends up being handled
fine because it's legal to do a trivial bitcast to the same type.
However, this is not true for aggregate types. Short-circuit the
whole code if the types match exactly to account for this.
Reverts 02940d6d22. Fixes breakage in the modules build.
LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.
The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.
This review is a restart of an older review request:
https://reviews.llvm.org/D83094
Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>
Differential Revision: https://reviews.llvm.org/D112696
Refer to https://llvm.org/PR52546.
Simplifies the following cases:
not(X) == 0 -> X != 0 -> X
not(X) <=u 0 -> X >u 0 -> X
not(X) >=s 0 -> X <s 0 -> X
not(X) != 1 -> X == 1 -> X
not(X) <=u 1 -> X >=u 1 -> X
not(X) >s 1 -> X <=s -1 -> X
Differential Revision: https://reviews.llvm.org/D114666
This prepares it for the regalloc work. Part of it is making model
evaluation accross 'development' and 'release' scenarios more reusable.
This patch:
- extends support to tensors of any shape (not just scalars, like we had
in the inliner -Oz case). While the tensor shape can be anything, we
assume row-major layout and expose the tensor as a buffer.
- exposes the NoInferenceModelRunner, which we use in the 'development'
mode to keep the evaluation code path consistent and simplify logging,
as we'll want to reuse it in the regalloc case.
Differential Revision: https://reviews.llvm.org/D115306
memset_pattern{4,8} behave as memset_pattern16, with the only difference
being the size of the pattern location.
Reviewed By: ab
Differential Revision: https://reviews.llvm.org/D114905
In the isDependence function the code does not try hard enough
to determine the dependence between types. If the types are
different it simply gives up, whereas in fact what we really
care about are the type sizes. I've changed the code to compare
sizes instead of types.
Reviewed By: fhahn, sdesmalen
Differential Revision: https://reviews.llvm.org/D108763
LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.
The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.
This review is a restart of an older review request:
https://reviews.llvm.org/D83094
Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>
Differential Revision: https://reviews.llvm.org/D112696