+/-0 is obviously foldable. Other non-special, non-subnormal
values are also probably OK. For denormal values, check
the calling function's denormal mode. For now, don't fold
denormals to the input for IEEE mode because as far as I know
the langref is still pretending LLVM's float isn't IEEE.
Also folds undef to 0, although NaN may make more sense. Skips
folding nans and infinities, although it should be OK to fold those
in a future change.
In practice this means that we can speculate more loads in SROA.
This e.g. comes up in https://godbolt.org/z/G8716s6sj,
although we are missing second half of the puzzle to optimize that.
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
If globals-aa is enabled, because of the deletion of memory instructions, there
may be call instruction that is not in ModOrRefSet but is a MemoryUseOrDef.
This causes the crash in the process of cloning uses and defs.
Fixes https://github.com/llvm/llvm-project/issues/58719
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D137553
After D138014 we do not support using AST with IR that is being
mutated. As such, we also no longer need to track unknown
instructions using WeakVH. Replace with AssertingVH to make sure
that they are not invalidated.
This restricts usage of AliasSetTracker to IR that does not change.
We used to use it during LICM where the underlying IR could change,
but remaining uses all use AST as part of a separate analysis phase.
This is split out from D137955, which makes use of the new guarantee
to switch to BatchAA.
Differential Revision: https://reviews.llvm.org/D138014
Implements the ThinLTO summary support for memprof related metadata.
This includes support for the assembly format, and for building the
summary from IR during ModuleSummaryAnalysis.
To reduce space in both the bitcode format and the in memory index,
we do 2 things:
1. We keep a single vector of all uniq stack id hashes, and record the
index into this vector in the callsite and allocation memprof
summaries.
2. When building the combined index during the LTO link, the callsite
and allocation memprof summaries are only kept on the FunctionSummary
of the prevailing copy.
Differential Revision: https://reviews.llvm.org/D135714
This reverts commit bf8381a8bc.
There is a layering violation: LLVMAnalysis depends on LLVMCore, so
LLVMCore should not include LLVMAnalysis header
llvm/Analysis/ModuleSummaryAnalysis.h
Enable using -module-summary with -S
(similarly to what currently can be achieved with opt <input> -o - | llvm-dis).
This is a recommit of ef9e62469.
Test plan: ninja check-all
Differential revision: https://reviews.llvm.org/D137768
In `MLInlineAdvisor::getAdviceImpl`, we call `getCachedFPI` twice, once
for the caller, once for the callee, so the second may invalidate the
reference obtained by the first because the underlying implementation of
the cache is a `DenseMap`. `std::map` doesn't have that problem.
For a min and max reduction idioms, the identity (i.e. neutral) element
should be datatype's highest and lowest possible values respectively.
Current implementation in IVDescriptors incorrectly returns -Inf for FMin
reduction and +Inf for FMax reduction. This patch fixes this bug which
was causing incorrect reduction computation results in loops vectorized
by LV.
Differential Revision: https://reviews.llvm.org/D137220
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.
The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.
High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.
Differential Revision: https://reviews.llvm.org/D135780
InstCombine does some basic store to load forwarding. One case it
currently misses is the case where the store is actually a memset.
This patch adds support for this case. This is a minimal
implementation that only handles a load at the memset base address,
without an offset.
GVN is already capable of performing this optimization. Having it
in InstCombine can help with phase ordering issues, similar to the
existing store to load forwarding.
Differential Revision: https://reviews.llvm.org/D137323
CVP currently only tries to simplify comparisons if there is a
constant operand. However, even if both are non-constant, we may
be able to determine the result of the comparison based on range
information.
IPSCCP is already capable of doing this, but because it runs very
early, it may miss some cases.
Differential Revision: https://reviews.llvm.org/D137253
Fix typo pointed out by Roman Divacky.
There should be no functional change, as the rest of the code will
return nullptr for undef anyway. The condition is just there for
clarity.
Use DL-aware ConstantFoldCompareInstOperands() API instead of
ConstantExpr API. The practical effect of this is that SCCP can
now fold comparisons that require DL.
Having all these instruction-specific overloads does not seem to
provide any compile-time benefit, so drop them in favor of the
generic methods accepting "const Instruction *". Only leave behind
the per-instruction AAQI overloads, which are part of the internal
implementation.
It was pointed out the verifier rejects inttoptr and ptrtoint casts with inputs and outputs whose scalability doesn't match. As such, checking the input type separately from the type of the cast itself is redundant.
This is a follow up to D136470 which extends the same scheme used there to ComputeNumSignBits and isKnownNonNull. As a reminder, for scalable vectors we track a single bit which is implicitly broadcast to all lanes. We do not know how many lanes there are statically, and thus have to be conservative along paths which require exact sizes.
Differential Revision: https://reviews.llvm.org/D137046
The pointsToConstantMemory() method returns true only if the memory pointed to
by the memory location is globally invariant. However, the LLVM memory model
also has the semantic notion of *locally-invariant*: memory that is known to be
invariant for the life of the SSA value representing that pointer. The most
common example of this is a pointer argument that is marked readonly noalias,
which the Rust compiler frequently emits.
It'd be desirable for LLVM to treat locally-invariant memory the same way as
globally-invariant memory when it's safe to do so. This patch implements that,
by introducing the concept of a *ModRefInfo mask*. A ModRefInfo mask is a bound
on the Mod/Ref behavior of an instruction that writes to a memory location,
based on the knowledge that the memory is globally-constant memory (in which
case the mask is NoModRef) or locally-constant memory (in which case the mask
is Ref). ModRefInfo values for an instruction can be combined with the
ModRefInfo mask by simply using the & operator. Where appropriate, this patch
has modified uses of pointsToConstantMemory() to instead examine the mask.
The most notable optimization change I noticed with this patch is that now
redundant loads from readonly noalias pointers can be eliminated across calls,
even when the pointer is captured. Internally, before this patch,
AliasAnalysis was assigning Ref to reads from constant memory; now AA can
assign NoModRef, which is a tighter bound.
Differential Revision: https://reviews.llvm.org/D136659
programUndefinedIfUndefOrPoison used to eagerly propagate the fact that
a value is poison to the users of the value. The problem is that if the
value has a lot of uses (orders of magnitude more than the scanning
limit we use in this function), then we spend the bulk of our time in
eagerly propagating the poison property, which we will mostly never use
later anyway due to the scanning limit.
I have a test case (of ~50k lines of machine generated C++), where this
results in ~60% of 35s compilation time being spent doing just this
eager propagation.
This patch changes programUndefinedIfUndefOrPoison to only propagate to
instructions actually visited, looking back to see if their operands are
poison. This should be equivalent and no functional change is intended,
but we regain virtually all of the 60% compilation time spent in this
function in my test case (i.e.: a 2.5x total compilation speedup).
Differential Revision: https://reviews.llvm.org/D137027
Rather than switching to a new AAQI instance with empty cache when
MayBeCrossIteration is toggled, include the value in the cache key.
The implementation redundantly include the information in both sides
of the pair, but that seems simpler than trying to store it only on
one side.
Differential Revision: https://reviews.llvm.org/D136175
We have similar code to translate a demanded elements mask for a shuffle's operands in multiple places - this patch adds a helper function to VectorUtils and updates a number of locations to use it directly.
Differential Revision: https://reviews.llvm.org/D136832
This extends the computeKnownBits analysis to support scalable vectors. The critical detail is in deciding how to represent the demanded elements of a vector whose length is unknown at compile time.
For this patch, I adopt the convention that we track one bit which corresponds to all lanes. That is, that bit is implicitly broadcast to all lanes of the scalable vector resulting in all lanes being demanded. This is the same convention we use in getSplatValue in SelectionDAG.
Note that this convention doesn't actually impact much. Most of the code is agnostic to the interpretation of the demanded elements, and the few cases which actually care need case by case handling anyways. In this patch, I just bail out of those cases.
A prior patch (D128159) proposed using a different convention in SDAG. I don't see any strong reason to prefer one scheme over the other, so I propose we go with this one as it's conceptually the simplest. Getting known and demanded bit optimizations unblocked at all is a significant win.
I've locally implemented this scheme in reasonable large parts of ValueTracking.cpp and SelectionDAG equivalents, and have not hit any blockers. If this is approved, I plan to post a series of patches plumbing this through all the relevant parts.
In the discussion on that patch, a preference was expressed for introducing some form of abstraction around the demanded elements. I'll note that I've played with several variations on that idea locally, and have yet to find anything which results in more readable code. If anyone has concrete ideas in this area, I'm happy to explore in follow up patches. I'd strongly prefer to be making API changes in NFC manner with tests in place.
Differential Revision: https://reviews.llvm.org/D136470
The writeonly attribute for memset_pattern16 (and other referenced
libcalls) is being added by InferFunctionAttrs nowadays. No need
to special-case it here.