Commit Graph

11972 Commits

Author SHA1 Message Date
Max Kazantsev 24c1d36bb5 [SCEV][NFC] Rename MaxNotTaken -> ConstantMaxNotTaken
We are going to introduce SymbolicMaxNotTaken, avoid mixing
things up.

Differential Revision: https://reviews.llvm.org/D138568
Reviewed By: lebedev.ri
2022-11-24 17:26:37 +07:00
Fangrui Song c2250d8bc0 [CSSPGO] Move cl::opt inside llvm:: after D100528 and D108342 2022-11-23 23:08:49 -08:00
Vasileios Porpodas af4e856fa7 [NFC] Replaced BB->getInstList().{erase(),pop_front(),pop_back()} with eraseFromParent().
Differential Revision: https://reviews.llvm.org/D138617
2022-11-23 22:47:46 -08:00
Aiden Grossman 65abca4611 [MLGO] Fix InlineAdvisor and ModelUnderTrainingRunner after hasValue removal
Recentlyin 4b6b248, llvm::Optional's hasValue method was removed as
described in
https://discourse.llvm.org/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor
This breaks InlineAdvisor and ModelUnderTrainingRunner. This patch fixes
them by changing the method to has_value, which hasValue was evaluating
to before.

Differential Revision: https://reviews.llvm.org/D138635
2022-11-24 03:48:34 +00:00
Fangrui Song f199f97c72 ModuleSummaryAnalysis: Internalize some cl::opt 2022-11-23 17:33:34 -08:00
Vasileios Porpodas 8b9a62ee49 [NFC] Use BB->size() instead of BB->getInstList().size().
Differential Revision: https://reviews.llvm.org/D138616
2022-11-23 17:25:53 -08:00
Max Kazantsev f5eeda037f [SCEV][NFC] Fix typo in comment 2022-11-23 14:38:24 +07:00
Max Kazantsev f285be6214 [SCEV][NFC] Introduce API for getting basic block's symbolic max exit count
Currently, it just returns exact exit count. This is a refectoring step
before it is actually implemented.
2022-11-22 16:02:58 +07:00
Max Kazantsev b803957bce [SCEV][NFC] Call getExitCount with SymbolicMaximum when computing loop symbolic max
Currently this is NFC, because SymbolicMaximum for BB is not implemented and just
reuses exact result. However, from code purity perspective, it's a necessary step
to do. Plans to implement symbolic max for blocks are underway.
2022-11-22 15:27:14 +07:00
Florian Hahn 5dad4c6788
[SCEV] Iteratively compute ranges for deeply nested expressions.
At the moment, getRangeRef may overflow the stack for very deeply nested
expressions.

This patch introduces a new getRangeRefIter function, which first builds
a worklist of N-ary expressions and phi nodes, followed by their
operands iteratively.

getRangeRef has been extended to also take a Depth argument and it
switches to use getRangeRefIter once the depth reaches a certain
threshold.

This ensures compile-time is not impacted in general. Note that
the iterative algorithm may lead to a slightly different evaluation
order, which could result in slightly worse ranges for cyclic phis.

https://llvm-compile-time-tracker.com/compare.php?from=23c3eb7cdf3478c9db86f6cb5115821a8f0f5f40&to=e0e09fa338e77e53242bfc846e1484350ad79773&stat=instructions

Fixes #49579.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D130728
2022-11-21 21:56:14 +00:00
Kazu Hirata 5fef791cb9 [Analysis] Remove getInverseMinMaxPred
The last uses of getInverseMinMaxPred were removed on February 24,
2022 in commit 5423b0a525.
2022-11-20 13:30:19 -08:00
Kazu Hirata 7b91798a5d [Analysis] Use llvm::Optional::value_or (NFC) 2022-11-19 21:11:10 -08:00
Kazu Hirata 5d1ae6346b [Analysis] Teach getOptionalIntLoopAttribute to return std::optional (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-19 15:36:50 -08:00
Matt Arsenault 841a0edd03 ConstantFolding: Constant fold some canonicalizes
+/-0 is obviously foldable. Other non-special, non-subnormal
values are also probably OK. For denormal values, check
the calling function's denormal mode. For now, don't fold
denormals to the input for IEEE mode because as far as I know
the langref is still pretending LLVM's float isn't IEEE.

Also folds undef to 0, although NaN may make more sense. Skips
folding nans and infinities, although it should be OK to fold those
in a future change.
2022-11-18 10:35:19 -08:00
Roman Lebedev be1f994311
[Analysis] `isSafeToLoadUnconditionally()`: `lifetime` intrinsics can be ignored
In practice this means that we can speculate more loads in SROA.
This e.g. comes up in https://godbolt.org/z/G8716s6sj,
although we are missing second half of the puzzle to optimize that.
2022-11-17 20:48:27 +03:00
Stanislav Mekhanoshin bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Matt Arsenault af7e80b7cb ValueTracking: Look through copysign in isKnownNeverInfinity 2022-11-17 08:52:09 -08:00
Matt Arsenault 29e4363dda ValueTracking: Look through fneg in isKnownNeverNaN 2022-11-17 08:35:49 -08:00
luxufan 3053a323c7 [MemorySSA] Relax assert condition in createDefinedAccess
If globals-aa is enabled, because of the deletion of memory instructions, there
may be call instruction that is not in ModOrRefSet but is a MemoryUseOrDef.
This causes the crash in the process of cloning uses and defs.

Fixes https://github.com/llvm/llvm-project/issues/58719

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D137553
2022-11-17 23:43:20 +08:00
Matt Arsenault ba1669c81f ValueTracking: Look through fabs and fneg in isKnownNeverInfinity 2022-11-17 00:06:15 -08:00
Matt Arsenault d24fe812ec ValueTracking: Look through canonicalize in isKnownNeverInfinity 2022-11-17 00:06:15 -08:00
Teresa Johnson 9eacbba290 Restore "[MemProf] ThinLTO summary support" with more fixes
This restores commit 98ed423361 and
follow on fix 00c22351ba, which were
reverted in 5d938eb6f7 due to an
MSVC bot failure. I've included a fix for that failure.

Differential Revision: https://reviews.llvm.org/D135714
2022-11-16 09:42:41 -08:00
Matt Arsenault ad3c91eedc MemoryBuiltins: Don't check for unsized allocas
The verifier rejects these.
2022-11-16 09:13:11 -08:00
Jeremy Morse 5d938eb6f7 Revert "Restore "[MemProf] ThinLTO summary support" with fixes"
This reverts commit 00c22351ba.
This reverts commit 98ed423361.

Seemingly MSVC has some kind of issue with this patch, in terms of linking:

  https://lab.llvm.org/buildbot/#/builders/123/builds/14137

I'll post more detail on D135714 momentarily.
2022-11-16 11:21:02 +00:00
Matt Arsenault fde4ef1973 InstSimplify: Fold arithmetic_fence as idempotent 2022-11-15 22:29:34 -08:00
Teresa Johnson 98ed423361 Restore "[MemProf] ThinLTO summary support" with fixes
This restores 4745945500, which was
reverted in commit 452a14efc8, along with
fixes for a couple of bot failures.
2022-11-15 08:55:17 -08:00
Nikita Popov 8051c1db95 [AST] Don't use WeakVH for unknown insts (NFCI)
After D138014 we do not support using AST with IR that is being
mutated. As such, we also no longer need to track unknown
instructions using WeakVH. Replace with AssertingVH to make sure
that they are not invalidated.
2022-11-15 16:46:05 +01:00
Teresa Johnson 452a14efc8 Revert "[MemProf] ThinLTO summary support"
This reverts commit 4745945500.

Revert while I try to fix a couple of non-Linux build failures.
2022-11-15 07:39:40 -08:00
Nikita Popov db5855d0e4 [AST] Restrict AliasSetTracker to immutable IR
This restricts usage of AliasSetTracker to IR that does not change.
We used to use it during LICM where the underlying IR could change,
but remaining uses all use AST as part of a separate analysis phase.

This is split out from D137955, which makes use of the new guarantee
to switch to BatchAA.

Differential Revision: https://reviews.llvm.org/D138014
2022-11-15 16:27:28 +01:00
Teresa Johnson 4745945500 [MemProf] ThinLTO summary support
Implements the ThinLTO summary support for memprof related metadata.

This includes support for the assembly format, and for building the
summary from IR during ModuleSummaryAnalysis.

To reduce space in both the bitcode format and the in memory index,
we do 2 things:
1. We keep a single vector of all uniq stack id hashes, and record the
   index into this vector in the callsite and allocation memprof
   summaries.
2. When building the combined index during the LTO link, the callsite
   and allocation memprof summaries are only kept on the FunctionSummary
   of the prevailing copy.

Differential Revision: https://reviews.llvm.org/D135714
2022-11-15 06:45:12 -08:00
Fangrui Song 77bf0df376 Revert "[opt][clang] Enable using -module-summary/-flto=thin with -S/-emit-llvm"
This reverts commit bf8381a8bc.

There is a layering violation: LLVMAnalysis depends on LLVMCore, so
LLVMCore should not include LLVMAnalysis header
llvm/Analysis/ModuleSummaryAnalysis.h
2022-11-14 15:51:03 -08:00
Alexander Shaposhnikov bf8381a8bc [opt][clang] Enable using -module-summary/-flto=thin with -S/-emit-llvm
Enable using -module-summary with -S
(similarly to what currently can be achieved with opt <input> -o - | llvm-dis).
This is a recommit of ef9e62469.

Test plan: ninja check-all

Differential revision: https://reviews.llvm.org/D137768
2022-11-14 23:24:08 +00:00
Sanjay Patel 0a37290dc8 [InstSimplify] restrict logic fold with partial undef vector
https://alive2.llvm.org/ce/z/4ncsnX

Fixes #58977
2022-11-14 12:09:23 -05:00
Nikita Popov b71bf080d0 [AA] Move MayBeCrossIteration into AAQI (NFC)
Move the MayBeCrossIteration flag from BasicAA into AAQI. This is
in preparation for exposing it to users of the AA API.
2022-11-14 16:42:22 +01:00
Nikita Popov 458ae539df [AST] Remove legacy AliasSetPrinter pass
A NewPM version of this pass exists, drop the legacy version of
this testing-only pass.
2022-11-14 15:50:38 +01:00
Florian Hahn 758699c399
[VectorUtils] Skip interleave members with diff type and alloca sizes.
Currently, codegen doesn't support cases where the type size doesn't
match the alloc size. Skip them for now.

Fixes #58722.
2022-11-13 22:06:20 +00:00
Matt Arsenault 4f2f7e84ff Analysis: Reorder code in isDereferenceableAndAlignedPointer
GEPs should be the most common and basic case, so try that first.
2022-11-11 16:38:51 -08:00
Sanjay Patel 21f1b2da95 [InstSimplify] fold fsub nnan with Inf operand
Similar to fbc2c8f2fb, but if we have a non-canonical
fsub with constant operand 1, then flip the sign of the
Infinity:
https://alive2.llvm.org/ce/z/vKWfhW

If Infinity is operand 0, then the sign remains:
https://alive2.llvm.org/ce/z/73d97C
2022-11-11 08:42:44 -05:00
Sanjay Patel fbc2c8f2fb [InstSimplify] fold X +nnan Inf
If we exclude NaN (and therefore the opposite Inf),
anything plus Inf is Inf:
https://alive2.llvm.org/ce/z/og3dj9
2022-11-10 17:13:26 -05:00
Matt Arsenault 7dd27a75a2 InstSimplify: Fold fdiv nnan ninf x, 0 -> poison
https://alive2.llvm.org/ce/z/JxX5in
2022-11-07 08:43:22 -08:00
Nikita Popov a50c269c73 [InstCombine] Handle load smaller than one byte in memset forward
APInt::getSplat() requires that the new size is >= the original
one. If we're loading less than 8 bits, truncate instead.

Fixes https://github.com/llvm/llvm-project/issues/58845.
2022-11-07 17:04:27 +01:00
David Green b46427b9a2 [InstSimplify] (~A & B) | ~(A | B) --> ~A with logical and
According to https://alive2.llvm.org/ce/z/opsdrb, it is valid to convert
(~A & B) | ~(A | B) --> ~A even if the And is a Logical And. This came
up from the vector masking of predicated blocks.

Differential Revision: https://reviews.llvm.org/D137435
2022-11-07 10:03:18 +00:00
Mircea Trofin 5617fb1411 [MLGO][NFC] Use std::map instead of DenseMap to avoid use after free
In `MLInlineAdvisor::getAdviceImpl`, we call `getCachedFPI` twice, once
for the caller, once for the callee, so the second may invalidate the
reference obtained by the first because the underlying implementation of
the cache is a `DenseMap`. `std::map` doesn't have that problem.
2022-11-04 16:07:24 -07:00
Nikita Popov 2f211f865d [LVI] Improve debug message (NFC) 2022-11-04 16:58:02 +01:00
Karthik Senthil d9c52c31a0 [LV][IVDescriptors] Fix recurrence identity element for FMin and FMax reductions
For a min and max reduction idioms, the identity (i.e. neutral) element
should be datatype's highest and lowest possible values respectively.
Current implementation in IVDescriptors incorrectly returns -Inf for FMin
reduction and +Inf for FMax reduction. This patch fixes this bug which
was causing incorrect reduction computation results in loops vectorized
by LV.

Differential Revision: https://reviews.llvm.org/D137220
2022-11-04 10:39:37 -04:00
Nikita Popov 304f1d59ca [IR] Switch everything to use memory attribute
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.

The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.

High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.

Differential Revision: https://reviews.llvm.org/D135780
2022-11-04 10:21:38 +01:00
Nikita Popov 2ddcf721a0 [InstCombine] Perform memset -> load forwarding
InstCombine does some basic store to load forwarding. One case it
currently misses is the case where the store is actually a memset.
This patch adds support for this case. This is a minimal
implementation that only handles a load at the memset base address,
without an offset.

GVN is already capable of performing this optimization. Having it
in InstCombine can help with phase ordering issues, similar to the
existing store to load forwarding.

Differential Revision: https://reviews.llvm.org/D137323
2022-11-03 16:03:57 +01:00
Nikita Popov 68b24c3b44 [CVP] Simplify comparisons without constant operand
CVP currently only tries to simplify comparisons if there is a
constant operand. However, even if both are non-constant, we may
be able to determine the result of the comparison based on range
information.

IPSCCP is already capable of doing this, but because it runs very
early, it may miss some cases.

Differential Revision: https://reviews.llvm.org/D137253
2022-11-03 15:35:27 +01:00
Nikita Popov 96a74c4527 [ValueLattice] Fix typo in condition (NFC)
Fix typo pointed out by Roman Divacky.

There should be no functional change, as the rest of the code will
return nullptr for undef anyway. The condition is just there for
clarity.
2022-11-02 17:52:13 +01:00
Nikita Popov 134bda4b61 [ValueLattice] Use DL-aware folding in getCompare()
Use DL-aware ConstantFoldCompareInstOperands() API instead of
ConstantExpr API. The practical effect of this is that SCCP can
now fold comparisons that require DL.
2022-11-02 10:41:11 +01:00
Nikita Popov 28b31d9ccc [ValueLattice] Move getCompare() out of line (NFC)
This is a fairly large method that is unlikely to benefit from
inlining.
2022-11-02 10:33:44 +01:00
Nikita Popov 41dba9e6a3 [AA] Remove some overloads (NFC)
Having all these instruction-specific overloads does not seem to
provide any compile-time benefit, so drop them in favor of the
generic methods accepting "const Instruction *". Only leave behind
the per-instruction AAQI overloads, which are part of the internal
implementation.
2022-11-02 10:21:10 +01:00
Philip Reames 9472a810ed Address post commit review feedback from D137046
It was pointed out the verifier rejects inttoptr and ptrtoint casts with inputs and outputs whose scalability doesn't match.  As such, checking the input type separately from the type of the cast itself is redundant.
2022-11-01 13:36:13 -07:00
Philip Reames 2e999b7dd1 Allow scalable vectors in ComputeNumSignBits and isKnownNonNull
This is a follow up to D136470 which extends the same scheme used there to ComputeNumSignBits and isKnownNonNull. As a reminder, for scalable vectors we track a single bit which is implicitly broadcast to all lanes. We do not know how many lanes there are statically, and thus have to be conservative along paths which require exact sizes.

Differential Revision: https://reviews.llvm.org/D137046
2022-11-01 09:29:42 -07:00
Nikita Popov 45143240b2 [AA] Add missing const qualifier (NFC) 2022-11-01 16:17:18 +01:00
Patrick Walton 01859da84b [AliasAnalysis] Introduce getModRefInfoMask() as a generalization of pointsToConstantMemory().
The pointsToConstantMemory() method returns true only if the memory pointed to
by the memory location is globally invariant. However, the LLVM memory model
also has the semantic notion of *locally-invariant*: memory that is known to be
invariant for the life of the SSA value representing that pointer. The most
common example of this is a pointer argument that is marked readonly noalias,
which the Rust compiler frequently emits.

It'd be desirable for LLVM to treat locally-invariant memory the same way as
globally-invariant memory when it's safe to do so. This patch implements that,
by introducing the concept of a *ModRefInfo mask*. A ModRefInfo mask is a bound
on the Mod/Ref behavior of an instruction that writes to a memory location,
based on the knowledge that the memory is globally-constant memory (in which
case the mask is NoModRef) or locally-constant memory (in which case the mask
is Ref). ModRefInfo values for an instruction can be combined with the
ModRefInfo mask by simply using the & operator. Where appropriate, this patch
has modified uses of pointsToConstantMemory() to instead examine the mask.

The most notable optimization change I noticed with this patch is that now
redundant loads from readonly noalias pointers can be eliminated across calls,
even when the pointer is captured. Internally, before this patch,
AliasAnalysis was assigning Ref to reads from constant memory; now AA can
assign NoModRef, which is a tighter bound.

Differential Revision: https://reviews.llvm.org/D136659
2022-10-31 13:03:41 -07:00
Philip Reames 93798fb740 Address post commit style comment from 087bb0f 2022-10-31 11:16:14 -07:00
Geza Lore d5e59e99f4 [ValueTracking] Improve performance of programUndefinedIfUndefOrPoison (NFC)
programUndefinedIfUndefOrPoison used to eagerly propagate the fact that
a value is poison to the users of the value. The problem is that if the
value has a lot of uses (orders of magnitude more than the scanning
limit we use in this function), then we spend the bulk of our time in
eagerly propagating the poison property, which we will mostly never use
later anyway due to the scanning limit.

I have a test case (of ~50k lines of machine generated C++), where this
results in ~60% of 35s compilation time being spent doing just this
eager propagation.

This patch changes programUndefinedIfUndefOrPoison to only propagate to
instructions actually visited, looking back to see if their operands are
poison. This should be equivalent and no functional change is intended,
but we regain virtually all of the 60% compilation time spent in this
function in my test case (i.e.: a 2.5x total compilation speedup).

Differential Revision: https://reviews.llvm.org/D137027
2022-10-31 10:20:11 +01:00
Nikita Popov efbb4d0245 [BasicAA] Include MayBeCrossIteration in cache key
Rather than switching to a new AAQI instance with empty cache when
MayBeCrossIteration is toggled, include the value in the cache key.

The implementation redundantly include the information in both sides
of the pair, but that seems simpler than trying to store it only on
one side.

Differential Revision: https://reviews.llvm.org/D136175
2022-10-31 09:59:42 +01:00
Philip Reames 35a1161c24 [ValueTracking] Assert known bits sanity in isKnownNonZero
These are the same asserts we have in other query routines; cover this interface too.
2022-10-30 10:53:52 -07:00
Simon Pilgrim 55a11b542e [VectorUtils] Add getShuffleDemandedElts helper
We have similar code to translate a demanded elements mask for a shuffle's operands in multiple places - this patch adds a helper function to VectorUtils and updates a number of locations to use it directly.

Differential Revision: https://reviews.llvm.org/D136832
2022-10-30 17:03:55 +00:00
Philip Reames 087bb0f1fe Allow scalable vectors in computeKnownBits
This extends the computeKnownBits analysis to support scalable vectors. The critical detail is in deciding how to represent the demanded elements of a vector whose length is unknown at compile time.

For this patch, I adopt the convention that we track one bit which corresponds to all lanes. That is, that bit is implicitly broadcast to all lanes of the scalable vector resulting in all lanes being demanded. This is the same convention we use in getSplatValue in SelectionDAG.

Note that this convention doesn't actually impact much. Most of the code is agnostic to the interpretation of the demanded elements, and the few cases which actually care need case by case handling anyways. In this patch, I just bail out of those cases.

A prior patch (D128159) proposed using a different convention in SDAG. I don't see any strong reason to prefer one scheme over the other, so I propose we go with this one as it's conceptually the simplest. Getting known and demanded bit optimizations unblocked at all is a significant win.

I've locally implemented this scheme in reasonable large parts of ValueTracking.cpp and SelectionDAG equivalents, and have not hit any blockers. If this is approved, I plan to post a series of patches plumbing this through all the relevant parts.

In the discussion on that patch, a preference was expressed for introducing some form of abstraction around the demanded elements. I'll note that I've played with several variations on that idea locally, and have yet to find anything which results in more readable code. If anyone has concrete ideas in this area, I'm happy to explore in follow up patches. I'd strongly prefer to be making API changes in NFC manner with tests in place.

Differential Revision: https://reviews.llvm.org/D136470
2022-10-30 08:44:37 -07:00
Nikita Popov 8e5f57d738 [BasicAA] Remove redundant libcall handling
The writeonly attribute for memset_pattern16 (and other referenced
libcalls) is being added by InferFunctionAttrs nowadays. No need
to special-case it here.
2022-10-27 12:01:33 +02:00
Haojian Wu 41b1669ca5 Fix a -Wunused-const-variable warning. 2022-10-27 10:51:28 +02:00
Nikita Popov 6c269a3f89 [BasicAA] Replace VisitedPhiBBs with a single flag
When looking through phis, BasicAA has to guard against the
possibility that values from two separate cycle iterations are
being compared -- in this case, even though the SSA values may
be the same, they cannot be considered as equal.

This is currently done by keeping a set of VisitedPhiBBs for any
phis we looked through, and then checking whether the relevant
instruction is reachable from one of the phis.

This patch replaces this set with a single flag. If the flag is
set, then we will not assume equality for any instruction part
of a cycle. While this is nominally less accurate, it makes
essentially no difference in practice. Here are the AA stats
for test-suite:

    aa.NumMayAlias  |   3072005 |  3072016
    aa.NumMustAlias |    337858 |   337854
    aa.NumNoAlias   |  13255345 | 13255349

The motivation for the change is to expose the MayBeCrossIteration
flag to AA users, which will allow fixing miscompiles related to
incorrect handling of cross-iteration AA queries.

Differential Revision: https://reviews.llvm.org/D136174
2022-10-27 10:29:41 +02:00
Philip Reames 269bc684e7 [LV][RISCV] Disable vectorization of epilogue loops
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.

In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.

In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.

As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.

As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.

Differential Revision: https://reviews.llvm.org/D136695
2022-10-25 14:28:02 -07:00
Kazu Hirata 3f8d2c917c Ensure newlines at the end of files (NFC) 2022-10-22 09:29:40 -07:00
Arthur Eubanks 4153f989ba [ObjCARC] Remove legacy PM versions of optimization passes
This doesn't touch objc-arc-contract because that's in the codegen pipeline.
However, this does move its corresponding initialize function into initializeCodegen().

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D135041
2022-10-21 13:40:54 -07:00
Nikita Popov eb470e67c1 [ModuleSummaryAnalysis] Use helper methods to check readnone/readonly (NFC)
This makes sure that this code continue working when switching to
the memory attribute.

A caveat here is that onlyReadsMemory() will also true for readnone.
To be conservative, I'm explicitly excluding that case here.
2022-10-21 12:18:57 +02:00
Paul Walker ab8257ca0e [NFC] Fix a few whitespace inconsistencies. 2022-10-20 14:52:25 +00:00
Florian Hahn 1625224fbb
[SCEV] Replace assert with returning CouldNotComp in computeMaxBECountForLT.
This patch removes the bail out for signed predicates and non-positive
strides in howManyLessThans and updates computeMaxBECountForLT to return
SCEVCouldNotCompute for signed predicates with negative strides.

AFAICT bail-out was only added because computeMaxBECountForLT may not
handle negative signed strides correctly. Instead of not calling
computeMaxBECountForLT at all because we bail out earlier, we can
instead return SCEVCouldNotCompute in computeMaxBECountForLT.

The max backedge taken count will be computed as the max value of the
symbolic backedge taken count.

This improves precision in cases where we can compute symbolic backedge
taken counts and also fixes a crash.

Fixes #57818.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D135667
2022-10-19 11:24:10 +01:00
Nikita Popov 747f27d97d [AA] Rename getModRefBehavior() to getMemoryEffects() (NFC)
Follow up on D135962, renaming the method name to match the new
type name.
2022-10-19 11:03:54 +02:00
Nikita Popov 1a9d9823c5 [AA] Rename uses of FunctionModRefBehavior (NFC)
Followup to D135962 to rename remaining uses of
FunctionModRefBehavior to MemoryEffects. Does not touch API names
yet, but also updates variables names FMRB/MRB to ME, to match the
new type name.
2022-10-19 10:54:47 +02:00
Arthur Eubanks 743087fb63 Port print-cfg-sccs to new pass manager
This is actually used, see https://discourse.llvm.org/t/use-print-callgrapg-sccs-from-opt/65782.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D135718
2022-10-18 08:47:08 -07:00
Florian Hahn a8e9742bd4
[IndVarSimplify] Clear block and loop dispositions after moving instr.
Moving an instruction can invalidate the cached block dispositions of
the corresponding SCEV. Invalidate the cached dispositions.

Also fixes a copy-paste error in forgetBlockAndLoopDispositions where
the start expression S was removed from BlockDispositions in the loop
but not the current values. This was also exposed by the new test case.

Fixes #58439.
2022-10-18 16:18:14 +01:00
Nikita Popov d06131fda2 [AST] Pass BatchAA to mergeSetIn() (NFCI) 2022-10-18 16:54:55 +02:00
Nikita Popov e162a73e41 [CFG] Add const qualifier to isPotentiallyReachableFromMany() (NFC)
Accept a const pointer for StopBB. Unfortunately the worklist has
to use non-const pointers due to LoopInfo interaction.
2022-10-18 10:06:07 +02:00
Daniel Sanders 021e6e05d3 [instsimplify] Move (extelt (inselt Vec, Value, Index), Index) -> Value from InstCombine
As requested in https://reviews.llvm.org/D135625#3858141

Differential Revision: https://reviews.llvm.org/D136099
2022-10-17 15:22:06 -07:00
Nikita Popov ac74e7a780 [InstSimplify] Only check self-simplify in simplifyInstruction()
InstSimplify currently checks whether the instruction simplifies
back to itself, and returns undef in that case. Generally, this
should only occur in unreachable code.

However, this was also done for the simplifyInstructionWithOperands()
API. In that case, the instruction only serves as a template that
provides the opcode and other non-operand data. In this case,
simplifying back to the same "instruction" may be expected. This
caused PR58401 in conjunction with D134954.

As such, move this check into simplifyInstruction() only. The only
other caller of simplifyInstructionWithOperands() also handles the
self-simplification case explicitly.
2022-10-17 15:52:38 +02:00
Nikita Popov 436fb27186 [BasicAA] Support loop phis in pointsToConstantMemory()
When looking for underlying objects, if we encounter one that we
have already seen, then we should skip it (as it has already been
checked) rather than bail out. In particular, this adds support
for the case where we have a loop use of a phi recurrence.
2022-10-17 12:34:55 +02:00
Florian Hahn 16cf666bb7
[Loop] Move block and loop dispo invalidation to makeLoopInvariant.
makeLoopInvariant may recursively move its operands to make them
invariant, before moving the passed in instruction. Those recursively
moved instructions are currently missed when invalidating block and loop
dispositions.

To address this, move the invalidation code to Loop::makeLoopInvariant.

Fixes #58314.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D135909
2022-10-14 21:58:14 +01:00
Nikita Popov 237b962031 [BasicAA] Account for cycles when checking for same select condition
If we have translated across a cycle backedge, the same SSA value
for the condition might be referring to two different loop iterations.
Use the isValueEqualInPotentialCycles() helper to avoid assuming
equality in that case.
2022-10-14 10:37:40 +02:00
Nikita Popov 03f9d0ff22 [TBAA] Model call accessing immutable type as readnone
Accesses to constant memory are not observable and should be
reported as readnone, not readonly. This is consistent with what
we do for normal (non-call) instructions: For those, the TBAA
metadata will result in pointsToConstantMemory() returning true,
which will then result in a NoModRef result, not a Ref result.

Differential Revision: https://reviews.llvm.org/D135864
2022-10-14 10:08:37 +02:00
Jacob Hegna 17095dfe36 Move interpreter check before modifying the allocation type. 2022-10-12 19:50:36 +00:00
Jacob Hegna 9d93a98f85 [MLGO] Force persistency in tflite buffers.
When training large models, we encounter use-after-free bugs when
writing to the input tensors for various MLGO models. This patch fixes the
issue by marking the tensors as "persistent".

Differential Revision: https://reviews.llvm.org/D135739
2022-10-12 19:50:36 +00:00
Arthur Eubanks 60e4af7ab8 [CallGraph] Port -print-callgraph-sccs to new pass manager
And remove the legacy opt-specific pass.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D135487
2022-10-11 14:43:16 -07:00
Nikita Popov 884bb97dca [MustExec][LICM] Handle latch being part of an inner cycle (PR57780)
The algorithm in allLoopPathsLeadToBlock() does not handle the case
where the loop latch is part of the predecessor set correctly: In
this case, we may take the backedge (escaping to a different loop
iteration) and not execute other latch successors. This can happen
if the latch is part of an inner cycle.

Fixes https://github.com/llvm/llvm-project/issues/57780.

Differential Revision: https://reviews.llvm.org/D134279
2022-10-11 09:30:13 +02:00
Florian Hahn 4b599fa1ee
[SCEV] Verify block disposition cache.
This extends the existing SCEV verification to catch cache invalidation
issues as in #57837.

The validation logic is similar to the recently added loop disposition
cache validation in bb68b2402d.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D134531
2022-10-10 20:42:19 +01:00
Shubham Narlawar b920407cf5 [LICM] Disable thread-safety checks in single-thread model
If the single-thread model is used, or the
-licm-force-thread-model-single flag is specified, skip checks
related to thread-safety. This means that store promotion for
conditionally executed stores only requires proof of
dereferenceability and writability, but not of thread-safety. For
example, this enables promotion of stores to (non-constant) globals,
as well as captured allocas.

Fixes https://github.com/llvm/llvm-project/issues/50537.

Differential Revision: https://reviews.llvm.org/D130466
2022-10-10 16:51:16 +02:00
Florian Hahn 19ad1cd5ce
Recommit "[SCEV] Support clearing Block/LoopDispositions for a single value."
This reverts commit 92f698f01f.

The updated version of the patch includes handling for non-SCEVable
types. A test case has been added in ec86e9a99b.
2022-10-07 20:15:44 +01:00
Florian Hahn 92f698f01f
Revert "[SCEV] Support clearing Block/LoopDispositions for a single value."
This reverts commit 9e931439dd.

This commit causes a crash when TSan, e.g. with
https://lab.llvm.org/buildbot/#/builders/70/builds/28309/steps/10/logs/stdio

Reverting while I extract a reproducer and submit a fix.
2022-10-07 17:58:54 +01:00
Florian Hahn 9e931439dd
[SCEV] Support clearing Block/LoopDispositions for a single value.
Extend forgetBlockAndLoopDisposition to allow clearing information for a
single value. This can be useful when only a single value is changed,
e.g. because the instruction is moved.

We also need to clear the cached values for all SCEV users, because they
may depend on the starting value's disposition.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D134614
2022-10-07 16:07:17 +01:00
Bjorn Pettersson 01e1f32971 [ValueTracking][SimplifyLibCalls] Fix bug in getConstantDataArrayInfo for wchar_t
When SimplifyLibCalls is dealing with wchar_t (e.g. optimizing wcslen)
it uses ValueTracking helpers with a CharSize/ElementSize that isn't
8, but rather 16 or 32 (to match with the size in bits of a wchar_t).

Problem I've seen is that llvm::getConstantDataArrayInfo is taking
both an "ElementSize" argument (basically indicating size of a
char/element in bits) and an "Offset" which afaict is an offset
in the unit "number of elements". Then it also use
stripAndAccumulateConstantOffsets to get a "StartIdx" which afaict
is calculated in bytes. The returned Slice.Length is based on
arithmetics that add/subtract variables that are having different
units (bytes vs elements). Most notably I think the "StartIdx" must
be scaled using the "ElementSize" to get correct results.

The symptom of the above problem was seen in the wcslen-1.ll test
case which miscompiled.

This patch is supposed to resolve the bug by converting between
bytes and elements when needed.

Differential Revision: https://reviews.llvm.org/D135263
2022-10-07 15:29:32 +02:00
Nikita Popov ccf53cae32 [ValueTracking] Remove unused Offset argument in getConstantStringInfo() (NFC) 2022-10-07 11:35:55 +02:00
Nikita Popov c5bf452022 [AA] Pass AAResults through AAQueryInfo
Currently, AAResultBase (from which alias analysis providers inherit)
stores a reference back to the AAResults aggregation it is part of,
so it can perform recursive alias analysis queries via
getBestAAResults().

This patch removes the back-reference from AAResultBase to AAResults,
and instead passes the used aggregation through the AAQueryInfo.
This can be used to perform recursive AA queries using the full
aggregation.

Differential Revision: https://reviews.llvm.org/D94363
2022-10-06 10:10:19 +02:00
Nikita Popov 6053b37e45 [AA] Thread AAQI through getModRefBehavior() (NFC)
This is in preparation for D94363, as we will need AAQI to
perform the recursive call to the function variant.
2022-10-06 09:57:42 +02:00
Sanjay Patel 0a1210e482 [InstSimplify] try harder to fold fmul with 0.0 operand
https://alive2.llvm.org/ce/z/oShzr3

This was noted as a missing fold in D134876 (with additional
examples based on issue #58046).

I'm assuming that fmul with a zero operand is rare enough
that the use of ValueTracking will not noticeably increase
compile-time.

This adjusts a PowerPC codegen test that was added with D88388
because it would get folded away and no longer provide coverage
for the bug fix.
2022-10-04 11:20:01 -04:00
Sanjay Patel 7f7a0f2f83 [InstSimplify] reduce code duplication for fmul folds; NFC
This is a modification of the earlier attempt from:
7b7940f9da

For fma callers, we only want to swap a 0.0 or 1.0 constant.
2022-10-04 10:29:53 -04:00
Nikita Popov 6e504d637d [ValueTracking] Handle constant exprs in isKnownNonZero()
Handle constant expressions by falling through to the general
operator-based code. In particular, this adds support for bitcast
and GEP expressions.
2022-10-04 11:58:07 +02:00
Nikita Popov 45dec8f5fd [ValueTracking] Avoid known bits fallthrough for freeze (NFCI)
The known bits logic should never produce a better result than
the direct recursive non-zero query here, so skip the fallthrough.
2022-10-04 11:02:31 +02:00