This fixes 69 llvm tests that failed when EXPENSIVE_CHECKS was enabled.
llvm/test/Transforms/IROutliner/outlining-commutative-operands-opposite-order.ll
is one example.
When we have EXPENSIVE_CHECKS, _GLIBCXX_DEBUG is defined. This means
that libstdc++ will call the compare function to check if it is
implemented correctly (that !(a < a) is true).
This happens even if there is only one item and here, we expect
to see one return void or multiple return constant integer.
Don't sort if we have 1 item, but do assert that it is the 1
ret void we expect. In the comparator, assert that neither
Value is a nullptr in case one ended up in a the list somehow.
Reviewed By: AndrewLitteken
Differential Revision: https://reviews.llvm.org/D130230
Closing https://github.com/llvm/llvm-project/issues/56919
It is meaningless to preserve the lifetime markers for the spilled
allocas in the coroutine frames and it would block some optimizations
too.
A const reference is preferred over a non-null const pointer.
`Type *` is kept as is to match the other overload.
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D131197
COFF has a verifier check that private global variables don't have a comdat of the same name.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D131043
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.
Differential Revision: https://reviews.llvm.org/D131114
In contrast to AAPotentialValues, the constant values version can
contain implicit `undef` in the set. We had an assertion that could
misfire before. Handle it properly now.
As discussed in [0], this diff adds the `skipprofile` attribute to
prevent the function from being profiled while allowing profiled
functions to be inlined into it. The `noprofile` attribute remains
unchanged.
The `noprofile` attribute is used for functions where it is
dangerous to add instrumentation to while the `skipprofile` attribute is
used to reduce code size or performance overhead.
[0] https://discourse.llvm.org/t/why-does-the-noprofile-attribute-restrict-inlining/64108
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D130807
BoundsChecking uses ObjectSizeOffsetEvaluator to keep track of the
underlying size/offset of pointers in allocations. However,
ObjectSizeOffsetVisitor (something ObjectSizeOffsetEvaluator
uses to check for constant sizes/offsets)
doesn't quite treat sizes and offsets the same way as
BoundsChecking. BoundsChecking wants to know the size of the
underlying allocation and the current pointer's offset within
it, but ObjectSizeOffsetVisitor only cares about the size
from the pointer to the end of the underlying allocation.
This only comes up when merging two size/offset pairs. Add a new mode to
ObjectSizeOffsetVisitor which cares about the underlying size/offset
rather than the size from the current pointer to the end of the
allocation.
Fixes a false positive with -fsanitize=bounds.
Reviewed By: vitalybuka, asbirlea
Differential Revision: https://reviews.llvm.org/D131001
This is the 2nd patch of the two-patch series (D130188, D130189) that
fix PR56275 (https://github.com/llvm/llvm-project/issues/56275) which
is a missed opportunity for loop interchange.
As follow-up on the dependence analysis (DA) patch D130188, this patch
normalizes DA results in loop interchange, such that negative dependence
vectors queried by loop interchange are reversed to be non-negative.
Now all tests in PR56275 can get interchanged. Those tests are added
in lit test as `pr56275.ll`.
Reviewed By: kawashima-fj, bmahjour, Meinersbur, #loopoptwg
Differential Revision: https://reviews.llvm.org/D130189
During LTO a local promoted to a global gets a unique suffix based on
a hash of the module IR. This means that changes in the local's module
can affect the contents in another module that imported it (because the name
of the imported promoted local is changed, but that doesn't reflect a
real change in the importing module). So any tool that's
validating changes to the importing module will see a superficial change.
Instead of using the module hash, we can use the "source_filename" if it
exists to generate a unique identifier that doesn't change due to LTO
shenanigans.
Differential Revision: https://reviews.llvm.org/D128863
This is mostly a stylistic change to make the uniform memop widening cost
code fit more naturally with the sourounding code. Its not strictly
speaking NFC as I added in the store with invariant value case, and we
could in theory have a target where a gather/scatter is cheaper than a
single load/store... but it's probably NFC in practice. Note that the
scatter/gather result can still be overriden later if the result is
uniform-by-parts.
Mark ModRefInfo as a bitmask enum, which allows using normal
& and | operators on it. This supersedes various functions like
unionModRef() and intersectModRef(). I think this makes the code
cleaner than going through helper functions...
Differential Revision: https://reviews.llvm.org/D130870
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
This extends the handling of uniform memory operations to handle the case where a store is storing a loop invariant value. Unlike the general case of a store to an invariant address where we must use the last active lane, in this case we can use any lane since all lanes must produce the same result.
For context, the basic structure of the existing code and how the change fits in:
* First, we select a widening strategy. (The result is irrelevant for this patch.)
* Then we determine if a computation is uniform within all lanes of VF. (Note this is the uniform-per-part definition, not LAI's uniform across all unrolled iterations definition.)
* If it is, we overrule the widening strategy, and unconditionally scalarize.
* VPReplicationRecipe - which is what actually does the scalarization - knows how to handle unform-per-part values including for scalable vectors. However, we do need to know that the expression is safe to execute without predication - e.g. the uniform mem op was unconditional in the original loop. (This part was split off and already landed.)
An obvious question is why not simply implement the generic case? The answer is that I'm going to, but doing so without a canonicalization towards uniform causes regressions due to bad interaction with scalarization/uniformity of values feeding the uniform mem-op. This patch is needed to avoid those regressions.
Differential Revision: https://reviews.llvm.org/D130364
If we have interleave groups in the loop we want to vectorise then
we should fall back on normal vectorisation with a scalar epilogue. In
such cases when tail-folding is enabled we'll almost certainly go on to
create vplans with very high costs for all vector VFs and fall back on
VF=1 anyway. This is likely to be worse than if we'd just used an
unpredicated vector loop in the first place.
Once the vectoriser has proper support for analysing all the costs
for each combination of VF and vectorisation style, then we should
be able to remove this.
Added an extra test here:
Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
Differential Revision: https://reviews.llvm.org/D128342
Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd.
This patch does the following changes:
1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost.
2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul.
Differential Revision: https://reviews.llvm.org/D130868
Reflect in the pointer's offset the length of the leading part
of the consumed string preceding the first converted digit.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D130912
SimplifyCFG does some common code hoisting, which is limited to hoisting a
sequence of identical instruction in identical order and stops at the first
non-identical instruction.
This patch allows hoisting instruction pairs over same-length sequences of
non-matching instructions. The linear asymptotic complexity of the algorithm
stays the same, there's an extra parameter `simplifycfg-hoist-common-skip-limit`
serving to limit compilation time and/or the size of the hoisted live ranges.
The patch improves SPECv6/525.x264_r by about 10%.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D129370
This is a follow-up to 2ebfda2417
(replace "if" with "else if" since the cases nuw/nsw
were meant to be handled separately).
Test plan:
1/ ninja check-llvm check-clang check-lld
2/ Bootstrapped LLVM/Clang pass tests
The isa<Constant> check could misfire on an instruction with 2 constant
operands. This bug was introduced with bb789381fc (D36988).
See issue #56810 for a C source example that exposed the bug.
https://alive2.llvm.org/ce/z/3jYbEH
We should choose one of these forms, and the option that uses
the narrow type allows the motivating example from issue #56294
to reduce. In the best case (no 'not' needed and 'trunc' remains),
this does remove an instruction.
Note that there is what looks like a regression because there
is an existing canonicalization that turns trunc into and+icmp.
That is a long-standing transform, and I'm not sure what effect
reversing it would have.
We already had the reasoning about uniform mem op loads; if the address is accessed at least once, we know the instruction doesn't need predicated to ensure fault safety. For stores, we do need to ensure that the values visible in memory are the same with and without predication. The easiest sub-case to check for is that all the values being stored are the same. Since we know that at least one lane is active, this tells us that the value must be visible.
Warning on confusing terminology: "uniform" vs "uniform mem op" mean two different things here, and this patch is specific to the later. It would *not* be legal to make this same change for merely "uniform" operations.
Differential Revision: https://reviews.llvm.org/D130637
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D130012
This patch introduces the inline cost priority into the
module inliner, which uses the same computation as
InlineCost.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D130012
This works with any logic + extend:
https://alive2.llvm.org/ce/z/vzsqQD
The motivating case is from issue #56294, but that's still not optimal
(it should simplify completely).
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
I am playing with the LoopDataPrefetch pass and found out that it
bails to work with a pointer in a non-zero address space. This
patch adds the target callback to check if an address space is to
be considered for prefetching. Default implementation still only
allows address space 0, so this is NFCI.
This does not currently affect any known targets, but seems to be
generally useful for the future.
Differential Revision: https://reviews.llvm.org/D129795
The behaviour of this patch is not great, but it has some side-effects
that are required for OpenMPOpt to work. The problem is that when we use
`-mlink-builtin-bitcode` we only import used symbols from the runtime.
Then OpenMPOpt will insert calls to symbols that were not previously
included. This patch removed this implicit behaviour as these functions
were kept alive by the `noinline` simply because it kept calls to them
in the module. This caused regression in some tests that relied on some
OpenMPOpt passes without using LTO. Reverting for the LLVM15 release but
will try to fix it more correctly on main.
This reverts commit d61d72dae6.
Fixes#56752
Instructions between two adjacent loops will be hoisted above the first
loop, or sunk below the second to facilitate loop fusion. Hoisting will
be attempted for an instruction that dominates the first loop.
Otherwise, sinking this instructions will be attempted.
Instructions with side effects will not be considered for sinking or
hoisting. Hoisting/sinking of any instructions between loops will only
be performed if all the instructions can be moved. As well,
sinking/hoisting is considered for each instruction in isolation,
without taking into account sinking/hoisting decisions for other
instructions in the preheader.
Differential Revision: https://reviews.llvm.org/D118076
Added alloca optimization which was missed during the implemenation of D112098.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D130503
This is an alternate to D129155 that uses TTI.haveFastSqrt() to avoid a
potential miscompile for programs with reads of errno. Moving the transform
to AggressiveInstCombine provides access to TTI.
If a sqrt call has "nnan", that implies that the input argument is never
negative because sqrt of {negative number} --> NAN.
If the argument is never negative and the call can be lowered without a
libcall, then we can assume that errno accesses are unchanged after lowering,
so the call can be translated to the LLVM intrinsic (which is expected to
become inline code).
This affects codegen for targets like x86 that have sqrt instructions, but
still have to conservatively assume that a libcall may be needed to set
errno as shown in issue #52620 and issue #56383.
This patch won't solve those examples - we will need to extend this to use
CannotBeOrderedLessThanZero or similar, enhance that analysis for new
operators, and/or deal with llvm.assume too.
Differential Revision: https://reviews.llvm.org/D129167
WinEHPrepare marks any function call from EH funclets as unreachable, if it's not a nounwind intrinsic or has no proper funclet bundle operand. This
affects ARC intrinsics on Windows, because they are lowered to regular function calls in the PreISelIntrinsicLowering pass. It caused silent binary truncations and crashes during unwinding with the GNUstep ObjC runtime: https://github.com/gnustep/libobjc2/issues/222
This patch adds a new function `llvm::IntrinsicInst::mayLowerToFunctionCall()` that aims to collect all affected intrinsic IDs.
* Clang CodeGen uses it to determine whether or not it must emit a funclet bundle operand.
* PreISelIntrinsicLowering asserts that the function returns true for all ObjC runtime calls it lowers.
* LLVM uses it to determine whether or not a funclet bundle operand must be propagated to inlined call sites.
Reviewed By: theraven
Differential Revision: https://reviews.llvm.org/D128190
Turning on opaque pointers has uncovered an issue with WPD where we currently pattern match away `assume(type.test)` in WPD so that a later LTT doesn't resolve the type test to undef and introduce an `assume(false)`. The pattern matching can fail in cases where we transform two `assume(type.test)`s into `assume(phi(type.test.1, type.test.2))`.
Currently we create `assume(type.test)` for all virtual calls that might be devirtualized. This is to support `-Wl,--lto-whole-program-visibility`.
To prevent this, all virtual calls that may not be in the same LTO module instead use a new `llvm.public.type.test` intrinsic in place of the `llvm.type.test`. Then when we know if `-Wl,--lto-whole-program-visibility` is passed or not, we can either replace all `llvm.public.type.test` with `llvm.type.test`, or replace all `llvm.public.type.test` with `true`. This prevents WPD from trying to pattern match away `assume(type.test)` for public virtual calls when failing the pattern matching will result in miscompiles.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D128955
We previously used the `noinline` attributes to specify some defintions
which should be kept alive in the runtime. These were then stripped
immediately in the OpenMPOpt module pass. However, Since the changes in
D130298, we not explicitly state which functions will have external
visiblity in the bitcode library. Additionally the OpenMPOpt module pass
should run before the inliner pass, so this shouldn't make a difference
in whether or not the functions will be alive for the initial pass of
OpenMPOpt. This should simplify the interface, and additionally save
time spend on scanning funciton names for noinline.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D130368
In D129523, it was noted that there is are some questionable naked casts
from Instruction to BinaryOperator, which could be addressed by doing a
dyn_cast directly to BinaryOperator, avoiding the need for the later cast.
This cleans up that casting.
Reviewed By: nikic, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D130448
In D129523, it was noted that the approach to check whether a value can
have FastMathFlags was done in different ways, and they should be made
consistent. This patch makes minor changes to fix that.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D130408
If we look at a write, we should not enact the "has been written to"
logic introduced to avoid spurious write -> read dependences. Doing so
lead to elimination of stores we needed, which is obviously bad.
The name `getEntrySamples` was misleading for 2 reasons. One, it's
close in name to `Function::getEntryCount`, but the equivalent here is
`getHeadSamples`; second, as opposed to the other get* APIs in
`FunctionSamples`, it performs an estimate/heuristic rather than just
retrieving raw data (or a non-heuristic derivate off that data, like
`getMaxCountInside`)
The new name should more clearly communicate its intent; and, being
close (in name) to `getHeadSamples`, it should allow the reader discover
the relation between them.
Also updated the doc comments for both `getHeadSamples[Estimate]` so a
reader may better understand the relation between them.
Differential Revision: https://reviews.llvm.org/D130281
Reorganize the code to make it clear what is and isn't handle, and why.
Restructure bailout to remove (false and confusing) dependence on
CM_Scalarize; just return invalid cost and propagate, that's what it
is for.
The internalize pass supports an option to provide a list of symbols
that should not be internalized. THis is useful retaining certain
defintions that should be kept alive. However, this interface is
somewhat difficult to use as it requires knowing every single symbol's
name and specifying it. Many APIs provide common prefixes for the
symbols exported by the library, so it would make sense to be able to
match these using a simple glob pattern. This patch changes the handling
from a simple string comparison to a glob pattern match.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D130319
If a function is non-recursive we only performed intra-procedural
reasoning for reachability (via AA::isPotentiallyReachable). However,
if it is re-entrant that doesn't mean we can't reach. Instead of this
problematic logic in the reachability reasoning we utilize logic in
AAPointerInfo. If a location is for sure written by a function it can
be re-entrant or recursive we know only intra-procedural reasoning is
sufficient.
The existing code doesn't expect dummy values (undef, poison, null-derived
constants etc) as arguments of these intrinsics. However, they can be there
in unreached code. Currently we fail trying to find base for them.
Handle these cases separately. Return null as base for them to be consistent
with the handling in the main algorithm in findBaseDefiningValue.
Differential Revision: https://reviews.llvm.org/D129561
Reviewed By: apilipenko
If we have a dominating must-write access we do not need to know the
initial value of some object to perform reasoning about the potential
values. The dominating must-write has overwritten the initial value.
This code confuses LV's "Uniform" and LVL/LAI's "Uniform". Despite the
common name, these are different.
* LVs notion means that only the first lane *of each unrolled part* is
required. That is, lanes within a single unroll factor are considered
uniform. This allows e.g. widenable memory ops to be considered
uses of uniform computations.
* LVL and LAI's notion refers to all lanes across all unrollings.
IsUniformMem is in turn defined in terms of LAI's notion. Thus a
UniformMemOpmeans is a memory operation with a loop invariant address.
This means the same address is accessed in every iteration.
The tweaked piece of code was trying to match a uniform mem op (i.e.
fully loop invariant address), but instead checked for LV's notion of
uniformity. In theory, this meant with UF > 1, we could speculate
a load which wasn't safe to execute.
This ends up being mostly silent in current code as it is nearly
impossible to create the case where this difference is visible. The
closest I've come in the test case from 54cb87, but even then, the
incorrect result is only visible in the vplan debug output; before this
change we sink the unsafely speculated load back into the user's predicate
blocks before emitting IR. Both before and after IR are correct so the
differences aren't "interesting".
The other test changes are uninteresting. They're cases where LV's uniform
analysis is slightly weaker than SCEV isLoopInvariant.
This probably should have been part of D123089, but the effects of it
don't show up until we start removing functions from the table in
D130107. Oops.
Differential Revision: https://reviews.llvm.org/D130184
The InstCombine test is reduced from issue #56601. Without the more
liberal match for ConstantExpr, we try to rearrange constants in
Negator forever.
Alternatively, we could adjust the definition of m_ImmConstant to be
more conservative, but that's probably a larger patch, and I don't
see any downside to changing m_ConstantExpr. We never capture and
modify a ConstantExpr; transforms just want to avoid it.
Differential Revision: https://reviews.llvm.org/D130286
This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:
1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.
Currently the default option is "disabled", but this will be
changed in a later patch.
I've added new tests to show the options behave as expected here:
Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll
Differential Revision: https://reviews.llvm.org/D129560
Replace the value-accepting isReallocLikeFn() overload with a
getReallocatedOperand() function, which returns which operand is
the one being reallocated. Currently, this is always the first one,
but once allockind(realloc) is respected, the reallocated operand
will be determined by the allocptr parameter attribute.
Remove isFreeCall() in favor of getFreedOperand(). Replace the
two remaining uses with a getFreedOperand() != nullptr check, as
they only care that something is getting freed. (The usage in DSE
is correct as such. The allocator-related checks in CFLGraph look
rather questionable in general.)
Use getFreedOperand() instead of isFreeCall() to remove the
implicit assumption that any pointer operand to a free function
is the operand being freed. This won't actually matter until we
handle allockind(free).
We currently assume in a number of places that free-like functions
free their first argument. This is true for all hardcoded free-like
functions, but with the new attribute-based design, the freed
argument is supposed to be indicated by the allocptr attribute.
To make sure we handle this correctly once allockind(free) is
respected, add a getFreedOperand() helper which returns the freed
argument, rather than just indicating whether the call frees *some*
argument.
This migrates most but not all users of isFreeCall() to the new
API. The remaining users are a bit more tricky.
Reapply the patch with getObjectSize() replaced by getAllocSize().
The former will also look through calls that return their argument,
and we'll end up placing dereferenceable attributes on intrinsics
like llvm.launder.invariant.group. While this isn't wrong, it also
doesn't seem to be particularly useful. For now, use getAllocSize()
instead, which sticks closer to the original behavior of this code.
-----
This code is just interested in the allocsize, not any other
allocator properties.
We were quite conservative when it came to PHI node handling to avoid
recursive reasoning. Now we check more direct if we have seen a PHI
already or not. This allows non-recursive PHI chains to be handled.
This also exposed a bug as we did only model the effect of one loop
traversal. `phi_no_store_3` has been adapted to show how we would have
used `undef` instead of `1` before. With this patch we don't replace
it at all, which is expected as we do not argue about loop iterations
(or alignments).
If we only have exact accesses we should never require the bit-pattern
to be uniform (in this case 0). Only a non-exact access should force us
to require only 0 values.