The existing predicate doesn't work for a single-element
vector, so make sure we are not crossing scalar/vector types.
Test (was crashing) based on the post-commit example for:
4827771234
When X is a power-of-two or zero and zero input is poison:
ctlz(i32 X) ^ 31 --> cttz(X)
cttz(i32 X) ^ 31 --> ctlz(X)
https://alive2.llvm.org/ce/z/Cs7sFE
Need either follow the original order of the operands for bool logical
ops, or emit freeze instruction to avoid poison propagation.
Differential Revision: https://reviews.llvm.org/D126877
This patch fixes an issue in which CorrelatedValuePropagation::processSRem
would create new instructions to represent the SRem instruction, but would not
correctly copy any existing debug location metadata to the new instruction.
Differential Revision: https://reviews.llvm.org/D132218
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D132585
Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs.
This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in 8201e3ef5c.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D129997
Originally landed as
commit 08860f525a ("[Local] Allow creating callbr with duplicate successors")
Reverted in
commit 1cf6b93df1 ("Revert "[Local] Allow creating callbr with duplicate successors"")
This is NOT nfc. Specifically, the following behavior changes:
* Pointers are now allowed. Both uniform, and constants.
* FP uniform non-constants can now be recognized.
* FP undefs are no longer considered constant. This matches int behavior which we had tests for. FP behavior was untested. Its not clear to me int behavior is reasonable, but it's what tests seem to expect, so go with minimum impact for now.
This simplifies the code and fixes handling for the callbr case,
where the instruction needs to be inserted in the normal
destination, rather than after the terminator.
Originally part of D129660.
Transforms occasionally want to insert an instruction directly
after the definition point of a value. This involves quite a few
different edge cases, e.g. for phi nodes the next insertion point
is not the next instruction, and for invokes and callbrs its not
even in the same block. Additionally, the insertion point may not
exist at all if catchswitch is involved.
This adds a general Instruction::getInsertionPointAfterDef() API to
implement the necessary logic. For now it is used in two places
where this should be mostly NFC. I will follow up with additional
uses where this fixes specific bugs in the existing implementations.
Differential Revision: https://reviews.llvm.org/D129660
For (X op Y) op Z --> (Y op Z) op X
we can still do transform when Y is multi-use. In D131356 limit it to one-use,
this patch remove this limit.
This is still not a complete solution, I add a todo test to show it.
In this case, X and Y are both multi use, we can't differentiate how to convert based on this.
But at least we don't make the code worse,and it can solve half the scenarios.
The pointer operands for the ScatterVectorize node may contain
non-instruction values and they are not checked for "already being
vectorized". Need to check that such pointers are already vectorized and
gather them instead of trying to build vectorize node to avoid compiler
crash.
Differential Revision: https://reviews.llvm.org/D132949
Removed EnableFP parameter in getOperandInfo function since it is not
needed, the operands kinds also controlled by the operation code, which
allows to remove extra check for the type of the operands. Also, added
analysis for uniform constant float values.
This change currently does not trigger any changes in the code since TTI
does not do analysis for constant floats, so it can be considered NFC.
Tested with llvm-test-suite + SPEC2017, no changes.
Differential Revision: https://reviews.llvm.org/D132886
We aleady support the transform: `(X+C1)*CI -> X*CI+C1*CI`
Here the case is a little special as the form of `(X+C1)*CI` is transformed into `(X|C1)*CI`,
so we should also support the transform: `(X|C1)*CI -> X*CI+C1*CI`
Fixes https://github.com/llvm/llvm-project/issues/57278
Reviewed By: bcl5980, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D132658
Taking the example from the test included in this patch:
$ cat test.cpp -n
1 void fun(int *a, int cond) {
2 if (cond)
3 a[1] = 1;
4 else
5 a[1] = 2;
6 }
mldst-motion will merge and sink the stores in if.then and if.else into
if.end. The resultant PHI, gep and store should be attributed line zero
with the innermost common scope rather than picking a debug location from
one of the original stores.
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D132741
Current implementation promotes a non-cold function in the SampleFDO profile
into a hot function in the FDO profile. This is too aggressive. This patch
promotes a hot functions in the SampleFDO profile into a hot function, and a
warm function in SampleFDO into a warm function in FDO.
Differential Revision: https://reviews.llvm.org/D132601
I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.
This patch changes order of searching for reductions vs other vectorization possibilities.
The idea is if we do not match a reduction it won't be harmful for further attempts to
find vectorizable operations on a vector build sequences. But doing it in the opposite
order we have good chance to ruin opportunity to match a reduction later.
We also don't want to try vectorizing binary operations too early as 2-way vectorization
may effectively prohibit wider ones leading to producing less effective code.
Differential Revision: https://reviews.llvm.org/D132590
This fixes https://github.com/llvm/llvm-project/issues/57336. It was exposed by a recent SCEV change, but appears to have been a long standing issue.
Note that the whole insert into the loop instead of a split exit edge is slightly contrived to begin with; it's there solely because IndVarSimplify preserves the CFG.
Differential Revision: https://reviews.llvm.org/D132571
When estimating the cost of the in-tree vectorized scalars in
buildvector sequences, need to take into account the vectorized
insertelement instruction. The top of the buildvector seuences is the
topmost vectorized insertelement instruction, because it will have
> than 1 use after the vectorization.
For the affected test case improves througput from 21 to 16 (per
llvm-mca).
Differential Revision: https://reviews.llvm.org/D132740
https://alive2.llvm.org/ce/z/j_8Wz9
The arithmetic shift was converted to logical shift with:
246078604c
That does not seem to uncover any other missing/conflicting folds,
so convert directly to signbit test + cast.
We still need to fold the pattern with logical shift to test + cast.
This allows reducing patterns where the output type is not
the same as the input value:
https://alive2.llvm.org/ce/z/nydwFVFixes#57394
The use of std::clamp should be safe here. MinRZ is at most 32, while
kMaxRZ is 1 << 18, so we have MinRZ <= kMaxRZ, avoiding the undefind
behavior of std::clamp.
This addresses a suggestion to simplify the check from D131989. This
also makes it easier to ensure that VPHeaderPHIRecipe::classof checks
for all header phi ids.
This patch replaces calls to greatestCommonDivisor with std::gcd where
both arguments are known to be of unsigned. This means that
std::common_type_t of the two argument types should just be the wider
one of the two.
This patch replaces calls to GreatestCommonDivisor64 with std::gcd
where both arguments are known to be of unsigned types no larger than
64 bits in size.
Add verification that VPHeaderPHIRecipes are only in header VPBBs. Also
adds missing checks for VPPointerInductionRecipe to
VPHeaderPHIRecipe::classof.
Split off from D119661.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D131989
If the shift constant has undefined lanes, we can assume those
are the same as the defined lanes in these transforms:
https://alive2.llvm.org/ce/z/t6TTJ2
Replace undef with poison in the test while here to support
the transition away from undef.
If constant shadown enabled we had false reports because
!isZeroValue() does not guaranty that the values is actually not zero.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D132761
MisExpect was occasionally crashing under SampleProfiling, due to a division by zero.
We worked around that in D124302 by changing the assert to an early return.
This patch is intended to add a test case for the crashing scenario and
re-enable MisExpect for SampleProfiling.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D124481
I'd extracted isUniform, and Florian moved isUniformAfterVectorization out of VPlan at basically the same time. Let's go ahead and merge them.
For the VPTransformState::get path, a VPValue without a def (which corresponds to an external IR value outside of VPLan) is explicitly handled above the uniform check. On the scalarizeInstruction path, I'm less sure why the change isn't visible, but test cases which would seem likely to hit it were already being handled as uniform through some other mechanism. It would be correct to consider values defined outside of vplan uniform here.
With this commit, we now attach an `DISubprogram` to the LLVM-generated
`_NoopCoro_ResumeDestroy` function. Thereby, lldb can show a
`std::coroutine_handle` to a `std::noop_coroutine` as
```
continuation = coro frame = 0x555555560d98 {
resume = 0x0000555555555c50 (a.out`__NoopCoro_ResumeDestroy)
destroy = 0x0000555555555c50 (a.out`__NoopCoro_ResumeDestroy)
}
```
instead of
```
continuation = coro frame = 0x555555560d98 {
resume = 0x0000555555555c50 (a.out`___lldb_unnamed_symbol211)
destroy = 0x0000555555555c50 (a.out`___lldb_unnamed_symbol211)
}
```
I renamed the function from `NoopCoro.ResumeDestroy` to
`_NoopCoro_ResumeDestroy` because:
* the leading `_` makes sure this is a reserved name and should not
clash with any user-provided names
* the `.` was replaced by a `_`, so the name is now a valid identifier
in C, making it allows me to type its name in the debugger
Differential Revision: https://reviews.llvm.org/D132580
Adds a pass ExpandLargeDivRem to expand div/rem instructions
with more than 128 bits into a loop computing that value.
As discussed on https://reviews.llvm.org/D120327, this approach has the advantage
that it is independent of the runtime library. This also helps the clang driver,
which otherwise would need to understand enough about the runtime library
to know whether to allow _BitInts with more than 128 bits.
Targets are still free to disable this pass and instead provide a faster
implementation in a runtime library.
Fixes https://github.com/llvm/llvm-project/issues/44994
Differential Revision: https://reviews.llvm.org/D126644
Fixes https://github.com/llvm/llvm-project/issues/57221.
This limits the tryWidenCondBranchToCondBranch transform making it
work only if the false block of widenable condition branch
has no successors.
If that block has successors, then SimplifyCondBranchToCondBranch
may undo the transform done by tryWidenCondBranchToCondBranch, which
would lead to infinite cycle of transformation and eventually
an assert failing.
Differential Revision: https://reviews.llvm.org/D132356
Closing https://github.com/llvm/llvm-project/issues/57339
The root cause for this issue is an pre-mature optimization to eliminate
the index for the final suspend point since we feel like we can judge
if a coroutine is suspended at the final suspend by if resume_fn_addr is
null. However this is not true if the coroutine exists via an exception
in promise.unhandled_exception(). According to
[dcl.fct.def.coroutine]p14:
> If the evaluation of the expression promise.unhandled_exception()
> exits via an exception, the coroutine is considered suspended at the
> final suspend point.
But from the perspective of the implementation, we can't set the coro
index to the final suspend point directly since it breaks the states.
To fix the issue, we block the optimization if we find there is any
unwind coro end, which indicates that it is possible that the coroutine
exists via an exception from promise.unhandled_exception().
Test Plan: folly
Since SCEV learned to look through single value phis with
20d798bd47, whenever we add
a new input to a Phi, we should make sure that the old cached
value is dropped. Otherwise, it may lead to various miscompiles,
such as breach of dominance as shown in the bug
https://github.com/llvm/llvm-project/issues/57335
This patch complete TODO left in D66965, and achieve
related pattern for bitreverse.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D132431
The goal is to separate collecting items for post-processing
and processing them. Post processing also outlined as
dedicated method.
Differential Revision: https://reviews.llvm.org/D132603
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.
Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.
KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.
A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:
```
.c:
int f(void);
int (*p)(void) = f;
p();
.s:
.4byte __kcfi_typeid_f
.global f
f:
...
```
Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.
As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.
Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.
Relands 67504c9549 with a fix for
32-bit builds.
Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay
Differential Revision: https://reviews.llvm.org/D119296
The SROA algorithm won't work for Scalable Vectors, since we don't
know how many bytes are loaded/stored. Bail out if a Scalable
Vector is seen.
Differential Revision: https://reviews.llvm.org/D132417
~(A * C1) + A --> (A * (1 - C1)) - 1
This is a non-obvious mix of bitwise logic and math:
https://alive2.llvm.org/ce/z/U7ACVT
The pattern may be produced by Negator from the more typical
code seen in issue #57255.
The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a
forward-edge control flow integrity scheme for indirect calls. It
uses a !kcfi_type metadata node to attach a type identifier for each
function and injects verification code before indirect calls.
Unlike the current CFI schemes implemented in LLVM, KCFI does not
require LTO, does not alter function references to point to a jump
table, and never breaks function address equality. KCFI is intended
to be used in low-level code, such as operating system kernels,
where the existing schemes can cause undue complications because
of the aforementioned properties. However, unlike the existing
schemes, KCFI is limited to validating only function pointers and is
not compatible with executable-only memory.
KCFI does not provide runtime support, but always traps when a
type mismatch is encountered. Users of the scheme are expected
to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi`
operand bundle to indirect calls, and LLVM lowers this to a
known architecture-specific sequence of instructions for each
callsite to make runtime patching easier for users who require this
functionality.
A KCFI type identifier is a 32-bit constant produced by taking the
lower half of xxHash64 from a C++ mangled typename. If a program
contains indirect calls to assembly functions, they must be
manually annotated with the expected type identifiers to prevent
errors. To make this easier, Clang generates a weak SHN_ABS
`__kcfi_typeid_<function>` symbol for each address-taken function
declaration, which can be used to annotate functions in assembly
as long as at least one C translation unit linked into the program
takes the function address. For example on AArch64, we might have
the following code:
```
.c:
int f(void);
int (*p)(void) = f;
p();
.s:
.4byte __kcfi_typeid_f
.global f
f:
...
```
Note that X86 uses a different preamble format for compatibility
with Linux kernel tooling. See the comments in
`X86AsmPrinter::emitKCFITypeId` for details.
As users of KCFI may need to locate trap locations for binary
validation and error handling, LLVM can additionally emit the
locations of traps to a `.kcfi_traps` section.
Similarly to other sanitizers, KCFI checking can be disabled for a
function with a `no_sanitize("kcfi")` function attribute.
Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay
Differential Revision: https://reviews.llvm.org/D119296
When rebasing the review which became f79214d1, I forgot to adjust for the changed semantics introduced by 531dd3634. Functionally, this had no impact, but semantically it resulted in an incorrect result for isPredicatedInst. I noticed this while doing a follow up change.
This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors.
The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others. This one was chosen mostly because it was straight forward, and none of the others seemed oviously better.
I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches.
Differential Revision: https://reviews.llvm.org/D130164
The diff modifies ext-tsp code layout algorithm in the following ways:
(i) fixes merging of cold block chains (this is a port of D129397);
(ii) adjusts the cost model utilized for optimization;
(iii) adjusts some APIs so that the implementation can be used in BOLT; this is
a prerequisite for D129895.
The only non-trivial change is (ii). Here we introduce different weights for
conditional and unconditional branches in the cost model. Based on the new model
it is slightly more important to increase the number of "fall-through
unconditional" jumps, which makes sense, as placing two blocks with an
unconditional jump next to each other reduces the number of jump instructions in
the generated code. Experimentally, this makes a mild impact on the performance;
I've seen up to 0.2%-0.3% perf win on some benchmarks.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D129893
The stronger one-use checks prevented transforms like this:
(x * y) + x --> x * (y + 1)
(x * y) - x --> x * (y - 1)
https://alive2.llvm.org/ce/z/eMhvQa
This is one of the IR transforms suggested in issue #57255.
This should be better in IR because it removes a use of a
variable operand (we already fold the case with a constant
multiply operand).
The backend should be able to re-distribute the multiply if
that's better for the target.
Differential Revision: https://reviews.llvm.org/D132412
The existing cost model for fixed-order recurrences models the phi as an
extract shuffle of a v1 vector. The shuffle produced should be a splice,
as they take two vectors inputs are extracting from a subset of the
lanes. On certain architectures the existing cost model can drastically
under-estimate the correct cost for the shuffle, so this changes it to a
SK_Splice and passes a correct Mask through to the getShuffleCost call.
I believe this might be the first use of a SK_Splice shuffle cost model
outside of scalable vectors, and some targets may require additions to
the cost-model to correctly account for them. In tree targets appear to
all have been updated where needed.
Differential Revision: https://reviews.llvm.org/D132308
The array size specification of the an alloca can be any integer,
so zext or trunc it to intptr before attempting to multiply it
with an intptr constant.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D131846
We were passing the type of `Val` to `getShadowOriginPtr`, rather
than the type of `Val`'s shadow resulting in broken IR. The fix
is simple.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D131845
This change came in a few hours ago and introduced a warning. The fix
is trivial, so I'm providing it. The original change was reviewed here:
https://reviews.llvm.org/D132331
For mingw target, if a symbol is not marked DSO local, a `.refptr` is
generated for it. This makes CFG check calls use an extra pointer
dereference, which adds extra overhead compared to the MSVC version,
so mark the CFG guard check funciton pointer DSO local to stop it.
This should have no effect on MSVC target.
Also adapt the existing cfguard tests to run for mingw targets, so that
this change is checked.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D132331
Canonicalize ((x + C1) & C2) --> ((x & C2) + C1) for suitable constants
C1 and C2, instead of the other way round. This should allow more
constant ADDs to be matched as part of addressing modes for loads and
stores.
Differential Revision: https://reviews.llvm.org/D130080
Don't demand low order bits from the LHS of an Add if:
- they are not demanded in the result, and
- they are known to be zero in the RHS, so they can't possibly
overflow and affect higher bit positions
This is intended to avoid a regression from a future patch to change
the order of canonicalization of ADD and AND.
Differential Revision: https://reviews.llvm.org/D130075
This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.
This completes the client side transition to the OperandValueInfo version of this routine. Backend TTI implementations still use the prior versions for now.
OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling. We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so.
This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works. Target TTI implementations still use the split flags. I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.
Initial implementation had too weak requirements to positive/negative
range crossings. Not crossing zero with nuw is not enough for two reasons:
- If ArLHS has negative step, it may turn from positive to negative
without crossing 0 boundary from left to right (and crossing right to
left doesn't count for unsigned);
- If ArLHS crosses SINT_MAX boundary, it still turns from positive to
negative;
In fact we require that ArLHS always stays non-negative or negative,
which an be enforced by the following set of preconditions:
- both nuw and nsw;
- positive step (looks liftable);
Because of positive step, boundary crossing is only possible from left
part to the right part. And because of no-wrap flags, it is guaranteed
to never happen.
(X op Y) op Z --> (Y op Z) op X
This isn't a complete solution (see TODO tests for possible refinements),
but it shows some nice wins and doesn't seem to cause any harm. I think
the most potential danger is from conflicting with other folds and causing
an infinite loop - that's the reason for avoiding patterns with constant
operands.
Alternatively, we could try this in the reassociate pass, but we would not
immediately see all of the logic folds that instcombine provides. I also
looked at improving ValueTracking's isImpliedCondition() (and we should
still add some enhancements there), but that would not work in general for
bitwise logic reduction.
The tests that reduce completely to 0/-1 are motivated by issue #56653.
Differential Revision: https://reviews.llvm.org/D131356
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.
Differential Revision: https://reviews.llvm.org/D132287
SLP vectorizer tries to find the reductions starting the operands of the
instructions with no-users/void returns/etc. But such operands can be
postponable instructions, like Cmp, InsertElement or InsertValue. Such
operands still must be postponed, vectorizer should not try to vectorize
them immediately.
Differential Revision: https://reviews.llvm.org/D131965
In many cases constant buildvector results in a vector load from a
constant/data pool. Need to consider this cost too.
Differential Revision: https://reviews.llvm.org/D126885
(~A | C) | (A ^ B) --> ~(A & B) | C
https://alive2.llvm.org/ce/z/Qw3aiJ
This extends the existing fold (just above the new match)
to peek through another 'or' instruction.
This should let the motivating case from issue #57174
simplify completely.
Use `std::clamp` directly from the standard library now that LLVM is built with
C++17 standards mode.
Differential Revision: https://reviews.llvm.org/D131869
If the incoming previous value of a fixed-order recurrence is a phi in
the header, go through incoming values from the latch until we find a
non-phi value. Use this as the new Previous, all uses in the header
will be dominated by the original phi, but need to be moved after
the non-phi previous value.
At the moment, fixed-order recurrences are modeled as a chain of
first-order recurrences.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D119661
The code would use first non-phi instruction as an insertion point, however
this could lead to freeze getting inserted between phi and landingpad
causing a verifier assert.
Differential Revision: https://reviews.llvm.org/D132105
This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is *not* an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it *not* to be a predicated instruction, but *were* considering it to be scalable with predication.
As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely. I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious.
Differential Revision: https://reviews.llvm.org/D131093
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.
Original Patch by @samparker (Sam Parker)
Differential Revision: https://reviews.llvm.org/D79483
NewGVN tables are not cleared out between the initial run of NewGVN and the verification. In case of phi-of-ops optimization, OpSafeForPHIOfOps goes out of sync between the two runs. One operand might not be safe for one basic block, but it might be safe for one of its successors. In this case, the operand will be added in OpSafeForPHIOfOps map. In verification phase, we reuse OpSafeForPHIOfOps without updating it again. As a result, the operand will be considered safe for phi-of-ops optimization even for the case that it is not. This patch fixes this problem.
Fix for 53807.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D130910
In D131869 we noticed that we jump through some hoops because we parse the
tolerance option used in MisExpect.cpp into a 64-bit integer. This is
unnecessary, since the value can only be in the range [0, 100).
This patch changes the underlying type to be 32-bit from where it is
parsed in Clang through to it's use in LLVM.
Reviewed By: jloser
Differential Revision: https://reviews.llvm.org/D131935
If a function only has a few instructions, instrumentation can significantly increase the size and performance overhead of that function. Add the `-pgo-function-size-threshold` option to select a size threshold so these small functions are not instrumented.
A similar option `-fxray-instruction-threshold=<N>` is used for XRay to reduce binary size overhead [1].
[1] https://www.llvm.org/docs/XRay.html
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D131816
Currently, clang ignores the 0 initialisation in finite math
For example:
```
double f_prod = 0;
double arr[1000];
for (size_t i = 0; i < 1000; i++) {
f_prod *= arr[i];
}
```
Clang will ignore that `f_prod` is set to zero and it will generate assembly to iterate over the loop.
Reviewed By: fhahn, spatel
Differential Revision: https://reviews.llvm.org/D131672
We don't have a dominator tree in this pass, so we
can't bail out sooner by checking for unreachable
code, but this is a minimal fix for the example in
issue #56875.
Currently, we try to vectorize values, feeding into stores, only if
slp-vectorize-hor-store option is provided. We can safely enable
vectorization of the value operand of a single store in the basic block,
if the operand value is used only in store.
It should enable extra vectorization and should not increase compile
time significantly.
Fixes https://github.com/llvm/llvm-project/issues/51320
Differential Revision: https://reviews.llvm.org/D131894
This should fix these build bot errors:
Step 6 (build-check-mlir-build-only) failure: build (failure)
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(124): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(129): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1386): warning C4996: 'llvm::Optional<llvm::fp::ExceptionBehavior>::getValue': Use value instead.
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\Transforms\Scalar\EarlyCSE.cpp(1388): warning C4996: 'llvm::Optional<llvm::RoundingMode>::getValue': Use value instead.
Previously we would only CSE constrained FP intrinsics in the default
floating point environment. Exception behavior of "strict" is still not
allowed since we are not allowed to remove any traps in that case.
There are no restrictions on CSE across function calls inside a function.
Differential Revision: https://reviews.llvm.org/D112256
Using the legacy PM for the optimization pipeline is deprecated and in
the process of being removed. This is a small step in that direction.
For an example of migrating to the new PM:
853b57fe80
The basic patterns look like this:
https://alive2.llvm.org/ce/z/MDj9EC
The tests have a use of the overflow value too.
Otherwise, existing folds should reduce already.
This was noted as a missing IR fold in:
926e7312b2
Hopefully, this makes it easier to implement a backend
fix because we should get the same IR regardless of
whether the source used builtins or inline code.