This is an unusual canonicalization because we create an extra instruction,
but it's likely better for analysis and codegen (similar reasoning as D133399).
InstCombine::Negator may create this kind of multiply from negate and shift,
but this should not conflict because of the narrow negation.
I don't know how to create a fully general proof for this kind of transform in
Alive2, but here's an example with bitwidths similar to one of the regression
tests:
https://alive2.llvm.org/ce/z/J3jTjR
Differential Revision: https://reviews.llvm.org/D133667
We don't combine generic shuffles together in IR, but select
shuffles are a special-case because a select shuffle of a
select shuffle is just another select shuffle; codegen is
expected to efficiently lower those (select shuffles are also
the canonical form of a vector select with constant condition).
This is a test to verify that we do not crash with the
problem noted in issue #57986. The root problem should
be fixed with a prior change to InstSimplify.
For the case where the constant is a power of two rather than zero,
the fold is incorrect, because it fails to check that the bit set
in the LHS matches the bit in the RHS.
Rather than fixing this, remove the power of two handling entirely,
as a different fold will already canonicalize such comparisons to
use a zero constant.
Fixes https://github.com/llvm/llvm-project/issues/57899.
Perform the simplifyWithOpReplaced() fold even for non-bool
selects. This subsumes a number of recently added folds for
zext/sext of the condition.
We still need to manually handle variations with both sext/zext
and not, because simplifyWithOpReplaced() only performs one
level of replacements.
We can handle vectors inside simplifyWithOpReplaced(), as long as
cross-lane operations are excluded. The equality can hold (or not
hold) for each vector lane independently, so we shouldn't use the
replacement value from other lanes.
I believe the only operations relevant here are shufflevector (where
all previous bugs were seen) and calls (which might use shuffle-like
intrinsics and would require more careful classification).
Differential Revision: https://reviews.llvm.org/D134348
`(A * -2**C) + B --> B - (A << C)`
https://alive2.llvm.org/ce/z/A6BWkf
This inverts what Negator was doing before:
D134310 / 0f32a5dea0
Analysis and codegen are generally better without multiply,
so we should favor this form even if we trade add for sub
(because those are generally equivalent cost operations).
This stops Negator from transforming:
`C1 - shl X, C2 --> mul X, (1<<C2) + C1`
...in the general case. There does not seem to be any analysis
benefit to using mul in IR, and there's definitely downside in
codegen (particularly when the multiply has to be expanded).
If `C1` is 0, then there's a stronger argument that the single
mul is a better canonicalization than negate-of-shl, but we may
want to remove that too.
This was noted as a potential conflict for D133667.
Differential Revision: https://reviews.llvm.org/D134310
These patterns were previously only implemented for i1 type but can be extended for any integer type by also handling zext and sext operands.
Differential Revision: https://reviews.llvm.org/D134142
If one of the operands in a matrix multiplication is negated we can optimise the equation by moving the negation to the smallest element of the operands or the result.
Reviewed By: spatel, fhahn
Differential Revision: https://reviews.llvm.org/D133300
This was originally part of D133788. There are no visible
regressions. All of the diffs show a large unsigned constant
becoming a small negative constant. This should be better
for analysis (and slightly less compile-time) and codegen.
This is a disguised sign-bit test with offset:
(X / +DivC) >> (Width - 1) --> ext (X <= -DivC)
(X / -DivC) >> (Width - 1) --> ext (X >= +DivC)
https://alive2.llvm.org/ce/z/cO8JO4
We don't match/test poison in the sdiv constant because
that would be immediate undefined behavior.
This is safe when the mul does not overflow:
https://alive2.llvm.org/ce/z/LedVVP
This could be extended to handle non-zero compare constants
and non-squared multiplies.
This patch enables a multi-use demanded bits fold (motivated by issue #57576):
https://alive2.llvm.org/ce/z/DsZakh
This mimics transforms that we already do on the single-use path.
Originally, this patch did not include the last part to form a constant, but
that can be removed independently to reduce risk. It's not clear what the
effect of either change will be when viewed end-to-end.
This is expected to be neutral or a slight win for compile-time.
See the "add-demand2" series for experimental timing results:
https://llvm-compile-time-tracker.com/?config=NewPM-O3&stat=instructions&remote=rotateright
Differential Revision: https://reviews.llvm.org/D133788
Fixes#57531
This transformation may be particularly useful on x86-64,
because x & (x - 1) can be performed by a single blsr instruction.
Differential Revision: https://reviews.llvm.org/D133362
The LLVM performance tips suggest that allocas should be placed at the
beginning of the entry block. So far, llvm doesn’t provide any helper to
find that position.
Add BasicBlock::getFirstNonPHIOrDbgOrAlloca and IRBuilder::SetInsertPointPastAllocas(Function*)
that get an insert position after the (static) allocas at the start of a
function and use it in ShadowStackGCLowering.
Differential Revision: https://reviews.llvm.org/D132554
(trunc (1 << Y) to iN) == 2**C --> Y == C
(trunc (1 << Y) to iN) != 2**C --> Y != C
https://alive2.llvm.org/ce/z/xnFPo5
Follow-up to d9e1f9d759. This was a suggested
enhancement mentioned in issue #51889.
LLVM contains a helpful function for getting the size of a C-style
array: `llvm::array_lengthof`. This is useful prior to C++17, but not as
helpful for C++17 or later: `std::size` already has support for C-style
arrays.
Change call sites to use `std::size` instead.
Differential Revision: https://reviews.llvm.org/D133429
(trunc (1 << Y) to iN) == 0 --> Y u>= N
(trunc (1 << Y) to iN) != 0 --> Y u< N
These can be generalized in several ways as noted by the TODO
items, but this handles the pattern in the motivating bug report.
Fixes#51889
Differential Revision: https://reviews.llvm.org/D115480
This transform came up as a potential DAGCombine in D133282,
so I wanted to see how it escaped in IR too.
We do general folds in InstCombiner::SimplifySelectsFeedingBinaryOp()
by checking if either arm of a select simplifies when the trailing
binop is threaded into the select.
So as long as one side simplifies, it's a good fold to combine a
negate and add into 1 subtract.
This is an example with a zero arm in the select:
https://alive2.llvm.org/ce/z/Hgu_Tj
And this models the tests with a cancelling 'not' op:
https://alive2.llvm.org/ce/z/BuzVV_
Differential Revision: https://reviews.llvm.org/D133369
This pattern is handled more generally in SimplifySelectsFeedingBinaryOp().
Tests to confirm that added to the add.ll test file in the previous commit.
This reverts commit c911befaec.
It has broken LLDB Arm/AArch64 Linux buildbots. I dont really understand
the underlying reason. Reverting for now make buildbot green.
https://reviews.llvm.org/D133036
When we do extractvalue (any_mul_with_overflow X, -1) --> (-X and icmp),
which left partly failed to match vector constant with poison element.
This patch try to fix it.
Alive2: https://alive2.llvm.org/ce/z/2rGp_3
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D132996
The existing predicate doesn't work for a single-element
vector, so make sure we are not crossing scalar/vector types.
Test (was crashing) based on the post-commit example for:
4827771234
When X is a power-of-two or zero and zero input is poison:
ctlz(i32 X) ^ 31 --> cttz(X)
cttz(i32 X) ^ 31 --> ctlz(X)
https://alive2.llvm.org/ce/z/Cs7sFE
Transforms occasionally want to insert an instruction directly
after the definition point of a value. This involves quite a few
different edge cases, e.g. for phi nodes the next insertion point
is not the next instruction, and for invokes and callbrs its not
even in the same block. Additionally, the insertion point may not
exist at all if catchswitch is involved.
This adds a general Instruction::getInsertionPointAfterDef() API to
implement the necessary logic. For now it is used in two places
where this should be mostly NFC. I will follow up with additional
uses where this fixes specific bugs in the existing implementations.
Differential Revision: https://reviews.llvm.org/D129660
For (X op Y) op Z --> (Y op Z) op X
we can still do transform when Y is multi-use. In D131356 limit it to one-use,
this patch remove this limit.
This is still not a complete solution, I add a todo test to show it.
In this case, X and Y are both multi use, we can't differentiate how to convert based on this.
But at least we don't make the code worse,and it can solve half the scenarios.
We aleady support the transform: `(X+C1)*CI -> X*CI+C1*CI`
Here the case is a little special as the form of `(X+C1)*CI` is transformed into `(X|C1)*CI`,
so we should also support the transform: `(X|C1)*CI -> X*CI+C1*CI`
Fixes https://github.com/llvm/llvm-project/issues/57278
Reviewed By: bcl5980, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D132658
https://alive2.llvm.org/ce/z/j_8Wz9
The arithmetic shift was converted to logical shift with:
246078604c
That does not seem to uncover any other missing/conflicting folds,
so convert directly to signbit test + cast.
We still need to fold the pattern with logical shift to test + cast.
This allows reducing patterns where the output type is not
the same as the input value:
https://alive2.llvm.org/ce/z/nydwFVFixes#57394