This was originally intended with D48893, but as discussed there, we
have to make the folds safe from producing extra poison. This should
give the single binop folds the same capabilities as the existing
folds for 2-binops+shuffle.
LLVM binary opcode review: there are a total of 18 binops. There are 7
commutative binops (add, mul, and, or, xor, fadd, fmul) which we already
fold. We're able to fold 6 more opcodes with this patch (shl, lshr, ashr,
fdiv, udiv, sdiv). There are no folds for srem/urem/frem AFAIK. We don't
bother with sub/fsub with constant operand 1 because those are
canonicalized to add/fadd. 7 + 6 + 3 + 2 = 18.
llvm-svn: 336684
The case with 2 variables is more complicated than the case where
we eliminate the shuffle entirely because a shuffle with an undef
mask element creates an undef result.
I'm not aware of any current analysis/transform that recognizes that
undef propagating to a div/rem/shift, but we have to guard against
the possibility.
llvm-svn: 336668
getSafeVectorConstantForBinop() was calling getBinOpIdentity() assuming
that the constant we wanted was operand 1 (RHS). That's wrong, but I
don't think we could expose a bug or even a suboptimal fold from that
because the callers have other guards for any binop that would have
been affected.
llvm-svn: 336617
As noted in D48987, there are many different ways for this transform to go wrong.
In particular, the poison potential for shifts means we have to more careful with those ops.
I added tests to make that behavior visible for all of the different cases that I could find.
This is a partial fix. To make this review easier, I did not make changes for the single binop
pattern (handled in foldSelectShuffleWith1Binop()). I also left out some potential optimizations
noted with TODO comments. I'll follow-up once we're confident that things are correct here.
The goal is to correct all marked FIXME tests to either avoid the shuffle transform or do it safely.
Note that distinguishing when the shuffle mask contains undefs and using getBinOpIdentity() allows
for some improvements to div/rem patterns, so there are wins along with the missed opportunities
and fixes.
Differential Revision: https://reviews.llvm.org/D49047
llvm-svn: 336546
As discussed in D48987 and D48893, there are many different
ways to go wrong depending on the binop (and as shown here
we already do go wrong in some cases).
llvm-svn: 336450
As the test diffs show, the current users of getBinOpIdentity()
are InstCombine and Reassociate. SLP vectorizer is a candidate
for using this functionality too (D28907).
The InstCombine shuffle improvements are part of the planned
enhancements noted in D48830.
InstCombine actually has several other uses of getBinOpIdentity()
via SimplifyUsingDistributiveLaws(), but we don't call that for
any FP ops. Fixing that might be another part of removing the
custom reassociation in InstCombine that is only done for fadd+fmul.
llvm-svn: 336215
This is the last significant change suggested in PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806#c5
...though there are several follow-ups noted in the code comments
in this patch to complete this transform.
It's possible that a binop feeding a select-shuffle has been eliminated
by earlier transforms (or the code was just written like this in the 1st
place), so we'll fail to match the patterns that have 2 binops from:
D48401,
D48678,
D48662,
D48485.
In that case, we can try to materialize identity constants for the remaining
binop to fill in the "ghost" lanes of the vector (where we just want to pass
through the original values of the source operand).
I added comments to ConstantExpr::getBinOpIdentity() to show planned follow-ups.
For now, we only handle the 5 commutative integer binops (add/mul/and/or/xor).
Differential Revision: https://reviews.llvm.org/D48830
llvm-svn: 336196
This extends D48485 to allow another pair of binops (add/or) to be combined either
with or without a leading shuffle:
or X, C --> add X, C (when X and C have no common bits set)
Here, we need value tracking to determine that the 'or' can be reversed into an 'add',
and we've added general infrastructure to allow extending to other opcodes or moving
to where other passes could use that functionality.
Differential Revision: https://reviews.llvm.org/D48662
llvm-svn: 336128
Due to current limitations in constant analysis, we need flags
on add or mul to show propagation for the potential transform
suggested in these tests (no other binops currently report
identity constants).
llvm-svn: 336101
This was discussed in D48401 as another improvement for:
https://bugs.llvm.org/show_bug.cgi?id=37806
If we have 2 different variable values, then we shuffle (select) those lanes,
shuffle (select) the constants, and then perform the binop. This eliminates a binop.
The new shuffle uses the same shuffle mask as the existing shuffle, so there's no
danger of creating a difficult shuffle.
All of the earlier constraints still apply, but we also check for extra uses to
avoid creating more instructions than we'll remove.
Additionally, we're disallowing the fold for div/rem because that could expose a
UB hole.
Differential Revision: https://reviews.llvm.org/D48678
llvm-svn: 335974
This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806
We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other
direction because that's generally better of course). This allows us to remove the shuffle
as we do in the regular opcodes-are-the-same cases.
This requires a small hack to make sure we don't introduce any extra poison:
https://rise4fun.com/Alive/ZGv
Other examples of opcodes where this would work are add+sub and fadd+fsub, but we already
canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. There
are planned enhancements for opcode transforms such or -> add.
Note that there's a different fold needed if we've already managed to simplify away a binop
as seen in the test based on PR37806, but we manage to get that one case here because this
fold is positioned above the demanded elements fold currently.
Differential Revision: https://reviews.llvm.org/D48485
llvm-svn: 335888
This one shows another pattern that we'll need to match
in some cases, but the current ordering of folds allows
us to match this as 2 binops before simplification takes
place.
llvm-svn: 335347
With non-commutative binops, we could be using the same
variable value as operand 0 in 1 binop and operand 1 in
the other, so we have to check for that possibility and
bail out.
llvm-svn: 335312
This is the simplest case from PR37806:
https://bugs.llvm.org/show_bug.cgi?id=37806
If we have a common variable operand used in a pair of binops with vector constants
that are vector selected together, then we can constant shuffle the constant vectors
to eliminate the shuffle instruction.
This has some tricky parts that are hopefully addressed in the tests and their
respective comments:
1. If the shuffle mask contains an undef element, then that lane of the result is
undef:
http://llvm.org/docs/LangRef.html#shufflevector-instruction
Therefore, we can replace the constant in that lane with an undef value except
for div/rem. With div/rem, an undef in the divisor would cause the whole op to
be undef. So I'm using the same hack as in D47686 - replace the undefs with '1'.
2. Intersect the wrapping and FMF of the original binops for the new binop. There
should be no extra poison or fast-math potential in the new binop that wasn't
possible in the original code.
3. Disregard other uses. Given that we're eliminating uses (shortening the
dependency chain), I think that's always the right IR canonicalization. But
I purposely chose the udiv test to demonstrate the scenario where both
intermediate values have other uses because that seems likely worse for
codegen with an expensive math op. This seems like a very rare possibility to
me, so I don't think it requires a backend patch first.
Differential Revision: https://reviews.llvm.org/D48401
llvm-svn: 335283