Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 362143
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 362142
Summary:
Direct sibling of D62662, the root cause of the endless combine loop in D62257
https://rise4fun.com/Alive/d3W
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62664
llvm-svn: 362133
Summary:
No tests change, and i'm not sure how to test this, but it's better safe than sorry.
Reviewers: spatel, RKSimon, craig.topper, t.p.northover
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62663
llvm-svn: 362132
Summary:
This was the root cause of the endless combine loop in D62257
https://rise4fun.com/Alive/d3W
Reviewers: RKSimon, spatel, craig.topper, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62662
llvm-svn: 362131
Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry.
Reviewers: spatel, RKSimon, craig.topper, t.p.northover
Reviewed By: craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62661
llvm-svn: 362130
I was looking into an endless combine loop the uncommitted follow-up patch
was causing, and it appears even these patches can exibit such an
endless loop. The root cause is that we try to hoist one binop (add/sub) with
constant operand, and if we get two such binops both of which are
eligible for this hoisting, we get stuck.
Some cases may highlight missing constant-folds.
Reverts r361871,r361872,r361873,r361874.
llvm-svn: 362109
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
This is a recommit, originally committed in rL361856, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 361874
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
This is a recommit, originally committed in rL361855, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 361873
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
This is a recommit, originally committed in rL361853, but reverted
to investigate test-suite compile-time hangs.
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 361872
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 361871
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..
https://rise4fun.com/Alive/AAq
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, JDevlieghere, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62294
llvm-svn: 361856
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.
It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..
https://rise4fun.com/Alive/ZRl
Reviewers: RKSimon, craig.topper, spatel, arsenm
Reviewed By: RKSimon, arsenm
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62263
llvm-svn: 361855
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.
No surprising test changes.
https://rise4fun.com/Alive/pbT
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62257
llvm-svn: 361854
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?
The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`
https://rise4fun.com/Alive/ffh
Reviewers: RKSimon, craig.topper, spatel, t.p.northover
Reviewed By: t.p.northover
Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62252
llvm-svn: 361853
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.
AArch64 test changes all look good (`neg` created), or neutral.
X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).
I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.
I'm unable to interpret AMDGPU change, looks neutral-ish?
This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].
https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)
Reviewers: craig.topper, RKSimon, spatel, arsenm
Reviewed By: RKSimon
Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62223
llvm-svn: 361852
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
This commit was reverted because of the build failure.
The reason was mlformed patch.
Build failure fixed.
llvm-svn: 361741
Details: To make instruction selection really divergence driven it is necessary to assign
the correct register classes to the cross block values beforehand. For the divergent targets
same value type requires different register classes dependent on the value divergence.
Reviewers: rampitec, nhaehnle
Differential Revision: https://reviews.llvm.org/D59990
llvm-svn: 361644
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.
computeKnownBits then uses this function to improve codegen, notably vector code after legalization.
A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.
This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.
Differential Revision: https://reviews.llvm.org/D61887
llvm-svn: 361620
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.
llvm-svn: 361547
There are no FP callers of DAGCombiner::reassociateOps() currently,
but we can add a fast-math check to make sure this API is not being
misused.
This was noted as a potential risk (and that risk might increase) with:
D62191
llvm-svn: 361268
This changes the isShift variable to include the constant operand
check that was previously in the if statement.
While there fix an 80 column violation and an unnecessary use of
getNode. Also fix variable name capitalization.
llvm-svn: 361168
Summary:
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
https://rise4fun.com/Alive/ml6
So, why is it there then?
This changes the testcase that was touched by @spatel in rL347478,
but i'm not sure that test tests anything particular?
Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin
Reviewed By: spatel
Subscribers: javed.absar, llvm-commits, spatel
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61918
llvm-svn: 361044
We catch most of these patterns (on x86 at least) by matching
a concat vectors opcode early in combining, but the pattern may
emerge later using insert subvector instead.
The AVX1 diffs for add/sub overflow show another missed narrowing
pattern. That one may be falling though the cracks because of
combine ordering and multiple uses.
llvm-svn: 360585
Summary:
When we know for sure whether two addresses do or do not alias, we
should immediately return from DAGCombiner::isAlias().
I think this comes from a bad copy/paste, Sorry for not catching that during the
code review.
Fixes PR41855.
Reviewers: niravd, gchatelet, EricWF
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61846
llvm-svn: 360566
I noticed that we were failing to narrow an x86 ymm math op in a case similar
to the 'madd' test diff. That is because a bitcast is sitting between the math
and the extract subvector and thwarting our pattern matching for narrowing:
t56: v8i32 = add t59, t58
t68: v4i64 = bitcast t56
t73: v2i64 = extract_subvector t68, Constant:i64<2>
t96: v4i32 = bitcast t73
There are a few wins and neutral diffs in the other tests.
Differential Revision: https://reviews.llvm.org/D61806
llvm-svn: 360541
To find the candidates to merge stores we iterate over all nodes in a chain
for each store, which leads to quadratic compile times for large basic blocks
with a large number of stores.
Reviewers: niravd, spatel, craig.topper
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D61511
llvm-svn: 360357
Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check.
Differential Revision: https://reviews.llvm.org/D61236
llvm-svn: 360226
When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.
This patch adds some limits to the number of nodes explored for the
cases mentioned above.
Reviewers: niravd, spatel, craig.topper
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D61397
llvm-svn: 360171
The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>. The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this. Instead, simply prevent it's creation.
Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change.
llvm-svn: 360090
This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635
by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is
legal but former is not) for zero-or-all-ones boolean reductions (which
are detected based on sign bits).
Differential Revision: https://reviews.llvm.org/D61398
llvm-svn: 360054
The original patch was committed at rL359398 and reverted at rL359695 because of
infinite looping.
This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop.
Original commit message:
This was originally part of D61028, but it's an independent diff.
If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.
The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.
Differential Revision: https://reviews.llvm.org/D61149
llvm-svn: 359793
Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry),
if adde is not legal for the target. Even it's at type-legalize phase.
Because adde is special and will not be legalized at operation-legalize phase later.
This fixes: PR40922
https://bugs.llvm.org/show_bug.cgi?id=40922
Differential Revision: https://reviews.llvm.org//D60854
llvm-svn: 359532
Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.
Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.
Reviewers: spatel, craig.topper, tstellar
Reviewed By: spatel
Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61199
llvm-svn: 359476
This was originally part of D61028, but it's an independent diff.
If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.
The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.
Differential Revision: https://reviews.llvm.org/D61149
llvm-svn: 359398
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).
Differential Revision: https://reviews.llvm.org/D61068
llvm-svn: 359293
If we have a vector FP division with a splatted divisor, use the existing transform
that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then
potentially be converted into a scalar FP division by existing combines (rL358984)
as seen in the tests here.
That can be a potentially big perf difference if scalar fdiv has better timing
(including avoiding possible frequency throttling for vector ops).
Differential Revision: https://reviews.llvm.org/D61028
llvm-svn: 359147
If we only match build vectors, we can miss some patterns
that use shuffles as seen in the affected tests.
Note that the underlying calls within getSplatSourceVector()
have the potential for compile-time explosion because of
exponential recursion looking through binop opcodes, but
currently the list of supported opcodes is very limited.
Both of those problems should be addressed in follow-up
patches.
llvm-svn: 358984
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.
This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).
To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.
Reviewers: spatel, RKSimon, craig.topper, kparzysz
Reviewed By: spatel
Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59758
llvm-svn: 358965
As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious.
shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask
preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair
llvm-svn: 358526
The checks in `canFoldInAddressingMode` tested for addressing modes that have a
base register but didn't set the `HasBaseReg` flag to true (it's false by
default). This patch fixes that. Although the omission of the flag was
technically incorrect it had no known observable impact, so no tests were
changed by this patch.
Differential Revision: https://reviews.llvm.org/D60314
llvm-svn: 358502
// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)
The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.
The x86 changes look neutral or better. There's one test with an
extra instruction, but that could be reversed for a subtarget with
the right attributes. But by default, we want to avoid the 256-bit
op when possible (in my motivating benchmark, a handful of ymm ops
sprinkled into a sequence of xmm ops are triggering frequency
throttling on Haswell resulting in significantly worse perf).
Differential Revision: https://reviews.llvm.org/D60545
llvm-svn: 358291
// bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) -->
// build_vec ...undef, (bo x, y), undef...
The lifetime of the nodes in these examples is different for variables versus constants,
but they are all build vectors briefly, so I'm proposing to catch them in this form to
handle all of the leading examples in the motivating test file.
Before we have build vectors, we might have insert_vector_element. After that, we might
have scalar_to_vector and constant pool loads.
It's going to take more work to ensure that FP vector operands are getting simplified
with undef elements, so this transform can apply more widely. In a non-loose FP environment,
we are likely simplifying FP elements to NaN values rather than undefs.
We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors
yet.
Differential Revision: https://reviews.llvm.org/D60514
llvm-svn: 358172
This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate.
Differential Revision: https://reviews.llvm.org/D60020
llvm-svn: 358027
There are a variety of vector patterns that may be profitably reduced to a
scalar op when scalar ops are performed using a subset (typically, the
first lane) of the vector register file.
For x86, this is true for float/double ops and element 0 because
insert/extract is just a sub-register rename.
Other targets should likely enable the hook in a similar way.
Differential Revision: https://reviews.llvm.org/D60150
llvm-svn: 357760
There are 3 changes to make this correspond to the same transform in instcombine:
1. Remove the legality check - we can't create anything less legal than we started with.
2. Ease the use restriction, so we only bail out if both operands have >1 use.
3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).
As discussed in D60150, there's a scalarization opportunity that will be made
easier by allowing this transform more generally.
llvm-svn: 357580
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.
Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight
Reviewed By: jyknight
Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58070
llvm-svn: 357283
Summary:
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations without fully pruning unused result values. This results
in nodes that are never added to the worklist and therefore can not be
pruned.
Add a node inserter for the combiner to make sure such nodes have the
chance of being pruned. This allows a number of additional peephole
optimizations.
Reviewers: efriedma, RKSimon, craig.topper, jyknight
Reviewed By: jyknight
Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58068
llvm-svn: 357279
After investigating the examples from D59777 targeting an SSE4.1 machine,
it looks like a very different problem due to how we map illegal types (256-bit in these cases).
We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand.
We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that
generality means it is limited to patterns with a one-use constraint, and the examples here have
2 uses. We don't need any uses or legality limitations for a simplification (no new value is
created).
It looks like we miss this pattern in IR too.
In one of the zext examples here, we have shuffle masks like this:
Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7>
Shuf = vector_shuffle<4,u,6,7,u,u,u,u>
...so that's moving the high half of the 1st vector into the low half. But the high half of the
1st vector is already identical to the low half.
Differential Revision: https://reviews.llvm.org/D59961
llvm-svn: 357258
This is a sibling to rL357178 that I noticed we'd hit if we chose
an alternate transform in D59818.
%z = zext i8 %x to i32
%dec = add i32 %z, -1
%r = sext i32 %dec to i64
=>
%z2 = zext i8 %x to i64
%r = add i64 %z2, -1
https://rise4fun.com/Alive/kPP
The x86 vector diffs show a slight regression, so there's a chance
that we should limit this and the previous transform to scalars.
But given that we allowed vectors before, I'm matching that behavior
here. We should change both transforms together if that's the right
thing to do.
llvm-svn: 357254
If scalar truncates are free, attempt to pre-truncate build_vectors source operands.
Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering.
Differential Revision: https://reviews.llvm.org/D59654
llvm-svn: 357161
Rework BaseIndexOffset and isAlias to fully work with lifetime nodes
and fold in lifetime alias analysis.
This is mostly NFC.
Reviewers: courbet
Reviewed By: courbet
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59794
llvm-svn: 357070
getAsCarry() checks that the input argument is a carry-producing node before
allowing a transformation to addcarry. This patch adds a check to make sure
that the carry-producing node is legal. If it is not, it may not remain in a
form that is manageable by the target backend. The test case caused a
compilation failure during instruction selection for this reason on SystemZ.
Patch by Ulrich Weigand.
Review: Sanjay Patel
https://reviews.llvm.org/D59822
llvm-svn: 357052
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations or not fully pruning unused result values. This can
result in nodes that are never added to the worklist and therefore can
not be pruned.
Add a node inserter as the current node deleter to make sure such
nodes have the chance of being pruned.
Many minor changes, mostly positive.
llvm-svn: 356996
This helps us relax the extension of a lot of scalar elements before they are inserted into a vector.
Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions.
Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes.
Differential Revision: https://reviews.llvm.org/D59484
llvm-svn: 356989
SDNodes can only have 64k operands and for some inputs (e.g. large
number of stores), we can reach this limit when creating TokenFactor
nodes. This patch is a follow up to D56740 and updates a few more places
that potentially can create TokenFactors with too many operands.
Reviewers: efriedma, craig.topper, aemerson, RKSimon
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D59156
llvm-svn: 356668
In r311255 we added a case where we split vectors whose elements are
all derived from the same input vector so that we could shuffle it
more efficiently. In doing so, createBuildVecShuffle was taught to
adjust for the fact that all indices would be based off of the first
vector when this happens, but it's possible for the code that checked
that to fire incorrectly if we happen to have a BUILD_VECTOR of
extracts from subvectors and don't hit this new optimization.
Instead of trying to detect if we've split the vector by checking if
we have extracts from the same base vector, we can just pass that
information into createBuildVecShuffle, avoiding the miscompile.
Differential Revision: https://reviews.llvm.org/D59507
llvm-svn: 356476
These changes are related to PR37743 and include:
SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.
Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.
Add promoting the integer ABS node in the LegalizeIntegerType.
Expand-based legalization of integer result for the ABS nodes.
Expand-based legalization of ABS vector operations.
Add some integer abs testcases for different typesizes for Thumb arch
Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
tmp = (SRA, Hi, 31)
Lo = (UADDO tmp, Lo)
Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
Lo = (XOR tmp, Lo)
The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
(ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).
Change integer abs testcases for codegen with the ABS node support for AArch64.
Indicate that the ABS is legal for the i64 type when the NEON is supported.
Change the integer abs testcases to show changing of codegen.
Add combine and legalization of ABS nodes for Thumb arch.
Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.
For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743
Patch by: @ikulagin (Ivan Kulagin)
Differential Revision: https://reviews.llvm.org/D49837
llvm-svn: 356468
This allows better code size for aarch64 floating point materialization
in a future patch.
Reviewers: evandro
Differential Revision: https://reviews.llvm.org/D58690
llvm-svn: 356389
Delete temporarily constructed node uses for analysis after it's use,
holding onto original input nodes. Ideally this would be rewritten
without making nodes, but this appears relatively complex.
Reviewers: spatel, RKSimon, craig.topper
Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57921
llvm-svn: 356382
Fold (x & ~y) | y and it's four commuted variants to x | y. This pattern
can in particular appear when a vselect c, x, -1 is expanded to
(x & ~c) | (-1 & c) and combined to (x & ~c) | c.
This change has some overlap with D59066, which avoids creating a
vselect of this form in the first place during uaddsat expansion.
Differential Revision: https://reviews.llvm.org/D59174
llvm-svn: 356333
rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef.
llvm-svn: 356327
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.
Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.
CodeGen tests with non-reordering changes:
X86/aligned-variadic.ll -- memory-based add folded into stored leaq
value.
X86/constant-combiners.ll -- Optimizes out overlap between stores.
X86/pr40631_deadstore_elision -- folds constant byte store into
preceding quad word constant store.
Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet
Reviewed By: courbet
Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59260
llvm-svn: 356068
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.
Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.
This also includes a few more changes to make this work somewhat
reasonably:
* Add support for expanding VECREDUCE in SDAG. Usually
experimental.vector.reduce is expanded prior to codegen, but if the
target does have native vector reduce, it may of course still be
necessary to expand due to legalization issues. This uses a shuffle
reduction if possible, followed by a naive scalar reduction.
* Allow the result type of integer VECREDUCE to be larger than the
vector element type. For example we need to be able to reduce a v8i8
into an (nominally) i32 result type on AArch64.
* Use the vector operand type rather than the scalar result type to
determine the action, so we can control exactly which vector types are
supported. Also change the legalize vector op code to handle
operations that only have vector operands, but no vector results, as
is the case for VECREDUCE.
* Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
explicitly specify for which vector types the reductions are supported.
This does not handle anything related to VECREDUCE_STRICT_*.
Differential Revision: https://reviews.llvm.org/D58015
llvm-svn: 355860