Summary:
Eli pointed out that it's unsafe to combine the shifts to ISD::SHL etc.,
because those are not defined for b > sizeof(a) * 8, even after some of
the combiners run.
However, PPCISD::SHL defines that behavior (as the instructions themselves).
Move the combination to the backend.
The tests in shift_mask.ll still pass.
Reviewers: echristo, hfinkel, efriedma, iteratee
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D33076
llvm-svn: 302937
This reverts r302712.
The change fails with ASAN enabled:
ERROR: AddressSanitizer: use-after-poison on address ... at ...
READ of size 2 at ... thread T0
#0 ... in llvm::SDNode::getNumValues() const <snip>/include/llvm/CodeGen/SelectionDAGNodes.h:855:42
#1 ... in llvm::SDNode::hasAnyUseOfValue(unsigned int) const <snip>/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:7270:3
#2 ... in llvm::SDValue::use_empty() const <snip> include/llvm/CodeGen/SelectionDAGNodes.h:1042:17
#3 ... in (anonymous namespace)::DAGCombiner::MergeConsecutiveStores(llvm::StoreSDNode*) <snip>/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12944:7
Reviewers: niravd
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33081
llvm-svn: 302746
Summary:
Allow consecutive stores whose values come from consecutive loads to
merged in the presense of other uses of the loads. Previously this was
disallowed as in general the merged load cannot be shared with the
other uses. Merging N stores into 1 may cause as many as N redundant
loads. However in the context of caching this should have neglible
affect on memory pressure and reduce instruction count making it
almost always a win.
Fixes PR32086.
Reviewers: spatel, jyknight, andreadb, hfinkel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30471
llvm-svn: 302712
Before r247167, the pass manager builder controlled which AA
implementations were used, exporting them all in the AliasAnalysis
analysis group.
Now, AAResultsWrapperPass always uses BasicAA, but still uses other AA
implementations if made available in the pass pipeline.
But regardless, SDAGISel is required at O0, and really doesn't need to
be doing fancy optimizations based on useful AA results.
Don't require AA at CodeGenOpt::None, and only use it otherwise.
This does have a functional impact (and one testcase is pessimized
because we can't reuse a load). But I think that's desirable no matter
what.
Note that this alone doesn't result in less DT computations: TwoAddress
was previously able to reuse the DT we computed for SDAG. That will be
fixed separately.
Differential Revision: https://reviews.llvm.org/D32766
llvm-svn: 302611
Remove an extra canonicalization step if ISD::ABS is going to be used anyway.
Updated x86 abs combine to check that we are lowering from both canonicalizations.
llvm-svn: 302337
Summary: Do the transform when the carry isn't used. It's a pattern exposed when legalizing large integers.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32755
llvm-svn: 302047
Summary:
This is the corresponding llvm change to D28037 to ensure no performance
regression.
Reviewers: bogner, kbarton, hfinkel, iteratee, echristo
Subscribers: nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28329
llvm-svn: 301990
Summary: This is a common pattern that arise when legalizing large integers operations. Only do it when Y + 1 cannot overflow as this would change the carry behavior of uaddo .
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32687
llvm-svn: 301922
Summary: Common pattern when legalizing large integers operations. Similar to D32687, when the carry isn't used.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Differential Revision: https://reviews.llvm.org/D32738
llvm-svn: 301919
The existing code only looks at half of the tree when matching bswap + rol patterns ending in an OR tree (as opposed to a cascade).
Patch originally introduced by Jim Lewis.
Submitted on the behalf of Dinar Temirbulatov.
Differential Revision: https://reviews.llvm.org/D32039
llvm-svn: 301907
We discussed shrinking/widening of selects in IR in D26556, and I'll try to get back to that
patch eventually. But I'm hoping that this transform is less iffy in the DAG where we can check
legality of the select that we want to produce.
A few things to note:
1. We can't wait until after legalization and do this generically because (at least in the x86
tests from PR14657), we'll have PACKSS and bitcasts in the pattern.
2. This might benefit more of the SSE codegen if we lifted the legal-or-custom requirement, but
that requires a closer look to make sure we don't end up worse.
3. There's a 'vblendv' opportunity that we're missing that results in andn/and/or in some cases.
That should be fixed next.
4. I'm assuming that AVX1 offers the worst of all worlds wrt uneven ISA support with multiple
legal vector sizes, but if there are other targets like that, we should add more tests.
5. There's a codegen miracle in the multi-BB tests from PR14657 (the gcc auto-vectorization tests):
despite IR that is terrible for the target, this patch allows us to generate the optimal loop
code because something post-ISEL is hoisting the splat extends above the vector loops.
Differential Revision: https://reviews.llvm.org/D32620
llvm-svn: 301781
Summary: As per discution on how to get better codegen an large int legalization, it became clear that using a glue for the carry was preventing several desirable optimizations. Passing the carry down as a value allow for more flexibility.
Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29872
llvm-svn: 301775
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.
This is largely a mechanical transformation from KnownZero to Known.Zero.
Differential Revision: https://reviews.llvm.org/D32569
llvm-svn: 301620
This patch uses various APInt methods to reduce the number of temporary APInts. These were all found while working through converting SelectionDAG's computeKnownBits to also use the KnownBits struct recently added to the ValueTracking version.
llvm-svn: 301618
Besides better codegen, the motivation is to be able to canonicalize this pattern
in IR (currently we don't) knowing that the backend is prepared for that.
This may also allow removing code for special constant cases in
DAGCombiner::foldSelectOfConstants() that was added in D30180.
Differential Revision: https://reviews.llvm.org/D31944
llvm-svn: 301457
While we use BaseIndexOffset in FindBetterNeighborChains to
appropriately realize they're almost the same address and should be
improved concurrently we do not use it in isAlias using the non-index
understanding FindBaseOffset instead. Adding a BaseIndexOffset check
in isAlias like should allow indexed stores to be merged.
FindBaseOffset to be excised in subsequent patch.
Reviewers: jyknight, aditya_nandakumar, bogner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31987
llvm-svn: 301187
This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.
For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."
PR32754.
llvm-svn: 301111
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.
Differential Revision: https://reviews.llvm.org/D32108
llvm-svn: 300856
I've changed one of the tests to not fold away, but we didn't and still don't do the transform
that the comment claims we do (and I don't know why we'd want to do that).
Follow-up to:
https://reviews.llvm.org/rL300725https://reviews.llvm.org/rL300763
llvm-svn: 300772
This allows forming more 'not' ops, so we get improvements for ISAs that have and-not.
Follow-up to:
https://reviews.llvm.org/rL300725
llvm-svn: 300763
The patch itself is simple: stop discriminating against vectors in visitAnd() and again in
SimplifyDemandedBits().
Some notes for reference:
1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions.
Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in.
2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could
make similar simultaneous changes in other places if we think that would be better.
3. I don't know what the intent of the changed tests in this patch was supposed to be, but since
they wiggled in a positive way, I'm just going with that. :)
4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253.
5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards
improving the vector codegen in that patch without writing any actual new code.
Differential Revision: https://reviews.llvm.org/D32230
llvm-svn: 300725
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.
This adds an lshrInPlace(const APInt &) version as well.
Differential Revision: https://reviews.llvm.org/D32155
llvm-svn: 300566
Remove non-consecutive stores from store merge candidate search as
they cannot be merged and will prevent us from finding subsequent
mergeable store cases.
Reviewers: jyknight, bogner, javed.absar, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32086
llvm-svn: 300561
This is a follow-on to r299096 which added support for fmadd.
Subtract does not have the case where with two multiply operands we commute in
order to fuse with the multiply with the fewer uses.
llvm-svn: 299572