Commit Graph

3329 Commits

Author SHA1 Message Date
Sanjay Patel e60cb7d1be [InstSimplify] insertelement V, undef, ? --> V
This was part of InstCombine, but it's better placed in
InstSimplify. InstCombine also had an unreachable but weaker
fold for insertelement with undef index, so that is deleted.

llvm-svn: 361559
2019-05-23 21:49:47 +00:00
Sanjay Patel 3249be1e03 [InstCombine] be more careful when transforming a shuffle mask
This is reduced from a fuzzer test:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14890

Usually, demanded elements should be able to simplify shuffle
mask elements that are pointing to undef elements of its source
operands, but that doesn't happen in the test case.

llvm-svn: 361533
2019-05-23 18:46:03 +00:00
Craig Topper 9816d55776 [X86][InstCombine] Remove InstCombine code that turns X86 round intrinsics into llvm.ceil/floor. Remove some isel patterns that existed because that was happening.
We were turning roundss/sd/ps/pd intrinsics with immediates of 1 or 2 into
llvm.floor/ceil.  The llvm.ceil/floor intrinsics are supposed to correspond
to the libm functions.  For the libm functions we need to disable the
precision exception so the llvm.floor/ceil functions should always map to
encodings 0x9 and 0xA.

We had a mix of isel patterns where some used 0x9 and 0xA and others used
0x1 and 0x2. We need to be consistent and always use 0x9 and 0xA.

Since we have no way in isel of knowing where the llvm.ceil/floor came
from, we can't map X86 specific intrinsics with encodings 1 or 2 to it.
We could map 0x9 and 0xA to llvm.ceil/floor instead, but I'd really like
to see a use case and optimization advantage first.

I've left the backend test cases to show the blend we now emit without
the extra isel patterns. But I've removed the InstCombine tests completely.

llvm-svn: 361425
2019-05-22 20:04:55 +00:00
Sanjay Patel 6a554188aa [InstCombine] fold shuffles of insert_subvectors
This should be a valid exception to the general rule of not creating new shuffle masks in IR...
because we already do it. :)
Also, DAG combining/legalization will undo this by widening the shuffle back out if needed.

Explanation for how we already do this: SLP or vector source can create chains of insert/extract
as shown in 1 of the examples from PR16739:
https://godbolt.org/z/NlK7rA
https://bugs.llvm.org/show_bug.cgi?id=16739

And we expect instcombine or DAGCombine to clean that up by creating relatively simple shuffles.

Differential Revision: https://reviews.llvm.org/D62024

llvm-svn: 361338
2019-05-22 00:32:25 +00:00
Cameron McInally 8bec58d5f7 [NFC][InstCombine] Add FIXME for one-use check on constant negation transforms.
llvm-svn: 361197
2019-05-20 21:00:42 +00:00
Cameron McInally 2557ca296a [InstCombine] Add visitFNeg(...) visitor for unary Fneg
Also, break out a helper function, namely foldFNegIntoConstant(...), which performs transforms common between visitFNeg(...) and visitFSub(...).

Differential Revision: https://reviews.llvm.org/D61693

llvm-svn: 361188
2019-05-20 19:10:30 +00:00
Sanjay Patel 926e47751b [InstCombine] move bitcast after insertelement-with-bitcasted-operands
llvm-svn: 361058
2019-05-17 18:06:12 +00:00
Roman Lebedev 3275060fe8 [InstCombine] canShiftBinOpWithConstantRHS(): drop bogus signbit check
Summary:
In D61918 i was looking at dropping it in DAGCombiner `visitShiftByConstant()`,
but as @craig.topper pointed out, it was copied from here.

That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
   https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
   https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
   https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
   https://rise4fun.com/Alive/ml6

So, why is it there then?
As far as i can tell, it dates all the way back to original check-in rL7793.
I think we should just drop it.

Reviewers: spatel, craig.topper, efriedma, majnemer

Reviewed By: spatel

Subscribers: llvm-commits, craig.topper

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61938

llvm-svn: 361043
2019-05-17 15:52:49 +00:00
Sanjay Patel 760f61ab36 [InstCombine] try harder to form rotate (funnel shift) (PR20750)
We have a similar match for patterns ending in a truncate. This
should be ok for all targets because the default expansion would
still likely be better from replacing 2 'and' ops with 1.

Attempt to show the logic equivalence in Alive (which doesn't
currently have funnel-shift in its vocabulary AFAICT):

  %shamt = zext i8 %i to i32
  %m = and i32 %shamt, 31
  %neg = sub i32 0, %shamt
  %and4 = and i32 %neg, 31
  %shl = shl i32 %v, %m
  %shr = lshr i32 %v, %and4
  %or = or i32 %shr, %shl
  =>
  %a = and i8 %i, 31
  %shamt2 = zext i8 %a to i32
  %neg2 = sub i32 0, %shamt2
  %and4 = and i32 %neg2, 31
  %shl = shl i32 %v, %shamt2
  %shr = lshr i32 %v, %and4
  %or = or i32 %shr, %shl

https://rise4fun.com/Alive/V9r

llvm-svn: 360605
2019-05-13 17:28:19 +00:00
Cameron McInally e75412ab47 Add InstCombine::visitFNeg(...)
Differential Revision: https://reviews.llvm.org/D61784

llvm-svn: 360461
2019-05-10 20:01:04 +00:00
Craig Topper 51a17df45d [InstCombine] When turning sext into zext due to known bits, return the new ZExt instead of calling replaceinstuseswith
The worklist loop that we're returning back to should be able to do the repacement itself. This is how we normally do replacements.

My main motivation was that I observed that we weren't preserving the name of the result when we do this transform. The replacement code in the worklist loop will call takeName as part of the replacement.

Differential Revision: https://reviews.llvm.org/D61695

llvm-svn: 360284
2019-05-08 20:59:21 +00:00
Robert Lougher 8681ef8f41 [InstCombine] Add new combine to add folding
(X | C1) + C2 --> (X | C1) ^ C1 iff (C1 == -C2)

I verified the correctness using Alive:
https://rise4fun.com/Alive/YNV

This transform enables the following transform that already exists in
instcombine:

(X | Y) ^ Y --> X & ~Y

As a result, the full expected transform is:

(X | C1) + C2 --> X & ~C1 iff (C1 == -C2)

There already exists the transform in the sub case:

(X | Y) - Y --> X & ~Y

However this does not trigger in the case where Y is constant due to an earlier
transform:

X - (-C) --> X + C

With this new add fold, both the add and sub constant cases are handled.

Patch by Chris Dawson.

Differential Revision: https://reviews.llvm.org/D61517

llvm-svn: 360185
2019-05-07 19:36:41 +00:00
Sanjay Patel 6a281a7545 [InstCombine] allow sinking fneg operands through an FP min/max
Fundamentally/generally, we should not have to rely on bailouts/crippling of
folds. In this particular case, I think we always recognize the inverted
predicate min/max pattern, so there should not be any loss of optimization.
Codegen looks better because we are eliminating an fneg.

llvm-svn: 360180
2019-05-07 18:58:07 +00:00
Sanjay Patel a6019d5164 [InstCombine] sink FP negation of operands through select
We don't always get this:

Cond ? -X : -Y --> -(Cond ? X : Y)

...even with the legacy IR form of fneg in the case with extra uses,
and we miss matching with the newer 'fneg' instruction because we
are expecting binops through the rest of the path.

Differential Revision: https://reviews.llvm.org/D61604

llvm-svn: 360075
2019-05-06 20:34:05 +00:00
Sanjay Patel a64bd09ec4 [InstCombine] reduce code duplication; NFC
llvm-svn: 360059
2019-05-06 17:39:18 +00:00
Sanjay Patel 62f457b137 [InstCombine] reduce code duplication; NFCI
llvm-svn: 360051
2019-05-06 15:35:02 +00:00
Simon Pilgrim 7a2e855a0f Move Value *RHSCIOp def into the scope where its actually used. NFCI.
llvm-svn: 359973
2019-05-05 10:27:45 +00:00
Philip Reames 84e54eb471 [InstCombine] Limit a vector demanded elts rule which was producing invalid IR.
The demanded elts rules introduced for GEPs in https://reviews.llvm.org/rL356293 replaced vector constants with undefs (by design).  It turns out that the LangRef disallows such cases when indexing structs.  The right fix is probably to relax the langref requirement, and update other passes to expect the result, but for the moment, limit the transform to avoid compiler crashes.

This should fix https://bugs.llvm.org/show_bug.cgi?id=41624.

llvm-svn: 359633
2019-04-30 23:09:26 +00:00
Sanjay Patel a706b9a90e [InstCombine] reduce code duplication; NFC
Follow-up to:
rL359482

Avoid this potential problem throughout by giving the type a name
and verifying the assumption that both operands are the same type.

llvm-svn: 359485
2019-04-29 19:23:44 +00:00
Simon Pilgrim e3c8776172 [InstCombine] visitFCmpInst - appease copy+paste pattern warning. NFCI.
PVS Studio's copy+paste recognizer was seeing this as a typo, technically Op0/Op1 in a fcmp should always be the same type, but we might as well avoid the issue.

Reported in https://www.viva64.com/en/b/0629/

llvm-svn: 359482
2019-04-29 18:52:19 +00:00
Simon Pilgrim 48a3b54572 [InstCombine][X86] Tweak generic expansion of PACKSS/PACKUS to shuffle then truncate. NFCI.
This has no effect on constant folding but will be useful when we expand non-saturating PACKSS/PACKUS intrinsics.

llvm-svn: 359191
2019-04-25 13:51:57 +00:00
Simon Pilgrim 4b7d3c4831 Fix include order. NFCI.
llvm-svn: 359177
2019-04-25 09:49:37 +00:00
Philip Reames 88cd69b56f Consolidate existing utilities for interpreting vector predicate maskes [NFC]
llvm-svn: 359163
2019-04-25 02:30:17 +00:00
Philip Reames 7c8647b26f [InstCombine] Be consistent w/handling of masked intrinsics style wise [NFC]
llvm-svn: 359160
2019-04-25 01:18:56 +00:00
Simon Pilgrim 55f14dac74 [InstCombine][X86] Use generic expansion of PACKSS/PACKUS for constant folding. NFCI.
This patch rewrites the existing PACKSS/PACKUS constant folding code to expand as a generic expansion.

This is a first NFCI step toward expanding PACKSS/PACKUS intrinsics which are acting as non-saturating truncations (although technically the expansion could be used in all cases - but we'll probably want to be conservative).

llvm-svn: 359111
2019-04-24 16:53:17 +00:00
Philip Reames 2ce017026a [InstCombine] Convert a masked.load of a dereferenceable address to an unconditional load
If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable.

Differential Revision: https://reviews.llvm.org/D59703

llvm-svn: 359000
2019-04-23 15:25:14 +00:00
Philip Reames d748689c7f [InstCombine] Eliminate stores to constant memory
If we have a store to a piece of memory which is known constant, then we know the store must be storing back the same value. As a result, the store (or memset, or memmove) must either be down a dead path, or a noop. In either case, it is valid to simply remove the store.

The motivating case for this involves a memmove to a buffer which is constant down a path which is dynamically dead.

Note that I'm choosing to implement the less aggressive of two possible semantics here. We could simply say that the store *is undefined*, and prune the path. Consensus in the review was that the more aggressive form might be a good follow on change at a later date.

Differential Revision: https://reviews.llvm.org/D60659

llvm-svn: 358919
2019-04-22 20:28:19 +00:00
Philip Reames d8d9b7b20e [InstSimplify] Move masked.gather w/no active lanes handling to InstSimplify from InstCombine
In the process, use the existing masked.load combine which is slightly stronger, and handles a mix of zero and undef elements in the mask.  

llvm-svn: 358913
2019-04-22 19:30:01 +00:00
Philip Reames 88679717ce [InstCombine] Factor out unreachable inst idiom creation [NFC]
In InstCombine, we use an idiom of "store i1 true, i1 undef" to indicate we've found a path which we've proven unreachable.  We can't actually insert the unreachable instruction since that would require changing the CFG.  We leave that to simplifycfg later.

This just factors out that idiom creation so we don't duplicate the same mostly undocument idiom creation in multiple places.

llvm-svn: 358600
2019-04-17 17:37:58 +00:00
Nikita Popov 5ecd6a48b9 [InstCombine] Prune fshl/fshr with masked operands
If a constant shift amount is used, then only some of the LHS/RHS
operand bits are demanded and we may be able to simplify based on
that. InstCombineSimplifyDemanded already had the necessary support
for that, we just weren't calling it with fshl/fshr as root.

In particular, this allows us to relax some masked funnel shifts
into simple shifts, as shown in the tests.

Patch by Shawn Landden.

Differential Revision: https://reviews.llvm.org/D60660

llvm-svn: 358515
2019-04-16 19:05:49 +00:00
Nikita Popov 79dffc67b5 [IR] Add WithOverflowInst class
This adds a WithOverflowInst class with a few helper methods to get
the underlying binop, signedness and nowrap type and makes use of it
where sensible. There will be two more uses in D60650/D60656.

The refactorings are all NFC, though I left some TODOs where things
could be improved. In particular we have two places where add/sub are
handled but mul isn't.

Differential Revision: https://reviews.llvm.org/D60668

llvm-svn: 358512
2019-04-16 18:55:16 +00:00
Hiroshi Yamauchi 09e539fcae [PGO] Profile guided code size optimization.
Summary:
Enable some of the existing size optimizations for cold code under PGO.

A ~5% code size saving in big internal app under PGO.

The way it gets BFI/PSI is discussed in the RFC thread

http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html 

Note it doesn't currently touch loop passes.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59514

llvm-svn: 358422
2019-04-15 16:49:00 +00:00
Sanjay Patel 5e13cd2e61 [InstCombine] canonicalize fdiv after fmul if reassociation is allowed
(X / Y) * Z --> (X * Z) / Y

This can allow other optimizations/reassociations as shown in the test diffs.

llvm-svn: 358404
2019-04-15 13:23:38 +00:00
Nikita Popov 95e5f28337 [InstCombine] Remove redundant/bogus mul_with_overflow combines
As pointed out in D60518 folding mulo(%x, undef) to {undef, undef}
isn't correct. As a correct version of this already exists in
InstructionSimplify (bd8056ef32/lib/Analysis/InstructionSimplify.cpp (L4750-L4757)) this is just
dead code though. Drop it together with the mul(%x, 0) -> {0, false}
fold that is also already handled by InstSimplify.

Differential Revision: https://reviews.llvm.org/D60649

llvm-svn: 358339
2019-04-13 19:43:35 +00:00
Chen Zheng 87dd0e06dc [InstCombine] Canonicalize (-X srem Y) to -(X srem Y).
Differential Revision: https://reviews.llvm.org/D60647

llvm-svn: 358328
2019-04-13 09:21:22 +00:00
Philip Reames b091cc081d [InstCombine] Fix a nasty miscompile introduced w/masked.gather demanded elts
This fixes a miscompile which was introduced in r356510 (https://reviews.llvm.org/D57372).

The problem is that the original patch removed pointer operands where the load results we're demanded, but without considering the legality of the load itself.  If the masked.gather had active, but undemanded, lanes, then we could end up creating a load which loaded from an undef address.  The result could be a segfault, or, in theory, an arbitrary read from a random memory location into an used register.  

llvm-svn: 358299
2019-04-12 18:26:56 +00:00
Nikita Popov 0a8228fd28 [InstCombine] Handle ssubo always overflow
Following D60483 and D60497, this adds support for AlwaysOverflows
handling for ssubo. This is the last case we can handle right now.

Differential Revision: https://reviews.llvm.org/D60518

llvm-svn: 358100
2019-04-10 16:32:15 +00:00
Nikita Popov 7a543c3758 [InstCombine] ssubo X, C -> saddo X, -C
ssubo X, C is equivalent to saddo X, -C. Make the transformation in
InstCombine and allow the logic implemented for saddo to fold prior
usages of add nsw or sub nsw with constants.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D60061

llvm-svn: 358099
2019-04-10 16:27:36 +00:00
Nikita Popov ef23e88480 [InstCombine] Handle saddo always overflow
Followup to D60483: Handle AlwaysOverflow conditions for saddo as
well.

Differential Revision: https://reviews.llvm.org/D60497

llvm-svn: 358095
2019-04-10 16:18:01 +00:00
Nikita Popov 09020ec2a7 [InstCombine] Handle usubo always overflow
Check AlwaysOverflow condition for usubo. The implementation is the
same as the existing handling for uaddo and umulo. Handling for saddo
and ssubo will follow (smulo doesn't have the necessary ValueTracking
support).

Differential Revision: https://reviews.llvm.org/D60483

llvm-svn: 358052
2019-04-10 07:10:53 +00:00
Nikita Popov 596cbeb705 [InstCombine] Directly call computeOverflow methods in OptimizeOverflowCheck; NFC
Instead of using the willOverflow helpers. This makes it easier to
extend handling of AlwaysOverflows.

llvm-svn: 358051
2019-04-10 07:10:44 +00:00
Chen Zheng 5e13ff1da2 [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y).
Differential Revision: https://reviews.llvm.org/D60395

llvm-svn: 358050
2019-04-10 06:52:09 +00:00
Nikita Popov 2f5e9de8d1 Revert "[InstCombine] [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y)."
This reverts commit 1383a91689.

sdiv-canonicalize.ll fails after this revision. The fold needs to be
moved outside the branch handling constant operands. However when this
is done there are further test changes, so I'm reverting this in the
meantime.

llvm-svn: 358026
2019-04-09 18:32:38 +00:00
Nikita Popov eda3b9326e [InstCombine] Restructure OptimizeOverflowCheck; NFC
Change the code to always handle the unsigned+signed cases together
with the same basic structure for add/sub/mul. The simple folds are
always handled first and then the ValueTracking overflow checks are
used.

llvm-svn: 358025
2019-04-09 18:32:28 +00:00
Chen Zheng 1383a91689 [InstCombine] [InstCombine] Canonicalize (-X s/ Y) to -(X s/ Y).
Differential Revision: https://reviews.llvm.org/D60395

llvm-svn: 358017
2019-04-09 16:34:31 +00:00
Sanjay Patel 49d9d17a77 [InstCombine] prevent possible miscompile with sdiv+negate of vector op
Similar to:
rL358005

Forego folding arbitrary vector constants to fix a possible miscompile bug.
We can enhance the transform if we do want to handle the more complicated
vector case.

llvm-svn: 358013
2019-04-09 15:13:03 +00:00
Sanjay Patel f62dcea7ed [InstCombine] prevent possible miscompile with negate+sdiv of vector op
// 0 - (X sdiv C)  -> (X sdiv -C)  provided the negation doesn't overflow.

This fold has been around for many years and nobody noticed the potential
vector miscompile from overflow until recently...
So it seems unlikely that there's much demand for a vector sdiv optimization
on arbitrary vector constants, so just limit the matching to splat constants
to avoid the possible bug.

Differential Revision: https://reviews.llvm.org/D60426

llvm-svn: 358005
2019-04-09 14:09:06 +00:00
Sanjay Patel 773e04c883 [InstCombine] peek through fdiv to find a squared sqrt
A more general canonicalization between fdiv and fmul would not
handle this case because that would have to be limited by uses
to prevent 2 values from becoming 3 values:
(x/y) * (x/y) --> (x*x) / (y*y)

(But we probably should still have that limited -- but more general --
canonicalization independently of this change.)

llvm-svn: 357943
2019-04-08 21:23:50 +00:00
Sanjay Patel b33938df7a [InstCombine] remove overzealous assert for shuffles (PR41419)
As the TODO indicates, instsimplify could be improved.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=41419

llvm-svn: 357910
2019-04-08 13:28:29 +00:00
Simon Pilgrim b4f1bfa659 [InstCombine][X86] Expand MOVMSK to generic IR (PR39927)
First step towards removing the MOVMSK intrinsics completely - this patch expands MOVMSK to the pattern:

e.g. PMOVMSKB(v16i8 x):
%cmp = icmp slt <16 x i8> %x, zeroinitializer
%int = bitcast <16 x i8> %cmp to i16
%res = zext i16 %int to i32

Which is correctly handled by ISel and FastIsel (give or take an annoying movzx move....): https://godbolt.org/z/rkrSFW

Differential Revision: https://reviews.llvm.org/D60256

llvm-svn: 357909
2019-04-08 13:17:51 +00:00
Chen Zheng 923c7c9daa [InstCombine] sdiv exact flag fixup.
Differential Revision: https://reviews.llvm.org/D60396

llvm-svn: 357904
2019-04-08 12:08:03 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Luqman Aden 8911c5be46 [InstCombine] Combine no-wrap sub and icmp w/ constant.
Teach InstCombine the transformation `(icmp P (sub nuw|nsw C2, Y), C) -> (icmp swap(P) Y, C2-C)`

Reviewers: majnemer, apilipenko, sanjoy, spatel, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: dmgreen, lebedev.ri, nikic, hiraditya, JDevlieghere, jfb, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59916

llvm-svn: 357674
2019-04-04 07:08:30 +00:00
David Bolvansky 937720e75b [InstCombine] Simplify ctpop with bitreverse/bswap
Summary: Fixes PR41337

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60148

llvm-svn: 357564
2019-04-03 08:08:44 +00:00
David Bolvansky 5ba60b22a4 [InstCombine] Simplify ctlz/cttz with bitreverse
Summary: Fixes PR41273

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60096

llvm-svn: 357521
2019-04-02 20:13:28 +00:00
Mikael Holmen 150a7ec2dc [InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder
Summary:
This fixes PR41270.

The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.

Reviewers: reames, spatel

Reviewed By: spatel

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60058

llvm-svn: 357389
2019-04-01 14:10:10 +00:00
Mikael Holmen 3e527cd823 Revert "[InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder"
This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5.

I'll recommit with a better commit message with reference to the
phabricator review.

llvm-svn: 357387
2019-04-01 14:06:45 +00:00
Mikael Holmen d66a47f90a [InstCombine] Handle vector gep with scalar argument in evaluateInDifferentElementOrder
This fixes PR41270.

The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.

llvm-svn: 357385
2019-04-01 13:48:56 +00:00
Sanjay Patel 97d1bc4454 [InstCombine] eliminate commuted select-shuffles + binop (PR41304)
If we have a commutable vector binop with inverted select-shuffles,
we don't care about the order of the operands in each vector lane:

LHS = shuffle V1, V2, <0, 5, 6, 3>
RHS = shuffle V2, V1, <0, 5, 6, 3>
LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2

PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...is currently titled as an SLP enhancement, but at least for the
given example, we can reduce that in instcombine because we are just
eliminating shuffles.

As noted in the TODO, this could be generalized, but I haven't thought
through those patterns completely, so this is limited to what appears
to be always safe.

Differential Revision: https://reviews.llvm.org/D60048

llvm-svn: 357382
2019-04-01 13:36:40 +00:00
Sanjay Patel b276dd195a [InstCombine] canonicalize select shuffles by commuting
In PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...we have a case where we want to fold a binop of select-shuffle (blended) values.

Rather than try to match commuted variants of the pattern, we can canonicalize the
shuffles and check for mask equality with commuted operands.

We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a
special case that the backend is required to handle because we already canonicalize
vector select to this shuffle form.

So there should be no codegen difference from this change. It's possible that this
improves CSE in IR though.

Differential Revision: https://reviews.llvm.org/D60016

llvm-svn: 357366
2019-03-31 15:01:30 +00:00
Sanjay Patel 3f4d1b4abd [InstCombine] move shuffle canonicalizations before other transforms
This may not be NFC, but I'm not sure how to expose any diffs in
tests. In theory, it should be slightly more efficient and possibly
more profitable to do the canonicalizations (which can increase the
undef elements in the mask) ahead of SimplifyDemandedVectorElts().

llvm-svn: 357272
2019-03-29 16:49:38 +00:00
Nikita Popov 7462303e06 [InstCombine] Use uadd.sat and usub.sat for canonicalization
Start using the uadd.sat and usub.sat intrinsics for the existing
canonicalizations. These intrinsics should optimize better than
expanded IR, have better handling in the X86 backend and should
be no worse than expanded IR in other backends, as far as we know.

rL357012 already introduced use of uadd.sat for the add+umin pattern.

Differential Revision: https://reviews.llvm.org/D58872

llvm-svn: 357103
2019-03-27 17:56:15 +00:00
Sanjay Patel 81e8d76f5b [InstCombine] form uaddsat from add+umin (PR14613)
This is the last step towards solving the examples shown in:
https://bugs.llvm.org/show_bug.cgi?id=14613

With this change, x86 should end up with psubus instructions
when those are available.

All known codegen issues with expanding the saturating intrinsics
were resolved with:
D59006 / rL356855

We also have some early evidence in D58872 that using the intrinsics
will lead to better perf. If some target regresses from this, custom
lowering of the intrinsics (as in the above for x86) may be needed.

llvm-svn: 357012
2019-03-26 17:50:08 +00:00
Tim Renouf 94c163c34e InstCombineSimplifyDemanded: Allow v3 results for AMDGCN buffer and image intrinsics
This helps to avoid the situation where RA spots that only 3 of the
v4f32 result of a load are used, and immediately reallocates the 4th
register for something else, requiring a stall waiting for the load.

Differential Revision: https://reviews.llvm.org/D58906

Change-Id: I947661edfd5715f62361a02b100f14aeeada29aa
llvm-svn: 356768
2019-03-22 15:53:50 +00:00
Craig Topper 16dc165046 [InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) if either zext or OP has another use.
If they have other users we'll just end up increasing the instruction count.

We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed.

Fixes PR41164.

Differential Revision: https://reviews.llvm.org/D59630

llvm-svn: 356690
2019-03-21 17:50:49 +00:00
Philip Reames 60212be619 [instcombine] Add some todos, and arrange code for readibility
llvm-svn: 356642
2019-03-21 03:23:40 +00:00
Philip Reames e4588bbf80 Simplify operands of masked stores and scatters based on demanded elements
If we know we're not storing a lane, we don't need to compute the lane. This could be improved by using the undef element result to further prune the mask, but I want to separate that into its own change since it's relatively likely to expose other problems.

Differential Revision: https://reviews.llvm.org/D57247

llvm-svn: 356590
2019-03-20 18:44:58 +00:00
Nikita Popov 37cf25c3c6 [InstCombine] Fold add nuw + uadd.with.overflow
Fold add nuw and uadd.with.overflow with constants if the
addition does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D59471

llvm-svn: 356584
2019-03-20 18:00:27 +00:00
Philip Reames 484d07c828 [instcombine] Add todos describing missing transforms for masked.* intrinsics
llvm-svn: 356536
2019-03-20 03:36:05 +00:00
Philip Reames 70537abe52 Demanded elements support for masked.load and masked.gather
Teach instcombine to propagate demanded elements through a masked load or masked gather instruction. This is in the broader context of improving vector pointer instcombine under https://reviews.llvm.org/D57140.

Differential Revision: https://reviews.llvm.org/D57372

llvm-svn: 356510
2019-03-19 20:10:00 +00:00
Sanjay Patel 5b820323ca [InstCombine] fold logic-of-nan-fcmps (PR41069)
Combine 2 fcmps that are checking for nan-ness:
   and (fcmp ord X, 0), (and (fcmp ord Y, 0), Z) --> and (fcmp ord X, Y), Z
   or  (fcmp uno X, 0), (or  (fcmp uno Y, 0), Z) --> or  (fcmp uno X, Y), Z

This is an exact match for a minimal reassociation pattern.
If we want to handle this more generally that should go in
the reassociate pass and allow removing this code.

This should fix:
https://bugs.llvm.org/show_bug.cgi?id=41069

llvm-svn: 356471
2019-03-19 16:39:17 +00:00
Sanjay Patel 6063393536 [InstCombine] allow general vector constants for funnel shift to shift transforms
Follow-up to:
rL356338
rL356369

We can calculate an arbitrary vector constant minus the bitwidth, so there's
no need to limit this transform to scalars and splats.

llvm-svn: 356372
2019-03-18 14:27:51 +00:00
Sanjay Patel 84de8a30a0 [InstCombine] extend rotate-left-by-constant canonicalization to funnel shift
Follow-up to:
rL356338

Rotates are a special case of funnel shift where the 2 input operands
are the same value, but that does not need to be a restriction for the
canonicalization when the shift amount is a constant.

llvm-svn: 356369
2019-03-18 14:10:11 +00:00
Sanjay Patel b3bcd95771 [InstCombine] canonicalize rotate right by constant to rotate left
This was noted as a backend problem:
https://bugs.llvm.org/show_bug.cgi?id=41057
...and subsequently fixed for x86:
rL356121
But we should canonicalize these in IR for the benefit of all targets
and improve IR analysis such as CSE.

llvm-svn: 356338
2019-03-17 19:08:00 +00:00
Philip Reames 68a2e4d48b [SimplifyDemandedVec] Strengthen handling all undef lanes (particularly GEPs)
A change of two parts:
1) A generic enhancement for all callers of SDVE to exploit the fact that if all lanes are undef, the result is undef.
2) A GEP specific piece to strengthen/fix the vector index undef element handling, and call into the generic infrastructure when visiting the GEP.

The result is that we replace a vector gep with at least one undef in each lane with a undef.  We can also do the same for vector intrinsics.  Once the masked.load patch (D57372) has landed, I'll update to include call tests as well.

Differential Revision: https://reviews.llvm.org/D57468

llvm-svn: 356293
2019-03-15 19:54:06 +00:00
Matt Arsenault 1d83670dbd AMDGPU: Remove intrinsic operand assert
Before r355981, this was under LLVM_DEBUG. I don't think the assert is
quite right, but this really should be a verifier check. Instcombine
should not be asserting on this sort of thing.

llvm-svn: 356219
2019-03-14 23:45:09 +00:00
Sanjay Patel de1d5d3675 [InstCombine] canonicalize funnel shift constant shift amount to be modulo bitwidth
The shift argument is defined to be modulo the bitwidth, so if that argument
is a constant, we can always reduce the constant to its minimal form to allow
better CSE and other follow-on transforms.

We need to be careful to ignore constant expressions here, or we will likely
infinite loop. I'm adding a general vector constant query for that case.

Differential Revision: https://reviews.llvm.org/D59374

llvm-svn: 356192
2019-03-14 19:22:08 +00:00
Matt Arsenault caf1316f71 IR: Add immarg attribute
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.

Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.

This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.

llvm-svn: 355981
2019-03-12 21:02:54 +00:00
Nikita Popov 884feb1b69 [InstCombine] Fold add nsw + sadd.with.overflow
Fold `add nsw` and `sadd.with.overflow` with constants if the addition
does not overflow.

Part of https://bugs.llvm.org/show_bug.cgi?id=38146.

Patch by Dan Robertson.

Differential Revision: https://reviews.llvm.org/D58881

llvm-svn: 355530
2019-03-06 18:30:00 +00:00
Davide Italiano 672bec223d [InstCombine] Mark debug values as unavailable after DCE.
Fixes PR40838.

llvm-svn: 355301
2019-03-04 04:38:58 +00:00
Sanjay Patel 1f65903dc1 [InstCombine] move add after smin/smax
Follow-up to rL355221.
This isn't specifically called for within PR14613,
but we'll get there eventually if it's not already
requested in some other bug report.

https://rise4fun.com/Alive/5b0

  Name: smax
  Pre: WillNotOverflowSignedSub(C1,C0)
  %a = add nsw i8 %x, C0
  %cond = icmp sgt i8 %a, C1
  %r = select i1 %cond, i8 %a, i8 C1
  =>
  %c2 = icmp sgt i8 %x, C1-C0
  %u2 = select i1 %c2, i8 %x, i8 C1-C0
  %r = add nsw i8 %u2, C0

  Name: smin
  Pre: WillNotOverflowSignedSub(C1,C0)
  %a = add nsw i32 %x, C0
  %cond = icmp slt i32 %a, C1
  %r = select i1 %cond, i32 %a, i32 C1
  =>
  %c2 = icmp slt i32 %x, C1-C0
  %u2 = select i1 %c2, i32 %x, i32 C1-C0
  %r = add nsw i32 %u2, C0

llvm-svn: 355272
2019-03-02 16:45:10 +00:00
Philip Reames cf0a978e1f [InstCombine] Extend saturating idempotent atomicrmw transform to FP
I'm assuming that the nan propogation logic for InstructonSimplify's handling of fadd and fsub is correct, and applying the same to atomicrmw.

Differential Revision: https://reviews.llvm.org/D58836

llvm-svn: 355222
2019-03-01 19:50:36 +00:00
Sanjay Patel 6e1e7e1c3e [InstCombine] move add after umin/umax
In the motivating cases from PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
...moving the add enables us to narrow the
min/max which eliminates zext/trunc which
enables signficantly better vectorization.
But that bug is still not completely fixed.

https://rise4fun.com/Alive/5KQ

  Name: umax
  Pre: C1 u>= C0
  %a = add nuw i8 %x, C0
  %cond = icmp ugt i8 %a, C1
  %r = select i1 %cond, i8 %a, i8 C1
  =>
  %c2 = icmp ugt i8 %x, C1-C0
  %u2 = select i1 %c2, i8 %x, i8 C1-C0
  %r = add nuw i8 %u2, C0

  Name: umin
  Pre: C1 u>= C0
  %a = add nuw i32 %x, C0
  %cond = icmp ult i32 %a, C1
  %r = select i1 %cond, i32 %a, i32 C1
  =>
  %c2 = icmp ult i32 %x, C1-C0
  %u2 = select i1 %c2, i32 %x, i32 C1-C0
  %r = add nuw i32 %u2, C0

llvm-svn: 355221
2019-03-01 19:42:40 +00:00
Philip Reames 77982868c5 [InstCombine] Extend "idempotent" atomicrmw optimizations to floating point
An idempotent atomicrmw is one that does not change memory in the process of execution.  We have already added handling for the various integer operations; this patch extends the same handling to floating point operations which were recently added to IR.

Note: At the moment, we canonicalize idempotent fsub to fadd when ordering requirements prevent us from using a load.  As discussed in the review, I will be replacing this with canonicalizing both floating point ops to integer ops in the near future.

Differential Revision: https://reviews.llvm.org/D58251

llvm-svn: 355210
2019-03-01 18:00:07 +00:00
Sanjay Patel 4a47f5f550 [InstCombine] fold adds of constants separated by sext/zext
This is part of a transform that may be done in the backend:
D13757
...but it should always be beneficial to fold this sooner in IR
for all targets.

https://rise4fun.com/Alive/vaiW

  Name: sext add nsw
  %add = add nsw i8 %i, C0
  %ext = sext i8 %add to i32
  %r = add i32 %ext, C1
  =>
  %s = sext i8 %i to i32
  %r = add i32 %s, sext(C0)+C1

  Name: zext add nuw
  %add = add nuw i8 %i, C0
  %ext = zext i8 %add to i16
  %r = add i16 %ext, C1
  =>
  %s = zext i8 %i to i16
  %r = add i16 %s, zext(C0)+C1

llvm-svn: 355118
2019-02-28 19:05:26 +00:00
Bjorn Pettersson d30f308a9f Add support for computing "zext of value" in KnownBits. NFCI
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.

This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.

Reviewers: craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58650

llvm-svn: 355099
2019-02-28 15:45:29 +00:00
Sanjay Patel e8bf0f79bd [InstCombine] canonicalize more unsigned saturated add with 'not'
Yet another pattern variation suggested by:
https://bugs.llvm.org/show_bug.cgi?id=14613

There are 8 more potential commuted patterns here on top of the
8 that were already handled (rL354221, rL354276, rL354393).
We have the obvious commute of the 'add' + commute of the cmp
predicate/operands (ugt/ult) + commute of the select operands:

Name: base
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ult i32 %x, %y
%r = select i1 %c, i32 -1, i32 %a
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: ugt
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ugt i32 %y, %x
%r = select i1 %c, i32 -1, i32 %a
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: commute select
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ult i32 %y, %x
%r = select i1 %c, i32 %a, i32 -1
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

Name: ugt + commute select
%notx = xor i32 %x, -1
%a = add i32 %notx, %y
%c = icmp ugt i32 %x, %y
%r = select i1 %c, i32 %a, i32 -1
=>
%c2 = icmp ult i32 %a, %y
%r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/den

llvm-svn: 354887
2019-02-26 15:18:49 +00:00
Sanjay Patel 9907d3c8b4 [InstCombine] canonicalize add/sub with bool
add A, sext(B) --> sub A, zext(B)

We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.

The backend should be prepared for this change after:
D57401
rL353433

This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.

The seeming regression in the fuzzer test was discussed in:
D58359

We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.

llvm-svn: 354748
2019-02-24 16:57:45 +00:00
Sanjay Patel c1e0184317 [InstCombine] reduce even more unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

  Name: uaddsat, -1 fval
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ugt i32 %notx, %y
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

  Name: uaddsat, -1 fval + ult
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ult i32 %y, %notx
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/nTp

llvm-svn: 354393
2019-02-19 22:14:21 +00:00
Sanjay Patel dcb93c0dda [InstCombine] rearrange saturated add folds; NFC
This is no-functional-change-intended, but that was also
true when it was part of rL354276, and I managed to lose
2 predicates for the fold with constant...causing much bot
distress. So this time I'm adding a couple of negative tests
to avoid that.

llvm-svn: 354384
2019-02-19 21:46:13 +00:00
Sanjay Patel 8a35d339c9 Revert "[InstCombine] reduce even more unsigned saturated add with 'not' op"
This reverts commit 079b610c29.
Bots are failing after this change on a stage 2 compile of clang.

llvm-svn: 354277
2019-02-18 16:04:22 +00:00
Sanjay Patel 079b610c29 [InstCombine] reduce even more unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

  Name: uaddsat, -1 fval
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ugt i32 %notx, %y
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

  Name: uaddsat, -1 fval + ult
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ult i32 %y, %notx
  %r = select i1 %c, i32 %a, i32 -1
  =>
  %a = add i32 %x, %y
  %c2 = icmp ugt i32 %y, %a
  %r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/nTp

llvm-svn: 354276
2019-02-18 15:21:39 +00:00
Sanjay Patel b341ee7071 [InstCombine] reduce more unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

  Name: not op
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ult i32 %notx, %y
  %r = select i1 %c, i32 -1, i32 %a
  =>
  %a = add i32 %x, %y
  %c2 = icmp ult i32 %a, %y
  %r = select i1 %c2, i32 -1, i32 %a

  Name: not op ugt
  %notx = xor i32 %x, -1
  %a = add i32 %x, %y
  %c = icmp ugt i32 %y, %notx
  %r = select i1 %c, i32 -1, i32 %a
  =>
  %a = add i32 %x, %y
  %c2 = icmp ult i32 %a, %y
  %r = select i1 %c2, i32 -1, i32 %a

https://rise4fun.com/Alive/niom

(The matching here is still incomplete.)

llvm-svn: 354224
2019-02-17 16:48:50 +00:00
Sanjay Patel bee2073542 [InstCombine] reduce unsigned saturated add with 'not' op
We want to use the sum in the icmp to allow matching with
m_UAddWithOverflow and eliminate the 'not'. This is discussed
in D51929 and is another step towards solving PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613

(The matching here is incomplete. Trying to take minimal steps
to make sure we don't induce infinite looping from existing
canonicalizations of the 'select'.)

llvm-svn: 354221
2019-02-17 15:58:48 +00:00
Philip Reames 8220ecbce1 [InstCombine] Address a couple stylistic issues pointed out by reviewer [NFC]
Better addressing comments from https://reviews.llvm.org/D58290.

llvm-svn: 354171
2019-02-15 21:31:39 +00:00
Philip Reames cae6c767e8 [InstCombine] Convert atomicrmws to xchg or store where legal
Implement two more transforms of atomicrmw:
1) We can convert an atomicrmw which produces a known value in memory into an xchg instead.
2) We can convert an atomicrmw xchg w/o users into a store for some orderings.

Differential Revision: https://reviews.llvm.org/D58290

llvm-svn: 354170
2019-02-15 21:23:51 +00:00
Sanjay Patel 8a2b543a13 [InstCombine] fix crash while trying to narrow a binop of shuffles (PR40734)
https://bugs.llvm.org/show_bug.cgi?id=40734

llvm-svn: 354144
2019-02-15 16:31:55 +00:00
Philip Reames db57ef6238 [InstCombine] Add todos for possible atomicrmw transforms
llvm-svn: 354059
2019-02-14 20:48:36 +00:00
Philip Reames 485474208e Canonicalize all integer "idempotent" atomicrmw ops
For "idempotent" atomicrmw instructions which we can't simply turn into load, canonicalize the operation and constant. This reduces the matching needed elsewhere in the optimizer, but doesn't directly impact codegen.

For any architecture where OR/Zero is not a good default choice, you can extend the AtomicExpand lowerIdempotentRMWIntoFencedLoad mechanism. I reviewed X86 to make sure this works well, haven't audited other backends.

Differential Revision: https://reviews.llvm.org/D58244

llvm-svn: 354058
2019-02-14 20:41:17 +00:00
Philip Reames 97067d3c73 Teach instcombine about remaining "idempotent" atomicrmw types
Expand on Quentin's r353471 patch which converts some atomicrmws into loads. Handle remaining operation types, and fix a slight bug. Atomic loads are required to have alignment. Since this was within the InstCombine fixed point, somewhere else in InstCombine was adding alignment before the verifier saw it, but still, we should fix.

Terminology wise, I'm using the "idempotent" naming that is used for the same operations in AtomicExpand and X86ISelLoweringInfo. Once this lands, I'll add similar tests for AtomicExpand, and move the pattern match function to a common location. In the review, there was seemingly consensus that "idempotent" was slightly incorrect for this context.  Once we setle on a better name, I'll update all uses at once.

Differential Revision: https://reviews.llvm.org/D58242

llvm-svn: 354046
2019-02-14 18:39:14 +00:00