Commit Graph

2449 Commits

Author SHA1 Message Date
Reid Kleckner eb9dd5b87f Reland "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies"
This re-lands r299875.

I introduced a bug in Clang code responsible for replacing K&R, no
prototype declarations with a real function definition with a prototype.
The bug was here:

       // Collect any return attributes from the call.
  -    if (oldAttrs.hasAttributes(llvm::AttributeList::ReturnIndex))
  -      newAttrs.push_back(llvm::AttributeList::get(newFn->getContext(),
  -                                                  oldAttrs.getRetAttributes()));
  +    newAttrs.push_back(oldAttrs.getRetAttributes());

Previously getRetAttributes() carried AttributeList::ReturnIndex in its
AttributeList. Now that we return the AttributeSetNode* directly, it no
longer carries that index, and we call this overload with a single node:
  AttributeList::get(LLVMContext&, ArrayRef<AttributeSetNode*>)

That aborted with an assertion on x86_32 targets. I added an explicit
triple to the test and added CHECKs to help find issues like this in the
future sooner.

llvm-svn: 299899
2017-04-10 23:31:05 +00:00
Reid Kleckner 211b1f324f Revert "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies"
This reverts r299875. A Linux bot came back with a test failure:
http://bb.pgr.jp/builders/test-clang-i686-linux-RA/builds/741/steps/test_clang/logs/Clang%20%3A%3A%20CodeGen__2006-05-19-SingleEltReturn.c

llvm-svn: 299878
2017-04-10 20:34:19 +00:00
Reid Kleckner 324c99dee5 [IR] Make AttributeSetNode public, avoid temporary AttributeList copies
Summary:
AttributeList::get(Fn|Ret|Param)Attributes no longer creates a temporary
AttributeList just to hide the AttributeSetNode type.

I've also added a factory method to create AttributeLists from a
parallel array of AttributeSetNodes. I think this simplifies
construction of AttributeLists when rewriting function prototypes.
Previously we would test if a particular index had attributes, and
conditionally add a temporary attribute list to a vector. Now the
attribute set vector is parallel to the argument vector already that
these passes already construct.

My long term vision is to wrap AttributeSetNode* inside an AttributeSet
type that holds the enum attributes, but that will come in a follow up
change.

I haven't done any performance measurements for this change because
profiling hasn't shown that any of the affected code is hot.

Reviewers: pete, chandlerc, sanjoy, hfinkel

Reviewed By: pete

Subscribers: jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D31198

llvm-svn: 299875
2017-04-10 20:18:10 +00:00
Sanjay Patel e4159d2238 [InstCombine] improve variable names; NFCI
llvm-svn: 299871
2017-04-10 19:38:36 +00:00
Craig Topper 0d830ff7bf [InstCombine] Use commutable matchers and m_OneUse in visitSub to shorten code. Add missing test cases.
In one case I removed commute handling for a multiply with a constant since we'll eventually get the constant on the right hand side.

llvm-svn: 299863
2017-04-10 18:09:25 +00:00
Craig Topper 98851adc2a [InstCombine] Use m_c_Add to shorten some code. Add testcases for this fold since they were missing. NFC
llvm-svn: 299853
2017-04-10 16:59:40 +00:00
Sanjay Patel 570e35c157 [InstCombine] fix matching of or-of-icmps constants (PR32524)
Also, make the same change in and-of-icmps and remove a hack for detecting that case.

Finally, add some FIXME comments because the code duplication here is awful.

This should fix the remaining IR problem noted in:
https://bugs.llvm.org/show_bug.cgi?id=32524

llvm-svn: 299851
2017-04-10 16:55:57 +00:00
Craig Topper 3eec73e20b [InstCombine] Support folding of add instructions with vector constants into select operations
We currently only fold scalar add of constants into selects. This improves this to support vectors too.

Differential Revision: https://reviews.llvm.org/D31683

llvm-svn: 299847
2017-04-10 16:40:00 +00:00
Craig Topper 31cc143b51 [InstCombine] Use commutable and/or/xor matchers to simplify some code
Summary:
This is my first time using the commutable matchers so wanted to make sure I was doing it right.

Are there any other matcher tricks to further shrink this? Can we commute the whole match so we don't have to LHS and RHS separately?

Reviewers: davide, spatel

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31680

llvm-svn: 299840
2017-04-10 07:13:40 +00:00
Craig Topper 838d13e7ee [InstCombine] Make sure we preserve fast math flags when folding fp instructions into phi nodes
Summary: I noticed in the select folding code that we copied fast math flags, but did not do the same for the similar handling in phi nodes. This patch fixes that to do the same thing as select

Reviewers: spatel, davide, majnemer, hfinkel

Reviewed By: davide

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31690

llvm-svn: 299838
2017-04-10 07:00:10 +00:00
Craig Topper d8840d7b10 [InstCombine] use m_c_And and m_c_Xor to handle commuted versions of a transform.
llvm-svn: 299837
2017-04-10 06:53:28 +00:00
Craig Topper 7639460367 [InstCombine] Remove unnecessary dyn_cast to BinaryOperator around some matcher checks in visitXor.
The matchers themselves should be enough.

llvm-svn: 299835
2017-04-10 06:53:23 +00:00
Craig Topper 4738321f0c [InstCombine] Make the (A|B)^B -> A & ~B transform code consistent with the very similar (A&B)^B -> ~A & B code. This should be NFC except for the addition of hasOneUse check.
I think this code is still overly complicated and should use matchers, but first I wanted to make it consistent.

llvm-svn: 299834
2017-04-10 06:53:21 +00:00
Craig Topper 4f16d82d6b [InstCombine] Use m_OneUse to shorten some code. NFC
llvm-svn: 299833
2017-04-10 06:53:19 +00:00
Sanjay Patel 16a054d5c7 [InstCombine] remove dead cases from icmp pair switches; NFCI
"PredicatesFoldable" returns false for signed/unsigned mismatched pairs,
so these cases should never exist. We'll default to 'unreachable' on those 
predicate combos instead.

Most of what's left in these switches belongs in InstSimplify (and may 
already be there), so there's probably more that can be done to reduce
this code.

llvm-svn: 299829
2017-04-09 21:51:34 +00:00
Craig Topper afa07c5ef6 [InstCombine] Extend some OR combines to support vectors.
This adds support for these combines for vectors
(X^C)|Y -> (X|Y)^C iff Y&C == 0
Y|(X^C) -> (X|Y)^C iff Y&C == 0

llvm-svn: 299822
2017-04-09 06:12:41 +00:00
Craig Topper e63c21b1ba [InstCombine] Extend a canonicalization check to apply to vector constants too.
llvm-svn: 299821
2017-04-09 06:12:39 +00:00
Craig Topper 437c97622b [InstCombine] Use the SubOne helper function to shorten some code. NFC
llvm-svn: 299819
2017-04-09 06:12:34 +00:00
Craig Topper 9d1821b262 [InstCombine] rename variable for easier reading; NFC
We usually give constants a 'C' somewhere in the name...

llvm-svn: 299818
2017-04-09 06:12:31 +00:00
Craig Topper 33e0dbcc58 [InstCombine] Handle more commuted cases of ((A & B) | ~A) -> (~A | B)
llvm-svn: 299747
2017-04-07 07:32:00 +00:00
Craig Topper 72a622cac7 [InstCombine] Add more commuted patterns to support folding ((~A & B) | A) -> (A | B).
llvm-svn: 299737
2017-04-07 00:29:47 +00:00
Craig Topper a521c30dc6 [InstCombine] Remove testing assert I accidentally left in r299710.
llvm-svn: 299715
2017-04-06 21:29:43 +00:00
Craig Topper b4da6840d8 [InstCombine] When checking to see if we can turn subtracts of 2^n - 1 into xor, we only need to call computeKnownBits on the RHS not the whole subtract. While there use isMask instead of isPowerOf2(C+1)
Calling computeKnownBits on the RHS should allows us to recurse one step further. isMask is equivalent to the isPowerOf2(C+1) except in the case where C is all ones. But that was already handled earlier by creating a not which is an Xor with all ones. So this should be fine.

llvm-svn: 299710
2017-04-06 21:06:03 +00:00
Craig Topper 7226d796aa [InstCombine] Remove redundant combine from visitAnd
This combine is fully handled by SimplifyDemandedInstructionBits as of r299658 where I fixed this code to ensure the Add/Sub had only a single user. Otherwise it would fire and create additional instructions. That fix resulted in an improvement to code generated for tsan which is why I committed it before deleting.

Differential Revision: https://reviews.llvm.org/D31543

llvm-svn: 299704
2017-04-06 20:41:48 +00:00
Craig Topper 3fc1225c18 [InstCombine] Fix a case where we weren't checking that an instruction had a single use resulting in extra instructions being created.
llvm-svn: 299658
2017-04-06 16:42:46 +00:00
Sanjay Patel 50c82c4395 [InstCombine] add fold for icmp with or mask of low bits (PR32542)
We already have these 'and' folds:

// X & -C == -C -> X >  u ~C
// X & -C != -C -> X <= u ~C
//   iff C is a power of 2

...but we were missing the 'or' siblings.

http://rise4fun.com/Alive/n6

This should improve:
https://bugs.llvm.org/show_bug.cgi?id=32524
...but there are 2 or more other pieces to fix still.

Differential Revision: https://reviews.llvm.org/D31712

llvm-svn: 299570
2017-04-05 17:57:05 +00:00
Sanjay Patel 519a87a468 [InstCombine] fix formatting and variable names; NFCI
There must be some opportunity to refactor big chunks of nearly duplicated code in FoldOrOfICmps / FoldAndOfICmps.
Also, none of this works with vectors, but it should.

llvm-svn: 299568
2017-04-05 17:38:34 +00:00
Sanjay Patel 0bf0abedf6 [InstCombine] rename variable for easier reading; NFC
We usually give constants a 'C' somewhere in the name...

llvm-svn: 299474
2017-04-04 22:06:03 +00:00
Craig Topper c745b6a1f6 [InstCombine] Turn subtract of vectors of i1 into xor like we do for scalar i1. Matches what we already do for add.
llvm-svn: 299472
2017-04-04 21:44:56 +00:00
Craig Topper 86173600ec [InstCombine] Support folding and/or/xor with a constant vector RHS into selects and phis
Currently we only fold with ConstantInt RHS. This generalizes to any Constant RHS.

Differential Revision: https://reviews.llvm.org/D31610

llvm-svn: 299466
2017-04-04 20:26:25 +00:00
Craig Topper e06b6bcfa1 [InstCombine] Use setAllBits in place of getAllOnesValue since we know the bitwidths are the same. NFCI
llvm-svn: 299413
2017-04-04 05:03:02 +00:00
Zvi Rackover 82bf48d8b9 InstCombine: Use the InstSimplify hook for shufflevector
Summary: Start using the recently added InstSimplify hook for shuffles in the respective InstCombine visitor.

Reviewers: spatel, RKSimon, craig.topper, majnemer

Reviewed By: majnemer

Subscribers: majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D31526

llvm-svn: 299412
2017-04-04 04:47:57 +00:00
Craig Topper 1604f0773b [InstCombine] Remove canonicalization for (X & C1) | C2 --> (X | C2) & (C1|C2) when C1 & C2 have common bits.
It turns out that SimplifyDemandedInstructionBits will get called earlier and remove bits from C1 first. Effectively doing (X & (C1&C2)) | C2. So by the time it got to this check there could be no common bits.

I think the DAGCombiner has the same check but its check can be executed because it handles demanded bits later. I'll look at it next.

llvm-svn: 299384
2017-04-03 20:41:47 +00:00
Craig Topper 3882613956 [DAGCombine][InstCombine] Fix inverted if condition in equivalent comments in DAGCombine and InstCombine. NFC
llvm-svn: 299378
2017-04-03 19:18:48 +00:00
Craig Topper 79120e80b8 Revert r299337 "[InstCombine] Remove redundant combine from visitAnd"
One of the tsan bots started failing at this commit. I don't see anything obviously wrong with the commit so trying this to see if it recovers.

Failing log: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-autoconf/builds/6792

llvm-svn: 299366
2017-04-03 17:22:23 +00:00
Sanjay Patel 77bf622db6 [InstCombine] fix formatting for foldLogOpOfMaskedICmps and related bits; NFCI
1. Improve enum, function, and variable names.
2. Improve comments.
3. Fix variable capitalization.
4. Run clang-format.

As an existing code comment suggests, this should work with vector types / splat constants too,
so making this look right first will reduce the diffs needed for that change.

llvm-svn: 299365
2017-04-03 16:53:12 +00:00
Craig Topper d33ee1b960 [APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.

Differential Revision: https://reviews.llvm.org/D31565

llvm-svn: 299362
2017-04-03 16:34:59 +00:00
Craig Topper d0b053d229 [InstCombine] Make foldOpWithConstantIntoOperand take a BinaryOperator instead of a generic Instruction.
It blindly assumes there are two operands so make it explicit.

llvm-svn: 299351
2017-04-03 07:08:08 +00:00
Craig Topper 07944f891c [InstCombine] Remove a And transform that should be handled by SimplifyDemandedInstructionBits. NFCI
llvm-svn: 299349
2017-04-03 06:02:09 +00:00
Craig Topper 70e4f434ae [InstCombine] Make InstCombiner::OptAndOp take a BinaryOperator instead of an Instruction.
The callers have already performed the necessary cast before calling. This allows us to remove a comment that says the instruction must be a BinaryOperator and make it explicit in the argument type.

Had to add a default case to the switch because BinaryOperator::getOpcode() returns a BinaryOps enum.

llvm-svn: 299339
2017-04-02 17:57:30 +00:00
Craig Topper d133591a7e [InstCombine] Remove redundant combine from visitAnd
As far as I can tell this combine is fully handled by SimplifyDemandedInstructionBits.

I was only looking at this because it is the only user of APIntOps::isShiftedMask which is itself broken. As demonstrated by r299187. I was going to fix isShiftedMask and needed to make sure we had coverage for the new cases it would expose to this combine. But looks like we can nuke it instead.

Differential Revision: https://reviews.llvm.org/D31543

llvm-svn: 299337
2017-04-02 17:34:30 +00:00
Craig Topper 47fd2de304 [APInt] Fix bugs in isShiftedMask to match behavior of the similar function in MathExtras.h
This removes a parameter from the routine that was responsible for a lot of the issue. It was a bit count that had to be set to the BitWidth of the APInt and would get passed to getLowBitsSet. This guaranteed the call to getLowBitsSet would create an all ones value. This was then compared to (V | (V-1)). So the only shifted masks we detected had to have the MSB set.

The one in tree user is a transform in InstCombine that never fires due to earlier transforms covering the case better. I've submitted a patch to remove it completely, but for now I've just adapted it to the new interface for isShiftedMask.

llvm-svn: 299273
2017-03-31 22:23:42 +00:00
Craig Topper e625d74271 [InstCombine] When adding an Instruction and its Users to the worklist at the same time, make sure we put the Users in first. Then put in the instruction.
This way we ensure we immediately revisit the instruction and do any additional optimizations before visiting the users. Otherwise we might visit the users, then the instruction, then users again, then instruction again.

llvm-svn: 299267
2017-03-31 21:35:30 +00:00
Craig Topper 885fa12e8a [APInt] Remove shift functions from APIntOps namespace. Replace the few users with the APInt class methods. NFCI
llvm-svn: 299248
2017-03-31 20:01:16 +00:00
Joerg Sonnenberger 28bed106e0 Do not translate rint into nearbyint, but truncate it like nearbyint.
A common way to implement nearbyint is by fiddling with the floating
point environment and calling rint. This is used at least by the BSD
libm and musl. As such, canonicalizing the latter to the former will
create infinite loops for libm and generally pessimize performance, at
least when the generic C versions are used.

This change preserves the rint in the libcall translation and also
handles the domain truncation logic, so that rint with float argument
will be reduced to rintf etc.

llvm-svn: 299247
2017-03-31 19:58:07 +00:00
Dehao Chen fed890ea3a Fix the InstCombine to reserve the VP metadata and sets correct call count.
Summary: Currently the VP metadata was dropped when InstCombine converts a call to direct call. This patch converts the VP metadata to branch_weights so that its hotness is recorded.

Reviewers: eraman, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31344

llvm-svn: 299228
2017-03-31 15:59:52 +00:00
Craig Topper 79e5bc528d [InstCombine] Fix typo last->least. NFC
llvm-svn: 299123
2017-03-30 22:28:55 +00:00
Simon Pilgrim 68168d17b9 Spelling mistakes in comments. NFCI.
Based on corrections mentioned in patch for clang for PR27635

llvm-svn: 299072
2017-03-30 12:59:53 +00:00
Matthew Simpson c8f0aeccda [InstCombine] Correct the check for vector GEPs
Some of the GEP combines (e.g., descaling) can't handle vector GEPs. We have an
existing check that attempts to bail out if given a vector GEP. However, the
check only tests the GEP's pointer operand. A GEP results in a vector of
pointers if at least one of its operands is vector-typed (e.g., its pointer
operand could be a scalar, but its index could be a vector). We should just
check the type of the GEP itself. This should fix PR32414.

Reference: https://bugs.llvm.org/show_bug.cgi?id=32414
Differential Revision: https://reviews.llvm.org/D31470

llvm-svn: 299017
2017-03-29 18:23:08 +00:00
Anna Thomas 923e574bff [InstCombine] For select rule, use positive check of constant int for select operand. NFCI
llvm-svn: 298906
2017-03-28 09:32:24 +00:00
Anna Thomas f57ae33381 [InstCombine] Avoid incorrect folding of select into phi nodes when incoming element is a vector type
Summary:
We are incorrectly folding selects into phi nodes when the incoming value of a phi
node is a constant vector. This optimization is done in `FoldOpIntoPhi` when the
select condition is a phi node with constant incoming values.
Without the fix, we are miscompiling (i.e. incorrectly folding the
select into the phi node) when the vector contains non-zero
elements.
This patch fixes the miscompile and we will correctly fold based on the
select vector operand (see added test cases).

Reviewers: majnemer, sanjoy, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31189

llvm-svn: 298845
2017-03-27 13:52:51 +00:00
Craig Topper 47596dd4cc [InstCombine] Change the interface of SimplifyDemandedBits so that it takes the instruction and operand instead of the Use.
The first thing it did was get the User for the Use to get the instruction back. This requires looking through the Uses for the User using the waymarking walk. That's pretty fast, but its probably still better to just pass the Instruction we already had.

llvm-svn: 298772
2017-03-25 06:52:52 +00:00
Craig Topper 8fbb74b5b2 Revert r298711 "[InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits"
Tsan bot is failing.

llvm-svn: 298745
2017-03-24 22:12:10 +00:00
Matt Arsenault 4c7795dd31 AMDGPU: Fold rcp/rsq of undef to undef
llvm-svn: 298725
2017-03-24 19:04:57 +00:00
Craig Topper d4521c2fc2 [InstCombine] Provide a way to calculate KnownZero/One for Add/Sub in SimplifyDemandedUseBits without recursing into ComputeKnownBits
SimplifyDemandedUseBits for Add/Sub already recursed down LHS and RHS for simplifying bits. If that didn't provide any simplifications we fall back to calling computeKnownBits which will recurse again. Instead just take the known bits for LHS and RHS we already have and call into a new function in ValueTracking that can calculate the known bits given the LHS/RHS bits.

llvm-svn: 298711
2017-03-24 16:56:51 +00:00
Craig Topper 36f2e0eee8 [InstCombine] Use range-based for loop. NFC
llvm-svn: 298680
2017-03-24 02:58:02 +00:00
Craig Topper df73e7c5b7 [InstCombine] Fix 80 column violation I accidentally introduced. NFC
llvm-svn: 298679
2017-03-24 02:57:59 +00:00
Craig Topper 74494d0179 [InstCombine] Remove some code from visitAnd that dealt with trying to reduce the LHS of a sub to 0. This should now be fully handled by SimplifyDemandedInstructionBits now.
Now that we call ShrinkDemandedConstant on the RHS of sub this should be taken care of. This code doesn't trigger on any in tree regressions, but did before ShrinkDemandedConstant was added to the RHS.

llvm-svn: 298644
2017-03-23 21:00:13 +00:00
Sanjay Patel 2f602cea41 [InstCombine] canonicalize insertelement of scalar constant ahead of insertelement of variable
insertelement (insertelement X, Y, IdxC1), ScalarC, IdxC2 -->
insertelement (insertelement X, ScalarC, IdxC2), Y, IdxC1

As noted in the code comment and seen in the test changes, the motivation is that by pulling
constant insertion up, we may be able to constant fold some insertelement instructions.

Differential Revision: https://reviews.llvm.org/D31196

llvm-svn: 298520
2017-03-22 17:10:44 +00:00
Craig Topper 07f2915ad8 [InstCombine] Teach SimplifyDemandedUseBits to shrink Constants on the left side of subtracts
Summary: Subtracts can have constants on the left side, but we don't shrink them based on demanded bits. This patch fixes that to match the right hand side.

Reviewers: davide, majnemer, spatel, sanjoy, hfinkel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31119

llvm-svn: 298478
2017-03-22 04:03:53 +00:00
Reid Kleckner b518054b87 Rename AttributeSet to AttributeList
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.

Rename AttributeSetImpl to AttributeListImpl to follow suit.

It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.

Reviewers: sanjoy, javed.absar, chandlerc, pete

Reviewed By: pete

Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits

Differential Revision: https://reviews.llvm.org/D31102

llvm-svn: 298393
2017-03-21 16:57:19 +00:00
Artur Pilipenko 4cc6130f52 NFC. InstCombiner::visitFAdd extract LHSIntVal/RHSIntVal local variables
llvm-svn: 298359
2017-03-21 11:32:15 +00:00
Matt Arsenault 6b00d40900 InstCombine: Check source value precision when reducing cast intrinsic
Missed this check when porting from the libcall version.

llvm-svn: 298312
2017-03-20 21:59:24 +00:00
Craig Topper d92d2fc763 [InstCombine] Print a debug message when we constant fold an operand during worklist creation
InstCombine tries to constant fold instruction operands during worklist building, but we don't print that we're doing this.

We also set a change flag here that causes us to rebuild and rerun the worklist one more time even if processing the worklist itself created no additional changes. So in the log I saw two inst combine runs that visited all instructions without printing that anything was changed. I may be submitting another patch to remove the change flag unless I can find some reason why we should be doing that.

Differential Revision: https://reviews.llvm.org/D31091

llvm-svn: 298264
2017-03-20 16:31:14 +00:00
Craig Topper ff9749f759 [InstCombine] Remove duplicate code in SimplifyDemandedUseBits for URem. NFC
llvm-svn: 298231
2017-03-19 21:45:57 +00:00
Craig Topper 3a86a04404 [InstCombine] Use setHighBits/setLowBits/setBitsFrom in place of getLowBitsSet/getHighBitsSet.
llvm-svn: 298204
2017-03-19 05:49:16 +00:00
Adrian Prantl 47ea6478ed Salvage debug info from instructions about to be deleted
[Reapplies r297971 and punting on finding a better API for findDbgValues()]

This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.

In the example in the testcase (which was extracted from XNU) there is a sequence of

 %4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
 %5 = bitcast %struct.entry* %4 to i8*, !dbg !42
 %add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
 %6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
 call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34

When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:

- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic

The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.

rdar://problem/30725338

Differential Revision: https://reviews.llvm.org/D30919

llvm-svn: 297994
2017-03-16 21:14:09 +00:00
Sanjay Patel 6105bb5eaf [InstCombine] avoid breaking up bitcasted vector min/max patterns (PR32306)
As the related tests show, we're not canonicalizing to this form for scalars or vectors yet,
but this solves the immediate problem in:
https://bugs.llvm.org/show_bug.cgi?id=32306

llvm-svn: 297989
2017-03-16 20:42:45 +00:00
Adrian Prantl fa9e84eb6d Revert commit r297971 because of issues reported by msan.
llvm-svn: 297982
2017-03-16 20:11:54 +00:00
Adrian Prantl 4377314a98 Salvage debug info from instructions about to be deleted
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.

In the example in the testcase (which was extracted from XNU) there is a sequence of

  %4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
  %5 = bitcast %struct.entry* %4 to i8*, !dbg !42
  %add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
  %6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
  call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34

When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:

- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic

The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.

rdar://problem/30725338

Differential Revision: https://reviews.llvm.org/D30919

llvm-svn: 297971
2017-03-16 18:22:52 +00:00
Bjorn Pettersson c98dabb1a0 [InstCombine] Liberate assert in InstCombiner::visitZExt
Summary:
The call to canEvaluateZExtd in InstCombiner::visitZExt may
return with BitsToClear == SrcTy->getScalarSizeInBits(), but
there is an assert that BitsToClear should be smaller than
SrcTy->getScalarSizeInBits().

I have a test case that triggers the assert, but it only happens
for my downstream target. I've not been able to trigger it for
any upstream target.

The assert triggered for a piece of code such as this
  %shr1 = lshr i16 undef, 15
  ...
  %shr2 = lshr i16 %shr1, 1
  %conv = zext i16 %shr2 to i32

Normally the lshr instructions are constant folded before we
visit the zext (that is why it is so hard to reproduce).
The original pattern, before instcombine, is of course a lot more
complicated in my test case. The shift count in the second lshr
is for example determined by the outcome of a PHI instruction.
It seems like other rewrites by instcombine leads up to
the pattern above. And then the zext is pulled from the
worklist, and visited (hitting the assert), before we detect
that the lshr instrucions can be constant folded.

Anyway, since the canEvaluateZExtd may return with BitsToClear
equal to SrcTy->getScalarSizeInBits(), and since the rewrite
that converts the expression type to avoid a zero extend works
also for the case where SrcBitsKept ends up being zero, then
it should be OK to liberate the assert to
  assert(BitsToClear <= SrcTy->getScalarSizeInBits() &&
         "Unreasonable BitsToClear");

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D30993

llvm-svn: 297952
2017-03-16 13:22:01 +00:00
Sanjay Patel a0a5682d00 [InstCombine] improve readability; NFCI
llvm-svn: 297755
2017-03-14 17:27:27 +00:00
Matt Arsenault d81f557fe2 AMDGPU: Fold icmp/fcmp into icmp intrinsic
The typical use is a library vote function which
compares to 0. Fold the user condition into the intrinsic.

llvm-svn: 297650
2017-03-13 18:14:02 +00:00
Matt Arsenault a3bdd8f27b AMDGPU: Fix insertion point when reducing load intrinsics
The insertion point may be later than the next instruction,
so it is necessary to set it when replacing the call.

llvm-svn: 297439
2017-03-10 05:25:49 +00:00
Matt Arsenault efe949cc67 AMDGPU: Support for SimplifyDemandedVectorElts for load intrinsics
llvm-svn: 297408
2017-03-09 20:34:27 +00:00
Sanjay Patel 62906af379 [InstCombine] avoid crashing on shuffle shrinkage when input type is not same as result type
llvm-svn: 297280
2017-03-08 15:02:23 +00:00
Sanjay Patel fe9705149b [InstCombine] shrink truncated insertelement into undef vector
This is the 2nd part of solving:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html

D30123 moves the trunc ahead of the shuffle, and this moves the trunc ahead of the insertelement. 
We're limiting this transform to undef rather than any constant to avoid backend problems.

Differential Revision: https://reviews.llvm.org/D30137

llvm-svn: 297242
2017-03-07 23:27:14 +00:00
Sanjay Patel 53fa17a014 [InstCombine] shrink truncated splat shuffle (2nd try)
This was committed at r297155 and reverted at r297166 because of an
over-reaching clang test. That should be fixed with r297189.

This is one part of solving a recent bug report:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html

This keeps with our general approach: changing arbitrary shuffles is off-limts,
but changing splat is ok. The transform is very similar to the existing
shrinkBitwiseLogic() canonicalization.

Differential Revision: https://reviews.llvm.org/D30123

llvm-svn: 297232
2017-03-07 21:45:16 +00:00
Sanjay Patel 6d30606168 revert r297155 because there's a clang test that depends on InstCombine:
tools/clang/test/CodeGen/zvector.c

llvm-svn: 297166
2017-03-07 17:41:45 +00:00
Sanjay Patel defdb7bed5 [InstCombine] shrink truncated splat shuffle
This is one part of solving a recent bug report:
http://lists.llvm.org/pipermail/llvm-dev/2017-February/110293.html

This keeps with our general approach: changing arbitrary shuffles is off-limts, 
but changing splat is ok. The transform is very similar to the existing 
shrinkBitwiseLogic() canonicalization.

Differential Revision: https://reviews.llvm.org/D30123

llvm-svn: 297155
2017-03-07 16:10:36 +00:00
Sanjay Patel c3b4735b6f [InstCombine] use dyn_cast instead of isa+cast; NFCI
llvm-svn: 297092
2017-03-06 23:25:28 +00:00
Simon Pilgrim e938a152c5 Use APInt::getLowBitsSet instead of APInt::getBitsSet for lower bit mask creation
llvm-svn: 296882
2017-03-03 16:56:33 +00:00
Bjorn Pettersson e5027cfbcc [InstCombine] Avoid faulty combines of select-cmp-br
Summary:
When InstCombine is optimizing certain select-cmp-br patterns
it replaces the result of the select in uses outside of the
basic block containing the select. This is only legal if the
path from the select to the outside use is disjoint from all
other paths out from the originating basic block.

The problem found was that InstCombiner::replacedSelectWithOperand
did not consider the case when both edges out from the br pointed
to the same label. In that case the paths aren't disjoint and the
transformation is illegal. This patch avoids the faulty rewrites
by verifying that there is a single flow to the successor where
we want to replace uses.

Reviewers: llvm-commits, spatel, majnemer

Differential Revision: https://reviews.llvm.org/D30455

llvm-svn: 296752
2017-03-02 15:18:58 +00:00
Mikael Holmen 760dc9aba7 Remove sometimes faulty rewrite of memcpy in instcombine.
Summary:
Solves PR 31990.

The bad rewrite could replace a memcpy of one word with
 store i4 -1
while it should actually be
 store i8 -1

Hopefully opt and llc has improved enough so the original optimization
done by the code isn't needed anymore.

One already existing testcase is affected. It originally tested that
the memcpy was replaced with
 load double
but since we now remove that rewrite it will be
 load i64
instead.

Patch suggestion by Eli Friedman.

Reviewers: eli.friedman, majnemer, efriedma

Reviewed By: efriedma

Subscribers: efriedma, llvm-commits

Differential Revision: https://reviews.llvm.org/D30254

llvm-svn: 296585
2017-03-01 06:45:20 +00:00
Matt Arsenault cdb468c0f9 AMDGPU: Basic folds for fmed3 intrinsic
Constant fold, canonicalize constants to RHS,
reduce to minnum/maxnum when inputs are nan/undef.

llvm-svn: 296409
2017-02-27 23:08:49 +00:00
Yaxun Liu e6d1ce59c0 [InstCombine] Fix bug in pointer replacement
This optimisation was crashing when there was a chain of more than one bitcast
instruction to replace, as a result of the changes in D27283.

Patch by James Price.

Differential Revision: https://reviews.llvm.org/D30347

llvm-svn: 296163
2017-02-24 20:27:25 +00:00
Sanjay Patel ec9a8de0e6 [InstCombine] don't try SimplifyDemandedInstructionBits from zext/sext because it's slow and unnecessary
This one seems more obvious than D30270 that it can't make improvements because an extension always needs
all of the incoming bits. There's one specific transform in SimplifyDemandedInstructionBits of converting
a sext to a zext when the sign-bit is known zero, but that is handled explicitly in visitSext() with
ComputeSignBit().

Like D30270, there are no IR differences (other than instruction names) for the case in PR32037:
https://bugs.llvm.org//show_bug.cgi?id=32037
...and no regression test differences.

Zext/sext are a smaller part of the profile, but this still appears to shave off another 0.5% or so from
'opt -O2'.

Differential Revision: https://reviews.llvm.org/D30280

llvm-svn: 296129
2017-02-24 15:18:42 +00:00
Sanjay Patel 68e4cb3c86 [InstCombine] use loop instead of recursion to peek through FPExt; NFCI
llvm-svn: 295992
2017-02-23 16:39:51 +00:00
Sanjay Patel adf2ab16e4 [InstCombine] use 'match' to reduce code; NFCI
llvm-svn: 295991
2017-02-23 16:26:03 +00:00
Matt Arsenault d4bca1e9ef AMDGPU: Replace disabled exp inputs with undef
llvm-svn: 295914
2017-02-23 00:44:03 +00:00
Matt Arsenault f5262256a1 AMDGPU: Add replacement bfe intrinsics
llvm-svn: 295899
2017-02-22 23:04:58 +00:00
Sanjay Patel 4805ce0b17 [InstCombine] don't try SimplifyDemandedInstructionBits from add/sub because it's slow and unlikely to succeed
Notably, no regression tests change when we remove these calls, and these are expensive calls.

The motivation comes from the general acknowledgement that the compiler is getting slower:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109188.html
http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html

And specifically the test case attached to PR32037:
https://bugs.llvm.org//show_bug.cgi?id=32037

Profiling the middle-end (opt) part of the compile:
$ ./opt -O2 row_common.bc -o /dev/null

...visitAdd and visitSub are near the top of the instcombine list, and the calls to SimplifyDemandedInstructionBits()
are high within each of those. Those calls account for 1%+ of the opt time in either debug or release profiles. And 
that's the rough win I see from this patch when testing opt built release from r295864 on an iMac with Haswell 4GHz
(model 4790K).

It seems unlikely that we'd be able to eliminate add/sub or change their operands given that add/sub normally affect
all bits, and the PR32037 example shows no IR difference after this change using -O2.

Also worth noting - the code comment in visitAdd:
// This handles stuff like (X & 254)+1 -> (X&254)|1
...isn't true. That transform is handled later with a call to haveNoCommonBitsSet().

Differential Revision: https://reviews.llvm.org/D30270

llvm-svn: 295898
2017-02-22 23:01:12 +00:00
Matt Arsenault 1f17c66890 AMDGPU: Add cvt.pkrtz intrinsic
Convert llvm.SI.packf16 test uses

llvm-svn: 295797
2017-02-22 00:27:34 +00:00
Sanjay Patel cb731f1538 [InstCombine] canonicalize non-obivous forms of integer min/max
This is part of trying to clean up our handling of min/max patterns in IR.
By converting these to canonical form, we're more likely to recognize them
because there are various places in InstCombine that don't use 
matchSelectPattern or m_SMax and friends.

The backend fixups referenced in the now deleted TODO comment were added with:
https://reviews.llvm.org/rL291392
https://reviews.llvm.org/rL289738

If there's any codegen fallout from this change, we should be able to address
it in DAGCombiner or target-specific lowering. 

llvm-svn: 295758
2017-02-21 19:33:53 +00:00
Anna Thomas ec36f3b79a [InstCombine] Do not exercise nested max/min pattern on abs
Summary:
This is a fix for assertion failure in
`getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern.

We should not be invoking the simplification rule for
ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations.

Added a test case which would cause an assertion failure without the patch.

Reviewers: sanjoy, majnemer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30051

llvm-svn: 295719
2017-02-21 14:40:28 +00:00
Sanjay Patel 53c5c3d65d [InstCombine] add nsw/nuw X, signbit --> or X, signbit
Changing to 'or' (rather than 'xor' when no wrapping flags are set)
allows icmp simplifies to happen as expected.

Differential Revision: https://reviews.llvm.org/D29729

llvm-svn: 295574
2017-02-18 22:20:09 +00:00
Eugene Leviant 958fcd7502 InstCombine: fix extraction when performing vector/array punning
Differential revision: https://reviews.llvm.org/D29491

llvm-svn: 295429
2017-02-17 07:36:03 +00:00
Matt Arsenault 920576042d InstCombine: Canonicalize fast fmuladd to fmul + fadd
llvm-svn: 295353
2017-02-16 18:46:24 +00:00
Craig Topper 3731f4d173 [AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus intrinsics like it does 128/256-bit.
llvm-svn: 295294
2017-02-16 07:35:23 +00:00
Sanjay Patel 845ea963aa [InstCombine] improve formatting; NFC
llvm-svn: 295237
2017-02-15 21:31:34 +00:00
Sanjay Patel 45b7e69fef [InstCombine] fold icmp sgt/slt (add nsw X, C2), C --> icmp sgt/slt X, (C - C2)
I found one special case of this transform for 'slt 0', so I removed that and added the general transform.

Alive code to check correctness:

Name: slt_no_overflow
Pre: WillNotOverflowSignedSub(C1, C2)
%a = add nsw i8 %x, C2
%b = icmp slt %a, C1
  =>
%b = icmp slt %x, C1 - C2

Name: sgt_no_overflow
Pre: WillNotOverflowSignedSub(C1, C2)
%a = add nsw i8 %x, C2
%b = icmp sgt %a, C1
  =>
%b = icmp sgt %x, C1 - C2

http://rise4fun.com/Alive/MH

Differential Revision: https://reviews.llvm.org/D29774

llvm-svn: 294898
2017-02-12 16:40:30 +00:00
Benjamin Kramer 03ab8a366e [InstCombine] Move class into anonymous namespace. NFC.
This is necessary to avoid warnings from GCC.
InstCombineLoadStoreAlloca.cpp:238:7: error: 'PointerReplacer' declared
with greater visibility than the type of its field 'PointerReplacer::IC'

llvm-svn: 294794
2017-02-10 22:26:35 +00:00
Benjamin Kramer 684c87be4f [InstCombine] Silence unused variable warning in Release builds.
llvm-svn: 294788
2017-02-10 22:04:17 +00:00
Yaxun Liu ba01ed00fe Fix invalid addrspacecast due to combining alloca with global var
For function-scope variables with large initialisation list, FE usually 
generates a global variable to hold the initializer, then generates 
memcpy intrinsic to initialize the alloca. InstCombiner::visitAllocaInst 
identifies such allocas which are accessed only by reading and replaces 
them with the global variable. This is done by casting the global variable 
to the type of the alloca and replacing all references.

However, when the global variable is in a different address space which 
is disjoint with addr space 0 (e.g. for IR generated from OpenCL, 
global variable cannot be in private addr space i.e. addr space 0), casting 
the global variable to addr space 0 results in invalid IR for certain 
targets (e.g. amdgpu).

To fix this issue, when the global variable is not in addr space 0, 
instead of casting it to addr space 0, this patch chases down the uses 
of alloca until reaching the load instructions, then replaces load from 
alloca with load from the global variable. If during the chasing 
bitcast and GEP are encountered, new bitcast and GEP based on the global 
variable are generated and used in the load instructions.

Differential Revision: https://reviews.llvm.org/D27283

llvm-svn: 294786
2017-02-10 21:46:07 +00:00
Sanjay Patel f38bab73aa [InstCombine] allow (X * C2) << C1 --> X * (C2 << C1) for vectors
This fold already existed for vectors but only when 'C1' was a splat
constant (but 'C2' could be any constant). 

There were no tests for any vector constants, so I'm adding a test
that shows non-splat constants for both operands.  

llvm-svn: 294650
2017-02-09 23:13:04 +00:00
Sanjay Patel ae3b43e488 [InstCombine] use m_APInt to allow demanded bits analysis on splat constants
llvm-svn: 294628
2017-02-09 21:43:06 +00:00
Sanjay Patel 6dd2eae76a [InstCombine] add local name for repeated calls; NFC
llvm-svn: 294470
2017-02-08 16:19:36 +00:00
Igor Laevsky a9b6872908 [InstComobineCalls] Fix buildbot failures after r294453.
Some targets don't support uint64_t options. Change type to unsigned.

Differential Revision: https://reviews.llvm.org/D28909

llvm-svn: 294461
2017-02-08 15:21:48 +00:00
Igor Laevsky 900ffa34c8 [InstCombineCalls] Unfold element atomic memcpy instruction
Differential Revision: https://reviews.llvm.org/D28909

llvm-svn: 294453
2017-02-08 14:32:04 +00:00
Igor Laevsky 4b317fa24e [InstCombineCalls] Remove zero length atomic memcpy intrinsics
Differential Revision: https://reviews.llvm.org/D28909

llvm-svn: 294452
2017-02-08 14:23:47 +00:00
David Blaikie 4c01af203e Fix the -Werror build for some sign-comparisons
llvm-svn: 294331
2017-02-07 18:58:17 +00:00
Davide Italiano 2133bf5562 [InstCombine] Make max size array combine a tunable.
Requested by Sanjoy/Hal a while ago, and forgotten by me
(r283612).

llvm-svn: 294323
2017-02-07 17:56:50 +00:00
Paul Robinson 383c5c228f Merge DebugLoc on combined stores; in this case, when combining stores
from the end of two blocks, merge instead of arbitrarily picking one.

Differential Revision: http://reviews.llvm.org/D29504

llvm-svn: 294251
2017-02-06 22:19:04 +00:00
Sanjay Patel cf4c90f3d3 [InstCombine] simplify dyn_cast + isa; NFCI
llvm-svn: 294198
2017-02-06 17:16:16 +00:00
Sanjay Patel 0fe32ac256 [InstCombine] treat i1 as a special type in shouldChangeType()
This patch is based on the llvm-dev discussion here:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/109631.html

Folding to i1 should always be desirable because that's better for value tracking 
and we have special folds for i1 types.

I checked for other users of shouldChangeType() where this might have an effect, 
but we already handle the i1 case differently than other types in all of those cases.

Side note: the default datalayout includes i1, so it seems we only find this gap in 
shouldChangeType + phi folding for the case when there is (1) an explicit datalayout 
without i1, (2) casting to i1 from a legal type, and (3) a phi with exactly 2 incoming
casted operands (as Björn mentioned).

Differential Revision: https://reviews.llvm.org/D29336

llvm-svn: 294066
2017-02-03 23:13:11 +00:00
Sanjay Patel 73fc8ddb06 [InstCombine] fix operand-complexity-based canonicalization (PR28296)
The code comments didn't match the code logic, and we didn't actually distinguish the fake unary (not/neg/fneg) 
operators from arguments. Adding another level to the weighting scheme provides more structure and can help 
simplify the pattern matching in InstCombine and other places.

I fixed regressions that would have shown up from this change in:
rL290067
rL290127

But that doesn't mean there are no pattern-matching logic holes left; some combines may just be missing regression tests.

Should fix:
https://llvm.org/bugs/show_bug.cgi?id=28296

Differential Revision: https://reviews.llvm.org/D27933

llvm-svn: 294049
2017-02-03 21:43:34 +00:00
Sanjay Patel c56d1ccd79 [InstCombine] move folds for shift-shift pairs; NFCI
Although this is 'no-functional-change-intended', I'm adding tests
for shl-shl and lshr-lshr pairs because there is no existing test 
coverage for those folds.

It seems like we should be able to remove some code from foldShiftedShift()
at this point because we're handling those patterns on the general path.

llvm-svn: 293814
2017-02-01 21:31:34 +00:00
Sanjoy Das e0e5795f6b [InstCombine] Allow InstCombine to merge adjacent guards
Summary:
If there are two adjacent guards with different conditions, we can
remove one of them and include its condition into the condition of
another one. This patch allows InstCombine to merge them by the
following pattern:

    guard(a); guard(b) -> guard(a & b).

Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29378

llvm-svn: 293778
2017-02-01 16:34:55 +00:00
Davide Italiano aec4617dc8 [Instcombine] Combine consecutive identical fences
Differential Revision:  https://reviews.llvm.org/D29314

llvm-svn: 293661
2017-01-31 18:09:05 +00:00
Arnold Schwaighofer c368563bd6 Don't combine stores to a swifterror pointer operand to a different type
llvm-svn: 293658
2017-01-31 17:53:49 +00:00
Sanjay Patel 2217f75ad1 fix formatting; NFC
llvm-svn: 293652
2017-01-31 17:25:42 +00:00
Silviu Baranga c6d21eba0e [InstCombine] Make sure that LHS and RHS have the same type in
transformToIndexedCompare

If they don't have the same type, the size of the constant
index would need to be adjusted (and this wouldn't be always
possible).

Alternatively we could try the analysis with the initial
RHS value, which would guarantee that the two sides have
the same type. However it is unlikely that in practice this
would pass our transformation requirements.

Fixes PR31808 (https://llvm.org/bugs/show_bug.cgi?id=31808).

llvm-svn: 293629
2017-01-31 14:04:15 +00:00
Sanjay Patel 8c5f236197 [InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2) for vectors with splat constants
llvm-svn: 293570
2017-01-30 23:35:52 +00:00
Sanjay Patel 0c39d56a60 [InstCombine] enable more lshr(shl X, C1), C2 folds for vectors with splat constants
llvm-svn: 293562
2017-01-30 23:01:05 +00:00
Sanjay Patel 373db5ba6c [InstCombine] enable (X >>?exact C1) << C2 --> X >>?exact (C1-C2) for vectors with splat constants
llvm-svn: 293524
2017-01-30 18:40:23 +00:00
Sanjay Patel 062c14af5c [InstCombine] use auto with obvious type; NFC
llvm-svn: 293508
2017-01-30 17:38:55 +00:00
Sanjay Patel 77732d5033 [InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1-C2) for vectors with splat constants
llvm-svn: 293507
2017-01-30 17:19:32 +00:00
Sanjay Patel 8e644c08ee [InstCombine] fixed to propagate 'exact' on lshr
The original shift is bigger, so this may qualify as 'obvious', 
but here's an attempt at an Alive-based proof:

Name: exact
Pre: (C1 u< C2)
%a = shl i8 %x, C1
%b = lshr exact i8 %a, C2 
  =>
%c = lshr exact i8 %x, C2 - C1
%b = and i8 %c, ((1 << width(C1)) - 1) u>> C2

Optimization is correct!

llvm-svn: 293498
2017-01-30 16:53:03 +00:00
Sanjay Patel 1196d7cd7f [InstCombine] enable lshr(shl X, C1), C2 folds for vectors with splat constants
llvm-svn: 293489
2017-01-30 16:11:40 +00:00
Sanjay Patel 062adaab83 [InstCombine] enable (X >>?,exact C1) << C2 --> X << (C2 - C1) for vectors with splats
llvm-svn: 293435
2017-01-29 17:11:18 +00:00
Sanjay Patel febcb9ce54 [InstCombine] move icmp transforms that might be recognized as min/max and inf-loop (PR31751)
This is a minimal patch to avoid the infinite loop in:
https://llvm.org/bugs/show_bug.cgi?id=31751

But the general problem is bigger: we're not canonicalizing all of the min/max forms reported
by value tracking's matchSelectPattern(), and we don't define min/max consistently. Some code
uses matchSelectPattern(), other code uses matchers like m_Umax, and others have their own
inline definitions which may be subtly different from any of the above.

The reason that the test cases in this patch need a cast op to trigger is because we don't
(yet) canonicalize all min/max forms based on matchSelectPattern() in 
canonicalizeMinMaxWithConstant(), but we do make min/max+cast transforms based on 
matchSelectPattern() in visitSelectInst().

The location of the icmp transforms that trigger the inf-loop seems arbitrary at best, so
I'm moving those behind the min/max fence in visitICmpInst() as the quick fix.

llvm-svn: 293345
2017-01-27 23:26:27 +00:00
Justin Lebar 25ebe2d767 [NVPTX] [InstCombine] Add llvm_unreachable to appease MSVC.
llvm-svn: 293253
2017-01-27 02:04:07 +00:00
Justin Lebar e3ac0fb948 [NVPTX] Fix use-after-stack-free bug in InstCombineCalls.
Introduced in r293244.

llvm-svn: 293251
2017-01-27 01:49:39 +00:00
Justin Lebar 698c31b8db [NVPTX] Upgrade NVVM intrinsics in InstCombineCalls.
Summary:
There are many NVVM intrinsics that we can't entirely get rid of, but
that nonetheless often correspond to target-generic LLVM intrinsics.

For example, if flush denormals to zero (ftz) is enabled, we can convert
@llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32.  On the other hand, if ftz is
disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a
non-ftz PTX instruction.  In this case, we can, however, simplify the
non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32.

These transformations are particularly useful because they let us
constant fold instructions that appear in libdevice, the bitcode library
that ships with CUDA and essentially functions as its libm.

Reviewers: tra

Subscribers: hfinkel, majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D28794

llvm-svn: 293244
2017-01-27 00:58:58 +00:00
Sanjoy Das 7516192a71 Revert a couple of InstCombine/Guard checkins
This change reverts:

r293061: "[InstCombine] Canonicalize guards for NOT OR condition"
r293058: "[InstCombine] Canonicalize guards for AND condition"

They miscompile cases like:

```
declare void @llvm.experimental.guard(i1, ...)

define void @test_guard_not_or(i1 %A, i1 %B) {
  %C = or i1 %A, %B
  %D = xor i1 %C, true
  call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ]
  ret void
}
```

because they do transfer the `i32 20, i32 30` parameters to newly
created guard instructions.

llvm-svn: 293227
2017-01-26 23:38:11 +00:00
Sanjay Patel 50753f02c2 [InstCombine] fold (X >>u C) << C --> X & (-1 << C)
We already have this fold when the lshr has one use, but it doesn't need that
restriction. We may be able to remove some code from foldShiftedShift().

Also, move the similar:
(X << C) >>u C --> X & (-1 >>u C)
...directly into visitLShr to help clean up foldShiftByConstOfShiftByConst().

That whole function seems questionable since it is called by commonShiftTransforms(),
but there's really not much in common if we're checking the shift opcodes for every
fold.

llvm-svn: 293215
2017-01-26 22:08:10 +00:00
Sanjay Patel b0d96d327e [InstCombine] use m_APInt to allow (X << C) >>u C --> X & (-1 >>u C) with splat vectors
llvm-svn: 293208
2017-01-26 20:52:27 +00:00
Craig Topper b6122122c9 [X86] Add demanded elts support for the inputs to pclmul intrinsic
This intrinsic uses bit 0 and bit 4 of an immediate argument to determine which bits of its inputs to read. This patch uses this information to simplify the demanded elements of the input vectors.

Differential Revision: https://reviews.llvm.org/D28979

llvm-svn: 293151
2017-01-26 05:17:13 +00:00
Artur Pilipenko b85f7a5d99 [InstCombine] Canonicalize guards for NOT OR condition
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D29075

Patch by Maxim Kazantsev.

llvm-svn: 293061
2017-01-25 14:45:12 +00:00
Simon Pilgrim 6f6b279109 [InstCombine][SSE] Add support for PACKSS/PACKUS constant folding
Differential Revision: https://reviews.llvm.org/D28949

llvm-svn: 293060
2017-01-25 14:37:24 +00:00
Artur Pilipenko 4df4c4a4aa [InstCombine] Canonicalize guards for AND condition
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D29074

Patch by Maxim Kazantsev.

llvm-svn: 293058
2017-01-25 14:20:52 +00:00
Artur Pilipenko e812ca00bb [InstCombine] Allow InstrCombine to remove one of adjacent guards if they are equivalent
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine

Reviewed By: majnemer, apilipenko

Differential Revision: https://reviews.llvm.org/D29071

Patch by Maxim Kazantsev.

llvm-svn: 293056
2017-01-25 14:12:12 +00:00
Amaury Sechet d90f5f6698 Use InstCombine's builder in foldSelectCttzCtlz instead of creating a new one.
Summary: As per title. This will add the instructiions we are interested in in the worklist.

Reviewers: mehdi_amini, majnemer, andreadb

Differential Revision: https://reviews.llvm.org/D29081

llvm-svn: 292957
2017-01-24 17:48:25 +00:00
Amaury Sechet 5da456e6a1 Fix formating in foldSelectCttzCtlz. NFC
llvm-svn: 292934
2017-01-24 14:22:27 +00:00
Simon Pilgrim 78f8630ac0 [InstCombine][X86] MULDQ/MULUDQ undef -> zero
Added early out for single undef input - we were already supporting (and testing) this in the constant folding code, we just do it quicker now

Drop undef handling from demanded elts code now that we handle it fully in InstCombiner::visitCallInst

llvm-svn: 292913
2017-01-24 11:07:41 +00:00
Matt Arsenault 954a624fb9 SimplifyLibCalls: Replace more unary libcalls with intrinsics
llvm-svn: 292855
2017-01-23 23:55:08 +00:00
Simon Pilgrim f6f3a36159 [InstCombine][X86] Add MULDQ/MULUDQ constant folding support
llvm-svn: 292793
2017-01-23 15:22:59 +00:00
Simon Pilgrim bb13fdabec [InstCombine][X86] MULDQ/MULUDQ undef -> zero
Match generic mul behaviour so that <X x i64> multiply and muldq/muludq pattern act the same

llvm-svn: 292784
2017-01-23 12:07:32 +00:00
Sanjay Patel 478a83c905 [InstCombine] use m_APInt to allow ashr folds for vectors with splat constants
We may be able to assert that no shl-shl or lshr-lshr pairs ever get here
because we should have already handled those in foldShiftedShift().

llvm-svn: 292726
2017-01-21 17:59:59 +00:00
Simon Pilgrim a50a93fcd0 [InstCombine][X86] Add MULDQ/MULUDQ undef handling
llvm-svn: 292627
2017-01-20 18:20:30 +00:00
Simon Pilgrim 51b3b98e3a [InstCombine][SSE] Add DemandedElts support for PACKSS/PACKUS instructions
Simplify a packss/packus truncation based on the elements of the mask that are actually demanded.

Differential Revision: https://reviews.llvm.org/D28777

llvm-svn: 292591
2017-01-20 09:28:21 +00:00
Davide Italiano 2ef8c4e708 [InstCombine] Simplify gep (gep p, a), (b-a)
Patch by Andrea Canciani.

Differential Revision:  https://reviews.llvm.org/D27413

llvm-svn: 292506
2017-01-19 18:51:56 +00:00
Sanjay Patel 291c3d8ff2 [InstCombine] icmp Pred (shl nsw X, C1), C0 --> icmp Pred X, C0 >> C1
Try harder to fold icmp with shl nsw as discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2017-January/108749.html

This is similar to the 'shl nuw' transforms that were added with D25913.

This may eventually help solve:
https://llvm.org/bugs/show_bug.cgi?id=30773

Differential Revision: https://reviews.llvm.org/D28406

llvm-svn: 292492
2017-01-19 16:12:10 +00:00
Sanjay Patel ae23d65a7d [InstCombine] add an assert to make a shl+icmp transform assumption explicit; NFCI
llvm-svn: 292440
2017-01-18 21:16:12 +00:00
Sanjay Patel 589de5ea4e [InstCombine] remove a redundant check; NFCI
I missed deleting this check when I refactored this chunk in:
https://reviews.llvm.org/rL292260

llvm-svn: 292433
2017-01-18 20:09:59 +00:00
Simon Pilgrim fe2c0ed4cf [InstCombine][AVX2] Add DemandedElts support for VPERMD/VPERMPS shuffles
Simplify a vpermv shuffle mask based on the elements of the mask that are actually demanded.

llvm-svn: 292371
2017-01-18 14:47:49 +00:00
Simon Pilgrim a22c3a1c0f [InstCombine] Remove unnecessary intrinsics demanded elts handling
As discussed on D28777 - we don't need to handle 'all element' shuffles inside InstCombiner::visitCallInst as InstCombiner::SimplifyDemandedVectorElts will do everything we need.

llvm-svn: 292365
2017-01-18 13:44:04 +00:00
Sanjay Patel 14715b3c2a [InstCombine] refactor foldICmpShlConstant(); NFCI
This reduces the size of and increases the symmetry with the planned functional change in:
https://reviews.llvm.org/D28406

llvm-svn: 292260
2017-01-17 21:25:16 +00:00
David Majnemer de55c606d1 [InstCombine] Fold ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2)
This further extends r292179 to support additional binary operators
beyond subtraction.

llvm-svn: 292238
2017-01-17 18:08:06 +00:00
Sanjay Patel 5424bd2625 [InstCombine] reduce indent; NFCI
llvm-svn: 292230
2017-01-17 16:59:09 +00:00
Simon Pilgrim d4eb800b03 [InstCombine][X86][AVX] Add DemandedElts support for VPERMILPD/VPERMILPS instructions
Simplify a vpermilvar shuffle mask based on the elements of the mask that are actually demanded.

llvm-svn: 292209
2017-01-17 11:35:03 +00:00
Sanjoy Das 679bc32c6a [InstCombine] Don't DSE across readnone functions that may throw
Summary: Depends on D28740

Reviewers: dberlin, chandlerc, hfinkel, majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D28742

llvm-svn: 292197
2017-01-17 05:45:09 +00:00
David Majnemer 36d382b773 [InstCombine] Fold ((C1-zext(X)) & C2) -> zext((C1-X) & C2)
This is valid if C2 fits within the bitwidth of X thanks to two's
complement modulo arithmetic.

llvm-svn: 292179
2017-01-17 00:45:57 +00:00
Matt Arsenault 7233344c28 SimplifyLibCalls: Replace fabs libcalls with intrinsics
Add missing fabs(fpext) optimzation that worked with the call,
and also fixes it creating a second fpext when there were multiple
uses.

llvm-svn: 292172
2017-01-17 00:10:40 +00:00
Sanjay Patel da5682afdd [InstCombine] use m_APInt instead of faking it
llvm-svn: 292164
2017-01-16 21:24:41 +00:00
Sanjay Patel 65cce20caa [InstCombine] fix names in canEvaluateShiftedShift(); NFC
It's not clear what 'First' and 'Second' mean, so use 'Inner' and 'Outer'
to match foldShiftedShift() and add comments with formulas, so it's easier
to see what's going on.

llvm-svn: 292153
2017-01-16 20:05:26 +00:00
Sanjay Patel ab8b32de71 [InstCombine] use m_APInt to allow shift-shift folds for vectors with splat constants
Some existing 'FIXME' tests are still not folded because of splat holes in value tracking.

llvm-svn: 292151
2017-01-16 19:35:45 +00:00
Sanjay Patel 646734a6cd [InstCombine] refactor shift-of-shift folds; NFCI
Reduces code duplication and makes it easier to extend these folds for vectors.

llvm-svn: 292145
2017-01-16 17:27:50 +00:00
Simon Pilgrim 73a68c25a0 [InstCombine][SSE] Add DemandedElts support for PSHUFB instructions
Simplify a pshufb shuffle mask based on the elements of the mask that are actually demanded.

Differential Revision: https://reviews.llvm.org/D28745

llvm-svn: 292101
2017-01-16 11:30:41 +00:00
Sanjay Patel 20aaf58543 [InstCombine] fix formatting; NFC
llvm-svn: 292073
2017-01-15 17:55:35 +00:00
Sanjay Patel 5f8451afad [InstCombine] use m_APInt to allow ashr folds for vectors with splat constants
llvm-svn: 292064
2017-01-15 16:38:19 +00:00
Chandler Carruth ca68a3ec47 [PM] Introduce an analysis set used to preserve all analyses over
a function's CFG when that CFG is unchanged.

This allows transformation passes to simply claim they preserve the CFG
and analysis passes to check for the CFG being preserved to remove the
fanout of all analyses being listed in all passes.

I've gone through and removed or cleaned up as many of the comments
reminding us to do this as I could.

Differential Revision: https://reviews.llvm.org/D28627

llvm-svn: 292054
2017-01-15 06:32:49 +00:00
Chandler Carruth 2f19a324cb [PM] The assumption cache is fundamentally designed to be self-updating,
mark it as never invalidated in the new PM.

The old PM already required this to work, and after a discussion with
Hal this seems to really be the only sensible answer. The cache
gracefully degrades as the IR is mutated, and most things which do this
should already be incrementally updating the cache.

This gets rid of a bunch of logic preserving and testing the
invalidation of this analysis.

llvm-svn: 292039
2017-01-15 00:26:18 +00:00
Chandler Carruth 5edfd4d99e [PM] Fix instcombine's analysis preservation in the new pass manager to
cover domtree and alias analysis. These are the pretty clear analyses
that we would always want to survive this pass.

To make these survive, we also need to preserve the assumption cache.

Added a test that verifies the important bits of this preservation.

llvm-svn: 292037
2017-01-14 23:25:22 +00:00
Sanjay Patel ca3124f74b [InstCombine] clean up visitAshr(); NFCI
llvm-svn: 292036
2017-01-14 23:13:50 +00:00
Sanjay Patel 40f401776b [InstCombine] optimize unsigned icmp of increment
Allows LLVM to optimize sequences like the following:

%add = add nuw i32 %x, 1
%cmp = icmp ugt i32 %add, %y

Into:

%cmp = icmp uge i32 %x, %y

Previously, only signed comparisons were being handled.

Decrements could also be handled, but 'sub nuw %x, 1' is currently canonicalized to
'add %x, -1' in InstCombineAddSub, losing the nuw flag. Removing that canonicalization
seems like it might have far-reaching ramifications so I kept this simple for now.

Patch by Matti Niemenmaa!

Differential Revision: https://reviews.llvm.org/D24700

llvm-svn: 291975
2017-01-13 23:25:46 +00:00
Sanjay Patel 2d4b456427 [InstCombine] use m_APInt to allow lshr folds for vectors with splat constants
llvm-svn: 291972
2017-01-13 23:04:10 +00:00
Sanjay Patel acd24c7b6a [InstCombine] use 'match' and other clean-up; NFCI
llvm-svn: 291937
2017-01-13 18:52:10 +00:00
Sanjay Patel b22f6c5f26 [InstCombine] use m_APInt to allow shl folds for vectors with splat constants
llvm-svn: 291934
2017-01-13 18:39:09 +00:00
Sanjay Patel cf08203105 [InstCombine] use Op0/Op1 local variables more consistently with shifts; NFC
llvm-svn: 291923
2017-01-13 18:08:25 +00:00
Sanjay Patel 5178363687 [InstCombine] if the condition of a select may be known via assumes, eliminate the select
This is a limited solution for PR31512:
https://llvm.org/bugs/show_bug.cgi?id=31512

The motivation is that we will need to increase usage of llvm.assume and/or metadata to solve PR28430:
https://llvm.org/bugs/show_bug.cgi?id=28430

...and this kind of simplification is needed to take advantage of that extra information.

The 'not' test case would be handled by:
https://reviews.llvm.org/D28485

Differential Revision:
https://reviews.llvm.org/D28337

llvm-svn: 291915
2017-01-13 17:02:42 +00:00
Robert Lougher f5df7a18dd [DebugInfo] Add const to DILocation variable declaration; NFC.
llvm-svn: 291785
2017-01-12 18:29:28 +00:00
Hal Finkel 8a9a783f2c Make processing @llvm.assume more efficient - Add affected values to the assumption cache
Here's my second try at making @llvm.assume processing more efficient. My
previous attempt, which leveraged operand bundles, r289755, didn't end up
working: it did make assume processing more efficient but eliminating the
assumption cache made ephemeral value computation too expensive. This is a
more-targeted change. We'll keep the assumption cache, but extend it to keep a
map of affected values (i.e. values about which an assumption might provide
some information) to the corresponding assumption intrinsics. This allows
ValueTracking and LVI to find assumptions relevant to the value being queried
without scanning all assumptions in the function. The fact that ValueTracking
started doing O(number of assumptions in the function) work, for every
known-bits query, has become prohibitively expensive in some cases.

As discussed during the review, this is a pragmatic fix that, longer term, will
likely be replaced by a more-principled solution (perhaps based on an extended
SSA form).

Differential Revision: https://reviews.llvm.org/D28459

llvm-svn: 291671
2017-01-11 13:24:24 +00:00
Sanjay Patel db0938fd9a [InstCombine] add a wrapper for a common pair of transforms; NFCI
Some of the callers are artificially limiting this transform to integer types;
this should make it easier to incrementally remove that restriction.

llvm-svn: 291620
2017-01-10 23:49:07 +00:00
Matt Arsenault 3f509042b0 InstCombine: Set operands instead of creating new call
llvm-svn: 291612
2017-01-10 23:17:52 +00:00
Matt Arsenault fdb78f8bae InstCombine: fdiv -x, -y -> fdiv x, y
llvm-svn: 291611
2017-01-10 23:08:54 +00:00
Sanjay Patel 940c06188e fix comment typos; NFC
llvm-svn: 291447
2017-01-09 16:27:56 +00:00
Matt Arsenault 3bdd75d01e InstCombine: Fold cos(-x) -> cos(x)
Also cos(fabs(x)) -> cos(x)

llvm-svn: 291022
2017-01-04 22:49:03 +00:00
David Majnemer cb892e9066 [InstCombine] Move casts around shift operations
It is possible to perform a left shift before zero extending if the
shift would only shift out zeros.

llvm-svn: 290928
2017-01-04 02:21:34 +00:00
David Majnemer 022d2a563b [InstCombine] Combine adds across a zext
We can perform the following:
(add (zext (add nuw X, C1)), C2) -> (zext (add nuw X, C1+C2))

This is only possible if C2 is negative and C2 is greater than or equal to negative C1.

llvm-svn: 290927
2017-01-04 02:21:31 +00:00
Matt Arsenault 56ff4839ae InstCombine: Fold fabs on select of constants
llvm-svn: 290913
2017-01-03 22:40:34 +00:00
Sanjay Patel f0d1e77373 [InstCombine] use 'match' to reduce code bloat; NFCI
I wrote this patch before seeing the comment in:
https://reviews.llvm.org/D27114
...that suggests we should actually be canonicalizing the other way.

So just in case we decide this is the right way, we might as well
have a cleaner implementation.

llvm-svn: 290912
2017-01-03 22:25:31 +00:00
Matt Arsenault b264c94963 InstCombine: Add fma with constant transforms
DAGCombine already does these.

llvm-svn: 290860
2017-01-03 04:32:35 +00:00
Matt Arsenault 1cc294c85d InstCombine: Add fma + fabs/fneg transforms
fma (fneg x), (fneg y), z -> fma x, y, z
fma (fabs x), (fabs x), z -> fma x, x, z

llvm-svn: 290859
2017-01-03 04:32:31 +00:00
Sanjay Patel b38ad88e9f [InstCombine] use combineMetadataForCSE instead of copying it; NFCI
llvm-svn: 290844
2017-01-02 23:25:28 +00:00
Craig Topper d00db69227 [InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and llvm.x86.avx512.vcomi.ss don't use the upper elements of their input.
This was already done for the SSE/SSE2 version of the intrinsics.

llvm-svn: 290776
2016-12-31 00:45:06 +00:00
Craig Topper 991636312b [InstCombine][AVX-512] When turning intrinsics with masking into native IR, don't emit a select if the mask is known to be all ones.
This saves InstCombine the burden of having to optimize the select later.

llvm-svn: 290774
2016-12-30 23:06:28 +00:00
David Majnemer 5ec5f278c9 [InstCombine] Address post-commit feedback
llvm-svn: 290741
2016-12-30 03:36:17 +00:00
David Majnemer a1cfd7c5f8 [InstCombine] More thoroughly canonicalize the position of zexts
We correctly canonicalized (add (sext x), (sext y)) to (sext (add x, y))
where possible.  However, we didn't perform the same canonicalization
for zexts or for muls.

llvm-svn: 290733
2016-12-30 00:28:58 +00:00
Craig Topper 17b5568bc7 [InstCombine] Use getVectorNumElements instead of explicitly casting to VectorType and calling getNumElements. NFC
llvm-svn: 290707
2016-12-29 07:03:18 +00:00
Craig Topper 62f06e241b [InstCombine] Fix typo in comment. NFC
llvm-svn: 290706
2016-12-29 05:38:31 +00:00
Craig Topper 2e18bcfc60 [InstCombine] Use a 32-bits instead of 64-bits for storing the number of elements in VectorType for a ShuffleVector. While there getVectorNumElements to avoid an explicit cast. NFC
llvm-svn: 290705
2016-12-29 04:24:32 +00:00
Craig Topper 1a8a3377cc [InstCombine][X86] If the lowest element of a scalar intrinsic isn't used make sure we add it to the worklist so we can DCE it sooner.
We bypassed the intrinsic and returned the passthru operand, but we should also add the intrinsic to the worklist since its now dead. This can allow DCE to find it sooner and remove it. Similar was done for InsertElement when the inserted element isn't demanded.

llvm-svn: 290704
2016-12-29 03:30:17 +00:00
Craig Topper 28ec3460e4 [InstCombine] Remove a piece of a comment that said that InstCombiner contains pass infrastructure. That hasn't been true since r226618. NFC
llvm-svn: 290648
2016-12-28 03:12:42 +00:00
Michael Kuperstein cd7ad7130f [InstCombine] Canonicalize insert splat sequences into an insert + shuffle
This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.

This fixes PR31286.

Differential Revision: https://reviews.llvm.org/D27992

llvm-svn: 290641
2016-12-28 00:18:08 +00:00
Craig Topper 72f2d4e8d6 [InstCombine][X86] Add DemandedElts support for 512-bit PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

This builds on r290554 which added supported for 128 and 256-bit.

llvm-svn: 290582
2016-12-27 05:30:09 +00:00
Craig Topper 7f8540b5e7 [AVX-512][InstCombine] Teach InstCombine to turn masked scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.

llvm-svn: 290566
2016-12-27 01:56:30 +00:00
Craig Topper 020b228155 [AVX-512][InstCombine] Teach InstCombine to turn packed add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
llvm-svn: 290559
2016-12-27 00:23:16 +00:00
Simon Pilgrim c9cf7fc7a4 [InstCombine][X86] Add DemandedElts support for PMULDQ/PMULUDQ instructions
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.

Differential Revision: https://reviews.llvm.org/D28119

llvm-svn: 290554
2016-12-26 23:28:17 +00:00
Craig Topper 7b788ada2d [AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION.
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.

I'll do the same thing for packed add/sub/mul/div in a future patch.

Reviewers: delena, RKSimon, zvi, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27879

llvm-svn: 290535
2016-12-26 06:33:19 +00:00
Craig Topper e328045711 [AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.

We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.

Reviewers: zvi, delena, spatel, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27825

llvm-svn: 290530
2016-12-25 23:58:57 +00:00
David Majnemer b0761a0c1b Revert "[InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp"
This reverts commit r289813, it caused PR31449.

llvm-svn: 290266
2016-12-21 19:21:59 +00:00
George Burgess IV 3f08914e7e [Analysis] Centralize objectsize lowering logic.
We're currently doing nearly the same thing for @llvm.objectsize in
three different places: two of them are missing checks for overflow,
and one of them could subtly break if InstCombine gets much smarter
about removing alloc sites. Seems like a good idea to not do that.

llvm-svn: 290214
2016-12-20 23:46:36 +00:00
Sanjay Patel 5a443ac000 [InstCombine] use commutative matcher for pattern with commutative operators
This is a case that was missed in:
https://reviews.llvm.org/rL290067
...and it would regress if we fix operand complexity (PR28296).

llvm-svn: 290127
2016-12-19 18:35:37 +00:00
Sanjay Patel dd46b52942 [InstCombine] add folds for icmp (umin|umax X, Y), X
This is a follow-up to:
https://reviews.llvm.org/rL289855 (https://reviews.llvm.org/D27531)
https://reviews.llvm.org/rL290111

llvm-svn: 290118
2016-12-19 17:32:37 +00:00
Sanjay Patel 8296c6c96f [InstCombine] add folds for icmp (smax X, Y), X
This is a follow-up to:
https://reviews.llvm.org/rL289855 (D27531)

llvm-svn: 290111
2016-12-19 16:28:53 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Sanjay Patel 2b9d4b4daf [InstCombine] use commutative matchers for patterns with commutative operators
Background/motivation - I was circling back around to:
https://llvm.org/bugs/show_bug.cgi?id=28296

I made a simple patch for that and noticed some regressions, so added test cases for
those with rL281055, and this is hopefully the minimal fix for just those cases.

But as you can see from the surrounding untouched folds, we are missing commuted patterns
all over the place, and of course there are no regression tests to cover any of those cases.

We could sprinkle "m_c_" dust all over this file and catch most of the missing folds, but 
then we still wouldn't have test coverage, and we'd still miss some fraction of commuted 
patterns because they require adjustments to the match order.

I'm aware of the concern about the potential compile-time performance impact of adding 
matches like this (currently being discussed on llvm-dev), but I don't think there's any
evidence yet to suggest that handling commutative pattern matching more thoroughly is not
a worthwhile goal of InstCombine.

Differential Revision: https://reviews.llvm.org/D24419

llvm-svn: 290067
2016-12-18 18:49:48 +00:00
Craig Topper e32b5fd7f9 [InstCombine] Simplify code slightly. NFC
llvm-svn: 290046
2016-12-17 18:10:04 +00:00
Sanjay Patel d640641a61 [InstCombine] add folds for icmp (smin X, Y), X
Min/max canonicalization (r287585) exposes the fact that we're missing combines for min/max patterns. 
This patch won't solve the example that was attached to that thread, so something else still needs fixing.

The line between InstCombine and InstSimplify gets blurry here because sometimes the icmp instruction that
we want to fold to already exists, but sometimes it's the swapped form of what we want.

Corresponding changes for smax/umin/umax to follow.

Differential Revision: https://reviews.llvm.org/D27531

llvm-svn: 289855
2016-12-15 19:13:37 +00:00
Ehsan Amiri 795b0671c5 [InstCombine] New opportunities for FoldAndOfICmp and FoldXorOfICmp
A number of new patterns for simplifying and/xor of icmp:

(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.

(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
   %s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.

llvm-svn: 289813
2016-12-15 12:25:13 +00:00
Craig Topper ab5f355d8c [AVX-512][InstCombine] Add masked scalar FMA intrinsics to SimplifyDemandedVectorElts.
llvm-svn: 289759
2016-12-15 03:49:45 +00:00
Hal Finkel 3ca4a6bcf1 Remove the AssumptionCache
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...

llvm-svn: 289756
2016-12-15 03:02:15 +00:00
Hal Finkel cb9f78e1c3 Make processing @llvm.assume more efficient by using operand bundles
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).

Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.

At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.

Differential Revision: https://reviews.llvm.org/D27259

llvm-svn: 289755
2016-12-15 02:53:42 +00:00
Robert Lougher cfd7198698 [InstCombine] Folding of a compare with RHS const should merge debug locations
If all the operands to a phi node are compares that have a RHS constant,
instcombine will try to pull them through the phi node, combining them into
a single operation. When it does this, the debug location of the new op
should be the merged debug locations of the phi node arguments.

Patch 8 of 8 for D26256.  Folding of a compare that has a RHS constant.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289704
2016-12-14 20:27:22 +00:00
Robert Lougher c9f7354776 [InstCombine] Folding of a binop with RHS const should merge the debug locations
If all the operands to a phi node are a binop with a RHS constant, instcombine
will try to pull them through the phi node, combining them into a single
operation. When it does this, the debug location of the new op should be the
merged debug locations of the phi node arguments.

Patch 7 of 8 for D26256.  Folding of a binop with RHS constant.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289699
2016-12-14 20:07:49 +00:00
Robert Lougher f02d9b8325 [InstCombine] When folding casts through a phi node merge the debug locations
If all the operands to a phi node are a cast, instcombine will try to pull
them through the phi node, combining them into a single cast. When it does
this, the debug location of the new cast should be the merged debug locations
of the phi node arguments.

Patch 6 of 8 for D26256.  Folding of a cast operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289693
2016-12-14 19:24:01 +00:00
Robert Lougher 373e36a410 [InstCombine] Folding loads through a phi node should merge the debug locations
If all the operands to a phi node are a load, instcombine will try to pull
them through the phi node, combining them into a single load. When it does
this, the debug location of the new load should be the merged debug locations
of the phi node arguments.

Patch 5 of 8 for D26256.  Folding of a load operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289688
2016-12-14 19:02:14 +00:00
Robert Lougher 8fc1e89bbb [InstCombine] When folding GEP through a phi node merge the debug locations
If all the operands to a phi node are getelementptr, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the new getelementptr
should be the merged debug locations of the phi node arguments.

Patch 4 of 8 for D26256.  Folding of a getelementptr operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289684
2016-12-14 18:37:50 +00:00
Robert Lougher 4b0790d488 [InstCombine] Merge debug locations when folding through a phi node
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.

Patch 3 of 8 for D26256.  Folding of a compare operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289681
2016-12-14 18:14:57 +00:00
Robert Lougher 2428a4050f [InstCombine] Merge debug locations when folding through a phi node
If all the operands to a phi node are of the same operation, instcombine
will try to pull them through the phi node, combining them into a single
operation.  When it does this, the debug location of the operation should
be the merged debug locations of the phi node arguments.

Patch 2 of 8 for D26256.  Folding of a binary operation.

Differential Revision: https://reviews.llvm.org/D26256

llvm-svn: 289679
2016-12-14 17:49:19 +00:00
Stephan Bergmann 17c7f70362 Replace APFloatBase static fltSemantics data members with getter functions
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.

Differential Revision: https://reviews.llvm.org/D26671

llvm-svn: 289647
2016-12-14 11:57:17 +00:00
Craig Topper aeaa52cc11 [X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar floating point to integer conversion intrinsics.
llvm-svn: 289639
2016-12-14 07:46:12 +00:00
Craig Topper 268b3abe6d [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle masked scalar add/sub/mul/div/max/min intrinsics better.
Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking.

llvm-svn: 289636
2016-12-14 06:06:58 +00:00
Craig Topper dfd268d76b [X86][InstCombine] Handle scalar fmadd intrinsics correctly in SimplifyDemandedVectorElts.
Now we pass a modified version of DemandedElts to each operand and we calculate undef elts correctly.

llvm-svn: 289632
2016-12-14 05:43:05 +00:00
Craig Topper eb6a20e79e [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar round intrinsics more correctly.
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0.

Also calculate UndefElts correctly.

Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.

llvm-svn: 289629
2016-12-14 03:17:30 +00:00
Craig Topper a0372dec26 [X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar min/max/cmp intrinsics more correctly.
Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused.

Also calculate UndefElts correctly.

Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support.

llvm-svn: 289628
2016-12-14 03:17:27 +00:00
Craig Topper ac75bca1eb [X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar intrinsics correctly.
Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed.

Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support.

llvm-svn: 289523
2016-12-13 07:45:45 +00:00
Sanjay Patel e730ce87a5 [InstCombine] fix bug when offsetting case values of a switch (PR31260)
We could truncate the condition and then try to fold the add into the
original condition value causing wrong case constants to be used.

Move the offset transform ahead of the truncate transform and return
after each transform, so there's no chance of getting confused values.

Fix for:
https://llvm.org/bugs/show_bug.cgi?id=31260

llvm-svn: 289442
2016-12-12 16:13:52 +00:00
Sanjay Patel 87e2f677d7 [InstCombine] clean up range-for-loops in visitSwitchInst(); NFCI
llvm-svn: 289439
2016-12-12 15:52:56 +00:00
Craig Topper 7fc6d34ed1 [InstCombine][XOP] The instructions for the scalar frcz intrinsics are defined to put 0 in the upper bits, not pass bits through like other intrinsics. So we should return a zero vector instead.
llvm-svn: 289411
2016-12-11 22:32:38 +00:00
Craig Topper 23ebd9564f [X86][InstCombine] Add support for scalar FMA intrinsics to SimplifyDemandedVectorElts.
This teaches SimplifyDemandedElts that the FMA can be removed if the lower element isn't used. It also teaches it that if upper elements of the first operand aren't used then we can simplify them.

llvm-svn: 289377
2016-12-11 08:54:52 +00:00
Craig Topper 61b280e7b0 [X86][InstCombine] Teach InstCombineCalls to simplify demanded elements for scalar FMA intrinsics.
These intrinsics don't read the upper bits of their second and third inputs so we can try to simplify them.

llvm-svn: 289372
2016-12-11 07:42:06 +00:00
Craig Topper d96395365a [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded for scalar cmp intrinsics with masking and rounding.
These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead.

llvm-svn: 289371
2016-12-11 07:42:04 +00:00
Craig Topper 790d0fa569 [AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded elements for scalar add,div,mul,sub,max,min intrinsics with masking and rounding.
These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well.

llvm-svn: 289370
2016-12-11 07:42:01 +00:00
Craig Topper 58917f3508 [AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls to match 128 and 256-bit.
llvm-svn: 289354
2016-12-11 01:59:36 +00:00
Craig Topper 9a63d7ade5 [X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a shufflevector if the indices are constant.
llvm-svn: 289348
2016-12-11 00:23:50 +00:00
Sanjay Patel 4c48bbe94d [InstCombine] add helper for shift-by-shift folds; NFCI
These are currently limited to integer types, but we should
be able to extend to splat vectors and possibly general vectors.

llvm-svn: 289343
2016-12-10 22:16:29 +00:00
Sanjay Patel b7f8cb698c [InstCombine] change select type to eliminate bitcasts
This solves a secondary problem seen in PR6137:
https://llvm.org/bugs/show_bug.cgi?id=6137#c6

This is similar to the bitwise logic op fold added with:
https://reviews.llvm.org/rL287707

And like that patch, I'm artificially restricting the
transform from vector <-> scalar types until we're sure
that the backend can handle that. 

llvm-svn: 288584
2016-12-03 15:25:16 +00:00
Peter Collingbourne ab85225be4 IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458
2016-12-02 02:24:42 +00:00