Commit Graph

10250 Commits

Author SHA1 Message Date
Bjorn Pettersson d065c81164 [CodeGen] Handle SMULFIXSAT with scale zero in TargetLowering::expandFixedPointMul
Summary:
Normally TargetLowering::expandFixedPointMul would handle
SMULFIXSAT with scale zero by using an SMULO to compute the
product and determine if saturation is needed (if overflow
happened). But if SMULO isn't custom/legal it falls through
and uses the same technique, using MULHS/SMUL_LOHI, as used
for non-zero scales.

Problem was that when checking for overflow (handling saturation)
when not using MULO we did not expect to find a zero scale. So
we ended up in an assertion when doing
  APInt::getLowBitsSet(VTSize, Scale - 1)

This patch fixes the problem by adding a new special case for
how saturation is computed when scale is zero.

Reviewers: RKSimon, bevinh, leonardchan, spatel

Reviewed By: RKSimon

Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67071

llvm-svn: 371309
2019-09-07 12:16:23 +00:00
Bjorn Pettersson 5e331e4ce8 [Intrinsic] Add the llvm.umul.fix.sat intrinsic
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.

This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.

Patch by: leonardchan, bjope

Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel

Reviewed By: leonardchan

Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57836

llvm-svn: 371308
2019-09-07 12:16:14 +00:00
Teresa Johnson 9c27b59cec Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.

Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.

There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.

Reviewers: chandlerc, hfinkel

Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66428

llvm-svn: 371284
2019-09-07 03:09:36 +00:00
Philip Reames 3a49ca331f Update CodeGen to use hasMetadata as appropriate [NFC]
My intial grepping for rL370933 missed a directory worth of cases.

llvm-svn: 370942
2019-09-04 17:46:55 +00:00
Bjorn Pettersson b0eb394417 [CodeGen] Use FSHR in DAGTypeLegalizer::ExpandIntRes_MULFIX
Summary:
Simplify the right shift of the intermediate result (given
in four parts) by using funnel shift.

There are some impact on lit tests, but that seems to be
related to register allocation differences due to how FSHR
is expanded on X86 (giving a slightly different operand order
for the OR operations compared to the old code).

Reviewers: leonardchan, RKSimon, spatel, lebedev.ri

Reviewed By: RKSimon

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, pzheng, bevinh, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67036

llvm-svn: 370813
2019-09-03 19:35:07 +00:00
Craig Topper 9c74c77404 [LegalizeDAG] Pass DAG to two calls to SDNode::dump in debug prints so that they will print target specific nodes correctly.
The dump methods can only print target node names correctly if
they can get access to the TLI object.

llvm-svn: 370694
2019-09-03 02:51:14 +00:00
Sanjay Patel 4e54cf3e0e [DAGCombiner] try to form test+set out of shift+mask patterns
The motivating bugs are:
https://bugs.llvm.org/show_bug.cgi?id=41340
https://bugs.llvm.org/show_bug.cgi?id=42697

As discussed there, we could view this as a failure of IR canonicalization,
but then we would need to implement a backend fixup with target overrides
to get this right in all cases. Instead, we can just view this as a codegen
opportunity. It's not even clear for x86 exactly when we should favor
test+set; some CPUs have better theoretical throughput for the ALU ops than
bt/test.

This patch is made more complicated than I expected because there's an early
DAGCombine for 'and' that can change types of the intermediate ops via
trunc+anyext.

Differential Revision: https://reviews.llvm.org/D66687

llvm-svn: 370668
2019-09-02 14:52:09 +00:00
Sanjay Patel c882208367 [DAGCombiner] improve throughput of shift+logic+shift
The motivating case for this is a long way from here:
https://bugs.llvm.org/show_bug.cgi?id=43146
...but I think this is where we have to start.

We need to canonicalize/optimize sequences of shift and logic to ease
pattern matching for things like bswap and improve perf in general.
But without the artificial limit of '!LegalTypes' (early combining),
there are a lot of test diffs, and not all are good.

In the minimal tests added for this proposal, x86 should have better
throughput in all cases. AArch64 is neutral for scalar tests because
it can fold shifts into bitwise logic ops.

There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns:
https://rise4fun.com/Alive/VlI
https://rise4fun.com/Alive/n1m
https://rise4fun.com/Alive/1Vn

Differential Revision: https://reviews.llvm.org/D67021

llvm-svn: 370617
2019-09-01 18:38:15 +00:00
Shiva Chen adfdcb9c26 [TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken with constant inputs
Summary:
  This fixes the bugzilla id 43183 which triggerd by the following commit:
  [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall

llvm-svn: 370604
2019-09-01 04:52:54 +00:00
Sanjay Patel 9e57b49392 [DAGCombiner] clean up code in visitShiftByConstant()
This is not quite NFC because the SDLoc propagation is changed,
but there are no regression test diffs from that.

llvm-svn: 370587
2019-08-31 15:08:58 +00:00
Amaury Sechet 82825ab882 [DAGCombiner] Match (add X, X) as (shl X, 1) when detecting rotate.
Summary: The combiner transforms (shl X, 1) into (add X, X).

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66882

llvm-svn: 370578
2019-08-31 11:40:02 +00:00
James Molloy e62c509cd4 [DAGCombiner] Don't create illegal narrow stores
Narrowing stores when the target doesn't support the narrow version
forces the target to expand into a load-modify-store sequence, which
is highly suboptimal. The information narrowing throws away (legality
of the inverse transform) is hard to re-analyze. If the target doesn't
support a store of the narrow type, don't narrow even in pre-legalize
mode.

No test as this is DAGCombiner and depends on target bits.

llvm-svn: 370576
2019-08-31 10:46:16 +00:00
Bjorn Pettersson e27c74abb6 [CodeGen] Refactor DAGTypeLegalizer::ExpandIntRes_MULFIX. NFC
Restructured the code a little bit in preparation for adding
UMULFIXSAT. I think it will be easier to understand the code
if not interleaving the codegen for signed/unsigned/saturated
cases that much.

llvm-svn: 370569
2019-08-31 09:28:50 +00:00
Simon Pilgrim 3be7081aa1 [DAGCombine] ReduceLoadWidth - remove duplicate SDLoc. NFCI.
SDLoc(N0) and SDLoc(cast<LoadSDNode>(N0)) should be equivalent.

llvm-svn: 370498
2019-08-30 18:19:02 +00:00
Simon Pilgrim 2d1e0899e9 [TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit SDNodeFlags from the original node.
Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case.

Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests

llvm-svn: 370497
2019-08-30 17:58:55 +00:00
Simon Pilgrim ab8cb1a3c5 [DAGCombine] visitVSELECT - remove equivalent getValueType() call. NFCI.
llvm-svn: 370489
2019-08-30 17:21:20 +00:00
Simon Pilgrim c2fed1dc8a [DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI.
llvm-svn: 370478
2019-08-30 15:17:37 +00:00
Simon Pilgrim 3367669668 [DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts.
llvm-svn: 370471
2019-08-30 13:30:37 +00:00
Simon Pilgrim 8e1989e79a [DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for vector types.
This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total.

llvm-svn: 370468
2019-08-30 12:22:06 +00:00
Simon Pilgrim 7cbf823f93 [DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros
Return a proper zero vector, just in case some elements are undef.

Noticed by inspection after dealing with a similar issue in PR43159.

llvm-svn: 370460
2019-08-30 10:42:14 +00:00
Dan Gohman 8cfeeaf9de [CodeGen] Fix lowering for returning the result of an extractvalue
When the number of return values exceeds the number of registers available,
SelectionDAGBuilder::visitRet transforms a function's return to use a
pointer to a buffer to hold return values. When the returned value is an
operator such as extractvalue, the value may have a non-zero result number.
Add that number to the indexing when obtaining the values to store.

This fixes https://bugs.llvm.org/show_bug.cgi?id=43132.

Differential Revision: https://reviews.llvm.org/D66978

llvm-svn: 370430
2019-08-30 04:33:22 +00:00
Simon Pilgrim ea67741899 [DAGCombine] Fix shadow variable warnings. NFCI.
llvm-svn: 370365
2019-08-29 14:34:07 +00:00
Simon Pilgrim 6c2fc64edc Fix signed/unsigned comparison warning. NFCI.
llvm-svn: 370333
2019-08-29 11:18:53 +00:00
Simon Pilgrim 27f43e6b1a Fix shadow variable warning. NFCI.
llvm-svn: 370332
2019-08-29 11:16:32 +00:00
Amaury Sechet 8365e42010 [DAGCombiner] (insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt X, N), IdxC) -> (vector_shuffle X, Y)
Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66718

llvm-svn: 370326
2019-08-29 10:35:51 +00:00
Simon Pilgrim dfb2a19ac2 LegalizeSetCCCondCode - Reduce scope of NeedSwap to fix cppcheck warning. NFCI.
No need for this to be defined outside the only switch case its used in.

llvm-svn: 370320
2019-08-29 10:11:34 +00:00
Craig Topper 1aadf6f39f [X86] Make inline assembly 'x' and 'v' constraints work for f128.
Including a type legalizer fix to make bitcast operand promotion
work correctly when getSoftenedFloat returns f128 instead of i128.

Fixes PR43157

llvm-svn: 370293
2019-08-29 05:13:56 +00:00
Shiva Chen b39876d8cd [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

llvm-svn: 370275
2019-08-28 23:40:37 +00:00
Kevin P. Neal ddf13c00ed [FPEnv] Add fptosi and fptoui constrained intrinsics.
This implements constrained floating point intrinsics for FP to signed and
unsigned integers.

Quoting from D32319:
The purpose of the constrained intrinsics is to force the optimizer to
respect the restrictions that will be necessary to support things like the
STDC FENV_ACCESS ON pragma without interfering with optimizations when
these restrictions are not needed.

Reviewed by:	Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton
Approved by:	Craig Topper
Differential Revision:	http://reviews.llvm.org/D63782

llvm-svn: 370228
2019-08-28 16:33:36 +00:00
Simon Pilgrim 14e07d7f4b [DAGCombine] Fix cppcheck shadow variable warning. NFCI.
We already have an outer Ops variable.

llvm-svn: 370197
2019-08-28 12:48:41 +00:00
Amaury Sechet 4f4387dd12 [TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles
Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66804

llvm-svn: 370190
2019-08-28 12:00:06 +00:00
Simon Pilgrim c5b38e2869 [DAGCombine] Remove LoadedSlice::Cost default 'ForCodeSize' constructor arguments. NFCI.
These were always being passed in and it allowed me to add the explicit tag to stop a cppcheck warning about 1 argument constructors.

llvm-svn: 370189
2019-08-28 11:50:36 +00:00
Matt Arsenault 2910184936 DAG: computeNumSignBits for MUL
Copied directly from the IR version.

Most of the testcases I've added for this are somewhat problematic
because they really end up testing the yet to be implemented version
for MUL_I24/MUL_U24.

llvm-svn: 370099
2019-08-27 19:05:33 +00:00
Sanjay Patel b516f1afdd [DAGCombiner] cancel fnegs from multiplied operands of FMA
(-X) * (-Y) + Z --> X * Y + Z

This is a missing optimization that shows up as a potential regression in D66050,
so we should solve it first. We appear to be partly missing this fold in IR as well.

We do handle the simpler case already:
(-X) * (-Y) --> X * Y

And it might be beneficial to make the constraint less conservative (eg, if both
operands are cheap, but not necessarily cheaper), but that causes infinite looping
for the existing fmul transform.

Differential Revision: https://reviews.llvm.org/D66755

llvm-svn: 370071
2019-08-27 15:17:46 +00:00
Amaury Sechet f28dee2cff [DAGCombiner] Add node to the worklist in topological order in parallelizeChainedStores
Summary: As per title.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66659

llvm-svn: 370056
2019-08-27 13:27:57 +00:00
Amaury Sechet a1e5ef3fd4 [DAGCombiner] Add node to the worklist in topological order after relegalization.
Summary: As per title.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66702

llvm-svn: 370040
2019-08-27 11:06:09 +00:00
Craig Topper 243ede9970 [SelectionDAGBuilder] Hide existence of ConstantDataVector vector from visitGetElementPtr.
ConstantDataVector is a specialized verison of ConstantVector
that stores data in a packed array of bits instead of as
individual pointers to other Constants. But we really shouldn't
expose that if we can void it. And we should handle regular
ConstantVector equally well.

This removes a dyn_cast to ConstantDataVector and just calls
getSplatValue directly on a Constant* if the type is a vector.

llvm-svn: 370018
2019-08-27 06:39:50 +00:00
Craig Topper 4a3f62f9fd [SelectionDAGBuilder] Fix typo in comment. NFC
llvm-svn: 370017
2019-08-27 06:38:51 +00:00
Richard Trieu 58e67b8aa3 Revert r369927 - [DAGCombiner] Remove a bunch of redundant AddToWorklist calls.
This change causes instrumented builds of Clang to have a fatal error in the
backend.  https://reviews.llvm.org/D66537 has the details.

llvm-svn: 370006
2019-08-27 02:04:11 +00:00
Craig Topper 846429de74 [DAGCombiner][X86] Teach SimplifyVBinOp to fold VBinOp (concat X, undef/constant), (concat Y, undef/constant) -> concat (VBinOp X, Y), VecC
This improves the combine I included in D66504 to handle constants in the upper operands of the concat. If we can constant fold them away we can pull the concat after the bin op. This helps with chains of madd reductions on X86 from loop unrolling. The loop madd reduction pattern creates pmaddwd with half the width of the add that follows it using zeroes to fill the upper bits. If we have two of these added together we can pull the zeroes through the accumulating add and then shrink it.

Differential Revision: https://reviews.llvm.org/D66680

llvm-svn: 369937
2019-08-26 17:59:11 +00:00
Amaury Sechet b7075e40f3 [DAGCombiner] Remove a bunch of redundant AddToWorklist calls.
Summary:
This comes as a first step toward processing the DAG nodes in topological orders. Doing so ensure that arguments of a node are combined before the node itself is combined, which exposes ore opportunities for optimization and/or reduce the amount of patterns a node has to match for.

DAGCombiner adding nodes to the worklist is various places causes the nodes to be in a different order from what is expected. In addition, this is reduant because these nodes end up being added to the worklist anyways due to the machinery at line 1621.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66537

llvm-svn: 369927
2019-08-26 17:02:12 +00:00
Craig Topper b8b90ac1c5 [X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of inserting into undef
Summary:
Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not.

I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66504

llvm-svn: 369872
2019-08-25 17:59:49 +00:00
Nikita Popov aa71c977ba [SDAG] Fold umul_lohi with 0 or 1 multiplicand
These can turn up during multiplication legalization. In principle
these should also apply to smul_lohi, but I wasn't able to figure
out how to produce those with the necessary operands.

Differential Revision: https://reviews.llvm.org/D66380

llvm-svn: 369864
2019-08-25 08:04:22 +00:00
Benjamin Kramer dc5f805d31 Do a sweep of symbol internalization. NFC.
llvm-svn: 369803
2019-08-23 19:59:23 +00:00
Craig Topper e7211bb567 [SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression
Patch showing the effect of enabling bool vector oversimplification.

Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify:

insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i

Preventing the removal of the AND to clear the upper bits of result

Differential Revision: https://reviews.llvm.org/D53022

llvm-svn: 369780
2019-08-23 17:14:58 +00:00
Simon Pilgrim 04906ef1f2 [DAGCombine] GetNegatedExpression - add FMA\FMAD support
If the accumulator and either of the multiply operands are negatable then we can we negate the entire expression.

Differential Revision: https://reviews.llvm.org/D63141

llvm-svn: 369746
2019-08-23 10:49:46 +00:00
Amaury Sechet 95cf66de7c [DAGCombiner] Remove explicit call to AddToWorklist in sqrt and reciprocal computations
Summary: These nodes end up being processed regardless due to DAGCombiner ensuring arguments are processed. This changes the order in which nodes are processed, which fixes an issue on PowerPC.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri, mcberg2017, stefanp, hfinkel

Subscribers: nemanjai, MaskRay, jsji, steven.zhang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66548

llvm-svn: 369662
2019-08-22 15:35:45 +00:00
Shiva Chen 72a41e7b0d [TargetLowering] Remove optional arguments passing to makeLibCall
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.

Differential Revision: https://reviews.llvm.org/D65795

llvm-svn: 369622
2019-08-22 04:59:43 +00:00
Amaury Sechet c0f190a048 [DAGCombiner] Remove mostly redundant calls to AddToWorklist
Summary:
These calls change the order in which some nodes are processed and so have an effect on codegen.

The change in fixup-bw-copy.ll is due to (and (load anyext)) gets transformed into (load zext) while previously the and was removed by SimplifyDemandedBits, so the (load anyext) remained.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66543

llvm-svn: 369561
2019-08-21 18:51:08 +00:00
Amaury Sechet 045f33aec9 [DAGCombiner] Various nits. NFC
llvm-svn: 369520
2019-08-21 12:01:37 +00:00
Craig Topper ba375263e8 [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
I also had to add a new combine to X86's combineExtractSubvector to prevent a regression.

This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.

I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.

Differential Revision: https://reviews.llvm.org/D66456

llvm-svn: 369459
2019-08-20 22:12:50 +00:00
Roman Lebedev edfaee0811 [TargetLowering] x s% C == 0 fold: vector divisor with INT_MIN handling
Summary:
The general fold is only valid for positive divisors.
Which effectively means, it is invalid for `INT_MIN` divisors,
and we currently bailout if we see them.

But that is too strict, we can just fix-up the results.
For that, let's do a second computation 'in parallel':
```
Name: srem -> and
Pre: isPowerOf2(C)
%o = srem i8 %X, C
%r = icmp eq %o, 0
  =>
%n = and i8 %X, C-1
%r = icmp eq %n, 0
```
https://rise4fun.com/Alive/Sup

And then just blend results: if the divisor was `INT_MIN`,
pick the value we got via bit-test,
else pick the value from general fold.

There's interesting observation - `ISD::ROTR` is set to
`LegalizeAction::Expand` before AVX512, so we should not
treat `INT_MIN` divisor as even; and as it can be seen
while `@test_srem_odd_even_one` improves on all run-lines,
`@test_srem_odd_even_INT_MIN` only improves for AVX512.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66300

llvm-svn: 369268
2019-08-19 15:01:42 +00:00
Craig Topper f43106e341 [SelectionDAG] Add a node creation debug message to getMachineNode.
llvm-svn: 369204
2019-08-18 06:28:00 +00:00
Bjorn Pettersson 9dddd26e31 [DAGCombiner] Add simple folds for SMULFIX/UMULFIX/SMULFIXSAT
Summary:
Add the following DAGCombiner folds for mulfix being
one of SMULFIX/UMULFIX/SMULFIXSAT:
  (mulfix x, undef, scale) -> 0
  (mulfix x, 0, scale) -> 0

Also added canonicalization of constants to RHS.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66052

llvm-svn: 369103
2019-08-16 13:16:48 +00:00
Daniel Sanders 0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Simon Pilgrim d4df81f463 Remove SmallBitVector.h include. NFCI.
SmallBitVector/BitVector types aren't used at all in the cpp file.

llvm-svn: 369008
2019-08-15 14:40:37 +00:00
Simon Pilgrim 983e9118a2 Remove BitVector.h include. NFCI.
BitVector type isn't used at all in the cpp file.

llvm-svn: 369007
2019-08-15 14:39:28 +00:00
Simon Pilgrim ed804dad1e [DAGCombine] MergeConsecutiveStores - fix cppcheck/MSVC extension warning. NFCI.
Set the StartIdx type to size_t so that it matches the StoreNodes SmallVector size() and index types.

Silences the MSVC analyzer warning that unsigned increment might overflow before exceeding size_t on 64-bit targets - this isn't likely to happen but it means we use consistent types and reduces the warning "noise" a little.

llvm-svn: 368998
2019-08-15 13:07:14 +00:00
Sanjay Patel 57d459309d [SDAG][x86] check for relaxed math when matching an FP reduction
If the last step in an FP add reduction allows reassociation and doesn't care
about -0.0, then we are free to recognize that computation as a reduction
that may reorder the intermediate steps.

This is requested directly by PR42705:
https://bugs.llvm.org/show_bug.cgi?id=42705
and solves PR42947 (if horizontal math instructions are actually faster than
the alternative):
https://bugs.llvm.org/show_bug.cgi?id=42947

Differential Revision: https://reviews.llvm.org/D66236

llvm-svn: 368995
2019-08-15 12:43:15 +00:00
Florian Hahn de1d6c8220 Add ptrmask intrinsic
This patch adds a ptrmask intrinsic which allows masking out bits of a
pointer that must be zero when accessing it, because of ABI alignment
requirements or a restriction of the meaningful bits of a pointer
through the data layout.

This avoids doing a ptrtoint/inttoptr round trip in some cases (e.g. tagged
pointers) and allows us to not lose information about the underlying
object.

Reviewers: nlopes, efriedma, hfinkel, sanjoy, jdoerfert, aqjune

Reviewed by: sanjoy, jdoerfert

Differential Revision: https://reviews.llvm.org/D59065

llvm-svn: 368986
2019-08-15 10:12:26 +00:00
Craig Topper e7ea06b7d2 [SelectionDAGBuilder] Teach gather/scatter getUniformBase to look through vector zeroinitializer indices in addition to scalar zeroes.
llvm-svn: 368926
2019-08-14 21:38:56 +00:00
Sanjay Patel ecccf29e6c [SDAG] move variable closer to use; NFC
llvm-svn: 368905
2019-08-14 19:46:15 +00:00
Roman Lebedev 676594305a [CodeGen][SelectionDAG] More efficient code for X % C == 0 (SREM case)
Summary:
This implements an optimization described in Hacker's Delight 10-17:
when `C` is constant, the result of `X % C == 0` can be computed
more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

One huge caveat: this signed case is only valid for positive divisors.

While we can freely negate negative divisors, we can't negate `INT_MIN`,
so for now if `INT_MIN` is encountered, we bailout.
As a follow-up, it should be possible to handle that more gracefully
via extra `and`+`setcc`+`select`.

This passes llvm's test-suite, and from cursory(!) cross-examination
the folds (the assembly) match those of GCC, and manual checking via alive
did not reveal any issues (other than the `INT_MIN` case)

Reviewers: RKSimon, spatel, hermord, craig.topper, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, thakis, javed.absar, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65366

llvm-svn: 368702
2019-08-13 14:57:37 +00:00
Roman Lebedev f4de7eda4a [TargetLowering][NFC] prepareUREMEqFold(): fixup comment
The comment initially matched the code, but the code was incorrect
and was fixed after the initial revert back back when it was introduced,
but the comment was never updated.

llvm-svn: 368701
2019-08-13 14:57:08 +00:00
Hans Wennborg 5390d25f2b Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT"
This introduced a false positive MemorySanitizer warning about use of
uninitialized memory in a vectorized crc function in Chromium. That suggests
maybe something is not right with this transformation. See
https://crbug.com/992853#c7 for a reproducer.

This also reverts the follow-up commits r368307 and r368308 which
depended on this.

> This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
>
> In particular this helps remove some unnecessary scalar->vector->scalar patterns.
>
> The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
>
> Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368660
2019-08-13 09:33:25 +00:00
Simon Pilgrim 05e8209e33 [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::TRUNCATE
llvm-svn: 368553
2019-08-12 10:56:05 +00:00
Bjorn Pettersson 27038a3780 [SelectionDAG] Widen vector results of SMULFIX/UMULFIX/SMULFIXSAT
Summary:
After the commits that changed x86 backend to widen vectors
instead of using promotion some of our downstream tests
started to fail. It was noticed that WidenVectorResult has
been missing support for SMULFIX/UMULFIX/SMULFIXSAT. This
patch adds the missing functionality.

Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66051

llvm-svn: 368540
2019-08-11 19:27:06 +00:00
Sanjay Patel 26b2c11451 [DAGCombiner] exclude x*2.0 from normal negation profitability rules
This is the codegen part of fixing:
https://bugs.llvm.org/show_bug.cgi?id=32939

Even with the optimal/canonical IR that is ideally created by D65954,
we would reverse that transform in DAGCombiner and end up with the same
asm on AArch64 or x86.

I see 2 options for trying to correct this:

  1. Limit isNegatibleForFree() by special-casing the fmul pattern (this patch).
  2. Avoid creating (fmul X, 2.0) in the 1st place by adding a special-case
     transform to SelectionDAG::getNode() and/or SelectionDAGBuilder::visitFMul()
     that matches the transform done by DAGCombiner.

This seems like the less intrusive patch, but if there's some other reason to
prefer 1 option over the other, we can change to the other option.

Differential Revision: https://reviews.llvm.org/D66016

llvm-svn: 368490
2019-08-09 21:37:32 +00:00
Sanjay Patel 0b4ae34c2f [DAGCombiner] remove redundant fold for X*1.0; NFC
This is handled at node creation time (similar to X/1.0)
after:
rL357029
(no fast-math-flags needed)

llvm-svn: 368443
2019-08-09 14:30:59 +00:00
Craig Topper 9158e54270 [SelectionDAG][X86] Move setcc mask splitting for mload/mstore/mgather/mscatter from DAGCombiner to the type legalizer.
We may be able to look to how VSELECT is handled to further
improve this, but this appears to be neutral or an improvement
on the test cases we have.

llvm-svn: 368344
2019-08-08 21:14:08 +00:00
Craig Topper bce4d79f37 [LegalizeTypes] Remove SplitVSETCC helper and just call SplitVecRes_SETCC.
llvm-svn: 368343
2019-08-08 21:13:58 +00:00
Simon Pilgrim e2e366797e [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT
This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.

In particular this helps remove some unnecessary scalar->vector->scalar patterns.

The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.

Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368276
2019-08-08 10:37:03 +00:00
Amy Huang 0b870b969f Recommit "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG"
with a fix to clear the SDNode map when SelectionDAG is cleared.

llvm-svn: 368230
2019-08-07 22:49:40 +00:00
Simon Pilgrim 0eafe011ca [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::VECTOR_SHUFFLE
In particular this helps the SSE vector shift cvttps2dq+add+shl pattern by avoiding the need for zeros in shuffle style extensions to vXi32 types as we'll be shifting out those bits anyway

llvm-svn: 368155
2019-08-07 11:43:13 +00:00
Aditya Nandakumar c8ac029d0a [GISel]: Add GISelKnownBits analysis
https://reviews.llvm.org/D65698

This adds a KnownBits analysis pass for GISel. This was done as a
pass (compared to static functions) so that we can add other features
such as caching queries(within a pass and across passes) in the future.
This patch only adds the basic pass boiler plate, and implements a lazy
non caching knownbits implementation (ported from SelectionDAG). I've
also hooked up the AArch64PreLegalizerCombiner pass to use this - there
should be no compile time regression as the analysis is lazy.

llvm-svn: 368065
2019-08-06 17:18:29 +00:00
Simon Pilgrim dae5ddad9d [TargetLowering] SimplifyMultipleUseDemandedBits - return UNDEF for undemanded ops
If we demand no bits/elts from an Op, just return UNDEF

llvm-svn: 368043
2019-08-06 14:30:42 +00:00
Ulrich Weigand 7b24dd741c [Strict FP] Allow custom operation actions
This patch changes the DAG legalizer to respect the operation actions
set by the target for strict floating-point operations. (Currently, the
legalizer will usually fall back to mutate to the non-strict action
(which is assumed to be legal), and only skip mutation if the strict
operation is marked legal.)

With this patch, if whenever a strict operation is marked as Legal or
Custom, it is passed to the target as usual. Only if it is marked as
Expand will the legalizer attempt to mutate to the non-strict operation.
Note that this will now fail if the non-strict operation is itself
marked as Custom -- the target will have to provide a Custom definition
for the strict operation then as well.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D65226

llvm-svn: 368012
2019-08-06 10:43:13 +00:00
Cullen Rhodes ced419f4d7 [SelectionDAG] Extend base addressing modes supported by MGATHER/MSCATTER
Summary:
Before this patch MGATHER/MSCATTER is capable of representing all
common addressing modes, but only when illegal types are used.
This patch adds an IndexType property so more representations
are available when using legal types only.

Original modes:
 vector of bases
 base + vector of signed scaled offsets

New modes:
 base + vector of signed unscaled offsets
 base + vector of unsigned scaled offsets
 base + vector of unsigned unscaled offsets

The current behaviour of addressing modes for gather/scatter remains
unchanged.

Patch by Paul Walker.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D65636

llvm-svn: 368008
2019-08-06 09:46:13 +00:00
Matt Arsenault f4d3113a5f CodeGen: Migration to using Register
llvm-svn: 367974
2019-08-06 03:59:31 +00:00
Matt Arsenault 3922392969 AMDGPU: Correct behavior of f16 buffer loads
Don't assume format loads for f16. Also fixes support for targets
without i16.

llvm-svn: 367879
2019-08-05 15:59:07 +00:00
Sanjay Patel eaf13044bd [DAGCombiner][x86] prevent infinite loop from truncate/extend transforms
The test case is based on the example from the post-commit thread for:
https://reviews.llvm.org/rGc9171bd0a955

This replaces the x86-specific simple-type check from:
rL367766
with a check in the DAGCombiner. Adding the check isn't
strictly necessary after the fix from:
rL367768
...but it seems likely that we're heading for trouble if
we are creating weird types in this transform.

I combined the earlier legality check into the initial
clause to simplify the code.

So we should only try the trunc/sext transform at the
earliest combine stage, but we limit the transform to
simple types anyway because the TLI hook is probably
too lax about what it considers a free truncate.

llvm-svn: 367834
2019-08-05 11:27:07 +00:00
Guillaume Chatelet c97a3d15d2 [LLVM][Alignment] Introduce Alignment Type
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Reviewed By: jfb

Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65514

llvm-svn: 367828
2019-08-05 11:02:05 +00:00
Guillaume Chatelet 65e4b47aad [LLVM][Alignment] Introduce Alignment Type in DataLayout
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65521

Make getFunctionPtrAlign() return MaybeAlign

llvm-svn: 367817
2019-08-05 09:00:43 +00:00
Craig Topper 5a4989e2ac [TargetLowering][X86] Teach SimplifyDemandedVectorElts to replace the base vector of INSERT_SUBVECTOR with undef if none of the elements are demanded even if the node has other users.
Summary:
The SimplifyDemandedVectorElts function can replace with undef
when no elements are demanded, but due to how it interacts with
TargetLoweringOpts, it can only do this when the node has
no other users.

Remove a now unneeded DAG combine from the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65713

llvm-svn: 367788
2019-08-04 17:30:41 +00:00
Craig Topper 76f0f2e0f0 [SelectionDAG] Add node creation debug message to getMemIntrinsicNode.
llvm-svn: 367771
2019-08-04 02:32:06 +00:00
Craig Topper 2edeb8a11a [DAGCombiner] Prevent the combine added in r367710 from creating illegal types after type legalization.
This is further fix for PR42880.

Sanjay already disabled the X86 TLI hook for non-simple types,
but we should really call isTypeLegal here if we're after type
legalization.

llvm-svn: 367768
2019-08-03 23:09:13 +00:00
Bill Wendling 41a2847a9a Emit diagnostic if an inline asm constraint requires an immediate
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.

Reviewers: joerg, mgorny, efriedma, rsmith

Reviewed By: joerg

Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60942

llvm-svn: 367750
2019-08-03 05:52:47 +00:00
Simon Pilgrim 794f7591ec [TargetLowering] SimplifyMultipleUseDemandedBits - don't assume INSERT_VECTOR_ELT value type is simple.
Noticed by inspection - this was copied from the X86 target equivalent where we can assume its legal/simple.

llvm-svn: 367721
2019-08-02 21:07:07 +00:00
Philip Reames 511be2a158 [Statepoints] Fix overalignment of loads in no-realign-stack functions
This really should have been part of 366765.  For some reason, I forgot to handle the corresponding load side, and the readable test cases (using deopt vs statepoints) turned out to be overly reduced.  Oops.

As seen in the test change, the problem was that we were using a load with alignment expectations rather than the unaligned variant when the stack alignment was less than that prefered type alignment.

llvm-svn: 367718
2019-08-02 20:17:37 +00:00
Sanjay Patel 68264558f9 [DAGCombiner] try to convert opposing shifts to casts
This reverses a questionable IR canonicalization when a truncate
is free:

sra (add (shl X, N1C), AddC), N1C -->
sext (add (trunc X to (width - N1C)), AddC')

https://rise4fun.com/Alive/slRC

More details in PR42644:
https://bugs.llvm.org/show_bug.cgi?id=42644

I limited this to pre-legalization for code simplicity because that
should be enough to reverse the IR patterns. I don't have any
evidence (no regression test diffs) that we need to try this later.

Differential Revision: https://reviews.llvm.org/D65607

llvm-svn: 367710
2019-08-02 19:33:46 +00:00
Daniel Sanders 2bea69bf65 Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
2019-08-01 23:27:28 +00:00
Craig Topper a9ed5436bd [X86] In decomposeMulByConstant, legalize the VT before querying whether the multiply is legal
If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it.

This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that...

Differential Revision: https://reviews.llvm.org/D65533

llvm-svn: 367601
2019-08-01 18:49:07 +00:00
Simon Pilgrim 1d183b407a [TargetLowering] SimplifyMultipleUseDemandedBits - Add ISD::INSERT_VECTOR_ELT handling
Allow us to peek through vector insertions to avoid dependencies on entire insertion chains.

llvm-svn: 367588
2019-08-01 17:46:44 +00:00
Craig Topper 388df2ea19 [SelectionDAG] Use APInt::isSubsetOf/intersects to simplify some code.
Also use KnownBits::isNegative/isNonNegative to further simplify.

llvm-svn: 367518
2019-08-01 06:06:21 +00:00
Amy Huang 153f20057c Revert "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" and
and partial fix.
Causes windows buildbot errors.

This reverts commit 6e65c34523963094acd0d6c94a5f5c64b32fe6aa and
53da7ca943.

llvm-svn: 367496
2019-07-31 23:59:31 +00:00
Michael Berg 005d705d43 Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context.

Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper

Reviewed By: spatel

Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji

Differential Revision: https://reviews.llvm.org/D65170

llvm-svn: 367486
2019-07-31 21:57:28 +00:00
Amy Huang 27a73dd02c Fix to r367374 "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG"
after windows buildbot failure.

Added a check that the MachineInstr exists and is a call before trying
to add symbols around it.

llvm-svn: 367483
2019-07-31 21:03:38 +00:00
Peter Collingbourne 33773d5cfc SelectionDAG, MI, AArch64: Widen target flags fields/arguments from unsigned char to unsigned.
This makes the field wider than MachineOperand::SubReg_TargetFlags so that
we don't end up silently truncating any higher bits. We should still catch
any bits truncated from the MachineOperand field as a consequence of the
assertion in MachineOperand::setTargetFlags().

Differential Revision: https://reviews.llvm.org/D65465

llvm-svn: 367474
2019-07-31 20:14:09 +00:00
Wei Mi f49c107f06 [DAGCombine] Limit the number of times for the same store and root nodes
to bail out in store merging dependence check.

We run into a case where dependence check in store merging bail out many times
for the same store and root nodes in a huge basicblock. That increases compile
time by almost 100x. The patch add a map to track how many times the bailing
out happen for the same store and root, and if it is over a limit, stop
considering the store with the same root as a merging candidate.

Differential Revision: https://reviews.llvm.org/D65174

llvm-svn: 367472
2019-07-31 19:59:24 +00:00
Amy Huang 53da7ca943 [MS] Emit S_HEAPALLOCSITE debug info in SelectionDAG
Summary: This emits labels around heapallocsite calls in SelectionDAG.

Reviewers: rnk

Subscribers: MatzeB, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61105

llvm-svn: 367374
2019-07-31 00:16:13 +00:00
Wei Mi 888efda280 [DAGCombiner] Add an option to control whether or not to enable store merging.
Add an option to control whether or not to enable store merging in dag combiner
so we can workaround some bugs more easily.

Differential Revision: https://reviews.llvm.org/D65482

llvm-svn: 367365
2019-07-30 23:14:56 +00:00
Simon Pilgrim f8a7e9de06 [DAGCombine] narrowInsertExtractVectorBinOp - early out for binops that change value type. NFCI.
This is implicit in the value type checks in getSubVectorSrc - this just makes it upfront and obvious.

llvm-svn: 367220
2019-07-29 11:34:45 +00:00
Simon Pilgrim 76f2f04d9d [DAGCombine] narrowInsertExtractVectorBinOp - early out for illegal op. NFCI.
If the subvector binop is illegal then early-out and avoid the subvector searches.

llvm-svn: 367181
2019-07-27 19:42:58 +00:00
Simon Pilgrim 603f94aa2a [TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support (Reapplied)
This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back.

This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value.

Fixes PR42777

llvm-svn: 367174
2019-07-27 14:11:59 +00:00
Simon Pilgrim 8a52671782 [SelectionDAG] Check for any recursion depth greater than or equal to limit instead of just equal the limit.
If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit.

This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it. 

This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches.

llvm-svn: 367171
2019-07-27 12:48:46 +00:00
Simon Pilgrim 3ff6126487 [TargetLowering] Add depth limit to SimplifyMultipleUseDemandedBits
We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit.

llvm-svn: 367169
2019-07-27 12:23:36 +00:00
Nico Weber 13f337c4cb Revert r367091, it caused PR42777.
llvm-svn: 367118
2019-07-26 14:58:42 +00:00
Simon Pilgrim a424a1f351 [SelectionDAG] GetDemandedBits - update SIGN_EXTEND_INREG op to just call SimplifyMultipleUseDemandedBits.
llvm-svn: 367098
2019-07-26 10:03:07 +00:00
Simon Pilgrim 9758407bf1 [TargetLowering] SimplifyMultipleUseDemandedBits - add SIGN_EXTEND_INREG support.
llvm-svn: 367096
2019-07-26 09:41:08 +00:00
Simon Pilgrim d0164fc525 [SelectionDAG] GetDemandedBits - update OR/XOR ops to just call SimplifyMultipleUseDemandedBits.
Eventually all of these will be moved over, but we create nodes in GetDemandedBits recursion at the moment which causes regressions when we try to remove them all.

llvm-svn: 367092
2019-07-26 09:13:29 +00:00
Simon Pilgrim b32ceb79b0 [TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support.
This allows us to peek through BITCASTs and attempt simplify the source operand, and then bitcast back.

llvm-svn: 367091
2019-07-26 08:38:39 +00:00
Roman Lebedev 017e272c3a [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH

InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.

Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.

Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.

Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm

Reviewed By: spatel

Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62871

llvm-svn: 366955
2019-07-24 22:57:22 +00:00
Simon Pilgrim 2bf871be4c Fix signed/unsigned comparison warning. NFCI.
llvm-svn: 366935
2019-07-24 17:44:22 +00:00
Simon Pilgrim 7d318b2bb1 [DAGCombine] matchBinOpReduction - add partial reduction matching
This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector:

e.g. <8 x i32> reduction pattern in a <16 x i32> vector:

<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>
<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>

matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction.

I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction.

Fixes the x86 partial reduction sum cases in PR33758 and PR42023.

Differential Revision: https://reviews.llvm.org/D65047

llvm-svn: 366933
2019-07-24 17:29:56 +00:00
Simon Pilgrim 3f01c7197f [SelectionDAG] makeEquivalentMemoryOrdering - early out for equal chains (PR42727)
If we are already using the same chain for the old/new memory ops then just return.

Fixes PR42727 which had getLoad() reusing an existing node.

llvm-svn: 366922
2019-07-24 16:53:14 +00:00
Sanjay Patel 10dad95a75 [SDAG] convert (sub x, 1) to (add x, -1) in ctpop expansion; NFC
We canonicalize to the add form, so create that directly for efficiency.

llvm-svn: 366914
2019-07-24 15:43:50 +00:00
Simon Pilgrim 0e8359aec1 [TargetLowering] SimplifyMultipleUseDemandedBits - add VECTOR_SHUFFLE support.
If all the demanded elts are from one operand and are inline, then we can use the operand directly.

The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements.

llvm-svn: 366817
2019-07-23 15:35:55 +00:00
Simon Pilgrim 743d45ee25 [TargetLowering] Add SimplifyMultipleUseDemandedBits
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.

The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.

We do see a couple of regressions that need to be addressed:

    AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
	
    X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.

The code owners have confirmed its ok for these cases to fixed up in future patches.

Differential Revision: https://reviews.llvm.org/D63281

llvm-svn: 366799
2019-07-23 12:39:08 +00:00
Craig Topper a658cb0b12 [DAGCombiner] Make ShrinkLoadReplaceStoreWithStore return an SDValue instead of an SDNode*. NFCI
The function was calling getNode() on an SDValue to return and the
caller turned the result back into a SDValue. So just return the
original SDValue to avoid this.

llvm-svn: 366779
2019-07-23 05:13:39 +00:00
Craig Topper f5247244f2 [DAGCombiner] Use SDNode::isOperandOf to simplify some code. NFCI
llvm-svn: 366778
2019-07-23 05:13:35 +00:00
Richard Trieu 81a5045cd6 Move variable out from debug only section.
MFI is no longer just needed for an assert.  Move it out of the debug only
section to allow non-assert builds to be able to find it.

llvm-svn: 366773
2019-07-23 02:59:15 +00:00
Philip Reames 2f5543aa72 [Statepoints] Fix a bug in statepoint lowering for functions w/no-realign-stack
We were silently using the ABI alignment for all of the stores generated for deopt and gc values.  We'd gotten the alignment of the stack slot itself properly reduced (via MachineFrameInfo's clamping), but having the MMO on the store incorrect was enough for us to generate an aligned store to a unaligned location.

The simplest fix would have been to just pass the alignment to the helper function, but once we do that, the helper function doesn't really help.  So, inline it and directly call the MMO version of DAG.getStore with a properly constructed MMO.

Note that there's a separate performance possibility here.  Even if we *can* realign stacks, we probably don't *want to* if all of the stores are in slowpaths.  But that's a later patch, if at all.  :)

llvm-svn: 366765
2019-07-22 23:33:18 +00:00
Matt Arsenault 542720b2bc TableGen: Support physical register inputs > 255
This was truncating register value that didn't fit in unsigned char.
Switch AMDGPU sendmsg intrinsics to using a tablegen pattern.

llvm-svn: 366695
2019-07-22 15:02:34 +00:00
Christudasan Devadasan 006cf8c03d Added address-space mangling for stack related intrinsics
Modified the following 3 intrinsics:
int_addressofreturnaddress,
int_frameaddress & int_sponentry.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D64561

llvm-svn: 366679
2019-07-22 12:42:48 +00:00
Oliver Stannard 6771a89fa0 [IPRA][ARM] Make use of the "returned" parameter attribute
ARM has code to recognise uses of the "returned" function parameter
attribute which guarantee that the value passed to the function in r0
will be returned in r0 unmodified. IPRA replaces the regmask on call
instructions, so needs to be told about this to avoid reverting the
optimisation.

Differential revision: https://reviews.llvm.org/D64986

llvm-svn: 366669
2019-07-22 08:44:36 +00:00
Roman Lebedev cd9b19484b [Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements
Summary:
Four things here:
1. Generalize the fold to handle non-splat divisors. Reasonably trivial.
2. Unban power-of-two divisors. I don't see any reason why they should
   be illegal.
   * There is no ban in Hacker's Delight
   * I think the ban came from the same bug that caused the miscompile
      in the base patch - in `floor((2^W - 1) / D)` we were dividing by
      `D0` instead of `D`, and we **were** ensuring that `D0` is not `1`,
      which made sense.
3. Unban `1` divisors. I no longer believe Hacker's Delight actually says
   that the fold is invalid for `D = 0`. Further considerations:
   * We know that
     * `(X u% 1) == 0`  can be constant-folded to `1`,
     * `(X u% 1) != 0`  can be constant-folded to `0`,
   *  Also, we know that
     * `X u<= -1` can be constant-folded to `1`,
     * `X u>  -1` can be constant-folded to `0`,
   * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p
   * We know will end up with the following:
       `(setule/setugt (rotr (mul N, P), K), Q)`
   * Therefore, for given new DAG nodes and comparison predicates
     (`ule`/`ugt`), we will still produce the correct answer if:
     `Q` is a all-ones constant; and both `P` and `K` are *anything*
     other than `undef`.
   * The fold will indeed produce `Q = all-ones`.
4. Try to re-splat the `P` and `K` vectors - we don't care about
   their values for the lanes where divisor was `1`.

Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00

Reviewed By: RKSimon

Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63963

llvm-svn: 366637
2019-07-20 16:33:15 +00:00
Matt Arsenault 5905aae169 DAG: Handle dbg_value for arguments split into multiple subregs
This was handled previously for arguments split due to not fitting in
an MVT. This was dropping the register for argument registers split
due to TLI::getRegisterTypeForCallingConv.

llvm-svn: 366574
2019-07-19 13:36:46 +00:00
Amy Huang f332fe642c [COFF] Change a variable type to be const in the HeapAllocSite map.
llvm-svn: 366479
2019-07-18 18:22:52 +00:00
Simon Pilgrim 8b525e357f [DAGCombine] Pull getSubVectorSrc helper out of narrowInsertExtractVectorBinOp. NFCI.
NFC step towards reusing this in other EXTRACT_SUBVECTOR combines.

llvm-svn: 366435
2019-07-18 13:45:53 +00:00
Evgeniy Stepanov d752f5e953 Basic codegen for MTE stack tagging.
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.

Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.

Differential Revision: https://reviews.llvm.org/D64172

llvm-svn: 366360
2019-07-17 19:24:02 +00:00
Amaury Sechet f34a69c2e2 [DAGCombiner] fold (addcarry (xor a, -1), b, c) -> (subcarry b, a, !c) and flip carry.
Summary:
As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b.

Depends on D57302

Reviewers: hfinkel, RKSimon, craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59208

llvm-svn: 366214
2019-07-16 15:17:00 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
Simon Pilgrim 701e2c0d71 [DAGCombine] narrowExtractedVectorBinOp - wrap subvector extraction in helper. NFCI.
First step towards supporting 'free' subvector extractions other than concat_vectors.

llvm-svn: 365896
2019-07-12 13:00:35 +00:00
Simon Pilgrim d0307f93a7 [DAGCombine] narrowInsertExtractVectorBinOp - add CONCAT_VECTORS support
We already split extract_subvector(binop(insert_subvector(v,x),insert_subvector(w,y))) -> binop(x,y).

This patch adds support for extract_subvector(binop(concat_vectors(),concat_vectors())) cases as well.

In particular this means we don't have to wait for X86 lowering to convert concat_vectors to insert_subvector chains, which helps avoid some cases where demandedelts/combine calls occur too late to split large vector ops.

The fast-isel-store.ll load folding regression is annoying but I don't think is that critical.

Differential Revision: https://reviews.llvm.org/D63653

llvm-svn: 365785
2019-07-11 14:45:03 +00:00
Tim Northover f2d6597653 OpaquePtr: use byval accessor instead of inspecting pointer type. NFC.
The accessor can deal with both "byval(ty)" and "ty* byval" forms
seamlessly.

llvm-svn: 365769
2019-07-11 13:12:38 +00:00
Sanjay Patel 138328e45c [SDAG] commute setcc operands to match a subtract
If we have:

R = sub X, Y
P = cmp Y, X

...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags.

Motivated by diffs in D58875 - not sure if this changes anything there,
but this seems like a good thing independent of that.

There's a more involved version of this transform already in IR (in instcombine
although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()".

Differential Revision: https://reviews.llvm.org/D63958

llvm-svn: 365711
2019-07-10 23:23:54 +00:00
Michael Berg f4572249d7 Move three folds for FADD, FSUB and FMUL in the DAG combiner away from Unsafe to more aligned checks that reflect context
Summary: Unsafe does not map well alone for each of these three cases as it is missing NoNan context when accessed directly with clang.  I have migrated the fold guards to reflect the expectations of handing nan and zero contexts directly (NoNan, NSZ) and some tests with it.  Unsafe does include NSZ, however there is already precedent for using the target option directly to reflect that context. 

Reviewers: spatel, wristow, hfinkel, craig.topper, arsenm

Reviewed By: arsenm

Subscribers: michele.scandale, wdng, javed.absar

Differential Revision: https://reviews.llvm.org/D64450

llvm-svn: 365679
2019-07-10 18:23:26 +00:00
Nick Desaulniers 8728e45706 [TargetLowering] support BlockAddress as "i" inline asm constraint
Summary:
This allows passing address of labels to inline assembly "i" input
constraints.

Fixes pr/42502.

Reviewers: ostannard

Reviewed By: ostannard

Subscribers: void, echristo, nathanchance, ostannard, javed.absar, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64167

llvm-svn: 365664
2019-07-10 17:08:25 +00:00
Simon Pilgrim 94c84aca5d [DAGCombine] visitINSERT_SUBVECTOR - use uint64_t subvector index. NFCI.
Keep the uint64_t type from getZExtValue() to stop truncation/extension overflow warnings in MSVC in subvector index math.

llvm-svn: 365621
2019-07-10 12:21:35 +00:00
Simon Pilgrim bb1167a3a1 Fix const/non-const lambda return type warning. NFCI.
llvm-svn: 365613
2019-07-10 10:45:09 +00:00
Craig Topper 84a1f07363 [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.

This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.

Differential Revision: https://reviews.llvm.org/D64295

llvm-svn: 365549
2019-07-09 19:55:28 +00:00
Simon Pilgrim 57603cbde8 [DAGCombine] LoadedSlice - keep getOffsetFromBase() uint64_t offset. NFCI.
Keep the uint64_t type from getOffsetFromBase() to stop truncation/extension overflow warnings in MSVC in alignment math.

llvm-svn: 365504
2019-07-09 15:28:57 +00:00
Tim Northover 60afa49abe OpaquePtr: add Type parameter to Loads analysis API.
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.

Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.

llvm-svn: 365468
2019-07-09 11:35:35 +00:00
Bjorn Pettersson 051a6a1c33 [SelectionDAG] Simplify some calls to getSetCCResultType. NFC
DAGTypeLegalizer and SelectionDAGLegalize has helper
functions wrapping the call to TLI.getSetCCResultType(...).
Use those helpers in more places.

llvm-svn: 365456
2019-07-09 10:27:51 +00:00
Bjorn Pettersson 59029017a6 [LegalizeTypes] Fix saturation bug for smul.fix.sat
Summary:
Make sure we use SETGE instead of SETGT when checking
if the sign bit is zero at SMULFIXSAT expansion.

The faulty expansion occured when doing "expand" of
SMULFIXSAT and the scale was exactly matching the
size of the smaller type. For example doing
  i64 Z = SMULFIXSAT X, Y, 32
and expanding X/Y/Z into using two i32 values.

The problem was that we sometimes did not saturate
to min when overflowing.

Here is an example using Q3.4 numbers:

Consider that we are multiplying X and Y.
  X = 0x80 (-8.0 as Q3.4)
  Y = 0x20 (2.0 as Q3.4)
To avoid loss of precision we do a widening
multiplication, getting a 16 bit result
  Z = 0xF000 (-16.0 as Q7.8)

To detect negative overflow we should check if
the five most significant bits in Z are less than -1.
Assume that we name the 4 most significant bits
as HH and the next 4 bits as HL. Then we can do the
check by examining if
 (HH < -1) or (HH == -1 && "sign bit in HL is zero").

The fault was that we have been doing the check as
 (HH < -1) or (HH == -1 && HL > 0)
instead of
 (HH < -1) or (HH == -1 && HL >= 0).

In our example HH is -1 and HL is 0, so the old
code did not trigger saturation and simply truncated
the result to 0x00 (0.0). With the bugfix we instead
detect that we should saturate to min, and the result
will be set to 0x80 (-8.0).

Reviewers: leonardchan, bevinh

Reviewed By: leonardchan

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64331

llvm-svn: 365455
2019-07-09 10:24:50 +00:00
Guillaume Chatelet 336f3e1601 Fixing @llvm.memcpy not honoring volatile.
This is explicitly not addressing target-specific code, or calls to memcpy.

Summary: https://bugs.llvm.org/show_bug.cgi?id=42254

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63215

llvm-svn: 365449
2019-07-09 09:53:36 +00:00
Reid Kleckner 2f07c2e9d9 Standardize on MSVC behavior for triples with no environment
Summary:
This makes it so that IR files using triples without an environment work
out of the box, without normalizing them.

Typically, the MSVC behavior is more desirable. For example, it tends to
enable things like constant merging, use of associative comdats, etc.

Addresses PR42491

Reviewers: compnerd

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64109

llvm-svn: 365387
2019-07-08 21:05:20 +00:00
Simon Pilgrim 9285bf0fb9 [TargetLowering] SimplifyDemandedBits - just call computeKnownBits for BUILD_VECTOR cases.
Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well).

A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts.

llvm-svn: 365309
2019-07-08 11:00:39 +00:00
Simon Pilgrim 9c68aa33e3 [DAGCombine] convertBuildVecZextToZext - remove duplicate getOpcode() call. NFCI.
llvm-svn: 365269
2019-07-06 18:32:15 +00:00
Craig Topper e9aed963ce [DAGCombiner] Don't combine (addcarry (uaddo X, Y), 0, Carry) -> (addcarry X, Y, Carry) if the Carry comes from the uaddo.
Summary:
The uaddo won't be removed and the addcarry will still be
dependent on the uaddo. So we'll just increase the use count
of X and Y and potentially require a COPY.

Reviewers: spatel, RKSimon, deadalnix

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64190

llvm-svn: 365149
2019-07-04 18:18:46 +00:00
Francis Visoiu Mistrih 83bbe2f418 [CodeGen] Make branch funnels pass the machine verifier
We previously marked all the tests with branch funnels as
`-verify-machineinstrs=0`.

This is an attempt to fix it.

1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList =
(outs)`

2) After that we hit an assert: ``` Assertion failed: (Op.getValueType()
!= MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue
operands should occur at end of operand list!"), function AddOperand,
file
/Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp,
line 461.  ```

The chain operand was added at the beginning of the operand list. Move
that to the end.

3) After that we hit another verifier issue in the pseudo expansion
where the registers used in the cmps and jmps are not added to the
livein lists. Add the `EFLAGS` to all the new MBBs that we create.

PR39436

Differential Review: https://reviews.llvm.org/D54155

llvm-svn: 365058
2019-07-03 17:16:45 +00:00
Amaury Sechet 57dfacb32d Use getAllOnesConstants instead of -1 in DAGCombiner. NFC
llvm-svn: 365054
2019-07-03 16:34:36 +00:00
Amaury Sechet bddb8c3597 [DAGCombine] More diamong carry pattern optimization.
Summary:
This diff improve the capability of DAGCOmbine to generate linear carries propagation in presence of a diamond pattern. It is now able to match a large variety of different patterns rather than some hardcoded one.

Arguably, the codegen in test cases is not better, but this is to be expected. The goal of this transformation is more about canonicalisation than actual optimisation.

Reviewers: hfinkel, RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57302

llvm-svn: 365051
2019-07-03 16:15:59 +00:00
James Molloy fa4aac7335 [SelectionDAG] Propagate alias metadata to target intrinsic nodes
When a target intrinsic has been determined to touch memory, we construct a MachineMemOperand during SDAG construction. In this case, we should propagate AAMDNodes metadata to the MachineMemOperand where available.

Differential revision: https://reviews.llvm.org/D64131

llvm-svn: 365043
2019-07-03 14:33:29 +00:00
Roman Lebedev c4b83a6054 [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457)
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.

Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.

Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.

Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel

Reviewed By: efriedma

Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64090

llvm-svn: 365010
2019-07-03 09:41:35 +00:00
Roman Lebedev 7c8ee375d8 [NFC][TargetLowering] Some preparatory cleanups around 'prepareUREMEqFold()' from D63963
llvm-svn: 364921
2019-07-02 13:21:23 +00:00
Zi Xuan Wu 7ae536a1ce [DAGCombiner] Exploiting more about the transformation of TransformFPLoadStorePair function
For a given floating point load / store pair, if the load value isn't used by any other operations, 
then consider transforming the pair to integer load / store operations if the target deems the transformation profitable.

And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair 
so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer.

I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target.

Differential Revision: https://reviews.llvm.org/D60601

llvm-svn: 364883
2019-07-02 02:54:52 +00:00
Benjamin Kramer ed13fef477 [SelectionDAG] Do minnum->minimum at legalization time instead of building time
The SDAGBuilder behavior stems from the days when we didn't have fast
math flags available in SDAG. We do now and doing the transformation in
the legalizer has the advantage that it also works for vector types.

llvm-svn: 364743
2019-07-01 11:00:23 +00:00
Craig Topper 4d0feb28ec [SelectionDAG] Use the memory VT instead of result VT for FoldingSet profiling in getMaskedLoad/getMaskedStore.
This matches what is done by the Profile function. Otherwise CSE
won't work properly.

llvm-svn: 364717
2019-06-30 06:46:33 +00:00
Roman Lebedev 29d05c005f [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).

Original patch D50222 by @hermord (Dmytro Shynkevych)

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

llvm-svn: 364600
2019-06-27 21:52:10 +00:00
Roman Lebedev 0a2b7b79fa Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)"
*Appears* to break test-suite on
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790

FAIL: burg.execution_time
FAIL: spiff.execution_time
FAIL: employ.execution_time
FAIL: llu.execution_time
FAIL: gramschmidt.execution_time
FAIL: fdtd-apml.execution_time

This reverts commit r364563.

llvm-svn: 364568
2019-06-27 17:22:31 +00:00
Roman Lebedev 0627b09863 [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

Original patch author: @hermord (Dmytro Shynkevych)!

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

llvm-svn: 364563
2019-06-27 16:45:42 +00:00
Simon Pilgrim 83e1a1e79b [TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support.
llvm-svn: 364548
2019-06-27 14:25:54 +00:00
Simon Pilgrim c692a8dc51 [TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts
llvm-svn: 364541
2019-06-27 13:48:43 +00:00
Djordje Todorovic 7eeeb5947e [ISEL][X86] Tracking of registers that forward call arguments
While lowering calls, collect info about registers that forward arguments
into following function frame. We store such info into the MachineFunction
of the call. This is used very late when dumping DWARF info about
call site parameters.

([9/13] Introduce the debug entry values.)

Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>

Differential Revision: https://reviews.llvm.org/D60715

llvm-svn: 364516
2019-06-27 10:51:15 +00:00
Roman Lebedev b0ecc1cc6b [X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness
Summary:
(Not so) boringly identical to pattern a (D62786)
Not yet sure how do deal with the last pattern c.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62793

llvm-svn: 364418
2019-06-26 12:19:39 +00:00
Simon Pilgrim a6319e5f83 [DAGCombine] visitEXTRACT_SUBVECTOR - add TODO for extract_subvector(bitcast()) support
We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases  (e.g. extract_subvector(v2i64 bitcast(v16i8)))

llvm-svn: 364405
2019-06-26 11:17:38 +00:00
QingShan Zhang e0e7d4c366 Teach the DAGCombine to fold this pattern(c1 and c2 is constant).
// fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2)
// fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2)
// fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2)
Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e.

t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0>
t2: i64 = any_extend t1
 -->
t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0>
 -->
t4: i64 = sign_extend_inreg t3

Differential Revision: https://reviews.llvm.org/D63318

llvm-svn: 364382
2019-06-26 05:12:53 +00:00
Simon Pilgrim 9762b26032 [DAGCombine] combineRepeatedFPDivisors - recognize -1.0 / X as a reciprocal
Fixes issue identified by @nemanjai (Nemanja Ivanovic) in D62963 / rL363040 - infinite loop due to GetNegatedExpression fighting combineRepeatedFPDivisors resulting in fneg(fdiv(x,splat)) -> fneg(fmul(x,1.0/splat)) -> fmul(x,-1.0/splat) -> fmul(x,(-1.0 * 1.0)/splat) ......

llvm-svn: 364326
2019-06-25 16:00:16 +00:00
Sanjay Patel 685c5cbc65 [SDAG] expand ctpop != 1
Change the generic ctpop expansion to more efficiently handle a
check for not-a-power-of-two value:
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

This is the inverted predicate sibling pattern that was added with:
D63004

This should have been done before I changed IR canonicalization to
favor this form with:
rL364246
...so if this requires revert/changing, the earlier commit may also
need to modified.

llvm-svn: 364319
2019-06-25 14:46:52 +00:00
Simon Pilgrim 1a18bb6f25 [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support
Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases.

Reapplies rL363856.

llvm-svn: 364311
2019-06-25 13:25:57 +00:00
Simon Pilgrim 36953ce769 [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.

Matches what we already do for ZERO_EXTEND.

Reapplies rL363850 but now with legality checks added at rL364290

llvm-svn: 364303
2019-06-25 12:57:43 +00:00
Sanjay Patel e4ef62291b [SDAG] improve expansion of ctpop+setcc
This should not cause any visible change in output, but it's
more efficient because we were producing non-canonical 'sub x, 1'
and 'setcc ugt x, 0'. As mentioned in the TODO, we should also
be handling the inverse predicate.

llvm-svn: 364302
2019-06-25 12:49:35 +00:00
Simon Pilgrim 69fc111184 [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.

Matches what we already do for SIGN_EXTEND.

Reapplies rL363802 but now with legality checks added at rL364290

llvm-svn: 364299
2019-06-25 12:19:12 +00:00
Simon Pilgrim b23c942ce4 [VectorLegalizer] ExpandANY_EXTEND_VECTOR_INREG/ExpandZERO_EXTEND_VECTOR_INREG - widen source vector
The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this.

This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this.

Part of the yak shaving to solve the crashes from rL364264 and rL364272

llvm-svn: 364295
2019-06-25 11:31:37 +00:00
Simon Pilgrim 49b3778e32 [TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ZERO/ANY_EXTEND
As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations || isOperationLegal cases.

We'll improve X86 legality in future commits.

llvm-svn: 364290
2019-06-25 10:51:15 +00:00
Roman Lebedev cdd43eac4f [Codegen] TargetLowering::SimplifySetCC(): omit urem when possible
Summary:
This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll`
The missing fold, at least partially, looks trivial:
https://rise4fun.com/Alive/Zsln
i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two,
and the `urem` is of something that may at most have a single bit set (or no bits set at all),
the `urem` is not needed.

Reviewers: RKSimon, craig.topper, xbolva00, spatel

Reviewed By: xbolva00, spatel

Subscribers: xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63390

llvm-svn: 364286
2019-06-25 10:01:42 +00:00
Craig Topper 079924b0b7 Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..."
This reverts the following patches.
"[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support"

We can end up with an any_extend_vector_inreg with a 256 bit result type
and a 128 bit result type. This is allowed by the ISD opcode, but the
generic operation legalizer is only able to expand cases where the
total vector width is the same.

The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg.
The SimplifyDemandedBits changes are allowing those nodes to become
aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom
handling and never lets them get to the generic legalizer. We need to do the same
for aext_vec_inreg.

llvm-svn: 364264
2019-06-25 01:32:42 +00:00
Roland Froese ea08248b2b [CodeGen] Add missing vector type legalization for ctlz_zero_undef
Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as
ctlz and cttz.

Differential Revision: https://reviews.llvm.org/D63463

llvm-svn: 364221
2019-06-24 19:27:07 +00:00
Matt Arsenault e3a676e9ad CodeGen: Introduce a class for registers
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().

llvm-svn: 364191
2019-06-24 15:50:29 +00:00
Simon Pilgrim 69144a925e [DAGCombine] visitMUL - allow shift by zero in MulByConstant.
This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward.

Fixes OSS Fuzz #15429

llvm-svn: 364179
2019-06-24 12:47:17 +00:00
Craig Topper 6ddc7912b0 [SelectionDAG] Remove the code that attempts to calculate the alignment for the second half of a split masked load/store.
The code divides the alignment by 2 if the original alignment is
equal to the original VT size. But this wouldn't be correct
if the alignment was larger than the VT size.

The memory operand object already takes care of calling MinAlign
on the base alignment and the memory pointer offset. So we don't
need any special code at all.

llvm-svn: 364151
2019-06-23 07:00:46 +00:00
Simon Pilgrim 0da13ed1f6 [DAGCombine] narrowExtractedVectorBinOp - pull out repeated getOpcode(). NFCI.
llvm-svn: 364076
2019-06-21 16:44:51 +00:00
Simon Pilgrim ca9933c22d [DAGCombine] narrowInsertExtractVectorBinOp - reuse "extract from insert" detection code.
Move the "extract from insert detection code" into a lambda helper function.

llvm-svn: 364059
2019-06-21 14:46:21 +00:00
Simon Pilgrim 801c0f12b0 [DAGCombiner] Use getAPIntValue() instead of getZExtValue() where possible.
Better handling of out-of-i64-range values due to large integer types or from fuzz tests.

llvm-svn: 363955
2019-06-20 17:36:23 +00:00
Jordan Rupprecht 02508decf4 [DAGCombiner][NFC] Remove unused var
llvm-svn: 363954
2019-06-20 17:30:01 +00:00
Simon Pilgrim 1d8093249f [DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), C)) non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

llvm-svn: 363929
2019-06-20 14:42:27 +00:00
Simon Pilgrim 98a0ac5c0f [DAGCombine] Add TODOs for some combines that should support non-uniform vectors
We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc.

llvm-svn: 363924
2019-06-20 12:48:49 +00:00
Simon Pilgrim a487628270 [DAGCombine] Reduce scope of ShAmtVal variable. NFCI.
Fixes cppcheck warning.

Use the more capable getAPIntVal() instead of getZExtValue() as well since I'm here.

llvm-svn: 363921
2019-06-20 10:56:37 +00:00
Simon Pilgrim 046d49a8dc [DAGCombine] Use ConstantSDNode::getAPIntValue() instead of getZExtValue().
Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something.....

llvm-svn: 363887
2019-06-19 22:14:24 +00:00
Simon Pilgrim f05369768c [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support
Move 'lowest' demanded elt -> bitcast fold out of ZERO_EXTEND_VECTOR_INREG into ANY_EXTEND_VECTOR_INREG case.

llvm-svn: 363856
2019-06-19 18:34:58 +00:00
Simon Pilgrim 6016fb726c [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.

Matches what we already do for ZERO_EXTEND.

llvm-svn: 363850
2019-06-19 18:00:24 +00:00
Simon Pilgrim c3994f77cb [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.

Matches what we already do for SIGN_EXTEND.

llvm-svn: 363802
2019-06-19 13:58:02 +00:00
Simon Pilgrim 9eed5d2f78 [DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

llvm-svn: 363793
2019-06-19 12:41:37 +00:00
Simon Pilgrim 8c49366c9b [DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths.

llvm-svn: 363792
2019-06-19 12:25:29 +00:00
Simon Pilgrim bb6b856183 [DAGCombiner] visitSHL - pull out repeated shift amount VT. NFCI.
llvm-svn: 363789
2019-06-19 11:31:26 +00:00
Simon Pilgrim d954a53633 [DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) comment. NFCI.
We pre-extend, not post.

llvm-svn: 363787
2019-06-19 11:17:48 +00:00
Matt Arsenault 9cac4e6d14 Rename ExpandISelPseudo->FinalizeISel, delay register reservation
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.

Patch by Matthias Braun

llvm-svn: 363757
2019-06-19 00:25:39 +00:00
Simon Pilgrim 5bef886cd8 [TargetLowering] SimplifyDemandedBits - Cleanup ANY_EXTEND handling
Match SIGN_EXTEND + ZERO_EXTEND handling - will be adding ANY_EXTEND_VECTOR_INREG support in a future patch.

llvm-svn: 363716
2019-06-18 18:22:30 +00:00