Commit Graph

11688 Commits

Author SHA1 Message Date
Simon Pilgrim 80715b7124 SelectionDAG.cpp - remove non-constant EXTRACT_SUBVECTOR/INSERT_SUBVECTOR handling. NFC.
Now that D79814 has landed, we can assume that subvector ops use constant, in-range indices.
2020-05-14 13:23:00 +01:00
Eric Christopher bfa200ebcf Remove an unused variable. 2020-05-13 15:13:02 -07:00
Eli Friedman ed428c429e [SelectionDAG] Require constant index for INSERT/EXTRACT_SUBVECTOR.
It sounds like an interesting idea in theory, but nothing is actually
taking advantage of it, and specifying/implementing the edge cases is
painful. So just forbid it.

Differential Revision: https://reviews.llvm.org/D79814
2020-05-13 13:08:59 -07:00
Craig Topper 8c72b0271b [CodeGen] Use Align in MachineConstantPool. 2020-05-12 10:06:40 -07:00
James Y Knight e9536795a3 Add comment for SelectionDAGBuilder::SL field. 2020-05-12 10:46:08 -04:00
David Sherwood 42c7a6d52b [CodeGen] Fix incorrect uses of getVectorNumElements()
I have fixed up some places in SelectionDAG::getNode() where we
used to assert that the number of vector elements for two types
are the same. I have changed such cases to assert that the
element counts are the same instead. I've added new tests that
exercise the code paths for all the truncations. All the extend
operations are covered by this existing test:

  CodeGen/AArch64/sve-sext-zext.ll

For the ISD::SETCC case I fixed this code path is exercised by
these existing tests:

  CodeGen/AArch64/sve-fcmp.ll
  CodeGen/AArch64/sve-intrinsics-int-compares-with-imm.ll

Differential Revision: https://reviews.llvm.org/D79399
2020-05-12 07:50:37 +01:00
Eli Friedman c9c930ae67 [SelectionDAG] Don't promote the alignment of allocas beyond the stack alignment.
allocas in LLVM IR have a specified alignment. When that alignment is
specified, the alloca has at least that alignment at runtime.

If the specified type of the alloca has a higher preferred alignment,
SelectionDAG currently ignores that specified alignment, and increases
the alignment. It does this even if it would trigger stack realignment.
I don't think this makes sense, so this patch changes that.

I was looking into this for SVE in particular: for SVE, overaligning
vscale'ed types is extra expensive because it requires realigning the
stack multiple times, or using dynamic allocation. (This currently isn't
implemented.)

I updated the expected assembly for a couple tests; in particular, for
arg-copy-elide.ll, the optimization in question does not increase the
alignment the way SelectionDAG normally would. For the rest, I just
increased the specified alignment on the allocas to match what
SelectionDAG was inferring.

Differential Revision: https://reviews.llvm.org/D79532
2020-05-11 17:39:00 -07:00
Sam McCall 728cf6d86b Revert "[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression"
This reverts commit 3c44c441db.

Causes infloops on some inputs, see https://reviews.llvm.org/D77319 for repro
2020-05-11 16:44:01 +02:00
QingShan Zhang 3c44c441db [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.

This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D77319
2020-05-11 02:41:10 +00:00
Craig Topper bebdc62c3f [SelectionDAG] Remove ConstantPoolSDNode::getAlignment.
Use getAlign instead.

Differential Revision: https://reviews.llvm.org/D79459
2020-05-08 16:04:11 -07:00
Craig Topper d1119980e5 [SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode.
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.

Removing getAlignment() will be done as a follow up.

Differential Revision: https://reviews.llvm.org/D79436
2020-05-08 16:04:11 -07:00
Simon Pilgrim 70293ba26f [DAG] SimplifyMultipleUseDemandedBits - remove superfluous bitcasts
If the SimplifyMultipleUseDemandedBits calls BITCASTs that peek through back to the original type then we can remove the BITCASTs entirely.

Differential Revision: https://reviews.llvm.org/D79572
2020-05-08 19:04:49 +01:00
aartbik 771d30c647 [llvm] [CodeGen] Fixed vector halving bug for masked store
Summary:
Note that this fix is very similar to what has already been
done for the masked load in https://reviews.llvm.org/D78608

Bugs:
https://bugs.llvm.org/show_bug.cgi?id=45563
https://bugs.llvm.org/show_bug.cgi?id=45833

Reviewers: craig.topper, nicolasvasilache, mehdi_amini

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79611
2020-05-07 19:01:40 -07:00
Kerry McLaughlin a31f4c52bf [SVE][CodeGen] Fix legalisation for scalable types
Summary:
This patch handles illegal scalable types when lowering IR operations,
addressing several places where the value of isScalableVector() is
ignored.

For types such as <vscale x 8 x i32>, this means splitting the
operations. In this example, we would split it into two
operations of type <vscale x 4 x i32> for the low and high halves.

In cases such as <vscale x 2 x i32>, the elements in the vector
will be promoted. In this case they will be promoted to
i64 (with a vector of type <vscale x 2 x i64>)

Reviewers: sdesmalen, efriedma, huntergr

Reviewed By: efriedma

Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78812
2020-05-07 10:01:31 +01:00
Craig Topper 7b9d6673bf [SelectionDAG] When splitting gather operands in type legalization, set MMO size to UnknownSize
I missed this case when I did the same for gather results and scatter
operands in c69a4d6bef.
2020-05-06 19:57:14 -07:00
LemonBoy 7fa5abd343 [SelectionDAG] Fix assertion failure with big shift amounts
Calling getShiftAmountTy with LegalTypes set may return a type that's too narrow to hold the shift amount for integer type it's applied to.

Fixes the regression introduced by D79096

Differential Revision: https://reviews.llvm.org/D79405
2020-05-06 11:58:37 -07:00
Sanjay Patel 2f1fe1864d [DAGCombiner] sink target-supported FP<->int cast op after concat vectors
Try to combine N short vector cast ops into 1 wide vector cast op:
concat (cast X), (cast Y)... -> cast (concat X, Y...)

This is part of solving PR45794:
https://bugs.llvm.org/show_bug.cgi?id=45794

As noted in the code comment, this is uglier than I was hoping because
the opcode determines whether we pass the source or destination type
to isOperationLegalOrCustom(). Also IIUC, there's no way to validate
what the other (dest or src) type is. Without the extra legality check
on that, there's an ARM regression test in:
test/CodeGen/ARM/isel-v8i32-crash.ll
...that will crash trying to lower an unsupported v8f32 to v8i16.

Differential Revision: https://reviews.llvm.org/D79360
2020-05-06 10:25:58 -04:00
David Sherwood cd3a54c55a [CodeGen] Fix warnings due to SelectionDAG::getSplatSourceVector
Summary:
I have fixed several places in getSplatSourceVector and isSplatValue
to work correctly with scalable vectors. I added new support for
the ISD::SPLAT_VECTOR DAG node as one of the obvious cases we can
support with scalable vectors. In other places I have tried to do
the sensible thing, such as bail out for vector types we don't yet
support or don't intend to support.

It's not possible to add IR test cases to cover these changes, since
they are currently only ever exercised on certain targets, e.g.
only X86 targets use the result of getSplatSourceVector. I've
assumed that X86 tests already exist to test these code paths for
fixed vectors. However, I have added some AArch64 unit tests that
test the specific functions I have changed.

Differential revision: https://reviews.llvm.org/D79083
2020-05-05 08:45:41 +01:00
Alex Richardson d1ff003fbb [SelectionDAGBuilder] Stop setting alignment to one for hidden sret values
We allocated a suitably aligned frame index so we know that all the values
have ABI alignment.
For MIPS this avoids using pair of lwl + lwr instructions instead of a
single lw. I found this when compiling CHERI pure capability code where
we can't use the lwl/lwr unaligned loads/stores and and were to falling
back to a byte load + shift + or sequence.

This should save a few instructions for MIPS and possibly other backends
that don't have fast unaligned loads/stores.
It also improves code generation for CodeGen/X86/pr34653.ll and
CodeGen/WebAssembly/offset.ll since they can now use aligned loads.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78999
2020-05-04 14:44:39 +01:00
LemonBoy 6d103ca855 [SelectionDAG] Unify scalarizeVectorLoad and VectorLegalizer::ExpandLoad
The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces.

The technique employed by `ExpandLoad`  is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines.

Differential Revision: https://reviews.llvm.org/D79096
2020-05-02 15:18:10 -07:00
Simon Pilgrim a09a3c6d3e Revert rG8e05ac0a510c - "[DAGCombine] visitTRUNCATE - remove GetDemandedBits call"
Causing buildbot failures
2020-05-02 20:08:33 +01:00
Simon Pilgrim 8e05ac0a51 [DAGCombine] visitTRUNCATE - remove GetDemandedBits call
rL368553 added SimplifyMultipleUseDemandedBits handling for ISD::TRUNCATE to SimplifyDemandedBits so we don't need to duplicate this (and it gets rid of another GetDemandedBits call which is slowly being replaced with SimplifyMultipleUseDemandedBits anyhow).
2020-05-02 19:52:17 +01:00
Simon Pilgrim 7cb5a51f38 [DAG] SimplifyDemandedVectorElts - add INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling 2020-05-01 16:20:51 +01:00
Simon Pilgrim 65d32a9892 [DAG] SimplifyDemandedVectorElts - remove INSERT_SUBVECTOR if we don't demand the subvector 2020-05-01 16:20:51 +01:00
Simon Pilgrim e3c0be596c [DAG] SimplifyDemandedVectorElts - add EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling 2020-05-01 13:48:07 +01:00
Craig Topper 6a1ad76dab [X86] Don't return true from isTruncateFree for vectors
Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to.

Differential Revision: https://reviews.llvm.org/D79045
2020-04-30 16:43:35 -07:00
Simon Pilgrim 96238486ed [DAGCombine] Move the remaining X86 funnel shift patterns to DAGCombine
X86 matches several 'shift+xor' funnel shift patterns:

  fold (or (srl (srl x1, 1), (xor y, 31)), (shl x0, y))  -> (fshl x0, x1, y)
  fold (or (shl (shl x0, 1), (xor y, 31)), (srl x1, y))  -> (fshr x0, x1, y)
  fold (or (shl (add x0, x0), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y)

These patterns are also what we end up with the proposed expansion changes in D77301.

This patch moves these to DAGCombine's generic MatchFunnelPosNeg.

All existing X86 test cases still pass, and we just have a small codegen change in pr32282.ll.

Reviewed By: @spatel

Differential Revision: https://reviews.llvm.org/D78935
2020-04-30 12:57:17 +01:00
Simon Pilgrim 6547a5ceb2 [DAG] Add TODO comment regarding ADD(X,X) -> SHL(X,1) canonicalization
As discussed on D78935
2020-04-30 12:57:16 +01:00
David Sherwood 058cd8c5be [CodeGen] Add support for inserting elements into scalable vectors
Summary:
This patch tries to ensure that we do something sensible when
generating code for the ISD::INSERT_VECTOR_ELT DAG node when operating
on scalable vectors. Previously we always returned 'undef' when
inserting an element into an out-of-bounds lane index, whereas now
we only do this for fixed length vectors. For scalable vectors it
is assumed that the backend will do the right thing in the same way
that we have to deal with variable lane indices.

In this patch I have permitted a few basic combinations for scalable
vector types where it makes sense, but in general avoided most cases
for now as they currently require the use of BUILD_VECTOR nodes.

This patch includes tests for all scalable vector types when inserting
into lane 0, but I've only included one or two vector types for other
cases such as variable lane inserts.

Differential Revision: https://reviews.llvm.org/D78992
2020-04-30 11:14:04 +01:00
QingShan Zhang b5f89744cc [DAGCombine] Checking the cost directly to improve the code readability
Call getNegatedExpression(Cost) and check the Cost to make the code more clear.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D78347
2020-04-29 01:49:39 +00:00
Craig Topper e13c141a91 [SelectionDAGBuilder] Use CallBase::isInlineAsm in a couple places. NFC
These lines were just changed from using CallBase::getCalledValue
to getCallledOperand. Go aheand change them to isInlineAsm.
2020-04-27 23:00:44 -07:00
Craig Topper a58b62b4a2 [IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand().
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().

I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.

Differential Revision: https://reviews.llvm.org/D78882
2020-04-27 22:17:03 -07:00
David Sherwood 096b25a8d8 [CodeGen] Use SPLAT_VECTOR for zeroinitialiser with scalable types
Summary:
When generating code for the LLVM IR zeroinitialiser operation, if
the vector type is scalable we should be using SPLAT_VECTOR instead
of BUILD_VECTOR.

Differential Revision: https://reviews.llvm.org/D78636
2020-04-27 15:57:59 +01:00
QingShan Zhang 2957fa0cd1 [NFC][DAGCombine] Adding three helper functions and change the getNegatedExpression to negateExpression
This is a NFC patch for D77319. The idea is to hide the getNegatibleCost inside the getNegatedExpression()
to have it return null if the cost is expensive, and add some helper function for easy to use. And
rename the old getNegatedExpression to negateExpression to avoid the semantic conflict.

Reviewed By: RKSimon

Differential revision: https://reviews.llvm.org/D78291
2020-04-27 04:11:42 +00:00
aartbik 907871d9ad [llvm] [CodeGen] Fixed vector halving bug for masked load
Summary:
Given a VL=14 that is enveloped by a proper VL=16, splitting the
masked load using the enveloping halving VL=8/8 should yields
should eventually yield V=8/5. This fixes various assert failures
in getHalfNumVectorElementsVT() and IncrementMemoryAddress().

Note, I suspect similar fixes will be needed for other masked
operations, but for now I send out a fix for masked load only.

Bugzilla issue 45563
https://bugs.llvm.org/show_bug.cgi?id=45563

Reviewers: craig.topper, mehdi_amini, nicolasvasilache

Reviewed By: craig.topper

Subscribers: hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78608
2020-04-23 15:12:44 -07:00
Christopher Tetreault ccd623eae3 [SVE] Remove calls to isScalable from CodeGen
Reviewers: efriedma, sdesmalen, stoklund, sunfish

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77755
2020-04-23 12:58:52 -07:00
Alex Richardson bbcfce4bad Use FrameIndexTy for stack protector
Using getValueType() is not correct for architectures extended with CHERI since
we need a pointer type and not the value that is loaded. While stack
protector is useless when you have CHERI (since CHERI provides much
stronger security guarantees), we still have a test to check that we can
generate correct code for checks. Merging b281138a1b
into our tree broke this test. Fix by using TLI.getFrameIndexTy().

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D77785
2020-04-23 13:12:27 +01:00
Craig Topper 05a11974ae [CallSite removal] Remove unneeded includes of CallSite.h. NFC 2020-04-22 00:07:13 -07:00
Kang Zhang a8e15ee04a [CodeGen] Support freeze expand for ppc_fp128
Summary:
The patch D29014 has added the new ISD::FREEZE and can deal with the
integer.
The patch D76980 has added SoftenFloatRes_FREEZE for float point.
But we still lack of expand for ppc_fp128, this will cause assertion for
some cases.
This patch is to support freeze expand for ppc_fp128.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78278
2020-04-20 07:27:41 +00:00
Simon Pilgrim 46de0d5fe9 SelectionDAGBuilder.h - remove unused includes + forward declarations. NFC.
Replace SelectionDAG.h include with SelectionDAG forward declaration.
2020-04-19 12:38:41 +01:00
Simon Pilgrim 032738d17e InstrEmitter.h - reduce SelectionDAG.h include to SelectionDAGNodes.h include.
Add SDDbgLabel/TargetLowering forward declarations.
Add the full SelectionDAG.h include to InstrEmitter.cpp.
2020-04-19 11:52:31 +01:00
Christopher Tetreault c858debebc Remove asserting getters from base Type
Summary:
Remove asserting vector getters from Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: dexonsmith, sdesmalen, efriedma

Reviewed By: efriedma

Subscribers: cfe-commits, hiraditya, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D77278
2020-04-17 14:03:31 -07:00
Fraser Cormack c819ef9653 Provide operand indices to adjustSchedDependency
This allows targets to know exactly which operands are contributing to
the dependency, which is required for targets with per-operand
scheduling models.

Differential Revision: https://reviews.llvm.org/D77135
2020-04-17 11:08:44 +01:00
Craig Topper 944cc5e0ab [SelectionDAGBuilder][CGP][X86] Move some of SDB's gather/scatter uniform base handling to CGP.
I've always found the "findValue" a little odd and
inconsistent with other things in SDB.

This simplfifies the code in SDB to just handle a splat constant
address or a 2 operand GEP in the same BB. This removes the
need for "findValue" since the operands to the GEP are
guaranteed to be available. The splat constant handling is
new, but was needed to avoid regressions due to constant
folding combining GEPs created in CGP.

CGP is now responsible for canonicalizing gather/scatters into
this form. The pattern I'm using for scalarizing, a scalar GEP
followed by a GEP with an all zeroes index, seems to be subject
to constant folding that the insertelement+shufflevector was not.

Differential Revision: https://reviews.llvm.org/D76947
2020-04-16 17:49:22 -07:00
Eli Friedman 7c10541e56 [SelectionDAG] Fix usage of Align constructing MachineMemOperands.
The "Align" passed into getMachineMemOperand etc. is the alignment of
the MachinePointerInfo, not the alignment of the memory operation.
(getAlign() on a MachineMemOperand automatically reduces the alignment
to account for this.)

We were passing on wrong (overconservative) alignment in a bunch of
places. Fix a bunch of these, mostly in legalization.  And while I'm
here, switch to the new Align APIs.

The test changes are all scheduling changes: the biggest effect of
preserving large alignments is that it improves alias analysis, so the
scheduler has more freedom.

(I was originally just trying to do a minor cleanup in
SelectionDAGBuilder, but I accidentally went deeper down the rabbit
hole.)

Differential Revision: https://reviews.llvm.org/D77687
2020-04-15 13:01:41 -07:00
Benjamin Kramer d790bd3999 Unbreak the build 2020-04-15 15:54:47 +02:00
Victor Campos d85b3877dc [CodeGen][ARM] Error when writing to specific reserved registers in inline asm
Summary:
No error or warning is emitted when specific reserved registers are
written to in inline assembly. Therefore, writes to the program counter
or to the frame pointer, for instance, were permitted, which could have
led to undesirable behaviour.

Example:
  int foo() {
    register int a __asm__("r7"); // r7 = frame-pointer in M-class ARM
    __asm__ __volatile__("mov %0, r1" : "=r"(a) : : );
    return a;
  }

In contrast, GCC issues an error in the same scenario.

This patch detects writes to specific reserved registers in inline
assembly for ARM and emits an error in such case. The detection works
for output and input operands. Clobber operands are not handled here:
they are already covered at a later point in
AsmPrinter::emitInlineAsm(const MachineInstr *MI). The registers
covered are: program counter, frame pointer and base pointer.

This is ARM only. Therefore the implementation of other targets'
counterparts remain open to do.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76848
2020-04-15 14:40:42 +01:00
QingShan Zhang c9f9c79c5a [NFC][DAGCombine] Change the value of NegatibleCost to make it align with the semantics
This is a minor NFC change to make the code more clear. We have the NegatibleCost that
has cheaper, neutral, and expensive. Typically, the smaller one means the less cost.
It is inverse for current implementation, which makes following code not easy to read.
If (CostX > CostY) negate(X)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D77993
2020-04-15 02:20:58 +00:00
Craig Topper 3043093822 [CallSite removal][CodeGen] Replace ImmutableCallSite with CallBase in isInTailCallPosition. 2020-04-13 23:04:57 -07:00
Craig Topper 113f37a1f9 [CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase
Differential Revision: https://reviews.llvm.org/D77995
2020-04-13 13:50:15 -07:00
Matt Arsenault e6605a209c DAG: Fix wrong legality check for ISD::FMAD
Since 1725f28841, this should check
isFMADLegalForFAddFSub rather than the the plain isOperationLegal.

This would assert in a subset of cases due to an oddity in how FMAD is
selected. We will allow FMA formation pre-legalize, but not FMAD even
in cases where it would be valid.

The current hook requires passing in the root fadd/fsub. However, in
this distributed case, this would be far more complicated to pass in
the relevant operand. AMDGPU doesn't get any value from the node, and
only needs the type and is the only implementor, so I'm not sure why
we have this complexity. Just rename and expand the assert to avoid
the more complicated checks spread through the distribution logic.
2020-04-13 10:25:39 -07:00
Craig Topper dbb272b0a3 [CallSite removal][FastISel] Use CallBase instead of CallSite in fastLowerCall. 2020-04-12 18:02:24 -07:00
Craig Topper 95192f548d [CallSite removal][TargetLowering] Use CallBase instead of CallSite in TargetLowering::ParseConstraints interface.
Differential Revision: https://reviews.llvm.org/D77929
2020-04-12 11:26:25 -07:00
Jonathan Roelofs 41f13f1f64 reland: [DAG] Fix PR45049: LegalizeTypes crash
Sometimes LegalizeTypes knows about common subexpressions before SelectionDAG
does, leading to accidental SDValue removal before its reference count was
truly zero.

Differential Revision: https://reviews.llvm.org/D76994

Reviewed-By: bjope

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45049

Reverted in 3ce77142a6 because the previous patch
broke the expensive-checks bots. The new patch removes the broken check.
2020-04-12 09:52:17 -06:00
Craig Topper 5b42399029 [CallSite removal][FastISel] Remove uses of CallSite.
Differential Revision: https://reviews.llvm.org/D77933
2020-04-11 20:52:45 -07:00
Craig Topper 806763efcf [CallSite removal][SelectionDAGBuilder] Use CallBase instead of ImmutableCallSite in visitPatchpoint.
Differential Revision: https://reviews.llvm.org/D77932
2020-04-11 13:07:31 -07:00
Sanjay Patel 1318ddbc14 [VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC
As proposed in D77881, we'll have the related widening operation,
so this name becomes too vague.

While here, change the function signature to take an 'int' rather
than 'size_t' for the scaling factor, add an assert for overflow of
32-bits, and improve the documentation comments.
2020-04-11 10:05:49 -04:00
Craig Topper 9c1842d8af Change FastISel::CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI.
This is the same as what was done to the CallLoweringInfo in
TargetLowering.h in r309159.

This is just a step on the way to replacing this with CallBase.
2020-04-10 23:45:36 -07:00
Craig Topper f49f6cf91e [CallSite removal][SelectionDAGBuilder] Remove most CallSite usage from visitInlineAsm.
I only left it at the interface to ParseConstraints since that
needs updates to other callers in different files. I'll do that
as a follow up.

Differential Revision: https://reviews.llvm.org/D77892
2020-04-10 19:23:33 -07:00
Christopher Tetreault 889f6606ed Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: stoklund, sdesmalen, efriedma

Reviewed By: sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77272
2020-04-10 14:53:43 -07:00
Simon Pilgrim a88cc20456 ProfileSummaryInfo.h - remove unnecessary includes. NFC
Remove a number of includes that aren't necessary (nor are we relying on the remaining includes to provide the declarations), we just needed a llvm::Instruction forward declaration.

This exposed a couple of source files that were implicitly replying on the includes for their use of llvm::SmallSet or std::set, requiring local includes to be added there instead.
2020-04-10 16:25:48 +01:00
Serguei Katkov 4275eb1331 Re-land [Codegen/Statepoint] Allow usage of registers for non gc deopt values.
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.

The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.

Reviewers: reames, danstrushin
Reviewed By: dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77797
2020-04-10 10:13:39 +07:00
Serguei Katkov 44f0d7f136 Revert "[Codegen/Statepoint] Allow usage of registers for non gc deopt values."
This reverts commit a0275705bb.

It causes buildbot failures building LLVM with BUILD_SHARED_LIBS due to a linker error.
2020-04-09 18:24:47 +07:00
Serguei Katkov a0275705bb [Codegen/Statepoint] Allow usage of registers for non gc deopt values.
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.

The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.

Reviewers: reames, dantrushin
Reviewed By: reames, dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77371
2020-04-09 16:57:35 +07:00
Jay Foad c63aed890e [KnownBits] Move AND, OR and XOR logic into KnownBits
Summary:
There are at least three clients for KnownBits calculations:
ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the
common logic should be moved out of these clients and into KnownBits
itself.

This patch does this for AND, OR and XOR calculations by implementing
and using appropriate operator overloads KnownBits::operator& etc.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74060
2020-04-09 10:10:37 +01:00
Matt Arsenault 586769cce2 DAG: Use Register 2020-04-08 13:44:31 -04:00
Matt Arsenault dcce3ef1d2 FastISel: Partially use Register
Doesn't try to convert the cases that depend on generated code.
2020-04-08 12:10:58 -04:00
Matt Arsenault aa26dd9858 CodeGen: Use Register in more places 2020-04-07 15:59:40 -04:00
Craig Topper c41685b16f [SelectionDAG] Make getZeroExtendInReg take a vector VT if the operand VT is a vector.
This removes a call to getScalarType from a bunch of call sites.
It also makes the behavior consistent with SIGN_EXTEND_INREG.

Differential Revision: https://reviews.llvm.org/D77631
2020-04-07 11:34:08 -07:00
Matt Arsenault b281138a1b DAG: Use the correct getPointerTy in a few places
These should not be assuming address space 0. Calling getPointerTy is
generally the wrong thing to do, since you should already know the
type from the incoming IR.
2020-04-07 12:45:41 -04:00
Serguei Katkov b7e3759e17 [DAG] Consolidate require spill slot logic in lambda. NFC.
Move the logic whether lowering of deopt value requires a spill slot in
a separate lambda.

Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77629
2020-04-07 16:43:47 +07:00
Pierre-vh 4fc59a468f Revert "[CodeGen][SelectionDAG] Flip Booleans More Often"
This reverts commit 23342bdcc8.
2020-04-07 09:09:10 +01:00
Pierre-vh 23342bdcc8 [CodeGen][SelectionDAG] Flip Booleans More Often
Differential Revision: https://reviews.llvm.org/D77201
2020-04-07 08:19:57 +01:00
Nick Desaulniers 5bc291be71 [SelectionDAG] fix predecessor list for INLINEASM_BRs' parent
Summary:
A bug report mentioned that LLVM was producing jumps off the end of a
function when using "asm goto with outputs". Further digging pointed to
MachineBasicBlocks that had their address taken and were indirect
targets of INLINEASM_BR being removed by BranchFolder, because their
 predecessor list was empty, so they appeared to have no entry.

This was a cascading failure caused earlier, during Pre-RA instruction
scheduling. We have a few special cases in Pre-RA instruction scheduling
where we split a MachineBasicBlock in two.  This requires careful
handing of predecessor and successor lists for a MachineBasicBlock that
was split, and careful handing of PHI MachineInstrs that referred to the
MachineBasicBlock before it was split.

The clue that led to this fix was the observation that many callers of
MachineBasicBlock::splice() frequently call
MachineBasicBlock::transferSuccessorsAndUpdatePHIs() to update their PHI
nodes after a splice. We don't want to reuse that method, as we have
custom successor transferring logic for this block split.

This patch fixes 2 pre-existing bugs, and adds tests.

The first bug was that MachineBasicBlock::splice() correctly handles
updating most successors and predecessors; we don't need to do anything
more than removing the previous fallthrough block from the first half of
the split block post splice. Previously, we were updating the successor
list incorrectly (updating successors updates predecessors).

The second bug was that PHI nodes that needed registers from the first
half of the split block were not having entries populated.  The register
live out information was correct, and the FuncInfo->PHINodesToUpdate was
correct. Specifically, the check in SelectionDAGISel::FinishBasicBlock:

    for (unsigned i = 0, e = FuncInfo->PHINodesToUpdate.size(); i != e; ++i) {
      MachineInstrBuilder PHI(*MF, FuncInfo->PHINodesToUpdate[i].first);
      if (!FuncInfo->MBB->isSuccessor(PHI->getParent()))
        continue;
      PHI.addReg(FuncInfo->PHINodesToUpdate[i].second).addMBB(FuncInfo->MBB);

was `continue`ing because FuncInfo->MBB tracks the second half of
the post-split block; no one was updating PHI entries for the first half
of the post-split block.

SelectionDAGBuilder::UpdateSplitBlock() already expects to perform
special handling for MachineBasicBlocks that were split post calls to
ScheduleDAGSDNodes::EmitSchedule(), so I'm confident that it's both
correct for ScheduleDAGSDNodes::EmitSchedule() to return the second half
of the split block `CopyBB` which updates `FuncInfo->MBB` (ie. the
current MachineBasicBlock being processed), and perform special handling
for this in SelectionDAGBuilder::UpdateSplitBlock().

Reviewers: void, craig.topper, efriedma

Reviewed By: void, efriedma

Subscribers: hfinkel, fhahn, MatzeB, efriedma, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76961
2020-04-06 13:46:39 -07:00
Craig Topper 07ed1fb597 [SelectionDAGBuilder] Fix ISD::FREEZE creation for structs with fields of different types.
The previous code used the type of the first field for the VT
passed to getNode for every field.

I've based the implementation here off what is done in visitSelect
as it removes the need to special case aggregates.

Differential Revision: https://reviews.llvm.org/D77093
2020-04-06 11:03:40 -07:00
Matt Arsenault 70726cec5b DAG: Combine extract_vector_elt of concat_vectors
Fixes extra canonicalize regressions when legalizing
vector fminnum/fmaxnum.
2020-04-06 09:26:29 -04:00
Guillaume Chatelet ff858d7781 [Alignment][NFC] Add DebugStr and operator*
Summary:
This is a roll forward of D77394 minus AlignmentFromAssumptions (which needs to be addressed separately)
Differences from D77394:
 - DebugStr() now prints the alignment value or `None` and no more `Align(x)` or `MaybeAlign(x)`
   - This is to keep Warning message consistent (CodeGen/SystemZ/alloca-04.ll)
 - Removed a few unneeded headers from Alignment (since it's included everywhere it's better to keep the dependencies to a minimum)

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77537
2020-04-06 12:09:45 +00:00
Guillaume Chatelet 6000478f39 Revert "[Alignment][NFC] Add DebugStr and operator*"
This reverts commit 1e34ab98fc.
2020-04-06 07:55:25 +00:00
Guillaume Chatelet 1e34ab98fc [Alignment][NFC] Add DebugStr and operator*
Summary:
Also updates files to use them.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77394
2020-04-06 07:12:46 +00:00
Craig Topper 97e57f3b24 [DAGCombiner] Use getAnyExtOrTrunc instead of getSExtOrTrunc in the zext(setcc) combine.
We're ANDing with 1 right after which will cause the SIGN_EXTEND to
be combined to ANY_EXTEND later. Might as well just start with an
ANY_EXTEND.

While there replace create the AND using the getZeroExtendInReg
helper to remove the need to explicitly create the VecOnes constant.
2020-04-05 22:44:45 -07:00
Craig Topper 586c051a27 [DAGCombiner] Replace a hardcoded constant in visitZERO_EXTEND with a proper check for the condition its trying to protect.
This code is replacing a shift with a new shift on an extended type.
If the shift amount type can't represent the maximum shift amount
for the new type, the amount needs to be extended to a type that
can.

Previously, the code just hardcoded a check for 256 bits which
seems to have been an assumption that the original shift amount
was MVT::i8. But that seems more catered to a specific target
like X86 that uses i8 as its legal shift amount type. Other
targets may use different types.

This commit changes the code to look at the real type of the shift
amount and makes sure it has enough bits for the Log2 of the
new type. There are similar checks to this in SelectionDAGBuilder
and LegalizeIntegerTypes.
2020-04-05 20:35:57 -07:00
Zuojian Lin a58c8a7866 Remove the additional constant which requires an extra register for statepoint lowering.
The newly-created constant zero will need an extra register to hold it
in the current statepoint lowering implementation. Remove it if there
exists one.
2020-04-05 11:22:09 -04:00
Jonathan Roelofs 3ce77142a6 Revert "[DAG] Fix PR45049: LegalizeTypes crash"
This reverts commit 17673ae0b2.
2020-04-04 13:47:22 -06:00
Jonathan Roelofs 17673ae0b2 [DAG] Fix PR45049: LegalizeTypes crash
Sometimes LegalizeTypes knows about common subexpressions before SelectionDAG
does, leading to accidental SDValue removal before its reference count was
truly zero.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45049

https://reviews.llvm.org/D76994
2020-04-04 13:36:22 -06:00
Matt Arsenault 30ebafaa56 CodeGen: Convert some TII hooks to use Register 2020-04-03 14:52:54 -04:00
Guillaume Chatelet 9068bccbae [Alignment][NFC] Deprecate InstrTypes getRetAlignment/getParamAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77312
2020-04-03 13:21:58 +00:00
Serguei Katkov bd1d70bf0e [DAG] Change isGCValue detection for statepoint lowering
isGCValue should detect whether the deopt value is a GC pointer.
Currently it checks by finding the value in SI.Bases and SI.Ptrs.
However these data structures contain only those values which
have corresponding gc.relocate call. So we can miss GC value if it
does not have gc.relocate call (dead after the call).

Check GC strategy whether pointer is GC one or consider any pointer
to be GC one conservatively.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77130
2020-04-03 12:36:13 +07:00
Simon Pilgrim b02c7a8152 Fix "result of 32-bit shift implicitly converted to 64 bits" MSVC warning. NFCI.
The shift of 1 by an amount that is never more than 31 means that the warning is a false positive but is safe and fixes Werror builds.
2020-04-02 12:02:04 +01:00
Jessica Clarke 616289ed29 [LegalizeTypes][RISCV] Correctly sign-extend comparison for ATOMIC_CMP_XCHG
Summary:
Currently, the comparison argument used for ATOMIC_CMP_XCHG is legalised
with GetPromotedInteger, which leaves the upper bits of the value
undefind. Since this is used for comparing in an LR/SC loop with a
full-width comparison, we must sign extend it. We introduce a new
getExtendForAtomicCmpSwapArg to complement getExtendForAtomicOps, since
many targets have compare-and-swap instructions (or pseudos) that
correctly handle an any-extend input, and the existing function
determines the extension of the result, whereas we are concerned with
the input.

This is related to https://reviews.llvm.org/D58829, which solved the
issue for ATOMIC_CMP_SWAP_WITH_SUCCESS, but not the simpler
ATOMIC_CMP_SWAP.

Reviewers: asb, lenary, efriedma

Reviewed By: asb

Subscribers: arichardson, hiraditya, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, jfb, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, evandro, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74453
2020-04-01 15:51:26 +01:00
Guillaume Chatelet 1dffa2550b [Alignment][NFC] Transition to MachineFrameInfo::getObjectAlign()
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77215
2020-04-01 14:08:28 +00:00
Guillaume Chatelet 3a78f44daf [Alignment][NFC] Convert SelectionDAG::InferPtrAlignment to MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77212
2020-04-01 13:22:11 +00:00
Guillaume Chatelet c7468c1696 [Alignment][NFC] Use Align in SelectionDAG::getMemIntrinsicNode
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, nemanjai, hiraditya, kbarton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77149
2020-04-01 09:32:05 +00:00
Qiu Chaofan 95bcab8272 [DAGCombiner] Require ninf for sqrt recip estimation
Currently, DAG combiner uses (fmul (rsqrt x) x) to estimate square
root of x. However, this method would return NaN if x is +Inf, which
is incorrect.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D76853
2020-04-01 16:23:43 +08:00
Craig Topper f92563f907 [VectorUtils][X86] De-templatize scaleShuffleMask and 2 X86 shuffle mask helpers and move their implementation to cpp files
Summary: These were templated due to SelectionDAG using int masks for shuffles and IR using unsigned masks for shuffles. But now that D72467 has landed we have an int mask version of IRBuilder::CreateShuffleVector. So just use int instead of a template

Reviewers: spatel, efriedma, RKSimon

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D77183
2020-04-01 00:46:48 -07:00
Eli Friedman 1ee6ec2bf3 Remove "mask" operand from shufflevector.
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.

This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.

I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors.  Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it.  Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.

Differential Revision: https://reviews.llvm.org/D72467
2020-03-31 13:08:59 -07:00
Guillaume Chatelet 998118c3d3 [Alignment][NFC] Deprecate MachineMemOperand::getMachineMemOperand version that takes an untyped alignement.
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77138
2020-03-31 16:05:31 +00:00
Guillaume Chatelet b9810988b2 [Alignment][NFC] Transitionning more getMachineMemOperand call sites
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77127
2020-03-31 11:04:10 +00:00
Denis Antrushin 47107dc3bd [Statepoint] Fix StatepointLoweringInfo::GCTransitionArgs initialization
Summary:
In method SelectionDAGBuilder::LowerStatepoint, array SI.GCTransitionArgs
is initialized from wrong part of ImmutableStatepoint class.
We copy gc args instead of transitions args.

Reviewers: reames, skatkov

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77075
2020-03-31 11:45:06 +03:00
Guillaume Chatelet c9d5c19597 [Alignment][NFC] Transitionning more getMachineMemOperand call sites
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77121
2020-03-31 08:36:18 +00:00
Craig Topper 2a07221cf3 [SelectionDAG] Add an assert that the input VT and output VT for ISD::FREEZE are the same.
Differential Revision: https://reviews.llvm.org/D77092
2020-03-30 23:21:58 -07:00
Juneyoung Lee 519f5c3796 [LegalizeTypes] Add SoftenFloatRes_FREEZE
Summary: This adds SoftenFloatRes_FREEZE.

Reviewers: bkramer, JamesNagurne, craig.topper, efriedma

Reviewed By: craig.topper

Subscribers: AbigailLinden, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76980
2020-03-31 10:16:38 +09:00
Nick Desaulniers f086941765 [SelectionDAGISel] small cleanup to INLINEASM_BR selection. NFC
Summary:
This code was throwing away the opcode for a boolean, which was then
reconstructing the opcode from that boolean.  Just pass the opcode, and
forget the boolean.

Reviewers: srhines

Reviewed By: srhines

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77100
2020-03-30 15:32:06 -07:00
Jakub Kuderski 77ce2e21a8 [AMDGPU] Add Relocation Constant Support
Summary:
This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf.

The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry.

`SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel.

Author: csyonghe <yonghe@google.com>

Reviewers: tpr, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76440
2020-03-30 13:49:20 -04:00
Guillaume Chatelet bdf77209b9 [Alignment][NFC] Use Align version of getMachineMemOperand
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77059
2020-03-30 15:46:27 +00:00
Guillaume Chatelet 01ba2ad9ef [Alignment][NFC] Provide tightened up functions in SelectionDAG, MachineFunction and MachineMemOperand
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77046
2020-03-30 13:03:27 +00:00
Guillaume Chatelet b91535f6c7 [Alignment][NFC] Return Align for SelectionDAGNodes::getOriginalAlignment/getAlignment
Summary:
Also deprecate getOriginalAlignment, getAlignment will take much more time as it is pervasive through the codebase (including TableGened files).

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76933
2020-03-30 07:26:48 +00:00
Reid Kleckner e5bf5037d8 [CodeGen] Fix sinking local values in lpads with phis
There was already a test case for landingpads to handle this case, but I
had forgotten to consider PHI instructions preceding the EH_LABEL in the
landingpad.

PR45261
2020-03-28 11:10:33 -07:00
Nemanja Ivanovic 4821411347 [DAGCombine] Fix splitting indexed loads in ForwardStoreValueToDirectLoad()
In DAGCombiner::visitLOAD() we perform some checks before breaking up an indexed
load. However, we don't do the same checking in ForwardStoreValueToDirectLoad()
which can lead to failures later during combining
(see: https://bugs.llvm.org/show_bug.cgi?id=45301).

This patch just adds the same checks to this function as well.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45301

Differential revision: https://reviews.llvm.org/D76778
2020-03-27 18:03:47 -05:00
Guillaume Chatelet 74eac9031a [Alignment][NFC] MachineMemOperand::getAlign/getBaseAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76925
2020-03-27 15:49:13 +00:00
Juneyoung Lee 1bcc500b48 [DAGCombine] Add basic optimizations for FREEZE in SelDag
Summary: This patch is the first effort to adding basic optimizations for FREEZE in SelDag.

Reviewers: spatel, lebedev.ri

Reviewed By: spatel

Subscribers: xbolva00, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76707
2020-03-27 12:20:39 +09:00
Craig Topper 9f7d4150b9 [X86] Move combineLoopMAddPattern and combineLoopSADPattern to an IR pass before SelecitonDAG.
These transforms rely on a vector reduction flag on the SDNode
set by SelectionDAGBuilder. This flag exists because SelectionDAG
can't see across basic blocks so SelectionDAGBuilder is looking
across and saving the info. X86 is the only target that uses this
flag currently. By removing the X86 code we can remove the flag
and the SelectionDAGBuilder code.

This pass adds a dedicated IR pass for X86 that looks across the
blocks and transforms the IR into a form that the X86 SelectionDAG
can finish.

An advantage of this new approach is that we can enhance it to
shrink the phi nodes and final reduction tree based on the zeroes
that we need to concatenate to bring the partially reduced
reduction back up to the original width.

Differential Revision: https://reviews.llvm.org/D76649
2020-03-26 14:10:20 -07:00
Qiu Chaofan 172456c775 [Legalizer] Fix some flags miss in vector results
In some scalarize/split result methods (unary, binary, ...), flags in
SDNode were not passed down, which may lead to unexpected results in
unsafe float-point optimization. This patch fixes them. (maybe not
complete)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D76832
2020-03-26 22:01:19 +08:00
Benjamin Kramer 0019c2f194 [SelectionDAG] Don't crash when freezing illegal float types 2020-03-24 19:45:19 +01:00
Juneyoung Lee 7802be4a3d [SelDag] Add FREEZE
Summary:
- Add FREEZE node to SelDag
- Lower FreezeInst (in IR) to FREEZE node
- Add Legalization for FREEZE node

Reviewers: qcolombet, bogner, efriedma, lebedev.ri, nlopes, craig.topper, arsenm

Reviewed By: lebedev.ri

Subscribers: wdng, xbolva00, Petar.Avramovic, liuz, lkail, dylanmckay, hiraditya, Jim, arsenm, craig.topper, RKSimon, spatel, lebedev.ri, regehr, trentxintong, nlopes, mkuper, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D29014
2020-03-24 23:04:58 +09:00
Matt Arsenault aa63eb6a46 GlobalISel: Add computeKnownBitsForTargetInstr
I think we can save the MRI argument from these since it's in
GISelKnownBits already, but currently not accessible.

Implementation deferred to avoid dependency on other patches.
2020-03-23 15:02:30 -04:00
Sanjay Patel 0eeee83d75 [VectorUtils] move x86's scaleShuffleMask to generic VectorUtils
We have some long-standing missing shuffle optimizations that could
use this transform via VectorCombine now:
https://bugs.llvm.org/show_bug.cgi?id=35454
(and we still don't get that case in the backend either)

This function is apparently templated because there's existing code
in IR that treats mask values as unsigned and backend code that
treats masks values as signed.

The mask values are not endian-dependent (as shown by the existing
bitcast transform from DAGCombiner).

Differential Revision: https://reviews.llvm.org/D76508
2020-03-23 09:58:55 -04:00
Guillaume Chatelet 3ba550a05a [Alignment][NFC] Use TFL::getStackAlign()
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: dylanmckay, sdardis, nemanjai, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76551
2020-03-23 13:48:29 +01:00
Jay Foad 7cdbf1ed4b Make use of APInt::countLeadingOnes. NFC. 2020-03-23 09:08:20 +00:00
Sam Parker 62fdb1f534 [DAGCombine] Skip PostInc combine with later users
When decided whether to generate a post-inc load/store, look at the
other memory nodes that use the same base address and, if any proceed
the current node, then don't do the combine.
The change only seems to be affecting the Arm backend, which I was
surprised at, but it appears to fix a lot of our issues around MVE
masked load/stores having to store a temporary address after an early
post-increment on a shared base address.

Differential Revision: https://reviews.llvm.org/D75847
2020-03-23 08:39:53 +00:00
Sam Parker 8e45eaf1da [NFC][DAGCombine] Refactor post-inc logic
Extract the decision to combine into a post-inc address into a
couple of functions to make the logic more clear and re-usable.

Differential Revision: https://reviews.llvm.org/D76060
2020-03-23 08:32:20 +00:00
Qiu Chaofan 763871053c [DAGCombiner] Require nsz for aggressive fma fold
For folding pattern `x-(fma y,z,u*v) -> (fma -y,z,(fma -u,v,x))`, if
`yz` is 1, `uv` is -1 and `x` is -0, sign of result would be changed.

Differential Revision: https://reviews.llvm.org/D76419
2020-03-22 23:10:07 +08:00
Simon Pilgrim c5fd9e3888 [DAG] Don't permit EXTLOAD when combining FSHL/FSHR consecutive loads (PR45265)
Technically we can permit EXTLOAD of the LHS operand but only if all the extended bits are shifted out. Until we test coverage for that case, I'm just disabling this to fix PR45265.
2020-03-21 10:52:41 +00:00
Pirama Arumuga Nainar edcfb47ff6 [DAGCombiner] Do not fold truncate(build_vector(..)) if it creates an illegal type
Summary:
It can be the case that a vector type is legal but the corresponding
scalar type is not legal for an architecture (i8 vs. v16i8 on AArch64).
Check if the scalar type created when folding
  truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y))

is legal if we are running after the type legalizer.

This fixes https://github.com/android/ndk/issues/1207.

Reviewers: RKSimon, srhines

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76312
2020-03-20 09:20:16 -07:00
Bjorn Pettersson d168b77780 [DAGCombiner] Fix non-determinism problem related to argument evaluation order in visitFDIV
Summary:
For some reason the order in which we call getNegatedExpression
for the involved operands, after a call to isCheaperToUseNegatedFPOps,
seem to matter. This patch includes a new test case in
test/CodeGen/X86/fdiv.ll that crashes if we reverse the order of
those calls. Before this patch that could happen depending on
which compiler that were used when buildind llvm. With my GCC
version (7.4.0) I got the crash, because it seems like it is
using a different order for the argument evaluation compared
to clang.

All other users of isCheaperToUseNegatedFPOps already used this
pattern with unfolded/ordered calls to getNegatedExpression, so
this patch is aligning visitFDIV with the other use cases.

This patch simply deals with the non-determinism for FDIV. While
the underlying problem with getNegatedExpression is discussed
further in D76439.

Reviewers: spatel, RKSimon

Reviewed By: spatel

Subscribers: hiraditya, mgrang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76319
2020-03-20 16:11:17 +01:00
Sanjay Patel 56da41393d [SDAG] reduce code duplication in getNegatedExpression(); NFCI 2020-03-19 13:55:15 -04:00
Cullen Rhodes 5ce38fcbac [ValueTypes] Add support for scalable EVTs
Summary:
* Remove a bunch of asserts checking for unsupported scalable types and
  add some more now that they are supported.
* Propagate the scalable flag where necessary.
* Add another `EVT::getExtendedVectorVT` method that takes an
  ElementCount parameter.
* Add `EVT::isExtendedScalableVector` and
  `EVT::getExtendedVectorElementCount` - latter is currently unused.

Reviewers: sdesmalen, efriedma, rengolin, craig.topper, huntergr

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75672
2020-03-19 11:04:15 +00:00
Craig Topper c69a4d6bef [SelectionDAG] When splitting gathers/scatters in type legalization, set MMO size to UnknownSize
Gather/scatter don't access one memory location, they access multiple disjoint locations. So using a fixed size isn't accurate. But we don't have a way to represent the true behavior so just use UnknownSize.

Previously we "split" the memory VT and use that size for the MMO of each half. But the memory VT is scalar so splitting usually just returned the original scalar VT, but on 32-bit X86 if the scalar VT was i64 it probably returned i32?

Differential Revision: https://reviews.llvm.org/D76388
2020-03-18 16:07:15 -07:00
Craig Topper 498b53890d [SelectionDAGBuilder][FPEnv] Take into account SelectionDAG continuous CSE when setting the nofpexcept flag for constrained intrinsics
SelectionDAG CSEs nodes based on their result type and operands, but not their flags. The flags are expected to be intersected when they are CSEd. In SelectionDAGBuilder, for FP nodes we manage both the fast math flags and the nofpexcept flag after the nodes have already been CSEd when they were created with getNode. The management of the fastmath flags before the constrained nodes prevents the nofpexcept management from working correctly.

This commit moves the FMF handling for constrained intrinsics into their visitor and disables the common FMF handling for these nodes.

Differential Revision: https://reviews.llvm.org/D75224
2020-03-18 13:37:17 -07:00
QingShan Zhang d577193c0f [DAGCombine] Respect the uses when combine FMA for a*b+/-c*d
If it is a*b-c*d, it could be also folded into fma(a, b, -c*d) or fma(-c, d, a*b).
This patch is trying to respect the uses of a*b and c*d to make the best choice.

Differential Revision: https://reviews.llvm.org/D75982
2020-03-18 03:34:27 +00:00
Simon Pilgrim 68224c1952 [TargetLowering] Only demand a rotation's modulo amount bits
ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.

Differential Revision: https://reviews.llvm.org/D76201
2020-03-17 21:23:46 +00:00
Craig Topper 98369178bc [SelectionDAGBuilder] Don't set MachinePointerInfo for gather when we find a uniform base
I believe we were previously calculating a pointer info with the scalar base and an offset of 0. But that's not really where the gather is pointing. The offset is a function of the indices of the GEP we looked through.

Also set the size of the MachineMemOperand to UnknownSize

Differential Revision: https://reviews.llvm.org/D76157
2020-03-17 11:03:45 -07:00
Simon Pilgrim c9656a3b31 [DAGCombiner] matchRotateSub - handle shift amount truncation
Under certain circumstances we'll end up in the position where the negated shift amount will get truncated to the type specified getScalarShiftAmountTy(), so we need to test for a truncated version of the shift amount as well.

This allows us to remove half of the remaining patterns tested for by X86ISelLowering's combineOrShiftToFunnelShift.
2020-03-17 16:01:23 +00:00
Simon Pilgrim 2b3b453a82 [TargetLowering] Only demand a funnelshift's modulo amount bits
ISD::FSHL/FSHR shift amount values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.
2020-03-16 13:52:17 +00:00
Simon Pilgrim 5641804298 [DAG] MatchRotate - Add funnel shift by variable support
Followup to D75114, this patch reuses the existing MatchRotate ROTL/ROTR rotation pattern code to also recognize the more general FSHL/FSHR funnel shift patterns when we have variable shift amounts, matched with MatchFunnelPosNeg which acts in an (almost) equivalent manner to MatchRotatePosNeg.
2020-03-15 11:50:45 +00:00
Brian Cain ad7b930bd1 Initialize IsFast* values
We must initialize these values in case some targets do not assign to
them in allowsMemoryAccess().
2020-03-13 17:46:32 -05:00
Craig Topper 431df3d873 [SelectionDAGBuilder] Simplify the struct type handling in getUniformBase. 2020-03-13 14:00:21 -07:00
QingShan Zhang e601196833 [NFC][DAGCombine] Move the fold of a*b-c and a-b*c into lambda function
This will help the review of https://reviews.llvm.org/D75982. It is
a simple code refactor.
2020-03-13 02:35:46 +00:00
Simon Pilgrim 2a2d242017 [DAGCombine] foldVSelectOfConstants - ensure constants are same type
Fix bug identified by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21167, foldVSelectOfConstants must ensure that the 2 build vectors have scalars of the same type before trying to compare APInt values.
2020-03-12 20:02:05 +00:00
Thomas Lively 4e589e6c26 [WebAssembly] Fix SIMD shift unrolling to avoid assertion failure
Summary:
Using the default DAG.UnrollVectorOp on v16i8 and v8i16 vectors
results in i8 or i16 nodes being inserted into the SelectionDAG. Since
those are illegal types, this causes a legalization assertion failure
for some code patterns, as uncovered by PR45178. This change unrolls
shifts manually to avoid this issue by adding and using a new optional
EVT argument to DAG.ExtractVectorElements to control the type of the
extract_element nodes.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76043
2020-03-12 12:20:14 -07:00
Andrzej Warzynski 46b9f14d71 [AArch64][SVE] Add intrinsics for non-temporal scatters/gathers
Summary:
This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:
  * aarch64_sve_ldnt1_gather_index
  * aarch64_sve_stnt1_scatter_index
These intrinsics implement the "scalar + vector of indices" addressing
mode.

As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to `ldnt1` instructions.

The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:
  * GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
  * SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1

The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D75601
2020-03-12 13:55:56 +00:00
Tres Popp bbe6764711 Remove unused variable.
Delete dead code from 8fffa40400.
2020-03-12 08:42:57 +01:00
Philip Reames 8fffa40400 [GC] Remove redundant entiries in stackmap section (and test it this time)
This is a reimplementation of the optimization removed in D75964. The actual spill/fill optimization is handled by D76013, this one just worries about reducing the stackmap section size itself by eliminating redundant entries. As noted in the comments, we could go a lot further here, but avoiding the degenerate invoke case as we did before is probably "enough" in practice.

Differential Revision: https://reviews.llvm.org/D76021
2020-03-11 21:24:48 -07:00
Bill Wendling 6aebf0ee56 Specify branch probabilities for callbr dests
Summary:
callbr's indirect branches aren't expected to be taken, so reduce their
probabilities to 0 while increasing the default destination to 1. This
allows some code improvements through block placement.

Reviewers: nickdesaulniers

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72656
2020-03-11 20:33:48 -07:00
Philip Reames 8f997b4f01 [GC] Loosen ordering on statepoint reloads to allow CSE
We just removed a broken duplicate elimination algorithm in D75964, and after landed that it occurred to me that duplicate elimination is simply CSE. SelectionDAG has a build in CSE, so why wasn't that triggering? Well, it turns out we were overly conservative in the memory states for our reloads and CSE (rightly) considers the incoming memory state for a load part of the identity of the load.

By loosening the chain and allowing reordering, we also allow CSE. As shown in the test case, doing iterative CSE as we go is enough to eliminate duplicate stores in later statepoints as well. We key our (block local) slot map by SDValue, so commoning a previous pair of loads at construction time means we also common following stores.

Differential Revision: https://reviews.llvm.org/D76013
2020-03-11 12:30:06 -07:00
Simon Pilgrim d8f9416fdc [DAG] MatchRotate - Add funnel shift by immediate support
This patch reuses the existing MatchRotate ROTL/ROTR rotation pattern code to also recognize the more general FSHL/FSHR funnel shift patterns when we have constant shift amounts.

Differential Revision: https://reviews.llvm.org/D75114
2020-03-11 18:55:18 +00:00
Philip Reames e671641844 [GC] Remove buggy untested optimization from statepoint lowering
A downstream test case (see included reduced test) revealed that we have a bug in how we handle duplicate relocations. If we have the same SDValue relocated twice, and that value happens to be a constant (such as null), we only export one of the two llvm::Values. Exporting on a per llvm::Value basis is required to allow lowering of gc.relocates in following basic blocks (e.g. invokes). Without it, we end up with a use of an undefined vreg and bad things happen.

Rather than fixing the optimization - which appears to be hard - I propose we simply remove it. There are no tests in tree that change with this code removed. If we find out later that this did matter for something, we can reimplement a variation of this in CodeGenPrepare to catch the easy cases without complicating the lowering code.

Thanks to Denis and Serguei who did all the hard work of figuring out what went wrong here. The patch is by far the easy part. :)

Differential Revision: https://reviews.llvm.org/D75964
2020-03-11 10:03:24 -07:00
Simon Pilgrim e71fb46a8f [TargetLowering] SimplifyDemandedVectorElts - add DemandedElts mask to ISD::BITCAST SimplifyDemandedBits call.
This fixes most of the regressions introduced in the rG4bc6f6332028 bugfix. The vector-trunc.ll issue should be fixed by D66004.
2020-03-10 13:39:10 +00:00
Djordje Todorovic c15c68abdc [CallSiteInfo] Enable the call site info only for -g + optimizations
Emit call site info only in the case of '-g' + 'O>0' level.

Differential Revision: https://reviews.llvm.org/D75175
2020-03-09 12:12:44 +01:00
Simon Pilgrim 7202d9cde9 [DAG] Combine fshl/fshr(load1,load0,c) if we have consecutive loads
As noted on D75114, if both arguments of a funnel shift are consecutive loads we are missing the opportunity to combine them into a single load.

Differential Revision: https://reviews.llvm.org/D75624
2020-03-06 11:36:18 +00:00
QingShan Zhang 3906ae387f [DAGCombine] Check the uses of negated floating constant and remove the hack
PowerPC hits an assertion due to somewhat the same reason as https://reviews.llvm.org/D70975.
Though there are already some hack, it still failed with some case, when the operand 0 is NOT
a const fp, it is another fma that with const fp. And that const fp is negated which result in multi-uses.

A better fix is to check the uses of the negated const fp. If there are already use of its negated
value, we will have benefit as no extra Node is added.

Differential revision: https://reviews.llvm.org/D75501
2020-03-05 03:42:50 +00:00
Sanjay Patel 29a2b20ab3 [SDAG] simplify FP binops to undef
As discussed in the commit thread for rGa253a2a and D73978, we can do more undef folding for FP ops.
The nnan and ninf fast-math-flags specify that if an operand is the disallowed value, the result is
poison, so we can produce an undef result.

But this doesn't work as expected (the undef operand cases remain) because of a Flags propagation
problem in SelectionDAGBuilder.

I've added DAGCombiner calls to enable these for the other cases because we've shown in other
patches that (because of the limited way that SDAG iterates), it is possible to miss simplifications
like this if they are done only at node creation time.

Several potential follow-ups to expand on this patch are possible.

Differential Revision: https://reviews.llvm.org/D75576
2020-03-04 10:42:16 -05:00
Craig Topper d8ad7cc088 [DAGCombiner][X86] Improve narrowExtractedVectorLoad to handle cases where the element size isn't byte sized by the subvector is.
Summary:
Follow up from D75377. If the subvector is byte sized and the
index is aligned to the subvector size, we can shrink the load.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: dbabokin, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75434
2020-03-03 08:41:31 -08:00
Jordan Rupprecht d7803c3832 Add default case to fix -Wswitch errors 2020-03-02 14:23:46 -08:00
Craig Topper adc69729ec [TargetLowering] Fix what look like copy/paste mistakes in compare with infinity handling SimplifySetCC.
I expect that the isCondCodeLegal checks should match that CC of
the node that we're going to create.

Rewriting to a switch to minimize repeated mentions of the same
constants.
2020-03-02 14:12:16 -08:00
Simon Pilgrim d20fb7ea13 Fix shadow variable warning. NFC. 2020-03-02 11:41:20 +00:00
Simon Pilgrim e4380b07cc Fix operator precedence warning. NFCI. 2020-03-02 10:56:58 +00:00
Craig Topper 0cd6712a7a [DAGCombiner][X86] Disable narrowExtractedVectorLoad if the element type size isn't byte sized
The address calculation for the offset assumes that you can calculate the offset by multiplying the index by the store size of the element. But that only works if the element's store size is exactly its real size since we store vectors tightly packed in memory. There are improvements we could make to this like special casing extracting element 0. I think we could also handle cases where the extracted VT is byte sized and the index is aligned with the extract element count.

Differential Revision: https://reviews.llvm.org/D75377
2020-03-01 18:13:25 -08:00
Craig Topper 211fb91f10 [DAGCombiner] Don't emit select_cc from visitSINT_TO_FP/visitUINT_TO_FP. Use plain select instead.
Select_cc isn't used by all targets. X86 doesn't have optimizations
for it.

Since we already know the input to the sint_to_fp/uint_to_fp is
a setcc we can just emit a plain select using that setcc as the
condition. Other DAG combines can turn that into a select_cc on
targets that support it.

Differential Revision: https://reviews.llvm.org/D75415
2020-03-01 10:52:17 -08:00
Sanjay Patel 619d7dc39a [DAGCombiner] recognize shuffle (shuffle X, Mask0), Mask --> splat X
We get the simple cases of this via demanded elements and other folds,
but that doesn't work if the values have >1 use, so add a dedicated
match for the pattern.

We already have this transform in IR, but it doesn't help the
motivating x86 tests (based on PR42024) because the shuffles don't
exist until after legalization and other combines have happened.
The AArch64 test shows a minimal IR example of the problem.

Differential Revision: https://reviews.llvm.org/D75348
2020-03-01 09:10:25 -05:00
David Green 1de1070559 [DAGCombine] Fix alias analysis for unaligned accesses
The alias analysis in DAG Combine looks at the BaseAlign, the Offset and
the Size of two accesses, and determines if they are known to access
different parts of memory by the fact that they are different offsets
from inside that "alignment window". It does not seem to account for
accesses that are not a multiple of the size, and may overflow from one
alignment window into another.

For example in the test case we have a 19byte memset that is splits into
a 16 byte neon store and an unaligned 4 byte store with a 15 byte
offset. This 15byte offset (with a base align of 8) wraps around to the
next alignment windows. When compared to an access that is a 16byte
offset (of the same 4byte size and 8byte basealign), the two accesses
are said not to alias.

I've fixed this here by just ensuring that the offsets are a multiple of
the size, ensuring that they don't overlap by wrapping. Fixes PR45035,
which was exposed by the UseAA changes in the arm backend.

Differential Revision: https://reviews.llvm.org/D75238
2020-02-28 18:44:36 +00:00
Simon Pilgrim 4bc6f63320 [TargetLowering] SimplifyDemandedBits - fix SCALAR_TO_VECTOR knownbits bug
We can only report the knownbits for a SCALAR_TO_VECTOR node if we only demand the 0'th element - the upper elements are undefined and shouldn't be trusted.

This is causing a number of regressions that need addressing but we need to get the bugfix in first.
2020-02-28 15:23:37 +00:00
serge-sans-paille 6d15c4deab No longer generate calls to *_finite
According to Joseph Myers, a libm maintainer

> They were only ever an ABI (selected by use of -ffinite-math-only or
> options implying it, which resulted in the headers using "asm" to redirect
> calls to some libm functions), not an API. The change means that ABI has
> turned into compat symbols (only available for existing binaries, not for
> anything newly linked, not included in static libm at all, not included in
> shared libm for future glibc ports such as RV32), so, yes, in any case
> where tools generate direct calls to those functions (rather than just
> following the "asm" annotations on function declarations in the headers),
> they need to stop doing so.

As a consequence, we should no longer assume these symbols are available on the
target system.

Still keep the TargetLibraryInfo for constant folding.

Differential Revision: https://reviews.llvm.org/D74712
2020-02-28 10:07:37 +01:00
Krzysztof Parzyszek fd7c2e24c1 [SDAG] Add SDNode::values() = make_range(values_begin(), values_end())
Also use it in a few places to simplify code a little bit.  NFC
2020-02-26 12:07:38 -06:00
Sanjay Patel b3d0c79836 [DAGCombiner] avoid narrowing fake fneg vector op
This may inhibit vector narrowing in general, but there's
already an inconsistency in the way that we deal with this
pattern as shown by the test diff.

We may want to add a dedicated function for narrowing fneg.
It's often folded into some other op, so moving it away from
other math ops may cause regressions that we would not see
for normal binops.

See D73978 for more details.
2020-02-26 11:25:56 -05:00
Simon Pilgrim bbb0933e3d [DAG] visitRotate - modulo non-uniform constant rotation amounts 2020-02-26 15:43:12 +00:00
Craig Topper 735d27dc40 [SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.

This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.

Differential Revision: https://reviews.llvm.org/D75132
2020-02-25 16:58:23 -08:00
Roman Lebedev d20907d1de
[Codegen] Revert rL354676/rL354677 and followups - introduced PR43446 miscompile
This reverts https://reviews.llvm.org/D58468
(rL354676, 44037d7a63),
and all and any follow-ups to that code block.

https://bugs.llvm.org/show_bug.cgi?id=43446
2020-02-25 20:30:12 +03:00
Bill Wendling 23c2a5ce33 Allow "callbr" to return non-void values
Summary:
Terminators in LLVM aren't prohibited from returning values. This means that
the "callbr" instruction, which is used for "asm goto", can support "asm goto
with outputs."

This patch removes all restrictions against "callbr" returning values. The
heavy lifting is done by the code generator. The "INLINEASM_BR" instruction's
a terminator, and the code generator doesn't allow non-terminator instructions
after a terminator. In order to correctly model the feature, we need to copy
outputs from "INLINEASM_BR" into virtual registers. Of course, those copies
aren't terminators.

To get around this issue, we split the block containing the "INLINEASM_BR"
right before the "COPY" instructions. This results in two cheats:

  - Any physical registers defined by "INLINEASM_BR" need to be marked as
    live-in into the block with the "COPY" instructions. This violates an
    assumption that physical registers aren't marked as "live-in" until after
    register allocation. But it seems as if the live-in information only
    needs to be correct after register allocation. So we're able to get away
    with this.

  - The indirect branches from the "INLINEASM_BR" are moved to the "COPY"
    block. This is to satisfy PHI nodes.

I've been told that MLIR can support this handily, but until we're able to
use it, we'll have to stick with the above.

Reviewers: jyknight, nickdesaulniers, hfinkel, MaskRay, lattner

Reviewed By: nickdesaulniers, MaskRay, lattner

Subscribers: rriddle, qcolombet, jdoerfert, MatzeB, echristo, MaskRay, xbolva00, aaron.ballman, cfe-commits, JonChesterfield, hiraditya, llvm-commits, rnk, craig.topper

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D69868
2020-02-24 18:29:06 -08:00
Craig Topper a5fa778882 [LegalizeTypes] Scalarize non-byte sized loads in WidenRecRes_Load and SplitVecResLoad
Should fix PR42803 and PR44902

Differential Revision: https://reviews.llvm.org/D74590
2020-02-24 15:14:33 -08:00
Simon Pilgrim 53b597cfa2 [SelectionDAG] Merge constant SDNode arithmetic into foldConstantArithmetic
This is the second patch as part of https://bugs.llvm.org/show_bug.cgi?id=36544

Merging in the ConstantSDNode variant of FoldConstantArithmetic. After this, I will begin merging in FoldConstantVectorArithmetic

I've ensured this patch can build & pass all lit tests in Windows and Linux environments.

Patch by @justice_adams (Justice Adams)

Differential Revision: https://reviews.llvm.org/D74881
2020-02-24 18:54:22 +00:00
Bevin Hansson 6e561d1c94 [Intrinsic] Add fixed point saturating division intrinsics.
Summary:
This patch adds intrinsics and ISelDAG nodes for signed
and unsigned fixed-point division:

```
llvm.sdiv.fix.sat.*
llvm.udiv.fix.sat.*
```

These intrinsics perform scaled, saturating division
on two integers or vectors of integers. They are
required for the implementation of the Embedded-C
fixed-point arithmetic in Clang.

Reviewers: bjope, leonardchan, craig.topper

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71550
2020-02-24 10:50:52 +01:00
Craig Topper 3a6bb32bd2 [SelectionDAG] Remove ISD::LIFETIME_START/LIFETIME_END from assert in getMemIntrinsicNode.
These appear to have their own SDNode type and shouldn't use
MemIntrinsicSDNode.
2020-02-23 22:32:36 -08:00
Sanjay Patel a253a2a793 [SDAG] fold fsub -0.0, undef to undef rather than NaN
A question about this behavior came up on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139003.html
...and as part of backend improvements in D73978.

We decided not to implement a more general change that would have
folded any FP binop with nearly arbitrary constant + undef operand
to undef because that is not theoretically correct (even if it is
practically correct).

This is the SDAG-equivalent to the IR change in D74713.
2020-02-23 11:36:53 -05:00
Francesco Petrogalli 31ec721516 [llvm][CodeGen] DAG Combiner folds for vscale.
Summary:
This patch simplifies the DAGs generated when using the intrinsic `@llvm.vscale.*` as follows:

* Fold (add (vscale * C0), (vscale * C1)) to (vscale * (C0 + C1)).
* Canonicalize (sub X, (vscale * C)) to (add X,  (vscale * -C)).
* Fold (mul (vscale * C0), C1) to (vscale * (C0 * C1)).
* Fold (shl (vscale * C0), C1) to (vscale * (C0 << C1)).

The test `sve-gep-ll` have been updated to reflect the folding introduced by this patch.

Reviewers: efriedma, sdesmalen, andwar, rengolin

Reviewed By: sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74782
2020-02-21 18:03:12 +00:00
Simon Pilgrim 42ec6fdce9 [TargetLowering] Apply basic shift combines before recursive SimplifyDemandedBits calls.
Minor refactor/cleanup before we begin adding non-uniform support.
2020-02-21 16:31:20 +00:00
Simon Pilgrim 86c52af05a [TargetLowering] SimplifyDemandedBits - use getValidShiftAmountConstant helper.
Use the SelectionDAG::getValidShiftAmountConstant helper to get const/constsplat shift amounts, which allows us to drop the out of range shift amount early-out.

First step towards better non-uniform shift amount support in SimplifyDemandedBits.
2020-02-21 14:23:53 +00:00
Eli Friedman c767cf24e4 [SVE] Add support for lowering GEPs involving scalable vectors.
This includes both GEPs where the indexed type is a scalable vector, and
GEPs where the result type is a scalable vector.

Differential Revision: https://reviews.llvm.org/D73602
2020-02-20 13:45:41 -08:00
Simon Pilgrim f9c326364e [DAGCombiner] Use SDValue::getConstantOperandAPInt helper where possible. NFC. 2020-02-20 18:23:05 +00:00
Simon Pilgrim fc2b4a02b1 [DAGCombine] visitEXTRACT_VECTOR_ELT - add SimplifyDemandedBits multi use support
Similar to what we already do with SimplifyDemandedVectorElts, call SimplifyDemandedBits across all the extracted elements of the source vector, treating it as single use.

There's a minor regression in store-weird-sizes.ll which will be addressed in an upcoming SimplifyDemandedBits patch.
2020-02-20 15:49:38 +00:00
Djordje Todorovic 2f215cf36a Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGfaff707db82d.
A failure found on an ARM 2-stage buildbot.
The investigation is needed.
2020-02-20 14:41:39 +01:00
Djordje Todorovic faff707db8 Reland "[DebugInfo] Enable the debug entry values feature by default"
Differential Revision: https://reviews.llvm.org/D73534
2020-02-19 11:12:26 +01:00
Thomas Lively 7b64a59060 Reland "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering"
This reverts commit 649aba93a2, now that
the approach started there has been shown to be workable in the patch
series culminating in https://reviews.llvm.org/D74192.
2020-02-18 13:49:46 -08:00
Simon Pilgrim d6eef0614f [TargetLowering] Add SimplifyMultipleUseDemandedBits 'all elements' helper wrapper. NFC. 2020-02-18 19:53:50 +00:00
Huihui Zhang 8ee0e1dc02 [NFC] Silence compiler warning [-Wmissing-braces]. 2020-02-18 10:37:12 -08:00
Djordje Todorovic 2bf44d11cb Revert "Reland "[DebugInfo] Enable the debug entry values feature by default""
This reverts commit rGa82d3e8a6e67.
2020-02-18 16:38:11 +01:00
Djordje Todorovic a82d3e8a6e Reland "[DebugInfo] Enable the debug entry values feature by default"
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-18 14:41:08 +01:00
James Clarke b3cd44f80b Use SETNE directly rather than SUB/SETNE 0 for stack guard check
Summary:
Backends should fold the subtraction into the comparison, but not all
seem to. Moreover, on targets where pointers are not integers, such as
CHERI, an integer subtraction is not appropriate. Instead we should just
compare the two pointers directly, as this should work everywhere and
potentially generate more efficient code.

Reviewers: bogner, lebedev.ri, efriedma, t.p.northover, uweigand, sunfish

Reviewed By: lebedev.ri

Subscribers: dschuff, sbc100, arichardson, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74454
2020-02-18 13:21:26 +00:00
Simon Pilgrim a1585aec6f [SelectionDAG] Expose the "getValidShiftAmount" helpers available. NFCI.
These are going to be useful in TargetLowering::SimplifyDemandedBits, so expose these helpers outside of SelectionDAG.cpp

Also add an getValidShiftAmountConstant early-out to getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant so we can use them for scalar cases as well.
2020-02-17 16:28:46 +00:00
Sander de Smalen a7a96c726e [AArch64] Implement passing SVE vectors by ref for AAPCS.
Summary:
This patch implements the part of the calling convention
where SVE Vectors are passed by reference. This means the
caller must allocate stack space for these objects and
pass the address to the callee.

Reviewers: efriedma, rovka, cameron.mcinally, c-rhodes, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71216
2020-02-17 15:20:28 +00:00
Sjoerd Meijer dad5f00e3b [DAGCombine] Combine pattern for REV16
This adds another pattern to the combiner for a case that we were not handling
to generate the REV16 instruction for ARM/Thumb2 and a bswap+ror on X86.

Differential Revision: https://reviews.llvm.org/D74032
2020-02-17 14:54:17 +00:00
Benjamin Kramer 5fc5c7db38 Strength reduce vectors into arrays. NFCI. 2020-02-17 15:37:35 +01:00
Simon Pilgrim ce2b5f1569 Fix gcc9.2 -Winit-list-lifetime warning. NFCI.
Reported by @lbenes (Luke Benes)
2020-02-15 16:48:51 +00:00
Diogo Sampaio 8bc790f9e6 [AArch64][FPenv] Update chain of int to fp conversion
Summary:
When using strict fp, it is required to update the
chain when performing integer type promotion of a
operand to a integer to floating point conversion.

Reviewers: craig.topper, john.brawn

Reviewed By: craig.topper

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74597
2020-02-15 05:07:34 +00:00
Vedant Kumar 3091049446 Add dbgs() output to help track down missing DW_AT_location bugs, NFC 2020-02-13 14:38:44 -08:00
Fangrui Song 0dce409cee [AsmPrinter] De-capitalize Emit{Function,BasicBlock]* and Emit{Start,End}OfAsmFile 2020-02-13 13:22:49 -08:00
Simon Pilgrim 32176133fa Move FIXME to start of comment so visual studio actually tags it. NFC. 2020-02-13 14:28:50 +00:00
Serguei Katkov a6f38b4697 [Statepoint] Remove redundant clear of call target on register
Patchable statepoint is lowered into sequence of nops, so zeroed call target
should not be on register. It is better to use getTargetConstant instead
of getConstant to select zero constant for call target.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D74465
2020-02-13 10:25:50 +07:00
Jay Foad 32aac25637 [KnownBits] Introduce anyext instead of passing a flag into zext
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.

I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.

NFC.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74482
2020-02-12 19:06:53 +00:00
Simon Pilgrim 9eb426c88c [TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes
The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not.

This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied.

It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on.

Differential Revision: https://reviews.llvm.org/D74221
2020-02-12 11:51:42 +00:00
Djordje Todorovic 97ed706a96 Revert "[DebugInfo] Enable the debug entry values feature by default"
This reverts commit rG9f6ff07f8a39.

Found a test failure on clang-with-thin-lto-ubuntu buildbot.
2020-02-12 11:59:04 +01:00
Djordje Todorovic 9f6ff07f8a [DebugInfo] Enable the debug entry values feature by default
This patch enables the debug entry values feature.

  - Remove the (CC1) experimental -femit-debug-entry-values option
  - Enable it for x86, arm and aarch64 targets
  - Resolve the test failures
  - Leave the llc experimental option for targets that do not
    support the CallSiteInfo yet

Differential Revision: https://reviews.llvm.org/D73534
2020-02-12 10:25:14 +01:00
Nicolai Hähnle 07a5b849f7 SelectionDAG: Fix bug in ClusterNeighboringLoads
Summary:
The method attempts to find loads that can be legally clustered by
looking for loads consuming the same chain glue token.

However, the old code looks at _all_ users of values produced by the
chain node -- including uses of the loaded/returned value of volatile
loads or atomics. This could lead to circular dependencies which then
failed during scheduling.

With this change, we filter out users by getResNo, i.e. by which
SDValue value they use, to ensure that we only look at users of the
chain glue token.

This appears to be a rather old bug, which is perhaps surprising.
However, the test case is actually quite fragile (i.e., it is hidden
by fairly small changes), and the test _must_ use volatile loads for
the bug to manifest.

Reviewers: arsenm, bogner, craig.topper, foad

Subscribers: MatzeB, jvesely, wdng, hiraditya, javed.absar, jfb, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74253
2020-02-12 09:12:55 +01:00
Craig Topper 0daf9b8e41 [X86][LegalizeTypes] Add SoftPromoteHalf support STRICT_FP_EXTEND and STRICT_FP_ROUND
This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses
them to implement soft promotion for the half type. This is
enough to provide basic support for __fp16 with strictfp.

Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C
is enabled.
2020-02-11 22:30:04 -08:00
Sebastian Neubauer 7cddd15e56 [SelectionDAG] Optimize build_vector of truncates and shifts
Add a simplification to fuse a manual vector extract with shifts and
truncate into a bitcast.

Unpacking and packing values into vectors is only optimized with
extractelement instructions, not when manually unpacked using shifts
and truncates.
This patch simplifies shifts and truncates into a bitcast if possible.

Simplify (build_vec (trunc $1)
                    (trunc (srl $1 width))
                    (trunc (srl $1 (2 * width))) ...)
to (bitcast $1)

Differential Revision: https://reviews.llvm.org/D73892
2020-02-10 15:04:07 +01:00
Craig Topper eeb63944e4 [LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results.
Remove code from LegalizeTypes that allowed this to work.

We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
2020-02-08 09:52:31 -08:00
Craig Topper 2af1640f9a [LegalizeDAG][X86][AMDGPU] Use ANY_EXTEND instead of ZERO_EXTEND when promoting ISD::CTTZ/CTTZ_ZERO_UNDEF.
Summary:
For CTTZ we place a set bit just past where the non-promoted type
stopped so the extended bits won't be used for the count. For
CTTZ_ZERO_UNDEF we don't care what happens if no bits are set in
the original type and we end up counting into the extended bits.
So we can just use ANY_EXTEND for both cases.

This matches what is done in type legalization for these operations.
We make no effort to force the upper bits to zero.

Differential Revision: https://reviews.llvm.org/D74111
2020-02-07 22:25:56 -08:00
Vedant Kumar 0d0ef315cb [MachineInstr] Add isCandidateForCallSiteEntry predicate
Add the isCandidateForCallSiteEntry predicate to MachineInstr to
determine whether a DWARF call site entry should be created for an
instruction.

For now, it's enough to have any call instruction that doesn't belong to
a blacklisted set of opcodes. For these opcodes, a call site entry isn't
meaningful.

Differential Revision: https://reviews.llvm.org/D74159
2020-02-07 10:10:41 -08:00
Guillaume Chatelet f85d3408e6 [NFC] Introduce an API for MemOp
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73964
2020-02-07 11:32:27 +01:00
Jeremy Morse 6531a78ac4 Revert "[DebugInfo] Remove some users of DBG_VALUEs IsIndirect field"
This reverts commit ed29dbaafa.

I'm backing out D68945, which as the discussion for D73526 shows, doesn't
seem to handle the -O0 path through the codegen backend correctly. I'll
reland the patch when a fix is worked out, apologies for all the churn.
The two parent commits are part of this revert too.

Conflicts:
	llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
	llvm/test/DebugInfo/X86/dbg-addr-dse.ll

SelectionDAGBuilder conflict is due to a nearby change in e39e2b4a79
that's technically unrelated. dbg-addr-dse.ll conflicted because
41206b61e3 (legitimately) changes the order of two lines.

There are further modifications to dbg-value-func-arg.ll: it landed after
the patch being reverted, and I've converted indirection to be represented
by the isIndirect field rather than DW_OP_deref.
2020-02-06 14:41:40 +00:00
Jeremy Morse ece761427f Revert "[DebugInfo][DAG] Distinguish different kinds of location indirection"
This reverts commit 3137fe4d23.

I'm backing out D68945, which this patch is a follow up for. It'll be
re-landed when D68945 is fixed.

The changes to dbg-value-func-arg.ll occur because our handling of certain
kinds of location now mixes up indirection that happens at different points
in a DIExpression. While this is a regression, it's a return to the prior
behaviour while a better patch is sought.
2020-02-06 14:41:40 +00:00
Jeremy Morse ed5998d21e Revert "[SafeStack][DebugInfo] Insert DW_OP_deref in correct location"
This reverts commit 2d3174c4df.

The overall solution for this problem is reverting D68945, which wasn't
handling the -O0 path through the codegen backend correctly. See:
discussion in D73526.
2020-02-06 14:41:39 +00:00
Simon Pilgrim 4592bb7195 visitINSERT_VECTOR_ELT - pull out repeated dyn_cast. NFCI.
This always gets called at least once.
2020-02-05 13:30:54 +00:00
Thomas Lively 649aba93a2 Revert "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering"
Summary:
This reverts commit 3ef169e586. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.

Depends on D73927.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73928
2020-02-04 20:04:59 -08:00
Reid Kleckner 2d89e0a098 [SEH] Remove CATCHPAD SDNode and X86::EH_RESTORE MachineInstr
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.

Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.

Fixes PR44697

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D73752
2020-02-04 15:13:12 -08:00
Matt Arsenault a3c814d234 Separately track input and output denormal mode
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
2020-02-04 12:59:21 -05:00
Simon Pilgrim 3dd688a9ee [DAG] OptLevelChanger - fix uninitialized variable analyzer warning (PR44471)
Ensure that OptLevelChanger::SavedFastISel is initialized in the constructor.

This should be NFC - as the equivalent 'same opt level' early-out is used in the destructor as well, so SavedFastISel is only actually referenced in the general case.

Differential Revision: https://reviews.llvm.org/D73875
2020-02-04 10:54:33 +00:00
David Green 362d00e051 [ARM][VecReduce] Force expand vector_reduce_fmin
Under MVE, we do not have any lowering for fminimum, which a
vector_reduce_fmin without NoNan will be expanded into. As with the
other recent patches, force this to expand in the pre-isel pass. Note
that Neon lowering would be OK because the scalar fminimum uses the
vector VMIN instruction, but is probably better to just rely on the
scalar operations, which is what is done here.

Also fixes what appears to be the reversal of INF vs -INF in the
vector_reduce_fmin widening code.
2020-02-04 09:36:59 +00:00
Guillaume Chatelet b8144c0536 [NFC] Encapsulate MemOp logic
Summary:
This patch simply introduces functions instead of directly accessing the fields.
This helps introducing additional check logic. A second patch will add simplifying functions.

Reviewers: courbet

Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73945
2020-02-04 10:36:26 +01:00
Simon Pilgrim 61621f826a [TargetLowering] SimplifyDemandedBits - add basic KnownBits ZEXTLoad handling
We have to be careful in SimplifyDemandedBits with loads in case we attempt to combine back to a constant (which then gets turned into a constant pool load again), but we can at least set the upper KnownBits for a ZEXTLoad to zero.
2020-02-03 16:50:04 +00:00
Guillaume Chatelet 333f2ad8b8 [Alignment][NFC] Use Align for getMemcpy/Memmove/Memset
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73885
2020-02-03 17:13:19 +01:00
Guillaume Chatelet fc19465965 [Alignment][NFC] Use Align for code creating MemOp
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73874
2020-02-03 14:10:30 +01:00
Guillaume Chatelet 75d9994a51 Fix broken invariant
Summary:
A Copy with a source that is zeros is the same as a Set of zeros.
This fixes the invariant that SrcAlign should always be non-null.

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73791
2020-02-03 11:01:05 +01:00
Craig Topper 943b5561d6 [LegalizeTypes][X86] Add a new strategy for type legalizing f16 type that softens it to i16, but promotes to f32 around arithmetic ops.
This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html

The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes.

It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from.

This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle.

Differential Revision: https://reviews.llvm.org/D73749
2020-02-01 11:21:04 -08:00
Matt Arsenault 792d9b5719 DAG: Check if a value is divergent before requiresUniformRegister
This avoids a potentially expensive scan if we already know it doesn't
matter.
2020-01-31 15:27:18 -08:00
Simon Pilgrim 8fbc7fd567 [DAG] SimplifyMultipleUseDemandedBits - peek through unused ISD::INSERT_SUBVECTOR subvectors
If we don't demand any elements of the inserted subvector then just skip it.
2020-01-31 18:57:22 +00:00
Simon Pilgrim 5702dadf6f [DAG] Enable ISD::INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::INSERT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
2020-01-31 18:02:34 +00:00
Guillaume Chatelet 3c89b75f23 [NFC] Introduce a type to model memory operation
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.

Reviewers: courbet

Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73785
2020-01-31 17:29:01 +01:00
Leonard Chan 2d3174c4df [SafeStack][DebugInfo] Insert DW_OP_deref in correct location
This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585
where a DW_OP_deref was placed at the end of a dwarf expression, resulting in corrupt
symbols when debugging.

This is an attempt to reland with a few fixes for buildbot since I
haven't merged from master in a bit.

Differential Revision: https://reviews.llvm.org/D73526
2020-01-30 17:09:42 -08:00
Leonard Chan 3b23453b6c Revert "[SafeStack][DebugInfo] Insert DW_OP_deref in correct location"
This reverts commit fff6a1b0f1.

This was breaking a bunch of buildbots.
2020-01-30 16:18:41 -08:00
Leonard Chan fff6a1b0f1 [SafeStack][DebugInfo] Insert DW_OP_deref in correct location
This patch addresses the issue found in https://bugs.llvm.org/show_bug.cgi?id=44585
where a DW_OP_deref was placed at the end of a dwarf expression, resulting in
corrupt symbols when debugging.

Differential Revision: https://reviews.llvm.org/D73526
2020-01-30 15:58:37 -08:00
Simon Pilgrim 57b0d33224 [DAGCombiner] ISD::AND/OR/XOR - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-30 12:02:53 +00:00
Simon Pilgrim a967aa2706 [DAGCombiner] ISD::SDIV/UDIV/SREM/UREM - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-30 12:02:52 +00:00
Simon Pilgrim f7245ef897 [DAGCombiner] ISD::SHL/SRA/SRL - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 18:49:42 +00:00
Simon Pilgrim 25b8e96388 [DAGCombiner] ISD::MUL - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 17:26:22 +00:00
Simon Pilgrim 4b04e11735 [DAGCombiner] Sub/SUBSAT - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us.
2020-01-29 16:57:13 +00:00
Simon Pilgrim 48bd6a0986 [DAGCombiner] visitIMINMAX - use general SelectionDAG::FoldConstantArithmetic
This handles all the constant splat / opaque testing for us instead of the ConstantSDNode variant where we have to do it ourselves.
2020-01-29 16:57:13 +00:00
Benjamin Kramer adcd026838 Make llvm::StringRef to std::string conversions explicit.
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.

This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.

This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-28 23:25:25 +01:00
Wang, Pengfei 3239b5034e [FPEnv] Add pragma FP_CONTRACT support under strict FP.
Summary: Support pragma FP_CONTRACT under strict FP.

Reviewers: craig.topper, andrew.w.kaylor, uweigand, RKSimon, LiuChen3

Subscribers: hiraditya, jdoerfert, cfe-commits, llvm-commits, LuoYuanke

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D72820
2020-01-28 20:43:43 +08:00
Guillaume Chatelet 879c825cb8 [instrinsics] Add @llvm.memcpy.inline instrinsics
Summary:
This is a follow up on D61634. It adds an LLVM IR intrinsic to allow better implementation of memcpy from C++.
A follow up CL will add the intrinsics in Clang.

Reviewers: courbet, theraven, t.p.northover, jdoerfert, tejohnson

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71710
2020-01-28 09:42:01 +01:00
Simon Pilgrim e7e043724e [DAG] Enable ISD::EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling
This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.

Differential Revision: This allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits to create a simpler ISD::EXTRACT_SUBVECTOR, which is particularly useful for cases where we're splitting into subvectors anyhow.
2020-01-27 21:17:47 +00:00
Wang, Pengfei 17b8f96d65 [FPEnv] Divide macro INSTRUCTION into INSTRUCTION and DAG_INSTRUCTION,
and macro FUNCTION likewise. NFCI.

Some functions like fmuladd don't really have a node, we should divide
the declaration form those have node to avoid introducing fake nodes.

Differential Revision: https://reviews.llvm.org/D72871
2020-01-27 10:38:05 +08:00
Simon Pilgrim 4a5f9d9faf [TargetLowering] Respect recursive depth in SimplifyDemandedBits call to ComputeNumSignBits 2020-01-26 10:01:56 +00:00
Simon Pilgrim 3daa71ee00 [SelectionDAG] ComputeNumSignBits - add DemandedElts support for MIN/MAX ops 2020-01-25 20:21:14 +00:00
Simon Pilgrim 3f8916b2e8 [SelectionDAG] ComputeNumSignBits - add support for rotate non-uniform vector amounts 2020-01-25 19:15:05 +00:00
Simon Pilgrim e3c26a9d1b [SelectionDAG] ComputeNumSignBits - add support for rotate uniform vector amounts 2020-01-25 18:55:47 +00:00
Simon Pilgrim c8de7c8f50 [TargetLowering] SimplifyDemandedBits - Remove ashr if all our demandedbits already match the sign bit
Differential Revision: https://reviews.llvm.org/D73412
2020-01-25 17:36:46 +00:00
@justice_adams (Justice Adams) daee63f974 [SelectionDag] Updated FoldConstantArithmetic method signature in preparation for merge with FoldConstantVectorArithmetic
Updated FoldConstantArithmetic method signature to match that of
FoldConstantVectorArithmetic in preparation for merging the two
functions together

https://bugs.llvm.org/show_bug.cgi?id=36544

This is the first step in combining the various
FoldConstantVectorArithmetic and FoldConstantVectorArithmetic
functions into one FoldConstantArithmetic function.

Differential Revision: https://reviews.llvm.org/D72870
2020-01-24 18:00:58 -05:00
Craig Topper d3bf06bc81 [DAGCombiner] Add combine for (not (strict_fsetcc)) to create a strict_fsetcc with the opposite condition.
Unlike the existing code that I modified here, I only handle the
case where the strict_fsetcc has a single use. Not sure exactly
how to handle multiples uses.

Testing this on X86 is hard because we already have a other
combines that get rid of lowered version of the integer setcc that
this xor will eventually become. So this combine really just
saves a bunch of extra nodes being created. Not sure about other
targets.

Differential Revision: https://reviews.llvm.org/D71816
2020-01-24 14:15:36 -08:00
Stanislav Mekhanoshin 7a94d4f4ee Allow combining of extract_subvector to extract element
Differential Revision: https://reviews.llvm.org/D73132
2020-01-24 10:50:26 -08:00
Guillaume Chatelet 805c157e8a [Alignment][NFC] Deprecate Align::None()
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.

Reviewers: xbolva00, courbet, bollu

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73099
2020-01-24 12:53:58 +01:00
Simon Pilgrim 0b45c2264a [SelectionDAG] rot(x, y) --> x iff ComputeNumSignBits(x) == BitWidth(x)
Rotating an 0/-1 value by any amount will always result in the same 0/-1 value
2020-01-24 10:35:57 +00:00
Simon Pilgrim e25eee4db7 [SelectionDAG] ComputeNumSignBits - add ISD::ADD demanded elts support 2020-01-23 17:48:07 +00:00
Simon Pilgrim 0fec8acdd8 [SelectionDAG] ComputeNumSignBits - add ISD::ADD vector support
Add missing handling for (ADD (AND X, 1), -1) uniform vectors
2020-01-23 16:42:12 +00:00
Simon Pilgrim fc5bbbf328 [SelectionDAG] ComputeNumSignBits - add ISD::SUB demanded elts support 2020-01-23 16:20:48 +00:00
Simon Pilgrim 48d4ba8fb2 [SelectionDAG] Compute Known + Sign Bits - merge INSERT_VECTOR_ELT known/unknown index paths
Match the approach in SimplifyDemandedBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits/ComputeNumSignBits call.
2020-01-23 13:31:37 +00:00
Simon Pilgrim 03cae086f4 [SelectionDAG] ComputeKnownBits - merge EXTRACT_VECTOR_ELT known/unknown index paths
Match the approach in SimplifyDemandedBits/ComputeNumSignBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits call.
2020-01-23 11:29:16 +00:00
Simon Pilgrim 98da49d979 [SelectionDAG] Compute Known + Sign Bits - merge INSERT_SUBVECTOR known/unknown index paths
Match the approach in SimplifyDemandedBits where we calculate the demanded elts and then have a common path for the ComputeKnownBits/ComputeNumSignBits call, additionally we only ever need original demanded elts of the base vector even if the index is unknown.
2020-01-23 11:29:15 +00:00
Stanislav Mekhanoshin 2d0fcf786c Precommit NFC part of DAGCombiner change. NFC.
This is NFC part of DAGCombiner::visitEXTRACT_SUBVECTOR()
change in the D73132.
2020-01-22 09:01:22 -08:00
Sander de Smalen 4cf16efe49 [AArch64][SVE] Add patterns for unpredicated load/store to frame-indices.
This patch also fixes up a number of cases in DAGCombine and
SelectionDAGBuilder where the size of a scalable vector is used in a
fixed-width context (thus triggering an assertion failure).

Reviewers: efriedma, c-rhodes, rovka, cameron.mcinally

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71215
2020-01-22 14:32:27 +00:00
Simon Pilgrim 80656fd7ae [SelectionDAG] getShiftAmountConstant - assert the type is an integer. 2020-01-22 13:52:44 +00:00
Sander de Smalen 67d4c9924c Add support for (expressing) vscale.
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:

  getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1

This can be used to propagate the value in CodeGenPrepare.

In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.

This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).

Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner

Reviewed by: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68203
2020-01-22 10:09:27 +00:00
Thomas Lively 3ef169e586 [WebAssembly][InstrEmitter] Foundation for multivalue call lowering
Summary:
WebAssembly is unique among upstream targets in that it does not at
any point use physical registers to store values. Instead, it uses
virtual registers to model positions in its value stack. This means
that some target-independent lowering activities that would use
physical registers need to use virtual registers instead for
WebAssembly and similar downstream targets. This CL generalizes the
existing `usesPhysRegsForPEI` lowering hook to
`usesPhysRegsForValues` in preparation for using it in more places.

One such place is in InstrEmitter for instructions that have variadic
defs. On register machines, it only makes sense for these defs to be
physical registers, but for WebAssembly they must be virtual registers
like any other values. This CL changes InstrEmitter to check the new
target lowering hook to determine whether variadic defs should be
physical or virtual registers.

These changes are necessary to support a generalized CALL instruction
for WebAssembly that is capable of returning an arbitrary number of
arguments. Fully implementing that instruction will require additional
changes that are described in comments here but left for a follow up
commit.

Reviewers: aheejin, dschuff, qcolombet

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71484
2020-01-21 11:13:46 -08:00
Simon Pilgrim f04284cf1d [TargetLowering] SimplifyDemandedBits ISD::SRA multi-use handling
Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses
2020-01-21 15:12:07 +00:00
Simon Pilgrim 47f99d2ca8 [SelectionDAG] GetDemandedBits - remove ANY_EXTEND handling
Rely on SimplifyMultipleUseDemandedBits fallback instead.
2020-01-21 14:39:00 +00:00
Simon Pilgrim 651fa669a2 [TargetLowering] SimplifyDemandedBits ANY_EXTEND/ANY_EXTEND_VECTOR_INREG multi-use handling
Call SimplifyMultipleUseDemandedBits to peek through extended source args with multiple uses
2020-01-21 14:07:19 +00:00
Simon Pilgrim 5f5f478564 [DAG] Fold extract_vector_elt (scalar_to_vector), K to undef (K != 0)
This was unconditionally folding this to the source operand, even if the access was out of bounds. Use undef instead of the extract is not the first element.

This helps with some cases where 3-vectors are legalized and avoids processing the 4th component.

Original Patch by: arsenm (Matt Arsenault)

Differential Revision: https://reviews.llvm.org/D51589
2020-01-21 10:58:30 +00:00
Simon Pilgrim 8d2e6bdbe1 [TargetLowering] SimplifyDemandedBits - Pull out InDemandedMask variable to ISD::SHL. NFCI.
Matches ISD::SRA + ISD::SRL variants.
2020-01-21 10:40:18 +00:00
Simon Pilgrim 9c06c10fba [SelectionDAG] GetDemandedBits - fallback to SimplifyMultipleUseDemandedBits by default.
First step towards removing SelectionDAG::GetDemandedBits entirely since it so similar to SimplifyMultipleUseDemandedBits anyhow.
2020-01-20 16:51:52 +00:00
Michael Liao 6d0d86a64d [DAG] Add helper for creating constant vector index with correct type. NFC. 2020-01-18 01:23:36 -05:00
Simon Pilgrim 1dc2f25790 [SelectionDAG] ComputeKnownBits - assert we're computing the 0'th (difference) result for the SUB/SUBC cases
Matches what we already do for the ADD/ADDC/ADDE case.
2020-01-17 13:53:57 +00:00
Simon Pilgrim f611158350 [SelectionDAG] Better ISD::ANY_EXTEND/ISD::ANY_EXTEND_VECTOR_INREG ComputeKnownBits support
Add DemandedElts handling to ISD::ANY_EXTEND and add missing ISD::ANY_EXTEND_VECTOR_INREG handling. Despite the lack of test changes this code IS being used - its just that the ANY_EXTEND ops are legalized later on (typically to ZERO_EXTEND equivalents) so we typically manage to combine later on.
2020-01-17 11:37:58 +00:00
Davide Italiano 30a8865142 [FastISel] Lower `llvm.dbg.value(undef, ...` correctly.
Summary:
Instead of just dropping them.

<rdar://problem/58657146>

Reviewers: aprantl, vsk, ab, paquette, echristo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72877
2020-01-16 16:22:20 -08:00
Craig Topper 61a89e17df [LegalizeDAG][Mips] Add an assert to protect a uint_to_fp implementation from double rounding. Add a i32->f32 uint_to_fp implementation that avoids this code.
The algorithm here only works if the sint_to_fp doesn't do any
rounding. Otherwise it can round before the offset fixup is
applied. Add an assert to protect this.

To avoid breaking the one test in tree that tested this code
with a set of types that fail the assert, I've enabled i32->f32
to use the i64->f32 algorithm. This only occurs when f64 isn't
a legal type. If f64 is legal then we do i32->f64->f32 instead.

Differential Revision: https://reviews.llvm.org/D72794
2020-01-16 11:08:16 -08:00
Matt Arsenault d0943537e1 GlobalISel: Apply target MMO flags to atomics
Unify MMO flag handling with SelectionDAG like with loads and stores.
2020-01-16 13:49:43 -05:00
Matt Arsenault 0d0fce42b0 GlobalISel: Preserve load/store metadata in IRTranslator
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.

Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
2020-01-16 13:49:43 -05:00
Jeremy Morse c969335abd Revert "[PHIEliminate] Move dbg values after phi and label"
Testing compiler-rt, a new assertion failure occurs when building
the GwpAsanTestObjects object. I'm uploading a reproducer to D70597.

This reverts commit 75188b01e9.
2020-01-16 14:01:27 +00:00
Chris Ye 75188b01e9 [PHIEliminate] Move dbg values after phi and label
If there are DBG_VALUEs between phi and label (after phi and before label),
DBG_VALUE will block PHI lowering after the LABEL. Moving all DBG_VALUEs
after Labels in the function ScheduleDAGSDNodes::EmitSchedule to avoid
impacting PHI lowering.

  before:
     PHI
     DBG_VALUE
     LABEL
  after: (move DBG_VALUE after label)
     PHI
     LABEL
     DBG_VALUE
  then: (phi lowering after label)
     LABEL
     COPY
     DBG_VALUE

Fixes the issue: https://bugs.llvm.org/show_bug.cgi?id=43859

Differential Revision: https://reviews.llvm.org/D70597
2020-01-16 11:58:09 +00:00
Craig Topper 5cf1b01a01 [LegalizeDAG][TargetLowering] Move vXi64/i64->vXf32/f32 uint_to_fp legalizing code from TargetLowering::expandUINT_TO_FP back to LegalizeDAG.
This was moved in October 2018, but we don't appear to be using
this for vectors on any in tree target.

Moving it back simplifies D72794 so we can share the code for i32->f32.
2020-01-15 22:04:50 -08:00
Michael Liao 8d07f8d98c [DAGCombine] Replace `getIntPtrConstant()` with `getVectorIdxTy()`.
- Prefer `getVectorIdxTy()` as the index operand type for
  `EXTRACT_SUBVECTOR` as targets expect different types by overloading
  `getVectorIdxTy()`.
2020-01-14 17:03:05 -05:00
Craig Topper 9ee90ea55c [LegalizeTypes] Remove untested code from ExpandIntOp_UINT_TO_FP
This code is untested in tree because the "APFloat::semanticsPrecision(sem) >= SrcVT.getSizeInBits() - 1" check is false for most combinations for int and fp types except maybe i32 and f64. For that you would need i32 to be an illegal type, but f64 to be legal and have custom handling for legalizing the split sint_to_fp. The precision check itself was added in 2010 to fix a double rounding issue in the algorithm that would occur if the sint_to_fp was not able to do the conversion without rounding.

Differential Revision: https://reviews.llvm.org/D72728
2020-01-14 13:15:29 -08:00
Ulrich Weigand 81ee484484 [FPEnv] Fix chain handling regression after 04a8696
Code in getRoot made the assumption that every node in PendingLoads
must always itself have a dependency on the current DAG root node.

After the changes in 04a8696, it turns out that this assumption no
longer holds true, causing wrong codegen in some cases (e.g. stores
after constrained FP intrinsics might get deleted).

To fix this, we now need to make sure that the TokenFactor created
by getRoot always includes the previous root, if there is no implicit
dependency already present.

The original getControlRoot code already has exactly this check,
so this patch simply reuses that code now for getRoot as well.
This fixes the regression.

NFC if no constrained FP intrinsic is present.
2020-01-14 14:10:57 +01:00
Simon Pilgrim c05a11108b [SelectionDAG] ComputeKnownBits - merge getValidMinimumShiftAmountConstant() and generic ISD::SHL handling.
As mentioned by @nikic on rGef5debac4302, we can merge the guaranteed bottom zero bits from the shifted value, and then, if a min shift amount is known, zero out the bottom bits as well.
2020-01-14 11:51:41 +00:00
Simon Pilgrim a43b0065c5 [SelectionDAG] ComputeKnownBits - merge getValidMinimumShiftAmountConstant() and generic ISD::SRL handling.
As mentioned by @nikic on rGef5debac4302 (although that was just about SHL), we can merge the guaranteed top zero bits from the shifted value, and then, if a min shift amount is known, zero out the top bits as well.

SHL tests / handling will be added in a follow up patch.
2020-01-14 11:41:47 +00:00
Craig Topper 26c7a4ed10 [LegalizeIntegerTypes][X86] Add support for expanding input of STRICT_SINT_TO_FP/STRICT_UINT_TO_FP into a libcall.
Needed to support i128->fp128 on 32-bit X86.

Add full set of strict sint_to_fp/uint_to_fp conversion tests for fp128.
2020-01-13 13:11:12 -08:00
Daniel Sanders a0f4600f4f Rework be15dfa88f such that it works with GlobalISel which doesn't use EVT
Summary:
be15dfa88f broke GlobalISel's usage of getSetCCInverse() which currently
appears to be limited to our out-of-tree backend. GlobalISel doesn't use
EVT's and isn't able to derive them from the information it has as it
doesn't distinguish between integer and floating point types (that
distinction is made by operations rather than values). Bring back the
bool version of getSetCCInverse() in a way that doesn't break the intent
of be15dfa88f but also allows GlobalISel to continue using it.

Reviewers: spatel, bogner, arichardson

Reviewed By: arichardson

Subscribers: rovka, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72309
2020-01-13 12:19:37 -08:00
Simon Pilgrim c6fcd5d115 [SelectionDAG] ComputeNumSignBits add getValidMaximumShiftAmountConstant() for ISD::SHL support
Allows us to handle non-uniform SHL shifts to determine the minimum number of sign bits remaining (based off the maximum shift amount value)
2020-01-13 18:02:37 +00:00
Andrew Wei 05366870ee [LegalizeTypes] Add SoftenFloatResult support for STRICT_SINT_TO_FP/STRICT_UINT_TO_FP
Some target like arm/riscv with soft-float will have compiling crash when using -fno-unsafe-math-optimization option.
This patch will add the missing strict FP support to SoftenFloatRes_XINT_TO_FP.

Differential Revision: https://reviews.llvm.org/D72277
2020-01-14 01:01:56 +08:00
Simon Pilgrim 38e2c01221 [SelectionDAG] ComputeNumSignBits add getValidMinimumShiftAmountConstant() ISD::SRA support
Allows us to handle more non-uniform SRA sign bits cases
2020-01-13 16:55:02 +00:00
Simon Pilgrim 376bc39c82 [SelectionDAG] ComputeNumSignBits - Use getValidShiftAmountConstant for shift opcodes
getValidShiftAmountConstant handles out of bounds shift amounts for us, allowing us to remove the local handling.
2020-01-13 14:12:12 +00:00
Simon Pilgrim 6d1a8fd447 [SelectionDAG] ComputeKnownBits - Add DemandedElts support to getValidShiftAmountConstant/getValidMinimumShiftAmountConstant() 2020-01-13 14:12:12 +00:00
Ulrich Weigand 04a86966fb [FPEnv] Fix chain handling for fpexcept.strict nodes
We need to ensure that fpexcept.strict nodes are not optimized away even if
the result is unused. To do that, we need to chain them into the block's
terminator nodes, like already done for PendingExcepts.

This patch adds two new lists of pending chains, PendingConstrainedFP and
PendingConstrainedFPStrict to hold constrained FP intrinsic nodes without
and with fpexcept.strict markers. This allows not only to solve the above
problem, but also to relax chains a bit further by no longer flushing all
FP nodes before a store or other memory access. (They are still flushed
before nodes with other side effects.)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D72341
2020-01-13 14:38:49 +01:00
Simon Pilgrim ef5debac43 [SelectionDAG] ComputeKnownBits add getValidMinimumShiftAmountConstant() ISD::SHL support
As mentioned on D72573
2020-01-13 12:02:13 +00:00
Simon Pilgrim 8f49204f26 [SelectionDAG] ComputeKnownBits - minimum leading/trailing zero bits in LSHR/SHL (PR44526)
As detailed in https://blog.regehr.org/archives/1709 we don't make use of the known leading/trailing zeros for shifted values in cases where we don't know the shift amount value.

This patch adds support to SelectionDAG::ComputeKnownBits to use KnownBits::countMinTrailingZeros and countMinLeadingZeros to set the minimum guaranteed leading/trailing known zero bits.

Differential Revision: https://reviews.llvm.org/D72573
2020-01-13 11:08:12 +00:00
Craig Topper efb674ac2f [LegalizeVectorOps] Parallelize the lo/hi part of STRICT_UINT_TO_FLOAT legalization.
The lo and hi computation are independent. Give them the same input
chain and TokenFactor the results together.
2020-01-11 17:50:30 -08:00
Craig Topper ed679804d5 [TargetLowering][X86] Connect the chain from STRICT_FSETCC in TargetLowering::expandFP_TO_UINT and X86TargetLowering::FP_TO_INTHelper. 2020-01-11 17:50:20 -08:00
Craig Topper ddfcd82bdc [LegalizeVectorOps] Expand vector MERGE_VALUES immediately.
Custom legalization can produce MERGE_VALUES to return multiple
results. We can expand them immediately instead of leaving them
around for DAG combine to clean up.
2020-01-11 17:50:20 -08:00
Craig Topper 5a9954c02a [LegalizeVectorOps] Remove some of the simpler Expand methods. Pass Results vector to a couple. NFCI
Some of the simplest handlers just call TLI and if that fails,
they fall back to unrolling. For those just inline the TLI call
and share the unrolling call with the default case of Expand.

For ExpandFSUB and ExpandBITREVERSE so that its obvious they
don't return results sometimes and want to defer to LegalizeDAG.
2020-01-11 12:14:19 -08:00
Craig Topper 9fe6f36c1a [LegalizeVectorOps] Only pass SDNode* instead SDValue to all of the Expand* and Promote* methods.
All the Expand* and Promote* function assume they are being
called with result 0 anyway. Just hardcode result 0 into them.
2020-01-11 11:41:23 -08:00
Craig Topper bb2553175a [TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from RunttimeLibcalls.def and all associated usages
Summary:
This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare.

This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code.

Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn

Reviewed By: efriedma

Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72536
2020-01-10 19:30:08 -08:00
Craig Topper 71cee21861 [TargetLowering] Use SelectionDAG::getSetCC and remove a repeated call to getSetCCResultType in softenSetCCOperands. NFCI 2020-01-10 13:24:00 -08:00
Craig Topper b590e0fd81 [TargetLowering][ARM][X86] Change softenSetCCOperands handling of ONE to avoid spurious exceptions for QNANs with strict FP quiet compares
ONE is currently softened to OGT | OLT. But the libcalls for OGT and OLT libcalls will trigger an exception for QNAN. At least for X86 with libgcc. UEQ on the other hand uses UO | OEQ. The UO and OEQ libcalls will not trigger an exception for QNAN.

This patch changes ONE to use the inverse of the UEQ lowering. So we now produce O & UNE. Technically the existing behavior was correct for a signalling ONE, but since I don't know how to generate one of those from clang that seemed like something we can deal with later as we would need to fix other predicates as well. Also removing spurious exceptions seemed better than missing an exception.

There are also problems with quiet OGT/OLT/OLE/OGE, but those are harder to fix.

Differential Revision: https://reviews.llvm.org/D72477
2020-01-10 11:00:17 -08:00
Craig Topper f678fc7660 [LegalizeVectorOps] Improve handling of multi-result operations.
This system wasn't very well designed for multi-result nodes. As
a consequence they weren't consistently registered in the
LegalizedNodes map leading to nodes being revisited for different
results.

I've removed the "Result" variable from the main LegalizeOp method
and used a SDNode* instead. The result number from the incoming
Op SDValue is only used for deciding which result to return to the
caller. When LegalizeOp is called it should always register a
legalized result for all of its results. Future calls for any other
result should be pulled for the LegalizedNodes map.

Legal nodes will now register all of their results in the map
instead of just the one we were called for.

The Expand and Promote handling to use a vector of results similar
to LegalizeDAG. Each of the new results is then re-legalized and
logged in the LegalizedNodes map for all of the Results for the
node being legalized. None of the handles register their own
results now. And none call ReplaceAllUsesOfValueWith now.

Custom handling now always passes result number 0 to LowerOperation.
This matches what LegalizeDAG does. Since the introduction of
STRICT nodes, I've encountered several issues with X86's custom
handling being called with an SDValue pointing at the chain and
our custom handlers using that to get a VT instead of result 0.
This should prevent us from having any more of those issues. On
return we will update the LegalizedNodes map for all results so
we shouldn't call the custom handler again for each result number.

I want to push SDNode* further into the Expand and Promote
handlers, but I've left that for a follow to keep this patch size
down. I've created a dummy SDValue(Node, 0) to keep the handlers
working.

Differential Revision: https://reviews.llvm.org/D72224
2020-01-10 10:14:58 -08:00
Ulrich Weigand f0fd11df7d [FPEnv] Invert sense of MIFlag::FPExcept flag
In D71841 we inverted the sense of the SDNode-level flag to ensure all nodes
default to potentially raising FP exceptions unless otherwise specified --
i.e. if we forget to propagate the flag somewhere, the effect is now only
lost performance, not incorrect code.

However, the related flag at the MI level still defaults to nodes not raising
FP exceptions unless otherwise specified. To be fully on the (conservatively)
safe side, we should invert that flag as well.

This patch does so by replacing MIFlag::FPExcept with MIFlag::NoFPExcept.
(Note that this does also introduce an incompatible change in the MIR format.)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D72466
2020-01-10 15:34:50 +01:00
Peng Guo cfd8498401 [MIR] Fix cyclic dependency of MIR formatter
Summary:
Move MIR formatter pointer from TargetMachine to TargetInstrInfo to
avoid cyclic dependency between target & codegen.

Reviewers: dsanders, bkramer, arsenm

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72485
2020-01-10 11:18:12 +01:00
Matt Arsenault f33f3d98e9 DAG: Don't use unchecked dyn_cast 2020-01-09 17:37:52 -05:00
Matt Arsenault 255cc5a760 CodeGen: Use LLT instead of EVT in getRegisterByName
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
2020-01-09 17:37:52 -05:00
Craig Topper b705fe5686 [TargetLowering][X86] TeachSimplifyDemandedBits to handle cases where only the sign bit is demanded from a SETCC and can be passed through
If we're doing a compare that only tests the sign bit and only the sign bit is demanded, we can just bypass the node. This removes one of the blend dependencies in our v2i64->v2f32 uint_to_fp codegen on pre-sse4.2 targets.

Differential Revision: https://reviews.llvm.org/D72356
2020-01-09 10:21:25 -08:00
Sanjay Patel cb5612e2df [DAGCombiner] reduce extract subvector of concat
If we are extracting a chunk of a vector that's a fraction of an
operand of the concatenated vector operand, we can extract directly
from one of those original operands.

This is another suggestion from PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024#c2

But I'm not sure yet if it will make any difference on those patterns.
It seems to help a few existing AVX512 tests though.

Differential Revision: https://reviews.llvm.org/D72361
2020-01-09 09:38:12 -05:00
QingShan Zhang d48ac7d54d [DAGCombine] Fold the (fma -x, y, -z) to -(fma x, y, z)
This is a positive combination as long as the NEG is NOT free,
as we are reducing the number of NEG from two to one.

Differential Revision: https://reviews.llvm.org/D72312
2020-01-09 04:33:46 +00:00
Daniel Sanders de3d0ee023 Revert "Revert "[MIR] Target specific MIR formating and parsing""
There was an unguarded dereference of MF in a function that permitted
nullptr. Fixed

This reverts commit 71d64f72f9.
2020-01-08 20:03:29 -08:00
Nico Weber 71d64f72f9 Revert "[MIR] Target specific MIR formating and parsing"
This reverts commit 3ef05d85be.
It broke check-llvm on many bots, see comments on D69836.
2020-01-08 22:50:49 -05:00
Peng Guo 3ef05d85be [MIR] Target specific MIR formating and parsing
Summary:
Added MIRFormatter for target specific MIR formating and parsing with
immediate and custom pseudo source values. Target machine can subclass
MIRFormatter and implement custom logic for printing and parsing
immediate and custom pseudo source values for better readability.

* Target specific immediate mnemonic need to start with "." follows by
  identifier string. When MIR parser sees immediate it will call target
  specific parsing function.

* Custom pseudo source value need to start with custom follows by
  double-quoted string. MIR parser will pass the quoted string to target
  specific PSV parsing function.

* MIRFormatter have 2 helper functions to facilitate LLVM value printing
  and parsing for custom PSV if they refers LLVM values.

Patch by Peng Guo

Reviewers: dsanders, arsenm

Reviewed By: dsanders

Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69836
2020-01-08 18:48:02 -08:00
Daniel Sanders 5ab6fa7b70 Revert "[MIR] Target specific MIR formating and parsing"
Forgot to credit Peng in the commit message.

This reverts commit be841f89d0.
2020-01-08 18:48:02 -08:00
Peng Guo be841f89d0 [MIR] Target specific MIR formating and parsing
Summary:
Added MIRFormatter for target specific MIR formating and parsing with
immediate and custom pseudo source values. Target machine can subclass
MIRFormatter and implement custom logic for printing and parsing
immediate and custom pseudo source values for better readability.

* Target specific immediate mnemonic need to start with "." follows by
  identifier string. When MIR parser sees immediate it will call target
  specific parsing function.

* Custom pseudo source value need to start with custom follows by
  double-quoted string. MIR parser will pass the quoted string to target
  specific PSV parsing function.

* MIRFormatter have 2 helper functions to facilitate LLVM value printing
  and parsing for custom PSV if they refers LLVM values.

Reviewers: dsanders, arsenm

Reviewed By: dsanders

Subscribers: wdng, jvesely, nhaehnle, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69836
2020-01-08 18:34:21 -08:00
Simon Pilgrim 108279948d [SelectionDAG] Use llvm::Optional<APInt> for FoldValue.
Use llvm::Optional<APInt> instead of std::pair<APInt, bool> with the bool second being used to report success/failure of fold.
2020-01-08 16:09:24 +00:00
Sanjay Patel 780ba1f22b [DAGCombiner] clean up extract-of-concat fold; NFC
This hopes to improve readability and adds an assert.
The functional change noted by the TODO comment is
proposed in:
D72361
2020-01-08 10:15:33 -05:00
Bevin Hansson 8e2b44f7e0 [Intrinsic] Add fixed point division intrinsics.
Summary:
This patch adds intrinsics and ISelDAG nodes for
signed and unsigned fixed-point division:

  llvm.sdiv.fix.*
  llvm.udiv.fix.*

These intrinsics perform scaled division on two
integers or vectors of integers. They are required
for the implementation of the Embedded-C fixed-point
arithmetic in Clang.

Patch by: ebevhan

Reviewers: bjope, leonardchan, efriedma, craig.topper

Reviewed By: craig.topper

Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70007
2020-01-08 15:17:46 +01:00
Wang, Pengfei 9a621de1ec [X86] Adding fp128 support for strict fcmp
Summary: Adding fp128 support for strict fcmp

Reviewers: craig.topper, LiuChen3, andrew.w.kaylor, RKSimon, uweigand

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71897
2020-01-08 12:59:31 +08:00
Bill Wendling e886e762dd Revert "Allow output constraints on "asm goto""
This reverts commit 52366088a8.

I accidentally pushed this before supporting changes.
2020-01-07 13:44:08 -08:00
Bill Wendling 52366088a8 Allow output constraints on "asm goto"
Summary:
Remove the restrictions that preventing "asm goto" from returning non-void
values. The values returned by "asm goto" are only valid on the "fallthrough"
path.

Reviewers: jyknight, nickdesaulniers, hfinkel

Reviewed By: jyknight, nickdesaulniers

Subscribers: rsmith, hiraditya, llvm-commits, cfe-commits, craig.topper, rnk

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69876
2020-01-07 13:40:26 -08:00
Sanjay Patel 58e2e92a57 [DAGCombiner] reduce shuffle of concat of same vector
This is possibly a small part towards solving PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024

The vectorizer is creating shuffles of concat like this:

%63 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
%64 = shufflevector <8 x i64> %63, <8 x i64> undef, <8 x i32> <i32 0, i32 4, i32 1, i32 5, i32 2, i32 6, i32 3, i32 7>

That might be fixable in the vectorizers, but we're not allowed to fold that into a single shuffle in instcombine,
so we should have a backend backstop to convert that into the likely simpler form:

%64 = shufflevector <4 x i64> %x, <4 x i64> undef, <8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3>

Differential Revision: https://reviews.llvm.org/D72300
2020-01-07 09:48:59 -05:00
Craig Topper 62f3403bfc [LegalizeTypes] Add widening support for STRICT_FSETCC/FSETCCS
This patch adds widening which really just scalarizes because we don't have a strategy for the extra elements we would need to pad with.

Differential Revision: https://reviews.llvm.org/D72193
2020-01-06 13:45:55 -08:00
Simon Pilgrim 6fa6000e3e [DAG] DAGCombiner::XformToShuffleWithZero - use APInt::extractBits helper. NFCI. 2020-01-06 13:17:02 +00:00
James Henderson d68904f957 [NFC] Fix trivial typos in comments
Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D72143

Patch by Kazuaki Ishizaki.
2020-01-06 10:50:26 +00:00
Craig Topper 19ace449a3 [TargetLowering] Use SETCC input type to call getBooleanContents instead of the setcc result type.
This isn't a functonal change since we also check the bit width is the
same and the input type is integer. This guarantees the input and
output type are the same. But passing the input type makes the code
more readable.
2020-01-05 23:15:49 -08:00
QingShan Zhang b9780f4f80 [DAGCombine] Don't check the legality of type when combine the SIGN_EXTEND_INREG
This is the DAG node for SIGN_EXTEND_INREG :

t21: v4i32 = sign_extend_inreg t18, ValueType:ch:v4i16

It has two operands. The first one is the value it want to extend, and the second
one is the type to specify how to extend the value. For this example, it means
that, it is signed extend the t18(v4i32) from v4i16 to v4i32. That is
the semantics of c code:

vector int foo(vector int m) {
   return m << 16 >> 16;
}

And it could be any vector type that hardware support the operation, though
the type 'v4i16' is NOT legal for the target. When we are trying to combine
the srl + sra, what we did now is calling the TLI.isOperationLegal(), which
will also check the legality of the type. That doesn't make sense.

Differential Revision: https://reviews.llvm.org/D70230
2020-01-06 03:00:58 +00:00
Craig Topper 4e37d60f2a [LegalizeVectorOps][X86] Enable expansion of vector fp_to_uint in LegalizeVectorOps to avoid scalarization.
The code here isn't great in all caess. Particularly v4f64->v4i32
on 64-bit AVX targets. But there is some improvement in some
configurations.

There's definitely some issues with computeNumSignBits with
X86ISD::STRICT_FCMP. As well as not being able to propagate sign
bits through merge_values nodes that get created during custom
legalization.
2020-01-04 19:18:54 -08:00
Craig Topper 16a67d252c [TargetLowering] In expandFP_TO_UINT, add proper extend or truncate for the condition to feed the DstVT select.
Previously, for vectors we created a vselect with a condition that
didn't match what the target wanted according to getSetCCResultType.

To make up for this, X86 had a special DAG combine to detect if
the condition was all sign bits and then insert its own truncate
or extend. By adding the extend/truncate here explicitly we can
avoid that.
2020-01-04 18:15:20 -08:00
Craig Topper 285d5e6b8b [LegalizeVectorOps] Split most of ExpandStrictFPOp into a separate UnrollStrictFPOp method. Call that method from ExpandUINT_TO_FLOAT.
ExpandStrictFPOp calls ExpandUINT_TO_FLOAT. Previously, ExpandUINT_TO_FLOAT
returned SDValue() if it wasn't able to handle and needed to unroll.
Then ExpandStrictFPOp would detect his SDValue() and do the unroll.

After this change, ExpandUINT_TO_FLOAT will directly call
UnrollStrictFPOp and return the unrolled result.
2020-01-04 17:03:50 -08:00
Simon Pilgrim eb0e1978df [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT (REAPPLIED)
This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.

In particular this helps remove some unnecessary scalar->vector->scalar patterns.

The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.

Reapplied after reversion at rL368660 due to PR42982 which was fixed at rGca7fdd41bda0.

Differential Revision: https://reviews.llvm.org/D65887
2020-01-04 13:15:50 +00:00
Sanjay Patel ca7fdd41bd [DAGCombiner] fix miscompile in translating (X & undef) to shuffle
See PR42982 for more context:
https://bugs.llvm.org/show_bug.cgi?id=42982
2020-01-03 14:58:49 -05:00
Craig Topper 7cdc60c3db [LegalizeVectorOps] Pass the post-UpdateNodeOperands version of Op to ExpandLoad/ExpandStore
UpdateNodeOperands might CSE to another existing node. So we should make sure we're legalizing that node otherwise we might fail to hook up the operands properly. I've moved the result registration up to the caller to avoid having to pass both Result and Op into the functions where it might be confusing which is which.

This address 2 other issues pointed out in D71861.

Differential Revision: https://reviews.llvm.org/D72021
2020-01-03 11:53:08 -08:00
Reid Kleckner 9c2b72821b Move tail call disabling code to target independent code
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).

There's no major functionality change, except for targets that never
implemented this check.

This LLVM attribute was originally added in
d9699bc7bd (2015).

Reviewers: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D72118
2020-01-03 11:27:41 -08:00
Roman Lebedev 0727e2b90c
[DAGCombiner][X86][AArch64] Generalize `A-(A&B)`->`A&(~B)` fold (PR44448)
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d
is too specific. It should/can just be 'A - (A & B)' -> 'A & (~B)'

Even if we don't manage to fold `~` into B,
we have likely formed `ANDN` node.
Also, this way there's less similar-but-duplicate folds.

Name: X - (X & Y)  ->  X & (~Y)
%o = and i32 %X, %Y
%r = sub i32 %X, %o
  =>
%n = xor i32 %Y, -1
%r = and i32 %X, %n

https://rise4fun.com/Alive/kOUl

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 17:55:47 +03:00
Roman Lebedev 86403c0ff8
[DAGCombiner] `~(add X, -1)` -> `neg X` fold
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d
is too specific. It should just be 'A - (A & B)' -> 'A & (~B)',
but we currently fail to sink that '~' into `(B - 1)`.

Name: ~(X - 1)  ->  (0 - X)
%o = add i32 %X, -1
%r = xor i32 %o, -1
  =>
%r = sub i32 0, %X

https://rise4fun.com/Alive/rjU
2020-01-03 17:55:46 +03:00
Roman Lebedev 3d492d7503
[DAGCombine][X86][Thumb2/LowOverheadLoops] `A - (A & C)` -> `A & (~C)` fold (PR44448)
While we do manage to fold integer-typed IR in middle-end,
we can't do that for the main motivational case of pointers.

There is @llvm.ptrmask() intrinsic which may or may not be helpful,
but i'm not sure it is fully considered canonical yet,
not everything is fully aware of it likely.

Name: PR44448  ptr - (ptr & C) -> ptr & (~C)
%bias = and i32 %ptr, C
%r = sub i32 %ptr, %bias
  =>
%r = and i32 %ptr, ~C

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 17:55:45 +03:00
Roman Lebedev 1711be78f7
[NFC][DAGCombine] Clarify comment for 'A - (A & (B - 1))' fold 2020-01-03 17:55:42 +03:00
Roman Lebedev 8dab0a4a7d
[DAGCombine][X86][AArch64] 'A - (A & (B - 1))' -> 'A & (0 - B)' fold (PR44448)
While we do manage to fold integer-typed IR in middle-end,
we can't do that for the main motivational case of pointers.

There is @llvm.ptrmask() intrinsic which may or may not be helpful,
but i'm not sure it is fully considered canonical yet,
not everything is fully aware of it likely.

https://rise4fun.com/Alive/ZVdp

Name: ptr - (ptr & (alignment-1))  ->  ptr & (0 - alignment)
  %mask = add i64 %alignment, -1
  %bias = and i64 %ptr, %mask
  %r = sub i64 %ptr, %bias
=>
  %highbitmask = sub i64 0, %alignment
  %r = and i64 %ptr, %highbitmask

See
  https://bugs.llvm.org/show_bug.cgi?id=44448
  https://reviews.llvm.org/D71499
2020-01-03 13:58:36 +03:00
Matt Arsenault 0d9f919b73 DAG: Use TargetConstant for FENCE operands 2020-01-02 17:16:10 -05:00
Fangrui Song 87fb204e8f [SelectionDAG] Simplify SelectionDAGBuilder::visitInlineAsm 2020-01-02 09:44:23 -08:00
Ulrich Weigand 63336795f0 [FPEnv] Default NoFPExcept SDNodeFlag to false
The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all
other such flags. This is a problem, because it implies that all code that
transforms SDNodes without copying flags can introduce a correctness bug,
not just a missed optimization.

This patch changes the default to false. This makes it necessary to move
setting the (No)FPExcept flag for constrained intrinsics from the
visitConstrainedIntrinsic routine to the generic visit routine at the
place where the other flags are set, or else the intersectFlagsWith
call would erase the NoFPExcept flag again.

In order to avoid making non-strict FP code worse, whenever
SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes
none of which can raise FP exceptions, it will preserve this property
on all results nodes generated, by setting the NoFPExcept flag on
those result nodes that would otherwise be considered as raising
an FP exception.

To check whether or not an SD node should be considered as raising
an FP exception, the following logic applies:

- For machine nodes, check the mayRaiseFPException property of
  the underlying MI instruction
- For regular nodes, check isStrictFPOpcode
- For target nodes, check a newly introduced isTargetStrictFPOpcode

The latter is implemented by reserving a range of target opcodes,
similarly to how memory opcodes are identified. (Note that there a
bit of a quirk in identifying target nodes that are both memory nodes
and strict FP nodes. To simplify the logic, right now all target memory
nodes are automatically also considered strict FP nodes -- this could
be fixed by adding one more range.)

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D71841
2020-01-02 16:59:45 +01:00
Mark de Wever 8dc7b982b4 [NFC] Fixes -Wrange-loop-analysis warnings
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.

Differential Revision: https://reviews.llvm.org/D71857
2020-01-01 20:01:37 +01:00
Matt Arsenault 4d7201e7b9 DAG: Stop trying to fold FP -(x-y) -> y-x in getNode with nsz
This was increasing the number of instructions when fsub was legalized
on AMDGPU with no signed zeros enabled. This fold should be guarded by
hasOneUse, and I don't think getNode should be doing that. The same
fold is already done as a regular combine through isNegatibleForFree.

This does require duplicating, even though isNegatibleForFree does
this combine already (and properly checks hasOneUse) to avoid one PPC
regression. In the regression, the outer fneg has nsz but the fsub
operand does not. isNegatibleForFree only sees the operand, and
doesn't see it's used from a nsz context. A nsz parameter needs to be
added and threaded through isNegatibleForFree to avoid this.
2019-12-31 22:49:51 -05:00
Craig Topper 4ae3120ed8 [LegalizeVectorOps][AArch64] Stop asking for v4f16 fp_round and fp_extend to be promoted.
These operations are needed as building blocks for promoting so they
can't be promoted themselves.

This appeared to work because the fp_extend query type for operation
actions is the result type, not the input type so it never triggered
in the legalizer.

For fp_round, the vector op legalizer just ended up creating a
nop fp_extend that was elided by getNode, followed by a nop
fp_round that was also elided by getNode. This was followed by
a final fp_round from v4f32 back to vf416 which was CSEd to the
original node. Then legalize vector ops just believed that node
legalized to itself. LegalizeDAG took another crack at promoting
it, but didn't have a handler so just skipped it with a debug
message saying it wasn't promoted.

This patch just removes the operation actions to avoid this
non-sense. Found while trying to refactor LegalizeVectorOps to
handle multiple result nodes better.
2019-12-31 15:04:12 -08:00
Craig Topper 787e078f3e [TargetLowering][AMDGPU] Make scalarizeVectorLoad return a pair of SDValues instead of creating a MERGE_VALUES node. NFCI
This allows us to clean up some places that were peeking through
the MERGE_VALUES node after the call. By returning the SDValues
directly, we can clean that up.

Unfortunately, there are several call sites in AMDGPU that wanted
the MERGE_VALUES and now need to create their own.
2019-12-30 19:36:04 -08:00
Fangrui Song 6f9b4c6826 [SelectionDAT] Simplify SelectionDAGBuilder::visitInlineAsm
Indirect C_Immediate or C_Other constraints have been excluded.

Also simplify an unneeded change to indirect 'X' by D60942.
2019-12-29 20:53:30 -08:00
Fangrui Song 5edb40c022 [SelectionDAG] Disallow indirect "i" constraint
This allows us to delete InlineAsm::Constraint_i workarounds in
SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and
TargetLowering::getInlineAsmMemConstraint overrides.

They were introduced to X86 in r237517 to prevent crashes for
constraints like "=*imr". They were later copied to other targets.
2019-12-29 16:50:42 -08:00
Simon Pilgrim 34769e0783 SimplifyDemandedBits - Remove duplicate getOperand() call. NFC.
Pulled out from D56387 - cleanup variable names, move shift amount legalization inside if() of its only user and remove duplicate getOperand() call.
2019-12-28 16:42:50 +00:00
Craig Topper a3f8964813 [TargetLowering] Update comment to reference the correct compiler-rt function the code is based on. NFC 2019-12-27 22:49:04 -08:00
Fangrui Song 044cc919f4 Delete setjmp_undefined_for_msvc workaround after llvm.setjmp was removed 2019-12-27 18:09:22 -08:00
Fangrui Song 7a7334663c Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821
Intrinsic has incorrect argument type!
  i32 (i32*)* @llvm.setjmp

*wipes tear*
2019-12-27 00:00:14 -08:00
Craig Topper 53ee806d93 [X86][FPEnv] Promote some float strictfp operations to double on i686-pc-windows-msvc to match what we do for non-strict.
The float libcalls are inlined in MSVC's math header where they
just cast to double and use the double libcall. Do the same when
we emit libcalls.
2019-12-26 20:22:24 -08:00
Kristina Bessonova cdd25a4c74 [DebugInfo][SelectionDAG] Change order while transferring SDDbgValue to another node
SelectionDAG::transferDbgValues() can 'reattach' SDDbgValue from one to
another node, but doesn't change its source order. If the destination node has
the order greater than the SDDbgValue, there are two possible issues
revealed later:

* If debug info is attached to an instruction that is the first definition
of a register, this ends up with a def-after-use and the debug info
gets 'undef' later.

* If MIR has another definition of a register above the debug info,
the debug info may represent a source variable incorrectly because
it appears (significantly) before an instruction corresponded
to this debug info.

So, the patch changes the order of an SDDbgValue when it is moved
to a node with greater order.

Reviewers: dblaikie, jmorse, aprantl

Reviewed By: aprantl

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71175
2019-12-26 21:01:59 +03:00
Wang, Pengfei 472bded3ed [X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend
Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend

Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71871
2019-12-26 08:15:13 +08:00
Fangrui Song e0d855b399 [SelectionDAG] Change SelectionDAGISel::{funcInfo,SDB} to use unique_ptr
CurDAG is referenced more than 2000 times and used in many gerated .cpp
files. Don't touch it for now.
2019-12-23 22:41:05 -08:00
Fangrui Song 01b98e6fd5 [SelectionDAG] Don't repeatedly add a node to the worklist in ComputeLiveOutVRegInfo. NFC
For sqlite3 amalgram, this decreases the number of Worklist.push_back calls (603084) by 10%.
2019-12-23 22:04:14 -08:00
Ulrich Weigand 0d3f782e41 [FPEnv][X86] More strict int <-> FP conversion fixes
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:

- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
  compare can raise exceptions and therefore needs to be a strict compare.
  I've made it signaling (even though quiet would also be correct) as
  signaling is the more usual default for an LT. This code exists both
  in common code and in the X86 target.

- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
  it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
  of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
  that ends up not chosen. I've fixed the algorithm to use only a single
  STRICT_SINT_TO_FP instead.

- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
  the wrong thing because it calls getOperationAction using the result VT.
  But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
  be called using the operand VT.

- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
  STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.

Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D71840
2019-12-23 21:11:45 +01:00
Sanjay Patel 8cefc37be5 [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.

Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0

Patch by: @RKSimon (Simon Pilgrim)

Differential Revision: https://reviews.llvm.org/D63815
2019-12-23 10:11:45 -05:00
Carl Ritson 2791667d2e [DAGCombiner] Check term use before applying aggressive FSUB optimisations
Summary:
Without this check unnecessary FMA instructions are generated when the FSUB terms are reused.
This also has the side-effect that the same value is computed to different levels of precision, which can create undesirable effects if the results are used together in subsequent computation.

Reviewers: arsenm, nhaehnle, foad, tpr, dstuttard, spatel

Reviewed By: arsenm

Subscribers: jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71656
2019-12-23 09:37:58 +09:00
Valentin Churavy fb0ccff6e5
[SelectionDAG] Copy FP flags when visiting a binary instruction.
Summary:
We noticed in Julia that the sequence below no longer turned into
a sequence of FMA instructions in LLVM 7+, but it did in LLVM 6.

```
    %29 = fmul contract <4 x double> %wide.load, %wide.load16
    %30 = fmul contract <4 x double> %wide.load13, %wide.load17
    %31 = fmul contract <4 x double> %wide.load14, %wide.load18
    %32 = fmul contract <4 x double> %wide.load15, %wide.load19
    %33 = fadd fast <4 x double> %vec.phi, %29
    %34 = fadd fast <4 x double> %vec.phi10, %30
    %35 = fadd fast <4 x double> %vec.phi11, %31
    %36 = fadd fast <4 x double> %vec.phi12, %32
```

Unlike Clang, Julia doesn't set the `unsafe-fp-math=true` function
attribute, but rather emits more local instruction flags.

This partially undoes https://reviews.llvm.org/D46854 and if required I can try to minimize the test further.

Reviewers: spatel, mcberg2017

Reviewed By: spatel

Subscribers: chriselrod, merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71495
2019-12-22 14:29:36 -05:00
Craig Topper e6e23a24be [LegalizeDAG] Add return to the strict node handling in PromoteLegalINT_TO_FP to prevent an invalid strict fp node from being created by falling into non-strict code path. 2019-12-19 11:39:50 -08:00
Liu, Chen3 2f932b5729 Enable STRICT_FP_TO_SINT/UINT on X86 backend
This patch is mainly for custom lowering the vector operation.

Differential Revision: https://reviews.llvm.org/D71592
2019-12-19 14:49:13 +08:00
Ulrich Weigand 1946461344 [FPEnv] Strict versions of llvm.minimum/llvm.maximum
Add new intrinsics
   llvm.experimental.constrained.minimum
   llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.

Includes SystemZ back-end support.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D71624
2019-12-18 21:35:28 +01:00
Craig Topper cfe316007f [SelectionDAGBuilder] Use getConstant instead of getTargetConstant to build the offset for struct types in getUniformBase.
getTargetConstant prevents any optimizations from operating on the
value and basically says its already been iseled. But since we
want the index to be in a register, this isn't true.

Prior to this we were generating a vbroadcast with an immediate
argument which is illegal and was flagged by the expensive checks
bot.
2019-12-18 10:44:28 -08:00
stozer 89d19d60ad Reapply: [DebugInfo] Correctly handle salvaged casts and split fragments at ISel
This reverts commit 1f3dd83cc1, reapplying
commit bb1b0bc4e5.

The original commit failed on some builds seemingly due to the use of a
bracketed constructor with an std::array, i.e. `std::array<> arr({...})`.
2019-12-18 16:26:42 +00:00
stozer 1f3dd83cc1 Revert "[DebugInfo] Correctly handle salvaged casts and split fragments at ISel"
Reverted due to build failure on windows bots.

This reverts commit bb1b0bc4e5.
2019-12-18 11:46:10 +00:00
stozer bb1b0bc4e5 [DebugInfo] Correctly handle salvaged casts and split fragments at ISel
Previously, LLVM had no functional way of performing casts inside of a
DIExpression(), which made salvaging cast instructions other than Noop
casts impossible. This patch enables the salvaging of casts by using the
DW_OP_LLVM_convert operator for SExt and Trunc instructions.

There is another issue which is exposed by this fix, in which fragment
DIExpressions (which are preserved more readily by this patch) for
values that must be split across registers in ISel trigger an assertion,
as the 'split' fragments extend beyond the bounds of the fragment
DIExpression causing an error. This patch also fixes this issue by
checking the fragment status of DIExpressions which are to be split, and
dropping fragments that are invalid.
2019-12-18 11:09:18 +00:00
Wang, Pengfei 8cc0b58673 [X86] Add calculation for elements in structures in getting uniform base for the Gather/Scatter intrinsic.
Summary: Add calculation for elements in structures in getting uniform
base for the Gather/Scatter intrinsic.

Reviewers: craig.topper, c-rhodes, RKSimon

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71442
2019-12-18 12:24:58 +08:00
Craig Topper c36773c78e [FPEnv][LegalizeTypes] Make ScalarizeVecOp_STRICT_FP_ROUND do its own replacements and return SDValue()
The caller will assert for nodes with more than 2 results unless
we return a null SDValue.

I tried to test this by copying an AArch64 test for ScalarizeVecOp_FP_ROUND.
While it did hit the assert and this commited fixed that. It also
hit a later problem that couldn't be fixed without adding strict
FP support to AArch64.
2019-12-17 15:17:43 -08:00
Craig Topper 84d8fa30f9 [FPEnv][LegalizeTypes][LegalizeDAG][AArch64] Few fixes/improvements for legalizing fp<->int conversion nodes.
This started with adding a test to support get code coverage on
ScalarizeVecOp_UnaryOp_StrictFP by copying an existing AArch64 test
and using constrained sitofp/uitofp intrinsics.

This found 3 separate issues:
-ScalarizeVecOp_UnaryOp_StrictFP needs to do its own replacement
 because the caller can't handle replacing multiple results.
-Missing integer promotion support for sitofp/uitofp
-Chain result not always assigned in ExpandLegalINT_TO_FP.

Committing them together so I can add the test case.
2019-12-17 14:37:00 -08:00
Sanjay Patel 6a77e36975 [SDAG] adjust isNegatibleForFree calculation to avoid crashing
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets to show the problem more
generally.

We check the number of uses as a predicate for whether some value is free to negate,
but that use count can change as we rewrite the expression in getNegatedExpression().
So something that was marked free to negate during the cost evaluation phase becomes
not free to negate during the rewrite phase (or the inverse - something that was not
free becomes free). This can lead to a crash/assert because we expect that everything
in an expression that is negatible to be handled in the corresponding code within
getNegatedExpression().

This patch adds a hack to work-around the case where we probably no longer detect
that either multiply operand of an FMA isNegatibleForFree which is assumed to be
true when we started rewriting the expression.

Differential Revision: https://reviews.llvm.org/D70975
2019-12-17 13:49:15 -05:00
Sanjay Patel 5b0251da1c Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()"
This reverts commit 36b1232ec5.
Need to adjust commit message - that was a leftover from the earlier version.
2019-12-17 13:47:59 -05:00
Sanjay Patel 36b1232ec5 [SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets to show the problem more
generally.

We check the number of uses as a predicate for whether some value is free to negate,
but that use count can change as we rewrite the expression in getNegatedExpression().
So something that was marked free to negate during the cost evaluation phase becomes
not free to negate during the rewrite phase (or the inverse - something that was not
free becomes free). This can lead to a crash/assert because we expect that everything
in an expression that is negatible to be handled in the corresponding code within
getNegatedExpression().

This patch adds a hack to work-around the case where we probably no longer detect
that either multiply operand of an FMA isNegatibleForFree which is assumed to be
true when we started rewriting the expression.

Differential Revision: https://reviews.llvm.org/D70975
2019-12-17 13:46:06 -05:00
Amaury Séchet ff6567cc77 [DAGCombiner] Add node back in the worklist in topological order in CommitTargetLoweringOpt
Summary:
Right now, DAGCombiner process the nodes in an iplementation defined order. This tends to be fragile as optimisation may or may not kick in depending on the traversal order.

This is part of a larger effort to get the DAGCombiner to process its node in topological order.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70921
2019-12-17 18:26:16 +01:00
Kevin P. Neal b1d8576b0a This adds constrained intrinsics for the signed and unsigned conversions
of integers to floating point.

This includes some of Craig Topper's changes for promotion support from
D71130.

Differential Revision: https://reviews.llvm.org/D69275
2019-12-17 10:06:51 -05:00
Craig Topper 13ce7c1291 [LegalizeTypes] Pre-size the SmallVectors in ScalarizeVecRes_StrictFPOp and SplitVecRes_StrictFPOp so we don't have to call push_back. NFCI
This avoids grow checking/handling in each iteration of the loop.
2019-12-16 23:42:13 -08:00
Craig Topper c738ebc1f5 [LegalizeTypes] Remove ScalarizeVecRes_STRICT_FP_ROUND in favor of just using ScalarizeVecRes_StrictFPOp. NFCI
It looks like ScalarizeVecRes_StrictFPOp can handle a variable
number of arguments with scalar and vector types so it should
be sufficient.
2019-12-16 23:42:13 -08:00
Craig Topper c4d2bb1ede [LegalizeTypes] Remove the call to SplitVecRes_UnaryOp from SplitVecRes_StrictFPOp. NFCI
It doesn't seem to do anything that SplitVecRes_StrictFPOp can't
do. SplitVecRes_StrictFPOp already handles nodes with a variable
number of arguments and a mix of scalar and vector arguments.
2019-12-16 23:42:13 -08:00
Craig Topper 4e48513b47 [SelectionDAG] Add the fpexcept flag to the SelectionDAG dumping output so we can better see when its not propagating.
We're currently losing this flag in type legalization and probably
other places when we expand strict fp nodes. This will make
reading logs easier.
2019-12-16 18:05:11 -08:00
Sanjay Patel 2afe864118 [DAG] Add SimplifyDemandedBits support for BSWAP
This exposes a shortcoming for AArch64, and that is tracked by PR40881:
https://bugs.llvm.org/show_bug.cgi?id=40881

Patch by: @RKSimon (Simon Pilgrim)

Differential Revision: https://reviews.llvm.org/D58017
2019-12-15 08:52:34 -05:00
Craig Topper 1dc0c8af5e [LegalizeTypes] Teach BitcastToInt_ATOMIC_SWAP to only create FP16_TO_FP when called from PromoteFloatResult.
There's also a call from SoftenFloatResult that should not be promoted.

The change test case would fail with the new RUN line prior to
this change.
2019-12-14 15:05:32 -08:00
Craig Topper 95ce8f9498 [LegalizeTypes] In PromoteFloatOp_SETCC, don't both querying for transforming the result type.
The result type is already legal, is doesnt' need to be
transformed.
2019-12-14 15:05:32 -08:00
Alex Richardson 11448eeb72 [NFC] Use SelectionDAG::getMemBasePlusOffset() instead of getNode(ISD::ADD)
Summary:
To find potential opportunities to use getMemBasePlusOffset() I looked at
all ISD::ADD uses found with the regex getNode\(ISD::ADD,.+,.+Ptr
in lib/CodeGen/SelectionDAG. If this patch is accepted I will convert
the files in the individual backends too.

The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.

We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.

Reviewers: spatel

Reviewed By: spatel

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71207
2019-12-13 21:40:03 +00:00
Alex Richardson fc83f53a86 [NFC] Implement SelectionDAG::getObjectPtrOffset() using getMemBasePlusOffset()
Summary:
This change is preparatory work to use this helper functions in more places.
In order to make this change, getMemBasePlusOffset() has been extended to
also take a SDNodeFlags parameter.

The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.

We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.

Reviewers: spatel

Reviewed By: spatel

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71206
2019-12-13 21:40:03 +00:00
Alex Richardson ea8888d1af [NFC] Add a SDValue overload for SelectionDAG::getMemBasePlusOffset()
Summary:
This change is preparatory work to use this helper functions in more places.
Currently the function only allows integer constants offsets, but there
are cases where we can use an existing SDValue parameter.

The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.

We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.

Reviewers: spatel, craig.topper

Reviewed By: spatel, craig.topper

Subscribers: craig.topper, merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71205
2019-12-13 21:40:03 +00:00
Alex Richardson d9bb70acd7 [NFC] Change SelectionDAG::getMemBasePlusOffset() to use int64_t
Summary:
This change is preparatory work to use this helper functions in more places.
Currently the function only allows positive offsets, but there are cases
where we want to subtract an offset from an existing pointer.

The motivation for this change is our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project). We use a separate register
type to store pointers (128-bit capabilities, which are effectively
unforgeable and monotonic fat pointers). These capabilities permit a
reduced set of operations and therefore use a separate ValueType (iFATPTR).
to represent pointers implemented as capabilities.
Therefore, we need to avoid using ISD::ADD for our patterns that operate
on pointers and need to use a function that chooses ISD::ADD or a new
ISD::PTRADD opcode depending on the value type.

We originally added a new DAG.getPointerAdd() function, but after this
patch series we can modify the implementation of getMemBasePlusOffset()
instead. Avoiding direct uses of ISD::ADD for pointer types will
significantly reduce the amount of assertion/instruction selection
failures for us in future upstream merges.

Reviewers: spatel

Reviewed By: spatel

Subscribers: merge_guards_bot, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71204
2019-12-13 21:40:03 +00:00
Sanjay Patel 2f0c7fd2db [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc (2nd try)
The initial attempt (rG89633320) botched the logic by reversing
the source/dest types. Added x86 tests for additional coverage.
The vector tests show a potential improvement (fold vector load
instead of broadcasting), but that's a known/existing problem.

This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm

Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16

Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16

...because D58017 exposes a regression without this fold.
2019-12-13 14:03:54 -05:00
Nicola Zaghen 97572775d2 Reland [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

This fixes the buildbot failures.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-13 14:30:21 +00:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Kerry McLaughlin 4194ca8e5a Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch
to use reg + imm non-temporal loads to fix previous test failures.

Original commit message:

Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
2019-12-13 10:08:20 +00:00
Craig Topper 5c80a4f454 [LegalizeTypes] Remove unnecessary if before calling ReplaceValueWith on the chain in SoftenFloatRes_LOAD.
I believe this is a leftover from when fp128 was softened to fp128
on X86-64. In that case type legalization must have been able to
create a load that was the same as N which would make this
replacement fail or assert. Since we no longer do that, this
check should be unneeded.
2019-12-13 00:14:41 -08:00
Sanjay Patel 9432937190 Revert "[DAGCombiner] fold shift-trunc-shift to shift-mask-trunc"
This reverts commit 8963332c33.
There was a logic bug typo in this code, but it wasn't visible in the asm for the tests.
2019-12-12 16:24:40 -05:00
Sanjay Patel 8963332c33 [DAGCombiner] fold shift-trunc-shift to shift-mask-trunc
This fold is done in IR by instcombine, and we have a special
form of it already here in DAGCombiner, but we want the more
general transform too:
https://rise4fun.com/Alive/3jZm

Name: general
Pre: (C1 + zext(C2) < 64)
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%a = and i64 %s2, zext((1 << (16 - C2)) - 1)
%r = trunc %a to i16

Name: special
Pre: C1 == 48
%s = lshr i64 %x, C1
%t = trunc i64 %s to i16
%r = lshr i16 %t, C2
=>
%s2 = lshr i64 %x, C1 + zext(C2)
%r = trunc %s2 to i16

...because D58017 exposes a regression without this fold.
2019-12-12 15:44:13 -05:00
Sanjay Patel b39009bf1d [DAGCombiner] improve readability
This is not quite NFC because I changed the SDLoc to use the more
standard 'N' (the starting node for the fold).

This transform is a special-case of a more general fold that we
do in IR, but it seems like the general fold is needed here too
to avoid a potential regression seen in D58017.

https://rise4fun.com/Alive/3jZm
2019-12-12 13:16:50 -05:00
stozer e39e2b4a79 [DebugInfo] Prevent invalid fragments at ISel from dropping debug info
During SelectionDAG, if a value which is associated with a DBG_VALUE
needs to be split across multiple registers, the DBG_VALUE will be split
into a set of fragment expressions to recreate the original value.

If one or more of these fragments cannot be created, they would
previously be silently dropped, causing the old debug value to live past
its expiry date. This patch fixes this issue by keeping invalid
fragments while setting their value as Undef.

Differential revision: https://reviews.llvm.org/D70248
2019-12-12 12:28:39 +00:00
Nicola Zaghen f798eb21ec Temporarily Revert "[DataLayout] Fix occurrences that size and range of pointers are assumed to be the same."
This reverts commit 5f6208778f.

This caused failures in Transforms/PhaseOrdering/scev-custom-dl.ll
const: Assertion `getBitWidth() == CR.getBitWidth() && "ConstantRange types don't agree!"' failed.
2019-12-12 10:29:54 +00:00
Nicola Zaghen 5f6208778f [DataLayout] Fix occurrences that size and range of pointers are assumed to be the same.
GEP index size can be specified in the DataLayout, introduced in D42123. However, there were still places
in which getIndexSizeInBits was used interchangeably with getPointerSizeInBits. This notably caused issues
with Instcombine's visitPtrToInt; but the unit tests was incorrect, so this remained undiscovered.

Differential Revision: https://reviews.llvm.org/D68328

Patch by Joseph Faulls!
2019-12-12 10:07:01 +00:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Sanjay Patel cdf5cfea8e Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()"
This reverts commit d1f0bdf2d2.
The patch can cause infinite loops in DAGCombiner.
2019-12-11 16:56:58 -05:00
Craig Topper 4b452952fe [LegalizeTypes] In SoftenFloatRes_FP_EXTEND, move the check for input already being promoted above the check for fp16 converting to something other than fp32.
The fp16 to larger than fp32 inserts an extend that need to
re-legalized if fp16 is promoted. But if we check for fp16
promotion first, then we can avoid emiting the fp_extend all
together.
2019-12-11 12:48:08 -08:00
Sanjay Patel d1f0bdf2d2 [SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets
to show the problem more generally.

We check the number of uses as a predicate for whether some
value is free to negate, but that use count can change as we
rewrite the expression in getNegatedExpression(). So something
that was marked free to negate during the cost evaluation
phase becomes not free to negate during the rewrite phase (or
the inverse - something that was not free becomes free).
This can lead to a crash/assert because we expect that
everything in an expression that is negatible to be handled
in the corresponding code within getNegatedExpression().

This patch skips the use check during the rewrite phase.
So we determine that some expression isNegatibleForFree
(identically to without this patch), but during the rewrite,
don't rely on use counts to decide how to create the optimal
expression.

Differential Revision: https://reviews.llvm.org/D70975
2019-12-11 13:30:39 -05:00
Kerry McLaughlin c0a3ab3655 Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
This reverts commit 3f5bf35f86 as it was
causing build failures in llvm-clang-x86_64-expensive-checks:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045
2019-12-11 13:58:39 +00:00
Kerry McLaughlin 3f5bf35f86 [AArch64][SVE] Implement intrinsics for non-temporal loads & stores
Summary:
Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.

Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71000
2019-12-11 11:13:51 +00:00
Craig Topper d4345636e6 [LegalizeTypes] Remove manual worklist management from SoftenFloatRes_FP_EXTEND.
I think this is no longer needed. The system should take care
of legalizing any new nodes that are added. I think this might
have been needed prior to r371709 or r307053.
2019-12-10 22:33:31 -08:00
Wang, Pengfei 21bc8631fe [FPEnv][X86] Constrained FCmp intrinsics enabling on X86
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.

Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70582
2019-12-11 08:23:09 +08:00
Mikael Holmen 4763267eee [LegalizeTypes] Bugfixes for big-endian targets when handling BITCASTs
Summary:
This fixes PR44135.

The special case when we promote a bitcast from a vector to an int
needs special handling when we are on a big-endian target.

Prior to this fix, for the added vec_to_int we see the following in the
SelectionDAG printouts

Type-legalized selection DAG: %bb.1 'foo:bb.1'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
        t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
      t17: v4i32 = bitcast t2
    t23: i32 = extract_vector_elt t17, Constant:i32<3>
  t8: ch,glue = CopyToReg t0, Register:i32 $r0, t23
  t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1

and I think here the extract_vector_elt is wrong and extracts the value
from the wrong index.

The program program should return the 32 bits made up of the elements at
index 4 and 5 in the vec6 array, but with

    t23: i32 = extract_vector_elt t17, Constant:i32<3>

as far as I can tell, we will extract values that originally didn't even
exist in the vec6 vectore.

If we would instead extract the element at index 2 we would get the wanted
values.

With this fix we insert a right shift after the bitcast in
DAGTypeLegalizer::PromoteIntRes_BITCAST which then gives us

Type-legalized selection DAG: %bb.1 'vec_to_int:bb.1'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
        t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
      t23: v4i32 = bitcast t2
    t27: i32 = extract_vector_elt t23, Constant:i32<2>
  t8: ch,glue = CopyToReg t0, Register:i32 $r0, t27
  t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1

So now we get

    t27: i32 = extract_vector_elt t23, Constant:i32<2>

which is what we want.

Similarly, the new int_to_vec testcase exposes a bug where we cast the other
direction. Then we instead need to add a left shift before the bitcast on
big-endian targets for the bits in the input integer to end up at the exptected
place in the vector.

Reviewers: bogner, spatel, craig.topper, t.p.northover, dmgreen, efriedma, SjoerdMeijer, samparker

Reviewed By: efriedma

Subscribers: eli.friedman, bjope, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70942
2019-12-10 11:22:35 +01:00
Hiroshi Yamauchi d9ae493937 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

A second try after reverted D71072.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71149
2019-12-09 12:42:59 -08:00
Ulrich Weigand 9db13b5a7d [FPEnv] Constrained FCmp intrinsics
This adds support for constrained floating-point comparison intrinsics.

Specifically, we add:

      declare <ty2>
      @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
                                          metadata <condition code>,
                                          metadata <exception behavior>)
      declare <ty2>
      @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
                                           metadata <condition code>,
                                           metadata <exception behavior>)

The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).

The condition code is implemented as a metadata string.  The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).

These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations.  Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes.  The patch includes support
for the common legalization operations for those nodes.

The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).

Differential Revision: https://reviews.llvm.org/D69281
2019-12-07 11:28:39 +01:00
Craig Topper 28b573d249 [TargetLowering] Fix another potential FPE in expandFP_TO_UINT
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:

// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)

Instead, I'd suggest to use the following sequence:

// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).

In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.

There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)

Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper

Differential Revision: https://reviews.llvm.org/D67105
2019-12-06 14:11:04 -08:00
Hiroshi Yamauchi 2eb30fafa5 Revert "[PGO][PGSO] Instrument the code gen / target passes."
This reverts commit 9a0b5e1407.

This seems to break buildbots.
2019-12-06 12:17:32 -08:00
Hiroshi Yamauchi 9a0b5e1407 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71072
2019-12-06 10:43:39 -08:00
John Brawn 984f1bb3e7 [LegalizeTypes] Add missing case for STRICT_FP_ROUND softening
This fixes a test failure in test/CodeGen/ARM/fp-intrinsics.ll.
2019-12-06 15:54:27 +00:00
Ulrich Weigand daee549b17 [FPEnv][SelectionDAG] Relax chain requirements
This patch implements the following changes:

1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats
each constrained intrinsic like a global barrier (e.g. a function call)
and fully serializes all pending chains. This is actually not required;
it is allowed for constrained intrinsics to be reordered w.r.t one
another or (nonvolatile) memory accesses. The MI-level scheduler already
allows for that flexibility, so it makes sense to allow it at the DAG
level as well.

This patch therefore changes the way chains for constrained intrisincs
are created, and handles them basically like load operations are handled.
This has the effect that constrained intrinsics are no longer serialized
against one another or (nonvolatile) loads. They are still serialized
against stores, but that seems hard to change with the current DAG chain
setup, and it also doesn't seem to be a big problem preventing DAG

2) The OPC_CheckFoldableChainNode check requires that each of the
intermediate nodes in a multi-node pattern match only has a single use.
This check tends to fail if those intermediate nodes are strict operations
as those have a chain output that typically indeed has another use.
However, we don't really need to consider chains here at all, since they
will all be rewritten anyway by UpdateChains later. Other parts of the
matcher therefore already ignore chains, but this hasOneUse check doesn't.

This patch replaces hasOneUse by a custom test that verifies there is no
more than one use of any non-chain output value.

In theory, this change could affect code unrelated to strict FP nodes,
but at least on SystemZ I could not find any single instance of that
happening

3) The SystemZ back-end currently does not allow matching multiply-and-
extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for
strict FP operations.  This was not possible in the past due to the
problems described under 1) and 2) above.

With those issues fixed, it is now possible to fully support those
instructions in strict mode as well, and this patch does so.

Differential Revision: https://reviews.llvm.org/D70913
2019-12-06 11:02:11 +01:00
Amy Huang 9e978bb01c Add support for lowering 32-bit/64-bit pointers
Summary:
This follows a previous patch that changes the X86 datalayout to represent
mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces
(https://reviews.llvm.org/D64931)

This patch implements the address space cast lowering to the corresponding
sign extension, zero extension, or truncate instructions.

Related to https://bugs.llvm.org/show_bug.cgi?id=42359

Reviewers: rnk, craig.topper, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69639
2019-12-04 11:39:03 -08:00
Ulrich Weigand c3d05c1b52 [SelectionDAG] Expand nnan FMINNUM/FMAXNUM to select sequence
InstCombine may synthesize FMINNUM/FMAXNUM nodes from fcmp+select
sequences (where the fcmp is marked nnan).  Currently, if the
target does not otherwise handle these nodes, they'll get expanded
to libcalls to fmin/fmax.  However, these functions may reside in
libm, which may introduce a library dependency that was not originally
present in the source code, potentially resulting in link failures.

To fix this problem, add code to TargetLowering::expandFMINNUM_FMAXNUM
to expand FMINNUM/FMAXNUM to a compare+select sequence instead of the
libcall. This is done only if the node is marked as "nnan"; in this case,
the expansion to compare+select is always correct. This also suffices to
catch all cases where FMINNUM/FMAXNUM was synthesized as above.

Differential Revision: https://reviews.llvm.org/D70965
2019-12-04 10:32:35 +01:00
Craig Topper f586fd44e4 [FPEnv] [PowerPC] Lowering ppc_fp128 StrictFP Nodes to libcalls
This is an alternative to D64662 that shares more code between
strict and non-strict nodes. It's modeled after the implementation
that I did for softening.

Differential Revision: https://reviews.llvm.org/D70867
2019-12-03 14:11:21 -08:00
Roman Lebedev 9a20c79ddc
[NFC][KnownBits] Add getMinValue() / getMaxValue() methods
As it can be seen from accompanying cleanup, it is not unheard of
to write `~Known.Zero` meaning "what maximal value can this KnownBits
produce". But i think `~Known.Zero` isn't *that* self-explanatory,
as compared to a method with a name.

Note that not all `~Known.Zero` places were cleaned up,
only those where this arguably improves things.
2019-12-03 20:04:51 +03:00
Amaury Séchet b4980f7781 [SelectionDAG] Reoder ViewXXXDAGs declarations to match execution order. NFC 2019-12-03 16:26:12 +01:00
Craig Topper 039664db87 [LegalizeDAG] Return true from ExpandNode for some nodes that don't have expand support.
These nodes have a FIXME that they only get here because a Custom
handler returned SDValue() instead of the original Op.

Even though we aren't expanding them, we should return true here to
prevent ConvertNodeToLibcall from also trying to process them until
the FIXME has been addressed.

I'm hoping to add checking to ConvertNodeToLibcall to make sure
we don't give it nodes it doesn't have support for.
2019-12-02 23:39:20 -08:00
Craig Topper f92000187e [LegalizeDAG] When expanding vector SRA/SRL/SHL add the new BUILD_VECTOR to the Results vector instead of just calling ReplaceNode
The code that processes the Results vector also calls ReplaceNode
and makes ExpandNode return true.

If we don't add it to the Results node, we end up returning false
from ExpandNode. This causes ConvertNodeToLibcall to be called next.
But ConvertNodeToLibcall doesn't do anything for shifts so they
just pass through unmodified. Except for printing a debug message.

Ultimately, I'd like to add more checks to ExpandNode and
ConvertNodeToLibcall to make sure we don't have nodes marked as
Expand that don't have any Expand or libcall handling.
2019-12-02 23:07:39 -08:00
Amaury Séchet c594d14d40 [DAGCombine] Factor oplist operations. NFC 2019-12-02 19:12:03 +01:00
Amaury Séchet d8d5106225 [SelectionDAG] Reduce assumptions made about levels. NFC 2019-12-02 17:43:13 +01:00
Craig Topper 2f3e8cb313 [LegalizeTypes] Add strict FP support to SoftenFloatRes_FP_ROUND. Fix mistake in SoftenFloatRes_FP_EXTEND.
These will be needed for ARM fp-instrinsics.ll which is currently
XFAILed.

One of the getOperand calls in SoftenFloatRes_FP_EXTEND was not
taking strict FP into account. It only affected the call
to setTypeListBeforeSoften which only has an effect on some targets.
2019-11-28 15:32:09 -08:00
Craig Topper 68ddf434c0 [LegalizeTypes] In SoftenFloatRes_FNEG, always generate integer arithmetic, never fall back to using fsub.
We would previously fallback if the type wasn't f32/f64/f128. But
I don't think any of the other floating point types ever go through
the softening code anyway. So this code is dead.
2019-11-28 15:30:34 -08:00
Craig Topper 2485fa7739 [LegalizeTypes] Use SoftenFloatRes_Unary in SoftenFloatRes_FCBRT to reduce code.
We don't have a STRICT_CBRT ISD opcode, but we can still
use SoftenFloatRes_Unary to simplify some code.
2019-11-28 15:30:34 -08:00
Amaury Séchet ca818f4550 [DAGCombiner] Peek through vector concats when trying to combine shuffles.
Summary: This combine showed up as needed when exploring the regression when processing the DAG in topological order.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68195
2019-11-28 23:57:29 +01:00
Craig Topper 735f4793f1 [LegalizeTypes] Remove dead code related to softening f16 which we no longer do.
f16 is promoted to f32 if it is not legal on the target.

Found while reviewing what else needed to be done for strict FP in
the softening code.
2019-11-27 22:10:30 -08:00
Craig Topper ed521fef03 [LegalTypes][X86] Add SoftenFloatOperand support for STRICT_FP_TO_SINT/STRICT_FP_TO_UINT. 2019-11-27 21:16:13 -08:00
Craig Topper 1727c4f1a2 [LegalizeTypes][X86] Add ExpandIntegerResult support for STRICT_FP_TO_SINT/STRICT_FP_TO_UINT. 2019-11-27 18:41:45 -08:00
Craig Topper ebfff46c8d [LegalizeTypes][FPEnv][X86] Add initial support for softening strict fp nodes
This is based on what's required for softening fp128 operations on 32-bit X86 assuming f32/f64/f80 are legal. So there could be some things missing.

Differential Revision: https://reviews.llvm.org/D70654
2019-11-27 10:50:10 -08:00
Craig Topper 350565dbc0 [LegalizeTypes] Add SoftenFloatOp_Unary to reduce some duplication for softening LRINT/LLRINT/LROUND/LLROUND
Summary: This will be enhanced in a follow up to add strict fp support

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70751
2019-11-26 17:37:51 -08:00
Craig Topper 9b08366f57 [LegalizeTypes] Add SoftenFloatRes_Unary and SoftenFloatRes_Binary functions to factor repeated patterns out of many of the SoftenFloatRes_* functions
This has been factored out of D70654 which will add strict FP support to these functions. By making the helpers we avoid repeating even more code.

Differential Revision: https://reviews.llvm.org/D70736
2019-11-26 12:52:17 -08:00
Craig Topper ee3b375b4c [LegalizeDAG] Use getOperationAction instead of getStrictFPOperationAction for STRICT_LRINT/LROUND/LLRINT/LLROUND. 2019-11-26 11:57:45 -08:00
David Green b5315ae8ff [Codegen][ARM] Add addressing modes from masked loads and stores
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.

To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.

This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
   Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
   legality of masked operations as well as normal ones. This array is
   fairly small, so doubling the size still won't make it very large.
   Offset masked loads can then be controlled with
   setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
   CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
   the same way.
- The ARM backend is then adjusted to make use of these indexed masked
   loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.

Differential Revision: https://reviews.llvm.org/D70176
2019-11-26 16:21:01 +00:00
Luís Marques 6fd4c42fa8 [LegalizeTypes][RISCV] Soften FCOPYSIGN operand
Summary: Adds support for softening FCOPYSIGN operands.
Adds RISC-V tests that exercise the new softening code.

Reviewers: asb, lenary, efriedma
Reviewed By: efriedma
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70679
2019-11-26 15:22:55 +00:00
Craig Topper 3dc7c5f7d8 [LegalizeTypes] Remove code to create ISD::FP_TO_FP16 from SoftenFloatRes_FTRUNC.
There seems to have been a misunderstanding of what ISD::FTRUNC
represents. ISD::FTRUNC is equivalent to llvm.trunc which takes
a floating point value, truncates it without changing the size
of the value and returns it.

Despite its similar name, its different than the fptrunc instruction
in IR which changes a floating point value to a smaller floating
point value. fptrunc is represented by ISD::FP_ROUND in SelectionDAG.

Since the ISD::FP_TO_FP16 node takes a floating point value and
converts it to f16 its more similar to ISD::FP_ROUND. In fact there
is identical code to what is being removed here in SoftenFloatRes_FP_ROUND.

I assume this bug was never encountered because it would require
f16 to be legalized by softening rather than the default of
promoting.
2019-11-25 18:18:40 -08:00
Sanjay Patel 214683f3b2 [DAGCombiner] avoid crash on out-of-bounds insert index (PR44139)
We already have this simplification at node-creation-time, but
the test from:
https://bugs.llvm.org/show_bug.cgi?id=44139
...shows that we can combine our way to an assert/crash too.
2019-11-25 16:24:06 -05:00
Craig Topper d6ec6e4bf6 [TargetLowering] Merge ExpandChainLibCall with makeLibCall
I need to be able to drop an operand for STRICT_FP_ROUND handling on X86. Merging these functions gives me the ArrayRef interface that passes the return type, operands, and debugloc instead of the Node.

Differential Revision: https://reviews.llvm.org/D70503
2019-11-25 10:52:49 -08:00
Clement Courbet cb15ba84fe Reland "[DAGCombiner] Allow zextended load combines."
Check that the generated type is simple.
2019-11-22 14:47:18 +01:00
Roman Lebedev 96cf5c8d47
[Codegen] TargetLowering::prepareUREMEqFold(): `x u% C1 ==/!= C2` (PR35479)
Summary:
The current lowering is:
```
Name: (X % C1) == C2 -> X * C3 <= C4 || false
Pre: (C2 == 0 || C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4
%res = icmp ule i8 %n3, %C4_fixed
%r = xor i1 %res, %is_tautologically_false
```
https://rise4fun.com/Alive/2xC
https://rise4fun.com/Alive/jpb5

However, we can support non-tautological cases `C1 u> C2` too.
Said handling consists of two parts:
* `C2 u<= (-1 %u C1)`. It just works. We only have to change `(X % C1) == C2` into `((X - C2) % C1) == 0`
```
Name: (X % C1) == C2 -> (X - C2) * C3 <= C4   iff C2 u<= (-1 %u C1)
Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1 && C2 u<= (-1 %u C1)
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = (-1 /u C1)
%n0 = sub i8 %x, C2
%n1 = mul i8 %n0, C3
%n2 = lshr i8 %n1, countTrailingZeros(C1) ; rotate right
%n3 = shl i8 %n1, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n4 = or i8 %n2, %n3 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4
%res = icmp ule i8 %n4, %C4_fixed
%r = xor i1 %res, %is_tautologically_false
```
https://rise4fun.com/Alive/m4P
https://rise4fun.com/Alive/SKrx
* `C2 u> (-1 %u C1)`. We also have to change `(X % C1) == C2` into `((X - C2) % C1) == 0`,
  and we have to decrement C4:
```
Name: (X % C1) == C2 -> (X - C2) * C3 <= C4   iff C2 u> (-1 %u C1)
Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1 && C2 u> (-1 %u C1)
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = (-1 /u C1)-1
%n0 = sub i8 %x, C2
%n1 = mul i8 %n0, C3
%n2 = lshr i8 %n1, countTrailingZeros(C1) ; rotate right
%n3 = shl i8 %n1, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n4 = or i8 %n2, %n3 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4
%res = icmp ule i8 %n4, %C4_fixed
%r = xor i1 %res, %is_tautologically_false
```
https://rise4fun.com/Alive/d40
https://rise4fun.com/Alive/8cF

I believe this concludes `x u% C1 ==/!= C2` lowering.
In fact, clang is may now be better in this regard than gcc:
as it can be seen from `@t32_6_4` test, we do lower `x % 6 == 4`
via this pattern, while gcc does not: https://godbolt.org/z/XNU2z9
And all the general alive proofs say this is legal.
And manual checking agrees: https://rise4fun.com/Alive/WA2

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=35479 | PR35479 ]].

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: nick, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70053
2019-11-22 15:22:42 +03:00
Roman Lebedev 3f46022e33
[Codegen] TargetLowering::prepareUREMEqFold(): `x u% C1 ==/!= C2` with tautological C1 u<= C2 (PR35479)
Summary:
This is a preparatory cleanup before i add more
of this fold to deal with comparisons with non-zero.

In essence, the current lowering is:
```
Name: (X % C1) == 0 -> X * C3 <= C4
Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, 0
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%r = icmp ule i8 %n3, %C4
```
https://rise4fun.com/Alive/oqd

It kinda just works, really no weird edge-cases.
But it isn't all that great for when comparing with non-zero.
In particular, given `(X % C1) == C2`, there will be problems
in the always-false tautological case where `C2 u>= C1`:
https://rise4fun.com/Alive/pH3

That case is tautological, always-false:
```
Name: (X % Y) u>= Y
%o0 = urem i8 %x, %y
%r = icmp uge i8 %o0, %y
  =>
%r = false
```
https://rise4fun.com/Alive/ofu

While we can't/shouldn't get such tautological case normally,
we do deal with non-splat vectors, so unless we want to give up
in this case, we need to fixup/short-circuit such lanes.

There are two lowering variants:
1. We can blend between whatever computed result and the correct tautological result
```
Name: (X % C1) == C2 -> X * C3 <= C4 || false
Pre: (C2 == 0 || C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%res = icmp ule i8 %n3, %C4
%r = select i1 %is_tautologically_false, i1 0, i1 %res
```
https://rise4fun.com/Alive/PjT5
https://rise4fun.com/Alive/1KV

2. We can invert the comparison result
```
Name: (X % C1) == C2 -> X * C3 <= C4 || false
Pre: (C2 == 0 || C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
  =>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4
%res = icmp ule i8 %n3, %C4_fixed
%r = xor i1 %res, %is_tautologically_false
```
https://rise4fun.com/Alive/2xC
https://rise4fun.com/Alive/jpb5

3. We can expand into `and`/`or`:
https://rise4fun.com/Alive/WGn
https://rise4fun.com/Alive/lcb5

Blend-one is likely better since we avoid having to load the
replacement from constant pool. `xor` is second best since
it's still pretty general. I'm not adding `and`/`or` variants.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: nick, hiraditya, xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70051
2019-11-22 15:16:03 +03:00
Clement Courbet 88e205525c Revert "[DAGCombiner] Allow zextended load combines."
Breaks some bots.
2019-11-22 09:01:08 +01:00
Clement Courbet 036790f988 [DAGCombiner] Allow zextended load combines.
Summary: or(zext(load8(base)), zext(load8(base+1)) -> zext(load16 base)

Reviewers: apilipenko, RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70487
2019-11-22 08:40:19 +01:00
Pengfei Wang 22a0edd070 [FPEnv] Add an option to disable strict float node mutating to an normal
float node

This patch add an option 'disable-strictnode-mutation' to prevent strict
node mutating to an normal node.
So we can make sure that the patch which sets strict-node as legal works
correctly.

Patch by Chen Liu(LiuChen3)

Differential Revision: https://reviews.llvm.org/D70226
2019-11-21 18:07:11 -08:00
Craig Topper 7696b99258 [LegalizeDAG][X86] Add support for turning STRICT_FADD/SUB/MUL/DIV into libcalls. Use it for fp128 on x86-64.
This requires a minor hack for f32/f64 strict fadd/fsub to avoid
turning those into libcalls.
2019-11-21 16:19:25 -08:00
Hiroshi Yamauchi 52e377497d [PGO][PGSO] DAG.shouldOptForSize part.
Summary:
(Split of off D67120)

SelectionDAG::shouldOptForSize changes for profile guided size optimization.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70095
2019-11-21 14:16:00 -08:00
Tom Stellard ab411801b8 [cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO.  I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so.  Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:

1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so.  This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.

With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.

2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set.  This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.

I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:

- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON

Reviewers: beanz, smeenai, compnerd, phosek

Reviewed By: beanz

Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70179
2019-11-21 10:48:08 -08:00
Clement Courbet 252567377c [DAGCombine][NFC] Use ArrayRef and correctly size SmallVectors.
In preparation for D70487.
2019-11-21 08:53:37 +01:00
Craig Topper c9e8e808cf [SelectionDAG][X86] Mutate strictFP nodes to non-strict in DoInstructionSelection when the node is marked Expand rather than when it is not Legal.
This allows operations that are marked Custom, but have some type
combinations that are legal to get past this code.

Add custom mutation code to X86's Select function for the nodes
that don't have isel patterns yet.
2019-11-20 10:36:02 -08:00
David Zarzycki 257acbf6ae
[SelectionDAG] Combine U{ADD,SUB}O diamonds into {ADD,SUB}CARRY
Summary:
Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR.

Work remaining: match ADD, etc.

Reviewers: craig.topper, RKSimon, spatel, niravd, jonpa, uweigand, deadalnix, nikic, lebedev.ri, dmgreen, chfast

Reviewed By: lebedev.ri

Subscribers: chfast, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70079
2019-11-20 16:25:42 +02:00
Serge Pavlov ea8678d1c7 Move floating point related entities to namespace level
This is recommit of commit e6584b2b7b, which was reverted in
30e7ee3c4b together with af57dbf12e.
Original message is below.

Enumerations that describe rounding mode and exception behavior were
defined inside ConstrainedFPIntrinsic. It makes sense to use the same
definitions to represent the same properties in other cases, not only
in constrained intrinsics. It was however inconvenient as required to
include constrained intrinsics definitions even if they were not needed.
Also using long scope prefix reduced readability.

This change moves these definitioins to the namespace llvm::fp.
No functional changes.

Differential Revision: https://reviews.llvm.org/D69552
2019-11-20 19:05:46 +07:00
Serge Pavlov 0c50c0b055 [FEnv] File with properties of constrained intrinsics
Summary
In several places we need to enumerate all constrained intrinsics or IR
nodes that should be represented by them. It is easy to miss some of
the cases. To make working with these intrinsics more convenient and
robust, this change introduces file containing definitions of all
constrained intrinsics and some of their properties. This file can be
included to generate constrained intrinsics processing code.

Reviewers: kpn, andrew.w.kaylor, cameron.mcinally, uweigand

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69887
2019-11-20 13:30:07 +07:00
Craig Topper c4b41e8d1d [LegalizeDAG][X86] Enable STRICT_FP_TO_SINT/UINT to be promoted
Differential Revision: https://reviews.llvm.org/D70220
2019-11-19 16:14:37 -08:00
Matt Arsenault 7fe9435dc8 Work on cleaning up denormal mode handling
Cleanup handling of the denormal-fp-math attribute. Consolidate places
checking the allowed names in one place.

This is in preparation for introducing FP type specific variants of
the denormal-fp-mode attribute. AMDGPU will switch to using this in
place of the current hacky use of subtarget features for the denormal
mode.

Introduce a new header for dealing with FP modes. The constrained
intrinsic classes define related enums that should also be moved into
this header for uses in other contexts.

The verifier could use a check to make sure the denorm-fp-mode
attribute is sane, but there currently isn't one.

Currently, DAGCombiner incorrectly asssumes non-IEEE behavior by
default in the one current user. Clang must be taught to start
emitting this attribute by default to avoid regressions when this is
switched to assume ieee behavior if the attribute isn't present.
2019-11-19 22:01:14 +05:30
Matt Arsenault b696b9dba7 DAG: Add function context to isFMAFasterThanFMulAndFAdd
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.

AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
2019-11-19 19:25:26 +05:30
Craig Topper dc02eb1909 [SelectionDAG] Merge the two identical ExpandChainLibCall methods from LegalizeTypes and LegalizeDAG to one version in TaretLowering.
Reviewers: RKSimon, efriedma, spatel

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70354
2019-11-18 20:22:33 -08:00
Craig Topper 6e20d70a69 [LegalizeDAG] Convert strict fp nodes to libcalls without losing the chain.
Previously we mutated the node and then converted it to a libcall. But this loses the chain information.

This patch keeps the chain, but unfortunately breaks tail call optimization as the functions involved in deciding if a node is in tail call position can't handle the chain. But correct ordering seems more important to be right.

Somehow the SystemZ tests improved. I looked at one of them and it seemed that we're handling the split vector elements in a different order and that made the copies work better.

Differential Revision: https://reviews.llvm.org/D70334
2019-11-18 11:24:08 -08:00
Eric Christopher 30e7ee3c4b Temporarily Revert "Add support for options -frounding-math, ftrapping-math, -ffp-model=, and -ffp-exception-behavior="
and a follow-up NFC rearrangement as it's causing a crash on valid. Testcase is on the original review thread.

This reverts commits af57dbf12e and e6584b2b7b
2019-11-18 10:46:48 -08:00
Graham Hunter 3f08ad611a [SVE][CodeGen] Scalable vector MVT size queries
* Implements scalable size queries for MVTs, split out from D53137.

* Contains a fix for FindMemType to avoid using scalable vector type
  to contain non-scalable types.

* Explicit casts for several places where implicit integer sign
  changes or promotion from 32 to 64 bits caused problems.

* CodeGenDAGPatterns will treat scalable and non-scalable vector types
  as different.

Reviewers: greened, cameron.mcinally, sdesmalen, rovka

Reviewed By: rovka

Differential Revision: https://reviews.llvm.org/D66871
2019-11-18 12:30:59 +00:00
Craig Topper bfbbf0aba8 [LegalizeTypes] Remove SoftenFloat handling from ExpandIntRes_LLROUND_LLRINT and remove assert from the strict fp path.
These were both recently added. While the call to GetSoftenedFloat
is a little more optimal, we don't do it in the expand for
FP_TO_SINT/UINT so there's no real reason to do it here. This
avoids a FIXME for strict fp.
2019-11-17 23:48:31 -08:00
Craig Topper 5a56d2aa33 [LegalizeTypes] Remove unnecessary conversion from EVT to MVT to MVT::SimpleValueType just to assign back to EVT. NFC 2019-11-17 23:48:31 -08:00
Craig Topper af435286e5 [LegalizeTypes][X86] Add support for expanding the result type of STRICT_LLROUND and STRICT_LLRINT.
This doesn't handle softening the input type, but we don't handle
softening any of the strict nodes yet. Skipping that made it easy
to reuse an existing function for creating a libcall from a node
with a chain.
2019-11-17 20:03:05 -08:00
Craig Topper 1b0efe2b17 [LegalizeTypes] When expanding the integer result of LLROUND/LLRINT, also call GetSoftenedFloat if the floating point input needs to be softened.
Before this we were emitting a bitcast to integer from the lowering
code that itself will need to be legalized. By calling
GetSoftenedFloat we get the integer conversion in one step without
needing to relegalize a bitcast.
2019-11-17 13:31:30 -08:00
Craig Topper 9b515b6dd9 [LegalizeTypes] Remove PromoteFloat support form ExpandIntRes_LLROUND_LLRINT.
This code isn't exercised, and was in the wrong place. If we need
this, we would need to promote the type before figuring out which
libcall to use.

I'm choosing to remove it rather than fixing since we don't
support PromoteFloat for LRINT/LROUND/LLRINT/LLROUND when the
result type is legal so I don't see much reason to support it
for the case where the result type isn't legal.
2019-11-17 13:31:30 -08:00
Craig Topper d4ba11ae32 [LegalizeTypes] Merge ExpandIntRes_LLROUND and ExpandIntRes_LLRINT into one function that handles both. NFC
These too functions are were the same except for which libcall gets
emitted. Just merge them into one.

This is prep work for some other work including strict fp support.
2019-11-17 13:31:30 -08:00
Serge Pavlov e6584b2b7b Move floating point related entities to namespace level
Enumerations that describe rounding mode and exception behavior were
defined inside ConstrainedFPIntrinsic. It makes sense to use the same
definitions to represent the same properties in other cases, not only
in constrained intrinsics. It was however inconvenient as required to
include constrained intrinsics definitions even if they were not needed.
Also using long scope prefix reduced readability.

This change moves these definitioins to the namespace llvm::fp.
No functional changes.

Differential Revision: https://reviews.llvm.org/D69552
2019-11-15 19:56:33 +07:00
Reid Kleckner 5fe3f00ae2 Replace wrongly deleted header banner, fix formatting
I reviewed the diff hunks of 05da2fe521 that don't contain
'#include' lines, and found two unintended changes. I deleted a header
banner inadvertently while inserting a header, and changed the
indentation of a constructor in an odd way. Add back the banner, and
reformat the constructor.
2019-11-14 10:21:42 -08:00
Paweł Bylica 1c247dd028
[DAGCombiner] Drop redundant DAG method param. NFC 2019-11-14 14:02:53 +01:00
Paweł Bylica 9b89bda517
[DAGCombiner] Use TLI field already available. NFC 2019-11-14 14:02:52 +01:00
Reid Kleckner 05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Sander de Smalen 9a1c243aa5 [AArch64][SVE] Allocate locals that are scalable vectors.
This patch adds a target interface to set the StackID for a given type,
which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a
'sve-vec' StackID, so it is allocated in the SVE area of the stack frame.

Reviewers: ostannard, efriedma, rengolin, cameron.mcinally

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D70080
2019-11-13 09:45:24 +00:00
joanlluch d384ad6b63 [TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (4)
Summary:
Replaces
```
unsigned getShiftAmountThreshold(EVT VT)
```
by

```
bool shouldAvoidTransformToShift(EVT VT, unsigned amount)
```
thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not.

Updates the MSP430 target with a custom implementation.

This continues  D69116, D69120, D69326 and updates them, so all of them must be committed before this.

Existing tests apply, a few more have been added.

Reviewers: asl, spatel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70042
2019-11-13 09:23:08 +01:00
aqjune e87d71668e [IR] Redefine Freeze instruction
Summary:
This patch redefines freeze instruction from being UnaryOperator to a subclass of UnaryInstruction.

ConstantExpr freeze is removed, as discussed in the previous review.
FreezeOperator is not added because there's no ConstantExpr freeze.
`freeze i8* null` test is added to `test/Bindings/llvm-c/freeze.ll` as well, because the null pointer-related bug in `tools/llvm-c/echo.cpp` is now fixed.
InstVisitor has visitFreeze now because freeze is not unaryop anymore.

Reviewers: whitequark, deadalnix, craig.topper, jdoerfert, lebedev.ri

Reviewed By: craig.topper, lebedev.ri

Subscribers: regehr, nlopes, mehdi_amini, hiraditya, steven_wu, dexonsmith, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69932
2019-11-12 10:49:00 +09:00
joanlluch e0012c5d6a [TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (3)
Summary:
Additional filtering of undesired shifts for targets that do not support them efficiently.

Related with  D69116 and  D69120

Applies the TLI.getShiftAmountThreshold hook to prevent undesired generation of shifts for the following IR code:

```
define i16 @testShiftBits(i16 %a) {
entry:
  %and = and i16 %a, -64
  %cmp = icmp eq i16 %and, 64
  %conv = zext i1 %cmp to i16
  ret i16 %conv
}

define i16 @testShiftBits_11(i16 %a) {
entry:
  %cmp = icmp ugt i16 %a, 63
  %conv = zext i1 %cmp to i16
  ret i16 %conv
}

define i16 @testShiftBits_12(i16 %a) {
entry:
  %cmp = icmp ult i16 %a, 64
  %conv = zext i1 %cmp to i16
  ret i16 %conv
}
```
The attached diff file shows the piece code in TargetLowering that is responsible for the generation of shifts in relation to the IR above.

Before applying this patch, shifts will be generated to replace non-legal icmp immediates. However, shifts may be undesired if they are even more expensive for the target.

For all my previous patches in this series (cited above) I added test cases for the MSP430 target. However, in this case, the target is not suitable for showing improvements related with this patch, because the MSP430 does not implement "isLegalICmpImmediate". The default implementation returns always true, therefore the patched code in TargetLowering is never reached for that target. Targets implementing both "isLegalICmpImmediate" and "getShiftAmountThreshold" will benefit from this.

The differential effect of this patch can only be shown for the MSP430 by temporarily implementing "isLegalICmpImmediate" to return false for large immediates. This is simulated with the implementation of a command line flag that was incorporated in D69975

This patch belongs to a initiative to "relax" the generation of shifts by LLVM for targets requiring it

Reviewers: spatel, lebedev.ri, asl

Reviewed By: spatel

Subscribers: lenary, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69326
2019-11-11 10:18:25 +01:00
Eli Friedman 5df3a87224 [AArch64][X86] Don't assume __powidf2 is available on Windows.
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.

(I think this started showing up recently because we added an
optimization that converts pow to powi.)

Differential Revision: https://reviews.llvm.org/D69013
2019-11-08 12:43:21 -08:00
Sanjay Patel 777d1d1d98 [SDAG] reduce code duplication; NFC 2019-11-07 10:28:45 -05:00
Sanjay Patel 2fdd58c506 [SDAG] reduce code duplication; NFC 2019-11-07 10:15:17 -05:00
Philip Reames db036ee0a4 [X86/Atomics] Correct a few transforms for new atomic lowering
This is a partial fix for the issues described in commit message of 027aa27 (the revert of G24609).  Unfortunately, I can't provide test coverage for it on it's own as the only (known) wrong example is still wrong, but due to a separate issue.

These fixes are cases where when performing unrelated DAG combines, we were dropping the atomicity flags entirely.
2019-11-05 13:20:08 -08:00
Thomas Preud'homme 646896a442 Fix PR40644: miscompile indexed FP constant store
Summary:
Functions replaceStoreOfFPConstant() and OptimizeFloatStore() both
replace store of float by a store of an integer unconditionally. However
this generates wrong code when the store that is replaced is an indexed
or truncating store. This commit solves this issue by adding an early
return in these functions when the store being considered is not a
normal store.

Bug was only observed on out of tree targets, hence the lack of testcase
in this commit.

Reviewers: efriedma

Subscribers: hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68420
2019-11-05 11:07:52 +00:00
aqjune 58acbce3de [IR] Add Freeze instruction
Summary:
- Define Instruction::Freeze, let it be UnaryOperator
- Add support for freeze to LLLexer/LLParser/BitcodeReader/BitcodeWriter
  The format is `%x = freeze <ty> %v`
- Add support for freeze instruction to llvm-c interface.
- Add m_Freeze in PatternMatch.
- Erase freeze when lowering IR to SelDag.

Reviewers: deadalnix, hfinkel, efriedma, lebedev.ri, nlopes, jdoerfert, regehr, filcab, delcypher, whitequark

Reviewed By: lebedev.ri, jdoerfert

Subscribers: jfb, kristof.beyls, hiraditya, lebedev.ri, steven_wu, dexonsmith, xbolva00, delcypher, spatel, regehr, trentxintong, vsk, filcab, nlopes, mehdi_amini, deadalnix, llvm-commits

Differential Revision: https://reviews.llvm.org/D29011
2019-11-05 15:54:56 +09:00
Sanjay Patel 113181e9bd [DAGCombine][MSP430] use shift amount threshold in DAGCombine (2/2)
Continuation of:
D69116

Contributes to a fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559

See also D69099 and D69116

Use the TLI hook in DAGCombine.cpp to guard against creating
shift nodes that are not optimal for a target.

Patch by: @joanlluch (Joan LLuch)

Differential Revision: https://reviews.llvm.org/D69120
2019-11-04 13:41:41 -05:00
Ulrich Weigand 664f84e246 [FPEnv][SelectionDAG] Refactor strict FP node construction
Small refactoring in visitConstrainedFPIntrinsic that should make
it easier to create DAG nodes requiring extra arguments.  That is
the case currently only for STRICT_FP_ROUND, but may be the case
for additional nodes (in particular compares) in the future.

Extracted from the patch for D69281.

NFC.
2019-11-04 17:45:54 +01:00
Dávid Bolvanský a18a8db0d4 [SelectionDAG] Fixed null check after dereferencing warning. NFCI. 2019-11-03 19:34:03 +01:00
Simon Pilgrim 095d2a4ced FastISel - fix uninitialized variable warnings in constructor. NFCI. 2019-11-02 18:03:22 +00:00
Simon Pilgrim 97725707f4 Fix uninitialized variable warning. NFCI. 2019-11-02 14:42:38 +00:00
Craig Topper 96bb076621 [TargetLowering] Move the setBooleanContents check on (xor (setcc), (setcc)) == / != 1 -> (setcc) != / == (setcc) to the right place
We need to be checking the value types for the inner setccs not
the outer setcc. We need to ensure those setccs produce a 0/1
value or that the xor is on the i1 type. I think at the time
this code was originally written, getBooleanContents didn't
take any arguments so this was probably correct. But now we can
have a different boolean contents for integer and floating point.

Not sure why the other combines below the xor were also checking
the boolean contents. None of them involve any setccs other than
the outer one and they only produce a new setcc.

Differential Revision: https://reviews.llvm.org/D69480
2019-11-01 14:43:17 -07:00
Matt Arsenault 6221767055 DAG: Add DAG argument to isFPExtFoldable
For AMDGPU this is dependent on the FP mode, which should eventually
not be a property of the subtarget.
2019-10-31 22:32:45 -07:00
Hiroshi Yamauchi 0d987e411a [PGO][PGSO] TargetLowering/TargetTransformationInfo/SwitchLoweringUtils part.
Summary:
(Split of off D67120)

TargetLowering/TargetTransformationInfo/SwitchLoweringUtils changes for profile
guided size optimization.

Reviewers: davidxl

Subscribers: eraman, hiraditya, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69580
2019-10-31 13:22:56 -07:00
Matt Arsenault 1725f28841 DAG: Add new control for ISD::FMAD formation
For AMDGPU this depends on whether denormals are enabled in the
default FP mode for the function. Currently this is treated as a
subtarget feature, so FMAD is selectively legal based on that. I want
to move this out of the subtarget features so this can be controlled
with a denormal mode attribute. Additionally, this will allow folding
based on a future ftz fast math flag.
2019-10-31 07:51:38 -07:00
Jeremy Morse 3137fe4d23 [DebugInfo][DAG] Distinguish different kinds of location indirection
From SelectionDAGs point of view, debug variable locations specified with
dbg.declare and dbg.addr are indirect -- they specify the address of
something. But calling conventions might mean that a Value is placed on
the stack somewhere, and this too is indirection. Previously this was
mixed up in the "IsIndirect" field of DBG_VALUE insts; this patch
separates them by encoding the indirection in a DIExpression.

If we have a dbg.declare or dbg.addr, then the expression produces an
address that then becomes a DWARF memory location. We can represent
this by putting a DW_OP_deref on the _end_ of the expression. If a Value
has been placed on the stack, then we need to put a DW_OP_deref on the
_start_ of the expression, to load the Value from the stack and have
the rest of the expression operate on it.

Differential Revision: https://reviews.llvm.org/D69028
2019-10-30 18:41:44 +00:00
Kevin P. Neal 72bc291f94 [NFC] Move this set of STRICT_* cases to be next to the non-strict cases.
Requested by Cameron McInally in D69275.
2019-10-30 13:32:27 -04:00
Jay Foad 86549c7528 [SelectionDAG] Add support for FP_ROUND in WidenVectorOperand.
Summary:
This is used on AMDGPU for rounding from v3f64 (which is illegal) to
v3f32 (which is legal).

Subscribers: jvesely, nhaehnle, tpr, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69339
2019-10-30 15:18:21 +00:00
Philip Reames 2460989eab [SelectionDAG] Enable lowering unordered atomics loads w/LoadSDNode (and stores w/StoreSDNode) by default
Enable the new SelectionDAG representation for unordered loads and stores introduced in r371441 by default.  As a reminder, the new lowering changes the representation of an unordered atomic load from an AtomicSDNode - which is essentially a black box which gets passed through without combines messing with it - to a LoadSDNode w/a atomic marker on the MMO. The later parallels the way we handle volatiles, and I've audited the code to ensure that every location which checks one checks the other.

This has been fairly heavily fuzzed, and I examined diffs in a reasonable large corpus of assembly by hand, so I'm reasonable sure this is correct for the common case.  Late in the review for this, it was discovered that I hadn't correctly handled cases which could be legalized into CAS operations.  This points out that there's a strong bias in the IR of the frontend I'm working with towards only legal atomics.  If there are problems with this patch, the most likely area will be legalization.

Differential Revision: https://reviews.llvm.org/D69219
2019-10-29 12:46:24 -07:00
Greg Bedwell 1ba72a81ca Fix some spelling mistakes in comments. NFC 2019-10-29 12:41:24 +00:00
Amy Huang 742043047c Recommit "Add a heap alloc site marker field to the ExtraInfo in MachineInstrs"
Summary:
Fixes some things from original commit at https://reviews.llvm.org/D69136. The main
change is that the heap alloc marker is always stored as ExtraInfo in the machine
instruction instead of in the PointerSumType because it cannot hold more than
4 pointer types.

Add instruction marker to MachineInstr ExtraInfo. This does almost the
same thing as Pre/PostInstrSymbols, except that it doesn't create a label until
printing instructions. This allows for labels to be put around instructions that
are deleted/duplicated somewhere.
Use this marker to track heap alloc site call instructions.

Reviewers: rnk

Subscribers: MatzeB, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69536
2019-10-28 16:59:32 -07:00
Andrew Paverd d157a9bc8b Add Windows Control Flow Guard checks (/guard:cf).
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.

Reviewers: thakis, rnk, theraven, pcc

Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65761
2019-10-28 15:19:39 +00:00
Sanjay Patel 1ebd4a2e3a [DAGCombiner] widen any_ext of popcount based on target support
This enhances D69127 (rGe6c145e0548e3b3de6eab27e44e1504387cf6b53)
to handle the looser "any_extend" cast in addition to zext.

This is a prerequisite step for canonicalizing in the other direction
(narrow the popcount) in IR - PR43688:
https://bugs.llvm.org/show_bug.cgi?id=43688
2019-10-28 10:07:12 -04:00
David Green ba2c625531 [Codegen][ARM] Add float softening for cbrt
We would previously have no soft-float softening for cbrt, so could hit
a crash failing to select. This fills in what appears to be missing.

Differential Revision: https://reviews.llvm.org/D69345
2019-10-28 11:08:55 +00:00
Kerry McLaughlin da720a38b9 [AArch64][SVE] Implement masked load intrinsics
Summary:
Adds support for codegen of masked loads, with non-extending,
zero-extending and sign-extending variants.

Reviewers: huntergr, rovka, greened, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, samparker, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68877
2019-10-28 10:06:14 +00:00
Sanjay Patel 85a2146c15 [SDAG] fold insert_vector_elt with undef index
Similar to:
rG4c47617627fb

This makes the DAG behavior consistent with IR's insertelement.

https://bugs.llvm.org/show_bug.cgi?id=42689

I've tried to maintain test intent for AArch64 and WebAssembly
by replacing undef index operands with something else.
2019-10-27 15:28:43 -04:00
Craig Topper f067dd839e [LegalizeTypes] When promoting BITREVERSE/BSWAP don't take the shift amount into account when determining the shift amount VT.
If the target's preferred shift amount VT can't hold any shift
amount for the promoted VT, we should use i32. The specific shift
amount shouldn't matter. The type will be adjusted later when the
shift itself is type legalized. This avoids an assert in getNode.

Fixes PR43820.
2019-10-27 12:20:35 -07:00
Craig Topper 73f255b83a [TargetLowering] Add getBooleanContents contents check to "SETCC (SETCC), [0|1], [EQ|NE] -> SETCC" combine.
This combine is only valid if the inner setcc produces a 0/1 result
or the inner type is MVT::i1.

I haven't seen this cause any issues, just happened to notice it
while reviewing combines in this function.

While there also fix another call to use the value type from the
SDValue for the operand instead of calling SDNode::getValueType(0).
Though its likely the use is result 0, its not guaranteed.
2019-10-27 10:07:15 -07:00
Sanjay Patel 4c47617627 [SDAG] fold extract_vector_elt with undef index
This makes the DAG behavior consistent with IR's extractelement after:
rGb32e4664a715

https://bugs.llvm.org/show_bug.cgi?id=42689

I've tried to maintain test intent for WebAssembly.
The AMDGPU test is trying to test for crashing or other bad behavior,
but I'm not sure if that's possible after this change.
2019-10-25 19:27:26 -04:00
Amy Huang 64c1f6602a Revert "Add an instruction marker field to the ExtraInfo in MachineInstrs."
Reverting commit b85b4e5a6f due to some
buildbot failures/ out of memory errors.
2019-10-25 12:41:34 -07:00
Sanjay Patel e6c145e054 [DAGCombiner] widen zext of popcount based on target support
zext (ctpop X) --> ctpop (zext X)

This is a prerequisite step for canonicalizing in the other direction (narrow the popcount) in IR - PR43688:
https://bugs.llvm.org/show_bug.cgi?id=43688

I'm not sure if any other targets are affected, but I found a missing fold for PPC, so added tests based on that.
The reason we widen all the way to 64-bit in these tests is because the initial DAG looks something like this:

  t5: i8 = ctpop t4
  t6: i32 = zero_extend t5  <-- created based on IR, but unused node?
    t7: i64 = zero_extend t5

Differential Revision: https://reviews.llvm.org/D69127
2019-10-25 14:10:51 -04:00
Amy Huang b85b4e5a6f Add an instruction marker field to the ExtraInfo in MachineInstrs.
Summary:
Add instruction marker to MachineInstr ExtraInfo. This does almost the
same thing as Pre/PostInstrSymbols, except that it doesn't create a label until
printing instructions. This allows for labels to be put around instructions that
are deleted/duplicated somewhere.

Also undo the workaround in r375137.

Reviewers: rnk

Subscribers: MatzeB, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69136
2019-10-25 09:21:10 -07:00
Itay Bookstein 59a51d84b3
[CodeGen][SelectionDAG] Fix tiny bug in ExpandIntRes_UADDSUBO
Summary:
Ternary expression checks for ISD::ADD instead of ISD::UADDO inside DAGTypeLegalizer::ExpandIntRes_UADDSUBO.
This means the ternary expression will evaluate to ISD::SUBCARRY for both ISD::UADDO and ISD::USUBO nodes.
Targets are likely to implement both, so impact will be very limited in practice.

Reviewers: bogner, lebedev.ri

Reviewed By: lebedev.ri

Subscribers: lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68123
2019-10-25 18:10:51 +03:00
Simon Pilgrim a18818207a Fix cppcheck shadow variable warning. NFCI. 2019-10-24 22:14:36 +01:00
Hans Wennborg 684ebc605e Revert 4334892e7b "[DAGCombine][ARM] x ==/!= c -> (x - c) ==/!= 0 iff '-c' can be folded into the x node."
This broke various Windows builds, see comments on the Phabricator
review.

This also reverts the follow-up 20bf0cf.

> Summary:
> This fold, helps recover from the rest of the D62266 ARM regressions.
> https://rise4fun.com/Alive/TvpC
>
> Note that while the fold is quite flexible, i've restricted it
> to the single interesting pattern at the moment.
>
> Reviewers: efriedma, craig.topper, spatel, RKSimon, deadalnix
>
> Reviewed By: deadalnix
>
> Subscribers: javed.absar, kristof.beyls, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D62450
2019-10-23 19:52:02 +02:00
Roman Lebedev 20bf0cf2f0
[TargetLowering] optimizeSetCCToComparisonWithZero(): add extra sanity checks (PR43769)
We should do the fold only if both constants are plain,
non-opaque constants, at least that is the DAG.FoldConstantArithmetic()
requirement.
And if the constant we are comparing with is zero - we shouldn't be
trying to do this fold in the first place.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43769
2019-10-23 12:01:40 +03:00
Roman Lebedev 4334892e7b
[DAGCombine][ARM] x ==/!= c -> (x - c) ==/!= 0 iff '-c' can be folded into the x node.
Summary:
This fold, helps recover from the rest of the D62266 ARM regressions.
https://rise4fun.com/Alive/TvpC

Note that while the fold is quite flexible, i've restricted it
to the single interesting pattern at the moment.

Reviewers: efriedma, craig.topper, spatel, RKSimon, deadalnix

Reviewed By: deadalnix

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62450
2019-10-22 22:56:35 +03:00
Guillaume Chatelet 5df90cd71c [Alignment][NFC] TargetCallingConv::setByValAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69248

llvm-svn: 375410
2019-10-21 12:05:33 +00:00
Guillaume Chatelet bac5f6bd21 [Alignment][NFC] TargetCallingConv::setOrigAlign and TargetLowering::getABIAlignmentForCallingConv
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69243

llvm-svn: 375407
2019-10-21 11:01:55 +00:00
Sanjay Patel a298964d22 [TargetLowering][DAGCombine][MSP430] add/use hook for Shift Amount Threshold (1/2)
Provides a TLI hook to allow targets to relax the emission of shifts, thus enabling
codegen improvements on targets with no multiple shift instructions and cheap selects
or branches.

Contributes to a Fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559

Patch by: @joanlluch (Joan LLuch)

Differential Revision: https://reviews.llvm.org/D69116

llvm-svn: 375347
2019-10-19 16:57:02 +00:00
Reid Kleckner 904cd3e06b Prune a LegacyDivergenceAnalysis and MachineLoopInfo include each
Now X86ISelLowering doesn't depend on many IR analyses.

llvm-svn: 375320
2019-10-19 01:31:09 +00:00
Reid Kleckner 0ad6c191de Prune Analysis includes from SelectionDAG.h
Only forward declarations are needed here. Follow-on to r375311.

llvm-svn: 375319
2019-10-19 01:07:48 +00:00
Graham Hunter 84da2596f9 [AArch64][SVE] Add SPLAT_VECTOR ISD Node
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.

Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.

At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.

I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.

Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy

Reviewed By: jmolloy

Differential Revision: https://reviews.llvm.org/D47775

llvm-svn: 375222
2019-10-18 11:48:35 +00:00
David Green e6f313b380 [Codegen] Alter the default promotion for saturating adds and subs
The default promotion for the add_sat/sub_sat nodes currently does:
    ANY_EXTEND iN to iM
    SHL by M-N
    [US][ADD|SUB]SAT
    L/ASHR by M-N

If the promoted add_sat or sub_sat node is not legal, this can produce code
that effectively does a lot of shifting (and requiring large constants to be
materialised) just to use the overflow flag. It is simpler to just do the
saturation manually, using the higher bitwidth addition and a min/max against
the saturating bounds. That is what this patch attempts to do.

Differential Revision: https://reviews.llvm.org/D68926

llvm-svn: 375211
2019-10-18 09:47:48 +00:00
Sam Parker 39af8a3a3b [DAGCombine][ARM] Enable extending masked loads
Add generic DAG combine for extending masked loads.

Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.

Differential Revision: https://reviews.llvm.org/D68337

llvm-svn: 375085
2019-10-17 07:55:55 +00:00
Craig Topper 8995daafa0 [LegalizeTypes] Don't use PromoteTargetBoolean in WidenVecOp_SETCC.
Similar to r374970, but I don't have a test for this.

PromoteTargetBoolean is intended to be use for legalizing an
operand that needs to be promoted. It picks its type based on
the return from getSetccResultType and is intended to be used
when we have freedom to pick the new type. But the return type
we need for WidenVecOp_SETCC is completely determined by the
type of the input node.

llvm-svn: 374972
2019-10-16 03:29:24 +00:00
Craig Topper 7b49e8ac35 [LegalizeTypes] Don't call PromoteTargetBoolean from SplitVecOp_VSETCC.
PromoteTargetBoolean calls getSetccResultType to get the return
type. But we were passing it the setcc result type rather than the
setcc input type. This causes an issue on X86 with avx512vl where
the setcc result type for vXf16 vectors is vXi16 while the
result type for vXi16 vectors is vXi1.

There's really no guarantee that getSetccResultType is the type
we need here. So now we just grab the extend type from
getExtendForContent and extend to the original result VT of the
node we're splitting.

llvm-svn: 374970
2019-10-16 02:50:04 +00:00
David Zarzycki 59390efef2 [X86] Make memcmp() use PTEST if possible and also enable AVX1
llvm-svn: 374922
2019-10-15 17:40:12 +00:00
Sanjay Patel d545c9056e [DAGCombiner] fold select-of-constants based on sign-bit test
Examples:
  i32 X > -1 ? C1 : -1 --> (X >>s 31) | C1
  i8 X < 0 ? C1 : 0 --> (X >>s 7) & C1

This is a small generalization of a fold requested in PR43650:
https://bugs.llvm.org/show_bug.cgi?id=43650

The sign-bit of the condition operand can be used as a mask for the true operand:
https://rise4fun.com/Alive/paT

Note that we already handle some of the patterns (isNegative + scalar) because
there's an over-specialized, yet over-reaching fold for that in foldSelectCCToShiftAnd().
It doesn't use any TLI hooks, so I can't easily rip out that code even though we're
duplicating part of it here. This fold is guarded by TLI.convertSelectOfConstantsToMath(),
so it should not cause problems for targets that prefer select over shift.

Also worth noting: I thought we could generalize this further to include the case where
the true operand of the select is not constant, but Alive says that may allow poison to
pass through where it does not in the original select form of the code.

Differential Revision: https://reviews.llvm.org/D68949

llvm-svn: 374902
2019-10-15 15:23:57 +00:00
Jeremy Morse ed29dbaafa [DebugInfo] Remove some users of DBG_VALUEs IsIndirect field
This patch kills off a significant user of the "IsIndirect" field of
DBG_VALUE machine insts. Brought up in in PR41675, IsIndirect is
techncally redundant as it can be expressed by the DIExpression of a
DBG_VALUE inst, and it isn't helpful to have two ways of expressing
things.

Rather than setting IsIndirect, have DBG_VALUE creators add an extra deref
to the insts DIExpression. There should now be no appearences of
IsIndirect=True from isel down to LiveDebugVariables / VirtRegRewriter,
which is ensured by an assertion in LDVImpl::handleDebugValue. This means
we also get to delete the IsIndirect handling in LiveDebugVariables. Tests
can be upgraded by for example swapping the following IsIndirect=True
DBG_VALUE:

  DBG_VALUE $somereg, 0, !123, !DIExpression(DW_OP_foo)

With one where the indirection is in the DIExpression, by _appending_
a deref:

  DBG_VALUE $somereg, $noreg, !123, !DIExpression(DW_OP_foo, DW_OP_deref)

Which both mean the same thing. 

Most of the test changes in this patch are updates of that form; also some
changes in how the textual assembly printer handles these insts.

Differential Revision: https://reviews.llvm.org/D68945

llvm-svn: 374877
2019-10-15 10:46:24 +00:00
Joerg Sonnenberger 9681ea9560 Reapply r374743 with a fix for the ocaml binding
Add a pass to lower is.constant and objectsize intrinsics

This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.

The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.

Differential Revision: https://reviews.llvm.org/D65280

llvm-svn: 374784
2019-10-14 16:15:14 +00:00
Dmitri Gribenko 1a21f98ac3 Revert "Add a pass to lower is.constant and objectsize intrinsics"
This reverts commit r374743. It broke the build with Ocaml enabled:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218

llvm-svn: 374768
2019-10-14 12:22:48 +00:00
Joerg Sonnenberger e4300c392d Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.

The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.

Differential Revision: https://reviews.llvm.org/D65280

llvm-svn: 374743
2019-10-13 23:00:15 +00:00
David Green 7c30af8e65 Revert 374373: [Codegen] Alter the default promotion for saturating adds and subs
This commit is not extending the promoted integers as it should. Reverting
whilst I look into the details.

llvm-svn: 374592
2019-10-11 20:33:03 +00:00
Sanjay Patel 3b581ac80f [DAGCombiner] fold vselect-of-constants to shift
The diffs suggest that we are missing some more basic
analysis/transforms, but this keeps the vector path in
sync with the scalar (rL374397). This is again a
preliminary step for introducing the reverse transform
in IR as proposed in D63382.

llvm-svn: 374555
2019-10-11 14:17:56 +00:00
Sanjay Patel 7b904ce724 [DAGCombiner] fold select-of-constants to shift
This reverses the scalar canonicalization proposed in D63382.

Pre: isPowerOf2(C1)
%r = select i1 %cond, i32 C1, i32 0
=>
%z = zext i1 %cond to i32
%r = shl i32 %z, log2(C1)

https://rise4fun.com/Alive/Z50

x86 already tries to fold this pattern, but it isn't done
uniformly, so we still see a diff. AArch64 probably should
enable the TLI hook to benefit too, but that's a follow-on.

llvm-svn: 374397
2019-10-10 17:52:02 +00:00
David Green 94d379095a [Codegen] Alter the default promotion for saturating adds and subs
The default promotion for the add_sat/sub_sat nodes currently does:
   1. ANY_EXTEND iN to iM
   2. SHL by M-N
   3. [US][ADD|SUB]SAT
   4. L/ASHR by M-N
If the promoted add_sat or sub_sat node is not legal, this can produce code
that effectively does a lot of shifting (and requiring large constants to be
materialised) just to use the overflow flag. It is simpler to just do the
saturation manually, using the higher bitwidth addition and a min/max against
the saturating bounds. That is what this patch attempts to do.

Differential Revision: https://reviews.llvm.org/D68643

llvm-svn: 374373
2019-10-10 16:04:49 +00:00
Sanjay Patel 7f0e7c0b1c [DAGCombiner] reduce code duplication; NFC
llvm-svn: 374370
2019-10-10 15:38:29 +00:00
Amaury Sechet aaf0507896 [DAGCombine] Match more patterns for half word bswap
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68250

llvm-svn: 374340
2019-10-10 13:20:10 +00:00
Philip Reames 931120846e Conservatively add volatility and atomic checks in a few places
As background, starting in D66309, I'm working on support unordered atomics analogous to volatile flags on normal LoadSDNode/StoreSDNodes for X86.

As part of that, I spent some time going through usages of LoadSDNode and StoreSDNode looking for cases where we might have missed a volatility check or need an atomic check. I couldn't find any cases that clearly miscompile - i.e. no test cases - but a couple of pieces in code loop suspicious though I can't figure out how to exercise them.

This patch adds defensive checks and asserts in the places my manual audit found. If anyone has any ideas on how to either a) disprove any of the checks, or b) hit the bug they might be fixing, I welcome suggestions.

Differential Revision: https://reviews.llvm.org/D68419

llvm-svn: 374261
2019-10-09 23:43:33 +00:00
Evandro Menezes e60415a0db [Support] Add mathematical constants
Add own version of the mathematical constants from the upcoming C++20 `std::numbers`.

Differential revision: https://reviews.llvm.org/D68257

llvm-svn: 374207
2019-10-09 19:58:01 +00:00
Kevin P. Neal 1c3d19c82d [FPEnv] Add constrained intrinsics for lrint and lround
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.

Reviewed by:	andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by:	craig.topper
Differential Revision:	https://reviews.llvm.org/D64746

llvm-svn: 373900
2019-10-07 13:20:00 +00:00
Simon Pilgrim b4ba3cbda0 [X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217)
If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.

This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.

Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper.

Fixes PR43217

Differential Revision: https://reviews.llvm.org/D68544

llvm-svn: 373871
2019-10-06 21:11:45 +00:00
Craig Topper 842dde6be4 [LegalizeTypes][X86] When splitting a vselect for type legalization, don't split a setcc condition if the setcc input is legal and vXi1 conditions are supported
Summary: The VSELECT splitting code tries to split a setcc input as well. But on avx512 where mask registers are well supported it should be better to just split the mask and use a single compare.

Reviewers: RKSimon, spatel, efriedma

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68359

llvm-svn: 373863
2019-10-06 18:43:03 +00:00
Sanjay Patel f643fabb52 Revert [DAGCombine] Match more patterns for half word bswap
This reverts r373850 (git commit 25ba49824d)

This patch appears to cause multiple codegen regression test failures - http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10680

llvm-svn: 373853
2019-10-06 15:27:34 +00:00
Amaury Sechet 25ba49824d [DAGCombine] Match more patterns for half word bswap
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68250

llvm-svn: 373850
2019-10-06 14:14:55 +00:00
Craig Topper 2decdf42b9 [FastISel] Copy the inline assembly dialect to the INLINEASM instruction.
Fixes PR43575.

llvm-svn: 373836
2019-10-05 23:21:17 +00:00
Philip Reames d5a4dad206 Fix a *nasty* miscompile in experimental unordered atomic lowering
This is an omission in rL371441.  Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block.

Included test case is how I spotted this.  We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with.  I'm sure it showed up a bunch of other ways as well.

Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode.  Finding this with testing would have been "unpleasant".  

llvm-svn: 373814
2019-10-05 00:32:10 +00:00
Eli Friedman 23ae13d51f [ScheduleDAG] When a node is cloned, add an edge between the nodes.
InstrEmitter's virtual register handling assumes that clones are emitted
after the cloned node.  Make sure this assumption actually holds.

Fixes a "Node emitted out of order - early" assertion on the testcase.

This is probably a very rare case to actually hit in practice; even
without the explicit edge, the scheduler will usually end up scheduling
the nodes in the expected order due to other constraints.

Differential Revision: https://reviews.llvm.org/D68068

llvm-svn: 373782
2019-10-04 19:51:40 +00:00
Sanjay Patel 288079aafd [DAGCombiner] add operation legality checks before creating shift ops (PR43542)
As discussed on llvm-dev and:
https://bugs.llvm.org/show_bug.cgi?id=43542
...we have transforms that assume shift operations are legal and transforms to
use them are profitable, but that may not hold for simple targets.

In this case, the MSP430 target custom lowers shifts by repeating (many)
simpler/fixed ops. That can be avoided by keeping this code as setcc/select.

Differential Revision: https://reviews.llvm.org/D68397

llvm-svn: 373666
2019-10-03 21:34:04 +00:00
Craig Topper 2772b970e3 [LegalizeTypes] Check for already split condition before calilng SplitVecRes_SETCC in SplitRes_SELECT.
No point in manually splitting the SETCC if it was already done.

llvm-svn: 373535
2019-10-02 22:34:49 +00:00
Hans Wennborg 9330005a54 Reapply r373431 "Switch lowering: omit range check for bit tests when default is unreachable (PR43129)"
This was reverted in r373454 due to breaking the expensive-checks bot.
This version addresses that by omitting the addSuccessorWithProb() call
when omitting the range check.

> Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
>
> This is modeled after the same functionality for jump tables, which was
> added in r357067.
>
> Differential revision: https://reviews.llvm.org/D68131

llvm-svn: 373477
2019-10-02 14:35:06 +00:00
Hans Wennborg 372aece777 Revert r373431 "Switch lowering: omit range check for bit tests when default is unreachable (PR43129)"
This broke http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/19967

> Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
>
> This is modeled after the same functionality for jump tables, which was
> added in r357067.
>
> Differential revision: https://reviews.llvm.org/D68131

llvm-svn: 373454
2019-10-02 12:08:44 +00:00
Hans Wennborg cbefc36fcc Switch lowering: omit range check for bit tests when default is unreachable (PR43129)
This is modeled after the same functionality for jump tables, which was
added in r357067.

Differential revision: https://reviews.llvm.org/D68131

llvm-svn: 373431
2019-10-02 08:32:15 +00:00
Simon Pilgrim 3c912c4abe [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks.

The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants).

Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.).

Partial reversion of rL372756 - I've identified the infinite loop issue inside the X86 override but haven't fixed it yet so I've only (re)committed the common TargetLowering refactoring part of the patch.

Differential Revision: https://reviews.llvm.org/D67557

llvm-svn: 373343
2019-10-01 15:32:04 +00:00
Matt Arsenault f24ac13aaa TLI: Remove DAG argument from getRegisterByName
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.

The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.

llvm-svn: 373292
2019-10-01 01:44:39 +00:00
Amaury Sechet e6f98c0073 [DAGCombiner] Clang format MatchRotate. NFC
llvm-svn: 373269
2019-09-30 21:41:52 +00:00
Daniel Sanders cbe13a1461 [globalisel][knownbits] Allow targets to call GISelKnownBits::computeKnownBitsImpl()
Summary:
It seems we missed that the target hook can't query the known-bits for the
inputs to a target instruction. Fix that oversight

Reviewers: aditya_nandakumar

Subscribers: rovka, hiraditya, volkan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67380

llvm-svn: 373264
2019-09-30 20:55:53 +00:00
Amaury Sechet 496c0564f1 [DAGCombiner] Update MatchRotate so that it returns an SDValue. NFC
llvm-svn: 373260
2019-09-30 20:47:23 +00:00
Tamas Berghammer 421a186fb4 Support MemoryLocation::UnknownSize in TargetLowering::IntrinsicInfo
Summary:
Previously IntrinsicInfo::size was an unsigned what can't represent the
64 bit value used by MemoryLocation::UnknownSize.

Reviewers: jmolloy

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68219

llvm-svn: 373214
2019-09-30 14:44:24 +00:00
Hans Wennborg dc7dbb1a88 NFC changes to SelectionDAGBuilder::visitBitTestHeader(), preparing for PR43129
llvm-svn: 373191
2019-09-30 08:47:53 +00:00
Roger Ferrer Ibanez 5a2a14db0b [TargetLowering] Simplify expansion of S{ADD,SUB}O
ISD::SADDO uses the suggested sequence described in the section §2.4 of
the RISCV Spec v2.2. ISD::SSUBO uses the dual approach but checking for
(non-zero) positive.

Differential Revision: https://reviews.llvm.org/D47927

llvm-svn: 373187
2019-09-30 07:58:50 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Thomas Raoux 3c8c667235 [TargetLowering] Make allowsMemoryAccess methode virtual.
Rename old function to explicitly show that it cares only about alignment.
The new allowsMemoryAccess call the function related to alignment by default
and can be overridden by target to inform whether the memory access is legal or
not.

Differential Revision: https://reviews.llvm.org/D67121

llvm-svn: 372935
2019-09-26 00:16:01 +00:00
Sanjay Patel 831a7e7068 [DAGCombiner] add one-use restriction to vector transform with cheap extract
We might be able to do better on the example in the test,
but in general, we should not scalarize a splatted vector
binop if there are other uses of the binop. Otherwise, we
can end up with code as we had - a scalar op that is
redundant with a vector op.

llvm-svn: 372886
2019-09-25 15:08:33 +00:00
Simon Pilgrim 20f4afc5a7 [DAG] Pull out minimum shift value calc into a helper function. NFCI.
llvm-svn: 372856
2019-09-25 12:28:56 +00:00
Ilya Biryukov 60e5e0b667 Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
Reason: this caused severe compile time regressions in JAX.
See email thread  of original revision on llvm-commits for details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html

llvm-svn: 372756
2019-09-24 13:48:02 +00:00
Craig Topper a533e87792 [X86][SelectionDAGBuilder] Move the hack for handling MMX shift by i32 intrinsics into the X86 backend.
This intrinsics should be shift by immediate, but gcc allows any
i32 scalar and clang needs to match that. So we try to detect the
non-constant case and move the data from an integer register to an
MMX register.

Previously this was done by creating a v2i32 build_vector and
bitcast in SelectionDAGBuilder. This had to be done early since
v2i32 isn't a legal type. The bitcast+build_vector would be DAG
combined to X86ISD::MMX_MOVW2D which isel will turn into a
GPR->MMX MOVD.

This commit just moves the whole thing to lowering and emits
the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The
test changes just seem to be due to nodes being linearized in a
different order.

llvm-svn: 372535
2019-09-23 01:05:33 +00:00
Simon Pilgrim c8a9ae4ce2 [SelectionDAG] computeKnownBits/ComputeNumSignBits - cleanup demanded/unknown paths. NFCI.
Merge the calls, just adjust the demandedelts if we have a valid extract_subvector constant index, else demand all elts.

llvm-svn: 372521
2019-09-22 18:47:12 +00:00
Craig Topper 1b7b4b467f [SelectionDAG][Mips][Sparc] Don't allow SimplifyDemandedBits to constant fold TargetConstant nodes to a Constant.
Summary:
After the switch in SimplifyDemandedBits, it tries to create a
constant when possible. If the original node is a TargetConstant
the default in the switch will call computeKnownBits on the
TargetConstant which will succeed. This results in the
TargetConstant becoming a Constant. But TargetConstant exists to
avoid being changed.

I've fixed the two cases that relied on this in tree by explicitly
making the nodes constant instead of target constant. The Sparc
case is an old bug. The Mips case was recently introduced now that
ImmArg on intrinsics gets turned into a TargetConstant when the
SelectionDAG is created. I've removed the ImmArg since it lowers
to generic code.

Reviewers: arsenm, RKSimon, spatel

Subscribers: jyknight, sdardis, wdng, arichardson, hiraditya, fedor.sergeev, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67802

llvm-svn: 372409
2019-09-20 16:49:51 +00:00
David Tellenbach 2a47c77e72 [FastISel] Fix insertion of unconditional branches during FastISel
The insertion of an unconditional branch during FastISel can differ depending on
building with or without debug information. This happens because FastISel::fastEmitBranch
emits an unconditional branch depending on the size of the current basic block
without distinguishing between debug and non-debug instructions.

This patch fixes this issue by ignoring debug instructions when getting the size
of the basic block.

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: ormris, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67703

llvm-svn: 372389
2019-09-20 13:22:59 +00:00
Matt Arsenault 3ecab8e455 Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This reverts r372314, reapplying r372285 and the commits which depend
on it (r372286-r372293, and r372296-r372297)

This was missing one switch to getTargetConstant in an untested case.

llvm-svn: 372338
2019-09-19 16:26:14 +00:00
Simon Pilgrim af6043557d [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
This patch converts the DAGCombine isNegatibleForFree/GetNegatedExpression into overridable TLI hooks and includes a demonstration X86 implementation.

The intention is to let us extend existing FNEG combines to work more generally with negatible float ops, allowing it work with target specific combines and opcodes (e.g. X86's FMA variants).

Unlike the SimplifyDemandedBits, we can't just handle target nodes through a Target callback, we need to do this as an override to allow targets to handle generic opcodes as well. This does mean that the target implementations has to duplicate some checks (recursion depth etc.).

I've only begun to replace X86's FNEG handling here, handling FMADDSUB/FMSUBADD negation and some low impact codegen changes (some FMA negatation propagation). We can build on this in future patches.

Differential Revision: https://reviews.llvm.org/D67557

llvm-svn: 372333
2019-09-19 15:02:47 +00:00
Amaury Sechet 9e94ef42ba [DAGCombiner] Add node to the worklist in topological order in scalarizeExtractedVectorLoad
Summary: As per title.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66661

llvm-svn: 372327
2019-09-19 14:22:11 +00:00
Simon Pilgrim c65dd89804 [DAG] Add SelectionDAG::MaxRecursionDepth constant
As commented on D67557 we have a lot of uses of depth checks all using magic numbers.

This patch adds the SelectionDAG::MaxRecursionDepth constant and moves over some general cases to use this explicitly.

Differential Revision: https://reviews.llvm.org/D67711

llvm-svn: 372315
2019-09-19 12:58:43 +00:00
Hans Wennborg 13bdae8541 Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics"
This broke the Chromium build, causing it to fail with e.g.

  fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15>

See llvm-commits thread of r372285 for details.

This also reverts r372286, r372287, r372288, r372289, r372290, r372291,
r372292, r372293, r372296, and r372297, which seemed to depend on the
main commit.

> Encode them directly as an imm argument to G_INTRINSIC*.
>
> Since now intrinsics can now define what parameters are required to be
> immediates, avoid using registers for them. Intrinsics could
> potentially want a constant that isn't a legal register type. Also,
> since G_CONSTANT is subject to CSE and legalization, transforms could
> potentially obscure the value (and create extra work for the
> selector). The register bank of a G_CONSTANT is also meaningful, so
> this could throw off future folding and legalization logic for AMDGPU.
>
> This will be much more convenient to work with than needing to call
> getConstantVRegVal and checking if it may have failed for every
> constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
> immarg operands, many of which need inspection during lowering. Having
> to find the value in a register is going to add a lot of boilerplate
> and waste compile time.
>
> SelectionDAG has always provided TargetConstant for constants which
> should not be legalized or materialized in a register. The distinction
> between Constant and TargetConstant was somewhat fuzzy, and there was
> no automatic way to force usage of TargetConstant for certain
> intrinsic parameters. They were both ultimately ConstantSDNode, and it
> was inconsistently used. It was quite easy to mis-select an
> instruction requiring an immediate. For SelectionDAG, start emitting
> TargetConstant for these arguments, and using timm to match them.
>
> Most of the work here is to cleanup target handling of constants. Some
> targets process intrinsics through intermediate custom nodes, which
> need to preserve TargetConstant usage to match the intrinsic
> expectation. Pattern inputs now need to distinguish whether a constant
> is merely compatible with an operand or whether it is mandatory.
>
> The GlobalISelEmitter needs to treat timm as a special case of a leaf
> node, simlar to MachineBasicBlock operands. This should also enable
> handling of patterns for some G_* instructions with immediates, like
> G_FENCE or G_EXTRACT.
>
> This does include a workaround for a crash in GlobalISelEmitter when
> ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372314
2019-09-19 12:33:07 +00:00
Matt Arsenault d8399d12cd GlobalISel: Don't materialize immarg arguments to intrinsics
Encode them directly as an imm argument to G_INTRINSIC*.

Since now intrinsics can now define what parameters are required to be
immediates, avoid using registers for them. Intrinsics could
potentially want a constant that isn't a legal register type. Also,
since G_CONSTANT is subject to CSE and legalization, transforms could
potentially obscure the value (and create extra work for the
selector). The register bank of a G_CONSTANT is also meaningful, so
this could throw off future folding and legalization logic for AMDGPU.

This will be much more convenient to work with than needing to call
getConstantVRegVal and checking if it may have failed for every
constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth
immarg operands, many of which need inspection during lowering. Having
to find the value in a register is going to add a lot of boilerplate
and waste compile time.

SelectionDAG has always provided TargetConstant for constants which
should not be legalized or materialized in a register. The distinction
between Constant and TargetConstant was somewhat fuzzy, and there was
no automatic way to force usage of TargetConstant for certain
intrinsic parameters. They were both ultimately ConstantSDNode, and it
was inconsistently used. It was quite easy to mis-select an
instruction requiring an immediate. For SelectionDAG, start emitting
TargetConstant for these arguments, and using timm to match them.

Most of the work here is to cleanup target handling of constants. Some
targets process intrinsics through intermediate custom nodes, which
need to preserve TargetConstant usage to match the intrinsic
expectation. Pattern inputs now need to distinguish whether a constant
is merely compatible with an operand or whether it is mandatory.

The GlobalISelEmitter needs to treat timm as a special case of a leaf
node, simlar to MachineBasicBlock operands. This should also enable
handling of patterns for some G_* instructions with immediates, like
G_FENCE or G_EXTRACT.

This does include a workaround for a crash in GlobalISelEmitter when
ARM tries to uses "imm" in an output with a "timm" pattern source.

llvm-svn: 372285
2019-09-19 01:33:14 +00:00
Roman Lebedev c00f318224 [DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold
Summary:
`DAGCombiner::visitADDLikeCommutative()` already has a sibling fold:
`(add X, Carry) -> (addcarry X, 0, Carry)`

This fold, as suggested by @efriedma, helps recover from //some//
of the regressions of D62266

Reviewers: efriedma, deadalnix

Subscribers: javed.absar, kristof.beyls, llvm-commits, efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62392

llvm-svn: 372259
2019-09-18 20:48:27 +00:00
Craig Topper b5ffbd0b14 [SimplifyDemandedBits] Use APInt::intersects to instead of ANDing and comparing to 0 separately. NFC
llvm-svn: 372158
2019-09-17 18:19:02 +00:00
Simon Pilgrim b743e94cdc [TargetLowering] SimplifyDemandedBits - add EXTRACT_SUBVECTOR support.
Call SimplifyDemandedBits on the source vector.

llvm-svn: 371923
2019-09-14 16:38:26 +00:00
Craig Topper 4d1df2aa23 [TargetRegisterInfo] Remove SVT argument from getCommonSubClass.
This was added to support fp128 on x86-64, but appears to be
unneeded now. This may be because the FR128 register class
added back then was merged with the VR128 register class later.

llvm-svn: 371815
2019-09-13 05:24:37 +00:00
Philip Reames 079e210463 [SDAG] Update generic code to conservatively check for isAtomic in addition to isVolatile
This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later.  See D66309 for context.

Differential Revision: https://reviews.llvm.org/D66318

llvm-svn: 371786
2019-09-12 22:49:17 +00:00
Craig Topper efe6724b9f [DAGCombiner][X86] Pass the CmpOpVT to reduceSelectOfFPConstantLoads so X86 can exclude fp128 compares.
The X86 decision assumes the compare will produce a result in an XMM
register, but that can't happen for an fp128 compare since those
go to a libcall the returns an i32. Pass the VT so X86 can check
the type.

llvm-svn: 371775
2019-09-12 21:30:18 +00:00
Craig Topper 344c398e2a [SelectionDAGBuilder] Simplify loop in visitSelect back to how it was before r255558.
This code was changed to accomodate fp128 being softened to itself
during type legalization on x86-64. This was done in order to create
libcalls while having fp128 as a legal type. We're now doing the
libcall creation during LegalizeDAG and the type legalization changes
to enable the old behavior have been removed. So this change to
SelectionDAGBuilder is no longer needed.

llvm-svn: 371771
2019-09-12 21:00:32 +00:00
Simon Pilgrim da59a6bf7d [DAGCombine] visitFDIV - Use isCheaperToUseNegatedFPOps helper for (fdiv (fneg X), (fneg Y)) -> (fdiv X, Y). NFCI.
Minor cleanup to use equivalent helper code.

llvm-svn: 371724
2019-09-12 11:03:09 +00:00
Tim Northover f1c2892912 AArch64: support arm64_32, an ILP32 slice for watchOS.
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.

llvm-svn: 371722
2019-09-12 10:22:23 +00:00
Qiu Chaofan b7fb5d0f6f [DAGCombiner] Improve division estimation of floating points.
Current implementation of estimating divisions loses precision since it
estimates reciprocal first and does multiplication.  This patch is to re-order
arithmetic operations in the last iteration in DAGCombiner to improve the
accuracy.

Reviewed By: Sanjay Patel, Jinsong Ji

Differential Revision: https://reviews.llvm.org/D66050

llvm-svn: 371713
2019-09-12 07:51:24 +00:00
Craig Topper b8dd075275 [LegalizeTypes] Remove code for softening a float type to itself.
This was previously used to turn fp128 operations into libcalls
on X86. This is now done through op legalization after r371672.

This restores much of this code to before r254653.

llvm-svn: 371709
2019-09-12 05:55:14 +00:00
Guillaume Chatelet b6722af068 [Alignment] Use Align for TargetLowering::MinStackArgumentAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67288

llvm-svn: 371498
2019-09-10 09:01:18 +00:00
Craig Topper e8b432fa0e [LegalizeTypes] Teach SoftenFloatOp_SELECT_CC to handle operand 2 or 3 being softened.
This can only happen on X86 when fp128 is a legal type, but we
go through softening to generate libcalls. This causes fp128 to
be softened to fp128 instead of an integer type. This can be
removed if D67128 lands.

llvm-svn: 371493
2019-09-10 07:56:02 +00:00
Philip Reames 20aafa3156 Introduce infrastructure for an incremental port of SelectionDAG atomic load/store handling
This is the first patch in a large sequence. The eventual goal is to have unordered atomic loads and stores - and possibly ordered atomics as well - handled through the normal ISEL codepaths for loads and stores. Today, there handled w/instances of AtomicSDNodes. The result of which is that all transforms need to be duplicated to work for unordered atomics. The benefit of the current design is that it's harder to introduce a silent miscompile by adding an transform which forgets about atomicity.  See the thread on llvm-dev titled "FYI: proposed changes to atomic load/store in SelectionDAG" for further context.

Note that this patch is NFC unless the experimental flag is set.

The basic strategy I plan on taking is:

    introduce infrastructure and a flag for testing (this patch)
    Audit uses of isVolatile, and apply isAtomic conservatively*
    piecemeal conservative* update generic code and x86 backedge code in individual reviews w/tests for cases which didn't check volatile, but can be found with inspection
    flip the flag at the end (with minimal diffs)
    Work through todo list identified in (2) and (3) exposing performance ops

(*) The "conservative" bit here is aimed at minimizing the number of diffs involved in (4). Ideally, there'd be none. In practice, getting it down to something reviewable by a human is the actual goal. Note that there are (currently) no paths which produce LoadSDNode or StoreSDNode with atomic MMOs, so we don't need to worry about preserving any behaviour there.

We've taken a very similar strategy twice before with success - once at IR level, and once at the MI level (post ISEL). 

Differential Revision: https://reviews.llvm.org/D66309

llvm-svn: 371441
2019-09-09 19:23:22 +00:00
Craig Topper 5ebd0a6e88 [SelectionDAG] Remove ISD::FP_ROUND_INREG
I don't think anything in tree creates this node. So all of this
code appears to be dead.

Code coverage agrees
http://lab.llvm.org:8080/coverage/coverage-reports/llvm/coverage/Users/buildslave/jenkins/workspace/clang-stage2-coverage-R/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp.html

Differential Revision: https://reviews.llvm.org/D67312

llvm-svn: 371431
2019-09-09 17:54:44 +00:00
Tim Northover 36147adc0b GlobalISel: add combiner to form indexed loads.
Loosely based on DAGCombiner version, but this part is slightly simpler in
GlobalIsel because all address calculation is performed by G_GEP. That makes
the inc/dec distinction moot so there's just pre/post to think about.

No targets can handle it yet so testing is via a special flag that overrides
target hooks.

llvm-svn: 371384
2019-09-09 10:04:23 +00:00
Craig Topper dac34f52d3 [DAGCombiner][X86][ARM] Teach visitMULO to fold multiplies with 0 to 0 and no carry.
I modified the ARM test to use two inputs instead of 0 so the
test hopefully still tests what was intended.

llvm-svn: 371344
2019-09-08 19:24:39 +00:00
Bjorn Pettersson d065c81164 [CodeGen] Handle SMULFIXSAT with scale zero in TargetLowering::expandFixedPointMul
Summary:
Normally TargetLowering::expandFixedPointMul would handle
SMULFIXSAT with scale zero by using an SMULO to compute the
product and determine if saturation is needed (if overflow
happened). But if SMULO isn't custom/legal it falls through
and uses the same technique, using MULHS/SMUL_LOHI, as used
for non-zero scales.

Problem was that when checking for overflow (handling saturation)
when not using MULO we did not expect to find a zero scale. So
we ended up in an assertion when doing
  APInt::getLowBitsSet(VTSize, Scale - 1)

This patch fixes the problem by adding a new special case for
how saturation is computed when scale is zero.

Reviewers: RKSimon, bevinh, leonardchan, spatel

Reviewed By: RKSimon

Subscribers: wuzish, nemanjai, hiraditya, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67071

llvm-svn: 371309
2019-09-07 12:16:23 +00:00
Bjorn Pettersson 5e331e4ce8 [Intrinsic] Add the llvm.umul.fix.sat intrinsic
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.

This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.

Patch by: leonardchan, bjope

Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel

Reviewed By: leonardchan

Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57836

llvm-svn: 371308
2019-09-07 12:16:14 +00:00
Teresa Johnson 9c27b59cec Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.

Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.

There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.

Reviewers: chandlerc, hfinkel

Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66428

llvm-svn: 371284
2019-09-07 03:09:36 +00:00
Philip Reames 3a49ca331f Update CodeGen to use hasMetadata as appropriate [NFC]
My intial grepping for rL370933 missed a directory worth of cases.

llvm-svn: 370942
2019-09-04 17:46:55 +00:00
Bjorn Pettersson b0eb394417 [CodeGen] Use FSHR in DAGTypeLegalizer::ExpandIntRes_MULFIX
Summary:
Simplify the right shift of the intermediate result (given
in four parts) by using funnel shift.

There are some impact on lit tests, but that seems to be
related to register allocation differences due to how FSHR
is expanded on X86 (giving a slightly different operand order
for the OR operations compared to the old code).

Reviewers: leonardchan, RKSimon, spatel, lebedev.ri

Reviewed By: RKSimon

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, pzheng, bevinh, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67036

llvm-svn: 370813
2019-09-03 19:35:07 +00:00
Craig Topper 9c74c77404 [LegalizeDAG] Pass DAG to two calls to SDNode::dump in debug prints so that they will print target specific nodes correctly.
The dump methods can only print target node names correctly if
they can get access to the TLI object.

llvm-svn: 370694
2019-09-03 02:51:14 +00:00
Sanjay Patel 4e54cf3e0e [DAGCombiner] try to form test+set out of shift+mask patterns
The motivating bugs are:
https://bugs.llvm.org/show_bug.cgi?id=41340
https://bugs.llvm.org/show_bug.cgi?id=42697

As discussed there, we could view this as a failure of IR canonicalization,
but then we would need to implement a backend fixup with target overrides
to get this right in all cases. Instead, we can just view this as a codegen
opportunity. It's not even clear for x86 exactly when we should favor
test+set; some CPUs have better theoretical throughput for the ALU ops than
bt/test.

This patch is made more complicated than I expected because there's an early
DAGCombine for 'and' that can change types of the intermediate ops via
trunc+anyext.

Differential Revision: https://reviews.llvm.org/D66687

llvm-svn: 370668
2019-09-02 14:52:09 +00:00
Sanjay Patel c882208367 [DAGCombiner] improve throughput of shift+logic+shift
The motivating case for this is a long way from here:
https://bugs.llvm.org/show_bug.cgi?id=43146
...but I think this is where we have to start.

We need to canonicalize/optimize sequences of shift and logic to ease
pattern matching for things like bswap and improve perf in general.
But without the artificial limit of '!LegalTypes' (early combining),
there are a lot of test diffs, and not all are good.

In the minimal tests added for this proposal, x86 should have better
throughput in all cases. AArch64 is neutral for scalar tests because
it can fold shifts into bitwise logic ops.

There are 3 shift opcodes and 3 logic opcodes for a total of 9 possible patterns:
https://rise4fun.com/Alive/VlI
https://rise4fun.com/Alive/n1m
https://rise4fun.com/Alive/1Vn

Differential Revision: https://reviews.llvm.org/D67021

llvm-svn: 370617
2019-09-01 18:38:15 +00:00
Shiva Chen adfdcb9c26 [TargetLowering] Fix Bugzilla ID 43183 to avoid soften comparison broken with constant inputs
Summary:
  This fixes the bugzilla id 43183 which triggerd by the following commit:
  [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall

llvm-svn: 370604
2019-09-01 04:52:54 +00:00
Sanjay Patel 9e57b49392 [DAGCombiner] clean up code in visitShiftByConstant()
This is not quite NFC because the SDLoc propagation is changed,
but there are no regression test diffs from that.

llvm-svn: 370587
2019-08-31 15:08:58 +00:00
Amaury Sechet 82825ab882 [DAGCombiner] Match (add X, X) as (shl X, 1) when detecting rotate.
Summary: The combiner transforms (shl X, 1) into (add X, X).

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66882

llvm-svn: 370578
2019-08-31 11:40:02 +00:00
James Molloy e62c509cd4 [DAGCombiner] Don't create illegal narrow stores
Narrowing stores when the target doesn't support the narrow version
forces the target to expand into a load-modify-store sequence, which
is highly suboptimal. The information narrowing throws away (legality
of the inverse transform) is hard to re-analyze. If the target doesn't
support a store of the narrow type, don't narrow even in pre-legalize
mode.

No test as this is DAGCombiner and depends on target bits.

llvm-svn: 370576
2019-08-31 10:46:16 +00:00
Bjorn Pettersson e27c74abb6 [CodeGen] Refactor DAGTypeLegalizer::ExpandIntRes_MULFIX. NFC
Restructured the code a little bit in preparation for adding
UMULFIXSAT. I think it will be easier to understand the code
if not interleaving the codegen for signed/unsigned/saturated
cases that much.

llvm-svn: 370569
2019-08-31 09:28:50 +00:00
Simon Pilgrim 3be7081aa1 [DAGCombine] ReduceLoadWidth - remove duplicate SDLoc. NFCI.
SDLoc(N0) and SDLoc(cast<LoadSDNode>(N0)) should be equivalent.

llvm-svn: 370498
2019-08-30 18:19:02 +00:00
Simon Pilgrim 2d1e0899e9 [TargetLowering] SimplifyDemandedBits ADD/SUB/MUL - correctly inherit SDNodeFlags from the original node.
Just disable NSW/NUW flags. This matches what we're already doing for the other situations for these nodes, it was just missed for the demanded constant case.

Noticed by inspection - confirmed in offline discussion with @spatel. I've checked we have test coverage in the x86 extract-bits.ll and extract-lowbits.ll tests

llvm-svn: 370497
2019-08-30 17:58:55 +00:00
Simon Pilgrim ab8cb1a3c5 [DAGCombine] visitVSELECT - remove equivalent getValueType() call. NFCI.
llvm-svn: 370489
2019-08-30 17:21:20 +00:00
Simon Pilgrim c2fed1dc8a [DAGCombine] visitVSELECT - remove duplicate getOperand calls. NFCI.
llvm-svn: 370478
2019-08-30 15:17:37 +00:00
Simon Pilgrim 3367669668 [DAGCombine] visitVSELECT - use getShiftAmountTy for shift amounts.
llvm-svn: 370471
2019-08-30 13:30:37 +00:00
Simon Pilgrim 8e1989e79a [DAGCombine] visitMULHS - use getScalarValueSizeInBits() to make safe for vector types.
This is hidden behind a (scalar-only) isOneConstant(N1) check at the moment, but once we get around to adding vector support we need to ensure we're dealing with the scalar bitwidth, not the total.

llvm-svn: 370468
2019-08-30 12:22:06 +00:00
Simon Pilgrim 7cbf823f93 [DAGCombine] visitMULHS/visitMULHU - isBuildVectorAllZeros doesn't mean node is all zeros
Return a proper zero vector, just in case some elements are undef.

Noticed by inspection after dealing with a similar issue in PR43159.

llvm-svn: 370460
2019-08-30 10:42:14 +00:00
Dan Gohman 8cfeeaf9de [CodeGen] Fix lowering for returning the result of an extractvalue
When the number of return values exceeds the number of registers available,
SelectionDAGBuilder::visitRet transforms a function's return to use a
pointer to a buffer to hold return values. When the returned value is an
operator such as extractvalue, the value may have a non-zero result number.
Add that number to the indexing when obtaining the values to store.

This fixes https://bugs.llvm.org/show_bug.cgi?id=43132.

Differential Revision: https://reviews.llvm.org/D66978

llvm-svn: 370430
2019-08-30 04:33:22 +00:00
Simon Pilgrim ea67741899 [DAGCombine] Fix shadow variable warnings. NFCI.
llvm-svn: 370365
2019-08-29 14:34:07 +00:00
Simon Pilgrim 6c2fc64edc Fix signed/unsigned comparison warning. NFCI.
llvm-svn: 370333
2019-08-29 11:18:53 +00:00
Simon Pilgrim 27f43e6b1a Fix shadow variable warning. NFCI.
llvm-svn: 370332
2019-08-29 11:16:32 +00:00
Amaury Sechet 8365e42010 [DAGCombiner] (insert_vector_elt (vector_shuffle X, Y), (extract_vector_elt X, N), IdxC) -> (vector_shuffle X, Y)
Summary: This is beneficial when the shuffle is only used once and end up being generated in a few places when some node is combined into a shuffle.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66718

llvm-svn: 370326
2019-08-29 10:35:51 +00:00
Simon Pilgrim dfb2a19ac2 LegalizeSetCCCondCode - Reduce scope of NeedSwap to fix cppcheck warning. NFCI.
No need for this to be defined outside the only switch case its used in.

llvm-svn: 370320
2019-08-29 10:11:34 +00:00
Craig Topper 1aadf6f39f [X86] Make inline assembly 'x' and 'v' constraints work for f128.
Including a type legalizer fix to make bitcast operand promotion
work correctly when getSoftenedFloat returns f128 instead of i128.

Fixes PR43157

llvm-svn: 370293
2019-08-29 05:13:56 +00:00
Shiva Chen b39876d8cd [RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall
The patch fixed the issue that RV64 didn't clear the upper bits
when return complex floating value with lp64 ABI.

float _Complex
complex_add(float _Complex a, float _Complex b)
{
   return a + b;
}

RealResult = zero_extend(RealA + RealB)
ImageResult = ImageA + ImageB
Return (RealResult | (ImageResult << 32))

The patch introduces shouldExtendTypeInLibCall target hook to suppress
the AssertZext generation when lowering floating LibCall.

Thanks to Eli's comments from the Bugzilla
https://bugs.llvm.org/show_bug.cgi?id=42820

Differential Revision: https://reviews.llvm.org/D65497

llvm-svn: 370275
2019-08-28 23:40:37 +00:00
Kevin P. Neal ddf13c00ed [FPEnv] Add fptosi and fptoui constrained intrinsics.
This implements constrained floating point intrinsics for FP to signed and
unsigned integers.

Quoting from D32319:
The purpose of the constrained intrinsics is to force the optimizer to
respect the restrictions that will be necessary to support things like the
STDC FENV_ACCESS ON pragma without interfering with optimizations when
these restrictions are not needed.

Reviewed by:	Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton
Approved by:	Craig Topper
Differential Revision:	http://reviews.llvm.org/D63782

llvm-svn: 370228
2019-08-28 16:33:36 +00:00
Simon Pilgrim 14e07d7f4b [DAGCombine] Fix cppcheck shadow variable warning. NFCI.
We already have an outer Ops variable.

llvm-svn: 370197
2019-08-28 12:48:41 +00:00
Amaury Sechet 4f4387dd12 [TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles
Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66804

llvm-svn: 370190
2019-08-28 12:00:06 +00:00
Simon Pilgrim c5b38e2869 [DAGCombine] Remove LoadedSlice::Cost default 'ForCodeSize' constructor arguments. NFCI.
These were always being passed in and it allowed me to add the explicit tag to stop a cppcheck warning about 1 argument constructors.

llvm-svn: 370189
2019-08-28 11:50:36 +00:00
Matt Arsenault 2910184936 DAG: computeNumSignBits for MUL
Copied directly from the IR version.

Most of the testcases I've added for this are somewhat problematic
because they really end up testing the yet to be implemented version
for MUL_I24/MUL_U24.

llvm-svn: 370099
2019-08-27 19:05:33 +00:00
Sanjay Patel b516f1afdd [DAGCombiner] cancel fnegs from multiplied operands of FMA
(-X) * (-Y) + Z --> X * Y + Z

This is a missing optimization that shows up as a potential regression in D66050,
so we should solve it first. We appear to be partly missing this fold in IR as well.

We do handle the simpler case already:
(-X) * (-Y) --> X * Y

And it might be beneficial to make the constraint less conservative (eg, if both
operands are cheap, but not necessarily cheaper), but that causes infinite looping
for the existing fmul transform.

Differential Revision: https://reviews.llvm.org/D66755

llvm-svn: 370071
2019-08-27 15:17:46 +00:00
Amaury Sechet f28dee2cff [DAGCombiner] Add node to the worklist in topological order in parallelizeChainedStores
Summary: As per title.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66659

llvm-svn: 370056
2019-08-27 13:27:57 +00:00
Amaury Sechet a1e5ef3fd4 [DAGCombiner] Add node to the worklist in topological order after relegalization.
Summary: As per title.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66702

llvm-svn: 370040
2019-08-27 11:06:09 +00:00
Craig Topper 243ede9970 [SelectionDAGBuilder] Hide existence of ConstantDataVector vector from visitGetElementPtr.
ConstantDataVector is a specialized verison of ConstantVector
that stores data in a packed array of bits instead of as
individual pointers to other Constants. But we really shouldn't
expose that if we can void it. And we should handle regular
ConstantVector equally well.

This removes a dyn_cast to ConstantDataVector and just calls
getSplatValue directly on a Constant* if the type is a vector.

llvm-svn: 370018
2019-08-27 06:39:50 +00:00
Craig Topper 4a3f62f9fd [SelectionDAGBuilder] Fix typo in comment. NFC
llvm-svn: 370017
2019-08-27 06:38:51 +00:00
Richard Trieu 58e67b8aa3 Revert r369927 - [DAGCombiner] Remove a bunch of redundant AddToWorklist calls.
This change causes instrumented builds of Clang to have a fatal error in the
backend.  https://reviews.llvm.org/D66537 has the details.

llvm-svn: 370006
2019-08-27 02:04:11 +00:00
Craig Topper 846429de74 [DAGCombiner][X86] Teach SimplifyVBinOp to fold VBinOp (concat X, undef/constant), (concat Y, undef/constant) -> concat (VBinOp X, Y), VecC
This improves the combine I included in D66504 to handle constants in the upper operands of the concat. If we can constant fold them away we can pull the concat after the bin op. This helps with chains of madd reductions on X86 from loop unrolling. The loop madd reduction pattern creates pmaddwd with half the width of the add that follows it using zeroes to fill the upper bits. If we have two of these added together we can pull the zeroes through the accumulating add and then shrink it.

Differential Revision: https://reviews.llvm.org/D66680

llvm-svn: 369937
2019-08-26 17:59:11 +00:00
Amaury Sechet b7075e40f3 [DAGCombiner] Remove a bunch of redundant AddToWorklist calls.
Summary:
This comes as a first step toward processing the DAG nodes in topological orders. Doing so ensure that arguments of a node are combined before the node itself is combined, which exposes ore opportunities for optimization and/or reduce the amount of patterns a node has to match for.

DAGCombiner adding nodes to the worklist is various places causes the nodes to be in a different order from what is expected. In addition, this is reduant because these nodes end up being added to the worklist anyways due to the machinery at line 1621.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66537

llvm-svn: 369927
2019-08-26 17:02:12 +00:00
Craig Topper b8b90ac1c5 [X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of inserting into undef
Summary:
Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not.

I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66504

llvm-svn: 369872
2019-08-25 17:59:49 +00:00
Nikita Popov aa71c977ba [SDAG] Fold umul_lohi with 0 or 1 multiplicand
These can turn up during multiplication legalization. In principle
these should also apply to smul_lohi, but I wasn't able to figure
out how to produce those with the necessary operands.

Differential Revision: https://reviews.llvm.org/D66380

llvm-svn: 369864
2019-08-25 08:04:22 +00:00
Benjamin Kramer dc5f805d31 Do a sweep of symbol internalization. NFC.
llvm-svn: 369803
2019-08-23 19:59:23 +00:00
Craig Topper e7211bb567 [SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression
Patch showing the effect of enabling bool vector oversimplification.

Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify:

insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i

Preventing the removal of the AND to clear the upper bits of result

Differential Revision: https://reviews.llvm.org/D53022

llvm-svn: 369780
2019-08-23 17:14:58 +00:00
Simon Pilgrim 04906ef1f2 [DAGCombine] GetNegatedExpression - add FMA\FMAD support
If the accumulator and either of the multiply operands are negatable then we can we negate the entire expression.

Differential Revision: https://reviews.llvm.org/D63141

llvm-svn: 369746
2019-08-23 10:49:46 +00:00
Amaury Sechet 95cf66de7c [DAGCombiner] Remove explicit call to AddToWorklist in sqrt and reciprocal computations
Summary: These nodes end up being processed regardless due to DAGCombiner ensuring arguments are processed. This changes the order in which nodes are processed, which fixes an issue on PowerPC.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri, mcberg2017, stefanp, hfinkel

Subscribers: nemanjai, MaskRay, jsji, steven.zhang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66548

llvm-svn: 369662
2019-08-22 15:35:45 +00:00
Shiva Chen 72a41e7b0d [TargetLowering] Remove optional arguments passing to makeLibCall
The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497.
The struct contain argument flags which will pass to makeLibCall function.
The patch should not has any functionality changes.

Differential Revision: https://reviews.llvm.org/D65795

llvm-svn: 369622
2019-08-22 04:59:43 +00:00
Amaury Sechet c0f190a048 [DAGCombiner] Remove mostly redundant calls to AddToWorklist
Summary:
These calls change the order in which some nodes are processed and so have an effect on codegen.

The change in fixup-bw-copy.ll is due to (and (load anyext)) gets transformed into (load zext) while previously the and was removed by SimplifyDemandedBits, so the (load anyext) remained.

Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66543

llvm-svn: 369561
2019-08-21 18:51:08 +00:00
Amaury Sechet 045f33aec9 [DAGCombiner] Various nits. NFC
llvm-svn: 369520
2019-08-21 12:01:37 +00:00
Craig Topper ba375263e8 [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
I also had to add a new combine to X86's combineExtractSubvector to prevent a regression.

This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.

I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.

Differential Revision: https://reviews.llvm.org/D66456

llvm-svn: 369459
2019-08-20 22:12:50 +00:00
Roman Lebedev edfaee0811 [TargetLowering] x s% C == 0 fold: vector divisor with INT_MIN handling
Summary:
The general fold is only valid for positive divisors.
Which effectively means, it is invalid for `INT_MIN` divisors,
and we currently bailout if we see them.

But that is too strict, we can just fix-up the results.
For that, let's do a second computation 'in parallel':
```
Name: srem -> and
Pre: isPowerOf2(C)
%o = srem i8 %X, C
%r = icmp eq %o, 0
  =>
%n = and i8 %X, C-1
%r = icmp eq %n, 0
```
https://rise4fun.com/Alive/Sup

And then just blend results: if the divisor was `INT_MIN`,
pick the value we got via bit-test,
else pick the value from general fold.

There's interesting observation - `ISD::ROTR` is set to
`LegalizeAction::Expand` before AVX512, so we should not
treat `INT_MIN` divisor as even; and as it can be seen
while `@test_srem_odd_even_one` improves on all run-lines,
`@test_srem_odd_even_INT_MIN` only improves for AVX512.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66300

llvm-svn: 369268
2019-08-19 15:01:42 +00:00
Craig Topper f43106e341 [SelectionDAG] Add a node creation debug message to getMachineNode.
llvm-svn: 369204
2019-08-18 06:28:00 +00:00
Bjorn Pettersson 9dddd26e31 [DAGCombiner] Add simple folds for SMULFIX/UMULFIX/SMULFIXSAT
Summary:
Add the following DAGCombiner folds for mulfix being
one of SMULFIX/UMULFIX/SMULFIXSAT:
  (mulfix x, undef, scale) -> 0
  (mulfix x, 0, scale) -> 0

Also added canonicalization of constants to RHS.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66052

llvm-svn: 369103
2019-08-16 13:16:48 +00:00
Daniel Sanders 0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Simon Pilgrim d4df81f463 Remove SmallBitVector.h include. NFCI.
SmallBitVector/BitVector types aren't used at all in the cpp file.

llvm-svn: 369008
2019-08-15 14:40:37 +00:00
Simon Pilgrim 983e9118a2 Remove BitVector.h include. NFCI.
BitVector type isn't used at all in the cpp file.

llvm-svn: 369007
2019-08-15 14:39:28 +00:00
Simon Pilgrim ed804dad1e [DAGCombine] MergeConsecutiveStores - fix cppcheck/MSVC extension warning. NFCI.
Set the StartIdx type to size_t so that it matches the StoreNodes SmallVector size() and index types.

Silences the MSVC analyzer warning that unsigned increment might overflow before exceeding size_t on 64-bit targets - this isn't likely to happen but it means we use consistent types and reduces the warning "noise" a little.

llvm-svn: 368998
2019-08-15 13:07:14 +00:00
Sanjay Patel 57d459309d [SDAG][x86] check for relaxed math when matching an FP reduction
If the last step in an FP add reduction allows reassociation and doesn't care
about -0.0, then we are free to recognize that computation as a reduction
that may reorder the intermediate steps.

This is requested directly by PR42705:
https://bugs.llvm.org/show_bug.cgi?id=42705
and solves PR42947 (if horizontal math instructions are actually faster than
the alternative):
https://bugs.llvm.org/show_bug.cgi?id=42947

Differential Revision: https://reviews.llvm.org/D66236

llvm-svn: 368995
2019-08-15 12:43:15 +00:00
Florian Hahn de1d6c8220 Add ptrmask intrinsic
This patch adds a ptrmask intrinsic which allows masking out bits of a
pointer that must be zero when accessing it, because of ABI alignment
requirements or a restriction of the meaningful bits of a pointer
through the data layout.

This avoids doing a ptrtoint/inttoptr round trip in some cases (e.g. tagged
pointers) and allows us to not lose information about the underlying
object.

Reviewers: nlopes, efriedma, hfinkel, sanjoy, jdoerfert, aqjune

Reviewed by: sanjoy, jdoerfert

Differential Revision: https://reviews.llvm.org/D59065

llvm-svn: 368986
2019-08-15 10:12:26 +00:00
Craig Topper e7ea06b7d2 [SelectionDAGBuilder] Teach gather/scatter getUniformBase to look through vector zeroinitializer indices in addition to scalar zeroes.
llvm-svn: 368926
2019-08-14 21:38:56 +00:00
Sanjay Patel ecccf29e6c [SDAG] move variable closer to use; NFC
llvm-svn: 368905
2019-08-14 19:46:15 +00:00
Roman Lebedev 676594305a [CodeGen][SelectionDAG] More efficient code for X % C == 0 (SREM case)
Summary:
This implements an optimization described in Hacker's Delight 10-17:
when `C` is constant, the result of `X % C == 0` can be computed
more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

One huge caveat: this signed case is only valid for positive divisors.

While we can freely negate negative divisors, we can't negate `INT_MIN`,
so for now if `INT_MIN` is encountered, we bailout.
As a follow-up, it should be possible to handle that more gracefully
via extra `and`+`setcc`+`select`.

This passes llvm's test-suite, and from cursory(!) cross-examination
the folds (the assembly) match those of GCC, and manual checking via alive
did not reveal any issues (other than the `INT_MIN` case)

Reviewers: RKSimon, spatel, hermord, craig.topper, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, thakis, javed.absar, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65366

llvm-svn: 368702
2019-08-13 14:57:37 +00:00
Roman Lebedev f4de7eda4a [TargetLowering][NFC] prepareUREMEqFold(): fixup comment
The comment initially matched the code, but the code was incorrect
and was fixed after the initial revert back back when it was introduced,
but the comment was never updated.

llvm-svn: 368701
2019-08-13 14:57:08 +00:00
Hans Wennborg 5390d25f2b Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT"
This introduced a false positive MemorySanitizer warning about use of
uninitialized memory in a vectorized crc function in Chromium. That suggests
maybe something is not right with this transformation. See
https://crbug.com/992853#c7 for a reproducer.

This also reverts the follow-up commits r368307 and r368308 which
depended on this.

> This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
>
> In particular this helps remove some unnecessary scalar->vector->scalar patterns.
>
> The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
>
> Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368660
2019-08-13 09:33:25 +00:00
Simon Pilgrim 05e8209e33 [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::TRUNCATE
llvm-svn: 368553
2019-08-12 10:56:05 +00:00
Bjorn Pettersson 27038a3780 [SelectionDAG] Widen vector results of SMULFIX/UMULFIX/SMULFIXSAT
Summary:
After the commits that changed x86 backend to widen vectors
instead of using promotion some of our downstream tests
started to fail. It was noticed that WidenVectorResult has
been missing support for SMULFIX/UMULFIX/SMULFIXSAT. This
patch adds the missing functionality.

Reviewers: craig.topper, RKSimon

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66051

llvm-svn: 368540
2019-08-11 19:27:06 +00:00
Sanjay Patel 26b2c11451 [DAGCombiner] exclude x*2.0 from normal negation profitability rules
This is the codegen part of fixing:
https://bugs.llvm.org/show_bug.cgi?id=32939

Even with the optimal/canonical IR that is ideally created by D65954,
we would reverse that transform in DAGCombiner and end up with the same
asm on AArch64 or x86.

I see 2 options for trying to correct this:

  1. Limit isNegatibleForFree() by special-casing the fmul pattern (this patch).
  2. Avoid creating (fmul X, 2.0) in the 1st place by adding a special-case
     transform to SelectionDAG::getNode() and/or SelectionDAGBuilder::visitFMul()
     that matches the transform done by DAGCombiner.

This seems like the less intrusive patch, but if there's some other reason to
prefer 1 option over the other, we can change to the other option.

Differential Revision: https://reviews.llvm.org/D66016

llvm-svn: 368490
2019-08-09 21:37:32 +00:00
Sanjay Patel 0b4ae34c2f [DAGCombiner] remove redundant fold for X*1.0; NFC
This is handled at node creation time (similar to X/1.0)
after:
rL357029
(no fast-math-flags needed)

llvm-svn: 368443
2019-08-09 14:30:59 +00:00
Craig Topper 9158e54270 [SelectionDAG][X86] Move setcc mask splitting for mload/mstore/mgather/mscatter from DAGCombiner to the type legalizer.
We may be able to look to how VSELECT is handled to further
improve this, but this appears to be neutral or an improvement
on the test cases we have.

llvm-svn: 368344
2019-08-08 21:14:08 +00:00
Craig Topper bce4d79f37 [LegalizeTypes] Remove SplitVSETCC helper and just call SplitVecRes_SETCC.
llvm-svn: 368343
2019-08-08 21:13:58 +00:00
Simon Pilgrim e2e366797e [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT
This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.

In particular this helps remove some unnecessary scalar->vector->scalar patterns.

The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.

Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368276
2019-08-08 10:37:03 +00:00
Amy Huang 0b870b969f Recommit "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG"
with a fix to clear the SDNode map when SelectionDAG is cleared.

llvm-svn: 368230
2019-08-07 22:49:40 +00:00
Simon Pilgrim 0eafe011ca [TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::VECTOR_SHUFFLE
In particular this helps the SSE vector shift cvttps2dq+add+shl pattern by avoiding the need for zeros in shuffle style extensions to vXi32 types as we'll be shifting out those bits anyway

llvm-svn: 368155
2019-08-07 11:43:13 +00:00
Aditya Nandakumar c8ac029d0a [GISel]: Add GISelKnownBits analysis
https://reviews.llvm.org/D65698

This adds a KnownBits analysis pass for GISel. This was done as a
pass (compared to static functions) so that we can add other features
such as caching queries(within a pass and across passes) in the future.
This patch only adds the basic pass boiler plate, and implements a lazy
non caching knownbits implementation (ported from SelectionDAG). I've
also hooked up the AArch64PreLegalizerCombiner pass to use this - there
should be no compile time regression as the analysis is lazy.

llvm-svn: 368065
2019-08-06 17:18:29 +00:00
Simon Pilgrim dae5ddad9d [TargetLowering] SimplifyMultipleUseDemandedBits - return UNDEF for undemanded ops
If we demand no bits/elts from an Op, just return UNDEF

llvm-svn: 368043
2019-08-06 14:30:42 +00:00
Ulrich Weigand 7b24dd741c [Strict FP] Allow custom operation actions
This patch changes the DAG legalizer to respect the operation actions
set by the target for strict floating-point operations. (Currently, the
legalizer will usually fall back to mutate to the non-strict action
(which is assumed to be legal), and only skip mutation if the strict
operation is marked legal.)

With this patch, if whenever a strict operation is marked as Legal or
Custom, it is passed to the target as usual. Only if it is marked as
Expand will the legalizer attempt to mutate to the non-strict operation.
Note that this will now fail if the non-strict operation is itself
marked as Custom -- the target will have to provide a Custom definition
for the strict operation then as well.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D65226

llvm-svn: 368012
2019-08-06 10:43:13 +00:00
Cullen Rhodes ced419f4d7 [SelectionDAG] Extend base addressing modes supported by MGATHER/MSCATTER
Summary:
Before this patch MGATHER/MSCATTER is capable of representing all
common addressing modes, but only when illegal types are used.
This patch adds an IndexType property so more representations
are available when using legal types only.

Original modes:
 vector of bases
 base + vector of signed scaled offsets

New modes:
 base + vector of signed unscaled offsets
 base + vector of unsigned scaled offsets
 base + vector of unsigned unscaled offsets

The current behaviour of addressing modes for gather/scatter remains
unchanged.

Patch by Paul Walker.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D65636

llvm-svn: 368008
2019-08-06 09:46:13 +00:00
Matt Arsenault f4d3113a5f CodeGen: Migration to using Register
llvm-svn: 367974
2019-08-06 03:59:31 +00:00
Matt Arsenault 3922392969 AMDGPU: Correct behavior of f16 buffer loads
Don't assume format loads for f16. Also fixes support for targets
without i16.

llvm-svn: 367879
2019-08-05 15:59:07 +00:00
Sanjay Patel eaf13044bd [DAGCombiner][x86] prevent infinite loop from truncate/extend transforms
The test case is based on the example from the post-commit thread for:
https://reviews.llvm.org/rGc9171bd0a955

This replaces the x86-specific simple-type check from:
rL367766
with a check in the DAGCombiner. Adding the check isn't
strictly necessary after the fix from:
rL367768
...but it seems likely that we're heading for trouble if
we are creating weird types in this transform.

I combined the earlier legality check into the initial
clause to simplify the code.

So we should only try the trunc/sext transform at the
earliest combine stage, but we limit the transform to
simple types anyway because the TLI hook is probably
too lax about what it considers a free truncate.

llvm-svn: 367834
2019-08-05 11:27:07 +00:00
Guillaume Chatelet c97a3d15d2 [LLVM][Alignment] Introduce Alignment Type
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Reviewed By: jfb

Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65514

llvm-svn: 367828
2019-08-05 11:02:05 +00:00
Guillaume Chatelet 65e4b47aad [LLVM][Alignment] Introduce Alignment Type in DataLayout
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jfb, jakehehrlich

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65521

Make getFunctionPtrAlign() return MaybeAlign

llvm-svn: 367817
2019-08-05 09:00:43 +00:00
Craig Topper 5a4989e2ac [TargetLowering][X86] Teach SimplifyDemandedVectorElts to replace the base vector of INSERT_SUBVECTOR with undef if none of the elements are demanded even if the node has other users.
Summary:
The SimplifyDemandedVectorElts function can replace with undef
when no elements are demanded, but due to how it interacts with
TargetLoweringOpts, it can only do this when the node has
no other users.

Remove a now unneeded DAG combine from the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65713

llvm-svn: 367788
2019-08-04 17:30:41 +00:00
Craig Topper 76f0f2e0f0 [SelectionDAG] Add node creation debug message to getMemIntrinsicNode.
llvm-svn: 367771
2019-08-04 02:32:06 +00:00
Craig Topper 2edeb8a11a [DAGCombiner] Prevent the combine added in r367710 from creating illegal types after type legalization.
This is further fix for PR42880.

Sanjay already disabled the X86 TLI hook for non-simple types,
but we should really call isTypeLegal here if we're after type
legalization.

llvm-svn: 367768
2019-08-03 23:09:13 +00:00
Bill Wendling 41a2847a9a Emit diagnostic if an inline asm constraint requires an immediate
Summary:
An inline asm call can result in an immediate after inlining. Therefore emit a
diagnostic here if constraint requires an immediate but one isn't supplied.

Reviewers: joerg, mgorny, efriedma, rsmith

Reviewed By: joerg

Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60942

llvm-svn: 367750
2019-08-03 05:52:47 +00:00
Simon Pilgrim 794f7591ec [TargetLowering] SimplifyMultipleUseDemandedBits - don't assume INSERT_VECTOR_ELT value type is simple.
Noticed by inspection - this was copied from the X86 target equivalent where we can assume its legal/simple.

llvm-svn: 367721
2019-08-02 21:07:07 +00:00
Philip Reames 511be2a158 [Statepoints] Fix overalignment of loads in no-realign-stack functions
This really should have been part of 366765.  For some reason, I forgot to handle the corresponding load side, and the readable test cases (using deopt vs statepoints) turned out to be overly reduced.  Oops.

As seen in the test change, the problem was that we were using a load with alignment expectations rather than the unaligned variant when the stack alignment was less than that prefered type alignment.

llvm-svn: 367718
2019-08-02 20:17:37 +00:00
Sanjay Patel 68264558f9 [DAGCombiner] try to convert opposing shifts to casts
This reverses a questionable IR canonicalization when a truncate
is free:

sra (add (shl X, N1C), AddC), N1C -->
sext (add (trunc X to (width - N1C)), AddC')

https://rise4fun.com/Alive/slRC

More details in PR42644:
https://bugs.llvm.org/show_bug.cgi?id=42644

I limited this to pre-legalization for code simplicity because that
should be enough to reverse the IR patterns. I don't have any
evidence (no regression test diffs) that we need to try this later.

Differential Revision: https://reviews.llvm.org/D65607

llvm-svn: 367710
2019-08-02 19:33:46 +00:00
Daniel Sanders 2bea69bf65 Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC
llvm-svn: 367633
2019-08-01 23:27:28 +00:00
Craig Topper a9ed5436bd [X86] In decomposeMulByConstant, legalize the VT before querying whether the multiply is legal
If a type is larger than a legal type and needs to be split, we would previously allow the multiply to be decomposed even if the split multiply is legal. Since the shift + add/sub code would also need to be split, its not any better to decompose it.

This patch figures out what type the mul will eventually be legalized to and then uses that type for the query. I tried just returning false illegal types and letting them get handled after type legalization, but then we can't recognize and i64 constant splat on 32-bit targets since will be destroyed by type legalization. We could special case vectors of i64 to avoid that...

Differential Revision: https://reviews.llvm.org/D65533

llvm-svn: 367601
2019-08-01 18:49:07 +00:00
Simon Pilgrim 1d183b407a [TargetLowering] SimplifyMultipleUseDemandedBits - Add ISD::INSERT_VECTOR_ELT handling
Allow us to peek through vector insertions to avoid dependencies on entire insertion chains.

llvm-svn: 367588
2019-08-01 17:46:44 +00:00
Craig Topper 388df2ea19 [SelectionDAG] Use APInt::isSubsetOf/intersects to simplify some code.
Also use KnownBits::isNegative/isNonNegative to further simplify.

llvm-svn: 367518
2019-08-01 06:06:21 +00:00
Amy Huang 153f20057c Revert "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG" and
and partial fix.
Causes windows buildbot errors.

This reverts commit 6e65c34523963094acd0d6c94a5f5c64b32fe6aa and
53da7ca943.

llvm-svn: 367496
2019-07-31 23:59:31 +00:00
Michael Berg 005d705d43 Migrate some more fadd and fsub cases away from UnsafeFPMath control to utilize NoSignedZerosFPMath options control
Summary: Honoring no signed zeroes is also available as a user control through clang separately regardless of fastmath or UnsafeFPMath context, DAG guards should reflect this context.

Reviewers: spatel, arsenm, hfinkel, wristow, craig.topper

Reviewed By: spatel

Subscribers: rampitec, foad, nhaehnle, wuzish, nemanjai, jvesely, wdng, javed.absar, MaskRay, jsji

Differential Revision: https://reviews.llvm.org/D65170

llvm-svn: 367486
2019-07-31 21:57:28 +00:00
Amy Huang 27a73dd02c Fix to r367374 "[MS] Emit S_HEAPALLOCSITE debug info in Selection DAG"
after windows buildbot failure.

Added a check that the MachineInstr exists and is a call before trying
to add symbols around it.

llvm-svn: 367483
2019-07-31 21:03:38 +00:00
Peter Collingbourne 33773d5cfc SelectionDAG, MI, AArch64: Widen target flags fields/arguments from unsigned char to unsigned.
This makes the field wider than MachineOperand::SubReg_TargetFlags so that
we don't end up silently truncating any higher bits. We should still catch
any bits truncated from the MachineOperand field as a consequence of the
assertion in MachineOperand::setTargetFlags().

Differential Revision: https://reviews.llvm.org/D65465

llvm-svn: 367474
2019-07-31 20:14:09 +00:00
Wei Mi f49c107f06 [DAGCombine] Limit the number of times for the same store and root nodes
to bail out in store merging dependence check.

We run into a case where dependence check in store merging bail out many times
for the same store and root nodes in a huge basicblock. That increases compile
time by almost 100x. The patch add a map to track how many times the bailing
out happen for the same store and root, and if it is over a limit, stop
considering the store with the same root as a merging candidate.

Differential Revision: https://reviews.llvm.org/D65174

llvm-svn: 367472
2019-07-31 19:59:24 +00:00
Amy Huang 53da7ca943 [MS] Emit S_HEAPALLOCSITE debug info in SelectionDAG
Summary: This emits labels around heapallocsite calls in SelectionDAG.

Reviewers: rnk

Subscribers: MatzeB, aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61105

llvm-svn: 367374
2019-07-31 00:16:13 +00:00
Wei Mi 888efda280 [DAGCombiner] Add an option to control whether or not to enable store merging.
Add an option to control whether or not to enable store merging in dag combiner
so we can workaround some bugs more easily.

Differential Revision: https://reviews.llvm.org/D65482

llvm-svn: 367365
2019-07-30 23:14:56 +00:00
Simon Pilgrim f8a7e9de06 [DAGCombine] narrowInsertExtractVectorBinOp - early out for binops that change value type. NFCI.
This is implicit in the value type checks in getSubVectorSrc - this just makes it upfront and obvious.

llvm-svn: 367220
2019-07-29 11:34:45 +00:00
Simon Pilgrim 76f2f04d9d [DAGCombine] narrowInsertExtractVectorBinOp - early out for illegal op. NFCI.
If the subvector binop is illegal then early-out and avoid the subvector searches.

llvm-svn: 367181
2019-07-27 19:42:58 +00:00
Simon Pilgrim 603f94aa2a [TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support (Reapplied)
This allows us to peek through BITCASTs, attempt to simplify the source operand, and then bitcast back.

This reapplies rL367091 which was reverted at rL367118 - we were inconsistently peeking through the bitcasts to the source value.

Fixes PR42777

llvm-svn: 367174
2019-07-27 14:11:59 +00:00
Simon Pilgrim 8a52671782 [SelectionDAG] Check for any recursion depth greater than or equal to limit instead of just equal the limit.
If anything called the recursive isKnownNeverNaN/computeKnownBits/ComputeNumSignBits/SimplifyDemandedBits/SimplifyMultipleUseDemandedBits with an incorrect depth then we could continue to recurse if we'd already exceeded the depth limit.

This replaces the limit check (Depth == 6) with a (Depth >= 6) to make sure that we don't circumvent it. 

This causes a couple of regressions as a mixture of calls (SimplifyMultipleUseDemandedBits + combineX86ShufflesRecursively) were calling with depths that were already over the limit. I've fixed SimplifyMultipleUseDemandedBits to not do this. combineX86ShufflesRecursively is trickier as we get a lot of regressions if we reduce its own limit from 8 to 6 (it also starts at Depth == 1 instead of Depth == 0 like the others....) - I'll see what I can do in future patches.

llvm-svn: 367171
2019-07-27 12:48:46 +00:00
Simon Pilgrim 3ff6126487 [TargetLowering] Add depth limit to SimplifyMultipleUseDemandedBits
We're getting reports of massive compile time increases because SimplifyMultipleUseDemandedBits was losing track of the depth and not earlying-out. No repro yet, but consider this a pre-emptive commit.

llvm-svn: 367169
2019-07-27 12:23:36 +00:00
Nico Weber 13f337c4cb Revert r367091, it caused PR42777.
llvm-svn: 367118
2019-07-26 14:58:42 +00:00
Simon Pilgrim a424a1f351 [SelectionDAG] GetDemandedBits - update SIGN_EXTEND_INREG op to just call SimplifyMultipleUseDemandedBits.
llvm-svn: 367098
2019-07-26 10:03:07 +00:00
Simon Pilgrim 9758407bf1 [TargetLowering] SimplifyMultipleUseDemandedBits - add SIGN_EXTEND_INREG support.
llvm-svn: 367096
2019-07-26 09:41:08 +00:00
Simon Pilgrim d0164fc525 [SelectionDAG] GetDemandedBits - update OR/XOR ops to just call SimplifyMultipleUseDemandedBits.
Eventually all of these will be moved over, but we create nodes in GetDemandedBits recursion at the moment which causes regressions when we try to remove them all.

llvm-svn: 367092
2019-07-26 09:13:29 +00:00
Simon Pilgrim b32ceb79b0 [TargetLowering] SimplifyMultipleUseDemandedBits - add BITCAST pass through support.
This allows us to peek through BITCASTs and attempt simplify the source operand, and then bitcast back.

llvm-svn: 367091
2019-07-26 08:38:39 +00:00
Roman Lebedev 017e272c3a [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold
Summary:
This was originally reported in D62818.
https://rise4fun.com/Alive/oPH

InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression
will be hoisted out of a loop if `Y` is invariant and `X` is not.
But as it is seen from the diffs here, if it didn't get hoisted,
the produced assembly is almost universally worse.

Much like with my recent "hoist add/sub by/from const" patches,
we should get almost universal win if we hoist constant,
there is almost always an "and/test by imm" instruction,
but "shift of imm" not so much, so we may avoid having to
materialize the immediate, and thus need one less register.
And since we now shift not by constant, but by something else,
the live-range of that something else may reduce.

Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit`
instruction pattern. And to not get into endless combine loop.

Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm

Reviewed By: spatel

Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62871

llvm-svn: 366955
2019-07-24 22:57:22 +00:00
Simon Pilgrim 2bf871be4c Fix signed/unsigned comparison warning. NFCI.
llvm-svn: 366935
2019-07-24 17:44:22 +00:00
Simon Pilgrim 7d318b2bb1 [DAGCombine] matchBinOpReduction - add partial reduction matching
This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector:

e.g. <8 x i32> reduction pattern in a <16 x i32> vector:

<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>
<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>

matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction.

I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction.

Fixes the x86 partial reduction sum cases in PR33758 and PR42023.

Differential Revision: https://reviews.llvm.org/D65047

llvm-svn: 366933
2019-07-24 17:29:56 +00:00
Simon Pilgrim 3f01c7197f [SelectionDAG] makeEquivalentMemoryOrdering - early out for equal chains (PR42727)
If we are already using the same chain for the old/new memory ops then just return.

Fixes PR42727 which had getLoad() reusing an existing node.

llvm-svn: 366922
2019-07-24 16:53:14 +00:00
Sanjay Patel 10dad95a75 [SDAG] convert (sub x, 1) to (add x, -1) in ctpop expansion; NFC
We canonicalize to the add form, so create that directly for efficiency.

llvm-svn: 366914
2019-07-24 15:43:50 +00:00
Simon Pilgrim 0e8359aec1 [TargetLowering] SimplifyMultipleUseDemandedBits - add VECTOR_SHUFFLE support.
If all the demanded elts are from one operand and are inline, then we can use the operand directly.

The changes are mainly from SSE41 targets which has blendvpd but not cmpgtq, allowing the v2i64 comparison to be simplified as we only need the signbit from alternate v4i32 elements.

llvm-svn: 366817
2019-07-23 15:35:55 +00:00
Simon Pilgrim 743d45ee25 [TargetLowering] Add SimplifyMultipleUseDemandedBits
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.

The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.

We do see a couple of regressions that need to be addressed:

    AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
	
    X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.

The code owners have confirmed its ok for these cases to fixed up in future patches.

Differential Revision: https://reviews.llvm.org/D63281

llvm-svn: 366799
2019-07-23 12:39:08 +00:00
Craig Topper a658cb0b12 [DAGCombiner] Make ShrinkLoadReplaceStoreWithStore return an SDValue instead of an SDNode*. NFCI
The function was calling getNode() on an SDValue to return and the
caller turned the result back into a SDValue. So just return the
original SDValue to avoid this.

llvm-svn: 366779
2019-07-23 05:13:39 +00:00
Craig Topper f5247244f2 [DAGCombiner] Use SDNode::isOperandOf to simplify some code. NFCI
llvm-svn: 366778
2019-07-23 05:13:35 +00:00
Richard Trieu 81a5045cd6 Move variable out from debug only section.
MFI is no longer just needed for an assert.  Move it out of the debug only
section to allow non-assert builds to be able to find it.

llvm-svn: 366773
2019-07-23 02:59:15 +00:00
Philip Reames 2f5543aa72 [Statepoints] Fix a bug in statepoint lowering for functions w/no-realign-stack
We were silently using the ABI alignment for all of the stores generated for deopt and gc values.  We'd gotten the alignment of the stack slot itself properly reduced (via MachineFrameInfo's clamping), but having the MMO on the store incorrect was enough for us to generate an aligned store to a unaligned location.

The simplest fix would have been to just pass the alignment to the helper function, but once we do that, the helper function doesn't really help.  So, inline it and directly call the MMO version of DAG.getStore with a properly constructed MMO.

Note that there's a separate performance possibility here.  Even if we *can* realign stacks, we probably don't *want to* if all of the stores are in slowpaths.  But that's a later patch, if at all.  :)

llvm-svn: 366765
2019-07-22 23:33:18 +00:00
Matt Arsenault 542720b2bc TableGen: Support physical register inputs > 255
This was truncating register value that didn't fit in unsigned char.
Switch AMDGPU sendmsg intrinsics to using a tablegen pattern.

llvm-svn: 366695
2019-07-22 15:02:34 +00:00
Christudasan Devadasan 006cf8c03d Added address-space mangling for stack related intrinsics
Modified the following 3 intrinsics:
int_addressofreturnaddress,
int_frameaddress & int_sponentry.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D64561

llvm-svn: 366679
2019-07-22 12:42:48 +00:00
Oliver Stannard 6771a89fa0 [IPRA][ARM] Make use of the "returned" parameter attribute
ARM has code to recognise uses of the "returned" function parameter
attribute which guarantee that the value passed to the function in r0
will be returned in r0 unmodified. IPRA replaces the regmask on call
instructions, so needs to be told about this to avoid reverting the
optimisation.

Differential revision: https://reviews.llvm.org/D64986

llvm-svn: 366669
2019-07-22 08:44:36 +00:00
Roman Lebedev cd9b19484b [Codegen][SelectionDAG] X u% C == 0 fold: non-splat vector improvements
Summary:
Four things here:
1. Generalize the fold to handle non-splat divisors. Reasonably trivial.
2. Unban power-of-two divisors. I don't see any reason why they should
   be illegal.
   * There is no ban in Hacker's Delight
   * I think the ban came from the same bug that caused the miscompile
      in the base patch - in `floor((2^W - 1) / D)` we were dividing by
      `D0` instead of `D`, and we **were** ensuring that `D0` is not `1`,
      which made sense.
3. Unban `1` divisors. I no longer believe Hacker's Delight actually says
   that the fold is invalid for `D = 0`. Further considerations:
   * We know that
     * `(X u% 1) == 0`  can be constant-folded to `1`,
     * `(X u% 1) != 0`  can be constant-folded to `0`,
   *  Also, we know that
     * `X u<= -1` can be constant-folded to `1`,
     * `X u>  -1` can be constant-folded to `0`,
   * https://godbolt.org/z/7jnZJX https://rise4fun.com/Alive/oF6p
   * We know will end up with the following:
       `(setule/setugt (rotr (mul N, P), K), Q)`
   * Therefore, for given new DAG nodes and comparison predicates
     (`ule`/`ugt`), we will still produce the correct answer if:
     `Q` is a all-ones constant; and both `P` and `K` are *anything*
     other than `undef`.
   * The fold will indeed produce `Q = all-ones`.
4. Try to re-splat the `P` and `K` vectors - we don't care about
   their values for the lanes where divisor was `1`.

Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00

Reviewed By: RKSimon

Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63963

llvm-svn: 366637
2019-07-20 16:33:15 +00:00
Matt Arsenault 5905aae169 DAG: Handle dbg_value for arguments split into multiple subregs
This was handled previously for arguments split due to not fitting in
an MVT. This was dropping the register for argument registers split
due to TLI::getRegisterTypeForCallingConv.

llvm-svn: 366574
2019-07-19 13:36:46 +00:00
Amy Huang f332fe642c [COFF] Change a variable type to be const in the HeapAllocSite map.
llvm-svn: 366479
2019-07-18 18:22:52 +00:00
Simon Pilgrim 8b525e357f [DAGCombine] Pull getSubVectorSrc helper out of narrowInsertExtractVectorBinOp. NFCI.
NFC step towards reusing this in other EXTRACT_SUBVECTOR combines.

llvm-svn: 366435
2019-07-18 13:45:53 +00:00
Evgeniy Stepanov d752f5e953 Basic codegen for MTE stack tagging.
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.

Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.

Differential Revision: https://reviews.llvm.org/D64172

llvm-svn: 366360
2019-07-17 19:24:02 +00:00
Amaury Sechet f34a69c2e2 [DAGCombiner] fold (addcarry (xor a, -1), b, c) -> (subcarry b, a, !c) and flip carry.
Summary:
As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b.

Depends on D57302

Reviewers: hfinkel, RKSimon, craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59208

llvm-svn: 366214
2019-07-16 15:17:00 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
Simon Pilgrim 701e2c0d71 [DAGCombine] narrowExtractedVectorBinOp - wrap subvector extraction in helper. NFCI.
First step towards supporting 'free' subvector extractions other than concat_vectors.

llvm-svn: 365896
2019-07-12 13:00:35 +00:00
Simon Pilgrim d0307f93a7 [DAGCombine] narrowInsertExtractVectorBinOp - add CONCAT_VECTORS support
We already split extract_subvector(binop(insert_subvector(v,x),insert_subvector(w,y))) -> binop(x,y).

This patch adds support for extract_subvector(binop(concat_vectors(),concat_vectors())) cases as well.

In particular this means we don't have to wait for X86 lowering to convert concat_vectors to insert_subvector chains, which helps avoid some cases where demandedelts/combine calls occur too late to split large vector ops.

The fast-isel-store.ll load folding regression is annoying but I don't think is that critical.

Differential Revision: https://reviews.llvm.org/D63653

llvm-svn: 365785
2019-07-11 14:45:03 +00:00
Tim Northover f2d6597653 OpaquePtr: use byval accessor instead of inspecting pointer type. NFC.
The accessor can deal with both "byval(ty)" and "ty* byval" forms
seamlessly.

llvm-svn: 365769
2019-07-11 13:12:38 +00:00
Sanjay Patel 138328e45c [SDAG] commute setcc operands to match a subtract
If we have:

R = sub X, Y
P = cmp Y, X

...then flipping the operands in the compare instruction can allow using a subtract that sets compare flags.

Motivated by diffs in D58875 - not sure if this changes anything there,
but this seems like a good thing independent of that.

There's a more involved version of this transform already in IR (in instcombine
although that seems misplaced to me) - see "swapMayExposeCSEOpportunities()".

Differential Revision: https://reviews.llvm.org/D63958

llvm-svn: 365711
2019-07-10 23:23:54 +00:00
Michael Berg f4572249d7 Move three folds for FADD, FSUB and FMUL in the DAG combiner away from Unsafe to more aligned checks that reflect context
Summary: Unsafe does not map well alone for each of these three cases as it is missing NoNan context when accessed directly with clang.  I have migrated the fold guards to reflect the expectations of handing nan and zero contexts directly (NoNan, NSZ) and some tests with it.  Unsafe does include NSZ, however there is already precedent for using the target option directly to reflect that context. 

Reviewers: spatel, wristow, hfinkel, craig.topper, arsenm

Reviewed By: arsenm

Subscribers: michele.scandale, wdng, javed.absar

Differential Revision: https://reviews.llvm.org/D64450

llvm-svn: 365679
2019-07-10 18:23:26 +00:00
Nick Desaulniers 8728e45706 [TargetLowering] support BlockAddress as "i" inline asm constraint
Summary:
This allows passing address of labels to inline assembly "i" input
constraints.

Fixes pr/42502.

Reviewers: ostannard

Reviewed By: ostannard

Subscribers: void, echristo, nathanchance, ostannard, javed.absar, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64167

llvm-svn: 365664
2019-07-10 17:08:25 +00:00
Simon Pilgrim 94c84aca5d [DAGCombine] visitINSERT_SUBVECTOR - use uint64_t subvector index. NFCI.
Keep the uint64_t type from getZExtValue() to stop truncation/extension overflow warnings in MSVC in subvector index math.

llvm-svn: 365621
2019-07-10 12:21:35 +00:00
Simon Pilgrim bb1167a3a1 Fix const/non-const lambda return type warning. NFCI.
llvm-svn: 365613
2019-07-10 10:45:09 +00:00
Craig Topper 84a1f07363 [X86][AMDGPU][DAGCombiner] Move call to allowsMemoryAccess into isLoadBitCastBeneficial/isStoreBitCastBeneficial to allow X86 to bypass it
Basically the problem is that X86 doesn't set the Fast flag from
allowsMemoryAccess on certain CPUs due to slow unaligned memory
subtarget features. This prevents bitcasts from being folded into
loads and stores. But all vector loads and stores of the same width
are the same cost on X86.

This patch merges the allowsMemoryAccess call into isLoadBitCastBeneficial to allow X86 to skip it.

Differential Revision: https://reviews.llvm.org/D64295

llvm-svn: 365549
2019-07-09 19:55:28 +00:00
Simon Pilgrim 57603cbde8 [DAGCombine] LoadedSlice - keep getOffsetFromBase() uint64_t offset. NFCI.
Keep the uint64_t type from getOffsetFromBase() to stop truncation/extension overflow warnings in MSVC in alignment math.

llvm-svn: 365504
2019-07-09 15:28:57 +00:00
Tim Northover 60afa49abe OpaquePtr: add Type parameter to Loads analysis API.
This makes the functions in Loads.h require a type to be specified
independently of the pointer Value so that when pointers have no structure
other than address-space, it can still do its job.

Most callers had an obvious memory operation handy to provide this type, but a
SROA and ArgumentPromotion were doing more complicated analysis. They get
updated to merge the properties of the various instructions they were
considering.

llvm-svn: 365468
2019-07-09 11:35:35 +00:00
Bjorn Pettersson 051a6a1c33 [SelectionDAG] Simplify some calls to getSetCCResultType. NFC
DAGTypeLegalizer and SelectionDAGLegalize has helper
functions wrapping the call to TLI.getSetCCResultType(...).
Use those helpers in more places.

llvm-svn: 365456
2019-07-09 10:27:51 +00:00
Bjorn Pettersson 59029017a6 [LegalizeTypes] Fix saturation bug for smul.fix.sat
Summary:
Make sure we use SETGE instead of SETGT when checking
if the sign bit is zero at SMULFIXSAT expansion.

The faulty expansion occured when doing "expand" of
SMULFIXSAT and the scale was exactly matching the
size of the smaller type. For example doing
  i64 Z = SMULFIXSAT X, Y, 32
and expanding X/Y/Z into using two i32 values.

The problem was that we sometimes did not saturate
to min when overflowing.

Here is an example using Q3.4 numbers:

Consider that we are multiplying X and Y.
  X = 0x80 (-8.0 as Q3.4)
  Y = 0x20 (2.0 as Q3.4)
To avoid loss of precision we do a widening
multiplication, getting a 16 bit result
  Z = 0xF000 (-16.0 as Q7.8)

To detect negative overflow we should check if
the five most significant bits in Z are less than -1.
Assume that we name the 4 most significant bits
as HH and the next 4 bits as HL. Then we can do the
check by examining if
 (HH < -1) or (HH == -1 && "sign bit in HL is zero").

The fault was that we have been doing the check as
 (HH < -1) or (HH == -1 && HL > 0)
instead of
 (HH < -1) or (HH == -1 && HL >= 0).

In our example HH is -1 and HL is 0, so the old
code did not trigger saturation and simply truncated
the result to 0x00 (0.0). With the bugfix we instead
detect that we should saturate to min, and the result
will be set to 0x80 (-8.0).

Reviewers: leonardchan, bevinh

Reviewed By: leonardchan

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64331

llvm-svn: 365455
2019-07-09 10:24:50 +00:00
Guillaume Chatelet 336f3e1601 Fixing @llvm.memcpy not honoring volatile.
This is explicitly not addressing target-specific code, or calls to memcpy.

Summary: https://bugs.llvm.org/show_bug.cgi?id=42254

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63215

llvm-svn: 365449
2019-07-09 09:53:36 +00:00
Reid Kleckner 2f07c2e9d9 Standardize on MSVC behavior for triples with no environment
Summary:
This makes it so that IR files using triples without an environment work
out of the box, without normalizing them.

Typically, the MSVC behavior is more desirable. For example, it tends to
enable things like constant merging, use of associative comdats, etc.

Addresses PR42491

Reviewers: compnerd

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64109

llvm-svn: 365387
2019-07-08 21:05:20 +00:00
Simon Pilgrim 9285bf0fb9 [TargetLowering] SimplifyDemandedBits - just call computeKnownBits for BUILD_VECTOR cases.
Don't do this locally, computeKnownBits does this better (and can handle non-constant cases as well).

A next step would be to actually simplify non-constant elements - building on what we already do in SimplifyDemandedVectorElts.

llvm-svn: 365309
2019-07-08 11:00:39 +00:00
Simon Pilgrim 9c68aa33e3 [DAGCombine] convertBuildVecZextToZext - remove duplicate getOpcode() call. NFCI.
llvm-svn: 365269
2019-07-06 18:32:15 +00:00
Craig Topper e9aed963ce [DAGCombiner] Don't combine (addcarry (uaddo X, Y), 0, Carry) -> (addcarry X, Y, Carry) if the Carry comes from the uaddo.
Summary:
The uaddo won't be removed and the addcarry will still be
dependent on the uaddo. So we'll just increase the use count
of X and Y and potentially require a COPY.

Reviewers: spatel, RKSimon, deadalnix

Reviewed By: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64190

llvm-svn: 365149
2019-07-04 18:18:46 +00:00
Francis Visoiu Mistrih 83bbe2f418 [CodeGen] Make branch funnels pass the machine verifier
We previously marked all the tests with branch funnels as
`-verify-machineinstrs=0`.

This is an attempt to fix it.

1) `ICALL_BRANCH_FUNNEL` has no defs. Mark it as `let OutOperandList =
(outs)`

2) After that we hit an assert: ``` Assertion failed: (Op.getValueType()
!= MVT::Other && Op.getValueType() != MVT::Glue && "Chain and glue
operands should occur at end of operand list!"), function AddOperand,
file
/Users/francisvm/llvm/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp,
line 461.  ```

The chain operand was added at the beginning of the operand list. Move
that to the end.

3) After that we hit another verifier issue in the pseudo expansion
where the registers used in the cmps and jmps are not added to the
livein lists. Add the `EFLAGS` to all the new MBBs that we create.

PR39436

Differential Review: https://reviews.llvm.org/D54155

llvm-svn: 365058
2019-07-03 17:16:45 +00:00
Amaury Sechet 57dfacb32d Use getAllOnesConstants instead of -1 in DAGCombiner. NFC
llvm-svn: 365054
2019-07-03 16:34:36 +00:00
Amaury Sechet bddb8c3597 [DAGCombine] More diamong carry pattern optimization.
Summary:
This diff improve the capability of DAGCOmbine to generate linear carries propagation in presence of a diamond pattern. It is now able to match a large variety of different patterns rather than some hardcoded one.

Arguably, the codegen in test cases is not better, but this is to be expected. The goal of this transformation is more about canonicalisation than actual optimisation.

Reviewers: hfinkel, RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D57302

llvm-svn: 365051
2019-07-03 16:15:59 +00:00
James Molloy fa4aac7335 [SelectionDAG] Propagate alias metadata to target intrinsic nodes
When a target intrinsic has been determined to touch memory, we construct a MachineMemOperand during SDAG construction. In this case, we should propagate AAMDNodes metadata to the MachineMemOperand where available.

Differential revision: https://reviews.llvm.org/D64131

llvm-svn: 365043
2019-07-03 14:33:29 +00:00
Roman Lebedev c4b83a6054 [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457)
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.

Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.

Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.

Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel

Reviewed By: efriedma

Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64090

llvm-svn: 365010
2019-07-03 09:41:35 +00:00
Roman Lebedev 7c8ee375d8 [NFC][TargetLowering] Some preparatory cleanups around 'prepareUREMEqFold()' from D63963
llvm-svn: 364921
2019-07-02 13:21:23 +00:00
Zi Xuan Wu 7ae536a1ce [DAGCombiner] Exploiting more about the transformation of TransformFPLoadStorePair function
For a given floating point load / store pair, if the load value isn't used by any other operations, 
then consider transforming the pair to integer load / store operations if the target deems the transformation profitable.

And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair 
so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer.

I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target.

Differential Revision: https://reviews.llvm.org/D60601

llvm-svn: 364883
2019-07-02 02:54:52 +00:00
Benjamin Kramer ed13fef477 [SelectionDAG] Do minnum->minimum at legalization time instead of building time
The SDAGBuilder behavior stems from the days when we didn't have fast
math flags available in SDAG. We do now and doing the transformation in
the legalizer has the advantage that it also works for vector types.

llvm-svn: 364743
2019-07-01 11:00:23 +00:00
Craig Topper 4d0feb28ec [SelectionDAG] Use the memory VT instead of result VT for FoldingSet profiling in getMaskedLoad/getMaskedStore.
This matches what is done by the Profile function. Otherwise CSE
won't work properly.

llvm-svn: 364717
2019-06-30 06:46:33 +00:00
Roman Lebedev 29d05c005f [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).

Original patch D50222 by @hermord (Dmytro Shynkevych)

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

llvm-svn: 364600
2019-06-27 21:52:10 +00:00
Roman Lebedev 0a2b7b79fa Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)"
*Appears* to break test-suite on
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790

FAIL: burg.execution_time
FAIL: spiff.execution_time
FAIL: employ.execution_time
FAIL: llu.execution_time
FAIL: gramschmidt.execution_time
FAIL: fdtd-apml.execution_time

This reverts commit r364563.

llvm-svn: 364568
2019-06-27 17:22:31 +00:00
Roman Lebedev 0627b09863 [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

Original patch author: @hermord (Dmytro Shynkevych)!

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

llvm-svn: 364563
2019-06-27 16:45:42 +00:00
Simon Pilgrim 83e1a1e79b [TargetLowering] SimplifyDemandedVectorElts - add shift/rotate support.
llvm-svn: 364548
2019-06-27 14:25:54 +00:00
Simon Pilgrim c692a8dc51 [TargetLowering] SimplifyDemandedBits - use DemandedElts to better identify partial splat shift amounts
llvm-svn: 364541
2019-06-27 13:48:43 +00:00
Djordje Todorovic 7eeeb5947e [ISEL][X86] Tracking of registers that forward call arguments
While lowering calls, collect info about registers that forward arguments
into following function frame. We store such info into the MachineFunction
of the call. This is used very late when dumping DWARF info about
call site parameters.

([9/13] Introduce the debug entry values.)

Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>

Differential Revision: https://reviews.llvm.org/D60715

llvm-svn: 364516
2019-06-27 10:51:15 +00:00
Roman Lebedev b0ecc1cc6b [X86] X86DAGToDAGISel::matchBitExtract(): pattern b: truncation awareness
Summary:
(Not so) boringly identical to pattern a (D62786)
Not yet sure how do deal with the last pattern c.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62793

llvm-svn: 364418
2019-06-26 12:19:39 +00:00
Simon Pilgrim a6319e5f83 [DAGCombine] visitEXTRACT_SUBVECTOR - add TODO for extract_subvector(bitcast()) support
We support 'big to little' (e.g. extract_subvector(v16i8 bitcast(v2i64))) but not 'little to big' cases  (e.g. extract_subvector(v2i64 bitcast(v16i8)))

llvm-svn: 364405
2019-06-26 11:17:38 +00:00
QingShan Zhang e0e7d4c366 Teach the DAGCombine to fold this pattern(c1 and c2 is constant).
// fold (sext (select cond, c1, c2)) -> (select cond, sext c1, sext c2)
// fold (zext (select cond, c1, c2)) -> (select cond, zext c1, zext c2)
// fold (aext (select cond, c1, c2)) -> (select cond, sext c1, sext c2)
Sign extend the operands if it is any_extend, to keep the signess of the operands that, the other combine rule would apply. The any_extend is handled as zero extend for constants. i.e.

t1: i8 = select t0, Constant:i8<-1>, Constant:i8<0>
t2: i64 = any_extend t1
 -->
t3: i64 = select t0, Constant:i64<-1>, Constant:i64<0>
 -->
t4: i64 = sign_extend_inreg t3

Differential Revision: https://reviews.llvm.org/D63318

llvm-svn: 364382
2019-06-26 05:12:53 +00:00
Simon Pilgrim 9762b26032 [DAGCombine] combineRepeatedFPDivisors - recognize -1.0 / X as a reciprocal
Fixes issue identified by @nemanjai (Nemanja Ivanovic) in D62963 / rL363040 - infinite loop due to GetNegatedExpression fighting combineRepeatedFPDivisors resulting in fneg(fdiv(x,splat)) -> fneg(fmul(x,1.0/splat)) -> fmul(x,-1.0/splat) -> fmul(x,(-1.0 * 1.0)/splat) ......

llvm-svn: 364326
2019-06-25 16:00:16 +00:00
Sanjay Patel 685c5cbc65 [SDAG] expand ctpop != 1
Change the generic ctpop expansion to more efficiently handle a
check for not-a-power-of-two value:
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

This is the inverted predicate sibling pattern that was added with:
D63004

This should have been done before I changed IR canonicalization to
favor this form with:
rL364246
...so if this requires revert/changing, the earlier commit may also
need to modified.

llvm-svn: 364319
2019-06-25 14:46:52 +00:00
Simon Pilgrim 1a18bb6f25 [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support
Add 'lowest' demanded elt -> bitcast fold to all *_EXTEND_VECTOR_INREG cases.

Reapplies rL363856.

llvm-svn: 364311
2019-06-25 13:25:57 +00:00
Simon Pilgrim 36953ce769 [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.

Matches what we already do for ZERO_EXTEND.

Reapplies rL363850 but now with legality checks added at rL364290

llvm-svn: 364303
2019-06-25 12:57:43 +00:00
Sanjay Patel e4ef62291b [SDAG] improve expansion of ctpop+setcc
This should not cause any visible change in output, but it's
more efficient because we were producing non-canonical 'sub x, 1'
and 'setcc ugt x, 0'. As mentioned in the TODO, we should also
be handling the inverse predicate.

llvm-svn: 364302
2019-06-25 12:49:35 +00:00
Simon Pilgrim 69fc111184 [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.

Matches what we already do for SIGN_EXTEND.

Reapplies rL363802 but now with legality checks added at rL364290

llvm-svn: 364299
2019-06-25 12:19:12 +00:00
Simon Pilgrim b23c942ce4 [VectorLegalizer] ExpandANY_EXTEND_VECTOR_INREG/ExpandZERO_EXTEND_VECTOR_INREG - widen source vector
The *_EXTEND_VECTOR_INREG opcodes were relaxed back around rL346784 to support source vector widths that are smaller than the output - it looks like the legalizers were never updated to account for this.

This patch inserts the smaller source vector into an undef vector of the same width of the result before performing the shuffle+bitcast to correctly handle this.

Part of the yak shaving to solve the crashes from rL364264 and rL364272

llvm-svn: 364295
2019-06-25 11:31:37 +00:00
Simon Pilgrim 49b3778e32 [TargetLowering] SimplifyDemandedBits - legal checks for SIGN/ZERO_EXTEND -> ZERO/ANY_EXTEND
As part of the fix for rL364264 + rL364272 - limit the *_EXTEND conversion to !TLO.LegalOperations || isOperationLegal cases.

We'll improve X86 legality in future commits.

llvm-svn: 364290
2019-06-25 10:51:15 +00:00
Roman Lebedev cdd43eac4f [Codegen] TargetLowering::SimplifySetCC(): omit urem when possible
Summary:
This addresses the regression that is being exposed by D50222 in `test/CodeGen/X86/jump_sign.ll`
The missing fold, at least partially, looks trivial:
https://rise4fun.com/Alive/Zsln
i.e. if we are comparing with zero, and comparing the `urem`-by-non-power-of-two,
and the `urem` is of something that may at most have a single bit set (or no bits set at all),
the `urem` is not needed.

Reviewers: RKSimon, craig.topper, xbolva00, spatel

Reviewed By: xbolva00, spatel

Subscribers: xbolva00, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63390

llvm-svn: 364286
2019-06-25 10:01:42 +00:00
Craig Topper 079924b0b7 Revert r363802, r363850, and r363856 "[TargetLowering] SimplifyDemandedBits..."
This reverts the following patches.
"[TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG"
"[TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support"

We can end up with an any_extend_vector_inreg with a 256 bit result type
and a 128 bit result type. This is allowed by the ISD opcode, but the
generic operation legalizer is only able to expand cases where the
total vector width is the same.

The X86 backend creates these mismatched cases for zext_vec_inreg/sext_vec_inreg.
The SimplifyDemandedBits changes are allowing those nodes to become
aext_vec_inreg. For the zext/sext cases, the X86 backend has Custom
handling and never lets them get to the generic legalizer. We need to do the same
for aext_vec_inreg.

llvm-svn: 364264
2019-06-25 01:32:42 +00:00
Roland Froese ea08248b2b [CodeGen] Add missing vector type legalization for ctlz_zero_undef
Widen vector result type for ctlz_zero_undef and cttz_zero_undef the same as
ctlz and cttz.

Differential Revision: https://reviews.llvm.org/D63463

llvm-svn: 364221
2019-06-24 19:27:07 +00:00
Matt Arsenault e3a676e9ad CodeGen: Introduce a class for registers
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().

llvm-svn: 364191
2019-06-24 15:50:29 +00:00
Simon Pilgrim 69144a925e [DAGCombine] visitMUL - allow shift by zero in MulByConstant.
This can occur under certain circumstances when undefs are created later on in the constant multipliers (e.g. in this case due to SimplifyDemandedVectorElts). Its better to let the shift by zero to occur and perform any cleanup afterward.

Fixes OSS Fuzz #15429

llvm-svn: 364179
2019-06-24 12:47:17 +00:00
Craig Topper 6ddc7912b0 [SelectionDAG] Remove the code that attempts to calculate the alignment for the second half of a split masked load/store.
The code divides the alignment by 2 if the original alignment is
equal to the original VT size. But this wouldn't be correct
if the alignment was larger than the VT size.

The memory operand object already takes care of calling MinAlign
on the base alignment and the memory pointer offset. So we don't
need any special code at all.

llvm-svn: 364151
2019-06-23 07:00:46 +00:00
Simon Pilgrim 0da13ed1f6 [DAGCombine] narrowExtractedVectorBinOp - pull out repeated getOpcode(). NFCI.
llvm-svn: 364076
2019-06-21 16:44:51 +00:00
Simon Pilgrim ca9933c22d [DAGCombine] narrowInsertExtractVectorBinOp - reuse "extract from insert" detection code.
Move the "extract from insert detection code" into a lambda helper function.

llvm-svn: 364059
2019-06-21 14:46:21 +00:00
Simon Pilgrim 801c0f12b0 [DAGCombiner] Use getAPIntValue() instead of getZExtValue() where possible.
Better handling of out-of-i64-range values due to large integer types or from fuzz tests.

llvm-svn: 363955
2019-06-20 17:36:23 +00:00
Jordan Rupprecht 02508decf4 [DAGCombiner][NFC] Remove unused var
llvm-svn: 363954
2019-06-20 17:30:01 +00:00
Simon Pilgrim 1d8093249f [DAGCombiner] Support (shl (zext (srl x, C)), C) -> (zext (shl (srl x, C), C)) non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

llvm-svn: 363929
2019-06-20 14:42:27 +00:00
Simon Pilgrim 98a0ac5c0f [DAGCombine] Add TODOs for some combines that should support non-uniform vectors
We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc.

llvm-svn: 363924
2019-06-20 12:48:49 +00:00
Simon Pilgrim a487628270 [DAGCombine] Reduce scope of ShAmtVal variable. NFCI.
Fixes cppcheck warning.

Use the more capable getAPIntVal() instead of getZExtValue() as well since I'm here.

llvm-svn: 363921
2019-06-20 10:56:37 +00:00
Simon Pilgrim 046d49a8dc [DAGCombine] Use ConstantSDNode::getAPIntValue() instead of getZExtValue().
Use getAPIntValue() in a few more places. Most of the time getZExtValue() is fine, but occasionally there's fuzzed code or someone decides to create i65536 or something.....

llvm-svn: 363887
2019-06-19 22:14:24 +00:00
Simon Pilgrim f05369768c [TargetLowering] SimplifyDemandedBits - add ANY_EXTEND_VECTOR_INREG support
Move 'lowest' demanded elt -> bitcast fold out of ZERO_EXTEND_VECTOR_INREG into ANY_EXTEND_VECTOR_INREG case.

llvm-svn: 363856
2019-06-19 18:34:58 +00:00
Simon Pilgrim 6016fb726c [TargetLowering] SimplifyDemandedBits ZERO_EXTEND_VECTOR_INREG -> ANY_EXTEND_VECTOR_INREG
Simplify ZERO_EXTEND_VECTOR_INREG if the extended bits are not required.

Matches what we already do for ZERO_EXTEND.

llvm-svn: 363850
2019-06-19 18:00:24 +00:00
Simon Pilgrim c3994f77cb [TargetLowering] SimplifyDemandedBits SIGN_EXTEND_VECTOR_INREG -> ANY/ZERO_EXTEND_VECTOR_INREG
Simplify SIGN_EXTEND_VECTOR_INREG if the extended bits are not required/known zero.

Matches what we already do for SIGN_EXTEND.

llvm-svn: 363802
2019-06-19 13:58:02 +00:00
Simon Pilgrim 9eed5d2f78 [DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

llvm-svn: 363793
2019-06-19 12:41:37 +00:00
Simon Pilgrim 8c49366c9b [DAGCombiner] Support (shl (ext (shl x, c1)), c2) -> 0 non-uniform folds.
Use matchBinaryPredicate instead of isConstOrConstSplat to let us handle non-uniform shift cases. 

This requires us to tweak matchBinaryPredicate to allow it to (optionally) handle constants with different type widths.

llvm-svn: 363792
2019-06-19 12:25:29 +00:00
Simon Pilgrim bb6b856183 [DAGCombiner] visitSHL - pull out repeated shift amount VT. NFCI.
llvm-svn: 363789
2019-06-19 11:31:26 +00:00
Simon Pilgrim d954a53633 [DAGCombine] Fix (shl (ext (shl x, c1)), c2) -> (shl (ext x), (add c1, c2)) comment. NFCI.
We pre-extend, not post.

llvm-svn: 363787
2019-06-19 11:17:48 +00:00
Matt Arsenault 9cac4e6d14 Rename ExpandISelPseudo->FinalizeISel, delay register reservation
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.

Patch by Matthias Braun

llvm-svn: 363757
2019-06-19 00:25:39 +00:00
Simon Pilgrim 5bef886cd8 [TargetLowering] SimplifyDemandedBits - Cleanup ANY_EXTEND handling
Match SIGN_EXTEND + ZERO_EXTEND handling - will be adding ANY_EXTEND_VECTOR_INREG support in a future patch.

llvm-svn: 363716
2019-06-18 18:22:30 +00:00
Simon Pilgrim 032b54f8e8 [TargetLowering] SimplifyDemandedBits - Merge ZERO_EXTEND+ZERO_EXTEND_VECTOR_INREG handling
Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches.

llvm-svn: 363713
2019-06-18 18:08:30 +00:00
Simon Pilgrim b6e7108dcd [TargetLowering] SimplifyDemandedBits - Merge SIGN_EXTEND+SIGN_EXTEND_VECTOR_INREG handling
Other than adding consistent demanded elts handling which was a trivial addition, the other differences in functionality will be added in later patches.

llvm-svn: 363710
2019-06-18 17:57:53 +00:00
Simon Pilgrim 9aa25be149 [TargetLowering] SimplifyDemandedVectorElts - support MUL and ANY_EXTEND_VECTOR_INREG
Also fold ANY_EXTEND_VECTOR_INREG -> BITCAST if we only need the bottom element.

Fixes temporary regression introduced in rL363693.

llvm-svn: 363694
2019-06-18 15:49:35 +00:00
Simon Pilgrim 83bacd8d72 [SelectionDAG] Legalize vaargs that require vector splitting
This adds vector splitting for vaarg instructions during type legalization

Committed on behalf of @luke (Luke Lau)

Differential Revision: https://reviews.llvm.org/D60762

llvm-svn: 363671
2019-06-18 12:24:02 +00:00
Luis Marques 2e46312ffd [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting
Some GEPs were not being split, presumably because that split would just be 
undone by the DAGCombiner. Not performing those splits can prevent important 
optimizations, such as preventing the element indices / member offsets from 
being (partially) folded into load/store instruction immediates. This patch:

- Makes the splits also occur in the cases where the base address and the GEP 
  are in the same BB.
- Ensures that the DAGCombiner doesn't reassociate them back again.

Differential Revision: https://reviews.llvm.org/D60294

llvm-svn: 363544
2019-06-17 10:54:12 +00:00
Simon Pilgrim ef78e55205 [SelectionDAG] Fold insert_subvector(undef, extract_subvector(v, c), c) -> v in getNode
This is already done in DAGCombiner::visitINSERT_SUBVECTOR, but this helps a number of shuffles across different vector widths recognise when they come from the same source.

llvm-svn: 363542
2019-06-17 10:14:52 +00:00
Michael Berg ad6bb86b2d adding more fmf propagation for selects plus updated tests
llvm-svn: 363484
2019-06-15 04:53:51 +00:00
Fangrui Song 968b5f84af Revert "adding more fmf propagation for selects plus tests"
This reverts rL363474. -debug-only=isel was added to some tests that
don't specify `REQUIRES: asserts`. This causes failures on
-DLLVM_ENABLE_ASSERTIONS=off builds.

I chose to revert instead of fixing the tests because I'm not sure
whether we should add `REQUIRES: asserts` to more tests.

llvm-svn: 363482
2019-06-15 03:51:08 +00:00
Michael Berg 69394bedc5 adding more fmf propagation for selects plus tests
llvm-svn: 363474
2019-06-14 23:30:52 +00:00
Simon Pilgrim 4e0648a541 [TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123)
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.

This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.

If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.

Differential Revision: https://reviews.llvm.org/D63075

llvm-svn: 363179
2019-06-12 17:14:03 +00:00
Simon Pilgrim 266f43964e [TargetLowering] Add allowsMemoryAccess(MachineMemOperand) helper wrapper. NFCI.
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.

llvm-svn: 363048
2019-06-11 11:00:23 +00:00
Simon Pilgrim 287e78c82b [DAGCombine] GetNegatedExpression - constant float vector support (PR42105)
Add support for negation of constant build vectors.

Differential Revision: https://reviews.llvm.org/D62963

llvm-svn: 363040
2019-06-11 09:44:33 +00:00
Sander de Smalen cbeb563cfb Change semantics of fadd/fmul vector reductions.
This patch changes how LLVM handles the accumulator/start value
in the reduction, by never ignoring it regardless of the presence of
fast-math flags on callsites. This change introduces the following
new intrinsics to replace the existing ones:

  llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd
  llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul

and adds functionality to auto-upgrade existing LLVM IR and bitcode.

Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D60261

llvm-svn: 363035
2019-06-11 08:22:10 +00:00
Francis Visoiu Mistrih a438432acc [FastISel] Skip creating unnecessary vregs for arguments
This behavior was added in r130928 for both FastISel and SD, and then
disabled in r131156 for FastISel.

This re-enables it for FastISel with the corresponding fix.

This is triggered only when FastISel can't lower the arguments and falls
back to SelectionDAG for it.

FastISel contains a map of "register fixups" where at the end of the
selection phase it replaces all uses of a register with another
register that FastISel sometimes pre-assigned. Code at the end of
SelectionDAGISel::runOnMachineFunction is doing the replacement at the
very end of the function, while other pieces that come in before that
look through the MachineFunction and assume everything is done. In this
case, the real issue is that the code emitting COPY instructions for the
liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg
assigned to the physreg is used, and if it's not, it will skip the COPY.
If a register wasn't replaced with its assigned fixup yet, the copy will
be skipped and we'll end up with uses of undefined registers.

This fix moves the replacement of registers before the emission of
copies for the live-ins.

The initial motivation for this fix is to enable tail calls for
swiftself functions, which were blocked because we couldn't prove that
the swiftself argument (which is callee-save) comes from a function
argument (live-in), because there was an extra copy (vreg to vreg).

A few tests are affected by this:

* llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21
(callee-save) but never reload it because it's attached to the return.
We now don't even spill it anymore.
* llvm/test/CodeGen/*/swiftself.ll: we tail-call now.
* llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this
test was not really testing the right thing, but it worked because the
same registers were re-used.
* llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes
* llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy
* llvm/test/CodeGen/Mips/*: get rid of spills and copies
* llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack
* llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack
* llvm/test/CodeGen/X86/swifterror.ll: same as AArch64
* llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed

Differential Revision: https://reviews.llvm.org/D62361

llvm-svn: 362963
2019-06-10 16:53:37 +00:00
QingShan Zhang ab846da7e8 [DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c

static void store64(u64 x, unsigned char* y)
{
    for(int i = 0; i != 8; ++i)
        y[i] = (x >> ((7-i) * 8)) & 255;
}

static u64 load64(const unsigned char* y)
{
    u64 res = 0;
    for(int i = 0; i != 8; ++i)
        res |= (u64)(y[i]) << ((7-i) * 8);
    return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.

Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.

Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;

>
*((i32)p) = val;

i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;

>
*((i32)p) = BSWAP(val);

Differential Revision: https://reviews.llvm.org/D62897

llvm-svn: 362921
2019-06-10 05:40:21 +00:00
David Bolvansky dcf5e6abdf [TargetLowering] Simplify (ctpop x) == 1
Reviewers: craig.topper, spatel, RKSimon, bkramer

Reviewed By: spatel

Subscribers: javed.absar, lebedev.ri, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63004

llvm-svn: 362912
2019-06-09 18:18:57 +00:00
Simon Pilgrim 5f337149fa Use for-range loop. NFCI.
llvm-svn: 362897
2019-06-09 09:07:30 +00:00
Simon Pilgrim 6bae6d5a5d [DAGCombine] visitAND - merge (zext_inreg ((s)extload x)) -> (zextload x) combines. NFCI.
Same codegen, only differ by the oneuse limit for the sextload case.

llvm-svn: 362880
2019-06-08 17:02:00 +00:00
Amara Emerson 829037a914 Factor out SelectionDAG's switch analysis and lowering into a separate component.
In order for GlobalISel to re-use the significant amount of analysis and
optimization code in SDAG's switch lowering, we first have to extract it and
create an interface to be used by both frameworks.

No test changes as it's NFC.

Differential Revision: https://reviews.llvm.org/D62745

llvm-svn: 362857
2019-06-08 00:05:17 +00:00
Simon Pilgrim f0240ee76d [DAGCombine] visitAND - fix local shadow variable warnings. NFCI.
llvm-svn: 362825
2019-06-07 18:36:43 +00:00
Simon Pilgrim 4c9db2045a [DAGCombine] Use APInt::extractBits in "sub-splat" constant mask detection. NFCI.
llvm-svn: 362820
2019-06-07 18:07:06 +00:00
Jason Liu 60ec248148 [AIX] Implement function descriptor on SDAG
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
 * A function descriptor (Name)
 * A function entry point (.Name)

The descriptor structure on AIX is the same as those in the ELF V1 ABI:
 * The address of the entry point of the function.
 * The TOC base address for the function.
 * The environment pointer.

The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
 function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".

Which symbol gets referenced depends on the context:
 * Taking the address of the function references the descriptor symbol.
 * Calling the function references the entry point symbol.

(2) Speaking of implementation on AIX, for direct function call target, we
 create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
 replace original TargetGlobalAddress SDNode. Then down the path, we can
 take advantage of this MCSymbol.

Patch by: Xiangling_L

Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara

Differential Revision: https://reviews.llvm.org/D62532

llvm-svn: 362735
2019-06-06 19:13:36 +00:00
Simon Pilgrim 842c7792aa [DAGCombine] MergeConsecutiveStores - improve non-temporal load\store handling (PR42123)
This patch is the first step towards ensuring MergeConsecutiveStores correctly handles non-temporal loads\stores:

1 - When merging load\stores we must ensure that they all have the same non-temporal flag. This is unlikely to occur, but can in strange cases where we're storing at the end of one page and the beginning of another.

2 - The merged load\store node must retain the non-temporal flag.

Differential Revision: https://reviews.llvm.org/D62910

llvm-svn: 362723
2019-06-06 17:04:13 +00:00
Simon Pilgrim da993d08c8 [DAGCombine] Cleanup isNegatibleForFree/GetNegatedExpression. NFCI.
Prep work for PR42105 - clang-format, use auto for cast and merge nested if()s

llvm-svn: 362695
2019-06-06 10:21:18 +00:00
Ulrich Weigand 6c5d5ce551 Allow target to handle STRICT floating-point nodes
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).

This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.

To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:

- A new MCID flag "mayRaiseFPException" that the target should set on any
  instruction that possibly can raise FP exception according to the
  architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
  instruction resulting from expansion of any constrained FP intrinsic.

Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).

Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.

The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.

This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.

Reviewed By: andrew.w.kaylor

Differential Revision: https://reviews.llvm.org/D55506

llvm-svn: 362663
2019-06-05 22:33:10 +00:00
Tim Northover 607c8a9d14 IR: make getParamByValType Just Work. NFC.
Most parts of LLVM don't care whether the byval type is derived from an
explicit Attribute or from the parameter's pointee type, so it makes
sense for the main access function to just return the right value.

The very few users who do care (only BitcodeReader so far) can find out
how it's specified by accessing the Attribute directly.

llvm-svn: 362642
2019-06-05 20:37:47 +00:00
Simon Pilgrim 77d6adc491 Fix shadow local variable warning. NFCI.
llvm-svn: 362622
2019-06-05 17:26:29 +00:00
Simon Pilgrim 5a81af547c [TargetLowering] SimplifyDemandedBits - pull out shift value type. NFCI.
Will be used more in an upcoming patch.

llvm-svn: 362595
2019-06-05 10:59:04 +00:00
Johannes Doerfert 6b432dca5d [SelectionDAG][FIX] Allow "returned" arguments to be bit-casted
Summary:
An argument that is return by a function but bit-casted before can still
be annotated as "returned". Make sure we do not crash for this case.

Reviewers: sunfish, stephenwlin, niravd, arsenm

Subscribers: wdng, hiraditya, bollu, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59917

llvm-svn: 362546
2019-06-04 20:34:43 +00:00
Nemanja Ivanovic aed7227b71 Revert r362472 as it is breaking PPC build bots
The patch https://reviews.llvm.org/rL362472 broke PPC LNT buildbots.
Reverting it to bring the bots back to green.

llvm-svn: 362539
2019-06-04 18:48:43 +00:00
Craig Topper 09a4415803 [DAGCombiner][X86] Fold (not (neg X)) -> (add X, -1)
This is a special case of a more general transform (not (sub Y, X)) -> (add X, ~Y). InstCombine knows the general form. I've restricted to the special case to fix the motivating case PR42118. I tried handling any case where Y was constant, but got some changes on some Mips tests that I couldn't quickly prove where beneficial.

Fixes PR42118

Differential Revision: https://reviews.llvm.org/D62828

llvm-svn: 362533
2019-06-04 17:44:18 +00:00
Sanjay Patel 1e63dd0b44 [SelectionDAG][x86] limit post-legalization store merging by type
The proposal in D62498 showed that x86 would benefit from vector
store splitting, but that may conflict with the generic DAG
combiner's store merging transforms.

Add memory type to the existing TLI hook that enables the merging
transforms, so we can limit those changes to scalars only for x86.

llvm-svn: 362507
2019-06-04 15:15:59 +00:00
Roman Lebedev 3dce0326fe [DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952)
Summary:
This *might* be the last fold for `sink-addsub-of-const.ll`, but i'm not sure yet.

As far as i can tell, there are no regressions here (ignoring x86-32),
all changes are either good or neutral.

This, almost surprisingly to me, fixes the motivational tests (in `shift-amount-mod.ll`)
`@reg32_lshr_by_sub_from_negated` from [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/vMd3

Reviewers: RKSimon, t.p.northover, craig.topper, spatel, efriedma

Reviewed By: RKSimon

Subscribers: sdardis, javed.absar, arichardson, kristof.beyls, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62774

llvm-svn: 362488
2019-06-04 11:06:21 +00:00
Roman Lebedev be6ce7b3f2 [DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold
Summary:
All changes except ARM look **great**.
https://rise4fun.com/Alive/R2M

The regression `test/CodeGen/ARM/addsubcarry-promotion.ll`
is recovered fully by D62392 + D62450.

Reviewers: RKSimon, craig.topper, spatel, rogfer01, efriedma

Reviewed By: efriedma

Subscribers: dmgreen, javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62266

llvm-svn: 362487
2019-06-04 11:06:08 +00:00
Simon Pilgrim ad298f86b7 [SelectionDAG] ComputeNumSignBits - support constant pool values from target
As I mentioned on D61887 we don't get many hits on ComputeNumSignBits as we did on computeKnownBits.

The case we do get is interesting though - it allows us to use the 'ConditionalNegate' combine in combineLogicBlendIntoPBLENDV to remove a select.

It comes too late for SSE41 (BLENDV) cases, but SSE2 tests can hit it now. We should probably try to make use of this for SSE41+ targets as well - avoiding variable blends is usually a good idea. I'll investigate as a followup.

Differential Revision: https://reviews.llvm.org/D62777

llvm-svn: 362486
2019-06-04 10:49:06 +00:00
Simon Pilgrim 3178546a27 [SelectionDAG] ComputeNumSignBits - clang-format + improve *EXTLOAD comments. NFCI.
Pre-commit requested for D62777.

llvm-svn: 362485
2019-06-04 10:17:56 +00:00
Simon Pilgrim 3018d505a3 [SelectionDAG] Add fpto[us]i(undef) --> undef constant fold
Follow up to D62807.

Differential Revision: https://reviews.llvm.org/D62811

llvm-svn: 362483
2019-06-04 10:04:55 +00:00
QingShan Zhang 11de0e71b0 [DAGCombine] Match a pattern where a wide type scalar value is stored by several narrow stores
This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c

static void store64(u64 x, unsigned char* y)
{
    for(int i = 0; i != 8; ++i)
        y[i] = (x >> ((7-i) * 8)) & 255;
}

static u64 load64(const unsigned char* y)
{
    u64 res = 0;
    for(int i = 0; i != 8; ++i)
        res |= (u64)(y[i]) << ((7-i) * 8);
    return res;
}
The load64 has been implemented by https://reviews.llvm.org/D26149
This patch is trying to implement the store pattern.

Match a pattern where a wide type scalar value is stored by several narrow
stores. Fold it into a single store or a BSWAP and a store if the targets
supports it.

Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;

>
*((i32)p) = val;

i8 *p = ...
i32 val = ...
p[0] = (val >> 24) & 0xFF;
p[1] = (val >> 16) & 0xFF;
p[2] = (val >> 8) & 0xFF;
p[3] = (val >> 0) & 0xFF;

>
*((i32)p) = BSWAP(val);

Differential Revision: https://reviews.llvm.org/D61843

llvm-svn: 362472
2019-06-04 08:53:53 +00:00
Michael Berg 6ff978ee05 Propagate fmf for setcc in SDAG for select folds
llvm-svn: 362448
2019-06-03 21:53:26 +00:00
Michael Berg 0b7f98da65 Propagate fmf for setcc/select folds
Summary: This change facilitates propagating fmf which was placed on setcc from fcmp through folds with selects so that back ends can model this path for arithmetic folds on selects in SDAG.

Reviewers: qcolombet, spatel

Reviewed By: qcolombet

Subscribers: nemanjai, jsji

Differential Revision: https://reviews.llvm.org/D62552

llvm-svn: 362439
2019-06-03 19:12:15 +00:00
Simon Pilgrim cb7e4e8193 [SelectionDAG] Add [us]itofp(undef) --> 0 constant fold (PR39205)
We were missing this fold in the DAG, which I've copied directly from llvm::ConstantFoldCastInstruction

Differential Revision: https://reviews.llvm.org/D62807

llvm-svn: 362397
2019-06-03 13:02:07 +00:00
Florian Hahn e71963c850 Recommit r360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
If we hit the limit, we do expand the outstanding tokenfactors.
Otherwise, we might drop nodes with users in the unexpanded
tokenfactors. This fixes the crashes reported by Jordan Rupprecht.

Reviewers: niravd, spatel, craig.topper, rupprecht

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D62633

llvm-svn: 362350
2019-06-03 01:30:19 +00:00
Craig Topper 50b35caf30 [DAGCombiner][X86] Fold away masked store and scatter with all zeroes mask.
Similar to what was done for masked load and gather.

llvm-svn: 362342
2019-06-02 22:52:38 +00:00
Craig Topper a7bc31ebc6 [DAGCombiner] Replace masked loads with a zero mask with the passthru value
Similar to what was recently done for gathers in r362015.

llvm-svn: 362337
2019-06-02 18:58:46 +00:00
Simon Pilgrim 7a869e7036 [DAGCombine] Fold insert_subvector(bitcast(x),bitcast(y),c1) -> bitcast(insert_subvector(x,y),c2)
Move this combine from x86 into generic DAGCombine, which currently only manages cases where the bitcast is between types of the same scalarsize.

Differential Revision: https://reviews.llvm.org/D59188

llvm-svn: 362324
2019-06-02 14:42:11 +00:00
Simon Pilgrim ffb4d2bff7 [DAG] isBitwiseNot / isConstOrConstSplat - add support for build vector undefs + truncation (PR41020)
Add (opt-in) support for implicit truncation to isConstOrConstSplat, which allows us to match truncated 'all ones' cases in isBitwiseNot.

PR41020 compares against using ISD::isBuildVectorAllOnes() instead, but that predicate silently accepts any UNDEF elements in the build vector which might not be what we want in isBitwiseNot - so I've added an opt-in 'AllowUndefs' flag that is set to false by default but will allow us to enable it on individual cases where its safe.

Differential Revision: https://reviews.llvm.org/D62783

llvm-svn: 362323
2019-06-02 11:56:39 +00:00
Simon Pilgrim 88522ce388 [TargetLowering] SimplifyDemandedBits - don't use OriginalDemanded variables in analysis.
These might have been replaced in multiple use cases.

llvm-svn: 362322
2019-06-02 10:12:55 +00:00
Simon Pilgrim 30a6caa3e7 [TargetLowering] SimplifyDemandedVectorElts - use same arg names as SimplifyDemandedBits. NFCI.
Helps with debugging as we recurse between them.

llvm-svn: 362321
2019-06-02 10:03:56 +00:00
Craig Topper f58ef87bb7 [DAGCombiner] Replace two unchecked dyn_casts with casts.
The results of the dyn_casts were immediately dereferenced on the next line
so they had better not be null.

I don't think there's any way for these dyn_casts to fail, so use a cast
of adding null check.

llvm-svn: 362315
2019-06-02 03:31:01 +00:00
Craig Topper bc9e04d0c3 [SelectionDAG] Make the code in mutateStrictFPToFP less aware of how many operands each node has. NFCI
Just copy all of the operands except the chain and call MorphNode on that.
This removes the IsUnary and IsTernary flags.

Also always get the result type from the result type of the original
nodes. Previously we got it from the operand except for two nodes
where that didn't work.

llvm-svn: 362269
2019-05-31 22:18:45 +00:00
Kevin P. Neal ac79007205 Revert revert of r362112 with minor SystemZ test file corrections.
[FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes

This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	https://reviews.llvm.org/D62546

llvm-svn: 362241
2019-05-31 16:32:12 +00:00
Roman Lebedev 46511d75b5 [DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts
I don't have a test case for these, but there is a test case for D62266
where, even after all the constant-folding patches, we still end up
with endless combine loop. Which makes sense, since we don't constant
fold for opaque constants.

llvm-svn: 362156
2019-05-30 21:10:37 +00:00
Roman Lebedev a4e3b50e26 [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold. Try 2
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 362146
2019-05-30 20:37:49 +00:00
Roman Lebedev 57aa36ff91 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 3
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 362145
2019-05-30 20:37:39 +00:00
Roman Lebedev 63b4741534 [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 3
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 362144
2019-05-30 20:37:29 +00:00
Roman Lebedev 05ad5fd213 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 3
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 362143
2019-05-30 20:37:18 +00:00
Roman Lebedev 1d9ec7a81b [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 3
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs, and then reverted in
rL362109 to fix missing constant folds that were causing
endless combine loops.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 362142
2019-05-30 20:36:54 +00:00
Roman Lebedev 7eb8b5b5dd [DAGCombine] ((c1-A)-c2) -> ((c1-c2)-A) constant-fold
Summary: https://rise4fun.com/Alive/B0A

Reviewers: t.p.northover, RKSimon, spatel, craig.topper

Reviewed By: RKSimon

Subscribers: javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62691

llvm-svn: 362135
2019-05-30 19:27:51 +00:00
Roman Lebedev 691b5e2ecc [DAGCombine] (A-C1)-C2 -> A-(C1+C2) constant-fold
Summary: https://rise4fun.com/Alive/Mb1M

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62689

llvm-svn: 362134
2019-05-30 19:27:42 +00:00
Roman Lebedev 0a3dbbcdfb [DAGCombine] (A+C1)-C2 -> A+(C1-C2) constant-fold
Summary:
Direct sibling of D62662, the root cause of the endless combine loop in D62257

https://rise4fun.com/Alive/d3W

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62664

llvm-svn: 362133
2019-05-30 19:27:32 +00:00
Roman Lebedev 9ff3159b4a [DAGCombine] Use FoldConstantArithmetic() to perform C2-(A+C1) -> (C2-C1)-A fold
Summary:
No tests change, and i'm not sure how to test this, but it's better safe than sorry.

Reviewers: spatel, RKSimon, craig.topper, t.p.northover

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62663

llvm-svn: 362132
2019-05-30 19:27:26 +00:00
Roman Lebedev cc9a9cf237 [DAGCombine] ((A-c1)+c2) -> (A+(c2-c1)) constant-fold
Summary:
This was the root cause of the endless combine loop in D62257

https://rise4fun.com/Alive/d3W

Reviewers: RKSimon, spatel, craig.topper, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62662

llvm-svn: 362131
2019-05-30 19:27:19 +00:00
Roman Lebedev ef95679741 [DAGCombine] Use FoldConstantArithmetic() to perform ((c1-A)+c2) -> (c1+c2)-A fold
Summary: No tests change, and i'm not sure how to test this, but it's better safe than sorry.

Reviewers: spatel, RKSimon, craig.topper, t.p.northover

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62661

llvm-svn: 362130
2019-05-30 19:27:10 +00:00
Tim Northover b7141207a4 Reapply: IR: add optional type to 'byval' function parameters
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.

If present, the type must match the pointee type of the argument.

The original commit did not remap byval types when linking modules, which broke
LTO. This version fixes that.

Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.

llvm-svn: 362128
2019-05-30 18:48:23 +00:00
Kevin P. Neal 51ce0b196a Correct error in revert of r362112.
Differential Revision:	http://reviews.llvm.org/D62546

llvm-svn: 362118
2019-05-30 17:21:45 +00:00
Kevin P. Neal d3db7b40b0 Revert r362112, it broke the bots with the message "Unsupported vector argument or return type"
Differential Revision:	http://reviews.llvm.org/D62546

llvm-svn: 362117
2019-05-30 17:10:21 +00:00
Kevin P. Neal 2e1807678d [FPEnv] Added a special UnrollVectorOp method to deal with the chain on StrictFP opcodes
This change creates UnrollVectorOp_StrictFP. The purpose of this is to address a failure that consistently occurs when calling StrictFP functions on vectors whose number of elements is 3 + 2n on most platforms, such as PowerPC or SystemZ. The old UnrollVectorOp method does not expect that the vector that it will unroll will have a chain, so it has an assert that prevents it from running if this is the case. This new StrictFP version of the method deals with the chain while unrolling the vector. With this new function in place during vector widending, llc can run vector-constrained-fp-intrinsics.ll for SystemZ successfully.

Submitted by:	Drew Wock <drew.wock@sas.com>
Reviewed by:	Cameron McInally, Kevin P. Neal
Approved by:	Cameron McInally
Differential Revision:	http://reviews.llvm.org/D62546

llvm-svn: 362112
2019-05-30 16:44:47 +00:00
Roman Lebedev 019d270e43 [DAGCombine] Revert of recommit of "binop-with-const hoisting" patches
I was looking into an endless combine loop the uncommitted follow-up patch
was causing, and it appears even these patches can exibit such an
endless loop. The root cause is that we try to hoist one binop (add/sub) with
constant operand, and if we get two such binops both of which are
eligible for this hoisting, we get stuck.

Some cases may highlight missing constant-folds.

Reverts r361871,r361872,r361873,r361874.

llvm-svn: 362109
2019-05-30 16:07:11 +00:00
Tim Northover 71ee3d0237 Revert "IR: add optional type to 'byval' function parameters"
The IRLinker doesn't delve into the new byval attribute when mapping types, and
this breaks LTO.

llvm-svn: 362029
2019-05-29 20:46:38 +00:00
Benjamin Kramer 107f8d9873 [DAGCombiner] Replace gathers with a zero mask with the passthru value
These can be created by the legalizer when splitting a larger gather.

See https://llvm.org/PR42055 for a motivating example.

Differential Revision: https://reviews.llvm.org/D62613

llvm-svn: 362015
2019-05-29 19:24:19 +00:00
Tim Northover 6e07f16fae IR: add optional type to 'byval' function parameters
When we switch to opaque pointer types we will need some way to describe
how many bytes a 'byval' parameter should occupy on the stack. This adds
a (for now) optional extra type parameter.

If present, the type must match the pointee type of the argument.

Note to front-end maintainers: if this causes test failures, it's probably
because the "byval" attribute is printed after attributes without any parameter
after this change.

llvm-svn: 362012
2019-05-29 19:12:48 +00:00
Adhemerval Zanella 6d7bf5e8df [CodeGen] Add lrint/llrint builtins
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D62017

llvm-svn: 361875
2019-05-28 20:47:44 +00:00
Roman Lebedev dfc34f0211 [DAGCombine] (x - C) - y -> (x - y) - C fold. Try 2
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

This is a recommit, originally committed in rL361856, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361874
2019-05-28 20:40:10 +00:00
Roman Lebedev d485c6bc9f [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold. Try 2
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

This is a recommit, originally committed in rL361855, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361873
2019-05-28 20:40:03 +00:00
Roman Lebedev 96c9986199 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold. Try 2
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

This is a recommit, originally committed in rL361853, but reverted
to investigate test-suite compile-time hangs.

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361872
2019-05-28 20:39:55 +00:00
Roman Lebedev 2feb7e56e2 [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold. Try 2
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

This is a recommit, originally committed in rL361852, but reverted
to investigate test-suite compile-time hangs.

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361871
2019-05-28 20:39:39 +00:00
Roman Lebedev 272d70c366 Revert DAGCombine "hoist binop with const" folds
Appear to introduce test-suite compile-time hang.

http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/22825

This reverts r361852,r361853,r361854,r361855,r361856

llvm-svn: 361865
2019-05-28 19:04:21 +00:00
Roman Lebedev 7669665432 [DAGCombine] (x - C) - y -> (x - y) - C fold
Summary:
Again only vectors affected. Frustrating. Let me take a look into that..

https://rise4fun.com/Alive/AAq

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62294

llvm-svn: 361856
2019-05-28 17:54:21 +00:00
Roman Lebedev 8c9b3e4e4a [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold
Summary:
This prevents regressions in next patch,
and somewhat recovers from the regression to AMDGPU test in D62223.

It is indeed not great that we leave vector decrement,
don't transform it into vector add all-ones..

https://rise4fun.com/Alive/ZRl

Reviewers: RKSimon, craig.topper, spatel, arsenm

Reviewed By: RKSimon, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62263

llvm-svn: 361855
2019-05-28 17:54:13 +00:00
Roman Lebedev 6a24c9b9ab [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold
Summary:
Only vector tests are being affected here,
since subtraction by scalar constant is rewritten
as addition by negated constant.

No surprising test changes.

https://rise4fun.com/Alive/pbT

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62257

llvm-svn: 361854
2019-05-28 17:54:04 +00:00
Roman Lebedev 1499f65ac1 [DAGCombiner][X86][AArch64][SPARC][SystemZ] y - (x + C) -> (y - x) - C fold
Summary:
Direct sibling of D62223 patch.
While i don't have a direct motivational pattern for this,
it would seem to make sense to handle both patterns (or none),
for symmetry?

The aarch64 changes look neutral;
sparc and systemz look like improvement (one less instruction each);
x86 changes - 32bit case improves, 64bit case shows that LEA no longer
gets constructed, which may be because that whole test is `-mattr=+slow-lea,+slow-3ops-lea`

https://rise4fun.com/Alive/ffh

Reviewers: RKSimon, craig.topper, spatel, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, jyknight, javed.absar, kristof.beyls, fedor.sergeev, jrtc27, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62252

llvm-svn: 361853
2019-05-28 17:53:54 +00:00
Roman Lebedev 19f51ec04a [DAGCombiner][X86][AArch64][AMDGPU] (x + C) - y -> (x - y) + C fold
Summary:
The main motivation is shown by all these `neg` instructions that are now created.
In particular, the `@reg32_lshr_by_negated_unfolded_sub_b` test.

AArch64 test changes all look good (`neg` created), or neutral.

X86 changes look neutral (vectors), or good (`neg` / `xor eax, eax` created).

I'm not sure about `X86/ragreedy-hoist-spill.ll`, it looks like the spill
is now hoisted into preheader (which should still be good?),
2 4-byte reloads become 1 8-byte reload, and are elsewhere,
but i'm not sure how that affects that loop.

I'm unable to interpret AMDGPU change, looks neutral-ish?

This is hopefully a step towards solving [[ https://bugs.llvm.org/show_bug.cgi?id=41952 | PR41952 ]].

https://rise4fun.com/Alive/pkdq (we are missing more patterns, i'll submit them later)

Reviewers: craig.topper, RKSimon, spatel, arsenm

Reviewed By: RKSimon

Subscribers: bjope, qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, javed.absar, dstuttard, tpr, t-tye, kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62223

llvm-svn: 361852
2019-05-28 17:53:43 +00:00
Simon Pilgrim 9cd9624fb6 [DAG] LegalizeVectorTypes - reduce scope of local variables. NFCI.
Move the element index/count variables into the block where they are actually used - appeases cppcheck and helps avoid shadow variable warnings.

llvm-svn: 361821
2019-05-28 13:46:26 +00:00
Benjamin Kramer 57e267a2e9 [X86] Custom lower CONCAT_VECTORS of v2i1
The generic legalizer cannot handle this. Add an assert instead of
silently miscompiling vectors with elements smaller than 8 bits.

llvm-svn: 361814
2019-05-28 12:52:57 +00:00
Sanjay Patel 2f99d009c1 [SelectionDAG] fold concat of extract subvectors
This is derived from the related fold for build vectors.
We also have a version of this in DAGCombiner. The benefit of
having this fold at node creation time is (1) efficiency and
(2) preventing infinite looping from creating patterns that
should not exist in the first place.

Currently, the inf-loop could happen with MergeConsecutiveStores()
because it naively creates concat of extracts when forming a wider
vector store. That could fight with target-specific store narrowing.

llvm-svn: 361780
2019-05-27 20:26:21 +00:00
Sanjay Patel e13ae3e4d8 [SelectionDAG] fix formatting and redundant comments; NFC
There's a possible missing fold here for extracting from the
same source vector. It's similar to a check that we use to
squash a build vector with all extracted elements from the
same source vector.

llvm-svn: 361778
2019-05-27 18:26:43 +00:00
Michael Liao 9c70c574b4 [SelectionDAG] Enhance the simplification of `copyto` from `implicit-def`.
Summary:
- The current implementation simplifies the case where the source of
  `copyto` is `implicit-def`ed. However, it only works when that
  `implicit-def` is single-used since it detects that from
  `implicit-def` and cannot determine which destination vreg should be
  used if there are multiple uses.
- This patch changes that detection when `copyto` is being emitted. If
  that `copyto`'s source is defined from `implicit-def`, it simplifies
  it. Hence, it works even that `implicit-def` is multi-used.
- Except it simplifies the internal IR, it won't improve the quality of
  code generation. However, it helps to detect 'implicit-def` in a
  straight-forward manner in some passes, such as `si-i1-copies`. A test
  case is added.

Reviewers: sunfish, nhaehnle

Subscribers: jvesely, hiraditya, asbirlea, llvm-commits, yaxunl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62342

llvm-svn: 361777
2019-05-27 18:26:29 +00:00
Simon Pilgrim ebb053b139 [SelectionDAG] GetDemandedBits - add demanded elements wrapper implementation
The DemandedElts variable is pretty much inert at the moment - the original GetDemandedBits implementation calls it with an 'all ones' DemandedElts value so the function is active and behaves exactly as it used to.

llvm-svn: 361773
2019-05-27 16:39:25 +00:00
Alexander Timofeev ba447bae74 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

llvm-svn: 361741
2019-05-26 20:33:26 +00:00
Simon Pilgrim 06e02856ab [SelectionDAG] GetDemandedBits - cleanup to more closely match SimplifyDemandedBits. NFCI.
Prep work before adding demanded elts support.

llvm-svn: 361739
2019-05-26 18:58:14 +00:00
Simon Pilgrim 2916b9e28c [SelectionDAG] MaskedValueIsZero - add demanded elements implementation
Will be used in an upcoming patch but I've updated the original implementation to call this to ensure test coverage.

llvm-svn: 361738
2019-05-26 18:43:44 +00:00
Sanjay Patel 91131b6500 [SelectionDAG] soften assertion when legalizing narrow vector FP ops
The test based on PR42010:
https://bugs.llvm.org/show_bug.cgi?id=42010
...may show an inaccuracy for PPC's target defs, but we should not
be so aggressive with an assert here. There's no telling what out-of-tree
targets look like.

llvm-svn: 361696
2019-05-25 13:48:07 +00:00
Peter Collingbourne 3b93737446 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

llvm-svn: 361688
2019-05-25 01:52:38 +00:00
Alexander Timofeev dffedea014 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

llvm-svn: 361644
2019-05-24 15:32:18 +00:00
Simon Pilgrim 95b8d9bbf8 [SelectionDAG] computeKnownBits - support constant pool values from target
This patch adds the overridable TargetLowering::getTargetConstantFromLoad function which allows targets to return any constant value loaded by a LoadSDNode node - only X86 makes use of this so far but everything should be in place for other targets.

computeKnownBits then uses this function to improve codegen, notably vector code after legalization.

A future commit will do the same for ComputeNumSignBits but computeKnownBits sees the bigger benefit.

This required a couple of fixes:
* SimplifyDemandedBits must early-out for getTargetConstantFromLoad cases to prevent infinite loops of constant regeneration (similar to what we already do for BUILD_VECTOR).
* Fix a DAGCombiner::visitTRUNCATE issue as we had trunc(shl(v8i32),v8i16) <-> shl(trunc(v8i16),v8i32) infinite loops after legalization on AVX512 targets.

Differential Revision: https://reviews.llvm.org/D61887

llvm-svn: 361620
2019-05-24 10:03:11 +00:00
Tim Northover 3d7a057b0d CodeGen: factor out swifterror value tracking.
llvm-svn: 361607
2019-05-24 08:39:43 +00:00
Sanjay Patel 7d6c0bce50 [DAGCombiner] make folds of binops safe for opcodes that produce >1 value
This is no-functional-change-intended currently because the definition
of isBinOp() only includes opcodes that produce 1 value. But if we
share that implementation with isCommutativeBinOp() as proposed in
D62191, then we need to make sure that the callers bail out for
opcodes that they are not prepared to handle correctly.

llvm-svn: 361547
2019-05-23 20:17:25 +00:00
Kees Cook c2187c20a4 [TargetLowering] Extend bool args to inline-asm according to getBooleanType
Summary:
This extends Krzysztof Parzyszek's X86-specific solution
(https://reviews.llvm.org/D60208) to the generic code pointed out by
James Y Knight.

Reviewers: kparzysz, craig.topper, nickdesaulniers

Subscribers: efriedma, sdardis, nemanjai, javed.absar, eraman, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, srhines, void, nickdesaulniers, jyknight

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60224

llvm-svn: 361404
2019-05-22 16:16:15 +00:00
Kees Cook a7a687e500 [TargetLowering] Add blank line (test commit)
llvm-svn: 361403
2019-05-22 16:02:13 +00:00
Leonard Chan 0bada7ce6c [Intrinsic] Signed Fixed Point Saturation Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D55720

llvm-svn: 361289
2019-05-21 19:17:19 +00:00
Sanjay Patel 10f6b39899 [SelectionDAG] fold insert subvector of undef into undef
DAGCombiner simplifies this more liberally as:
  // If inserting an UNDEF, just return the original vector.
  if (N1.isUndef())
    return N0;

So there's no way to make this visible in output AFAIK, but
doing this at node creation time should be slightly more efficient.

llvm-svn: 361287
2019-05-21 18:53:53 +00:00
Sanjay Patel 51dc59d090 [SelectionDAG] remove redundant code; NFCI
getNode() squashes concatenation of undefs via FoldCONCAT_VECTORS():
  // Concat of UNDEFs is UNDEF.
  if (llvm::all_of(Ops, [](SDValue Op) { return Op.isUndef(); }))
    return DAG.getUNDEF(VT);

llvm-svn: 361284
2019-05-21 18:28:22 +00:00
Sanjay Patel 78c3f58122 [DAGCombiner] prevent unsafe reassociation of FP ops
There are no FP callers of DAGCombiner::reassociateOps() currently,
but we can add a fast-math check to make sure this API is not being
misused.

This was noted as a potential risk (and that risk might increase) with:
D62191

llvm-svn: 361268
2019-05-21 14:47:38 +00:00
Dylan McKay e967308da4 Add TargetLoweringInfo hook for explicitly setting the ABI calling convention endianess
Summary:
The endianess used in the calling convention does not always match the
endianess of the target on all architectures, namely AVR.

When an argument is too large to be legalised by the architecture and is
split for the ABI, a new hook TargetLoweringInfo::shouldSplitFunctionArgumentsAsLittleEndian
is queried to find the endianess that function arguments must be laid
out in.

This approach was recommended by Eli Friedman.

Originally reported in https://github.com/avr-rust/rust/issues/129.

Patch by Carl Peto.

Reviewers: bogner, t.p.northover, RKSimon, niravd, efriedma

Reviewed By: efriedma

Subscribers: JDevlieghere, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62003

llvm-svn: 361222
2019-05-21 06:38:02 +00:00
Craig Topper 97d4f7c194 [SelectionDAGBuilder] Flush PendingExports before creating INLINEASM_BR node for asm goto.
Since INLINEASM_BR is a terminator we need to flush the pending exports before
emitting it. If we don't do this, a TokenFactor can be inserted between it and
the BR instruction emitted to finish the callbr lowering.

It looks like nodes are glued to the INLINEASM_BR so I had to make sure we emit
the TokenFactor before that.

Differential Revision: https://reviews.llvm.org/D59981

llvm-svn: 361177
2019-05-20 17:08:02 +00:00
Craig Topper af7a188453 [Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llvm.llround overloaded instead of fixing to i64
We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well.

Differential Revision: https://reviews.llvm.org/D62026

llvm-svn: 361169
2019-05-20 16:27:09 +00:00
Craig Topper 203bfdd0f0 [DAGCombiner] Refactor code in visitShiftByConstant slightly to make it more readable. NFC
This changes the isShift variable to include the constant operand
check that was previously in the if statement.

While there fix an 80 column violation and an unnecessary use of
getNode. Also fix variable name capitalization.

llvm-svn: 361168
2019-05-20 16:26:55 +00:00
Nikita Popov 9060b6df97 [SDAG] Vector op legalization for overflow ops
Fixes issue reported by aemerson on D57348. Vector op legalization
support is added for uaddo, usubo, saddo and ssubo (umulo and smulo
were already supported). As usual, by extracting TargetLowering methods
and calling them from vector op legalization.

Vector op legalization doesn't really deal with multiple result nodes,
so I'm explicitly performing a recursive legalization call on the
result value that is not being legalized.

There are some existing test changes because expansion happens
earlier, so we don't get a DAG combiner run in between anymore.

Differential Revision: https://reviews.llvm.org/D61692

llvm-svn: 361166
2019-05-20 16:09:22 +00:00
Guillaume Chatelet e386a01e84 [NFC] Refactor visitIntrinsicCall so it doesn't return a const char*
Summary: API simplification

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61306

llvm-svn: 361140
2019-05-20 11:01:30 +00:00
Petar Jovanovic e85bbf564d [DebugInfoMetadata] Refactor DIExpression::prepend constants (NFC)
Refactor DIExpression::With* into a flag enum in order to be less
error-prone to use (as discussed on D60866).

Patch by Djordje Todorovic.

Differential Revision: https://reviews.llvm.org/D61943

llvm-svn: 361137
2019-05-20 10:35:57 +00:00
Guillaume Chatelet a760e69840 Revert "[NFC] Refactor visitIntrinsicCall so it doesn't return a const char*"
This reverts commit 706d3cd6388cc3446aab282f3af879862b10cbed.

llvm-svn: 361130
2019-05-20 09:00:12 +00:00
Guillaume Chatelet fa8c152576 [NFC] Refactor visitIntrinsicCall so it doesn't return a const char*
Summary: API simplification

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61306

llvm-svn: 361129
2019-05-20 08:52:10 +00:00
Roman Lebedev 64c756b991 [DAGCombiner] visitShiftByConstant(): drop bogus signbit check
Summary:
That check claims that the transform is illegal otherwise.
That isn't true:
1. For `ISD::ADD`, we only process `ISD::SHL` outer shift => sign bit does not matter
   https://rise4fun.com/Alive/K4A
2. For `ISD::AND`, there is no restriction on constants:
   https://rise4fun.com/Alive/Wy3
3. For `ISD::OR`, there is no restriction on constants:
   https://rise4fun.com/Alive/GOH
3. For `ISD::XOR`, there is no restriction on constants:
   https://rise4fun.com/Alive/ml6

So, why is it there then?

This changes the testcase that was touched by @spatel in rL347478,
but i'm not sure that test tests anything particular?

Reviewers: RKSimon, spatel, craig.topper, jojo, rengolin

Reviewed By: spatel

Subscribers: javed.absar, llvm-commits, spatel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61918

llvm-svn: 361044
2019-05-17 15:52:58 +00:00
Tim Renouf e3cbdaf1b5 [CodeGen] Fixed de-optimization of legalize subvector extract
The recent introduction of v3i32 etc as an MVT, and its use in AMDGPU
3-dword memory instructions, caused a de-optimization problem for code
with such a load that then bitcasts via vector of i8, because v12i8 is
not an MVT so it legalizes the bitcast by widening it.

This commit adds the ability to widen a bitcast using extract_subvector
on the result, so the value does not need to go via memory.

Differential Revision: https://reviews.llvm.org/D60457

Change-Id: Ie4abb7760547e54a2445961992eafc78e80d4b64
llvm-svn: 360942
2019-05-16 21:49:06 +00:00
Adhemerval Zanella 73643b5041 [CodeGen] Add lround/llround builtins
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics.  The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.

The idea is to optimize lround/llround generation for AArch64
in a subsequent patch.  Current semantic is just route it to libm
symbol.

llvm-svn: 360889
2019-05-16 13:15:27 +00:00
Reid Kleckner 4882490349 [codeview] Fix SDNode representation of annotation labels
Before this change, they were erroneously constructed with the EH_LABEL
SDNode opcode, which caused other passes to interact with them in
incorrect ways. See the FIXME about fastisel that this addresses in the
existing test case.

Fixes PR41890

llvm-svn: 360818
2019-05-15 21:46:05 +00:00
Clement Courbet d9d0665d1c [[DAGCombiner][NFC] Add a comment.
As suggested in D61846.

llvm-svn: 360755
2019-05-15 08:21:18 +00:00
Sanjay Patel 99d6420a82 [SDAG] fix unused variable warning and unneeded indirection; NFC
llvm-svn: 360640
2019-05-14 00:57:31 +00:00
Sanjay Patel 3a13d970aa [SDAG, x86] allow targets to override test for binop opcodes
This follows the pattern of the existing isCommutativeBinOp().

x86 shows improvements from vector narrowing for the min/max opcodes.

llvm-svn: 360639
2019-05-14 00:39:40 +00:00
Nick Desaulniers c33f754e74 [TargetLowering] Handle multi depth GEPs w/ inline asm constraints
Summary:
X86TargetLowering::LowerAsmOperandForConstraint had better support than
TargetLowering::LowerAsmOperandForConstraint for arbitrary depth
getelementpointers for "i", "n", and "s" extended inline assembly
constraints. Hoist its support from the derived class into the base
class.

Link: https://github.com/ClangBuiltLinux/linux/issues/469

Reviewers: echristo, t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, E5ten, kees, jyknight, nemanjai, javed.absar, eraman, hiraditya, jsji, llvm-commits, void, craig.topper, nathanchance, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61560

llvm-svn: 360604
2019-05-13 17:27:44 +00:00
Simon Pilgrim d3cedee3c6 [TargetLowering] Add SimplifyDemandedBits support for ZERO_EXTEND_VECTOR_INREG
More work for PR39709.

llvm-svn: 360592
2019-05-13 15:51:26 +00:00
Sanjay Patel 05dafb1c97 [DAGCombiner] narrow vector binop with inserts/extract
We catch most of these patterns (on x86 at least) by matching
a concat vectors opcode early in combining, but the pattern may
emerge later using insert subvector instead.

The AVX1 diffs for add/sub overflow show another missed narrowing
pattern. That one may be falling though the cracks because of
combine ordering and multiple uses.

llvm-svn: 360585
2019-05-13 14:31:14 +00:00
Kevin P. Neal 5987749e33 Add constrained fptrunc and fpext intrinsics.
The new fptrunc and fpext intrinsics are constrained versions of the
regular fptrunc and fpext instructions.

Reviewed by:	Andrew Kaylor, Craig Topper, Cameron McInally, Conner Abbot
Approved by:	Craig Topper
Differential Revision: https://reviews.llvm.org/D55897

llvm-svn: 360581
2019-05-13 13:23:30 +00:00
Simon Pilgrim d845bc3d0c TargetLowering::SimplifyDemandedBits - early-out for UNDEF ops. NFCI.
llvm-svn: 360579
2019-05-13 12:44:03 +00:00
Clement Courbet 9afc4764dd [DAGCombiner] Fix invalid alias analysis.
Summary:
When we know for sure whether two addresses do or do not alias, we
should immediately return from DAGCombiner::isAlias().

I think this comes from a bad copy/paste, Sorry for not catching that during the
code review.

Fixes PR41855.

Reviewers: niravd, gchatelet, EricWF

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61846

llvm-svn: 360566
2019-05-13 09:07:37 +00:00
Craig Topper 61e556d2bd Recommit r358887 "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"
I've included a new fix in X86RegisterInfo to prevent PR41619 without
reintroducing r359392. We might be able to improve that in the base class
implementation of shouldRewriteCopySrc somehow. But this hopefully enables
forward progress on SimplifyDemandedBits improvements for now.

Original commit message:

This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGComb
but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

llvm-svn: 360552
2019-05-13 04:03:35 +00:00
Sanjay Patel a09e686821 [DAGCombiner] try to move bitcast after extract_subvector
I noticed that we were failing to narrow an x86 ymm math op in a case similar
to the 'madd' test diff. That is because a bitcast is sitting between the math
and the extract subvector and thwarting our pattern matching for narrowing:

       t56: v8i32 = add t59, t58
      t68: v4i64 = bitcast t56
    t73: v2i64 = extract_subvector t68, Constant:i64<2>
  t96: v4i32 = bitcast t73

There are a few wins and neutral diffs in the other tests.

Differential Revision: https://reviews.llvm.org/D61806

llvm-svn: 360541
2019-05-12 14:43:20 +00:00
Simon Pilgrim 605a840747 [DAG] Add SimplifyDemandedBits support for BITREVERSE
Pulled out of D58017 while I continue to investigate the BSWAP regression on PPC

llvm-svn: 360534
2019-05-11 20:56:05 +00:00
Simon Pilgrim aeed0a30c0 SelectionDAGISel::CodeGenAndEmitDAG - remove unused variable. NFCI.
llvm-svn: 360514
2019-05-11 11:00:37 +00:00
Jordan Rupprecht 16c7fbd112 Revert [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
This reverts r360171 (git commit a9d6c32eaf). A repro showing the asan/msan failures is forthcoming.

llvm-svn: 360481
2019-05-10 23:20:02 +00:00
Craig Topper 114f763f37 [LegalizeVectorOps] Remove calls to LegalizeOp on the return value from ExpandLoad/ExpandStore.
We already updated the LegalizedNodes map at the end of the Expand call. This
would have marked the new node as being mapped to itself. So the LegalizeOp
call will find that an immediately return.

llvm-svn: 360472
2019-05-10 21:42:27 +00:00
Nikita Popov 9f7537bd48 [SDAG] Recursively legalize both vector mulo results
Split out from D61692 per RKSimon's suggestion. Vector op
legalization will automatically recursively legalize the returned
SDValue, but we need to take care of the other results ourselves.
Otherwise it will end up getting legalized only during op
legalization, by which point it might be too late (though I'm not
aware of any specific cases right now).

There are codegen differences because expansion occurs earlier now
and we don't get a DAGCombiner run in between.

Differential Revision: https://reviews.llvm.org/D61744

llvm-svn: 360470
2019-05-10 20:42:48 +00:00
Sanjay Patel b37ddeafc0 [DAGCombiner] reduce code duplication; NFC
llvm-svn: 360462
2019-05-10 20:02:30 +00:00
Tim Northover 6c1e3f9493 SelectionDAG: accommodate atomic floating stores.
We were applying a pointer truncation to floating types, which crashed LLVM.
That is Not A Good Thing(TM).

llvm-svn: 360421
2019-05-10 11:23:04 +00:00
Cameron McInally 156eb28289 [CodeGen] Add comment about FSUB <-> FNEG xforms
Differential Revision: https://reviews.llvm.org/D61741

llvm-svn: 360366
2019-05-09 19:28:52 +00:00
Florian Hahn be10bc71f9 [DAGCombiner] Limit number of nodes explored as store candidates.
To find the candidates to merge stores we iterate over all nodes in a chain
for each store, which leads to quadratic compile times for large basic blocks
with a large number of stores.

Reviewers: niravd, spatel, craig.topper

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D61511

llvm-svn: 360357
2019-05-09 17:05:52 +00:00
Leonard Chan 95b7abdcc5 [SelectionDAG] Expand ADD/SUBCARRY
This patch allows for expansion of ADDCARRY and SUBCARRY when the target does not support it.

Differential Revision: https://reviews.llvm.org/D61411

llvm-svn: 360303
2019-05-09 01:17:48 +00:00
Sanjay Patel 902b3ecdad [SelectionDAG] fold 'fneg undef' to undef
This is extracted from the original draft of D61419 with some additional tests.
We don't currently get this in IR (it's conservatively turned into a NaN),
but presumably that'll get updated as we add real IR support for 'fneg'
rather than 'fsub -0.0, x'.

The x86-32 run shows the following, and I haven't looked further to see why,
but that seems to be independent:
  Legalizing: t1: f32 = undef
  Trying to expand node
  Creating fp constant: t4: f32 = ConstantFP<0.000000e+00>

Differential Revision: https://reviews.llvm.org/D61516

llvm-svn: 360296
2019-05-08 22:19:52 +00:00
Craig Topper 493aec3ef5 [FastISel][X86] Support FNeg instruction in target independent fast isel handling
This patch adds support for calling selectFNeg for FNeg instructions in addition to the fsub idiom

Differential Revision: https://reviews.llvm.org/D61624

llvm-svn: 360273
2019-05-08 17:27:08 +00:00
Simon Pilgrim 2788ad3ee2 [LegalizeDAG] Assert non-power-of-2 load/store op splits are in range. NFCI.
Fixes static analyzer undefined/out-of-range shift warnings.

llvm-svn: 360245
2019-05-08 11:22:10 +00:00
Simon Pilgrim 97a0c54179 Fix cppcheck operator precedence warning. NFCI.
llvm-svn: 360234
2019-05-08 10:07:34 +00:00
QingShan Zhang e065af6a42 [NFC] Add a static function to do the endian check
Add a new function to do the endian check, as I will commit another patch later, which will also need the endian check. 

Differential Revision: https://reviews.llvm.org/D61236

llvm-svn: 360226
2019-05-08 07:21:37 +00:00
Florian Hahn a9d6c32eaf [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.

This patch adds some limits to the number of nodes explored for the
cases mentioned above.

Reviewers: niravd, spatel, craig.topper

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D61397

llvm-svn: 360171
2019-05-07 16:47:27 +00:00
Simon Pilgrim 3044ac058b Avoid use-after-move warnings by using swap instead. NFCI.
Swap should be as quick in these cases, and leaves the original variables in a known (empty) state.

llvm-svn: 360164
2019-05-07 15:45:00 +00:00
Craig Topper c6d445f9c1 [FastISel][X86] If selectFNeg fails, fall back to SelectionDAG not treating it as an fsub.
Summary:
If fneg lowering for fsub -0.0, x fails we currently fall back to treating it as an fsub. This has different behavior for nans than the xor with sign bit trick we normally try to do. On X86, the xor trick for double fails fast-isel in 32-bit mode with sse2 due to 64 bit integer types not being available. With -O2 we would always use an xorpd for this case. If we use subsd, this creates an observable behavior difference between -O0 and -O2. So fall back to SelectionDAG if we can't fast-isel it, that way SelectionDAG will use the xorpd.

I believe this patch is restoring the behavior prior to r345295 from last October. This was missed then because our fast isel case in 32-bit mode aborted fast-isel earlier for another reason. But I've added new tests to cover that.

Reviewers: andrew.w.kaylor, cameron.mcinally, spatel, efriedma

Reviewed By: cameron.mcinally

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61622

llvm-svn: 360111
2019-05-07 04:25:24 +00:00
Craig Topper 39f1a97417 [FastISel] Pass the fneg input operand to hasTrivialKill in FastISel::selectFNeg.
We're trying to calculate the kill flag for OpReg which is the input so we need to pass the input here.

llvm-svn: 360097
2019-05-06 23:09:09 +00:00
Philip Reames 2f53d79bff Fix pr33010, a 2 year old crashing regression
The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>.  The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this.  Instead, simply prevent it's creation.

Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change.  

llvm-svn: 360090
2019-05-06 22:09:31 +00:00
Craig Topper ad56843dd7 [SelectionDAG][X86] Support inline assembly returning an mmx register into a type with fewer than 64 bits.
It's possible to use the 'y' mmx constraint with a type narrower than 64-bits.

This patch supports this by bitcasting the mmx type to 64-bits and then
truncating to the desired type.

There are probably other missing type combinations we need to support, but this
is the case we have a bug report for.

Fixes PR41748.

Differential Revision: https://reviews.llvm.org/D61582

llvm-svn: 360069
2019-05-06 19:50:14 +00:00
Craig Topper 55a71b575c Revert r359392 and r358887
Reverts "[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead"
Reverts "[TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling"

Eric Christopher and Jorge Gorbe Moya reported some issues with these patches to me off list.

Removing the CodeGenOnly instructions has changed how fneg is handled during fast-isel with sse/sse2. We're now emitting fsub -0.0, x instead
moving to the integer domain(in a GPR), xoring the sign bit, and then moving back to xmm. This is because the fast isel table no longer
contains an entry for (f32/f64 bitcast (i32/i64)) so the target independent fneg code fails. The use of fsub changes the behavior of nan with
respect to -O2 codegen which will always use a pxor. NOTE: We still have a difference with double with -m32 since the move to GPR doesn't work
there. I'll file a separate PR for that and add test cases.

Since removing the CodeGenOnly instructions was fixing PR41619, I'm reverting r358887 which exposed that PR. Though I wouldn't be surprised
if that bug can still be hit independent of that.

This should hopefully get Google back to green. I'll work with Simon and other X86 folks to figure out how to move forward again.

llvm-svn: 360066
2019-05-06 19:29:24 +00:00
Nikita Popov cfe786a195 [SDAG][AArch64] Boolean and/or reduce to umax/min reduce (PR41635)
This addresses one half of https://bugs.llvm.org/show_bug.cgi?id=41635
by combining a VECREDUCE_AND/OR into VECREDUCE_UMIN/UMAX (if latter is
legal but former is not) for zero-or-all-ones boolean reductions (which
are detected based on sign bits).

Differential Revision: https://reviews.llvm.org/D61398

llvm-svn: 360054
2019-05-06 16:17:17 +00:00
Craig Topper f723490e76 [SelectionDAG] Replace llvm_unreachable at the end of getCopyFromParts with a report_fatal_error.
Based on PR41748, not all cases are handled in this function.

llvm_unreachable is treated as an optimization hint than can prune code paths
in a release build. This causes weird behavior when PR41748 is encountered on a
release build. It appears to generate an fp_round instruction from the floating
point code.

Making this a report_fatal_error prevents incorrect optimization of the code
and will instead generate a message to file a bug report.

llvm-svn: 360008
2019-05-06 04:01:49 +00:00
Simon Pilgrim 0f89b76b84 [SelectionDAG] Use any_of/all_of where possible. NFCI.
llvm-svn: 359974
2019-05-05 10:30:04 +00:00
Simon Pilgrim 5d3b100750 [DAGCombine] Remove repeated variables. NFCI.
llvm-svn: 359915
2019-05-03 18:20:28 +00:00
Simon Pilgrim 308b5ec1ff [TargetLowering] SimplifySetCC - remove repeated variable. NFCI.
Also reduce scope of Temp variable.

llvm-svn: 359911
2019-05-03 18:02:33 +00:00
Simon Pilgrim d857f64c31 [SelectionDAG] CreateTopologicalOrder - don't use iterator
We shouldn't use an iterator to loop across a std::vector when the same loop is adding elements to that std::vector

Found by cppcheck

llvm-svn: 359900
2019-05-03 15:50:37 +00:00
Simon Pilgrim bc876df3a5 [TargetLowering] ShrinkDemandedConstant - reduce scope of TLO.DAG variable. NFCI.
Only ever used in one block

llvm-svn: 359890
2019-05-03 14:38:24 +00:00
Simon Pilgrim e798e3a346 [TargetLowering] expandUnalignedStore - cleanup EVT variables. NFCI.
Avoid duplicated EVTs and rename Store/Load VTs to avoid -Wshadow warnings.

llvm-svn: 359877
2019-05-03 12:55:25 +00:00
Simon Pilgrim 42d2b604b5 [SelectionDAG] Use INT_MIN as (1 << 31) is UB for signed integers. NFCI.
llvm-svn: 359873
2019-05-03 11:32:00 +00:00
Simon Pilgrim bfd00a6440 [SelectionDAG] computeKnownBits - remove some duplicate/shadow variables. NFCI.
llvm-svn: 359872
2019-05-03 11:11:03 +00:00
Craig Topper e8a1cde886 [SelectionDAG] Add asserts to verify the vectorness of input and output types of TRUNCATE/ZERO_EXTEND/ANY_EXTEND/SIGN_EXTEND agree
As a result of the underlying cause of PR41678 we created an ANY_EXTEND node with a scalar result type and v1i1 input type. Ideally we would have asserted for this instead of letting it go through to instruction selection and generate bad machine IR

Differential Revision: https://reviews.llvm.org/D61463

llvm-svn: 359836
2019-05-02 22:26:26 +00:00
Sanjay Patel 1972826178 [DAGCombiner] try repeated fdiv divisor transform before building estimate (2nd try)
The original patch was committed at rL359398 and reverted at rL359695 because of
infinite looping.

This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop.

Original commit message:

This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359793
2019-05-02 15:02:08 +00:00
Sanjay Patel 284472be6d [SelectionDAG] remove constant folding limitations based on FP exceptions
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.

There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.

Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.

Differential Revision: https://reviews.llvm.org/D61331

llvm-svn: 359791
2019-05-02 14:47:59 +00:00
Sanjay Patel 64d5751254 Revert "[DAGCombiner] try repeated fdiv divisor transform before building estimate"
This reverts commit fb9a5307a9 (rL359398)
because it can cause an infinite loop due to opposing combines.

llvm-svn: 359695
2019-05-01 16:06:21 +00:00
Tim Northover ee2474df9f DAG: allow DAG pointer size different from memory representation.
In preparation for supporting ILP32 on AArch64, this modifies the SelectionDAG
builder code so that pointers are allowed to have a larger type when "live" in
the DAG compared to memory.

Pointers get zero-extended whenever they are loaded, and truncated prior to
stores.  In addition, a few not quite so obvious locations need updating:

  * A GEP that has not been marked inbounds needs to enforce the IR-documented
    2s-complement wrapping at the memory pointer size. Inbounds GEPs are
    undefined if they overflow the address space, so no additional operations
    are needed.
  * Signed comparisons would give incorrect results if performed on the
    zero-extended values.

This shouldn't affect CodeGen for now, but will become active when the AArch64
ILP32 support is committed.

llvm-svn: 359676
2019-05-01 12:37:30 +00:00
Sanjay Patel 0387bf5269 [SelectionDAG] remove div-by-zero constant folding restriction
We don't have this restriction in IR, so it should not be here
either simply out of consistency. Code that wants to handle FP
exceptions is expected to use the 'strict' variants of these
nodes.

We don't get the frem case because frem by 0.0 produces NaN (invalid),
and that's the remaining check here (so the removed check for frem
was dead code AFAIK).

This is the only place in SDAG that uses "HasFPExceptions", so I
think we should remove that entirely as a follow-up patch.

llvm-svn: 359566
2019-04-30 14:37:15 +00:00
Sjoerd Meijer 0ed4619679 [TargetLowering] findOptimalMemOpLowering. NFCI.
This was a local static funtion in SelectionDAG, which I've promoted to
TargetLowering so that I can reuse it to estimate the cost of a memory
operation in D59787.

Differential Revision: https://reviews.llvm.org/D59766

llvm-svn: 359543
2019-04-30 10:09:15 +00:00
Sjoerd Meijer 180f1ae57c [TargetLowering] Change getOptimalMemOpType to take a function attribute list
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.

This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.

Differential Revision: https://reviews.llvm.org/D59785

llvm-svn: 359537
2019-04-30 08:38:12 +00:00
Zi Xuan Wu 49d60fdc2e [DAGCombiner] Do not generate ISD::ADDE node if adde is not legal for the target when combine ISD::TRUNC node
Do not combine (trunc adde(X, Y, Carry)) into (adde trunc(X), trunc(Y), Carry), 
if adde is not legal for the target. Even it's at type-legalize phase. 
Because adde is special and will not be legalized at operation-legalize phase later.

This fixes: PR40922
https://bugs.llvm.org/show_bug.cgi?id=40922

Differential Revision: https://reviews.llvm.org//D60854

llvm-svn: 359532
2019-04-30 03:01:14 +00:00
Bjorn Pettersson 820994572c [DAG] Refactor DAGCombiner::ReassociateOps
Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.

Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.

Reviewers: spatel, craig.topper, tstellar

Reviewed By: spatel

Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61199

llvm-svn: 359476
2019-04-29 17:50:10 +00:00
Sanjay Patel fb9a5307a9 [DAGCombiner] try repeated fdiv divisor transform before building estimate
This was originally part of D61028, but it's an independent diff.

If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.

The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.

Differential Revision: https://reviews.llvm.org/D61149

llvm-svn: 359398
2019-04-28 12:23:43 +00:00
Simon Pilgrim ef54b1dddf [DAGCombine] Cleanup visitEXTRACT_SUBVECTOR. NFCI.
Use ArrayRef::slice, reduce some rather awkward long lines for legibility and run clang-format.

llvm-svn: 359326
2019-04-26 17:49:02 +00:00
Simon Pilgrim 5d6ef94c36 [X86][SSE] Disable shouldFoldConstantShiftPairToMask for btver1/btver2 targets (PR40758)
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).

Differential Revision: https://reviews.llvm.org/D61068

llvm-svn: 359293
2019-04-26 10:49:13 +00:00
Craig Topper f9c30eddd0 [SelectionDAG][X86] Use stack load/store in PromoteIntRes_BITCAST when the input needs to be be split and the output type is a vector.
We had special case handling here, but it uses a scalar any_extend for the
promotion then bitcasts to the final type. This won't split up the input data
into multiple promoted elements like we need.

This patch falls back to doing the conversion through memory.

Fixes PR41594 which I believe was reflected in the bitcast-vector-bool.ll
changes. The changes to vector-half-conversions.ll are fixing a previously
unknown miscompile from this issue.

Differential Revision: https://reviews.llvm.org/D61114

llvm-svn: 359219
2019-04-25 18:19:59 +00:00
Amy Huang 68c9199493 Recommitting r358783 and r358786 "[MS] Emit S_HEAPALLOCSITE debug info" with fixes for buildbot error (undefined assembler label).
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.

Reviewers: rnk

Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D61083

llvm-svn: 359149
2019-04-24 23:02:48 +00:00
Sanjay Patel 6f41bf948b [DAGCombiner] scale repeated FP divisor by splat factor
If we have a vector FP division with a splatted divisor, use the existing transform
that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then
potentially be converted into a scalar FP division by existing combines (rL358984)
as seen in the tests here.

That can be a potentially big perf difference if scalar fdiv has better timing
(including avoiding possible frequency throttling for vector ops).

Differential Revision: https://reviews.llvm.org/D61028

llvm-svn: 359147
2019-04-24 22:28:58 +00:00
Bjorn Pettersson 71e8c6f20f Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.

It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.

Reviewers: hfinkel, materi, jkorous

Reviewed By: jkorous

Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61038

llvm-svn: 359072
2019-04-24 06:55:50 +00:00
Amy Huang fc79ab9857 Revert "[MS] Emit S_HEAPALLOCSITE debug info" because of ToTWin64(db)
buildbot failure.

This reverts commit d07d6d6177 and
c774f687b6.

llvm-svn: 359034
2019-04-23 21:12:58 +00:00
Fangrui Song efd94c56ba Use llvm::stable_sort
While touching the code, simplify if feasible.

llvm-svn: 358996
2019-04-23 14:51:27 +00:00
Sanjay Patel 06ff5eae5b [DAGCombiner] generalize binop-of-splats scalarization
If we only match build vectors, we can miss some patterns
that use shuffles as seen in the affected tests.

Note that the underlying calls within getSplatSourceVector()
have the potential for compile-time explosion because of
exponential recursion looking through binop opcodes, but
currently the list of supported opcodes is very limited.
Both of those problems should be addressed in follow-up
patches.

llvm-svn: 358984
2019-04-23 13:16:41 +00:00
Bjorn Pettersson f97b29be88 [DAGCombiner] Combine OR as ADD when no common bits are set
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.

This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).

To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.

Reviewers: spatel, RKSimon, craig.topper, kparzysz

Reviewed By: spatel

Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59758

llvm-svn: 358965
2019-04-23 10:01:08 +00:00
Sanjay Patel bf8aacb715 [SelectionDAG] move splat util functions up from x86 lowering
This was supposed to be NFC, but the change in SDLoc
definitions causes instruction scheduling changes.

There's nothing x86-specific in this code, and it can
likely be used from DAGCombiner's simplifyVBinOp().

llvm-svn: 358930
2019-04-22 22:43:36 +00:00
Simon Pilgrim 6276ce0142 [TargetLowering][AMDGPU][X86] Improve SimplifyDemandedBits bitcast handling
This patch adds support for BigBitWidth -> SmallBitWidth bitcasts, splitting the DemandedBits/Elts accordingly.

The AMDGPU backend needed an extra  (srl (and x, c1 << c2), c2) -> (and (srl(x, c2), c1) combine to encourage BFE creation, I investigated putting this in DAGCombine but it caused a lot of noise on other targets - some improvements, some regressions.

The X86 changes are all definite wins.

Differential Revision: https://reviews.llvm.org/D60462

llvm-svn: 358887
2019-04-22 14:04:35 +00:00
Sanjay Patel 9bc6c77220 [DAGCombiner] make variable name less ambiguous; NFC
llvm-svn: 358886
2019-04-22 13:42:50 +00:00
Sanjay Patel d6989daae9 [DAGCombiner] prepare shuffle-of-splat to handle more patterns; NFC
llvm-svn: 358884
2019-04-22 13:36:07 +00:00
Amy Huang c774f687b6 [MS] Emit S_HEAPALLOCSITE debug info
Summary:
This emits labels around heapallocsite calls and S_HEAPALLOCSITE debug
info in codeview. Currently only changes FastISel, so emitting labels still
needs to be implemented in SelectionDAG.

Reviewers: hans, rnk

Subscribers: aprantl, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D60800

llvm-svn: 358783
2019-04-19 21:09:11 +00:00
Sanjay Patel e197c617a6 [SelectionDAG] soften splat mask assert/unreachable (PR41535)
These are general queries, so they should not die when given
a degenerate input like an all undef mask. Callers should be
able to deal with an op that will eventually be simplified away.

llvm-svn: 358761
2019-04-19 15:31:11 +00:00
Simon Pilgrim e7fe6dd5ed [DAGCombine] Add SimplifyDemandedBits helper that handles demanded elts mask as well
The other SimplifyDemandedBits helpers become wrappers to this new demanded elts variant.

llvm-svn: 358585
2019-04-17 15:45:44 +00:00
Florian Hahn 258a425c69 [ScheduleDAGRRList] Recompute topological ordering on demand.
Currently there is a single point in ScheduleDAGRRList, where we
actually query the topological order (besides init code). Currently we
are recomputing the order after adding a node (which does not have
predecessors) and then we add predecessors edge-by-edge.

We can avoid adding edges one-by-one after we added a new node. In that case, we can
just rebuild the order from scratch after adding the edges to the DAG
and avoid all the updates to the ordering.

Also, we can delay updating the DAG until we query the DAG, if we keep a
list of added edges. Depending on the number of updates, we can either
apply them when needed or recompute the order from scratch.

This brings down the geomean compile time for of CTMark with -O1 down 0.3% on X86,
with no regressions.

Reviewers: MatzeB, atrick, efriedma, niravd, paquette

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D60125

llvm-svn: 358583
2019-04-17 15:05:29 +00:00
Simon Pilgrim e5573f4f4e [TargetLowering] Rename preferShiftsToClearExtremeBits and shouldFoldShiftPairToMask (PR41359)
As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious.

shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask

preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair

llvm-svn: 358526
2019-04-16 20:57:28 +00:00
Luis Marques eda370d4c8 [DAGCombiner] Add missing flag to addressing mode check
The checks in `canFoldInAddressingMode` tested for addressing modes that have a
base register but didn't set the `HasBaseReg` flag to true (it's false by
default). This patch fixes that. Although the omission of the flag was
technically incorrect it had no known observable impact, so no tests were
changed by this patch.

Differential Revision:  https://reviews.llvm.org/D60314

llvm-svn: 358502
2019-04-16 15:09:18 +00:00
Tim Northover 2be3f868f9 DAG: propagate ConsecutiveRegs flags to returns too.
Arguments already have a flag to inform backends when they have been split up.
The AArch64 arm64_32 ABI makes use of these on return types too, so that code
emitted for armv7k can be ABI-compliant.

There should be no CodeGen changes yet, just making more information available.

llvm-svn: 358399
2019-04-15 12:04:10 +00:00
Tim Northover 9db00f7e5b DAG: propagate whether an arg is a pointer for CallingConv decisions.
The arm64_32 ABI specifies that pointers (despite being 32-bits) should be
zero-extended to 64-bits when passed in registers for efficiency reasons. This
means that the SelectionDAG needs to be able to tell the backend that an
argument was originally a pointer, which is implmented here.

Additionally, some memory intrinsics need to be declared as taking an i8*
instead of an iPTR.

There should be no CodeGen change yet, but it will be triggered when AArch64
backend support for ILP32 is added.

llvm-svn: 358398
2019-04-15 12:03:54 +00:00
Bjorn Pettersson 60569363a5 [SelectionDAG] Use KnownBits::computeForAddSub/computeForAddCarry
Summary:
Use KnownBits::computeForAddSub/computeForAddCarry
in SelectionDAG::computeKnownBits when doing value
tracking for addition/subtraction.

This should improve the precision of the known bits,
as we only used to make a simple estimate of known
zeroes. The KnownBits support functions are also
able to deduce bits that are known to be one in the
result.

Reviewers: spatel, RKSimon, nikic, lebedev.ri

Reviewed By: nikic

Subscribers: nikic, javed.absar, lebedev.ri, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60460

llvm-svn: 358372
2019-04-15 07:19:11 +00:00
Sanjay Patel 5e4ad39af7 [DAGCombiner] narrow shuffle of concatenated vectors
// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)

The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.

The x86 changes look neutral or better. There's one test with an
extra instruction, but that could be reversed for a subtarget with
the right attributes. But by default, we want to avoid the 256-bit
op when possible (in my motivating benchmark, a handful of ymm ops
sprinkled into a sequence of xmm ops are triggering frequency
throttling on Haswell resulting in significantly worse perf).

Differential Revision: https://reviews.llvm.org/D60545

llvm-svn: 358291
2019-04-12 16:31:56 +00:00
Craig Topper 3b1239d2a8 [TargetLowering][X86] Teach SimplifyDemandedBits to use ShrinkDemandedOp on ISD::SHL nodes.
If the upper bits of the SHL result aren't used, we might be able to use a narrower shift. For example, on X86 this can turn a 64-bit into 32-bit enabling a smaller encoding.

Differential Revision: https://reviews.llvm.org/D60358

llvm-svn: 358257
2019-04-12 06:49:28 +00:00
Sanjay Patel fd314eca8f [DAGCombiner] refactor narrowing of extracted vector binop; NFC
There's a TODO comment about handling patterns with insert_subvector,
and we do want to match that.

llvm-svn: 358187
2019-04-11 15:59:47 +00:00
Sanjay Patel c0f4a35e68 [DAGCombiner][x86] scalarize inserted vector FP ops
// bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) -->
// build_vec ...undef, (bo x, y), undef...

The lifetime of the nodes in these examples is different for variables versus constants,
but they are all build vectors briefly, so I'm proposing to catch them in this form to
handle all of the leading examples in the motivating test file.

Before we have build vectors, we might have insert_vector_element. After that, we might
have scalar_to_vector and constant pool loads.

It's going to take more work to ensure that FP vector operands are getting simplified
with undef elements, so this transform can apply more widely. In a non-loose FP environment,
we are likely simplifying FP elements to NaN values rather than undefs.

We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors
yet.

Differential Revision: https://reviews.llvm.org/D60514

llvm-svn: 358172
2019-04-11 14:21:57 +00:00
David Green 0861c87b06 Revert rL357745: [SelectionDAG] Compute known bits of CopyFromReg
Certain optimisations from ConstantHoisting and CGP rely on Selection DAG not
seeing through to the constant in other blocks. Revert this patch while we come
up with a better way to handle that.

I will try to follow this up with some better tests.

llvm-svn: 358113
2019-04-10 18:00:41 +00:00
Craig Topper 61e77b11d1 [DAGCombiner][X86][SystemZ] Canonicalize SSUBO with immediate RHS to SADDO by negating the immediate.
This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate.

Differential Revision: https://reviews.llvm.org/D60020

llvm-svn: 358027
2019-04-09 18:33:56 +00:00
Simon Pilgrim d7cc0ec581 [TargetLowering] SimplifyDemandedBits - add ISD::INSERT_SUBVECTOR support
llvm-svn: 358019
2019-04-09 16:52:21 +00:00
Simon Pilgrim 55f79ef9fe [TargetLowering] SimplifyDemandedBits - Remove GetDemandedSrcMask lambda. NFCI.
An older version of this could return false but now that this always succeeds we can just inline and simplify it.

llvm-svn: 357999
2019-04-09 12:29:26 +00:00
Simon Pilgrim 345eacd555 [TargetLowering] SimplifyDemandedBits - call SimplifyDemandedBits in bitcast handling
When bitcasting from a source op to a larger bitwidth op, split the demanded bits and OR them on top of one another and demand those merged bits in the SimplifyDemandedBits call on the source op.

llvm-svn: 357992
2019-04-09 10:27:59 +00:00
Simon Pilgrim 9f74df7d5b [TargetLowering] SimplifyDemandedBits - use DemandedElts in bitcast handling
Be more selective in the SimplifyDemandedBits -> SimplifyDemandedVectorElts bitcast call based on the demanded elts.

llvm-svn: 357942
2019-04-08 20:59:38 +00:00
Simon Pilgrim 561ba38623 [DAG] Pull out ComputeNumSignBits call to make debugging easier. NFCI.
llvm-svn: 357861
2019-04-07 11:49:33 +00:00
Simon Pilgrim 17586cda4a [SelectionDAG] Add fcmp UNDEF handling to SelectionDAG::FoldSetCC
Second half of PR40800, this patch adds DAG undef handling to fcmp instructions to match the behavior in llvm::ConstantFoldCompareInstruction, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).

This involves a lot of tweaking to reduced tests as bugpoint loves to reduce fcmp arguments to undef........

Differential Revision: https://reviews.llvm.org/D60006

llvm-svn: 357765
2019-04-05 14:56:21 +00:00
Sanjay Patel 50a8652785 [DAGCombiner][x86] scalarize splatted vector FP ops
There are a variety of vector patterns that may be profitably reduced to a
scalar op when scalar ops are performed using a subset (typically, the
first lane) of the vector register file.

For x86, this is true for float/double ops and element 0 because
insert/extract is just a sub-register rename.

Other targets should likely enable the hook in a similar way.

Differential Revision: https://reviews.llvm.org/D60150

llvm-svn: 357760
2019-04-05 13:32:17 +00:00
Piotr Sobczak 0376ac1d94 [SelectionDAG] Compute known bits of CopyFromReg
Summary:
Teach SelectionDAG how to compute known bits of ISD::CopyFromReg if
the virtual reg used has one def only.

This can be particularly useful when calling isBaseWithConstantOffset()
with the ISD::CopyFromReg argument, as more optimizations may get enabled
in the result.

Also add a missing truncation on X86, found by testing of this patch.

Change-Id: Id1c9fceec862d118c54a5b53adf72ada5d6daefa

Reviewers: bogner, craig.topper, RKSimon

Reviewed By: RKSimon

Subscribers: lebedev.ri, nemanjai, jvesely, nhaehnle, javed.absar, jsji, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59535

llvm-svn: 357745
2019-04-05 07:44:09 +00:00
Serguei Katkov c39636cc2c [FastISel] Fix crash for gc.relocate lowring
Lowering safepoint checks that all gc.relocaes observed in safepoint
must be lowered. However Fast-Isel is able to skip dead gc.relocate.

To resolve this issue we just ignore dead gc.relocate in the check.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60184

llvm-svn: 357742
2019-04-05 05:41:08 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Serguei Katkov fb44846e37 [FastISel] Fix the crash in gc.result lowering
The Fast ISel has a fallback to SelectionDAGISel in case it cannot handle the instruction.
This works as follows:
Using reverse order, try to select instruction using Fast ISel, if it cannot handle instruction it fallbacks to SelectionDAGISel
for these instructions if it is a call and continue fast instruction selections.

However if unhandled instruction is not a call or statepoint related instruction it fallbacks to SelectionDAGISel for all remaining
instructions in basic block.

However gc.result instruction is missed and as a result it is possible that gc.result is processed earlier than statepoint
causing breakage invariant the gc.results should be handled after statepoint.

Test is updated because in the current form fast-isel cannot handle ret instruction (due to i1 ret type without explicit ext)
and as a result test does not check fast-isel at all.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D60182

llvm-svn: 357672
2019-04-04 04:19:56 +00:00
Simon Pilgrim 8d248dbd77 [DAGCombiner] Rename variables Demanded -> DemandedBits/DemandedElts. NFCI.
Use consistent variable names down the SimplifyDemanded* call stack so debugging isn't such a annoyance.

llvm-svn: 357602
2019-04-03 16:00:59 +00:00
Sanjay Patel 00dae6b22d [DAGCombiner] loosen restrictions for moving shuffles after vector binop
There are 3 changes to make this correspond to the same transform in instcombine:
1. Remove the legality check - we can't create anything less legal than we started with.
2. Ease the use restriction, so we only bail out if both operands have >1 use.
3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).

As discussed in D60150, there's a scalarization opportunity that will be made
easier by allowing this transform more generally.

llvm-svn: 357580
2019-04-03 13:42:06 +00:00
Simon Pilgrim 02599de2e1 [DAGCombine] Don't use getZExtValue() until we know the constant is in range.
Noticed during prep for a patch for PR40758.

llvm-svn: 357571
2019-04-03 11:00:55 +00:00
Hans Wennborg 94b867dc7c Revert r357256 "[DAGCombine] Improve Lifetime node chains."
As it caused a pathological compile-time regressionin V8, see PR41352.

> Improve both start and end lifetime nodes chain dependencies.
>
> Reviewers: courbet
>
> Reviewed By: courbet
>
> Subscribers: hiraditya, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D59795

This also reverts the follow-up r357309:

> [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop.
>
> Avoid EXPENSIVE_CHECK failure. NFCI.

llvm-svn: 357563
2019-04-03 07:41:58 +00:00
Sanjay Patel 7cb7daabbb [DAGCombiner] reduce code duplication; NFC
llvm-svn: 357498
2019-04-02 17:20:54 +00:00
Nirav Dave 54f7118de5 [DAGCombiner] Rewrite ImproveLifetimeNodeChain to avoid DAG loop.
Avoid EXPENSIVE_CHECK failure. NFCI.

llvm-svn: 357309
2019-03-29 20:26:23 +00:00
Nirav Dave 7e84cacdbd [DAG] Avoid redundancy in StoreMerge TokenFactor generation.
Avoid generating redundant TokenFactor when all merged stores have
the same chain.

llvm-svn: 357299
2019-03-29 18:50:22 +00:00
Nirav Dave fe59e14031 [DAGCombine] Prune unnused nodes.
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.

Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight

Reviewed By: jyknight

Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58070

llvm-svn: 357283
2019-03-29 17:35:56 +00:00
Nirav Dave 610036c506 [DAG] Set up infrastructure to avoid smart constructor-based dangling nodes
Summary:
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations without fully pruning unused result values. This results
in nodes that are never added to the worklist and therefore can not be
pruned.

Add a node inserter for the combiner to make sure such nodes have the
chance of being pruned. This allows a number of additional peephole
optimizations.

Reviewers: efriedma, RKSimon, craig.topper, jyknight

Reviewed By: jyknight

Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58068

llvm-svn: 357279
2019-03-29 17:26:40 +00:00
Sanjay Patel 12685d0f7c [DAGCombiner] simplify shuffle of shuffle
After investigating the examples from D59777 targeting an SSE4.1 machine,
it looks like a very different problem due to how we map illegal types (256-bit in these cases).

We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand.
We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that
generality means it is limited to patterns with a one-use constraint, and the examples here have
2 uses. We don't need any uses or legality limitations for a simplification (no new value is
created).

It looks like we miss this pattern in IR too.

In one of the zext examples here, we have shuffle masks like this:

Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7>
Shuf = vector_shuffle<4,u,6,7,u,u,u,u>

...so that's moving the high half of the 1st vector into the low half. But the high half of the
1st vector is already identical to the low half.

Differential Revision: https://reviews.llvm.org/D59961

llvm-svn: 357258
2019-03-29 14:20:38 +00:00
Nirav Dave 9259de217e [DAGCombine] Improve Lifetime node chains.
Improve both start and end lifetime nodes chain dependencies.

Reviewers: courbet

Reviewed By: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59795

llvm-svn: 357256
2019-03-29 14:09:47 +00:00
Sanjay Patel 665a385035 [DAGCombiner] fold sext into decrement
This is a sibling to rL357178 that I noticed we'd hit if we chose
an alternate transform in D59818.

  %z = zext i8 %x to i32
  %dec = add i32 %z, -1
  %r = sext i32 %dec to i64
  =>
  %z2 = zext i8 %x to i64
  %r = add i64 %z2, -1

https://rise4fun.com/Alive/kPP

The x86 vector diffs show a slight regression, so there's a chance
that we should limit this and the previous transform to scalars.

But given that we allowed vectors before, I'm matching that behavior
here. We should change both transforms together if that's the right
thing to do.

llvm-svn: 357254
2019-03-29 13:49:08 +00:00
Hans Wennborg 800b12f90a Switch lowering: exploit unreachable fall-through when lowering case range cluster
In the example below, we would previously emit two range checks, one for cases
1--3 and one for 4--6. This patch makes us exploit the fact that the
fall-through is unreachable and only one range check is necessary.

  switch i32 %i, label %default [
    i32 1,  label %bb1
    i32 2,  label %bb1
    i32 3,  label %bb1
    i32 4,  label %bb2
    i32 5,  label %bb2
    i32 6,  label %bb2
  ]
  default: unreachable

llvm-svn: 357252
2019-03-29 13:40:05 +00:00
Craig Topper ea626d8bdb [SelectionDAGBuilder] Fix 80 column violation. NFC
llvm-svn: 357213
2019-03-28 20:52:22 +00:00
Nirav Dave 8b9c9822a1 [DAG] Fix Lifetime Node ID hashing.
llvm-svn: 357179
2019-03-28 15:53:01 +00:00
Sanjay Patel ffa8d3def7 [DAGCombiner] fold sext into negation
As noted in D59818:
  %z = zext i8 %x to i32
  %neg = sub i32 0, %z
  %r = sext i32 %neg to i64
  =>
  %z2 = zext i8 %x to i64
  %r = sub i64 0, %z2

https://rise4fun.com/Alive/KzSR

llvm-svn: 357178
2019-03-28 15:46:02 +00:00
Simon Pilgrim 38a0616c1d [DAGCombiner] Fold truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y))
If scalar truncates are free, attempt to pre-truncate build_vectors source operands.

Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering.

Differential Revision: https://reviews.llvm.org/D59654

llvm-svn: 357161
2019-03-28 11:34:21 +00:00
Nirav Dave 6b741a8038 [DAGCombiner] Teach TokenFactor pruning to peek through lifetime nodes
Summary: Lifetime nodes were inhibiting TokenFactor simplification inhibiting chain-based optimizations.

Reviewers: courbet, jyknight

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59897

llvm-svn: 357121
2019-03-27 20:37:08 +00:00
Justin Bogner b1650f0da9 [LegalizeVectorTypes] Allow single loads and stores for more short vectors
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)

This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.

Patch by Guillaume Marques. Thanks!

Differential Revision: https://reviews.llvm.org/D56201

llvm-svn: 357120
2019-03-27 20:35:56 +00:00
Nirav Dave c6dfaa0e83 Revert r356996 "[DAG] Avoid smart constructor-based dangling nodes."
This patch appears to trigger very large compile time increases in
halide builds.

llvm-svn: 357116
2019-03-27 19:54:41 +00:00
Nikita Popov 6d855ea024 [ConstantRange] Rename isWrappedSet() to isUpperWrapped()
Split out from D59749. The current implementation of isWrappedSet()
doesn't do what it says on the tin, and treats ranges like
[X, Max] as wrapping, because they are represented as [X, 0) when
using half-inclusive ranges. This also makes it inconsistent with
the semantics of isSignWrappedSet().

This patch renames isWrappedSet() to isUpperWrapped(), in preparation
for the introduction of a new isWrappedSet() method with corrected
behavior.

llvm-svn: 357107
2019-03-27 18:19:33 +00:00
Nirav Dave b5630a2ab1 [DAGCombiner] Unify Lifetime and memory Op aliasing.
Rework BaseIndexOffset and isAlias to fully work with lifetime nodes
and fold in lifetime alias analysis.

This is mostly NFC.

Reviewers: courbet

Reviewed By: courbet

Subscribers: hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59794

llvm-svn: 357070
2019-03-27 14:14:46 +00:00
Nirav Dave 96a264e053 [DAGCombine] Refactor GatherAllAliases. NFCI.
llvm-svn: 357069
2019-03-27 14:14:35 +00:00
Hans Wennborg 5c0d7a24e8 Re-commit r355490 "[CodeGen] Omit range checks from jump tables when lowering switches with unreachable default"
Original commit by Ayonam Ray.

This commit adds a regression test for the issue discovered in the
previous commit: that the range check for the jump table can only be
omitted if the fall-through destination of the jump table is
unreachable, which isn't necessarily true just because the default of
the switch is unreachable.

This addresses the missing optimization in PR41242.

> During the lowering of a switch that would result in the generation of a
> jump table, a range check is performed before indexing into the jump
> table, for the switch value being outside the jump table range and a
> conditional branch is inserted to jump to the default block. In case the
> default block is unreachable, this conditional jump can be omitted. This
> patch implements omitting this conditional branch for unreachable
> defaults.
>
> Differential Revision: https://reviews.llvm.org/D52002
> Reviewers: Hans Wennborg, Eli Freidman, Roman Lebedev

llvm-svn: 357067
2019-03-27 14:10:11 +00:00
Jonas Paulsson 38342a5185 [DAGCombiner] Don't allow addcarry if the carry producer is illegal.
getAsCarry() checks that the input argument is a carry-producing node before
allowing a transformation to addcarry. This patch adds a check to make sure
that the carry-producing node is legal. If it is not, it may not remain in a
form that is manageable by the target backend. The test case caused a
compilation failure during instruction selection for this reason on SystemZ.

Patch by Ulrich Weigand.

Review: Sanjay Patel
https://reviews.llvm.org/D59822

llvm-svn: 357052
2019-03-27 08:41:46 +00:00
Sanjay Patel bb5cba3cca [SDAG] add simplifications for FP at node creation time
We have the folds for fadd/fsub/fmul already in DAGCombiner,
so it may be possible to remove that code if we can guarantee that
these ops are zapped before they can exist.

llvm-svn: 357029
2019-03-26 20:54:15 +00:00
Nirav Dave a28c514581 [DAG] Avoid smart constructor-based dangling nodes.
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations or not fully pruning unused result values. This can
result in nodes that are never added to the worklist and therefore can
not be pruned.

Add a node inserter as the current node deleter to make sure such
nodes have the chance of being pruned.

Many minor changes, mostly positive.

llvm-svn: 356996
2019-03-26 15:08:14 +00:00
Simon Pilgrim e24441aab0 [TargetLowering] Add SimplifyDemandedBits support for ISD::INSERT_VECTOR_ELT
This helps us relax the extension of a lot of scalar elements before they are inserted into a vector.

Its exposes an issue in DAGCombiner::convertBuildVecZextToZext as some/all the zero-extensions may be relaxed to ANY_EXTEND, so we need to handle that case to avoid a couple of AVX2 VPMOVZX test regressions.

Once this is in it should be easier to fix a number of remaining failures to fold loads into VBROADCAST nodes.

Differential Revision: https://reviews.llvm.org/D59484

llvm-svn: 356989
2019-03-26 12:32:01 +00:00
Yi Kong 74b874ac4c Fix nondeterminism introduced in r353954
DenseMap iteration order is not guaranteed, use MapVector instead.

Fix provided by srhines.

Differential Revision: https://reviews.llvm.org/D59807

llvm-svn: 356988
2019-03-26 12:18:08 +00:00
Simon Pilgrim 167af1bafb [SelectionDAG] Add icmp UNDEF handling to SelectionDAG::FoldSetCC
First half of PR40800, this patch adds DAG undef handling to icmp instructions to match the behaviour in llvm::ConstantFoldCompareInstruction and SimplifyICmpInst, this permits constant folding of vector comparisons where some elements had been reduced to UNDEF (by SimplifyDemandedVectorElts etc.).

This involved a lot of tweaking to reduced tests as bugpoint loves to reduce icmp arguments to undef........

Differential Revision: https://reviews.llvm.org/D59363

llvm-svn: 356938
2019-03-25 18:51:57 +00:00
Craig Topper 07e3071854 [LegalizeDAG] Expand i16 bswap directly to a rotate by 8 instead of relying on DAG combine.
An i16 bswap can be implemented with an i16 rotate by 8. We previously emitted
a shift and OR sequence that DAG combine should be able to turn back into
rotate. But we might as well go there directly. If rotate isn't legal,
LegalizeDAG should further legalize it to either the opposite rotate, or the
shift and OR pattern.

I don't know of any way to get the existing DAG combine reliance to fail. So
I don't know any way to add new tests for this that wouldn't have worked
previously.

llvm-svn: 356860
2019-03-24 17:02:14 +00:00
Simon Pilgrim 94e8f152c1 [TargetLowering] SimplifyDemandedBits trunc(srl(x, C1)) - early out for out of range C1. NFCI.
llvm-svn: 356810
2019-03-22 20:53:49 +00:00
Florian Hahn 71033f2987 [DAGCombiner] Use getTokenFactor in a few more cases.
SDNodes can only have 64k operands and for some inputs (e.g. large
number of stores), we can reach this limit when creating TokenFactor
nodes. This patch is a follow up to D56740 and updates a few more places
that potentially can create TokenFactors with too many operands.

Reviewers: efriedma, craig.topper, aemerson, RKSimon

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D59156

llvm-svn: 356668
2019-03-21 14:32:09 +00:00
Simon Pilgrim da4992bf8d [DAGCombine] SimplifySelectCC - call FoldSetCC with the setcc result type
We were calling FoldSetCC with the compare operand type instead of the result type.

Found by OSS-Fuzz #13838 (https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13838)

llvm-svn: 356667
2019-03-21 14:07:18 +00:00
Simon Pilgrim 54ed653870 [SelectionDAG] Add scalarization of ABS node (PR41149)
Patch by: @ikulagin (Ivan Kulagin)

Differential Revision: https://reviews.llvm.org/D59577

llvm-svn: 356656
2019-03-21 11:18:54 +00:00
Simon Pilgrim 51f65171e9 Remove out of date comment. NFCI.
DAGCombiner::convertBuildVecZextToZext just requires the extractions to be sequential, they don't have to start from 0'th index.

llvm-svn: 356552
2019-03-20 12:24:15 +00:00
Simon Pilgrim 77482120da Fix for ABS legalization on PPC buildbot.
llvm-svn: 356498
2019-03-19 18:55:46 +00:00
Philip Reames db65a5b776 Allow unordered loads to be considered invariant in CodeGen
The actual code change is fairly straight forward, but exercising it isn't. First, it turned out we weren't adding the appropriate flags in SelectionDAG. Second, it turned out that we've got some optimization gaps, so obvious test cases don't work.

My first attempt (in atomic-unordered.ll) points out a deficiency in our peephole-opt folding logic which I plan to fix separately. Instead, I'm exercising this through MachineLICM.

Differential Revision: https://reviews.llvm.org/D59375

llvm-svn: 356494
2019-03-19 18:27:18 +00:00
Justin Bogner b353d6887e [DAGCombine] Fix a miscompile when reducing BUILD_VECTORs to a shuffle
In r311255 we added a case where we split vectors whose elements are
all derived from the same input vector so that we could shuffle it
more efficiently. In doing so, createBuildVecShuffle was taught to
adjust for the fact that all indices would be based off of the first
vector when this happens, but it's possible for the code that checked
that to fire incorrectly if we happen to have a BUILD_VECTOR of
extracts from subvectors and don't hit this new optimization.

Instead of trying to detect if we've split the vector by checking if
we have extracts from the same base vector, we can just pass that
information into createBuildVecShuffle, avoiding the miscompile.

Differential Revision: https://reviews.llvm.org/D59507

llvm-svn: 356476
2019-03-19 16:52:00 +00:00
Simon Pilgrim a56f2822d0 [SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect
These changes are related to PR37743 and include:

    SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.

    Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.

    Add promoting the integer ABS node in the LegalizeIntegerType.

    Expand-based legalization of integer result for the ABS nodes.

    Expand-based legalization of ABS vector operations.

    Add some integer abs testcases for different typesizes for Thumb arch

    Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
        tmp = (SRA, Hi, 31)
        Lo = (UADDO tmp, Lo)
        Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
        Lo = (XOR tmp, Lo)

    The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
        (ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).

    Change integer abs testcases for codegen with the ABS node support for AArch64.
        Indicate that the ABS is legal for the i64 type when the NEON is supported.
        Change the integer abs testcases to show changing of codegen.

    Add combine and legalization of ABS nodes for Thumb arch.

    Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.

For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743

Patch by: @ikulagin (Ivan Kulagin)

Differential Revision: https://reviews.llvm.org/D49837

llvm-svn: 356468
2019-03-19 16:24:55 +00:00
Adhemerval Zanella 664c1ef528 [TargetLowering] Add code size information on isFPImmLegal. NFC
This allows better code size for aarch64 floating point materialization
in a future patch.

Reviewers: evandro

Differential Revision: https://reviews.llvm.org/D58690

llvm-svn: 356389
2019-03-18 18:40:07 +00:00
Nirav Dave 55c921f4bf [DAG] Cleanup unused node in SimplifySelectCC.
Delete temporarily constructed node uses for analysis after it's use,
holding onto original input nodes. Ideally this would be rewritten
without making nodes, but this appears relatively complex.

Reviewers: spatel, RKSimon, craig.topper

Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57921

llvm-svn: 356382
2019-03-18 17:02:38 +00:00
David Stenberg 8a2e4af7e7 [DebugInfo] Ignore bitcasts when lowering stack arg dbg.values
Summary:
Look past bitcasts when looking for parameter debug values that are
described by frame-index loads in `EmitFuncArgumentDbgValue()`.

In the attached test case we would be left with an undef `DBG_VALUE`
for the parameter without this patch.

A similar fix was done for parameters passed in registers in D13005.

This fixes PR40777.

Reviewers: aprantl, vsk, jmorse

Reviewed By: aprantl

Subscribers: bjope, javed.absar, jdoerfert, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D58831

llvm-svn: 356363
2019-03-18 11:27:32 +00:00
Tim Renouf c302b9b5fe [CodeGen] Prepare for introduction of v3 and v5 MVTs
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:

* Exclude non-legal non-power-of-2 vector types from ComputeRegisterProp
  mechanism in TargetLoweringBase::getTypeConversion.

* Cope with SETCC and VSELECT for odd-width i1 vector when the other
  vectors are legal type.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58899

Change-Id: Ib5f23377dbef511be3a936211a0b9f94e46331f8
llvm-svn: 356350
2019-03-17 21:43:12 +00:00
Nikita Popov 9a4453592b [DAGCombine] Fold (x & ~y) | y patterns
Fold (x & ~y) | y and it's four commuted variants to x | y. This pattern
can in particular appear when a vselect c, x, -1 is expanded to
(x & ~c) | (-1 & c) and combined to (x & ~c) | c.

This change has some overlap with D59066, which avoids creating a
vselect of this form in the first place during uaddsat expansion.

Differential Revision: https://reviews.llvm.org/D59174

llvm-svn: 356333
2019-03-17 15:45:38 +00:00
Sanjay Patel 6a6e808b69 [TargetLowering] improve the default expansion of uaddsat/usubsat
This is a subset of what was proposed in:
D59006
...and may overlap with test changes from:
D59174
...but it seems like a good general optimization to turn selects
into bitwise-logic when possible because we never know exactly
what can happen at this stage of DAG combining depending on how
the target has defined things.

Differential Revision: https://reviews.llvm.org/D59066

llvm-svn: 356332
2019-03-17 14:57:40 +00:00
Simon Pilgrim 3b0a6c69ee [DAGCombine] combineShuffleOfScalars - handle non-zero SCALAR_TO_VECTOR indices (PR41097)
rL356292 reduces the size of scalar_to_vector if we know the upper bits are undef - which means that shuffles may find they are suddenly referencing scalar_to_vector elements other than zero - so make sure we handle this as undef.

llvm-svn: 356327
2019-03-16 17:36:26 +00:00
Heejin Ahn 66ce419468 [WebAssembly] Make rethrow take an except_ref type argument
Summary:
In the new wasm EH proposal, `rethrow` takes an `except_ref` argument.
This change was missing in r352598.

This patch adds `llvm.wasm.rethrow.in.catch` intrinsic. This is an
intrinsic that's gonna eventually be lowered to wasm `rethrow`
instruction, but this intrinsic can appear only within a catchpad or a
cleanuppad scope. Also this intrinsic needs to be invokable - otherwise
EH pad successor for it will not be correctly generated in clang.

This also adds lowering logic for this intrinsic in
`SelectionDAGBuilder::visitInvoke`. This routine is basically a
specialized and simplified version of
`SelectionDAGBuilder::visitTargetIntrinsic`, but we can't use it
because if is only for `CallInst`s.

This deletes the previous `llvm.wasm.rethrow` intrinsic and related
tests, which was meant to be used within a `__cxa_rethrow` library
function. Turned out this needs some more logic, so the intrinsic for
this purpose will be added later.

LateEHPrepare takes a result value of `catch` and inserts it into
matching `rethrow` as an argument.

`RETHROW_IN_CATCH` is a pseudo instruction that serves as a link between
`llvm.wasm.rethrow.in.catch` and the real wasm `rethrow` instruction. To
generate a `rethrow` instruction, we need an `except_ref` argument,
which is generated from `catch` instruction. But `catch` instrutions are
added in LateEHPrepare pass, so we use `RETHROW_IN_CATCH`, which takes
no argument, until we are able to correctly lower it to `rethrow` in
LateEHPrepare.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59352

llvm-svn: 356316
2019-03-16 05:38:57 +00:00
Simon Pilgrim 8fbe439345 [SelectionDAG] Add SimplifyDemandedBits handling for ISD::SCALAR_TO_VECTOR
Fixes a lot of constant folding mismatches between i686 and x86_64

llvm-svn: 356273
2019-03-15 17:00:55 +00:00
Nirav Dave ee5183c796 [DAGCombiner] Fix Comment. NFC.
llvm-svn: 356069
2019-03-13 17:44:40 +00:00
Nirav Dave d6351340bb [DAGCombiner] If a TokenFactor would be merged into its user, consider the user later.
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.

Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.

CodeGen tests with non-reordering changes:

  X86/aligned-variadic.ll -- memory-based add folded into stored leaq
  value.

  X86/constant-combiners.ll -- Optimizes out overlap between stores.

  X86/pr40631_deadstore_elision -- folds constant byte store into
  preceding quad word constant store.

Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet

Reviewed By: courbet

Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59260

llvm-svn: 356068
2019-03-13 17:07:09 +00:00
Clement Courbet 3bb5d0bb9b Re-land r354244 "[DAGCombiner] Eliminate dead stores to stack."
Always check candidates for hasOtherUses(), not only stores.

llvm-svn: 356050
2019-03-13 13:56:23 +00:00