Commit Graph

10971 Commits

Author SHA1 Message Date
Craig Topper 76123d338d [DAGCombiner] visitSIGN_EXTEND_INREG should fold sext_vector_inreg(undef) to 0 not undef.
We need to ensure that the sign bits of the result all match
so we can't fold to undef.

Similar to PR46585.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D83163
2020-07-04 14:35:49 -07:00
Craig Topper 120c5f1057 [DAGCombiner] Don't fold zext_vector_inreg/sext_vector_inreg(undef) to undef. Fold to 0.
zext_vector_inreg needs to produces 0s in the extended bits and
sext_vector_inreg needs to produce upper bits that are all the
same. So we should fold them to a 0 vector instead of undef.

Fixes PR46585.
2020-07-04 11:42:53 -07:00
Simon Pilgrim 56a8a5c9fe [DAG] matchBinOpReduction - match subvector reduction patterns beyond a matched shufflevector reduction
Currently matchBinOpReduction only handles shufflevector reduction patterns, but in many cases these only occur in the final stages of a reduction, once we're down to legal vector widths.

Before this its likely that we are performing reductions using subvector extractions to repeatedly split the source vector in half and perform the binop on the halves.

Assuming we've found a non-partial reduction, this patch continues looking for subvector reductions as far as it can beyond the last shufflevector.

Fixes PR37890
2020-07-04 15:28:15 +01:00
Guillaume Chatelet 87e2751cf0 [Alignment][NFC] Use proper getter to retrieve alignment from ConstantInt and ConstantSDNode
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D83082
2020-07-03 08:06:43 +00:00
Sanjay Patel bc110de78a [SelectionDAG] don't split branch on logic-of-vector-compares
SelectionDAGBuilder converts logic-of-compares into multiple branches based
on a boolean TLI setting in isJumpExpensive(). But that probably never
considered the pattern of extracted bools from a vector compare - it seems
unlikely that we would want to turn vector logic into control-flow.

The motivating x86 reduction case is shown in PR44565:
https://bugs.llvm.org/show_bug.cgi?id=44565
...and that test shows the expected improvement from using pmovmsk codegen.

For AArch64, I modified the test to include an extra op because the simpler
test gets transformed by a codegen invocation of SimplifyCFG.

Differential Revision: https://reviews.llvm.org/D82602
2020-07-02 17:05:24 -04:00
Sander de Smalen 143e324e75 [CodeGen][SVE] Don't drop scalable flag in DAGCombiner::visitEXTRACT_SUBVECTOR
There was a rogue 'assert' in AArch64ISelLowering for the tuple.get intrinsics,
that shouldn't really have been there (I suspect this was a remnant from when
we expected the wider vector always to have come from a vector CONCAT).

When I tried to create a more minimal reproducer, I found a bug in
DAGCombiner where it drops the scalable flag when trying to fold:

      extract_subv (bitcast X), Index --> bitcast (extract_subv X, Index')

This patch fixes both issues.

Reviewers: david-arm, efriedma, spatel

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82910
2020-07-02 10:16:43 +01:00
David Sherwood c7df35d2b2 [CodeGen] Fix warnings in getCopyToPartsVector
Whilst trying to assemble the following test:

  clang/test/CodeGen/aarch64-sve-intrinsics/acle_sve_set2.c

I discovered we were hitting some warnings about possible invalid
calls to getVectorNumElements() in getCopyToPartsVector(). I've
tried to fix these by using ElementCount types where possible and
I've made the assumption that we don't support using a fixed width
vector to copy parts of a scalable vector, and vice versa. Looking
at how the copy is implemented I think that's the right thing for
now.

Differential Revision: https://reviews.llvm.org/D82744
2020-07-02 09:08:20 +01:00
Craig Topper 51e92b223b [X86] Speculatively apply the same fix from 361853c96f to PromoteIntOp_MGATHER.
The UpdateNodeOperands here is also subject to CSE.
2020-07-01 11:57:59 -07:00
Craig Topper 361853c96f [LegalizeTypes] Properly handle the case when UpdateNodeOperands in PromoteIntOp_MLOAD triggers CSE instead of updating the node in place.
The caller can't handle the node having multiple results like a
masked load does. So we need to detect the case and do our own
result replacement.

Fixes PR46532.
2020-07-01 11:48:50 -07:00
David Sherwood f11305780f [CodeGen] Fix warnings in DAGCombiner::visitSCALAR_TO_VECTOR
In visitSCALAR_TO_VECTOR we try to optimise cases such as:

  scalar_to_vector (extract_vector_elt %x)

into vector shuffles of %x. However, it led to numerous warnings
when %x is a scalable vector type, so for now I've changed the
code to only perform the combination on fixed length vectors.
Although we probably could change the code to work with scalable
vectors in certain cases, without a proper profit analysis it
doesn't seem worth it at the moment.

This change fixes up one of the warnings in:

  llvm/test/CodeGen/AArch64/sve-merging-stores.ll

I've also added a simplified version of the same test to:

  llvm/test/CodeGen/AArch64/sve-fp.ll

which already has checks for no warnings.

Differential Revision: https://reviews.llvm.org/D82872
2020-07-01 18:47:13 +01:00
James Y Knight 4b0aa5724f Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.

Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.

Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D79794
2020-07-01 12:51:50 -04:00
Guillaume Chatelet ef36f5143d [Alignment] TargetLowering::hasPairedLoad must use Align for RequiredAlignment
As per documentation of `hasPairLoad`:
"`RequiredAlignment` gives the minimal alignment constraints that must be met to be able to select this paired load."
In this sense, `0` is strictly equivalent to `1`. We make this obvious by using `Align` instead of unsigned.
There is only one implementor of this interface.

Differential Revision: https://reviews.llvm.org/D82958
2020-07-01 14:32:30 +00:00
Guillaume Chatelet d3085c2501 [Alignment][NFC] Transition and simplify calls to DL::getABITypeAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82956
2020-07-01 14:31:56 +00:00
Guillaume Chatelet 27bbc8ede1 [Alignment][NFC] Migrate TargetTransformInfo::CreateVariableSizedObject to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82939
2020-07-01 14:31:21 +00:00
David Sherwood 97a7a9abb2 [CodeGen] Fix up warnings in visitEXTRACT_SUBVECTOR
It's perfectly valid to do certain DAG combines where we extract
subvectors from a concat vector when we have scalable vector types.
However, we can do this in a way that avoids generating compiler
warnings by replacing calls to getVectorNumElements() with
getVectorMinNumElements(). Due to the way subvector extracts are
designed to work with scalable vector types this is ok.

This eliminates some warnings from existing tests in this file:

  llvm/test/CodeGen/AArch64/sve-intrinsics-loads.ll

Differential Revision: https://reviews.llvm.org/D82655
2020-07-01 15:10:53 +01:00
Guillaume Chatelet 28de229bc6 [Alignment][NFC] Migrate MachineFrameInfo::CreateStackObject to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82894
2020-07-01 07:28:11 +00:00
Guillaume Chatelet c1cd61e02a [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemcpy to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82849
2020-06-30 13:12:31 +00:00
Guillaume Chatelet 306d7c6929 [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemmove to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82850
2020-06-30 12:46:59 +00:00
Guillaume Chatelet 6a6af30d43 [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemset to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82851
2020-06-30 12:46:26 +00:00
Guillaume Chatelet 5f8bdb3e6a [Alignment][NFC] TargetLowering::allowsMemoryAccess
Second patch of a series to adapt TargetLowering::allowsXXX functions

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82785
2020-06-30 08:17:00 +00:00
David Sherwood c02332a693 [CodeGen] Fix warning in getNode for EXTRACT_SUBVECTOR
Fix a warning in getNode() when extracting a subvector from a
concat vector. We can simply replace the call to getVectorNumElements
with getVectorMinNumElements as this follows the defined behaviour
for EXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D82746
2020-06-30 08:11:41 +01:00
David Sherwood 46a7f4d6f4 [SVE][CodeGen] Fix bug in DAGCombiner::reduceBuildVecToShuffle
When trying to reduce a BUILD_VECTOR to a SHUFFLE_VECTOR it's
important that we carefully check the vector types that led to
that BUILD_VECTOR. In the test I have attached to this commit
there is a case where the results of two SVE faddv instructions
are being stored to consecutive memory locations. With my fix,
as part of merging those stores we discover that each BUILD_VECTOR
element came from an extract of a SVE vector element and
therefore bail out.

Differential Revision: https://reviews.llvm.org/D82564
2020-06-30 07:28:15 +01:00
Simon Pilgrim 3521ecf1f8 [X86] Add vector support to targetShrinkDemandedConstant for OR/XOR opcodes
If a constant is only allsignbits in the demanded/active bits, then sign extend it to an allsignbits bool pattern for OR/XOR ops.

This also requires SimplifyDemandedBits XOR handling to be modified to call ShrinkDemandedConstant on any (non-NOT) XOR pattern to account for non-splat cases.

Next step towards fixing PR45808 - with this patch we now get a <-1,-1,0,0> v4i64 constant instead of <1,1,0,0>.

Differential Revision: https://reviews.llvm.org/D82257
2020-06-29 12:19:05 +01:00
Simon Pilgrim 973685fc78 [TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant
Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.
2020-06-29 11:46:58 +01:00
Guillaume Chatelet 3500d9ec95 Fix invalid alignment in DAGCombiner::isLegalNarrowLdSt
`ShAmt / 8` can be a non power of two, this can lead to an invalid alignment.
context: https://reviews.llvm.org/D41350#inline-749165

Differential Revision: https://reviews.llvm.org/D82565
2020-06-29 09:22:15 +00:00
Simon Pilgrim 6bdb3ce452 [DAG] reduceBuildVecExtToExtBuildVec - don't combine if it would break a splat.
reduceBuildVecExtToExtBuildVec was breaking a splat(zext(x)) pattern into buildvector(x, 0, x, 0, ..) resulting in much more complex insert+shuffle codegen.

We already go to some lengths to avoid this in SimplifyDemandedVectorElts etc. when we encounter splat buildvectors.

It should be OK to fold all splat(aext(x)) patterns - we might need to tighten this if we find a case where we mustn't introduce a buildvector(x, undef, x, undef, ..) but I can't find one.

Fixes PR46461.
2020-06-27 11:03:57 +01:00
Sanjay Patel e7f7715eb9 [DAGCombiner] rename variables for readability; NFC
PR46406 shows a pattern where we can do better, so try to clean this up
before adding more code.
2020-06-26 14:22:11 -04:00
Sjoerd Meijer 243a5329d4 [SelectionDAG] Lower @llvm.get.active.lane.mask to setcc
This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp
ule, and creates vectors for its 2 arguments on which the comparison is
performed.

Differential Revision: https://reviews.llvm.org/D82292
2020-06-26 07:46:38 +01:00
Eli Friedman e9d4e34ab8 [AArch64][SVE] Add legalization support for i32/i64 vector srem/urem
Implement them on top of sdiv/udiv, similar to what we do for integer
types.

Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.

Differential Revision: https://reviews.llvm.org/D81511
2020-06-23 16:27:52 -07:00
Kerry McLaughlin 5080503174 [SVE][CodeGen] Legalisation of vsetcc with scalable types
Summary: Changes SplitVecOp_VSETCC to use getVectorElementCount()

Reviewers: sdesmalen, efriedma, dancgr

Reviewed By: efriedma

Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79167
2020-06-23 11:56:29 +01:00
Simon Pilgrim bcc0dc3832 [DAG] visitSIGN_EXTEND_INREG - rename EVT variable. NFCI.
We had a EVT type variable called EVT, which isn't a good idea....
2020-06-23 10:45:27 +01:00
Paul Walker 499c63288f [SVE] Code generation for fixed length vector loads & stores.
Summary:
This patch adds base support for code generating fixed length
vector operations targeting a known SVE vector length. To achieve
this we lower fixed length vector operations to equivalent scalable
vector operations, whereby SVE predication is used to limit the
elements processed to those present within the fixed length vector.

Specifically this patch implements load and store operations, which
get lowered to their masked counterparts thusly:

  V = load(Addr) =>
    V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr))

  store(V, (Addr)) =>
    masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr))

Reviewers: rengolin, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80385
2020-06-23 09:39:03 +00:00
Simon Pilgrim 0acd22b8fb StatepointLowering.cpp - fix implicit CommandLine.h dependency. NFC.
StatepointLowering defines a cl::opt but don't include CommandLine.h.
2020-06-23 09:43:39 +01:00
Michael Liao b1360caa82 [SDAG] Add new AssertAlign ISD node.
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
  where these alignments are retrieved from alignment attributes in LLVM
  IR. These tracked alignments could help DAG combining and lowering
  generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
  we only generate AssertAlign nodes on return values from intrinsic
  calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
  new (base + offset) patterns.

Reviewers: arsenm, bogner

Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81711
2020-06-23 00:51:11 -04:00
Simon Pilgrim 48d1a2d6d0 [DAG] Add SimplifyMultipleUseDemandedVectorElts helper for SimplifyMultipleUseDemandedBits. NFCI.
We have many cases where we call SimplifyMultipleUseDemandedBits and demand specific vector elements, but all the bits from them - this adds a helper wrapper to handle this.
2020-06-22 14:24:39 +01:00
Simon Pilgrim ecc5d7ee0d [DAG] SimplifyMultipleUseDemandedBits - drop unnecessary *_EXTEND_VECTOR_INREG cases
For little endian targets, if we only need the lowest element and none of the extended bits then we can just use the (bitcasted) source vector directly.

We already do this in SimplifyDemandedBits, this adds the SimplifyMultipleUseDemandedBits equivalent.
2020-06-22 12:35:32 +01:00
David Sherwood 7edc7f6edb [CodeGen] Fix SimplifyDemandedBits for scalable vectors
For now I have changed SimplifyDemandedBits and it's various callers
to assume we know nothing for scalable vectors and to ignore the
demanded bits completely. I have also done something similar for
SimplifyDemandedVectorElts. These changes fix up lots of warnings
due to calls to EVT::getVectorNumElements() for types with scalable
vectors. These functions are all used for optimisations, rather than
functional requirements. In future we can revisit this code if
there is a need to improve code quality for SVE.

Differential Revision: https://reviews.llvm.org/D80537
2020-06-19 07:59:35 +01:00
David Sherwood 9e811b0d93 [CodeGen] Fix ComputeNumSignBits for scalable vectors
When trying to calculate the number of sign bits for scalable vectors
we should just bail out for now and pretend we know nothing.

Differential Revision: https://reviews.llvm.org/D81093
2020-06-19 07:58:42 +01:00
Simon Pilgrim 2474421398 [TargetLowering] SimplifyMultipleUseDemandedBits - drop already extended ISD::SIGN_EXTEND_INREG nodes.
If the source of the SIGN_EXTEND_INREG node is already sign extended, use the source directly.
2020-06-18 16:41:08 +01:00
Lucas Prates a255931c40 [ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.

Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.

This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.

Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer

Reviewed By: ostannard

Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75169
2020-06-18 13:15:13 +01:00
David Sherwood 65912a9768 [CodeGen] Fix warnings in foldCONCAT_VECTORS
Instead of asserting the number of elements is the same, we should be
comparing the element counts instead. In addition, when looking at
concats of extract_subvectors it's fine to use getVectorMinNumElements()
for scalable vectors.

I discovered these warnings when compiling the structured loads tests in
this file:

  test/CodeGen/AArch64/sve-intrinsics-loads.ll

Differential Revision: https://reviews.llvm.org/D81936
2020-06-18 09:29:37 +01:00
Aaron Smith 7e01675ea5 [SelectionDAG] Add MVT::bf16 to getConstantFP()
Summary:
This was probably overlooked in recent bfloat patches.
Needed to handle bf16 constants in SelectionDAG.

  ConstantFP:bf16<APFloat(0)>

Reviewers: stuij

Reviewed By: stuij

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81779
2020-06-16 15:10:05 -07:00
Qiu Chaofan f8ef7c99a0 [DAGCombiner] Require ninf for division estimation
Current implementation of division estimation isn't correct for some
cases like 1.0/0.0 (result is nan, not expected inf).

And this change exposes a potential infinite loop: we use
isConstOrConstSplatFP in combineRepeatedFPDivisors to look up if the
divisor is some constant. But it doesn't work after legalized on some
platforms. This patch restricts the method to act before LegalDAG.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D80542
2020-06-14 22:58:22 +08:00
Amanieu d'Antras 6973125cb7 Fix FastISel dropping srcloc metadata from InlineAsm
Summary:
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46060

I've also added the Extra_IsConvergent flag which was missing from FastISel.

Reviewers: echristo

Reviewed By: echristo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80759
2020-06-13 16:52:37 +01:00
Michael Liao e7b920e6fe [DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2))
Reviewers: arsenm

Subscribers: sdardis, wdng, hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, ecnelises, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81708
2020-06-12 13:53:08 -04:00
Simon Pilgrim 5509e2cc2e [DAG] foldAddSubOfSignBit - add support for non-uniform vector constants 2020-06-12 14:58:15 +01:00
David Sherwood bd97342a0c [CodeGen] Let computeKnownBits do something sensible for scalable vectors
Until we have a real need for computing known bits for scalable
vectors I have simply changed the code to bail out for now and
pretend we know nothing. I've also fixed up some simple callers of
computeKnownBits too.

Differential Revision: https://reviews.llvm.org/D80437
2020-06-11 08:17:11 +01:00
Sanjay Patel 702cf93356 [DAGCombiner] allow more folding of fadd + fmul into fma
If fmul and fadd are separated by an fma, we can fold them together
to save an instruction:
fadd (fma A, B, (fmul C, D)), N1 --> fma(A, B, fma(C, D, N1))

The fold implemented here is actually a specialization - we should
be able to peek through >1 fma to find this pattern. That's another
patch if we want to try that enhancement though.

This transform was guarded by the TLI hook enableAggressiveFMAFusion(),
so it was done for some in-tree targets like PowerPC, but not AArch64
or x86. The hook is protecting against forming a potentially more
expensive computation when fma takes longer to execute than a single
fadd. That hook may be needed for other transforms, but in this case,
we are replacing fmul+fadd with fma, and the fma should never take
longer than the 2 individual instructions.

'contract' FMF is all we need to allow this transform. That flag
corresponds to -ffp-contract=fast in Clang, so we are allowed to form
fma ops freely across expressions.

Differential Revision: https://reviews.llvm.org/D80801
2020-06-09 10:41:27 -04:00
Guillaume Chatelet 800e100588 Revert "[Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess"
This reverts commit f21c52667e.
2020-06-09 10:43:59 +00:00
Simon Wallis 4dba59689d [ARM] prologue instructions emitted for naked function with >64 byte argument
Summary:

The naked function attribute is meant to suppress all function
prologue/epilogue instructions.

On ARM, some are still emitted if an argument greater than 64 bytes in size
(the threshold for using the byval attribute in IR) is passed partially
in registers.

Perform the check for Attribute::Naked and early exit in
SelectionDAGISel::LowerArguments().

Checking in ARMFrameLowering::determineCalleeSaves() is too late.

A test case is included.

Reviewers: llvm-commits, olista01, danielkiss

Reviewed By: danielkiss

Subscribers: kristof.beyls, hiraditya, danielkiss

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80715

Change-Id: Icedecf2a4ad31bc3c35ab0df7489a9d346e1f7cc
2020-06-09 11:33:03 +01:00
Guillaume Chatelet f21c52667e [Alignment][NFC] Migrate TargetLowering::allowsMemoryAccess
Summary:
Note to downstream target maintainers: this might silently change the semantics of your code if you override `TargetLowering::allowsMemoryAccess` without marking it override.

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81379
2020-06-09 10:11:07 +00:00
Guillaume Chatelet e26ed6bdae Fix unused variable warning 2020-06-09 08:56:05 +00:00
David Sherwood cc8872400c [CodeGen] Ensure callers of CreateStackTemporary use sensible alignments
In two instances of CreateStackTemporary we are sometimes promoting
alignments beyond the stack alignment. I have introduced a new function
called getReducedAlign that will return the alignment for the broken
down parts of illegal vector types. For example, on NEON a <32 x i8>
type is made up of two <16 x i8> types - in this case the sensible
alignment is 16 bytes, not 32.

In the legalization code wherever we create stack temporaries I have
started using the reduced alignments instead for illegal vector types.

I added a test to

  CodeGen/AArch64/build-one-lane.ll

that tries to insert an element into an illegal fixed vector type
that involves creating a temporary stack object.

Differential Revision: https://reviews.llvm.org/D80370
2020-06-09 08:10:17 +01:00
Christopher Tetreault caa2fddce7 [SVE] Eliminate calls to default-false VectorType::get() from CodeGen
Reviewers: efriedma, c-rhodes, david-arm, spatel, craig.topper, aqjune, paquette, arsenm, gchatelet

Reviewed By: spatel, gchatelet

Subscribers: wdng, tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80313
2020-06-08 10:26:10 -07:00
Sanjay Patel 302cc8a121 [DAGCombiner] clean-up FMA+FMUL folds; NFC
D80801 suggests some readability improvements before mocing this block.
2020-06-06 10:32:54 -04:00
Matt Arsenault 45e1a22a92 GlobalISel: Make known bits/alignment API more consistent
Just computing the alignment makes sense without caring about the
general known bits, such as for non-integral pointers. Separate the
two and start calling into the TargetLowering hooks for frame indexes.

Start calling the TargetLowering implementation for FrameIndexes,
which improves the AMDGPU matching for stack addressing modes. Also
introduce a new hook for returning known alignment of target
instructions. For AMDGPU, it would be useful to report the known
alignment implied by certain intrinsic calls.

Also stop using MaybeAlign.
2020-06-05 14:57:22 -04:00
Sander de Smalen 937cb7a8c7 Reland D80640: [CodeGen][SVE] Calculate correct type legalization for scalable vectors.
This reverts commit 9bcef270d7.
2020-06-05 18:09:31 +01:00
Sander de Smalen 9bcef270d7 Revert "[CodeGen][SVE] Calculate correct type legalization for scalable vectors."
Seems to break some buildbots, reverting the patch for now.

This reverts commit 164f4b9d26.
2020-06-05 16:03:52 +01:00
Sander de Smalen 164f4b9d26 [CodeGen][SVE] Calculate correct type legalization for scalable vectors.
This patch updates TargetLoweringBase::computeRegisterProperties and
TargetLoweringBase::getTypeConversion to support scalable vectors,
and make the right calls on how to legalise them. These changes are required
to legalise both MVTs and EVTs.

Reviewers: efriedma, david-arm, ctetreau

Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80640
2020-06-05 15:20:34 +01:00
Kerry McLaughlin 89fc0166f5 [CodeGen][SVE] Legalisation of extends with scalable types
Summary:
This patch adds legalisation of extensions where the operand
of the extend is a legal scalable type but the result is not.

EXTRACT_SUBVECTOR is used to split the result, before
being replaced by target-specific [S|U]UNPK[HI|LO] operations.

For example:

```
zext <vscale x 16 x i8> %a to <vscale x 16 x i16>
```
should emit:

```
uunpklo z2.h, z0.b
uunpkhi z1.h, z0.b
```

Reviewers: sdesmalen, efriedma, david-arm

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79587
2020-06-05 12:08:42 +01:00
Philip Reames 4c735439fd [Statepoint] Migrate a few tests to gc-live bundle format and fix assert
The assert was missed in 0e7c7705, migrating the test revealed the problem.
2020-06-04 18:15:58 -07:00
Matt Arsenault af867b7850 DAG: Change computeKnownBitsForFrameIndex to be usable by GISel
This wasn't getting much value from the DAG or depth arguments, since
it's only called on the frame index root nodes. FrameIndexes can also
only return a scalar value, so it also didn't need DemandedElts.
2020-06-04 10:50:26 -04:00
Sanjay Patel 652b3757c8 [x86] add test/code comment for chain value use (PR46195); NFC 2020-06-04 09:15:17 -04:00
Simon Pilgrim adf10dcf2e [DAG] scalarizeBinOpOfSplats - extract from the source of splat vector (PR46189)
D79003/rG9fa58d1bf2f8 exposed an issue with scalarizeBinOpOfSplats that we were extracting from the splatted vector result instead of the source, the splat index is only valid for the source vector not the result, which may contain undefs, including at the splat index.
2020-06-04 11:58:59 +01:00
Tim Northover 87e24c3200 Revert "[DAGCombiner] avoid unnecessary indirection from SDNode/SDValue; NFCI"
This reverts commit 21dadd774f.

In at least PromoteIntBinOps, they wanted to know about users of *all* values
produced by the node not just the integer being promoted. For example not
replacing chain users if the operation was a load breaks the ordering of the
DAG.
2020-06-04 11:53:14 +01:00
Madhur Amilkanthwar b3cff3c720 Utility to dump .dot representation of SelectionDAG without firing viewer
Summary:
This patch adds support for dumping .dot
representation of SelectionDAG. It is inspired from the fact that,
a developer may want to just dump the graph at
a predictable path with a simple name to compare.
The exisitng utility (i.e. viewGraph) are overkill
for this motive hence this patch adds the requires support
while using the core routines from GraphWriter.

Example usage: DAG.dumpDotGraph("/tmp/graph.dot", "MyGraph")
will create /tmp/graph.dot file when DAG is an
object of SelectionDAG class.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D80711
2020-06-04 11:51:48 +05:30
Philip Reames ab6779bbd8 [Statepoint] Remove last of old ImmutableStatepoint code
To do so, I had to sink the old school inline operand handling into GCStatepointInst which is non ideal.  This code should be removed shortly and I was able to at least clean it up a bunch.
2020-06-03 20:31:17 -07:00
Philip Reames 91dd2f2536 [Statepoint] Delete more dead code from old wrappers
The verify() routine duplicates IR/Verifier.cpp checks, so while not technically dead it doesn't add any value either.
2020-06-03 20:10:30 -07:00
Henry Kao c57e41c000 [CodeGen][SVE] Replace deprecated calls in getCopyFromPartsVector()
Summary: Replaced getVectorNumElements() with getVectorElementCount(). Added operator overloads for class ElementCount. Fixes warning in several AArch64 unit tests.

Reviewers: sdesmalen, kmclaughlin, dancgr, efriedma, each, andwar, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80826
2020-06-03 11:20:02 -04:00
Simon Pilgrim ea80b40669 [DAG] SimplifyDemandedBits - peek through SHL if we only demand sign bits.
If we're only demanding the (shifted) sign bits of the shift source value, then we can use the value directly.

This handles SimplifyDemandedBits/SimplifyMultipleUseDemandedBits for both ISD::SHL and X86ISD::VSHLI.

Differential Revision: https://reviews.llvm.org/D80869
2020-06-03 16:11:54 +01:00
Simon Pilgrim c438b257f1 [DAG] GetDemandedBits - don't bother asserting for a non-null cast<> result. NFC.
cast<> will assert on failure anyhow.

This lets us fold the cast<> with the getAPIntValue() that uses it.
2020-06-03 12:43:07 +01:00
Kadir Cetinkaya af86a10bad
[llvm] Fix unused variable warning 2020-06-02 22:46:24 +02:00
Amy Kwan a3ada630d8 [DAGCombiner] Combine shifts into multiply-high
This patch implements a target independent DAG combine to produce multiply-high
instructions from shifts. This DAG combine will combine shifts for any type as
long as the MULH on the narrow type is legal.

For now, it is enabled on PowerPC as PowerPC is the only target that has an
implementation of the isMulhCheaperThanMulShift TLI hook introduced in
D78271.

Moreover, this DAG combine focuses on catching the pattern:
(shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>)
to produce mulhs when we have a sign-extend, and mulhu when we have
a zero-extend.

The patch performs the following checks:
- Operation is a right shift arithmetic (sra) or logical (srl)
- Input to the shift is a multiply
- Both operands to the shift are sext/zext nodes
- The extends into the multiply are both the same
- The narrow type is half the width of the wide type
- The shift amount is the width of the narrow type
- The respective mulh operation is legal

Differential Revision: https://reviews.llvm.org/D78272
2020-06-02 15:22:48 -05:00
Denis Antrushin fa818ded24 [StatepointLowering] Handle UNDEF gc values.
Do not spill UNDEF GC values. Instead, replace corresponding
gc.relocate intrinsic with an (arbitrary, but recognizable) constant.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80714
2020-06-02 10:18:33 +03:00
Matt Arsenault 836c7dcf12 DAG: Fix getNode dropping flags if there's a glue output
The AMDGPU non-strict fdiv lowering needs to introduce an FP mode
switch in some cases, and has custom nodes to provide chain/glue for
the intermediate FP operations. We need to propagate nofpexcept here,
but getNode was dropping the flags.

Adding nofpexcept in the AMDGPU custom lowering is left to a future
patch.

Also fix a second case where flags were dropped, but in this case it
seems it just didn't handle this number of operands.

Test will be included in future AMDGPU patch.
2020-06-01 13:48:02 -04:00
Florian Hahn ec25a71eb7 [ScheduleDAG] Avoid unnecessary recomputation of topological order.
In some cases ScheduleDAGRRList has to add new nodes to resolve problems
with interfering physical registers. When new nodes are added, it
completely re-computes the topological order, which can take a long
time, but is unnecessary. We only add nodes one by one, and initially
they do not have any predecessors. So we can just insert them at the end
of the vector. Later we add predecessors, but the helper function
properly updates the topological order much more efficiently. With this
change, the compile time for the program below drops from 300s to 30s on
my machine.

    define i11129 @test1() {
      %L1 = load i11129, i11129* undef
      %B30 = ashr i11129 %L1, %L1
      store i11129 %B30, i11129* undef
      ret i11129 %L1
    }

This should be generally beneficial, as we can skip a large amount of
work. Theoretically there are some scenarios where we might not safe
much, e.g. when we add a dependency between the first and last node.
Then we would have to shift all nodes. But we still do not have to spend
the time re-computing the initial order.

Reviewers: MatzeB, atrick, efriedma, niravd, paquette

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D59722
2020-05-31 11:04:35 +01:00
Craig Topper a4dd45b7d0 [DAGCombiner] Move debug message and statistic update into CommitTargetLoweringOpt.
This code was repeated in two callers of CommitTargetLoweringOpt.
But CommitTargetLoweringOpt is also called from TargetLowering.
We should print a message for those calls to. So sink the
repeated code into CommitTargetLoweringOpt to catch those calls.
2020-05-30 19:47:07 -07:00
Simon Pilgrim 63824ad947 [TargetLowering] SimplifyDemandedBits - remove shift amount clamps from getValidShiftAmountConstant calls. NFC.
getValidShiftAmountConstant only returns a value if the shift amount is in range, so we don't need to check it again.
2020-05-30 14:04:55 +01:00
Simon Pilgrim 9d0bfcec83 [SelectionDAG] ComputeNumSignBits - use Valid Min/Max shift amount helpers directly. NFCI.
We are calling getValidShiftAmountConstant first followed by getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant if that failed. But both are used in the same way in ComputeNumSignBits and the Min/Max variants call getValidShiftAmountConstant internally anyhow.
2020-05-30 14:02:14 +01:00
Simon Pilgrim 81b50a7823 [SelectionDAG] Remove repeated getOperand() call. NFC. 2020-05-30 10:21:36 +01:00
Tobias Bosch 6a4714030e [DebugInfo][DAG] Don't reuse debug location on COPY if width changes.
Summary:
This caused incorrect debug information for parameters:
Previously, after a COPY of a parameter that changes the width,
we would emit a DBG_VALUE that continues to be associated to that
parameter, even though it now used a different width.
This made the LiveDebugValues pass assume the parameter value
got clobbered and it stopped tracking the parameter entry
value, leading to incorrect debug information.

Fixes https://bugs.llvm.org/show_bug.cgi?id=39715

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80819
2020-05-29 13:24:33 -07:00
Zequan Wu 80e107ccd0 Add NoMerge MIFlag to avoid MIR branch folding
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given

Differential Revision: https://reviews.llvm.org/D79537
2020-05-29 12:31:06 -07:00
Guozhi Wei 40c08367e4 [DAGCombiner] Add command line options to guard store width reduction
optimizations

As discussed in the thread http://lists.llvm.org/pipermail/llvm-dev/2020-May/141838.html,
some bit field access width can be reduced by ReduceLoadOpStoreWidth, some
can't. If two accesses are very close, and the first access width is reduced,
the second is not. Then the wide load of second access will be stalled for long
time.

This patch add command line options to guard ReduceLoadOpStoreWidth and
ShrinkLoadReplaceStoreWithStore, so users can use them to disable these
store width reduction optimizations.

Differential Revision: https://reviews.llvm.org/D80745
2020-05-29 09:41:41 -07:00
David Sherwood d8a78889f6 [CodeGen] Fix warning in visitShuffleVector
Make sure we only ask for the number of elements after we've
bailed out for scalable vectors.

Differential revision: https://reviews.llvm.org/D80632
2020-05-29 17:09:59 +01:00
Sanjay Patel 21dadd774f [DAGCombiner] avoid unnecessary indirection from SDNode/SDValue; NFCI 2020-05-29 09:31:52 -04:00
Florian Hahn d20a3d35e1 [DAGComb] Do not turn insert_elt into shuffle for single elt vectors.
Currently combineInsertEltToShuffle turns insert_vector_elt into a
vector_shuffle, even if the inserted element is a vector with a single
element. In this case, it should be unlikely that the additional shuffle
would be more efficient than a insert_vector_elt.

Additionally, this fixes a infinite cycle in DAGCombine, where
combineInsertEltToShuffle turns a insert_vector_elt into a shuffle,
which gets turned back into a insert_vector_elt/extract_vector_elt by
a custom AArch64 lowering (in visitVECTOR_SHUFFLE).

Such insert_vector_elt and extract_vector_elt combinations can be
lowered efficiently using mov on AArch64.

There are 2 test changes in arm64-neon-copy.ll: we now use one or two
mov instructions instead of a single zip1. The reason that we need a
second mov in ins1f2 is that we have to move the result to the result
register and is not really related to the DAGCombine fold I think.
But in any case, on most uarchs, mov should be cheaper than zip1. On a
Cortex-A75 for example, zip1 is twice as expensive as mov
(https://developer.arm.com/docs/101398/latest/arm-cortex-a75-software-optimization-guide-v20)

Reviewers: spatel, efriedma, dmgreen, RKSimon

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D80710
2020-05-29 13:21:13 +01:00
Paul Walker 92f3d29af0 [SelectionDAG] Update getNode asserts for EXTRACT/INSERT_SUBVECTOR.
Summary:
The description of EXTACT_SUBVECTOR and INSERT_SUBVECTOR has been
changed to accommodate scalable vectors (see ISDOpcodes.h). This
patch updates the asserts used to verify these requirements when
using SelectionDAG's getNode interface.

This patch introduces the MVT function getVectorMinNumElements
that can be used against fixed-length and scalable vectors when
only the known minimum vector length is required.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80709
2020-05-29 11:02:18 +00:00
David Sherwood 4265f1d23c [CodeGen] Fix warnings in getZeroExtendInReg
We should be using getVectorElementCount() to assert that two types
have the same numbers of elements. I encountered the warnings while
compiling this test:

  CodeGen/AArch64/sve-intrinsics-ld1.ll

Differential Revision: https://reviews.llvm.org/D80616
2020-05-29 11:51:07 +01:00
David Sherwood b147b88c84 [CodeGen] Add support for extracting elements of scalable vectors
I have tried to ensure that SelectionDAG and DAGCombiner do
sensible things for scalable vectors, and added support for a
limited number of simple folds. Codegen support for the vector
extract patterns have also been added to the AArch64 backend.

New vector extract tests have been added here:

  CodeGen/AArch64/sve-extract-element.ll

and I have also added new folds using inserts and extracts here:

  CodeGen/AArch64/sve-insert-element.ll

Differential Revision: https://reviews.llvm.org/D80208
2020-05-29 07:49:43 +01:00
Eric Christopher bce702e5f2 unsigned -> Register for readability. 2020-05-28 15:21:55 -07:00
Philip Reames 9d06547794 [Statepoints] Sink routines for grabbing projections to GCStatepointInst [NFC]
Mechanical movement, nothing more.
2020-05-28 13:51:59 -07:00
Philip Reames a0d2fd4a1f [Statepoint] Sink actual_args and gc_args to GCStatepointInst [NFC]
These are the two operand sets which are expected to survive more than another week or so.  Instead of bothering to update the deopt and gc-transition operands, we'll just wait until those are removed and delete the code.

For those following along, this is likely to be the last (major) change in this sequence for about a week.  I want to wait until all of this has been merged downstream to ensure I haven't introduced any bugs (and migrate some downstream code to the new interfaces).  Once that's done, we should be able to delete Statepoint/ImmutableStatepoint without too much work.
2020-05-28 13:51:59 -07:00
Philip Reames 4d6cda9bda [Statepoint] Use iterate_range.empty [NFC] 2020-05-28 13:51:59 -07:00
Philip Reames 58beb76b7b [Statepoint] Convert a few more isStatepoint calls to idiomatic isa/cast
I'd apparently only grepped in the lib directories and missed a few used in the Statepoint header itself.  Beyond simple mechanical cleanup, changed the type of one routine to reflect the fact it also returns a statepoint.
2020-05-28 11:35:36 -07:00
Philip Reames 501aa47ab8 [Statepoint] Sink logic about actual callee into GCStatepointInst
Sinking logic around actual callee from Statepoint to GCStatepointInst.  While doing so, adjust naming to be consistent about refering to "actual" callee and follow precedent on naming from CallBase otherwise.

Use the result to simplify one consumer.  This is mostly just to ensure the new code is exercised, but is also a helpful cleanup on it's own.
2020-05-28 10:53:39 -07:00
Nikita Popov e0e5c64460 [SDAG] Don't require LazyBlockFrequencyInfo at optnone
While LazyBlockFrequencyInfo itself is lazy, the dominator tree
and loop info analyses it requires are not. Drop the dependency
on this pass in SelectionDAGIsel at O0.
This makes for a ~0.6% O0 compile-time improvement.

Differential Revision: https://reviews.llvm.org/D80387
2020-05-28 18:48:33 +02:00
Philip Reames 98a87c65a3 [Statepoint] Reduce scope of usage of ImmutableStatepoint
Can't quite fully remove it yet as some more items need sunk the GCStatepointInst class from the wrapper, but we can at least reduce scope.
2020-05-27 18:57:42 -07:00
Philip Reames 87bea912c2 [Statepoint] Replace uses of isX functions with idiomatic isa<X>
Now that all of the statepoint related routines have classes with isa support, let's cleanup.

I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
2020-05-27 18:32:28 -07:00
Matt Arsenault dda82986f9 DAG: Fix expansion of DYNAMIC_STACKALLOC for StackGrowsUp targets
Can't test this since I can't directly use the default expansion for
AMDGPU. It needs to scale the amount by the wave size, rather than use
the raw byte size value.
2020-05-27 18:45:40 -04:00
Philip Reames 1af3705c7f Start migrating away from statepoint's inline length prefixed argument bundles
In the current statepoint design, we have four distinct groups of operands to the call: call args, gc transition args, deopt args, and gc args. This format prexisted the support in IR for operand bundles and was in fact one of the inspirations for the extension. However, we never went back and rearchitected statepoints to fully leverage bundles.

This change is the first in a small sequence to do so. All this does is extend the SelectionDAG lowering code to allow deopt and gc transition operands to be specified in either inline argument bundles or operand bundles.

Differential Revision: https://reviews.llvm.org/D8059
2020-05-27 09:16:10 -07:00
Simon Pilgrim 50db8402fc ResourcePriorityQueue.h - reduce unnecessary includes to forward declarations. NFC.
Move includes to ResourcePriorityQueue.cpp
2020-05-26 19:22:14 +01:00
Matt Arsenault 9786e7552d Revert "[AMDGPU] NFC target dependent requiresUniformRegister refactored out"
This reverts commit fb38b98338.

This will regress compile time.
2020-05-26 12:58:18 -04:00
alex-t fb38b98338 [AMDGPU] NFC target dependent requiresUniformRegister refactored out
Summary: Target specific method encapsulated into the Target Lowering Info.

Reviewers: rampitec, vpykhtin

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70085
2020-05-26 19:49:20 +03:00
Serge Pavlov 4d20e31f73 [FPEnv] Intrinsic llvm.roundeven
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.

Differential Revision: https://reviews.llvm.org/D75670
2020-05-26 19:24:58 +07:00
Sanjay Patel f368040c14 [DAGCombiner] try to move splat after binop with splat constant
binop (splat X), (splat C) --> splat (binop X, C)
binop (splat C), (splat X) --> splat (binop C, X)

We do this in IR, and there's a similar fold for the case with 2
non-constant operands just above the code diff in this patch.

This was discussed in D79718, and the extra shuffle in the test
(llvm/test/CodeGen/X86/vector-fshl-128.ll::sink_splatvar) where it
was noticed disappears because demanded elements analysis is no
longer blocked. The large majority of the test diffs seem to be
benign code scheduling changes, but I do see another type of win:
moving the splat later allows binop narrowing in some cases.

Regressions were avoided on x86 and ARM with the INSERT_VECTOR_ELT
restriction.

Differential Revision: https://reviews.llvm.org/D79886
2020-05-26 08:12:46 -04:00
Simon Pilgrim 8f48814879 FunctionLoweringInfo.h - move APInt.h dependency to FunctionLoweringInfo.cpp. NFC. 2020-05-25 12:58:35 +01:00
Simon Pilgrim 9fa58d1bf2 [DAG] Add SimplifyDemandedVectorElts binop SimplifyMultipleUseDemandedBits handling
For the supported binops (basic arithmetic, logicals + shifts), if we fail to simplify the demanded vector elts, then call SimplifyMultipleUseDemandedBits and try to peek through ops to remove unnecessary dependencies.

This helps with PR40502.

Differential Revision: https://reviews.llvm.org/D79003
2020-05-25 12:41:22 +01:00
Simon Pilgrim 1603106725 [TargetLowering] Improve expandFunnelShift shift amount masking
For the 'inverse shift', we currently always perform a subtraction of the original (masked) shift amount.

But for the case where we are handling power-of-2 type widths, we can replace:

(sub bw-1, (and amt, bw-1) ) -> (and (xor amt, bw-1), bw-1) -> (and ~amt, bw-1)

This allows x86 shifts to fold away the and-mask.

Followup to D77301 + D80466.

http://volta.cs.utah.edu:8080/z/Nod0Gr

Differential Revision: https://reviews.llvm.org/D80489
2020-05-24 11:25:09 +01:00
Amy Kwan b631f86ac5 [TLI][PowerPC] Introduce TLI query to check if MULH is cheaper than MUL + SHIFT
This patch introduces a TargetLowering query, isMulhCheaperThanMulShift.

Currently in DAG Combine, it will transform mulhs/mulhu into a
wider multiply and a shift if the wide multiply is legal.

This TLI function is implemented on 64-bit PowerPC, as it is more desirable to
have multiply-high over multiply + shift for words and doublewords. Having
multiply-high can also aid in further transformations that can be done.

Differential Revision: https://reviews.llvm.org/D78271
2020-05-23 16:47:12 -05:00
Simon Pilgrim fe0006c882 TargetLowering.h - remove unnecessary TargetMachine.h include. NFC
Replace with forward declaration and move dependency down to source files that actually need it.

Both TargetLowering.h and TargetMachine.h are 2 of the most expensive headers (top 10) in the ClangBuildAnalyzer report when building llc.
2020-05-23 19:49:38 +01:00
Craig Topper 7392820f98 [Align] Remove operations on MaybeAlign that asserted that it had a defined value.
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.

I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.

I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.

Differential Revision: https://reviews.llvm.org/D80455
2020-05-22 21:54:28 -07:00
Simon Pilgrim b9def827b7 StatepointLowering.h - remove unused includes. NFC. 2020-05-22 10:49:11 +01:00
Marcello Maggioni dbaed589ab [SelectionDAG] Add the option of disabling generic combines.
Summary:
For some targets generic combines don't really do much and they
consume a disproportionate amount of time.
There's not really a mechanism in SDISel to tactically disable
combines, but we can have a switch to disable all of them and
let the targets just implement what they specifically need.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79112
2020-05-21 20:11:29 +00:00
Denis Antrushin dedcefe09d [Statepoint] Constant fold FP deopt args.
We do not have any special handling for constant FP deopt arguments.
They are just spilled to stack or generated in register by MOVS
instruction. This is inefficient and, when we have too many such
constant arguments, may result in register allocation failure.
Instead, we can bitcast such constant FP operands to appropriately
sized integer and record as constant into statepoint and later, into
StackMap.

Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D80318
2020-05-21 11:02:54 +03:00
Craig Topper ae5ab2f40a [LegalizeDAG] Modify ExpandLegalINT_TO_FP to swap data for little/big endian instead of the pointers.
Will make it easier to pass the pointer info and alignment
correctly to the loads/stores.

While there also make the i32 stores independent and use a token
factor to join before the load.
2020-05-20 22:29:59 -07:00
Craig Topper 17bd86bc9b [LegalizeVectorTypes] Create correct memoperands in SplitVecRes_INSERT_SUBVECTOR.
Previously this code just used a default constructed
MachinePointerInfo. But we know the accesses are to a fixed stack
object or at least somewhere on the stack.

While there fix the alignment passed to the full vector load/stores.

I don't think this function is currently exercised in tree so I
don't know how to test it. I just noticed it when I removed
non-constant index support in this function.

Differential Revision: https://reviews.llvm.org/D80058
2020-05-20 15:06:36 -07:00
Arthur Eubanks 8a88755610 Reland [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Reverted due to unexpectedly passing tests, added REQUIRES: asserts for reland.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 11:25:44 -07:00
Arthur Eubanks b8cbff51d3 Revert "[X86] Codegen for preallocated"
This reverts commit 810567dc69.

Some tests are unexpectedly passing
2020-05-20 10:04:55 -07:00
Arthur Eubanks 810567dc69 [X86] Codegen for preallocated
See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.
Does not yet handle musttail. That requires a LangRef change first.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

```
struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}
```

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77689
2020-05-20 09:20:38 -07:00
QingShan Zhang 2b59e9f1bd [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.

This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D77319
2020-05-20 02:12:16 +00:00
Matt Arsenault 3e315697ac DAG: Use correct pointer size for llvm.ptrmask
This was ignoring the address space, and would assert on address
spaces with a different size from the default.
2020-05-18 16:46:11 -04:00
Nikita Popov 52e98f620c [Alignment] Remove unnecessary getValueOrABITypeAlignment calls (NFC)
Now that load/store alignment is required, we no longer need most
of them. Also switch the getLoadStoreAlignment() helper to return
Align instead of MaybeAlign.
2020-05-17 22:19:15 +02:00
Craig Topper 796ae8cf82 [LegalizeDAG] Use MachinePointerInfo::getUnknownStack in place of MachinePointerInfo() in a couple places. NFC
We know the pointer somewhere on the stack, we just don't know
exactly where since the index may be variable.

Differential Revision: https://reviews.llvm.org/D80060
2020-05-16 15:48:16 -07:00
Eli Friedman 4f04db4b54 AllocaInst should store Align instead of MaybeAlign.
Along the lines of D77454 and D79968.  Unlike loads and stores, the
default alignment is getPrefTypeAlign, to match the existing handling in
various places, including SelectionDAG and InstCombine.

Differential Revision: https://reviews.llvm.org/D80044
2020-05-16 14:53:16 -07:00
Craig Topper 13d44b2a0c [LegalizeDAG] Use getMemBasePlusOffset to simplify some code. Use other signature of getMemBasePlusOffset in another location. NFCI
The code was calculating an offset from a stack pointer SDValue.
This is exactly what getMemBasePlusOffset does. I also replaced
sizeof(int) with a hardcoded 4. We know the type we're operating
on is 4 bytes. But the size of int that the source code is being
compiled with isn't guaranteed to be 4 bytes.

While here replace another use of getMemBasePlusOffset that was
proceeded with a call to getConstant with the other signature
that call getConstant internally.
2020-05-16 01:02:08 -07:00
Craig Topper 45c7b3fd91 [LegalizeVectorTypes] Remove non-constnat INSERT_SUBVECTOR handling. NFC
Now that D79814 has landed, we can assume that subvector ops use constant, in-range indices.
2020-05-15 23:56:13 -07:00
Eli Friedman 11aa3707e3 StoreInst should store Align, not MaybeAlign
This is D77454, except for stores.  All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.

Differential Revision: https://reviews.llvm.org/D79968
2020-05-15 12:26:58 -07:00
David Sherwood fb1c55b57d [CodeGen] Fix FoldConstantVectorArithmetic for scalable vectors
For now I have changed FoldConstantVectorArithmetic to return early
if we encounter a scalable vector, since the subsequent code assumes
you can perform lane-wise constant folds. However, in future work we
should be able to extend this to look at splats of a constant value
and fold those if possible. I have also added the same code to
FoldConstantArithmetic, since that deals with vectors too.

The warnings I fixed in this patch were being generated by this
existing test:

  CodeGen/AArch64/sve-int-arith.ll

Differential Revision: https://reviews.llvm.org/D79421
2020-05-15 14:58:44 +01:00
Simon Pilgrim 9d4b4f344d DAGCombiner.cpp - remove non-constant EXTRACT_SUBVECTOR/INSERT_SUBVECTOR handling. NFC.
Now that D79814 has landed, we can assume that subvector ops use constant, in-range indices.
2020-05-15 12:41:35 +01:00
David Sherwood 8ce4a8f6df [CodeGen] Refactor CreateStackTemporary
I've created a new variant of CreateStackTemporary that takes
TypeSize and Align arguments, and made the older instances of
CreateStackTemporary call this new function. This refactoring is
in preparation for more patches in this area related to scalable
vectors and improving the alignment calculations.

Differential Revision: https://reviews.llvm.org/D79933
2020-05-15 07:29:13 +01:00
Eli Friedman accc6b5545 LoadInst should store Align, not MaybeAlign.
The fact that loads and stores can have the alignment missing is a
constant source of confusion: code that usually works can break down in
rare cases.  So fix the LoadInst API so the alignment is never missing.

To reduce the number of changes required to make this work, IRBuilder
and certain LoadInst constructors will grab the module's datalayout and
compute the alignment automatically.  This is the same alignment
instcombine would eventually apply anyway; we're just doing it earlier.
There's a minor risk that the way we're retrieving the datalayout
could break out-of-tree code, but I don't think that's likely.

This is the last in a series of patches, so most of the necessary
changes have already been merged.

Differential Revision: https://reviews.llvm.org/D77454
2020-05-14 13:19:21 -07:00
Simon Pilgrim acb6f1ae09 TargetLowering.cpp - remove non-constant EXTRACT_SUBVECTOR/INSERT_SUBVECTOR handling. NFC.
Now that D79814 has landed, we can assume that subvector ops use constant, in-range indices.
2020-05-14 18:13:58 +01:00
Jay Foad 17941437a2 [TargetLowering] Improve expansion of FSHL/FSHR
Use an extra shift-by-1 instead of a compare and select to handle the
shift-by-zero case. This sometimes saves one instruction (if the compare
couldn't be combined with a previous instruction). It also works better
on targets that don't have good select instructions.

Note that currently this change doesn't affect most targets because
expandFunnelShift is not used because funnel shift intrinsics are
lowered early in SelectionDAGBuilder. But there is work afoot to change
that; see D77152.

Differential Revision: https://reviews.llvm.org/D77301
2020-05-14 16:36:22 +01:00
Simon Pilgrim 80715b7124 SelectionDAG.cpp - remove non-constant EXTRACT_SUBVECTOR/INSERT_SUBVECTOR handling. NFC.
Now that D79814 has landed, we can assume that subvector ops use constant, in-range indices.
2020-05-14 13:23:00 +01:00
Eric Christopher bfa200ebcf Remove an unused variable. 2020-05-13 15:13:02 -07:00
Eli Friedman ed428c429e [SelectionDAG] Require constant index for INSERT/EXTRACT_SUBVECTOR.
It sounds like an interesting idea in theory, but nothing is actually
taking advantage of it, and specifying/implementing the edge cases is
painful. So just forbid it.

Differential Revision: https://reviews.llvm.org/D79814
2020-05-13 13:08:59 -07:00
Craig Topper 8c72b0271b [CodeGen] Use Align in MachineConstantPool. 2020-05-12 10:06:40 -07:00
James Y Knight e9536795a3 Add comment for SelectionDAGBuilder::SL field. 2020-05-12 10:46:08 -04:00
David Sherwood 42c7a6d52b [CodeGen] Fix incorrect uses of getVectorNumElements()
I have fixed up some places in SelectionDAG::getNode() where we
used to assert that the number of vector elements for two types
are the same. I have changed such cases to assert that the
element counts are the same instead. I've added new tests that
exercise the code paths for all the truncations. All the extend
operations are covered by this existing test:

  CodeGen/AArch64/sve-sext-zext.ll

For the ISD::SETCC case I fixed this code path is exercised by
these existing tests:

  CodeGen/AArch64/sve-fcmp.ll
  CodeGen/AArch64/sve-intrinsics-int-compares-with-imm.ll

Differential Revision: https://reviews.llvm.org/D79399
2020-05-12 07:50:37 +01:00
Eli Friedman c9c930ae67 [SelectionDAG] Don't promote the alignment of allocas beyond the stack alignment.
allocas in LLVM IR have a specified alignment. When that alignment is
specified, the alloca has at least that alignment at runtime.

If the specified type of the alloca has a higher preferred alignment,
SelectionDAG currently ignores that specified alignment, and increases
the alignment. It does this even if it would trigger stack realignment.
I don't think this makes sense, so this patch changes that.

I was looking into this for SVE in particular: for SVE, overaligning
vscale'ed types is extra expensive because it requires realigning the
stack multiple times, or using dynamic allocation. (This currently isn't
implemented.)

I updated the expected assembly for a couple tests; in particular, for
arg-copy-elide.ll, the optimization in question does not increase the
alignment the way SelectionDAG normally would. For the rest, I just
increased the specified alignment on the allocas to match what
SelectionDAG was inferring.

Differential Revision: https://reviews.llvm.org/D79532
2020-05-11 17:39:00 -07:00
Sam McCall 728cf6d86b Revert "[DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression"
This reverts commit 3c44c441db.

Causes infloops on some inputs, see https://reviews.llvm.org/D77319 for repro
2020-05-11 16:44:01 +02:00
QingShan Zhang 3c44c441db [DAGCombine] Remove the getNegatibleCost to avoid the out of sync with getNegatedExpression
We have the getNegatibleCost/getNegatedExpression to evaluate the cost and negate the expression.
However, during negating the expression, the cost might change as we are changing the DAG,
and then, hit the assertion if we negated the wrong expression as the cost is not trustful anymore.

This patch is target to remove the getNegatibleCost to avoid the out of sync with getNegatedExpression,
and check the cost during negating the expression. It also reduce the duplicated code between
getNegatibleCost and getNegatedExpression. And fix the crash for the test in D76638

Reviewed By: RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D77319
2020-05-11 02:41:10 +00:00
Craig Topper bebdc62c3f [SelectionDAG] Remove ConstantPoolSDNode::getAlignment.
Use getAlign instead.

Differential Revision: https://reviews.llvm.org/D79459
2020-05-08 16:04:11 -07:00
Craig Topper d1119980e5 [SelectionDAG] Use Align/MaybeAlign for ConstantPoolSDNode.
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.

Removing getAlignment() will be done as a follow up.

Differential Revision: https://reviews.llvm.org/D79436
2020-05-08 16:04:11 -07:00
Simon Pilgrim 70293ba26f [DAG] SimplifyMultipleUseDemandedBits - remove superfluous bitcasts
If the SimplifyMultipleUseDemandedBits calls BITCASTs that peek through back to the original type then we can remove the BITCASTs entirely.

Differential Revision: https://reviews.llvm.org/D79572
2020-05-08 19:04:49 +01:00
aartbik 771d30c647 [llvm] [CodeGen] Fixed vector halving bug for masked store
Summary:
Note that this fix is very similar to what has already been
done for the masked load in https://reviews.llvm.org/D78608

Bugs:
https://bugs.llvm.org/show_bug.cgi?id=45563
https://bugs.llvm.org/show_bug.cgi?id=45833

Reviewers: craig.topper, nicolasvasilache, mehdi_amini

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79611
2020-05-07 19:01:40 -07:00
Kerry McLaughlin a31f4c52bf [SVE][CodeGen] Fix legalisation for scalable types
Summary:
This patch handles illegal scalable types when lowering IR operations,
addressing several places where the value of isScalableVector() is
ignored.

For types such as <vscale x 8 x i32>, this means splitting the
operations. In this example, we would split it into two
operations of type <vscale x 4 x i32> for the low and high halves.

In cases such as <vscale x 2 x i32>, the elements in the vector
will be promoted. In this case they will be promoted to
i64 (with a vector of type <vscale x 2 x i64>)

Reviewers: sdesmalen, efriedma, huntergr

Reviewed By: efriedma

Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78812
2020-05-07 10:01:31 +01:00
Craig Topper 7b9d6673bf [SelectionDAG] When splitting gather operands in type legalization, set MMO size to UnknownSize
I missed this case when I did the same for gather results and scatter
operands in c69a4d6bef.
2020-05-06 19:57:14 -07:00
LemonBoy 7fa5abd343 [SelectionDAG] Fix assertion failure with big shift amounts
Calling getShiftAmountTy with LegalTypes set may return a type that's too narrow to hold the shift amount for integer type it's applied to.

Fixes the regression introduced by D79096

Differential Revision: https://reviews.llvm.org/D79405
2020-05-06 11:58:37 -07:00
Sanjay Patel 2f1fe1864d [DAGCombiner] sink target-supported FP<->int cast op after concat vectors
Try to combine N short vector cast ops into 1 wide vector cast op:
concat (cast X), (cast Y)... -> cast (concat X, Y...)

This is part of solving PR45794:
https://bugs.llvm.org/show_bug.cgi?id=45794

As noted in the code comment, this is uglier than I was hoping because
the opcode determines whether we pass the source or destination type
to isOperationLegalOrCustom(). Also IIUC, there's no way to validate
what the other (dest or src) type is. Without the extra legality check
on that, there's an ARM regression test in:
test/CodeGen/ARM/isel-v8i32-crash.ll
...that will crash trying to lower an unsupported v8f32 to v8i16.

Differential Revision: https://reviews.llvm.org/D79360
2020-05-06 10:25:58 -04:00
David Sherwood cd3a54c55a [CodeGen] Fix warnings due to SelectionDAG::getSplatSourceVector
Summary:
I have fixed several places in getSplatSourceVector and isSplatValue
to work correctly with scalable vectors. I added new support for
the ISD::SPLAT_VECTOR DAG node as one of the obvious cases we can
support with scalable vectors. In other places I have tried to do
the sensible thing, such as bail out for vector types we don't yet
support or don't intend to support.

It's not possible to add IR test cases to cover these changes, since
they are currently only ever exercised on certain targets, e.g.
only X86 targets use the result of getSplatSourceVector. I've
assumed that X86 tests already exist to test these code paths for
fixed vectors. However, I have added some AArch64 unit tests that
test the specific functions I have changed.

Differential revision: https://reviews.llvm.org/D79083
2020-05-05 08:45:41 +01:00
Alex Richardson d1ff003fbb [SelectionDAGBuilder] Stop setting alignment to one for hidden sret values
We allocated a suitably aligned frame index so we know that all the values
have ABI alignment.
For MIPS this avoids using pair of lwl + lwr instructions instead of a
single lw. I found this when compiling CHERI pure capability code where
we can't use the lwl/lwr unaligned loads/stores and and were to falling
back to a byte load + shift + or sequence.

This should save a few instructions for MIPS and possibly other backends
that don't have fast unaligned loads/stores.
It also improves code generation for CodeGen/X86/pr34653.ll and
CodeGen/WebAssembly/offset.ll since they can now use aligned loads.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78999
2020-05-04 14:44:39 +01:00
LemonBoy 6d103ca855 [SelectionDAG] Unify scalarizeVectorLoad and VectorLegalizer::ExpandLoad
The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces.

The technique employed by `ExpandLoad`  is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines.

Differential Revision: https://reviews.llvm.org/D79096
2020-05-02 15:18:10 -07:00
Simon Pilgrim a09a3c6d3e Revert rG8e05ac0a510c - "[DAGCombine] visitTRUNCATE - remove GetDemandedBits call"
Causing buildbot failures
2020-05-02 20:08:33 +01:00
Simon Pilgrim 8e05ac0a51 [DAGCombine] visitTRUNCATE - remove GetDemandedBits call
rL368553 added SimplifyMultipleUseDemandedBits handling for ISD::TRUNCATE to SimplifyDemandedBits so we don't need to duplicate this (and it gets rid of another GetDemandedBits call which is slowly being replaced with SimplifyMultipleUseDemandedBits anyhow).
2020-05-02 19:52:17 +01:00
Simon Pilgrim 7cb5a51f38 [DAG] SimplifyDemandedVectorElts - add INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling 2020-05-01 16:20:51 +01:00
Simon Pilgrim 65d32a9892 [DAG] SimplifyDemandedVectorElts - remove INSERT_SUBVECTOR if we don't demand the subvector 2020-05-01 16:20:51 +01:00
Simon Pilgrim e3c0be596c [DAG] SimplifyDemandedVectorElts - add EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling 2020-05-01 13:48:07 +01:00
Craig Topper 6a1ad76dab [X86] Don't return true from isTruncateFree for vectors
Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to.

Differential Revision: https://reviews.llvm.org/D79045
2020-04-30 16:43:35 -07:00
Simon Pilgrim 96238486ed [DAGCombine] Move the remaining X86 funnel shift patterns to DAGCombine
X86 matches several 'shift+xor' funnel shift patterns:

  fold (or (srl (srl x1, 1), (xor y, 31)), (shl x0, y))  -> (fshl x0, x1, y)
  fold (or (shl (shl x0, 1), (xor y, 31)), (srl x1, y))  -> (fshr x0, x1, y)
  fold (or (shl (add x0, x0), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y)

These patterns are also what we end up with the proposed expansion changes in D77301.

This patch moves these to DAGCombine's generic MatchFunnelPosNeg.

All existing X86 test cases still pass, and we just have a small codegen change in pr32282.ll.

Reviewed By: @spatel

Differential Revision: https://reviews.llvm.org/D78935
2020-04-30 12:57:17 +01:00
Simon Pilgrim 6547a5ceb2 [DAG] Add TODO comment regarding ADD(X,X) -> SHL(X,1) canonicalization
As discussed on D78935
2020-04-30 12:57:16 +01:00
David Sherwood 058cd8c5be [CodeGen] Add support for inserting elements into scalable vectors
Summary:
This patch tries to ensure that we do something sensible when
generating code for the ISD::INSERT_VECTOR_ELT DAG node when operating
on scalable vectors. Previously we always returned 'undef' when
inserting an element into an out-of-bounds lane index, whereas now
we only do this for fixed length vectors. For scalable vectors it
is assumed that the backend will do the right thing in the same way
that we have to deal with variable lane indices.

In this patch I have permitted a few basic combinations for scalable
vector types where it makes sense, but in general avoided most cases
for now as they currently require the use of BUILD_VECTOR nodes.

This patch includes tests for all scalable vector types when inserting
into lane 0, but I've only included one or two vector types for other
cases such as variable lane inserts.

Differential Revision: https://reviews.llvm.org/D78992
2020-04-30 11:14:04 +01:00
QingShan Zhang b5f89744cc [DAGCombine] Checking the cost directly to improve the code readability
Call getNegatedExpression(Cost) and check the Cost to make the code more clear.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D78347
2020-04-29 01:49:39 +00:00
Craig Topper e13c141a91 [SelectionDAGBuilder] Use CallBase::isInlineAsm in a couple places. NFC
These lines were just changed from using CallBase::getCalledValue
to getCallledOperand. Go aheand change them to isInlineAsm.
2020-04-27 23:00:44 -07:00
Craig Topper a58b62b4a2 [IR] Replace all uses of CallBase::getCalledValue() with getCalledOperand().
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().

I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.

Differential Revision: https://reviews.llvm.org/D78882
2020-04-27 22:17:03 -07:00
David Sherwood 096b25a8d8 [CodeGen] Use SPLAT_VECTOR for zeroinitialiser with scalable types
Summary:
When generating code for the LLVM IR zeroinitialiser operation, if
the vector type is scalable we should be using SPLAT_VECTOR instead
of BUILD_VECTOR.

Differential Revision: https://reviews.llvm.org/D78636
2020-04-27 15:57:59 +01:00
QingShan Zhang 2957fa0cd1 [NFC][DAGCombine] Adding three helper functions and change the getNegatedExpression to negateExpression
This is a NFC patch for D77319. The idea is to hide the getNegatibleCost inside the getNegatedExpression()
to have it return null if the cost is expensive, and add some helper function for easy to use. And
rename the old getNegatedExpression to negateExpression to avoid the semantic conflict.

Reviewed By: RKSimon

Differential revision: https://reviews.llvm.org/D78291
2020-04-27 04:11:42 +00:00
aartbik 907871d9ad [llvm] [CodeGen] Fixed vector halving bug for masked load
Summary:
Given a VL=14 that is enveloped by a proper VL=16, splitting the
masked load using the enveloping halving VL=8/8 should yields
should eventually yield V=8/5. This fixes various assert failures
in getHalfNumVectorElementsVT() and IncrementMemoryAddress().

Note, I suspect similar fixes will be needed for other masked
operations, but for now I send out a fix for masked load only.

Bugzilla issue 45563
https://bugs.llvm.org/show_bug.cgi?id=45563

Reviewers: craig.topper, mehdi_amini, nicolasvasilache

Reviewed By: craig.topper

Subscribers: hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78608
2020-04-23 15:12:44 -07:00
Christopher Tetreault ccd623eae3 [SVE] Remove calls to isScalable from CodeGen
Reviewers: efriedma, sdesmalen, stoklund, sunfish

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77755
2020-04-23 12:58:52 -07:00
Alex Richardson bbcfce4bad Use FrameIndexTy for stack protector
Using getValueType() is not correct for architectures extended with CHERI since
we need a pointer type and not the value that is loaded. While stack
protector is useless when you have CHERI (since CHERI provides much
stronger security guarantees), we still have a test to check that we can
generate correct code for checks. Merging b281138a1b
into our tree broke this test. Fix by using TLI.getFrameIndexTy().

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D77785
2020-04-23 13:12:27 +01:00
Craig Topper 05a11974ae [CallSite removal] Remove unneeded includes of CallSite.h. NFC 2020-04-22 00:07:13 -07:00
Kang Zhang a8e15ee04a [CodeGen] Support freeze expand for ppc_fp128
Summary:
The patch D29014 has added the new ISD::FREEZE and can deal with the
integer.
The patch D76980 has added SoftenFloatRes_FREEZE for float point.
But we still lack of expand for ppc_fp128, this will cause assertion for
some cases.
This patch is to support freeze expand for ppc_fp128.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78278
2020-04-20 07:27:41 +00:00
Simon Pilgrim 46de0d5fe9 SelectionDAGBuilder.h - remove unused includes + forward declarations. NFC.
Replace SelectionDAG.h include with SelectionDAG forward declaration.
2020-04-19 12:38:41 +01:00
Simon Pilgrim 032738d17e InstrEmitter.h - reduce SelectionDAG.h include to SelectionDAGNodes.h include.
Add SDDbgLabel/TargetLowering forward declarations.
Add the full SelectionDAG.h include to InstrEmitter.cpp.
2020-04-19 11:52:31 +01:00
Christopher Tetreault c858debebc Remove asserting getters from base Type
Summary:
Remove asserting vector getters from Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: dexonsmith, sdesmalen, efriedma

Reviewed By: efriedma

Subscribers: cfe-commits, hiraditya, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D77278
2020-04-17 14:03:31 -07:00
Fraser Cormack c819ef9653 Provide operand indices to adjustSchedDependency
This allows targets to know exactly which operands are contributing to
the dependency, which is required for targets with per-operand
scheduling models.

Differential Revision: https://reviews.llvm.org/D77135
2020-04-17 11:08:44 +01:00
Craig Topper 944cc5e0ab [SelectionDAGBuilder][CGP][X86] Move some of SDB's gather/scatter uniform base handling to CGP.
I've always found the "findValue" a little odd and
inconsistent with other things in SDB.

This simplfifies the code in SDB to just handle a splat constant
address or a 2 operand GEP in the same BB. This removes the
need for "findValue" since the operands to the GEP are
guaranteed to be available. The splat constant handling is
new, but was needed to avoid regressions due to constant
folding combining GEPs created in CGP.

CGP is now responsible for canonicalizing gather/scatters into
this form. The pattern I'm using for scalarizing, a scalar GEP
followed by a GEP with an all zeroes index, seems to be subject
to constant folding that the insertelement+shufflevector was not.

Differential Revision: https://reviews.llvm.org/D76947
2020-04-16 17:49:22 -07:00
Eli Friedman 7c10541e56 [SelectionDAG] Fix usage of Align constructing MachineMemOperands.
The "Align" passed into getMachineMemOperand etc. is the alignment of
the MachinePointerInfo, not the alignment of the memory operation.
(getAlign() on a MachineMemOperand automatically reduces the alignment
to account for this.)

We were passing on wrong (overconservative) alignment in a bunch of
places. Fix a bunch of these, mostly in legalization.  And while I'm
here, switch to the new Align APIs.

The test changes are all scheduling changes: the biggest effect of
preserving large alignments is that it improves alias analysis, so the
scheduler has more freedom.

(I was originally just trying to do a minor cleanup in
SelectionDAGBuilder, but I accidentally went deeper down the rabbit
hole.)

Differential Revision: https://reviews.llvm.org/D77687
2020-04-15 13:01:41 -07:00
Benjamin Kramer d790bd3999 Unbreak the build 2020-04-15 15:54:47 +02:00
Victor Campos d85b3877dc [CodeGen][ARM] Error when writing to specific reserved registers in inline asm
Summary:
No error or warning is emitted when specific reserved registers are
written to in inline assembly. Therefore, writes to the program counter
or to the frame pointer, for instance, were permitted, which could have
led to undesirable behaviour.

Example:
  int foo() {
    register int a __asm__("r7"); // r7 = frame-pointer in M-class ARM
    __asm__ __volatile__("mov %0, r1" : "=r"(a) : : );
    return a;
  }

In contrast, GCC issues an error in the same scenario.

This patch detects writes to specific reserved registers in inline
assembly for ARM and emits an error in such case. The detection works
for output and input operands. Clobber operands are not handled here:
they are already covered at a later point in
AsmPrinter::emitInlineAsm(const MachineInstr *MI). The registers
covered are: program counter, frame pointer and base pointer.

This is ARM only. Therefore the implementation of other targets'
counterparts remain open to do.

Reviewers: efriedma

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76848
2020-04-15 14:40:42 +01:00
QingShan Zhang c9f9c79c5a [NFC][DAGCombine] Change the value of NegatibleCost to make it align with the semantics
This is a minor NFC change to make the code more clear. We have the NegatibleCost that
has cheaper, neutral, and expensive. Typically, the smaller one means the less cost.
It is inverse for current implementation, which makes following code not easy to read.
If (CostX > CostY) negate(X)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D77993
2020-04-15 02:20:58 +00:00
Craig Topper 3043093822 [CallSite removal][CodeGen] Replace ImmutableCallSite with CallBase in isInTailCallPosition. 2020-04-13 23:04:57 -07:00
Craig Topper 113f37a1f9 [CallSite removal][TargetLowering] Replace ImmutableCallSite with CallBase
Differential Revision: https://reviews.llvm.org/D77995
2020-04-13 13:50:15 -07:00
Matt Arsenault e6605a209c DAG: Fix wrong legality check for ISD::FMAD
Since 1725f28841, this should check
isFMADLegalForFAddFSub rather than the the plain isOperationLegal.

This would assert in a subset of cases due to an oddity in how FMAD is
selected. We will allow FMA formation pre-legalize, but not FMAD even
in cases where it would be valid.

The current hook requires passing in the root fadd/fsub. However, in
this distributed case, this would be far more complicated to pass in
the relevant operand. AMDGPU doesn't get any value from the node, and
only needs the type and is the only implementor, so I'm not sure why
we have this complexity. Just rename and expand the assert to avoid
the more complicated checks spread through the distribution logic.
2020-04-13 10:25:39 -07:00
Craig Topper dbb272b0a3 [CallSite removal][FastISel] Use CallBase instead of CallSite in fastLowerCall. 2020-04-12 18:02:24 -07:00
Craig Topper 95192f548d [CallSite removal][TargetLowering] Use CallBase instead of CallSite in TargetLowering::ParseConstraints interface.
Differential Revision: https://reviews.llvm.org/D77929
2020-04-12 11:26:25 -07:00
Jonathan Roelofs 41f13f1f64 reland: [DAG] Fix PR45049: LegalizeTypes crash
Sometimes LegalizeTypes knows about common subexpressions before SelectionDAG
does, leading to accidental SDValue removal before its reference count was
truly zero.

Differential Revision: https://reviews.llvm.org/D76994

Reviewed-By: bjope

Fixes: https://bugs.llvm.org/show_bug.cgi?id=45049

Reverted in 3ce77142a6 because the previous patch
broke the expensive-checks bots. The new patch removes the broken check.
2020-04-12 09:52:17 -06:00
Craig Topper 5b42399029 [CallSite removal][FastISel] Remove uses of CallSite.
Differential Revision: https://reviews.llvm.org/D77933
2020-04-11 20:52:45 -07:00
Craig Topper 806763efcf [CallSite removal][SelectionDAGBuilder] Use CallBase instead of ImmutableCallSite in visitPatchpoint.
Differential Revision: https://reviews.llvm.org/D77932
2020-04-11 13:07:31 -07:00
Sanjay Patel 1318ddbc14 [VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC
As proposed in D77881, we'll have the related widening operation,
so this name becomes too vague.

While here, change the function signature to take an 'int' rather
than 'size_t' for the scaling factor, add an assert for overflow of
32-bits, and improve the documentation comments.
2020-04-11 10:05:49 -04:00
Craig Topper 9c1842d8af Change FastISel::CallLoweringInfo::CS to be an ImmutableCallSite instead of a pointer. NFCI.
This is the same as what was done to the CallLoweringInfo in
TargetLowering.h in r309159.

This is just a step on the way to replacing this with CallBase.
2020-04-10 23:45:36 -07:00
Craig Topper f49f6cf91e [CallSite removal][SelectionDAGBuilder] Remove most CallSite usage from visitInlineAsm.
I only left it at the interface to ParseConstraints since that
needs updates to other callers in different files. I'll do that
as a follow up.

Differential Revision: https://reviews.llvm.org/D77892
2020-04-10 19:23:33 -07:00
Christopher Tetreault 889f6606ed Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: stoklund, sdesmalen, efriedma

Reviewed By: sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77272
2020-04-10 14:53:43 -07:00
Simon Pilgrim a88cc20456 ProfileSummaryInfo.h - remove unnecessary includes. NFC
Remove a number of includes that aren't necessary (nor are we relying on the remaining includes to provide the declarations), we just needed a llvm::Instruction forward declaration.

This exposed a couple of source files that were implicitly replying on the includes for their use of llvm::SmallSet or std::set, requiring local includes to be added there instead.
2020-04-10 16:25:48 +01:00
Serguei Katkov 4275eb1331 Re-land [Codegen/Statepoint] Allow usage of registers for non gc deopt values.
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.

The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.

Reviewers: reames, danstrushin
Reviewed By: dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77797
2020-04-10 10:13:39 +07:00
Serguei Katkov 44f0d7f136 Revert "[Codegen/Statepoint] Allow usage of registers for non gc deopt values."
This reverts commit a0275705bb.

It causes buildbot failures building LLVM with BUILD_SHARED_LIBS due to a linker error.
2020-04-09 18:24:47 +07:00
Serguei Katkov a0275705bb [Codegen/Statepoint] Allow usage of registers for non gc deopt values.
The change introduces the usage of physical registers for non-gc deopt values.
This require runtime support to know how to take a value from register.
By default usage is off and can be switched on by option.

The change also introduces additional fix-up patch which forces the spilling
of caller saved registers (clobbered after the call) and re-writes statepoint
to use spill slots instead of caller saved registers.

Reviewers: reames, dantrushin
Reviewed By: reames, dantrushin
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D77371
2020-04-09 16:57:35 +07:00
Jay Foad c63aed890e [KnownBits] Move AND, OR and XOR logic into KnownBits
Summary:
There are at least three clients for KnownBits calculations:
ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the
common logic should be moved out of these clients and into KnownBits
itself.

This patch does this for AND, OR and XOR calculations by implementing
and using appropriate operator overloads KnownBits::operator& etc.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74060
2020-04-09 10:10:37 +01:00
Matt Arsenault 586769cce2 DAG: Use Register 2020-04-08 13:44:31 -04:00
Matt Arsenault dcce3ef1d2 FastISel: Partially use Register
Doesn't try to convert the cases that depend on generated code.
2020-04-08 12:10:58 -04:00