Commit Graph

1137 Commits

Author SHA1 Message Date
Eli Friedman 6c04b7dd4f [AArch64] Optimize overflow checks for [s|u]mul.with.overflow.i32.
Saves one instruction for signed, uses a cheaper instruction for
unsigned.

Differential Revision: https://reviews.llvm.org/D105770
2021-07-12 15:30:42 -07:00
Benjamin Kramer 0da3573a9e [AArch64] Silence unused variable warning. NFC.
AArch64ISelLowering.cpp:15167:8: warning: unused variable 'OpCode' [-Wunused-variable]
  auto OpCode = N->getOpcode();
       ^
2021-07-12 16:01:11 +02:00
Michael Liao 8253fa2298 Fix warning '-Wparentheses'. NFC. 2021-07-12 09:25:30 -04:00
David Truby c305557acd [llvm][sve] Lowering for VLS truncating stores
This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.

Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.

Differential Revision: https://reviews.llvm.org/D104471
2021-07-12 11:14:17 +01:00
Bradley Smith 026bb84bcd [AArch64][SVE] Add ISel patterns for floating point compare with zero instructions
Additionally, lower the floating point compare SVE intrinsics to
SETCC_MERGE_ZERO ISD nodes to avoid duplicating ISel patterns.

Differential Revision: https://reviews.llvm.org/D105486
2021-07-08 10:46:12 +00:00
Bradley Smith 5ab9000fbb [AArch64][SVE] Fix selection failures for scalable MLOAD nodes with passthru
Differential Revision: https://reviews.llvm.org/D105348
2021-07-06 14:17:23 +00:00
Caroline Concatto a2c5c56055 [AArch64][CostModel] Add cost model for experimental.vector.splice
This patch adds a new  ShuffleKind SK_Splice and then handle the cost in
getShuffleCost, as in experimental.vector.reverse.

Differential Revision: https://reviews.llvm.org/D104630
2021-07-05 14:30:24 +01:00
Bradley Smith cc273983f7 [AArch64][SVE] Improve fixed length codegen for common vector shuffle case
Improve codegen when lowering the common vector shuffle case from the
vectorizer (op1[last]:op2[0:last-1]). This patch only handles this
common case as it is difficult to handle this more generally when using
fixed length vectors, due to being unable to use the SVE ext instruction.

Differential Revision: https://reviews.llvm.org/D105289
2021-07-05 12:09:27 +01:00
Paul Walker 287d39dd5a [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Krzysztof Parzyszek df88c26f0d [OpaquePtr] Add type parameter to emitLoadLinked
Differential Revision: https://reviews.llvm.org/D105353
2021-07-02 13:07:40 -05:00
Florian Hahn 1a248233a5
[AArch64] Use custom lowering for fp16 vector copysign.
The custom copysign lowering already supports fp16. Use it.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D105277
2021-07-02 11:15:30 +01:00
Eli Friedman 0176ac9503 [AArch64] Optimize SVE bitcasts of unpacked types.
Target-independent code only knows how to spill to the stack; instead,
use AArch64ISD::REINTERPRET_CAST.

Differential Revision: https://reviews.llvm.org/D104573
2021-07-01 15:35:48 -07:00
Bradley Smith 2668727929 [SelectionDAG] Implement PromoteIntRes_INSERT_SUBVECTOR
Inserting into a smaller-than-legal scalable vector would result in an
internal compiler error. For example, inserting a <vscale x 4 x i8> into
a <vscale x 8 x i8> (both illegal vector types for SVE) would cause a
crash.

This crash was happening because there was no code to promote (legalise)
the result of an INSERT_SUBVECTOR node.

This patch implements PromoteIntRes_INSERT_SUBVECTOR, which legalises
the ISD node. This is currently done by going through memory. This is
necessary because of the requirement that the SubVec parameter of the
INSERT_SUBVECTOR node must be smaller than the Vec parameter, which
means that INSERT_SUBVECTOR cannot always have a legal result/operand
types.

Co-Authored-by: Joe Ellis <joe.ellis@arm.com>

Differential Revision: https://reviews.llvm.org/D102766
2021-07-01 17:05:53 +01:00
Bradley Smith 01b846674d [AArch64][SVE] Add support for fixed length MSCATTER/MGATHER
Since gather lowering can now lower to nodes that may need expansion via
the vector legalizer, do MGATHER lowering via vector legalizer.

Additionally, as part of adding passthru support for fixed typed
gathers, fix passthru support for scalable types.

Depends on D104910

Differential Revision: https://reviews.llvm.org/D104217
2021-07-01 12:13:59 +01:00
Fangrui Song 17858da022 [AArch64] Remove unneeded ExternalSymbolSDNode code for machine constraint "S". NFC
ExternalSymbolSDNode is implicitly generated libcalls but with an address taking
operation we cannot reference an ExternalSymbolSDNode.
2021-06-30 17:52:56 -07:00
Sjoerd Meijer b062fff87a Recommit "[AArch64] Custom lower <4 x i8> loads"
This recommits D104782 including a fix for adding a wrong operand to the new
load node.

Differential Revision: https://reviews.llvm.org/D105110
2021-06-30 09:18:06 +01:00
Dylan Fleming c3d3defd11 [SVE] Added CodeGen support for inserting an element into a predicate vector
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D104722
2021-06-29 14:55:40 +01:00
Sjoerd Meijer 3a7cea2858 Revert "[AArch64] Custom lower <4 x i8> loads"
This reverts commit 51e434fc25 because of a
build bot failure in test-suite::GCC-C-execute-pr60960.test that I need to
investigate.
2021-06-28 17:44:46 +01:00
Bradley Smith c089e29aa4 [AArch64][SVE] DAG combine SETCC_MERGE_ZERO of a SETCC_MERGE_ZERO
This helps remove extra comparisons when generating masks for fixed
length masked operations.

Differential Revision: https://reviews.llvm.org/D104910
2021-06-28 15:06:06 +01:00
David Green 2887f14639 [ISel] Port AArch64 SABD and UABD to DAGCombine
This ports the AArch64 SABD and USBD over to DAG Combine, where they can be
used by more backends (notably MVE in a follow-up patch). The matching code
has changed very little, just to handle legal operations and types
differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing
a ubds/abdu which is zexted to the original type.

Differential Revision: https://reviews.llvm.org/D91937
2021-06-26 19:34:16 +01:00
Sjoerd Meijer 51e434fc25 [AArch64] Custom lower <4 x i8> loads
This custom lowers <4 x i8> vector loads using a 32-bit load, followed by 2
SSHLL instructions to extend it to e.g. a <4 x i32> vector. Before, it was
really inefficient and expensive to construct a <4 x i32> for this as 4 byte
loads and 4 moves were used. With this improvement SLP vectorisation might for
example become profitable, see D103629.

Differential Revision: https://reviews.llvm.org/D104782
2021-06-25 09:53:51 +01:00
Martin Storsjö 42f74e8249 [llvm] Rename StringRef _lower() method calls to _insensitive()
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
2021-06-25 00:22:01 +03:00
Sander de Smalen d5e14ba88c [GlobalISel] NFC: Change LLT::vector to take ElementCount.
This also adds new interfaces for the fixed- and scalable case:
* LLT::fixed_vector
* LLT::scalable_vector

The strategy for migrating to the new interfaces was as follows:
* If the new LLT is a (modified) clone of another LLT, taking the
  same number of elements, then use LLT::vector(OtherTy.getElementCount())
  or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2)
  or operator*. That is because there is no reason to specifically restrict
  the types to 'fixed_vector'.
* If the algorithm works on the number of elements (as unsigned), then
  just use fixed_vector. This will need to be fixed up in the future when
  modifying the algorithm to also work for scalable vectors, and will need
  then need additional tests to confirm the behaviour works the same for
  scalable vectors.
* If the test used the '/*Scalable=*/true` flag of LLT::vector, then
  this is replaced by LLT::scalable_vector.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104451
2021-06-24 11:26:12 +01:00
David Spickett e4ecd83fe9 [llvm][AArch64] Handle arrays of struct properly (from IR)
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46996

One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)

This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
  ret [ 1 x %T1 ] zeroinitializer
}
```

We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.

This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.

You get this:
```
  renamable $d0 = FMOVD0
  $w0 = COPY killed renamable $d0
```

Rather than this:
```
  $d0 = FMOVD0
  $w0 = COPY $wzr
```

The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.

This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.

Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.

New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104123
2021-06-16 13:56:01 +00:00
Huihui Zhang 1c096bf09f [SVE][LSR] Teach LSR to enable simple scaled-index addressing mode generation for SVE.
Currently, Loop strengh reduce is not handling loops with scalable stride very well.

Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance,
(refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added).

Memory accesses are incremented by "16*vscale", while induction variable is incremented
by "8*vscale". The scaling factor "2" needs to be extracted to build candidate formula
i.e., "reg(%in) + 2*reg({0,+,(8 * %vscale)}". So that addrec register reg({0,+,(8*vscale)})
can be reused among Address and ICmpZero LSRUses to enable optimal solution selection.

This patch allow LSR getExactSDiv to recognize special cases like "C1*X*Y /s C2*X*Y",
and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR
is missing candidate formula with proper scaled factor to leverage target scaled-index
addressing mode.

Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable
vector. But allow simple valid scale to pass through.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D103939
2021-06-14 16:42:34 -07:00
Jingu Kang 08ce52ef5e [AArch64] Improve SAD pattern
Given a vecreduce_add node, detect the below pattern and convert it to the node
sequence with UABDL, [S|U]ADB and UADDLP.

i32 vecreduce_add(
 v16i32 abs(
   v16i32 sub(
    v16i32 [sign|zero]_extend(v16i8 a), v16i32 [sign|zero]_extend(v16i8 b))))
=================>
i32 vecreduce_add(
  v4i32 UADDLP(
    v8i16 add(
      v8i16 zext(
        v8i8 [S|U]ABD low8:v16i8 a, low8:v16i8 b
      v8i16 zext(
        v8i8 [S|U]ABD high8:v16i8 a, high8:v16i8 b

Differential Revision: https://reviews.llvm.org/D104042
2021-06-14 15:48:51 +01:00
Nikita Popov 1ffa6499ea [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
David Green 12f53e5392 [AArch64] Remove AArch64ISD::NEG
This NEG node is just a vector negation, easily represented as a SUB
zero. Removing it from the one place it is generated is essentially an
NFC, but can allow some extra folding. The updated tests are now loading
different constant literals, which have already been negated.

Differential Revision: https://reviews.llvm.org/D103703
2021-06-05 19:54:42 +01:00
Bradley Smith a85f5874e2 [AArch64] Remove SETCC of CSEL when the latter's condition can be inverted
setcc (csel 0, 1, cond, X), 1, ne ==> csel 0, 1, !cond, X

Where X is a condition code setting instruction.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D103256
2021-06-04 15:53:21 +01:00
Irina Dobrescu e971099a9b [AArch64] Optimise bitreverse lowering in ISel
Differential Revision: https://reviews.llvm.org/D103105
2021-06-02 12:51:12 +01:00
Jessica Paquette e7f501b5e7 [GlobalISel][AArch64] Combine and (lshr x, cst), mask -> ubfx x, cst, width
Also add a target hook which allows us to get around custom legalization on
AArch64.

Differential Revision: https://reviews.llvm.org/D99283
2021-06-01 10:56:17 -07:00
Eli Friedman 0b3b0a727a [AArch64][RISCV] Make sure isel correctly honors failure orderings.
If a cmpxchg specifies acquire or seq_cst on failure, make sure we
generate code consistent with that ordering even if the success ordering
is not acquire/seq_cst.

At one point, it was ambiguous whether this sort of construct was valid,
but the C++ standad and LLVM now accept arbitrary combinations of
success/failure orderings.

This doesn't address the corresponding issue in AtomicExpand. (This was
reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .)

Fixes https://bugs.llvm.org/show_bug.cgi?id=50512.

Differential Revision: https://reviews.llvm.org/D103284
2021-05-28 12:47:40 -07:00
Tim Northover 9ff2eb1ea5 SwiftTailCC: teach verifier musttail rules applicable to this CC.
SwiftTailCC has a different set of requirements than the C calling convention
for a tail call. The exact argument sequence doesn't have to match, but fewer
ABI-affecting attributes are allowed.

Also make sure the musttail diagnostic triggers if a musttail call isn't
actually a tail call.
2021-05-28 11:12:00 +01:00
Bradley Smith f3c577ed38 [AArch64][SVE] Add fixed length codegen for FP_TO_{S,U}INT/{S,U}INT_TO_FP
Depends on D102607

Differential Revision: https://reviews.llvm.org/D102777
2021-05-25 12:54:55 +01:00
Bradley Smith e40513252a [AArch64][SVE] Add fixed length codegen for FP_ROUND/FP_EXTEND
Depends on D102498

Differential Revision: https://reviews.llvm.org/D102607
2021-05-24 13:02:30 +01:00
Bradley Smith 4bc14be259 [AArch64][SVE] Improve codegen for fixed length vector concat
Differential Revision: https://reviews.llvm.org/D102498
2021-05-24 12:56:02 +01:00
David Truby bf3b6cf920 [llvm][sve] Lowering for VLS MLOAD/MSTORE
This adds custom lowering for the MLOAD and MSTORE ISD nodes when
passed fixed length vectors in SVE. This is done by converting the
vectors to VLA vectors and using the VLA code generation.

Fixed length extending loads and truncating stores currently produce
correct code, but do not use the built in extend/truncate in the
load and store instructions. This will be fixed in a future patch.

Differential Revision: https://reviews.llvm.org/D101834
2021-05-20 10:50:59 +00:00
Andrew Savonichev a647100b43 [AArch64] Combine vector shift instructions in SelectionDAG
bswap.v2i16 + sitofp in LLVM IR generate a sequence of:

  - REV32 + USHR for bswap.v2i16
  - SHL + SSHR + SCVTF for sext to v2i32 and scvt

The shift instructions are excessive as noted in PR24820, and they can
be optimized to just SSHR.

Differential Revision: https://reviews.llvm.org/D102333
2021-05-20 10:50:13 +03:00
Eli Friedman 3dd49ec194 [AArch64][SVE] Implement extractelement of i1 vectors.
The implementation just extends the vector to a larger element type, and
extracts from that.  Not fancy, but generates reasonable code.

There was discussion in the review of doing the promotion in
target-independent code, but I'm sticking with this to avoid making
LegalizeDAG infrastructure more complicated.

Differential Revision: https://reviews.llvm.org/D87651
2021-05-17 14:51:11 -07:00
Irina Dobrescu 50511df32e [AArch64] Lower bitreverse in ISel
Adding lowering support for bitreverse.

Previously, lowering bitreverse would expand it into a series of other instructions. This patch makes it so this produces a single rbit instruction instead.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D102397
2021-05-17 13:35:27 +01:00
Tim Northover 82a0e808bb IR/AArch64/X86: add "swifttailcc" calling convention.
Swift's new concurrency features are going to require guaranteed tail calls so
that they don't consume excessive amounts of stack space. This would normally
mean "tailcc", but there are also Swift-specific ABI desires that don't
naturally go along with "tailcc" so this adds another calling convention that's
the combination of "swiftcc" and "tailcc".

Support is added for AArch64 and X86 for now.
2021-05-17 10:48:34 +01:00
Jacob Bramley 900c898994 [AArch64] Lower fpto*i.sat intrinsics.
AArch64's fctv* instructions implement the saturating behaviour that the
fpto*i.sat intrinsics require, in cases where the destination width
matches the saturation width. Lowering them removes a lot of unnecessary
generated code.

Only scalar lowerings are supported for now.

Differential Revision: https://reviews.llvm.org/D102353
2021-05-17 10:19:19 +01:00
Bradley Smith 90ffcb1245 [AArch64][SVE] Add unpredicated vector BIC ISD node
Addition of this node allows us to better utilize the different forms of
the SVE BIC instructions, including using the alias to an AND (immediate).

Differential Revision: https://reviews.llvm.org/D101831
2021-05-14 16:12:13 +01:00
Tim Northover ea0eec69f1 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
Peter Waller 6e6f9a636b [AArch64][SVE] Improve sve.convert.to.svbool lowering
The sve.convert.to.svbool lowering has the effect of widening a logical
<M x i1> vector representing lanes into a physical <16 x i1> vector
representing bits in a predicate register.

In general, if converting to svbool, the contents of lanes in the
physical register might not be known. For sve.convert.to.svbool the new
lanes are specified to be zeroed, requiring 'and' instructions to mask
off the new lanes. For lanes coming from a ptrue or a comparison,
however, they are known to be zero.

CodeGen Before:
  ptrue p0.s, vl16
  ptrue p1.s
  ptrue p2.b
  and   p0.b, p2/z, p0.b, p1.b
  ret

After:
  ptrue	p0.s, vl16
  ret

Differential Revision: https://reviews.llvm.org/D101544
2021-05-12 10:57:25 +01:00
Bradley Smith 635164b95a [AArch64][SVE] Improve SVE codegen for fixed length BITCAST
Expanding a fixed length operation involves wrapping the operation in an
insert/extract subvector pair, as such, when this is done to bitcast we
end up with an extract_subvector of a bitcast. DAGCombine tries to
convert this into a bitcast of an extract_subvector which restores the
initial fixed length bitcast, causing an infinite loop of legalization.

As part of this patch, we must make sure the above DAGCombine does not
trigger after legalization if the created bitcast would not be legal.

Differential Revision: https://reviews.llvm.org/D101990
2021-05-10 14:43:53 +01:00
Bradley Smith 65c89cd1a6 [AArch64][SVE] Better utilisation of unpredicated forms of remaining intrinsics
When using predicated intrinsics, if the predicate used is all lanes active,
use an unpredicated form of the instruction, additionally this allows for
better use of immediate forms.

This only includes instructions where the unpredicated/predicated forms
matched in such a way that instruction selection would not introduce extra
ptrue instructions. This allows us to convert the intrinsics directly to
architecture independent ISD nodes.

Depends on D101062

Differential Revision: https://reviews.llvm.org/D101828
2021-05-10 13:06:02 +01:00
Bradley Smith f8f953c2a6 [AArch64][SVE] Better utilisation of unpredicated forms of arithmetic intrinsics
When using predicated arithmetic intrinsics, if the predicate used is all
lanes active, use an unpredicated form of the instruction, additionally
this allows for better use of immediate forms.

This also includes a new complex isel pattern which allows matching an
all active predicate when the types are different but the predicate is a
superset of the type being used. For example, to allow a b8 ptrue for a
b32 predicate operand.

This only includes instructions where the unpredicated/predicated forms
are mismatched between variants, meaning that the removal of the
predicate is done during instruction selection in order to prevent
spurious re-introductions of ptrue instructions.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D101062
2021-05-10 13:05:37 +01:00
Sander de Smalen 407a33889d [AArch64][SVE] Fix isel failure for FP-extending loads
DAGCombiner tries to combine a (fpext (load)) to (fround (extload))
but SVE has no FP-extending loads. By marking these as expand,
the combine no longer happens.

This also fixes a similar issue for fptrunc, where the source type
is not a legal type.

Reviewed By: bsmith, kmclaughlin

Differential Revision: https://reviews.llvm.org/D102053
2021-05-10 11:27:38 +01:00
Jun Ma b3aeb13892 [AArch64][SVE] Remove index_vector node.
Since index_vector is lowered into step_vector in D100816, we can just remove
index_vector, use step_vector for codegen directly.

Differential Revision: https://reviews.llvm.org/D101593
2021-05-10 11:08:58 +08:00