Commit Graph

5409 Commits

Author SHA1 Message Date
Simon Pilgrim 2cbf9fd402 [DAG] DAGCombiner::visitVECTOR_SHUFFLE - recognise INSERT_SUBVECTOR patterns
IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.

This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.

This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.

Fixes PR50053

Differential Revision: https://reviews.llvm.org/D107068
2021-08-05 15:40:48 +01:00
Jessica Paquette ca2e053652 [AArch64][GlobalISel] Legalize wide vector G_PHIs
Clamp the max number of elements when legalizing G_PHI. This allows us to
legalize some common fallbacks like 4 x s64.

Here's an example: https://godbolt.org/z/6YocsEYTd

Had to add -global-isel-abort=0 to legalize-phi.mir to account for the
G_EXTRACT_VECTOR_ELT from the 32 x s8 G_PHI.

Differential Revision: https://reviews.llvm.org/D107508
2021-08-04 16:48:59 -07:00
Jessica Paquette d9279843b1 [AArch64][GlobalISel] Widen G_PHI before clamping it during legalization
This allows us to handle weird types like s88; we first widen to s128, then
clamp back down to s64.

https://godbolt.org/z/9xqbP46Mz

Also this makes it possible for GISel to legalize the case in pr48188.ll. It
now does the same thing as SDAG, although regalloc chooses different registers.

Differential Revision: https://reviews.llvm.org/D107417
2021-08-04 10:25:14 -07:00
Jessica Paquette 7d97de60b3 [AArch64][GlobalISel] Widen G_FPTO*I before clamping
Going through our legalization rules and doing some cleanup.

Widening and then clamping is usually easier than clamping and then widening.

This allows us to legalize some weird types like s88.

Differential Revision: https://reviews.llvm.org/D107413
2021-08-04 10:19:26 -07:00
Bradley Smith d9cc5d84e4 [AArch64][SVE] Combine bitcasts of predicate types with vector inserts/extracts of loads/stores
An insert subvector that is inserting the result of a vector predicate
sized load into undef at index 0, whose result is casted to a predicate
type, can be combined into a direct predicate load. Likewise the same
applies to extract subvector but in reverse.

The purpose of this optimization is to clean up cases that will be
introduced in a later patch where casts to/from predicate types from i8
types will use insert subvector, rather than going through memory early.

This optimization is done in SVEIntrinsicOpts rather than InstCombine to
re-introduce scalable loads as late as possible, to give other
optimizations the best chance possible to do a good job.

Differential Revision: https://reviews.llvm.org/D106549
2021-08-04 15:51:14 +00:00
Simon Wallis 9269752671 [AArch64] Fix assert AArch64TargetLowering::ReplaceNodeResults
Don't know how to custom expand this
UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788

The fix is to provide missing expansions for:
  case ISD::STRICT_FP_TO_UINT:
  case ISD::STRICT_FP_TO_SINT:

A test case is provided.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D107452
2021-08-04 16:18:19 +01:00
Arthur Eubanks ad25344620 [MC][CodeGen] Emit constant pools earlier
Previously we would emit constant pool entries for ldr inline asm at the
very end of AsmPrinter::doFinalization(). However, if we're emitting
dwarf aranges, that would end all sections with aranges. Then if we have
constant pool entries to be emitted in those same sections, we'd hit an
assert that the section has already been ended.

We want to emit constant pool entries before emitting dwarf aranges.
This patch splits out arm32/64's constant pool entry emission into its
own MCTargetStreamer virtual method.

Fixes PR51208

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D107314
2021-08-03 20:55:31 -07:00
Jessica Paquette 5643736378 [AArch64][GlobalISel] Widen G_SELECT before clamping it
This allows us to handle the s88 G_SELECTS:

https://godbolt.org/z/5s18M4erY

Weird types like this can result in weird merges.

Widening to s128 first and then clamping down avoids that situation.

Differential Revision: https://reviews.llvm.org/D107415
2021-08-03 18:31:17 -07:00
David Green bd07c2e266 [AArch64] Prefer fmov over orr v.16b when copying f32/f64
This changes the lowering of f32 and f64 COPY from a 128bit vector ORR to
a fmov of the appropriate type. At least on some CPU's with 64bit NEON
data paths this is expected to be faster, and shouldn't be slower on any
CPU that treats fmov as a register rename.

Differential Revision: https://reviews.llvm.org/D106365
2021-08-03 17:25:40 +01:00
Jason Molenda 0d8cd4e2d5 [AArch64InstPrinter] Change printAddSubImm to comment imm value when shifted
Add a comment when there is a shifted value,
    add x9, x0, #291, lsl #12 ; =1191936
but not when the immediate value is unshifted,
    subs x9, x0, #256 ; =256
when the comment adds nothing additional to the reader.

Differential Revision: https://reviews.llvm.org/D107196
2021-08-03 02:28:46 -07:00
Cullen Rhodes a02bbeeae7 [AArch64][AsmParser] NFC: Use helpers in matrix tile list parsing 2021-08-03 08:13:01 +00:00
Roman Lebedev 6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Jessica Paquette bd13c8e610 [AArch64][GlobalISel] Emit extloads for ZExt/SExt values in assignValueToAddress
When a value is expected to be extended, we should emit an extended load rather
than a normal G_LOAD.

Add checklines to arm64-abi.ll which show that we now emit the correct loads.

For ease of comparison: https://godbolt.org/z/8WvY6EfdE

Differential Revision: https://reviews.llvm.org/D107313
2021-08-02 14:48:44 -07:00
Irina Dobrescu b01417d3c5 [AArch64] Optimise min/max lowering in ISel
Differential Revision: https://reviews.llvm.org/D106561
2021-08-02 13:40:21 +01:00
Cullen Rhodes 7ed0120d84 [AArch64][AsmParser] NFC: Parser.Lex() -> Lex()
Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D107146
2021-08-02 09:48:41 +00:00
Eli Friedman bdd55b2f18 Fix the default alignment of i1 vectors.
Currently, the default alignment is much larger than the actual size of
the vector in memory.  Fix this to use a sane default.

For SVE, temporarily remove lowering of load/store operations for
predicates with less than 16 elements. The layout the backend was
assuming for SVE predicates with less than 16 elements doesn't agree
with the frontend. More work probably needs to be done here.

This change is, strictly speaking, not backwards-compatible at the
bitcode level. But probably nobody is actually depending on that; i1
vectors in memory are rare, and the code that does use them probably
ends up forcing the alignment to something sane anyway.  If we think
this is a concern, I can restrict this to scalable vectors for now
(where it's actually causing issues for me at the moment).

Differential Revision: https://reviews.llvm.org/D88994
2021-07-31 14:09:59 -07:00
Matt Arsenault ebc17a0d68 GlobalISel: Scalarize unaligned vector stores
This has the same problems and limitations as the load path.
2021-07-31 10:37:15 -04:00
Alexandros Lamprineas 7d940432c4 [AArch64] Legalize MVT::i64x8 in DAG isel lowering
This patch legalizes the Machine Value Type introduced in D94096 for loads
and stores. A new target hook named getAsmOperandValueType() is added which
maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization.

Differential Revision: https://reviews.llvm.org/D94097
2021-07-31 09:51:28 +01:00
Cullen Rhodes 3a349d2269 [AArch64][SME] Introduce feature for streaming mode
The Scalable Matrix Extension (SME) introduces a new execution mode
called Streaming SVE mode. In streaming mode a substantial subset of the
SVE and SVE2 instruction set is available, along with new outer product,
load, store, extract and insert instructions that operate on the new
architectural register state for the matrix.

To support streaming mode this patch introduces a new subtarget feature
+streaming-sve. If enabled, the subset of SVE(2) instructions are
available. The existing behaviour for SVE(2) remains unchanged, the
subset of instructions that are legal in streaming mode are enabled if
either +sve[2] or +streaming-sve is specified. Instructions that are
illegal in streaming mode remain predicated on +sve[2].

The SME target feature has been updated to imply +streaming-sve rather
than +sve.

The following changes are made to the SVE(2) tests:
  * For instructions that are legal in streaming mode:
    - added RUN line to verify +streaming-sve enables the instruction.
    - updated diagnostic to 'instruction requires: streaming-sve or sve'.
  * For instructions that are illegal in streaming-mode:
    - added RUN line to verify +streaming-sve does not enable the
      instruction.

SVE(2) instructions that are legal in streaming mode have:

  if !HaveSVE[2]() && !HaveSME() then UNDEFINED;

at the top of the pseudocode in the XML.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D106272
2021-07-30 07:30:45 +00:00
Adrian Prantl c5d84d2eb3 GlobalISel/AArch64: don't optimize away redundant branches at -O0
This patch prevents GlobalISel from optimizing out redundant branch
instructions when compiling without optimizations.

The motivating example is code like the following common pattern in
Swift, where users expect to be able to set a breakpoint on the early
exit:

public func f(b: Bool) {
  guard b else {
    return // I would like to set a breakpoint here.
  }
  ...
}

The patch modifies two places in GlobalISEL: The first one is in
IRTranslator.cpp where the removal of redundant branches is made
conditional on the optimization level. The second one is in
AArch64InstructionSelector.cpp where an -O0 *only* optimization is
being removed.

Disabling these optimizations increases code size at -O0 by
~8%. However, doing so improves debuggability, and debug builds are
the primary reason why developers compile without optimizations. We
thus concluded that this is the right trade-off.

rdar://79515454

This tenatively reapplies the patch without modifications, the LLDB
test that has blocked this from landing previously has since been
modified to hopefully no longer be sensitive to this change.

Differential Revision: https://reviews.llvm.org/D105238
2021-07-29 16:04:22 -07:00
Bradley Smith 191831e380 [AArch64][SVE] Fix incorrect mask type when lowering fixed type SVE gather/scatter
An incorrect mask type when lowering an SVE gather/scatter was causing
a codegen fault which manifested as the incorrect predicate size being
used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d).

Fixes PR51182.

Differential Revision: https://reviews.llvm.org/D106943
2021-07-29 11:22:17 +00:00
Cullen Rhodes 08d92dbbff [AArch64][AsmParser] NFC: Parser.getTok() -> getTok()
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106949
2021-07-29 10:18:54 +00:00
Amara Emerson da61ab8475 [AArch64][GlobalISel] More widenToNextPow2 changes, this time for arithmetic/bitwise ops. 2021-07-29 03:02:29 -07:00
Jessica Paquette 5a333dc5da [AArch64][GlobalISel] Improve legalization for odd-type G_LOAD
Swap the order of widening so that we widen to the next power-of-2 first when
legalizing G_LOAD.

Also, provide a minimum type for the power of 2 to disallow s2 + s1. Clamping
ought to disallow s2 and s1, but I think it's better to be explicit about the
expected minimum size.

We probably need a similar change for G_STORE, but it seems to be a bit more
finnicky. So, let's just handle G_LOAD for now.

Differential Revision: https://reviews.llvm.org/D107013
2021-07-28 17:19:14 -07:00
Jessica Paquette c0a41c3d3b [AArch64][GlobalISel] Improve legalization for odd-sized G_ICMP/G_CONSTANT
We were handing types like s88 like

1) clamp to the range
2) widen to the next power of 2

This isn't desirable because it causes an odd breakdown for types like s88.
If we widen to the next power of 2 (s128) first, then we get a clean breakdown
when we clamp back to s64.

Differential Revision: https://reviews.llvm.org/D106998
2021-07-28 15:31:33 -07:00
Amara Emerson a11d9a1f48 [AArch64][GlobalISel] Fix constraining LDXPX intrinsic selection.
Causes a fallback because of lack of regclasses on vregs, unless its without
asserts, where we end up crashing later in codegen.
2021-07-27 12:13:56 -07:00
Cullen Rhodes 2e27c4e1f1 [AArch64][SME] Add zero instruction
This patch adds the zero instruction for zeroing a list of 64-bit
element ZA tiles. The instruction takes a list of up to eight tiles
ZA0.D-ZA7.D, which must be in order, e.g.

  zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6.d,za7.d}
  zero {za1.d,za3.d,za5.d,za7.d}

The assembler also accepts 32-bit, 16-bit and 8-bit element tiles which
are mapped to corresponding 64-bit element tiles in accordance with the
architecturally defined mapping between different element size tiles,
e.g.

  * Zeroing ZA0.B, or the entire array name ZA, is equivalent to zeroing
    all eight 64-bit element tiles ZA0.D to ZA7.D.
  * Zeroing ZA0.S is equivalent to zeroing ZA0.D and ZA4.D.

The preferred disassembly of this instruction uses the shortest list of
tile names that represent the encoded immediate mask, e.g.

  * An immediate which encodes 64-bit element tiles ZA0.D, ZA1.D, ZA4.D and
    ZA5.D is disassembled as {ZA0.S, ZA1.S}.
  * An immediate which encodes 64-bit element tiles ZA0.D, ZA2.D, ZA4.D and
    ZA6.D is disassembled as {ZA0.H}.
  * An all-ones immediate is disassembled as {ZA}.
  * An all-zeros immediate is disassembled as an empty list {}.

This patch adds the MatrixTileList asm operand and related parsing to support
this.

Depends on D105570.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105575
2021-07-27 08:35:45 +00:00
Craig Topper 2ea9db0c49 [AArch64] Fix -Wparentheses warning with gcc 5.4. NFC 2021-07-26 21:08:56 -07:00
Jon Roelofs f2e8e46d78 Revert "[AArch64][GlobalISel] Legalize ctpop s128"
This reverts commit 97e95fea53.

It broke test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll. Not sure why I didn't see that.
2021-07-26 17:06:43 -07:00
Jon Roelofs 97e95fea53 [AArch64][GlobalISel] Legalize ctpop s128
Differential revision: https://reviews.llvm.org/D106494
2021-07-26 16:33:50 -07:00
Amara Emerson 172051a1f4 [AArch64][GlobalISel] Add identity combines to post-legal combiner.
We see some shifts of zero emitted during legalization.

Differential Revision: https://reviews.llvm.org/D106816
2021-07-26 15:17:11 -07:00
Amara Emerson c658b472f3 [GlobalISel] Add a constant folding combine.
Use it AArch64 post-legal combiner. These don't always get folded because when
the instructions are created the constants are obscured by artifacts.

Differential Revision: https://reviews.llvm.org/D106776
2021-07-26 14:53:33 -07:00
Amara Emerson 6af8d36054 [AArch4][GlobalISel] Post-legalize combine s64 = G_MERGE s32, 0 -> G_ZEXT.
These are generated as a byproduce of legalization.

Differential Revision: https://reviews.llvm.org/D106768
2021-07-26 10:58:04 -07:00
Amara Emerson 0d41d21929 [AArch64][GlobalISel] Enable some select combines after legalization.
The legalizer generates selects for some operations, which can have constant
condition values, resulting in lots of dead code if it's not folded away.

Differential Revision: https://reviews.llvm.org/D106762
2021-07-26 10:40:32 -07:00
Amara Emerson dec34104bf [GlobalISel] Add combine for merge(unmerge) and use AArch64 postlegal-combiner.
Differential Revision: https://reviews.llvm.org/D106761
2021-07-26 10:37:31 -07:00
Paul Walker 3b77e2737c [SVE] Use reg+reg addressing mode for immediate offsets.
For reg+imm SVE addressing mode imm is implictly scaled by VL,
making them impractical for truely immediate offsets.  However, if
the offset can be unscaled based on the storage element type we
can use the reg+reg SVE addressing mode and thus either reduce the
number of generate add instructions or replace them with a mov
instruction that can be hoisted from the hot code path.

Differential Revision: https://reviews.llvm.org/D106744
2021-07-26 16:24:16 +01:00
Bradley Smith 81eafb8a37 [AArch64][SVE] Break false dependencies for inactive lanes of unary operations
Differential Revision: https://reviews.llvm.org/D105889
2021-07-26 15:01:21 +00:00
Tim Northover a487a49acc AArch64: support i128 (& larger) returns in GlobalISel 2021-07-26 14:16:35 +01:00
Caroline Concatto bf28111ebd [AArch65][SVE] Remove vector_splice from AddedComplexity pattern
The pattern for vector_splice with Index equal or bigger than
zero was misplaced in the AddedComplexity = 1 pattern in the AArch64
tablegen file. This patch fixes it by removing vector_splice pattern
from inside AddedComplexity = 1.
2021-07-26 13:35:51 +01:00
Caroline Concatto 0bfc26e3a4 [SVE][AArch64] Improve code generation for vector_splice for Imm > 0
This patch implements vector_splice in tablegen for all cases when the
Immediate is positive and lower than the known minimum value of
a scalable vector.
Vector_splice can be implemented using SVE instruction EXT.
For instance :
    @llvm.experimental.vector.splice(Vector_1, Vector_2, Imm)
    @llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>
        EXT  Vector_1, Vector_2, Imm              // Vector_1 = B, C, D + Vector_2 = E

Depends on D105633

Differential Revision: https://reviews.llvm.org/D106273
2021-07-26 11:45:46 +01:00
Caroline Concatto 73e4e9cd00 [AArch64][SVE] Improve code generation for vector_splice for Imm == -1
This patch implements vector_splice in tablegen for:
  a) when the immediate is equal to -1 (Imm==1) and uses:
       INSR  +  LASTB
For instance :
@llvm.experimental.vector.splice(Vector_1, Vector_2, -1)
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <D, E, F, G>
    LAST   RegLast, Vector_1                 // RegLast = D
    INSR   Res, (Vector_1 >> 1), RegLast     // Res = D + E, F, G

Differential Revision: https://reviews.llvm.org/D105633
2021-07-26 11:25:01 +01:00
Cullen Rhodes e6ff9179ce [AArch64][AsmParser] NFC: Parser.getTok().getLoc() -> getLoc()
Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D106635
2021-07-26 09:36:34 +00:00
David Sherwood 0aff1798b5 [Analysis] Add simple cost model for strict (in-order) reductions
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432
2021-07-26 10:26:06 +01:00
Kyungwoo Lee 6530ea4095 [AArch64] Fix Local Deallocation for Homogeneous Prolog/Epilog
The stack adjustment for local deallocation was incorrectly ported.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D106760
2021-07-25 10:51:11 -07:00
Amara Emerson acbc0c5f0e [AArch64][GlobalISel] Widen non-pow-2 types for shifts before clamping.
For types like s96, we don't want to clamp to s64, we want to first widen to
s128 and then narrow it. Otherwise we end up with impossible to legalize types.
2021-07-24 15:50:43 -07:00
Benjamin Kramer dd70cd089a [llvm][sve] Silence unused variable warning in Release builds. NFC 2021-07-23 16:16:35 +02:00
David Truby 1528a4d400 [llvm][sve] Lowering for VLS truncating stores
This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.

Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.

Differential Revision: https://reviews.llvm.org/D104471
2021-07-23 14:04:55 +01:00
David Green 38986c6782 [AArch64] Add worst case shuffle costs
This adds some missing single source shuffle costs for AArch64, of i16
and i8 vectors. v4i16 are the same as v4i32 with a worse case cost of 3
coming from the perfect shuffle tables. The larger vector sizes expand
into a constant pool, plus a load (and adrp) and a tbl. I arbitrarily
chose 8 for the cost to be expensive but not too expensive.

Differential Revision: https://reviews.llvm.org/D106241
2021-07-23 09:01:58 +01:00
Cullen Rhodes fde7550094 [AArch64][AsmParser] NFC: when creating a token IsSuffix=false should be default
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106568
2021-07-23 06:36:06 +00:00
Vitaly Buka 44ba8c691c [NFC][asan] Always pass Dominator Trees into forAllReachableExits 2021-07-22 18:01:38 -07:00
David Green c9cebda772 [AArch64] Adjust the cost of integer sum reductions
This changes the cost to (LT.first-1) * cost(add) + 2, where the cost of
an add is assumed to be 1. This brings it inline with the other
reductions.

Differential Revision: https://reviews.llvm.org/D106240
2021-07-22 18:19:54 +01:00
Cullen Rhodes 00e87e1c5b [AArch64][SME] Improve diagnostic for vector select register
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D106540
2021-07-22 13:46:40 +00:00
Jessica Paquette c75a2bbe08 [AArch64][GlobalISel] Change | -> || in an if
I wrote the wrong type of OR by mistake.
2021-07-21 14:57:31 -07:00
Jessica Paquette d0af732bd0 [AArch64][GlobalISel] Widen s2 and s4 G_IMPLICIT_DEF + G_FREEZE
These had

```
.clampScalar(0, s1, 64)
.widenScalarToNextPow2(0, 8)
```

If you have s2 or s4, then `widenScalarToNextPow2` does nothing.

This changes the `widenScalarToNextPow2` rule to use s8 as the minimum type
instead, allowing us to correctly widen s2 and s4.

This does not impact s1, since it's marked as legal already.

Differential Revision: https://reviews.llvm.org/D106413
2021-07-21 12:59:20 -07:00
Eli Friedman 0ca46a1757 [SelectionDAG] Fix the representation of ISD::STEP_VECTOR.
The existing rule about the operand type is strange.  Instead, just say
the operand is a TargetConstant with the right width.  (Legalization
ignores TargetConstants, so it doesn't matter if that width is legal.)

Highlights:

1. I had to substantially rewrite the AArch64 isel patterns to expect a
TargetConstant.  Nothing too exotic, but maybe a little hairy. Maybe
worth considering a target-specific node with some dagcombines instead
of this complicated nest of isel patterns.
2. Our behavior on RV32 for vectors of i64 has changed slightly. In
particular, we correctly preserve the width of the arithmetic through
legalization.  This changes the DAG a bit. Maybe room for
improvement here.
3. I explicitly defined the behavior around overflow. This is necessary
to make the DAGCombine transforms legal, and I don't think it causes any
practical issues.

Differential Revision: https://reviews.llvm.org/D105673
2021-07-21 10:58:40 -07:00
Cullen Rhodes 008c755d76 [AArch64][SME] Support .arch and .arch_extension assembler directives
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105566
2021-07-21 08:40:27 +00:00
Tim Northover 291e0daa6e AArch64: support 8 & 16-bit atomic operations in GlobalISel
We have SelectionDAG patterns for 8 & 16-bit atomic operations, but they
assume the value types will have been legalized to 32-bits. So this adds
the ability to widen them to both AArch64 & generic GISel
infrastructure.
2021-07-21 09:35:14 +01:00
Cullen Rhodes 2d80bbd939 [AArch64][SME] Add mova instructions
This patch adds the mova instruction to insert/extract an SVE vector
register to/from a ZA tile vector.

The preferred MOV aliases are also implemented.

Depends on D105572.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: david-arm, CarolineConcatto

Differential Revision: https://reviews.llvm.org/D105574
2021-07-21 08:20:01 +00:00
Cullen Rhodes 6c32cfe85c [AArch64][SME] Add ldr and str instructions
The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D105573
2021-07-21 08:17:13 +00:00
Jon Roelofs 75187aa352 [AArch64][GlobalISel] Legalize ctpop for v2s64, v2s32, v4s32, v4s16, v8s16
https://llvm.godbolt.org/z/nTTK6M5qe

Differential revision: https://reviews.llvm.org/D106388
2021-07-20 15:37:56 -07:00
Jessica Paquette 8f54ebd51d [AArch64][GlobalISel] Select llvm.aarch64.neon.st2 intrinsics
Add manual selection code similar to the code in AArch64ISelDAGToDAG, and add
`createTuple` helpers similar to the code there as well.

This accounted for around 111 fallbacks while building clang for AArch64 with
GlobalISel.

This also should make it easy to add selection code for other store
intrinsics.

As a minor cleanup, this uses `createQTuple` in the other place where we use
REG_SEQUENCE.

Differential Revision: https://reviews.llvm.org/D106332
2021-07-20 13:23:46 -07:00
Eli Friedman 664a1fd9f0 [AArch64] Use the CMP_SWAP_128 variants added in 843c6140.
Accidentally forgot to flip the opcode... and I didn't notice because it
was working fine for the GlobalISel.
2021-07-20 13:23:27 -07:00
Fangrui Song 0c0549fbb3 [AArch64] Delete unused Opcode after D106039 2021-07-20 12:51:44 -07:00
Eli Friedman 843c614058 [AArch64] Fix i128 cmpxchg using ldxp/stxp.
Basically two parts to this fix:

1. Stop using AtomicExpand to expand cmpxchg i128
2. Fix AArch64ExpandPseudoInsts to use a correct expansion.

From ARM architecture reference:

To atomically load two 64-bit quantities, perform a Load-Exclusive
pair/Store-Exclusive pair sequence of reading and writing the same value
for which the Store-Exclusive pair succeeds, and use the read values
from the Load-Exclusive pair.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51102

Differential Revision: https://reviews.llvm.org/D106039
2021-07-20 12:38:12 -07:00
Bradley Smith 191f9fa5d2 [AArch64][SVE] Move instcombine like transforms out of SVEIntrinsicOpts
Instead move them to the instcombine that happens in AArch64TargetTransformInfo.

Differential Revision: https://reviews.llvm.org/D106144
2021-07-20 14:17:30 +00:00
Sander de Smalen eb1a5120b8 [AArch64][SVE][InstCombine] last{a,b} of a splat vector
Replace last{a,b}(splat(X)) with X, irrespective of the predicate.

Patch by/Committing on behalf of: Usman Nadeem (mnadeem)

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D105520
2021-07-20 09:44:43 +01:00
Cullen Rhodes 15af3aaa2e [AArch64][SME] Add system registers and related instructions
This patch adds the new system registers introduced in SME:

  - ID_AA64SMFR0_EL1 (ro) SME feature identifier.
  - SMCR_ELx (r/w) streaming mode control register for configuring
    effective SVE Streaming SVE Vector length when the PE is in
    Streaming SVE mode.
  - SVCR (r/w) streaming vector control register, visible at all
    exception levels. Provides access to PSTATE.SM and PSTATE.ZA
    using MSR and MRS instructions.
  - SMPRI_EL1 (r/w) streaming mode execution priority register.
  - SMPRIMAP_EL2 (r/w) streaming mode priority mapping register.
  - SMIDR_EL1 (ro) streaming mode identification register.
  - TPIDR2_EL0 (r/w) for use by SME software to manage per-thread
    SME context.
  - MPAMSM_EL1 (r/w) MPAM (v8.4) streaming mode register, for
    labelling memory accesses performed in streaming mode.

Also added in this patch are the SME mode change instructions.
Three MSR immediate instructions are implemented to set or clear
PSTATE.SM, PSTATE.ZA, or both respectively:

  - MSR SVCRSM, #<imm1>
  - MSR SVCRZA, #<imm1>
  - MSR SVCRSMZA, #<imm1>

The following smstart/smstop aliases are also implemented for
convenience:

  smstart    -> MSR SVCRSMZA, #1
  smstart sm -> MSR SVCRSM,   #1
  smstart za -> MSR SVCRZA,   #1

  smstop     -> MSR SVCRSMZA, #0
  smstop sm  -> MSR SVCRSM,   #0
  smstop za  -> MSR SVCRZA,   #0

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105576
2021-07-20 08:06:26 +00:00
Amara Emerson 56a6686e0c [AArch64][GlobalISel] Don't form truncstores in postlegalizer-lowering for s128.
We don't support truncating s128 stores, so don't form them.
2021-07-20 00:04:34 -07:00
Matt Arsenault 30fa074c0a AArch64/GlobalISel: Preserve memory types 2021-07-19 20:21:05 -04:00
Amy Huang fd972bb9fd Revert "[llvm][sve] Lowering for VLS truncating stores" because it
causes a seg fault (see https://reviews.llvm.org/D104471).

This reverts commit c305557acd.
2021-07-19 11:03:33 -07:00
David Green 5561ad8b36 [ARM] Remove PromotedBitwiseVT for NEON types
This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32,
treating them the same as the AArch64 and MVE backends where we just add
the relevant patterns for each legal type. This prevents a lot of
bitcasts from being added to the DAG, which have the potential to make
optimizations more difficult. It does mean adding extra patterns, and
some codegen can change due to the types now being legal, not promoted.

Differential Revision: https://reviews.llvm.org/D105588
2021-07-19 16:36:33 +01:00
Matt Arsenault e574fd9d52 AArch64/GlobalISel: Cleanup unnecessary size checks in call lowering
The CCValAssign types should now be accurate, so these are no longer
necessary.
2021-07-19 11:01:30 -04:00
Florian Mayer d23f26f0af [NFC] [MTE] helper for stack tagging lifetimes.
Reviewed By: eugenis, vitalybuka

Differential Revision: https://reviews.llvm.org/D106135
2021-07-19 11:09:16 +01:00
Cullen Rhodes f91eaa7007 [AArch64][SME] Add SVE2 instructions added in SME
This patch adds support for the following instructions:

    SCLAMP, UCLAMP, REV, DUP (predicate)

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D105577
2021-07-19 08:03:05 +00:00
Sander de Smalen 0ed0573527 [AArch64][SVE] Optimize bitcasts between unpacked half/i16 vectors.
The case for nxv2f32/nxv2i32 was already covered by D104573.
This patch builds on top of that by making the mechanism work for
nxv2[b]f16/nxv2i16, nxv4[b]f16/nxv4i16 as well.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D106138
2021-07-19 08:29:28 +01:00
Jon Roelofs 5cd63e9ec2 [AArch64][GlobalISel] Legalize bswap <2 x i16>
Differential revision: https://reviews.llvm.org/D105935
2021-07-17 15:31:15 -07:00
Eli Friedman e41e865b15 [AArch64] Prepare for changes to STEP_VECTOR.
Rewrite patterns to assume that the operand of STEP_VECTOR is a
constant. The old patterns will stop working when the operand is changed
from a Constant to a TargetConstant. (See D105673.)

Add test coverage for certain patterns that weren't exercised by
existing regression tests.

Differential Revision: https://reviews.llvm.org/D105847
2021-07-17 14:13:41 -07:00
Nikita Popov 6d3e7c783b [OpaquePtr] Remove uses of CreateConstGEP1_32() without element type
Remove uses of to-be-deprecated API. I've fallen back to calling
getPointerElementType() in some cases where the correct type wasn't
immediately obvious to me.
2021-07-17 18:32:36 +02:00
Nicholas Guy 9769535efd [AArch64] Update Cortex-A55 SchedModel to improve LDP scheduling
Specifying the latencies of specific LDP variants appears to improve
performance almost universally.

Differential Revision: https://reviews.llvm.org/D105882
2021-07-16 12:00:57 +01:00
Cullen Rhodes 99eb96f031 [AArch64][SME] Add load and store instructions
This patch adds support for following contiguous load and store
instructions:

  * LD1B, LD1H, LD1W, LD1D, LD1Q
  * ST1B, ST1H, ST1W, ST1D, ST1Q

A new register class and operand is added for the 32-bit vector select
register W12-W15. The differences in the following tests which have been
re-generated are caused by the introduction of this register class:

  * llvm/test/CodeGen/AArch64/GlobalISel/irtranslator-inline-asm.ll
  * llvm/test/CodeGen/AArch64/GlobalISel/regbank-inlineasm.mir
  * llvm/test/CodeGen/AArch64/stp-opt-with-renaming-reserved-regs.mir
  * llvm/test/CodeGen/AArch64/stp-opt-with-renaming.mir

D88663 attempts to resolve the issue with the store pair test
differences in the AArch64 load/store optimizer.

The GlobalISel differences are caused by changes in the enum values of
register classes, tests have been updated with the new values.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: CarolineConcatto

Differential Revision: https://reviews.llvm.org/D105572
2021-07-16 10:11:10 +00:00
Matt Arsenault e91da668d0 GlobalISel: Track argument pointeriness with arg flags
Since we're still building on top of the MVT based infrastructure, we
need to track the pointer type/address space on the side so we can end
up with the correct pointer LLTs when interpreting CCValAssigns.
2021-07-15 19:11:40 -04:00
Jessica Paquette 46c8e7122b [AArch64][GlobalISel] Clamp <n x p0> vecs when legalizing G_EXTRACT_VECTOR_ELT
This case was missing from G_EXTRACT_VECTOR_ELT. It's the same as for s64.

https://godbolt.org/z/Tnq4acY8z

Differential Revision: https://reviews.llvm.org/D105952
2021-07-15 14:05:28 -07:00
Simon Pilgrim 91e151476c [TTI] Consistently make getMinVectorRegisterBitWidth() methods const. NFCI.
The underlying getMinVectorRegisterBitWidth() methods are const, but it was missed in a couple of TargetTransformInfo wrappers.

Noticed while working on D103925
2021-07-15 13:27:55 +01:00
Irina Dobrescu 831ee6b0c3 [AArch64][GlobalISel] Optimise lowering for some vector types for min/max
Differential Revision: https://reviews.llvm.org/D105696
2021-07-15 11:34:32 +01:00
Cullen Rhodes dfa76933c2 [AArch64][SME] Add outer product instructions
This patch adds support for the following outer product instructions:

  * BFMOPA, BFMOPS, FMOPA, FMOPS, SMOPA, SMOPS, SUMOPA, SUMOPS, UMOPA,
    UMOPS, USMOPA, USMOPS.

Depends on D105570.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105571
2021-07-15 09:51:06 +00:00
Jon Roelofs 0e49c54a8c [AArch64] Fix selection of G_UNMERGE <2 x s16>
Differential revision: https://reviews.llvm.org/D106007
2021-07-14 13:40:56 -07:00
Eli Friedman 1e30bf8621 [SelectionDAG] Add an overload of getStepVector that assumes step 1.
This is mostly a minor convenience, but the pattern seems frequent
enough to be worthwhile (and we'll probably add more uses in the
future).

Differential Revision: https://reviews.llvm.org/D105850
2021-07-14 11:37:01 -07:00
Sander de Smalen eac1670739 [CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid.
At the moment, <vscale x 1 x eltty> are not yet fully handled by the
code-generator, so to avoid vectorizing loops with that VF, we mark the
cost for these types as invalid.
The reason for not adding a new "TTI::getMinimumScalableVF" is because
the type is supposed to be a type that can be legalized. It partially is,
although the support for these types need some more work.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D103882
2021-07-14 16:44:22 +01:00
Stephen Tozer 810e4c3c66 [DebugInfo] Correctly update dbg.values with duplicated location ops
This patch fixes code that incorrectly handled dbg.values with duplicate
location operands, i.e. !DIArgList(i32 %a, i32 %a). The errors in
question were caused by either applying an update to dbg.value multiple
times when the update is only valid once, or by updating the
DIExpression for only the first instance of a value that appears
multiple times.

Differential Revision: https://reviews.llvm.org/D105831
2021-07-14 11:17:24 +01:00
Cullen Rhodes c08dabb0f4 [AArch64][SME] Add matrix register definitions and parsing support
SME introduces the ZA array, a new piece of architectural register state
consisting of a matrix of [SVLb x SVLb] bytes, where SVL is the
implementation defined Streaming SVE vector length and SVLb is the
number of 8-bit elements in a vector of SVL bits.

SME instructions consist of three types of matrix operands:

  * Tiles: a ZA tile is a square, two-dimensional sub-array of elements
  within the ZA array. These tiles make up the larger accumulator array
  and the granularity varies based on the element size, i.e.
    - ZAQ0..ZAQ15 (smallest tile granule)
    - ZAD0..ZAD7
    - ZAS0..ZAS3
    - ZAH0..ZAH1
    or ZAB0       (largest tile granule, single tile)
  * Tile vectors: similar to regular tiles, but have an extra 'h' or 'v'
  to tell how the vector at [reg+offset] is layed out in the tile,
  horizontally or vertically. E.g. za1h.h or za15v.q, which corresponds
  to vectors in registers ZAH1 and ZAQ15, respectively.
  * Accumulator matrix: this is the entire accumulator array ZA.

This patch adds the register classes and related operands and parsing
for SME instructions operating on the accumulator array.

The ADDHA and ADDVA instructions which operate on tiles are also added
in this patch to make some use of the code added, later patches will
make use of the other operands introduced here.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Co-authored by: Sander de Smalen (@sdesmalen)

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105570
2021-07-14 08:25:49 +00:00
Jon Roelofs 87c6bf92a9 [AArch64] rm unused subreg's 2021-07-13 18:06:31 -07:00
Jon Roelofs 6377388c32 [AArch64] Fix AArch64::dsub's size 2021-07-13 18:06:31 -07:00
Jessica Paquette 5bd7cc4f42 [AArch64][GlobalISel] Mark v2s64 -> v2p0 G_INTTOPTR as legal
Allow

```
%x:_<2 x p0> = G_INTTOPTR %y:_<2 x s64>
```

This shows up when building clang for AArch64 with GlobalISel.

Also show that we can select it.

This should match SDAG's behaviour: https://godbolt.org/z/33oqYoaYv

Differential Revision: https://reviews.llvm.org/D105944
2021-07-13 17:28:14 -07:00
Jon Roelofs eba638dbbb [AArch64][GlobalISel] Legalize load <2 x i16>
Differential revision: https://reviews.llvm.org/D105913
2021-07-13 11:12:05 -07:00
Jon Roelofs 43c7ca8e49 [AArch64][GlobalISel] Legalize store <2 x i16>
Differential revision: https://reviews.llvm.org/D105912
2021-07-13 11:12:05 -07:00
Matt Arsenault 77a608d9de GlobalISel: Remove getIntrinsicID utility function
This is redundant with a method directly on MachineInstr
2021-07-13 11:04:10 -04:00
Tim Northover 7802f62b3f AArch64: use 4-byte slots for arm64_32 pointers in a tail call 2021-07-13 11:08:59 +01:00
Jon Roelofs 6611fbc62a [AArch64] Dump a little more info about unimplemented reg-to-reg copies. NFC 2021-07-12 15:37:11 -07:00
Eli Friedman 6c04b7dd4f [AArch64] Optimize overflow checks for [s|u]mul.with.overflow.i32.
Saves one instruction for signed, uses a cheaper instruction for
unsigned.

Differential Revision: https://reviews.llvm.org/D105770
2021-07-12 15:30:42 -07:00
Benjamin Kramer 0da3573a9e [AArch64] Silence unused variable warning. NFC.
AArch64ISelLowering.cpp:15167:8: warning: unused variable 'OpCode' [-Wunused-variable]
  auto OpCode = N->getOpcode();
       ^
2021-07-12 16:01:11 +02:00