The isEssentiallyExtractHighSubvector function currently calls
getVectorNumElements on a type that in specific cases might be scalable.
Since this function only has correct behaviour at the moment on scalable
types anyway, the function can just return false when given a fixed type.
Differential Revision: https://reviews.llvm.org/D109163
When lowering a fixed length gather/scatter the index type is assumed to
be the same as the memory type, this is incorrect in cases where the
extension of the index has been folded into the addressing mode.
For now add a temporary workaround to fix the codegen faults caused by
this by preventing the removal of this extension. At a later date the
lowering for SVE gather/scatters will be redesigned to improve the way
addressing modes are handled.
As a short term side effect of this change, the addressing modes
generated for fixed length gather/scatters will not be optimal.
Differential Revision: https://reviews.llvm.org/D109145
For vectors that are exactly equal to getMaxSVEVectorSizeInBits, just use
AArch64SVEPredPattern::all, which can enable the use of unpredicated ptrue when available.
TestPlan: check-llvm
Differential Revision: https://reviews.llvm.org/D108706
This patch solely moves convert operation between SVE predicate pattern
and element number into two small functions. It's pre-commit patch for optimize
pture with known sve register width.
Differential Revision: https://reviews.llvm.org/D108705
This allows the instruction selector to realize that it can directly
broadcast the low byte of the memset value, rather than replicating
it to a 64-bit GPR before broadcasting.
This fixes PR50985.
Differential Revision: https://reviews.llvm.org/D108354
If Orig produces more than one value (rare) with different types (rarer) then
we need to make sure we check against the one that Orig actually represents,
not just the first type.
Unfortunately because of the combination of things that need to happen I wasn't
able to produce a test.
Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal.
This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands.
Differential Revision: https://reviews.llvm.org/D107597
This patch enables extending loads for fixed length SVE code generation.
There is a slight regression here in the mulh tests; since these tests
load the parameter and then extend it these are treated as extending
loads which are merged, preventing the mulh instruction from being
generated. As this affects scalable SVE codegen as well this should be
addressed in a separate patch.
Reviewed By: bsmith
Differential Revision: https://reviews.llvm.org/D107057
This was checking extends as shuffles, where as we should be checking
the operands. This helps sink the shuffles, creating more addl/subl
instructions.
Differential Revision: https://reviews.llvm.org/D107623
I was originally going to try to implement this in target-independent
code, but it's actually sort of tricky to generate the correct sequence
for vectors like nxv2f32. So just stick this in target-specific code,
at least for now.
Differential Revision: https://reviews.llvm.org/D107608
The patterns for fixed length gather/scatter with 32-bit offsets and
64-bit memory type are slightly different that the rest of the patterns,
as such the lowering needs to be slightly different to ensure the
correct types are used.
Differential Revision: https://reviews.llvm.org/D107576
IR typically creates INSERT_SUBVECTOR patterns as a widening of the subvector with undefs to pad to the destination size, followed by a shuffle for the actual insertion - SelectionDAGBuilder has to do something similar for shuffles when source/destination vectors are different sizes.
This combine attempts to recognize these patterns by looking for a shuffle of a subvector (from a CONCAT_VECTORS) that starts at a modulo of its size into an otherwise identity shuffle of the base vector.
This uncovered a couple of target-specific issues as we haven't often created INSERT_SUBVECTOR nodes in generic code - aarch64 could only handle insertions into the bottom of undefs (i.e. a vector widening), and x86-avx512 vXi1 insertion wasn't keeping track of undef elements in the base vector.
Fixes PR50053
Differential Revision: https://reviews.llvm.org/D107068
Don't know how to custom expand this
UNREACHABLE executed at llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:16788
The fix is to provide missing expansions for:
case ISD::STRICT_FP_TO_UINT:
case ISD::STRICT_FP_TO_SINT:
A test case is provided.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D107452
This patch legalizes the Machine Value Type introduced in D94096 for loads
and stores. A new target hook named getAsmOperandValueType() is added which
maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization.
Differential Revision: https://reviews.llvm.org/D94097
An incorrect mask type when lowering an SVE gather/scatter was causing
a codegen fault which manifested as the incorrect predicate size being
used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d).
Fixes PR51182.
Differential Revision: https://reviews.llvm.org/D106943
This patch implements vector_splice in tablegen for all cases when the
Immediate is positive and lower than the known minimum value of
a scalable vector.
Vector_splice can be implemented using SVE instruction EXT.
For instance :
@llvm.experimental.vector.splice(Vector_1, Vector_2, Imm)
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>
EXT Vector_1, Vector_2, Imm // Vector_1 = B, C, D + Vector_2 = E
Depends on D105633
Differential Revision: https://reviews.llvm.org/D106273
This patch implements vector_splice in tablegen for:
a) when the immediate is equal to -1 (Imm==1) and uses:
INSR + LASTB
For instance :
@llvm.experimental.vector.splice(Vector_1, Vector_2, -1)
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <D, E, F, G>
LAST RegLast, Vector_1 // RegLast = D
INSR Res, (Vector_1 >> 1), RegLast // Res = D + E, F, G
Differential Revision: https://reviews.llvm.org/D105633
This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.
Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.
Differential Revision: https://reviews.llvm.org/D104471
Basically two parts to this fix:
1. Stop using AtomicExpand to expand cmpxchg i128
2. Fix AArch64ExpandPseudoInsts to use a correct expansion.
From ARM architecture reference:
To atomically load two 64-bit quantities, perform a Load-Exclusive
pair/Store-Exclusive pair sequence of reading and writing the same value
for which the Store-Exclusive pair succeeds, and use the read values
from the Load-Exclusive pair.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51102
Differential Revision: https://reviews.llvm.org/D106039
This removes the promotion of NEON AND, OR and XOR nodes to v2i32/v4i32,
treating them the same as the AArch64 and MVE backends where we just add
the relevant patterns for each legal type. This prevents a lot of
bitcasts from being added to the DAG, which have the potential to make
optimizations more difficult. It does mean adding extra patterns, and
some codegen can change due to the types now being legal, not promoted.
Differential Revision: https://reviews.llvm.org/D105588
The case for nxv2f32/nxv2i32 was already covered by D104573.
This patch builds on top of that by making the mechanism work for
nxv2[b]f16/nxv2i16, nxv4[b]f16/nxv4i16 as well.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D106138
This is mostly a minor convenience, but the pattern seems frequent
enough to be worthwhile (and we'll probably add more uses in the
future).
Differential Revision: https://reviews.llvm.org/D105850
This adds custom lowering for truncating stores when operating on
fixed length vectors in SVE. It also includes a DAG combine to
fold extends followed by truncating stores into non-truncating
stores in order to prevent this pattern appearing once truncating
stores are supported.
Currently truncating stores are not used in certain cases where
the size of the vector is larger than the target vector width.
Differential Revision: https://reviews.llvm.org/D104471
Additionally, lower the floating point compare SVE intrinsics to
SETCC_MERGE_ZERO ISD nodes to avoid duplicating ISel patterns.
Differential Revision: https://reviews.llvm.org/D105486
This patch adds a new ShuffleKind SK_Splice and then handle the cost in
getShuffleCost, as in experimental.vector.reverse.
Differential Revision: https://reviews.llvm.org/D104630
Improve codegen when lowering the common vector shuffle case from the
vectorizer (op1[last]:op2[0:last-1]). This patch only handles this
common case as it is difficult to handle this more generally when using
fixed length vectors, due to being unable to use the SVE ext instruction.
Differential Revision: https://reviews.llvm.org/D105289