Commit Graph

1262 Commits

Author SHA1 Message Date
David Green 0cd8063960 [AArch64] Genereate CCMP from And CSel
LLVM has a couple of ways of producing ccmp - either from chains in isel
or from a later ifcvt style pass. This adds a simple DAG combine to
capture more cases, converting and(csel(0, 1, cc0), csel(0, 1, cc1))
into a csel(ccmp(.., cc0)), depending on cc1 (a SUBS in this case).

Differential Revision: https://reviews.llvm.org/D118327
2022-02-02 13:48:16 +00:00
David Sherwood 11cf807796 [AArch64][CodeGen] Always use SVE (when enabled) to lower integer divides
This patch adds custom lowering support for ISD::SDIV and ISD::UDIV
when SVE is enabled, regardless of the minimum SVE vector length. We do
this because NEON simply does not have vector integer divide support, so
we want to take advantage of these instructions in SVE.

As part of this patch I've also simplified LowerToPredicatedOp to avoid
re-asking the same question about whether we should be using SVE for
fixed length vectors. Once we've made the decision to call
LowerToPredicatedOp, then we should simply assert we should be using SVE.

I've updated the 128-bit min SVE vector bits tests here:

  CodeGen/AArch64/sve-fixed-length-int-div.ll
  CodeGen/AArch64/sve-fixed-length-int-rem.ll

Differential Revision: https://reviews.llvm.org/D117871
2022-02-02 09:46:02 +00:00
Cullen Rhodes 16d464a291 [AArch64][SVE] NFC: tidy up isel lowering
Whilst adding legal types <-> register classes for Streaming SVE in
D118561 I noticed the hasSVE predication block set operation actions for
opcodes that may not be legal in Streaming SVE. Move these operations to
the later hasSVE block which has loops over the same types.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D118560
2022-02-02 09:02:20 +00:00
Nikita Popov a1dc6d4b83 [AArch64] Do not use ABI alignment for mops.memset.tag
Pointer element types do not imply that the pointer is ABI aligned.
We should be using either an explicit align attribute here, or fall
back to an alignment of 1. This fixes a new element type access
introduced in D117764.

I don't think this makes any practical difference though, as the
lowering does not depend on alignment.

Differential Revision: https://reviews.llvm.org/D118681
2022-02-01 14:37:53 +01:00
Fangrui Song 51ed14d224 [AArch64] Temporarily use getPointerElementType to fix -Wdeprecated-declarations. NFC 2022-01-31 19:16:11 -08:00
tyb0807 5aa08bf708 [AArch64][SelectionDAG] CodeGen for Armv8.8/9.3 MOPS
New target SDNodes are added: AArch64ISD::MOPS_MEMSET, etc.
Each intrinsic is translated to one of these in SelectionDAGBuilder
via EmitTargetCodeForMOPS.

A custom lowering routine for INTRINSIC_W_CHAIN is added to handle
llvm.aarch64.mops.memset.tag. This takes a separate path from the common
intrinsics but ultimately ends up in the same EmitMOPS().

This is part 4/4 of a series of patches split from
https://reviews.llvm.org/D117405 to facilitate reviewing.

Patch by Tomas Matheson, Lucas Prates and Son Tuan Vu.

Differential Revision: https://reviews.llvm.org/D117764
2022-01-31 20:56:27 +00:00
Florian Hahn 23091f7d50
[AArch64] Bail out for float operands in SetCC optimization.
The optimization added in D118139 causes a crash on the added test case
while trying to zero extend an vector of floats.

Fix the crash by bailing out for floating point operands.

Reviewed By: DavidTruby

Differential Revision: https://reviews.llvm.org/D118615
2022-01-31 18:20:47 +00:00
Paul Walker 30efee764d [SVE] Remove AArch64ISD::PFALSE.
AArch64ISD::PFALSE does not provide any value, in fact it can
prevent common combines from firing.  We only needed to lower
to PFALSE until ISD::SPLAT_VECTOR became generally available.

Differential Revision: https://reviews.llvm.org/D118469
2022-01-29 11:31:00 +00:00
Paul Walker 3bc876d0a3 [AArch64] Add isel for bitcasting between bfloat and half types.
Differential Revision: https://reviews.llvm.org/D118420
2022-01-29 11:26:13 +00:00
Ahmed Bougacha 634ca7349d [ObjCARC] Require the function argument in the clang.arc.attachedcall bundle.
Currently, the clang.arc.attachedcall bundle takes an optional function
argument.  Depending on whether the argument is present, calls with this
bundle have the following semantics:

- on x86, with the argument present, the call is lowered to:
    call _target
    mov rax, rdi
    call _objc_retainAutoreleasedReturnValue

- on AArch64, without the argument, the call is lowered to:
    bl _target
    mov x29, x29

  and the objc runtime call is expected to be emitted separately.

That's because, on x86, the objc runtime checks for both the mov and
the call on x86, and treats the combination as the ARC autorelease elision
marker.

But on AArch64, it only checks for the dedicated NOP marker, as that's
historically been sufficiently unique.  Thanks to that, the runtime call
wasn't required to be adjacent to the NOP marker, so it wasn't emitted
as part of the bundle sequence.

This patch unifies both architectures: on AArch64, we now emit all
3 instructions for the bundle.  This guarantees that the runtime call
is adjacent to the marker in the sequence, and that's information the
runtime can use to further optimize this.

This helps simplify some of the handling, in particular
BundledRetainClaimRVs, which no longer needs to know whether the bundle
is sufficient or not: it now always should be.

Note that this does not include an AutoUpgrade for the nullary bundles,
as they are only produced in ObjCContract as part of the obj/asm emission
pipeline, and are not expected to be in bitcode.

Differential Revision: https://reviews.llvm.org/D118214
2022-01-28 12:41:45 -08:00
David Truby 81bd67e18a
[AArch64][SVE][VLS] Move extends into arguments of comparisons
When a comparison is extended and it would be free to extend the
arguments to that comparison, we can propagate the extend into those arguments.
This prevents extra instructions being generated to extend the result of the
comparison, which is not free to extend.

This is a resubmission of D116812 with fixes that need another review.

Differential Revision: https://reviews.llvm.org/D118139
2022-01-28 14:16:08 +00:00
Sander de Smalen af1c8f0d14 [AArch64][SVE] Folds VSELECT if the predicate is all active.
This adds the following changes:

* Fold: vselect(<all active predicate>, x, y) => x
* Extend isAllActivePredicate to take vscale_range into account, e.g.
  isAllActivePredicate(vl16) for nxv16i1 and vscale == 1 => true.
  isAllActivePredicate(vl32) for nxv16i1 and vscale == 2 => true.

Differential Revision: https://reviews.llvm.org/D118147
2022-01-27 15:58:56 +00:00
Sander de Smalen 417a75c6d0 [AArch64][SVE] Avoid using ptrue for ptest in VECREDUCE_OR.
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118145
2022-01-27 11:44:49 +00:00
Sander de Smalen c9da81d997 [AArch64][SVE] Implement missing lowering for extract_subvector for predicates.
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D118057
2022-01-27 11:01:11 +00:00
Sander de Smalen d58757e522 [AArch64][SVE] Implement PFALSE with explicit AArch64ISD node.
The ISel patterns for PFALSE helps recognise the instructions as being
free of side-effects, which helps MachineCSE remove redundant
PFALSE instructions.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D118054
2022-01-27 10:30:13 +00:00
Benjamin Kramer f15014ff54 Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17"
This reverts commit ef82063207.

- It conflicts with the existing llvm::size in STLExtras, which will now
  never be called.
- Calling it without llvm:: breaks C++17 compat
2022-01-26 16:55:53 +01:00
serge-sans-paille ef82063207 Rename llvm::array_lengthof into llvm::size to match std::size from C++17
As a conquence move llvm::array_lengthof from STLExtras.h to
STLForwardCompat.h (which is included by STLExtras.h so no build
breakage expected).
2022-01-26 16:17:45 +01:00
Maciej Gabka c5263cd518 Restrict performPostLD1Combine to 64 and 128 bit vectors
When wider vectors are used, for example fixed width SVE,
there is no patterns to select AArch64ISD::LD1LANEpost
nodes, so we should do an early exit.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D117674
2022-01-26 09:57:44 +00:00
Paul Walker d95cf1f6cf [SVE] Enable ISD::ABDS/U ISel for scalable vectors.
NOTE: This patch also includes tests that highlight those cases
where the existing DAG combine doesn't yet work well for SVE.

Differential Revision: https://reviews.llvm.org/D117873
2022-01-25 12:14:53 +00:00
Nikita Popov aa97bc116d [NFC] Remove uses of PointerType::getElementType()
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().

This is part of D117885, in preparation for deprecating the API.
2022-01-25 09:44:52 +01:00
Peter Waller d4a6bf4d1a Revert "[AArch64][SVE][VLS] Move extends into arguments of comparisons"
This reverts commit db04d3e30b, which
causes a buildbot failure.
2022-01-20 12:01:23 +00:00
Adrian Tong b6a7ae2c5d Optimize shift and accumulate pattern in AArch64.
AArch64 supports unsigned shift right and accumulate. In case we see a
unsigned shift right followed by an OR. We could turn them into a USRA
instruction, given the operands of the OR has no common bits.

Differential Revision: https://reviews.llvm.org/D114405
2022-01-20 01:57:40 +00:00
David Truby db04d3e30b
[AArch64][SVE][VLS] Move extends into arguments of comparisons
When a comparison is extended and it would be free to extend the
arguments to that comparison, we can propagate the extend into those arguments.
This prevents extra instructions being generated to extend the result of the
comparison, which is not free to extend.

Differential Revision: https://reviews.llvm.org/D116812
2022-01-19 14:11:45 +00:00
Jim Lin d6b0734837 [NFC] Use Register instead of unsigned 2022-01-19 20:17:04 +08:00
Akshay Kumar 6f61fe7de9 [Aarch64] Customer lowering of COPYSIGN to SIMD should check for NEON availability
For the following test case, clang is crashing for ARM64 architecture
$ cat crash.c
double crash(double a, double b)
{
	return __builtin_copysign(a, b);
}

$ clang -O2 -march=armv8-a+nosimd --target=arm64 -S crash.c -o /dev/null
fatal error: error in backend: Cannot select: 0x7fae361bb4e8: v2i64 = AArch64ISD::BIT 0x7fae361bb210, 0x7fae361bb278, 0x7fae361bb480
Fix: PR51806

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D116581
2022-01-18 00:25:15 +05:30
David Sherwood 3a272d1eaf [SVE][CodeGen] Use splice instruction when lowering VECTOR_SPLICE
For certain negative indices passed to the VECTOR_SPLICE operation
we can actually directly use the SVE splice instruction by creating
the appropriate predicate. The predicate needs to be constructed in
such a way that all but the last -idx elements are false. We can do
this efficiently using a combination of 'ptrue' (with the appropriate
fixed pattern, e.g. vl1, vl2, etc.) and 'rev'. The advantage of using
these instructions to generate the predicate is they do not set any
flags, unlike the whilelo instruction. This is critical when the splice
operation is in a loop, since we want MachineLICM to hoist the
predicate generation out of the loop.

Differential Revision: https://reviews.llvm.org/D115863
2022-01-11 11:58:17 +00:00
Jay Foad 3f3fe4a5cf [GlobalISel] Fix typo Extact to Extract in function name. NFC. 2022-01-07 11:13:35 +00:00
Nicholas Guy 13992498cd [AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops
Differential Revision: https://reviews.llvm.org/D114879
2022-01-05 12:54:31 +00:00
Paul Walker 4325fd7402 [AArch64ISelLowering] Don't look through scalable extract_subvector when optimising DUPLANE.
When constructDup is passed an extract_subvector it tries to use
extract_subvector's operand directly when creating the DUPLANE.
This is invalid when extracting from a scalable vector because the
necessary DUPLANE ISel patterns do not exist.

NOTE: This patch is an update to https://reviews.llvm.org/D110524
that originally fixed this but introduced a bug when the result
VT is 64bits. I've restructured the code so the critial final
else block is entered when necessary.

Differential Revision: https://reviews.llvm.org/D116442
2022-01-05 11:56:59 +00:00
Andrew Wei 03dc2975d0 [AArch64][SVE] Lower shuffles to permute instructions: zip1/2, uzp1/2, trn1/2
Attempt to lower a shuffle as a permute instruction(zip/uzp/trn) for fixed length SVE.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D113376
2021-12-21 18:39:09 +08:00
David Truby 7e44eb079d [AArch64][SVE] Improve code generation for VLS i1 masks
This patch partially resolves an issue for VLS code generation
where a mask is generated from a smaller width integer comparison
than the instruction using the mask requires.

Instead of sign extending a p register by converting it to a z
register, extending that, and converting back, we instead just
do an unpack of the p register.

A separate issue causes the code generation to still be poor when
the mask generation would fit in a neon register, as we then use
a neon comparison operation and have to convert that to a p register.
This will be resolved in a separate patch.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D111221
2021-12-17 16:26:49 +00:00
David Truby 5c9684704d [DAG][sve] Lowering for VLS masked truncating stores
This extends the custom lowering for truncating stores on
fixed length vectors in SVE to support masked truncating stores.
It also adds a DAG combine for truncates followed by masked
stores.

Reviewed By: peterwaller-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D108115
2021-12-17 15:04:45 +00:00
Andrew Wei dc7b672f96 [AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revw
Attempt to lower a shuffle as a permute instruction(rev/revb/revh/revw) for fixed length SVE.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D114960
2021-12-15 21:53:00 +08:00
John Brawn dc9f65be45 [AArch64][SVE] Fix handling of stack protection with SVE
Fix a couple of things that were causing stack protection to not work
correctly in functions that have scalable vectors on the stack:
 * Use TypeSize when determining if accesses to a variable are
   considered out-of-bounds so that the behaviour is correct for
   scalable vectors.
 * When stack protection is enabled move the stack protector location
   to the top of the SVE locals, so that any overflow in them (or the
   other locals which are below that) will be detected.

Fixes: https://github.com/llvm/llvm-project/issues/51137

Differential Revision: https://reviews.llvm.org/D111631
2021-12-14 11:30:48 +00:00
Peter Waller 921e89c59a [SVE] Only combine (fneg (fma)) => FNMLA with nsz
-(Za + Zm * Zn) != (-Za + Zm * (-Zn))
when the FMA produces a zero output (e.g. all zero inputs can produce -0
output)

Add a PatFrag to check presence of nsz on the fneg, add tests which
ensure the combine does not fire in the absense of nsz.

See https://reviews.llvm.org/D90901 for a similar discussion on X86.

Differential Revision: https://reviews.llvm.org/D109525
2021-12-13 11:33:07 +00:00
Matt Devereau 2e585dd91a [AArch64][SVE] Lower vector.insert to predicated merged MOV
Use predicated SEL for vector.insert instead of going through memory

Differential Revision: https://reviews.llvm.org/D115259
2021-12-13 11:17:55 +00:00
Peter Waller ed43aab98d [AArch64][SVE] Fix fptrunc store for fixed len vector
Restrict duplicate FP_EXTEND/FP_TRUNC -> LOAD/STORE DAG combines to only
larger than NEON types, as these are the ones for which there is custom
lowering.

Update tests so that they go through memory to improve validation.

Differential Revision: https://reviews.llvm.org/D115166
2021-12-07 12:22:07 +00:00
Peter Waller a6f751c34e [AArch64][SVE] Fix ICE extracting fixedvec from scalable load
f526c600c0 had a concern raised because of an invalid typesize request
on a scalable vector, which this patch addresses.

Prevent shouldReduceLoadWidth from attempting to query the bit size, and
add a regression test in sve-extract-fixed-vector.ll.

Differential Revision: https://reviews.llvm.org/D115156
2021-12-06 16:49:43 +00:00
Matt Devereau 4244f95cc6 [AArch64][SVE] Enable bf16 vector.insert
Allow passthrough bf16 registers for vector.insert

Differential revision: https://reviews.llvm.org/D114858
2021-12-02 12:59:19 +00:00
Bradley Smith fd9069ffce [AArch64][SVE] Duplicate FP_EXTEND/FP_TRUNC -> LOAD/STORE dag combines
By duplicating these dag combines we can bypass the legality checks that
they do, this allows us to perform these combines on larger than legal
fixed types, which in turn allows us to bring the same benefits D114580
brought but to larger than legal fixed types.

Depends on D114580

Differential Revision: https://reviews.llvm.org/D114628
2021-12-01 15:33:53 +00:00
David Green 9e8a71caf0 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 15:29:14 +00:00
Hans Wennborg a87782c34d Revert "[DAG] Create fptosi.sat from clamped fptosi"
It causes builds to fail with this assert:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

See comment on the code review.

> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976

This reverts commit 52ff3b0093.
2021-11-30 15:36:56 +01:00
David Green 52ff3b0093 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 11:05:32 +00:00
David Sherwood 84364bdaab [CodeGen][AArch64] Bail out in performConcatVectorsCombine for scalable vectors
I tried to exercise the existing combine patterns in performConcatVectorsCombine
for scalable vectors and at the moment it doesn't seem possible. Parts of
the code currently assume we're dealing with fixed-width vectors with calls
to getVectorNumElements(), therefore I've decided to simply bail out early
for scalable vectors.

Added a test here to show that we don't crash when attempting to combine
truncate + concat:

  CodeGen/AArch64/concat_vector-truncate-combine.ll

Differential Revision: https://reviews.llvm.org/D114600
2021-11-29 14:26:14 +00:00
Bradley Smith 6180806632 [AArch64][SVE] Mark fixed-type FP extending/truncating loads/stores as custom
This allows the generic DAG combine to fold fp_extend/fp_trunc into
loads/stores which we can then lower into a integer extending
load/truncating store plus an FP_EXTEND/FP_ROUND.

The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked
types hence lowering them introduces an unpack/zip. By allowing these
nodes to be combined with loads/store we make it much easier to have
this unpack/zip combined into the load/store by our custom lowering.

Differential Revision: https://reviews.llvm.org/D114580
2021-11-29 11:56:07 +00:00
David Sherwood a31f4bdfe8 [CodeGen][SVE] Use whilelo instruction when lowering @llvm.get.active.lane.mask
In most common cases the @llvm.get.active.lane.mask intrinsic maps directly
to the SVE whilelo instruction, which already takes overflow into account.
However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower
this immediately to a generic sequence of instructions that explicitly
take overflow into account. This makes it very difficult to then later
transform back into a single whilelo instruction. Therefore, this patch
introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if
we should lower/expand this to a sequence of generic ISD nodes, or instead
just leave it as an intrinsic for the target to lower.

You can see the significant improvement in code quality for some of the
tests in this file:

  CodeGen/AArch64/active_lane_mask.ll

Differential Revision: https://reviews.llvm.org/D114542
2021-11-29 08:08:17 +00:00
Bradley Smith eafbaca977 [AArch64][SVE] Generate ASRD instructions for power of 2 signed divides
Differential Revision: https://reviews.llvm.org/D113281
2021-11-26 11:08:27 +00:00
Simon Pilgrim 63b1e58f07 [DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED)
If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift.

For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task.

Reapplied with fix for AArch64 rev patterns to matching the ARM fix.

https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount)
https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount)
https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount)
https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount)

Differential Revision: https://reviews.llvm.org/D114354
2021-11-25 11:14:15 +00:00
Bradley Smith 080ef0b6a6 [AArch64][SVE] Recognize all ones mask during fixed mask generation
Differential Revision: https://reviews.llvm.org/D114431
2021-11-24 13:55:06 +00:00
David Green 760d4d03d5 [AArch64] Sink splat shuffles to lane index intrinsics
This teaches AArch64TargetLowering::shouldSinkOperands to sink splat
shuffles to certain neon intrinsics, so that they can make use of the
lane variants of the instructions that are available.

Differential Revision: https://reviews.llvm.org/D112994
2021-11-22 08:11:35 +00:00