Commit Graph

1329 Commits

Author SHA1 Message Date
David Green 60f57b3658 [AArch64] Ensure fixed point fptoi_sat has correct saturation width
D113200 introduced an error where it was converting FP_TO_SI_SAT with
multiply to a fixed point floating point convert. The saturation
bitwidth needs to be equal to the floating point width, or else the
routine would truncate the result as opposed to saturating it.

Fixes #54601
2022-03-29 10:12:44 +01:00
Shao-Ce SUN 662b9fa02c [NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D122557
2022-03-29 09:53:24 +08:00
zhongyunde c3fe025bd4 [AArch64][SelectionDAG] Refactor to support more scalable vector extending loads
Accord the discussion in D120953, we should firstly exclude all scalable vector
extending loads and then selectively enable those which we directly support.

This patch is intend to refactor for above (truncating stores is not touched),and
more scalable vector types will try to reduce the number of masked loads in favour
of more unpklo/hi instructions.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D122281
2022-03-27 21:18:01 +08:00
David Green 693d3b7e76 [AArch64] Lower 3 and 4 sources buildvectors to TBL
The default expansion for buildvectors is to extract each element and
insert them into a new vector. That involves a lot of copying to/from
the GPR registers. TLB3 and TLB4 can be relatively slow instructions
with the mask needing to be loaded from a constant pool, but they should
always be better than all the moves to/from GPRs.

Differential Revision: https://reviews.llvm.org/D121137
2022-03-26 21:10:43 +00:00
Simon Pilgrim e699b5da44 [AArch64] isProfitableToHoist - remove nullptr test
User is dereferenced on the main codepath so the null test is likely superfluous
2022-03-25 10:27:16 +00:00
David Green 3d8d60e147 Revert "[AArch64] Lower 3 and 4 sources buildvectors to TBL"
This reverts commit ec93b28909 as problems
with it have been reported.
2022-03-25 10:03:10 +00:00
David Green ec93b28909 [AArch64] Lower 3 and 4 sources buildvectors to TBL
The default expansion for buildvectors is to extract each element and
insert them into a new vector. That involves a lot of copying to/from
the GPR registers. TLB3 and TLB4 can be relatively slow instructions
with the mask needing to be loaded from a constant pool, but they should
always be better than all the moves to/from GPRs.

Differential Revision: https://reviews.llvm.org/D121137
2022-03-24 10:02:33 +00:00
David Green 54bc9ad2e8 [AArch64] Make some methods static. NFC 2022-03-24 08:55:27 +00:00
David Spickett c3b98194df Reland "[llvm][AArch64] Insert "bti j" after call to setjmp"
This reverts commit edb7ba714a.

This changes BLR_BTI to take variable_ops meaning that we can accept
a register or a label. The pattern still expects one argument so we'll
never get more than one. Then later we can check the type of the operand
to choose BL or BLR to emit.

(this is what BLR_RVMARKER does but I missed this detail of it first time around)

Also require NoSLSBLRMitigation which I missed in the first version.
2022-03-23 11:43:43 +00:00
David Spickett edb7ba714a Revert "[llvm][AArch64] Insert "bti j" after call to setjmp"
This reverts commit eb5ecbbcbb
due to failures on buildbots with expensive checks enabled.
2022-03-23 10:43:20 +00:00
David Spickett eb5ecbbcbb [llvm][AArch64] Insert "bti j" after call to setjmp
Some implementations of setjmp will end with a br instead of a ret.
This means that the next instruction after a call to setjmp must be
a "bti j" (j for jump) to make this work when branch target identification
is enabled.

The BTI extension was added in armv8.5-a but the bti instruction is in the
hint space. This means we can emit it for any architecture version as long
as branch target enforcement flags are passed.

The starting point for the hint number is 32 then call adds 2, jump adds 4.
Hence "hint #36" for a "bti j" (and "hint #34" for the "bti c" you see
at the start of functions).

The existing Arm command line option -mno-bti-at-return-twice has been
applied to AArch64 as well.

Support is added to SelectionDAG Isel and GlobalIsel. FastIsel will
defer to SelectionDAG.

Based on the change done for M profile Arm in https://reviews.llvm.org/D112427

Fixes #48888

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D121707
2022-03-23 09:51:02 +00:00
zhongyunde 828b89bc0b [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads
Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.

Both of normal and masked loads test cases added to guard compile crash.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D120953
2022-03-21 23:47:33 +08:00
chenglin.bi dd3b90e4d7 [AArch64] Combine ISD::SETCC into AArch64ISD::ANDS
When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false).
ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction.
So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison
Fix https://github.com/llvm/llvm-project/issues/54283

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D121449
2022-03-19 13:04:16 +00:00
Paul Walker f46fe36d59 [AArch64] Fix incorrect getSetCCInverse usage within trySwapVSelectOperands.
When inverting the compare predicate trySwapVSelectOperands is
incorrectly using the type of the select's cond operand rather
than the type of cond's operands. This means we're treating all
inversions as if they're integer.

Differential Revision: https://reviews.llvm.org/D121968
2022-03-19 12:36:14 +00:00
David Green fe6057a293 [AArch64] Custom lower concat(v4i8 load, ...)
We already have custom lowering for v4i8 load, which loads as a f32,
converts to a vector and bitcasts and extends the result to a v4i16.
This adds some custom lowering of concat(v4i8 load, ...) to keep the
result as an f32 and create a buildvector of the resulting f32 loads.
This helps not create all the extends and bitcasts, which are often
difficult to fully clean up.

Differential Revision: https://reviews.llvm.org/D121400
2022-03-18 11:58:02 +00:00
David Green 0b6df40c52 [AArch64] Combine ISD::AND into AArch64ISD::ANDS
If we already have a AArch64ISD::ANDS node with identical operands, we
can merge any ISD::AND into it, reducing the instruction count by
calculating the value and the flags in a single operation. This code is
taken from the X86 backend, and could also handle AArch64ISD::ADDS and
AArch64ISD::SUBS, but I couldn't find any test cases where it came up.

Differential Revision: https://reviews.llvm.org/D118584
2022-03-17 09:44:11 +00:00
zhongyunde 3568333815 [AArch64] Perform last active true vector combine
Test bit of lane EC-1 can use P register directly, eg:
Materialize : Idx = (add (mul vscale, NumEls), -1)
               i1 = extract_vector_elt t37, Constant:i64<Idx>
    ... into: "ptrue p, all" + PTEST

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D121180
2022-03-15 01:25:03 +08:00
Arthur Eubanks 250620f76e [OpaquePtr][AArch64] Use elementtype on ldxr/stxr
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D120527
2022-03-14 10:09:59 -07:00
David Sherwood aeeb1199b4 [AArch64][SVE] Change the asserts in LowerToPredicatedOp to check for legal types
When building the LLVM test suite with SVE I discovered a crash
when compiling some Halide tests, which occurs because we try to
use SVE to lower 64-bit vector multiplies and there is no
vscale_range attribute on the function. In this case the min SVE
vector bits was 0, which caused an assert in LowerToPredicatedOp
to fire. I have amended the asserts in this function to check that the
fixed-width type is legal. If the fixed-width type is larger than NEON
and is legal then it must be because we've set the min SVE vector
bits to something > 128. Or if the min SVE bits is 0, then the only
legal types allowed are 128 bit types - for any other types the assert
will fire.

Tests added here:

  CodeGen/AArch64/sve-fixed-length-no-vscale-range.ll

Differential Revision: https://reviews.llvm.org/D121297
2022-03-11 09:57:58 +00:00
Philippe Valembois 26cd258420 [AArch64] Use correct calling convention for each vararg
While checking is tail call optimization is possible, the calling
convention applied to fixed arguments is not the correct one.
This implies for DarwinPCS that all arguments of a vararg function will
go to the stack although fixed ones can go in registers.

This prevents non-virtual thunks to be tail optimized although they are
marked as musttail.

Differential Revision: https://reviews.llvm.org/D120622
2022-03-10 15:07:25 -08:00
David Green 4899e2cab4 [AArch64] Fix type in comment. NFC 2022-03-10 15:03:27 +00:00
David Green 21a97a2ac1 [AArch64] TBL uses zero for out of range elements.
A TBL instruction will use zero for any out of range values. We can use
this in GenerateTBL to help turn a TBL2 into a TBL1, avoiding the need
to materialise the zero.

Differential Revision: https://reviews.llvm.org/D121139
2022-03-10 14:45:13 +00:00
zhongyunde c22c8b151b [AArch64] Perform first active true vector combine
Materialize : i1 = extract_vector_elt t37, Constant:i64<0>
   ... into: "ptrue p, all" + PTEST
Test bit of lane 0 can use P register directly, and the instruction “pture all”
is loop invariant, which will beneficial to SVE after hoisting out the loop.

Reviewed By: david-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D120891
2022-03-08 01:10:21 +08:00
David Green d9633d1490 [AArch64] Turn truncating buildvectors into truncates
When lowering large v16f32->v16i8 fp_to_si_sat, the fp_to_si_sat node is
split several times, creating an illegal v4i8 concat that gets expanded
into a BUILD_VECTOR. After some combining and other legalisation, it
ends up the a buildvector that extracts from 4 vectors, looking like
BUILDVECTOR(a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3). That is
really an v16i32->v16i8 truncate in disguise.

This adds a ReconstructTruncateFromBuildVector method to detect the
pattern, converting it back into the legal "concat(trunc(concat(trunc(a),
trunc(b))), trunc(concat(trunc(c), trunc(d))))" tree. The extracted
nodes could also be v4i16, in which case the truncates are not needed.
All those truncates and concats then become uzip1's, which is much
better than expanding by moving vector lanes around.

Differential Revision: https://reviews.llvm.org/D119469
2022-03-07 09:42:54 +00:00
Benjamin Kramer fbce4a7803 Drop some more global std::maps. NFCI. 2022-03-06 13:28:29 +01:00
Karl Meakin 43a0016f3d Extend `performANDCSELCombine` to `performANDORCSELCombine`
Differential Revision: https://reviews.llvm.org/D120422
2022-03-04 15:09:59 +00:00
Paul Walker 42b4a6227e [DAGCombine] Prevent illegal ISD::SPLAT_VECTOR operations post legalisation.
When triggered during operation legalisation the affected combine
generates a splat_vector that when custom lowered for SVE fixed
length code generation, results in the original precombine sequence
and thus we enter a legalisation/combine hang.

NOTE: The patch contains no tests because I observed this issue
only when combined with other work that might never become public.
The current way AArch64 lowers ISD::SPLAT_VECTOR meant a specific
test was not possible so I'm hoping the DAGCombiner fix can be seen
as obvious. The AArch64ISelLowering change is requirted to maintain
existing code quality.

Differential Revision: https://reviews.llvm.org/D120735
2022-03-04 11:54:03 +00:00
David Green e348b09bb5 [AArch64] Turn UZP1 with undef operand into truncate
This turns upz1(x, undef) to concat(truncate(x), undef), as the truncate
is simpler and can often be optimized away, and it helps some of the
insert-subvector tests optimize more cleanly.

Differential Revision: https://reviews.llvm.org/D120879
2022-03-04 11:12:26 +00:00
Cullen Rhodes 616586794b [AArch64] Add legal types for Streaming SVE
The compiler currently crashes for scalable types when compiling with
+sme, e.g.

  define <vscale x 4 x i32> @foo(<vscale x 4 x i32> %a) {
    ret <vscale x 4 x i32> %a
  }

since it doesn't know how to legalize the types. SME implies a subset of
SVE (+streaming-sve), the hasSVE predication in the backend needs
extending to consider types/operations that are legal in Streaming SVE.

This is the first patch adding legal types <-> register classes. Before
making the change +sve(2) was temporarily replaced with +sme in all the
intrinsics tests to see what failed, and again after making the change.
For all the tests that passed after adding the legal types another RUN
line has been added for +streaming-sve. More patches to follow.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D118561
2022-03-03 09:51:14 +00:00
Sander de Smalen eac2638ec1 [AArch64][SVE] Fold away SETCC if original input was predicate vector.
This adds the following two folds:

Fold 1:
   setcc_merge_zero(
       all_active, extend(nxvNi1 ...), != splat(0))
  -> nxvNi1 ...

Fold 2:
   setcc_merge_zero(
       pred, extend(nxvNi1 ...), != splat(0))
  -> nxvNi1 and(pred, ...)

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D119334
2022-02-28 14:12:43 +00:00
Sander de Smalen 201e3686ab [AArch64][SVE] Handle more cases in findMoreOptimalIndexType.
This patch addresses @paulwalker-arm's comment on D117900 to
only update/write the by-ref operands iff the function returns
true. It also handles a few more cases where a series of added
offsets can be folded into the base pointer, rather than just looking
at a single offset.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D119728
2022-02-28 12:13:52 +00:00
Paul Walker 16ee102964 [SVE] Add missing splat patterns for bfloat vectors.
Differential Revision: https://reviews.llvm.org/D120496
2022-02-25 16:53:39 +00:00
Paul Walker 8ca5be93cc [SVE] Don't custom lower constant predicate ISD:SPLAT_VECTOR operations.
Differential Revision: https://reviews.llvm.org/D120340
2022-02-25 11:32:37 +00:00
Arthur Eubanks 6aa285eb85 [OpaquePtr][AArch64] Use load/store value type instead of pointer type for ldnt1/stnt1 alignment 2022-02-24 16:56:13 -08:00
Craig Topper c7d6448d03 [DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable.
Internally to DAGCombiner the SDValues were passed by non-const
reference despite not being modified. They were then passed by
const reference to TLI.

This patch passes them by value which is consistent with the vast
majority of code.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D120420
2022-02-23 12:40:45 -08:00
David Green 774b571546 [AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors
We have a combine for converting mul(dup(ext(..)), ...) into
mul(ext(dup(..)), ..), for allowing more uses of smull and umull
instructions. Currently it looks for vector insert and shuffle vectors
to detect the element that we can convert to a vector extend. Not all
cases will have a shufflevector/insert element though.

This started by extending the recognition to buildvectors (with elements
that may be individually extended). The new method seems to cover all
the cases that the old method captured though, as the shuffle will
eventually be lowered to buildvectors, so the old method has been
removed to keep the code a little simpler. The new code detects legal
build_vector(ext(a), ext(b), ..), converting them to ext(build_vector(a,
b, ..)) providing all the extends/types match up.

Differential Revision: https://reviews.llvm.org/D120018
2022-02-22 23:37:22 +00:00
Reid Kleckner ecb27004ec Revert "[AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors"
This reverts commit 9fc1a0dcb7.

We have bisected a compiler crash to this revision and will provide a
test case soon.
2022-02-22 10:31:09 -08:00
Sunho Kim d6a9eec238 [AARCH64][DAGCombine] Add combine for negation of CSEL absolute value pattern.
This folds a negation through a csel, which can come up during the
lowering of negative abs.

Fixes https://github.com/llvm/llvm-project/issues/51558.

Differential Revision: https://reviews.llvm.org/D112204
2022-02-22 09:59:36 +00:00
David Green 9fc1a0dcb7 [AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors
We have a combine for converting mul(dup(ext(..)), ...) into
mul(ext(dup(..)), ..), for allowing more uses of smull and umull
instructions. Currently it looks for vector insert and shuffle vectors
to detect the element that we can convert to a vector extend. Not all
cases will have a shufflevector/insert element though.

This started by extending the recognition to buildvectors (with elements
that may be individually extended). The new method seems to cover all
the cases that the old method captured though, as the shuffle will
eventually be lowered to buildvectors, so the old method has been
removed to keep the code a little simpler. The new code detects legal
build_vector(ext(a), ext(b), ..), converting them to ext(build_vector(a,
b, ..)) providing all the extends/types match up.

Differential Revision: https://reviews.llvm.org/D120018
2022-02-21 15:44:30 +00:00
Serguei Katkov 7f2293ba25 [STATEPOINT] Mark LR is early-clobber implicit def.
LR is modified at the moment of the call and before any use is read.

Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D120114
2022-02-21 10:37:43 +07:00
David Green b8801ba050 [AArch64] Common patterns between UMULL and int_aarch64_neon_umull
We have some duplicate patterns between the AArch64ISD::UMULL (/SMULL)
and the int_aarch64_neon_umull (/smull) intrinsics. They did not
replicate all the patterns though, leaving some gaps on instructions
like umlal2 from codegen. This commons all the patterns by converting
all int_aarch64_neon_umull intrinsics to UMULL nodes and removing the
duplicate for umull/smull intrinsics, so that all instructions go
through the same tablegen pattern.

This improves some of the longer-than-legal mla patterns, helping them
replace ext with umlal2.

Differential Revision: https://reviews.llvm.org/D119887
2022-02-19 14:38:57 +00:00
Simon Pilgrim f60d101b00 Fix Wdocumentation unknown parameter warning 2022-02-19 13:00:59 +00:00
John Brawn bbd7eac800 [AArch64] Remove an unused variable in my previous patch 2022-02-17 16:39:44 +00:00
John Brawn 8e17c9613f [AArch64] Add some missing strict FP vector lowering
Also add a test for the codegen of strict FP vector operations so
these changes get tested.

Differential Revision: https://reviews.llvm.org/D117795
2022-02-17 16:10:31 +00:00
Matt Devereau 2f2dcb4fb1 [AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operations
(vselect (setcc ( condcode) (_) (_)) (a)          (op (a) (b)))
=> (vselect (setcc (!condcode) (_) (_)) (op (a) (b)) (a))

As a follow up to D117689, invert the operand order and condition
in order to fold vselects into predicated instructions.

Differential Revision: https://reviews.llvm.org/D119424
2022-02-17 16:01:17 +00:00
John Brawn d916856bee [AArch64] Allow strict opcodes in faddp patterns
This also requires adjustment to code in AArch64ISelLowering so that
vector_extract is distributed over strict_fadd.

Differential Revision: https://reviews.llvm.org/D118489
2022-02-17 13:11:55 +00:00
John Brawn d4342efb69 [AArch64] Add instruction selection for strict FP
This consists of marking the various strict opcodes as legal, and
adjusting instruction selection patterns so that 'op' is 'any_op'.

FP16 and vector instructions additionally require some extra work in
lowering and legalization, so we can't set IsStrictFPEnabled just yet.
Also more work needs to be done for full strict fp support (marking
instructions that can raise exceptions as such, and modelling FPCR use
for controlling rounding).

Differential Revision: https://reviews.llvm.org/D114946
2022-02-17 13:11:54 +00:00
David Green f3bc7fd546 [AArch64] Cleanup for performCommonVectorExtendCombine. NFC
This is some NFC (hopefully!) cleanup for performCommonVectorExtendCombine
and related methods, removing conditions that cannot occur and otherwise
cleaning up the code a little.
2022-02-17 10:03:28 +00:00
zhongyunde 064b2a6dc6 [DAGCombiner][AArch64] Enhance to fold CSNEG into CSINC instruction
Perform the scalar expression combine in the form of:
  CSNEG(1, c, cc) + b  =>  cc  ? b+1 : b-c => CSINC(b-c, b, !cc)
  CSNEG(c, -1, cc) + b =>  cc  ? b+c : b+1 => CSINC(b+c, b, cc)

Fix https://github.com/llvm/llvm-project/issues/53071

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D119105
2022-02-16 09:39:38 +08:00
David Green 4072e362c0 [ISel] Port AArch64 HADD and RHADD to ISel
This ports the aarch64 combines for HADD and RHADD over to DAG combine,
so that they can be used in more architectures (notably MVE in a
followup patch). They are renamed to AVGFLOOR and AVGCEIL in the
process, to avoid confusion with instructions such as X86 hadd. The code
was also rewritten slightly to remove the AArch64 idiosyncrasies.

The general pattern for a AVGFLOORS is
  %xe = sext i8 %x to i32
  %ye = sext i8 %y to i32
  %a = add i32 %xe, %ye
  %r = lshr i32 %a, 1
  %t = trunc i32 %r to i8

An AVGFLOORU is equivalent with zext. Because of the truncate
lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes
an extra rounding, so includes an extra add of 1.

Differential Revision: https://reviews.llvm.org/D106237
2022-02-11 18:28:56 +00:00