Commit Graph

5822 Commits

Author SHA1 Message Date
David Green bccbf5276e [AArch64] Remove isDef32
isDef32 would attempt to make a guess at which SelectionDag nodes were
32bit sources, and use the nature of 32bit AArch64 instructions
implicitly zeroing the upper register half to not emit zext that were
expected to already be zero. This was a bit fragile though, needing to
guess at the correct opcodes that do not become 32bit defs later in
ISel.

This patch removed isDef32, relying on the AArch64MIPeephole optimizer
to remove redundant SUBREG_TO_REG nodes. A part of
SelectArithExtendedRegister was left with the same logic as a heuristic
to prevent some regressions from it picking less optimal sequences.
The AArch64MIPeepholeOpt pass also needs to be taught that a COPY from a
FPR will become a FMOVSWr, which it lowers immediately to make sure that
remains true through register allocation.

Fixes #55833

Differential Revision: https://reviews.llvm.org/D127154
2022-06-07 18:57:59 +01:00
Matt Arsenault 56303223ac llvm-reduce: Don't assert on functions which don't track liveness
Use the query that doesn't assert if TracksLiveness isn't set, which
needs to always be available. We also need to start printing liveins
regardless of TracksLiveness.
2022-06-07 10:00:25 -04:00
David Green 6468feaeac [AArch64] Regenerate arm64-shifted-sext.ll and add a test from #55833. NFC 2022-06-07 13:55:53 +01:00
Michael Kitzan b7fcf6632f [GISel] Add new combines for G_ADD
Patch adds new GICombineRules for G_ADD:

G_ADD(x, G_SUB(y, x)) -> y
G_ADD(G_SUB(y, x), x) -> y

Patch additionally adds new combine tests for AArch64 target for
these new rules.

Reviewed by: paquette

Differential Revision: https://reviews.llvm.org/D87936
2022-06-06 11:19:45 -07:00
David Green 4ea1b43527 [AArch64] Generate ADDP from shuffled add
This adds a fold of add(x, shuffle(x, <1,0,3,2,5,4,...>), into
shuffle(addp(x), <0,0,1,1,2,2,..>. The ADDP instruction takes two
vectors and returns one, adding adjacent pairs. So we match x in a
custom combine as it is lowered from a v8i32. The original code
would be 2 rev64 and 2 add, with the new code being a single addp
with a zip1;zip2 shuffle, producing smaller code.

Differential Revision: https://reviews.llvm.org/D126686
2022-06-06 11:39:51 +01:00
Paul Walker 2dde272db7 [SVE] Refactor sve-bitcast.ll to include all combinations for legal types.
Patch enables custom lowering for MVT::nxv4bf16 because otherwise
the refactored test file triggers a selection failure.

The reason for the refactoring it to highlight cases where the
generated code is wrong.
2022-06-03 12:09:19 +01:00
David Green 79e3b043e5 [AArch64] Add extra addp codegen tests. NFC 2022-06-03 11:36:40 +01:00
Serguei Katkov 24e16e4af2 [SSAUpdaterImpl] Do not generate phi node with all the same incoming values
If all available vals to basic block are the same - do not build new phi node and
just use this value.

Reviewed By: sameerds
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D126525
2022-06-03 12:24:33 +07:00
Serguei Katkov c4d955dd7f [MachineSSAUpdate] Add a test for redundant phi generation. 2022-06-03 11:27:14 +07:00
Paul Walker 48ea26a387 [SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR.
LowerINSERT_SUBVECTOR emits AArch64ISD::UUNPK## when lowering
scalable vector floating point INSERT_SUBVECTOR. However, these
nodes only make sense for integer types and thus isel patterns do
not exist for floating point, which leads to isel failures.

This patch ensures floating point operands are cast to integer
before the core lowering takes place.

Fixes: #55037

Differential Revision: https://reviews.llvm.org/D126487
2022-06-02 14:51:04 +01:00
Nikita Popov 41d5033eb1 [IR] Enable opaque pointers by default
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:

* If IR that contains *neither* explicit ptr nor %T* types is passed
  to tools, we will now use opaque pointer mode, unless
  -opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
  It is possible to opt-out by calling setOpaquePointers(false) on
  LLVMContext.

A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.

Differential Revision: https://reviews.llvm.org/D126689
2022-06-02 09:40:56 +02:00
Hendrik Greving a92ed167f2 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-02 00:49:11 +00:00
Hendrik Greving e9d05cc7d8 Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4."
This reverts commit 430ac5c302.

Due to failures in Clang tests.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 13:27:49 -07:00
Hendrik Greving 430ac5c302 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 12:48:01 -07:00
Fangrui Song 873d2aff42 [AArch64][test] Replace -march with -mtriple for llc RUN lines
-march is error-prone: -march inherits the OS and environment from the default
target triple. Use -mtriple which is more common.
2022-05-31 22:39:43 -07:00
Alexander Shaposhnikov a72cc958a3 [CodeGen][AArch64] Add support for LDAPR
This diff adds support for LDAPR (RCPC extension)
(https://github.com/llvm/llvm-project/issues/55561).

Differential revision: https://reviews.llvm.org/D126250

Test plan: ninja check-all
2022-05-31 21:40:50 +00:00
Sander de Smalen 9c38fc111b [AArch64] Remove references to Streaming SVE from target features.
Following discussion on D120261 and D121208 it seems better to remove the
concept of Streaming SVE from the subtarget/assembler predicates and
instead reason about 'SVE' and 'SME' as its higher level features, rather
than trying to model this runtime mode through explicit feature flags.

This patch is largely NFC.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D125977
2022-05-31 16:25:01 +02:00
David Green 5cb14dc5a3 [AArch64] Look through copy in MachineCombiner FMUL patterns.
This is a small addition to D99662, which added machine combiner
patterns for FMUL(DUP(..)). Due to the way these are generated from
ISel, they may also be FMUL(COPY(DUP(..))), which this patch now
ignores the no-op COPY in.

Differential Revision: https://reviews.llvm.org/D126632
2022-05-31 09:28:00 +01:00
Edd Barrett d245974e1a Test stackmap support for floating point types.
It appears that float support is complete, or at least, the stackmap records
emitted are not inconceivable (I must admit that I don't know about many of the
architectures under test here).

One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect)
quirk of the stackmap format: in the case of a Register record, the Offset or
SmallConstant field can encode a sub-register index! I've only ever seen this
field zero for Register entries up until now.
2022-05-30 10:49:32 +01:00
David Green 99b0078064 [AArch64] Tests for showing MachineCombiner COPY patterns. NFC 2022-05-30 10:47:44 +01:00
David Green 9a3144d078 [AArch64] Reuse larger DUP if available
If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the
larger node using a vector extract to obtain the smaller. This comes up
in the smull/smlal code, but needs a small fixup to allow the smull2
code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still
match smull2 extracts.

Differential Revision: https://reviews.llvm.org/D126449
2022-05-29 19:42:13 +01:00
Serge Pavlov bdd0093f4d [GlobalISel] Add G_IS_FPCLASS
Add a generic opcode to represent `llvm.is_fpclass` intrinsic.

Differential Revision: https://reviews.llvm.org/D121454
2022-05-27 13:49:47 +07:00
Rahman Lavaee 3aa249329f Revert "[Propeller] Promote functions with propeller profiles to .text.hot."
This reverts commit 4d8d2580c5.
2022-05-26 18:45:40 -07:00
Rahman Lavaee 4d8d2580c5 [Propeller] Promote functions with propeller profiles to .text.hot.
Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely).

This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`.

The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`.

Differential Revision: https://reviews.llvm.org/D122930
2022-05-26 16:23:21 -07:00
Adrian Tong 7c13ae6490 Give option to use isCopyInstr to determine which MI is
treated as Copy instruction in MCP.

This is then used in AArch64 to remove copy instructions after taildup
ran in machine block placement

Differential Revision: https://reviews.llvm.org/D125335
2022-05-26 18:43:16 +00:00
Chen Zheng d79275238f [MachineSink] replace MachineLoop with MachineCycle
reapply 62a9b36fcf and fix module build
failue:
1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def
   MachineCycleInfoWrapperPass is a anylysis pass, should not be there.
2: move the definition for MachineCycleInfoPrinterPass to cpp file.

Otherwise, there are module conflicit for MachineCycleInfoWrapperPass
in MachinePassRegistry.def and MachineCycleAnalysis.h after
62a9b36fcf.

MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-26 06:45:23 -04:00
Chen Zheng 80c4910f3d Revert "[MachineSink] replace MachineLoop with MachineCycle"
This reverts commit 62a9b36fcf.
Cause build failure on lldb incremental buildbot:
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes
2022-05-24 22:43:37 -04:00
Paul Walker 6f215ca680 [SelectionDAG] Add support to widen ISD::STEP_VECTOR operations.
Fixes: #55165

Differential Revision: https://reviews.llvm.org/D126168
2022-05-24 22:42:37 +01:00
Chen Zheng 62a9b36fcf [MachineSink] replace MachineLoop with MachineCycle
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-24 01:16:19 -04:00
Craig Topper 569d8945f3 [DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2.
If the VT is i2, then 2 is really -2.

Test has not been commited yet, but diff shows the change.

Fixes PR55644.

Differential Revision: https://reviews.llvm.org/D126213
2022-05-23 11:13:57 -07:00
Craig Topper 75eb0576de [AArch64] Add test case for pr55644. NFC 2022-05-23 11:13:57 -07:00
Edd Barrett c5e5cf1258 Test stackmap support for i128
This diff adds tests that check the currently-working stackmap cases for i128.
This will help ensure no regressions are later introduced by D125680 (when
ready).

Note that i128 stackmap support is currently incomplete, so we cant test all
i128 functionality:

    i128 constants >= 2^{63} crash LLVM
    non-constant i128s crash LLVM

So this change tests only constant i128 operands of value < 2^{63}.

A couple of incorrect comments are also fixed.
2022-05-23 11:56:24 +01:00
Simon Pilgrim dd231f02a3 [AArch64] Regenerate andandshift.ll test checks 2022-05-23 11:48:24 +01:00
Andre Vieira 572fc7d2fd [AArch64] Order STP Q's by ascending address
This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule
STP Q's to the same base-address in ascending order of offsets. We have found
this to improve performance on Neoverse N1 and should not hurt other AArch64
cores.

Differential Revision: https://reviews.llvm.org/D125377
2022-05-23 09:50:44 +01:00
Florian Hahn 0cc981e021
[AArch64] implement isReassocProfitable, disable for (u|s)mlal.
Currently reassociating add expressions can lead to failing to select
(u|s)mlal. Implement isReassocProfitable to skip reassociating
expressions that can be lowered to (u|s)mlal.

The same issue exists for the *mlsl variants as well, but the DAG
combiner doesn't use the isReassocProfitable hook before reassociating.
To be fixed in a follow-up commit as this requires DAGCombiner changes
as well.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D125895
2022-05-23 09:39:00 +01:00
David Green 6ef5e242f2 [AArch64] Fix assumptions on input type of tryCombineFixedPointConvert
It is possible for the input type to not be v2i64 or v4i32, so weaken
the assertion to a return, fixing the crash in the new test.

Fixes #55606
2022-05-23 08:55:54 +01:00
Paul Walker 258dac43d6 [SVE] Enable use of 32bit gather/scatter indices for fixed length vectors
Differential Revision: https://reviews.llvm.org/D125193
2022-05-22 12:32:30 +01:00
Bill Wendling d497129f9b [AArch64] Use proper instruction mnemonics for FPRs
The FPR128 regs need MOVIv2d_ns and SVE regs need DUP_ZI_D.

Differential Revision: https://reviews.llvm.org/D126083
2022-05-20 12:02:26 -07:00
Rahul Anand R 534ea8bca5 [AArch64] Generate AND in place of CSEL for predicated CTTZ
This patch implements a for a target specific optimization that replaces
the cmp and csel from cttz with an and mask.

Recommitted with a fix for truncated value sizes.

Differential Revision: https://reviews.llvm.org/D123782
2022-05-20 13:41:32 +01:00
Bill Wendling 6e00a34cdb [AArch64] Add support for -fzero-call-used-regs
Support the "-fzero-call-used-regs" option on AArch64. This involves much less
specialized code than the X86 version. Most of the checks can be done with
TableGen.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D124836
2022-05-19 16:58:28 -07:00
David Green 602f81ec33 [AArch64] Fix zero element TBL indices
A TBL instruction will fill out-of-range values with 0's, something used
in D121139 to turn tbl2 with a zero input into tbl1s. This works OK for
v16i8, but for v8i8 the input is still treated as a v16i8, so
out-of-range values (like a lane index of 8) would end up loading values
from the top half of the input register. Clean this up by detecting the
out of range values and making sure they really use out of range values.
There is a fix for swapped indices of 64bit input vectors too, which
could be incorrectly adjusted if the zerovector was the first operand.

Fixes #55545

Differential Revision: https://reviews.llvm.org/D125865
2022-05-19 13:54:35 +01:00
David Green dd644ddf85 [AArch64] Extend zero vector TBL codegen tests. NFC 2022-05-19 13:01:55 +01:00
Jon Roelofs d699e54ca2 Fix an or+and miscompile w/ GlobalISel
Fixes #55284
2022-05-18 19:09:47 -07:00
Michael Kitzan 29bebb0237 [GISel] Add new combines for G_FMINNUM/MAXNUM and G_FMINIMUM/MAXIMUM
I noticed https://reviews.llvm.org/D87415 added SDAG combines to fold
FMIN/MAX instrs with NaNs.

The patch implements the same NaN combines for GISel GMIR FMIN/MAX opcodes:
G_FMINNUM(X, NaN) -> X
G_FMAXNUM(X, NaN) -> X
G_FMINIMUM(X, NaN) -> NaN
G_FMAXIMUM(X, NaN) -> NaN

The patch adds AArch64 tests for these combines as well.

Reviewed by: arsenm

Differential revision: https://reviews.llvm.org/D125819
2022-05-18 12:08:53 -07:00
Craig Topper 46eef76876 [DAGCombiner] Fix bug in MatchBSwapHWordLow.
This function tries to match (a >> 8) | (a << 8) as (bswap a) >> 16.

If the SRL isn't masked and the high bits aren't demanded, we still
need to ensure that bits 23:16 are zero. After the right shift they
will be in bits 15:8 which is where the important bits from the SHL
end up. It's only a bswap if the OR on bits 15:8 only takes the bits
from the SHL.

Fixes PR55484.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D125641
2022-05-18 09:23:18 -07:00
Florian Hahn a74e075908
[AArch64] Add tests showing reassoc breaks (s|u)ml(a|s)l selection. 2022-05-18 16:40:28 +01:00
Simon Pilgrim 939affc67d [AArch64] neon-vmull-high-p64.ll - fix name/check mismatch identified in D125604
Typos meant that we weren't actually checking the function name, which wasn't accounting for mangling
2022-05-18 13:24:28 +01:00
Simon Pilgrim 1584b2c74e [AArch64] fp16-v8-instructions.ll - remove some old defunct CHECKS identified in D125604
Typos meant that the update script never removed them
2022-05-18 12:49:05 +01:00
David Green 4c6a070a2c [AArch64] Teach perfect shuffles tables about D-lane movs
Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle
tables, slightly lowering the costs a little more. This is a rough
improvement in general, especially if you ignore mov v0.16b, v2.16b type
moves that are often artefacts of the calling convention.

The D register movs are encoded as (0x4 | LaneIdx), and to generate a D
register move we are required to bitcast into a higher type, but it is
otherwise very similar to the S-lane mov's already supported.

Differential Revision: https://reviews.llvm.org/D125477
2022-05-17 18:16:45 +01:00
David Green 8311fb7512 [AArch64] Extra tests useful for D-lane shuffles. NFC 2022-05-17 11:15:55 +01:00
Martin Storsjö 64a3c63e01 [MC] [Win64EH] Check for matches between epilogs and the prolog on ARM64
This allows sharing opcodes between prolog and epilog even when there
is more than one epilog.

I didn't make any handcrafted special MC level testcases for this (yet
at least), but it does seem to have the expected effect on two existing
CodeGen level testcases.

Differential Revision: https://reviews.llvm.org/D125619
2022-05-17 00:41:39 +03:00
Martin Storsjö cabefea2ec [MC] [Win64EH] Try writing an ARM64 "packed epilog" even if the epilog doesn't share opcodes with the prolog
The "packed epilog" form only implies that the epilog is located
exactly at the end of the function (so the location of the epilog
is implicit from the epilog opcodes), but it doesn't have to share
opcodes with the prolog - as long as the total number of opcode
bytes and the offset to the epilog fit within the bitfields.

This avoids writing a 4 byte epilog scope in many cases. (I haven't
measured how much this shrinks actual xdata sections in practice
though.)

Differential Revision: https://reviews.llvm.org/D125536
2022-05-17 00:41:39 +03:00
Paul Walker ee8aa351e4 [AArch64] Use ADDV for boolean xor reductions.
NEON does not have native support for xor reductions. However, when
reducing predicate vectors the operation is synonymous with an add
reduction that is supported.

Differential Revision: https://reviews.llvm.org/D125605
2022-05-16 22:34:12 +01:00
David Green 5d29d75273 [AArch64] Predicate SSHLL;SCVTF patterns behind UseAlternateSExtLoadCVTF32
There have been some patterns in the AArch64 backend to optimize code of
the form:
  ldrsh w8, [x0]
  scvtf s0, w8
to:
  ldr h0, [x0]
  sshll v0.4s, v0.4h, #0
  scvtf s0, s0
The idea is to remove the GRP->FPR move, but in reality is making code
larger and slower (or the same) on all the cpus I tried.

This patch adds the UseAlternateSExtLoadCVTF32 predicate similar to
nearby related pattern.

Differential Revision: https://reviews.llvm.org/D125470
2022-05-16 18:00:30 +01:00
Craig Topper 74f6ded49d [AArch64][ARM][RISCV][X86] Add test cases for PR55484. NFC
This bug is in generic DAG combine and easily reproducible on many
targets.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D125640
2022-05-16 09:28:11 -07:00
David Green 7272a8c23c [AArch64] Update check lines in arm64-scvt.ll. NFC 2022-05-16 15:50:39 +01:00
Bradley Smith 7ff5148d64 [DAGCombine] Support splat_vector nodes in (and (extload)) dagcombine
Differential Revision: https://reviews.llvm.org/D125367
2022-05-16 11:25:20 +00:00
Tim Northover 1ddc6ab1a9 AArch64: support ISel for fence instructions
Only the most conservative of the DAG patterns matched, leaving GISel with "dmb
ish" everywhere which is inefficient.
2022-05-16 12:01:18 +01:00
David Green 4c3e51ecfa [AArch64] Handle 64bit vectors in tryCombineFixedPointConvert
Under some situations we can visit 64bit vector extract elements in
tryCombineFixedPointConvert, where an assert fires as they are expected
to have been converted to 128bit. Turn the assert into an if statement,
bailing out and letting the extract be handled first.

Also invert some ifs, using early exits to reduce indentation.

Fixes #55417
2022-05-16 11:08:47 +01:00
Alex Richardson c8b44600c5 [AArch64] Avoid emitting MOVID when NEON is disabled
Previously, creating a zero floating-point constant used MOVID even when
NEON was disabled which resulted in the following fatal error:
`Attempting to emit MOVID instruction but the Feature_HasNEON predicate(s) are not met`

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D125237
2022-05-14 14:40:51 +00:00
Alex Richardson 09551251e3 [AArch64] Add missing HasNEON predicates to int->float patterns
I was trying to compile code with -march=+nosimd and hit various
instruction predicate verification errors, this patch should address the
ones I saw in integer to floating-pointer conversions.

I noticed that for signed conversions, some non-NEON instruction sequences
are shorter. I don't know if the longer one is still faster on current
architectures (the patterns date back to the initial backend import)

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D125308
2022-05-14 14:15:36 +00:00
Alex Richardson f8639133b5 [AArch64] Baseline test for D125307
Differential Revision: https://reviews.llvm.org/D125240
2022-05-14 14:15:36 +00:00
Eli Friedman 96c2a0c9ff [GlobalIsel] Fix fallback if stack protector isn't supported.
When GlobalISel fails, we need to report the error, and we need to set
the FailedISel property.  We skipped those steps if stack protector
insertion failed, which led to a very strange miscompile.

Differential Revision: https://reviews.llvm.org/D125584
2022-05-13 14:17:27 -07:00
Amara Emerson 41fef10449 [GlobalISel] Combine G_SHL, G_ASHR, G_SHL of undef shifts to undef.
Differential Revision: https://reviews.llvm.org/D125041
2022-05-13 12:20:34 -07:00
Sam Parker 6d53d35efd [TypePromotion] Avoid some unnecessary truncs
Recommit.

Check for legal zext 'sinks' before inserting a trunc.

Differential Revision: https://reviews.llvm.org/D115451
2022-05-13 09:45:20 +01:00
Sam Parker 84b5f7c38c [NFC][TypePromotion][AArch64] Tests
Simplify existing test and also add it as a codegen test for aarch64.
2022-05-13 09:27:42 +01:00
Karl Meakin 0298cce257 [AArch64] Add `foldADCToCINC` DAG combine.
Differential revision: https://reviews.llvm.org/D123781
2022-05-12 22:21:20 +01:00
Karl Meakin d29fc6e7d2 [AArch64] Replace `performANDSCombine` with `performFlagSettingCombine`.
`performFlagSettingCombine` is a generalised version of `performANDSCombine` which also works on  `ADCS` and `SBCS`.

Differential revision: https://reviews.llvm.org/D124464
2022-05-12 22:17:23 +01:00
Craig Topper cec249c60d [TypePromotion] Promote undef by converting to 0.
If we're promoting an undef I think that means that we expect the
upper bits are zero. undef doesn't guarantee that.

This patch replaces undef with 0 to ensure this. This matches how
a zext or sext of undef would be folded by InstCombine/InstSimplify.

I haven't found a failure from this was just thinking through the code.

Differential Revision: https://reviews.llvm.org/D123174
2022-05-12 09:09:24 -07:00
Nikita Popov 44d85259d0 [AArch64] Preserve chain when lowering fixed length load to SVE (PR55281)
When a fixed length load is lowered to an SVE masked load, the
result chain is currently set to the input chain of the old load,
rather than the result chain of the new load. This may cause stores
to be incorrectly reordered.

Fixes https://github.com/llvm/llvm-project/issues/55281.

Differential Revision: https://reviews.llvm.org/D125464
2022-05-12 16:03:32 +02:00
David Green 442c351b2b Revert "[AArch64] Generate AND in place of CSEL for predicated CTTZ"
This reverts commit 7dcd0ea683 due to
issues reported postcommit with the correctness of truncated cttzs.
2022-05-10 17:17:03 +01:00
Rosie Sumpter 1a2665902f [AArch64][SVE] Improve codegen when extracting first lane of active lane mask
When extracting the first lane of a predicate created using the
llvm.get.active.lane.mask intrinsic, it should give the same codegen as
when the predicate is created using the llvm.aarch64.sve.whilelo
intrinsic, since get.active.lane.mask is lowered to whilelo. This patch
ensures the codegen is the same by recognizing
llvm.get.active.lane.mask as a flag-setting operation in this case.

Differential Revision: https://reviews.llvm.org/D125215
2022-05-09 13:56:04 +01:00
Alban Bridonneau fef81131d9 [SVE] Optimize new cases for lowerConvertToSVBool
Converts to SVBool are already considered as a nop, if they
are converting an operand from a ptrue or a cmp, because
they zero the extra predicate lanes by construction.

This patch adds 2 similar cases:
- The wide cmp, which were not directly recognized by the test
for other forms of cmp
- Splats of 1, which will be generated as ptrue, and as such
will also zero the extra predicate lines.

Reviewed By: paulwalker-arm, peterwaller-arm

Differential Revision: https://reviews.llvm.org/D124908
2022-05-09 10:17:57 +00:00
Rahul Anand R 7dcd0ea683 [AArch64] Generate AND in place of CSEL for predicated CTTZ
This patch implements a for a target specific optimization that replaces
the cmp and csel from cttz with an and mask.

Differential Revision: https://reviews.llvm.org/D123782
2022-05-09 10:28:20 +01:00
David Green 830c18047b [AArch64] Add missing NVCAST patterns.
There were apparently some missing NVCAST patterns. This fills them in
using foreach, as opposed to having the specify them individually.

Fixes #55321
2022-05-07 21:08:14 +01:00
Amaury Séchet 06fad8bc05 [DAGCombine] Add node in the worklist in topological order in CombineTo
This is part of an ongoing effort toward making DAGCombine process the nodes in topological order.

This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124743
2022-05-07 16:24:31 +00:00
Kazu Hirata 26ba347fbb [AArch64] Add llvm/test/CodeGen/AArch64/i256-math.ll
This patch adds a test case for i256 additions and subtractions.  I'm
leaving out multiplications for now, which would result in very long
sequences.

Differential Revision: https://reviews.llvm.org/D125125
2022-05-06 14:26:12 -07:00
Kazu Hirata fffb6e6afd [AArch64] Fix sub with carry
13403a70e4 introduced a bug where we
generate the outgoing carry inverted, which in turn breaks the
lowering of @llvm.usub.sat.i128, returning the normal difference on
saturation and zero otherwise.

Note that AArch64 has peculiar semantics where the subtraction
instructions generate borrow inverted.  The problem is that we mix the
two forms of semantics -- the normal carry and inverted carry -- in
the area of extended precision subtractions.  Specifically, we have
three problems:

- lowerADDSUBCARRY takes the non-inverted incoming carry from a
  subtraction and feeds it to SBCS without inverting it first.

- lowerADDSUBCARRY makes available the outgoing carry from SBCS
  without inverting it.

- foldOverflowCheck folds:

  (SBC{S} l r (CMP (CSET LO carry) 1)) => (SBC{S} l r carry)

  When the incoming carry flag is set, CSET LO results in zero.  CMP
  in turn generates a borrow, *clearing* the carry flag.  Instead, we
  should fold:

  (SBC{S} l r (CMP 0 (CSET LO carry))) => (SBC{S} l r carry)

  When the incoming carry flag is set, CSET LO results in zero.  CMP
  does not generate a borrow, *setting* the carry flag.

IIUC, we should use the normal (that is, non-inverted) semantics for
carry everywhere.

This patch fixes the three problems above.

This patch does not add any new testcases because we have a plenty of
them covering the instruction in question.  In particular,
@u128_saturating_sub is identical to the testcase in the motivating
issue.

Fixes: #55253

Differential Revision: https://reviews.llvm.org/D124976
2022-05-06 11:04:17 -07:00
Craig Topper 76f90a9d71 [SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift.
Otherwise we have garbage in the upper bits that can affect the
results of the UREM.

Fixes PR55296.

Differential Revision: https://reviews.llvm.org/D125076
2022-05-06 09:26:30 -07:00
David Green 115c188807 [DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask'))
If the mask is made up of elements that form a mask in the higher type
we can convert shuffle(bitcast into the bitcast type, simplifying the
instruction sequence. A v4i32 2,3,0,1 for example can be treated as a
1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load
combines, along with helping simplify a number of other tests.

The PowerPC combine for v16i8 splat vector loads needed some fixes to
keep it working for v16i8 vectors. This improves the handling of v2i64
shuffles to match too, hopefully improving them in general.

Differential Revision: https://reviews.llvm.org/D123801
2022-05-06 10:50:31 +01:00
Amara Emerson 586802eb72 [GlobalISel] Re-generate some tests. 2022-05-05 14:14:36 -07:00
Craig Topper 084f967370 [SelectionDAG] Constant fold (sext_inreg undef, VT) to 0 instead of undef.
The result of sign_extend_inreg needs to have as many sign bits
as requested by the VT argument. The easiest way to guarantee this
is to fold it to 0.

SystemZ test was modified to avoid using undef.

Fixes https://github.com/llvm/llvm-project/issues/55178

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124696
2022-05-05 09:45:35 -07:00
Amara Emerson 87e3646a1f [AArch64][GlobalISel] Add undef combines to postlegalizer combiner. 2022-05-05 09:22:08 -07:00
David Green c7a6b11b7e [ARM][AArch64] Add some extra shuffle conversion test coverage. NFC
This adds a big endian run line for the AArch64 TRN tests and
regenerated the check lines, along with adding an extra MVE VMOVN case
and regenerating vector-DAGCombine.ll for easier updating.
2022-05-05 15:27:44 +01:00
Bradley Smith 8f623f4ab0 [AArch64][SVE] Restore SP from FP when SVE CSRs and variable sized objects are present
Without SVE, after a dynamic stack allocation has modified the SP, it is
presumed that a frame pointer restoration will revert the SP back to
it's correct value prior to any caller stack being restored. However the
SVE frame is restored using the stack pointer directly, as it is located
after the frame pointer. This means that in the presence of a dynamic
stack allocation, any SVE callee state gets corrupted as SP has the
incorrect value when the SVE state is restored.

To address this issue, when variable sized objects and SVE CSRs are
present, treat the stack as having been realigned, hence restoring the
stack pointer from the frame pointerr prior to restoring the SVE state.

Differential Revision: https://reviews.llvm.org/D124615
2022-05-04 12:57:03 +00:00
Alex Borcan afaa56df7a Implement support for __llvm_addrsig for MachO in llvm-mc
The __llvm_addrsig section is a section that the linker needs for safe icf.
This was not yet implemented for MachO - this is the implementation.
It has been tested with a safe deduplication implementation inside lld.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D123751
2022-05-03 18:19:18 -04:00
Jon Roelofs e1c808b36e Fix zero-width bitfield extracts to emit 0
Fixes #55129
2022-05-03 14:46:42 -07:00
Philipp Tomsich 64816e68f4 [AArch64] Support for Ampere1 core
Add support for the Ampere Computing Ampere1 core.
Ampere1 implements the AArch64 state and is compatible with ARMv8.6-A.

Differential Revision: https://reviews.llvm.org/D117112
2022-05-03 15:54:02 +01:00
Bradley Smith 96bbd359ed [AArch64][SVE] Only fold frame indexes referencing SVE objects into SVE loads/stores
Currently we always fold frame indexes into SVE load/store instructions,
however these instructions can only encode VL scaled offests. This means
that when we are accessing a fixed length stack object with these
instructions, the folded in frame index gets pulled back out during frame
lowering. This can cause issues when we have no spare registers and no
emergency spill slot.

Rather than causing issues like this, don't fold in frame indexes that
reference fixed length objects.

Fixes: #55041

Differential Revision: https://reviews.llvm.org/D124457
2022-05-03 09:48:13 +00:00
Sanjay Patel 747c6a0c73 [SDAG] fix miscompile when casting int->FP->int
This is the codegen equivalent of D124692.

As shown in https://github.com/llvm/llvm-project/issues/55150 -
the existing fold may be wrong when converting to a signed value.
This is a quick fix to avoid the miscompile.
https://alive2.llvm.org/ce/z/KtaDmd

Differential Revision: https://reviews.llvm.org/D124771
2022-05-02 14:57:27 -04:00
Sanjay Patel cb3fb08508 [AArch64] add tests for int->FP->int casts; NFC
Copied from x86 tests for multi-target coverage.
Also, provides coverage for target-specific asm
testing for Alive2 or its follow-ons.

See #55150 and D124692
2022-05-02 09:18:12 -04:00
Paul Walker f10a8f6752 [LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG
SIGN_EXTEND_INREG expansion can trigger a TypeSize error because
"VT.getSizeInBits() == 1" is used to detect for a boolean without
first verifying VT is a scalar.
2022-04-30 19:21:48 +01:00
Craig Topper 6affe87bda [DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask.
We try to match as a disguised rotate by constant of these forms
(shl (X | Y), C1) | (srl X, C2) --> (rotl X, C1) | (shl Y, C1)
(shl X, C1) | (srl (X | Y), C2) --> (rotl X, C1) | (srl Y, C2)

We may have also looked through an AND to find the shift. If we
did, we need to apply a mask to the result.

I'll add an AArch64 test and pre-commit it and the RISC-V test
tomorrow.

Fixes PR55201.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124711
2022-04-30 11:02:30 -07:00
Craig Topper 808c33ace5 [RISCV][AArch64] Pre-commit tests for D124711. NFC 2022-04-30 10:59:20 -07:00
Saleem Abdulrasool 24ba1302b3 AArch64: modify Swift async frame record storage on Windows
The frame layout on Windows differs from that on other platforms. It
will spill the registers in descending numeric value (i.e. x30, x29,
...). Furthermore, the x29, x30 pair is particularly important as it
is used for the fast stack walking. As a result, we cannot simply
insert the Swift async frame record in between the store. To provide
the simplistic search mechanism, always spill the async frame record
prior to the spilled registers.

This was caught by the assertion failure in the frame lowering code when
building the runtime for Windows AArch64.

Fixes: #55058

Differential Revision: https://reviews.llvm.org/D124498
Reviewed By: mstorsjo
2022-04-30 09:01:33 -07:00
Craig Topper 65dbd8d793 [SelectionDAG] Pre-commit test for D124696. NFC 2022-04-29 17:24:13 -07:00
Paul Walker b481512485 [SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine.
This is essentially a refactoring patch but allows more cases to
be caught, hence the output changes to some tests.

Differential Revision: https://reviews.llvm.org/D122994
2022-04-29 17:42:33 +01:00
Paul Walker 23c509754d [DAGCombiner] Stop invalid sign conversion in refineIndexType.
When looking through extends of gather/scatter indices it's safe
to convert a known positive signed index to unsigned, but unsigned
indices must remain unsigned.

Depends On D123318

Differential Revision: https://reviews.llvm.org/D123326
2022-04-29 14:20:13 +01:00
Paul Walker 59588f0a3d [SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost.
getGatherScatterIndexIsExtended currently looks through all
SIGN_EXTEND_INREG operations regardless of their input type.  This
patch restricts the code to only look through i32->i64 extensions,
which are the ones supported implicitly by SVE addressing modes.

Differential Revision: https://reviews.llvm.org/D123318
2022-04-29 14:20:13 +01:00
Paul Walker 7a0b897e86 [DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling
refineUniformBase and selectGatherScatterAddrMode both attempt the
transformation:

  base(0) + index(A+splat(B)) => base(B) + index(A)

However, this is only safe when index is not implicitly scaled.

Differential Revision: https://reviews.llvm.org/D123222
2022-04-29 12:35:16 +01:00
Nikita Popov 4e545bdb35 [SimplifyCFG] Thread branches on same condition in more cases (PR54980)
SimplifyCFG implements basic jump threading, if a branch is
performed on a phi node with constant operands. However,
InstCombine canonicalizes such phis to the condition value of a
previous branch, if possible. SimplifyCFG does support this as
well, but only in the very limited case where the same condition
is used in a direct predecessor -- notably, this does not include
the common diamond pattern (i.e. two consecutive if/elses on the
same condition).

This patch extends the code to look back a limited number of
blocks to find a branch on the same value, rather than only
looking at the direct predecessor.

Fixes https://github.com/llvm/llvm-project/issues/54980.

Differential Revision: https://reviews.llvm.org/D124159
2022-04-29 09:44:05 +02:00
Paul Walker 3c382ed71f [AArch64][SVE] Remove BIC from logical operation DestructiveBinaryComm patterns
This reverts part of https://reviews.llvm.org/D124224 that causes
an assert because the register allocator triggers a pathological
situation where there's no safe way to insert a zeroing MOVPFRX
instruction.
2022-04-22 15:07:55 +01:00
zhongyunde e1afae0311 [AArch64][SVE] Add some logical operation DestructiveBinaryComm patterns
Add DestructiveBinaryComm* patterns for ORR, EOR, AND and BIC.
The above instructions requires that the source and destination registers are
equal, so use movprfx should be beneficial to performance.
note: BIC (i.e. A & ~B) is not a commutative operation.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D124224
2022-04-22 20:31:00 +08:00
Daniel Kiss de07cde67b [AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions.
autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions
therefore let's emit .cfi_negate_ra_state for these too.
In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication
in one step here we can't emit the . cfi_negate_ra_state because that would be point after
the ret* instruction.

Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D111780
2022-04-22 13:25:57 +02:00
Karl Meakin 81904454f7 [AArch64] Add `foldOverflowCheck` DAG combine
Differential Revision: https://reviews.llvm.org//D123779
2022-04-21 14:56:38 +01:00
Karl Meakin 13403a70e4 [AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY
Differential Revision: https://reviews.llvm.org/D123322
2022-04-21 14:56:37 +01:00
Pengxuan Zheng 38612fbc89 Reland "[COFF, ARM64] Add __break intrinsic"
https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170

Reland after fixing the test failure. The failure was due to conflict with a
change (D122983) which was merged right before this patch.

Reviewed By: rnk, mstorsjo

Differential Revision: https://reviews.llvm.org/D124032
2022-04-20 13:01:30 -07:00
Pengxuan Zheng bff8356b19 Revert "[COFF, ARM64] Add __break intrinsic"
This reverts commit 8a9b4fb4aa.
2022-04-20 11:57:49 -07:00
Pengxuan Zheng 8a9b4fb4aa [COFF, ARM64] Add __break intrinsic
https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170

Reviewed By: rnk, mstorsjo

Differential Revision: https://reviews.llvm.org/D124032
2022-04-20 11:20:26 -07:00
Alexey Bataev 2cca53c815 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 09:37:16 -07:00
Alexey Bataev 5f7ac15912 Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer."
This reverts commit 2f49163b33 to fix
a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284
2022-04-20 06:35:55 -07:00
Alexey Bataev 2f49163b33 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 05:32:56 -07:00
Matt Arsenault d16945d31b AArch64/GlobalISel: Add -global-isel-abort=1 to select tests
Otherwise the legalizer verifier error isn't triggered since the
default is fallback.
2022-04-19 21:04:32 -04:00
David Green 73dc996428 [AArch64] Add lane moves to PerfectShuffle tables
This teaches the perfect shuffle tables about lane inserts, that can
help reduce the cost of many entries. Many of the shuffle masks are
one-away from being correct, and a simple lane move can be a lot simpler
than trying to use ext/zip/etc. Because they are not exactly like the
other masks handled in the perfect shuffle tables, they require special
casing to generate them, with a special InsOp Operator.

The lane to insert into is encoded as the RHSID, and the move from is
grabbed from the original mask. This helps reduce the maximum perfect
shuffle entry cost to 3, with many more shuffles being generatable in a
single instruction.

Differential Revision: https://reviews.llvm.org/D123386
2022-04-19 14:49:50 +01:00
David Green cc9495f679 [AArch64] Only mark cost 1 perfect shuffles as legal
The perfect shuffle tables encode a cost of either 0 (a nop-copy) or 1
(a single instruction) with a cost encoding of 0 in the upper 2 bits.
All perfect shuffles with any cost are then marked as legal shuffles
though (the maximum encoded cost is 3), which can confuse the DAG
combiner into thinking the shuffles are cheaper than the should be.

Limiting legal shuffles to single instructions seems to do better in
most case, producing less instructions for complex shuffles. There are
some cases that now become tbl, which may be better or worse depending
on whether the instruction is in a loop and the tbl load can be hoisted
out.

Differential Revision: https://reviews.llvm.org/D123377
2022-04-19 12:58:55 +01:00
David Green 50af82701c [AArch64] Cost all perfect shuffles entries as cost 1
A brief introduction to perfect shuffles - AArch64 NEON has a number of
shuffle operations - dups, zips, exts, movs etc that can in some way
shuffle around the lanes of a vector. Given a shuffle of size 4 with 2
inputs, some shuffle masks can be easily codegen'd to a single
instruction. A <0,0,1,1> mask for example is a zip LHS, LHS. This is
great, but some masks are not so simple, like a <0,0,1,2>. It turns out
we can generate that from zip LHS, <0,2,0,2>, having generated
<0,2,0,2> from uzp LHS, LHS, producing the result in 2 instructions.

It is not obvious from a given mask how to get there though. So we have
a simple program (PerfectShuffle.cpp in the util folder) that can scan
through all combinations of 4-element vectors and generate the perfect
combination of results needed for each shuffle mask (for some definition
of perfect). This is run offline to generate a table that is queried for
generating shuffle instructions. (Because the table could get quite big,
it is limited to 4 element vectors).

In the perfect shuffle tables zip, unz and trn shuffles were being cost
as 2, which is higher than needed and skews the perfect shuffle tables
to create inefficient combinations. This sets them to 1 and regenerates
the tables. The codegen will usually be better and the costs should be
more precise (but it can get less second-order re-use of values from
multiple shuffles, these cases should be fixed up in subsequent patches.

Differential Revision: https://reviews.llvm.org/D123379
2022-04-19 12:05:05 +01:00
chenglin.bi 222adf338a [Arch64][SelectionDAG] Add target-specific implementation of srem
1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first.
2. Add AArch64 faster path for SREM only pow2 case.

Fix https://github.com/llvm/llvm-project/issues/54649

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122968
2022-04-19 02:49:42 +08:00
Momchil Velikov e0ff354b83 [AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer
[Re-commit after fixing a dereference of "end" iterator]

The AArch64LoadStoreOptimnizer pass may merge a register
increment/decrement with a following memory operation. In doing so, it
may break CFI by moving a stack pointer adjustment past the CFI
instruction that described *that* adjustment.

This patch fixes this issue by moving said CFI instruction after the
merged instruction, where the SP increment/decrement actually takes
place.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D114547
2022-04-18 12:09:44 +01:00
chenglin.bi acfc025a72 Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem"
This reverts commit 9d9eddd3dd.
2022-04-18 10:35:09 +08:00
chenglin.bi 9d9eddd3dd [Arch64][SelectionDAG] Add target-specific implementation of srem
X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case.

Fix https://github.com/llvm/llvm-project/issues/54649

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122968
2022-04-16 12:29:11 +08:00
Momchil Velikov 24c84bd236 [AArch64] Async unwind - Fix MTE codegen emitting frame adjustments in a loop
When untagging the stack, the compiler may emit a sequence like:
```
        .LBB0_1:
          st2g sp, [sp], #32
          sub x8, x8, #32
          cbnz x8, .LBB0_1
          stg sp, [sp], #16
```
These stack adjustments cannot be described by CFI instructions.

This patch disables merging of SP update with untagging, i.e. makes the
compiler use an additional scratch register (there should be plenty
available at this point as we are in the epilogue) and generate:
```
            mov     x9, sp
            mov     x8, #256
            stg     x9, [x9], #16
    .LBB0_1:
            sub     x8, x8, #32
            st2g    x9, [x9], #32
            cbnz    x8, .LBB0_1
            add     sp, sp, #272
```
Merging is disabled only when we need to generate asynchronous unwind
tables.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D114548
2022-04-15 14:00:23 +01:00
John Brawn 27a8735a44 [AArch64] Add mayRaiseFPException to appropriate instructions
This is mostly handled by adding "let mayRaiseFPException = 1" before
the definition of the relevant instruction classes, but there are a
couple of complications:
 * When we have a multiclass where currently some instantiations are
   of instructions that can raise an exception and others aren't we
   need to split that into two multiclasses, one inheriting from the
   other using a multiclass parameter to enable exceptions.
 * In a couple of places in the globalisel instruction selector we
   need to manually set the NoFPExcept flag. There's also another
   place that looks like it should need it, but that code is never hit
   for those opcodes due to them being handled by the generic
   instruction selector, so I've instead just removed them from the
   switch.

Differential Revision: https://reviews.llvm.org/D115352
2022-04-14 16:51:22 +01:00
John Brawn 12c1022679 [AArch64] Lowering and legalization of strict FP16
For strict FP16 to work correctly needs some changes in lowering and
legalization:
 * SelectionDAGLegalize::PromoteNode was missing handling for some
   strict fp opcodes.
 * Some of the custom lowering of strict fp operations needed to be
   adjusted to work with FP16.
 * Custom lowering needed to be added for round-to-int operations.

With this, and the previous patches for the rest of the strict fp
isel, we can set IsStrictFPEnabled = true.

Differential Revision: https://reviews.llvm.org/D115620
2022-04-14 16:51:22 +01:00
David Green 1ba8f4f67d [AArch64] Move v4i8 concat load lowering to a combine.
The existing code was not updating the uses of loads that it recreated,
leading to incorrect chains which could break the ordering between
nodes. This moves the code to a combine instead, and makes sure we
update the chain references. This does mean it happens earlier -
potentially before the concats are simplified. This can lead to
inefficiencies in the codegen, which will be fixed in followups.
2022-04-14 15:19:33 +01:00
Paul Walker 0c44115e51 [SVE] Add support for non-element-type sized scaling when lowering MGATHER/MSCATTER.
The lowering code did not use the scale operand of MGATHER/MSCATTER
nodes, but instead assumed scaled indices were always scaled based
on the element type of the memory type. This patch adds the missing
support by rewritting the nodes as unscaled variants.

Differential Revision: https://reviews.llvm.org/D123670
2022-04-14 11:54:46 +01:00
Momchil Velikov 62d4686be3 Revert "[AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer"
This reverts commit ecbf32dd88.

It's possible this patch is the reason for an asertion failure
`!NodePtr->isKnownSentinel()` in `AArch64LoadStoreOpt::mergeUpdateInsn`
(https://lab.llvm.org/buildbot/#/builders/185/builds/1555) reverting while I
investigate.
2022-04-14 09:33:40 +01:00
David Green 4585bff408 [AArch64] Add new shuffles tests, and regenerate aarch64-wide-shuffle.ll and neon-wide-splat.ll. NFC 2022-04-13 18:10:49 +01:00
chenglin.bi 82e5976b7d [AArch64][SelectionDAG] stick all the power-of-two tests in a separate file; NFC
Baseline tests for D122968 (issue #54649).
2022-04-14 00:48:28 +08:00
Momchil Velikov ecbf32dd88 [AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer
The AArch64LoadStoreOptimnizer pass may merge a register
increment/decrement with a following memory operation. In doing so, it
may break CFI by moving a stack pointer adjustment past the CFI
instruction that described *that* adjustment.

This patch fixes this issue by moving said CFI instruction after the
merged instruction, where the SP increment/decrement actually takes
place.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D114547
2022-04-13 17:04:53 +01:00
Alex Richardson ee44896cf4 [AArch64] Add missing HasNEON predicate in scalar FABD patterns
I was trying to compile with -march=+nosimd and hit the following assertion:
`Attempting to emit FABD64 instruction but the Feature_HasNEON predicate(s) are not met`.
This adds a HasNEON predicate to the patterns which was omitted in commit
21d9b33d62 for some reason.
The new code generation matches GCC with -mcpu=<cpu>+nosimd:
https://godbolt.org/z/n1Y7xh5jo

Differential Revision: https://reviews.llvm.org/D123491
2022-04-13 09:30:11 +00:00
Alex Richardson 32a353a5e0 [AArch64] Baseline test for D123491 2022-04-13 09:30:11 +00:00
David Sherwood 44271e7c55 [AArch64][SVE] Fix lowering of "fcmp ueq/one" when using SVE
We were previously lowering to the incorrect instructions for the
setcc DAG node when using the SETUEQ and SETONE floating point
condition codes. I have fixed this by marking the SETONE code
as Expand and letting the SETUNE code be legal. I have also
fixed up the patterns for FCMNE_PPzZZ and FCMNE_PPzZ0 to use
the correct opcode.

Differential Revision: https://reviews.llvm.org/D121905
2022-04-13 10:24:03 +01:00
Daniel Kiss b0343a38a5 Support the min of module flags when linking, use for AArch64 BTI/PAC-RET
LTO objects might compiled with different `mbranch-protection` flags which will cause an error in the linker.
Such a setup is allowed in the normal build with this change that is possible.

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D123493
2022-04-13 09:31:51 +02:00
Matt Arsenault 6009122250 AArch64/GlobalISel: Remove pointless s1 legalize rules
These have no net effect on the legalize rules.
2022-04-12 16:54:04 -04:00
Matt Arsenault 3f2cc7cc2b GlobalISel: Fix lowerSelect handling of boolean high bits
This was making several invalid assumptions about the incoming
select. First, it was assuming the incoming condition was either s1 or
already sign extended, not accounting for different boolean high bits
behavior between scalar and vector conditions. We only had a vector
boolean due to the intermediate step vector select, which is now
avoided.

Second, it was assuming it can use the result vector type as a boolean
mask. These types don't have anything to do with other, and only makes
sense in the context of the expansion to bit operations. Since these
logically are part of the same lowering, do the complete expansion in
a single step.

The added select_v4s1_s1 test does fail to legalize, since it seems
AArch64's vector legalization support is pretty incomplete.
2022-04-12 16:54:03 -04:00
Ahmed Bougacha cfa4fe7c51 [AArch64][LOH] Don't ignore regmasks in bundles by iterating over instrs.
The LOH pass iterates over instructions to build its custom register
state machine, but it uses the top-level bundle iterator.
This should be okay, because when the wrapper BUNDLE MI is built,
it aggregates the register defs/uses in its instructions into MOs.

However, that doesn't apply to regmasks, and accumulating regmasks
across multiple instructions would be messy business.
There are a couple AnalyzePhysRegInBundle (/Virt) helpers that
do look at regmasks, but those don't fit in very well here.

AArch64 has started to use a few bundle instructions, specifically
as glorified pseudos for variant call instructions, which have regmasks.
So the LOH pass ends up ignoring regmasks.

Concretely, this has been wrong for a while, but, on aarch64, the
most common bundle (rv_marker call) was always followed by the
attached call instruction, a plain BL with a regmask.  Which
was properly detected by the pass.

However, we recently started keeping the attached call in the bundle,
so the regmask is now ignored.  And the pass happily combines ADRPs, of
say, x8, across the bundle, resulting in corrupt pointers later.
2022-04-12 10:34:54 -07:00
Ahmed Bougacha f3e76dcae3 [AArch64] Cleanup call-rv-marker.ll test. NFC.
This was doing -iphoneos instead of -ios. While there,
remove an old TODO and cleanup some alignment.
2022-04-12 10:34:54 -07:00
Momchil Velikov d0ea42a7c1 [AArch64] Async unwind - function epilogues
Reviewed By: MaskRay, chill

Differential Revision: https://reviews.llvm.org/D112330
2022-04-12 16:50:50 +01:00
Matt Arsenault 7e8ff962b3 AArch64/GlobalISel: Regenerate mir test checks
Minimizes the test diffs in future changes from introduction of -NEXT.
2022-04-11 20:12:22 -04:00
Matt Arsenault 492d0eab89 AArch64/GlobalISel: Remove IR section from a test 2022-04-11 19:43:37 -04:00
Biplob Mishra d06fb9045b AArch64 adding more tests to show the simple scenarios for or/and combine 2022-04-11 20:54:12 +01:00
Momchil Velikov b4ad28da19 [CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CGA
aware) nature of the unwind tables.

Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints taht LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):

  * there is a single basic block, containing the function prologue

  * possibly multiple epilogue blocks, where each epilogue block is
    complete and self-contained, i.e. CSR restore instructions (and the
    corresponding CFI instructions are not split across two or more
    blocks.

  * prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  - "has a call frame", if the function has executed the prologue, or
     has not executed any epilogue

  - "does not have a call frame", if the function has not executed the
    prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

- CFI instructions, which reset the unwind table state to the
    initial one.  This is done by a target specific hook and is
    expected to be trivial to implement, for example it could be:
```
     .cfi_def_cfa <sp>, 0
     .cfi_same_value <rN>
     .cfi_same_value <rN-1>
     ...
```
where `<rN>` are the callee-saved registers.

- CFI instructions, which reset the unwind table state to the one
    created by the function prologue. These are the sequence:
```
       .cfi_restore_state
       .cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.

Reviewed By: MaskRay, danielkiss, chill

Differential Revision: https://reviews.llvm.org/D114545
2022-04-11 13:27:26 +01:00
Sanjay Patel 2ed15984b4 [SDAG] try to reduce compare of funnel shift equal 0
fshl (or X, Y), X, C ==/!= 0 --> or (shl Y, C), X ==/!= 0
fshl X, (or X, Y), C ==/!= 0 --> or (srl Y, BW-C), X ==/!= 0

This is similar to an existing setcc-of-rotate fold, but the
matching requires more checks for the more general funnel op:
https://alive2.llvm.org/ce/z/Ab2jDd

We are effectively decomposing the funnel shift into logical
shifts, reassociating, and removing a shift.

This should get us the final improvements for x86-64 that were
originally shown in D111530
( https://github.com/llvm/llvm-project/issues/49541 );
x86-32 still shows some SHLD/SHRD, so the pattern is not
matching there yet.

Differential Revision: https://reviews.llvm.org/D122919
2022-04-11 07:44:58 -04:00
Tim Northover 901831a4e6 Revert "AArch64: take compact unwind frame size from last CFI instruction."
It was on ToT when I pushed and committed unintentionally.
2022-04-11 12:25:58 +01:00
Tim Northover 9fe32ca697 AArch64: add nvcast patterns for v1f64 2022-04-11 12:24:48 +01:00
Tim Northover 4120a3abdd AArch64: take compact unwind frame size from last CFI instruction.
Asynchronous exception support for the prologue means that there can be
multiple .cfi_def_cfa_offset instructions in a single function, which tripped
up an assertion in the compact unwind generator.

In reality the compact unwind format is far too restrictive to represent
asynchronous frames so if we ever wanted that on Darwin we'd fall back to DWARF
(possibly keeping compact unwind around for synchronous users). So the compact
format should continue to represent the synchronous situation, and the
assertion can be removed.
2022-04-11 12:24:48 +01:00
Tim Northover 6c85668d28 Tail calls: look through AssertZExt to find register copy.
arm64_32 guarantees the high 32 bits of pointer parameters are passed as 0, and
this is modelled in the IR by inserting an AssertZExt after the CopyFromReg.
The function deciding whether registers that need to be preserved actually are
wasn't expecting this so it banned perfectly legitimate tail calls.
2022-04-11 12:24:47 +01:00
Alexander Shaposhnikov 626039cdcc [AArch64] Split fuse-literals feature
This diff splits fuse-literals feature and enables fuse-adrp-add by default,
in particular, it adjusts instruction scheduling to place ADRP+ADD pairs together.
This also enables the linker to apply the relaxations described in
d2ca58c54b.

Differential revision: https://reviews.llvm.org/D120104

Test plan: make check-all
2022-04-11 05:27:11 +00:00
Karl Meakin 784b9d468a [AArch64] Update tests with the `update_llc_test_checks.py` script (NFC)
Reviewed By: Kmeakin

Differential Revision: https://reviews.llvm.org/D123317
2022-04-07 18:06:15 +01:00
Paul Walker a88e8374db [SVE] Add more gather/scatter tests to highlight bugs in their generated code. 2022-04-07 17:13:48 +01:00
Martin Storsjö 8d7a17b7c8 [AArch64] Fix the upper limit for folded address offsets for COFF
In COFF, the immediates in IMAGE_REL_ARM64_PAGEBASE_REL21 relocations
are limited to 21 bit signed, i.e. the offset has to be less than
(1 << 20). The previous limit did intend to cover for this case, but
had missed that the 21 bit field was signed.

This fixes issue https://github.com/llvm/llvm-project/issues/54753.

Differential Revision: https://reviews.llvm.org/D123160
2022-04-06 22:54:13 +03:00
Daniil Kovalev 62a983ebc5 Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if"
This reverts commit 83a798d4b0.

As discussed in D120714 with @thakis, the patch added unneeded complexity
without noticeable benefits.
2022-04-06 20:32:53 +03:00
Paul Walker 7d3af9ef0f [DAGCombine] insert_subvector undef, (splat X), N2 -> splat X
Differential Revision: https://reviews.llvm.org/D120328
2022-04-06 17:15:38 +01:00
Paul Walker 5e407f0887 [SVE] Add gather/scatter tests to highlight bugs in their generated code. 2022-04-06 15:30:29 +01:00
chenglin.bi 87f0d55304 [AArch64] Fold lsr+bfi in tryBitfieldInsertOpFromOr
In tryBitfieldInsertOpFromOr, if the new created LSR Node's source
is LSR with Imm shift, try to fold them.

Fixes https://github.com/llvm/llvm-project/issues/54696

Reviewed By: efriedma, benshi001

Differential Revision: https://reviews.llvm.org/D122915
2022-04-06 22:02:31 +08:00
zhongyunde 9a2d5cc1da [SVE][AArch64] Enable first active true vector combine for INTRINSIC_WO_CHAIN
WHILELO/LS insn is used very important for SVE loop, and itself
is a flag-setting operation, so add it.

Reviewed By: paulwalker-arm, david-arm

Differential Revision: https://reviews.llvm.org/D122796
2022-04-06 21:01:37 +08:00
zhongyunde 19e5235147 [AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD
Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD
because it relies on BuildVectorSDNode is fails for scalable vectors.
This patch is intend to handle that, so we can circle back the type MVT::nxv2i32

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D122703
2022-04-06 20:50:42 +08:00
Daniil Kovalev 83a798d4b0 [CodeGen] Place SDNode debug ID declaration under appropriate #if
Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to
reduce memory usage when it is not needed.

Differential Revision: https://reviews.llvm.org/D120714
2022-04-06 14:09:32 +03:00
Matt Arsenault 634bf829a8 MachineVerifier: Diagnose undef set on full register defs
An undef def of a full register would assert in LiveIntervalCalc.
2022-04-05 22:19:17 -04:00
zhongyunde 251637690a [AArch64] Enhance last active true vector combine
Last active extracting will output LASTB + WHILELS, and the WHILELS itself
is a flag-setting operation, so perform it preferly.

Reviewed By: paulwalker-arm, sdesmalen

Differential Revision: https://reviews.llvm.org/D122551
2022-04-06 09:54:28 +08:00
Jessica Paquette 6c9bc2dd1c [GlobalISel] NFC: Add test coverage for s144 and s142
144 = 16 * 9

For types where s16 is legal.

It may be interesting to break these down into 16-bit chunks rather than 32
or 64 bits.

Add tests for some opcodes, just so we get some test coverage drawing attention
to this.
2022-04-05 15:26:46 -07:00
Jessica Paquette 30922d62f4 [GlobalISel] NFC: Add some test coverage for s158
158 = 32 * 5 - 2

This is a wide type which may benefit from a different widening scheme than
types which are multiples of 64. For example, if 32-bit and 64-bit scalars
are both allowed, and a type is a multiple of 32, or is closer to a multiple
of 32, it *may* be better to

- Widen to the wide multiple of 32
- Break up the type into 32-bit chunks

Anyway, we don't have any test coverage for this at all, so for the sake of
making sure we test it, let's add some test coverage.
2022-04-05 15:11:22 -07:00
Jessica Paquette 5830afa532 [GlobalISel] NFC: Regen some tests + improve test coverage for wide even types
It turns out we don't do an awesome job with weird types like s318 (and other
types near them, like s316).

We don't have any test coverage for those types, so let's add some so it's
easier to see the impact of legalization improvements on them when we make
changes.

Since the test generator was changed, it's easier to update relevant tests prior
to changing things rather than squinting at a bunch of "ah, CHECK is now
CHECK-NEXT" lines. So, let's just regenerate a bunch of tests while we're
here.

Unfortunately the "CHECK-NEXT" scheme doesn't work with legalize-cmp for some
reason, and the test will fail. So keep that one having CHECK lines.
2022-04-05 12:13:22 -07:00
Biplob Mishra 90853d8f37 Adding new tests to demonstrate code patterns with multiple or/and which can be combined with a single mask 2022-04-05 14:17:02 +01:00
Biplob Mishra edb4520205 rev16 instruction is being generated for a half word byte swap on a 32-bit input as a bswap+rotr. This is not true for a 64-bit input.
This patch implements the rev16 instruction for a AArch64 backend for a half word byte swap on a 64-bit input.

Differential Revision: https://reviews.llvm.org/D122643
2022-04-05 13:43:11 +01:00
Biplob Mishra f2b4b2ebe7 Reverting changes to correct the commit message 2022-04-05 13:38:14 +01:00
Biplob Mishra afca54f0cf [ARM][AArch64] Optimize pattern for converting a half word byte swap in a 64-bit input to a rev16 instruction. Differential Revision: https://reviews.llvm.org/D122643 2022-04-05 12:23:09 +01:00
Muhammad Omair Javaid 0320115c16 Revert "[CodeGen] Async unwind - add a pass to fix CFI information"
This reverts commit 980c3e6dd2.

This commit had failing tests with clang crashing across various
AArch64/Linux buildots.

https://lab.llvm.org/buildbot/#/builders/179/builds/3346

Differential Revision: https://reviews.llvm.org/D114545
2022-04-05 13:12:30 +05:00
David Green 3b9833597e [AArch64] Alter mull buildvectors(ext(..)) combine to work on shuffles
D120018 altered this combine to work on buildvectors as opposed to
shuffle dup's. This works well for dups and other things that are
expanded into buildvectors. Some shuffles are legal though, and stay as
vector_shuffle through lowering. This expands the transform to also
handle shuffles, so that we can turn mul(shuffle(sext into
mul(sext(shuffle and more readily make smull/umull instructions. This
can come up from the SLP vectorizer adding shuffles that are costed from
extends.

Differential Revision: https://reviews.llvm.org/D123012
2022-04-04 23:07:47 +01:00
David Green a70480dd13 [AArch64] Add some tests for mul(shuffle(ext. NFC 2022-04-04 22:54:55 +01:00
Momchil Velikov 980c3e6dd2 [CodeGen] Async unwind - add a pass to fix CFI information
This pass inserts the necessary CFI instructions to compensate for the
inconsistency of the call-frame information caused by linear (non-CFG
aware) nature of the unwind tables.

Unlike the `CFIInstrInserer` pass, this one almost always emits only
`.cfi_remember_state`/`.cfi_restore_state`, which results in smaller
unwind tables and also transparently handles custom unwind info
extensions like CFA offset adjustement and save locations of SVE
registers.

This pass takes advantage of the constraints that LLVM imposes on the
placement of save/restore points (cf. `ShrinkWrap.cpp`):

  * there is a single basic block, containing the function prologue

  * possibly multiple epilogue blocks, where each epilogue block is
    complete and self-contained, i.e. CSR restore instructions (and the
    corresponding CFI instructions are not split across two or more
    blocks.

  * prologue and epilogue blocks are outside of any loops

Thus, during execution, at the beginning and at the end of each basic
block the function can be in one of two states:

  - "has a call frame", if the function has executed the prologue, or
     has not executed any epilogue

  - "does not have a call frame", if the function has not executed the
    prologue, or has executed an epilogue

These properties can be computed for each basic block by a single RPO
traversal.

In order to accommodate backends which do not generate unwind info in
epilogues we compute an additional property "strong no call frame on
entry" which is set for the entry point of the function and for every
block reachable from the entry along a path that does not execute the
prologue. If this property holds, it takes precedence over the "has a
call frame" property.

From the point of view of the unwind tables, the "has/does not have
call frame" state at beginning of each block is determined by the
state at the end of the previous block, in layout order.

Where these states differ, we insert compensating CFI instructions,
which come in two flavours:

- CFI instructions, which reset the unwind table state to the
    initial one.  This is done by a target specific hook and is
    expected to be trivial to implement, for example it could be:
```
     .cfi_def_cfa <sp>, 0
     .cfi_same_value <rN>
     .cfi_same_value <rN-1>
     ...
```
where `<rN>` are the callee-saved registers.

- CFI instructions, which reset the unwind table state to the one
    created by the function prologue. These are the sequence:
```
       .cfi_restore_state
       .cfi_remember_state
```
In this case we also insert a `.cfi_remember_state` after the
last CFI instruction in the function prologue.

Reviewed By: MaskRay, danielkiss, chill

Differential Revision: https://reviews.llvm.org/D114545
2022-04-04 14:38:22 +01:00
Dávid Bolvanský fb65aaf0be [NFCI] Fixed missing colon in CHECK directives - part 2 2022-04-03 14:42:59 +02:00
Sanjay Patel ec0b332cd8 [AArch64] add tests for funnel+or == 0; NFC
These are copied from x86 ( 1074bdfb52 ) to
provide more coverage for a potential generic combine.
2022-04-01 13:39:25 -04:00
Nicholas Guy 7d676714fb [AArch64] Set MaxBytesForLoopAlignment for more targets
Differential Revision: https://reviews.llvm.org/D122566
2022-03-31 11:37:11 +01:00
Sanjay Patel e18cc5277f [SDAG] try to canonicalize logical shift after bswap
When shifting by a byte-multiple:
bswap (shl X, C) --> lshr (bswap X), C
bswap (lshr X, C) --> shl (bswap X), C

This is the backend version of D122010 and an alternative
suggested in D120648.
There's an extra check to make sure the shift amount is
valid that was not in the rough draft.

I'm not sure if there is a larger motivating case for RISCV (bug report?),
but the ARM diffs show a benefit from having a late version of the
transform (because we do not combine the loads in IR).

Differential Revision: https://reviews.llvm.org/D122655
2022-03-30 09:29:32 -04:00
Eli Friedman a8ebd85e46 [MC] Make MCAsmInfo::isAcceptableChar reflect MCAsmInfo::doesAllowAtInName
On targets which don't allow "@" in unquoted identifiers, make sure we
don't emit them; otherwise, we can't parse our own output.

Differential Revision: https://reviews.llvm.org/D122516
2022-03-29 14:01:32 -07:00
David Green 60f57b3658 [AArch64] Ensure fixed point fptoi_sat has correct saturation width
D113200 introduced an error where it was converting FP_TO_SI_SAT with
multiply to a fixed point floating point convert. The saturation
bitwidth needs to be equal to the floating point width, or else the
routine would truncate the result as opposed to saturating it.

Fixes #54601
2022-03-29 10:12:44 +01:00
zhongyunde 2b3becb41d [AArch64][GlobalISel] Add new MOVI pattern for fp constants
GlobalISel is used in option -O0, so add MOVI pattern for it,
which is done similar in gcc.(https://godbolt.org/z/8j6fzG3h6)

Fix https://github.com/llvm/llvm-project/issues/53651

Reviewed By: dmgreen, paquette

Differential Revision: https://reviews.llvm.org/D122559
2022-03-29 10:57:22 +08:00
zhongyunde c3fe025bd4 [AArch64][SelectionDAG] Refactor to support more scalable vector extending loads
Accord the discussion in D120953, we should firstly exclude all scalable vector
extending loads and then selectively enable those which we directly support.

This patch is intend to refactor for above (truncating stores is not touched),and
more scalable vector types will try to reduce the number of masked loads in favour
of more unpklo/hi instructions.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D122281
2022-03-27 21:18:01 +08:00
David Green 693d3b7e76 [AArch64] Lower 3 and 4 sources buildvectors to TBL
The default expansion for buildvectors is to extract each element and
insert them into a new vector. That involves a lot of copying to/from
the GPR registers. TLB3 and TLB4 can be relatively slow instructions
with the mask needing to be loaded from a constant pool, but they should
always be better than all the moves to/from GPRs.

Differential Revision: https://reviews.llvm.org/D121137
2022-03-26 21:10:43 +00:00
zhongyunde 758be63ac6 [test][AArch64] Add a test case for D121180 NFC
Now, perform last active true vector combine only where
we're extracting from a flag-setting operation. But in
fact, the last active extracting will output LASTB + WHILELS,
and the WHILELS itself is a flag-setting operation, so
precommit this case to test the potentially further optimization.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D122453
2022-03-26 19:12:16 +08:00
David Green 3d8d60e147 Revert "[AArch64] Lower 3 and 4 sources buildvectors to TBL"
This reverts commit ec93b28909 as problems
with it have been reported.
2022-03-25 10:03:10 +00:00
Momchil Velikov 50a97aacac [AArch64] Async unwind - function prologues
Re-commit of 32e8b550e5

This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-03-24 16:16:44 +00:00
David Green ec93b28909 [AArch64] Lower 3 and 4 sources buildvectors to TBL
The default expansion for buildvectors is to extract each element and
insert them into a new vector. That involves a lot of copying to/from
the GPR registers. TLB3 and TLB4 can be relatively slow instructions
with the mask needing to be loaded from a constant pool, but they should
always be better than all the moves to/from GPRs.

Differential Revision: https://reviews.llvm.org/D121137
2022-03-24 10:02:33 +00:00
David Green 311bdbc9b7 [AArch64] Add tests showing inefficient TBL3/4 generation. NFC 2022-03-23 16:43:23 +00:00
David Spickett c3b98194df Reland "[llvm][AArch64] Insert "bti j" after call to setjmp"
This reverts commit edb7ba714a.

This changes BLR_BTI to take variable_ops meaning that we can accept
a register or a label. The pattern still expects one argument so we'll
never get more than one. Then later we can check the type of the operand
to choose BL or BLR to emit.

(this is what BLR_RVMARKER does but I missed this detail of it first time around)

Also require NoSLSBLRMitigation which I missed in the first version.
2022-03-23 11:43:43 +00:00
David Spickett edb7ba714a Revert "[llvm][AArch64] Insert "bti j" after call to setjmp"
This reverts commit eb5ecbbcbb
due to failures on buildbots with expensive checks enabled.
2022-03-23 10:43:20 +00:00
David Spickett eb5ecbbcbb [llvm][AArch64] Insert "bti j" after call to setjmp
Some implementations of setjmp will end with a br instead of a ret.
This means that the next instruction after a call to setjmp must be
a "bti j" (j for jump) to make this work when branch target identification
is enabled.

The BTI extension was added in armv8.5-a but the bti instruction is in the
hint space. This means we can emit it for any architecture version as long
as branch target enforcement flags are passed.

The starting point for the hint number is 32 then call adds 2, jump adds 4.
Hence "hint #36" for a "bti j" (and "hint #34" for the "bti c" you see
at the start of functions).

The existing Arm command line option -mno-bti-at-return-twice has been
applied to AArch64 as well.

Support is added to SelectionDAG Isel and GlobalIsel. FastIsel will
defer to SelectionDAG.

Based on the change done for M profile Arm in https://reviews.llvm.org/D112427

Fixes #48888

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D121707
2022-03-23 09:51:02 +00:00
zhongyunde 828b89bc0b [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads
Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.

Both of normal and masked loads test cases added to guard compile crash.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D120953
2022-03-21 23:47:33 +08:00
chenglin.bi dd3b90e4d7 [AArch64] Combine ISD::SETCC into AArch64ISD::ANDS
When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false).
ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction.
So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison
Fix https://github.com/llvm/llvm-project/issues/54283

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D121449
2022-03-19 13:04:16 +00:00
Paul Walker f46fe36d59 [AArch64] Fix incorrect getSetCCInverse usage within trySwapVSelectOperands.
When inverting the compare predicate trySwapVSelectOperands is
incorrectly using the type of the select's cond operand rather
than the type of cond's operands. This means we're treating all
inversions as if they're integer.

Differential Revision: https://reviews.llvm.org/D121968
2022-03-19 12:36:14 +00:00
David Green fe6057a293 [AArch64] Custom lower concat(v4i8 load, ...)
We already have custom lowering for v4i8 load, which loads as a f32,
converts to a vector and bitcasts and extends the result to a v4i16.
This adds some custom lowering of concat(v4i8 load, ...) to keep the
result as an f32 and create a buildvector of the resulting f32 loads.
This helps not create all the extends and bitcasts, which are often
difficult to fully clean up.

Differential Revision: https://reviews.llvm.org/D121400
2022-03-18 11:58:02 +00:00
David Green 0fa4aeb453 [AArch64] Add extra insert-subvector tests. NFC 2022-03-17 15:29:07 +00:00
David Green 0b6df40c52 [AArch64] Combine ISD::AND into AArch64ISD::ANDS
If we already have a AArch64ISD::ANDS node with identical operands, we
can merge any ISD::AND into it, reducing the instruction count by
calculating the value and the flags in a single operation. This code is
taken from the X86 backend, and could also handle AArch64ISD::ADDS and
AArch64ISD::SUBS, but I couldn't find any test cases where it came up.

Differential Revision: https://reviews.llvm.org/D118584
2022-03-17 09:44:11 +00:00
David Green 09a2b5b506 [AArch64] Regenerate and extend peephole-and-tst.ll tests. NFC 2022-03-16 09:44:20 +00:00
Matthias Gehre 09854f2af3 [SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers
Emit calls to __divei4 and friends for divison/remainder of large integers.

This fixes https://github.com/llvm/llvm-project/issues/44994.

The overall RFC is in https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329

The compiler-rt part is in https://reviews.llvm.org/D120327

Differential Revision: https://reviews.llvm.org/D120329
2022-03-16 09:36:28 +00:00
Amara Emerson 8cbf18cb04 [GlobalISel] Fix store merging incorrectly merging volatile stores.
The existing volatile checks only handle aliasing hazards between stores,
but that isn't enough since by that point volatile stores may have already
been added to the current candidate group.
2022-03-14 13:48:51 -07:00
Florian Mayer 628c537b32 [MTE] Add test that stack tagging does not mess up stack coloring.
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D121433
2022-03-14 13:36:21 -07:00
Mircea Trofin 294eca35a0 [regalloc] Remove -consider-local-interval-cost
Discussed extensively on D98232. The functionality introduced in D35816
never worked correctly. In D98232, it was fixed, but, as it was
introducing a large compile-time regression, and the value of the
original patch was called into doubt, we disabled it by default
everywhere. A year later, it appears that caused no grief, so it seems
safe to remove the disabled code.

This should be accompanied by re-opening bug 26810.

Differential Revision: https://reviews.llvm.org/D121128
2022-03-14 10:49:16 -07:00
zhongyunde 3568333815 [AArch64] Perform last active true vector combine
Test bit of lane EC-1 can use P register directly, eg:
Materialize : Idx = (add (mul vscale, NumEls), -1)
               i1 = extract_vector_elt t37, Constant:i64<Idx>
    ... into: "ptrue p, all" + PTEST

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D121180
2022-03-15 01:25:03 +08:00
Arthur Eubanks 250620f76e [OpaquePtr][AArch64] Use elementtype on ldxr/stxr
Includes verifier changes checking the elementtype, clang codegen
changes to emit the elementtype, and ISel changes using the elementtype.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D120527
2022-03-14 10:09:59 -07:00
Sanjay Patel c2592c374e [SDAG] simplify bitwise logic with repeated operand
We do not have general reassociation here (and probably
do not need it), but I noticed these were missing in
patches/tests motivated by D111530, so we can at
least handle the simplest patterns.

The VE test diff looks correct, but we miss that
pattern in IR currently:
https://alive2.llvm.org/ce/z/u66_PM
2022-03-13 11:12:30 -04:00
Sanjay Patel 9f4caf55db [AArch64] add tests for bitwise logic reassociation; NFC
Chooses from a variety of scalar/vector/illegal types
because that should not inhibit any folds.
2022-03-13 11:12:30 -04:00
David Sherwood aeeb1199b4 [AArch64][SVE] Change the asserts in LowerToPredicatedOp to check for legal types
When building the LLVM test suite with SVE I discovered a crash
when compiling some Halide tests, which occurs because we try to
use SVE to lower 64-bit vector multiplies and there is no
vscale_range attribute on the function. In this case the min SVE
vector bits was 0, which caused an assert in LowerToPredicatedOp
to fire. I have amended the asserts in this function to check that the
fixed-width type is legal. If the fixed-width type is larger than NEON
and is legal then it must be because we've set the min SVE vector
bits to something > 128. Or if the min SVE bits is 0, then the only
legal types allowed are 128 bit types - for any other types the assert
will fire.

Tests added here:

  CodeGen/AArch64/sve-fixed-length-no-vscale-range.ll

Differential Revision: https://reviews.llvm.org/D121297
2022-03-11 09:57:58 +00:00
Philippe Valembois 26cd258420 [AArch64] Use correct calling convention for each vararg
While checking is tail call optimization is possible, the calling
convention applied to fixed arguments is not the correct one.
This implies for DarwinPCS that all arguments of a vararg function will
go to the stack although fixed ones can go in registers.

This prevents non-virtual thunks to be tail optimized although they are
marked as musttail.

Differential Revision: https://reviews.llvm.org/D120622
2022-03-10 15:07:25 -08:00
David Green 21a97a2ac1 [AArch64] TBL uses zero for out of range elements.
A TBL instruction will use zero for any out of range values. We can use
this in GenerateTBL to help turn a TBL2 into a TBL1, avoiding the need
to materialise the zero.

Differential Revision: https://reviews.llvm.org/D121139
2022-03-10 14:45:13 +00:00
David Green 43591be2aa [AArch64] Extra tests for tbl with zero elements. NFC 2022-03-10 13:51:04 +00:00
Xiang1 Zhang c31014322c TLS loads opimization (hoist)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120000
2022-03-10 09:29:06 +08:00
Saleem Abdulrasool c31f0a0050 AArch64: correct epilogue/prologue emission for swift async
The prologue and epilogue emission were unbalanced in light of different
strategies of async frame context emission.  Adjust the epilogue emission
to match the prologue emission.  This makes the elision work properly as
well as the deployment based.  Due to the fact that the epilogue always
was clearing a bit (which should not be set in the first place), the
client would not notice the behavioural issue unless the deployment
version was in effect.
2022-03-09 18:41:10 +00:00
Sanjay Patel 341623653d [SDAG] match rotate pattern with extra 'or' operation
This is another fold generalized from D111530.
We can find a common source for a rotate operation hidden inside an 'or':
https://alive2.llvm.org/ce/z/9pV8hn

Deciding when this is profitable vs. a funnel-shift is tricky, but this
does not show any regressions: if a target has a rotate but it does not
have a funnel-shift, then try to form the rotate here. That is why we
don't have x86 test diffs for the scalar tests that are duplicated from
AArch64 ( 74a65e3834 ) - shld/shrd are available. That also makes it
difficult to show vector diffs - the only case where I found a diff was
on x86 AVX512 or XOP with i64 elements.

There's an additional check for a legal type to avoid a problem seen
with x86-32 where we form a 64-bit rotate but then it gets split
inefficiently. We might avoid that by adding more rotate folds, but
I didn't check to see what is missing on that path.

This gets most of the motivating patterns for AArch64 / ARM that are in
D111530.

We still need a couple of enhancements to setcc pattern matching with
rotate/funnel-shift to get the rest.

Differential Revision: https://reviews.llvm.org/D120933
2022-03-09 13:19:00 -05:00
Florian Hahn 3836003e87
[AArch64] Add test for D120481 with multiple uses. 2022-03-08 11:11:03 +00:00
zhongyunde c22c8b151b [AArch64] Perform first active true vector combine
Materialize : i1 = extract_vector_elt t37, Constant:i64<0>
   ... into: "ptrue p, all" + PTEST
Test bit of lane 0 can use P register directly, and the instruction “pture all”
is loop invariant, which will beneficial to SVE after hoisting out the loop.

Reviewed By: david-arm, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D120891
2022-03-08 01:10:21 +08:00
David Green d9633d1490 [AArch64] Turn truncating buildvectors into truncates
When lowering large v16f32->v16i8 fp_to_si_sat, the fp_to_si_sat node is
split several times, creating an illegal v4i8 concat that gets expanded
into a BUILD_VECTOR. After some combining and other legalisation, it
ends up the a buildvector that extracts from 4 vectors, looking like
BUILDVECTOR(a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3). That is
really an v16i32->v16i8 truncate in disguise.

This adds a ReconstructTruncateFromBuildVector method to detect the
pattern, converting it back into the legal "concat(trunc(concat(trunc(a),
trunc(b))), trunc(concat(trunc(c), trunc(d))))" tree. The extracted
nodes could also be v4i16, in which case the truncates are not needed.
All those truncates and concats then become uzip1's, which is much
better than expanding by moving vector lanes around.

Differential Revision: https://reviews.llvm.org/D119469
2022-03-07 09:42:54 +00:00
David Green 4388f4f776 [DAG] Don't convert undef to 0 when creating buildvector
When inserting undef into buildvectors created from shuffles of
buildvectors, we convert elements to the largest needed type. This had
the effect of converting undef into 0, which isn't needed as the
buildvector implicitly truncates and trunc(zext(undef)) == undef.

Differential Revision: https://reviews.llvm.org/D121002
2022-03-06 18:35:34 +00:00
David Green 84ccd015e7 [AArch64] Some tests to show reconstructing truncates. NFC 2022-03-05 18:35:43 +00:00
Karl Meakin 1d8093fe1e [AArch64] fix i128-math.ll 2022-03-05 17:51:58 +00:00
Karl Meakin f3e254b3f3 [AArch64] Add test for i128 overflow/saturation ops (NFC)
This test exposes opportunities for future optimization work

Differential Revision: https://reviews.llvm.org/D121013
2022-03-05 17:25:04 +00:00
Sanjay Patel f4b53972ce [SDAG] fold bitwise logic with shifted operands
This extends acb96ffd14 to 'and' and 'xor' opcodes.

Copying from that message:

LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z

https://alive2.llvm.org/ce/z/QmR9rR

This is a reassociation + factoring fold. The common shift operation is moved
after a bitwise logic op on 2 input operands.
We get simpler cases of these patterns in IR, but I suspect we would miss all
of these exact tests in IR too. We also handle the simpler form of this plus
several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().
2022-03-05 11:14:45 -05:00
Sanjay Patel 90c2330c15 [AArch64][x86] add tests for bitwise logic + shifts; NFC
Copy tests from ecf606cb43 and replace 'or' with 'xor' / 'and'.
This provides coverage for an enhancement of D120516 / acb96ffd14
2022-03-05 11:14:45 -05:00
Hans Wennborg 85c53c7092 Revert "[AArch64] Async unwind - function prologues"
It caused builds to assert with:

  (StackSize == 0 && "We already have the CFA offset!"),
  function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624.

when targeting iOS. See comment on the code review for reproducer.

> This patch rearranges emission of CFI instructions, so the resulting
> DWARF and `.eh_frame` information is precise at every instruction.
>
> The current state is that the unwind info is emitted only after the
> function prologue. This is fine for synchronous (e.g. C++) exceptions,
> but the information is generally incorrect when the program counter is
> at an instruction in the prologue or the epilogue, for example:
>
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> after the `stp` is executed the (initial) rule for the CFA still says
> the CFA is in the `sp`, even though it's already offset by 16 bytes
>
> A correct unwind info could look like:
> ```
> stp     x29, x30, [sp, #-16]!           // 16-byte Folded Spill
> .cfi_def_cfa_offset 16
> mov     x29, sp
> .cfi_def_cfa w29, 16
> ...
> ```
>
> Having this information precise up to an instruction is useful for
> sampling profilers that would like to get a stack backtrace. The end
> goal (towards this patch is just a step) is to have fully working
> `-fasynchronous-unwind-tables`.
>
> Reviewed By: danielkiss, MaskRay
>
> Differential Revision: https://reviews.llvm.org/D111411

This reverts commit 32e8b550e5.
2022-03-04 17:36:26 +01:00
zhongyunde 7a605ab7bf [AArch64] Use simd mov to materialize big fp constants
mov w8, #1325400064 + fmov s0, w8 ==> movi v0.2s, 0x4f, lsl 24
Fix https://github.com/llvm/llvm-project/issues/53651

Reviewed By: dmgreen, fhahn

Differential Revision: https://reviews.llvm.org/D120452
2022-03-04 11:34:20 -05:00
Karl Meakin 43a0016f3d Extend `performANDCSELCombine` to `performANDORCSELCombine`
Differential Revision: https://reviews.llvm.org/D120422
2022-03-04 15:09:59 +00:00
David Green e348b09bb5 [AArch64] Turn UZP1 with undef operand into truncate
This turns upz1(x, undef) to concat(truncate(x), undef), as the truncate
is simpler and can often be optimized away, and it helps some of the
insert-subvector tests optimize more cleanly.

Differential Revision: https://reviews.llvm.org/D120879
2022-03-04 11:12:26 +00:00
Sander de Smalen 7c65d2288b [AArch64] Improve access to fixed-width object when stack has SVE.
When the stack has SVE objects, fixed-width objects are often better accessed
from the SP, instead of the FP, because part/all of the fixed-width offset
can be folded into the (non-scalable) addressing mode, where otherwise an
ADDVL would be required.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D120738
2022-03-04 09:33:59 +00:00
Sander de Smalen d363bddac5 [AArch64] NFC: Add test for access to fixed-width stack object when stack has SVE.
In this case, the access would benefit from being accessed from the SP, as that
would avoid the redundant ADDVL, since most of the offset can currently be
folded into the addressing mode.
2022-03-04 09:33:59 +00:00
David Green 04661a4d8e [AArch64] Additional insert-subvector codegen tests. NFC 2022-03-04 09:04:09 +00:00
Sanjay Patel 74a65e3834 [AArch64][x86] add tests for rotate/funnel combines; NFC 2022-03-03 15:22:35 -05:00
Florian Hahn 0f261256e0
[AArch64] Use first op of FADDPv* instead of implicit def.
This patch updates the FADDPv* patterns that only use the lower half of
the result register. For those patterns, the second operand does not
matter because its results won't be used.

Instead of introducing new implicit defs for those operands, just use
the first operand. The problem with using new implicit defs is that
register allocation can introduce unnecessary dependencies by using a
different register than the first operand.

For motivating cases, see the changes in the fadd_reduction_*_in_loop
cases. Without this change, the first faddp in the loop has an
unnecessary additional dependency through v0, which is also used for
a cross-iteration reduction.

This can noticeable impact performance. For slightly bigger loops,
this change can improve performance by 15%.

Reviewed By: sdesmalen, t.p.northover

Differential Revision: https://reviews.llvm.org/D120706
2022-03-03 13:32:09 +00:00
Cullen Rhodes e4fa8291a2 [AArch64] Allow copying of SVE registers in Streaming SVE
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D118562
2022-03-03 09:51:14 +00:00
Cullen Rhodes 616586794b [AArch64] Add legal types for Streaming SVE
The compiler currently crashes for scalable types when compiling with
+sme, e.g.

  define <vscale x 4 x i32> @foo(<vscale x 4 x i32> %a) {
    ret <vscale x 4 x i32> %a
  }

since it doesn't know how to legalize the types. SME implies a subset of
SVE (+streaming-sve), the hasSVE predication in the backend needs
extending to consider types/operations that are legal in Streaming SVE.

This is the first patch adding legal types <-> register classes. Before
making the change +sve(2) was temporarily replaced with +sme in all the
intrinsics tests to see what failed, and again after making the change.
For all the tests that passed after adding the legal types another RUN
line has been added for +streaming-sve. More patches to follow.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D118561
2022-03-03 09:51:14 +00:00
Momchil Velikov 63c9aca12a Revert "[AArch64] Async unwind - function epilogues"
This reverts commit 74319d6794.

It causes test failures that look like infinite loop in asan/hwasan
unwinding.
2022-03-02 15:01:57 +00:00
Momchil Velikov 74319d6794 [AArch64] Async unwind - function epilogues
Counterpart of https://reviews.llvm.org/D111411 this change makes the
unwind information instruction precise in function epilogues.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112330
2022-03-02 13:15:11 +00:00
Xiang1 Zhang 65588a0776 Revert "TLS loads opimization (hoist)"
Revert for more reviews

This reverts commit 30e612ebdf.
2022-03-02 14:10:11 +08:00
Xiang1 Zhang 30e612ebdf TLS loads opimization (hoist)
Reviewed By: Wang Pheobe, Topper Craig

Differential Revision: https://reviews.llvm.org/D120000
2022-03-02 10:37:24 +08:00
Cameron McInally 70629d570b [SVE] Update patterns to commute FMLS multiplication operands
Use PatFrags to commute the multiplication operands of an AArch64ISD::FMA_PRED
node, allowing unpredicated FMLS instructions to match.

Reviewed by: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D120570
2022-03-01 12:53:14 -08:00
Florian Hahn bb746716c2
[AArch64] Add tests with unnecessary dependency with faddp lowering.
The added tests highlight an unnecessary cross-iteration dependency when
lowering reductions to faddp. This dependency can negatively impact
performance.
2022-03-01 10:30:34 +00:00
Florian Hahn 70c398c198
[AArch64] Use common CHECK prefix for test, reducing duplicated checks.
Use the common CHECK prefix with runlines with and without fullfp16.
This means no duplicated checks are generated for tests not using fp16.
2022-03-01 10:30:29 +00:00
Sander de Smalen eac2638ec1 [AArch64][SVE] Fold away SETCC if original input was predicate vector.
This adds the following two folds:

Fold 1:
   setcc_merge_zero(
       all_active, extend(nxvNi1 ...), != splat(0))
  -> nxvNi1 ...

Fold 2:
   setcc_merge_zero(
       pred, extend(nxvNi1 ...), != splat(0))
  -> nxvNi1 and(pred, ...)

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D119334
2022-02-28 14:12:43 +00:00
Momchil Velikov 32e8b550e5 [AArch64] Async unwind - function prologues
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.

The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:

```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes

A correct unwind info could look like:
```
stp	x29, x30, [sp, #-16]!           // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov	x29, sp
.cfi_def_cfa w29, 16
...
```

Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.

Reviewed By: danielkiss, MaskRay

Differential Revision: https://reviews.llvm.org/D111411
2022-02-28 13:37:57 +00:00
Sander de Smalen 201e3686ab [AArch64][SVE] Handle more cases in findMoreOptimalIndexType.
This patch addresses @paulwalker-arm's comment on D117900 to
only update/write the by-ref operands iff the function returns
true. It also handles a few more cases where a series of added
offsets can be folded into the base pointer, rather than just looking
at a single offset.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D119728
2022-02-28 12:13:52 +00:00
Sanjay Patel acb96ffd14 [SDAG] fold bitwise logic with shifted operands
LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z

https://alive2.llvm.org/ce/z/QmR9rR

This is a reassociation + factoring fold. The common shift operation is moved
after a bitwise logic op on 2 input operands.
We get simpler cases of these patterns in IR, but I suspect we would miss all
of these exact tests in IR too. We also handle the simpler form of this plus
several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().

This is a partial implementation of a transform suggested in D111530
(only handles 'or' bitwise logic as a first step - need to stamp out more
tests for other opcodes).
Several of the same tests added for D111530 are altered here (but not
fully optimized). I'm not sure yet if this would help/hinder that patch,
but this should be an improvement for all tests added with ecf606cb43
since it removes a shift operation in those examples.

Differential Revision: https://reviews.llvm.org/D120516
2022-02-27 09:54:12 -05:00
Simon Pilgrim fadd20f80d [DAG] Ensure type is legal for bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) fold
As reported on D120192
2022-02-27 11:25:22 +00:00
Florian Hahn c679fbee2a
[AArch64] Add tests for tbl + cmp splitting.
Additional tests showing potential for follow-ups after
D120571.
2022-02-25 17:59:44 +00:00
Paul Walker 16ee102964 [SVE] Add missing splat patterns for bfloat vectors.
Differential Revision: https://reviews.llvm.org/D120496
2022-02-25 16:53:39 +00:00
Paul Walker 7ab78f34cd [SVE] Refactor complex immediate pattern used by CPY/DUP.
SelectSVE8BitLslImm didn't account for constant values that have a
larger bit width than the result vector's element type.  This only
seems to affect a single corner case when lowering fixed length
vectors but the code itself is also not consistent with how other
related complex patterns are implemented so I've taken the
opportunity to refactor the code.

Differential Revision: https://reviews.llvm.org/D120440
2022-02-25 16:12:35 +00:00
Florian Hahn 166968a892
[AArch64] Add test cases where zext can be lowered to series of tbl.
Add a set of tests for upcoming patches that allow lowering vector zext
using AArch64 tbl instructions instead of shifts.
2022-02-25 15:36:32 +00:00
Sanjay Patel ecf606cb43 [AArch64][x86] add tests for bitwise logic + shifts; NFC 2022-02-24 16:01:16 -05:00
Simon Pilgrim 370ebc9d9a [DAG] Attempt to fold bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2))))
If the shl is at least half the bitwidth (i.e. the lower half of the bswap source is zero), then we can reduce the shift and perform the bswap at half the bitwidth and just zero extend.

Based off PR51391 + PR53867

Differential Revision: https://reviews.llvm.org/D120192
2022-02-24 19:33:51 +00:00
David Green b3e9fdd170 [AArch64] Regenerate dp1.ll test, NFC
The old check lines were not showing enough congtext to show issues.
Regenerate the test with theua auto-check lines to be clearer.
2022-02-24 19:33:45 +00:00
Momchil Velikov 17e85cd410 [AArch64] Async unwind - Always place the first LDP at the end when ReverseCSRRestoreSeq is true
This patch is in preparation for the async unwind CFI.

Put the first `LDP` the end, so that the load-store optimizer can run
and merge the `LDP` and the `ADD` into a post-index `LDP`.

Do this always and as early as at the time of the initial creation of
the CSR restore instructions, even if that `LDP` is not guaranteed to
be mergeable with a subsequent `SP` increment.

This greatly simplifies the CFI generation for prologue, as otherwise
we have to take extra steps to ensure reordering does not cross CFI
instructions.

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D112328
2022-02-24 18:48:07 +00:00