llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	bccbf5276e	[AArch64] Remove isDef32 isDef32 would attempt to make a guess at which SelectionDag nodes were 32bit sources, and use the nature of 32bit AArch64 instructions implicitly zeroing the upper register half to not emit zext that were expected to already be zero. This was a bit fragile though, needing to guess at the correct opcodes that do not become 32bit defs later in ISel. This patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes. A part of SelectArithExtendedRegister was left with the same logic as a heuristic to prevent some regressions from it picking less optimal sequences. The AArch64MIPeepholeOpt pass also needs to be taught that a COPY from a FPR will become a FMOVSWr, which it lowers immediately to make sure that remains true through register allocation. Fixes #55833 Differential Revision: https://reviews.llvm.org/D127154	2022-06-07 18:57:59 +01:00
Matt Arsenault	56303223ac	llvm-reduce: Don't assert on functions which don't track liveness Use the query that doesn't assert if TracksLiveness isn't set, which needs to always be available. We also need to start printing liveins regardless of TracksLiveness.	2022-06-07 10:00:25 -04:00
David Green	6468feaeac	[AArch64] Regenerate arm64-shifted-sext.ll and add a test from #55833 . NFC	2022-06-07 13:55:53 +01:00
Michael Kitzan	b7fcf6632f	[GISel] Add new combines for G_ADD Patch adds new GICombineRules for G_ADD: G_ADD(x, G_SUB(y, x)) -> y G_ADD(G_SUB(y, x), x) -> y Patch additionally adds new combine tests for AArch64 target for these new rules. Reviewed by: paquette Differential Revision: https://reviews.llvm.org/D87936	2022-06-06 11:19:45 -07:00
David Green	4ea1b43527	[AArch64] Generate ADDP from shuffled add This adds a fold of add(x, shuffle(x, <1,0,3,2,5,4,...>), into shuffle(addp(x), <0,0,1,1,2,2,..>. The ADDP instruction takes two vectors and returns one, adding adjacent pairs. So we match x in a custom combine as it is lowered from a v8i32. The original code would be 2 rev64 and 2 add, with the new code being a single addp with a zip1;zip2 shuffle, producing smaller code. Differential Revision: https://reviews.llvm.org/D126686	2022-06-06 11:39:51 +01:00
Paul Walker	2dde272db7	[SVE] Refactor sve-bitcast.ll to include all combinations for legal types. Patch enables custom lowering for MVT::nxv4bf16 because otherwise the refactored test file triggers a selection failure. The reason for the refactoring it to highlight cases where the generated code is wrong.	2022-06-03 12:09:19 +01:00
David Green	79e3b043e5	[AArch64] Add extra addp codegen tests. NFC	2022-06-03 11:36:40 +01:00
Serguei Katkov	24e16e4af2	[SSAUpdaterImpl] Do not generate phi node with all the same incoming values If all available vals to basic block are the same - do not build new phi node and just use this value. Reviewed By: sameerds Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D126525	2022-06-03 12:24:33 +07:00
Serguei Katkov	c4d955dd7f	[MachineSSAUpdate] Add a test for redundant phi generation.	2022-06-03 11:27:14 +07:00
Paul Walker	48ea26a387	[SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR. LowerINSERT_SUBVECTOR emits AArch64ISD::UUNPK## when lowering scalable vector floating point INSERT_SUBVECTOR. However, these nodes only make sense for integer types and thus isel patterns do not exist for floating point, which leads to isel failures. This patch ensures floating point operands are cast to integer before the core lowering takes place. Fixes: #55037 Differential Revision: https://reviews.llvm.org/D126487	2022-06-02 14:51:04 +01:00
Nikita Popov	41d5033eb1	[IR] Enable opaque pointers by default This enabled opaque pointers by default in LLVM. The effect of this is twofold: * If IR that contains neither explicit ptr nor %T* types is passed to tools, we will now use opaque pointer mode, unless -opaque-pointers=0 has been explicitly passed. * Users of LLVM as a library will now default to opaque pointers. It is possible to opt-out by calling setOpaquePointers(false) on LLVMContext. A cmake option to toggle this default will not be provided. Frontends or other tools that want to (temporarily) keep using typed pointers should disable opaque pointers via LLVMContext. Differential Revision: https://reviews.llvm.org/D126689	2022-06-02 09:40:56 +02:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit `430ac5c302`. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Fangrui Song	873d2aff42	[AArch64][test] Replace -march with -mtriple for llc RUN lines -march is error-prone: -march inherits the OS and environment from the default target triple. Use -mtriple which is more common.	2022-05-31 22:39:43 -07:00
Alexander Shaposhnikov	a72cc958a3	[CodeGen][AArch64] Add support for LDAPR This diff adds support for LDAPR (RCPC extension) (https://github.com/llvm/llvm-project/issues/55561). Differential revision: https://reviews.llvm.org/D126250 Test plan: ninja check-all	2022-05-31 21:40:50 +00:00
Sander de Smalen	9c38fc111b	[AArch64] Remove references to Streaming SVE from target features. Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler predicates and instead reason about 'SVE' and 'SME' as its higher level features, rather than trying to model this runtime mode through explicit feature flags. This patch is largely NFC. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D125977	2022-05-31 16:25:01 +02:00
David Green	5cb14dc5a3	[AArch64] Look through copy in MachineCombiner FMUL patterns. This is a small addition to D99662, which added machine combiner patterns for FMUL(DUP(..)). Due to the way these are generated from ISel, they may also be FMUL(COPY(DUP(..))), which this patch now ignores the no-op COPY in. Differential Revision: https://reviews.llvm.org/D126632	2022-05-31 09:28:00 +01:00
Edd Barrett	d245974e1a	Test stackmap support for floating point types. It appears that float support is complete, or at least, the stackmap records emitted are not inconceivable (I must admit that I don't know about many of the architectures under test here). One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect) quirk of the stackmap format: in the case of a Register record, the Offset or SmallConstant field can encode a sub-register index! I've only ever seen this field zero for Register entries up until now.	2022-05-30 10:49:32 +01:00
David Green	99b0078064	[AArch64] Tests for showing MachineCombiner COPY patterns. NFC	2022-05-30 10:47:44 +01:00
David Green	9a3144d078	[AArch64] Reuse larger DUP if available If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the larger node using a vector extract to obtain the smaller. This comes up in the smull/smlal code, but needs a small fixup to allow the smull2 code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still match smull2 extracts. Differential Revision: https://reviews.llvm.org/D126449	2022-05-29 19:42:13 +01:00
Serge Pavlov	bdd0093f4d	[GlobalISel] Add G_IS_FPCLASS Add a generic opcode to represent `llvm.is_fpclass` intrinsic. Differential Revision: https://reviews.llvm.org/D121454	2022-05-27 13:49:47 +07:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit `4d8d2580c5`.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Adrian Tong	7c13ae6490	Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP. This is then used in AArch64 to remove copy instructions after taildup ran in machine block placement Differential Revision: https://reviews.llvm.org/D125335	2022-05-26 18:43:16 +00:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply `62a9b36fcf` and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after `62a9b36fcf`. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit `62a9b36fcf`. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Paul Walker	6f215ca680	[SelectionDAG] Add support to widen ISD::STEP_VECTOR operations. Fixes: #55165 Differential Revision: https://reviews.llvm.org/D126168	2022-05-24 22:42:37 +01:00
Chen Zheng	62a9b36fcf	[MachineSink] replace MachineLoop with MachineCycle MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-24 01:16:19 -04:00
Craig Topper	569d8945f3	[DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2. If the VT is i2, then 2 is really -2. Test has not been commited yet, but diff shows the change. Fixes PR55644. Differential Revision: https://reviews.llvm.org/D126213	2022-05-23 11:13:57 -07:00
Craig Topper	75eb0576de	[AArch64] Add test case for pr55644. NFC	2022-05-23 11:13:57 -07:00
Edd Barrett	c5e5cf1258	Test stackmap support for i128 This diff adds tests that check the currently-working stackmap cases for i128. This will help ensure no regressions are later introduced by D125680 (when ready). Note that i128 stackmap support is currently incomplete, so we cant test all i128 functionality: i128 constants >= 2^{63} crash LLVM non-constant i128s crash LLVM So this change tests only constant i128 operands of value < 2^{63}. A couple of incorrect comments are also fixed.	2022-05-23 11:56:24 +01:00
Simon Pilgrim	dd231f02a3	[AArch64] Regenerate andandshift.ll test checks	2022-05-23 11:48:24 +01:00
Andre Vieira	572fc7d2fd	[AArch64] Order STP Q's by ascending address This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule STP Q's to the same base-address in ascending order of offsets. We have found this to improve performance on Neoverse N1 and should not hurt other AArch64 cores. Differential Revision: https://reviews.llvm.org/D125377	2022-05-23 09:50:44 +01:00
Florian Hahn	0cc981e021	[AArch64] implement isReassocProfitable, disable for (u\|s)mlal. Currently reassociating add expressions can lead to failing to select (u\|s)mlal. Implement isReassocProfitable to skip reassociating expressions that can be lowered to (u\|s)mlal. The same issue exists for the *mlsl variants as well, but the DAG combiner doesn't use the isReassocProfitable hook before reassociating. To be fixed in a follow-up commit as this requires DAGCombiner changes as well. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125895	2022-05-23 09:39:00 +01:00
David Green	6ef5e242f2	[AArch64] Fix assumptions on input type of tryCombineFixedPointConvert It is possible for the input type to not be v2i64 or v4i32, so weaken the assertion to a return, fixing the crash in the new test. Fixes #55606	2022-05-23 08:55:54 +01:00
Paul Walker	258dac43d6	[SVE] Enable use of 32bit gather/scatter indices for fixed length vectors Differential Revision: https://reviews.llvm.org/D125193	2022-05-22 12:32:30 +01:00
Bill Wendling	d497129f9b	[AArch64] Use proper instruction mnemonics for FPRs The FPR128 regs need MOVIv2d_ns and SVE regs need DUP_ZI_D. Differential Revision: https://reviews.llvm.org/D126083	2022-05-20 12:02:26 -07:00
Rahul Anand R	534ea8bca5	[AArch64] Generate AND in place of CSEL for predicated CTTZ This patch implements a for a target specific optimization that replaces the cmp and csel from cttz with an and mask. Recommitted with a fix for truncated value sizes. Differential Revision: https://reviews.llvm.org/D123782	2022-05-20 13:41:32 +01:00
Bill Wendling	6e00a34cdb	[AArch64] Add support for -fzero-call-used-regs Support the "-fzero-call-used-regs" option on AArch64. This involves much less specialized code than the X86 version. Most of the checks can be done with TableGen. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D124836	2022-05-19 16:58:28 -07:00
David Green	602f81ec33	[AArch64] Fix zero element TBL indices A TBL instruction will fill out-of-range values with 0's, something used in D121139 to turn tbl2 with a zero input into tbl1s. This works OK for v16i8, but for v8i8 the input is still treated as a v16i8, so out-of-range values (like a lane index of 8) would end up loading values from the top half of the input register. Clean this up by detecting the out of range values and making sure they really use out of range values. There is a fix for swapped indices of 64bit input vectors too, which could be incorrectly adjusted if the zerovector was the first operand. Fixes #55545 Differential Revision: https://reviews.llvm.org/D125865	2022-05-19 13:54:35 +01:00
David Green	dd644ddf85	[AArch64] Extend zero vector TBL codegen tests. NFC	2022-05-19 13:01:55 +01:00
Jon Roelofs	d699e54ca2	Fix an or+and miscompile w/ GlobalISel Fixes #55284	2022-05-18 19:09:47 -07:00
Michael Kitzan	29bebb0237	[GISel] Add new combines for G_FMINNUM/MAXNUM and G_FMINIMUM/MAXIMUM I noticed https://reviews.llvm.org/D87415 added SDAG combines to fold FMIN/MAX instrs with NaNs. The patch implements the same NaN combines for GISel GMIR FMIN/MAX opcodes: G_FMINNUM(X, NaN) -> X G_FMAXNUM(X, NaN) -> X G_FMINIMUM(X, NaN) -> NaN G_FMAXIMUM(X, NaN) -> NaN The patch adds AArch64 tests for these combines as well. Reviewed by: arsenm Differential revision: https://reviews.llvm.org/D125819	2022-05-18 12:08:53 -07:00
Craig Topper	46eef76876	[DAGCombiner] Fix bug in MatchBSwapHWordLow. This function tries to match (a >> 8) \| (a << 8) as (bswap a) >> 16. If the SRL isn't masked and the high bits aren't demanded, we still need to ensure that bits 23:16 are zero. After the right shift they will be in bits 15:8 which is where the important bits from the SHL end up. It's only a bswap if the OR on bits 15:8 only takes the bits from the SHL. Fixes PR55484. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125641	2022-05-18 09:23:18 -07:00
Florian Hahn	a74e075908	[AArch64] Add tests showing reassoc breaks (s\|u)ml(a\|s)l selection.	2022-05-18 16:40:28 +01:00
Simon Pilgrim	939affc67d	[AArch64] neon-vmull-high-p64.ll - fix name/check mismatch identified in D125604 Typos meant that we weren't actually checking the function name, which wasn't accounting for mangling	2022-05-18 13:24:28 +01:00
Simon Pilgrim	1584b2c74e	[AArch64] fp16-v8-instructions.ll - remove some old defunct CHECKS identified in D125604 Typos meant that the update script never removed them	2022-05-18 12:49:05 +01:00
David Green	4c6a070a2c	[AArch64] Teach perfect shuffles tables about D-lane movs Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle tables, slightly lowering the costs a little more. This is a rough improvement in general, especially if you ignore mov v0.16b, v2.16b type moves that are often artefacts of the calling convention. The D register movs are encoded as (0x4 \| LaneIdx), and to generate a D register move we are required to bitcast into a higher type, but it is otherwise very similar to the S-lane mov's already supported. Differential Revision: https://reviews.llvm.org/D125477	2022-05-17 18:16:45 +01:00
David Green	8311fb7512	[AArch64] Extra tests useful for D-lane shuffles. NFC	2022-05-17 11:15:55 +01:00
Martin Storsjö	64a3c63e01	[MC] [Win64EH] Check for matches between epilogs and the prolog on ARM64 This allows sharing opcodes between prolog and epilog even when there is more than one epilog. I didn't make any handcrafted special MC level testcases for this (yet at least), but it does seem to have the expected effect on two existing CodeGen level testcases. Differential Revision: https://reviews.llvm.org/D125619	2022-05-17 00:41:39 +03:00
Martin Storsjö	cabefea2ec	[MC] [Win64EH] Try writing an ARM64 "packed epilog" even if the epilog doesn't share opcodes with the prolog The "packed epilog" form only implies that the epilog is located exactly at the end of the function (so the location of the epilog is implicit from the epilog opcodes), but it doesn't have to share opcodes with the prolog - as long as the total number of opcode bytes and the offset to the epilog fit within the bitfields. This avoids writing a 4 byte epilog scope in many cases. (I haven't measured how much this shrinks actual xdata sections in practice though.) Differential Revision: https://reviews.llvm.org/D125536	2022-05-17 00:41:39 +03:00
Paul Walker	ee8aa351e4	[AArch64] Use ADDV for boolean xor reductions. NEON does not have native support for xor reductions. However, when reducing predicate vectors the operation is synonymous with an add reduction that is supported. Differential Revision: https://reviews.llvm.org/D125605	2022-05-16 22:34:12 +01:00
David Green	5d29d75273	[AArch64] Predicate SSHLL;SCVTF patterns behind UseAlternateSExtLoadCVTF32 There have been some patterns in the AArch64 backend to optimize code of the form: ldrsh w8, [x0] scvtf s0, w8 to: ldr h0, [x0] sshll v0.4s, v0.4h, #0 scvtf s0, s0 The idea is to remove the GRP->FPR move, but in reality is making code larger and slower (or the same) on all the cpus I tried. This patch adds the UseAlternateSExtLoadCVTF32 predicate similar to nearby related pattern. Differential Revision: https://reviews.llvm.org/D125470	2022-05-16 18:00:30 +01:00
Craig Topper	74f6ded49d	[AArch64][ARM][RISCV][X86] Add test cases for PR55484. NFC This bug is in generic DAG combine and easily reproducible on many targets. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D125640	2022-05-16 09:28:11 -07:00
David Green	7272a8c23c	[AArch64] Update check lines in arm64-scvt.ll. NFC	2022-05-16 15:50:39 +01:00
Bradley Smith	7ff5148d64	[DAGCombine] Support splat_vector nodes in (and (extload)) dagcombine Differential Revision: https://reviews.llvm.org/D125367	2022-05-16 11:25:20 +00:00
Tim Northover	1ddc6ab1a9	AArch64: support ISel for fence instructions Only the most conservative of the DAG patterns matched, leaving GISel with "dmb ish" everywhere which is inefficient.	2022-05-16 12:01:18 +01:00
David Green	4c3e51ecfa	[AArch64] Handle 64bit vectors in tryCombineFixedPointConvert Under some situations we can visit 64bit vector extract elements in tryCombineFixedPointConvert, where an assert fires as they are expected to have been converted to 128bit. Turn the assert into an if statement, bailing out and letting the extract be handled first. Also invert some ifs, using early exits to reduce indentation. Fixes #55417	2022-05-16 11:08:47 +01:00
Alex Richardson	c8b44600c5	[AArch64] Avoid emitting MOVID when NEON is disabled Previously, creating a zero floating-point constant used MOVID even when NEON was disabled which resulted in the following fatal error: `Attempting to emit MOVID instruction but the Feature_HasNEON predicate(s) are not met` Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125237	2022-05-14 14:40:51 +00:00
Alex Richardson	09551251e3	[AArch64] Add missing HasNEON predicates to int->float patterns I was trying to compile code with -march=+nosimd and hit various instruction predicate verification errors, this patch should address the ones I saw in integer to floating-pointer conversions. I noticed that for signed conversions, some non-NEON instruction sequences are shorter. I don't know if the longer one is still faster on current architectures (the patterns date back to the initial backend import) Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125308	2022-05-14 14:15:36 +00:00
Alex Richardson	f8639133b5	[AArch64] Baseline test for D125307 Differential Revision: https://reviews.llvm.org/D125240	2022-05-14 14:15:36 +00:00
Eli Friedman	96c2a0c9ff	[GlobalIsel] Fix fallback if stack protector isn't supported. When GlobalISel fails, we need to report the error, and we need to set the FailedISel property. We skipped those steps if stack protector insertion failed, which led to a very strange miscompile. Differential Revision: https://reviews.llvm.org/D125584	2022-05-13 14:17:27 -07:00
Amara Emerson	41fef10449	[GlobalISel] Combine G_SHL, G_ASHR, G_SHL of undef shifts to undef. Differential Revision: https://reviews.llvm.org/D125041	2022-05-13 12:20:34 -07:00
Sam Parker	6d53d35efd	[TypePromotion] Avoid some unnecessary truncs Recommit. Check for legal zext 'sinks' before inserting a trunc. Differential Revision: https://reviews.llvm.org/D115451	2022-05-13 09:45:20 +01:00
Sam Parker	84b5f7c38c	[NFC][TypePromotion][AArch64] Tests Simplify existing test and also add it as a codegen test for aarch64.	2022-05-13 09:27:42 +01:00
Karl Meakin	0298cce257	[AArch64] Add `foldADCToCINC` DAG combine. Differential revision: https://reviews.llvm.org/D123781	2022-05-12 22:21:20 +01:00
Karl Meakin	d29fc6e7d2	[AArch64] Replace `performANDSCombine` with `performFlagSettingCombine`. `performFlagSettingCombine` is a generalised version of `performANDSCombine` which also works on `ADCS` and `SBCS`. Differential revision: https://reviews.llvm.org/D124464	2022-05-12 22:17:23 +01:00
Craig Topper	cec249c60d	[TypePromotion] Promote undef by converting to 0. If we're promoting an undef I think that means that we expect the upper bits are zero. undef doesn't guarantee that. This patch replaces undef with 0 to ensure this. This matches how a zext or sext of undef would be folded by InstCombine/InstSimplify. I haven't found a failure from this was just thinking through the code. Differential Revision: https://reviews.llvm.org/D123174	2022-05-12 09:09:24 -07:00
Nikita Popov	44d85259d0	[AArch64] Preserve chain when lowering fixed length load to SVE (PR55281) When a fixed length load is lowered to an SVE masked load, the result chain is currently set to the input chain of the old load, rather than the result chain of the new load. This may cause stores to be incorrectly reordered. Fixes https://github.com/llvm/llvm-project/issues/55281. Differential Revision: https://reviews.llvm.org/D125464	2022-05-12 16:03:32 +02:00
David Green	442c351b2b	Revert "[AArch64] Generate AND in place of CSEL for predicated CTTZ" This reverts commit `7dcd0ea683` due to issues reported postcommit with the correctness of truncated cttzs.	2022-05-10 17:17:03 +01:00
Rosie Sumpter	1a2665902f	[AArch64][SVE] Improve codegen when extracting first lane of active lane mask When extracting the first lane of a predicate created using the llvm.get.active.lane.mask intrinsic, it should give the same codegen as when the predicate is created using the llvm.aarch64.sve.whilelo intrinsic, since get.active.lane.mask is lowered to whilelo. This patch ensures the codegen is the same by recognizing llvm.get.active.lane.mask as a flag-setting operation in this case. Differential Revision: https://reviews.llvm.org/D125215	2022-05-09 13:56:04 +01:00
Alban Bridonneau	fef81131d9	[SVE] Optimize new cases for lowerConvertToSVBool Converts to SVBool are already considered as a nop, if they are converting an operand from a ptrue or a cmp, because they zero the extra predicate lanes by construction. This patch adds 2 similar cases: - The wide cmp, which were not directly recognized by the test for other forms of cmp - Splats of 1, which will be generated as ptrue, and as such will also zero the extra predicate lines. Reviewed By: paulwalker-arm, peterwaller-arm Differential Revision: https://reviews.llvm.org/D124908	2022-05-09 10:17:57 +00:00
Rahul Anand R	7dcd0ea683	[AArch64] Generate AND in place of CSEL for predicated CTTZ This patch implements a for a target specific optimization that replaces the cmp and csel from cttz with an and mask. Differential Revision: https://reviews.llvm.org/D123782	2022-05-09 10:28:20 +01:00
David Green	830c18047b	[AArch64] Add missing NVCAST patterns. There were apparently some missing NVCAST patterns. This fills them in using foreach, as opposed to having the specify them individually. Fixes #55321	2022-05-07 21:08:14 +01:00
Amaury Séchet	06fad8bc05	[DAGCombine] Add node in the worklist in topological order in CombineTo This is part of an ongoing effort toward making DAGCombine process the nodes in topological order. This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124743	2022-05-07 16:24:31 +00:00
Kazu Hirata	26ba347fbb	[AArch64] Add llvm/test/CodeGen/AArch64/i256-math.ll This patch adds a test case for i256 additions and subtractions. I'm leaving out multiplications for now, which would result in very long sequences. Differential Revision: https://reviews.llvm.org/D125125	2022-05-06 14:26:12 -07:00
Kazu Hirata	fffb6e6afd	[AArch64] Fix sub with carry `13403a70e4` introduced a bug where we generate the outgoing carry inverted, which in turn breaks the lowering of @llvm.usub.sat.i128, returning the normal difference on saturation and zero otherwise. Note that AArch64 has peculiar semantics where the subtraction instructions generate borrow inverted. The problem is that we mix the two forms of semantics -- the normal carry and inverted carry -- in the area of extended precision subtractions. Specifically, we have three problems: - lowerADDSUBCARRY takes the non-inverted incoming carry from a subtraction and feeds it to SBCS without inverting it first. - lowerADDSUBCARRY makes available the outgoing carry from SBCS without inverting it. - foldOverflowCheck folds: (SBC{S} l r (CMP (CSET LO carry) 1)) => (SBC{S} l r carry) When the incoming carry flag is set, CSET LO results in zero. CMP in turn generates a borrow, clearing the carry flag. Instead, we should fold: (SBC{S} l r (CMP 0 (CSET LO carry))) => (SBC{S} l r carry) When the incoming carry flag is set, CSET LO results in zero. CMP does not generate a borrow, setting the carry flag. IIUC, we should use the normal (that is, non-inverted) semantics for carry everywhere. This patch fixes the three problems above. This patch does not add any new testcases because we have a plenty of them covering the instruction in question. In particular, @u128_saturating_sub is identical to the testcase in the motivating issue. Fixes: #55253 Differential Revision: https://reviews.llvm.org/D124976	2022-05-06 11:04:17 -07:00
Craig Topper	76f90a9d71	[SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift. Otherwise we have garbage in the upper bits that can affect the results of the UREM. Fixes PR55296. Differential Revision: https://reviews.llvm.org/D125076	2022-05-06 09:26:30 -07:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
Amara Emerson	586802eb72	[GlobalISel] Re-generate some tests.	2022-05-05 14:14:36 -07:00
Craig Topper	084f967370	[SelectionDAG] Constant fold (sext_inreg undef, VT) to 0 instead of undef. The result of sign_extend_inreg needs to have as many sign bits as requested by the VT argument. The easiest way to guarantee this is to fold it to 0. SystemZ test was modified to avoid using undef. Fixes https://github.com/llvm/llvm-project/issues/55178 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124696	2022-05-05 09:45:35 -07:00
Amara Emerson	87e3646a1f	[AArch64][GlobalISel] Add undef combines to postlegalizer combiner.	2022-05-05 09:22:08 -07:00
David Green	c7a6b11b7e	[ARM][AArch64] Add some extra shuffle conversion test coverage. NFC This adds a big endian run line for the AArch64 TRN tests and regenerated the check lines, along with adding an extra MVE VMOVN case and regenerating vector-DAGCombine.ll for easier updating.	2022-05-05 15:27:44 +01:00
Bradley Smith	8f623f4ab0	[AArch64][SVE] Restore SP from FP when SVE CSRs and variable sized objects are present Without SVE, after a dynamic stack allocation has modified the SP, it is presumed that a frame pointer restoration will revert the SP back to it's correct value prior to any caller stack being restored. However the SVE frame is restored using the stack pointer directly, as it is located after the frame pointer. This means that in the presence of a dynamic stack allocation, any SVE callee state gets corrupted as SP has the incorrect value when the SVE state is restored. To address this issue, when variable sized objects and SVE CSRs are present, treat the stack as having been realigned, hence restoring the stack pointer from the frame pointerr prior to restoring the SVE state. Differential Revision: https://reviews.llvm.org/D124615	2022-05-04 12:57:03 +00:00
Alex Borcan	afaa56df7a	Implement support for __llvm_addrsig for MachO in llvm-mc The __llvm_addrsig section is a section that the linker needs for safe icf. This was not yet implemented for MachO - this is the implementation. It has been tested with a safe deduplication implementation inside lld. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D123751	2022-05-03 18:19:18 -04:00
Jon Roelofs	e1c808b36e	Fix zero-width bitfield extracts to emit 0 Fixes #55129	2022-05-03 14:46:42 -07:00
Philipp Tomsich	64816e68f4	[AArch64] Support for Ampere1 core Add support for the Ampere Computing Ampere1 core. Ampere1 implements the AArch64 state and is compatible with ARMv8.6-A. Differential Revision: https://reviews.llvm.org/D117112	2022-05-03 15:54:02 +01:00
Bradley Smith	96bbd359ed	[AArch64][SVE] Only fold frame indexes referencing SVE objects into SVE loads/stores Currently we always fold frame indexes into SVE load/store instructions, however these instructions can only encode VL scaled offests. This means that when we are accessing a fixed length stack object with these instructions, the folded in frame index gets pulled back out during frame lowering. This can cause issues when we have no spare registers and no emergency spill slot. Rather than causing issues like this, don't fold in frame indexes that reference fixed length objects. Fixes: #55041 Differential Revision: https://reviews.llvm.org/D124457	2022-05-03 09:48:13 +00:00
Sanjay Patel	747c6a0c73	[SDAG] fix miscompile when casting int->FP->int This is the codegen equivalent of D124692. As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. https://alive2.llvm.org/ce/z/KtaDmd Differential Revision: https://reviews.llvm.org/D124771	2022-05-02 14:57:27 -04:00
Sanjay Patel	cb3fb08508	[AArch64] add tests for int->FP->int casts; NFC Copied from x86 tests for multi-target coverage. Also, provides coverage for target-specific asm testing for Alive2 or its follow-ons. See #55150 and D124692	2022-05-02 09:18:12 -04:00
Paul Walker	f10a8f6752	[LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG SIGN_EXTEND_INREG expansion can trigger a TypeSize error because "VT.getSizeInBits() == 1" is used to detect for a boolean without first verifying VT is a scalar.	2022-04-30 19:21:48 +01:00
Craig Topper	6affe87bda	[DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask. We try to match as a disguised rotate by constant of these forms (shl (X \| Y), C1) \| (srl X, C2) --> (rotl X, C1) \| (shl Y, C1) (shl X, C1) \| (srl (X \| Y), C2) --> (rotl X, C1) \| (srl Y, C2) We may have also looked through an AND to find the shift. If we did, we need to apply a mask to the result. I'll add an AArch64 test and pre-commit it and the RISC-V test tomorrow. Fixes PR55201. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124711	2022-04-30 11:02:30 -07:00
Craig Topper	808c33ace5	[RISCV][AArch64] Pre-commit tests for D124711. NFC	2022-04-30 10:59:20 -07:00
Saleem Abdulrasool	24ba1302b3	AArch64: modify Swift async frame record storage on Windows The frame layout on Windows differs from that on other platforms. It will spill the registers in descending numeric value (i.e. x30, x29, ...). Furthermore, the x29, x30 pair is particularly important as it is used for the fast stack walking. As a result, we cannot simply insert the Swift async frame record in between the store. To provide the simplistic search mechanism, always spill the async frame record prior to the spilled registers. This was caught by the assertion failure in the frame lowering code when building the runtime for Windows AArch64. Fixes: #55058 Differential Revision: https://reviews.llvm.org/D124498 Reviewed By: mstorsjo	2022-04-30 09:01:33 -07:00
Craig Topper	65dbd8d793	[SelectionDAG] Pre-commit test for D124696. NFC	2022-04-29 17:24:13 -07:00
Paul Walker	b481512485	[SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine. This is essentially a refactoring patch but allows more cases to be caught, hence the output changes to some tests. Differential Revision: https://reviews.llvm.org/D122994	2022-04-29 17:42:33 +01:00
Paul Walker	23c509754d	[DAGCombiner] Stop invalid sign conversion in refineIndexType. When looking through extends of gather/scatter indices it's safe to convert a known positive signed index to unsigned, but unsigned indices must remain unsigned. Depends On D123318 Differential Revision: https://reviews.llvm.org/D123326	2022-04-29 14:20:13 +01:00
Paul Walker	59588f0a3d	[SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost. getGatherScatterIndexIsExtended currently looks through all SIGN_EXTEND_INREG operations regardless of their input type. This patch restricts the code to only look through i32->i64 extensions, which are the ones supported implicitly by SVE addressing modes. Differential Revision: https://reviews.llvm.org/D123318	2022-04-29 14:20:13 +01:00
Paul Walker	7a0b897e86	[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling refineUniformBase and selectGatherScatterAddrMode both attempt the transformation: base(0) + index(A+splat(B)) => base(B) + index(A) However, this is only safe when index is not implicitly scaled. Differential Revision: https://reviews.llvm.org/D123222	2022-04-29 12:35:16 +01:00
Nikita Popov	4e545bdb35	[SimplifyCFG] Thread branches on same condition in more cases (PR54980) SimplifyCFG implements basic jump threading, if a branch is performed on a phi node with constant operands. However, InstCombine canonicalizes such phis to the condition value of a previous branch, if possible. SimplifyCFG does support this as well, but only in the very limited case where the same condition is used in a direct predecessor -- notably, this does not include the common diamond pattern (i.e. two consecutive if/elses on the same condition). This patch extends the code to look back a limited number of blocks to find a branch on the same value, rather than only looking at the direct predecessor. Fixes https://github.com/llvm/llvm-project/issues/54980. Differential Revision: https://reviews.llvm.org/D124159	2022-04-29 09:44:05 +02:00
Paul Walker	3c382ed71f	[AArch64][SVE] Remove BIC from logical operation DestructiveBinaryComm patterns This reverts part of https://reviews.llvm.org/D124224 that causes an assert because the register allocator triggers a pathological situation where there's no safe way to insert a zeroing MOVPFRX instruction.	2022-04-22 15:07:55 +01:00
zhongyunde	e1afae0311	[AArch64][SVE] Add some logical operation DestructiveBinaryComm patterns Add DestructiveBinaryComm* patterns for ORR, EOR, AND and BIC. The above instructions requires that the source and destination registers are equal, so use movprfx should be beneficial to performance. note: BIC (i.e. A & ~B) is not a commutative operation. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D124224	2022-04-22 20:31:00 +08:00
Daniel Kiss	de07cde67b	[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D111780	2022-04-22 13:25:57 +02:00
Karl Meakin	81904454f7	[AArch64] Add `foldOverflowCheck` DAG combine Differential Revision: https://reviews.llvm.org//D123779	2022-04-21 14:56:38 +01:00
Karl Meakin	13403a70e4	[AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY Differential Revision: https://reviews.llvm.org/D123322	2022-04-21 14:56:37 +01:00
Pengxuan Zheng	38612fbc89	Reland "[COFF, ARM64] Add __break intrinsic" https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170 Reland after fixing the test failure. The failure was due to conflict with a change (D122983) which was merged right before this patch. Reviewed By: rnk, mstorsjo Differential Revision: https://reviews.llvm.org/D124032	2022-04-20 13:01:30 -07:00
Pengxuan Zheng	bff8356b19	Revert "[COFF, ARM64] Add __break intrinsic" This reverts commit `8a9b4fb4aa`.	2022-04-20 11:57:49 -07:00
Pengxuan Zheng	8a9b4fb4aa	[COFF, ARM64] Add __break intrinsic https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170 Reviewed By: rnk, mstorsjo Differential Revision: https://reviews.llvm.org/D124032	2022-04-20 11:20:26 -07:00
Alexey Bataev	2cca53c815	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 09:37:16 -07:00
Alexey Bataev	5f7ac15912	Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer." This reverts commit `2f49163b33` to fix a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284	2022-04-20 06:35:55 -07:00
Alexey Bataev	2f49163b33	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 05:32:56 -07:00
Matt Arsenault	d16945d31b	AArch64/GlobalISel: Add -global-isel-abort=1 to select tests Otherwise the legalizer verifier error isn't triggered since the default is fallback.	2022-04-19 21:04:32 -04:00
David Green	73dc996428	[AArch64] Add lane moves to PerfectShuffle tables This teaches the perfect shuffle tables about lane inserts, that can help reduce the cost of many entries. Many of the shuffle masks are one-away from being correct, and a simple lane move can be a lot simpler than trying to use ext/zip/etc. Because they are not exactly like the other masks handled in the perfect shuffle tables, they require special casing to generate them, with a special InsOp Operator. The lane to insert into is encoded as the RHSID, and the move from is grabbed from the original mask. This helps reduce the maximum perfect shuffle entry cost to 3, with many more shuffles being generatable in a single instruction. Differential Revision: https://reviews.llvm.org/D123386	2022-04-19 14:49:50 +01:00
David Green	cc9495f679	[AArch64] Only mark cost 1 perfect shuffles as legal The perfect shuffle tables encode a cost of either 0 (a nop-copy) or 1 (a single instruction) with a cost encoding of 0 in the upper 2 bits. All perfect shuffles with any cost are then marked as legal shuffles though (the maximum encoded cost is 3), which can confuse the DAG combiner into thinking the shuffles are cheaper than the should be. Limiting legal shuffles to single instructions seems to do better in most case, producing less instructions for complex shuffles. There are some cases that now become tbl, which may be better or worse depending on whether the instruction is in a loop and the tbl load can be hoisted out. Differential Revision: https://reviews.llvm.org/D123377	2022-04-19 12:58:55 +01:00
David Green	50af82701c	[AArch64] Cost all perfect shuffles entries as cost 1 A brief introduction to perfect shuffles - AArch64 NEON has a number of shuffle operations - dups, zips, exts, movs etc that can in some way shuffle around the lanes of a vector. Given a shuffle of size 4 with 2 inputs, some shuffle masks can be easily codegen'd to a single instruction. A <0,0,1,1> mask for example is a zip LHS, LHS. This is great, but some masks are not so simple, like a <0,0,1,2>. It turns out we can generate that from zip LHS, <0,2,0,2>, having generated <0,2,0,2> from uzp LHS, LHS, producing the result in 2 instructions. It is not obvious from a given mask how to get there though. So we have a simple program (PerfectShuffle.cpp in the util folder) that can scan through all combinations of 4-element vectors and generate the perfect combination of results needed for each shuffle mask (for some definition of perfect). This is run offline to generate a table that is queried for generating shuffle instructions. (Because the table could get quite big, it is limited to 4 element vectors). In the perfect shuffle tables zip, unz and trn shuffles were being cost as 2, which is higher than needed and skews the perfect shuffle tables to create inefficient combinations. This sets them to 1 and regenerates the tables. The codegen will usually be better and the costs should be more precise (but it can get less second-order re-use of values from multiple shuffles, these cases should be fixed up in subsequent patches. Differential Revision: https://reviews.llvm.org/D123379	2022-04-19 12:05:05 +01:00
chenglin.bi	222adf338a	[Arch64][SelectionDAG] Add target-specific implementation of srem 1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. 2. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-19 02:49:42 +08:00
Momchil Velikov	e0ff354b83	[AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer [Re-commit after fixing a dereference of "end" iterator] The AArch64LoadStoreOptimnizer pass may merge a register increment/decrement with a following memory operation. In doing so, it may break CFI by moving a stack pointer adjustment past the CFI instruction that described that adjustment. This patch fixes this issue by moving said CFI instruction after the merged instruction, where the SP increment/decrement actually takes place. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D114547	2022-04-18 12:09:44 +01:00
chenglin.bi	acfc025a72	Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem" This reverts commit `9d9eddd3dd`.	2022-04-18 10:35:09 +08:00
chenglin.bi	9d9eddd3dd	[Arch64][SelectionDAG] Add target-specific implementation of srem X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-16 12:29:11 +08:00
Momchil Velikov	24c84bd236	[AArch64] Async unwind - Fix MTE codegen emitting frame adjustments in a loop When untagging the stack, the compiler may emit a sequence like: ``` .LBB0_1: st2g sp, [sp], #32 sub x8, x8, #32 cbnz x8, .LBB0_1 stg sp, [sp], #16 ``` These stack adjustments cannot be described by CFI instructions. This patch disables merging of SP update with untagging, i.e. makes the compiler use an additional scratch register (there should be plenty available at this point as we are in the epilogue) and generate: ``` mov x9, sp mov x8, #256 stg x9, [x9], #16 .LBB0_1: sub x8, x8, #32 st2g x9, [x9], #32 cbnz x8, .LBB0_1 add sp, sp, #272 ``` Merging is disabled only when we need to generate asynchronous unwind tables. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D114548	2022-04-15 14:00:23 +01:00
John Brawn	27a8735a44	[AArch64] Add mayRaiseFPException to appropriate instructions This is mostly handled by adding "let mayRaiseFPException = 1" before the definition of the relevant instruction classes, but there are a couple of complications: * When we have a multiclass where currently some instantiations are of instructions that can raise an exception and others aren't we need to split that into two multiclasses, one inheriting from the other using a multiclass parameter to enable exceptions. * In a couple of places in the globalisel instruction selector we need to manually set the NoFPExcept flag. There's also another place that looks like it should need it, but that code is never hit for those opcodes due to them being handled by the generic instruction selector, so I've instead just removed them from the switch. Differential Revision: https://reviews.llvm.org/D115352	2022-04-14 16:51:22 +01:00
John Brawn	12c1022679	[AArch64] Lowering and legalization of strict FP16 For strict FP16 to work correctly needs some changes in lowering and legalization: * SelectionDAGLegalize::PromoteNode was missing handling for some strict fp opcodes. * Some of the custom lowering of strict fp operations needed to be adjusted to work with FP16. * Custom lowering needed to be added for round-to-int operations. With this, and the previous patches for the rest of the strict fp isel, we can set IsStrictFPEnabled = true. Differential Revision: https://reviews.llvm.org/D115620	2022-04-14 16:51:22 +01:00
David Green	1ba8f4f67d	[AArch64] Move v4i8 concat load lowering to a combine. The existing code was not updating the uses of loads that it recreated, leading to incorrect chains which could break the ordering between nodes. This moves the code to a combine instead, and makes sure we update the chain references. This does mean it happens earlier - potentially before the concats are simplified. This can lead to inefficiencies in the codegen, which will be fixed in followups.	2022-04-14 15:19:33 +01:00
Paul Walker	0c44115e51	[SVE] Add support for non-element-type sized scaling when lowering MGATHER/MSCATTER. The lowering code did not use the scale operand of MGATHER/MSCATTER nodes, but instead assumed scaled indices were always scaled based on the element type of the memory type. This patch adds the missing support by rewritting the nodes as unscaled variants. Differential Revision: https://reviews.llvm.org/D123670	2022-04-14 11:54:46 +01:00
Momchil Velikov	62d4686be3	Revert "[AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer" This reverts commit `ecbf32dd88`. It's possible this patch is the reason for an asertion failure `!NodePtr->isKnownSentinel()` in `AArch64LoadStoreOpt::mergeUpdateInsn` (https://lab.llvm.org/buildbot/#/builders/185/builds/1555) reverting while I investigate.	2022-04-14 09:33:40 +01:00
David Green	4585bff408	[AArch64] Add new shuffles tests, and regenerate aarch64-wide-shuffle.ll and neon-wide-splat.ll. NFC	2022-04-13 18:10:49 +01:00
chenglin.bi	82e5976b7d	[AArch64][SelectionDAG] stick all the power-of-two tests in a separate file; NFC Baseline tests for D122968 (issue #54649).	2022-04-14 00:48:28 +08:00
Momchil Velikov	ecbf32dd88	[AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer The AArch64LoadStoreOptimnizer pass may merge a register increment/decrement with a following memory operation. In doing so, it may break CFI by moving a stack pointer adjustment past the CFI instruction that described that adjustment. This patch fixes this issue by moving said CFI instruction after the merged instruction, where the SP increment/decrement actually takes place. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D114547	2022-04-13 17:04:53 +01:00
Alex Richardson	ee44896cf4	[AArch64] Add missing HasNEON predicate in scalar FABD patterns I was trying to compile with -march=+nosimd and hit the following assertion: `Attempting to emit FABD64 instruction but the Feature_HasNEON predicate(s) are not met`. This adds a HasNEON predicate to the patterns which was omitted in commit `21d9b33d62` for some reason. The new code generation matches GCC with -mcpu=<cpu>+nosimd: https://godbolt.org/z/n1Y7xh5jo Differential Revision: https://reviews.llvm.org/D123491	2022-04-13 09:30:11 +00:00
Alex Richardson	32a353a5e0	[AArch64] Baseline test for D123491	2022-04-13 09:30:11 +00:00
David Sherwood	44271e7c55	[AArch64][SVE] Fix lowering of "fcmp ueq/one" when using SVE We were previously lowering to the incorrect instructions for the setcc DAG node when using the SETUEQ and SETONE floating point condition codes. I have fixed this by marking the SETONE code as Expand and letting the SETUNE code be legal. I have also fixed up the patterns for FCMNE_PPzZZ and FCMNE_PPzZ0 to use the correct opcode. Differential Revision: https://reviews.llvm.org/D121905	2022-04-13 10:24:03 +01:00
Daniel Kiss	b0343a38a5	Support the min of module flags when linking, use for AArch64 BTI/PAC-RET LTO objects might compiled with different `mbranch-protection` flags which will cause an error in the linker. Such a setup is allowed in the normal build with this change that is possible. Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D123493	2022-04-13 09:31:51 +02:00
Matt Arsenault	6009122250	AArch64/GlobalISel: Remove pointless s1 legalize rules These have no net effect on the legalize rules.	2022-04-12 16:54:04 -04:00
Matt Arsenault	3f2cc7cc2b	GlobalISel: Fix lowerSelect handling of boolean high bits This was making several invalid assumptions about the incoming select. First, it was assuming the incoming condition was either s1 or already sign extended, not accounting for different boolean high bits behavior between scalar and vector conditions. We only had a vector boolean due to the intermediate step vector select, which is now avoided. Second, it was assuming it can use the result vector type as a boolean mask. These types don't have anything to do with other, and only makes sense in the context of the expansion to bit operations. Since these logically are part of the same lowering, do the complete expansion in a single step. The added select_v4s1_s1 test does fail to legalize, since it seems AArch64's vector legalization support is pretty incomplete.	2022-04-12 16:54:03 -04:00
Ahmed Bougacha	cfa4fe7c51	[AArch64][LOH] Don't ignore regmasks in bundles by iterating over instrs. The LOH pass iterates over instructions to build its custom register state machine, but it uses the top-level bundle iterator. This should be okay, because when the wrapper BUNDLE MI is built, it aggregates the register defs/uses in its instructions into MOs. However, that doesn't apply to regmasks, and accumulating regmasks across multiple instructions would be messy business. There are a couple AnalyzePhysRegInBundle (/Virt) helpers that do look at regmasks, but those don't fit in very well here. AArch64 has started to use a few bundle instructions, specifically as glorified pseudos for variant call instructions, which have regmasks. So the LOH pass ends up ignoring regmasks. Concretely, this has been wrong for a while, but, on aarch64, the most common bundle (rv_marker call) was always followed by the attached call instruction, a plain BL with a regmask. Which was properly detected by the pass. However, we recently started keeping the attached call in the bundle, so the regmask is now ignored. And the pass happily combines ADRPs, of say, x8, across the bundle, resulting in corrupt pointers later.	2022-04-12 10:34:54 -07:00
Ahmed Bougacha	f3e76dcae3	[AArch64] Cleanup call-rv-marker.ll test. NFC. This was doing -iphoneos instead of -ios. While there, remove an old TODO and cleanup some alignment.	2022-04-12 10:34:54 -07:00
Momchil Velikov	d0ea42a7c1	[AArch64] Async unwind - function epilogues Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330	2022-04-12 16:50:50 +01:00
Matt Arsenault	7e8ff962b3	AArch64/GlobalISel: Regenerate mir test checks Minimizes the test diffs in future changes from introduction of -NEXT.	2022-04-11 20:12:22 -04:00
Matt Arsenault	492d0eab89	AArch64/GlobalISel: Remove IR section from a test	2022-04-11 19:43:37 -04:00
Biplob Mishra	d06fb9045b	AArch64 adding more tests to show the simple scenarios for or/and combine	2022-04-11 20:54:12 +01:00
Momchil Velikov	b4ad28da19	[CodeGen] Async unwind - add a pass to fix CFI information This pass inserts the necessary CFI instructions to compensate for the inconsistency of the call-frame information caused by linear (non-CGA aware) nature of the unwind tables. Unlike the `CFIInstrInserer` pass, this one almost always emits only `.cfi_remember_state`/`.cfi_restore_state`, which results in smaller unwind tables and also transparently handles custom unwind info extensions like CFA offset adjustement and save locations of SVE registers. This pass takes advantage of the constraints taht LLVM imposes on the placement of save/restore points (cf. `ShrinkWrap.cpp`): * there is a single basic block, containing the function prologue * possibly multiple epilogue blocks, where each epilogue block is complete and self-contained, i.e. CSR restore instructions (and the corresponding CFI instructions are not split across two or more blocks. * prologue and epilogue blocks are outside of any loops Thus, during execution, at the beginning and at the end of each basic block the function can be in one of two states: - "has a call frame", if the function has executed the prologue, or has not executed any epilogue - "does not have a call frame", if the function has not executed the prologue, or has executed an epilogue These properties can be computed for each basic block by a single RPO traversal. From the point of view of the unwind tables, the "has/does not have call frame" state at beginning of each block is determined by the state at the end of the previous block, in layout order. Where these states differ, we insert compensating CFI instructions, which come in two flavours: - CFI instructions, which reset the unwind table state to the initial one. This is done by a target specific hook and is expected to be trivial to implement, for example it could be: ``` .cfi_def_cfa <sp>, 0 .cfi_same_value <rN> .cfi_same_value <rN-1> ... ``` where `<rN>` are the callee-saved registers. - CFI instructions, which reset the unwind table state to the one created by the function prologue. These are the sequence: ``` .cfi_restore_state .cfi_remember_state ``` In this case we also insert a `.cfi_remember_state` after the last CFI instruction in the function prologue. Reviewed By: MaskRay, danielkiss, chill Differential Revision: https://reviews.llvm.org/D114545	2022-04-11 13:27:26 +01:00
Sanjay Patel	2ed15984b4	[SDAG] try to reduce compare of funnel shift equal 0 fshl (or X, Y), X, C ==/!= 0 --> or (shl Y, C), X ==/!= 0 fshl X, (or X, Y), C ==/!= 0 --> or (srl Y, BW-C), X ==/!= 0 This is similar to an existing setcc-of-rotate fold, but the matching requires more checks for the more general funnel op: https://alive2.llvm.org/ce/z/Ab2jDd We are effectively decomposing the funnel shift into logical shifts, reassociating, and removing a shift. This should get us the final improvements for x86-64 that were originally shown in D111530 ( https://github.com/llvm/llvm-project/issues/49541 ); x86-32 still shows some SHLD/SHRD, so the pattern is not matching there yet. Differential Revision: https://reviews.llvm.org/D122919	2022-04-11 07:44:58 -04:00
Tim Northover	901831a4e6	Revert "AArch64: take compact unwind frame size from last CFI instruction." It was on ToT when I pushed and committed unintentionally.	2022-04-11 12:25:58 +01:00
Tim Northover	9fe32ca697	AArch64: add nvcast patterns for v1f64	2022-04-11 12:24:48 +01:00
Tim Northover	4120a3abdd	AArch64: take compact unwind frame size from last CFI instruction. Asynchronous exception support for the prologue means that there can be multiple .cfi_def_cfa_offset instructions in a single function, which tripped up an assertion in the compact unwind generator. In reality the compact unwind format is far too restrictive to represent asynchronous frames so if we ever wanted that on Darwin we'd fall back to DWARF (possibly keeping compact unwind around for synchronous users). So the compact format should continue to represent the synchronous situation, and the assertion can be removed.	2022-04-11 12:24:48 +01:00
Tim Northover	6c85668d28	Tail calls: look through AssertZExt to find register copy. arm64_32 guarantees the high 32 bits of pointer parameters are passed as 0, and this is modelled in the IR by inserting an AssertZExt after the CopyFromReg. The function deciding whether registers that need to be preserved actually are wasn't expecting this so it banned perfectly legitimate tail calls.	2022-04-11 12:24:47 +01:00
Alexander Shaposhnikov	626039cdcc	[AArch64] Split fuse-literals feature This diff splits fuse-literals feature and enables fuse-adrp-add by default, in particular, it adjusts instruction scheduling to place ADRP+ADD pairs together. This also enables the linker to apply the relaxations described in `d2ca58c54b`. Differential revision: https://reviews.llvm.org/D120104 Test plan: make check-all	2022-04-11 05:27:11 +00:00
Karl Meakin	784b9d468a	[AArch64] Update tests with the `update_llc_test_checks.py` script (NFC) Reviewed By: Kmeakin Differential Revision: https://reviews.llvm.org/D123317	2022-04-07 18:06:15 +01:00
Paul Walker	a88e8374db	[SVE] Add more gather/scatter tests to highlight bugs in their generated code.	2022-04-07 17:13:48 +01:00
Martin Storsjö	8d7a17b7c8	[AArch64] Fix the upper limit for folded address offsets for COFF In COFF, the immediates in IMAGE_REL_ARM64_PAGEBASE_REL21 relocations are limited to 21 bit signed, i.e. the offset has to be less than (1 << 20). The previous limit did intend to cover for this case, but had missed that the 21 bit field was signed. This fixes issue https://github.com/llvm/llvm-project/issues/54753. Differential Revision: https://reviews.llvm.org/D123160	2022-04-06 22:54:13 +03:00
Daniil Kovalev	62a983ebc5	Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if" This reverts commit `83a798d4b0`. As discussed in D120714 with @thakis, the patch added unneeded complexity without noticeable benefits.	2022-04-06 20:32:53 +03:00
Paul Walker	7d3af9ef0f	[DAGCombine] insert_subvector undef, (splat X), N2 -> splat X Differential Revision: https://reviews.llvm.org/D120328	2022-04-06 17:15:38 +01:00
Paul Walker	5e407f0887	[SVE] Add gather/scatter tests to highlight bugs in their generated code.	2022-04-06 15:30:29 +01:00
chenglin.bi	87f0d55304	[AArch64] Fold lsr+bfi in tryBitfieldInsertOpFromOr In tryBitfieldInsertOpFromOr, if the new created LSR Node's source is LSR with Imm shift, try to fold them. Fixes https://github.com/llvm/llvm-project/issues/54696 Reviewed By: efriedma, benshi001 Differential Revision: https://reviews.llvm.org/D122915	2022-04-06 22:02:31 +08:00
zhongyunde	9a2d5cc1da	[SVE][AArch64] Enable first active true vector combine for INTRINSIC_WO_CHAIN WHILELO/LS insn is used very important for SVE loop, and itself is a flag-setting operation, so add it. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D122796	2022-04-06 21:01:37 +08:00
zhongyunde	19e5235147	[AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD because it relies on BuildVectorSDNode is fails for scalable vectors. This patch is intend to handle that, so we can circle back the type MVT::nxv2i32 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122703	2022-04-06 20:50:42 +08:00
Daniil Kovalev	83a798d4b0	[CodeGen] Place SDNode debug ID declaration under appropriate #if Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed. Differential Revision: https://reviews.llvm.org/D120714	2022-04-06 14:09:32 +03:00
Matt Arsenault	634bf829a8	MachineVerifier: Diagnose undef set on full register defs An undef def of a full register would assert in LiveIntervalCalc.	2022-04-05 22:19:17 -04:00
zhongyunde	251637690a	[AArch64] Enhance last active true vector combine Last active extracting will output LASTB + WHILELS, and the WHILELS itself is a flag-setting operation, so perform it preferly. Reviewed By: paulwalker-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D122551	2022-04-06 09:54:28 +08:00
Jessica Paquette	6c9bc2dd1c	[GlobalISel] NFC: Add test coverage for s144 and s142 144 = 16 * 9 For types where s16 is legal. It may be interesting to break these down into 16-bit chunks rather than 32 or 64 bits. Add tests for some opcodes, just so we get some test coverage drawing attention to this.	2022-04-05 15:26:46 -07:00
Jessica Paquette	30922d62f4	[GlobalISel] NFC: Add some test coverage for s158 158 = 32 * 5 - 2 This is a wide type which may benefit from a different widening scheme than types which are multiples of 64. For example, if 32-bit and 64-bit scalars are both allowed, and a type is a multiple of 32, or is closer to a multiple of 32, it may be better to - Widen to the wide multiple of 32 - Break up the type into 32-bit chunks Anyway, we don't have any test coverage for this at all, so for the sake of making sure we test it, let's add some test coverage.	2022-04-05 15:11:22 -07:00
Jessica Paquette	5830afa532	[GlobalISel] NFC: Regen some tests + improve test coverage for wide even types It turns out we don't do an awesome job with weird types like s318 (and other types near them, like s316). We don't have any test coverage for those types, so let's add some so it's easier to see the impact of legalization improvements on them when we make changes. Since the test generator was changed, it's easier to update relevant tests prior to changing things rather than squinting at a bunch of "ah, CHECK is now CHECK-NEXT" lines. So, let's just regenerate a bunch of tests while we're here. Unfortunately the "CHECK-NEXT" scheme doesn't work with legalize-cmp for some reason, and the test will fail. So keep that one having CHECK lines.	2022-04-05 12:13:22 -07:00
Biplob Mishra	90853d8f37	Adding new tests to demonstrate code patterns with multiple or/and which can be combined with a single mask	2022-04-05 14:17:02 +01:00
Biplob Mishra	edb4520205	rev16 instruction is being generated for a half word byte swap on a 32-bit input as a bswap+rotr. This is not true for a 64-bit input. This patch implements the rev16 instruction for a AArch64 backend for a half word byte swap on a 64-bit input. Differential Revision: https://reviews.llvm.org/D122643	2022-04-05 13:43:11 +01:00
Biplob Mishra	f2b4b2ebe7	Reverting changes to correct the commit message	2022-04-05 13:38:14 +01:00
Biplob Mishra	afca54f0cf	[ARM][AArch64] Optimize pattern for converting a half word byte swap in a 64-bit input to a rev16 instruction. Differential Revision: https://reviews.llvm.org/D122643	2022-04-05 12:23:09 +01:00
Muhammad Omair Javaid	0320115c16	Revert "[CodeGen] Async unwind - add a pass to fix CFI information" This reverts commit `980c3e6dd2`. This commit had failing tests with clang crashing across various AArch64/Linux buildots. https://lab.llvm.org/buildbot/#/builders/179/builds/3346 Differential Revision: https://reviews.llvm.org/D114545	2022-04-05 13:12:30 +05:00
David Green	3b9833597e	[AArch64] Alter mull buildvectors(ext(..)) combine to work on shuffles D120018 altered this combine to work on buildvectors as opposed to shuffle dup's. This works well for dups and other things that are expanded into buildvectors. Some shuffles are legal though, and stay as vector_shuffle through lowering. This expands the transform to also handle shuffles, so that we can turn mul(shuffle(sext into mul(sext(shuffle and more readily make smull/umull instructions. This can come up from the SLP vectorizer adding shuffles that are costed from extends. Differential Revision: https://reviews.llvm.org/D123012	2022-04-04 23:07:47 +01:00
David Green	a70480dd13	[AArch64] Add some tests for mul(shuffle(ext. NFC	2022-04-04 22:54:55 +01:00
Momchil Velikov	980c3e6dd2	[CodeGen] Async unwind - add a pass to fix CFI information This pass inserts the necessary CFI instructions to compensate for the inconsistency of the call-frame information caused by linear (non-CFG aware) nature of the unwind tables. Unlike the `CFIInstrInserer` pass, this one almost always emits only `.cfi_remember_state`/`.cfi_restore_state`, which results in smaller unwind tables and also transparently handles custom unwind info extensions like CFA offset adjustement and save locations of SVE registers. This pass takes advantage of the constraints that LLVM imposes on the placement of save/restore points (cf. `ShrinkWrap.cpp`): * there is a single basic block, containing the function prologue * possibly multiple epilogue blocks, where each epilogue block is complete and self-contained, i.e. CSR restore instructions (and the corresponding CFI instructions are not split across two or more blocks. * prologue and epilogue blocks are outside of any loops Thus, during execution, at the beginning and at the end of each basic block the function can be in one of two states: - "has a call frame", if the function has executed the prologue, or has not executed any epilogue - "does not have a call frame", if the function has not executed the prologue, or has executed an epilogue These properties can be computed for each basic block by a single RPO traversal. In order to accommodate backends which do not generate unwind info in epilogues we compute an additional property "strong no call frame on entry" which is set for the entry point of the function and for every block reachable from the entry along a path that does not execute the prologue. If this property holds, it takes precedence over the "has a call frame" property. From the point of view of the unwind tables, the "has/does not have call frame" state at beginning of each block is determined by the state at the end of the previous block, in layout order. Where these states differ, we insert compensating CFI instructions, which come in two flavours: - CFI instructions, which reset the unwind table state to the initial one. This is done by a target specific hook and is expected to be trivial to implement, for example it could be: ``` .cfi_def_cfa <sp>, 0 .cfi_same_value <rN> .cfi_same_value <rN-1> ... ``` where `<rN>` are the callee-saved registers. - CFI instructions, which reset the unwind table state to the one created by the function prologue. These are the sequence: ``` .cfi_restore_state .cfi_remember_state ``` In this case we also insert a `.cfi_remember_state` after the last CFI instruction in the function prologue. Reviewed By: MaskRay, danielkiss, chill Differential Revision: https://reviews.llvm.org/D114545	2022-04-04 14:38:22 +01:00
Dávid Bolvanský	fb65aaf0be	[NFCI] Fixed missing colon in CHECK directives - part 2	2022-04-03 14:42:59 +02:00
Sanjay Patel	ec0b332cd8	[AArch64] add tests for funnel+or == 0; NFC These are copied from x86 ( `1074bdfb52` ) to provide more coverage for a potential generic combine.	2022-04-01 13:39:25 -04:00
Nicholas Guy	7d676714fb	[AArch64] Set MaxBytesForLoopAlignment for more targets Differential Revision: https://reviews.llvm.org/D122566	2022-03-31 11:37:11 +01:00
Sanjay Patel	e18cc5277f	[SDAG] try to canonicalize logical shift after bswap When shifting by a byte-multiple: bswap (shl X, C) --> lshr (bswap X), C bswap (lshr X, C) --> shl (bswap X), C This is the backend version of D122010 and an alternative suggested in D120648. There's an extra check to make sure the shift amount is valid that was not in the rough draft. I'm not sure if there is a larger motivating case for RISCV (bug report?), but the ARM diffs show a benefit from having a late version of the transform (because we do not combine the loads in IR). Differential Revision: https://reviews.llvm.org/D122655	2022-03-30 09:29:32 -04:00
Eli Friedman	a8ebd85e46	[MC] Make MCAsmInfo::isAcceptableChar reflect MCAsmInfo::doesAllowAtInName On targets which don't allow "@" in unquoted identifiers, make sure we don't emit them; otherwise, we can't parse our own output. Differential Revision: https://reviews.llvm.org/D122516	2022-03-29 14:01:32 -07:00
David Green	60f57b3658	[AArch64] Ensure fixed point fptoi_sat has correct saturation width D113200 introduced an error where it was converting FP_TO_SI_SAT with multiply to a fixed point floating point convert. The saturation bitwidth needs to be equal to the floating point width, or else the routine would truncate the result as opposed to saturating it. Fixes #54601	2022-03-29 10:12:44 +01:00
zhongyunde	2b3becb41d	[AArch64][GlobalISel] Add new MOVI pattern for fp constants GlobalISel is used in option -O0, so add MOVI pattern for it, which is done similar in gcc.(https://godbolt.org/z/8j6fzG3h6) Fix https://github.com/llvm/llvm-project/issues/53651 Reviewed By: dmgreen, paquette Differential Revision: https://reviews.llvm.org/D122559	2022-03-29 10:57:22 +08:00
zhongyunde	c3fe025bd4	[AArch64][SelectionDAG] Refactor to support more scalable vector extending loads Accord the discussion in D120953, we should firstly exclude all scalable vector extending loads and then selectively enable those which we directly support. This patch is intend to refactor for above (truncating stores is not touched),and more scalable vector types will try to reduce the number of masked loads in favour of more unpklo/hi instructions. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122281	2022-03-27 21:18:01 +08:00
David Green	693d3b7e76	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-26 21:10:43 +00:00
zhongyunde	758be63ac6	[test][AArch64] Add a test case for D121180 NFC Now, perform last active true vector combine only where we're extracting from a flag-setting operation. But in fact, the last active extracting will output LASTB + WHILELS, and the WHILELS itself is a flag-setting operation, so precommit this case to test the potentially further optimization. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122453	2022-03-26 19:12:16 +08:00
David Green	3d8d60e147	Revert "[AArch64] Lower 3 and 4 sources buildvectors to TBL" This reverts commit `ec93b28909` as problems with it have been reported.	2022-03-25 10:03:10 +00:00
Momchil Velikov	50a97aacac	[AArch64] Async unwind - function prologues Re-commit of `32e8b550e5` This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-03-24 16:16:44 +00:00
David Green	ec93b28909	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-24 10:02:33 +00:00
David Green	311bdbc9b7	[AArch64] Add tests showing inefficient TBL3/4 generation. NFC	2022-03-23 16:43:23 +00:00
David Spickett	c3b98194df	Reland "[llvm][AArch64] Insert "bti j" after call to setjmp" This reverts commit `edb7ba714a`. This changes BLR_BTI to take variable_ops meaning that we can accept a register or a label. The pattern still expects one argument so we'll never get more than one. Then later we can check the type of the operand to choose BL or BLR to emit. (this is what BLR_RVMARKER does but I missed this detail of it first time around) Also require NoSLSBLRMitigation which I missed in the first version.	2022-03-23 11:43:43 +00:00
David Spickett	edb7ba714a	Revert "[llvm][AArch64] Insert "bti j" after call to setjmp" This reverts commit `eb5ecbbcbb` due to failures on buildbots with expensive checks enabled.	2022-03-23 10:43:20 +00:00
David Spickett	eb5ecbbcbb	[llvm][AArch64] Insert "bti j" after call to setjmp Some implementations of setjmp will end with a br instead of a ret. This means that the next instruction after a call to setjmp must be a "bti j" (j for jump) to make this work when branch target identification is enabled. The BTI extension was added in armv8.5-a but the bti instruction is in the hint space. This means we can emit it for any architecture version as long as branch target enforcement flags are passed. The starting point for the hint number is 32 then call adds 2, jump adds 4. Hence "hint #36" for a "bti j" (and "hint #34" for the "bti c" you see at the start of functions). The existing Arm command line option -mno-bti-at-return-twice has been applied to AArch64 as well. Support is added to SelectionDAG Isel and GlobalIsel. FastIsel will defer to SelectionDAG. Based on the change done for M profile Arm in https://reviews.llvm.org/D112427 Fixes #48888 Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D121707	2022-03-23 09:51:02 +00:00
zhongyunde	828b89bc0b	[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads Trying to reduce the number of masked loads in favour of more unpklo/hi instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions from legal types. Both of normal and masked loads test cases added to guard compile crash. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D120953	2022-03-21 23:47:33 +08:00
chenglin.bi	dd3b90e4d7	[AArch64] Combine ISD::SETCC into AArch64ISD::ANDS When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false). ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction. So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison Fix https://github.com/llvm/llvm-project/issues/54283 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121449	2022-03-19 13:04:16 +00:00
Paul Walker	f46fe36d59	[AArch64] Fix incorrect getSetCCInverse usage within trySwapVSelectOperands. When inverting the compare predicate trySwapVSelectOperands is incorrectly using the type of the select's cond operand rather than the type of cond's operands. This means we're treating all inversions as if they're integer. Differential Revision: https://reviews.llvm.org/D121968	2022-03-19 12:36:14 +00:00
David Green	fe6057a293	[AArch64] Custom lower concat(v4i8 load, ...) We already have custom lowering for v4i8 load, which loads as a f32, converts to a vector and bitcasts and extends the result to a v4i16. This adds some custom lowering of concat(v4i8 load, ...) to keep the result as an f32 and create a buildvector of the resulting f32 loads. This helps not create all the extends and bitcasts, which are often difficult to fully clean up. Differential Revision: https://reviews.llvm.org/D121400	2022-03-18 11:58:02 +00:00
David Green	0fa4aeb453	[AArch64] Add extra insert-subvector tests. NFC	2022-03-17 15:29:07 +00:00
David Green	0b6df40c52	[AArch64] Combine ISD::AND into AArch64ISD::ANDS If we already have a AArch64ISD::ANDS node with identical operands, we can merge any ISD::AND into it, reducing the instruction count by calculating the value and the flags in a single operation. This code is taken from the X86 backend, and could also handle AArch64ISD::ADDS and AArch64ISD::SUBS, but I couldn't find any test cases where it came up. Differential Revision: https://reviews.llvm.org/D118584	2022-03-17 09:44:11 +00:00
David Green	09a2b5b506	[AArch64] Regenerate and extend peephole-and-tst.ll tests. NFC	2022-03-16 09:44:20 +00:00
Matthias Gehre	09854f2af3	[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers Emit calls to __divei4 and friends for divison/remainder of large integers. This fixes https://github.com/llvm/llvm-project/issues/44994. The overall RFC is in https://discourse.llvm.org/t/rfc-add-support-for-division-of-large-bitint-builtins-selectiondag-globalisel-clang/60329 The compiler-rt part is in https://reviews.llvm.org/D120327 Differential Revision: https://reviews.llvm.org/D120329	2022-03-16 09:36:28 +00:00
Amara Emerson	8cbf18cb04	[GlobalISel] Fix store merging incorrectly merging volatile stores. The existing volatile checks only handle aliasing hazards between stores, but that isn't enough since by that point volatile stores may have already been added to the current candidate group.	2022-03-14 13:48:51 -07:00
Florian Mayer	628c537b32	[MTE] Add test that stack tagging does not mess up stack coloring. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D121433	2022-03-14 13:36:21 -07:00
Mircea Trofin	294eca35a0	[regalloc] Remove -consider-local-interval-cost Discussed extensively on D98232. The functionality introduced in D35816 never worked correctly. In D98232, it was fixed, but, as it was introducing a large compile-time regression, and the value of the original patch was called into doubt, we disabled it by default everywhere. A year later, it appears that caused no grief, so it seems safe to remove the disabled code. This should be accompanied by re-opening bug 26810. Differential Revision: https://reviews.llvm.org/D121128	2022-03-14 10:49:16 -07:00
zhongyunde	3568333815	[AArch64] Perform last active true vector combine Test bit of lane EC-1 can use P register directly, eg: Materialize : Idx = (add (mul vscale, NumEls), -1) i1 = extract_vector_elt t37, Constant:i64<Idx> ... into: "ptrue p, all" + PTEST Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121180	2022-03-15 01:25:03 +08:00
Arthur Eubanks	250620f76e	[OpaquePtr][AArch64] Use elementtype on ldxr/stxr Includes verifier changes checking the elementtype, clang codegen changes to emit the elementtype, and ISel changes using the elementtype. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120527	2022-03-14 10:09:59 -07:00
Sanjay Patel	c2592c374e	[SDAG] simplify bitwise logic with repeated operand We do not have general reassociation here (and probably do not need it), but I noticed these were missing in patches/tests motivated by D111530, so we can at least handle the simplest patterns. The VE test diff looks correct, but we miss that pattern in IR currently: https://alive2.llvm.org/ce/z/u66_PM	2022-03-13 11:12:30 -04:00
Sanjay Patel	9f4caf55db	[AArch64] add tests for bitwise logic reassociation; NFC Chooses from a variety of scalar/vector/illegal types because that should not inhibit any folds.	2022-03-13 11:12:30 -04:00
David Sherwood	aeeb1199b4	[AArch64][SVE] Change the asserts in LowerToPredicatedOp to check for legal types When building the LLVM test suite with SVE I discovered a crash when compiling some Halide tests, which occurs because we try to use SVE to lower 64-bit vector multiplies and there is no vscale_range attribute on the function. In this case the min SVE vector bits was 0, which caused an assert in LowerToPredicatedOp to fire. I have amended the asserts in this function to check that the fixed-width type is legal. If the fixed-width type is larger than NEON and is legal then it must be because we've set the min SVE vector bits to something > 128. Or if the min SVE bits is 0, then the only legal types allowed are 128 bit types - for any other types the assert will fire. Tests added here: CodeGen/AArch64/sve-fixed-length-no-vscale-range.ll Differential Revision: https://reviews.llvm.org/D121297	2022-03-11 09:57:58 +00:00
Philippe Valembois	26cd258420	[AArch64] Use correct calling convention for each vararg While checking is tail call optimization is possible, the calling convention applied to fixed arguments is not the correct one. This implies for DarwinPCS that all arguments of a vararg function will go to the stack although fixed ones can go in registers. This prevents non-virtual thunks to be tail optimized although they are marked as musttail. Differential Revision: https://reviews.llvm.org/D120622	2022-03-10 15:07:25 -08:00
David Green	21a97a2ac1	[AArch64] TBL uses zero for out of range elements. A TBL instruction will use zero for any out of range values. We can use this in GenerateTBL to help turn a TBL2 into a TBL1, avoiding the need to materialise the zero. Differential Revision: https://reviews.llvm.org/D121139	2022-03-10 14:45:13 +00:00
David Green	43591be2aa	[AArch64] Extra tests for tbl with zero elements. NFC	2022-03-10 13:51:04 +00:00
Xiang1 Zhang	c31014322c	TLS loads opimization (hoist) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120000	2022-03-10 09:29:06 +08:00
Saleem Abdulrasool	c31f0a0050	AArch64: correct epilogue/prologue emission for swift async The prologue and epilogue emission were unbalanced in light of different strategies of async frame context emission. Adjust the epilogue emission to match the prologue emission. This makes the elision work properly as well as the deployment based. Due to the fact that the epilogue always was clearing a bit (which should not be set in the first place), the client would not notice the behavioural issue unless the deployment version was in effect.	2022-03-09 18:41:10 +00:00
Sanjay Patel	341623653d	[SDAG] match rotate pattern with extra 'or' operation This is another fold generalized from D111530. We can find a common source for a rotate operation hidden inside an 'or': https://alive2.llvm.org/ce/z/9pV8hn Deciding when this is profitable vs. a funnel-shift is tricky, but this does not show any regressions: if a target has a rotate but it does not have a funnel-shift, then try to form the rotate here. That is why we don't have x86 test diffs for the scalar tests that are duplicated from AArch64 ( `74a65e3834` ) - shld/shrd are available. That also makes it difficult to show vector diffs - the only case where I found a diff was on x86 AVX512 or XOP with i64 elements. There's an additional check for a legal type to avoid a problem seen with x86-32 where we form a 64-bit rotate but then it gets split inefficiently. We might avoid that by adding more rotate folds, but I didn't check to see what is missing on that path. This gets most of the motivating patterns for AArch64 / ARM that are in D111530. We still need a couple of enhancements to setcc pattern matching with rotate/funnel-shift to get the rest. Differential Revision: https://reviews.llvm.org/D120933	2022-03-09 13:19:00 -05:00
Florian Hahn	3836003e87	[AArch64] Add test for D120481 with multiple uses.	2022-03-08 11:11:03 +00:00
zhongyunde	c22c8b151b	[AArch64] Perform first active true vector combine Materialize : i1 = extract_vector_elt t37, Constant:i64<0> ... into: "ptrue p, all" + PTEST Test bit of lane 0 can use P register directly, and the instruction “pture all” is loop invariant, which will beneficial to SVE after hoisting out the loop. Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D120891	2022-03-08 01:10:21 +08:00
David Green	d9633d1490	[AArch64] Turn truncating buildvectors into truncates When lowering large v16f32->v16i8 fp_to_si_sat, the fp_to_si_sat node is split several times, creating an illegal v4i8 concat that gets expanded into a BUILD_VECTOR. After some combining and other legalisation, it ends up the a buildvector that extracts from 4 vectors, looking like BUILDVECTOR(a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3). That is really an v16i32->v16i8 truncate in disguise. This adds a ReconstructTruncateFromBuildVector method to detect the pattern, converting it back into the legal "concat(trunc(concat(trunc(a), trunc(b))), trunc(concat(trunc(c), trunc(d))))" tree. The extracted nodes could also be v4i16, in which case the truncates are not needed. All those truncates and concats then become uzip1's, which is much better than expanding by moving vector lanes around. Differential Revision: https://reviews.llvm.org/D119469	2022-03-07 09:42:54 +00:00
David Green	4388f4f776	[DAG] Don't convert undef to 0 when creating buildvector When inserting undef into buildvectors created from shuffles of buildvectors, we convert elements to the largest needed type. This had the effect of converting undef into 0, which isn't needed as the buildvector implicitly truncates and trunc(zext(undef)) == undef. Differential Revision: https://reviews.llvm.org/D121002	2022-03-06 18:35:34 +00:00
David Green	84ccd015e7	[AArch64] Some tests to show reconstructing truncates. NFC	2022-03-05 18:35:43 +00:00
Karl Meakin	1d8093fe1e	[AArch64] fix i128-math.ll	2022-03-05 17:51:58 +00:00
Karl Meakin	f3e254b3f3	[AArch64] Add test for i128 overflow/saturation ops (NFC) This test exposes opportunities for future optimization work Differential Revision: https://reviews.llvm.org/D121013	2022-03-05 17:25:04 +00:00
Sanjay Patel	f4b53972ce	[SDAG] fold bitwise logic with shifted operands This extends `acb96ffd14` to 'and' and 'xor' opcodes. Copying from that message: LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands().	2022-03-05 11:14:45 -05:00
Sanjay Patel	90c2330c15	[AArch64][x86] add tests for bitwise logic + shifts; NFC Copy tests from `ecf606cb43` and replace 'or' with 'xor' / 'and'. This provides coverage for an enhancement of D120516 / `acb96ffd14`	2022-03-05 11:14:45 -05:00
Hans Wennborg	85c53c7092	Revert "[AArch64] Async unwind - function prologues" It caused builds to assert with: (StackSize == 0 && "We already have the CFA offset!"), function generateCompactUnwindEncoding, file AArch64AsmBackend.cpp, line 624. when targeting iOS. See comment on the code review for reproducer. > This patch rearranges emission of CFI instructions, so the resulting > DWARF and `.eh_frame` information is precise at every instruction. > > The current state is that the unwind info is emitted only after the > function prologue. This is fine for synchronous (e.g. C++) exceptions, > but the information is generally incorrect when the program counter is > at an instruction in the prologue or the epilogue, for example: > > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > after the `stp` is executed the (initial) rule for the CFA still says > the CFA is in the `sp`, even though it's already offset by 16 bytes > > A correct unwind info could look like: > ``` > stp x29, x30, [sp, #-16]! // 16-byte Folded Spill > .cfi_def_cfa_offset 16 > mov x29, sp > .cfi_def_cfa w29, 16 > ... > ``` > > Having this information precise up to an instruction is useful for > sampling profilers that would like to get a stack backtrace. The end > goal (towards this patch is just a step) is to have fully working > `-fasynchronous-unwind-tables`. > > Reviewed By: danielkiss, MaskRay > > Differential Revision: https://reviews.llvm.org/D111411 This reverts commit `32e8b550e5`.	2022-03-04 17:36:26 +01:00
zhongyunde	7a605ab7bf	[AArch64] Use simd mov to materialize big fp constants mov w8, #1325400064 + fmov s0, w8 ==> movi v0.2s, 0x4f, lsl 24 Fix https://github.com/llvm/llvm-project/issues/53651 Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D120452	2022-03-04 11:34:20 -05:00
Karl Meakin	43a0016f3d	Extend `performANDCSELCombine` to `performANDORCSELCombine` Differential Revision: https://reviews.llvm.org/D120422	2022-03-04 15:09:59 +00:00
David Green	e348b09bb5	[AArch64] Turn UZP1 with undef operand into truncate This turns upz1(x, undef) to concat(truncate(x), undef), as the truncate is simpler and can often be optimized away, and it helps some of the insert-subvector tests optimize more cleanly. Differential Revision: https://reviews.llvm.org/D120879	2022-03-04 11:12:26 +00:00
Sander de Smalen	7c65d2288b	[AArch64] Improve access to fixed-width object when stack has SVE. When the stack has SVE objects, fixed-width objects are often better accessed from the SP, instead of the FP, because part/all of the fixed-width offset can be folded into the (non-scalable) addressing mode, where otherwise an ADDVL would be required. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D120738	2022-03-04 09:33:59 +00:00
Sander de Smalen	d363bddac5	[AArch64] NFC: Add test for access to fixed-width stack object when stack has SVE. In this case, the access would benefit from being accessed from the SP, as that would avoid the redundant ADDVL, since most of the offset can currently be folded into the addressing mode.	2022-03-04 09:33:59 +00:00
David Green	04661a4d8e	[AArch64] Additional insert-subvector codegen tests. NFC	2022-03-04 09:04:09 +00:00
Sanjay Patel	74a65e3834	[AArch64][x86] add tests for rotate/funnel combines; NFC	2022-03-03 15:22:35 -05:00
Florian Hahn	0f261256e0	[AArch64] Use first op of FADDPv* instead of implicit def. This patch updates the FADDPv* patterns that only use the lower half of the result register. For those patterns, the second operand does not matter because its results won't be used. Instead of introducing new implicit defs for those operands, just use the first operand. The problem with using new implicit defs is that register allocation can introduce unnecessary dependencies by using a different register than the first operand. For motivating cases, see the changes in the fadd_reduction_*_in_loop cases. Without this change, the first faddp in the loop has an unnecessary additional dependency through v0, which is also used for a cross-iteration reduction. This can noticeable impact performance. For slightly bigger loops, this change can improve performance by 15%. Reviewed By: sdesmalen, t.p.northover Differential Revision: https://reviews.llvm.org/D120706	2022-03-03 13:32:09 +00:00
Cullen Rhodes	e4fa8291a2	[AArch64] Allow copying of SVE registers in Streaming SVE Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118562	2022-03-03 09:51:14 +00:00
Cullen Rhodes	616586794b	[AArch64] Add legal types for Streaming SVE The compiler currently crashes for scalable types when compiling with +sme, e.g. define <vscale x 4 x i32> @foo(<vscale x 4 x i32> %a) { ret <vscale x 4 x i32> %a } since it doesn't know how to legalize the types. SME implies a subset of SVE (+streaming-sve), the hasSVE predication in the backend needs extending to consider types/operations that are legal in Streaming SVE. This is the first patch adding legal types <-> register classes. Before making the change +sve(2) was temporarily replaced with +sme in all the intrinsics tests to see what failed, and again after making the change. For all the tests that passed after adding the legal types another RUN line has been added for +streaming-sve. More patches to follow. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118561	2022-03-03 09:51:14 +00:00
Momchil Velikov	63c9aca12a	Revert "[AArch64] Async unwind - function epilogues" This reverts commit `74319d6794`. It causes test failures that look like infinite loop in asan/hwasan unwinding.	2022-03-02 15:01:57 +00:00
Momchil Velikov	74319d6794	[AArch64] Async unwind - function epilogues Counterpart of https://reviews.llvm.org/D111411 this change makes the unwind information instruction precise in function epilogues. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112330	2022-03-02 13:15:11 +00:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit `30e612ebdf`.	2022-03-02 14:10:11 +08:00
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Cameron McInally	70629d570b	[SVE] Update patterns to commute FMLS multiplication operands Use PatFrags to commute the multiplication operands of an AArch64ISD::FMA_PRED node, allowing unpredicated FMLS instructions to match. Reviewed by: paulwalker-arm Differential Revision: https://reviews.llvm.org/D120570	2022-03-01 12:53:14 -08:00
Florian Hahn	bb746716c2	[AArch64] Add tests with unnecessary dependency with faddp lowering. The added tests highlight an unnecessary cross-iteration dependency when lowering reductions to faddp. This dependency can negatively impact performance.	2022-03-01 10:30:34 +00:00
Florian Hahn	70c398c198	[AArch64] Use common CHECK prefix for test, reducing duplicated checks. Use the common CHECK prefix with runlines with and without fullfp16. This means no duplicated checks are generated for tests not using fp16.	2022-03-01 10:30:29 +00:00
Sander de Smalen	eac2638ec1	[AArch64][SVE] Fold away SETCC if original input was predicate vector. This adds the following two folds: Fold 1: setcc_merge_zero( all_active, extend(nxvNi1 ...), != splat(0)) -> nxvNi1 ... Fold 2: setcc_merge_zero( pred, extend(nxvNi1 ...), != splat(0)) -> nxvNi1 and(pred, ...) Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D119334	2022-02-28 14:12:43 +00:00
Momchil Velikov	32e8b550e5	[AArch64] Async unwind - function prologues This patch rearranges emission of CFI instructions, so the resulting DWARF and `.eh_frame` information is precise at every instruction. The current state is that the unwind info is emitted only after the function prologue. This is fine for synchronous (e.g. C++) exceptions, but the information is generally incorrect when the program counter is at an instruction in the prologue or the epilogue, for example: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill mov x29, sp .cfi_def_cfa w29, 16 ... ``` after the `stp` is executed the (initial) rule for the CFA still says the CFA is in the `sp`, even though it's already offset by 16 bytes A correct unwind info could look like: ``` stp x29, x30, [sp, #-16]! // 16-byte Folded Spill .cfi_def_cfa_offset 16 mov x29, sp .cfi_def_cfa w29, 16 ... ``` Having this information precise up to an instruction is useful for sampling profilers that would like to get a stack backtrace. The end goal (towards this patch is just a step) is to have fully working `-fasynchronous-unwind-tables`. Reviewed By: danielkiss, MaskRay Differential Revision: https://reviews.llvm.org/D111411	2022-02-28 13:37:57 +00:00
Sander de Smalen	201e3686ab	[AArch64][SVE] Handle more cases in findMoreOptimalIndexType. This patch addresses @paulwalker-arm's comment on D117900 to only update/write the by-ref operands iff the function returns true. It also handles a few more cases where a series of added offsets can be folded into the base pointer, rather than just looking at a single offset. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D119728	2022-02-28 12:13:52 +00:00
Sanjay Patel	acb96ffd14	[SDAG] fold bitwise logic with shifted operands LOGIC (LOGIC (SH X0, Y), Z), (SH X1, Y) --> LOGIC (SH (LOGIC X0, X1), Y), Z https://alive2.llvm.org/ce/z/QmR9rR This is a reassociation + factoring fold. The common shift operation is moved after a bitwise logic op on 2 input operands. We get simpler cases of these patterns in IR, but I suspect we would miss all of these exact tests in IR too. We also handle the simpler form of this plus several other folds in DAGCombiner::hoistLogicOpWithSameOpcodeHands(). This is a partial implementation of a transform suggested in D111530 (only handles 'or' bitwise logic as a first step - need to stamp out more tests for other opcodes). Several of the same tests added for D111530 are altered here (but not fully optimized). I'm not sure yet if this would help/hinder that patch, but this should be an improvement for all tests added with `ecf606cb43` since it removes a shift operation in those examples. Differential Revision: https://reviews.llvm.org/D120516	2022-02-27 09:54:12 -05:00
Simon Pilgrim	fadd20f80d	[DAG] Ensure type is legal for bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) fold As reported on D120192	2022-02-27 11:25:22 +00:00
Florian Hahn	c679fbee2a	[AArch64] Add tests for tbl + cmp splitting. Additional tests showing potential for follow-ups after D120571.	2022-02-25 17:59:44 +00:00
Paul Walker	16ee102964	[SVE] Add missing splat patterns for bfloat vectors. Differential Revision: https://reviews.llvm.org/D120496	2022-02-25 16:53:39 +00:00
Paul Walker	7ab78f34cd	[SVE] Refactor complex immediate pattern used by CPY/DUP. SelectSVE8BitLslImm didn't account for constant values that have a larger bit width than the result vector's element type. This only seems to affect a single corner case when lowering fixed length vectors but the code itself is also not consistent with how other related complex patterns are implemented so I've taken the opportunity to refactor the code. Differential Revision: https://reviews.llvm.org/D120440	2022-02-25 16:12:35 +00:00
Florian Hahn	166968a892	[AArch64] Add test cases where zext can be lowered to series of tbl. Add a set of tests for upcoming patches that allow lowering vector zext using AArch64 tbl instructions instead of shifts.	2022-02-25 15:36:32 +00:00
Sanjay Patel	ecf606cb43	[AArch64][x86] add tests for bitwise logic + shifts; NFC	2022-02-24 16:01:16 -05:00
Simon Pilgrim	370ebc9d9a	[DAG] Attempt to fold bswap(shl(x,c)) -> zext(bswap(trunc(shl(x,c-bw/2)))) If the shl is at least half the bitwidth (i.e. the lower half of the bswap source is zero), then we can reduce the shift and perform the bswap at half the bitwidth and just zero extend. Based off PR51391 + PR53867 Differential Revision: https://reviews.llvm.org/D120192	2022-02-24 19:33:51 +00:00
David Green	b3e9fdd170	[AArch64] Regenerate dp1.ll test, NFC The old check lines were not showing enough congtext to show issues. Regenerate the test with theua auto-check lines to be clearer.	2022-02-24 19:33:45 +00:00
Momchil Velikov	17e85cd410	[AArch64] Async unwind - Always place the first LDP at the end when ReverseCSRRestoreSeq is true This patch is in preparation for the async unwind CFI. Put the first `LDP` the end, so that the load-store optimizer can run and merge the `LDP` and the `ADD` into a post-index `LDP`. Do this always and as early as at the time of the initial creation of the CSR restore instructions, even if that `LDP` is not guaranteed to be mergeable with a subsequent `SP` increment. This greatly simplifies the CFI generation for prologue, as otherwise we have to take extra steps to ensure reordering does not cross CFI instructions. Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D112328	2022-02-24 18:48:07 +00:00

... 3 4 5 6 7 ...

5822 Commits