llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Devereau	cc3ef26f60	[AArch64][SVE] Add sve.dupq.lane(insert(constant vector, 0), 0) ld1rq tests	2022-06-24 07:40:31 +00:00
Lian Wang	770fe864fe	[SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128239	2022-06-24 02:32:53 +00:00
Bradley Smith	6f27df5084	[AArch64][SVE] Match (add x (lsr/asr y c)) -> usra/ssra x y c Differential Revision: https://reviews.llvm.org/D128045	2022-06-23 14:56:21 +00:00
Florian Mayer	9320a32bb9	[MTE] [HWASan] Use LoopInfo for reachability queries. The reachability queries default to "reachable" after exploring too many basic blocks. LoopInfo helps it skip over the whole loop. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D127917	2022-06-22 15:28:49 -07:00
Florian Mayer	476ced4b89	[MTE] [HWASan] Support diamond lifetimes. We were overly conservative and required a ret statement to be dominated completely be a single lifetime.end marker. This is quite restrictive and leads to two problems: * limits coverage of use-after-scope, as we degenerate to use-after-return; * increases stack usage in programs, as we have to remove all lifetime markers if we degenerate to use-after-return, which prevents reuse of stack slots by the stack coloring algorithm. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D127905	2022-06-22 11:16:34 -07:00
David Sherwood	aa0a413df8	[AArch64][SME] Add some SME PSTATE setting/query intrinsics This patch adds support for: * Querying the PSTATE.SM state with @llvm.aarch64.sme.get.pstatesm * Reading/writing the TPIDR2 register with new @llvm.aarch64.sme.get.tpidr2 and @llvm.aarch64.sme.set.tpidr2 intrinsics. Tests added here: CodeGen/AArch64/sme-get-pstatesm.ll CodeGen/AArch64/sme-read-write-tpidr2.ll Differential Revision: https://reviews.llvm.org/D127957	2022-06-22 10:26:45 +01:00
Paul Walker	696169a35d	[SVE] Add isel patterns that match "FpImm - A" to the immediate form of FSUBR. Differential Revision: https://reviews.llvm.org/D128200	2022-06-22 00:11:24 +01:00
Paul Walker	7b285ae0e8	[SVE] Lower "unpredicated" sabd/uabd intrinsics to ISD::ABDS/U. This enables an existing transformation that when combined with an add will emit saba/uaba instructions. Differential Revision: https://reviews.llvm.org/D128198	2022-06-22 00:02:51 +01:00
Martin Sebor	b19194c032	[InstCombine] handle subobjects of constant aggregates Remove the known limitation of the library function call folders to only work with top-level arrays of characters (as per the TODO comment in the code) and allows them to also fold calls involving subobjects of constant aggregates such as member arrays.	2022-06-21 11:55:14 -06:00
David Green	3f81841474	[AArch64] Add Extract(DUP(C)) as a canonical constant. As a followup to D128144, this adds extract(DUP(C)) as a canonical constant to prevent it being transformed back into a BUILD_VECTOR, leading to an infinite loop.	2022-06-21 09:51:22 +01:00
Serguei Katkov	163c77b2e0	[AARCH64 folding] Do not fold any copy with NZCV There is no instruction to fold NZCV, so, just do not do it. Without the fix the added test case crashes with an assert "Mismatched register size in non subreg COPY" Reviewed By: danilaml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D127294	2022-06-21 10:38:49 +07:00
Luo, Yuanke	44e8a205f4	[fastregalloc] Enhance the heuristics for liveout in self loop. For below case, virtual register is defined twice in the self loop. We don't need to spill %0 after the third instruction `%0 = def (tied %0)`, because it is defined in the second instruction `%0 = def`. 1 bb.1 2 %0 = def 3 %0 = def (tied %0) 4 ... 5 jmp bb.1 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D125079	2022-06-21 09:18:49 +08:00
David Green	c0ecbfa4fd	[AArch64] Known bits for AArch64ISD::DUP An AArch64ISD::DUP is just a splat, where the known bits for each lane are the same as the input. This teaches that to computeKnownBitsForTargetNode. Problems arise for constants though, as a constant BUILD_VECTOR can be lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then turn back into a constant BUILD_VECTOR leading to an infinite cycle. This has been prevented by adding a isTargetCanonicalConstantNode node to prevent the conversion back into a BUILD_VECTOR. Differential Revision: https://reviews.llvm.org/D128144	2022-06-20 19:11:57 +01:00
David Sherwood	013358632e	[AArch64][SME] Add the zero intrinsic The SME zero instruction takes a mask as an input declaring which 64-bit element tiles should be zeroed. There is a 1:1 mapping between the zero intrinsic and the instruction, however we also want to make the register allocator aware that some tile registers are being written to. We can actually just use the custom inserter for a pseudo instruction to correctly mark all the appropriate registers in the mask as implicitly defined by the operation. Differential Revision: https://reviews.llvm.org/D127843	2022-06-20 14:27:59 +01:00
Simon Pilgrim	e4a124dda5	[DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m) Similar to the existing (shl (srl x, c1), c2) fold Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125836	2022-06-20 08:37:38 +01:00
Lian Wang	ab25e263a9	[SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D127710	2022-06-20 06:30:26 +00:00
Simon Pilgrim	1ebe5cac46	[DAG] SimplifyDemandedBits - add DemandedElts handling to ISD::SIGN_EXTEND_INREG simplification	2022-06-19 15:35:29 +01:00
Sanjay Patel	f126643862	[AArch64] add tests for masked subtract; NFC	2022-06-17 14:56:32 -04:00
Paul Walker	0e21f1d56a	[SelectionDAG] Extend WidenVecOp_INSERT_SUBVECTOR to cover more cases. WidenVecOp_INSERT_SUBVECTOR only supported cases where widening effectively converts the insert into a copy. However, when the widened subvector is no bigger than the vector being inserted into and we can be sure there's no loss of data, we can simply emit another INSERT_SUBVECTOR. Fixes: #54982 Differential Revision: https://reviews.llvm.org/D127508	2022-06-17 12:39:42 +00:00
Paul Walker	fcd058acc9	[SVE][CodeGen] Restructure SVE fixed length tests to use update_llc_test_checks. Most tests have been updated to make use of vscale_range to reduce the number of RUN lines. For the remaining RUN lines the check prefixes have been updated to ensure the original expectation of the manual CHECK lines is maintained after update_llc_test_checks is run.	2022-06-17 00:30:56 +01:00
David Green	527b8ccde5	[AArch64] Regenerate 3 codegen test files. NFC	2022-06-16 18:23:05 +01:00
Adrian Tong	55311801f0	Allow bitwidth difference when checking for isOneOrOneSplat. This helps handling a case where the BUILD_VECTOR has i16 element type and i32 constant operands t2: v8i16 = setcc t8, t17, setult:ch t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ... t4: v8i16 = and t2, t3 t5: v8i16 = add t8, t4 This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove t3 and t4 from the DAG. Differential Revision: https://reviews.llvm.org/D127354	2022-06-16 16:04:20 +00:00
David Sherwood	6f6fa5aa10	[AArch64][SME] Add SME cntsb/h/w/d intrinsics These intrinsics return the number of elements in a streaming vector, for example aarch64.sme.cntsw returns the number of 32-bit elements. When in streaming mode these are equivalent to aarch64.sve.cntb/h/w/d with an input value of 1. I have implemented these intrinsics using the rdsvl instruction and added tests here: CodeGen/AArch64/SME/sme-intrinsics-rdsvl.ll Differential Revision: https://reviews.llvm.org/D127853	2022-06-16 10:50:25 +01:00
Simon Pilgrim	adfcdb0d0d	[AArch64] Add test case from D127354	2022-06-15 12:21:00 +01:00
David Sherwood	db7061e2ca	[NFC] Move tests CodeGen/AArch64/SME/sme-* -> CodeGen/AArch64/sme-*	2022-06-15 11:10:29 +01:00
David Sherwood	5fa2416ea0	[AArch64][SME] Add SME read/write intrinsics that map to the mova instruction This patch adds implementations for the read/write SME ACLE intrinsics: @llvm.aarch64.sme.read.horiz @llvm.aarch64.sme.read.vert @llvm.aarch64.sme.write.horiz @llvm.aarch64.sme.write.vert These all map to the SME mova instruction. Differential Revision: https://reviews.llvm.org/D127414	2022-06-15 10:31:07 +01:00
David Sherwood	bd61664167	[AArch64][SME] Add ldr/str (fill/spill) intrinsics This patch adds implementations for the fill/spill SME ACLE intrinsics: @llvm.aarch64.sme.ldr @llvm.aarch64.sme.str Differential Revision: https://reviews.llvm.org/D127317	2022-06-14 13:58:22 +01:00
Florian Hahn	e5c4308ba1	[InterleavedLoadComb] Rename uses when inserting new uses. This fixes a crash due to uses needing to be renamed.	2022-06-14 13:15:23 +01:00
Rosie Sumpter	2c4e44752d	[AArch64][SME] Add load/store intrinsics This patch adds implementations for the load/store SME ACLE intrinsics: - @llvm.aarch64.sme.ld1* - @llvm.aarch64.sme.st1* Differential Revision: https://reviews.llvm.org/D127210	2022-06-14 11:11:22 +01:00
Serguei Katkov	095bf6be28	[Greedy RegAlloc] Fix the handling of split register in last chance re-coloring. This is a fix for https://github.com/llvm/llvm-project/issues/55827. When register we are trying to re-color is split the original register (we tried to recover) has no uses after the split. However in rollback actions we assign back physical register to it. Later it causes different assertions. One of them is in attached test. This CL fixes this by avoiding assigning physical register back to register which has no usage or its live interval now is empty. Reviewed By: arsenm, qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D127281	2022-06-14 12:04:17 +07:00
Amaury Séchet	9ecf423453	[AArch64] Autogenerate sve-fixed-length tests. NFC As per title. This makes it easier to work onc hange that require "shotgun diffs" over the codebase. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D127118	2022-06-13 12:50:07 +00:00
Simon Pilgrim	7d8fd4f5db	[DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere Another issue unearthed by D127115 We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with). D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded. This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late. Differential Revision: https://reviews.llvm.org/D127595	2022-06-13 11:48:18 +01:00
zhongyunde	3cefcdb8c6	[test] Add test for D126700 NFC	2022-06-13 18:37:29 +08:00
David Green	dbac0e83d1	[AArch64] Mark smull and umull as commutative.	2022-06-13 09:24:15 +01:00
David Green	963c0a0147	[AArch64] Look through bitcast when looking for extract_high subvector Since D61806, DAGCombiner has folded subvector_extract(bitcast(..)) to bitcast(subvector_extract(..)), which would place a bitcast between a subvector_extract and the operation that could be converted to a high neon instruction (like smull2). This adds better matching for the subvector_extract, through the tablegen extract_high PatFrags to optionally skip the bitcast under little ending, still matchings an extract of the high half of the input vector. I didn't update the extract_high of a duplicate patterns, as the ComplexPattern need names operands. I did add a extract_high_dup_v8i16 PatFrag to abstract away the common code, which can be extended in a future patch. Differential Revision: https://reviews.llvm.org/D126782	2022-06-12 10:59:09 +01:00
Amaury Séchet	982f65a68e	Autogenerate sve-fixed-length-frame-offests-crash.ll . NFC	2022-06-12 01:54:10 +00:00
Amaury Séchet	d35da7f78a	Autogenerate sve-fixed-length-bitselect.ll . NFC	2022-06-12 01:53:05 +00:00
Simon Pilgrim	44a0cd25df	[DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling Check if we're just replacing one v1x?? vector with another	2022-06-11 19:30:00 +01:00
David Green	338fd211e7	[AArch64] Generate FADDP from shuffled fadd As a follow up to D126686, this does the same fold for floating point add and shuffle. In this case it is limited to reassoc either x[0]+x[1] or x[1]+x[0] for both result[0] and results[1]. Differential Revision: https://reviews.llvm.org/D127087	2022-06-11 14:16:37 +01:00
David Green	82fcd7397a	[AArch64] Add extra faddp codegen tests. NFC	2022-06-11 12:57:48 +01:00
Paul Walker	10d55c4634	[SelectionDAG] Remove invalid TypeSize conversion from WidenVecOp_BITCAST. Differential Revision: https://reviews.llvm.org/D127322	2022-06-11 10:41:13 +01:00
Eli Friedman	0ff51d5dde	Fix interaction of CFI instructions with MachineOutliner. 1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway through. 2. Make sure copied CFI directives refer to the correct instruction. Fixes https://github.com/llvm/llvm-project/issues/55842 Differential Revision: https://reviews.llvm.org/D126930	2022-06-10 13:37:49 -07:00
Guillaume Chatelet	38637ee477	[clang] Add support for __builtin_memset_inline In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`. The idea is to get support from the compiler to easily write efficient memory function implementations. This patch could be split in two: - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics. - and another one for the Clang part providing the instrinsic as a builtin. Differential Revision: https://reviews.llvm.org/D126903	2022-06-10 13:13:59 +00:00
Ahmed Bougacha	c68b469e07	[AArch64][SVE] Don't crash on pre-legalizer types in extload combine. This was assuming the vector types were MVTs, but they don't have to be. Note that the concrete output of the test isn't very useful, since it's dominated by nonsensical calling convention lowering for the weird types. Differential Revision: https://reviews.llvm.org/D126505	2022-06-09 10:33:21 -07:00
Guillaume Chatelet	dc3367970e	[SelectionDAG] Handle bzero/memset libcalls globally instead of per target Differential Revision: https://reviews.llvm.org/D127279	2022-06-09 08:34:55 +00:00
Florian Mayer	0593ce5f0b	[MC] Add 'G' to augmentation string for MTE instrumented functions This was agreed on in https://lists.llvm.org/pipermail/llvm-dev/2020-May/141345.html The thread proposed two options * add a character to augmentation string and handle in libuwind * use a separate personality function. It was determined that this is the simpler and better option. This is part of ARM's Aarch64 ABI: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id22 The next step after this is teaching libunwind to untag when this augmentation character is set. Reviewed By: MaskRay, eugenis Differential Revision: https://reviews.llvm.org/D127007	2022-06-08 12:36:32 -07:00
David Green	a1aef4f374	[AArch64] Remove ToBeRemoved from AArch64MIPeepholeOpt The ToBeRemoved is used to remove any MachineInstructions that are no longer needed, making sure we don't invalidate the iterator that is currently in use by erasing the instruction straight away. This makes issues for keeping the code in SSA from though, where subsequent transforms that require SSA form may have been broken by previous peepholes. If, instead, we use make_early_inc_range the iteration issue shouldn't be present, so long as we do not remove the subsequent instruction in the peephole optimizations. That way the code between transforms is kept in SSA form, meaning hopefully less things that can go wrong. Differential Revision: https://reviews.llvm.org/D127296	2022-06-08 17:26:07 +01:00
David Green	33ead6e444	[AArch64] Add tests for bitcast high register extracts. NFC	2022-06-08 15:26:31 +01:00
Paul Walker	d88354213c	[SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST. Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work with TypeSize directly rather than silently casting to unsigned. To accomplish this I've extended TypeSize with an interface that essentially allows TypeSize division when both operands have the same number of dimensions. There still exists combinations of scalable vector bitcasts that cause compiler crashes. I call these out by adding "is missing" entries to sve-bitcast. Depends on D126957. Fixes: #55114 Differential Revision: https://reviews.llvm.org/D127126	2022-06-08 10:30:07 +01:00
Paul Walker	a1121c31d8	[SVE] Fix incorrect code generation for bitcasts of unpacked vector types. Bitcasting between unpacked scalable vector types of different element counts is not a NOP because the live elements are laid out differently. 01234567 e.g. nxv2i32 = XX??XX?? nxv4f16 = X?X?X?X? Differential Revision: https://reviews.llvm.org/D126957	2022-06-08 10:30:07 +01:00
David Green	bccbf5276e	[AArch64] Remove isDef32 isDef32 would attempt to make a guess at which SelectionDag nodes were 32bit sources, and use the nature of 32bit AArch64 instructions implicitly zeroing the upper register half to not emit zext that were expected to already be zero. This was a bit fragile though, needing to guess at the correct opcodes that do not become 32bit defs later in ISel. This patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes. A part of SelectArithExtendedRegister was left with the same logic as a heuristic to prevent some regressions from it picking less optimal sequences. The AArch64MIPeepholeOpt pass also needs to be taught that a COPY from a FPR will become a FMOVSWr, which it lowers immediately to make sure that remains true through register allocation. Fixes #55833 Differential Revision: https://reviews.llvm.org/D127154	2022-06-07 18:57:59 +01:00
Matt Arsenault	56303223ac	llvm-reduce: Don't assert on functions which don't track liveness Use the query that doesn't assert if TracksLiveness isn't set, which needs to always be available. We also need to start printing liveins regardless of TracksLiveness.	2022-06-07 10:00:25 -04:00
David Green	6468feaeac	[AArch64] Regenerate arm64-shifted-sext.ll and add a test from #55833 . NFC	2022-06-07 13:55:53 +01:00
Michael Kitzan	b7fcf6632f	[GISel] Add new combines for G_ADD Patch adds new GICombineRules for G_ADD: G_ADD(x, G_SUB(y, x)) -> y G_ADD(G_SUB(y, x), x) -> y Patch additionally adds new combine tests for AArch64 target for these new rules. Reviewed by: paquette Differential Revision: https://reviews.llvm.org/D87936	2022-06-06 11:19:45 -07:00
David Green	4ea1b43527	[AArch64] Generate ADDP from shuffled add This adds a fold of add(x, shuffle(x, <1,0,3,2,5,4,...>), into shuffle(addp(x), <0,0,1,1,2,2,..>. The ADDP instruction takes two vectors and returns one, adding adjacent pairs. So we match x in a custom combine as it is lowered from a v8i32. The original code would be 2 rev64 and 2 add, with the new code being a single addp with a zip1;zip2 shuffle, producing smaller code. Differential Revision: https://reviews.llvm.org/D126686	2022-06-06 11:39:51 +01:00
Paul Walker	2dde272db7	[SVE] Refactor sve-bitcast.ll to include all combinations for legal types. Patch enables custom lowering for MVT::nxv4bf16 because otherwise the refactored test file triggers a selection failure. The reason for the refactoring it to highlight cases where the generated code is wrong.	2022-06-03 12:09:19 +01:00
David Green	79e3b043e5	[AArch64] Add extra addp codegen tests. NFC	2022-06-03 11:36:40 +01:00
Serguei Katkov	24e16e4af2	[SSAUpdaterImpl] Do not generate phi node with all the same incoming values If all available vals to basic block are the same - do not build new phi node and just use this value. Reviewed By: sameerds Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D126525	2022-06-03 12:24:33 +07:00
Serguei Katkov	c4d955dd7f	[MachineSSAUpdate] Add a test for redundant phi generation.	2022-06-03 11:27:14 +07:00
Paul Walker	48ea26a387	[SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR. LowerINSERT_SUBVECTOR emits AArch64ISD::UUNPK## when lowering scalable vector floating point INSERT_SUBVECTOR. However, these nodes only make sense for integer types and thus isel patterns do not exist for floating point, which leads to isel failures. This patch ensures floating point operands are cast to integer before the core lowering takes place. Fixes: #55037 Differential Revision: https://reviews.llvm.org/D126487	2022-06-02 14:51:04 +01:00
Nikita Popov	41d5033eb1	[IR] Enable opaque pointers by default This enabled opaque pointers by default in LLVM. The effect of this is twofold: * If IR that contains neither explicit ptr nor %T* types is passed to tools, we will now use opaque pointer mode, unless -opaque-pointers=0 has been explicitly passed. * Users of LLVM as a library will now default to opaque pointers. It is possible to opt-out by calling setOpaquePointers(false) on LLVMContext. A cmake option to toggle this default will not be provided. Frontends or other tools that want to (temporarily) keep using typed pointers should disable opaque pointers via LLVMContext. Differential Revision: https://reviews.llvm.org/D126689	2022-06-02 09:40:56 +02:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit `430ac5c302`. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Fangrui Song	873d2aff42	[AArch64][test] Replace -march with -mtriple for llc RUN lines -march is error-prone: -march inherits the OS and environment from the default target triple. Use -mtriple which is more common.	2022-05-31 22:39:43 -07:00
Alexander Shaposhnikov	a72cc958a3	[CodeGen][AArch64] Add support for LDAPR This diff adds support for LDAPR (RCPC extension) (https://github.com/llvm/llvm-project/issues/55561). Differential revision: https://reviews.llvm.org/D126250 Test plan: ninja check-all	2022-05-31 21:40:50 +00:00
Sander de Smalen	9c38fc111b	[AArch64] Remove references to Streaming SVE from target features. Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler predicates and instead reason about 'SVE' and 'SME' as its higher level features, rather than trying to model this runtime mode through explicit feature flags. This patch is largely NFC. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D125977	2022-05-31 16:25:01 +02:00
David Green	5cb14dc5a3	[AArch64] Look through copy in MachineCombiner FMUL patterns. This is a small addition to D99662, which added machine combiner patterns for FMUL(DUP(..)). Due to the way these are generated from ISel, they may also be FMUL(COPY(DUP(..))), which this patch now ignores the no-op COPY in. Differential Revision: https://reviews.llvm.org/D126632	2022-05-31 09:28:00 +01:00
Edd Barrett	d245974e1a	Test stackmap support for floating point types. It appears that float support is complete, or at least, the stackmap records emitted are not inconceivable (I must admit that I don't know about many of the architectures under test here). One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect) quirk of the stackmap format: in the case of a Register record, the Offset or SmallConstant field can encode a sub-register index! I've only ever seen this field zero for Register entries up until now.	2022-05-30 10:49:32 +01:00
David Green	99b0078064	[AArch64] Tests for showing MachineCombiner COPY patterns. NFC	2022-05-30 10:47:44 +01:00
David Green	9a3144d078	[AArch64] Reuse larger DUP if available If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the larger node using a vector extract to obtain the smaller. This comes up in the smull/smlal code, but needs a small fixup to allow the smull2 code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still match smull2 extracts. Differential Revision: https://reviews.llvm.org/D126449	2022-05-29 19:42:13 +01:00
Serge Pavlov	bdd0093f4d	[GlobalISel] Add G_IS_FPCLASS Add a generic opcode to represent `llvm.is_fpclass` intrinsic. Differential Revision: https://reviews.llvm.org/D121454	2022-05-27 13:49:47 +07:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit `4d8d2580c5`.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Adrian Tong	7c13ae6490	Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP. This is then used in AArch64 to remove copy instructions after taildup ran in machine block placement Differential Revision: https://reviews.llvm.org/D125335	2022-05-26 18:43:16 +00:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply `62a9b36fcf` and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after `62a9b36fcf`. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit `62a9b36fcf`. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Paul Walker	6f215ca680	[SelectionDAG] Add support to widen ISD::STEP_VECTOR operations. Fixes: #55165 Differential Revision: https://reviews.llvm.org/D126168	2022-05-24 22:42:37 +01:00
Chen Zheng	62a9b36fcf	[MachineSink] replace MachineLoop with MachineCycle MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-24 01:16:19 -04:00
Craig Topper	569d8945f3	[DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2. If the VT is i2, then 2 is really -2. Test has not been commited yet, but diff shows the change. Fixes PR55644. Differential Revision: https://reviews.llvm.org/D126213	2022-05-23 11:13:57 -07:00
Craig Topper	75eb0576de	[AArch64] Add test case for pr55644. NFC	2022-05-23 11:13:57 -07:00
Edd Barrett	c5e5cf1258	Test stackmap support for i128 This diff adds tests that check the currently-working stackmap cases for i128. This will help ensure no regressions are later introduced by D125680 (when ready). Note that i128 stackmap support is currently incomplete, so we cant test all i128 functionality: i128 constants >= 2^{63} crash LLVM non-constant i128s crash LLVM So this change tests only constant i128 operands of value < 2^{63}. A couple of incorrect comments are also fixed.	2022-05-23 11:56:24 +01:00
Simon Pilgrim	dd231f02a3	[AArch64] Regenerate andandshift.ll test checks	2022-05-23 11:48:24 +01:00
Andre Vieira	572fc7d2fd	[AArch64] Order STP Q's by ascending address This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule STP Q's to the same base-address in ascending order of offsets. We have found this to improve performance on Neoverse N1 and should not hurt other AArch64 cores. Differential Revision: https://reviews.llvm.org/D125377	2022-05-23 09:50:44 +01:00
Florian Hahn	0cc981e021	[AArch64] implement isReassocProfitable, disable for (u\|s)mlal. Currently reassociating add expressions can lead to failing to select (u\|s)mlal. Implement isReassocProfitable to skip reassociating expressions that can be lowered to (u\|s)mlal. The same issue exists for the *mlsl variants as well, but the DAG combiner doesn't use the isReassocProfitable hook before reassociating. To be fixed in a follow-up commit as this requires DAGCombiner changes as well. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125895	2022-05-23 09:39:00 +01:00
David Green	6ef5e242f2	[AArch64] Fix assumptions on input type of tryCombineFixedPointConvert It is possible for the input type to not be v2i64 or v4i32, so weaken the assertion to a return, fixing the crash in the new test. Fixes #55606	2022-05-23 08:55:54 +01:00
Paul Walker	258dac43d6	[SVE] Enable use of 32bit gather/scatter indices for fixed length vectors Differential Revision: https://reviews.llvm.org/D125193	2022-05-22 12:32:30 +01:00
Bill Wendling	d497129f9b	[AArch64] Use proper instruction mnemonics for FPRs The FPR128 regs need MOVIv2d_ns and SVE regs need DUP_ZI_D. Differential Revision: https://reviews.llvm.org/D126083	2022-05-20 12:02:26 -07:00
Rahul Anand R	534ea8bca5	[AArch64] Generate AND in place of CSEL for predicated CTTZ This patch implements a for a target specific optimization that replaces the cmp and csel from cttz with an and mask. Recommitted with a fix for truncated value sizes. Differential Revision: https://reviews.llvm.org/D123782	2022-05-20 13:41:32 +01:00
Bill Wendling	6e00a34cdb	[AArch64] Add support for -fzero-call-used-regs Support the "-fzero-call-used-regs" option on AArch64. This involves much less specialized code than the X86 version. Most of the checks can be done with TableGen. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D124836	2022-05-19 16:58:28 -07:00
David Green	602f81ec33	[AArch64] Fix zero element TBL indices A TBL instruction will fill out-of-range values with 0's, something used in D121139 to turn tbl2 with a zero input into tbl1s. This works OK for v16i8, but for v8i8 the input is still treated as a v16i8, so out-of-range values (like a lane index of 8) would end up loading values from the top half of the input register. Clean this up by detecting the out of range values and making sure they really use out of range values. There is a fix for swapped indices of 64bit input vectors too, which could be incorrectly adjusted if the zerovector was the first operand. Fixes #55545 Differential Revision: https://reviews.llvm.org/D125865	2022-05-19 13:54:35 +01:00
David Green	dd644ddf85	[AArch64] Extend zero vector TBL codegen tests. NFC	2022-05-19 13:01:55 +01:00
Jon Roelofs	d699e54ca2	Fix an or+and miscompile w/ GlobalISel Fixes #55284	2022-05-18 19:09:47 -07:00
Michael Kitzan	29bebb0237	[GISel] Add new combines for G_FMINNUM/MAXNUM and G_FMINIMUM/MAXIMUM I noticed https://reviews.llvm.org/D87415 added SDAG combines to fold FMIN/MAX instrs with NaNs. The patch implements the same NaN combines for GISel GMIR FMIN/MAX opcodes: G_FMINNUM(X, NaN) -> X G_FMAXNUM(X, NaN) -> X G_FMINIMUM(X, NaN) -> NaN G_FMAXIMUM(X, NaN) -> NaN The patch adds AArch64 tests for these combines as well. Reviewed by: arsenm Differential revision: https://reviews.llvm.org/D125819	2022-05-18 12:08:53 -07:00
Craig Topper	46eef76876	[DAGCombiner] Fix bug in MatchBSwapHWordLow. This function tries to match (a >> 8) \| (a << 8) as (bswap a) >> 16. If the SRL isn't masked and the high bits aren't demanded, we still need to ensure that bits 23:16 are zero. After the right shift they will be in bits 15:8 which is where the important bits from the SHL end up. It's only a bswap if the OR on bits 15:8 only takes the bits from the SHL. Fixes PR55484. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125641	2022-05-18 09:23:18 -07:00
Florian Hahn	a74e075908	[AArch64] Add tests showing reassoc breaks (s\|u)ml(a\|s)l selection.	2022-05-18 16:40:28 +01:00
Simon Pilgrim	939affc67d	[AArch64] neon-vmull-high-p64.ll - fix name/check mismatch identified in D125604 Typos meant that we weren't actually checking the function name, which wasn't accounting for mangling	2022-05-18 13:24:28 +01:00
Simon Pilgrim	1584b2c74e	[AArch64] fp16-v8-instructions.ll - remove some old defunct CHECKS identified in D125604 Typos meant that the update script never removed them	2022-05-18 12:49:05 +01:00
David Green	4c6a070a2c	[AArch64] Teach perfect shuffles tables about D-lane movs Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle tables, slightly lowering the costs a little more. This is a rough improvement in general, especially if you ignore mov v0.16b, v2.16b type moves that are often artefacts of the calling convention. The D register movs are encoded as (0x4 \| LaneIdx), and to generate a D register move we are required to bitcast into a higher type, but it is otherwise very similar to the S-lane mov's already supported. Differential Revision: https://reviews.llvm.org/D125477	2022-05-17 18:16:45 +01:00
David Green	8311fb7512	[AArch64] Extra tests useful for D-lane shuffles. NFC	2022-05-17 11:15:55 +01:00
Martin Storsjö	64a3c63e01	[MC] [Win64EH] Check for matches between epilogs and the prolog on ARM64 This allows sharing opcodes between prolog and epilog even when there is more than one epilog. I didn't make any handcrafted special MC level testcases for this (yet at least), but it does seem to have the expected effect on two existing CodeGen level testcases. Differential Revision: https://reviews.llvm.org/D125619	2022-05-17 00:41:39 +03:00
Martin Storsjö	cabefea2ec	[MC] [Win64EH] Try writing an ARM64 "packed epilog" even if the epilog doesn't share opcodes with the prolog The "packed epilog" form only implies that the epilog is located exactly at the end of the function (so the location of the epilog is implicit from the epilog opcodes), but it doesn't have to share opcodes with the prolog - as long as the total number of opcode bytes and the offset to the epilog fit within the bitfields. This avoids writing a 4 byte epilog scope in many cases. (I haven't measured how much this shrinks actual xdata sections in practice though.) Differential Revision: https://reviews.llvm.org/D125536	2022-05-17 00:41:39 +03:00
Paul Walker	ee8aa351e4	[AArch64] Use ADDV for boolean xor reductions. NEON does not have native support for xor reductions. However, when reducing predicate vectors the operation is synonymous with an add reduction that is supported. Differential Revision: https://reviews.llvm.org/D125605	2022-05-16 22:34:12 +01:00
David Green	5d29d75273	[AArch64] Predicate SSHLL;SCVTF patterns behind UseAlternateSExtLoadCVTF32 There have been some patterns in the AArch64 backend to optimize code of the form: ldrsh w8, [x0] scvtf s0, w8 to: ldr h0, [x0] sshll v0.4s, v0.4h, #0 scvtf s0, s0 The idea is to remove the GRP->FPR move, but in reality is making code larger and slower (or the same) on all the cpus I tried. This patch adds the UseAlternateSExtLoadCVTF32 predicate similar to nearby related pattern. Differential Revision: https://reviews.llvm.org/D125470	2022-05-16 18:00:30 +01:00
Craig Topper	74f6ded49d	[AArch64][ARM][RISCV][X86] Add test cases for PR55484. NFC This bug is in generic DAG combine and easily reproducible on many targets. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D125640	2022-05-16 09:28:11 -07:00
David Green	7272a8c23c	[AArch64] Update check lines in arm64-scvt.ll. NFC	2022-05-16 15:50:39 +01:00
Bradley Smith	7ff5148d64	[DAGCombine] Support splat_vector nodes in (and (extload)) dagcombine Differential Revision: https://reviews.llvm.org/D125367	2022-05-16 11:25:20 +00:00
Tim Northover	1ddc6ab1a9	AArch64: support ISel for fence instructions Only the most conservative of the DAG patterns matched, leaving GISel with "dmb ish" everywhere which is inefficient.	2022-05-16 12:01:18 +01:00
David Green	4c3e51ecfa	[AArch64] Handle 64bit vectors in tryCombineFixedPointConvert Under some situations we can visit 64bit vector extract elements in tryCombineFixedPointConvert, where an assert fires as they are expected to have been converted to 128bit. Turn the assert into an if statement, bailing out and letting the extract be handled first. Also invert some ifs, using early exits to reduce indentation. Fixes #55417	2022-05-16 11:08:47 +01:00
Alex Richardson	c8b44600c5	[AArch64] Avoid emitting MOVID when NEON is disabled Previously, creating a zero floating-point constant used MOVID even when NEON was disabled which resulted in the following fatal error: `Attempting to emit MOVID instruction but the Feature_HasNEON predicate(s) are not met` Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125237	2022-05-14 14:40:51 +00:00
Alex Richardson	09551251e3	[AArch64] Add missing HasNEON predicates to int->float patterns I was trying to compile code with -march=+nosimd and hit various instruction predicate verification errors, this patch should address the ones I saw in integer to floating-pointer conversions. I noticed that for signed conversions, some non-NEON instruction sequences are shorter. I don't know if the longer one is still faster on current architectures (the patterns date back to the initial backend import) Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125308	2022-05-14 14:15:36 +00:00
Alex Richardson	f8639133b5	[AArch64] Baseline test for D125307 Differential Revision: https://reviews.llvm.org/D125240	2022-05-14 14:15:36 +00:00
Eli Friedman	96c2a0c9ff	[GlobalIsel] Fix fallback if stack protector isn't supported. When GlobalISel fails, we need to report the error, and we need to set the FailedISel property. We skipped those steps if stack protector insertion failed, which led to a very strange miscompile. Differential Revision: https://reviews.llvm.org/D125584	2022-05-13 14:17:27 -07:00
Amara Emerson	41fef10449	[GlobalISel] Combine G_SHL, G_ASHR, G_SHL of undef shifts to undef. Differential Revision: https://reviews.llvm.org/D125041	2022-05-13 12:20:34 -07:00
Sam Parker	6d53d35efd	[TypePromotion] Avoid some unnecessary truncs Recommit. Check for legal zext 'sinks' before inserting a trunc. Differential Revision: https://reviews.llvm.org/D115451	2022-05-13 09:45:20 +01:00
Sam Parker	84b5f7c38c	[NFC][TypePromotion][AArch64] Tests Simplify existing test and also add it as a codegen test for aarch64.	2022-05-13 09:27:42 +01:00
Karl Meakin	0298cce257	[AArch64] Add `foldADCToCINC` DAG combine. Differential revision: https://reviews.llvm.org/D123781	2022-05-12 22:21:20 +01:00
Karl Meakin	d29fc6e7d2	[AArch64] Replace `performANDSCombine` with `performFlagSettingCombine`. `performFlagSettingCombine` is a generalised version of `performANDSCombine` which also works on `ADCS` and `SBCS`. Differential revision: https://reviews.llvm.org/D124464	2022-05-12 22:17:23 +01:00
Craig Topper	cec249c60d	[TypePromotion] Promote undef by converting to 0. If we're promoting an undef I think that means that we expect the upper bits are zero. undef doesn't guarantee that. This patch replaces undef with 0 to ensure this. This matches how a zext or sext of undef would be folded by InstCombine/InstSimplify. I haven't found a failure from this was just thinking through the code. Differential Revision: https://reviews.llvm.org/D123174	2022-05-12 09:09:24 -07:00
Nikita Popov	44d85259d0	[AArch64] Preserve chain when lowering fixed length load to SVE (PR55281) When a fixed length load is lowered to an SVE masked load, the result chain is currently set to the input chain of the old load, rather than the result chain of the new load. This may cause stores to be incorrectly reordered. Fixes https://github.com/llvm/llvm-project/issues/55281. Differential Revision: https://reviews.llvm.org/D125464	2022-05-12 16:03:32 +02:00
David Green	442c351b2b	Revert "[AArch64] Generate AND in place of CSEL for predicated CTTZ" This reverts commit `7dcd0ea683` due to issues reported postcommit with the correctness of truncated cttzs.	2022-05-10 17:17:03 +01:00
Rosie Sumpter	1a2665902f	[AArch64][SVE] Improve codegen when extracting first lane of active lane mask When extracting the first lane of a predicate created using the llvm.get.active.lane.mask intrinsic, it should give the same codegen as when the predicate is created using the llvm.aarch64.sve.whilelo intrinsic, since get.active.lane.mask is lowered to whilelo. This patch ensures the codegen is the same by recognizing llvm.get.active.lane.mask as a flag-setting operation in this case. Differential Revision: https://reviews.llvm.org/D125215	2022-05-09 13:56:04 +01:00
Alban Bridonneau	fef81131d9	[SVE] Optimize new cases for lowerConvertToSVBool Converts to SVBool are already considered as a nop, if they are converting an operand from a ptrue or a cmp, because they zero the extra predicate lanes by construction. This patch adds 2 similar cases: - The wide cmp, which were not directly recognized by the test for other forms of cmp - Splats of 1, which will be generated as ptrue, and as such will also zero the extra predicate lines. Reviewed By: paulwalker-arm, peterwaller-arm Differential Revision: https://reviews.llvm.org/D124908	2022-05-09 10:17:57 +00:00
Rahul Anand R	7dcd0ea683	[AArch64] Generate AND in place of CSEL for predicated CTTZ This patch implements a for a target specific optimization that replaces the cmp and csel from cttz with an and mask. Differential Revision: https://reviews.llvm.org/D123782	2022-05-09 10:28:20 +01:00
David Green	830c18047b	[AArch64] Add missing NVCAST patterns. There were apparently some missing NVCAST patterns. This fills them in using foreach, as opposed to having the specify them individually. Fixes #55321	2022-05-07 21:08:14 +01:00
Amaury Séchet	06fad8bc05	[DAGCombine] Add node in the worklist in topological order in CombineTo This is part of an ongoing effort toward making DAGCombine process the nodes in topological order. This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124743	2022-05-07 16:24:31 +00:00
Kazu Hirata	26ba347fbb	[AArch64] Add llvm/test/CodeGen/AArch64/i256-math.ll This patch adds a test case for i256 additions and subtractions. I'm leaving out multiplications for now, which would result in very long sequences. Differential Revision: https://reviews.llvm.org/D125125	2022-05-06 14:26:12 -07:00
Kazu Hirata	fffb6e6afd	[AArch64] Fix sub with carry `13403a70e4` introduced a bug where we generate the outgoing carry inverted, which in turn breaks the lowering of @llvm.usub.sat.i128, returning the normal difference on saturation and zero otherwise. Note that AArch64 has peculiar semantics where the subtraction instructions generate borrow inverted. The problem is that we mix the two forms of semantics -- the normal carry and inverted carry -- in the area of extended precision subtractions. Specifically, we have three problems: - lowerADDSUBCARRY takes the non-inverted incoming carry from a subtraction and feeds it to SBCS without inverting it first. - lowerADDSUBCARRY makes available the outgoing carry from SBCS without inverting it. - foldOverflowCheck folds: (SBC{S} l r (CMP (CSET LO carry) 1)) => (SBC{S} l r carry) When the incoming carry flag is set, CSET LO results in zero. CMP in turn generates a borrow, clearing the carry flag. Instead, we should fold: (SBC{S} l r (CMP 0 (CSET LO carry))) => (SBC{S} l r carry) When the incoming carry flag is set, CSET LO results in zero. CMP does not generate a borrow, setting the carry flag. IIUC, we should use the normal (that is, non-inverted) semantics for carry everywhere. This patch fixes the three problems above. This patch does not add any new testcases because we have a plenty of them covering the instruction in question. In particular, @u128_saturating_sub is identical to the testcase in the motivating issue. Fixes: #55253 Differential Revision: https://reviews.llvm.org/D124976	2022-05-06 11:04:17 -07:00
Craig Topper	76f90a9d71	[SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift. Otherwise we have garbage in the upper bits that can affect the results of the UREM. Fixes PR55296. Differential Revision: https://reviews.llvm.org/D125076	2022-05-06 09:26:30 -07:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
Amara Emerson	586802eb72	[GlobalISel] Re-generate some tests.	2022-05-05 14:14:36 -07:00
Craig Topper	084f967370	[SelectionDAG] Constant fold (sext_inreg undef, VT) to 0 instead of undef. The result of sign_extend_inreg needs to have as many sign bits as requested by the VT argument. The easiest way to guarantee this is to fold it to 0. SystemZ test was modified to avoid using undef. Fixes https://github.com/llvm/llvm-project/issues/55178 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124696	2022-05-05 09:45:35 -07:00
Amara Emerson	87e3646a1f	[AArch64][GlobalISel] Add undef combines to postlegalizer combiner.	2022-05-05 09:22:08 -07:00
David Green	c7a6b11b7e	[ARM][AArch64] Add some extra shuffle conversion test coverage. NFC This adds a big endian run line for the AArch64 TRN tests and regenerated the check lines, along with adding an extra MVE VMOVN case and regenerating vector-DAGCombine.ll for easier updating.	2022-05-05 15:27:44 +01:00
Bradley Smith	8f623f4ab0	[AArch64][SVE] Restore SP from FP when SVE CSRs and variable sized objects are present Without SVE, after a dynamic stack allocation has modified the SP, it is presumed that a frame pointer restoration will revert the SP back to it's correct value prior to any caller stack being restored. However the SVE frame is restored using the stack pointer directly, as it is located after the frame pointer. This means that in the presence of a dynamic stack allocation, any SVE callee state gets corrupted as SP has the incorrect value when the SVE state is restored. To address this issue, when variable sized objects and SVE CSRs are present, treat the stack as having been realigned, hence restoring the stack pointer from the frame pointerr prior to restoring the SVE state. Differential Revision: https://reviews.llvm.org/D124615	2022-05-04 12:57:03 +00:00
Alex Borcan	afaa56df7a	Implement support for __llvm_addrsig for MachO in llvm-mc The __llvm_addrsig section is a section that the linker needs for safe icf. This was not yet implemented for MachO - this is the implementation. It has been tested with a safe deduplication implementation inside lld. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D123751	2022-05-03 18:19:18 -04:00
Jon Roelofs	e1c808b36e	Fix zero-width bitfield extracts to emit 0 Fixes #55129	2022-05-03 14:46:42 -07:00
Philipp Tomsich	64816e68f4	[AArch64] Support for Ampere1 core Add support for the Ampere Computing Ampere1 core. Ampere1 implements the AArch64 state and is compatible with ARMv8.6-A. Differential Revision: https://reviews.llvm.org/D117112	2022-05-03 15:54:02 +01:00
Bradley Smith	96bbd359ed	[AArch64][SVE] Only fold frame indexes referencing SVE objects into SVE loads/stores Currently we always fold frame indexes into SVE load/store instructions, however these instructions can only encode VL scaled offests. This means that when we are accessing a fixed length stack object with these instructions, the folded in frame index gets pulled back out during frame lowering. This can cause issues when we have no spare registers and no emergency spill slot. Rather than causing issues like this, don't fold in frame indexes that reference fixed length objects. Fixes: #55041 Differential Revision: https://reviews.llvm.org/D124457	2022-05-03 09:48:13 +00:00
Sanjay Patel	747c6a0c73	[SDAG] fix miscompile when casting int->FP->int This is the codegen equivalent of D124692. As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. https://alive2.llvm.org/ce/z/KtaDmd Differential Revision: https://reviews.llvm.org/D124771	2022-05-02 14:57:27 -04:00
Sanjay Patel	cb3fb08508	[AArch64] add tests for int->FP->int casts; NFC Copied from x86 tests for multi-target coverage. Also, provides coverage for target-specific asm testing for Alive2 or its follow-ons. See #55150 and D124692	2022-05-02 09:18:12 -04:00
Paul Walker	f10a8f6752	[LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG SIGN_EXTEND_INREG expansion can trigger a TypeSize error because "VT.getSizeInBits() == 1" is used to detect for a boolean without first verifying VT is a scalar.	2022-04-30 19:21:48 +01:00
Craig Topper	6affe87bda	[DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask. We try to match as a disguised rotate by constant of these forms (shl (X \| Y), C1) \| (srl X, C2) --> (rotl X, C1) \| (shl Y, C1) (shl X, C1) \| (srl (X \| Y), C2) --> (rotl X, C1) \| (srl Y, C2) We may have also looked through an AND to find the shift. If we did, we need to apply a mask to the result. I'll add an AArch64 test and pre-commit it and the RISC-V test tomorrow. Fixes PR55201. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124711	2022-04-30 11:02:30 -07:00
Craig Topper	808c33ace5	[RISCV][AArch64] Pre-commit tests for D124711. NFC	2022-04-30 10:59:20 -07:00
Saleem Abdulrasool	24ba1302b3	AArch64: modify Swift async frame record storage on Windows The frame layout on Windows differs from that on other platforms. It will spill the registers in descending numeric value (i.e. x30, x29, ...). Furthermore, the x29, x30 pair is particularly important as it is used for the fast stack walking. As a result, we cannot simply insert the Swift async frame record in between the store. To provide the simplistic search mechanism, always spill the async frame record prior to the spilled registers. This was caught by the assertion failure in the frame lowering code when building the runtime for Windows AArch64. Fixes: #55058 Differential Revision: https://reviews.llvm.org/D124498 Reviewed By: mstorsjo	2022-04-30 09:01:33 -07:00
Craig Topper	65dbd8d793	[SelectionDAG] Pre-commit test for D124696. NFC	2022-04-29 17:24:13 -07:00
Paul Walker	b481512485	[SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine. This is essentially a refactoring patch but allows more cases to be caught, hence the output changes to some tests. Differential Revision: https://reviews.llvm.org/D122994	2022-04-29 17:42:33 +01:00
Paul Walker	23c509754d	[DAGCombiner] Stop invalid sign conversion in refineIndexType. When looking through extends of gather/scatter indices it's safe to convert a known positive signed index to unsigned, but unsigned indices must remain unsigned. Depends On D123318 Differential Revision: https://reviews.llvm.org/D123326	2022-04-29 14:20:13 +01:00
Paul Walker	59588f0a3d	[SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost. getGatherScatterIndexIsExtended currently looks through all SIGN_EXTEND_INREG operations regardless of their input type. This patch restricts the code to only look through i32->i64 extensions, which are the ones supported implicitly by SVE addressing modes. Differential Revision: https://reviews.llvm.org/D123318	2022-04-29 14:20:13 +01:00
Paul Walker	7a0b897e86	[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling refineUniformBase and selectGatherScatterAddrMode both attempt the transformation: base(0) + index(A+splat(B)) => base(B) + index(A) However, this is only safe when index is not implicitly scaled. Differential Revision: https://reviews.llvm.org/D123222	2022-04-29 12:35:16 +01:00
Nikita Popov	4e545bdb35	[SimplifyCFG] Thread branches on same condition in more cases (PR54980) SimplifyCFG implements basic jump threading, if a branch is performed on a phi node with constant operands. However, InstCombine canonicalizes such phis to the condition value of a previous branch, if possible. SimplifyCFG does support this as well, but only in the very limited case where the same condition is used in a direct predecessor -- notably, this does not include the common diamond pattern (i.e. two consecutive if/elses on the same condition). This patch extends the code to look back a limited number of blocks to find a branch on the same value, rather than only looking at the direct predecessor. Fixes https://github.com/llvm/llvm-project/issues/54980. Differential Revision: https://reviews.llvm.org/D124159	2022-04-29 09:44:05 +02:00
Paul Walker	3c382ed71f	[AArch64][SVE] Remove BIC from logical operation DestructiveBinaryComm patterns This reverts part of https://reviews.llvm.org/D124224 that causes an assert because the register allocator triggers a pathological situation where there's no safe way to insert a zeroing MOVPFRX instruction.	2022-04-22 15:07:55 +01:00
zhongyunde	e1afae0311	[AArch64][SVE] Add some logical operation DestructiveBinaryComm patterns Add DestructiveBinaryComm* patterns for ORR, EOR, AND and BIC. The above instructions requires that the source and destination registers are equal, so use movprfx should be beneficial to performance. note: BIC (i.e. A & ~B) is not a commutative operation. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D124224	2022-04-22 20:31:00 +08:00
Daniel Kiss	de07cde67b	[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D111780	2022-04-22 13:25:57 +02:00
Karl Meakin	81904454f7	[AArch64] Add `foldOverflowCheck` DAG combine Differential Revision: https://reviews.llvm.org//D123779	2022-04-21 14:56:38 +01:00
Karl Meakin	13403a70e4	[AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY Differential Revision: https://reviews.llvm.org/D123322	2022-04-21 14:56:37 +01:00
Pengxuan Zheng	38612fbc89	Reland "[COFF, ARM64] Add __break intrinsic" https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170 Reland after fixing the test failure. The failure was due to conflict with a change (D122983) which was merged right before this patch. Reviewed By: rnk, mstorsjo Differential Revision: https://reviews.llvm.org/D124032	2022-04-20 13:01:30 -07:00
Pengxuan Zheng	bff8356b19	Revert "[COFF, ARM64] Add __break intrinsic" This reverts commit `8a9b4fb4aa`.	2022-04-20 11:57:49 -07:00
Pengxuan Zheng	8a9b4fb4aa	[COFF, ARM64] Add __break intrinsic https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170 Reviewed By: rnk, mstorsjo Differential Revision: https://reviews.llvm.org/D124032	2022-04-20 11:20:26 -07:00
Alexey Bataev	2cca53c815	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 09:37:16 -07:00
Alexey Bataev	5f7ac15912	Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer." This reverts commit `2f49163b33` to fix a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284	2022-04-20 06:35:55 -07:00
Alexey Bataev	2f49163b33	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 05:32:56 -07:00
Matt Arsenault	d16945d31b	AArch64/GlobalISel: Add -global-isel-abort=1 to select tests Otherwise the legalizer verifier error isn't triggered since the default is fallback.	2022-04-19 21:04:32 -04:00
David Green	73dc996428	[AArch64] Add lane moves to PerfectShuffle tables This teaches the perfect shuffle tables about lane inserts, that can help reduce the cost of many entries. Many of the shuffle masks are one-away from being correct, and a simple lane move can be a lot simpler than trying to use ext/zip/etc. Because they are not exactly like the other masks handled in the perfect shuffle tables, they require special casing to generate them, with a special InsOp Operator. The lane to insert into is encoded as the RHSID, and the move from is grabbed from the original mask. This helps reduce the maximum perfect shuffle entry cost to 3, with many more shuffles being generatable in a single instruction. Differential Revision: https://reviews.llvm.org/D123386	2022-04-19 14:49:50 +01:00
David Green	cc9495f679	[AArch64] Only mark cost 1 perfect shuffles as legal The perfect shuffle tables encode a cost of either 0 (a nop-copy) or 1 (a single instruction) with a cost encoding of 0 in the upper 2 bits. All perfect shuffles with any cost are then marked as legal shuffles though (the maximum encoded cost is 3), which can confuse the DAG combiner into thinking the shuffles are cheaper than the should be. Limiting legal shuffles to single instructions seems to do better in most case, producing less instructions for complex shuffles. There are some cases that now become tbl, which may be better or worse depending on whether the instruction is in a loop and the tbl load can be hoisted out. Differential Revision: https://reviews.llvm.org/D123377	2022-04-19 12:58:55 +01:00
David Green	50af82701c	[AArch64] Cost all perfect shuffles entries as cost 1 A brief introduction to perfect shuffles - AArch64 NEON has a number of shuffle operations - dups, zips, exts, movs etc that can in some way shuffle around the lanes of a vector. Given a shuffle of size 4 with 2 inputs, some shuffle masks can be easily codegen'd to a single instruction. A <0,0,1,1> mask for example is a zip LHS, LHS. This is great, but some masks are not so simple, like a <0,0,1,2>. It turns out we can generate that from zip LHS, <0,2,0,2>, having generated <0,2,0,2> from uzp LHS, LHS, producing the result in 2 instructions. It is not obvious from a given mask how to get there though. So we have a simple program (PerfectShuffle.cpp in the util folder) that can scan through all combinations of 4-element vectors and generate the perfect combination of results needed for each shuffle mask (for some definition of perfect). This is run offline to generate a table that is queried for generating shuffle instructions. (Because the table could get quite big, it is limited to 4 element vectors). In the perfect shuffle tables zip, unz and trn shuffles were being cost as 2, which is higher than needed and skews the perfect shuffle tables to create inefficient combinations. This sets them to 1 and regenerates the tables. The codegen will usually be better and the costs should be more precise (but it can get less second-order re-use of values from multiple shuffles, these cases should be fixed up in subsequent patches. Differential Revision: https://reviews.llvm.org/D123379	2022-04-19 12:05:05 +01:00
chenglin.bi	222adf338a	[Arch64][SelectionDAG] Add target-specific implementation of srem 1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. 2. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-19 02:49:42 +08:00
Momchil Velikov	e0ff354b83	[AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer [Re-commit after fixing a dereference of "end" iterator] The AArch64LoadStoreOptimnizer pass may merge a register increment/decrement with a following memory operation. In doing so, it may break CFI by moving a stack pointer adjustment past the CFI instruction that described that adjustment. This patch fixes this issue by moving said CFI instruction after the merged instruction, where the SP increment/decrement actually takes place. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D114547	2022-04-18 12:09:44 +01:00
chenglin.bi	acfc025a72	Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem" This reverts commit `9d9eddd3dd`.	2022-04-18 10:35:09 +08:00
chenglin.bi	9d9eddd3dd	[Arch64][SelectionDAG] Add target-specific implementation of srem X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-16 12:29:11 +08:00
Momchil Velikov	24c84bd236	[AArch64] Async unwind - Fix MTE codegen emitting frame adjustments in a loop When untagging the stack, the compiler may emit a sequence like: ``` .LBB0_1: st2g sp, [sp], #32 sub x8, x8, #32 cbnz x8, .LBB0_1 stg sp, [sp], #16 ``` These stack adjustments cannot be described by CFI instructions. This patch disables merging of SP update with untagging, i.e. makes the compiler use an additional scratch register (there should be plenty available at this point as we are in the epilogue) and generate: ``` mov x9, sp mov x8, #256 stg x9, [x9], #16 .LBB0_1: sub x8, x8, #32 st2g x9, [x9], #32 cbnz x8, .LBB0_1 add sp, sp, #272 ``` Merging is disabled only when we need to generate asynchronous unwind tables. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D114548	2022-04-15 14:00:23 +01:00
John Brawn	27a8735a44	[AArch64] Add mayRaiseFPException to appropriate instructions This is mostly handled by adding "let mayRaiseFPException = 1" before the definition of the relevant instruction classes, but there are a couple of complications: * When we have a multiclass where currently some instantiations are of instructions that can raise an exception and others aren't we need to split that into two multiclasses, one inheriting from the other using a multiclass parameter to enable exceptions. * In a couple of places in the globalisel instruction selector we need to manually set the NoFPExcept flag. There's also another place that looks like it should need it, but that code is never hit for those opcodes due to them being handled by the generic instruction selector, so I've instead just removed them from the switch. Differential Revision: https://reviews.llvm.org/D115352	2022-04-14 16:51:22 +01:00
John Brawn	12c1022679	[AArch64] Lowering and legalization of strict FP16 For strict FP16 to work correctly needs some changes in lowering and legalization: * SelectionDAGLegalize::PromoteNode was missing handling for some strict fp opcodes. * Some of the custom lowering of strict fp operations needed to be adjusted to work with FP16. * Custom lowering needed to be added for round-to-int operations. With this, and the previous patches for the rest of the strict fp isel, we can set IsStrictFPEnabled = true. Differential Revision: https://reviews.llvm.org/D115620	2022-04-14 16:51:22 +01:00
David Green	1ba8f4f67d	[AArch64] Move v4i8 concat load lowering to a combine. The existing code was not updating the uses of loads that it recreated, leading to incorrect chains which could break the ordering between nodes. This moves the code to a combine instead, and makes sure we update the chain references. This does mean it happens earlier - potentially before the concats are simplified. This can lead to inefficiencies in the codegen, which will be fixed in followups.	2022-04-14 15:19:33 +01:00
Paul Walker	0c44115e51	[SVE] Add support for non-element-type sized scaling when lowering MGATHER/MSCATTER. The lowering code did not use the scale operand of MGATHER/MSCATTER nodes, but instead assumed scaled indices were always scaled based on the element type of the memory type. This patch adds the missing support by rewritting the nodes as unscaled variants. Differential Revision: https://reviews.llvm.org/D123670	2022-04-14 11:54:46 +01:00
Momchil Velikov	62d4686be3	Revert "[AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer" This reverts commit `ecbf32dd88`. It's possible this patch is the reason for an asertion failure `!NodePtr->isKnownSentinel()` in `AArch64LoadStoreOpt::mergeUpdateInsn` (https://lab.llvm.org/buildbot/#/builders/185/builds/1555) reverting while I investigate.	2022-04-14 09:33:40 +01:00
David Green	4585bff408	[AArch64] Add new shuffles tests, and regenerate aarch64-wide-shuffle.ll and neon-wide-splat.ll. NFC	2022-04-13 18:10:49 +01:00
chenglin.bi	82e5976b7d	[AArch64][SelectionDAG] stick all the power-of-two tests in a separate file; NFC Baseline tests for D122968 (issue #54649).	2022-04-14 00:48:28 +08:00
Momchil Velikov	ecbf32dd88	[AArch64] Async unwind - Adjust unwind info in AArch64LoadStoreOptimizer The AArch64LoadStoreOptimnizer pass may merge a register increment/decrement with a following memory operation. In doing so, it may break CFI by moving a stack pointer adjustment past the CFI instruction that described that adjustment. This patch fixes this issue by moving said CFI instruction after the merged instruction, where the SP increment/decrement actually takes place. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D114547	2022-04-13 17:04:53 +01:00
Alex Richardson	ee44896cf4	[AArch64] Add missing HasNEON predicate in scalar FABD patterns I was trying to compile with -march=+nosimd and hit the following assertion: `Attempting to emit FABD64 instruction but the Feature_HasNEON predicate(s) are not met`. This adds a HasNEON predicate to the patterns which was omitted in commit `21d9b33d62` for some reason. The new code generation matches GCC with -mcpu=<cpu>+nosimd: https://godbolt.org/z/n1Y7xh5jo Differential Revision: https://reviews.llvm.org/D123491	2022-04-13 09:30:11 +00:00
Alex Richardson	32a353a5e0	[AArch64] Baseline test for D123491	2022-04-13 09:30:11 +00:00
David Sherwood	44271e7c55	[AArch64][SVE] Fix lowering of "fcmp ueq/one" when using SVE We were previously lowering to the incorrect instructions for the setcc DAG node when using the SETUEQ and SETONE floating point condition codes. I have fixed this by marking the SETONE code as Expand and letting the SETUNE code be legal. I have also fixed up the patterns for FCMNE_PPzZZ and FCMNE_PPzZ0 to use the correct opcode. Differential Revision: https://reviews.llvm.org/D121905	2022-04-13 10:24:03 +01:00
Daniel Kiss	b0343a38a5	Support the min of module flags when linking, use for AArch64 BTI/PAC-RET LTO objects might compiled with different `mbranch-protection` flags which will cause an error in the linker. Such a setup is allowed in the normal build with this change that is possible. Reviewed By: pcc Differential Revision: https://reviews.llvm.org/D123493	2022-04-13 09:31:51 +02:00
Matt Arsenault	6009122250	AArch64/GlobalISel: Remove pointless s1 legalize rules These have no net effect on the legalize rules.	2022-04-12 16:54:04 -04:00
Matt Arsenault	3f2cc7cc2b	GlobalISel: Fix lowerSelect handling of boolean high bits This was making several invalid assumptions about the incoming select. First, it was assuming the incoming condition was either s1 or already sign extended, not accounting for different boolean high bits behavior between scalar and vector conditions. We only had a vector boolean due to the intermediate step vector select, which is now avoided. Second, it was assuming it can use the result vector type as a boolean mask. These types don't have anything to do with other, and only makes sense in the context of the expansion to bit operations. Since these logically are part of the same lowering, do the complete expansion in a single step. The added select_v4s1_s1 test does fail to legalize, since it seems AArch64's vector legalization support is pretty incomplete.	2022-04-12 16:54:03 -04:00
Ahmed Bougacha	cfa4fe7c51	[AArch64][LOH] Don't ignore regmasks in bundles by iterating over instrs. The LOH pass iterates over instructions to build its custom register state machine, but it uses the top-level bundle iterator. This should be okay, because when the wrapper BUNDLE MI is built, it aggregates the register defs/uses in its instructions into MOs. However, that doesn't apply to regmasks, and accumulating regmasks across multiple instructions would be messy business. There are a couple AnalyzePhysRegInBundle (/Virt) helpers that do look at regmasks, but those don't fit in very well here. AArch64 has started to use a few bundle instructions, specifically as glorified pseudos for variant call instructions, which have regmasks. So the LOH pass ends up ignoring regmasks. Concretely, this has been wrong for a while, but, on aarch64, the most common bundle (rv_marker call) was always followed by the attached call instruction, a plain BL with a regmask. Which was properly detected by the pass. However, we recently started keeping the attached call in the bundle, so the regmask is now ignored. And the pass happily combines ADRPs, of say, x8, across the bundle, resulting in corrupt pointers later.	2022-04-12 10:34:54 -07:00
Ahmed Bougacha	f3e76dcae3	[AArch64] Cleanup call-rv-marker.ll test. NFC. This was doing -iphoneos instead of -ios. While there, remove an old TODO and cleanup some alignment.	2022-04-12 10:34:54 -07:00
Momchil Velikov	d0ea42a7c1	[AArch64] Async unwind - function epilogues Reviewed By: MaskRay, chill Differential Revision: https://reviews.llvm.org/D112330	2022-04-12 16:50:50 +01:00
Matt Arsenault	7e8ff962b3	AArch64/GlobalISel: Regenerate mir test checks Minimizes the test diffs in future changes from introduction of -NEXT.	2022-04-11 20:12:22 -04:00
Matt Arsenault	492d0eab89	AArch64/GlobalISel: Remove IR section from a test	2022-04-11 19:43:37 -04:00
Biplob Mishra	d06fb9045b	AArch64 adding more tests to show the simple scenarios for or/and combine	2022-04-11 20:54:12 +01:00
Momchil Velikov	b4ad28da19	[CodeGen] Async unwind - add a pass to fix CFI information This pass inserts the necessary CFI instructions to compensate for the inconsistency of the call-frame information caused by linear (non-CGA aware) nature of the unwind tables. Unlike the `CFIInstrInserer` pass, this one almost always emits only `.cfi_remember_state`/`.cfi_restore_state`, which results in smaller unwind tables and also transparently handles custom unwind info extensions like CFA offset adjustement and save locations of SVE registers. This pass takes advantage of the constraints taht LLVM imposes on the placement of save/restore points (cf. `ShrinkWrap.cpp`): * there is a single basic block, containing the function prologue * possibly multiple epilogue blocks, where each epilogue block is complete and self-contained, i.e. CSR restore instructions (and the corresponding CFI instructions are not split across two or more blocks. * prologue and epilogue blocks are outside of any loops Thus, during execution, at the beginning and at the end of each basic block the function can be in one of two states: - "has a call frame", if the function has executed the prologue, or has not executed any epilogue - "does not have a call frame", if the function has not executed the prologue, or has executed an epilogue These properties can be computed for each basic block by a single RPO traversal. From the point of view of the unwind tables, the "has/does not have call frame" state at beginning of each block is determined by the state at the end of the previous block, in layout order. Where these states differ, we insert compensating CFI instructions, which come in two flavours: - CFI instructions, which reset the unwind table state to the initial one. This is done by a target specific hook and is expected to be trivial to implement, for example it could be: ``` .cfi_def_cfa <sp>, 0 .cfi_same_value <rN> .cfi_same_value <rN-1> ... ``` where `<rN>` are the callee-saved registers. - CFI instructions, which reset the unwind table state to the one created by the function prologue. These are the sequence: ``` .cfi_restore_state .cfi_remember_state ``` In this case we also insert a `.cfi_remember_state` after the last CFI instruction in the function prologue. Reviewed By: MaskRay, danielkiss, chill Differential Revision: https://reviews.llvm.org/D114545	2022-04-11 13:27:26 +01:00
Sanjay Patel	2ed15984b4	[SDAG] try to reduce compare of funnel shift equal 0 fshl (or X, Y), X, C ==/!= 0 --> or (shl Y, C), X ==/!= 0 fshl X, (or X, Y), C ==/!= 0 --> or (srl Y, BW-C), X ==/!= 0 This is similar to an existing setcc-of-rotate fold, but the matching requires more checks for the more general funnel op: https://alive2.llvm.org/ce/z/Ab2jDd We are effectively decomposing the funnel shift into logical shifts, reassociating, and removing a shift. This should get us the final improvements for x86-64 that were originally shown in D111530 ( https://github.com/llvm/llvm-project/issues/49541 ); x86-32 still shows some SHLD/SHRD, so the pattern is not matching there yet. Differential Revision: https://reviews.llvm.org/D122919	2022-04-11 07:44:58 -04:00
Tim Northover	901831a4e6	Revert "AArch64: take compact unwind frame size from last CFI instruction." It was on ToT when I pushed and committed unintentionally.	2022-04-11 12:25:58 +01:00
Tim Northover	9fe32ca697	AArch64: add nvcast patterns for v1f64	2022-04-11 12:24:48 +01:00
Tim Northover	4120a3abdd	AArch64: take compact unwind frame size from last CFI instruction. Asynchronous exception support for the prologue means that there can be multiple .cfi_def_cfa_offset instructions in a single function, which tripped up an assertion in the compact unwind generator. In reality the compact unwind format is far too restrictive to represent asynchronous frames so if we ever wanted that on Darwin we'd fall back to DWARF (possibly keeping compact unwind around for synchronous users). So the compact format should continue to represent the synchronous situation, and the assertion can be removed.	2022-04-11 12:24:48 +01:00
Tim Northover	6c85668d28	Tail calls: look through AssertZExt to find register copy. arm64_32 guarantees the high 32 bits of pointer parameters are passed as 0, and this is modelled in the IR by inserting an AssertZExt after the CopyFromReg. The function deciding whether registers that need to be preserved actually are wasn't expecting this so it banned perfectly legitimate tail calls.	2022-04-11 12:24:47 +01:00
Alexander Shaposhnikov	626039cdcc	[AArch64] Split fuse-literals feature This diff splits fuse-literals feature and enables fuse-adrp-add by default, in particular, it adjusts instruction scheduling to place ADRP+ADD pairs together. This also enables the linker to apply the relaxations described in `d2ca58c54b`. Differential revision: https://reviews.llvm.org/D120104 Test plan: make check-all	2022-04-11 05:27:11 +00:00
Karl Meakin	784b9d468a	[AArch64] Update tests with the `update_llc_test_checks.py` script (NFC) Reviewed By: Kmeakin Differential Revision: https://reviews.llvm.org/D123317	2022-04-07 18:06:15 +01:00
Paul Walker	a88e8374db	[SVE] Add more gather/scatter tests to highlight bugs in their generated code.	2022-04-07 17:13:48 +01:00

... 2 3 4 5 6 ...

5822 Commits