llvm-project

Commit Graph

Author	SHA1	Message	Date
jacquesguan	e60eb7053d	recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." With fix for AArch64 and Hexgon test cases.	2022-07-21 17:34:34 +08:00
David Green	23d6186be0	[SelectionDAG] Fix fptoi.sat scalable vector lowering Vector fptosi_sat and fptoui_sat were being expanded by unrolling the vector operation. This doesn't work for scalable vector, so this patch adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable. Scalable tests are added for AArch64 and RISCV. Some of the AArch64 fptoi_sat operations should be legal, but that will be handled in another patch. Differential Revision: https://reviews.llvm.org/D130028	2022-07-21 08:00:22 +01:00
Simon Pilgrim	9fc347aa4e	[DAG] PromoteIntRes_BUILD_VECTOR - extend constant boolean vectors according to target BooleanContents PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target. This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur. Differential Revision: https://reviews.llvm.org/D129641	2022-07-20 10:49:31 +01:00
Alexandros Lamprineas	051738b08c	Reland "[AArch64] Add a tablegen pattern for UZP2." Converts concat_vectors((trunc (lshr)), (trunc (lshr))) to UZP2 when the shift amount is half the width of the vector element. Prioritize the ADDHN(2), SUBHN(2) patterns over UZP2. Fixes https://github.com/llvm/llvm-project/issues/52919 Differential Revision: https://reviews.llvm.org/D130061	2022-07-20 09:47:32 +01:00
chenglin.bi	d337c1f256	[AArch64] Use SUBXrx64 for dynamic stack to refer to sp When we lower dynamic stack, we need to substract pattern `x15 << 4` from sp. Subtract instruction with arith shifted register(SUBXrs) can't refer to sp. So for now we need two extra mov like: ``` mov x0, sp sub x0, x0, x15, lsl #4 mov sp, x0 ``` If we want to refer to sp in subtract instruction like this: ``` sub sp, sp, x15, lsl #4 ``` We must use arith extended register version(SUBXrx). So in this patch when we find sub have sp operand on src0, try to select to SubXrx64. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D129932	2022-07-20 11:46:10 +08:00
Jez Ng	2e2737cdf9	[MC][MachO] Change addrsig format + ensure its size is properly set There were two problems with the previous setup: 1. We weren't setting its size, which caused problems when `__llvm_addrsig` wasn't the last section. In particular, `__debug_line` (if created) is generated and placed after `__llvm_addrsig`, and would result in an invalid object file w/ overlapping sections being emitted. 2. The symbol indices could be invalidated if e.g. `llvm-strip` ran on the object file. See discussion [here][1]. To fix both these issues, we use symbol relocations instead of encoding symbol indices directly in the section contents. The section itself doesn't contain any data. That sidesteps the layout problem in addition to solving the second issue. The corresponding LLD change to read in this new format: {D128938}. It will fix the icf-safe.ll test failure on this diff. [1]: https://discourse.llvm.org/t/problems-with-mach-o-address-significance-table-generation/63392/ Reviewed By: #lld-macho, alx32 Differential Revision: https://reviews.llvm.org/D127637	2022-07-19 21:22:23 -04:00
Nick Desaulniers	1cf6b93df1	Revert "[Local] Allow creating callbr with duplicate successors" This reverts commit `08860f525a`. Crashes during PPC64LE linux kernel builds as reported by @nathanchance. https://reviews.llvm.org/D129997#3663632	2022-07-19 15:03:27 -07:00
David Truby	4c82f56d8f	[llvm][SVE] Remove redundant and when comparing against extending load When determining if an `and` should be merged into an extending load the constant argument to the `and` is currently not checked if the argument requires truncation. This prevents the combine happening when the vector width is half the normal available vector width for SVE VLA vectors. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D129281	2022-07-19 17:08:32 +01:00
Nikita Popov	08860f525a	[Local] Allow creating callbr with duplicate successors Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs. Differential Revision: https://reviews.llvm.org/D129997	2022-07-19 14:28:22 +02:00
Cullen Rhodes	f7b2d4aac6	[AArch64] Add patterns to fold zext(cmpeq(x, splat(0))) Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D129626	2022-07-19 08:14:38 +00:00
Rosie Sumpter	05d424d165	[AArch64][SVE] Fold fadda(ptrue, x, select(mask, y, -0.0)) into fadda(mask, x, y) This patch adds an SVE pattern to recognize the use of a select with an fadda in the form fadda(ptrue, x, select(mask, y, -0.0)). In this case the select can be folded away, with the select mask used as the predicate for fadda. This improves the codegen when vectorizing loops with ordered fp reductions. Differential Revision: https://reviews.llvm.org/D129623	2022-07-19 08:31:51 +01:00
Matt Arsenault	8d0383eb69	CodeGen: Remove AliasAnalysis from regalloc This was stored in LiveIntervals, but not actually used for anything related to LiveIntervals. It was only used in one check for if a load instruction is rematerializable. I also don't think this was entirely correct, since it was implicitly assuming constant loads are also dereferenceable. Remove this and rely only on the invariant+dereferenceable flags in the memory operand. Set the flag based on the AA query upfront. This should have the same net benefit, but has the possible disadvantage of making this AA query nonlazy. Preserve the behavior of assuming pointsToConstantMemory implying dereferenceable for now, but maybe this should be changed.	2022-07-18 17:23:41 -04:00
Simon Pilgrim	dc681bc2e0	[AArch64] Regenerate arm64-vector-ldst.ll test checks	2022-07-16 15:27:47 +01:00
Simon Pilgrim	2d4c43d45f	[AArch64] Regenerate arm64-neon-simd-ldst-one.ll test checks	2022-07-16 15:27:47 +01:00
Simon Pilgrim	f7a9c5c61b	[AArch64] Regenerate arm64-vmax.ll test checks	2022-07-16 15:27:47 +01:00
Simon Pilgrim	ccc2a60bc8	[AArch64] Regenerate arm64-mul.ll test checks	2022-07-16 15:27:47 +01:00
Simon Pilgrim	a5d0122f75	[DAG] Canonicalize non-inlane shuffle -> AND if all non-inlane referenced elements are known zero As mentioned on D127115, this patch that attempts to recognise shuffle masks that could be simplified to a AND mask - we already have a similar transform that will fold AND -> 'clear mask' shuffle, but this patch handles cases where the referenced elements are not from the same lane indices but are known to be zero. Differential Revision: https://reviews.llvm.org/D129150	2022-07-16 11:38:24 +01:00
Simon Pilgrim	95440c39a0	[AArch64] Regenerate optimize-imm.ll test checks	2022-07-15 13:54:17 +01:00
Edd Barrett	2e62a26fd7	[stackmaps] Legalise patchpoint arguments. This is similar to D125680, but for llvm.experimental.patchpoint (instead of llvm.experimental.stackmap). Differential review: https://reviews.llvm.org/D129268	2022-07-15 12:01:59 +01:00
Nikita Popov	2a721374ae	[IR] Don't use blockaddresses as callbr arguments Following some recent discussions, this changes the representation of callbrs in IR. The current blockaddress arguments are replaced with `!` label constraints that refer directly to callbr indirect destinations: ; Before: %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo)) to label %asm.fallthrough [label %foo] ; After: %res = callbr i8* asm "", "=r,r,!i"(i8* %x) to label %asm.fallthrough [label %foo] The benefit of this is that we can easily update the successors of a callbr, without having to worry about also updating blockaddress references. This should allow us to remove some limitations: * Allow unrolling/peeling/rotation of callbr, or any other clone-based optimizations (https://github.com/llvm/llvm-project/issues/41834) * Allow duplicate successors (https://github.com/llvm/llvm-project/issues/45248) This is just the IR representation change though, I will follow up with patches to remove limtations in various transformation passes that are no longer needed. Differential Revision: https://reviews.llvm.org/D129288	2022-07-15 10:18:17 +02:00
Amara Emerson	d4f84df0a0	[GlobalISel] Change widenScalar of G_FCONSTANT to mutate into G_CONSTANT. Widening a G_FCONSTANT by extending and then generating G_FPTRUNC doesn't produce the same result all the time. Instead, we can just transform it to a G_CONSTANT of the same bit pattern and truncate using a plain G_TRUNC instead. Fixes https://github.com/llvm/llvm-project/issues/56454 Differential Revision: https://reviews.llvm.org/D129743	2022-07-14 11:05:10 -07:00
Guozhi Wei	2f11b3a6d7	[MachineCombiner] Don't compute the latency of transient instructions If an MI will not generate a target instruction, we should not compute its latency. Then we can compute more precise instruction sequence cost, and get better result. Differential Revision: https://reviews.llvm.org/D129615	2022-07-14 17:08:14 +00:00
Cullen Rhodes	a2fe6aa9eb	[NFC][SVE] Add tests for zext(cmpeq(x, splat(0))) In preparation for follow up patch folding above to CNOT. Reviewed By: paulwalker-arm, peterwaller-arm Differential Revision: https://reviews.llvm.org/D129625	2022-07-14 09:32:20 +00:00
Amara Emerson	cef349a3c8	[GlobalISel] Re-generate some checks.	2022-07-14 00:57:53 -07:00
Amara Emerson	2824bdd92f	[GlobalISel] Fix and(load)->zextload combine crash. We shouldn't use getOpcodeDef() if we need to guarantee the def has only one user since under the hood it may look through copies and optimization hints, which themselves may have multiple users.	2022-07-13 14:58:45 -07:00
Simon Pilgrim	04419a5f55	[AArch64] Regenerate arm64-vshuffle.ll test checks Not quite ready to use the update script, but can clean it up slightly so the diffs aren't so great.	2022-07-13 13:52:15 +01:00
Simon Pilgrim	9bc77b7342	[AArch64] Regenerate arm64-vselect.ll test checks The ushll -> sshll FIXME had been fixed long ago, but nobody noticed because the test wasn't checking for either.....	2022-07-13 13:52:15 +01:00
Kai Nacke	4ae254e488	Revert "[GISel] Unify use of getStackGuard" This reverts commit `e60b4fb2b7`.	2022-07-12 17:00:43 -04:00
Kai Nacke	e60b4fb2b7	[GISel] Unify use of getStackGuard Some rework of getStackGuard() based on comments in https://reviews.llvm.org/D129505. - getStackGuard() now creates and returns the destination register, simplifying calls - the pointer type is passed to getStackGuard() to avoid recomputation - removed PtrMemTy in emitSPDescriptorParent(), because this type is only used here when loading the value but not when storing the value Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D129576	2022-07-12 16:46:37 -04:00
David Green	c5d68ca1c8	[AArch64] Fix subtarget features for tests. NFC These tests were using instructions that require feature predicates that were not enabled.	2022-07-12 11:03:40 +01:00
Rosie Sumpter	e5edc1b5ee	[AArch64][SVE] Ensure PTEST operands have type nxv16i1 Currently any legal predicate types will be pattern-matched when creating a PTEST instruction. This could be a problem in future since PTEST always uses the .B specifier for the operand, but it is not always guaranteed that the extra lanes of unpacked types (e.g. nxv4i1) are zero. This patch ensures the operands of PTEST are type nxv16i1, where the undef lanes are set to zero. Differential Revision: https://reviews.llvm.org/D129282/	2022-07-12 09:27:59 +01:00
David Green	74c9030a11	[AArch64] Move fp16 intrinsics tests to new file. NFC The enabled features for the existing test do not always include FP16, which is required for the intrinsics.	2022-07-11 20:36:46 +01:00
Sanjay Patel	d0eec5f7e7	[SDAG] enhance sub->xor fold to ignore signbit As suggested in the post-commit feedback for D128123, we can ease the mask constraint to ignore the MSB (and make the code easier to read by adjusting the check). https://alive2.llvm.org/ce/z/bbvqWv	2022-07-11 12:37:50 -04:00
Sanjay Patel	4670c1e55d	[AArch64] add test for possible sub->xor enhancement; NFC	2022-07-11 12:37:35 -04:00
David Green	28b41237e6	[InterleaveAccessPass] Handle multi-use binop shuffles D89489 added some logic to the interleaved access pass to attempt to undo the folding of shuffles into binops, that instcombine performs. If early-cse is run too, the binops may be commoned into a single operation with multiple shuffle uses. It is still profitable reverse the transform though, so long as all the uses are shuffles. Differential Revision: https://reviews.llvm.org/D129419	2022-07-10 17:24:37 +01:00
David Green	6ce63e267a	[ARM][AArch64] Add additional test for multiuse vldn binop shuffles. NFC For D129419, these are the same as the existing test, but run through -early-cse.	2022-07-09 22:48:12 +01:00
Paul Osmialowski	b17754bcaa	[SimplifyLibCalls] refactor pow(x, n) expansion where n is a constant integer value Since the backend's codegen is capable to expand powi into fmul's, it is not needed anymore to do so in the ::optimizePow() function of SimplifyLibCalls.cpp. What is sufficient is to always turn pow(x, n) into powi(x, n) for the cases where n is a constant integer value. Dropping the current expansion code allowed relaxation of the folding conditions and now this can also happen at optimization levels below Ofast. The added CodeGen/AArch64/powi.ll test case ensures that powi is actually expanded into fmul's, confirming that this refactor did not cause any performance degradation. Following an idea proposed by David Sherwood <david.sherwood@arm.com>. Differential Revision: https://reviews.llvm.org/D128591	2022-07-09 12:00:22 -04:00
Douglas Yung	eba6d92f69	Replace hard coded number with regex so the test passes on downstream projects that may define additional opcodes.	2022-07-08 15:49:37 -07:00
Matt Arsenault	02769f2b3f	AArch64/GlobalISel: Stop using legal s1 values As far as I can tell treating s1 values as legal makes no sense. There are no allocatable 1-bit registers. SelectionDAG legalizes the usual set of boolean operations to 32-bits, and this should do the same. This avoids some special case handling in the selector of s1 values, and some extra code to look through truncates. This makes some code worse at -O0, since nothing cleans up the and 1 the artifact combiner inserts. We could probably add some non-essential combines or teach the artifact combiner to elide intermediates betweeen boolean uses and defs.	2022-07-08 11:55:08 -04:00
Matt Arsenault	1ee6ce9bad	GlobalISel: Allow forming atomic/volatile G_ZEXTLOAD SelectionDAG has a target hook, getExtendForAtomicOps, which it uses in the computeKnownBits implementation for ATOMIC_LOAD. This is pretty ugly (as is having a separate load opcode for atomics), so instead allow making use of atomic zextload. Enable this for AArch64 since the DAG path defaults in to the zext behavior. The tablegen changes are pretty ugly, but partially helps migrate SelectionDAG from using ISD::ATOMIC_LOAD to regular ISD::LOAD with atomic memory operands. For now the DAG emitter will emit matchers for patterns which the DAG will not produce. I'm still a bit confused by the intent of the isLoad/isStore/isAtomic bits. The DAG implementation rejects trying to use any of these in combination. For now I've opted to make the isLoad checks also check isAtomic, although I think having isLoad and isAtomic set on these makes most sense.	2022-07-08 11:55:08 -04:00
Sanjay Patel	8b75671314	[SDAG] try to replace subtract-from-constant with xor This is almost the same as the abandoned D48529, but it allows splat vector constants too. This replaces the x86-specific code that was added with the alternate patch D48557 with the original generic combine. This transform is a less restricted form of an existing InstCombine and the proposed SDAG equivalent for that in D128080: https://alive2.llvm.org/ce/z/OUm6N_ Differential Revision: https://reviews.llvm.org/D128123	2022-07-08 08:14:24 -04:00
Petar Avramovic	2483f43d47	[AArch64][GlobalISel] Fix call lowering for <3 x i32> vector arguments Differential Revision: https://reviews.llvm.org/D129194	2022-07-08 10:25:45 +02:00
Sergei Barannikov	2247fdc84d	[SelectionDAG] computeKnownBits / ComputeNumSignBits for the remaining overflow-aware nodes Some overflow-aware nodes were missing from the switches in computeKnownBits and ComputeNumSignBits.	2022-07-08 09:19:19 +01:00
Florian Hahn	afdedd405e	[AArch64] Try to re-use extended operand for SETCC with vector ops. Try to re-use an already extended operand for SetCC with vector operands feeding an extended select. Doing so avoids requiring another full extension of the SET_CC result when lowering the select. This improves lowering for certain extend/cmp/select patterns operating. For example with v16i8, this replaces 6 instructions for the extra extension with 4 separate selects. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY On the AArch64 cores I have access to, the patch improves performance of the vector loop by ~10%. This could be generalized per follow-ups, but the initial version targets one of the more important cases in combination with D96522. Alive2 modeling: * sext EQ https://alive2.llvm.org/ce/z/5upBvb * sext NE https://alive2.llvm.org/ce/z/zbEcJp * zext EQ https://alive2.llvm.org/ce/z/_xMwof * zext NE https://alive2.llvm.org/ce/z/5FwKfc * zext unsigned predicate: https://alive2.llvm.org/ce/z/iEwLU3 * sext signed predicate: https://alive2.llvm.org/ce/z/aMBega Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D120481	2022-07-07 16:50:00 -07:00
Florian Hahn	bbf2725cf6	[AArch64] Add vector select tests with odd element types. Additional tests for D120481.	2022-07-07 14:07:25 -07:00
Bradley Smith	60d6be5dd3	[LegalizeTypes] Replace vecreduce_xor/or/and with vecreduce_add/umax/umin if not legal This is done during type legalization since the target representation of these nodes may not be valid until after type legalization, and after type legalization the fact that these are dealing with i1 types may be lost. Differential Revision: https://reviews.llvm.org/D128996	2022-07-07 09:33:54 +00:00
Sander de Smalen	6106a767b7	[AArch64][SME] Update load/store intrinsics to take predicate corresponding to element size. Instead of using <vscale x 16 x i1> for all the loads/stores, we now use the appropriate predicate type according to the element size, e.g. ld1b uses <vscale x 16 x i1> ld1w uses <vscale x 4 x i1> ld1q uses <vscale x 1 x i1> Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D129083	2022-07-07 07:39:27 +00:00
Sander de Smalen	15c3ba8a44	[AArc64] Legalisation of compares and truncates of nxv1i1 types. Truncates and compares require some changes to generic legalisation functions to use ElementCount instead of getNumElements. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D129082	2022-07-07 07:39:27 +00:00
Florian Hahn	9cb00e7133	[AArch64] Clean up vselect-ext.ll test, add tests with ne/ep preds.	2022-07-06 22:05:40 -07:00
Luo, Yuanke	b45829dcdf	[AArch64][GlobalISel] update the gisel test case	2022-07-07 11:43:42 +08:00
Luo, Yuanke	21007259dc	[AArch64][GlobalISel] update the test case with update_mir_test_checks.py	2022-07-07 09:03:32 +08:00
Eli Friedman	696f53665d	[AsmPrinter] Fix bit pattern for i1 vectors. Vectors are defined to be tightly packed, regardless of the element type. The AsmPrinter didn't realize this, and was allocating extra padding. Fixes https://github.com/llvm/llvm-project/issues/49286 Fixes https://github.com/llvm/llvm-project/issues/53246 Fixes https://github.com/llvm/llvm-project/issues/55522 Differential Revision: https://reviews.llvm.org/D129164	2022-07-06 12:56:47 -07:00
Sander de Smalen	5d4f6ce229	[AArch64][SVE] Zero other lanes when doing OR reduction on unpacked predicate using ptest. When the predicate vector is unpacked, we cannot assume anything about the values in the other lanes. We have to make sure we use the correct predicate where we know that the other lanes have been zeroed. Reviewed By: RosieSumpter Differential Revision: https://reviews.llvm.org/D129081	2022-07-06 16:12:44 +00:00
Sander de Smalen	95e08824fa	[AArch64] Add support for various operations on nxv1i1 types. The supported operations are: * Logical operations (and, or, xor, bic) * Logical reductions (and, or, xor, [us]min, [us]max) * Conversions to/from svbool_t * Predicate count (CNTP) Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D128835	2022-07-06 15:57:11 +00:00
Sander de Smalen	e7db82d701	[AArch64] NFC: Fix name mangling in sve-insert-vector.ll	2022-07-06 15:57:11 +00:00
Shilei Tian	1023ddaf77	[LLVM] Add the support for fmax and fmin in atomicrmw instruction This patch adds the support for `fmax` and `fmin` operations in `atomicrmw` instruction. For now (at least in this patch), the instruction will be expanded to CAS loop. There are already a couple of targets supporting the feature. I'll create another patch(es) to enable them accordingly. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D127041	2022-07-06 10:57:53 -04:00
Amara Emerson	b97013fd60	[AArch64][GlobalISel] Add support for sret demotion. To do this, we need to implement a target hook and make a minor change to the call lowering to demote arguments to sret if they can't be handled by the calling conventions. Fixes issue 56295 Differential Revision: https://reviews.llvm.org/D129098	2022-07-05 15:23:47 -07:00
Nikita Popov	a4772cbaf0	Revert "[SimplifyCFG] Thread branches on same condition in more cases (PR54980)" This reverts commit `4e545bdb35`. The newly added test is the third infinite combine loop caused by this change. In this case, it's a combination of the branch to common dest and jump threading folds that keeps peeling off loop iterations. The core problem here is that we ideally would not thread over loop backedges, both because it is potentially non-profitable (it may break canonical loop structure) and because it may result in these kinds of loops. Unfortunately, due to the lack of a dominator tree in SimplifyCFG, there is no good way to prevent this. While we have LoopHeaders, this is an optional structure and we don't do a good job of keeping it up to date. It would be fine for a profitability check, but is not suitable for a correctness check. So for now I'm just giving up here, as I don't see a good way to robustly prevent infinite combine loops. Fixes https://github.com/llvm/llvm-project/issues/56203.	2022-07-05 16:57:46 +02:00
David Sherwood	77b13a57a9	[AArch64][SME] Add SME addha/va intrinsics This patch adds new the following SME intrinsics: @llvm.aarch64.sme.addva @llvm.aarch64.sme.addha Differential Revision: https://reviews.llvm.org/D127861	2022-07-05 09:47:17 +01:00
Sander de Smalen	5785717e18	[AArch64] Add support for insert/extract for nxv1i1 types. This patch adds patterns and tests for subvector insert/extract intrinsics to/from all legal predicate types. Reviewed By: david-arm, kmclaughlin Differential Revision: https://reviews.llvm.org/D128975	2022-07-04 15:54:03 +00:00
Florian Hahn	f0089fae1d	[AArch64] Add additional tests for D120481.	2022-07-04 09:29:42 +01:00
David Green	d100a30a54	[AArch64] Regenerate more tests. NFC Also includes some adjustments for asm.py to handle updating more cases successfully.	2022-07-03 15:49:16 +01:00
Sander de Smalen	690db16422	[AArch64] Make nxv1i1 types a legal type for SVE. One motivation to add support for these types are the LD1Q/ST1Q instructions in SME, for which we have defined a number of load/store intrinsics which at the moment still take a `<vscale x 16 x i1>` predicate regardless of their element type. This patch adds basic support for the nxv1i1 type such that it can be passed/returned from functions, as well as some basic support to support some existing tests that result in a nxv1i1 type. It also adds support for splats. Other operations (e.g. insert/extract subvector, logical ops, etc) will be supported in follow-up patches. Reviewed By: paulwalker-arm, efriedma Differential Revision: https://reviews.llvm.org/D128665	2022-07-01 15:11:13 +00:00
Matt Devereau	5166345f50	[SVE][AArch64] Refine hasSVEArgsOrReturn As described in aapcs64 (https://github.com/ARM-software/abi-aa/blob/2022Q1/aapcs64/aapcs64.rst#scalable-vector-registers) AAVPCS is used only when registers z0-z7 take an SVE argument. This fixes the case where floats occupy the lower bits of registers z0-z7 but SVE arguments in registers greater than z7 cause a function to use AAVPCS where it should use AAPCS. Moving SVE function deduction from AArch64RegisterInfo::hasSVEArgsOrReturn to AArch64TargetLowering::LowerFormalArguments where physical register lowering is more accurate fixes this. Differential Revision: https://reviews.llvm.org/D127209	2022-07-01 13:24:55 +00:00
Paul Walker	43f8a6b749	[SVE] Use CPY to zero active lanes of a floating point vector. Patterns exist for the integer case that are trivially expandable to cover 0.0f. Differential Revision: https://reviews.llvm.org/D128669	2022-07-01 00:59:00 +01:00
Paul Walker	2be4a7a209	[SVE] Extend "and(ipg,cmp(x,y))" patterns to cover the case when y is an immediate. Differential Revision: https://reviews.llvm.org/D128479	2022-07-01 00:56:22 +01:00
Fangrui Song	45f3a5aae7	[AArch64] Add target feature "all" This is used by disassemblers: `llvm-mc -disassemble -mattr=` and `llvm-objdump --mattr=`. The main use case is for llvm-objdump to disassemble all known instructions (D128030). In user-facing tools, "all" is intentionally not supported in producers: integrated assembler (`.arch_extension all`), clang -march (`-march=armv9.3a+all`). Due to the code structure, `llvm-mc -mattr=+all` `llc -mattr=+all` are not rejected (they are internal tool). Add `llvm/test/CodeGen/AArch64/mattr-all.ll` to catch behavior changes. AArch64SysReg::SysReg::haveFeatures: check `FeatureAll` to print `AArch64SysReg::SysReg::AltName` for some system registers (e.g. `ERRIDR_EL1, RNDR`). AArch64.td: add `AssemblerPredicateWithAll` to additionally test `FeatureAll`. Change all `AssemblerPredicate` (except `UseNegativeImmediates`) to `AssemblerPredicateWithAll`. utils/TableGen/{DecoderEmitter,SubtargetFeatureInfo}.cpp: support arbitrarily nested all_of, any_of, and not. Note: A predicate supports all_of, any_of, and not. For a target (though currently not for AArch64) an encoding may be disassembled differently with different target features. Note: AArch64MCCodeEmitter::computeAvailableFeatures is not available to the disassembler. Reviewed By: peter.smith, lenary Differential Revision: https://reviews.llvm.org/D128029	2022-06-30 10:37:58 -07:00
Fangrui Song	ab2e1c0804	[AArch64] Make FeatureFuseAdrpAdd a tune feature Update D120104 to add FeatureFuseAdrpAdd to Processor#TuneFeatures instead of Processor#Features, similar to FeatureFuseAES, and matching Tune*. This enables FeatureFuseAdrpAdd for `clang -mcpu=xxx -mtune=generic` even if xxx does not set FeatureFuseAdrpAdd. Reviewed By: alexander-shaposhnikov, peter.smith Differential Revision: https://reviews.llvm.org/D128787	2022-06-30 10:32:38 -07:00
Bradley Smith	424b2ae9ab	[AArch64][SVE] Match (add x (urshr/srshr y c)) -> ursra/srsra x y c Differential Revision: https://reviews.llvm.org/D128447	2022-06-29 12:10:50 +00:00
Guozhi Wei	2fcc495549	[AArch64] Update test case. Replace the new generated virtual register number with a macro to avoid name mismatch due to different configuration of compiler.	2022-06-29 01:37:56 +00:00
Guozhi Wei	ddc9e8861c	[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce latency Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance to evaluate which instruction sequence has lower latency. Differential Revision: https://reviews.llvm.org/D124564	2022-06-28 21:42:51 +00:00
Egor Zhdan	be4b40d5bc	[MC] Allow annotating custom sections as zerofill This is already possible for e.g. `cstring_literals`, but the entry for zerofill was unnamed. rdar://90336380 Differential Revision: https://reviews.llvm.org/D128654	2022-06-28 15:08:47 +01:00
Sander de Smalen	fbefc62a96	[AArch64][SME] Sink tile offset operands into the loop for load/store instructions. This helps ISel decompose the generic offset for the tile into a base + offset. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D128508	2022-06-28 10:28:36 +01:00
David Sherwood	054faac9f9	[AArch64][SME] Add SVE2 psel, uclamp, sclamp and revd IR intrinsics When the SME feature is enabled we also gain access to a few extra SVE2 instructions. This patch adds LLVM IR intrinsics to make use of these new instructions: @llvm.aarch64.sve.psel @llvm.aarch64.sve.revd @llvm.aarch64.sve.sclamp @llvm.aarch64.sve.uclamp Differential Revision: https://reviews.llvm.org/D128332	2022-06-28 10:25:06 +01:00
Sander de Smalen	180cc74de9	[AArch64] Update SME load/store intrinsics to work on opaque pointers. These intrinsics should be able to use opaque pointers, because the load/store type is already encoded in their names and return/operand type. Reviewed By: c-rhodes Differential Revision: https://reviews.llvm.org/D128505	2022-06-28 09:50:11 +01:00
David Sherwood	f916ee0fb1	[AArch64][SME] Add SME outer product intrinsics This patch adds the following intrinsics to support the SME ACLE: * @llvm.aarch64.sme.mopa: Non-widening outer product + accumulate * @llvm.aarch64.sme.mops: Non-widening outer product + subtract * @llvm.aarch64.sme.mopa.wide: Widening outer product + accumulate * @llvm.aarch64.sme.mops.wide: Widening outer product + subtract * @llvm.aarch64.sme.smopa.wide: Widening signed sum of outer product + accumulate * @llvm.aarch64.sme.smops.wide: Widening signed sum of outer product + subtract * @llvm.aarch64.sme.umopa.wide: Widening unsigned sum of outer product + accumulate * @llvm.aarch64.sme.umops.wide: Widening unsigned sum of outer product + subtract * @llvm.aarch64.sme.sumopa.wide: Widening signed by unsigned sum of outer product + accumulate * @llvm.aarch64.sme.sumops.wide: Widening signed by unsigned sum of outer product + subtract * @llvm.aarch64.sme.usmopa.wide: Widening unsigned by signed sum of outer product + accumulate * @llvm.aarch64.sme.usmops.wide: Widening unsigned by signed sum of outer product + subtract Differential Revision: https://reviews.llvm.org/D127956	2022-06-28 09:41:44 +01:00
Paul Walker	049e107139	[NFC][SVE] Add more tests of vector compares and selects taking an immediate operand. Increases coverage of predicated compares (int and fp) along with predicated zeroing of active floating point lanes.	2022-06-27 18:22:40 +01:00
Edd Barrett	94fbb147c8	[STACKMAPS] Document+test UINT64_MAX stack size. When a function does a dynamic stack allocation, the function's stack size (in the stack map) is reported as UINT64_MAX. This change tests and documents this property. Differential Revision: https://reviews.llvm.org/D128525	2022-06-27 11:57:07 +01:00
Bradley Smith	a83aa33d1b	[IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace. Differential Revision: https://reviews.llvm.org/D127976	2022-06-27 10:48:45 +00:00
Paul Walker	fadea4413e	[NFC][SVE] Auto-generate CHECK lines for intrinsic codegen tests.	2022-06-27 00:07:00 +01:00
Matt Arsenault	e7bc73739a	GlobalISel: Make LoadStoreOpt preserve all Avoids dropping CSE info analysis	2022-06-25 09:24:54 -04:00
David Green	03c65c0d32	[AArch64] Convert vector add(ext, ext) into ext(add(ext, ext)) Given a vector add or sub from extends that needs more that one 'step' (i.e i8 to i32 or i16 to i64), we can transform the sequence to sext(add(ext, ext)), to allow the add(ext, ext) to become a single uaddl and a larger extend, producing less instructions in total. https://alive2.llvm.org/ce/z/S2T4k- Differential Revision: https://reviews.llvm.org/D128426	2022-06-24 10:04:28 +01:00
David Green	ae7fbcd199	[AArch64] Add addition extend of add/sub neon tests. NFC	2022-06-24 09:41:10 +01:00
Matt Devereau	cc3ef26f60	[AArch64][SVE] Add sve.dupq.lane(insert(constant vector, 0), 0) ld1rq tests	2022-06-24 07:40:31 +00:00
Lian Wang	770fe864fe	[SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128239	2022-06-24 02:32:53 +00:00
Bradley Smith	6f27df5084	[AArch64][SVE] Match (add x (lsr/asr y c)) -> usra/ssra x y c Differential Revision: https://reviews.llvm.org/D128045	2022-06-23 14:56:21 +00:00
Florian Mayer	9320a32bb9	[MTE] [HWASan] Use LoopInfo for reachability queries. The reachability queries default to "reachable" after exploring too many basic blocks. LoopInfo helps it skip over the whole loop. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D127917	2022-06-22 15:28:49 -07:00
Florian Mayer	476ced4b89	[MTE] [HWASan] Support diamond lifetimes. We were overly conservative and required a ret statement to be dominated completely be a single lifetime.end marker. This is quite restrictive and leads to two problems: * limits coverage of use-after-scope, as we degenerate to use-after-return; * increases stack usage in programs, as we have to remove all lifetime markers if we degenerate to use-after-return, which prevents reuse of stack slots by the stack coloring algorithm. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D127905	2022-06-22 11:16:34 -07:00
David Sherwood	aa0a413df8	[AArch64][SME] Add some SME PSTATE setting/query intrinsics This patch adds support for: * Querying the PSTATE.SM state with @llvm.aarch64.sme.get.pstatesm * Reading/writing the TPIDR2 register with new @llvm.aarch64.sme.get.tpidr2 and @llvm.aarch64.sme.set.tpidr2 intrinsics. Tests added here: CodeGen/AArch64/sme-get-pstatesm.ll CodeGen/AArch64/sme-read-write-tpidr2.ll Differential Revision: https://reviews.llvm.org/D127957	2022-06-22 10:26:45 +01:00
Paul Walker	696169a35d	[SVE] Add isel patterns that match "FpImm - A" to the immediate form of FSUBR. Differential Revision: https://reviews.llvm.org/D128200	2022-06-22 00:11:24 +01:00
Paul Walker	7b285ae0e8	[SVE] Lower "unpredicated" sabd/uabd intrinsics to ISD::ABDS/U. This enables an existing transformation that when combined with an add will emit saba/uaba instructions. Differential Revision: https://reviews.llvm.org/D128198	2022-06-22 00:02:51 +01:00
Martin Sebor	b19194c032	[InstCombine] handle subobjects of constant aggregates Remove the known limitation of the library function call folders to only work with top-level arrays of characters (as per the TODO comment in the code) and allows them to also fold calls involving subobjects of constant aggregates such as member arrays.	2022-06-21 11:55:14 -06:00
David Green	3f81841474	[AArch64] Add Extract(DUP(C)) as a canonical constant. As a followup to D128144, this adds extract(DUP(C)) as a canonical constant to prevent it being transformed back into a BUILD_VECTOR, leading to an infinite loop.	2022-06-21 09:51:22 +01:00
Serguei Katkov	163c77b2e0	[AARCH64 folding] Do not fold any copy with NZCV There is no instruction to fold NZCV, so, just do not do it. Without the fix the added test case crashes with an assert "Mismatched register size in non subreg COPY" Reviewed By: danilaml Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D127294	2022-06-21 10:38:49 +07:00
Luo, Yuanke	44e8a205f4	[fastregalloc] Enhance the heuristics for liveout in self loop. For below case, virtual register is defined twice in the self loop. We don't need to spill %0 after the third instruction `%0 = def (tied %0)`, because it is defined in the second instruction `%0 = def`. 1 bb.1 2 %0 = def 3 %0 = def (tied %0) 4 ... 5 jmp bb.1 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D125079	2022-06-21 09:18:49 +08:00
David Green	c0ecbfa4fd	[AArch64] Known bits for AArch64ISD::DUP An AArch64ISD::DUP is just a splat, where the known bits for each lane are the same as the input. This teaches that to computeKnownBitsForTargetNode. Problems arise for constants though, as a constant BUILD_VECTOR can be lowered to an AArch64ISD::DUP, which SimplifyDemandedBits would then turn back into a constant BUILD_VECTOR leading to an infinite cycle. This has been prevented by adding a isTargetCanonicalConstantNode node to prevent the conversion back into a BUILD_VECTOR. Differential Revision: https://reviews.llvm.org/D128144	2022-06-20 19:11:57 +01:00
David Sherwood	013358632e	[AArch64][SME] Add the zero intrinsic The SME zero instruction takes a mask as an input declaring which 64-bit element tiles should be zeroed. There is a 1:1 mapping between the zero intrinsic and the instruction, however we also want to make the register allocator aware that some tile registers are being written to. We can actually just use the custom inserter for a pseudo instruction to correctly mark all the appropriate registers in the mask as implicitly defined by the operation. Differential Revision: https://reviews.llvm.org/D127843	2022-06-20 14:27:59 +01:00
Simon Pilgrim	e4a124dda5	[DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m) Similar to the existing (shl (srl x, c1), c2) fold Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125836	2022-06-20 08:37:38 +01:00
Lian Wang	ab25e263a9	[SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D127710	2022-06-20 06:30:26 +00:00
Simon Pilgrim	1ebe5cac46	[DAG] SimplifyDemandedBits - add DemandedElts handling to ISD::SIGN_EXTEND_INREG simplification	2022-06-19 15:35:29 +01:00
Sanjay Patel	f126643862	[AArch64] add tests for masked subtract; NFC	2022-06-17 14:56:32 -04:00
Paul Walker	0e21f1d56a	[SelectionDAG] Extend WidenVecOp_INSERT_SUBVECTOR to cover more cases. WidenVecOp_INSERT_SUBVECTOR only supported cases where widening effectively converts the insert into a copy. However, when the widened subvector is no bigger than the vector being inserted into and we can be sure there's no loss of data, we can simply emit another INSERT_SUBVECTOR. Fixes: #54982 Differential Revision: https://reviews.llvm.org/D127508	2022-06-17 12:39:42 +00:00
Paul Walker	fcd058acc9	[SVE][CodeGen] Restructure SVE fixed length tests to use update_llc_test_checks. Most tests have been updated to make use of vscale_range to reduce the number of RUN lines. For the remaining RUN lines the check prefixes have been updated to ensure the original expectation of the manual CHECK lines is maintained after update_llc_test_checks is run.	2022-06-17 00:30:56 +01:00
David Green	527b8ccde5	[AArch64] Regenerate 3 codegen test files. NFC	2022-06-16 18:23:05 +01:00
Adrian Tong	55311801f0	Allow bitwidth difference when checking for isOneOrOneSplat. This helps handling a case where the BUILD_VECTOR has i16 element type and i32 constant operands t2: v8i16 = setcc t8, t17, setult:ch t3: v8i16 = BUILD_VECTOR Constant:i32<1>, ... t4: v8i16 = and t2, t3 t5: v8i16 = add t8, t4 This can be turned into t5: v8i16 = sub t8, t2, and allows us to remove t3 and t4 from the DAG. Differential Revision: https://reviews.llvm.org/D127354	2022-06-16 16:04:20 +00:00
David Sherwood	6f6fa5aa10	[AArch64][SME] Add SME cntsb/h/w/d intrinsics These intrinsics return the number of elements in a streaming vector, for example aarch64.sme.cntsw returns the number of 32-bit elements. When in streaming mode these are equivalent to aarch64.sve.cntb/h/w/d with an input value of 1. I have implemented these intrinsics using the rdsvl instruction and added tests here: CodeGen/AArch64/SME/sme-intrinsics-rdsvl.ll Differential Revision: https://reviews.llvm.org/D127853	2022-06-16 10:50:25 +01:00
Simon Pilgrim	adfcdb0d0d	[AArch64] Add test case from D127354	2022-06-15 12:21:00 +01:00
David Sherwood	db7061e2ca	[NFC] Move tests CodeGen/AArch64/SME/sme-* -> CodeGen/AArch64/sme-*	2022-06-15 11:10:29 +01:00
David Sherwood	5fa2416ea0	[AArch64][SME] Add SME read/write intrinsics that map to the mova instruction This patch adds implementations for the read/write SME ACLE intrinsics: @llvm.aarch64.sme.read.horiz @llvm.aarch64.sme.read.vert @llvm.aarch64.sme.write.horiz @llvm.aarch64.sme.write.vert These all map to the SME mova instruction. Differential Revision: https://reviews.llvm.org/D127414	2022-06-15 10:31:07 +01:00
David Sherwood	bd61664167	[AArch64][SME] Add ldr/str (fill/spill) intrinsics This patch adds implementations for the fill/spill SME ACLE intrinsics: @llvm.aarch64.sme.ldr @llvm.aarch64.sme.str Differential Revision: https://reviews.llvm.org/D127317	2022-06-14 13:58:22 +01:00
Florian Hahn	e5c4308ba1	[InterleavedLoadComb] Rename uses when inserting new uses. This fixes a crash due to uses needing to be renamed.	2022-06-14 13:15:23 +01:00
Rosie Sumpter	2c4e44752d	[AArch64][SME] Add load/store intrinsics This patch adds implementations for the load/store SME ACLE intrinsics: - @llvm.aarch64.sme.ld1* - @llvm.aarch64.sme.st1* Differential Revision: https://reviews.llvm.org/D127210	2022-06-14 11:11:22 +01:00
Serguei Katkov	095bf6be28	[Greedy RegAlloc] Fix the handling of split register in last chance re-coloring. This is a fix for https://github.com/llvm/llvm-project/issues/55827. When register we are trying to re-color is split the original register (we tried to recover) has no uses after the split. However in rollback actions we assign back physical register to it. Later it causes different assertions. One of them is in attached test. This CL fixes this by avoiding assigning physical register back to register which has no usage or its live interval now is empty. Reviewed By: arsenm, qcolombet Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D127281	2022-06-14 12:04:17 +07:00
Amaury Séchet	9ecf423453	[AArch64] Autogenerate sve-fixed-length tests. NFC As per title. This makes it easier to work onc hange that require "shotgun diffs" over the codebase. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D127118	2022-06-13 12:50:07 +00:00
Simon Pilgrim	7d8fd4f5db	[DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere Another issue unearthed by D127115 We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with). D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded. This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late. Differential Revision: https://reviews.llvm.org/D127595	2022-06-13 11:48:18 +01:00
zhongyunde	3cefcdb8c6	[test] Add test for D126700 NFC	2022-06-13 18:37:29 +08:00
David Green	dbac0e83d1	[AArch64] Mark smull and umull as commutative.	2022-06-13 09:24:15 +01:00
David Green	963c0a0147	[AArch64] Look through bitcast when looking for extract_high subvector Since D61806, DAGCombiner has folded subvector_extract(bitcast(..)) to bitcast(subvector_extract(..)), which would place a bitcast between a subvector_extract and the operation that could be converted to a high neon instruction (like smull2). This adds better matching for the subvector_extract, through the tablegen extract_high PatFrags to optionally skip the bitcast under little ending, still matchings an extract of the high half of the input vector. I didn't update the extract_high of a duplicate patterns, as the ComplexPattern need names operands. I did add a extract_high_dup_v8i16 PatFrag to abstract away the common code, which can be extended in a future patch. Differential Revision: https://reviews.llvm.org/D126782	2022-06-12 10:59:09 +01:00
Amaury Séchet	982f65a68e	Autogenerate sve-fixed-length-frame-offests-crash.ll . NFC	2022-06-12 01:54:10 +00:00
Amaury Séchet	d35da7f78a	Autogenerate sve-fixed-length-bitselect.ll . NFC	2022-06-12 01:53:05 +00:00
Simon Pilgrim	44a0cd25df	[DAG] visitINSERT_VECTOR_ELT - add <1 x ???> insert_vector_elt(v0,extract_vector_elt(v1,0),0) special case handling Check if we're just replacing one v1x?? vector with another	2022-06-11 19:30:00 +01:00
David Green	338fd211e7	[AArch64] Generate FADDP from shuffled fadd As a follow up to D126686, this does the same fold for floating point add and shuffle. In this case it is limited to reassoc either x[0]+x[1] or x[1]+x[0] for both result[0] and results[1]. Differential Revision: https://reviews.llvm.org/D127087	2022-06-11 14:16:37 +01:00
David Green	82fcd7397a	[AArch64] Add extra faddp codegen tests. NFC	2022-06-11 12:57:48 +01:00
Paul Walker	10d55c4634	[SelectionDAG] Remove invalid TypeSize conversion from WidenVecOp_BITCAST. Differential Revision: https://reviews.llvm.org/D127322	2022-06-11 10:41:13 +01:00
Eli Friedman	0ff51d5dde	Fix interaction of CFI instructions with MachineOutliner. 1. When checking if a candidate contains a CFI instruction, actually iterate over all of the instructions, instead of stopping halfway through. 2. Make sure copied CFI directives refer to the correct instruction. Fixes https://github.com/llvm/llvm-project/issues/55842 Differential Revision: https://reviews.llvm.org/D126930	2022-06-10 13:37:49 -07:00
Guillaume Chatelet	38637ee477	[clang] Add support for __builtin_memset_inline In the same spirit as D73543 and in reply to https://reviews.llvm.org/D126768#3549920 this patch is adding support for `__builtin_memset_inline`. The idea is to get support from the compiler to easily write efficient memory function implementations. This patch could be split in two: - one for the LLVM part adding the `llvm.memset.inline.*` intrinsics. - and another one for the Clang part providing the instrinsic as a builtin. Differential Revision: https://reviews.llvm.org/D126903	2022-06-10 13:13:59 +00:00
Ahmed Bougacha	c68b469e07	[AArch64][SVE] Don't crash on pre-legalizer types in extload combine. This was assuming the vector types were MVTs, but they don't have to be. Note that the concrete output of the test isn't very useful, since it's dominated by nonsensical calling convention lowering for the weird types. Differential Revision: https://reviews.llvm.org/D126505	2022-06-09 10:33:21 -07:00
Guillaume Chatelet	dc3367970e	[SelectionDAG] Handle bzero/memset libcalls globally instead of per target Differential Revision: https://reviews.llvm.org/D127279	2022-06-09 08:34:55 +00:00
Florian Mayer	0593ce5f0b	[MC] Add 'G' to augmentation string for MTE instrumented functions This was agreed on in https://lists.llvm.org/pipermail/llvm-dev/2020-May/141345.html The thread proposed two options * add a character to augmentation string and handle in libuwind * use a separate personality function. It was determined that this is the simpler and better option. This is part of ARM's Aarch64 ABI: https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id22 The next step after this is teaching libunwind to untag when this augmentation character is set. Reviewed By: MaskRay, eugenis Differential Revision: https://reviews.llvm.org/D127007	2022-06-08 12:36:32 -07:00
David Green	a1aef4f374	[AArch64] Remove ToBeRemoved from AArch64MIPeepholeOpt The ToBeRemoved is used to remove any MachineInstructions that are no longer needed, making sure we don't invalidate the iterator that is currently in use by erasing the instruction straight away. This makes issues for keeping the code in SSA from though, where subsequent transforms that require SSA form may have been broken by previous peepholes. If, instead, we use make_early_inc_range the iteration issue shouldn't be present, so long as we do not remove the subsequent instruction in the peephole optimizations. That way the code between transforms is kept in SSA form, meaning hopefully less things that can go wrong. Differential Revision: https://reviews.llvm.org/D127296	2022-06-08 17:26:07 +01:00
David Green	33ead6e444	[AArch64] Add tests for bitcast high register extracts. NFC	2022-06-08 15:26:31 +01:00
Paul Walker	d88354213c	[SelectionDAG] Remove invalid TypeSize conversion from PromoteIntRes_BITCAST. Extend the TypeWidenVector case of PromoteIntRes_BITCAST to work with TypeSize directly rather than silently casting to unsigned. To accomplish this I've extended TypeSize with an interface that essentially allows TypeSize division when both operands have the same number of dimensions. There still exists combinations of scalable vector bitcasts that cause compiler crashes. I call these out by adding "is missing" entries to sve-bitcast. Depends on D126957. Fixes: #55114 Differential Revision: https://reviews.llvm.org/D127126	2022-06-08 10:30:07 +01:00
Paul Walker	a1121c31d8	[SVE] Fix incorrect code generation for bitcasts of unpacked vector types. Bitcasting between unpacked scalable vector types of different element counts is not a NOP because the live elements are laid out differently. 01234567 e.g. nxv2i32 = XX??XX?? nxv4f16 = X?X?X?X? Differential Revision: https://reviews.llvm.org/D126957	2022-06-08 10:30:07 +01:00
David Green	bccbf5276e	[AArch64] Remove isDef32 isDef32 would attempt to make a guess at which SelectionDag nodes were 32bit sources, and use the nature of 32bit AArch64 instructions implicitly zeroing the upper register half to not emit zext that were expected to already be zero. This was a bit fragile though, needing to guess at the correct opcodes that do not become 32bit defs later in ISel. This patch removed isDef32, relying on the AArch64MIPeephole optimizer to remove redundant SUBREG_TO_REG nodes. A part of SelectArithExtendedRegister was left with the same logic as a heuristic to prevent some regressions from it picking less optimal sequences. The AArch64MIPeepholeOpt pass also needs to be taught that a COPY from a FPR will become a FMOVSWr, which it lowers immediately to make sure that remains true through register allocation. Fixes #55833 Differential Revision: https://reviews.llvm.org/D127154	2022-06-07 18:57:59 +01:00
Matt Arsenault	56303223ac	llvm-reduce: Don't assert on functions which don't track liveness Use the query that doesn't assert if TracksLiveness isn't set, which needs to always be available. We also need to start printing liveins regardless of TracksLiveness.	2022-06-07 10:00:25 -04:00
David Green	6468feaeac	[AArch64] Regenerate arm64-shifted-sext.ll and add a test from #55833 . NFC	2022-06-07 13:55:53 +01:00
Michael Kitzan	b7fcf6632f	[GISel] Add new combines for G_ADD Patch adds new GICombineRules for G_ADD: G_ADD(x, G_SUB(y, x)) -> y G_ADD(G_SUB(y, x), x) -> y Patch additionally adds new combine tests for AArch64 target for these new rules. Reviewed by: paquette Differential Revision: https://reviews.llvm.org/D87936	2022-06-06 11:19:45 -07:00
David Green	4ea1b43527	[AArch64] Generate ADDP from shuffled add This adds a fold of add(x, shuffle(x, <1,0,3,2,5,4,...>), into shuffle(addp(x), <0,0,1,1,2,2,..>. The ADDP instruction takes two vectors and returns one, adding adjacent pairs. So we match x in a custom combine as it is lowered from a v8i32. The original code would be 2 rev64 and 2 add, with the new code being a single addp with a zip1;zip2 shuffle, producing smaller code. Differential Revision: https://reviews.llvm.org/D126686	2022-06-06 11:39:51 +01:00
Paul Walker	2dde272db7	[SVE] Refactor sve-bitcast.ll to include all combinations for legal types. Patch enables custom lowering for MVT::nxv4bf16 because otherwise the refactored test file triggers a selection failure. The reason for the refactoring it to highlight cases where the generated code is wrong.	2022-06-03 12:09:19 +01:00
David Green	79e3b043e5	[AArch64] Add extra addp codegen tests. NFC	2022-06-03 11:36:40 +01:00
Serguei Katkov	24e16e4af2	[SSAUpdaterImpl] Do not generate phi node with all the same incoming values If all available vals to basic block are the same - do not build new phi node and just use this value. Reviewed By: sameerds Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D126525	2022-06-03 12:24:33 +07:00
Serguei Katkov	c4d955dd7f	[MachineSSAUpdate] Add a test for redundant phi generation.	2022-06-03 11:27:14 +07:00
Paul Walker	48ea26a387	[SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR. LowerINSERT_SUBVECTOR emits AArch64ISD::UUNPK## when lowering scalable vector floating point INSERT_SUBVECTOR. However, these nodes only make sense for integer types and thus isel patterns do not exist for floating point, which leads to isel failures. This patch ensures floating point operands are cast to integer before the core lowering takes place. Fixes: #55037 Differential Revision: https://reviews.llvm.org/D126487	2022-06-02 14:51:04 +01:00
Nikita Popov	41d5033eb1	[IR] Enable opaque pointers by default This enabled opaque pointers by default in LLVM. The effect of this is twofold: * If IR that contains neither explicit ptr nor %T* types is passed to tools, we will now use opaque pointer mode, unless -opaque-pointers=0 has been explicitly passed. * Users of LLVM as a library will now default to opaque pointers. It is possible to opt-out by calling setOpaquePointers(false) on LLVMContext. A cmake option to toggle this default will not be provided. Frontends or other tools that want to (temporarily) keep using typed pointers should disable opaque pointers via LLVMContext. Differential Revision: https://reviews.llvm.org/D126689	2022-06-02 09:40:56 +02:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit `430ac5c302`. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Fangrui Song	873d2aff42	[AArch64][test] Replace -march with -mtriple for llc RUN lines -march is error-prone: -march inherits the OS and environment from the default target triple. Use -mtriple which is more common.	2022-05-31 22:39:43 -07:00
Alexander Shaposhnikov	a72cc958a3	[CodeGen][AArch64] Add support for LDAPR This diff adds support for LDAPR (RCPC extension) (https://github.com/llvm/llvm-project/issues/55561). Differential revision: https://reviews.llvm.org/D126250 Test plan: ninja check-all	2022-05-31 21:40:50 +00:00
Sander de Smalen	9c38fc111b	[AArch64] Remove references to Streaming SVE from target features. Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler predicates and instead reason about 'SVE' and 'SME' as its higher level features, rather than trying to model this runtime mode through explicit feature flags. This patch is largely NFC. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D125977	2022-05-31 16:25:01 +02:00
David Green	5cb14dc5a3	[AArch64] Look through copy in MachineCombiner FMUL patterns. This is a small addition to D99662, which added machine combiner patterns for FMUL(DUP(..)). Due to the way these are generated from ISel, they may also be FMUL(COPY(DUP(..))), which this patch now ignores the no-op COPY in. Differential Revision: https://reviews.llvm.org/D126632	2022-05-31 09:28:00 +01:00
Edd Barrett	d245974e1a	Test stackmap support for floating point types. It appears that float support is complete, or at least, the stackmap records emitted are not inconceivable (I must admit that I don't know about many of the architectures under test here). One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect) quirk of the stackmap format: in the case of a Register record, the Offset or SmallConstant field can encode a sub-register index! I've only ever seen this field zero for Register entries up until now.	2022-05-30 10:49:32 +01:00
David Green	99b0078064	[AArch64] Tests for showing MachineCombiner COPY patterns. NFC	2022-05-30 10:47:44 +01:00
David Green	9a3144d078	[AArch64] Reuse larger DUP if available If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the larger node using a vector extract to obtain the smaller. This comes up in the smull/smlal code, but needs a small fixup to allow the smull2 code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still match smull2 extracts. Differential Revision: https://reviews.llvm.org/D126449	2022-05-29 19:42:13 +01:00
Serge Pavlov	bdd0093f4d	[GlobalISel] Add G_IS_FPCLASS Add a generic opcode to represent `llvm.is_fpclass` intrinsic. Differential Revision: https://reviews.llvm.org/D121454	2022-05-27 13:49:47 +07:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit `4d8d2580c5`.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Adrian Tong	7c13ae6490	Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP. This is then used in AArch64 to remove copy instructions after taildup ran in machine block placement Differential Revision: https://reviews.llvm.org/D125335	2022-05-26 18:43:16 +00:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply `62a9b36fcf` and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after `62a9b36fcf`. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit `62a9b36fcf`. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Paul Walker	6f215ca680	[SelectionDAG] Add support to widen ISD::STEP_VECTOR operations. Fixes: #55165 Differential Revision: https://reviews.llvm.org/D126168	2022-05-24 22:42:37 +01:00
Chen Zheng	62a9b36fcf	[MachineSink] replace MachineLoop with MachineCycle MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-24 01:16:19 -04:00
Craig Topper	569d8945f3	[DAGCombiner][AArch64] Don't fold (smulo x, 2) -> (saddo x, x) if VT is i2. If the VT is i2, then 2 is really -2. Test has not been commited yet, but diff shows the change. Fixes PR55644. Differential Revision: https://reviews.llvm.org/D126213	2022-05-23 11:13:57 -07:00
Craig Topper	75eb0576de	[AArch64] Add test case for pr55644. NFC	2022-05-23 11:13:57 -07:00
Edd Barrett	c5e5cf1258	Test stackmap support for i128 This diff adds tests that check the currently-working stackmap cases for i128. This will help ensure no regressions are later introduced by D125680 (when ready). Note that i128 stackmap support is currently incomplete, so we cant test all i128 functionality: i128 constants >= 2^{63} crash LLVM non-constant i128s crash LLVM So this change tests only constant i128 operands of value < 2^{63}. A couple of incorrect comments are also fixed.	2022-05-23 11:56:24 +01:00
Simon Pilgrim	dd231f02a3	[AArch64] Regenerate andandshift.ll test checks	2022-05-23 11:48:24 +01:00
Andre Vieira	572fc7d2fd	[AArch64] Order STP Q's by ascending address This patch adds an AArch64 specific PostRA MachineScheduler to try to schedule STP Q's to the same base-address in ascending order of offsets. We have found this to improve performance on Neoverse N1 and should not hurt other AArch64 cores. Differential Revision: https://reviews.llvm.org/D125377	2022-05-23 09:50:44 +01:00
Florian Hahn	0cc981e021	[AArch64] implement isReassocProfitable, disable for (u\|s)mlal. Currently reassociating add expressions can lead to failing to select (u\|s)mlal. Implement isReassocProfitable to skip reassociating expressions that can be lowered to (u\|s)mlal. The same issue exists for the *mlsl variants as well, but the DAG combiner doesn't use the isReassocProfitable hook before reassociating. To be fixed in a follow-up commit as this requires DAGCombiner changes as well. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125895	2022-05-23 09:39:00 +01:00
David Green	6ef5e242f2	[AArch64] Fix assumptions on input type of tryCombineFixedPointConvert It is possible for the input type to not be v2i64 or v4i32, so weaken the assertion to a return, fixing the crash in the new test. Fixes #55606	2022-05-23 08:55:54 +01:00
Paul Walker	258dac43d6	[SVE] Enable use of 32bit gather/scatter indices for fixed length vectors Differential Revision: https://reviews.llvm.org/D125193	2022-05-22 12:32:30 +01:00
Bill Wendling	d497129f9b	[AArch64] Use proper instruction mnemonics for FPRs The FPR128 regs need MOVIv2d_ns and SVE regs need DUP_ZI_D. Differential Revision: https://reviews.llvm.org/D126083	2022-05-20 12:02:26 -07:00
Rahul Anand R	534ea8bca5	[AArch64] Generate AND in place of CSEL for predicated CTTZ This patch implements a for a target specific optimization that replaces the cmp and csel from cttz with an and mask. Recommitted with a fix for truncated value sizes. Differential Revision: https://reviews.llvm.org/D123782	2022-05-20 13:41:32 +01:00
Bill Wendling	6e00a34cdb	[AArch64] Add support for -fzero-call-used-regs Support the "-fzero-call-used-regs" option on AArch64. This involves much less specialized code than the X86 version. Most of the checks can be done with TableGen. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D124836	2022-05-19 16:58:28 -07:00
David Green	602f81ec33	[AArch64] Fix zero element TBL indices A TBL instruction will fill out-of-range values with 0's, something used in D121139 to turn tbl2 with a zero input into tbl1s. This works OK for v16i8, but for v8i8 the input is still treated as a v16i8, so out-of-range values (like a lane index of 8) would end up loading values from the top half of the input register. Clean this up by detecting the out of range values and making sure they really use out of range values. There is a fix for swapped indices of 64bit input vectors too, which could be incorrectly adjusted if the zerovector was the first operand. Fixes #55545 Differential Revision: https://reviews.llvm.org/D125865	2022-05-19 13:54:35 +01:00
David Green	dd644ddf85	[AArch64] Extend zero vector TBL codegen tests. NFC	2022-05-19 13:01:55 +01:00
Jon Roelofs	d699e54ca2	Fix an or+and miscompile w/ GlobalISel Fixes #55284	2022-05-18 19:09:47 -07:00
Michael Kitzan	29bebb0237	[GISel] Add new combines for G_FMINNUM/MAXNUM and G_FMINIMUM/MAXIMUM I noticed https://reviews.llvm.org/D87415 added SDAG combines to fold FMIN/MAX instrs with NaNs. The patch implements the same NaN combines for GISel GMIR FMIN/MAX opcodes: G_FMINNUM(X, NaN) -> X G_FMAXNUM(X, NaN) -> X G_FMINIMUM(X, NaN) -> NaN G_FMAXIMUM(X, NaN) -> NaN The patch adds AArch64 tests for these combines as well. Reviewed by: arsenm Differential revision: https://reviews.llvm.org/D125819	2022-05-18 12:08:53 -07:00
Craig Topper	46eef76876	[DAGCombiner] Fix bug in MatchBSwapHWordLow. This function tries to match (a >> 8) \| (a << 8) as (bswap a) >> 16. If the SRL isn't masked and the high bits aren't demanded, we still need to ensure that bits 23:16 are zero. After the right shift they will be in bits 15:8 which is where the important bits from the SHL end up. It's only a bswap if the OR on bits 15:8 only takes the bits from the SHL. Fixes PR55484. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125641	2022-05-18 09:23:18 -07:00
Florian Hahn	a74e075908	[AArch64] Add tests showing reassoc breaks (s\|u)ml(a\|s)l selection.	2022-05-18 16:40:28 +01:00
Simon Pilgrim	939affc67d	[AArch64] neon-vmull-high-p64.ll - fix name/check mismatch identified in D125604 Typos meant that we weren't actually checking the function name, which wasn't accounting for mangling	2022-05-18 13:24:28 +01:00
Simon Pilgrim	1584b2c74e	[AArch64] fp16-v8-instructions.ll - remove some old defunct CHECKS identified in D125604 Typos meant that the update script never removed them	2022-05-18 12:49:05 +01:00
David Green	4c6a070a2c	[AArch64] Teach perfect shuffles tables about D-lane movs Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle tables, slightly lowering the costs a little more. This is a rough improvement in general, especially if you ignore mov v0.16b, v2.16b type moves that are often artefacts of the calling convention. The D register movs are encoded as (0x4 \| LaneIdx), and to generate a D register move we are required to bitcast into a higher type, but it is otherwise very similar to the S-lane mov's already supported. Differential Revision: https://reviews.llvm.org/D125477	2022-05-17 18:16:45 +01:00
David Green	8311fb7512	[AArch64] Extra tests useful for D-lane shuffles. NFC	2022-05-17 11:15:55 +01:00
Martin Storsjö	64a3c63e01	[MC] [Win64EH] Check for matches between epilogs and the prolog on ARM64 This allows sharing opcodes between prolog and epilog even when there is more than one epilog. I didn't make any handcrafted special MC level testcases for this (yet at least), but it does seem to have the expected effect on two existing CodeGen level testcases. Differential Revision: https://reviews.llvm.org/D125619	2022-05-17 00:41:39 +03:00
Martin Storsjö	cabefea2ec	[MC] [Win64EH] Try writing an ARM64 "packed epilog" even if the epilog doesn't share opcodes with the prolog The "packed epilog" form only implies that the epilog is located exactly at the end of the function (so the location of the epilog is implicit from the epilog opcodes), but it doesn't have to share opcodes with the prolog - as long as the total number of opcode bytes and the offset to the epilog fit within the bitfields. This avoids writing a 4 byte epilog scope in many cases. (I haven't measured how much this shrinks actual xdata sections in practice though.) Differential Revision: https://reviews.llvm.org/D125536	2022-05-17 00:41:39 +03:00
Paul Walker	ee8aa351e4	[AArch64] Use ADDV for boolean xor reductions. NEON does not have native support for xor reductions. However, when reducing predicate vectors the operation is synonymous with an add reduction that is supported. Differential Revision: https://reviews.llvm.org/D125605	2022-05-16 22:34:12 +01:00
David Green	5d29d75273	[AArch64] Predicate SSHLL;SCVTF patterns behind UseAlternateSExtLoadCVTF32 There have been some patterns in the AArch64 backend to optimize code of the form: ldrsh w8, [x0] scvtf s0, w8 to: ldr h0, [x0] sshll v0.4s, v0.4h, #0 scvtf s0, s0 The idea is to remove the GRP->FPR move, but in reality is making code larger and slower (or the same) on all the cpus I tried. This patch adds the UseAlternateSExtLoadCVTF32 predicate similar to nearby related pattern. Differential Revision: https://reviews.llvm.org/D125470	2022-05-16 18:00:30 +01:00
Craig Topper	74f6ded49d	[AArch64][ARM][RISCV][X86] Add test cases for PR55484. NFC This bug is in generic DAG combine and easily reproducible on many targets. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D125640	2022-05-16 09:28:11 -07:00
David Green	7272a8c23c	[AArch64] Update check lines in arm64-scvt.ll. NFC	2022-05-16 15:50:39 +01:00
Bradley Smith	7ff5148d64	[DAGCombine] Support splat_vector nodes in (and (extload)) dagcombine Differential Revision: https://reviews.llvm.org/D125367	2022-05-16 11:25:20 +00:00
Tim Northover	1ddc6ab1a9	AArch64: support ISel for fence instructions Only the most conservative of the DAG patterns matched, leaving GISel with "dmb ish" everywhere which is inefficient.	2022-05-16 12:01:18 +01:00
David Green	4c3e51ecfa	[AArch64] Handle 64bit vectors in tryCombineFixedPointConvert Under some situations we can visit 64bit vector extract elements in tryCombineFixedPointConvert, where an assert fires as they are expected to have been converted to 128bit. Turn the assert into an if statement, bailing out and letting the extract be handled first. Also invert some ifs, using early exits to reduce indentation. Fixes #55417	2022-05-16 11:08:47 +01:00
Alex Richardson	c8b44600c5	[AArch64] Avoid emitting MOVID when NEON is disabled Previously, creating a zero floating-point constant used MOVID even when NEON was disabled which resulted in the following fatal error: `Attempting to emit MOVID instruction but the Feature_HasNEON predicate(s) are not met` Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125237	2022-05-14 14:40:51 +00:00
Alex Richardson	09551251e3	[AArch64] Add missing HasNEON predicates to int->float patterns I was trying to compile code with -march=+nosimd and hit various instruction predicate verification errors, this patch should address the ones I saw in integer to floating-pointer conversions. I noticed that for signed conversions, some non-NEON instruction sequences are shorter. I don't know if the longer one is still faster on current architectures (the patterns date back to the initial backend import) Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125308	2022-05-14 14:15:36 +00:00
Alex Richardson	f8639133b5	[AArch64] Baseline test for D125307 Differential Revision: https://reviews.llvm.org/D125240	2022-05-14 14:15:36 +00:00
Eli Friedman	96c2a0c9ff	[GlobalIsel] Fix fallback if stack protector isn't supported. When GlobalISel fails, we need to report the error, and we need to set the FailedISel property. We skipped those steps if stack protector insertion failed, which led to a very strange miscompile. Differential Revision: https://reviews.llvm.org/D125584	2022-05-13 14:17:27 -07:00
Amara Emerson	41fef10449	[GlobalISel] Combine G_SHL, G_ASHR, G_SHL of undef shifts to undef. Differential Revision: https://reviews.llvm.org/D125041	2022-05-13 12:20:34 -07:00
Sam Parker	6d53d35efd	[TypePromotion] Avoid some unnecessary truncs Recommit. Check for legal zext 'sinks' before inserting a trunc. Differential Revision: https://reviews.llvm.org/D115451	2022-05-13 09:45:20 +01:00
Sam Parker	84b5f7c38c	[NFC][TypePromotion][AArch64] Tests Simplify existing test and also add it as a codegen test for aarch64.	2022-05-13 09:27:42 +01:00
Karl Meakin	0298cce257	[AArch64] Add `foldADCToCINC` DAG combine. Differential revision: https://reviews.llvm.org/D123781	2022-05-12 22:21:20 +01:00
Karl Meakin	d29fc6e7d2	[AArch64] Replace `performANDSCombine` with `performFlagSettingCombine`. `performFlagSettingCombine` is a generalised version of `performANDSCombine` which also works on `ADCS` and `SBCS`. Differential revision: https://reviews.llvm.org/D124464	2022-05-12 22:17:23 +01:00
Craig Topper	cec249c60d	[TypePromotion] Promote undef by converting to 0. If we're promoting an undef I think that means that we expect the upper bits are zero. undef doesn't guarantee that. This patch replaces undef with 0 to ensure this. This matches how a zext or sext of undef would be folded by InstCombine/InstSimplify. I haven't found a failure from this was just thinking through the code. Differential Revision: https://reviews.llvm.org/D123174	2022-05-12 09:09:24 -07:00
Nikita Popov	44d85259d0	[AArch64] Preserve chain when lowering fixed length load to SVE (PR55281) When a fixed length load is lowered to an SVE masked load, the result chain is currently set to the input chain of the old load, rather than the result chain of the new load. This may cause stores to be incorrectly reordered. Fixes https://github.com/llvm/llvm-project/issues/55281. Differential Revision: https://reviews.llvm.org/D125464	2022-05-12 16:03:32 +02:00
David Green	442c351b2b	Revert "[AArch64] Generate AND in place of CSEL for predicated CTTZ" This reverts commit `7dcd0ea683` due to issues reported postcommit with the correctness of truncated cttzs.	2022-05-10 17:17:03 +01:00
Rosie Sumpter	1a2665902f	[AArch64][SVE] Improve codegen when extracting first lane of active lane mask When extracting the first lane of a predicate created using the llvm.get.active.lane.mask intrinsic, it should give the same codegen as when the predicate is created using the llvm.aarch64.sve.whilelo intrinsic, since get.active.lane.mask is lowered to whilelo. This patch ensures the codegen is the same by recognizing llvm.get.active.lane.mask as a flag-setting operation in this case. Differential Revision: https://reviews.llvm.org/D125215	2022-05-09 13:56:04 +01:00
Alban Bridonneau	fef81131d9	[SVE] Optimize new cases for lowerConvertToSVBool Converts to SVBool are already considered as a nop, if they are converting an operand from a ptrue or a cmp, because they zero the extra predicate lanes by construction. This patch adds 2 similar cases: - The wide cmp, which were not directly recognized by the test for other forms of cmp - Splats of 1, which will be generated as ptrue, and as such will also zero the extra predicate lines. Reviewed By: paulwalker-arm, peterwaller-arm Differential Revision: https://reviews.llvm.org/D124908	2022-05-09 10:17:57 +00:00
Rahul Anand R	7dcd0ea683	[AArch64] Generate AND in place of CSEL for predicated CTTZ This patch implements a for a target specific optimization that replaces the cmp and csel from cttz with an and mask. Differential Revision: https://reviews.llvm.org/D123782	2022-05-09 10:28:20 +01:00
David Green	830c18047b	[AArch64] Add missing NVCAST patterns. There were apparently some missing NVCAST patterns. This fills them in using foreach, as opposed to having the specify them individually. Fixes #55321	2022-05-07 21:08:14 +01:00
Amaury Séchet	06fad8bc05	[DAGCombine] Add node in the worklist in topological order in CombineTo This is part of an ongoing effort toward making DAGCombine process the nodes in topological order. This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124743	2022-05-07 16:24:31 +00:00
Kazu Hirata	26ba347fbb	[AArch64] Add llvm/test/CodeGen/AArch64/i256-math.ll This patch adds a test case for i256 additions and subtractions. I'm leaving out multiplications for now, which would result in very long sequences. Differential Revision: https://reviews.llvm.org/D125125	2022-05-06 14:26:12 -07:00
Kazu Hirata	fffb6e6afd	[AArch64] Fix sub with carry `13403a70e4` introduced a bug where we generate the outgoing carry inverted, which in turn breaks the lowering of @llvm.usub.sat.i128, returning the normal difference on saturation and zero otherwise. Note that AArch64 has peculiar semantics where the subtraction instructions generate borrow inverted. The problem is that we mix the two forms of semantics -- the normal carry and inverted carry -- in the area of extended precision subtractions. Specifically, we have three problems: - lowerADDSUBCARRY takes the non-inverted incoming carry from a subtraction and feeds it to SBCS without inverting it first. - lowerADDSUBCARRY makes available the outgoing carry from SBCS without inverting it. - foldOverflowCheck folds: (SBC{S} l r (CMP (CSET LO carry) 1)) => (SBC{S} l r carry) When the incoming carry flag is set, CSET LO results in zero. CMP in turn generates a borrow, clearing the carry flag. Instead, we should fold: (SBC{S} l r (CMP 0 (CSET LO carry))) => (SBC{S} l r carry) When the incoming carry flag is set, CSET LO results in zero. CMP does not generate a borrow, setting the carry flag. IIUC, we should use the normal (that is, non-inverted) semantics for carry everywhere. This patch fixes the three problems above. This patch does not add any new testcases because we have a plenty of them covering the instruction in question. In particular, @u128_saturating_sub is identical to the testcase in the motivating issue. Fixes: #55253 Differential Revision: https://reviews.llvm.org/D124976	2022-05-06 11:04:17 -07:00
Craig Topper	76f90a9d71	[SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift. Otherwise we have garbage in the upper bits that can affect the results of the UREM. Fixes PR55296. Differential Revision: https://reviews.llvm.org/D125076	2022-05-06 09:26:30 -07:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
Amara Emerson	586802eb72	[GlobalISel] Re-generate some tests.	2022-05-05 14:14:36 -07:00
Craig Topper	084f967370	[SelectionDAG] Constant fold (sext_inreg undef, VT) to 0 instead of undef. The result of sign_extend_inreg needs to have as many sign bits as requested by the VT argument. The easiest way to guarantee this is to fold it to 0. SystemZ test was modified to avoid using undef. Fixes https://github.com/llvm/llvm-project/issues/55178 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124696	2022-05-05 09:45:35 -07:00
Amara Emerson	87e3646a1f	[AArch64][GlobalISel] Add undef combines to postlegalizer combiner.	2022-05-05 09:22:08 -07:00
David Green	c7a6b11b7e	[ARM][AArch64] Add some extra shuffle conversion test coverage. NFC This adds a big endian run line for the AArch64 TRN tests and regenerated the check lines, along with adding an extra MVE VMOVN case and regenerating vector-DAGCombine.ll for easier updating.	2022-05-05 15:27:44 +01:00
Bradley Smith	8f623f4ab0	[AArch64][SVE] Restore SP from FP when SVE CSRs and variable sized objects are present Without SVE, after a dynamic stack allocation has modified the SP, it is presumed that a frame pointer restoration will revert the SP back to it's correct value prior to any caller stack being restored. However the SVE frame is restored using the stack pointer directly, as it is located after the frame pointer. This means that in the presence of a dynamic stack allocation, any SVE callee state gets corrupted as SP has the incorrect value when the SVE state is restored. To address this issue, when variable sized objects and SVE CSRs are present, treat the stack as having been realigned, hence restoring the stack pointer from the frame pointerr prior to restoring the SVE state. Differential Revision: https://reviews.llvm.org/D124615	2022-05-04 12:57:03 +00:00
Alex Borcan	afaa56df7a	Implement support for __llvm_addrsig for MachO in llvm-mc The __llvm_addrsig section is a section that the linker needs for safe icf. This was not yet implemented for MachO - this is the implementation. It has been tested with a safe deduplication implementation inside lld. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D123751	2022-05-03 18:19:18 -04:00
Jon Roelofs	e1c808b36e	Fix zero-width bitfield extracts to emit 0 Fixes #55129	2022-05-03 14:46:42 -07:00
Philipp Tomsich	64816e68f4	[AArch64] Support for Ampere1 core Add support for the Ampere Computing Ampere1 core. Ampere1 implements the AArch64 state and is compatible with ARMv8.6-A. Differential Revision: https://reviews.llvm.org/D117112	2022-05-03 15:54:02 +01:00
Bradley Smith	96bbd359ed	[AArch64][SVE] Only fold frame indexes referencing SVE objects into SVE loads/stores Currently we always fold frame indexes into SVE load/store instructions, however these instructions can only encode VL scaled offests. This means that when we are accessing a fixed length stack object with these instructions, the folded in frame index gets pulled back out during frame lowering. This can cause issues when we have no spare registers and no emergency spill slot. Rather than causing issues like this, don't fold in frame indexes that reference fixed length objects. Fixes: #55041 Differential Revision: https://reviews.llvm.org/D124457	2022-05-03 09:48:13 +00:00
Sanjay Patel	747c6a0c73	[SDAG] fix miscompile when casting int->FP->int This is the codegen equivalent of D124692. As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. https://alive2.llvm.org/ce/z/KtaDmd Differential Revision: https://reviews.llvm.org/D124771	2022-05-02 14:57:27 -04:00
Sanjay Patel	cb3fb08508	[AArch64] add tests for int->FP->int casts; NFC Copied from x86 tests for multi-target coverage. Also, provides coverage for target-specific asm testing for Alive2 or its follow-ons. See #55150 and D124692	2022-05-02 09:18:12 -04:00
Paul Walker	f10a8f6752	[LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG SIGN_EXTEND_INREG expansion can trigger a TypeSize error because "VT.getSizeInBits() == 1" is used to detect for a boolean without first verifying VT is a scalar.	2022-04-30 19:21:48 +01:00
Craig Topper	6affe87bda	[DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask. We try to match as a disguised rotate by constant of these forms (shl (X \| Y), C1) \| (srl X, C2) --> (rotl X, C1) \| (shl Y, C1) (shl X, C1) \| (srl (X \| Y), C2) --> (rotl X, C1) \| (srl Y, C2) We may have also looked through an AND to find the shift. If we did, we need to apply a mask to the result. I'll add an AArch64 test and pre-commit it and the RISC-V test tomorrow. Fixes PR55201. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124711	2022-04-30 11:02:30 -07:00
Craig Topper	808c33ace5	[RISCV][AArch64] Pre-commit tests for D124711. NFC	2022-04-30 10:59:20 -07:00
Saleem Abdulrasool	24ba1302b3	AArch64: modify Swift async frame record storage on Windows The frame layout on Windows differs from that on other platforms. It will spill the registers in descending numeric value (i.e. x30, x29, ...). Furthermore, the x29, x30 pair is particularly important as it is used for the fast stack walking. As a result, we cannot simply insert the Swift async frame record in between the store. To provide the simplistic search mechanism, always spill the async frame record prior to the spilled registers. This was caught by the assertion failure in the frame lowering code when building the runtime for Windows AArch64. Fixes: #55058 Differential Revision: https://reviews.llvm.org/D124498 Reviewed By: mstorsjo	2022-04-30 09:01:33 -07:00
Craig Topper	65dbd8d793	[SelectionDAG] Pre-commit test for D124696. NFC	2022-04-29 17:24:13 -07:00
Paul Walker	b481512485	[SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine. This is essentially a refactoring patch but allows more cases to be caught, hence the output changes to some tests. Differential Revision: https://reviews.llvm.org/D122994	2022-04-29 17:42:33 +01:00
Paul Walker	23c509754d	[DAGCombiner] Stop invalid sign conversion in refineIndexType. When looking through extends of gather/scatter indices it's safe to convert a known positive signed index to unsigned, but unsigned indices must remain unsigned. Depends On D123318 Differential Revision: https://reviews.llvm.org/D123326	2022-04-29 14:20:13 +01:00
Paul Walker	59588f0a3d	[SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost. getGatherScatterIndexIsExtended currently looks through all SIGN_EXTEND_INREG operations regardless of their input type. This patch restricts the code to only look through i32->i64 extensions, which are the ones supported implicitly by SVE addressing modes. Differential Revision: https://reviews.llvm.org/D123318	2022-04-29 14:20:13 +01:00
Paul Walker	7a0b897e86	[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling refineUniformBase and selectGatherScatterAddrMode both attempt the transformation: base(0) + index(A+splat(B)) => base(B) + index(A) However, this is only safe when index is not implicitly scaled. Differential Revision: https://reviews.llvm.org/D123222	2022-04-29 12:35:16 +01:00
Nikita Popov	4e545bdb35	[SimplifyCFG] Thread branches on same condition in more cases (PR54980) SimplifyCFG implements basic jump threading, if a branch is performed on a phi node with constant operands. However, InstCombine canonicalizes such phis to the condition value of a previous branch, if possible. SimplifyCFG does support this as well, but only in the very limited case where the same condition is used in a direct predecessor -- notably, this does not include the common diamond pattern (i.e. two consecutive if/elses on the same condition). This patch extends the code to look back a limited number of blocks to find a branch on the same value, rather than only looking at the direct predecessor. Fixes https://github.com/llvm/llvm-project/issues/54980. Differential Revision: https://reviews.llvm.org/D124159	2022-04-29 09:44:05 +02:00
Paul Walker	3c382ed71f	[AArch64][SVE] Remove BIC from logical operation DestructiveBinaryComm patterns This reverts part of https://reviews.llvm.org/D124224 that causes an assert because the register allocator triggers a pathological situation where there's no safe way to insert a zeroing MOVPFRX instruction.	2022-04-22 15:07:55 +01:00
zhongyunde	e1afae0311	[AArch64][SVE] Add some logical operation DestructiveBinaryComm patterns Add DestructiveBinaryComm* patterns for ORR, EOR, AND and BIC. The above instructions requires that the source and destination registers are equal, so use movprfx should be beneficial to performance. note: BIC (i.e. A & ~B) is not a commutative operation. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D124224	2022-04-22 20:31:00 +08:00
Daniel Kiss	de07cde67b	[AArch64] Emit .cfi_negate_ra_state for PAC-auth instructions. autiasp, autibsp instructions are the counterpart of paciasp/pacibsp instructions therefore let's emit .cfi_negate_ra_state for these too. In case of Armv8.3 instruction set the retaa/retbb will do the return and authentication in one step here we can't emit the . cfi_negate_ra_state because that would be point after the ret* instruction. Reviewed By: nickdesaulniers, MaskRay Differential Revision: https://reviews.llvm.org/D111780	2022-04-22 13:25:57 +02:00
Karl Meakin	81904454f7	[AArch64] Add `foldOverflowCheck` DAG combine Differential Revision: https://reviews.llvm.org//D123779	2022-04-21 14:56:38 +01:00
Karl Meakin	13403a70e4	[AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY Differential Revision: https://reviews.llvm.org/D123322	2022-04-21 14:56:37 +01:00
Pengxuan Zheng	38612fbc89	Reland "[COFF, ARM64] Add __break intrinsic" https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170 Reland after fixing the test failure. The failure was due to conflict with a change (D122983) which was merged right before this patch. Reviewed By: rnk, mstorsjo Differential Revision: https://reviews.llvm.org/D124032	2022-04-20 13:01:30 -07:00
Pengxuan Zheng	bff8356b19	Revert "[COFF, ARM64] Add __break intrinsic" This reverts commit `8a9b4fb4aa`.	2022-04-20 11:57:49 -07:00
Pengxuan Zheng	8a9b4fb4aa	[COFF, ARM64] Add __break intrinsic https://docs.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170 Reviewed By: rnk, mstorsjo Differential Revision: https://reviews.llvm.org/D124032	2022-04-20 11:20:26 -07:00
Alexey Bataev	2cca53c815	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 09:37:16 -07:00
Alexey Bataev	5f7ac15912	Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer." This reverts commit `2f49163b33` to fix a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284	2022-04-20 06:35:55 -07:00
Alexey Bataev	2f49163b33	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 05:32:56 -07:00
Matt Arsenault	d16945d31b	AArch64/GlobalISel: Add -global-isel-abort=1 to select tests Otherwise the legalizer verifier error isn't triggered since the default is fallback.	2022-04-19 21:04:32 -04:00
David Green	73dc996428	[AArch64] Add lane moves to PerfectShuffle tables This teaches the perfect shuffle tables about lane inserts, that can help reduce the cost of many entries. Many of the shuffle masks are one-away from being correct, and a simple lane move can be a lot simpler than trying to use ext/zip/etc. Because they are not exactly like the other masks handled in the perfect shuffle tables, they require special casing to generate them, with a special InsOp Operator. The lane to insert into is encoded as the RHSID, and the move from is grabbed from the original mask. This helps reduce the maximum perfect shuffle entry cost to 3, with many more shuffles being generatable in a single instruction. Differential Revision: https://reviews.llvm.org/D123386	2022-04-19 14:49:50 +01:00
David Green	cc9495f679	[AArch64] Only mark cost 1 perfect shuffles as legal The perfect shuffle tables encode a cost of either 0 (a nop-copy) or 1 (a single instruction) with a cost encoding of 0 in the upper 2 bits. All perfect shuffles with any cost are then marked as legal shuffles though (the maximum encoded cost is 3), which can confuse the DAG combiner into thinking the shuffles are cheaper than the should be. Limiting legal shuffles to single instructions seems to do better in most case, producing less instructions for complex shuffles. There are some cases that now become tbl, which may be better or worse depending on whether the instruction is in a loop and the tbl load can be hoisted out. Differential Revision: https://reviews.llvm.org/D123377	2022-04-19 12:58:55 +01:00
David Green	50af82701c	[AArch64] Cost all perfect shuffles entries as cost 1 A brief introduction to perfect shuffles - AArch64 NEON has a number of shuffle operations - dups, zips, exts, movs etc that can in some way shuffle around the lanes of a vector. Given a shuffle of size 4 with 2 inputs, some shuffle masks can be easily codegen'd to a single instruction. A <0,0,1,1> mask for example is a zip LHS, LHS. This is great, but some masks are not so simple, like a <0,0,1,2>. It turns out we can generate that from zip LHS, <0,2,0,2>, having generated <0,2,0,2> from uzp LHS, LHS, producing the result in 2 instructions. It is not obvious from a given mask how to get there though. So we have a simple program (PerfectShuffle.cpp in the util folder) that can scan through all combinations of 4-element vectors and generate the perfect combination of results needed for each shuffle mask (for some definition of perfect). This is run offline to generate a table that is queried for generating shuffle instructions. (Because the table could get quite big, it is limited to 4 element vectors). In the perfect shuffle tables zip, unz and trn shuffles were being cost as 2, which is higher than needed and skews the perfect shuffle tables to create inefficient combinations. This sets them to 1 and regenerates the tables. The codegen will usually be better and the costs should be more precise (but it can get less second-order re-use of values from multiple shuffles, these cases should be fixed up in subsequent patches. Differential Revision: https://reviews.llvm.org/D123379	2022-04-19 12:05:05 +01:00
chenglin.bi	222adf338a	[Arch64][SelectionDAG] Add target-specific implementation of srem 1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. 2. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-19 02:49:42 +08:00

... 3 4 5 6 7 ...

5955 Commits