llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	0cd8063960	[AArch64] Genereate CCMP from And CSel LLVM has a couple of ways of producing ccmp - either from chains in isel or from a later ifcvt style pass. This adds a simple DAG combine to capture more cases, converting and(csel(0, 1, cc0), csel(0, 1, cc1)) into a csel(ccmp(.., cc0)), depending on cc1 (a SUBS in this case). Differential Revision: https://reviews.llvm.org/D118327	2022-02-02 13:48:16 +00:00
David Sherwood	11cf807796	[AArch64][CodeGen] Always use SVE (when enabled) to lower integer divides This patch adds custom lowering support for ISD::SDIV and ISD::UDIV when SVE is enabled, regardless of the minimum SVE vector length. We do this because NEON simply does not have vector integer divide support, so we want to take advantage of these instructions in SVE. As part of this patch I've also simplified LowerToPredicatedOp to avoid re-asking the same question about whether we should be using SVE for fixed length vectors. Once we've made the decision to call LowerToPredicatedOp, then we should simply assert we should be using SVE. I've updated the 128-bit min SVE vector bits tests here: CodeGen/AArch64/sve-fixed-length-int-div.ll CodeGen/AArch64/sve-fixed-length-int-rem.ll Differential Revision: https://reviews.llvm.org/D117871	2022-02-02 09:46:02 +00:00
Cullen Rhodes	16d464a291	[AArch64][SVE] NFC: tidy up isel lowering Whilst adding legal types <-> register classes for Streaming SVE in D118561 I noticed the hasSVE predication block set operation actions for opcodes that may not be legal in Streaming SVE. Move these operations to the later hasSVE block which has loops over the same types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118560	2022-02-02 09:02:20 +00:00
Nikita Popov	a1dc6d4b83	[AArch64] Do not use ABI alignment for mops.memset.tag Pointer element types do not imply that the pointer is ABI aligned. We should be using either an explicit align attribute here, or fall back to an alignment of 1. This fixes a new element type access introduced in D117764. I don't think this makes any practical difference though, as the lowering does not depend on alignment. Differential Revision: https://reviews.llvm.org/D118681	2022-02-01 14:37:53 +01:00
Fangrui Song	51ed14d224	[AArch64] Temporarily use getPointerElementType to fix -Wdeprecated-declarations. NFC	2022-01-31 19:16:11 -08:00
tyb0807	5aa08bf708	[AArch64][SelectionDAG] CodeGen for Armv8.8/9.3 MOPS New target SDNodes are added: AArch64ISD::MOPS_MEMSET, etc. Each intrinsic is translated to one of these in SelectionDAGBuilder via EmitTargetCodeForMOPS. A custom lowering routine for INTRINSIC_W_CHAIN is added to handle llvm.aarch64.mops.memset.tag. This takes a separate path from the common intrinsics but ultimately ends up in the same EmitMOPS(). This is part 4/4 of a series of patches split from https://reviews.llvm.org/D117405 to facilitate reviewing. Patch by Tomas Matheson, Lucas Prates and Son Tuan Vu. Differential Revision: https://reviews.llvm.org/D117764	2022-01-31 20:56:27 +00:00
Florian Hahn	23091f7d50	[AArch64] Bail out for float operands in SetCC optimization. The optimization added in D118139 causes a crash on the added test case while trying to zero extend an vector of floats. Fix the crash by bailing out for floating point operands. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D118615	2022-01-31 18:20:47 +00:00
Paul Walker	30efee764d	[SVE] Remove AArch64ISD::PFALSE. AArch64ISD::PFALSE does not provide any value, in fact it can prevent common combines from firing. We only needed to lower to PFALSE until ISD::SPLAT_VECTOR became generally available. Differential Revision: https://reviews.llvm.org/D118469	2022-01-29 11:31:00 +00:00
Paul Walker	3bc876d0a3	[AArch64] Add isel for bitcasting between bfloat and half types. Differential Revision: https://reviews.llvm.org/D118420	2022-01-29 11:26:13 +00:00
Ahmed Bougacha	634ca7349d	[ObjCARC] Require the function argument in the clang.arc.attachedcall bundle. Currently, the clang.arc.attachedcall bundle takes an optional function argument. Depending on whether the argument is present, calls with this bundle have the following semantics: - on x86, with the argument present, the call is lowered to: call _target mov rax, rdi call _objc_retainAutoreleasedReturnValue - on AArch64, without the argument, the call is lowered to: bl _target mov x29, x29 and the objc runtime call is expected to be emitted separately. That's because, on x86, the objc runtime checks for both the mov and the call on x86, and treats the combination as the ARC autorelease elision marker. But on AArch64, it only checks for the dedicated NOP marker, as that's historically been sufficiently unique. Thanks to that, the runtime call wasn't required to be adjacent to the NOP marker, so it wasn't emitted as part of the bundle sequence. This patch unifies both architectures: on AArch64, we now emit all 3 instructions for the bundle. This guarantees that the runtime call is adjacent to the marker in the sequence, and that's information the runtime can use to further optimize this. This helps simplify some of the handling, in particular BundledRetainClaimRVs, which no longer needs to know whether the bundle is sufficient or not: it now always should be. Note that this does not include an AutoUpgrade for the nullary bundles, as they are only produced in ObjCContract as part of the obj/asm emission pipeline, and are not expected to be in bitcode. Differential Revision: https://reviews.llvm.org/D118214	2022-01-28 12:41:45 -08:00
David Truby	81bd67e18a	[AArch64][SVE][VLS] Move extends into arguments of comparisons When a comparison is extended and it would be free to extend the arguments to that comparison, we can propagate the extend into those arguments. This prevents extra instructions being generated to extend the result of the comparison, which is not free to extend. This is a resubmission of D116812 with fixes that need another review. Differential Revision: https://reviews.llvm.org/D118139	2022-01-28 14:16:08 +00:00
Sander de Smalen	af1c8f0d14	[AArch64][SVE] Folds VSELECT if the predicate is all active. This adds the following changes: * Fold: vselect(<all active predicate>, x, y) => x * Extend isAllActivePredicate to take vscale_range into account, e.g. isAllActivePredicate(vl16) for nxv16i1 and vscale == 1 => true. isAllActivePredicate(vl32) for nxv16i1 and vscale == 2 => true. Differential Revision: https://reviews.llvm.org/D118147	2022-01-27 15:58:56 +00:00
Sander de Smalen	417a75c6d0	[AArch64][SVE] Avoid using ptrue for ptest in VECREDUCE_OR. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118145	2022-01-27 11:44:49 +00:00
Sander de Smalen	c9da81d997	[AArch64][SVE] Implement missing lowering for extract_subvector for predicates. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D118057	2022-01-27 11:01:11 +00:00
Sander de Smalen	d58757e522	[AArch64][SVE] Implement PFALSE with explicit AArch64ISD node. The ISel patterns for PFALSE helps recognise the instructions as being free of side-effects, which helps MachineCSE remove redundant PFALSE instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118054	2022-01-27 10:30:13 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Maciej Gabka	c5263cd518	Restrict performPostLD1Combine to 64 and 128 bit vectors When wider vectors are used, for example fixed width SVE, there is no patterns to select AArch64ISD::LD1LANEpost nodes, so we should do an early exit. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D117674	2022-01-26 09:57:44 +00:00
Paul Walker	d95cf1f6cf	[SVE] Enable ISD::ABDS/U ISel for scalable vectors. NOTE: This patch also includes tests that highlight those cases where the existing DAG combine doesn't yet work well for SVE. Differential Revision: https://reviews.llvm.org/D117873	2022-01-25 12:14:53 +00:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Peter Waller	d4a6bf4d1a	Revert "[AArch64][SVE][VLS] Move extends into arguments of comparisons" This reverts commit `db04d3e30b`, which causes a buildbot failure.	2022-01-20 12:01:23 +00:00
Adrian Tong	b6a7ae2c5d	Optimize shift and accumulate pattern in AArch64. AArch64 supports unsigned shift right and accumulate. In case we see a unsigned shift right followed by an OR. We could turn them into a USRA instruction, given the operands of the OR has no common bits. Differential Revision: https://reviews.llvm.org/D114405	2022-01-20 01:57:40 +00:00
David Truby	db04d3e30b	[AArch64][SVE][VLS] Move extends into arguments of comparisons When a comparison is extended and it would be free to extend the arguments to that comparison, we can propagate the extend into those arguments. This prevents extra instructions being generated to extend the result of the comparison, which is not free to extend. Differential Revision: https://reviews.llvm.org/D116812	2022-01-19 14:11:45 +00:00
Jim Lin	d6b0734837	[NFC] Use Register instead of unsigned	2022-01-19 20:17:04 +08:00
Akshay Kumar	6f61fe7de9	[Aarch64] Customer lowering of COPYSIGN to SIMD should check for NEON availability For the following test case, clang is crashing for ARM64 architecture $ cat crash.c double crash(double a, double b) { return __builtin_copysign(a, b); } $ clang -O2 -march=armv8-a+nosimd --target=arm64 -S crash.c -o /dev/null fatal error: error in backend: Cannot select: 0x7fae361bb4e8: v2i64 = AArch64ISD::BIT 0x7fae361bb210, 0x7fae361bb278, 0x7fae361bb480 Fix: PR51806 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D116581	2022-01-18 00:25:15 +05:30
David Sherwood	3a272d1eaf	[SVE][CodeGen] Use splice instruction when lowering VECTOR_SPLICE For certain negative indices passed to the VECTOR_SPLICE operation we can actually directly use the SVE splice instruction by creating the appropriate predicate. The predicate needs to be constructed in such a way that all but the last -idx elements are false. We can do this efficiently using a combination of 'ptrue' (with the appropriate fixed pattern, e.g. vl1, vl2, etc.) and 'rev'. The advantage of using these instructions to generate the predicate is they do not set any flags, unlike the whilelo instruction. This is critical when the splice operation is in a loop, since we want MachineLICM to hoist the predicate generation out of the loop. Differential Revision: https://reviews.llvm.org/D115863	2022-01-11 11:58:17 +00:00
Jay Foad	3f3fe4a5cf	[GlobalISel] Fix typo Extact to Extract in function name. NFC.	2022-01-07 11:13:35 +00:00
Nicholas Guy	13992498cd	[AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops Differential Revision: https://reviews.llvm.org/D114879	2022-01-05 12:54:31 +00:00
Paul Walker	4325fd7402	[AArch64ISelLowering] Don't look through scalable extract_subvector when optimising DUPLANE. When constructDup is passed an extract_subvector it tries to use extract_subvector's operand directly when creating the DUPLANE. This is invalid when extracting from a scalable vector because the necessary DUPLANE ISel patterns do not exist. NOTE: This patch is an update to https://reviews.llvm.org/D110524 that originally fixed this but introduced a bug when the result VT is 64bits. I've restructured the code so the critial final else block is entered when necessary. Differential Revision: https://reviews.llvm.org/D116442	2022-01-05 11:56:59 +00:00
Andrew Wei	03dc2975d0	[AArch64][SVE] Lower shuffles to permute instructions: zip1/2, uzp1/2, trn1/2 Attempt to lower a shuffle as a permute instruction(zip/uzp/trn) for fixed length SVE. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D113376	2021-12-21 18:39:09 +08:00
David Truby	7e44eb079d	[AArch64][SVE] Improve code generation for VLS i1 masks This patch partially resolves an issue for VLS code generation where a mask is generated from a smaller width integer comparison than the instruction using the mask requires. Instead of sign extending a p register by converting it to a z register, extending that, and converting back, we instead just do an unpack of the p register. A separate issue causes the code generation to still be poor when the mask generation would fit in a neon register, as we then use a neon comparison operation and have to convert that to a p register. This will be resolved in a separate patch. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D111221	2021-12-17 16:26:49 +00:00
David Truby	5c9684704d	[DAG][sve] Lowering for VLS masked truncating stores This extends the custom lowering for truncating stores on fixed length vectors in SVE to support masked truncating stores. It also adds a DAG combine for truncates followed by masked stores. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D108115	2021-12-17 15:04:45 +00:00
Andrew Wei	dc7b672f96	[AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revw Attempt to lower a shuffle as a permute instruction(rev/revb/revh/revw) for fixed length SVE. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D114960	2021-12-15 21:53:00 +08:00
John Brawn	dc9f65be45	[AArch64][SVE] Fix handling of stack protection with SVE Fix a couple of things that were causing stack protection to not work correctly in functions that have scalable vectors on the stack: * Use TypeSize when determining if accesses to a variable are considered out-of-bounds so that the behaviour is correct for scalable vectors. * When stack protection is enabled move the stack protector location to the top of the SVE locals, so that any overflow in them (or the other locals which are below that) will be detected. Fixes: https://github.com/llvm/llvm-project/issues/51137 Differential Revision: https://reviews.llvm.org/D111631	2021-12-14 11:30:48 +00:00
Peter Waller	921e89c59a	[SVE] Only combine (fneg (fma)) => FNMLA with nsz -(Za + Zm * Zn) != (-Za + Zm * (-Zn)) when the FMA produces a zero output (e.g. all zero inputs can produce -0 output) Add a PatFrag to check presence of nsz on the fneg, add tests which ensure the combine does not fire in the absense of nsz. See https://reviews.llvm.org/D90901 for a similar discussion on X86. Differential Revision: https://reviews.llvm.org/D109525	2021-12-13 11:33:07 +00:00
Matt Devereau	2e585dd91a	[AArch64][SVE] Lower vector.insert to predicated merged MOV Use predicated SEL for vector.insert instead of going through memory Differential Revision: https://reviews.llvm.org/D115259	2021-12-13 11:17:55 +00:00
Peter Waller	ed43aab98d	[AArch64][SVE] Fix fptrunc store for fixed len vector Restrict duplicate FP_EXTEND/FP_TRUNC -> LOAD/STORE DAG combines to only larger than NEON types, as these are the ones for which there is custom lowering. Update tests so that they go through memory to improve validation. Differential Revision: https://reviews.llvm.org/D115166	2021-12-07 12:22:07 +00:00
Peter Waller	a6f751c34e	[AArch64][SVE] Fix ICE extracting fixedvec from scalable load `f526c600c0` had a concern raised because of an invalid typesize request on a scalable vector, which this patch addresses. Prevent shouldReduceLoadWidth from attempting to query the bit size, and add a regression test in sve-extract-fixed-vector.ll. Differential Revision: https://reviews.llvm.org/D115156	2021-12-06 16:49:43 +00:00
Matt Devereau	4244f95cc6	[AArch64][SVE] Enable bf16 vector.insert Allow passthrough bf16 registers for vector.insert Differential revision: https://reviews.llvm.org/D114858	2021-12-02 12:59:19 +00:00
Bradley Smith	fd9069ffce	[AArch64][SVE] Duplicate FP_EXTEND/FP_TRUNC -> LOAD/STORE dag combines By duplicating these dag combines we can bypass the legality checks that they do, this allows us to perform these combines on larger than legal fixed types, which in turn allows us to bring the same benefits D114580 brought but to larger than legal fixed types. Depends on D114580 Differential Revision: https://reviews.llvm.org/D114628	2021-12-01 15:33:53 +00:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit `52ff3b0093`.	2021-11-30 15:36:56 +01:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
David Sherwood	84364bdaab	[CodeGen][AArch64] Bail out in performConcatVectorsCombine for scalable vectors I tried to exercise the existing combine patterns in performConcatVectorsCombine for scalable vectors and at the moment it doesn't seem possible. Parts of the code currently assume we're dealing with fixed-width vectors with calls to getVectorNumElements(), therefore I've decided to simply bail out early for scalable vectors. Added a test here to show that we don't crash when attempting to combine truncate + concat: CodeGen/AArch64/concat_vector-truncate-combine.ll Differential Revision: https://reviews.llvm.org/D114600	2021-11-29 14:26:14 +00:00
Bradley Smith	6180806632	[AArch64][SVE] Mark fixed-type FP extending/truncating loads/stores as custom This allows the generic DAG combine to fold fp_extend/fp_trunc into loads/stores which we can then lower into a integer extending load/truncating store plus an FP_EXTEND/FP_ROUND. The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked types hence lowering them introduces an unpack/zip. By allowing these nodes to be combined with loads/store we make it much easier to have this unpack/zip combined into the load/store by our custom lowering. Differential Revision: https://reviews.llvm.org/D114580	2021-11-29 11:56:07 +00:00
David Sherwood	a31f4bdfe8	[CodeGen][SVE] Use whilelo instruction when lowering @llvm.get.active.lane.mask In most common cases the @llvm.get.active.lane.mask intrinsic maps directly to the SVE whilelo instruction, which already takes overflow into account. However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower this immediately to a generic sequence of instructions that explicitly take overflow into account. This makes it very difficult to then later transform back into a single whilelo instruction. Therefore, this patch introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if we should lower/expand this to a sequence of generic ISD nodes, or instead just leave it as an intrinsic for the target to lower. You can see the significant improvement in code quality for some of the tests in this file: CodeGen/AArch64/active_lane_mask.ll Differential Revision: https://reviews.llvm.org/D114542	2021-11-29 08:08:17 +00:00
Bradley Smith	eafbaca977	[AArch64][SVE] Generate ASRD instructions for power of 2 signed divides Differential Revision: https://reviews.llvm.org/D113281	2021-11-26 11:08:27 +00:00
Simon Pilgrim	63b1e58f07	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED) If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. Reapplied with fix for AArch64 rev patterns to matching the ARM fix. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-25 11:14:15 +00:00
Bradley Smith	080ef0b6a6	[AArch64][SVE] Recognize all ones mask during fixed mask generation Differential Revision: https://reviews.llvm.org/D114431	2021-11-24 13:55:06 +00:00
David Green	760d4d03d5	[AArch64] Sink splat shuffles to lane index intrinsics This teaches AArch64TargetLowering::shouldSinkOperands to sink splat shuffles to certain neon intrinsics, so that they can make use of the lane variants of the instructions that are available. Differential Revision: https://reviews.llvm.org/D112994	2021-11-22 08:11:35 +00:00

1 2 3 4 5 ...

1262 Commits