llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	fdce239ae9	[AArch64] Attempt to emitConjunction from brcond We currently use emitConjunction to create CCMP conjunctions from the conditions of selects, helping turning and/ors into more optimal ccmp sequences that don't need to go through csels. This extends that to also be used whilst lowering brcond, giving more opportunity for better condition generation. Differential Revision: https://reviews.llvm.org/D118650	2022-02-08 11:27:10 +00:00
David Truby	be826cf4f7	[AArch64][NEON][SVE] Lower FCOPYSIGN using AArch64ISD::BSP This patch modifies the FCOPYSIGN lowering to go through the BSP pseudo-instruction. This allows the same lowering code for NEON, SVE and SVE2. As part of this, lowering for BSP for SVE and SVE2 is also added. For SVE and NEON this patch is NFC. Differential Revision: https://reviews.llvm.org/D118394	2022-02-07 14:35:26 +00:00
zhongyunde 00443407	b3b129f11f	[DAGCombiner][AArch64] Enhance to support for scalar CSINC Enhance to fold csel into csinc instruction. Fix https://github.com/llvm/llvm-project/issues/53071 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D116915	2022-02-07 10:27:48 +08:00
Simon Pilgrim	d3f966c6f0	[AArch64] LowerVectorSRA_SRL_SHL - silence dead code warning Remove default case from switch and move llvm_unreachable to after the switch()	2022-02-06 16:29:38 +00:00
Paul Walker	20085df22a	[NFC][SVE] Change useSVEForFixedLengthVectorVT to allow unconditional SVE usage for NEON sized vectors. Previously useSVEForFixedLengthVectorVT only allowed SVE usage when the target SVE register length was known to be at least 256bit. This was true even for NEON sized vectors, which was an artificial restriction imposed during early SVE bring up. This now changes so that callers can opt to use SVE for NEON sized vectors regardless of the SVE register length. The patch is NFC because for all places where OverrideNEON is used we now explicitly also check that SVE code generation for larger than NEON vectors is enabled. The intent is that over time these extra checks will either be removed or the lowering disabled if the SVE usage proves not beneficial. Differential Revision: https://reviews.llvm.org/D118957	2022-02-04 14:34:35 +00:00
Caroline Concatto	961e954af5	[AArch64][SVE] Add more folds to make use of gather/scatter with 32-bit indices In AArch64ISelLowering.cpp this patch implements this fold: 1) GEP (%ptr, SHL ((stepvector(A) + splat(%offset))) << splat(B))) into GEP (%ptr + (%offset << B), step_vector (A << B)) The above transform simplifies the index operand so that it can be expressed as i32 elements. This allows using only one gather/scatter assembly instruction instead of two. Patch by Paul Walker (@paulwalker-arm). Depends on D117900 Differential Revision: https://reviews.llvm.org/D118345	2022-02-03 19:18:30 +00:00
Caroline Concatto	019f0221d5	[AArch64][SVE] Fold gather/scatter with 32bits when possible In AArch64ISelLowering.cpp this patch implements this fold: GEP (%ptr, (splat(%offset) + stepvector(A))) into GEP ((%ptr + %offset), stepvector(A)) The above transform simplifies the index operand so that it can be expressed as i32 elements. This allows using only one gather/scatter assembly instruction instead of two. Patch by Paul Walker (@paulwalker-arm). Depends on D118459 Differential Revision: https://reviews.llvm.org/D117900	2022-02-03 18:58:37 +00:00
Sunho Kim	44601f4956	[AARCH64][NEON] Allow to sink operands for aarch64_neon_pmull This teaches AArch64TargetLowering::shouldSinkOperands to sink the operands of aarch64_neon_pmull intrinsic. Differential Revision: https://reviews.llvm.org/D117944	2022-02-03 16:46:49 +00:00
David Green	31373fb88a	[AArch64] Reassociate integer extending reductions to pairwise addition. Given an (integer) vecreduce, we know the order of the inputs does not matter. We can convert UADDV(add(zext(extract_lo(x)), zext(extract_hi(x)))) into UADDV(UADDLP(x)). This can also happen through an extra add, where we transform UADDV(add(y, add(zext(extract_lo(x)), zext(extract_hi(x))))). This makes sure the same thing happens signed cases too, which requires adding a new SADDLP node. Differential Revision: https://reviews.llvm.org/D118107	2022-02-03 11:05:48 +00:00
David Green	0cd8063960	[AArch64] Genereate CCMP from And CSel LLVM has a couple of ways of producing ccmp - either from chains in isel or from a later ifcvt style pass. This adds a simple DAG combine to capture more cases, converting and(csel(0, 1, cc0), csel(0, 1, cc1)) into a csel(ccmp(.., cc0)), depending on cc1 (a SUBS in this case). Differential Revision: https://reviews.llvm.org/D118327	2022-02-02 13:48:16 +00:00
David Sherwood	11cf807796	[AArch64][CodeGen] Always use SVE (when enabled) to lower integer divides This patch adds custom lowering support for ISD::SDIV and ISD::UDIV when SVE is enabled, regardless of the minimum SVE vector length. We do this because NEON simply does not have vector integer divide support, so we want to take advantage of these instructions in SVE. As part of this patch I've also simplified LowerToPredicatedOp to avoid re-asking the same question about whether we should be using SVE for fixed length vectors. Once we've made the decision to call LowerToPredicatedOp, then we should simply assert we should be using SVE. I've updated the 128-bit min SVE vector bits tests here: CodeGen/AArch64/sve-fixed-length-int-div.ll CodeGen/AArch64/sve-fixed-length-int-rem.ll Differential Revision: https://reviews.llvm.org/D117871	2022-02-02 09:46:02 +00:00
Cullen Rhodes	16d464a291	[AArch64][SVE] NFC: tidy up isel lowering Whilst adding legal types <-> register classes for Streaming SVE in D118561 I noticed the hasSVE predication block set operation actions for opcodes that may not be legal in Streaming SVE. Move these operations to the later hasSVE block which has loops over the same types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118560	2022-02-02 09:02:20 +00:00
Nikita Popov	a1dc6d4b83	[AArch64] Do not use ABI alignment for mops.memset.tag Pointer element types do not imply that the pointer is ABI aligned. We should be using either an explicit align attribute here, or fall back to an alignment of 1. This fixes a new element type access introduced in D117764. I don't think this makes any practical difference though, as the lowering does not depend on alignment. Differential Revision: https://reviews.llvm.org/D118681	2022-02-01 14:37:53 +01:00
Fangrui Song	51ed14d224	[AArch64] Temporarily use getPointerElementType to fix -Wdeprecated-declarations. NFC	2022-01-31 19:16:11 -08:00
tyb0807	5aa08bf708	[AArch64][SelectionDAG] CodeGen for Armv8.8/9.3 MOPS New target SDNodes are added: AArch64ISD::MOPS_MEMSET, etc. Each intrinsic is translated to one of these in SelectionDAGBuilder via EmitTargetCodeForMOPS. A custom lowering routine for INTRINSIC_W_CHAIN is added to handle llvm.aarch64.mops.memset.tag. This takes a separate path from the common intrinsics but ultimately ends up in the same EmitMOPS(). This is part 4/4 of a series of patches split from https://reviews.llvm.org/D117405 to facilitate reviewing. Patch by Tomas Matheson, Lucas Prates and Son Tuan Vu. Differential Revision: https://reviews.llvm.org/D117764	2022-01-31 20:56:27 +00:00
Florian Hahn	23091f7d50	[AArch64] Bail out for float operands in SetCC optimization. The optimization added in D118139 causes a crash on the added test case while trying to zero extend an vector of floats. Fix the crash by bailing out for floating point operands. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D118615	2022-01-31 18:20:47 +00:00
Paul Walker	30efee764d	[SVE] Remove AArch64ISD::PFALSE. AArch64ISD::PFALSE does not provide any value, in fact it can prevent common combines from firing. We only needed to lower to PFALSE until ISD::SPLAT_VECTOR became generally available. Differential Revision: https://reviews.llvm.org/D118469	2022-01-29 11:31:00 +00:00
Paul Walker	3bc876d0a3	[AArch64] Add isel for bitcasting between bfloat and half types. Differential Revision: https://reviews.llvm.org/D118420	2022-01-29 11:26:13 +00:00
Ahmed Bougacha	634ca7349d	[ObjCARC] Require the function argument in the clang.arc.attachedcall bundle. Currently, the clang.arc.attachedcall bundle takes an optional function argument. Depending on whether the argument is present, calls with this bundle have the following semantics: - on x86, with the argument present, the call is lowered to: call _target mov rax, rdi call _objc_retainAutoreleasedReturnValue - on AArch64, without the argument, the call is lowered to: bl _target mov x29, x29 and the objc runtime call is expected to be emitted separately. That's because, on x86, the objc runtime checks for both the mov and the call on x86, and treats the combination as the ARC autorelease elision marker. But on AArch64, it only checks for the dedicated NOP marker, as that's historically been sufficiently unique. Thanks to that, the runtime call wasn't required to be adjacent to the NOP marker, so it wasn't emitted as part of the bundle sequence. This patch unifies both architectures: on AArch64, we now emit all 3 instructions for the bundle. This guarantees that the runtime call is adjacent to the marker in the sequence, and that's information the runtime can use to further optimize this. This helps simplify some of the handling, in particular BundledRetainClaimRVs, which no longer needs to know whether the bundle is sufficient or not: it now always should be. Note that this does not include an AutoUpgrade for the nullary bundles, as they are only produced in ObjCContract as part of the obj/asm emission pipeline, and are not expected to be in bitcode. Differential Revision: https://reviews.llvm.org/D118214	2022-01-28 12:41:45 -08:00
David Truby	81bd67e18a	[AArch64][SVE][VLS] Move extends into arguments of comparisons When a comparison is extended and it would be free to extend the arguments to that comparison, we can propagate the extend into those arguments. This prevents extra instructions being generated to extend the result of the comparison, which is not free to extend. This is a resubmission of D116812 with fixes that need another review. Differential Revision: https://reviews.llvm.org/D118139	2022-01-28 14:16:08 +00:00
Sander de Smalen	af1c8f0d14	[AArch64][SVE] Folds VSELECT if the predicate is all active. This adds the following changes: * Fold: vselect(<all active predicate>, x, y) => x * Extend isAllActivePredicate to take vscale_range into account, e.g. isAllActivePredicate(vl16) for nxv16i1 and vscale == 1 => true. isAllActivePredicate(vl32) for nxv16i1 and vscale == 2 => true. Differential Revision: https://reviews.llvm.org/D118147	2022-01-27 15:58:56 +00:00
Sander de Smalen	417a75c6d0	[AArch64][SVE] Avoid using ptrue for ptest in VECREDUCE_OR. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118145	2022-01-27 11:44:49 +00:00
Sander de Smalen	c9da81d997	[AArch64][SVE] Implement missing lowering for extract_subvector for predicates. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D118057	2022-01-27 11:01:11 +00:00
Sander de Smalen	d58757e522	[AArch64][SVE] Implement PFALSE with explicit AArch64ISD node. The ISel patterns for PFALSE helps recognise the instructions as being free of side-effects, which helps MachineCSE remove redundant PFALSE instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118054	2022-01-27 10:30:13 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Maciej Gabka	c5263cd518	Restrict performPostLD1Combine to 64 and 128 bit vectors When wider vectors are used, for example fixed width SVE, there is no patterns to select AArch64ISD::LD1LANEpost nodes, so we should do an early exit. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D117674	2022-01-26 09:57:44 +00:00
Paul Walker	d95cf1f6cf	[SVE] Enable ISD::ABDS/U ISel for scalable vectors. NOTE: This patch also includes tests that highlight those cases where the existing DAG combine doesn't yet work well for SVE. Differential Revision: https://reviews.llvm.org/D117873	2022-01-25 12:14:53 +00:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Peter Waller	d4a6bf4d1a	Revert "[AArch64][SVE][VLS] Move extends into arguments of comparisons" This reverts commit `db04d3e30b`, which causes a buildbot failure.	2022-01-20 12:01:23 +00:00
Adrian Tong	b6a7ae2c5d	Optimize shift and accumulate pattern in AArch64. AArch64 supports unsigned shift right and accumulate. In case we see a unsigned shift right followed by an OR. We could turn them into a USRA instruction, given the operands of the OR has no common bits. Differential Revision: https://reviews.llvm.org/D114405	2022-01-20 01:57:40 +00:00
David Truby	db04d3e30b	[AArch64][SVE][VLS] Move extends into arguments of comparisons When a comparison is extended and it would be free to extend the arguments to that comparison, we can propagate the extend into those arguments. This prevents extra instructions being generated to extend the result of the comparison, which is not free to extend. Differential Revision: https://reviews.llvm.org/D116812	2022-01-19 14:11:45 +00:00
Jim Lin	d6b0734837	[NFC] Use Register instead of unsigned	2022-01-19 20:17:04 +08:00
Akshay Kumar	6f61fe7de9	[Aarch64] Customer lowering of COPYSIGN to SIMD should check for NEON availability For the following test case, clang is crashing for ARM64 architecture $ cat crash.c double crash(double a, double b) { return __builtin_copysign(a, b); } $ clang -O2 -march=armv8-a+nosimd --target=arm64 -S crash.c -o /dev/null fatal error: error in backend: Cannot select: 0x7fae361bb4e8: v2i64 = AArch64ISD::BIT 0x7fae361bb210, 0x7fae361bb278, 0x7fae361bb480 Fix: PR51806 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D116581	2022-01-18 00:25:15 +05:30
David Sherwood	3a272d1eaf	[SVE][CodeGen] Use splice instruction when lowering VECTOR_SPLICE For certain negative indices passed to the VECTOR_SPLICE operation we can actually directly use the SVE splice instruction by creating the appropriate predicate. The predicate needs to be constructed in such a way that all but the last -idx elements are false. We can do this efficiently using a combination of 'ptrue' (with the appropriate fixed pattern, e.g. vl1, vl2, etc.) and 'rev'. The advantage of using these instructions to generate the predicate is they do not set any flags, unlike the whilelo instruction. This is critical when the splice operation is in a loop, since we want MachineLICM to hoist the predicate generation out of the loop. Differential Revision: https://reviews.llvm.org/D115863	2022-01-11 11:58:17 +00:00
Jay Foad	3f3fe4a5cf	[GlobalISel] Fix typo Extact to Extract in function name. NFC.	2022-01-07 11:13:35 +00:00
Nicholas Guy	13992498cd	[AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops Differential Revision: https://reviews.llvm.org/D114879	2022-01-05 12:54:31 +00:00
Paul Walker	4325fd7402	[AArch64ISelLowering] Don't look through scalable extract_subvector when optimising DUPLANE. When constructDup is passed an extract_subvector it tries to use extract_subvector's operand directly when creating the DUPLANE. This is invalid when extracting from a scalable vector because the necessary DUPLANE ISel patterns do not exist. NOTE: This patch is an update to https://reviews.llvm.org/D110524 that originally fixed this but introduced a bug when the result VT is 64bits. I've restructured the code so the critial final else block is entered when necessary. Differential Revision: https://reviews.llvm.org/D116442	2022-01-05 11:56:59 +00:00
Andrew Wei	03dc2975d0	[AArch64][SVE] Lower shuffles to permute instructions: zip1/2, uzp1/2, trn1/2 Attempt to lower a shuffle as a permute instruction(zip/uzp/trn) for fixed length SVE. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D113376	2021-12-21 18:39:09 +08:00
David Truby	7e44eb079d	[AArch64][SVE] Improve code generation for VLS i1 masks This patch partially resolves an issue for VLS code generation where a mask is generated from a smaller width integer comparison than the instruction using the mask requires. Instead of sign extending a p register by converting it to a z register, extending that, and converting back, we instead just do an unpack of the p register. A separate issue causes the code generation to still be poor when the mask generation would fit in a neon register, as we then use a neon comparison operation and have to convert that to a p register. This will be resolved in a separate patch. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D111221	2021-12-17 16:26:49 +00:00
David Truby	5c9684704d	[DAG][sve] Lowering for VLS masked truncating stores This extends the custom lowering for truncating stores on fixed length vectors in SVE to support masked truncating stores. It also adds a DAG combine for truncates followed by masked stores. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D108115	2021-12-17 15:04:45 +00:00
Andrew Wei	dc7b672f96	[AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revw Attempt to lower a shuffle as a permute instruction(rev/revb/revh/revw) for fixed length SVE. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D114960	2021-12-15 21:53:00 +08:00
John Brawn	dc9f65be45	[AArch64][SVE] Fix handling of stack protection with SVE Fix a couple of things that were causing stack protection to not work correctly in functions that have scalable vectors on the stack: * Use TypeSize when determining if accesses to a variable are considered out-of-bounds so that the behaviour is correct for scalable vectors. * When stack protection is enabled move the stack protector location to the top of the SVE locals, so that any overflow in them (or the other locals which are below that) will be detected. Fixes: https://github.com/llvm/llvm-project/issues/51137 Differential Revision: https://reviews.llvm.org/D111631	2021-12-14 11:30:48 +00:00
Peter Waller	921e89c59a	[SVE] Only combine (fneg (fma)) => FNMLA with nsz -(Za + Zm * Zn) != (-Za + Zm * (-Zn)) when the FMA produces a zero output (e.g. all zero inputs can produce -0 output) Add a PatFrag to check presence of nsz on the fneg, add tests which ensure the combine does not fire in the absense of nsz. See https://reviews.llvm.org/D90901 for a similar discussion on X86. Differential Revision: https://reviews.llvm.org/D109525	2021-12-13 11:33:07 +00:00
Matt Devereau	2e585dd91a	[AArch64][SVE] Lower vector.insert to predicated merged MOV Use predicated SEL for vector.insert instead of going through memory Differential Revision: https://reviews.llvm.org/D115259	2021-12-13 11:17:55 +00:00
Peter Waller	ed43aab98d	[AArch64][SVE] Fix fptrunc store for fixed len vector Restrict duplicate FP_EXTEND/FP_TRUNC -> LOAD/STORE DAG combines to only larger than NEON types, as these are the ones for which there is custom lowering. Update tests so that they go through memory to improve validation. Differential Revision: https://reviews.llvm.org/D115166	2021-12-07 12:22:07 +00:00
Peter Waller	a6f751c34e	[AArch64][SVE] Fix ICE extracting fixedvec from scalable load `f526c600c0` had a concern raised because of an invalid typesize request on a scalable vector, which this patch addresses. Prevent shouldReduceLoadWidth from attempting to query the bit size, and add a regression test in sve-extract-fixed-vector.ll. Differential Revision: https://reviews.llvm.org/D115156	2021-12-06 16:49:43 +00:00
Matt Devereau	4244f95cc6	[AArch64][SVE] Enable bf16 vector.insert Allow passthrough bf16 registers for vector.insert Differential revision: https://reviews.llvm.org/D114858	2021-12-02 12:59:19 +00:00
Bradley Smith	fd9069ffce	[AArch64][SVE] Duplicate FP_EXTEND/FP_TRUNC -> LOAD/STORE dag combines By duplicating these dag combines we can bypass the legality checks that they do, this allows us to perform these combines on larger than legal fixed types, which in turn allows us to bring the same benefits D114580 brought but to larger than legal fixed types. Depends on D114580 Differential Revision: https://reviews.llvm.org/D114628	2021-12-01 15:33:53 +00:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit `52ff3b0093`.	2021-11-30 15:36:56 +01:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
David Sherwood	84364bdaab	[CodeGen][AArch64] Bail out in performConcatVectorsCombine for scalable vectors I tried to exercise the existing combine patterns in performConcatVectorsCombine for scalable vectors and at the moment it doesn't seem possible. Parts of the code currently assume we're dealing with fixed-width vectors with calls to getVectorNumElements(), therefore I've decided to simply bail out early for scalable vectors. Added a test here to show that we don't crash when attempting to combine truncate + concat: CodeGen/AArch64/concat_vector-truncate-combine.ll Differential Revision: https://reviews.llvm.org/D114600	2021-11-29 14:26:14 +00:00
Bradley Smith	6180806632	[AArch64][SVE] Mark fixed-type FP extending/truncating loads/stores as custom This allows the generic DAG combine to fold fp_extend/fp_trunc into loads/stores which we can then lower into a integer extending load/truncating store plus an FP_EXTEND/FP_ROUND. The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked types hence lowering them introduces an unpack/zip. By allowing these nodes to be combined with loads/store we make it much easier to have this unpack/zip combined into the load/store by our custom lowering. Differential Revision: https://reviews.llvm.org/D114580	2021-11-29 11:56:07 +00:00
David Sherwood	a31f4bdfe8	[CodeGen][SVE] Use whilelo instruction when lowering @llvm.get.active.lane.mask In most common cases the @llvm.get.active.lane.mask intrinsic maps directly to the SVE whilelo instruction, which already takes overflow into account. However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower this immediately to a generic sequence of instructions that explicitly take overflow into account. This makes it very difficult to then later transform back into a single whilelo instruction. Therefore, this patch introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if we should lower/expand this to a sequence of generic ISD nodes, or instead just leave it as an intrinsic for the target to lower. You can see the significant improvement in code quality for some of the tests in this file: CodeGen/AArch64/active_lane_mask.ll Differential Revision: https://reviews.llvm.org/D114542	2021-11-29 08:08:17 +00:00
Bradley Smith	eafbaca977	[AArch64][SVE] Generate ASRD instructions for power of 2 signed divides Differential Revision: https://reviews.llvm.org/D113281	2021-11-26 11:08:27 +00:00
Simon Pilgrim	63b1e58f07	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED) If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. Reapplied with fix for AArch64 rev patterns to matching the ARM fix. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-25 11:14:15 +00:00
Bradley Smith	080ef0b6a6	[AArch64][SVE] Recognize all ones mask during fixed mask generation Differential Revision: https://reviews.llvm.org/D114431	2021-11-24 13:55:06 +00:00
David Green	760d4d03d5	[AArch64] Sink splat shuffles to lane index intrinsics This teaches AArch64TargetLowering::shouldSinkOperands to sink splat shuffles to certain neon intrinsics, so that they can make use of the lane variants of the instructions that are available. Differential Revision: https://reviews.llvm.org/D112994	2021-11-22 08:11:35 +00:00
Zarko Todorovski	5b8bbbecfa	[NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target Reworded removed code comments that contain `sanity check` and `sanity test`.	2021-11-17 21:59:00 -05:00
Serguei Katkov	0ecb12a27f	[STATEPOINT] Force implicit-def for lr register. STATEPOINT instruction behavior is similar to call instruction. In aarch64 BL instruction implicitly define lr register, so STATEPOINT instruction should do the same. However STATEPOINT is a general pseudo instruction and I could not find a way to override list of implicit defs for specific target. So this patch post processes inserting STATEPOINT instruction by adding implisit dead def for lr. Reviewers: reames, loicottet, ostannard Reviewed By: reames Subscribers: danilaml, hiraditya, kristof.beyls, llvm-commits, yrouban Differential Revision: https://reviews.llvm.org/D111114	2021-11-16 12:52:00 +07:00
Kazu Hirata	efa896e5f7	[Target] Use SDNode::uses (NFC)	2021-11-12 21:23:04 -08:00
Florian Hahn	c2ed9fd054	[AArch64] Use custom lowering for {U,S}INT_TO_FP with i8. With fullfp16, it is cheaper to cast the {U,S}INT_TO_FP operand to i16 first, rather than promoting it to i32. The custom lowering for {U,S}INT_TO_FP already supports that, it just needs to be used. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D113601	2021-11-11 08:47:15 +00:00
David Green	703ded8dda	[AArch64] Allow FP16 vector fixed point converts This extends performFpToIntCombine to work on FP16 vectors as well as the f32 and f64 vectors it already supported. Differential Revision: https://reviews.llvm.org/D113297	2021-11-11 07:32:52 +00:00
David Green	509b397dd5	[AArch64] Combine vector fptoi.sat(fmul) to fixed point fcvtz Similar to D113199 but dealing with the vector size, this extends the fptosi+fmul to fixed point fold to handle fptosi.sat nodes that are equally viable, so long as the saturation width matches the output width. Differential Revision: https://reviews.llvm.org/D113200	2021-11-10 16:12:48 +00:00
Mindong Chen	495e258fd7	[AArch64][SVE] Add FP types to the supported SVE structure load/stores vector type list This adds FP type support to the SVE Container type list as a supplement to D112303. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D113333	2021-11-08 22:29:08 +08:00
David Sherwood	8d38c24fb6	[SVE][CodeGen] Improve codegen for some FP insert_subvector cases When inserting an unpacked FP subvector into a packed vector we can simply cast the unpacked value into a packed value, since both types are legal for SVE. We can then use this as the input for the UZP instruction. This avoids us expanding the operation by going through the stack. Differential Revision: https://reviews.llvm.org/D113270	2021-11-08 13:45:55 +00:00
Andrew Wei	bf3784b882	[AArch64] Canonicalize X(Y+1) or X(1-Y) to madd/msub Performing the rearrangement for add/sub and mul instructions to match the madd/msub pattern Reviewed By: dmgreen, sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D111862	2021-11-08 16:49:31 +08:00
David Sherwood	657a1dcd0d	[AArch64] Add target DAG combine for UUNPKHI/LO When created a UUNPKLO/HI node with an undef input then the output should also be undef. I've added a target DAG combine function to ensure we avoid creating an unnecessary uunpklo/hi instruction. Differential Revision: https://reviews.llvm.org/D113266	2021-11-05 13:50:59 +00:00
Sander de Smalen	1ea4296208	[NFC] Remove from UnivariateLinearPolyBase::getValue(). This interface should not have existed in the first place, let alone be a public member. It allows calling `ElementCount::get(..)->getValue()`, which is ambiguous. The interfaces to be used are either getFixedValue() or getKnownMinValue().	2021-11-04 14:32:08 +00:00
Cameron McInally	702fd3d323	[SVE] Fix VLS FMA matching for CodeGenOpt::Aggressive. For NEON, FMA matching is done in the MachineCombiner, and not the DAGCombiner. That causes problems with VLS lowering, since the vectors are fixed width at the DAGCombiner, but are scalable in the MachineCombiner. This patch corrects it by matching FMAs for VLS vectors in the DAGCombiner. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D112557	2021-11-01 10:43:52 -07:00
Bradley Smith	bf72a469ba	[AArch64][SVE] Fix build failure introduced in `13faa5f440`	2021-10-29 11:57:02 +00:00
Bradley Smith	13faa5f440	[AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types This adds support for SVE structured loads/stores to the relevant target hooks, such that we can support these instructions in the InterleavedAccess pass. Depends on D112078 Differential Revision: https://reviews.llvm.org/D112303	2021-10-29 09:35:57 +00:00
Kerry McLaughlin	1f49b71fe5	[SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt This patch enables the use of reciprocal estimates for SVE when both the -Ofast and -mrecip flags are used. Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D111657	2021-10-25 11:30:44 +01:00
Simon Pilgrim	71e39e3f18	[ADT] Add APInt::isNegatedPowerOf2() helper Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos..... Differential Revision: https://reviews.llvm.org/D111998	2021-10-19 14:38:21 +01:00
Andrew Wei	f5056c8c16	[AArch64] Improve shuffle vector by using wider types Try to widen element type to get a new mask value for a better permutation sequence, so that we can use NEON shuffle instructions, such as zip1/2, UZP1/2, TRN1/2, REV, INS, etc. For example: shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 6, i32 7, i32 2, i32 3> is equivalent to: shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1> Finally, we can get: mov v0.d[0], v1.d[1] Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D111619	2021-10-18 21:24:45 +08:00
David Green	fa1a68285e	[AArch64] Improve fptosi.sat vector lowering Similar to D111236, this improves the lowering of vector fptosi.sat and fptoui.sat, using legal converts and further saturating from there with min/max. f64 are excluded for the moment due to producing worse code in places compared to the unrolling. Differential Revision: https://reviews.llvm.org/D111787	2021-10-15 11:37:53 +01:00
David Green	a4f42a33be	[AArch64] Improve fptosi.sat lowering Improve the lowering of scalar fptosi.sat and fptoui.sat for saturating widths smaller than legal types by using the fact that the legal type will saturate under aarch64, and saturating the result further using min/max. Differential Revision: https://reviews.llvm.org/D111236	2021-10-15 11:12:23 +01:00
guopeilin	b092dc0bb9	[AArch64ISelLowering] Avoid duplane in some cases when sve enabled Reviewed By: david-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D110524	2021-10-15 15:33:24 +08:00
Bradley Smith	2eb42e3d2a	[AArch64][SVE] Add fixed type lowering for EXTRACT_SUBVECTOR Depends on D111135 Differential Revision: https://reviews.llvm.org/D111165	2021-10-12 14:56:15 +00:00
Bradley Smith	03065ecd85	[AArch64][SVE] Ensure LowerEXTRACT_SUBVECTOR is not called for illegal types The lowering for EXTRACT_SUBVECTOR should not be called during type legalization, only as part of lowering, hence return SDValue() when called on illegal types. This also adds missing tests for extracting fixed types from illegal scalable types. Differential Revision: https://reviews.llvm.org/D111412	2021-10-11 11:20:50 +00:00
Andrew Savonichev	7ae8f392a1	[AArch64] Emit AssertZExt for i1 arguments AAPCS requires i1 argument to be zero-extended to 8-bits by the caller. Emit a new AArch64ISD::ASSERT_ZEXT_BOOL hint (or AssertZExt for GlobalISel) to enable some optimization opportunities. In particular, when the argument is forwarded to the callee, we can avoid zero-extension and use it as-is. Differential Revision: https://reviews.llvm.org/D107160	2021-10-11 11:55:11 +03:00
Bradley Smith	5be266db7a	[AArch64][SVE] Improve VECTOR_SPLICE codegen for VL > 128-bit Differential Revision: https://reviews.llvm.org/D111135	2021-10-07 15:28:55 +00:00
Jay Foad	a9bceb2b05	[APInt] Stop using soft-deprecated constructors and methods in llvm. NFC. Stop using APInt constructors and methods that were soft-deprecated in D109483. This fixes all the uses I found in llvm, except for the APInt unit tests which should still test the deprecated methods. Differential Revision: https://reviews.llvm.org/D110807	2021-10-04 08:57:44 +01:00
Kazu Hirata	c1e32b3fc0	[Target] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-02 12:06:29 -07:00
Tim Northover	3a00e58c2f	AArch64: use indivisible cmpxchg for 128-bit atomic loads at O0 Like normal atomicrmw operations, at -O0 the simple register-allocator can insert spills into the LL/SC loop if it's expanded and visible when regalloc runs. This can cause the operation to never succeed by repeatedly clearing the monitor. Instead expand to a cmpxchg, which has a pseudo-instruction for -O0.	2021-09-22 14:20:43 +01:00
Tim Northover	13aa102e07	AArch64: use ldp/stp for 128-bit atomic load/store in v.84 onwards v8.4 says that normal loads/stores of 128-bytes are single-copy atomic if they're properly aligned (which all LLVM atomics are) so we no longer need to do a full RMW operation to guarantee we got a clean read.	2021-09-20 09:50:11 +01:00
Cullen Rhodes	17f1ccc759	[AArch64][SVE] NFC: Remove unnecessary if	2021-09-16 11:26:46 +00:00
David Truby	915e9e76bf	[llvm][sve] Lowering for VLS masked extending loads This extends the custom lowering for extending loads on fixed length vectors in SVE to support masked extending loads. The existing tests for correct behaviour of masked extending loads exhibit bad code generation due to the legalistaion of i1 vectors. They have been left as-is and new tests have been added that do not exhibit this behaviour. Differential Revision: https://reviews.llvm.org/D108200	2021-09-13 11:13:25 +01:00
Huihui Zhang	da4a2fd832	[AArch64ISelLowering] Fix null pointer access in performSVEAndCombine. When combining 'and' of an unsigned unpack and shuffle instruction, bail early if shuffle is not constructed from a constant integer. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D109556	2021-09-10 10:36:43 -07:00
Craig Topper	9af8f1b18e	[SelectionDAG] Add isZero/isAllOnes methods to ConstantSDNode. Soft deprecrate isNullValue/isAllOnesValue and update in tree callers. This matches the changes to the APInt interface from D109483. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D109535	2021-09-09 13:28:30 -07:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Bradley Smith	8089f9ed5a	[AArch64][SVE] Add missing patterns for unpredicated subr intrinsics Differential Revision: https://reviews.llvm.org/D109369	2021-09-09 10:28:37 +00:00
Ben Shi	f0460fa4eb	[AArch64] Improve target hook function to decide folding (mul (add x, c1), c2) Prevent the folding if it leads to worse code. Reviewed By: dmgreen, kda Differential Revision: https://reviews.llvm.org/D108871	2021-09-08 01:51:26 +00:00
Sander de Smalen	448d47f743	[AArch64][SVE] Implement all-inactive predicate with PFALSE. Instead of using a WHILE XZR, XZR instruction, just emit a PFALSE. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D109311	2021-09-07 14:29:02 +01:00
David Truby	b297531ece	[AArch64][sve] Prevent incorrect function call on fixed width vector The isEssentiallyExtractHighSubvector function currently calls getVectorNumElements on a type that in specific cases might be scalable. Since this function only has correct behaviour at the moment on scalable types anyway, the function can just return false when given a fixed type. Differential Revision: https://reviews.llvm.org/D109163	2021-09-06 14:25:03 +01:00
Kevin Athey	c7f50a445e	Revert "[AArch64] Implement target hook function to decide folding (mul (add x, c1), c2)" This reverts commit `095bea23d0`. Broke buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/11411	2021-09-03 18:08:58 -07:00
Ben Shi	095bea23d0	[AArch64] Implement target hook function to decide folding (mul (add x, c1), c2) Prevent the folding if it leads to worse code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D108871	2021-09-04 07:24:23 +08:00
Cullen Rhodes	1dcd900d1d	[AArch64][ISel] NFC: DAG.getMachineFunction() -> MF Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109135	2021-09-03 07:59:17 +00:00
Bradley Smith	14e1a4a6ee	[AArch64][SVE] Workaround incorrect types when lowering fixed length gather/scatter When lowering a fixed length gather/scatter the index type is assumed to be the same as the memory type, this is incorrect in cases where the extension of the index has been folded into the addressing mode. For now add a temporary workaround to fix the codegen faults caused by this by preventing the removal of this extension. At a later date the lowering for SVE gather/scatters will be redesigned to improve the way addressing modes are handled. As a short term side effect of this change, the addressing modes generated for fixed length gather/scatters will not be optimal. Differential Revision: https://reviews.llvm.org/D109145	2021-09-02 15:07:24 +00:00

1 2 3 4 5 ...

1321 Commits