llvm-project

Commit Graph

Author	SHA1	Message	Date
chenglin.bi	222adf338a	[Arch64][SelectionDAG] Add target-specific implementation of srem 1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. 2. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-19 02:49:42 +08:00
chenglin.bi	acfc025a72	Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem" This reverts commit `9d9eddd3dd`.	2022-04-18 10:35:09 +08:00
chenglin.bi	9d9eddd3dd	[Arch64][SelectionDAG] Add target-specific implementation of srem X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-16 12:29:11 +08:00
zhongyunde	49cb4fef02	[AArch64][SelectionDAG] Refactor to support more scalable vector extending stores Similar to D122281, we should firstly exclude all scalable vector extending stores and then selectively enable those which we directly support. Also merge integer and float scalable vector into scalable_vector_valuetypes. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D123449	2022-04-15 19:11:40 +08:00
Paul Walker	a5a258e208	[SVE] Refactor MGATHER lowering for unsupported passthru values. Handle unsupported passthru values before lowering the gather to target specific nodes. This is a simplification that's on the road to moving more of MGATHER lowering into td based isel. Differential Revision: https://reviews.llvm.org/D123683	2022-04-14 17:26:43 +01:00
John Brawn	12c1022679	[AArch64] Lowering and legalization of strict FP16 For strict FP16 to work correctly needs some changes in lowering and legalization: * SelectionDAGLegalize::PromoteNode was missing handling for some strict fp opcodes. * Some of the custom lowering of strict fp operations needed to be adjusted to work with FP16. * Custom lowering needed to be added for round-to-int operations. With this, and the previous patches for the rest of the strict fp isel, we can set IsStrictFPEnabled = true. Differential Revision: https://reviews.llvm.org/D115620	2022-04-14 16:51:22 +01:00
David Green	1ba8f4f67d	[AArch64] Move v4i8 concat load lowering to a combine. The existing code was not updating the uses of loads that it recreated, leading to incorrect chains which could break the ordering between nodes. This moves the code to a combine instead, and makes sure we update the chain references. This does mean it happens earlier - potentially before the concats are simplified. This can lead to inefficiencies in the codegen, which will be fixed in followups.	2022-04-14 15:19:33 +01:00
Paul Walker	0c44115e51	[SVE] Add support for non-element-type sized scaling when lowering MGATHER/MSCATTER. The lowering code did not use the scale operand of MGATHER/MSCATTER nodes, but instead assumed scaled indices were always scaled based on the element type of the memory type. This patch adds the missing support by rewritting the nodes as unscaled variants. Differential Revision: https://reviews.llvm.org/D123670	2022-04-14 11:54:46 +01:00
David Sherwood	44271e7c55	[AArch64][SVE] Fix lowering of "fcmp ueq/one" when using SVE We were previously lowering to the incorrect instructions for the setcc DAG node when using the SETUEQ and SETONE floating point condition codes. I have fixed this by marking the SETONE code as Expand and letting the SETUNE code be legal. I have also fixed up the patterns for FCMNE_PPzZZ and FCMNE_PPzZ0 to use the correct opcode. Differential Revision: https://reviews.llvm.org/D121905	2022-04-13 10:24:03 +01:00
David Green	a93607c479	[AArch64] Remove always true Perfect cost check. NFC Perfect shuffle costs are always encoded less than 4, and shouldn't really have a cost more than 3, so it makes no sense to check it when generating shuffles. The perfect shuffle is likely always better than a tbl too (although that may depend on whether it is in a loop).	2022-04-08 12:16:34 +01:00
Matt Arsenault	c4ea925f50	AtomicExpand: Change return type for shouldExpandAtomicStoreInIR Use the same enum as the other atomic instructions for consistency, in preparation for addition of another strategy. Introduce a new "Expand" option, since the store expansion does not use cmpxchg. Alternatively, the existing CmpXChg strategy could be renamed to Expand.	2022-04-06 22:34:04 -04:00
Martin Storsjö	8d7a17b7c8	[AArch64] Fix the upper limit for folded address offsets for COFF In COFF, the immediates in IMAGE_REL_ARM64_PAGEBASE_REL21 relocations are limited to 21 bit signed, i.e. the offset has to be less than (1 << 20). The previous limit did intend to cover for this case, but had missed that the 21 bit field was signed. This fixes issue https://github.com/llvm/llvm-project/issues/54753. Differential Revision: https://reviews.llvm.org/D123160	2022-04-06 22:54:13 +03:00
zhongyunde	9a2d5cc1da	[SVE][AArch64] Enable first active true vector combine for INTRINSIC_WO_CHAIN WHILELO/LS insn is used very important for SVE loop, and itself is a flag-setting operation, so add it. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D122796	2022-04-06 21:01:37 +08:00
zhongyunde	19e5235147	[AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD because it relies on BuildVectorSDNode is fails for scalable vectors. This patch is intend to handle that, so we can circle back the type MVT::nxv2i32 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122703	2022-04-06 20:50:42 +08:00
zhongyunde	251637690a	[AArch64] Enhance last active true vector combine Last active extracting will output LASTB + WHILELS, and the WHILELS itself is a flag-setting operation, so perform it preferly. Reviewed By: paulwalker-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D122551	2022-04-06 09:54:28 +08:00
David Green	3b9833597e	[AArch64] Alter mull buildvectors(ext(..)) combine to work on shuffles D120018 altered this combine to work on buildvectors as opposed to shuffle dup's. This works well for dups and other things that are expanded into buildvectors. Some shuffles are legal though, and stay as vector_shuffle through lowering. This expands the transform to also handle shuffles, so that we can turn mul(shuffle(sext into mul(sext(shuffle and more readily make smull/umull instructions. This can come up from the SLP vectorizer adding shuffles that are costed from extends. Differential Revision: https://reviews.llvm.org/D123012	2022-04-04 23:07:47 +01:00
David Green	60f57b3658	[AArch64] Ensure fixed point fptoi_sat has correct saturation width D113200 introduced an error where it was converting FP_TO_SI_SAT with multiply to a fixed point floating point convert. The saturation bitwidth needs to be equal to the floating point width, or else the routine would truncate the result as opposed to saturating it. Fixes #54601	2022-03-29 10:12:44 +01:00
Shao-Ce SUN	662b9fa02c	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557	2022-03-29 09:53:24 +08:00
zhongyunde	c3fe025bd4	[AArch64][SelectionDAG] Refactor to support more scalable vector extending loads Accord the discussion in D120953, we should firstly exclude all scalable vector extending loads and then selectively enable those which we directly support. This patch is intend to refactor for above (truncating stores is not touched),and more scalable vector types will try to reduce the number of masked loads in favour of more unpklo/hi instructions. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122281	2022-03-27 21:18:01 +08:00
David Green	693d3b7e76	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-26 21:10:43 +00:00
Simon Pilgrim	e699b5da44	[AArch64] isProfitableToHoist - remove nullptr test User is dereferenced on the main codepath so the null test is likely superfluous	2022-03-25 10:27:16 +00:00
David Green	3d8d60e147	Revert "[AArch64] Lower 3 and 4 sources buildvectors to TBL" This reverts commit `ec93b28909` as problems with it have been reported.	2022-03-25 10:03:10 +00:00
David Green	ec93b28909	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-24 10:02:33 +00:00
David Green	54bc9ad2e8	[AArch64] Make some methods static. NFC	2022-03-24 08:55:27 +00:00
David Spickett	c3b98194df	Reland "[llvm][AArch64] Insert "bti j" after call to setjmp" This reverts commit `edb7ba714a`. This changes BLR_BTI to take variable_ops meaning that we can accept a register or a label. The pattern still expects one argument so we'll never get more than one. Then later we can check the type of the operand to choose BL or BLR to emit. (this is what BLR_RVMARKER does but I missed this detail of it first time around) Also require NoSLSBLRMitigation which I missed in the first version.	2022-03-23 11:43:43 +00:00
David Spickett	edb7ba714a	Revert "[llvm][AArch64] Insert "bti j" after call to setjmp" This reverts commit `eb5ecbbcbb` due to failures on buildbots with expensive checks enabled.	2022-03-23 10:43:20 +00:00
David Spickett	eb5ecbbcbb	[llvm][AArch64] Insert "bti j" after call to setjmp Some implementations of setjmp will end with a br instead of a ret. This means that the next instruction after a call to setjmp must be a "bti j" (j for jump) to make this work when branch target identification is enabled. The BTI extension was added in armv8.5-a but the bti instruction is in the hint space. This means we can emit it for any architecture version as long as branch target enforcement flags are passed. The starting point for the hint number is 32 then call adds 2, jump adds 4. Hence "hint #36" for a "bti j" (and "hint #34" for the "bti c" you see at the start of functions). The existing Arm command line option -mno-bti-at-return-twice has been applied to AArch64 as well. Support is added to SelectionDAG Isel and GlobalIsel. FastIsel will defer to SelectionDAG. Based on the change done for M profile Arm in https://reviews.llvm.org/D112427 Fixes #48888 Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D121707	2022-03-23 09:51:02 +00:00
zhongyunde	828b89bc0b	[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads Trying to reduce the number of masked loads in favour of more unpklo/hi instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions from legal types. Both of normal and masked loads test cases added to guard compile crash. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D120953	2022-03-21 23:47:33 +08:00
chenglin.bi	dd3b90e4d7	[AArch64] Combine ISD::SETCC into AArch64ISD::ANDS When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false). ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction. So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison Fix https://github.com/llvm/llvm-project/issues/54283 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121449	2022-03-19 13:04:16 +00:00
Paul Walker	f46fe36d59	[AArch64] Fix incorrect getSetCCInverse usage within trySwapVSelectOperands. When inverting the compare predicate trySwapVSelectOperands is incorrectly using the type of the select's cond operand rather than the type of cond's operands. This means we're treating all inversions as if they're integer. Differential Revision: https://reviews.llvm.org/D121968	2022-03-19 12:36:14 +00:00
David Green	fe6057a293	[AArch64] Custom lower concat(v4i8 load, ...) We already have custom lowering for v4i8 load, which loads as a f32, converts to a vector and bitcasts and extends the result to a v4i16. This adds some custom lowering of concat(v4i8 load, ...) to keep the result as an f32 and create a buildvector of the resulting f32 loads. This helps not create all the extends and bitcasts, which are often difficult to fully clean up. Differential Revision: https://reviews.llvm.org/D121400	2022-03-18 11:58:02 +00:00
David Green	0b6df40c52	[AArch64] Combine ISD::AND into AArch64ISD::ANDS If we already have a AArch64ISD::ANDS node with identical operands, we can merge any ISD::AND into it, reducing the instruction count by calculating the value and the flags in a single operation. This code is taken from the X86 backend, and could also handle AArch64ISD::ADDS and AArch64ISD::SUBS, but I couldn't find any test cases where it came up. Differential Revision: https://reviews.llvm.org/D118584	2022-03-17 09:44:11 +00:00
zhongyunde	3568333815	[AArch64] Perform last active true vector combine Test bit of lane EC-1 can use P register directly, eg: Materialize : Idx = (add (mul vscale, NumEls), -1) i1 = extract_vector_elt t37, Constant:i64<Idx> ... into: "ptrue p, all" + PTEST Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121180	2022-03-15 01:25:03 +08:00
Arthur Eubanks	250620f76e	[OpaquePtr][AArch64] Use elementtype on ldxr/stxr Includes verifier changes checking the elementtype, clang codegen changes to emit the elementtype, and ISel changes using the elementtype. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120527	2022-03-14 10:09:59 -07:00
David Sherwood	aeeb1199b4	[AArch64][SVE] Change the asserts in LowerToPredicatedOp to check for legal types When building the LLVM test suite with SVE I discovered a crash when compiling some Halide tests, which occurs because we try to use SVE to lower 64-bit vector multiplies and there is no vscale_range attribute on the function. In this case the min SVE vector bits was 0, which caused an assert in LowerToPredicatedOp to fire. I have amended the asserts in this function to check that the fixed-width type is legal. If the fixed-width type is larger than NEON and is legal then it must be because we've set the min SVE vector bits to something > 128. Or if the min SVE bits is 0, then the only legal types allowed are 128 bit types - for any other types the assert will fire. Tests added here: CodeGen/AArch64/sve-fixed-length-no-vscale-range.ll Differential Revision: https://reviews.llvm.org/D121297	2022-03-11 09:57:58 +00:00
Philippe Valembois	26cd258420	[AArch64] Use correct calling convention for each vararg While checking is tail call optimization is possible, the calling convention applied to fixed arguments is not the correct one. This implies for DarwinPCS that all arguments of a vararg function will go to the stack although fixed ones can go in registers. This prevents non-virtual thunks to be tail optimized although they are marked as musttail. Differential Revision: https://reviews.llvm.org/D120622	2022-03-10 15:07:25 -08:00
David Green	4899e2cab4	[AArch64] Fix type in comment. NFC	2022-03-10 15:03:27 +00:00
David Green	21a97a2ac1	[AArch64] TBL uses zero for out of range elements. A TBL instruction will use zero for any out of range values. We can use this in GenerateTBL to help turn a TBL2 into a TBL1, avoiding the need to materialise the zero. Differential Revision: https://reviews.llvm.org/D121139	2022-03-10 14:45:13 +00:00
zhongyunde	c22c8b151b	[AArch64] Perform first active true vector combine Materialize : i1 = extract_vector_elt t37, Constant:i64<0> ... into: "ptrue p, all" + PTEST Test bit of lane 0 can use P register directly, and the instruction “pture all” is loop invariant, which will beneficial to SVE after hoisting out the loop. Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D120891	2022-03-08 01:10:21 +08:00
David Green	d9633d1490	[AArch64] Turn truncating buildvectors into truncates When lowering large v16f32->v16i8 fp_to_si_sat, the fp_to_si_sat node is split several times, creating an illegal v4i8 concat that gets expanded into a BUILD_VECTOR. After some combining and other legalisation, it ends up the a buildvector that extracts from 4 vectors, looking like BUILDVECTOR(a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3). That is really an v16i32->v16i8 truncate in disguise. This adds a ReconstructTruncateFromBuildVector method to detect the pattern, converting it back into the legal "concat(trunc(concat(trunc(a), trunc(b))), trunc(concat(trunc(c), trunc(d))))" tree. The extracted nodes could also be v4i16, in which case the truncates are not needed. All those truncates and concats then become uzip1's, which is much better than expanding by moving vector lanes around. Differential Revision: https://reviews.llvm.org/D119469	2022-03-07 09:42:54 +00:00
Benjamin Kramer	fbce4a7803	Drop some more global std::maps. NFCI.	2022-03-06 13:28:29 +01:00
Karl Meakin	43a0016f3d	Extend `performANDCSELCombine` to `performANDORCSELCombine` Differential Revision: https://reviews.llvm.org/D120422	2022-03-04 15:09:59 +00:00
Paul Walker	42b4a6227e	[DAGCombine] Prevent illegal ISD::SPLAT_VECTOR operations post legalisation. When triggered during operation legalisation the affected combine generates a splat_vector that when custom lowered for SVE fixed length code generation, results in the original precombine sequence and thus we enter a legalisation/combine hang. NOTE: The patch contains no tests because I observed this issue only when combined with other work that might never become public. The current way AArch64 lowers ISD::SPLAT_VECTOR meant a specific test was not possible so I'm hoping the DAGCombiner fix can be seen as obvious. The AArch64ISelLowering change is requirted to maintain existing code quality. Differential Revision: https://reviews.llvm.org/D120735	2022-03-04 11:54:03 +00:00
David Green	e348b09bb5	[AArch64] Turn UZP1 with undef operand into truncate This turns upz1(x, undef) to concat(truncate(x), undef), as the truncate is simpler and can often be optimized away, and it helps some of the insert-subvector tests optimize more cleanly. Differential Revision: https://reviews.llvm.org/D120879	2022-03-04 11:12:26 +00:00
Cullen Rhodes	616586794b	[AArch64] Add legal types for Streaming SVE The compiler currently crashes for scalable types when compiling with +sme, e.g. define <vscale x 4 x i32> @foo(<vscale x 4 x i32> %a) { ret <vscale x 4 x i32> %a } since it doesn't know how to legalize the types. SME implies a subset of SVE (+streaming-sve), the hasSVE predication in the backend needs extending to consider types/operations that are legal in Streaming SVE. This is the first patch adding legal types <-> register classes. Before making the change +sve(2) was temporarily replaced with +sme in all the intrinsics tests to see what failed, and again after making the change. For all the tests that passed after adding the legal types another RUN line has been added for +streaming-sve. More patches to follow. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118561	2022-03-03 09:51:14 +00:00
Sander de Smalen	eac2638ec1	[AArch64][SVE] Fold away SETCC if original input was predicate vector. This adds the following two folds: Fold 1: setcc_merge_zero( all_active, extend(nxvNi1 ...), != splat(0)) -> nxvNi1 ... Fold 2: setcc_merge_zero( pred, extend(nxvNi1 ...), != splat(0)) -> nxvNi1 and(pred, ...) Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D119334	2022-02-28 14:12:43 +00:00
Sander de Smalen	201e3686ab	[AArch64][SVE] Handle more cases in findMoreOptimalIndexType. This patch addresses @paulwalker-arm's comment on D117900 to only update/write the by-ref operands iff the function returns true. It also handles a few more cases where a series of added offsets can be folded into the base pointer, rather than just looking at a single offset. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D119728	2022-02-28 12:13:52 +00:00
Paul Walker	16ee102964	[SVE] Add missing splat patterns for bfloat vectors. Differential Revision: https://reviews.llvm.org/D120496	2022-02-25 16:53:39 +00:00
Paul Walker	8ca5be93cc	[SVE] Don't custom lower constant predicate ISD:SPLAT_VECTOR operations. Differential Revision: https://reviews.llvm.org/D120340	2022-02-25 11:32:37 +00:00
Arthur Eubanks	6aa285eb85	[OpaquePtr][AArch64] Use load/store value type instead of pointer type for ldnt1/stnt1 alignment	2022-02-24 16:56:13 -08:00
Craig Topper	c7d6448d03	[DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable. Internally to DAGCombiner the SDValues were passed by non-const reference despite not being modified. They were then passed by const reference to TLI. This patch passes them by value which is consistent with the vast majority of code. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120420	2022-02-23 12:40:45 -08:00
David Green	774b571546	[AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors We have a combine for converting mul(dup(ext(..)), ...) into mul(ext(dup(..)), ..), for allowing more uses of smull and umull instructions. Currently it looks for vector insert and shuffle vectors to detect the element that we can convert to a vector extend. Not all cases will have a shufflevector/insert element though. This started by extending the recognition to buildvectors (with elements that may be individually extended). The new method seems to cover all the cases that the old method captured though, as the shuffle will eventually be lowered to buildvectors, so the old method has been removed to keep the code a little simpler. The new code detects legal build_vector(ext(a), ext(b), ..), converting them to ext(build_vector(a, b, ..)) providing all the extends/types match up. Differential Revision: https://reviews.llvm.org/D120018	2022-02-22 23:37:22 +00:00
Reid Kleckner	ecb27004ec	Revert "[AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors" This reverts commit `9fc1a0dcb7`. We have bisected a compiler crash to this revision and will provide a test case soon.	2022-02-22 10:31:09 -08:00
Sunho Kim	d6a9eec238	[AARCH64][DAGCombine] Add combine for negation of CSEL absolute value pattern. This folds a negation through a csel, which can come up during the lowering of negative abs. Fixes https://github.com/llvm/llvm-project/issues/51558. Differential Revision: https://reviews.llvm.org/D112204	2022-02-22 09:59:36 +00:00
David Green	9fc1a0dcb7	[AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors We have a combine for converting mul(dup(ext(..)), ...) into mul(ext(dup(..)), ..), for allowing more uses of smull and umull instructions. Currently it looks for vector insert and shuffle vectors to detect the element that we can convert to a vector extend. Not all cases will have a shufflevector/insert element though. This started by extending the recognition to buildvectors (with elements that may be individually extended). The new method seems to cover all the cases that the old method captured though, as the shuffle will eventually be lowered to buildvectors, so the old method has been removed to keep the code a little simpler. The new code detects legal build_vector(ext(a), ext(b), ..), converting them to ext(build_vector(a, b, ..)) providing all the extends/types match up. Differential Revision: https://reviews.llvm.org/D120018	2022-02-21 15:44:30 +00:00
Serguei Katkov	7f2293ba25	[STATEPOINT] Mark LR is early-clobber implicit def. LR is modified at the moment of the call and before any use is read. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D120114	2022-02-21 10:37:43 +07:00
David Green	b8801ba050	[AArch64] Common patterns between UMULL and int_aarch64_neon_umull We have some duplicate patterns between the AArch64ISD::UMULL (/SMULL) and the int_aarch64_neon_umull (/smull) intrinsics. They did not replicate all the patterns though, leaving some gaps on instructions like umlal2 from codegen. This commons all the patterns by converting all int_aarch64_neon_umull intrinsics to UMULL nodes and removing the duplicate for umull/smull intrinsics, so that all instructions go through the same tablegen pattern. This improves some of the longer-than-legal mla patterns, helping them replace ext with umlal2. Differential Revision: https://reviews.llvm.org/D119887	2022-02-19 14:38:57 +00:00
Simon Pilgrim	f60d101b00	Fix Wdocumentation unknown parameter warning	2022-02-19 13:00:59 +00:00
John Brawn	bbd7eac800	[AArch64] Remove an unused variable in my previous patch	2022-02-17 16:39:44 +00:00
John Brawn	8e17c9613f	[AArch64] Add some missing strict FP vector lowering Also add a test for the codegen of strict FP vector operations so these changes get tested. Differential Revision: https://reviews.llvm.org/D117795	2022-02-17 16:10:31 +00:00
Matt Devereau	2f2dcb4fb1	[AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operations (vselect (setcc ( condcode) (_) (_)) (a) (op (a) (b))) => (vselect (setcc (!condcode) (_) (_)) (op (a) (b)) (a)) As a follow up to D117689, invert the operand order and condition in order to fold vselects into predicated instructions. Differential Revision: https://reviews.llvm.org/D119424	2022-02-17 16:01:17 +00:00
John Brawn	d916856bee	[AArch64] Allow strict opcodes in faddp patterns This also requires adjustment to code in AArch64ISelLowering so that vector_extract is distributed over strict_fadd. Differential Revision: https://reviews.llvm.org/D118489	2022-02-17 13:11:55 +00:00
John Brawn	d4342efb69	[AArch64] Add instruction selection for strict FP This consists of marking the various strict opcodes as legal, and adjusting instruction selection patterns so that 'op' is 'any_op'. FP16 and vector instructions additionally require some extra work in lowering and legalization, so we can't set IsStrictFPEnabled just yet. Also more work needs to be done for full strict fp support (marking instructions that can raise exceptions as such, and modelling FPCR use for controlling rounding). Differential Revision: https://reviews.llvm.org/D114946	2022-02-17 13:11:54 +00:00
David Green	f3bc7fd546	[AArch64] Cleanup for performCommonVectorExtendCombine. NFC This is some NFC (hopefully!) cleanup for performCommonVectorExtendCombine and related methods, removing conditions that cannot occur and otherwise cleaning up the code a little.	2022-02-17 10:03:28 +00:00
zhongyunde	064b2a6dc6	[DAGCombiner][AArch64] Enhance to fold CSNEG into CSINC instruction Perform the scalar expression combine in the form of: CSNEG(1, c, cc) + b => cc ? b+1 : b-c => CSINC(b-c, b, !cc) CSNEG(c, -1, cc) + b => cc ? b+c : b+1 => CSINC(b+c, b, cc) Fix https://github.com/llvm/llvm-project/issues/53071 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D119105	2022-02-16 09:39:38 +08:00
David Green	4072e362c0	[ISel] Port AArch64 HADD and RHADD to ISel This ports the aarch64 combines for HADD and RHADD over to DAG combine, so that they can be used in more architectures (notably MVE in a followup patch). They are renamed to AVGFLOOR and AVGCEIL in the process, to avoid confusion with instructions such as X86 hadd. The code was also rewritten slightly to remove the AArch64 idiosyncrasies. The general pattern for a AVGFLOORS is %xe = sext i8 %x to i32 %ye = sext i8 %y to i32 %a = add i32 %xe, %ye %r = lshr i32 %a, 1 %t = trunc i32 %r to i8 An AVGFLOORU is equivalent with zext. Because of the truncate lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes an extra rounding, so includes an extra add of 1. Differential Revision: https://reviews.llvm.org/D106237	2022-02-11 18:28:56 +00:00
Bradley Smith	c53ad72aa9	[AArch64][SVE] Fix selection failure caused by fp/int convert using non-Neon types Fixes: #53679 Differential Revision: https://reviews.llvm.org/D119428	2022-02-11 11:46:59 +00:00
Paul Walker	02e0d50eb1	[SVE] Remove AArch64ISD::ADD_PRED and AArch64ISD::SUB_PRED. These nodes provide an indirection that is not necessary because SVE has unpredicated add/sub instructions and there's no downside to using them for partial register operations. In fact, the test changes show that unifying how fixed-length and scalable vector add/sub are lowered enables better use of existing isel patterns. Differential Revision: https://reviews.llvm.org/D119355	2022-02-10 17:22:01 +00:00
David Sherwood	8b58494cea	[AArch64] Improve codegen for get.active.lane.mask when SVE is available When lowering the get.active.lane.mask intrinsic with a fixed-width predicate vector result, we can actually make use of the SVE whilelo instruction when SVE is enabled. We do this by carefully choosing a sensible VT for the whilelo instruction, then promoting it to an integer vector, i.e. nxv16i1 -> nx16i8. We can then extract a v16i8 subvector and truncate back to the original return type, i.e. v16i1. This leads to a significant improvement in code quality. Differential Revision: https://reviews.llvm.org/D116664	2022-02-10 16:02:44 +00:00
Paul Walker	9cc7eb0ec9	[SVE] Remove redundant hasBF16 calls from lowering code. The are several places where hasBF16 is used to protect code that has no requirement for the +bf16 feature. The lowering code uses stock SVE instructions for things like loads and stores and so is safe even when +bf16 is not available. NOTE: Currently the nxvbf16 type is not legal unless the +bf16 feature is available, but that isn't an issue because the affected code is post type legalisation. NOTE: This patch mirrors previous work that removed the same redundant protection from isel patterns where the resulting selection emitted stock SVE instructions. Differential Revision: https://reviews.llvm.org/D119328	2022-02-10 14:47:23 +00:00
Paul Walker	c58be85720	[SVE] Prefer zero-extending loads when lowering ISD::EXTLOAD. The decision is perhaps arbitrary but I figure zeroing has no dependency on the value being loaded. Differential Revision: https://reviews.llvm.org/D119327	2022-02-10 14:30:28 +00:00
David Sherwood	a57a7f3de5	[SVE][CodeGen] Bail out for scalable vectors in AArch64TargetLowering::ReconstructShuffle Previously the code in AArch64TargetLowering::ReconstructShuffle assumed the input vectors were always fixed-width, however this is not always the case since you can extract elements from scalable vectors and insert into fixed-width ones. We were hitting crashes here for two different cases: 1. When lowering a fixed-length vector extract from a scalable vector with i1 element types. This happens due to the fact the i1 elements get promoted to larger integer types for fixed-width vectors and leads to sequences of INSERT_VECTOR_ELT and EXTRACT_VECTOR_ELT nodes. In this case AArch64TargetLowering::ReconstructShuffle will still fail to make a transformation, but at least it no longer crashes. 2. When lowering a sequence of extractelement/insertelement operations on mixed fixed-width/scalable vectors. For now, I've just changed AArch64TargetLowering::ReconstructShuffle to bail out if it finds a scalable vector. Tests for both instances described above have been added here: (1) CodeGen/AArch64/sve-extract-fixed-vector.ll (2) CodeGen/AArch64/sve-fixed-length-reshuffle.ll Differential Revision: https://reviews.llvm.org/D116602	2022-02-10 14:18:49 +00:00
Bradley Smith	98936aee7d	[AArch64][SVE] Fix selection failure during lowering of shuffle_vector The lowering code for shuffle_vector has a code path that looks through extract_subvector, this code path did not properly account for the potential presense of larger than Neon vector types and could produce unselectable DAG nodes. Differential Revision: https://reviews.llvm.org/D119252	2022-02-10 12:07:51 +00:00
David Sherwood	eabae1b017	[AArch64][CodeGen] Always use SVE (when enabled) to lower 64-bit vector multiplies This patch adds custom lowering support for ISD::MUL with v1i64 and v2i64 types when SVE is enabled, regardless of the minimum SVE vector length. We do this because NEON simply does not have 64-bit vector multiplies, so we want to take advantage of these instructions in SVE. I've updated the 128-bit min SVE vector bits tests here: CodeGen/AArch64/sve-fixed-length-int-arith.ll CodeGen/AArch64/sve-fixed-length-int-mulh.ll CodeGen/AArch64/sve-fixed-length-int-rem.ll Differential Revision: https://reviews.llvm.org/D118802	2022-02-08 15:37:52 +00:00
David Green	fdce239ae9	[AArch64] Attempt to emitConjunction from brcond We currently use emitConjunction to create CCMP conjunctions from the conditions of selects, helping turning and/ors into more optimal ccmp sequences that don't need to go through csels. This extends that to also be used whilst lowering brcond, giving more opportunity for better condition generation. Differential Revision: https://reviews.llvm.org/D118650	2022-02-08 11:27:10 +00:00
David Truby	be826cf4f7	[AArch64][NEON][SVE] Lower FCOPYSIGN using AArch64ISD::BSP This patch modifies the FCOPYSIGN lowering to go through the BSP pseudo-instruction. This allows the same lowering code for NEON, SVE and SVE2. As part of this, lowering for BSP for SVE and SVE2 is also added. For SVE and NEON this patch is NFC. Differential Revision: https://reviews.llvm.org/D118394	2022-02-07 14:35:26 +00:00
zhongyunde 00443407	b3b129f11f	[DAGCombiner][AArch64] Enhance to support for scalar CSINC Enhance to fold csel into csinc instruction. Fix https://github.com/llvm/llvm-project/issues/53071 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D116915	2022-02-07 10:27:48 +08:00
Simon Pilgrim	d3f966c6f0	[AArch64] LowerVectorSRA_SRL_SHL - silence dead code warning Remove default case from switch and move llvm_unreachable to after the switch()	2022-02-06 16:29:38 +00:00
Paul Walker	20085df22a	[NFC][SVE] Change useSVEForFixedLengthVectorVT to allow unconditional SVE usage for NEON sized vectors. Previously useSVEForFixedLengthVectorVT only allowed SVE usage when the target SVE register length was known to be at least 256bit. This was true even for NEON sized vectors, which was an artificial restriction imposed during early SVE bring up. This now changes so that callers can opt to use SVE for NEON sized vectors regardless of the SVE register length. The patch is NFC because for all places where OverrideNEON is used we now explicitly also check that SVE code generation for larger than NEON vectors is enabled. The intent is that over time these extra checks will either be removed or the lowering disabled if the SVE usage proves not beneficial. Differential Revision: https://reviews.llvm.org/D118957	2022-02-04 14:34:35 +00:00
Caroline Concatto	961e954af5	[AArch64][SVE] Add more folds to make use of gather/scatter with 32-bit indices In AArch64ISelLowering.cpp this patch implements this fold: 1) GEP (%ptr, SHL ((stepvector(A) + splat(%offset))) << splat(B))) into GEP (%ptr + (%offset << B), step_vector (A << B)) The above transform simplifies the index operand so that it can be expressed as i32 elements. This allows using only one gather/scatter assembly instruction instead of two. Patch by Paul Walker (@paulwalker-arm). Depends on D117900 Differential Revision: https://reviews.llvm.org/D118345	2022-02-03 19:18:30 +00:00
Caroline Concatto	019f0221d5	[AArch64][SVE] Fold gather/scatter with 32bits when possible In AArch64ISelLowering.cpp this patch implements this fold: GEP (%ptr, (splat(%offset) + stepvector(A))) into GEP ((%ptr + %offset), stepvector(A)) The above transform simplifies the index operand so that it can be expressed as i32 elements. This allows using only one gather/scatter assembly instruction instead of two. Patch by Paul Walker (@paulwalker-arm). Depends on D118459 Differential Revision: https://reviews.llvm.org/D117900	2022-02-03 18:58:37 +00:00
Sunho Kim	44601f4956	[AARCH64][NEON] Allow to sink operands for aarch64_neon_pmull This teaches AArch64TargetLowering::shouldSinkOperands to sink the operands of aarch64_neon_pmull intrinsic. Differential Revision: https://reviews.llvm.org/D117944	2022-02-03 16:46:49 +00:00
David Green	31373fb88a	[AArch64] Reassociate integer extending reductions to pairwise addition. Given an (integer) vecreduce, we know the order of the inputs does not matter. We can convert UADDV(add(zext(extract_lo(x)), zext(extract_hi(x)))) into UADDV(UADDLP(x)). This can also happen through an extra add, where we transform UADDV(add(y, add(zext(extract_lo(x)), zext(extract_hi(x))))). This makes sure the same thing happens signed cases too, which requires adding a new SADDLP node. Differential Revision: https://reviews.llvm.org/D118107	2022-02-03 11:05:48 +00:00
David Green	0cd8063960	[AArch64] Genereate CCMP from And CSel LLVM has a couple of ways of producing ccmp - either from chains in isel or from a later ifcvt style pass. This adds a simple DAG combine to capture more cases, converting and(csel(0, 1, cc0), csel(0, 1, cc1)) into a csel(ccmp(.., cc0)), depending on cc1 (a SUBS in this case). Differential Revision: https://reviews.llvm.org/D118327	2022-02-02 13:48:16 +00:00
David Sherwood	11cf807796	[AArch64][CodeGen] Always use SVE (when enabled) to lower integer divides This patch adds custom lowering support for ISD::SDIV and ISD::UDIV when SVE is enabled, regardless of the minimum SVE vector length. We do this because NEON simply does not have vector integer divide support, so we want to take advantage of these instructions in SVE. As part of this patch I've also simplified LowerToPredicatedOp to avoid re-asking the same question about whether we should be using SVE for fixed length vectors. Once we've made the decision to call LowerToPredicatedOp, then we should simply assert we should be using SVE. I've updated the 128-bit min SVE vector bits tests here: CodeGen/AArch64/sve-fixed-length-int-div.ll CodeGen/AArch64/sve-fixed-length-int-rem.ll Differential Revision: https://reviews.llvm.org/D117871	2022-02-02 09:46:02 +00:00
Cullen Rhodes	16d464a291	[AArch64][SVE] NFC: tidy up isel lowering Whilst adding legal types <-> register classes for Streaming SVE in D118561 I noticed the hasSVE predication block set operation actions for opcodes that may not be legal in Streaming SVE. Move these operations to the later hasSVE block which has loops over the same types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118560	2022-02-02 09:02:20 +00:00
Nikita Popov	a1dc6d4b83	[AArch64] Do not use ABI alignment for mops.memset.tag Pointer element types do not imply that the pointer is ABI aligned. We should be using either an explicit align attribute here, or fall back to an alignment of 1. This fixes a new element type access introduced in D117764. I don't think this makes any practical difference though, as the lowering does not depend on alignment. Differential Revision: https://reviews.llvm.org/D118681	2022-02-01 14:37:53 +01:00
Fangrui Song	51ed14d224	[AArch64] Temporarily use getPointerElementType to fix -Wdeprecated-declarations. NFC	2022-01-31 19:16:11 -08:00
tyb0807	5aa08bf708	[AArch64][SelectionDAG] CodeGen for Armv8.8/9.3 MOPS New target SDNodes are added: AArch64ISD::MOPS_MEMSET, etc. Each intrinsic is translated to one of these in SelectionDAGBuilder via EmitTargetCodeForMOPS. A custom lowering routine for INTRINSIC_W_CHAIN is added to handle llvm.aarch64.mops.memset.tag. This takes a separate path from the common intrinsics but ultimately ends up in the same EmitMOPS(). This is part 4/4 of a series of patches split from https://reviews.llvm.org/D117405 to facilitate reviewing. Patch by Tomas Matheson, Lucas Prates and Son Tuan Vu. Differential Revision: https://reviews.llvm.org/D117764	2022-01-31 20:56:27 +00:00
Florian Hahn	23091f7d50	[AArch64] Bail out for float operands in SetCC optimization. The optimization added in D118139 causes a crash on the added test case while trying to zero extend an vector of floats. Fix the crash by bailing out for floating point operands. Reviewed By: DavidTruby Differential Revision: https://reviews.llvm.org/D118615	2022-01-31 18:20:47 +00:00
Paul Walker	30efee764d	[SVE] Remove AArch64ISD::PFALSE. AArch64ISD::PFALSE does not provide any value, in fact it can prevent common combines from firing. We only needed to lower to PFALSE until ISD::SPLAT_VECTOR became generally available. Differential Revision: https://reviews.llvm.org/D118469	2022-01-29 11:31:00 +00:00
Paul Walker	3bc876d0a3	[AArch64] Add isel for bitcasting between bfloat and half types. Differential Revision: https://reviews.llvm.org/D118420	2022-01-29 11:26:13 +00:00
Ahmed Bougacha	634ca7349d	[ObjCARC] Require the function argument in the clang.arc.attachedcall bundle. Currently, the clang.arc.attachedcall bundle takes an optional function argument. Depending on whether the argument is present, calls with this bundle have the following semantics: - on x86, with the argument present, the call is lowered to: call _target mov rax, rdi call _objc_retainAutoreleasedReturnValue - on AArch64, without the argument, the call is lowered to: bl _target mov x29, x29 and the objc runtime call is expected to be emitted separately. That's because, on x86, the objc runtime checks for both the mov and the call on x86, and treats the combination as the ARC autorelease elision marker. But on AArch64, it only checks for the dedicated NOP marker, as that's historically been sufficiently unique. Thanks to that, the runtime call wasn't required to be adjacent to the NOP marker, so it wasn't emitted as part of the bundle sequence. This patch unifies both architectures: on AArch64, we now emit all 3 instructions for the bundle. This guarantees that the runtime call is adjacent to the marker in the sequence, and that's information the runtime can use to further optimize this. This helps simplify some of the handling, in particular BundledRetainClaimRVs, which no longer needs to know whether the bundle is sufficient or not: it now always should be. Note that this does not include an AutoUpgrade for the nullary bundles, as they are only produced in ObjCContract as part of the obj/asm emission pipeline, and are not expected to be in bitcode. Differential Revision: https://reviews.llvm.org/D118214	2022-01-28 12:41:45 -08:00
David Truby	81bd67e18a	[AArch64][SVE][VLS] Move extends into arguments of comparisons When a comparison is extended and it would be free to extend the arguments to that comparison, we can propagate the extend into those arguments. This prevents extra instructions being generated to extend the result of the comparison, which is not free to extend. This is a resubmission of D116812 with fixes that need another review. Differential Revision: https://reviews.llvm.org/D118139	2022-01-28 14:16:08 +00:00
Sander de Smalen	af1c8f0d14	[AArch64][SVE] Folds VSELECT if the predicate is all active. This adds the following changes: * Fold: vselect(<all active predicate>, x, y) => x * Extend isAllActivePredicate to take vscale_range into account, e.g. isAllActivePredicate(vl16) for nxv16i1 and vscale == 1 => true. isAllActivePredicate(vl32) for nxv16i1 and vscale == 2 => true. Differential Revision: https://reviews.llvm.org/D118147	2022-01-27 15:58:56 +00:00
Sander de Smalen	417a75c6d0	[AArch64][SVE] Avoid using ptrue for ptest in VECREDUCE_OR. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118145	2022-01-27 11:44:49 +00:00
Sander de Smalen	c9da81d997	[AArch64][SVE] Implement missing lowering for extract_subvector for predicates. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D118057	2022-01-27 11:01:11 +00:00
Sander de Smalen	d58757e522	[AArch64][SVE] Implement PFALSE with explicit AArch64ISD node. The ISel patterns for PFALSE helps recognise the instructions as being free of side-effects, which helps MachineCSE remove redundant PFALSE instructions. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D118054	2022-01-27 10:30:13 +00:00
Benjamin Kramer	f15014ff54	Revert "Rename llvm::array_lengthof into llvm::size to match std::size from C++17" This reverts commit `ef82063207`. - It conflicts with the existing llvm::size in STLExtras, which will now never be called. - Calling it without llvm:: breaks C++17 compat	2022-01-26 16:55:53 +01:00
serge-sans-paille	ef82063207	Rename llvm::array_lengthof into llvm::size to match std::size from C++17 As a conquence move llvm::array_lengthof from STLExtras.h to STLForwardCompat.h (which is included by STLExtras.h so no build breakage expected).	2022-01-26 16:17:45 +01:00
Maciej Gabka	c5263cd518	Restrict performPostLD1Combine to 64 and 128 bit vectors When wider vectors are used, for example fixed width SVE, there is no patterns to select AArch64ISD::LD1LANEpost nodes, so we should do an early exit. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D117674	2022-01-26 09:57:44 +00:00
Paul Walker	d95cf1f6cf	[SVE] Enable ISD::ABDS/U ISel for scalable vectors. NOTE: This patch also includes tests that highlight those cases where the existing DAG combine doesn't yet work well for SVE. Differential Revision: https://reviews.llvm.org/D117873	2022-01-25 12:14:53 +00:00
Nikita Popov	aa97bc116d	[NFC] Remove uses of PointerType::getElementType() Instead use either Type::getPointerElementType() or Type::getNonOpaquePointerElementType(). This is part of D117885, in preparation for deprecating the API.	2022-01-25 09:44:52 +01:00
Peter Waller	d4a6bf4d1a	Revert "[AArch64][SVE][VLS] Move extends into arguments of comparisons" This reverts commit `db04d3e30b`, which causes a buildbot failure.	2022-01-20 12:01:23 +00:00
Adrian Tong	b6a7ae2c5d	Optimize shift and accumulate pattern in AArch64. AArch64 supports unsigned shift right and accumulate. In case we see a unsigned shift right followed by an OR. We could turn them into a USRA instruction, given the operands of the OR has no common bits. Differential Revision: https://reviews.llvm.org/D114405	2022-01-20 01:57:40 +00:00
David Truby	db04d3e30b	[AArch64][SVE][VLS] Move extends into arguments of comparisons When a comparison is extended and it would be free to extend the arguments to that comparison, we can propagate the extend into those arguments. This prevents extra instructions being generated to extend the result of the comparison, which is not free to extend. Differential Revision: https://reviews.llvm.org/D116812	2022-01-19 14:11:45 +00:00
Jim Lin	d6b0734837	[NFC] Use Register instead of unsigned	2022-01-19 20:17:04 +08:00
Akshay Kumar	6f61fe7de9	[Aarch64] Customer lowering of COPYSIGN to SIMD should check for NEON availability For the following test case, clang is crashing for ARM64 architecture $ cat crash.c double crash(double a, double b) { return __builtin_copysign(a, b); } $ clang -O2 -march=armv8-a+nosimd --target=arm64 -S crash.c -o /dev/null fatal error: error in backend: Cannot select: 0x7fae361bb4e8: v2i64 = AArch64ISD::BIT 0x7fae361bb210, 0x7fae361bb278, 0x7fae361bb480 Fix: PR51806 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D116581	2022-01-18 00:25:15 +05:30
David Sherwood	3a272d1eaf	[SVE][CodeGen] Use splice instruction when lowering VECTOR_SPLICE For certain negative indices passed to the VECTOR_SPLICE operation we can actually directly use the SVE splice instruction by creating the appropriate predicate. The predicate needs to be constructed in such a way that all but the last -idx elements are false. We can do this efficiently using a combination of 'ptrue' (with the appropriate fixed pattern, e.g. vl1, vl2, etc.) and 'rev'. The advantage of using these instructions to generate the predicate is they do not set any flags, unlike the whilelo instruction. This is critical when the splice operation is in a loop, since we want MachineLICM to hoist the predicate generation out of the loop. Differential Revision: https://reviews.llvm.org/D115863	2022-01-11 11:58:17 +00:00
Jay Foad	3f3fe4a5cf	[GlobalISel] Fix typo Extact to Extract in function name. NFC.	2022-01-07 11:13:35 +00:00
Nicholas Guy	13992498cd	[AArch64][CodeGen] Emit alignment "Max Skip" operand for AArch64 loops Differential Revision: https://reviews.llvm.org/D114879	2022-01-05 12:54:31 +00:00
Paul Walker	4325fd7402	[AArch64ISelLowering] Don't look through scalable extract_subvector when optimising DUPLANE. When constructDup is passed an extract_subvector it tries to use extract_subvector's operand directly when creating the DUPLANE. This is invalid when extracting from a scalable vector because the necessary DUPLANE ISel patterns do not exist. NOTE: This patch is an update to https://reviews.llvm.org/D110524 that originally fixed this but introduced a bug when the result VT is 64bits. I've restructured the code so the critial final else block is entered when necessary. Differential Revision: https://reviews.llvm.org/D116442	2022-01-05 11:56:59 +00:00
Andrew Wei	03dc2975d0	[AArch64][SVE] Lower shuffles to permute instructions: zip1/2, uzp1/2, trn1/2 Attempt to lower a shuffle as a permute instruction(zip/uzp/trn) for fixed length SVE. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D113376	2021-12-21 18:39:09 +08:00
David Truby	7e44eb079d	[AArch64][SVE] Improve code generation for VLS i1 masks This patch partially resolves an issue for VLS code generation where a mask is generated from a smaller width integer comparison than the instruction using the mask requires. Instead of sign extending a p register by converting it to a z register, extending that, and converting back, we instead just do an unpack of the p register. A separate issue causes the code generation to still be poor when the mask generation would fit in a neon register, as we then use a neon comparison operation and have to convert that to a p register. This will be resolved in a separate patch. Reviewed By: peterwaller-arm Differential Revision: https://reviews.llvm.org/D111221	2021-12-17 16:26:49 +00:00
David Truby	5c9684704d	[DAG][sve] Lowering for VLS masked truncating stores This extends the custom lowering for truncating stores on fixed length vectors in SVE to support masked truncating stores. It also adds a DAG combine for truncates followed by masked stores. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D108115	2021-12-17 15:04:45 +00:00
Andrew Wei	dc7b672f96	[AArch64][SVE] Lower shuffles to permute instructions: rev/revb/revh/revw Attempt to lower a shuffle as a permute instruction(rev/revb/revh/revw) for fixed length SVE. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D114960	2021-12-15 21:53:00 +08:00
John Brawn	dc9f65be45	[AArch64][SVE] Fix handling of stack protection with SVE Fix a couple of things that were causing stack protection to not work correctly in functions that have scalable vectors on the stack: * Use TypeSize when determining if accesses to a variable are considered out-of-bounds so that the behaviour is correct for scalable vectors. * When stack protection is enabled move the stack protector location to the top of the SVE locals, so that any overflow in them (or the other locals which are below that) will be detected. Fixes: https://github.com/llvm/llvm-project/issues/51137 Differential Revision: https://reviews.llvm.org/D111631	2021-12-14 11:30:48 +00:00
Peter Waller	921e89c59a	[SVE] Only combine (fneg (fma)) => FNMLA with nsz -(Za + Zm * Zn) != (-Za + Zm * (-Zn)) when the FMA produces a zero output (e.g. all zero inputs can produce -0 output) Add a PatFrag to check presence of nsz on the fneg, add tests which ensure the combine does not fire in the absense of nsz. See https://reviews.llvm.org/D90901 for a similar discussion on X86. Differential Revision: https://reviews.llvm.org/D109525	2021-12-13 11:33:07 +00:00
Matt Devereau	2e585dd91a	[AArch64][SVE] Lower vector.insert to predicated merged MOV Use predicated SEL for vector.insert instead of going through memory Differential Revision: https://reviews.llvm.org/D115259	2021-12-13 11:17:55 +00:00
Peter Waller	ed43aab98d	[AArch64][SVE] Fix fptrunc store for fixed len vector Restrict duplicate FP_EXTEND/FP_TRUNC -> LOAD/STORE DAG combines to only larger than NEON types, as these are the ones for which there is custom lowering. Update tests so that they go through memory to improve validation. Differential Revision: https://reviews.llvm.org/D115166	2021-12-07 12:22:07 +00:00
Peter Waller	a6f751c34e	[AArch64][SVE] Fix ICE extracting fixedvec from scalable load `f526c600c0` had a concern raised because of an invalid typesize request on a scalable vector, which this patch addresses. Prevent shouldReduceLoadWidth from attempting to query the bit size, and add a regression test in sve-extract-fixed-vector.ll. Differential Revision: https://reviews.llvm.org/D115156	2021-12-06 16:49:43 +00:00
Matt Devereau	4244f95cc6	[AArch64][SVE] Enable bf16 vector.insert Allow passthrough bf16 registers for vector.insert Differential revision: https://reviews.llvm.org/D114858	2021-12-02 12:59:19 +00:00
Bradley Smith	fd9069ffce	[AArch64][SVE] Duplicate FP_EXTEND/FP_TRUNC -> LOAD/STORE dag combines By duplicating these dag combines we can bypass the legality checks that they do, this allows us to perform these combines on larger than legal fixed types, which in turn allows us to bring the same benefits D114580 brought but to larger than legal fixed types. Depends on D114580 Differential Revision: https://reviews.llvm.org/D114628	2021-12-01 15:33:53 +00:00
David Green	9e8a71caf0	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 15:29:14 +00:00
Hans Wennborg	a87782c34d	Revert "[DAG] Create fptosi.sat from clamped fptosi" It causes builds to fail with this assert: llvm/include/llvm/ADT/APInt.h:990: bool llvm::APInt::operator==(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed. See comment on the code review. > This adds a fold in DAGCombine to create fptosi_sat from sequences for > smin(smax(fptosi(x))) nodes, where the min/max saturate the output of > the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because > it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, > ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need > to be handled similarly. > > A shouldConvertFpToSat method was added to control when converting may > be profitable. The original fptosi will have a less strict semantics > than the fptosisat, with less values that need to produce defined > behaviour. > > This especially helps on ARM/AArch64 where the vcvt instructions > naturally saturate the result. > > Differential Revision: https://reviews.llvm.org/D111976 This reverts commit `52ff3b0093`.	2021-11-30 15:36:56 +01:00
David Green	52ff3b0093	[DAG] Create fptosi.sat from clamped fptosi This adds a fold in DAGCombine to create fptosi_sat from sequences for smin(smax(fptosi(x))) nodes, where the min/max saturate the output of the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN, ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need to be handled similarly. A shouldConvertFpToSat method was added to control when converting may be profitable. The original fptosi will have a less strict semantics than the fptosisat, with less values that need to produce defined behaviour. This especially helps on ARM/AArch64 where the vcvt instructions naturally saturate the result. Differential Revision: https://reviews.llvm.org/D111976	2021-11-30 11:05:32 +00:00
David Sherwood	84364bdaab	[CodeGen][AArch64] Bail out in performConcatVectorsCombine for scalable vectors I tried to exercise the existing combine patterns in performConcatVectorsCombine for scalable vectors and at the moment it doesn't seem possible. Parts of the code currently assume we're dealing with fixed-width vectors with calls to getVectorNumElements(), therefore I've decided to simply bail out early for scalable vectors. Added a test here to show that we don't crash when attempting to combine truncate + concat: CodeGen/AArch64/concat_vector-truncate-combine.ll Differential Revision: https://reviews.llvm.org/D114600	2021-11-29 14:26:14 +00:00
Bradley Smith	6180806632	[AArch64][SVE] Mark fixed-type FP extending/truncating loads/stores as custom This allows the generic DAG combine to fold fp_extend/fp_trunc into loads/stores which we can then lower into a integer extending load/truncating store plus an FP_EXTEND/FP_ROUND. The nuance here is that fixed-type FP_EXTEND/FP_ROUND require unpacked types hence lowering them introduces an unpack/zip. By allowing these nodes to be combined with loads/store we make it much easier to have this unpack/zip combined into the load/store by our custom lowering. Differential Revision: https://reviews.llvm.org/D114580	2021-11-29 11:56:07 +00:00
David Sherwood	a31f4bdfe8	[CodeGen][SVE] Use whilelo instruction when lowering @llvm.get.active.lane.mask In most common cases the @llvm.get.active.lane.mask intrinsic maps directly to the SVE whilelo instruction, which already takes overflow into account. However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower this immediately to a generic sequence of instructions that explicitly take overflow into account. This makes it very difficult to then later transform back into a single whilelo instruction. Therefore, this patch introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if we should lower/expand this to a sequence of generic ISD nodes, or instead just leave it as an intrinsic for the target to lower. You can see the significant improvement in code quality for some of the tests in this file: CodeGen/AArch64/active_lane_mask.ll Differential Revision: https://reviews.llvm.org/D114542	2021-11-29 08:08:17 +00:00
Bradley Smith	eafbaca977	[AArch64][SVE] Generate ASRD instructions for power of 2 signed divides Differential Revision: https://reviews.llvm.org/D113281	2021-11-26 11:08:27 +00:00
Simon Pilgrim	63b1e58f07	[DAG] SimplifyDemandedBits - simplify rotl/rotr to shl/srl (REAPPLIED) If we only demand bits from one half of a rotation pattern, see if we can simplify to a logical shift. For the ARM/AArch64 rev16/32 patterns, I had to drop a fold to prevent srl(bswap()) -> rotr(bswap) -> srl(bswap) infinite loops. I've replaced this with an isel PatFrag which should do the same task. Reapplied with fix for AArch64 rev patterns to matching the ARM fix. https://alive2.llvm.org/ce/z/iroxki (rol -> shl by amt iff demanded bits has at least as many trailing zeros as the shift amount) https://alive2.llvm.org/ce/z/4ez_U- (ror -> shl by revamt iff demanded bits has at least as many trailing zeros as the reverse shift amount) https://alive2.llvm.org/ce/z/cD7dR- (ror -> lshr by amt iff demanded bits has at least as many leading zeros as the shift amount) https://alive2.llvm.org/ce/z/_XGHtQ (rol -> lshr by revamt iff demanded bits has at least as many leading zeros as the reverse shift amount) Differential Revision: https://reviews.llvm.org/D114354	2021-11-25 11:14:15 +00:00
Bradley Smith	080ef0b6a6	[AArch64][SVE] Recognize all ones mask during fixed mask generation Differential Revision: https://reviews.llvm.org/D114431	2021-11-24 13:55:06 +00:00
David Green	760d4d03d5	[AArch64] Sink splat shuffles to lane index intrinsics This teaches AArch64TargetLowering::shouldSinkOperands to sink splat shuffles to certain neon intrinsics, so that they can make use of the lane variants of the instructions that are available. Differential Revision: https://reviews.llvm.org/D112994	2021-11-22 08:11:35 +00:00
Zarko Todorovski	5b8bbbecfa	[NFC][llvm] Inclusive language: reword and remove uses of sanity in llvm/lib/Target Reworded removed code comments that contain `sanity check` and `sanity test`.	2021-11-17 21:59:00 -05:00
Serguei Katkov	0ecb12a27f	[STATEPOINT] Force implicit-def for lr register. STATEPOINT instruction behavior is similar to call instruction. In aarch64 BL instruction implicitly define lr register, so STATEPOINT instruction should do the same. However STATEPOINT is a general pseudo instruction and I could not find a way to override list of implicit defs for specific target. So this patch post processes inserting STATEPOINT instruction by adding implisit dead def for lr. Reviewers: reames, loicottet, ostannard Reviewed By: reames Subscribers: danilaml, hiraditya, kristof.beyls, llvm-commits, yrouban Differential Revision: https://reviews.llvm.org/D111114	2021-11-16 12:52:00 +07:00
Kazu Hirata	efa896e5f7	[Target] Use SDNode::uses (NFC)	2021-11-12 21:23:04 -08:00
Florian Hahn	c2ed9fd054	[AArch64] Use custom lowering for {U,S}INT_TO_FP with i8. With fullfp16, it is cheaper to cast the {U,S}INT_TO_FP operand to i16 first, rather than promoting it to i32. The custom lowering for {U,S}INT_TO_FP already supports that, it just needs to be used. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D113601	2021-11-11 08:47:15 +00:00
David Green	703ded8dda	[AArch64] Allow FP16 vector fixed point converts This extends performFpToIntCombine to work on FP16 vectors as well as the f32 and f64 vectors it already supported. Differential Revision: https://reviews.llvm.org/D113297	2021-11-11 07:32:52 +00:00
David Green	509b397dd5	[AArch64] Combine vector fptoi.sat(fmul) to fixed point fcvtz Similar to D113199 but dealing with the vector size, this extends the fptosi+fmul to fixed point fold to handle fptosi.sat nodes that are equally viable, so long as the saturation width matches the output width. Differential Revision: https://reviews.llvm.org/D113200	2021-11-10 16:12:48 +00:00
Mindong Chen	495e258fd7	[AArch64][SVE] Add FP types to the supported SVE structure load/stores vector type list This adds FP type support to the SVE Container type list as a supplement to D112303. Reviewed By: peterwaller-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D113333	2021-11-08 22:29:08 +08:00
David Sherwood	8d38c24fb6	[SVE][CodeGen] Improve codegen for some FP insert_subvector cases When inserting an unpacked FP subvector into a packed vector we can simply cast the unpacked value into a packed value, since both types are legal for SVE. We can then use this as the input for the UZP instruction. This avoids us expanding the operation by going through the stack. Differential Revision: https://reviews.llvm.org/D113270	2021-11-08 13:45:55 +00:00
Andrew Wei	bf3784b882	[AArch64] Canonicalize X(Y+1) or X(1-Y) to madd/msub Performing the rearrangement for add/sub and mul instructions to match the madd/msub pattern Reviewed By: dmgreen, sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D111862	2021-11-08 16:49:31 +08:00
David Sherwood	657a1dcd0d	[AArch64] Add target DAG combine for UUNPKHI/LO When created a UUNPKLO/HI node with an undef input then the output should also be undef. I've added a target DAG combine function to ensure we avoid creating an unnecessary uunpklo/hi instruction. Differential Revision: https://reviews.llvm.org/D113266	2021-11-05 13:50:59 +00:00
Sander de Smalen	1ea4296208	[NFC] Remove from UnivariateLinearPolyBase::getValue(). This interface should not have existed in the first place, let alone be a public member. It allows calling `ElementCount::get(..)->getValue()`, which is ambiguous. The interfaces to be used are either getFixedValue() or getKnownMinValue().	2021-11-04 14:32:08 +00:00
Cameron McInally	702fd3d323	[SVE] Fix VLS FMA matching for CodeGenOpt::Aggressive. For NEON, FMA matching is done in the MachineCombiner, and not the DAGCombiner. That causes problems with VLS lowering, since the vectors are fixed width at the DAGCombiner, but are scalable in the MachineCombiner. This patch corrects it by matching FMAs for VLS vectors in the DAGCombiner. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D112557	2021-11-01 10:43:52 -07:00
Bradley Smith	bf72a469ba	[AArch64][SVE] Fix build failure introduced in `13faa5f440`	2021-10-29 11:57:02 +00:00
Bradley Smith	13faa5f440	[AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types This adds support for SVE structured loads/stores to the relevant target hooks, such that we can support these instructions in the InterleavedAccess pass. Depends on D112078 Differential Revision: https://reviews.llvm.org/D112303	2021-10-29 09:35:57 +00:00
Kerry McLaughlin	1f49b71fe5	[SVE][CodeGen] Enable reciprocal estimates for scalable fdiv/fsqrt This patch enables the use of reciprocal estimates for SVE when both the -Ofast and -mrecip flags are used. Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D111657	2021-10-25 11:30:44 +01:00
Simon Pilgrim	71e39e3f18	[ADT] Add APInt::isNegatedPowerOf2() helper Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos..... Differential Revision: https://reviews.llvm.org/D111998	2021-10-19 14:38:21 +01:00
Andrew Wei	f5056c8c16	[AArch64] Improve shuffle vector by using wider types Try to widen element type to get a new mask value for a better permutation sequence, so that we can use NEON shuffle instructions, such as zip1/2, UZP1/2, TRN1/2, REV, INS, etc. For example: shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 6, i32 7, i32 2, i32 3> is equivalent to: shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1> Finally, we can get: mov v0.d[0], v1.d[1] Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D111619	2021-10-18 21:24:45 +08:00

1 2 3 4 5 ...

1445 Commits