llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	60f57b3658	[AArch64] Ensure fixed point fptoi_sat has correct saturation width D113200 introduced an error where it was converting FP_TO_SI_SAT with multiply to a fixed point floating point convert. The saturation bitwidth needs to be equal to the floating point width, or else the routine would truncate the result as opposed to saturating it. Fixes #54601	2022-03-29 10:12:44 +01:00
Shao-Ce SUN	662b9fa02c	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557	2022-03-29 09:53:24 +08:00
zhongyunde	c3fe025bd4	[AArch64][SelectionDAG] Refactor to support more scalable vector extending loads Accord the discussion in D120953, we should firstly exclude all scalable vector extending loads and then selectively enable those which we directly support. This patch is intend to refactor for above (truncating stores is not touched),and more scalable vector types will try to reduce the number of masked loads in favour of more unpklo/hi instructions. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122281	2022-03-27 21:18:01 +08:00
David Green	693d3b7e76	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-26 21:10:43 +00:00
Simon Pilgrim	e699b5da44	[AArch64] isProfitableToHoist - remove nullptr test User is dereferenced on the main codepath so the null test is likely superfluous	2022-03-25 10:27:16 +00:00
David Green	3d8d60e147	Revert "[AArch64] Lower 3 and 4 sources buildvectors to TBL" This reverts commit `ec93b28909` as problems with it have been reported.	2022-03-25 10:03:10 +00:00
David Green	ec93b28909	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-24 10:02:33 +00:00
David Green	54bc9ad2e8	[AArch64] Make some methods static. NFC	2022-03-24 08:55:27 +00:00
David Spickett	c3b98194df	Reland "[llvm][AArch64] Insert "bti j" after call to setjmp" This reverts commit `edb7ba714a`. This changes BLR_BTI to take variable_ops meaning that we can accept a register or a label. The pattern still expects one argument so we'll never get more than one. Then later we can check the type of the operand to choose BL or BLR to emit. (this is what BLR_RVMARKER does but I missed this detail of it first time around) Also require NoSLSBLRMitigation which I missed in the first version.	2022-03-23 11:43:43 +00:00
David Spickett	edb7ba714a	Revert "[llvm][AArch64] Insert "bti j" after call to setjmp" This reverts commit `eb5ecbbcbb` due to failures on buildbots with expensive checks enabled.	2022-03-23 10:43:20 +00:00
David Spickett	eb5ecbbcbb	[llvm][AArch64] Insert "bti j" after call to setjmp Some implementations of setjmp will end with a br instead of a ret. This means that the next instruction after a call to setjmp must be a "bti j" (j for jump) to make this work when branch target identification is enabled. The BTI extension was added in armv8.5-a but the bti instruction is in the hint space. This means we can emit it for any architecture version as long as branch target enforcement flags are passed. The starting point for the hint number is 32 then call adds 2, jump adds 4. Hence "hint #36" for a "bti j" (and "hint #34" for the "bti c" you see at the start of functions). The existing Arm command line option -mno-bti-at-return-twice has been applied to AArch64 as well. Support is added to SelectionDAG Isel and GlobalIsel. FastIsel will defer to SelectionDAG. Based on the change done for M profile Arm in https://reviews.llvm.org/D112427 Fixes #48888 Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D121707	2022-03-23 09:51:02 +00:00
zhongyunde	828b89bc0b	[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads Trying to reduce the number of masked loads in favour of more unpklo/hi instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions from legal types. Both of normal and masked loads test cases added to guard compile crash. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D120953	2022-03-21 23:47:33 +08:00
chenglin.bi	dd3b90e4d7	[AArch64] Combine ISD::SETCC into AArch64ISD::ANDS When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false). ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction. So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison Fix https://github.com/llvm/llvm-project/issues/54283 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121449	2022-03-19 13:04:16 +00:00
Paul Walker	f46fe36d59	[AArch64] Fix incorrect getSetCCInverse usage within trySwapVSelectOperands. When inverting the compare predicate trySwapVSelectOperands is incorrectly using the type of the select's cond operand rather than the type of cond's operands. This means we're treating all inversions as if they're integer. Differential Revision: https://reviews.llvm.org/D121968	2022-03-19 12:36:14 +00:00
David Green	fe6057a293	[AArch64] Custom lower concat(v4i8 load, ...) We already have custom lowering for v4i8 load, which loads as a f32, converts to a vector and bitcasts and extends the result to a v4i16. This adds some custom lowering of concat(v4i8 load, ...) to keep the result as an f32 and create a buildvector of the resulting f32 loads. This helps not create all the extends and bitcasts, which are often difficult to fully clean up. Differential Revision: https://reviews.llvm.org/D121400	2022-03-18 11:58:02 +00:00
David Green	0b6df40c52	[AArch64] Combine ISD::AND into AArch64ISD::ANDS If we already have a AArch64ISD::ANDS node with identical operands, we can merge any ISD::AND into it, reducing the instruction count by calculating the value and the flags in a single operation. This code is taken from the X86 backend, and could also handle AArch64ISD::ADDS and AArch64ISD::SUBS, but I couldn't find any test cases where it came up. Differential Revision: https://reviews.llvm.org/D118584	2022-03-17 09:44:11 +00:00
zhongyunde	3568333815	[AArch64] Perform last active true vector combine Test bit of lane EC-1 can use P register directly, eg: Materialize : Idx = (add (mul vscale, NumEls), -1) i1 = extract_vector_elt t37, Constant:i64<Idx> ... into: "ptrue p, all" + PTEST Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121180	2022-03-15 01:25:03 +08:00
Arthur Eubanks	250620f76e	[OpaquePtr][AArch64] Use elementtype on ldxr/stxr Includes verifier changes checking the elementtype, clang codegen changes to emit the elementtype, and ISel changes using the elementtype. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120527	2022-03-14 10:09:59 -07:00
David Sherwood	aeeb1199b4	[AArch64][SVE] Change the asserts in LowerToPredicatedOp to check for legal types When building the LLVM test suite with SVE I discovered a crash when compiling some Halide tests, which occurs because we try to use SVE to lower 64-bit vector multiplies and there is no vscale_range attribute on the function. In this case the min SVE vector bits was 0, which caused an assert in LowerToPredicatedOp to fire. I have amended the asserts in this function to check that the fixed-width type is legal. If the fixed-width type is larger than NEON and is legal then it must be because we've set the min SVE vector bits to something > 128. Or if the min SVE bits is 0, then the only legal types allowed are 128 bit types - for any other types the assert will fire. Tests added here: CodeGen/AArch64/sve-fixed-length-no-vscale-range.ll Differential Revision: https://reviews.llvm.org/D121297	2022-03-11 09:57:58 +00:00
Philippe Valembois	26cd258420	[AArch64] Use correct calling convention for each vararg While checking is tail call optimization is possible, the calling convention applied to fixed arguments is not the correct one. This implies for DarwinPCS that all arguments of a vararg function will go to the stack although fixed ones can go in registers. This prevents non-virtual thunks to be tail optimized although they are marked as musttail. Differential Revision: https://reviews.llvm.org/D120622	2022-03-10 15:07:25 -08:00
David Green	4899e2cab4	[AArch64] Fix type in comment. NFC	2022-03-10 15:03:27 +00:00
David Green	21a97a2ac1	[AArch64] TBL uses zero for out of range elements. A TBL instruction will use zero for any out of range values. We can use this in GenerateTBL to help turn a TBL2 into a TBL1, avoiding the need to materialise the zero. Differential Revision: https://reviews.llvm.org/D121139	2022-03-10 14:45:13 +00:00
zhongyunde	c22c8b151b	[AArch64] Perform first active true vector combine Materialize : i1 = extract_vector_elt t37, Constant:i64<0> ... into: "ptrue p, all" + PTEST Test bit of lane 0 can use P register directly, and the instruction “pture all” is loop invariant, which will beneficial to SVE after hoisting out the loop. Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D120891	2022-03-08 01:10:21 +08:00
David Green	d9633d1490	[AArch64] Turn truncating buildvectors into truncates When lowering large v16f32->v16i8 fp_to_si_sat, the fp_to_si_sat node is split several times, creating an illegal v4i8 concat that gets expanded into a BUILD_VECTOR. After some combining and other legalisation, it ends up the a buildvector that extracts from 4 vectors, looking like BUILDVECTOR(a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3). That is really an v16i32->v16i8 truncate in disguise. This adds a ReconstructTruncateFromBuildVector method to detect the pattern, converting it back into the legal "concat(trunc(concat(trunc(a), trunc(b))), trunc(concat(trunc(c), trunc(d))))" tree. The extracted nodes could also be v4i16, in which case the truncates are not needed. All those truncates and concats then become uzip1's, which is much better than expanding by moving vector lanes around. Differential Revision: https://reviews.llvm.org/D119469	2022-03-07 09:42:54 +00:00
Benjamin Kramer	fbce4a7803	Drop some more global std::maps. NFCI.	2022-03-06 13:28:29 +01:00
Karl Meakin	43a0016f3d	Extend `performANDCSELCombine` to `performANDORCSELCombine` Differential Revision: https://reviews.llvm.org/D120422	2022-03-04 15:09:59 +00:00
Paul Walker	42b4a6227e	[DAGCombine] Prevent illegal ISD::SPLAT_VECTOR operations post legalisation. When triggered during operation legalisation the affected combine generates a splat_vector that when custom lowered for SVE fixed length code generation, results in the original precombine sequence and thus we enter a legalisation/combine hang. NOTE: The patch contains no tests because I observed this issue only when combined with other work that might never become public. The current way AArch64 lowers ISD::SPLAT_VECTOR meant a specific test was not possible so I'm hoping the DAGCombiner fix can be seen as obvious. The AArch64ISelLowering change is requirted to maintain existing code quality. Differential Revision: https://reviews.llvm.org/D120735	2022-03-04 11:54:03 +00:00
David Green	e348b09bb5	[AArch64] Turn UZP1 with undef operand into truncate This turns upz1(x, undef) to concat(truncate(x), undef), as the truncate is simpler and can often be optimized away, and it helps some of the insert-subvector tests optimize more cleanly. Differential Revision: https://reviews.llvm.org/D120879	2022-03-04 11:12:26 +00:00
Cullen Rhodes	616586794b	[AArch64] Add legal types for Streaming SVE The compiler currently crashes for scalable types when compiling with +sme, e.g. define <vscale x 4 x i32> @foo(<vscale x 4 x i32> %a) { ret <vscale x 4 x i32> %a } since it doesn't know how to legalize the types. SME implies a subset of SVE (+streaming-sve), the hasSVE predication in the backend needs extending to consider types/operations that are legal in Streaming SVE. This is the first patch adding legal types <-> register classes. Before making the change +sve(2) was temporarily replaced with +sme in all the intrinsics tests to see what failed, and again after making the change. For all the tests that passed after adding the legal types another RUN line has been added for +streaming-sve. More patches to follow. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118561	2022-03-03 09:51:14 +00:00
Sander de Smalen	eac2638ec1	[AArch64][SVE] Fold away SETCC if original input was predicate vector. This adds the following two folds: Fold 1: setcc_merge_zero( all_active, extend(nxvNi1 ...), != splat(0)) -> nxvNi1 ... Fold 2: setcc_merge_zero( pred, extend(nxvNi1 ...), != splat(0)) -> nxvNi1 and(pred, ...) Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D119334	2022-02-28 14:12:43 +00:00
Sander de Smalen	201e3686ab	[AArch64][SVE] Handle more cases in findMoreOptimalIndexType. This patch addresses @paulwalker-arm's comment on D117900 to only update/write the by-ref operands iff the function returns true. It also handles a few more cases where a series of added offsets can be folded into the base pointer, rather than just looking at a single offset. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D119728	2022-02-28 12:13:52 +00:00
Paul Walker	16ee102964	[SVE] Add missing splat patterns for bfloat vectors. Differential Revision: https://reviews.llvm.org/D120496	2022-02-25 16:53:39 +00:00
Paul Walker	8ca5be93cc	[SVE] Don't custom lower constant predicate ISD:SPLAT_VECTOR operations. Differential Revision: https://reviews.llvm.org/D120340	2022-02-25 11:32:37 +00:00
Arthur Eubanks	6aa285eb85	[OpaquePtr][AArch64] Use load/store value type instead of pointer type for ldnt1/stnt1 alignment	2022-02-24 16:56:13 -08:00
Craig Topper	c7d6448d03	[DAGCombiner][TargetLowering] Pass SDValue by value to isMulAddWithConstProfitable. Internally to DAGCombiner the SDValues were passed by non-const reference despite not being modified. They were then passed by const reference to TLI. This patch passes them by value which is consistent with the vast majority of code. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D120420	2022-02-23 12:40:45 -08:00
David Green	774b571546	[AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors We have a combine for converting mul(dup(ext(..)), ...) into mul(ext(dup(..)), ..), for allowing more uses of smull and umull instructions. Currently it looks for vector insert and shuffle vectors to detect the element that we can convert to a vector extend. Not all cases will have a shufflevector/insert element though. This started by extending the recognition to buildvectors (with elements that may be individually extended). The new method seems to cover all the cases that the old method captured though, as the shuffle will eventually be lowered to buildvectors, so the old method has been removed to keep the code a little simpler. The new code detects legal build_vector(ext(a), ext(b), ..), converting them to ext(build_vector(a, b, ..)) providing all the extends/types match up. Differential Revision: https://reviews.llvm.org/D120018	2022-02-22 23:37:22 +00:00
Reid Kleckner	ecb27004ec	Revert "[AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors" This reverts commit `9fc1a0dcb7`. We have bisected a compiler crash to this revision and will provide a test case soon.	2022-02-22 10:31:09 -08:00
Sunho Kim	d6a9eec238	[AARCH64][DAGCombine] Add combine for negation of CSEL absolute value pattern. This folds a negation through a csel, which can come up during the lowering of negative abs. Fixes https://github.com/llvm/llvm-project/issues/51558. Differential Revision: https://reviews.llvm.org/D112204	2022-02-22 09:59:36 +00:00
David Green	9fc1a0dcb7	[AArch64] Alter mull shuffle(ext(..)) combine to work on buildvectors We have a combine for converting mul(dup(ext(..)), ...) into mul(ext(dup(..)), ..), for allowing more uses of smull and umull instructions. Currently it looks for vector insert and shuffle vectors to detect the element that we can convert to a vector extend. Not all cases will have a shufflevector/insert element though. This started by extending the recognition to buildvectors (with elements that may be individually extended). The new method seems to cover all the cases that the old method captured though, as the shuffle will eventually be lowered to buildvectors, so the old method has been removed to keep the code a little simpler. The new code detects legal build_vector(ext(a), ext(b), ..), converting them to ext(build_vector(a, b, ..)) providing all the extends/types match up. Differential Revision: https://reviews.llvm.org/D120018	2022-02-21 15:44:30 +00:00
Serguei Katkov	7f2293ba25	[STATEPOINT] Mark LR is early-clobber implicit def. LR is modified at the moment of the call and before any use is read. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D120114	2022-02-21 10:37:43 +07:00
David Green	b8801ba050	[AArch64] Common patterns between UMULL and int_aarch64_neon_umull We have some duplicate patterns between the AArch64ISD::UMULL (/SMULL) and the int_aarch64_neon_umull (/smull) intrinsics. They did not replicate all the patterns though, leaving some gaps on instructions like umlal2 from codegen. This commons all the patterns by converting all int_aarch64_neon_umull intrinsics to UMULL nodes and removing the duplicate for umull/smull intrinsics, so that all instructions go through the same tablegen pattern. This improves some of the longer-than-legal mla patterns, helping them replace ext with umlal2. Differential Revision: https://reviews.llvm.org/D119887	2022-02-19 14:38:57 +00:00
Simon Pilgrim	f60d101b00	Fix Wdocumentation unknown parameter warning	2022-02-19 13:00:59 +00:00
John Brawn	bbd7eac800	[AArch64] Remove an unused variable in my previous patch	2022-02-17 16:39:44 +00:00
John Brawn	8e17c9613f	[AArch64] Add some missing strict FP vector lowering Also add a test for the codegen of strict FP vector operations so these changes get tested. Differential Revision: https://reviews.llvm.org/D117795	2022-02-17 16:10:31 +00:00
Matt Devereau	2f2dcb4fb1	[AArch64][SVE] Invert VSelect operand order and condition for predicated arithmetic operations (vselect (setcc ( condcode) (_) (_)) (a) (op (a) (b))) => (vselect (setcc (!condcode) (_) (_)) (op (a) (b)) (a)) As a follow up to D117689, invert the operand order and condition in order to fold vselects into predicated instructions. Differential Revision: https://reviews.llvm.org/D119424	2022-02-17 16:01:17 +00:00
John Brawn	d916856bee	[AArch64] Allow strict opcodes in faddp patterns This also requires adjustment to code in AArch64ISelLowering so that vector_extract is distributed over strict_fadd. Differential Revision: https://reviews.llvm.org/D118489	2022-02-17 13:11:55 +00:00
John Brawn	d4342efb69	[AArch64] Add instruction selection for strict FP This consists of marking the various strict opcodes as legal, and adjusting instruction selection patterns so that 'op' is 'any_op'. FP16 and vector instructions additionally require some extra work in lowering and legalization, so we can't set IsStrictFPEnabled just yet. Also more work needs to be done for full strict fp support (marking instructions that can raise exceptions as such, and modelling FPCR use for controlling rounding). Differential Revision: https://reviews.llvm.org/D114946	2022-02-17 13:11:54 +00:00
David Green	f3bc7fd546	[AArch64] Cleanup for performCommonVectorExtendCombine. NFC This is some NFC (hopefully!) cleanup for performCommonVectorExtendCombine and related methods, removing conditions that cannot occur and otherwise cleaning up the code a little.	2022-02-17 10:03:28 +00:00
zhongyunde	064b2a6dc6	[DAGCombiner][AArch64] Enhance to fold CSNEG into CSINC instruction Perform the scalar expression combine in the form of: CSNEG(1, c, cc) + b => cc ? b+1 : b-c => CSINC(b-c, b, !cc) CSNEG(c, -1, cc) + b => cc ? b+c : b+1 => CSINC(b+c, b, cc) Fix https://github.com/llvm/llvm-project/issues/53071 Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D119105	2022-02-16 09:39:38 +08:00
David Green	4072e362c0	[ISel] Port AArch64 HADD and RHADD to ISel This ports the aarch64 combines for HADD and RHADD over to DAG combine, so that they can be used in more architectures (notably MVE in a followup patch). They are renamed to AVGFLOOR and AVGCEIL in the process, to avoid confusion with instructions such as X86 hadd. The code was also rewritten slightly to remove the AArch64 idiosyncrasies. The general pattern for a AVGFLOORS is %xe = sext i8 %x to i32 %ye = sext i8 %y to i32 %a = add i32 %xe, %ye %r = lshr i32 %a, 1 %t = trunc i32 %r to i8 An AVGFLOORU is equivalent with zext. Because of the truncate lshr==ashr, as the top bits are not demanded. An AVGCEIL also includes an extra rounding, so includes an extra add of 1. Differential Revision: https://reviews.llvm.org/D106237	2022-02-11 18:28:56 +00:00

1 2 3 4 5 ...

1329 Commits