llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	e4a124dda5	[DAG] Fold (srl (shl x, c1), c2) -> and(shl/srl(x, c3), m) Similar to the existing (shl (srl x, c1), c2) fold Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125836	2022-06-20 08:37:38 +01:00
David Sherwood	6f6fa5aa10	[AArch64][SME] Add SME cntsb/h/w/d intrinsics These intrinsics return the number of elements in a streaming vector, for example aarch64.sme.cntsw returns the number of 32-bit elements. When in streaming mode these are equivalent to aarch64.sve.cntb/h/w/d with an input value of 1. I have implemented these intrinsics using the rdsvl instruction and added tests here: CodeGen/AArch64/SME/sme-intrinsics-rdsvl.ll Differential Revision: https://reviews.llvm.org/D127853	2022-06-16 10:50:25 +01:00
David Sherwood	5fa2416ea0	[AArch64][SME] Add SME read/write intrinsics that map to the mova instruction This patch adds implementations for the read/write SME ACLE intrinsics: @llvm.aarch64.sme.read.horiz @llvm.aarch64.sme.read.vert @llvm.aarch64.sme.write.horiz @llvm.aarch64.sme.write.vert These all map to the SME mova instruction. Differential Revision: https://reviews.llvm.org/D127414	2022-06-15 10:31:07 +01:00
David Sherwood	bd61664167	[AArch64][SME] Add ldr/str (fill/spill) intrinsics This patch adds implementations for the fill/spill SME ACLE intrinsics: @llvm.aarch64.sme.ldr @llvm.aarch64.sme.str Differential Revision: https://reviews.llvm.org/D127317	2022-06-14 13:58:22 +01:00
Rosie Sumpter	2c4e44752d	[AArch64][SME] Add load/store intrinsics This patch adds implementations for the load/store SME ACLE intrinsics: - @llvm.aarch64.sme.ld1* - @llvm.aarch64.sme.st1* Differential Revision: https://reviews.llvm.org/D127210	2022-06-14 11:11:22 +01:00
Guillaume Chatelet	d1a27d0b9c	[NFC][Alignment] Use proper version of getAlign	2022-06-13 12:59:38 +00:00
David Green	338fd211e7	[AArch64] Generate FADDP from shuffled fadd As a follow up to D126686, this does the same fold for floating point add and shuffle. In this case it is limited to reassoc either x[0]+x[1] or x[1]+x[0] for both result[0] and results[1]. Differential Revision: https://reviews.llvm.org/D127087	2022-06-11 14:16:37 +01:00
Ahmed Bougacha	c68b469e07	[AArch64][SVE] Don't crash on pre-legalizer types in extload combine. This was assuming the vector types were MVTs, but they don't have to be. Note that the concrete output of the test isn't very useful, since it's dominated by nonsensical calling convention lowering for the weird types. Differential Revision: https://reviews.llvm.org/D126505	2022-06-09 10:33:21 -07:00
Paul Walker	a1121c31d8	[SVE] Fix incorrect code generation for bitcasts of unpacked vector types. Bitcasting between unpacked scalable vector types of different element counts is not a NOP because the live elements are laid out differently. 01234567 e.g. nxv2i32 = XX??XX?? nxv4f16 = X?X?X?X? Differential Revision: https://reviews.llvm.org/D126957	2022-06-08 10:30:07 +01:00
Guillaume Chatelet	0788186182	[Alignment][NFC] Remove usage of MemSDNode::getAlignment I can't remove the function just yet as it is used in the generated .inc files. I would also like to provide a way to compare alignment with TypeSize since it came up a few times. Differential Revision: https://reviews.llvm.org/D126910	2022-06-07 13:52:20 +00:00
David Green	4ea1b43527	[AArch64] Generate ADDP from shuffled add This adds a fold of add(x, shuffle(x, <1,0,3,2,5,4,...>), into shuffle(addp(x), <0,0,1,1,2,2,..>. The ADDP instruction takes two vectors and returns one, adding adjacent pairs. So we match x in a custom combine as it is lowered from a v8i32. The original code would be 2 rev64 and 2 add, with the new code being a single addp with a zip1;zip2 shuffle, producing smaller code. Differential Revision: https://reviews.llvm.org/D126686	2022-06-06 11:39:51 +01:00
Paul Walker	2dde272db7	[SVE] Refactor sve-bitcast.ll to include all combinations for legal types. Patch enables custom lowering for MVT::nxv4bf16 because otherwise the refactored test file triggers a selection failure. The reason for the refactoring it to highlight cases where the generated code is wrong.	2022-06-03 12:09:19 +01:00
Paul Walker	48ea26a387	[SVE] Fixed custom lowering of ISD::INSERT_SUBVECTOR. LowerINSERT_SUBVECTOR emits AArch64ISD::UUNPK## when lowering scalable vector floating point INSERT_SUBVECTOR. However, these nodes only make sense for integer types and thus isel patterns do not exist for floating point, which leads to isel failures. This patch ensures floating point operands are cast to integer before the core lowering takes place. Fixes: #55037 Differential Revision: https://reviews.llvm.org/D126487	2022-06-02 14:51:04 +01:00
Paul Walker	1fe4953d89	[SVE] Remove custom lowering of scalable vector MGATHER & MSCATTER operations. Differential Revision: https://reviews.llvm.org/D126255	2022-06-02 11:19:52 +01:00
Sander de Smalen	9c38fc111b	[AArch64] Remove references to Streaming SVE from target features. Following discussion on D120261 and D121208 it seems better to remove the concept of Streaming SVE from the subtarget/assembler predicates and instead reason about 'SVE' and 'SME' as its higher level features, rather than trying to model this runtime mode through explicit feature flags. This patch is largely NFC. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D125977	2022-05-31 16:25:01 +02:00
David Green	9a3144d078	[AArch64] Reuse larger DUP if available If both a v2i32 DUP(x) and a v4i32 DUP(x) node exists, we can re-use the larger node using a vector extract to obtain the smaller. This comes up in the smull/smlal code, but needs a small fixup to allow the smull2 code in tryExtendDUPToExtractHigh/performAddSubLongCombine to still match smull2 extracts. Differential Revision: https://reviews.llvm.org/D126449	2022-05-29 19:42:13 +01:00
Florian Hahn	786c687810	[AArch64] Add support for FMA intrinsics to shouldSinkOperands. If the fma operates on a legal vector type, the indexed variants can be used, if the second operand is a splat of a valid index. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D126234	2022-05-27 10:37:03 +01:00
Zongwei Lan	ad73ce318e	[Target] use getSubtarget<> instead of static_cast<>(getSubtarget()) Differential Revision: https://reviews.llvm.org/D125391	2022-05-26 11:22:41 -07:00
Florian Hahn	0cc981e021	[AArch64] implement isReassocProfitable, disable for (u\|s)mlal. Currently reassociating add expressions can lead to failing to select (u\|s)mlal. Implement isReassocProfitable to skip reassociating expressions that can be lowered to (u\|s)mlal. The same issue exists for the *mlsl variants as well, but the DAG combiner doesn't use the isReassocProfitable hook before reassociating. To be fixed in a follow-up commit as this requires DAGCombiner changes as well. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D125895	2022-05-23 09:39:00 +01:00
David Green	6ef5e242f2	[AArch64] Fix assumptions on input type of tryCombineFixedPointConvert It is possible for the input type to not be v2i64 or v4i32, so weaken the assertion to a return, fixing the crash in the new test. Fixes #55606	2022-05-23 08:55:54 +01:00
Paul Walker	258dac43d6	[SVE] Enable use of 32bit gather/scatter indices for fixed length vectors Differential Revision: https://reviews.llvm.org/D125193	2022-05-22 12:32:30 +01:00
Paul Walker	216f546c84	[SVE] Refactor lowering for fixed length MGATHER/MSCATTER. Lower fixed length MGATHER/MSCATTER operations to scalable vector equivalents, which are then lowered to SVE specific nodes. This two stage process is in preparation for making scalable vector MGATHER/MSCATTER operations legal. Differential Revision: https://reviews.llvm.org/D125192	2022-05-21 10:14:45 +01:00
Rahul Anand R	534ea8bca5	[AArch64] Generate AND in place of CSEL for predicated CTTZ This patch implements a for a target specific optimization that replaces the cmp and csel from cttz with an and mask. Recommitted with a fix for truncated value sizes. Differential Revision: https://reviews.llvm.org/D123782	2022-05-20 13:41:32 +01:00
David Green	602f81ec33	[AArch64] Fix zero element TBL indices A TBL instruction will fill out-of-range values with 0's, something used in D121139 to turn tbl2 with a zero input into tbl1s. This works OK for v16i8, but for v8i8 the input is still treated as a v16i8, so out-of-range values (like a lane index of 8) would end up loading values from the top half of the input register. Clean this up by detecting the out of range values and making sure they really use out of range values. There is a fix for swapped indices of 64bit input vectors too, which could be incorrectly adjusted if the zerovector was the first operand. Fixes #55545 Differential Revision: https://reviews.llvm.org/D125865	2022-05-19 13:54:35 +01:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
David Green	4c6a070a2c	[AArch64] Teach perfect shuffles tables about D-lane movs Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle tables, slightly lowering the costs a little more. This is a rough improvement in general, especially if you ignore mov v0.16b, v2.16b type moves that are often artefacts of the calling convention. The D register movs are encoded as (0x4 \| LaneIdx), and to generate a D register move we are required to bitcast into a higher type, but it is otherwise very similar to the S-lane mov's already supported. Differential Revision: https://reviews.llvm.org/D125477	2022-05-17 18:16:45 +01:00
Simon Pilgrim	d40b7f0d5a	[DAG] Fold (shl (srl x, c), c) -> and(x, m) even if srl has other uses If we're using shift pairs to mask, then relax the one use limit if the shift amounts are equal - we'll only be generating a single AND node. AArch64 has a couple of regressions due to this, so I've enforced the existing one use limit inside a AArch64TargetLowering::shouldFoldConstantShiftPairToMask callback. Part of the work to fix the regressions in D77804 Differential Revision: https://reviews.llvm.org/D125607	2022-05-17 13:40:11 +01:00
Paul Walker	ee8aa351e4	[AArch64] Use ADDV for boolean xor reductions. NEON does not have native support for xor reductions. However, when reducing predicate vectors the operation is synonymous with an add reduction that is supported. Differential Revision: https://reviews.llvm.org/D125605	2022-05-16 22:34:12 +01:00
Paul Walker	7dd05ba9ed	[SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes. During early gather/scatter enablement two different approaches were taken to represent scaled indices: * A Scale operand whereby byte_offsets = Index * Scale * An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType) Having multiple representations is bad as shown by this patch which fixes instances where the two are out of sync. The dedicated scale operand is more flexible and pervasive so this patch removes the UNSCALED values from IndexType. This means all indices are scaled but the scale can be one, hence unscaled. SDNodes now use the scale operand to answer the "isScaledIndex" question. I toyed with the idea of keeping the UNSCALED enums and helper functions but because they will have no uses and force SDNodes to validate the set of supported values I figured it's best to remove them. We can re-add them if there's a real need. For similar reasons I've kept the IndexType enum when a bool could be used as I think being explicitly looks better. Depends On D123347 Differential Revision: https://reviews.llvm.org/D123381	2022-05-16 20:47:52 +01:00
David Green	4c3e51ecfa	[AArch64] Handle 64bit vectors in tryCombineFixedPointConvert Under some situations we can visit 64bit vector extract elements in tryCombineFixedPointConvert, where an assert fires as they are expected to have been converted to 128bit. Turn the assert into an if statement, bailing out and letting the extract be handled first. Also invert some ifs, using early exits to reduce indentation. Fixes #55417	2022-05-16 11:08:47 +01:00
Karl Meakin	0298cce257	[AArch64] Add `foldADCToCINC` DAG combine. Differential revision: https://reviews.llvm.org/D123781	2022-05-12 22:21:20 +01:00
Karl Meakin	d29fc6e7d2	[AArch64] Replace `performANDSCombine` with `performFlagSettingCombine`. `performFlagSettingCombine` is a generalised version of `performANDSCombine` which also works on `ADCS` and `SBCS`. Differential revision: https://reviews.llvm.org/D124464	2022-05-12 22:17:23 +01:00
Nikita Popov	44d85259d0	[AArch64] Preserve chain when lowering fixed length load to SVE (PR55281) When a fixed length load is lowered to an SVE masked load, the result chain is currently set to the input chain of the old load, rather than the result chain of the new load. This may cause stores to be incorrectly reordered. Fixes https://github.com/llvm/llvm-project/issues/55281. Differential Revision: https://reviews.llvm.org/D125464	2022-05-12 16:03:32 +02:00
Alban Bridonneau	e6635377e5	[NFC] Change comment number in aarch64 isel	2022-05-11 15:41:33 +00:00
David Green	442c351b2b	Revert "[AArch64] Generate AND in place of CSEL for predicated CTTZ" This reverts commit `7dcd0ea683` due to issues reported postcommit with the correctness of truncated cttzs.	2022-05-10 17:17:03 +01:00
Amaury Séchet	59fea9380d	[AArch64] Remove ADDC, ADDE, SUBC, SUBE support, use the CARRY ops instead This cleans up tech debt. Similar to D33390 . Reviewed By: Kmeakin Differential Revision: https://reviews.llvm.org/D125150	2022-05-10 00:23:15 +00:00
Rosie Sumpter	1a2665902f	[AArch64][SVE] Improve codegen when extracting first lane of active lane mask When extracting the first lane of a predicate created using the llvm.get.active.lane.mask intrinsic, it should give the same codegen as when the predicate is created using the llvm.aarch64.sve.whilelo intrinsic, since get.active.lane.mask is lowered to whilelo. This patch ensures the codegen is the same by recognizing llvm.get.active.lane.mask as a flag-setting operation in this case. Differential Revision: https://reviews.llvm.org/D125215	2022-05-09 13:56:04 +01:00
Alban Bridonneau	fef81131d9	[SVE] Optimize new cases for lowerConvertToSVBool Converts to SVBool are already considered as a nop, if they are converting an operand from a ptrue or a cmp, because they zero the extra predicate lanes by construction. This patch adds 2 similar cases: - The wide cmp, which were not directly recognized by the test for other forms of cmp - Splats of 1, which will be generated as ptrue, and as such will also zero the extra predicate lines. Reviewed By: paulwalker-arm, peterwaller-arm Differential Revision: https://reviews.llvm.org/D124908	2022-05-09 10:17:57 +00:00
Rahul Anand R	7dcd0ea683	[AArch64] Generate AND in place of CSEL for predicated CTTZ This patch implements a for a target specific optimization that replaces the cmp and csel from cttz with an and mask. Differential Revision: https://reviews.llvm.org/D123782	2022-05-09 10:28:20 +01:00
Paul Walker	702c4ade22	[ISD::IndexType] Helper functions for common queries. Add helper functions to query the signed and scaled properties of ISD::IndexType along with functions to change them. Remove setIndexType from MaskedGatherSDNode because it only has one usage and typically should only be changed alongside its index operand. Minimise the direct use of the enum values to lay the groundwork for more refactoring. Differential Revision: https://reviews.llvm.org/D123347	2022-05-07 11:23:42 +01:00
Kazu Hirata	fffb6e6afd	[AArch64] Fix sub with carry `13403a70e4` introduced a bug where we generate the outgoing carry inverted, which in turn breaks the lowering of @llvm.usub.sat.i128, returning the normal difference on saturation and zero otherwise. Note that AArch64 has peculiar semantics where the subtraction instructions generate borrow inverted. The problem is that we mix the two forms of semantics -- the normal carry and inverted carry -- in the area of extended precision subtractions. Specifically, we have three problems: - lowerADDSUBCARRY takes the non-inverted incoming carry from a subtraction and feeds it to SBCS without inverting it first. - lowerADDSUBCARRY makes available the outgoing carry from SBCS without inverting it. - foldOverflowCheck folds: (SBC{S} l r (CMP (CSET LO carry) 1)) => (SBC{S} l r carry) When the incoming carry flag is set, CSET LO results in zero. CMP in turn generates a borrow, clearing the carry flag. Instead, we should fold: (SBC{S} l r (CMP 0 (CSET LO carry))) => (SBC{S} l r carry) When the incoming carry flag is set, CSET LO results in zero. CMP does not generate a borrow, setting the carry flag. IIUC, we should use the normal (that is, non-inverted) semantics for carry everywhere. This patch fixes the three problems above. This patch does not add any new testcases because we have a plenty of them covering the instruction in question. In particular, @u128_saturating_sub is identical to the testcase in the motivating issue. Fixes: #55253 Differential Revision: https://reviews.llvm.org/D124976	2022-05-06 11:04:17 -07:00
Paul Walker	b481512485	[SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine. This is essentially a refactoring patch but allows more cases to be caught, hence the output changes to some tests. Differential Revision: https://reviews.llvm.org/D122994	2022-04-29 17:42:33 +01:00
Paul Walker	59588f0a3d	[SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost. getGatherScatterIndexIsExtended currently looks through all SIGN_EXTEND_INREG operations regardless of their input type. This patch restricts the code to only look through i32->i64 extensions, which are the ones supported implicitly by SVE addressing modes. Differential Revision: https://reviews.llvm.org/D123318	2022-04-29 14:20:13 +01:00
Paul Walker	7a0b897e86	[DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling refineUniformBase and selectGatherScatterAddrMode both attempt the transformation: base(0) + index(A+splat(B)) => base(B) + index(A) However, this is only safe when index is not implicitly scaled. Differential Revision: https://reviews.llvm.org/D123222	2022-04-29 12:35:16 +01:00
David Green	d6327050e0	[AArch64] Use PerfectShuffle costs in AArch64TTIImpl::getShuffleCost Given a shuffle with 4 elements size 16 or 32, we can use the costs directly from the PerfectShuffle tables to get a slightly more accurate cost for the resulting shuffle. Differential Revision: https://reviews.llvm.org/D123409	2022-04-27 12:09:01 +01:00
David Green	9727c77d58	[NFC] Rename Instrinsic to Intrinsic	2022-04-25 18:13:23 +01:00
Karl Meakin	81904454f7	[AArch64] Add `foldOverflowCheck` DAG combine Differential Revision: https://reviews.llvm.org//D123779	2022-04-21 14:56:38 +01:00
Karl Meakin	13403a70e4	[AArch64] Add lowerings for {ADD,SUB}CARRY and S{ADD,SUB}O_CARRY Differential Revision: https://reviews.llvm.org/D123322	2022-04-21 14:56:37 +01:00
David Green	73dc996428	[AArch64] Add lane moves to PerfectShuffle tables This teaches the perfect shuffle tables about lane inserts, that can help reduce the cost of many entries. Many of the shuffle masks are one-away from being correct, and a simple lane move can be a lot simpler than trying to use ext/zip/etc. Because they are not exactly like the other masks handled in the perfect shuffle tables, they require special casing to generate them, with a special InsOp Operator. The lane to insert into is encoded as the RHSID, and the move from is grabbed from the original mask. This helps reduce the maximum perfect shuffle entry cost to 3, with many more shuffles being generatable in a single instruction. Differential Revision: https://reviews.llvm.org/D123386	2022-04-19 14:49:50 +01:00
David Green	cc9495f679	[AArch64] Only mark cost 1 perfect shuffles as legal The perfect shuffle tables encode a cost of either 0 (a nop-copy) or 1 (a single instruction) with a cost encoding of 0 in the upper 2 bits. All perfect shuffles with any cost are then marked as legal shuffles though (the maximum encoded cost is 3), which can confuse the DAG combiner into thinking the shuffles are cheaper than the should be. Limiting legal shuffles to single instructions seems to do better in most case, producing less instructions for complex shuffles. There are some cases that now become tbl, which may be better or worse depending on whether the instruction is in a loop and the tbl load can be hoisted out. Differential Revision: https://reviews.llvm.org/D123377	2022-04-19 12:58:55 +01:00
chenglin.bi	222adf338a	[Arch64][SelectionDAG] Add target-specific implementation of srem 1. X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. 2. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-19 02:49:42 +08:00
chenglin.bi	acfc025a72	Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem" This reverts commit `9d9eddd3dd`.	2022-04-18 10:35:09 +08:00
chenglin.bi	9d9eddd3dd	[Arch64][SelectionDAG] Add target-specific implementation of srem X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first. Add AArch64 faster path for SREM only pow2 case. Fix https://github.com/llvm/llvm-project/issues/54649 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122968	2022-04-16 12:29:11 +08:00
zhongyunde	49cb4fef02	[AArch64][SelectionDAG] Refactor to support more scalable vector extending stores Similar to D122281, we should firstly exclude all scalable vector extending stores and then selectively enable those which we directly support. Also merge integer and float scalable vector into scalable_vector_valuetypes. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D123449	2022-04-15 19:11:40 +08:00
Paul Walker	a5a258e208	[SVE] Refactor MGATHER lowering for unsupported passthru values. Handle unsupported passthru values before lowering the gather to target specific nodes. This is a simplification that's on the road to moving more of MGATHER lowering into td based isel. Differential Revision: https://reviews.llvm.org/D123683	2022-04-14 17:26:43 +01:00
John Brawn	12c1022679	[AArch64] Lowering and legalization of strict FP16 For strict FP16 to work correctly needs some changes in lowering and legalization: * SelectionDAGLegalize::PromoteNode was missing handling for some strict fp opcodes. * Some of the custom lowering of strict fp operations needed to be adjusted to work with FP16. * Custom lowering needed to be added for round-to-int operations. With this, and the previous patches for the rest of the strict fp isel, we can set IsStrictFPEnabled = true. Differential Revision: https://reviews.llvm.org/D115620	2022-04-14 16:51:22 +01:00
David Green	1ba8f4f67d	[AArch64] Move v4i8 concat load lowering to a combine. The existing code was not updating the uses of loads that it recreated, leading to incorrect chains which could break the ordering between nodes. This moves the code to a combine instead, and makes sure we update the chain references. This does mean it happens earlier - potentially before the concats are simplified. This can lead to inefficiencies in the codegen, which will be fixed in followups.	2022-04-14 15:19:33 +01:00
Paul Walker	0c44115e51	[SVE] Add support for non-element-type sized scaling when lowering MGATHER/MSCATTER. The lowering code did not use the scale operand of MGATHER/MSCATTER nodes, but instead assumed scaled indices were always scaled based on the element type of the memory type. This patch adds the missing support by rewritting the nodes as unscaled variants. Differential Revision: https://reviews.llvm.org/D123670	2022-04-14 11:54:46 +01:00
David Sherwood	44271e7c55	[AArch64][SVE] Fix lowering of "fcmp ueq/one" when using SVE We were previously lowering to the incorrect instructions for the setcc DAG node when using the SETUEQ and SETONE floating point condition codes. I have fixed this by marking the SETONE code as Expand and letting the SETUNE code be legal. I have also fixed up the patterns for FCMNE_PPzZZ and FCMNE_PPzZ0 to use the correct opcode. Differential Revision: https://reviews.llvm.org/D121905	2022-04-13 10:24:03 +01:00
David Green	a93607c479	[AArch64] Remove always true Perfect cost check. NFC Perfect shuffle costs are always encoded less than 4, and shouldn't really have a cost more than 3, so it makes no sense to check it when generating shuffles. The perfect shuffle is likely always better than a tbl too (although that may depend on whether it is in a loop).	2022-04-08 12:16:34 +01:00
Matt Arsenault	c4ea925f50	AtomicExpand: Change return type for shouldExpandAtomicStoreInIR Use the same enum as the other atomic instructions for consistency, in preparation for addition of another strategy. Introduce a new "Expand" option, since the store expansion does not use cmpxchg. Alternatively, the existing CmpXChg strategy could be renamed to Expand.	2022-04-06 22:34:04 -04:00
Martin Storsjö	8d7a17b7c8	[AArch64] Fix the upper limit for folded address offsets for COFF In COFF, the immediates in IMAGE_REL_ARM64_PAGEBASE_REL21 relocations are limited to 21 bit signed, i.e. the offset has to be less than (1 << 20). The previous limit did intend to cover for this case, but had missed that the 21 bit field was signed. This fixes issue https://github.com/llvm/llvm-project/issues/54753. Differential Revision: https://reviews.llvm.org/D123160	2022-04-06 22:54:13 +03:00
zhongyunde	9a2d5cc1da	[SVE][AArch64] Enable first active true vector combine for INTRINSIC_WO_CHAIN WHILELO/LS insn is used very important for SVE loop, and itself is a flag-setting operation, so add it. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D122796	2022-04-06 21:01:37 +08:00
zhongyunde	19e5235147	[AArch64][InstCombine] Fold MLOAD and zero extensions into MLOAD Accord the discussion in D122281, we missing an ISD::AND combine for MLOAD because it relies on BuildVectorSDNode is fails for scalable vectors. This patch is intend to handle that, so we can circle back the type MVT::nxv2i32 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122703	2022-04-06 20:50:42 +08:00
zhongyunde	251637690a	[AArch64] Enhance last active true vector combine Last active extracting will output LASTB + WHILELS, and the WHILELS itself is a flag-setting operation, so perform it preferly. Reviewed By: paulwalker-arm, sdesmalen Differential Revision: https://reviews.llvm.org/D122551	2022-04-06 09:54:28 +08:00
David Green	3b9833597e	[AArch64] Alter mull buildvectors(ext(..)) combine to work on shuffles D120018 altered this combine to work on buildvectors as opposed to shuffle dup's. This works well for dups and other things that are expanded into buildvectors. Some shuffles are legal though, and stay as vector_shuffle through lowering. This expands the transform to also handle shuffles, so that we can turn mul(shuffle(sext into mul(sext(shuffle and more readily make smull/umull instructions. This can come up from the SLP vectorizer adding shuffles that are costed from extends. Differential Revision: https://reviews.llvm.org/D123012	2022-04-04 23:07:47 +01:00
David Green	60f57b3658	[AArch64] Ensure fixed point fptoi_sat has correct saturation width D113200 introduced an error where it was converting FP_TO_SI_SAT with multiply to a fixed point floating point convert. The saturation bitwidth needs to be equal to the floating point width, or else the routine would truncate the result as opposed to saturating it. Fixes #54601	2022-03-29 10:12:44 +01:00
Shao-Ce SUN	662b9fa02c	[NFC][CodeGen] Add a setTargetDAGCombine use ArrayRef Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122557	2022-03-29 09:53:24 +08:00
zhongyunde	c3fe025bd4	[AArch64][SelectionDAG] Refactor to support more scalable vector extending loads Accord the discussion in D120953, we should firstly exclude all scalable vector extending loads and then selectively enable those which we directly support. This patch is intend to refactor for above (truncating stores is not touched),and more scalable vector types will try to reduce the number of masked loads in favour of more unpklo/hi instructions. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D122281	2022-03-27 21:18:01 +08:00
David Green	693d3b7e76	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-26 21:10:43 +00:00
Simon Pilgrim	e699b5da44	[AArch64] isProfitableToHoist - remove nullptr test User is dereferenced on the main codepath so the null test is likely superfluous	2022-03-25 10:27:16 +00:00
David Green	3d8d60e147	Revert "[AArch64] Lower 3 and 4 sources buildvectors to TBL" This reverts commit `ec93b28909` as problems with it have been reported.	2022-03-25 10:03:10 +00:00
David Green	ec93b28909	[AArch64] Lower 3 and 4 sources buildvectors to TBL The default expansion for buildvectors is to extract each element and insert them into a new vector. That involves a lot of copying to/from the GPR registers. TLB3 and TLB4 can be relatively slow instructions with the mask needing to be loaded from a constant pool, but they should always be better than all the moves to/from GPRs. Differential Revision: https://reviews.llvm.org/D121137	2022-03-24 10:02:33 +00:00
David Green	54bc9ad2e8	[AArch64] Make some methods static. NFC	2022-03-24 08:55:27 +00:00
David Spickett	c3b98194df	Reland "[llvm][AArch64] Insert "bti j" after call to setjmp" This reverts commit `edb7ba714a`. This changes BLR_BTI to take variable_ops meaning that we can accept a register or a label. The pattern still expects one argument so we'll never get more than one. Then later we can check the type of the operand to choose BL or BLR to emit. (this is what BLR_RVMARKER does but I missed this detail of it first time around) Also require NoSLSBLRMitigation which I missed in the first version.	2022-03-23 11:43:43 +00:00
David Spickett	edb7ba714a	Revert "[llvm][AArch64] Insert "bti j" after call to setjmp" This reverts commit `eb5ecbbcbb` due to failures on buildbots with expensive checks enabled.	2022-03-23 10:43:20 +00:00
David Spickett	eb5ecbbcbb	[llvm][AArch64] Insert "bti j" after call to setjmp Some implementations of setjmp will end with a br instead of a ret. This means that the next instruction after a call to setjmp must be a "bti j" (j for jump) to make this work when branch target identification is enabled. The BTI extension was added in armv8.5-a but the bti instruction is in the hint space. This means we can emit it for any architecture version as long as branch target enforcement flags are passed. The starting point for the hint number is 32 then call adds 2, jump adds 4. Hence "hint #36" for a "bti j" (and "hint #34" for the "bti c" you see at the start of functions). The existing Arm command line option -mno-bti-at-return-twice has been applied to AArch64 as well. Support is added to SelectionDAG Isel and GlobalIsel. FastIsel will defer to SelectionDAG. Based on the change done for M profile Arm in https://reviews.llvm.org/D112427 Fixes #48888 Reviewed By: danielkiss Differential Revision: https://reviews.llvm.org/D121707	2022-03-23 09:51:02 +00:00
zhongyunde	828b89bc0b	[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads Trying to reduce the number of masked loads in favour of more unpklo/hi instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions from legal types. Both of normal and masked loads test cases added to guard compile crash. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D120953	2022-03-21 23:47:33 +08:00
chenglin.bi	dd3b90e4d7	[AArch64] Combine ISD::SETCC into AArch64ISD::ANDS When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false). ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction. So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison Fix https://github.com/llvm/llvm-project/issues/54283 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121449	2022-03-19 13:04:16 +00:00
Paul Walker	f46fe36d59	[AArch64] Fix incorrect getSetCCInverse usage within trySwapVSelectOperands. When inverting the compare predicate trySwapVSelectOperands is incorrectly using the type of the select's cond operand rather than the type of cond's operands. This means we're treating all inversions as if they're integer. Differential Revision: https://reviews.llvm.org/D121968	2022-03-19 12:36:14 +00:00
David Green	fe6057a293	[AArch64] Custom lower concat(v4i8 load, ...) We already have custom lowering for v4i8 load, which loads as a f32, converts to a vector and bitcasts and extends the result to a v4i16. This adds some custom lowering of concat(v4i8 load, ...) to keep the result as an f32 and create a buildvector of the resulting f32 loads. This helps not create all the extends and bitcasts, which are often difficult to fully clean up. Differential Revision: https://reviews.llvm.org/D121400	2022-03-18 11:58:02 +00:00
David Green	0b6df40c52	[AArch64] Combine ISD::AND into AArch64ISD::ANDS If we already have a AArch64ISD::ANDS node with identical operands, we can merge any ISD::AND into it, reducing the instruction count by calculating the value and the flags in a single operation. This code is taken from the X86 backend, and could also handle AArch64ISD::ADDS and AArch64ISD::SUBS, but I couldn't find any test cases where it came up. Differential Revision: https://reviews.llvm.org/D118584	2022-03-17 09:44:11 +00:00
zhongyunde	3568333815	[AArch64] Perform last active true vector combine Test bit of lane EC-1 can use P register directly, eg: Materialize : Idx = (add (mul vscale, NumEls), -1) i1 = extract_vector_elt t37, Constant:i64<Idx> ... into: "ptrue p, all" + PTEST Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D121180	2022-03-15 01:25:03 +08:00
Arthur Eubanks	250620f76e	[OpaquePtr][AArch64] Use elementtype on ldxr/stxr Includes verifier changes checking the elementtype, clang codegen changes to emit the elementtype, and ISel changes using the elementtype. Reviewed By: #opaque-pointers, nikic Differential Revision: https://reviews.llvm.org/D120527	2022-03-14 10:09:59 -07:00
David Sherwood	aeeb1199b4	[AArch64][SVE] Change the asserts in LowerToPredicatedOp to check for legal types When building the LLVM test suite with SVE I discovered a crash when compiling some Halide tests, which occurs because we try to use SVE to lower 64-bit vector multiplies and there is no vscale_range attribute on the function. In this case the min SVE vector bits was 0, which caused an assert in LowerToPredicatedOp to fire. I have amended the asserts in this function to check that the fixed-width type is legal. If the fixed-width type is larger than NEON and is legal then it must be because we've set the min SVE vector bits to something > 128. Or if the min SVE bits is 0, then the only legal types allowed are 128 bit types - for any other types the assert will fire. Tests added here: CodeGen/AArch64/sve-fixed-length-no-vscale-range.ll Differential Revision: https://reviews.llvm.org/D121297	2022-03-11 09:57:58 +00:00
Philippe Valembois	26cd258420	[AArch64] Use correct calling convention for each vararg While checking is tail call optimization is possible, the calling convention applied to fixed arguments is not the correct one. This implies for DarwinPCS that all arguments of a vararg function will go to the stack although fixed ones can go in registers. This prevents non-virtual thunks to be tail optimized although they are marked as musttail. Differential Revision: https://reviews.llvm.org/D120622	2022-03-10 15:07:25 -08:00
David Green	4899e2cab4	[AArch64] Fix type in comment. NFC	2022-03-10 15:03:27 +00:00
David Green	21a97a2ac1	[AArch64] TBL uses zero for out of range elements. A TBL instruction will use zero for any out of range values. We can use this in GenerateTBL to help turn a TBL2 into a TBL1, avoiding the need to materialise the zero. Differential Revision: https://reviews.llvm.org/D121139	2022-03-10 14:45:13 +00:00
zhongyunde	c22c8b151b	[AArch64] Perform first active true vector combine Materialize : i1 = extract_vector_elt t37, Constant:i64<0> ... into: "ptrue p, all" + PTEST Test bit of lane 0 can use P register directly, and the instruction “pture all” is loop invariant, which will beneficial to SVE after hoisting out the loop. Reviewed By: david-arm, paulwalker-arm Differential Revision: https://reviews.llvm.org/D120891	2022-03-08 01:10:21 +08:00
David Green	d9633d1490	[AArch64] Turn truncating buildvectors into truncates When lowering large v16f32->v16i8 fp_to_si_sat, the fp_to_si_sat node is split several times, creating an illegal v4i8 concat that gets expanded into a BUILD_VECTOR. After some combining and other legalisation, it ends up the a buildvector that extracts from 4 vectors, looking like BUILDVECTOR(a0,a1,a2,a3,b0,b1,b2,b3,c0,c1,c2,c3,d0,d1,d2,d3). That is really an v16i32->v16i8 truncate in disguise. This adds a ReconstructTruncateFromBuildVector method to detect the pattern, converting it back into the legal "concat(trunc(concat(trunc(a), trunc(b))), trunc(concat(trunc(c), trunc(d))))" tree. The extracted nodes could also be v4i16, in which case the truncates are not needed. All those truncates and concats then become uzip1's, which is much better than expanding by moving vector lanes around. Differential Revision: https://reviews.llvm.org/D119469	2022-03-07 09:42:54 +00:00
Benjamin Kramer	fbce4a7803	Drop some more global std::maps. NFCI.	2022-03-06 13:28:29 +01:00
Karl Meakin	43a0016f3d	Extend `performANDCSELCombine` to `performANDORCSELCombine` Differential Revision: https://reviews.llvm.org/D120422	2022-03-04 15:09:59 +00:00
Paul Walker	42b4a6227e	[DAGCombine] Prevent illegal ISD::SPLAT_VECTOR operations post legalisation. When triggered during operation legalisation the affected combine generates a splat_vector that when custom lowered for SVE fixed length code generation, results in the original precombine sequence and thus we enter a legalisation/combine hang. NOTE: The patch contains no tests because I observed this issue only when combined with other work that might never become public. The current way AArch64 lowers ISD::SPLAT_VECTOR meant a specific test was not possible so I'm hoping the DAGCombiner fix can be seen as obvious. The AArch64ISelLowering change is requirted to maintain existing code quality. Differential Revision: https://reviews.llvm.org/D120735	2022-03-04 11:54:03 +00:00
David Green	e348b09bb5	[AArch64] Turn UZP1 with undef operand into truncate This turns upz1(x, undef) to concat(truncate(x), undef), as the truncate is simpler and can often be optimized away, and it helps some of the insert-subvector tests optimize more cleanly. Differential Revision: https://reviews.llvm.org/D120879	2022-03-04 11:12:26 +00:00
Cullen Rhodes	616586794b	[AArch64] Add legal types for Streaming SVE The compiler currently crashes for scalable types when compiling with +sme, e.g. define <vscale x 4 x i32> @foo(<vscale x 4 x i32> %a) { ret <vscale x 4 x i32> %a } since it doesn't know how to legalize the types. SME implies a subset of SVE (+streaming-sve), the hasSVE predication in the backend needs extending to consider types/operations that are legal in Streaming SVE. This is the first patch adding legal types <-> register classes. Before making the change +sve(2) was temporarily replaced with +sme in all the intrinsics tests to see what failed, and again after making the change. For all the tests that passed after adding the legal types another RUN line has been added for +streaming-sve. More patches to follow. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D118561	2022-03-03 09:51:14 +00:00
Sander de Smalen	eac2638ec1	[AArch64][SVE] Fold away SETCC if original input was predicate vector. This adds the following two folds: Fold 1: setcc_merge_zero( all_active, extend(nxvNi1 ...), != splat(0)) -> nxvNi1 ... Fold 2: setcc_merge_zero( pred, extend(nxvNi1 ...), != splat(0)) -> nxvNi1 and(pred, ...) Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D119334	2022-02-28 14:12:43 +00:00
Sander de Smalen	201e3686ab	[AArch64][SVE] Handle more cases in findMoreOptimalIndexType. This patch addresses @paulwalker-arm's comment on D117900 to only update/write the by-ref operands iff the function returns true. It also handles a few more cases where a series of added offsets can be folded into the base pointer, rather than just looking at a single offset. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D119728	2022-02-28 12:13:52 +00:00
Paul Walker	16ee102964	[SVE] Add missing splat patterns for bfloat vectors. Differential Revision: https://reviews.llvm.org/D120496	2022-02-25 16:53:39 +00:00
Paul Walker	8ca5be93cc	[SVE] Don't custom lower constant predicate ISD:SPLAT_VECTOR operations. Differential Revision: https://reviews.llvm.org/D120340	2022-02-25 11:32:37 +00:00
Arthur Eubanks	6aa285eb85	[OpaquePtr][AArch64] Use load/store value type instead of pointer type for ldnt1/stnt1 alignment	2022-02-24 16:56:13 -08:00

1 2 3 4 5 ...

1445 Commits