llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	c9a41fe60a	[RISCV] Prefer vnsrl.wi v8, v8, 0 over vnsrl.wx v8, v8, x0. I have a couple data points that some microarchitectures prefer the immediate 0 over x0. Does anyone know of microarchitectures where the opposite is true? Unfortunately, this is different than the vncvt.x.x.w alias from the spec. Perhaps the alias was poorly chosen if x0 isn't as optimal as immediate 0 on all microarchitectures. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132041	2022-08-19 08:40:17 -07:00
Archibald Elliott	3a729069e4	[IR] Update llvm.prefetch to match docs The current llvm.prefetch intrinsic docs state "The rw, locality and cache type arguments must be constant integers." This change: - Makes arg 3 (cache type) an ImmArg - Improves the verifier error messages to reference the incorrect argument. - Fixes two tests which contradict the docs. This is needed as the lowering to GlobalISel is different for ImmArgs compared to other constants. The non-ImmArgs create a G_CONSTANT MIR instruction, the for ImmArgs the constant is put directly on the intrinsic's MIR instruction as an immediate. Differential Revision: https://reviews.llvm.org/D132042	2022-08-19 09:11:17 +01:00
Craig Topper	ba1f4cab44	[RISCV] Copy SDNodeFlags in lowerToScalableOp. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D132177	2022-08-18 20:42:59 -07:00
Craig Topper	5349aa2354	[RISCV] Copy SDNodeFlags in doPeepholeMaskedRVV and doPeepholeMergeVVMFold Especially the NoFPExcept flag for FP. Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D132173	2022-08-18 20:42:46 -07:00
Craig Topper	550fab53e1	[RISCV] Fold (sub C, (xor (setcc), 1)) -> (add (setcc), C-1). Extracted from D131729 where we handled C==0. It's now generalized to more constants. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D132000	2022-08-17 09:50:08 -07:00
Alex Bradbury	ce38128194	[RISCV] Avoid redundant branch-to-branch when expanding cmpxchg If the success value of a cmpxchg is used in a branch, the expanded cmpxchg sequence ends up with a redundant branch-to-branch (as the backend atomics expansion happens as late as possible, passes to optimise such cases have already run). This patch identifies this case and avoid it when expanding the cmpxchg. Note that a similar optimisation is possible for a BEQ on the cmpxchg success value. As it's hard to imagine a case where real-world code may do that, this patch doens't handle that case. Differential Revision: https://reviews.llvm.org/D130192	2022-08-17 13:49:15 +01:00
Craig Topper	39707c1a9a	[RISCV] Add test coverage for (select (icmp X, Y), float, float). NFC We fold integer setcc into SELECT_CC during DAG combine even if the SELECT_CC has FP result type, but we had no test coverage.	2022-08-16 21:28:26 -07:00
Craig Topper	d8cdd78b6c	[RISCV] Add test cases to show missed opportunity to fold (sub C, (xor (setcc), 1)). NFC (sub C, (xori X, 1)) can be folded to (add X, C-1) if X is 0 or 1. This would avoid the xori and in some cases remove an instruction neede to materialize the constant.	2022-08-16 16:40:53 -07:00
Craig Topper	53ce22e429	Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine." This time using N1 instead of N0 since N1 points to the original setcc. This now affects scheduling as I expected. Original commit message: We change seteq<->setne but it doesn't change the semantics of the setcc. We should keep original debug location. This is consistent with visitXor in the generic DAGCombiner.	2022-08-16 15:51:07 -07:00
Craig Topper	b5a18de651	[RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is still preferrable. (addi X, -1) can be compressed, sub with x0 on the LHS is never compressible.	2022-08-16 14:49:52 -07:00
Craig Topper	de6fd16971	[RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12. We still need to materialize the constant in a register and we may not be removing all uses of the original constant so it may increase code size.	2022-08-16 14:11:31 -07:00
Craig Topper	1180ed41ee	[RISCV] Add more test cases for (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). NFC In these test cases we do the transform, but the immediate is too large to form an ADDI so it didn't save any instructions. If the constant is opaque or has additional users we shouldn't do the transform if it doesn't form an ADDI.	2022-08-16 14:08:42 -07:00
Craig Topper	4854fa217f	[RISCV] Move test from setcc-logic.ll to select-const.ll. NFC Also add setne version of the test. Add some common prefixes to reduce number of identical CHECK lines.	2022-08-16 14:08:42 -07:00
Craig Topper	4184edc691	[RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc. This introduce an xori in some cases. I don't believe it was the intention of the original patch. This was an accident because nonan FP equality compares also use SETEQ/SETNE. Also pass the correct type to getSetCCInverse.	2022-08-16 13:00:36 -07:00
Craig Topper	87e7837293	[RISCV] Add test cases to show where we inverted a fp setcc and introduced an extra xori. In these tests we had (sub C, (seteq X, Y)) which we converted to the (add (setne X, Y), C-1). We don't have a FNE compare instruction so this created an XORI to invert an FEQ instruction. This might be a good idea since it can save a constant materialization, but does not appear to be the intention of the original patch.	2022-08-16 12:59:16 -07:00
Craig Topper	7a73ab5818	[RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64. We have a good selection of W instructions, so promoting a truncated value back to i64 is often free. This appears to be a net code size reduction on SPECINT2006. This has been split from D130397 as one of the patches needed to complete that. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131819	2022-08-15 08:32:51 -07:00
Simon Pilgrim	3a73133217	[DAG] canCreateUndefOrPoison - add freeze(sign_extend_inreg(x,vt)) -> sign_extend_inreg(freeze(x),vt) support Guaranteed not to create undef/poison	2022-08-15 12:18:59 +01:00
Simon Pilgrim	7e294e676e	[DAG] canCreateUndefOrPoison - add freeze(assertsext/zext(x,bt)) -> assertsext/zext(freeze(x),vt) support These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison	2022-08-15 11:13:43 +01:00
Craig Topper	b8c5420d74	[X86][RISCV] Pre-commit tests for D130862. NFC Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131442	2022-08-14 16:31:15 -07:00
LiaoChunyu	99ef0ddea3	[RISCV] Fold (sub constant, (setcc x, y, eq/neq)) -> (add constant - 1, (setcc x, y, neq/eq)) (setcc x, y, eq/neq) are seqz, snez that set rd = 0/1. addi is used to process immediate, which can save instructions for load immediate. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131471	2022-08-13 20:37:57 +08:00
Craig Topper	e493944f5f	[RISCV] Use SLTIU X, -1 for (setne X, -1). Since -1 is the maximum unsigned value, all values less than it are not equal to it.	2022-08-11 15:36:04 -07:00
Craig Topper	2c79801a0e	[RISCV] Add more ineg+setcc isel patterns to avoid creating neg+xori+slti(u). Including patterns to select addiw if only the lower 32 bits are used. I'm not excited about adding this many patterns. I'm looking at whether we can create the xori during lowering and move the ineg patterns to DAGCombiner.	2022-08-11 14:24:09 -07:00
Yeting Kuo	875694089d	[RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics. The patch uses peephole method to fold merge.vvm and unmasked intrinsics to masked intrinsics. Using peephole intead of tablegen patterns is to avoid large auto gnerated code. Note: The patch ignores segment loads since I don't know how to test them. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130442	2022-08-11 17:58:11 +08:00
Yeting Kuo	7050f2102e	[RISCV][test] Precommitted test for optimization for vmerge and unmasked intrinsics. Precommitted test cases for D130442. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130753	2022-08-11 13:23:08 +08:00
Alex Bradbury	47b1f8362a	[RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls Prior to this patch, libcalls inserted by the SelectionDAG legalizer could never be tailcalled. The eligibility of libcalls for tail calling is is partly determined by checking TargetLowering::isInTailCallPosition and comparing the return type of the libcall and the calleer. isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly (which always returns false if not implemented by the target). This patch provides a minimal implementation of TargetLowering::isUsedByReturnOnly - enough to support tail calling libcalls on hard float ABIs. Soft-float ABIs are left for a follow on patch. libcall-tail-calls.ll also shows missed opportunities to tail call integer libcalls, but this is due to issues outside of the isUsedByReturnOnly hook. Differential Revision: https://reviews.llvm.org/D131087	2022-08-10 10:50:29 +01:00
Philip Reames	46fe24e2fb	[RISCV] Split check lines for fpclamptosat_vec test This is currently exercising scalarization code path; with vectors enabled, we hit a different code path. Explicitly exercise both so that both configurations have testing.	2022-08-09 15:03:29 -07:00
Philip Reames	8800b1103a	[RISCV] Pin a test to scalar lowering to preserve test intent [nfc] In an upcoming change to enable fixed length vector lowering via vector registers, the codepath exercised would change. Pin this to the old lowering.	2022-08-09 10:12:18 -07:00
Nikita Popov	f5ed0cb217	[RISCV] Add target feature to force-enable atomics This adds a +forced-atomics target feature with the same semantics as +atomics-32 on ARM (D130480). For RISCV targets without the +a extension, this forces LLVM to assume that lock-free atomics (up to 32/64 bits for riscv32/64 respectively) are available. This means that atomic load/store are lowered to a simple load/store (and fence as necessary), as these are guaranteed to be atomic (as long as they're aligned). Atomic RMW/CAS are lowered to __sync (rather than __atomic) libcalls. Responsibility for providing the __sync libcalls lies with the user (for privileged single-core code they can be implemented by disabling interrupts). Code using +forced-atomics and -forced-atomics are not ABI compatible if atomic variables cross the ABI boundary. For context, the difference between __sync and __atomic is that the former are required to be lock-free, while the latter requires a shared global lock provided by a shared object library. See https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed discussion on the topic. This target feature will be used by Rust's riscv32i target family to support the use of atomic load/store without atomic RMW/CAS. Differential Revision: https://reviews.llvm.org/D130621	2022-08-09 16:04:46 +02:00
Shubham Narlawar	ab4fc87a9d	[DAG] Emit table lookup from TargetLowering::expandCTTZ() This patch emits table lookup in expandCTTZ. Context - https://reviews.llvm.org/D113291 transforms set of IR instructions to cttz intrinsic but there are some targets which does not support CTTZ or CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ(). Differential Revision: https://reviews.llvm.org/D128911	2022-08-08 12:08:05 +01:00
Simon Pilgrim	e5e93b6130	[DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks). This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero. The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead? Differential Revision: https://reviews.llvm.org/D130839	2022-08-08 11:53:56 +01:00
Craig Topper	75c64c7c4e	[RISCV] Don't use li+sh3add for constants that can use lui+add. If we're adding a constant that can't use addi we try a few tricks, one of which is using li+sh3add. We should not do this if lui+add would work. For example adding 8192. Using sh3add prevents folding a sext.w to form addw, thus increasing instruction count.	2022-08-05 12:47:03 -07:00
Philip Reames	9a9848f4b9	[RISCVInsertVSETVLI] Remove an unsound optimization This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem: vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo vsetivli a2, a1, e32, m4, ta, mu With the unsound result being: vsetivli a1, a0, e32, m1, ta, mu vsetivli a2, a0, e32, m4, ta, mu Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively. Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16 After the unsound optimization: a1 = 16 and a2 = 33 This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound. This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own. Differential Revision: https://reviews.llvm.org/D131264	2022-08-05 12:13:08 -07:00
Craig Topper	12a1ca9c42	[RISCV] Relax another one use restriction in performSRACombine. When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C) it's possible that the add is used by multiple sras. We should allow the combine if all the SRAs will eventually be updated. After transforming all of the sras, the shls will share a single (sext_inreg (add X, C1), i32). This pattern occurs if an sra with 32 is used as index in multiple GEPs with different scales. The shl from the GEPs will be combined with the sra before we get a chance to match the sra pattern.	2022-08-04 14:32:31 -07:00
Craig Topper	a2de12c987	[RISCV] Relax a one use restriction performSRACombine When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) ignore the use count on the (shl X, 32). The sext_inreg after the transform is free. So we're only making 2 new instructions, the add and the shl. So we only need to be concerned with replacing the original sra+add. The original shl can have other uses. This helps if there are multiple different constants being added to the same shl.	2022-08-04 11:25:08 -07:00
Lorenzo Albano	74940d2668	[VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D121114	2022-08-04 16:12:01 +02:00
wanglian	b6b0690355	[LegalizeTypes][VP] Add split operand support for VP float and integer casting Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D130685	2022-08-04 15:41:50 +08:00
Craig Topper	53d560b22f	[RISCV] Prevent infinite loop after D129980. D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into (seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it will be turned back into an 'and' by SimplifyDemandedBits which can cause an infinite loop. To prevent this, check if bit 31 is 0 with computeKnownBits before doing the transformation. Fixes PR56905. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131113	2022-08-03 15:19:07 -07:00
Alex Bradbury	9e966dd298	[RISCV][test] Add test for ability to tailcall libcalls Although there's good coverage of the libcalls within llvm/test/CodeGen, it's useful to have tests for all ABI and hard/soft-float combinations in order to properly test the logic that enables libcall tail calls (which will be implemented in a follow-up patch).	2022-08-03 19:27:59 +01:00
Alex Bradbury	28f12a09ae	[RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics An unnecessary sext.w is generated when masking the result of the riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed. Although this isn't a particularly important optimisation, removing the sext.w simplifies implementation of an additional cmpxchg-related optimisation in D130192. Although I can't produce a test with different codegen for the other atomics intrinsics, these are added as well for completeness. Differential Revision: https://reviews.llvm.org/D130191	2022-08-03 13:41:58 +01:00
Craig Topper	c2d0685286	[RISCV] Simplify test case from D130931. NFC	2022-08-01 16:50:56 -07:00
Craig Topper	da5b1bf5bb	[RISCV] Teach RISCVMergeBaseOffset to merge %lo/%pcrel_lo into load/store after folding arithmetic. It's possible we have: lui a0, %hi(sym) addi a0, %lo(sym) addi a0, <offset1> lw a0, <offset2>(a0) We want to arrive at lui a0, %hi(sym+offset1+offset2) lw a0, %lo(sym+offset1+offset2) We currently fail to do this because we only consider loads/stores if we didn't find any arithmetic. This patch splits arithmetic folding and load/store folding into two separate phases. The load/store folding can no longer assume the offset in hi/lo is 0 so we must combine the offsets. I've applied the same simm32 limit that we applied in the arithmetic folding. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D130931	2022-08-01 15:33:21 -07:00
Craig Topper	e07a8155f5	[RISCV] Move Pre-RA pseudo expansion from addMachineSSAOptimization to addPreRegAlloc. addMachineSSAOptimization is skipped for -O0, but this pass is required for -O0.	2022-08-01 13:44:43 -07:00
Craig Topper	f9b05e6dad	[RISCV] Pre-commit tests for D130931. NFC	2022-08-01 12:57:09 -07:00
Craig Topper	450edb0b37	[RISCV] Explicitly select second operand of branch condition to X0. At least based on the lit tests, the coalescer sometimes fails to propagate the copy from X0 into the branch instruction. This patch does it manually during isel. The majority of the changes are from the select patterns. Some of the changes are just register allocation changes. Only the Select change affects the whether a b*z instruction is generated in the tests. I changed the branch pattern for consistency. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130809	2022-08-01 11:16:48 -07:00
Lorenzo Albano	71b7c03fd6	[RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121113	2022-08-01 09:23:45 -07:00
Luís Marques	0bc177b6f5	[RISCV] Extend the Merge Base Offset pass to handle AUIPC+ADDI Builds upon D123264, adding support for merging the low part of the LLA address into the load/store instruction offsets. Differential Revision: https://reviews.llvm.org/D123265	2022-08-01 11:30:02 +02:00
Luís Marques	260a641068	[RISCV] Pre-RA expand pseudos pass Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up patches to fold the addi of PseudoLLA instructions into the immediate operand of load/store instructions. Differential Revision: https://reviews.llvm.org/D123264	2022-07-31 23:19:00 +02:00
Craig Topper	d21b315360	[RISCV] Remove vmerges from vector ceil, floor, trunc lowering. Use masked operations to suppress spurious exception bits being set in fflags. Unfortunately, doing this adds extra copies.	2022-07-30 10:58:41 -07:00
Luís Marques	383bc7210e	[RISCV] Precommit test for D123265 Differential Revision: https://reviews.llvm.org/D128562	2022-07-30 00:13:42 +02:00
Craig Topper	e637feee80	[RISCV] Add isel pattern for (setne/eq GPR, -2048) For constants in the range [-2047, 2048] we use addi. If the constant is -2048 we can use xori. If we don't match this explicitly, we'll emit an LI for the -2048 followed by an XOR.	2022-07-29 14:07:38 -07:00
Craig Topper	2750873dfe	[RISCV] Update lowerFROUND to use masked instructions. This avoids a vmerge at the end and avoids spurious fflags updates. This isn't used for constrained intrinsic so we technically don't have to worry about fflags, but it doesn't cost much to support it. To support I've extend our FCOPYSIGN_VL node to support a passthru operand. Similar to what was done for VRGATHER*_VL nodes. I plan to do a similar update for trunc, floor, and ceil. Reviewed By: reames, frasercrmck Differential Revision: https://reviews.llvm.org/D130659	2022-07-28 10:05:19 -07:00
Simon Pilgrim	69d5a038b9	[DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits. There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP. Differential Revision: https://reviews.llvm.org/D77804	2022-07-28 14:10:44 +01:00
Craig Topper	a304d70ee9	[RISCV] Reorder (and/or/xor (shl X, C1), C2) if we can form ANDI/ORI/XORI. InstCombine and DAGCombine prefer to keep shl before binops. This patch teaches isel to convert to (shl (and/or/xor X, C1 >> C2), C2) if (C1 >> C2) is a simm12. The idea was taken from X86's isel code. There's a special case implemented for a sext_inreg between the shift and the binop. Differential Revision: https://reviews.llvm.org/D130610	2022-07-27 17:35:26 -07:00
Craig Topper	8d87f71e54	[RISCV] Pre-commit tests for D130610. NFC	2022-07-27 17:35:17 -07:00
Craig Topper	32622d6de4	[RISCV] Add isel pattern for (mul (and X, 0xffffffff), 3<<C) with Zba. We can use slli.uw by C followed by sh1add. Similar can be done for multiples of 5 and 9. We need to make sure that C is less than 32 to stay in bounds of the 5-bit immediate for slli.uw. We have existing patterns for (mul X, 3<<C) that use sh1add followed by slli. That order doesn't allow the and to be folded. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130146	2022-07-27 09:41:59 -07:00
LiaoChunyu	bf4f9a468a	[RISCV]Enable isIntDivCheap when attribute is minsize Don't expand divisions by constants when attribute is minsize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130543	2022-07-27 18:22:51 +08:00
Simon Pilgrim	529bd4f352	[DAG] SimplifyDemandedBits - don't early-out for multiple use values SimplifyDemandedBits currently early-outs for multi-use values beyond the root node (just returning the knownbits), which is missing a number of optimizations as there are plenty of cases where we can still simplify when initially demanding all elements/bits. @lenary has confirmed that the test cases in aea-erratum-fix.ll need refactoring and the current increase codegen is not a major concern. Differential Revision: https://reviews.llvm.org/D129765	2022-07-27 10:54:06 +01:00
Craig Topper	3928e89c31	[RISCV] Pre-commit tests for D130146. NFC	2022-07-26 14:22:08 -07:00
Philip Reames	75b15a7e63	[RISCV] Add codegen coverage for ceil/floor/trunc/round/roundeven within FPR Currently, all of these go to libcalls. A change to improve lowering is upcoming.	2022-07-26 08:48:46 -07:00
wangpc	1a7078d106	[DAGCombine] Mask doesn't have to be (EltSize - 1) exactly when combining rotation I think what we need is the least Log2(EltSize) significant bits are known to be ones. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130251	2022-07-26 21:14:45 +08:00
wangpc	10a7c5b798	[RISCV] Precommit test for D130251 Added tests won't modify the least Log2(EltSize) significant bits. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D130252	2022-07-26 21:09:11 +08:00
jacquesguan	cb370cf413	[DAGCombiner] Teach scalarizeExtractedBinop to support scalable splat. This patch supports the scalable splat part for scalarizeExtractedBinop. Differential Revision: https://reviews.llvm.org/D129725	2022-07-26 09:31:45 +08:00
Craig Topper	f04ae43752	[RISCV] Add more test cases for select with (setge X, C) condition. InstCombine and SelectionDAG will tend to canonicalize these conditions to (setgt X, C-1). C-1 might be more costly to materialize than C would have been.	2022-07-25 11:55:21 -07:00
Craig Topper	1db6d6dcd8	[RISCV] Teach RISCVCodeGenPrepare to optimize (zext (abs(i32 X, i1 1))). (abs(i32 X, i1 1) always produces a positive result. The 'i1 1' means INT_MIN input produces poison. If the result is sign extended, InstCombine will convert it to zext. This does not produce ideal code for RISCV. This patch reverses the zext back to sext which can be folded into a subw or negw. Ideally we'd do this in SelectionDAG, but we lose the INT_MIN poison flag when llvm.abs becomes ISD::ABS. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130412	2022-07-25 09:36:41 -07:00
jacquesguan	d8800ead62	[RISCV] Scalarize binop followed by extractelement. This patch adds shouldScalarizeBinop to RISCV target in order to convert an extract element of a vector binary operation into an extract element followed by a scalar binary operation. Differential Revision: https://reviews.llvm.org/D129545	2022-07-25 17:23:31 +08:00
Craig Topper	ab2348a6fa	[RISCV] Add sext.b/h and zext.b/h/w to RISCVInstrInfo::foldMemoryOperandImpl. We can always fold zext.b since it is just andi. The others require Zba/Zbb. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130302	2022-07-21 14:54:58 -07:00
jacquesguan	e60eb7053d	recommit "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." With fix for AArch64 and Hexgon test cases.	2022-07-21 17:34:34 +08:00
David Green	23d6186be0	[SelectionDAG] Fix fptoi.sat scalable vector lowering Vector fptosi_sat and fptoui_sat were being expanded by unrolling the vector operation. This doesn't work for scalable vector, so this patch adds a call to TLI.expandFP_TO_INT_SAT if the vector is scalable. Scalable tests are added for AArch64 and RISCV. Some of the AArch64 fptoi_sat operations should be legal, but that will be handled in another patch. Differential Revision: https://reviews.llvm.org/D130028	2022-07-21 08:00:22 +01:00
Craig Topper	add17fc8e4	[RISCV] Combine (select_cc (srl (and X, 1<<C), C), 0, eq/ne, true, fale) (srl (and X, 1<<C), C) is the form we receive for testing bit C. An earlier combine removed the setcc so it wasn't there to match when we created the SELECT_CC. This doesn't happen for BR_CC because generic DAG combine rebuilds the setcc if it is used by BRCOND. We can shift X left by XLen-1-C to put the bit to be tested in the MSB, and use a signed compare with 0 to test the MSB.	2022-07-20 22:32:11 -07:00
Craig Topper	8983db15a3	[RISCV] Optimize (brcond (seteq (and X, 1 << C), 0)) If C > 10, this will require a constant to be materialized for the And. To avoid this, we can shift X left by XLen-1-C bits to put the tested bit in the MSB, then we can do a signed compare with 0 to determine if the MSB is 0 or 1. Thanks to @reames for the suggestion. I've implemented this inside of translateSetCCForBranch which is called when setcc+brcond or setcc+select is converted to br_cc or select_cc during lowering. It doesn't make sense to do this for general setcc since we lack a sgez instruction. I've tested bit 10, 11, 31, 32, 63 and a couple bits betwen 11 and 31 and between 32 and 63 for both i32 and i64 where applicable. Select has some deficiencies where we receive (and (srl X, C), 1) instead. This doesn't happen for br_cc due to the call to rebuildSetCC in the generic DAGCombiner for brcond. I'll explore improving select in a future patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130203	2022-07-20 18:40:49 -07:00
Craig Topper	31b8939ded	[RISCV] Recognize bexti from (srl (and X, 1<<C), C). This is the form we get for (zext (setne (and X 1<<C))). We only had bexti patterns for the alternative form (and (srl X, C), 1).	2022-07-20 15:03:52 -07:00
Craig Topper	6746b2349c	[RISCV] Add test cases for failure to use bexti for (setne (and X, 1<<C)) This will get converted to (srl (and X, 1<<C), C) which we need to isel to bexti.	2022-07-20 15:03:52 -07:00
Alex Bradbury	c30c461dde	[RISCV][test] Add tests for atomic compare exchange + branch on result Due to the late expansion of the compare exchange sequences, there's scope for improving codegen by folding the branches into the cmpxchg loop (avoiding a branch-to-branch).	2022-07-20 17:49:33 +01:00
Alex Bradbury	b1578bf377	[RISCV][test] Add tests showing signext behaviour of cmpxchg	2022-07-20 17:10:16 +01:00
Simon Pilgrim	0f6b0461b0	[DAG] SimplifyDemandedBits - relax "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" to match only demanded bits The "xor (X >> ShiftC), XorC --> (not X) >> ShiftC" fold is currently limited to the XOR mask being a shifted all-bits mask, but we can relax this to only need to match under the demanded bits. This helps expose more bit extraction/clearing patterns and fixes the PowerPC testCompares*.ll regressions from D127115 Alive2: https://alive2.llvm.org/ce/z/fl7T7K Differential Revision: https://reviews.llvm.org/D129933	2022-07-19 10:59:07 +01:00
Max Kazantsev	69b284aaf6	Revert "[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat." This reverts commit `58dfaaaace`. Massive AARCH test failures in buildbot.	2022-07-19 13:41:52 +07:00
jacquesguan	58dfaaaace	[DAGCombiner] Teach scalarizeBinOpOfSplats handle scalable splat. This revision supports to scalarize a binary operation of two scalable splat vectors. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D122791	2022-07-19 11:20:51 +08:00
jacquesguan	3fcaea176c	[RISCV][test] Precommit test for D122791. Differential Revision: https://reviews.llvm.org/D123362	2022-07-19 10:56:02 +08:00
ksyx	3198364e6e	[RISCV][Clang] Add support for Zmmul extension This patch implements recently ratified extension Zmmul, a subextension of M (Integer Multiplication and Division) consisting only multiplication part of it. Differential Revision: https://reviews.llvm.org/D103313 Reviewed By: craig.topper, jrtc27, asb	2022-07-18 20:26:08 -04:00
Craig Topper	0b02752899	[RISCV] Optimize (seteq (i64 (and X, 0xffffffff)), C1) (and X, 0xffffffff) requires 2 shifts in the base ISA. Since we know the result is being used by a compare, we can use a sext_inreg instead of an AND if we also modify C1 to have 33 sign bits instead of 32 leading zeros. This can also improve the generated code for materializing C1. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D129980	2022-07-18 10:54:45 -07:00
Craig Topper	464b3a9d8a	[RISCV] Pre-commit tests for D129980. NFC Differential Revision: https://reviews.llvm.org/D129981	2022-07-18 10:54:45 -07:00
Craig Topper	7c0b9b379b	[RISCV] Add isel patterns for ineg+setge/le/uge/ule. setge/le/uge/ule selected by themselves require an xori with 1. If we're negating the setcc, we can fold the xori with the neg to create an addi with -1. This works because xori X, 1 is equivalent to 1 - X if X is either 0 or 1. So we're doing -(1 - X) which is X-1 or X+-1. This improves the code for selecting between 0 and -1 based on a condition for some conditions. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129957	2022-07-18 09:55:01 -07:00
Craig Topper	d7f2a63371	[RISCV] Fold stack reload into sext.w by using lw instead of ld. We can use lw to load 4 bytes from the stack and sign extend them instead of loading all 8 bytes. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129948	2022-07-18 09:09:17 -07:00
jacquesguan	bd228a1772	[RISCV] Extend use of SHXADD instructions in RVV spill/reload code. This patch extends D124824. It uses SHXADD+SLLI to emit 3, 5, or 9 multiplied by a power 2. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D129179	2022-07-18 10:53:19 +08:00
jacquesguan	557a471ab3	[RISCV][test] Precommit test for D129179. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D129463	2022-07-18 10:42:08 +08:00
Craig Topper	decf385c27	[RISCV] Teach targetShrinkDemandedConstant to handle OR and XOR. We were only handling AND before, but SimplifyDemandedBits can also call it for OR and XOR.	2022-07-17 12:36:33 -07:00
Craig Topper	8cc483099a	[RISCV] Teach RISCVCodeGenPrepare to optimize (i64 (and (zext/sext (i32 X), C1))) If X is known positive by a dominating condition, we can fill in ones into the upper bits of C1 if that would allow it to become an simm12 allowing the use of ANDI. This pattern often occurs in unrolled loops where the induction variable has been widened. To get the best benefit from this, I had to move the pass above ConstantHoisting which is in addIRPasses. Otherwise the AND constant is often hoisted away from the AND. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129888	2022-07-17 11:00:56 -07:00
Craig Topper	ee6267c443	[RISCV] Remove Gather/Scatter Opt from the O0 pipeline.	2022-07-17 10:58:33 -07:00
Philip Reames	6ab686eb86	[LSR] Allow already invariant operand for ICmpZero matching [try 2] Changes since initial commit: * Wrapping a pointer in an SCEV unknown hides the base, and SCEV is only able to compute a subtraction when the bases are known to be equal. This results in a SCEVCouldNotCompute flowing forward and triggering asserts. Test case added in `d767b392`. * isLoopInvariant returns true for instructions outside the loop, but not necessarily above the loop. Since this code is allowed to visit uses of an IV outside of a loop, we have to make sure the operands of the compare are both invariant and dominating the header. Test case added in `2aed3cdb`. Original commit message follows... The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 13:29:43 -07:00
Philip Reames	6fe766beba	Revert "[LSR] Allow already invariant operand for ICmpZero matching" This reverts commit `9153515a7b`. Builtbot crash was reported in the commit thread, reverting while investigating.	2022-07-15 10:47:57 -07:00
Philip Reames	9153515a7b	[LSR] Allow already invariant operand for ICmpZero matching The ICmpZero matching is checking to see if the expression is loop invariant per SCEV and expandable. This allows expressions inside the loop which can be made loop invariant to be seamlessly expanded, but is overly conservative for expressions which already are loop invariant. As a simple justification for why this is correct, consider a loop invariant urem as RHS vs an alternate function with that same urem wrapped inside a helper call. Why would it be legal to match the later, but not the former? Differential Revision: https://reviews.llvm.org/D129793	2022-07-15 09:51:00 -07:00
Craig Topper	79016f6eef	[RISCV] Refine the heuristics for our custom (mul (and X, C2), C1) isel. Prefer to use SLLI instead of zext.w/zext.h in more cases. SLLI might be better for compression.	2022-07-14 18:24:10 -07:00
Craig Topper	dcfc1fd26f	[SelectionDAG][RISCV][AMDGPU][ARM] Improve SimplifyDemandedBits for SHL with variable shift amount. If we have a variable shift amount and the demanded mask has leading zeros, we can propagate those leading zeros to not demand those bits from operand 0. This can allow zero_extend/sign_extend to become any_extend. This pattern can occur due to C integer promotion rules. This transform is already done by InstCombineSimplifyDemanded.cpp where sign_extend can be turned into zero_extend for example. Reviewed By: spatel, foad Differential Revision: https://reviews.llvm.org/D121833	2022-07-14 16:10:14 -07:00
Craig Topper	450f0bd17b	[RISCV] Add additional tests for D121833. NFC	2022-07-14 16:10:14 -07:00
Craig Topper	1a8468ba61	[RISCV] Add a RISCV specific CodeGenPrepare pass. Initial optimization is to convert (i64 (zext (i32 X))) to (i64 (sext (i32 X))) if the dominating condition for the basic block guaranteed the sign bit of X is zero. This frequently occurs in loop preheaders where a signed induction variable that can never be negative has been widened. There will be a dominating check that the 32-bit trip count isn't negative or zero. The check here is not restricted to that specific case though. A i32->i64 sext is cheaper than zext on RV64 without the Zba extension. Later optimizations can often remove the sext from the preheader basic block because the dominating block also needs a sext to evaluate the greater than 0 check. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D129732	2022-07-14 10:20:59 -07:00
Fraser Cormack	d1a5669f5e	[RISCV] Disable subregister liveness by default We previously enabled subregister liveness by default when compiling with RVV. This has been shown to cause miscompilations where RVV register operand constraints are not met. A test was added for this in D129639 which explains the issue in more detail. Until this issue is fixed in some way, we should not be enabling subregister liveness unless the user asks for it. Reviewed By: craig.topper, rogfer01, kito-cheng Differential Revision: https://reviews.llvm.org/D129646	2022-07-14 17:04:10 +01:00
Fraser Cormack	3b334978d5	[RISCV] Add a test showing a miscompilation with subreg liveness This patch adds a test which shows that we may incorrectly register allocate for RVV instructions which have no-overlap constraints on source/dest registers of different LMUL groups. The particular case shows that a vrgatherei16 instruction writes to a LMUL=1 register group v11 and reads from an EMUL=2 register group v10/v11. This breaks the overlap constraints of the vrgatherei16 instruction. The test also shows that disabling subregister liveness fixes the test. We use `early-clobber` on the `VR` dest and the `VRM2` source to enforce the constraint but with subregister liveness this constraint is not met. It's unclear to me at this point whether this is per-design of early-clobber in conjunction with subregisters (meaning we should find another way of expressing this constraint) or whether it's a bug in the register allocator somewhere. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D129639	2022-07-14 10:34:38 +01:00
Craig Topper	257755530a	[RISCV] Fold (sra (sext_inreg (shl X, C1), i32), C2) -> (sra (shl X, C1+32), C2+32). The former pattern will select as slliw+sraiw while the latter will select as slli+srai. This can enable the slli+srai to be compressed. Differential Revision: https://reviews.llvm.org/D129688	2022-07-13 14:34:17 -07:00
Craig Topper	e32864b605	[RISCV] Add test case show missed opportunity to turn slliw+sraiw into slli+srai. slliw and sraiw have no compressed encodings. slli and srai do have compressed encodings. Pre-commit for D129688	2022-07-13 12:55:12 -07:00
Alex Bradbury	2ce0a5c8c3	[RISCV][test][NFC] Regenerate RISC-V tests with update_llc_test_checks.py -u If a change alters more than a couple of tests it's really handy to be able to regenerate any that were created by update_llc_test_checks.py with something like `update_llc_test_checks.py -u llvm/test/CodeGen/RISCV`. I noticed this causes some extraneous changes (perhaps due to hand editing). This commit addresses that by updating any fails that are modified by update_llc_test_checks.py -u.	2022-07-13 19:37:34 +01:00
Philip Reames	dde2a7fb6d	[RISCV] Exploit fact that vscale is always power of two to replace urem sequence When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale. vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.) We know from the RISCV V extension specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.) It is worth noting that AArch64 SVE specification explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic. Differential Revision: https://reviews.llvm.org/D129609	2022-07-13 10:54:47 -07:00
jacquesguan	9049c46b9d	[RISCV][test] Add test of binop followed by extractelement. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D129544	2022-07-13 14:33:56 +08:00
Philip Reames	1ce3f94570	{RISCV] Test coverage for improved lowering assuming vscale is pow-of-two	2022-07-12 15:40:30 -07:00
Craig Topper	8eaf00e04d	[TargetLowering][RISCV] Make expandCTLZ work for non-power of 2 types. To convert CTLZ to popcount we do x = x \| (x >> 1); x = x \| (x >> 2); ... x = x \| (x >>16); x = x \| (x >>32); // for 64-bit input return popcount(~x); This smears the most significant set bit across all of the bits below it then inverts the remaining 0s and does a population count. To support non-power of 2 types, the last shift amount must be more than half of the size of the type. For i15, the last shift was previously a shift by 4, with this patch we add another shift of 8. Fixes PR56457. Differential Revision: https://reviews.llvm.org/D129431	2022-07-12 11:36:37 -07:00
Craig Topper	866be0aa8a	[RISCV] Pre-commit test for PR56457. NFC	2022-07-12 11:36:37 -07:00
Craig Topper	ca13555e0c	[RISCV] Pre-commit tests for D121833. NFC	2022-07-11 12:14:29 -07:00
LiaoChunyu	3f68f0f816	[RISCV] Optimize 2x SELECT for floating-point types Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127871	2022-07-11 14:10:27 +08:00
Craig Topper	35ec8a423d	[RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools. I think it only makes sense to return true here if we aren't going to turn around and create a constant pool for the immmediate. I left out the check for useConstantPoolForLargeInts() thinking that even if you don't want the commpiler to create a constant pool you might still want to avoid materializing an integer that is already available in a global variable. Test file was copied from AArch64/ARM and has not been commited yet. Will post separate review for that. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D129402	2022-07-10 14:10:17 -07:00
Craig Topper	60450f91c8	[RISCV] Add test cases for inline memcpy expansion Test file was taken directly from AArch64/ARM. I've added RUN lines for aligned and unaligned since many of the test cases are strings that aren't aligned and have an odd size. Some of these test cases are modified by D129402. Differential Revision: https://reviews.llvm.org/D129403	2022-07-10 14:09:02 -07:00
Craig Topper	5f7641a3be	[RISCV] Modify the custom isel for (add X, imm) used by load/stores. We have custom isel that tries to select the Lo12 bits using a separate ADDI that can later folded into the load/store address by the post-isel peephole. This patch disables this if the load/store already had a non-zero offset. A non-zero offset implies that CodeGenPrepare split several large offsets used by different loads and stores into a common large offset and multiple small offsets that could be folded. Folding more of the lo12 bits changes this common offset by increasing the small offsets. While this can save an instruction to materialize the common offset, it can also prevent the small offsets from fitting in a compressed load/store instruction. Removing this also simplifies the last piece needed to fold the custom isel for add into SelectAddrRegImm and remove the post-isel peephole.	2022-07-09 22:47:27 -07:00
Craig Topper	92f1794d41	[RISCV] Mark fminnum_vl and fmaxnum_vl as commutable.	2022-07-08 10:19:09 -07:00
Craig Topper	069ba96660	[RISCV] Add commuted fixed vector vfmax.vf and vfmin.vf tests. NFC The ISD opcodes aren't marked commutable so we don't match these properly.	2022-07-08 10:19:09 -07:00
Philip Reames	264018d764	[RISCV] Mark vsadd(u)_vl as commutable This allows fixed length vectors involving splats on the LHS to commute into the _vx form of the instruction. Oddly, the generic canonicalization rules appear to catch the scalable vector cases. I haven't fully dug in to understand why, but I suspect it's because of a difference in how we represent splats (splat_vector vs build_vector). Differential Revision: https://reviews.llvm.org/D129302	2022-07-08 10:18:21 -07:00
Craig Topper	a246eb6814	[RISCV] Mark (s/u)min_vl and (s/u)max_vl as commutable.	2022-07-08 09:59:42 -07:00
Craig Topper	cd783bf997	[RISCV] Add fixed vector vmin(u).vx and vmax(u).vx tests. NFC	2022-07-08 09:59:41 -07:00
Sanjay Patel	8b75671314	[SDAG] try to replace subtract-from-constant with xor This is almost the same as the abandoned D48529, but it allows splat vector constants too. This replaces the x86-specific code that was added with the alternate patch D48557 with the original generic combine. This transform is a less restricted form of an existing InstCombine and the proposed SDAG equivalent for that in D128080: https://alive2.llvm.org/ce/z/OUm6N_ Differential Revision: https://reviews.llvm.org/D128123	2022-07-08 08:14:24 -04:00
Kito Cheng	5c45ae4108	[RISCV] Fix wrong register rename for store value during make-compressible optimization Current implementation will rename both register in store instructions if we store base address into memory with same base register, it's OK if the offset is 0, however that is wrong transform if offset isn't 0, give a smalle example here: sd a0, 808(a0) We should not transform into: addi a2, a0, 768 sd a2, 40(a2) That should just rename base address like this: addi a2, a0, 768 sd a0, 40(a2) Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128876	2022-07-08 18:07:17 +08:00
Kito Cheng	7b9a3b9d6d	[RISCV] Precommit testcase to show wrong result of make-compressible optimization Use following example to demo what happened now: li a1, 1 sd a1, 800(a0) sd a0, 808(a0) # Store base address into base + offset li a1, 2 sd a1, 816(a0) Current will optimizate into: li a1, 1 addi a2, a0, 768 sd a1, 32(a2) sd a2, 40(a2) # Wrong replacement for the source register. li a1, 2 sd a1, 48(a2) Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128875	2022-07-08 17:01:22 +08:00
Lian Wang	9cfb28d672	[RISCV] Change VECTOR_SPLICE mask operation from expand to promote Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128717	2022-07-08 06:20:22 +00:00
Lian Wang	99da3115d1	[RISCV] Recommit test for D128717 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128778	2022-07-08 02:21:33 +00:00
Lian Wang	ab9e8a3a6f	Revert "[RISCV] Precommit test for D128717" This reverts commit `b3b37f3ecf`.	2022-07-08 01:56:29 +00:00
Lian Wang	b3b37f3ecf	[RISCV] Precommit test for D128717 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128778	2022-07-08 01:47:01 +00:00
Diego Caballero	bf1758c3dc	Revert "[RISCV] Optimize 2x SELECT for floating-point types" This reverts commit `1178992c72`.	2022-07-07 22:54:00 +00:00
Philip Reames	439783da01	[RISCV] Adjust fixed vector coverage for get.active.lane.mask Make sure we include at least one case where the vsadd/vmsltu lowering requires only LMUL1. We should be able to generate all of the fixed vector variants from scalar to vector idioms, but this is probably not very important right now given the fixed length variants we'd actually use when vectorizing with LMUL=1 are reasonable.	2022-07-07 13:28:29 -07:00
Philip Reames	fa3783c907	[RISCV] Test coverage for missing commute of vsadd(u) For some reason, this appears to only happen with fixed length vectors. Scalable ones commute just fine in all the cases I've seen.	2022-07-07 09:11:44 -07:00
Philip Reames	6f4773f064	[RISCV] Add codegen coverage for get.active.lane.mask	2022-07-06 14:50:41 -07:00
Craig Topper	088bb8a328	[RISCV] Add more SHXADD patterns where the input is (and (shl/shr X, C2), C1) It might be possible to rewrite (and (shl/shr X, C2), C1) in a way that gives us a left shift that can be folded into a SHXADD.	2022-07-05 16:21:47 -07:00
Craig Topper	ac3e26bcff	[RISCV] Add more SHXADD tests. NFC	2022-07-05 13:41:58 -07:00
luxufan	c06d0b4d02	[RISCV] Add ADDI instr for computing FrameIndex address RVV doesn't have immediate field for memory addressing. Currently we build MachineInstructions in PEI to computing stack offset for RVV load store instructions. These instructions were added too late to can be optimized by CSE, LICM... passes. This patch makes FrameIndex SDNodes can't be matched in RVV Load Store instruction selection patterns. So that the FrameIndex SDNodes would be selected as `ADDI GPR, targetframeindex`. There are 2 advantages for such change: 1. Stack objects address computing can be optimized by machine function passes. 2. Since the ADDI instruction's destination register can be used as a temp register, we can save an emergency spill slot. Differential Revision: https://reviews.llvm.org/D128187	2022-07-04 22:13:35 +08:00
Craig Topper	d36e09cfe5	[RISCV] Add more SHXADD patterns. This handles the code we get for this. int foo(unsigned x, int *y) { return y[x >> 3]; } The srl and shl implied by the array index will be combined to form (srl (and X, C2), C1). We need to reverse this get to back the shl to fold into SHXADD.	2022-07-03 21:57:05 -07:00
luxufan	0f45eaf0da	[RISCV] Add a scavenge spill slot when use ADDI to compute scalable stack offset Computing scalable offset needs up to two scrach registers. We add scavenge spill slots according to the result of `RISCV::isRVVSpill` and `RVVStackSize`. Since ADDI is not included in `RISCV::isRVVSpill`, PEI doesn't add scavenge spill slots for scrach registers when using ADDI to get scalable stack offsets. The ADDI instruction has a destination register which can be used as a scrach register. So one scavenge spil slot is sufficient for computing scalable stack offsets. Differential Revision: https://reviews.llvm.org/D128188	2022-07-03 20:18:13 +08:00
Craig Topper	7e4ab9d5b8	[RISCV] Add more SHXADD isel patterns. This handles the code we get for int foo(int* x, unsigned y) { return x[y >> 1]; } The shift right and the shl will get DAG combined into (shl (and X, 0xfffffffe), 1). We have custom isel to match the shl+and, but with Zba the (add (shl X, 1), Y) part will get matched and leave the and to be iseled by itself. This commit adds a larger pattern that includes the and.	2022-07-02 23:11:22 -07:00
Craig Topper	b2e9684fe4	[RISCV] isel (shl (and X, C2), C) -> (slli (srliw X, C3), C3+C). where C2 has 32 leading zeros and C3 trailing zeros. When the shl is used by an add C is 1,2 or 3, we end up matching (add (shl X, C), Y) first. This leaves an and with a constant that is harder to materialize.	2022-07-02 01:04:44 -07:00
Craig Topper	9ac548e118	[RISCV] isel (add (and X, 0xFFFFFFFE), Y) as (SH1ADD (SRLIW X, 1), Y). Similar for SH2ADD and SH3ADD. This is what we get from int foo(int* x, unsigned y) { return x[y >> 1]; } This allows us to avoid materializing 0xFFFFFFFE into a register.	2022-07-01 23:52:29 -07:00
Yeting Kuo	5744b9cb79	[RISCV] Restore "Enable shrink wrap by default" This reverts commit `7af3d4ab3d`. RISC-V reverted the shrink wrap patch for bug 53662. Since the bug is fixed by D123679, the commit re-enable it. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D128965	2022-07-02 11:13:13 +08:00
Craig Topper	354e04554a	[RISCV] Make custom isel for (add X, imm) used by load/stores more selective. Only handle immediates that would produce an ADDI or ADDIW of Lo12 as the final instruction in their materialization. As the test change show this removes immediates that materialize with lui+addiw that is not the same as lui+addi.	2022-06-30 14:20:11 -07:00
Craig Topper	51d672946e	[RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) Similar for a subtract with a constant left hand side. (sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine for (sext (add (trunc X to i32), 32) to i32). For RISCV, we should lower this as addiw which means turning it into (sext_inreg (add X, C1)). There is an existing DAG combine to convert back to (sext (add (trunc X to i32), 32) to i32), but it requires isTruncateFree to return true and for i32 to be a legal type as it used sign_extend and truncate nodes. So that doesn't work for RISCV. If the outer sra happens be used by a shl by constant, it will be folded and the shift amount of the sra will be changed before we can do our own DAG combine. This requires us to match the more general pattern and restore the shl. I had wanted to do this as a separate (add (shl X, 32), C1<<32) -> (shl (add X, C1), 32) combine, but that hit an infinite loop for some values of C1. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128869	2022-06-30 09:01:24 -07:00
Craig Topper	9ace5af049	[RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C). The sext_inreg can often be folded into an earlier instruction by using a W instruction. The sext_inreg also works better with our ABI. This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128843	2022-06-30 09:01:24 -07:00
Craig Topper	781e3d7ad8	[RISCV] Pre-commit tests for D128869. NFC	2022-06-30 09:01:24 -07:00
Fraser Cormack	d5213c83ff	[RISCV] Add a test covering a (reverted) codegen issue This test checks one of problematic cases outlined in D128006, leading to the patch's reversal. I thought it best to add a test just in case this sort of optimization is attempted again in the future in some fashion.	2022-06-30 09:27:52 +01:00
Craig Topper	75095e6281	[RISCV] Pre-commit tests for D128843. NFC	2022-06-29 12:12:42 -07:00
Philip Reames	dd48d3ad0e	Revert "[RISCV] Avoid changing etype for splat of 0 or -1" This reverts commit `755c84c62c`. A bug was reported on the original review thread (https://reviews.llvm.org/D128006), and on inspection this patch is simply wrong. It needs to be checking for VLInBytes, not MaxVL. These happen to be the same when using AVL=VLMAX (which is quite common), but this does not fold when AVL != VLMAX.	2022-06-29 10:27:02 -07:00
Craig Topper	7cbfb4eb7a	[RISCV] Select (srl (and X, C2) as (slli (srliw X, C3), C3-C). If C2 has 32 leading zeros and C3 trailing zeros.	2022-06-29 09:15:09 -07:00
Philip Reames	860c62f53c	[RISCV] Refine known bits for READ_VLENB This implements known bits for READ_VALUE using any information known about minimum and maximum VLEN. There's an additional assumption that VLEN is a power of two. The motivation here is mostly to remove the last use of getMinVLen, but while I was here, I decided to also fix the bug for VLEN < 128 and handle max from command line generically too. Differential Revision: https://reviews.llvm.org/D128758	2022-06-28 15:42:14 -07:00
Craig Topper	02c8453e64	[RISCV] Teach RISCVMergeBaseOffset to handle read-modify-write of a global. The pass was previously limited to LUI+ADDI being used by a single instruction. This patch allows the pass to optimize multiple memory operations that use the same offset. Each of them will receive a separate %lo relocation. My main motivation is to handle a read-modify-write where we have a load and store to the same address, but I didn't restrict it to that case. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128599	2022-06-28 11:46:24 -07:00
Philip Reames	c755bf658f	[RISCV] Add test coverage for high known bits for vscale	2022-06-28 10:04:05 -07:00
Alex Bradbury	7bcfcabbd1	[RISCV] Implement support for the Zicbop extension Implements the ratified RISC-V Base Cache Management Operation ISA Extension: Zicbop, as described in https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf. This is implemented in a separate patch to Zicbom and Zicboz due to it requiring a new ASM operand type to be defined. Differential Revision: https://reviews.llvm.org/D117433	2022-06-28 12:43:26 +01:00
Alex Bradbury	4f40ca53ce	[RISCV] Implement support for the Zicbom and Zicboz extensions Implements the ratified RISC-V Base Cache Management Operation ISA Extensions: Zicbom and Zicboz, as described in https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf. Zicbop is implemented in a separate patch due to it requiring a new ASM operand type to be defined. As discussed in the relevant issue in the upstream spec https://github.com/riscv/riscv-CMOs/issues/47, the cbo.* instructions use the format (rs1) or 0(rs1) for their operand, similar to the AMOs. Differential Revision: https://reviews.llvm.org/D117432	2022-06-28 12:43:25 +01:00
Lian Wang	96ab083622	[RISCV] Support VECTOR_REVERSE mask operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128627	2022-06-28 07:48:51 +00:00
LiaoChunyu	1178992c72	[RISCV] Optimize 2x SELECT for floating-point types Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127871	2022-06-28 12:02:05 +08:00

1 2 3 4 5 ...

1997 Commits