llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	39707c1a9a	[RISCV] Add test coverage for (select (icmp X, Y), float, float). NFC We fold integer setcc into SELECT_CC during DAG combine even if the SELECT_CC has FP result type, but we had no test coverage.	2022-08-16 21:28:26 -07:00
Craig Topper	d8cdd78b6c	[RISCV] Add test cases to show missed opportunity to fold (sub C, (xor (setcc), 1)). NFC (sub C, (xori X, 1)) can be folded to (add X, C-1) if X is 0 or 1. This would avoid the xori and in some cases remove an instruction neede to materialize the constant.	2022-08-16 16:40:53 -07:00
Craig Topper	53ce22e429	Recommit "[RISCV] Use setcc's original SDLoc when inverting it in performSUBCombine." This time using N1 instead of N0 since N1 points to the original setcc. This now affects scheduling as I expected. Original commit message: We change seteq<->setne but it doesn't change the semantics of the setcc. We should keep original debug location. This is consistent with visitXor in the generic DAGCombiner.	2022-08-16 15:51:07 -07:00
Craig Topper	b5a18de651	[RISCV] Remove C!=0 restriction from (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). While (sub 0, X) can use x0 for the 0, I believe (add X, -1) is still preferrable. (addi X, -1) can be compressed, sub with x0 on the LHS is never compressible.	2022-08-16 14:49:52 -07:00
Craig Topper	de6fd16971	[RISCV] Don't fold (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) if C-1 isn't simm12. We still need to materialize the constant in a register and we may not be removing all uses of the original constant so it may increase code size.	2022-08-16 14:11:31 -07:00
Craig Topper	1180ed41ee	[RISCV] Add more test cases for (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)). NFC In these test cases we do the transform, but the immediate is too large to form an ADDI so it didn't save any instructions. If the constant is opaque or has additional users we shouldn't do the transform if it doesn't form an ADDI.	2022-08-16 14:08:42 -07:00
Craig Topper	4854fa217f	[RISCV] Move test from setcc-logic.ll to select-const.ll. NFC Also add setne version of the test. Add some common prefixes to reduce number of identical CHECK lines.	2022-08-16 14:08:42 -07:00
Craig Topper	4184edc691	[RISCV] (sub C, (setcc x, y, eq/neq)) -> (add C-1, (setcc x, y, neq/eq)) fold for FP setcc. This introduce an xori in some cases. I don't believe it was the intention of the original patch. This was an accident because nonan FP equality compares also use SETEQ/SETNE. Also pass the correct type to getSetCCInverse.	2022-08-16 13:00:36 -07:00
Craig Topper	87e7837293	[RISCV] Add test cases to show where we inverted a fp setcc and introduced an extra xori. In these tests we had (sub C, (seteq X, Y)) which we converted to the (add (setne X, Y), C-1). We don't have a FNE compare instruction so this created an XORI to invert an FEQ instruction. This might be a good idea since it can save a constant materialization, but does not appear to be the intention of the original patch.	2022-08-16 12:59:16 -07:00
Craig Topper	7a73ab5818	[RISCV] Enable isTruncateFree in SDAG for i64->i32 on rv64. We have a good selection of W instructions, so promoting a truncated value back to i64 is often free. This appears to be a net code size reduction on SPECINT2006. This has been split from D130397 as one of the patches needed to complete that. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131819	2022-08-15 08:32:51 -07:00
Simon Pilgrim	3a73133217	[DAG] canCreateUndefOrPoison - add freeze(sign_extend_inreg(x,vt)) -> sign_extend_inreg(freeze(x),vt) support Guaranteed not to create undef/poison	2022-08-15 12:18:59 +01:00
Simon Pilgrim	7e294e676e	[DAG] canCreateUndefOrPoison - add freeze(assertsext/zext(x,bt)) -> assertsext/zext(freeze(x),vt) support These are guaranteed not to create undef/poison (although they may pass through) - the associated ISD::VALUETYPE node is also guaranteed never to generate poison	2022-08-15 11:13:43 +01:00
Craig Topper	b8c5420d74	[X86][RISCV] Pre-commit tests for D130862. NFC Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D131442	2022-08-14 16:31:15 -07:00
LiaoChunyu	99ef0ddea3	[RISCV] Fold (sub constant, (setcc x, y, eq/neq)) -> (add constant - 1, (setcc x, y, neq/eq)) (setcc x, y, eq/neq) are seqz, snez that set rd = 0/1. addi is used to process immediate, which can save instructions for load immediate. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D131471	2022-08-13 20:37:57 +08:00
Craig Topper	e493944f5f	[RISCV] Use SLTIU X, -1 for (setne X, -1). Since -1 is the maximum unsigned value, all values less than it are not equal to it.	2022-08-11 15:36:04 -07:00
Craig Topper	2c79801a0e	[RISCV] Add more ineg+setcc isel patterns to avoid creating neg+xori+slti(u). Including patterns to select addiw if only the lower 32 bits are used. I'm not excited about adding this many patterns. I'm looking at whether we can create the xori during lowering and move the ineg patterns to DAGCombiner.	2022-08-11 14:24:09 -07:00
Yeting Kuo	875694089d	[RISCV] Peephole optimization to fold merge.vvm and unmasked intrinsics. The patch uses peephole method to fold merge.vvm and unmasked intrinsics to masked intrinsics. Using peephole intead of tablegen patterns is to avoid large auto gnerated code. Note: The patch ignores segment loads since I don't know how to test them. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130442	2022-08-11 17:58:11 +08:00
Yeting Kuo	7050f2102e	[RISCV][test] Precommitted test for optimization for vmerge and unmasked intrinsics. Precommitted test cases for D130442. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130753	2022-08-11 13:23:08 +08:00
Alex Bradbury	47b1f8362a	[RISCV] Implement isUsedByReturnOnly TargetLowering hook in order to tailcall more libcalls Prior to this patch, libcalls inserted by the SelectionDAG legalizer could never be tailcalled. The eligibility of libcalls for tail calling is is partly determined by checking TargetLowering::isInTailCallPosition and comparing the return type of the libcall and the calleer. isInTailCallPosition in turn calls TargetLowering::isUsedByReturnOnly (which always returns false if not implemented by the target). This patch provides a minimal implementation of TargetLowering::isUsedByReturnOnly - enough to support tail calling libcalls on hard float ABIs. Soft-float ABIs are left for a follow on patch. libcall-tail-calls.ll also shows missed opportunities to tail call integer libcalls, but this is due to issues outside of the isUsedByReturnOnly hook. Differential Revision: https://reviews.llvm.org/D131087	2022-08-10 10:50:29 +01:00
Philip Reames	46fe24e2fb	[RISCV] Split check lines for fpclamptosat_vec test This is currently exercising scalarization code path; with vectors enabled, we hit a different code path. Explicitly exercise both so that both configurations have testing.	2022-08-09 15:03:29 -07:00
Philip Reames	8800b1103a	[RISCV] Pin a test to scalar lowering to preserve test intent [nfc] In an upcoming change to enable fixed length vector lowering via vector registers, the codepath exercised would change. Pin this to the old lowering.	2022-08-09 10:12:18 -07:00
Nikita Popov	f5ed0cb217	[RISCV] Add target feature to force-enable atomics This adds a +forced-atomics target feature with the same semantics as +atomics-32 on ARM (D130480). For RISCV targets without the +a extension, this forces LLVM to assume that lock-free atomics (up to 32/64 bits for riscv32/64 respectively) are available. This means that atomic load/store are lowered to a simple load/store (and fence as necessary), as these are guaranteed to be atomic (as long as they're aligned). Atomic RMW/CAS are lowered to __sync (rather than __atomic) libcalls. Responsibility for providing the __sync libcalls lies with the user (for privileged single-core code they can be implemented by disabling interrupts). Code using +forced-atomics and -forced-atomics are not ABI compatible if atomic variables cross the ABI boundary. For context, the difference between __sync and __atomic is that the former are required to be lock-free, while the latter requires a shared global lock provided by a shared object library. See https://llvm.org/docs/Atomics.html#libcalls-atomic for a detailed discussion on the topic. This target feature will be used by Rust's riscv32i target family to support the use of atomic load/store without atomic RMW/CAS. Differential Revision: https://reviews.llvm.org/D130621	2022-08-09 16:04:46 +02:00
Shubham Narlawar	ab4fc87a9d	[DAG] Emit table lookup from TargetLowering::expandCTTZ() This patch emits table lookup in expandCTTZ. Context - https://reviews.llvm.org/D113291 transforms set of IR instructions to cttz intrinsic but there are some targets which does not support CTTZ or CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ(). Differential Revision: https://reviews.llvm.org/D128911	2022-08-08 12:08:05 +01:00
Simon Pilgrim	e5e93b6130	[DAG] FoldConstantArithmetic - add initial support for undef elements in bitcasted binop constant folding FoldConstantArithmetic can fold constant vectors hidden behind bitcasts (e.g. vXi64 -> v2Xi32 on 32-bit platforms), but currently bails if either vector contains undef elements. These undefs can often occur due to SimplifyDemandedBits/VectorElts calls recognising that the upper bits are often unnecessary (e.g. funnel-shift/rotate implicit-modulo and AND masks). This patch adds a basic 'FoldValueWithUndef' handler that will attempt to constant fold if one or both of the ops are undef - so far this just handles the AND and MUL cases where we always fold to zero. The RISCV codegen increase is interesting - it looks like the BUILD_VECTOR lowering was loading a constant pool entry but now (with all elements defined constant) it can materialize the constant instead? Differential Revision: https://reviews.llvm.org/D130839	2022-08-08 11:53:56 +01:00
Craig Topper	75c64c7c4e	[RISCV] Don't use li+sh3add for constants that can use lui+add. If we're adding a constant that can't use addi we try a few tricks, one of which is using li+sh3add. We should not do this if lui+add would work. For example adding 8192. Using sh3add prevents folding a sext.w to form addw, thus increasing instruction count.	2022-08-05 12:47:03 -07:00
Philip Reames	9a9848f4b9	[RISCVInsertVSETVLI] Remove an unsound optimization This fixes a bug reported privately by @craig.topper. Here's an example which illustrates the problem: vsetivli a1, a0, e32, m1, ta, mu # both DefInfo and PrevInfo vsetivli a2, a1, e32, m4, ta, mu With the unsound result being: vsetivli a1, a0, e32, m1, ta, mu vsetivli a2, a0, e32, m4, ta, mu Consider the case where this is running on a machine with VLEN=512,. For this case, the VLMAXs are 16 and 64 respectively. Consider for a0 = 33. The correct result is: a1 = 16, and a2 = 16 After the unsound optimization: a1 = 16 and a2 = 33 This particular example used VLMAXs which differed by more than a power of two. With a difference of only one power of two, there's another form of this bug which involves the AVL < 2 x VLMAX special case, but that ones more complicated to construct as many examples turn out accidentally sound. This patch takes the approach of simply removing the unsound optimization, but there are multiple sound sub-cases of it. I plan to return to at least a couple of them, but figured it was cleaner to remove the unsound optimization (for ease of backporting), and then review the new optimizations on their own. Differential Revision: https://reviews.llvm.org/D131264	2022-08-05 12:13:08 -07:00
Craig Topper	12a1ca9c42	[RISCV] Relax another one use restriction in performSRACombine. When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), i32), C) it's possible that the add is used by multiple sras. We should allow the combine if all the SRAs will eventually be updated. After transforming all of the sras, the shls will share a single (sext_inreg (add X, C1), i32). This pattern occurs if an sra with 32 is used as index in multiple GEPs with different scales. The shl from the GEPs will be combined with the sra before we get a chance to match the sra pattern.	2022-08-04 14:32:31 -07:00
Craig Topper	a2de12c987	[RISCV] Relax a one use restriction performSRACombine When folding (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) ignore the use count on the (shl X, 32). The sext_inreg after the transform is free. So we're only making 2 new instructions, the add and the shl. So we only need to be concerned with replacing the original sra+add. The original shl can have other uses. This helps if there are multiple different constants being added to the same shl.	2022-08-04 11:25:08 -07:00
Lorenzo Albano	74940d2668	[VP] Add widening for VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D121114	2022-08-04 16:12:01 +02:00
wanglian	b6b0690355	[LegalizeTypes][VP] Add split operand support for VP float and integer casting Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D130685	2022-08-04 15:41:50 +08:00
Craig Topper	53d560b22f	[RISCV] Prevent infinite loop after D129980. D129980 converts (seteq (i64 (and X, 0xffffffff)), C1) into (seteq (i64 (sext_inreg X, i32)), C1). If bit 31 of X is 0, it will be turned back into an 'and' by SimplifyDemandedBits which can cause an infinite loop. To prevent this, check if bit 31 is 0 with computeKnownBits before doing the transformation. Fixes PR56905. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D131113	2022-08-03 15:19:07 -07:00
Alex Bradbury	9e966dd298	[RISCV][test] Add test for ability to tailcall libcalls Although there's good coverage of the libcalls within llvm/test/CodeGen, it's useful to have tests for all ABI and hard/soft-float combinations in order to properly test the logic that enables libcall tail calls (which will be implemented in a follow-up patch).	2022-08-03 19:27:59 +01:00
Alex Bradbury	28f12a09ae	[RISCV] Teach ComputeNumSignBitsForTargetNode about masked atomic intrinsics An unnecessary sext.w is generated when masking the result of the riscv_masked_cmpxchg_i64 intrinsic. Implementing handling of the intrinsic in ComputeNumSignBitsForTargetNode allows it to be removed. Although this isn't a particularly important optimisation, removing the sext.w simplifies implementation of an additional cmpxchg-related optimisation in D130192. Although I can't produce a test with different codegen for the other atomics intrinsics, these are added as well for completeness. Differential Revision: https://reviews.llvm.org/D130191	2022-08-03 13:41:58 +01:00
Craig Topper	c2d0685286	[RISCV] Simplify test case from D130931. NFC	2022-08-01 16:50:56 -07:00
Craig Topper	da5b1bf5bb	[RISCV] Teach RISCVMergeBaseOffset to merge %lo/%pcrel_lo into load/store after folding arithmetic. It's possible we have: lui a0, %hi(sym) addi a0, %lo(sym) addi a0, <offset1> lw a0, <offset2>(a0) We want to arrive at lui a0, %hi(sym+offset1+offset2) lw a0, %lo(sym+offset1+offset2) We currently fail to do this because we only consider loads/stores if we didn't find any arithmetic. This patch splits arithmetic folding and load/store folding into two separate phases. The load/store folding can no longer assume the offset in hi/lo is 0 so we must combine the offsets. I've applied the same simm32 limit that we applied in the arithmetic folding. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D130931	2022-08-01 15:33:21 -07:00
Craig Topper	e07a8155f5	[RISCV] Move Pre-RA pseudo expansion from addMachineSSAOptimization to addPreRegAlloc. addMachineSSAOptimization is skipped for -O0, but this pass is required for -O0.	2022-08-01 13:44:43 -07:00
Craig Topper	f9b05e6dad	[RISCV] Pre-commit tests for D130931. NFC	2022-08-01 12:57:09 -07:00
Craig Topper	450edb0b37	[RISCV] Explicitly select second operand of branch condition to X0. At least based on the lit tests, the coalescer sometimes fails to propagate the copy from X0 into the branch instruction. This patch does it manually during isel. The majority of the changes are from the select patterns. Some of the changes are just register allocation changes. Only the Select change affects the whether a b*z instruction is generated in the tests. I changed the branch pattern for consistency. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D130809	2022-08-01 11:16:48 -07:00
Lorenzo Albano	71b7c03fd6	[RISCV][VP] Custom lower VP_STRIDED_LOAD and VP_STRIDED_STORE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D121113	2022-08-01 09:23:45 -07:00
Luís Marques	0bc177b6f5	[RISCV] Extend the Merge Base Offset pass to handle AUIPC+ADDI Builds upon D123264, adding support for merging the low part of the LLA address into the load/store instruction offsets. Differential Revision: https://reviews.llvm.org/D123265	2022-08-01 11:30:02 +02:00
Luís Marques	260a641068	[RISCV] Pre-RA expand pseudos pass Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up patches to fold the addi of PseudoLLA instructions into the immediate operand of load/store instructions. Differential Revision: https://reviews.llvm.org/D123264	2022-07-31 23:19:00 +02:00
Craig Topper	d21b315360	[RISCV] Remove vmerges from vector ceil, floor, trunc lowering. Use masked operations to suppress spurious exception bits being set in fflags. Unfortunately, doing this adds extra copies.	2022-07-30 10:58:41 -07:00
Luís Marques	383bc7210e	[RISCV] Precommit test for D123265 Differential Revision: https://reviews.llvm.org/D128562	2022-07-30 00:13:42 +02:00
Craig Topper	e637feee80	[RISCV] Add isel pattern for (setne/eq GPR, -2048) For constants in the range [-2047, 2048] we use addi. If the constant is -2048 we can use xori. If we don't match this explicitly, we'll emit an LI for the -2048 followed by an XOR.	2022-07-29 14:07:38 -07:00
Craig Topper	2750873dfe	[RISCV] Update lowerFROUND to use masked instructions. This avoids a vmerge at the end and avoids spurious fflags updates. This isn't used for constrained intrinsic so we technically don't have to worry about fflags, but it doesn't cost much to support it. To support I've extend our FCOPYSIGN_VL node to support a passthru operand. Similar to what was done for VRGATHER*_VL nodes. I plan to do a similar update for trunc, floor, and ceil. Reviewed By: reames, frasercrmck Differential Revision: https://reviews.llvm.org/D130659	2022-07-28 10:05:19 -07:00
Simon Pilgrim	69d5a038b9	[DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits. There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP. Differential Revision: https://reviews.llvm.org/D77804	2022-07-28 14:10:44 +01:00
Craig Topper	a304d70ee9	[RISCV] Reorder (and/or/xor (shl X, C1), C2) if we can form ANDI/ORI/XORI. InstCombine and DAGCombine prefer to keep shl before binops. This patch teaches isel to convert to (shl (and/or/xor X, C1 >> C2), C2) if (C1 >> C2) is a simm12. The idea was taken from X86's isel code. There's a special case implemented for a sext_inreg between the shift and the binop. Differential Revision: https://reviews.llvm.org/D130610	2022-07-27 17:35:26 -07:00
Craig Topper	8d87f71e54	[RISCV] Pre-commit tests for D130610. NFC	2022-07-27 17:35:17 -07:00
Craig Topper	32622d6de4	[RISCV] Add isel pattern for (mul (and X, 0xffffffff), 3<<C) with Zba. We can use slli.uw by C followed by sh1add. Similar can be done for multiples of 5 and 9. We need to make sure that C is less than 32 to stay in bounds of the 5-bit immediate for slli.uw. We have existing patterns for (mul X, 3<<C) that use sh1add followed by slli. That order doesn't allow the and to be folded. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D130146	2022-07-27 09:41:59 -07:00
LiaoChunyu	bf4f9a468a	[RISCV]Enable isIntDivCheap when attribute is minsize Don't expand divisions by constants when attribute is minsize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D130543	2022-07-27 18:22:51 +08:00

1 2 3 4 5 ...

1891 Commits