llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	a21c557955	[RISCV] Remove Zbproposedc extension This consists of 3 compressed instructions, c.not, c.neg, and c.zext.w. I believe these have been picked up by the Zce effort using different encodings. I don't think it makes sense to keep them in bitmanip. It will eventually cause a conflict if/when Zce is implemented in llvm. Differential Revision: https://reviews.llvm.org/D110871	2021-09-30 14:23:05 -07:00
Sander de Smalen	6709b193ea	[SelectionDAG] Make WidenVecRes_EXTRACT_SUBVECTOR work for scalable vectors. The legalizer handles this by breaking up an EXTRACT_SUBVECTOR into smaller parts, and combines those together, padding the result with UNDEF vectors, e.g. nxv6i64 extract_subvector(nxv12i64, 6) <-> nxv8i64 concat( nxv2i64 extract_subvector(nxv16i64, 6) nxv2i64 extract_subvector(nxv16i64, 8) nxv2i64 extract_subvector(nxv16i64, 10) nxv2i64 undef) Reviewed By: frasercrmck, david-arm Differential Revision: https://reviews.llvm.org/D110253	2021-09-29 11:33:45 +01:00
Craig Topper	a2a07e8db3	[RISCV] Fold store of vmv.x.s to a vse with VL=1. This can avoid a loss of decoupling with the scalar unit on cores with decoupled scalar and vector units. We should support FP too, but those use extract_element and not a custom ISD node so it is a little different. I also left a FIXME in the test for i64 extract and store on RV32. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109482	2021-09-27 09:54:46 -07:00
Craig Topper	933182e948	[RISCV] Improve support for forming widening multiplies when one input is a scalar splat. If one input of a fixed vector multiply is a sign/zero extend and the other operand is a splat of a scalar, we can use a widening multiply if the scalar value has sufficient sign/zero bits. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D110028	2021-09-27 09:37:07 -07:00
Fraser Cormack	e2b46e336b	[DAGCombiner][VP] Fold zero-length or false-masked VP ops This patch adds a generic DAGCombine for vector-predicated (VP) nodes. Those for which we can determine that no vector element is active can be replaced by either undef or, for reductions, the start value. This is tested rather trivially at the IR level, where it's possible that we want to teach instcombine to perform this optimization. However, we can also see the zero-evl case arise during SelectionDAG legalization, when wide VP operations can be split into two and the upper operation emerges as trivially false. It's possible that we could perform this optimization "proactively" (both on legal vectors and before splitting) and reduce the width of an operation and insert it into a larger undef vector: ``` v8i32 vp_add x, y, mask, 4 -> v8i32 insert_subvector (v8i32 undef), (v4i32 vp_add xsub, ysub, mask, 4), i32 0 ``` This is somewhat analogous to similar vector narrow/widening optimizations, but it's unclear at this point whether that's beneficial to do this for VP ops for any/all targets. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D109148	2021-09-27 11:30:09 +01:00
Craig Topper	715cf6ffb9	[RISCV] Add another isel optimization for (and (shl X, c2), c1). Where c1 is a shifted mask with 32-c2 leading zeros and c3 trailing zeros and c3>c2. We can select it as (slli (srliw X, c3-c2), c3).	2021-09-24 15:10:25 -07:00
Stanislav Mekhanoshin	08d7eec06e	Revert "Allow rematerialization of virtual reg uses" Reverted due to two distcint performance regression reports. This reverts commit `92c1fd19ab`.	2021-09-24 10:26:11 -07:00
Hsiangkai Wang	7d39a8a921	[RISCV] (1/2) Add the tail policy argument to builtins/intrinsics. Add the tail policy argument to LLVM IR intrinsics. There are two policies for tail elements. Tail agnostic means users do not care about the values in the tail elements and tail undisturbed means the values in the tail elements need to be kept after the operation. In order to let users control the tail policy, we add an additional argument at the end of the argument list. For unmasked operations, we have no maskedoff and the tail policy is always tail agnostic. If users want to keep tail elements under unmasked operations, they could use all one mask in the masked operations to do it. So, we only add the additional argument for masked operations for most cases. There are exceptions listed below. In this patch, we do not handle the following cases to reduce the complexity of the patch. There could be two separate patches for them. * Use dest argument to control tail policy vmerge.vvm/vmerge.vxm/vmerge.vim (add _t builtins with additional dest argument) vfmerge.vfm (add _t builtins with additional dest argument) vmv.v.v (add _t builtins with additional dest argument) vmv.v.x (add _t builtins with additional dest argument) vmv.v.i (add _t builtins with additional dest argument) vfmv.v.f (add _t builtins with additional dest argument) vadc.vvm/vadc.vxm/vadc.vim (add _t builtins with additional dest argument) vsbc.vvm/vsbc.vxm (add _t builtins with additional dest argument) * Always has tail argument for masked/unmasked intrinsics Vector Single-Width Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Integer Multiply-Add Instructions (add _t and _mt builtins) Vector Single-Width Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Widening Floating-Point Fused Multiply-Add Instructions (add _t and _mt builtins) Vector Reduction Operations (add _t and _mt builtins) Vector Slideup Instructions (add _t and _mt builtins) Vector Slidedown Instructions (add _t and _mt builtins) Discussion: https://github.com/riscv/rvv-intrinsic-doc/pull/101 Differential Revision: https://reviews.llvm.org/D105092	2021-09-24 17:09:50 +08:00
Craig Topper	40b230f685	[RISCV] Limit transformAddImmMulImm to prevent an infinite loop. This fixes an issue reported in D108607.	2021-09-23 15:53:11 -07:00
Craig Topper	70f50114f3	[RISCV] Add another isel optimization for (and (shl x, c2), c1) Turn (and (shl x, c2), c1) -> (slli (srli x, c3-c2), c3) if c1 is a shifted mask with no leading zeros and c3 trailing zeros where c3 is greater than c2.	2021-09-23 14:18:07 -07:00
Craig Topper	8811227a0c	[RISCV] Add more tests for (and (shl x, C2), C1) that can be improved by using a pair of shifts. NFC These tests have C1 as a shifted mask having no leading zeros and C3 trailing zeros. If C3 is more than C2, we can select this as (slli (srli x, C3-C2), C3).	2021-09-23 14:18:07 -07:00
Craig Topper	4a69551d66	[RISCV] Add more isel optimizations for (and (shr x, c2), c1). Turn (and (shr x, c2), c1) -> (slli (srli x, c2+c3), c3) if c1 is a shifted mask with c2 leading zeros and c3 trailing zeros. When the leading zeros is C2+32 we can use SRLIW in place of SRLI.	2021-09-23 11:29:04 -07:00
Craig Topper	19734ae6f0	[RISCV] Add more tests for (and (srl x, C2), C1) that can be improved by using a pair of shifts. NFC These tests have C1 as a shifted mask having C2 leading zeros and some number of trailing zeros, C3. We can select this as (slli (srli x, C2+C3), C3) or (slli (srliw x, C2+C3), C3).	2021-09-23 11:29:04 -07:00
Fraser Cormack	e7c879a69d	[RISCV][VP] Add support for VP_REDUCE_* operations This patch adds codegen support for lowering the vector-predicated reduction intrinsics to RVV instructions. The process is similar to that of the other reduction intrinsics, save for the fact that every VP reduction has a start value. We reuse the existing custom "VL" nodes, adding extra patterns where required to handle non-true masks. To support these nodes, the `RISCVISD::VECREDUCE_*_VL` nodes have been given an explicit "merge" operand. This is to faciliate the VP reductions, where we must be careful to ensure that even if no operation is performed (when VL=0) we still produce the start value. The RVV reductions don't update the destination register under these conditions, so we tie the splatted start value to the output register. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107657	2021-09-23 11:11:05 +01:00
Hsiangkai Wang	ebc5feb4ed	[RISCV] Update mir tests.	2021-09-23 09:42:16 +08:00
Craig Topper	16ba77d19c	[RISCV] Remove stale FIXMEs from float-convert.ll and double-convert.ll. NFC	2021-09-22 14:25:40 -07:00
Craig Topper	f0a422f935	[RISCV] Add fcvt.s.w(u)/fcvt.d.w(u)/fcvt.h.w(u) to hasAllNBitUsers These instructions only read the lower 32 bits of their input.	2021-09-22 14:24:26 -07:00
Craig Topper	c7e78150f7	[RISCV] Add test cases showing failure to use ADDIW before fcvt.s.w/fcvt.d.w/fcvt.h.w. NFC By not using ADDIW we can cause both an ADDIW and ADDI to be emitted when the add has multiple users. These instructions needed be added to the list of instructions that only use the lower 32 bits of input. I've also added tests for the wu versions, but I'm having trouble showing bad codegen from it.	2021-09-22 14:24:26 -07:00
Craig Topper	b33a1cc05b	[RISCV] Optimize vp.store with an all ones mask to avoid a vmset. We can use riscv_vse intrinsic instead of riscv_vse_mask. The code here is based on similar code for handling masked.scatter and vp.scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D110206	2021-09-22 09:12:47 -07:00
Craig Topper	aeb63d464f	[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for and/or/xor. This requires a minor change to CodeGenPrepare to ensure that shouldSinkOperands will be called for And. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D110106	2021-09-21 10:07:29 -07:00
Ben Shi	b3052013b4	[RISCV] Optimize (add (mul x, c0), c1) Optimize (add (mul x, c0), c1) -> (ADDI (MUL (ADDI, c1/c0), c0), c1%c0), if c1/c0 and c1%c0 are simm12, while c1 is not. Optimize (add (mul x, c0), c1) -> (MUL (ADDI, c1/c0), c0), if c1%c0 is zero, and c1/c0 is simm12 while c1 is not. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108607	2021-09-21 14:13:14 +00:00
Craig Topper	c6e52b1e85	[RISCV] Add test cases for missed opportunities to use vand/vor/vxor.vx. NFC These are cases were the splat is in another basic block. CGP needs to sink it to expose the opportunity to SelectionDAG.	2021-09-20 13:45:40 -07:00
Craig Topper	a95ba81073	[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FMA. If either of the multiplicands is a splat, we can sink it to use vfmacc.vf or similar.	2021-09-20 11:49:50 -07:00
Craig Topper	792101fff7	[RISCV] Add test cases for missed opportunity to use vfmacc.vf. NFC This is another case of a splat being in another basic block preventing SelectionDAG from optimizing it.	2021-09-20 11:49:50 -07:00
Craig Topper	04ab6c85ef	[RISCV] Teach RISCVTargetLowering::shouldSinkOperands to sink splats for FAdd/FSub/FMul/FDiv.	2021-09-20 10:25:46 -07:00
Craig Topper	890027b314	[RISCV] Add test cases showing failure to use .vf vector operations when splat is in another basic block. NFC We should have CGP copy the splats into the same basic block as the FP operation so that SelectionDAG can fold them.	2021-09-20 10:25:38 -07:00
Craig Topper	d85e347a28	[RISCV] Add a pass to recognize VLS strided loads/store from gather/scatter. For strided accesses the loop vectorizer seems to prefer creating a vector induction variable with a start value of the form <i32 0, i32 1, i32 2, ...>. This value will be incremented each loop iteration by a splat constant equal to the length of the vector. Within the loop, arithmetic using splat values will be done on this vector induction variable to produce indices for a vector GEP. This pass attempts to dig through the arithmetic back to the phi to create a new scalar induction variable and a stride. We push all of the arithmetic out of the loop by folding it into the start, step, and stride values. Then we create a scalar GEP to use as the base pointer for a strided load or store using the computed stride. Loop strength reduce will run after this pass and can do some cleanups to the scalar GEP and induction variable. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107790	2021-09-20 09:39:44 -07:00
Ben Shi	dee5a8ca32	[RISCV] Optimize (add (shl x, c0), (shl y, c1)) with SHADD Optimize (add (shl x, c0), (shl y, c1)) -> (SLLI (SHADD x, y), c1), if c0-c1 == 1/2/3. Reviewed By: craig.topper, luismarques Differential Revision: https://reviews.llvm.org/D108916	2021-09-19 16:35:12 +08:00
Craig Topper	73e5b9ea90	[RISCV] Select (srl (sext_inreg X, i32), uimm5) to SRAIW if only lower 32 bits are used. SimplifyDemandedBits can turn srl into sra if the bits being shifted in aren't demanded. This patch can recover the original sra in some cases. I've renamed the tablegen class for detecting W users since the "overflowing operator" term I originally borrowed from Operator.h does not include srl. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D109162	2021-09-16 11:03:35 -07:00
Matt Arsenault	4a36e96c3f	RegAllocGreedy: Account for reserved registers in num regs heuristic This simple heuristic uses the estimated live range length combined with the number of registers in the class to switch which heuristic to use. This was taking the raw number of registers in the class, even though not all of them may be available. AMDGPU heavily relies on dynamically reserved numbers of registers based on user attributes to satisfy occupancy constraints, so the raw number is highly misleading. There are still a few problems here. In the original testcase that made me notice this, the live range size is incorrect after the scheduler rearranges instructions, since the instructions don't have the original InstrDist offsets. Additionally, I think it would be more appropriate to use the number of disjointly allocatable registers in the class. For the AMDGPU register tuples, there are a large number of registers in each tuple class, but only a small fraction can actually be allocated at the same time since they all overlap with each other. It seems we do not have a query that corresponds to the number of independently allocatable registers. Relatedly, I'm still debugging some allocation failures where overlapping tuples seem to not be handled correctly. The test changes are mostly noise. There are a handful of x86 tests that look like regressions with an additional spill, and a handful that now avoid a spill. The worst looking regression is likely test/Thumb2/mve-vld4.ll which introduces a few additional spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll shows a massive improvement by completely eliminating a large number of spills inside a loop.	2021-09-14 21:00:29 -04:00
Craig Topper	1b736bda3b	[RISCV] Enable CGP to sink splat operands of Add/Sub/Mul/Shl/LShr/AShr LICM may have pulled out a splat, but with .vx instructions we can fold it into an operation. This patch enables CGP to reverse the LICM transform and move the splat back into the loop. I've started with the commutable integer operations and shifts, but we can extend this with more operations in future patches. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109394	2021-09-10 09:04:01 -07:00
Craig Topper	6c7cadb8c1	[RISCV] Teach vsetvli insertion that stores don't use the policy bits in vtype. This can avoid a vsetvl after a tail undisturbed operation. Differential Revision: https://reviews.llvm.org/D109549	2021-09-10 09:03:20 -07:00
Craig Topper	8c4803dc93	[RISCV] Add test cases showing failure to fold splatted shift amounts across basic blocks. We should have CGP copy the splats into the same basic block as the shift so that SelectionDAG can fold them.	2021-09-09 12:45:30 -07:00
Yvan Roux	261cbe98c3	[RISCV] Fix Machine Outliner jump table handling. Don't outline machine instructions which are using jump table indexes since they are materialized as local labels (like the already handled case of constant pools). Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D109436	2021-09-09 07:32:30 +02:00
Craig Topper	a574f0e0c3	[RISCV] Disable use of i128 shift libcalls on RV32. Since i128 isn't a legal C type on RV32, I don't believe libgcc implements these functions for RV32. compiler-rt does implement them because i128 support is enabled in order to handle long double. This is consistent with 32-bit X86 and ARM. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D109383	2021-09-08 14:26:07 -07:00
Craig Topper	c00cb52854	[RISCV] Pre-commit tests for D109394. NFC	2021-09-08 10:25:38 -07:00
Craig Topper	b04c09c07c	[RISCV] Use V0 instead of VMV0: for mask vectors in isel patterns. This is consistent with the RVV intrinsic patterns. This has been shown to prevent some "ran out of registers" errors in our internal testing. Unfortunately, there are some regressions on LMUL=8 tests in here. I think the lack of registers with LMUL=8 just makes it very hard to schedule correctly. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109245	2021-09-08 09:46:21 -07:00
Craig Topper	1f16191906	[RISCV] Add an GPR def to the Zvlseg SPILL/RELOAD pseudos The expansion of these pseudos creates ADD instructions. Those ADDs modify a GPR so that it is no longer contains the same value as the input base pointer. Therefore, I believe we should have a GPR as a Def on these instructions and expansion should get the destination register for the ADDs from that operand. At least in our tests here this works out so that register scavenging picks the same register as the base pointer. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109405	2021-09-08 09:23:33 -07:00
David Green	d8d24c64fe	[DAG] Fix GT -> GE condition when creating SetCC `79845ed6df` folded some setcc(ashr) conditions to setcc, but got the condition for NE incorrect, using GT where it should be using GE.	2021-09-08 12:41:51 +01:00
Fraser Cormack	2c5568a6a9	[LegalizeTypes][VP] Add promotion support for binary VP ops This patch extends the preliminary support for vector-predicated (VP) operation legalization to include promotion of illegal integer vector types. Integer promotion of binary VP operations is relatively simple and piggy-backs on the non-VP logic, but passing the two extra mask and VP operands through to the promoted operation. Tests have been added to the RISC-V target to cover the basic scenarios for integer promotion for both fixed- and scalable-vector types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D108288	2021-09-08 10:22:57 +01:00
Fraser Cormack	a823bdf3ab	[RISCV][VP] Custom lower VP_STORE and VP_LOAD This patch adds support for the vector-predicated `VP_STORE` and `VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and `MLOAD`: to regular load/store instructions via intrinsics. One necessary change was made to `SelectionDAGLegalize` so that `VP_STORE` nodes' operation actions are taken from the stored "value" operands, in the same vein as `STORE` or `MSTORE`. Reviewed By: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D108999	2021-09-07 10:53:25 +01:00
Fraser Cormack	f4dee8cb82	[RISCV][VP] Custom lower VP_SCATTER and VP_GATHER This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by lowering them to RVV's `vsox`/`vlux` instructions, respectively. This process is almost identical to the existing `MSCATTER`/`MGATHER` support. One extra change was made to `SelectionDAGLegalize` so that `VP_SCATTER`'s operation action is derived from its stored "value" operand rather than its return type (which is always the chain). Reviewed By: craig.topper, rogfer01 Differential Revision: https://reviews.llvm.org/D108987	2021-09-07 10:43:07 +01:00
David Green	8523fb96a6	[DAG] Fold select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Given a select_cc producing a constant and a invertion of the constant for a comparison more than zero, we can produce an xor with ashr instead, which produces smaller code. The ashr either sets all bits or clear all bits depending on if the value is negative. This is then xor'd with the constant to optionally negate the value. https://alive2.llvm.org/ce/z/DTFaBZ This includes a OneUseCheck on the Cmp, which seems to make thinks a little worse and will be removed in a followup. Differential Revision: https://reviews.llvm.org/D109149	2021-09-05 16:04:01 +01:00
David Green	79845ed6df	[DAG] Fold setcc eq with ashr to compare to zero. Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 -> set_cc setlt X, 0 to prevent some regressions later on when folding select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C Differential Revision: https://reviews.llvm.org/D109214	2021-09-05 14:06:47 +01:00
David Green	7801d7963d	[DAG] Add tests for select_cc and setcc with constant patterns.	2021-09-05 10:17:21 +01:00
Craig Topper	75620fadf5	[RISCV] Change how we encode AVL operands in vector pseudoinstructions to use GPRNoX0. This patch changes the register class to avoid accidentally setting the AVL operand to X0 through MachineIR optimizations. There are cases where we really want to use X0, but we can't get that past the MachineVerifier with the register class as GPRNoX0. So I've use a 64-bit -1 as a sentinel for X0. All other immediate values should be uimm5. I convert it to X0 at the earliest possible point in the VSETVLI insertion pass to avoid touching the rest of the algorithm. In SelectionDAG lowering I'm using a -1 TargetConstant to hide it from instruction selection and treat it differently than if the user used -1. A user -1 should be selected to a register since it doesn't fit in uimm5. This is the rest of the changes started in D109110. As mentioned there, I don't have a failing test from MachineIR optimizations anymore. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D109116	2021-09-03 09:19:25 -07:00
Evandro Menezes	cd6064bb9e	[RISCV] Improve shrink wrap test (NFC) Restore test for shrink wrapping disabled.	2021-09-02 12:14:04 -05:00
Craig Topper	498e8ae412	[RISCV] Add Zba command line to rv64i-exhaustive-w-insts.ll Zba adds a zext.w pseudoinstruction using ADDUW. This can simplify the generated code for many of these tests. There are at least 2 suboptimal cases in this config that I've marked with TODOs.	2021-09-02 08:36:27 -07:00
Craig Topper	eaa560582a	[RISCV] Remove stale TODOs from test. NFC These were fixed by D106230.	2021-09-02 08:36:27 -07:00
Craig Topper	b5fd6b46f5	[RISCV] Teach instruction selection to elide sext.w in some cases. If a sext_inreg is up for isel, and all its users are W instructions, we can skip emitting the sext_inreg. This helpful if the producing instruction can't become a W instruction. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D108966	2021-09-02 07:54:34 -07:00

1 2 3 4 5 ...

1083 Commits