llvm-project

Commit Graph

Author	SHA1	Message	Date
LiaoChunyu	3f68f0f816	[RISCV] Optimize 2x SELECT for floating-point types Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127871	2022-07-11 14:10:27 +08:00
Craig Topper	35ec8a423d	[RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools. I think it only makes sense to return true here if we aren't going to turn around and create a constant pool for the immmediate. I left out the check for useConstantPoolForLargeInts() thinking that even if you don't want the commpiler to create a constant pool you might still want to avoid materializing an integer that is already available in a global variable. Test file was copied from AArch64/ARM and has not been commited yet. Will post separate review for that. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D129402	2022-07-10 14:10:17 -07:00
Craig Topper	60450f91c8	[RISCV] Add test cases for inline memcpy expansion Test file was taken directly from AArch64/ARM. I've added RUN lines for aligned and unaligned since many of the test cases are strings that aren't aligned and have an odd size. Some of these test cases are modified by D129402. Differential Revision: https://reviews.llvm.org/D129403	2022-07-10 14:09:02 -07:00
Craig Topper	5f7641a3be	[RISCV] Modify the custom isel for (add X, imm) used by load/stores. We have custom isel that tries to select the Lo12 bits using a separate ADDI that can later folded into the load/store address by the post-isel peephole. This patch disables this if the load/store already had a non-zero offset. A non-zero offset implies that CodeGenPrepare split several large offsets used by different loads and stores into a common large offset and multiple small offsets that could be folded. Folding more of the lo12 bits changes this common offset by increasing the small offsets. While this can save an instruction to materialize the common offset, it can also prevent the small offsets from fitting in a compressed load/store instruction. Removing this also simplifies the last piece needed to fold the custom isel for add into SelectAddrRegImm and remove the post-isel peephole.	2022-07-09 22:47:27 -07:00
Craig Topper	92f1794d41	[RISCV] Mark fminnum_vl and fmaxnum_vl as commutable.	2022-07-08 10:19:09 -07:00
Craig Topper	069ba96660	[RISCV] Add commuted fixed vector vfmax.vf and vfmin.vf tests. NFC The ISD opcodes aren't marked commutable so we don't match these properly.	2022-07-08 10:19:09 -07:00
Philip Reames	264018d764	[RISCV] Mark vsadd(u)_vl as commutable This allows fixed length vectors involving splats on the LHS to commute into the _vx form of the instruction. Oddly, the generic canonicalization rules appear to catch the scalable vector cases. I haven't fully dug in to understand why, but I suspect it's because of a difference in how we represent splats (splat_vector vs build_vector). Differential Revision: https://reviews.llvm.org/D129302	2022-07-08 10:18:21 -07:00
Craig Topper	a246eb6814	[RISCV] Mark (s/u)min_vl and (s/u)max_vl as commutable.	2022-07-08 09:59:42 -07:00
Craig Topper	cd783bf997	[RISCV] Add fixed vector vmin(u).vx and vmax(u).vx tests. NFC	2022-07-08 09:59:41 -07:00
Sanjay Patel	8b75671314	[SDAG] try to replace subtract-from-constant with xor This is almost the same as the abandoned D48529, but it allows splat vector constants too. This replaces the x86-specific code that was added with the alternate patch D48557 with the original generic combine. This transform is a less restricted form of an existing InstCombine and the proposed SDAG equivalent for that in D128080: https://alive2.llvm.org/ce/z/OUm6N_ Differential Revision: https://reviews.llvm.org/D128123	2022-07-08 08:14:24 -04:00
Kito Cheng	5c45ae4108	[RISCV] Fix wrong register rename for store value during make-compressible optimization Current implementation will rename both register in store instructions if we store base address into memory with same base register, it's OK if the offset is 0, however that is wrong transform if offset isn't 0, give a smalle example here: sd a0, 808(a0) We should not transform into: addi a2, a0, 768 sd a2, 40(a2) That should just rename base address like this: addi a2, a0, 768 sd a0, 40(a2) Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128876	2022-07-08 18:07:17 +08:00
Kito Cheng	7b9a3b9d6d	[RISCV] Precommit testcase to show wrong result of make-compressible optimization Use following example to demo what happened now: li a1, 1 sd a1, 800(a0) sd a0, 808(a0) # Store base address into base + offset li a1, 2 sd a1, 816(a0) Current will optimizate into: li a1, 1 addi a2, a0, 768 sd a1, 32(a2) sd a2, 40(a2) # Wrong replacement for the source register. li a1, 2 sd a1, 48(a2) Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128875	2022-07-08 17:01:22 +08:00
Lian Wang	9cfb28d672	[RISCV] Change VECTOR_SPLICE mask operation from expand to promote Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128717	2022-07-08 06:20:22 +00:00
Lian Wang	99da3115d1	[RISCV] Recommit test for D128717 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128778	2022-07-08 02:21:33 +00:00
Lian Wang	ab9e8a3a6f	Revert "[RISCV] Precommit test for D128717" This reverts commit `b3b37f3ecf`.	2022-07-08 01:56:29 +00:00
Lian Wang	b3b37f3ecf	[RISCV] Precommit test for D128717 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128778	2022-07-08 01:47:01 +00:00
Diego Caballero	bf1758c3dc	Revert "[RISCV] Optimize 2x SELECT for floating-point types" This reverts commit `1178992c72`.	2022-07-07 22:54:00 +00:00
Philip Reames	439783da01	[RISCV] Adjust fixed vector coverage for get.active.lane.mask Make sure we include at least one case where the vsadd/vmsltu lowering requires only LMUL1. We should be able to generate all of the fixed vector variants from scalar to vector idioms, but this is probably not very important right now given the fixed length variants we'd actually use when vectorizing with LMUL=1 are reasonable.	2022-07-07 13:28:29 -07:00
Philip Reames	fa3783c907	[RISCV] Test coverage for missing commute of vsadd(u) For some reason, this appears to only happen with fixed length vectors. Scalable ones commute just fine in all the cases I've seen.	2022-07-07 09:11:44 -07:00
Philip Reames	6f4773f064	[RISCV] Add codegen coverage for get.active.lane.mask	2022-07-06 14:50:41 -07:00
Craig Topper	088bb8a328	[RISCV] Add more SHXADD patterns where the input is (and (shl/shr X, C2), C1) It might be possible to rewrite (and (shl/shr X, C2), C1) in a way that gives us a left shift that can be folded into a SHXADD.	2022-07-05 16:21:47 -07:00
Craig Topper	ac3e26bcff	[RISCV] Add more SHXADD tests. NFC	2022-07-05 13:41:58 -07:00
luxufan	c06d0b4d02	[RISCV] Add ADDI instr for computing FrameIndex address RVV doesn't have immediate field for memory addressing. Currently we build MachineInstructions in PEI to computing stack offset for RVV load store instructions. These instructions were added too late to can be optimized by CSE, LICM... passes. This patch makes FrameIndex SDNodes can't be matched in RVV Load Store instruction selection patterns. So that the FrameIndex SDNodes would be selected as `ADDI GPR, targetframeindex`. There are 2 advantages for such change: 1. Stack objects address computing can be optimized by machine function passes. 2. Since the ADDI instruction's destination register can be used as a temp register, we can save an emergency spill slot. Differential Revision: https://reviews.llvm.org/D128187	2022-07-04 22:13:35 +08:00
Craig Topper	d36e09cfe5	[RISCV] Add more SHXADD patterns. This handles the code we get for this. int foo(unsigned x, int *y) { return y[x >> 3]; } The srl and shl implied by the array index will be combined to form (srl (and X, C2), C1). We need to reverse this get to back the shl to fold into SHXADD.	2022-07-03 21:57:05 -07:00
luxufan	0f45eaf0da	[RISCV] Add a scavenge spill slot when use ADDI to compute scalable stack offset Computing scalable offset needs up to two scrach registers. We add scavenge spill slots according to the result of `RISCV::isRVVSpill` and `RVVStackSize`. Since ADDI is not included in `RISCV::isRVVSpill`, PEI doesn't add scavenge spill slots for scrach registers when using ADDI to get scalable stack offsets. The ADDI instruction has a destination register which can be used as a scrach register. So one scavenge spil slot is sufficient for computing scalable stack offsets. Differential Revision: https://reviews.llvm.org/D128188	2022-07-03 20:18:13 +08:00
Craig Topper	7e4ab9d5b8	[RISCV] Add more SHXADD isel patterns. This handles the code we get for int foo(int* x, unsigned y) { return x[y >> 1]; } The shift right and the shl will get DAG combined into (shl (and X, 0xfffffffe), 1). We have custom isel to match the shl+and, but with Zba the (add (shl X, 1), Y) part will get matched and leave the and to be iseled by itself. This commit adds a larger pattern that includes the and.	2022-07-02 23:11:22 -07:00
Craig Topper	b2e9684fe4	[RISCV] isel (shl (and X, C2), C) -> (slli (srliw X, C3), C3+C). where C2 has 32 leading zeros and C3 trailing zeros. When the shl is used by an add C is 1,2 or 3, we end up matching (add (shl X, C), Y) first. This leaves an and with a constant that is harder to materialize.	2022-07-02 01:04:44 -07:00
Craig Topper	9ac548e118	[RISCV] isel (add (and X, 0xFFFFFFFE), Y) as (SH1ADD (SRLIW X, 1), Y). Similar for SH2ADD and SH3ADD. This is what we get from int foo(int* x, unsigned y) { return x[y >> 1]; } This allows us to avoid materializing 0xFFFFFFFE into a register.	2022-07-01 23:52:29 -07:00
Yeting Kuo	5744b9cb79	[RISCV] Restore "Enable shrink wrap by default" This reverts commit `7af3d4ab3d`. RISC-V reverted the shrink wrap patch for bug 53662. Since the bug is fixed by D123679, the commit re-enable it. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D128965	2022-07-02 11:13:13 +08:00
Craig Topper	354e04554a	[RISCV] Make custom isel for (add X, imm) used by load/stores more selective. Only handle immediates that would produce an ADDI or ADDIW of Lo12 as the final instruction in their materialization. As the test change show this removes immediates that materialize with lui+addiw that is not the same as lui+addi.	2022-06-30 14:20:11 -07:00
Craig Topper	51d672946e	[RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) Similar for a subtract with a constant left hand side. (sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine for (sext (add (trunc X to i32), 32) to i32). For RISCV, we should lower this as addiw which means turning it into (sext_inreg (add X, C1)). There is an existing DAG combine to convert back to (sext (add (trunc X to i32), 32) to i32), but it requires isTruncateFree to return true and for i32 to be a legal type as it used sign_extend and truncate nodes. So that doesn't work for RISCV. If the outer sra happens be used by a shl by constant, it will be folded and the shift amount of the sra will be changed before we can do our own DAG combine. This requires us to match the more general pattern and restore the shl. I had wanted to do this as a separate (add (shl X, 32), C1<<32) -> (shl (add X, C1), 32) combine, but that hit an infinite loop for some values of C1. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128869	2022-06-30 09:01:24 -07:00
Craig Topper	9ace5af049	[RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C). The sext_inreg can often be folded into an earlier instruction by using a W instruction. The sext_inreg also works better with our ABI. This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128843	2022-06-30 09:01:24 -07:00
Craig Topper	781e3d7ad8	[RISCV] Pre-commit tests for D128869. NFC	2022-06-30 09:01:24 -07:00
Fraser Cormack	d5213c83ff	[RISCV] Add a test covering a (reverted) codegen issue This test checks one of problematic cases outlined in D128006, leading to the patch's reversal. I thought it best to add a test just in case this sort of optimization is attempted again in the future in some fashion.	2022-06-30 09:27:52 +01:00
Craig Topper	75095e6281	[RISCV] Pre-commit tests for D128843. NFC	2022-06-29 12:12:42 -07:00
Philip Reames	dd48d3ad0e	Revert "[RISCV] Avoid changing etype for splat of 0 or -1" This reverts commit `755c84c62c`. A bug was reported on the original review thread (https://reviews.llvm.org/D128006), and on inspection this patch is simply wrong. It needs to be checking for VLInBytes, not MaxVL. These happen to be the same when using AVL=VLMAX (which is quite common), but this does not fold when AVL != VLMAX.	2022-06-29 10:27:02 -07:00
Craig Topper	7cbfb4eb7a	[RISCV] Select (srl (and X, C2) as (slli (srliw X, C3), C3-C). If C2 has 32 leading zeros and C3 trailing zeros.	2022-06-29 09:15:09 -07:00
Philip Reames	860c62f53c	[RISCV] Refine known bits for READ_VLENB This implements known bits for READ_VALUE using any information known about minimum and maximum VLEN. There's an additional assumption that VLEN is a power of two. The motivation here is mostly to remove the last use of getMinVLen, but while I was here, I decided to also fix the bug for VLEN < 128 and handle max from command line generically too. Differential Revision: https://reviews.llvm.org/D128758	2022-06-28 15:42:14 -07:00
Craig Topper	02c8453e64	[RISCV] Teach RISCVMergeBaseOffset to handle read-modify-write of a global. The pass was previously limited to LUI+ADDI being used by a single instruction. This patch allows the pass to optimize multiple memory operations that use the same offset. Each of them will receive a separate %lo relocation. My main motivation is to handle a read-modify-write where we have a load and store to the same address, but I didn't restrict it to that case. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128599	2022-06-28 11:46:24 -07:00
Philip Reames	c755bf658f	[RISCV] Add test coverage for high known bits for vscale	2022-06-28 10:04:05 -07:00
Alex Bradbury	7bcfcabbd1	[RISCV] Implement support for the Zicbop extension Implements the ratified RISC-V Base Cache Management Operation ISA Extension: Zicbop, as described in https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf. This is implemented in a separate patch to Zicbom and Zicboz due to it requiring a new ASM operand type to be defined. Differential Revision: https://reviews.llvm.org/D117433	2022-06-28 12:43:26 +01:00
Alex Bradbury	4f40ca53ce	[RISCV] Implement support for the Zicbom and Zicboz extensions Implements the ratified RISC-V Base Cache Management Operation ISA Extensions: Zicbom and Zicboz, as described in https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf. Zicbop is implemented in a separate patch due to it requiring a new ASM operand type to be defined. As discussed in the relevant issue in the upstream spec https://github.com/riscv/riscv-CMOs/issues/47, the cbo.* instructions use the format (rs1) or 0(rs1) for their operand, similar to the AMOs. Differential Revision: https://reviews.llvm.org/D117432	2022-06-28 12:43:25 +01:00
Lian Wang	96ab083622	[RISCV] Support VECTOR_REVERSE mask operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128627	2022-06-28 07:48:51 +00:00
LiaoChunyu	1178992c72	[RISCV] Optimize 2x SELECT for floating-point types Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127871	2022-06-28 12:02:05 +08:00
Craig Topper	9afa5b8da2	[RISCV] Add tests for (load (add X, [2048,4094])). NFC Offsets in the range [-4095,-2049] or [2048, 4094] are split into two ADDIs. One of the ADDIs will be folded into the load/store immediate through an post-isel peephole.	2022-06-27 13:42:57 -07:00
Bradley Smith	a83aa33d1b	[IR] Move vector.insert/vector.extract out of experimental namespace These intrinsics are now fundemental for SVE code generation and have been present for a year and a half, hence move them out of the experimental namespace. Differential Revision: https://reviews.llvm.org/D127976	2022-06-27 10:48:45 +00:00
Craig Topper	3d37e785c7	[RISCV] Merge more rv32/rv64 vector intrinsic tests that contain the same content.	2022-06-25 13:21:44 -07:00
Craig Topper	78a31bb969	[RISCV] Change how we isel (add X, [-4096, -2049]) or (add X, [2048,4095]). We currently split the immediate almost equally between two addis. If the immediate is odd, it won't be split exactly equal. This patch instead gives one addi an immediate of 2047 or -2048 and the other getsthe remainder. If the original immediate is near -2049 or 2048, this might allow the use of c.addi for the addi that receives the smaller immediate. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D128500	2022-06-24 08:31:52 -07:00
Craig Topper	c579ab53bd	[RISCV] Move vfma_vl+fneg_vl matching to DAG combine. This patch adds 3 new _VL RISCVISD opcodes to represent VFMA_VL with different portions negated. It also adds a DAG combine to peek through FNEG_VL to create these new opcodes. This is modeled after similar code from X86. This makes the isel patterns more regular and reduces the size of the isel table by ~37K. The test changes look like regressions, but they point to a bug that was already there. We aren't able to commute a masked FMA instruction to improve register allocation because we always use a mask undisturbed policy. Prior to this patch we matched two multiply operands in a different order and hid this issue for these test cases, but a different test still could have encountered it. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D128310	2022-06-24 00:00:37 -07:00
Lian Wang	770fe864fe	[SelectionDAG] Enable WidenVecOp_VECREDUCE for scalable vector Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128239	2022-06-24 02:32:53 +00:00
Philip Reames	1cc9792281	[RISCV] Fix a crash in InsertVSETVLI where we hadn't properly guarded for a SEWLMULRatioOnly abstract state A forward abstract state can be in the special SEWLMULRatioOnly state which means we're not allowed to inspect its fields. The scalar to vector move case was mising a guard, and we'd crash on an assert. Test cases included.	2022-06-23 10:25:16 -07:00
Craig Topper	8b10ffabae	[RISCV] Disable <vscale x 1 x > types with Zve32x or Zve32f. According to the vector spec, mf8 is not supported for i8 if ELEN is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32. Since RVVBitsPerBlock is 64 and LMUL is calculated as ((MinNumElements ElementSize) / RVVBitsPerBlock) this means we need to disable any type with MinNumElements==1. For generic IR, these types will now be widened in type legalization. For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan to work on disabling the intrinsics in the riscv_vector.h header. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D128286	2022-06-23 08:49:18 -07:00
Craig Topper	4045b62d4c	[RISCV] Add macrofusion infrastructure and one example usage. This adds the macrofusion plumbing and support fusing LUI+ADDI(W). This is similar to D73643, but handles a different case. Other cases can be added in the future. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D128393	2022-06-23 08:38:39 -07:00
Craig Topper	59cde2133d	Recommit "[RISCV] Enable subregister liveness tracking for RVV." The failure that caused the previous revert has been fixed by https://reviews.llvm.org/D126048 Original commit message: RVV makes heavy use of subregisters due to LMUL>1 and segment load/store tuples. Enabling subregister liveness tracking improves the quality of the register allocation. I've added a command line that can be used to turn it off if it causes compile time or functional issues. I used the command line to keep the old behavior for one interesting test case that was testing register allocation. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D128016	2022-06-20 20:46:06 -07:00
Lian Wang	ab25e263a9	[SelectionDAG] Enable WidenVecOp_VECREDUCE_SEQ for scalable vector Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D127710	2022-06-20 06:30:26 +00:00
Craig Topper	314dbde12c	[DAGCombiner][ARM][RISCV] Teach ShrinkLoadReplaceStoreWithStore to use truncstore. The VT we want to shrink to may not be legal especially after type legalization. Fixes PR56110. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D128135	2022-06-19 15:50:15 -07:00
Craig Topper	545a71c0d6	[RISCV] Pre-promote v1i1/v2i1/v4i1->i1/i2/i4 bitcasts before type legalization Type legalization will convert the bitcast into a vector store and scalar load. Instead this patch widens the vector to v8i1 with undef, and bitcasts it to i8. v8i1->i8 has custom handling for type legalization already to bitcast to a v1i8 vector and use an extract_element. The code here was lifted from X86's avx512 support. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D128099	2022-06-18 11:06:45 -07:00
Han-Kuan Chen	e29133629b	[MachineCopyPropagation][RISCV] Fix D125335 accidentally change control flow. D125335 makes regsOverlap skip following control flow, which is not entended in the original code. Differential Revision: https://reviews.llvm.org/D128039	2022-06-17 21:40:08 -07:00
Han-Kuan Chen	dbfb00a930	[MachineCopyPropagation][RISCV] Add test case showing failure for MachineCopyPropagation. NFC This is a pre-commit test cases for D128039. Differential Revision: https://reviews.llvm.org/D128040	2022-06-17 21:39:53 -07:00
Philip Reames	4d245f1bc2	[RISCV] Move store policy and mask reg ops into demanded handling in InsertVSETVLI Doing so let's the post-mutation pass leverage the demanded info to rewrite vsetvlis before a store/mask-op to eliminate later vsetvlis. Sorry for the lack of store test change; all of my attempts to write something reasonable have been handled through existing logic.	2022-06-17 12:09:50 -07:00
Philip Reames	755c84c62c	[RISCV] Avoid changing etype for splat of 0 or -1 A splat of the values 0 and -1 as sign extended 12 bit immediates are always the same bit pattern regardless of the etype used to perform the operation. As a result, we can sometimes avoid introducing a vsetvli just for the purposes of a splat. Looking at the diffs, we don't get a huge amount of immediate value out of this. We mostly push the vsetvli one instruction down, usually in front of a vmerge. We also don't get the corresponding fixed length vector cases because VL typically is changed despite the actual bits written being the same. Both of these are areas I plan to explore in future patches. Interestingly, this makes a great example of why we need the forward and backward implementation to be consistent. Before we merged the demanded field handling, if we implement only the forward direction, we lost the ability to mutate a prior vsetvli and eliminate a later one entirely. This resulted in practical regressions instead of improvements. It's always nice when practice matches theory. :) Differential Revision: https://reviews.llvm.org/D128006	2022-06-17 08:10:14 -07:00
Lian Wang	16215eb979	[LegalizeTypes][RISCV][NFC] Modify assert in PromoteIntRes_STEP_VECTOR and add some tests for RISCV Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D127939	2022-06-17 02:26:09 +00:00
Lian Wang	5e16a781ba	[RISCV][NFC][test] Correct a wrong test in vreductions-fp-vp.ll Reviewed By: victor-eds, frasercrmck Differential Revision: https://reviews.llvm.org/D127946	2022-06-17 02:22:59 +00:00
Craig Topper	9d7b01dc95	[RISCV] Implement RISCVTargetLowering::getTargetConstantFromLoad. This allows computeKnownBits to see the constant being loaded. This recovers the rv64zbp test case changes from D127520. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127679	2022-06-16 15:11:18 -07:00
Craig Topper	e6c7a3a54f	[SelectionDAG] Don't apply MinRCSize constraint in InstrEmitter::AddRegisterOperand for IMPLICIT_DEF sources. MinRCSize is 4 and prevents constrainRegClass from changing the register class if the new class has size less than 4. IMPLICIT_DEF gets a unique vreg for each use and will be removed by the ProcessImplicitDef pass before register allocation. I don't think there is any reason to prevent constraining the virtual register to whatever register class the use needs. The attached test case was previously creating a copy of IMPLICIT_DEF because vrm8nov0 has 3 registers in it. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D128005	2022-06-16 14:55:14 -07:00
Philip Reames	89a11ebd8e	[RISCV] Avoid reducing etype just to initialize lane 0 of an undef vector If we're writing to an undef vector (i.e. implicit_def), we can change the value of bits outside the requested write without consequence. This allows us to avoid a VSETVLI just for narrowing the value written. Differential Revision: https://reviews.llvm.org/D127880	2022-06-16 11:14:21 -07:00
Craig Topper	912a5172f8	[RISCV] Use TAIL_AGNOSTIC in riscv_fma_vl patterns. We may eventually need tail undisturbed patterns, but we will need a policy operand on the ISD node to communicate it.	2022-06-16 09:09:36 -07:00
Philip Reames	4a3e46115a	[RISCV] Extend demanded field transform in InsertVSETVLI to VTYPE subfeilds The motivating case, and the only one actually enabled by this patch, is a load or store followed by another op with the same SEW/LMUL ratio. As an example, consider: define void @test1(ptr %in, ptr %out) { entry: %0 = load <8 x i16>, ptr %in, align 2 %1 = sext <8 x i16> %0 to <8 x i32> store <8 x i32> %1, ptr %out, align 4 ret void } Without this patch, we get: vsetivli zero, 8, e16, mf4, ta, mu vle16.v v8, (a0) vsetvli zero, zero, e32, mf2, ta, mu vsext.vf2 v9, v8 vse32.v v9, (a1) ret Whereas with the patch we get: vsetivli zero, 8, e32, mf2, ta, mu vle16.v v8, (a0) vsext.vf2 v9, v8 vse32.v v9, (a1) ret We have rewritten the first vsetvli and thus removed the second one. As is strongly hinted by the code structure and todos, I am planning on communing this with all (or most all?) of the cases from isCompatible used in the forward data flow. This will be done in a series of following changes - some NFC reworks, and some reviewed optimization extensions. Differential Revision: https://reviews.llvm.org/D127780	2022-06-16 08:01:27 -07:00
Kito Cheng	e9f7263b38	Reland "[SplitKit] Handle early clobber + tied to def correctly" This reverts commit `7207373e1e`. We found another RISC-V bug when landing D126048, and it has been fixed by D127642 now. Differential Revision: https://reviews.llvm.org/D126048	2022-06-16 17:13:09 +08:00
Kito Cheng	8e16c4db57	Reland "[RISCV] Testcase to show wrong register allocation result of subreg liveness" This reverts commit `6a6f632b93`. This commit will failed when EXPENSIVE_CHECKS enabled, fixed by D126048. Differential Revision: https://reviews.llvm.org/D126047	2022-06-16 17:13:09 +08:00
Kito Cheng	687e56614f	[RISCV] Fixing undefined physical register issue when subreg liveness tracking enabled. RISC-V expand register tuple spilling into series of register spilling after register allocation phase by the pseudo instruction expansion, however part of register tuple might be still undefined during spilling, machine verifier will complain the spill instruction is using an undefined physical register. Optimal solution should be doing liveness analysis and do not emit spill and reload for those undefined parts, but accurate liveness info at that point is not so easy to get. So the suboptimal solution is still spill and reload those undefined parts, but adding implicit-use of super register to spill function, then machine verifier will only report report using undefined physical register if the when whole super register is undefined, and this behavior are also documented in MachineVerifier::checkLiveness[1]. Example for demo what happend: ``` v10m2 = xxx # v12m2 not define yet PseudoVSPILL2_M2 v10m2_v12m2 ... ``` After expansion: ``` v10m2 = xxx # v12m2 not define yet # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r VS2R_V v10m2 VS2R_V v12m2 # Use undef reg! ``` What this patch did: ``` v10m2 = xxx # v12m2 not define yet # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r VS2R_V v10m2 implicit v10m2_v12m2 # Use undef reg (v12m2), but v10m2_v12m2 ins't totally undef, so # that's OK. VS2R_V v12m2 implicit v10m2_v12m2 ``` [1] https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/MachineVerifier.cpp#L2016-L2019 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127642	2022-06-15 16:23:39 +08:00
Yeting Kuo	9096a52566	[RISCV] Teach vsetvli insertion to not insert redundant vsetvli right after VLEFF/VLSEGFF. VSETVLIInfos right after VLEFF/VLSEGFF are currently unknown since they modify VL. Unknown VSETVLIInfos make next vector operations needed to be inserted VSET(I)VLI. Actually the next vector operation of VLEFF/VLSEGFF may not need to be inserted VSET(I)VLI if it uses same VTYPE and the resulted vl of VLEFF/VLSEGFF. Take the below C code as an example, vint8m4_t vec_src1 = vle8ff_v_i8m4(str1, &new_vl, vl); vbool2_t mask1 = vmseq_vx_i8m4_b2(vec_src1, 0, new_vl); vsetvli insertion adds a redundant vsetvli for that, Assembly result: vsetvli a2,a2,e8,m4,ta,mu vle8ff.v v28,(a0) csrr a3,vl ; redundant vsetvli zero,a3,e8,m4,ta,mu ; redundant vmseq.vi v25,v28,0 After D126794, VLEFF/VLSEGFF has a define having value of VL. The patch consider there is a ghost vsetvli right after VLEFF/VLSEGFF. The ghost VSET(I)LIs use the vl output of the VLEFF/VLSEGFF as its AVL and same VTYPE of the VLEFF/VLSEGFF. The ghost vsetvli must be redundant, and we could use it to get the VSETVLIInfo right after VLEFF/VLSEGFF. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127576	2022-06-15 13:58:40 +08:00
Ping Deng	c06f77ec0d	[SelectionDAG] fold 'Op0 - (X * MulC)' to 'Op0 + (X << log2(-MulC))' Reviewed By: craig.topper, spatel Differential Revision: https://reviews.llvm.org/D127474	2022-06-15 05:50:18 +00:00
Ping Deng	a0af049614	[RISCV][NFC] Add more tests for instruction selection of 'mul' precommit tests for D127474 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127475	2022-06-15 03:48:17 +00:00
Craig Topper	e4062522d3	[RISCV] Disable matchSplatAsGather for i1 vectors to prevent creating illegal nodes. We were incorrectly creating a VRGATHER node with i1 vector type. We could support this by promoting the mask to i8 and truncating it, but for now I want to prevent the crash. Fixes PR56007. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127681	2022-06-13 13:41:39 -07:00
Philip Reames	6bab045d08	[RISCV] Add basic fshr/fshl cost and codegen coverage	2022-06-13 11:49:53 -07:00
Craig Topper	bb1a52aa8b	Recommit "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates." With fix for sanitizer build bot failure.	2022-06-13 11:35:44 -07:00
Mitch Phillips	9d99870590	Revert "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates." This reverts commit `8bbcb98848`. Broke the UBSan bot. More details in https://reviews.llvm.org/D127376.	2022-06-13 10:16:28 -07:00
Craig Topper	cef03e3dcd	[RISCV] Move creation of constant pools from isel to lowering. This simplifies the isel code by removing the manual load creation. It also improves our ability to use 0 strided loads for vector splats. There is an assumption here that Mask and ShiftedMask constants are cheap enough that they don't become constant pool loads so that our isel optimizations involving And still work. I believe those constants are 3 instructions in the worst case. The rv64zbp-intrinsic.ll changes is a regression caused by intrinsics being expanded to RISCVISD also occuring during lowering. So the optimizations were only happening during the last DAGCombine, which can't see through the load. I believe we can fix this test by implementing TargetLowering::getTargetConstantFromLoad for RISC-V or by adding the intrinsic to computeKnownBitsForTargetNode to enable earlier DAG combine. Since Zbp is not a ratified extension, I don't view these as blocking this patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127520	2022-06-13 09:07:57 -07:00
Simon Pilgrim	7d8fd4f5db	[DAG] visitINSERT_VECTOR_ELT - attempt to reconstruct BUILD_VECTOR before other fold interfere Another issue unearthed by D127115 We take a long time to canonicalize an insert_vector_elt chain before being able to convert it into a build_vector - even if they are already in ascending insertion order, we fold the nodes one at a time into the build_vector 'seed', leaving plenty of time for other folds to alter it (in particular recognising when they come from extract_vector_elt resulting in a shuffle_vector that is much harder to fold with). D127115 makes this particularly difficult as we're almost guaranteed to have the lost the sequence before all possible insertions have been folded. This patch proposes to begin at the last insertion and attempt to collect all the (oneuse) insertions right away and create the build_vector before its too late. Differential Revision: https://reviews.llvm.org/D127595	2022-06-13 11:48:18 +01:00
Craig Topper	08ea27bf13	[RISCV] Don't require loop simplify form in RISCVGatherScatterLowering. We need a preheader and a single latch, but we don't need a dedicated exit. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127513	2022-06-10 13:00:20 -07:00
Craig Topper	a639e1fceb	[RISCV] Add test case showing failure to convert gather/scatter to strided load/store. NFC Our optimization pass checks for loop simplify form, before doing the transform. The loops here aren't in loop simplify form because the exit block has two predecessors. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127451	2022-06-10 13:00:20 -07:00
Nikita Popov	c10921fa1a	[CGP] Also freeze ctlz/cttz operand when despeculating D125887 changed the ctlz/cttz despeculation transform to insert a freeze for the introduced branch on zero. While this does fix the "branch on poison" issue, we may still get in trouble if we pick a different value for the branch and for the ctz argument (i.e. non-zero for the branch, but zero for the ctz). To avoid this, we should use the same frozen value in both positions. This does cause a regression in RISCV codegen by introducing an additional sext. The DAG looks like this: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %3 t4: i64 = AssertSext t2, ValueType:ch:i32 t23: i64 = freeze t4 t9: ch = CopyToReg t0, Register:i64 %0, t23 t16: ch = CopyToReg t0, Register:i64 %4, Constant:i64<32> t18: ch = TokenFactor t9, t16 t25: i64 = sign_extend_inreg t23, ValueType:ch:i32 t24: i64 = setcc t25, Constant:i64<0>, seteq:ch t28: i64 = and t24, Constant:i64<1> t19: ch = brcond t18, t28, BasicBlock:ch<cond.end 0x8311f68> t21: ch = br t19, BasicBlock:ch<cond.false 0x8311e80> I don't see a really obvious way to improve this, as we can't push the freeze past the AssertSext (which may produce poison). Differential Revision: https://reviews.llvm.org/D126638	2022-06-10 09:46:10 +02:00
Yeting Kuo	f68cad9087	[RISCV] Lower VLEFF/VLSEGFF SDNodes to MachineInstrs with VL outputs. The patch is a replacement of D125199. PseudoReadVL with vtype has worry for computing same vtypes of VLEFF/VLSEGFF in two different places, DAGToDAG and InsertVSETVLI. VLEFF/VLSEGFF MI with VL output still could provide the vtype of VLEFF/VLSEGFF to the users of its VL. The patch names the new pseudo as original VLEFF/VLSEGFF name suffixed "_VL" and expand them in RISCVInsertVSETVLI pass. This patch also reverts commit `4537aae0d5`, "[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.". Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126794	2022-06-10 13:57:10 +08:00
Craig Topper	8bbcb98848	[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates. For an addition with simm14 and simm15 immediates with 2 or 3 trailing bits, we can use a shXadd instruction and an addi to do the addition. This patch teaches RISCVMergeBaseOffset to see through this pattern. I don't think the sh1add case occurs because we use two addis for that, but I implemented it for completeness. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127376	2022-06-09 16:07:35 -07:00
Kito Cheng	cfa463fdc6	[RISCV][NFC] Update testcase for D126861	2022-06-10 00:18:02 +08:00
Kito Cheng	4b11f90903	[RISCV] Fix missing stack pointer recover In order to make sure the stack point is right through the EH region, we also need to restore stack pointer from the frame pointer if we don't preserve stack space within prologue/epilogue for outgoing variables, normally it's just checking the variable sized object is present or not is enough, but we also don't preserve that at prologue/epilogue when have vector objects in stack. Example to show what happened: ``` try { sp adjust for outgoing args. // 1. Sp changed. func_call // 2. Exception raised sp restore // Oh, not restored } catch { // 3. And now we are here. } // 4. Prepare to return!, restore return address from stack, but...sp is wrong. // 5. Screw up! ``` Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D126861	2022-06-09 23:38:50 +08:00
Kito Cheng	8b3426569e	[RISCV] Pre-commit testcase for PR55442 The testcase show the stack pointer isn't recovered when we got exception from `_Z3fooiiiiiiiiiiPi`, and then we screw up due to restore return address from wrong stack pointer. NOTE: Trigger conditions: 1. Frame pointer is required. 2. Stack has out-going argument 3. Vector extension is enabled. Another run-able testcase: $ clang++ -target riscv64-unknown-linux-gnu -march=rv64gcv test.cpp ``` void __attribute__((noinline)) foo(int, int, int, int, int, int, int, int, int, int, int ){ throw int(0); } int main(int argc, char *argv) { int exception_value = 1; try { foo(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0); } catch (int i) { exception_value = i; } return exception_value; } ``` Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D126860	2022-06-09 23:35:38 +08:00
Lian Wang	91e31fd205	[RISCV][VP] Add fp test of widen and split for vp.setcc Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D127079	2022-06-09 08:14:12 +00:00
Lian Wang	362a02dabe	[RISCV][test] Add widen STEP_VECTOR tests. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127371	2022-06-09 07:47:04 +00:00
Craig Topper	4bcfc41846	[SelectionDAG] Teach computeKnownBits that a nsw self multiply produce a positive value. This matches what we do in IR. For the RISC-V test case, this allows us to use -8 for the AND mask instead of materializing a constant in a register. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D127335	2022-06-08 14:55:58 -07:00
Craig Topper	e4ba24c17d	[RISCV] Support (addi (addi globaladdr, C1), C2) in RISCVMergeBaseOffset. Add with immediates in the range [-4096, -2049] or [2048, 4095] get convert to two ADDIs. Teach RISCVMergeBaseOffset to recognize this pattern as well. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D126843	2022-06-08 08:20:37 -07:00
Craig Topper	33f4da2455	[RISCV] Support LUI+ADDIW in RISCVMergeBaseOffsetOpt::matchLargeOffset. LUI+ADDIW always produces a simm32. This allows us to always fold it into a global offset. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D126729	2022-06-08 08:19:21 -07:00
Shao-Ce SUN	862f30a428	[RISCV] Add ISD::EH_DWARF_CFA Based on D24038. LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D126181	2022-06-08 22:03:30 +08:00
Kito Cheng	6a6f632b93	Revert "[RISCV] Testcase to show wrong register allocation result of subreg liveness" Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS. This reverts commit `cbe22c7943`.	2022-06-08 21:19:27 +08:00
Kito Cheng	7207373e1e	Revert "[SplitKit] Handle early clobber + tied to def correctly" Revert due to failed on LLVM_ENABLE_EXPENSIVE_CHECKS. This reverts commit `e14d04909d`.	2022-06-08 13:05:35 +08:00
Kito Cheng	e14d04909d	[SplitKit] Handle early clobber + tied to def correctly Spliter will try to extend a live range into `r` slot for a use operand, that's works on most situaion, however that not work correctly when the operand has tied to def, and the def operand is early clobber. Give an example to demo what's wrong: 0 %0 = ... 16 early-clobber %0 = Op %0 (tied-def 0), ... 32 ... = Op %0 Before extend: %0 = [0r, 0d) [16e, 32d) The point we want to extend is 0d to 16e not 16r in this case, but if we use 16r here we will extend nothing because that already contained in [16e, 32d). This patch add check for detect such case and adjust the extend point. Detailed explanation for testcase: https://reviews.llvm.org/D126047 Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D126048	2022-06-08 11:33:05 +08:00
Kito Cheng	cbe22c7943	[RISCV] Testcase to show wrong register allocation result of subreg liveness This testcase show the live range isn't construct correctly when subreg liveness is enabled. In the testcase `early-clobber-tied-def-subreg-liveness.ll`, first operand of `vsext.vf2 v8, v16, v0.t` is both def and use, and the use is come from the memory location of `.L__const._Z3foov.var_49`, it's load and spilled into stack, and then...v8 is overwrite by another instructions. ``` lui a0, %hi(.L__const._Z3foov.var_49) addi a0, a0, %lo(.L__const._Z3foov.var_49) ... vle16.v v8, (a0) # Load value from var_49 ... addi a0, sp, 16 ... vs2r.v v8, (a0) # Spill ... vl2r.v v8, (a1) # Reload ... lui a0, %hi(.L__const._Z3foov.var_40) addi a0, a0, %lo(.L__const._Z3foov.var_40) vle16.v v8, (a0) # Load value...into v8??? vmsbc.vx v0, v8, a0 # And use that. ... vsext.vf2 v8, v16, v0.t # But v8 is here...which is expect value from the reload ``` The `early-clobber-tied-def-subreg-liveness.mir` has more detailed infomation for that, `%25.sub_vrm2_0` is defined in 64, and used in 464, and defined again in 464, and we has used an inline asm to clobber all vector register for trigger spliter. ``` 0B bb.0.entry: 16B %0:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_49 32B %1:gpr = ADDI %0:gpr, target-flags(riscv-lo) @__const._Z3foov.var_49 48B dead $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype 64B undef %25.sub_vrm2_0:vrn4m2nov0 = PseudoVLE16_V_M2 %1:gpr, 2, 4, implicit $vl, implicit $vtype 80B %3:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_48 96B %4:gpr = ADDI %3:gpr, target-flags(riscv-lo) @__const._Z3foov.var_48 112B %5:vr = PseudoVLE8_V_M1 %4:gpr, 2, 3, implicit $vl, implicit $vtype 128B %6:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_46 144B %7:gpr = ADDI %6:gpr, target-flags(riscv-lo) @__const._Z3foov.var_46 160B %25.sub_vrm2_1:vrn4m2nov0 = PseudoVLE16_V_M2 %7:gpr, 2, 4, implicit $vl, implicit $vtype 176B %9:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_45 192B %10:gpr = ADDI %9:gpr, target-flags(riscv-lo) @__const._Z3foov.var_45 208B %25.sub_vrm2_2:vrn4m2nov0 = PseudoVLE16_V_M2 %10:gpr, 2, 4, implicit $vl, implicit $vtype 224B INLINEASM &"" [sideeffect] [attdialect], $0:[clobber], ... 240B %12:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_44 256B %13:gpr = ADDI %12:gpr, target-flags(riscv-lo) @__const._Z3foov.var_44 272B dead $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype 288B %25.sub_vrm2_3:vrn4m2nov0 = PseudoVLE16_V_M2 %13:gpr, 2, 4, implicit $vl, implicit $vtype 304B $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype 320B %16:gpr = LUI target-flags(riscv-hi) @__const._Z3foov.var_40 336B %17:gpr = ADDI %16:gpr, target-flags(riscv-lo) @__const._Z3foov.var_40 352B %18:vrm2 = PseudoVLE16_V_M2 %17:gpr, 2, 4, implicit $vl, implicit $vtype 368B $x0 = PseudoVSETIVLI 2, 73, implicit-def $vl, implicit-def $vtype 384B %20:gpr = LUI 1048572 400B %21:gpr = ADDIW %20:gpr, 928 416B early-clobber %22:vr = PseudoVMSBC_VX_M2 %18:vrm2, %21:gpr, 2, 4, implicit $vl, implicit $vtype 432B $x0 = PseudoVSETIVLI 2, 9, implicit-def $vl, implicit-def $vtype 448B $v0 = COPY %22:vr 464B early-clobber %25.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %25.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, killed $v0, 2, 4, 0, implicit $vl, implicit $vtype 480B %26:gpr = LUI target-flags(riscv-hi) @var_47 496B %27:gpr = ADDI %26:gpr, target-flags(riscv-lo) @var_47 512B PseudoVSSEG4E16_V_M2 %25:vrn4m2nov0, %27:gpr, 2, 4, implicit $vl, implicit $vtype 528B PseudoRET ``` When spliter will try to split %25: ``` selectOrSplit VRN4M2NoV0:%25 [64r,160r:4)[160r,208r:0)[208r,288r:1)[288r,464e:2)[464e,512r:3) 0@160r 1@208r 2@288r 3@464e 4@64r L0000000000000030 [160r,512r:0) 0@160r L00000000000000C0 [208r,512r:0) 0@208r L0000000000000300 [288r,512r:0) 0@288r L000000000000000C [64r,464e:1)[464e,512r:0) 0@464e 1@64r weight:1.179245e-02 w=1.179245e-02 ``` ``` Best local split range: 64r-208r, 6.999861e-03, 3 instrs enterIntvBefore 64r: not live leaveIntvAfter 208r: valno 1 useIntv [64B;216r): [64B;216r):1 blit [64r,160r:4): [64r;160r)=1(%29)(recalc) blit [160r,208r:0): [160r;208r)=1(%29)(recalc) blit [208r,288r:1): [208r;216r)=1(%29)(recalc) [216r;288r)=0(%28)(recalc) blit [288r,464e:2): [288r;464e)=0(%28)(recalc) blit [464e,512r:3): [464e;512r)=0(%28)(recalc) rewr %bb.0 464e:0 early-clobber %28.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %25.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, $v0, 2, 4, 0, implicit $vl, implicit $vtype rewr %bb.0 288r:0 %28.sub_vrm2_3:vrn4m2nov0 = PseudoVLE16_V_M2 %13:gpr, 2, 4, implicit $vl, implicit $vtype rewr %bb.0 208r:1 %29.sub_vrm2_2:vrn4m2nov0 = PseudoVLE16_V_M2 %10:gpr, 2, 4, implicit $vl, implicit $vtype rewr %bb.0 160r:1 %29.sub_vrm2_1:vrn4m2nov0 = PseudoVLE16_V_M2 %7:gpr, 2, 4, implicit $vl, implicit $vtype rewr %bb.0 64r:1 undef %29.sub_vrm2_0:vrn4m2nov0 = PseudoVLE16_V_M2 %1:gpr, 2, 4, implicit $vl, implicit $vtype rewr %bb.0 464B:0 early-clobber %28.sub_vrm2_0:vrn4m2nov0 = PseudoVSEXT_VF2_M2_MASK %28.sub_vrm2_0:vrn4m2nov0(tied-def 0), %5:vr, $v0, 2, 4, 0, implicit $vl, implicit $vtype rewr %bb.0 512B:0 PseudoVSSEG4E16_V_M2 %28:vrn4m2nov0, %27:gpr, 2, 4, implicit $vl, implicit $vtype rewr %bb.0 216B:1 undef %28.sub_vrm1_0_sub_vrm1_1_sub_vrm1_2_sub_vrm1_3_sub_vrm1_4_sub_vrm1_5:vrn4m2nov0 = COPY %29.sub_vrm1_0_sub_vrm1_1_sub_vrm1_2_sub_vrm1_3_sub_vrm1_4_sub_vrm1_5:vrn4m2nov0 queuing new interval: %28 [216r,288r:0)[288r,464e:1)[464e,512r:2) 0@216r 1@288r 2@464e L000000000000000C [216r,216d:0)[464e,512r:1) 0@216r 1@464e L0000000000000300 [288r,512r:0) 0@288r L00000000000000C0 [216r,512r:0) 0@216r L0000000000000030 [216r,512r:0) 0@216r weight:8.706897e-03 Enqueuing %28 queuing new interval: %29 [64r,160r:0)[160r,208r:1)[208r,216r:2) 0@64r 1@160r 2@208r L000000000000000C [64r,216r:0) 0@64r L00000000000000C0 [208r,216r:0) 0@208r L0000000000000030 [160r,216r:0) 0@160r weight:1.097826e-02 Enqueuing %29 ``` The live range of first part subreg of %25 is become [216r,216d:0)[464e,512r:1), however first live range should live until 464e rather than just live and [216r,216d:0). And then the register allocator allocated wrong result accroding the live range info. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126047	2022-06-08 11:27:24 +08:00
Craig Topper	0c66deb498	[RISCV] Scalarize gather/scatter on RV64 with Zve32* extension. i64 indices aren't supported on Zve32*. Scalarize gathers to prevent generating illegal instructions. Since InstCombine will aggressively canonicalize GEP indices to pointer size, we're pretty much always going to have an i64 index. Trying to predict when SelectionDAG will find a smaller index from the TTI hook used by the ScalarizeMaskedMemIntrinPass seems fragile. To optimize this we probably need an IR pass to rewrite it earlier. Test RUN lines have also been added to make sure the strided load/store optimization still works. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127179	2022-06-07 08:07:50 -07:00
Matt Arsenault	56303223ac	llvm-reduce: Don't assert on functions which don't track liveness Use the query that doesn't assert if TracksLiveness isn't set, which needs to always be available. We also need to start printing liveins regardless of TracksLiveness.	2022-06-07 10:00:25 -04:00
Craig Topper	be398100ea	[SelectionDAG] Further improve computeKnownBits for (smax X, C) where C is non-negative. Move the code that was added for D126896 after the normal recursive calls to computeKnownBits. This allows us to calculate trailing zeros. Previously we would break out of the switch before the recursive calls.	2022-06-06 09:59:23 -07:00
Shao-Ce SUN	84bacb18c6	[RISCV] Use check-prefixes to reduce check lines Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125083	2022-06-06 15:59:15 +08:00
Lian Wang	20cf77f776	[LegalizeTypes][VP] Add widen and split support for vp.fptrunc and vp.fpext Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126439	2022-06-06 02:28:01 +00:00
LiaoChunyu	f14d18c7a9	[RISCV] Add more patterns for FNMADD D54205 handles fnmadd: -rs1 * rs2 - rs3 This patch add fnmadd: -(rs1 * rs2 + rs3) (the nsz flag on the FMA) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126852	2022-06-04 12:31:45 +08:00
Craig Topper	cc3bd43533	[RISCV] Support LUI+ADDIW in doPeepholeLoadStoreADDI. This fixes an inconsistency between RV32 and RV64. Still considering trying to do this peephole during isel, but wanted to fix the inconsistency first. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126986	2022-06-03 18:06:56 -07:00
Craig Topper	8da5d5dbdc	[RISCV] Pre-commit test cases for D126986. NFC	2022-06-03 13:31:45 -07:00
Craig Topper	170c550ca8	[RISCV] Use SelectionDAG::isBaseWithConstantOffset in scalar load/store address matching. Test changes are because isBaseWithConstantOffset uses computeKnownBits and that is able to see that an earlier AND instruction guaranteed alignment so that we can treat an OR as an ADD. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126970	2022-06-03 10:55:28 -07:00
Craig Topper	dbead2388b	[RISCV] Add custom isel for (add X, imm) used by load/stores. If the imm is out of range for an ADDI, we will materialize it in a register using multiple instructions. If the ADD is used by a load/store, doPeepholeLoadStoreADDI can try to pull an ADDI from the constant materialization into the load/store offset. This only works if the ADD has a single use, otherwise the peephole would have to rebuild multiple nodes. This patch instead tries to solve the problem when the add is selected. We check that the add is only used by loads/stores and if it is we will select it to (ADDI (ADD X, Imm-Lo12), Lo12). This will enable the simple case in doPeepholeLoadStoreADDI that can bypass an ADDI used as a pointer. As a result we can remove the more complicated peephole from doPeepholeLoadStoreADDI. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126576	2022-06-02 13:45:32 -07:00
Craig Topper	fa20bf1636	[DAGCombiner][RISCV] Improve computeKnownBits for (smax X, C) where C is non-negative. If C is non-negative, the result of the smax must also be non-negative, so all sign bits of the result are 0. This allows DAGCombiner to remove a zext_inreg in the modified test. This zext_inreg started as a sext that became zext before type legalization then was promoted to a zext_inreg. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126896	2022-06-02 12:34:24 -07:00
Craig Topper	01ba470826	[RISCV] Add test case showing unnecessary extend after i32 smax on rv64. NFC One of the operands of the smax is a positive value so computeKnownBits determines the result of the smax must always be positive. This allows DAG combiner to convert the sign extend to zero extend before type legalization. After type legalization the smax is promoted to i64 by sign extending its inputs and the zero extend becomes an AND instruction. We are unable to remove the AND at this point and it becomes a pair of shifts or a zext.w. The result of smax has as many sign bits as the minimum of its inputs. Had we kept the sign extend instead of turning it into a zero extend it would be removed by DAG combiner after type legalization.	2022-06-02 09:58:11 -07:00
Philip Reames	dcdb0bf25b	[RISCV] Fix an inconsistency with compatible load/store handling Once we've computed the incoming predecessor state, we should use the same compatibility check with knowledge of MI as we did in phase 2 in order to be consistent across all phases. Differential Revision: https://reviews.llvm.org/D126574	2022-06-02 08:03:51 -07:00
jacquesguan	5482ae6328	[LegalizeTypes][VP] Add widen and split support for VP FP integer casting op. This patch adds widen and split support for VP_FPTOSI, VP_FPTOUI, VP_SITOFP and VP_UITOFP. Differential Revision: https://reviews.llvm.org/D126847	2022-06-02 09:05:27 +00:00
jacquesguan	058791d8f2	[LegalizeTypes][VP] Add widen and split support for VP_SIGN_EXTEND and VP_ZERO_EXTEND. Differential Revision: https://reviews.llvm.org/D126442	2022-06-02 02:21:22 +00:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Philip Reames	f15add7d93	[RISCV] Split fixed-vector-strided-load-store.ll so it can be autogened I've gotten tired of updating register allocation changes by hand, let's just autogen this even if we have to duplicate it.	2022-06-01 16:12:35 -07:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit `430ac5c302`. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Craig Topper	aeb27f133a	[RISCV] Fix i64<->f64 and i32<->f32 bitcasts with VLS vectors enabled. We enable a custom handler to optimize conversions between scalars and fixed vectors. Unfortunately, the custom handler picks up scalar to scalar conversions as well. If the scalar types are both legal, we wouldn't match any of the fixed vector cases and would return SDValue() causing the LegalizeDAG to expand the bitcast through memory. This patch fixes this by checking if it's a scalar to scalar conversion and returns `Op` if both types are legal. Differential Revision: https://reviews.llvm.org/D126739	2022-06-01 08:13:49 -07:00
wangpc	57203af167	[RISCV] Set target-abi explicitly to reduce codegen results As mentioned in D125947, we can reduce codegen results by adding an explicit hard single-float ABI. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D126640	2022-06-01 13:49:23 +08:00
Philip Reames	33b1be5916	[riscv] add test coverage for fractional lmul w/fixed length vectorization	2022-05-31 10:25:37 -07:00
Craig Topper	1b2de79ff4	[RISCV] Use two ADDIs to do some stack pointer adjustments. If the adjustment doesn't fit in 12 bits, try to break it into two 12 bit values before falling back to movImm+add/sub. This is based on a similar idea from isel. Reviewed By: luismarques, reames Differential Revision: https://reviews.llvm.org/D126392	2022-05-31 10:25:28 -07:00
Craig Topper	80c4cf6369	[RISCV] Fix a few corner case bugs in RISCVMergeBaseOffsetOpt::matchLargeOffset The immediate for LUI is stored as 20-bit unsigned value. We need to sign extend if after shifting by 12 to match the instruction behavior. If we find an LUI+ADDI on RV64, it means the constant isn't a simm32. If it was, we would have emitted LUI+ADDIW from constant materialization. Make sure the constant is a simm32 before folding. This appears to match gcc. A future patch will add support for LUI+ADDIW on RV64.	2022-05-31 09:50:54 -07:00
Craig Topper	3b5456d5f0	[RISCV] Pre-commit tests for D126635. NFC	2022-05-31 09:49:46 -07:00
eopXD	2cadf84fc8	[RISCV] Pass OptLevel to `RISCVDAGToDAGISel` correctly Originally, `OptLevel` isn't passed into the `MachineFunctionPass`. This lets the default parameter of `SelectionDAGISel`, which is `CodeGenOpt::Default`, be passed in. OptLevelChanger captures the optimization level with the parameter, and rather not the value within `TargetMachine`. This lets the optimization be unintentionally overwriten if other value than `CodeGenOpt::Default` passed. This patch fixes this by passing the optimization level rather than using the default value. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126641	2022-05-30 17:22:50 -07:00
eopXD	51002bdb5e	[RISCV] Precommit test case to show bug in RISCVISelDagToDag The optimization level should not be restored into O2. This is a pre-commit test case to show fix in D126641. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126677	2022-05-30 15:59:20 -07:00
Ping Deng	88af539c0e	[RISCV] Support VP_REDUCE_MUL mask operation Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126520	2022-05-30 03:05:39 +00:00
Ping Deng	083798e270	[LegalizeTypes][VP] Add integer promotion support for vp.fptosi/vp.fptoui Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125760	2022-05-30 03:05:39 +00:00
Craig Topper	6a6cf2e28d	[RISCV] isel (add (and X, 0x1FFFFFFFE), Y) as (SH1ADD (SRLI X, 1), Y) This pattern is what we get after DAG combine for C code like this. short ptr1, ptr2, *ptr3; unsigned diff = ptr1 - ptr2; return ptr3[diff]; Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126588	2022-05-29 18:24:07 -07:00
Craig Topper	e642d0ea21	[RISCV] Add test cases showing missed opportunity to use shXadd.uw. NFC The tests here show the codegen for something like this C code. unsigned diff = ptr1 - ptr2; return ptr3[diff]; The pointer difference is truncated to 32-bits before being used again as an index. In SelectionDAG this appears as an AND between a SRL and a SHL. DAGCombiner will remove the shifts leaving only an AND. The Mask now has 1,2, or 3 trailing zeros and 31, 30, or 29 leading zeros. We end up falling back to constant materialization to create this mask. We could instead use srli followed by slli.uw. Or since we have an add, we can use srli followed by shXadd.uw. Differential Revision: https://reviews.llvm.org/D126589	2022-05-29 18:22:55 -07:00
Philip Reames	85b4470035	[RISCV] Allow PRE of vsetvli involving non-1 LMUL This is a follow up to address a review comment from D124869. When deciding whether to PRE a vsetvli, we can allow non-LMUL1 vsetvlis. Differential Revision: https://reviews.llvm.org/D126563	2022-05-27 15:49:41 -07:00
Craig Topper	542a83c362	[RISCV] Correct load/store alignments in sink-splat-operands.ll. NFC These should be aligned to the natural alignment of the element. Probably copy/paste mistake from the i32 tests. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126567	2022-05-27 14:39:31 -07:00
Philip Reames	d4905a7b20	[RISCV] Add a vsetvli PRE test involving non-1 LMUL	2022-05-27 13:16:05 -07:00
Craig Topper	aaad507546	[RISCV] Return false from isOffsetFoldingLegal instead of reversing the fold in lowering. When lowering GlobalAddressNodes, we were removing a non-zero offset and creating a separate ADD. It already comes out of SelectionDAGBuilder with a separate ADD. The ADD was being removed by DAGCombiner. This patch disables the DAG combine so we don't have to reverse it. Test changes all look to be instruction order changes. Probably due to different DAG node ordering. Differential Revision: https://reviews.llvm.org/D126558	2022-05-27 11:05:18 -07:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit `4d8d2580c5`.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Philip Reames	8a3b6ba756	[RISCV] Add a subtarget feature to enable unaligned scalar loads and stores A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature. There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment. Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them. Differential Revision: https://reviews.llvm.org/D126085	2022-05-26 15:25:47 -07:00
Craig Topper	460781feef	[LegalizeTypes] Fix bug in expensive checks verification With a fix for an expensive checks build failure exposed by new RISC-V tests. Something about expanding two rotates in type legalization caused a change in the remapping tables that the expensive checks verifying wasn't expecting. See comment in the code for how it was fixed. Tests came from this commit that exposed the bug [RISCV] Add test cases showing failure to remove mask on rotate amounts. If the masking AND has multiple users we fail to remove it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D126036	2022-05-26 13:13:32 -07:00
Philip Reames	afe49934a6	[RISCV] Allow compatible VTYPE in AVL Reg Forward cases During insertion of VSETVLI, we have two related bits of code which decide whether we can reuse a previous vsetvli result. As was pointed out in the original review, these cases can allow any prior state for which we know that VL is the same for any value of AVL. This was originally separated out of a desire for separate tests and review. As it turns out, finding a test case for this has been quite challenging. Most of the cases I tried, we manage to already get through other chains of logic. We do have one correct test change, but that only exercises one of the two changes. Differential Revision: https://reviews.llvm.org/D126400	2022-05-26 08:50:35 -07:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply `62a9b36fcf` and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after `62a9b36fcf`. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Lian Wang	8aa6b05deb	[LegalizeTypes][VP] Add widen and split support for VP_TRUNCATE Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125950	2022-05-26 02:03:27 +00:00
Haocong.Lu	085acede57	[RISCV][NFC] Remove solved TODO for combining constant shifts Reviewed By: benshi001, asb Differential Revision: https://reviews.llvm.org/D126185	2022-05-26 09:55:19 +08:00
Philip Reames	be2cb824d0	[riscv] Remove mutation of prior vsetvli from insertion dataflow This moves mutation entirely out of the main algorithm. The immediate trigger is that we hit another case of the same issue I thought we'd fixed in `72925d9`. It turns out we hadn't considered the cross block case. As a brief summary, the issue being fixed is that if we mutate a previous vsetvli in phase 3, there's a possibility that some later use of that vsetvli changes "compatibility". In the cross_block_mutate test, this later vsetvli occurs in another block (and is thus visit order dependent too!). This causes us to fail strict asserts. (To be explicit, the current on by default workaround should compensate. It's only when we turn that off that we have problems.) Now, I want to explicitly call out an alternate workaround. We could leave the mutation in phase 3, and simplify restrict it to the case where the previous vsetvli's GPR result is unused. That covers the case we've actually seen. (I'll note that codegen regressions with a simple form of this were significant. We might have to check specifically for the use outside block case to keep them reasonable, which complicates the workaround slightly.) Personally, I'm at the point where I want the mutation pulled out just for robustness sake. I'm worried there's yet one more form of this bug we haven't thought about. The other motivation for this change is that it does give us a couple of minor codegen wins. None appear to be hugely significant, but improvements never hurt right? Differential Revision: https://reviews.llvm.org/D125270	2022-05-25 10:51:14 -07:00
Craig Topper	172149e98c	[RISCV] Preserve fast math flags in lowerVPOp. Update test to check MIR after finalize-isel instead of debug output. This is of course not the only place we should preserve FMF, but it's the most obvious one. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126306	2022-05-25 09:16:07 -07:00
Philip Reames	2a3b6f2cba	[RISCV] Hoist VSETVLI vlmax, vtype out of scalable loops This is a straight forward extension of the PRE transform introduced in D124869 to handle the VLMAX case. The test changes here look quite positive. This surprised me until I realized that all the tests are using @llvm.vscale to figure out the VLMAX, not the llvm.riscv.vsetvlmax intrinsic. If they'd used the later, these would have been full redundancy cases and fully handled by the data flow. I'm not really sure if use of vscale here is representative or not. If it is, we should probably look at using VSETVLI to lower vscale rather than a raw read of vlenb and some math. Differential Revision: https://reviews.llvm.org/D126338	2022-05-25 08:00:27 -07:00
Philip Reames	a4a438f05a	[riscv] Add coverage for fixed length vector loops using LMUL	2022-05-25 07:42:21 -07:00
Lewis Revill	29a5a7c6d4	[RISCV] Add pre-emit pass to make more instructions compressible When optimizing for size, this pass searches for instructions that are prevented from being compressed by one of the following: 1. The use of a single uncompressed register. 2. A base register + offset where the offset is too large to be compressed and the base register may or may not already be compressed. In the first case, if there is a compressed register available, then the uncompressed register is copied to the compressed register and its uses replaced. This is only done if there are enough uses that code size would be improved. In the second case, if a compressed register is available, then the original base register is copied and adjusted such that: new_base_register = base_register + adjustment base_register + large_offset = new_base_register + small_offset and the uses of the base register are replaced with the new base register. Again this is only done if there are enough uses for code size to be improved. This pass was authored by Lewis Revill, with large offset optimization added by Craig Blackmore. Differential Revision: https://reviews.llvm.org/D92105	2022-05-25 09:25:02 +01:00
Craig Topper	66db5312bd	[RISCV] Fix vnsrl/vnsra isel patterns that are dropping VL. We were incorrectly using VLMax instead of the passed VL. Reviewed By: khchen, reames Differential Revision: https://reviews.llvm.org/D126319	2022-05-24 21:38:59 -07:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit `62a9b36fcf`. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Philip Reames	948d931323	[RISCV] Ensure the forwarded AVL register is alive When the AVL value does not fit in 5 bits, the register in which this value is stored may be dead when we want to forward it. This patch ensure the kill flags on the register are cleared before forwarding. Patch by: loralb Differential Revision: https://reviews.llvm.org/D125971	2022-05-24 15:07:42 -07:00
Philip Reames	a95ecb20bc	[RISCV] Hoist VSETVLI out of idiomatic fixed length vector loops This patch teaches the VSETVLI insertion pass to perform a very limited form of partial redundancy elimination. The motivating example comes from the fixed length vectorization of a simple loop such as: for (unsigned i = 0; i < a_len; i++) a[i] += b; Without this change, the core vector loop and preheader is as follows: .LBB0_3: # %vector.ph andi a1, a6, -8 addi a4, a0, 16 mv a5, a1 .LBB0_4: # %vector.body # =>This Inner Loop Header: Depth=1 addi a3, a4, -16 vsetivli zero, 4, e32, m1, ta, mu vle32.v v8, (a3) vle32.v v9, (a4) vadd.vx v8, v8, a2 vadd.vx v9, v9, a2 vse32.v v8, (a3) vse32.v v9, (a4) addi a5, a5, -8 addi a4, a4, 32 bnez a5, .LBB0_4 The key thing to note here is that, the execution of the vsetivli only needs to happen once. Since there's no tail folding happening here, the value of the vector configuration registers are invariant through the loop. After this patch, we hoist the configuration into the preheader and perform it once. .LBB0_3: # %vector.ph andi a1, a6, -8 vsetivli zero, 4, e32, m1, ta, mu addi a4, a0, 16 mv a5, a1 .LBB0_4: # %vector.body # =>This Inner Loop Header: Depth=1 addi a3, a4, -16 vle32.v v8, (a3) vle32.v v9, (a4) vadd.vx v8, v8, a2 vadd.vx v9, v9, a2 vse32.v v8, (a3) vse32.v v9, (a4) addi a5, a5, -8 addi a4, a4, 32 bnez a5, .LBB0_4 Differential Revision: https://reviews.llvm.org/D124869	2022-05-24 14:56:01 -07:00

1 2 3 4 5 ...

1891 Commits