llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	c3c17b1695	[RISCV] Use MVT for the argument to getMaskTypeFor. NFC Only one caller didn't already have an MVT and that was easy to fix. Since the return type is MVT and it uses MVT::getVectorVT, taking an MVT as input makes the most sense.	2022-07-11 15:14:44 -07:00
Craig Topper	759e5e0096	[RISCV] Remove doPeepholeLoadStoreADDI. All of the cases should be handled by SelectAddrRegImm now. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D129451	2022-07-11 10:44:33 -07:00
Craig Topper	907d923a20	[RISCV] Move the custom isel for (add X, imm) into SelectAddrRegImm. This custom isel was used to split the lo12 bits of the imm so that they could be folded into load/store addresses via a post-isel peephole. This patch instead splits the immediate during isel and folds the lo12 removing the need for the post-isel peephole to do anything. After this we'll be able to remove the post-isel peephole. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D129450	2022-07-11 10:44:33 -07:00
Craig Topper	1a2bd44b77	[RISCV] Make shouldConvertConstantLoadToIntImm return true unless enableUnalignedScalarMem is true. This restores the old behavior before D129402 when enableUnalignedScalarMem is false. This fixes a regression spotted by @asb. To fix this correctly, we need to consider alignment of the load we'd be replacing, but that's not possible in the current interface.	2022-07-11 09:40:08 -07:00
David Sherwood	03fee6712a	[LoopVectorize] Add option to use active lane mask for loop control flow Currently, for vectorised loops that use the get.active.lane.mask intrinsic we only use the mask for predicated vector operations, such as masked loads and stores, etc. The loop itself is still controlled by comparing the canonical induction variable with the trip count. However, for some targets this is inefficient when it's cheap to use the mask itself to control the loop. This patch adds support for using the active lane mask for control flow by: 1. Generating the active lane mask for the next iteration of the vector loop, rather than the current one. If there are still any remaining iterations then at least the first bit of the mask will be set. 2. Extract the first bit of this mask and use this bit for the conditional branch. I did this by creating a new VPActiveLaneMaskPHIRecipe that sets up the initial PHI values in the vector loop pre-header. I've also made use of the new BranchOnCond VPInstruction for the final instruction in the loop region. Differential Revision: https://reviews.llvm.org/D125301	2022-07-11 13:46:55 +01:00
LiaoChunyu	3f68f0f816	[RISCV] Optimize 2x SELECT for floating-point types Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127871	2022-07-11 14:10:27 +08:00
Pengcheng Wang	8977989449	[RISCV] Increase complexity of RVV element extraction patterns Somehow some tests failed in our downstream because it matched VFMV+FSD pattern first. Both FSD and VSE patterns have the same complexity, while FSD is matched before VSE in the generated matcher table. This problem only occurs in our downstream (so sorry that I can't provide a test here) and increasing the value of `AddedComplexity` can fix it. Reviewed By: StephenFan, craig.topper Differential Revision: https://reviews.llvm.org/D129360	2022-07-11 10:53:15 +08:00
Craig Topper	35ec8a423d	[RISCV] Teach shouldConvertConstantLoadToIntImm that constant materialization can use constant pools. I think it only makes sense to return true here if we aren't going to turn around and create a constant pool for the immmediate. I left out the check for useConstantPoolForLargeInts() thinking that even if you don't want the commpiler to create a constant pool you might still want to avoid materializing an integer that is already available in a global variable. Test file was copied from AArch64/ARM and has not been commited yet. Will post separate review for that. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D129402	2022-07-10 14:10:17 -07:00
Craig Topper	5f7641a3be	[RISCV] Modify the custom isel for (add X, imm) used by load/stores. We have custom isel that tries to select the Lo12 bits using a separate ADDI that can later folded into the load/store address by the post-isel peephole. This patch disables this if the load/store already had a non-zero offset. A non-zero offset implies that CodeGenPrepare split several large offsets used by different loads and stores into a common large offset and multiple small offsets that could be folded. Folding more of the lo12 bits changes this common offset by increasing the small offsets. While this can save an instruction to materialize the common offset, it can also prevent the small offsets from fitting in a compressed load/store instruction. Removing this also simplifies the last piece needed to fold the custom isel for add into SelectAddrRegImm and remove the post-isel peephole.	2022-07-09 22:47:27 -07:00
Craig Topper	9c6a2200e2	[RISCV] Support folding constant addresses in SelectAddrRegImm. We already handled this by folding an ADDI in the post-isel peephole. My goal is to remove that peephole so this adds the functionality to isel.	2022-07-09 13:12:02 -07:00
Philip Reames	b12930e133	[RISCV] Switch to using get.active.lane.mask when tail folding The motivation here is to a) bring us closer into alignment with AArch64 under the assumption that codepath is better tested, and b) simplify pattern matching in an upcoming change. The immediate impact is a significant IR reduction but a fairly minimal change in the generated assembly. Due to a difference in expansion behavior we get a saturating add vs an unsaturating one for the old code, but that's about it. This difference comes down to different handling of overflow, which doesn't seem to be possible here anyways, so the assembly codegen is arguably a minor regression. I don't expect that to matter in practice. Differential Revision: https://reviews.llvm.org/D129221	2022-07-08 10:24:59 -07:00
Craig Topper	92f1794d41	[RISCV] Mark fminnum_vl and fmaxnum_vl as commutable.	2022-07-08 10:19:09 -07:00
Philip Reames	264018d764	[RISCV] Mark vsadd(u)_vl as commutable This allows fixed length vectors involving splats on the LHS to commute into the _vx form of the instruction. Oddly, the generic canonicalization rules appear to catch the scalable vector cases. I haven't fully dug in to understand why, but I suspect it's because of a difference in how we represent splats (splat_vector vs build_vector). Differential Revision: https://reviews.llvm.org/D129302	2022-07-08 10:18:21 -07:00
Craig Topper	a246eb6814	[RISCV] Mark (s/u)min_vl and (s/u)max_vl as commutable.	2022-07-08 09:59:42 -07:00
Kito Cheng	5c45ae4108	[RISCV] Fix wrong register rename for store value during make-compressible optimization Current implementation will rename both register in store instructions if we store base address into memory with same base register, it's OK if the offset is 0, however that is wrong transform if offset isn't 0, give a smalle example here: sd a0, 808(a0) We should not transform into: addi a2, a0, 768 sd a2, 40(a2) That should just rename base address like this: addi a2, a0, 768 sd a0, 40(a2) Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128876	2022-07-08 18:07:17 +08:00
Lian Wang	9cfb28d672	[RISCV] Change VECTOR_SPLICE mask operation from expand to promote Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128717	2022-07-08 06:20:22 +00:00
Diego Caballero	bf1758c3dc	Revert "[RISCV] Optimize 2x SELECT for floating-point types" This reverts commit `1178992c72`.	2022-07-07 22:54:00 +00:00
Craig Topper	088bb8a328	[RISCV] Add more SHXADD patterns where the input is (and (shl/shr X, C2), C1) It might be possible to rewrite (and (shl/shr X, C2), C1) in a way that gives us a left shift that can be folded into a SHXADD.	2022-07-05 16:21:47 -07:00
Craig Topper	a1cd3f49b6	[RISCV] Use a switch statement in PreprocessISelDAG. NFC This should make it easier to add more peepholes in the future.	2022-07-05 12:25:04 -07:00
Craig Topper	c15bcad2f9	[RISCV] Update PreprocessISelDAG to use RemoveDeadNodes. Instead of deleting nodes as we go, delete all dead nodes if a change is made. This allows adding peepholes that might make multiple nodes dead.	2022-07-05 12:25:03 -07:00
Craig Topper	f27672924e	[RISCV] Replace an explicit check with an assert. Shift amounts should never be 0 or more than bitwidth - 1.	2022-07-04 23:21:54 -07:00
Craig Topper	66790b70ea	[RISCV] Rename some variables for clarity. NFC	2022-07-04 23:21:54 -07:00
jacquesguan	063500afc0	[RISCV][NFC] Merge the isolated decleration into foreach. Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D129063	2022-07-05 10:17:45 +08:00
luxufan	c06d0b4d02	[RISCV] Add ADDI instr for computing FrameIndex address RVV doesn't have immediate field for memory addressing. Currently we build MachineInstructions in PEI to computing stack offset for RVV load store instructions. These instructions were added too late to can be optimized by CSE, LICM... passes. This patch makes FrameIndex SDNodes can't be matched in RVV Load Store instruction selection patterns. So that the FrameIndex SDNodes would be selected as `ADDI GPR, targetframeindex`. There are 2 advantages for such change: 1. Stack objects address computing can be optimized by machine function passes. 2. Since the ADDI instruction's destination register can be used as a temp register, we can save an emergency spill slot. Differential Revision: https://reviews.llvm.org/D128187	2022-07-04 22:13:35 +08:00
Craig Topper	d36e09cfe5	[RISCV] Add more SHXADD patterns. This handles the code we get for this. int foo(unsigned x, int *y) { return y[x >> 3]; } The srl and shl implied by the array index will be combined to form (srl (and X, C2), C1). We need to reverse this get to back the shl to fold into SHXADD.	2022-07-03 21:57:05 -07:00
Craig Topper	8eb4dcb737	[RISCV] Move some SHXADD matching cases into a ComplexPattern. NFC Some more complex cases require checking the relationship of operands on different nodes of the match. They also require additional instructions to be created. Using a ComplexPattern gives us that flexibility. I'll be adding another pattern in a future patch.	2022-07-03 21:57:05 -07:00
Craig Topper	13d58ff9f3	[RISCV] Replace call to APInt::countTrailingZeros with uint64_t verson. NFC We know the number of bits is 64 or 32 so we can use the uint64_t version directly. This saves the APInt needing to check for the small vs large size.	2022-07-03 09:00:01 -07:00
luxufan	0f45eaf0da	[RISCV] Add a scavenge spill slot when use ADDI to compute scalable stack offset Computing scalable offset needs up to two scrach registers. We add scavenge spill slots according to the result of `RISCV::isRVVSpill` and `RVVStackSize`. Since ADDI is not included in `RISCV::isRVVSpill`, PEI doesn't add scavenge spill slots for scrach registers when using ADDI to get scalable stack offsets. The ADDI instruction has a destination register which can be used as a scrach register. So one scavenge spil slot is sufficient for computing scalable stack offsets. Differential Revision: https://reviews.llvm.org/D128188	2022-07-03 20:18:13 +08:00
Craig Topper	7e4ab9d5b8	[RISCV] Add more SHXADD isel patterns. This handles the code we get for int foo(int* x, unsigned y) { return x[y >> 1]; } The shift right and the shl will get DAG combined into (shl (and X, 0xfffffffe), 1). We have custom isel to match the shl+and, but with Zba the (add (shl X, 1), Y) part will get matched and leave the and to be iseled by itself. This commit adds a larger pattern that includes the and.	2022-07-02 23:11:22 -07:00
Craig Topper	5d787689b1	[RISCV] Match RISCVISD::ADD_LO in SelectAddrRegImm. This allows us to fold global and constant pool addresses into load/store during isel instead of in the post-isel peephole. I did not copy the alignment check for ConsantPoolSDNode because it wasn't tested. This is a step towards being able to remove the post-isel peephole. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D128738	2022-07-02 09:51:06 -07:00
Craig Topper	b2e9684fe4	[RISCV] isel (shl (and X, C2), C) -> (slli (srliw X, C3), C3+C). where C2 has 32 leading zeros and C3 trailing zeros. When the shl is used by an add C is 1,2 or 3, we end up matching (add (shl X, C), Y) first. This leaves an and with a constant that is harder to materialize.	2022-07-02 01:04:44 -07:00
Craig Topper	9ac548e118	[RISCV] isel (add (and X, 0xFFFFFFFE), Y) as (SH1ADD (SRLIW X, 1), Y). Similar for SH2ADD and SH3ADD. This is what we get from int foo(int* x, unsigned y) { return x[y >> 1]; } This allows us to avoid materializing 0xFFFFFFFE into a register.	2022-07-01 23:52:29 -07:00
Yeting Kuo	5744b9cb79	[RISCV] Restore "Enable shrink wrap by default" This reverts commit `7af3d4ab3d`. RISC-V reverted the shrink wrap patch for bug 53662. Since the bug is fixed by D123679, the commit re-enable it. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D128965	2022-07-02 11:13:13 +08:00
Yeting Kuo	8590a35ef9	[RISCV][NFC] Simplify condition of IsTU. Just simplify code. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D128972	2022-07-02 09:22:38 +08:00
Craig Topper	188582b7e0	[RISCV] Considering existing offset in the alignment when folding ADDIs into load/store. getPointerAlignment and ConstantPoolSDNode::getAlign only consider the alignment of the object. If we already have a non-zero offset into the offset that may have reduced the alignment. Since the base pointer will become an LUI with the old offset, we need to be sure the new offset fits in the alignment of the address that will be used to create the LUI immediate. I'm not sure it is possible to have a non-zero offset in the GlobalAddressSDNode or ConstantPoolSDNode at this point today so this may only be a theoretical bug. Differential Revision: https://reviews.llvm.org/D129006	2022-07-01 11:18:40 -07:00
Fangrui Song	6e8ec13d3f	[MC][RISCV] Suppress R_RISCV_{ADD,SUB}32 in .apple_names .apple_types after D127549 This fixes test/DebugInfo/Generic/accel-table-hash-collisions.ll and cross-cu-inlining.ll when the default triple is riscv. llvm-dwarfdump --apple-names does not resolve R_RISCV_{ADD,SUB}32 in .apple_names .apple_types and having ADD/SUB will cause decoding failure `Atom[0]: Error extracting the value`.	2022-07-01 11:15:04 -07:00
Craig Topper	058d521ea4	[RISCV] Avoid repeated code in SelectAddrRegImm. NFC	2022-06-30 17:22:04 -07:00
Craig Topper	5ca39a55a7	[RISCV] Remove an unnecessary copy of X0 in selectShiftMask. We know which instruction we're emitting so its ok to directly encode X0 into the instruction. We only need to create a copy when a constant 0 is selected without context of what instructions uses it.	2022-06-30 15:11:58 -07:00
Craig Topper	354e04554a	[RISCV] Make custom isel for (add X, imm) used by load/stores more selective. Only handle immediates that would produce an ADDI or ADDIW of Lo12 as the final instruction in their materialization. As the test change show this removes immediates that materialize with lui+addiw that is not the same as lui+addi.	2022-06-30 14:20:11 -07:00
Craig Topper	ae5f5eb2f1	[RISCV] Replace some uses of XLenVT in RISCVDAGToDAGISel::Select with the original Node VT. NFCI These should contain the same thing, but we aren't consistent about which we use. Since we call ReplaceNode, it seems more correct to use the initial VT.	2022-06-30 13:00:44 -07:00
Craig Topper	2b7b609821	[RISCV] Use getVTList to simplify creation of vleff MachineSDNode. NFC We don't need to pass the 3 VTs separately, we already have a list available to us.	2022-06-30 11:34:02 -07:00
Craig Topper	89e7e59621	[RISCV] Use the VT passed into selectImm instead of XLenVT. NFCI I think the VT pased in will always be XLen.	2022-06-30 11:15:28 -07:00
Craig Topper	51d672946e	[RISCV] Fold (sra (add (shl X, 32), C1), 32 - C) -> (shl (sext_inreg (add X, C1), C) Similar for a subtract with a constant left hand side. (sra (add (shl X, 32), C1<<32), 32) is the canonical IR from InstCombine for (sext (add (trunc X to i32), 32) to i32). For RISCV, we should lower this as addiw which means turning it into (sext_inreg (add X, C1)). There is an existing DAG combine to convert back to (sext (add (trunc X to i32), 32) to i32), but it requires isTruncateFree to return true and for i32 to be a legal type as it used sign_extend and truncate nodes. So that doesn't work for RISCV. If the outer sra happens be used by a shl by constant, it will be folded and the shift amount of the sra will be changed before we can do our own DAG combine. This requires us to match the more general pattern and restore the shl. I had wanted to do this as a separate (add (shl X, 32), C1<<32) -> (shl (add X, C1), 32) combine, but that hit an infinite loop for some values of C1. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128869	2022-06-30 09:01:24 -07:00
Craig Topper	9ace5af049	[RISCV] DAG combine (sra (shl X, 32), 32 - C) -> (shl (sext_inreg X, i32), C). The sext_inreg can often be folded into an earlier instruction by using a W instruction. The sext_inreg also works better with our ABI. This is one of the steps to improving the generated code for this https://godbolt.org/z/hssn6sPco Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128843	2022-06-30 09:01:24 -07:00
Philip Reames	dd48d3ad0e	Revert "[RISCV] Avoid changing etype for splat of 0 or -1" This reverts commit `755c84c62c`. A bug was reported on the original review thread (https://reviews.llvm.org/D128006), and on inspection this patch is simply wrong. It needs to be checking for VLInBytes, not MaxVL. These happen to be the same when using AVL=VLMAX (which is quite common), but this does not fold when AVL != VLMAX.	2022-06-29 10:27:02 -07:00
Craig Topper	7cbfb4eb7a	[RISCV] Select (srl (and X, C2) as (slli (srliw X, C3), C3-C). If C2 has 32 leading zeros and C3 trailing zeros.	2022-06-29 09:15:09 -07:00
Craig Topper	5dcc525492	[RISCV] Fold (add X, [-4096, -2049]) or (add X, [2048,4096]) into load/store address during isel. Previously we iseled this to a pair of ADDIs and relied on a post isel peephole to fold one of the ADDIs into the load/store. Now we split the immediate in two parts the same way isel does and fold one of the pieces. If the add has a non-memory use it will emit two isels and larger one will CSE with the ADDI we created for the the memory use. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D128741	2022-06-28 16:59:39 -07:00
Philip Reames	da60558d8a	[RISCV] Rename getMin/MaxVLen to getArchMin/MaxVlen and make protected [nfc]	2022-06-28 15:54:40 -07:00
Philip Reames	860c62f53c	[RISCV] Refine known bits for READ_VLENB This implements known bits for READ_VALUE using any information known about minimum and maximum VLEN. There's an additional assumption that VLEN is a power of two. The motivation here is mostly to remove the last use of getMinVLen, but while I was here, I decided to also fix the bug for VLEN < 128 and handle max from command line generically too. Differential Revision: https://reviews.llvm.org/D128758	2022-06-28 15:42:14 -07:00
Craig Topper	02c8453e64	[RISCV] Teach RISCVMergeBaseOffset to handle read-modify-write of a global. The pass was previously limited to LUI+ADDI being used by a single instruction. This patch allows the pass to optimize multiple memory operations that use the same offset. Each of them will receive a separate %lo relocation. My main motivation is to handle a read-modify-write where we have a load and store to the same address, but I didn't restrict it to that case. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128599	2022-06-28 11:46:24 -07:00
Alex Bradbury	7bcfcabbd1	[RISCV] Implement support for the Zicbop extension Implements the ratified RISC-V Base Cache Management Operation ISA Extension: Zicbop, as described in https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf. This is implemented in a separate patch to Zicbom and Zicboz due to it requiring a new ASM operand type to be defined. Differential Revision: https://reviews.llvm.org/D117433	2022-06-28 12:43:26 +01:00
Alex Bradbury	4f40ca53ce	[RISCV] Implement support for the Zicbom and Zicboz extensions Implements the ratified RISC-V Base Cache Management Operation ISA Extensions: Zicbom and Zicboz, as described in https://github.com/riscv/riscv-CMOs/blob/master/specifications/cmobase-v1.0.pdf. Zicbop is implemented in a separate patch due to it requiring a new ASM operand type to be defined. As discussed in the relevant issue in the upstream spec https://github.com/riscv/riscv-CMOs/issues/47, the cbo.* instructions use the format (rs1) or 0(rs1) for their operand, similar to the AMOs. Differential Revision: https://reviews.llvm.org/D117432	2022-06-28 12:43:25 +01:00
Lian Wang	96ab083622	[RISCV] Support VECTOR_REVERSE mask operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128627	2022-06-28 07:48:51 +00:00
LiaoChunyu	1178992c72	[RISCV] Optimize 2x SELECT for floating-point types Including the following opcode: Select_FPR16_Using_CC_GPR Select_FPR32_Using_CC_GPR Select_FPR64_Using_CC_GPR Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127871	2022-06-28 12:02:05 +08:00
Craig Topper	ea1b861278	[RISCV] Fix misleading formatting and remove a dead getNode call. NFC	2022-06-27 18:49:57 -07:00
Craig Topper	87077c7eb5	[RISCV] Remove repeated calls to getSExtValue. NFC	2022-06-27 13:42:58 -07:00
Philip Reames	0533b6e2f6	[RISCV] Remove a use of getMinVLen in favor of getRealMinVLen The later is possibly greater than the former, and thus the assert was overly strong when a wider VLEN was set at the command line.	2022-06-27 12:52:24 -07:00
Philip Reames	aadc9d26a3	[RISCV] Cost model for scalable reductions This extends the existing cost model for reductions for scalable vectors. The existing cost model assumes that reductions are roughly logarithmic in cost for unordered variants and linear for ordered ones. This change keeps that same basic model, and extends it out to the maximum number of elements a scalable vector could possibly have. This results in costs which aren't terribly high for unordered reductions, but are for ordered ones. This seems about right; we want to strongly bias away from using scalable ordered reductions if the cost might be linear in VL. Differential Revision: https://reviews.llvm.org/D127447	2022-06-27 12:44:38 -07:00
Craig Topper	eb9d21d65c	[RISCV] Remove extra semicolon. NFC	2022-06-26 18:19:43 -07:00
Craig Topper	5e944e9eb7	[RISCV] Refactor SelectAddrRegImm to not depend on SelectBaseAddr. SelectBaseAddr was a minor convenience to use since it already' existed for vector load/store. D128187 is going to remove the other uses of SelectBaseAddr so it has less reason to exist. This patch removes the dependency on SelectBaseAddr and adds a new SelectAddrFrameIndex to share some code with SelectFrameAddrRegImm.	2022-06-26 11:11:41 -07:00
Philip Reames	9803b0d1e7	[RISCV] Implement getVScaleForTuning and thus prefer scalable vectorization when enabled LoopVectorizer uses getVScaleForTuning for deciding how to discount the cost of a potential vector factor by the amount of work performed. Without the callback implemented, the vectorizer was defaulting to an estimated vscale of 1. This results in fixed vectorization looking falsely profitable (since it used the command line VLEN). The test change is pretty limited since a) we don't have much coverage of the vectorizer with scalable vectors at all, and b) what little coverage we have mostly uses i64 element types. There's a separate issue with <vscale x 1 x i64> which prevents us from getting to this stage of costing, and thus only the one test explicitly written to avoid that is visible in the diff. However, this is actually a very wide impact change as it changes the practical vectorization result when both fixed and scalable is enabled to scalable. As an aside, I think the vectorizer is at little too strongly biased towards scalable when both are legal, but we can explore that separately. For now, let's just get the cost model working the way it was intended. Differential Revision: https://reviews.llvm.org/D128547	2022-06-25 11:25:23 -07:00
Philip Reames	767ba58f80	[RISCV] Make getMinRVVVectorSizeInBits and getMaxRVVVectorSizeInBits protected [nfc] These are now only used in the implementation of getRealMinVLen and getRealMaxVLEn, and useRVVForFixedLengthVectors; make them protected to discourage new users.	2022-06-25 11:11:31 -07:00
Shao-Ce SUN	529f05cdbb	[RISCV][MC] Fold UIMM related code Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D128495	2022-06-25 10:50:50 +08:00
Philip Reames	4710e78974	[RISCV] Implement RISCVTTIImpl::getMaxVScale correctly The comments in the existing code appear to pre-exist the standardization of the +v extension. In particular, the specification does provide a bound on the maximum value VLEN can take. From what I can tell, the LMUL comment was simply a misunderstanding of what this API returns. This API returns the maximum value that vscale can take at runtime. This is used in the vectorizer to bound the largest scalable VF (e.g. LMUL in RISCV terms) which can be used without violating memory dependence. Differential Revision: https://reviews.llvm.org/D128538	2022-06-24 16:51:53 -07:00
Philip Reames	a0443dd47c	[RISCV] Simplify 16 bit index handling in lowerVECTOR_REVERSE [nfc] getRealMaxVLen returns an upper bound on the value of VLEN. We can use this upper bound (which unless explicitly set at command line is going to result in a e8 MaxVLMax of much greater than 256) instead of explicitly handling the unknown case separately from the bounded by number greater than 256 case. Note as well that this code already implicitly depends on a capped value for VLEN. If infinite VLEN were possible, than 16 bit indices wouldn't be enough.	2022-06-24 13:08:39 -07:00
Philip Reames	f1e1c3ce77	[RISCV] Replace two calls to getMinRVVVectorSizeInBits in fixed length lowering [nfc] Both of these are only reached if useRVVForFixedLengthVectors is true. Given that, we know that getRealMinVLen() == getMinRVVVectorSizeInBits().	2022-06-24 13:00:57 -07:00
Philip Reames	f1b1bcdbd4	[RISCV] Replace two calls to getMinRVVVectorSizeInBits with getRealMinVLen [nfc] This doesn't change behavior, it just makes it slightly more obvious what's going on. Note that getRealMinVLen is always >= getMinRVVVectorSizeInBits. The first case is a bit tricky, as you have to know that getMinRVVVectorSizeInBits returns 0 when not set, and thus is equivalent to the else value clause. The new code structure makes it more obvious we return 0 unless using RVV for fixed length vectors.	2022-06-24 12:07:33 -07:00
Craig Topper	78a31bb969	[RISCV] Change how we isel (add X, [-4096, -2049]) or (add X, [2048,4095]). We currently split the immediate almost equally between two addis. If the immediate is odd, it won't be split exactly equal. This patch instead gives one addi an immediate of 2047 or -2048 and the other getsthe remainder. If the original immediate is near -2049 or 2048, this might allow the use of c.addi for the addi that receives the smaller immediate. Reviewed By: asb, luismarques Differential Revision: https://reviews.llvm.org/D128500	2022-06-24 08:31:52 -07:00
Craig Topper	c579ab53bd	[RISCV] Move vfma_vl+fneg_vl matching to DAG combine. This patch adds 3 new _VL RISCVISD opcodes to represent VFMA_VL with different portions negated. It also adds a DAG combine to peek through FNEG_VL to create these new opcodes. This is modeled after similar code from X86. This makes the isel patterns more regular and reduces the size of the isel table by ~37K. The test changes look like regressions, but they point to a bug that was already there. We aren't able to commute a masked FMA instruction to improve register allocation because we always use a mask undisturbed policy. Prior to this patch we matched two multiply operands in a different order and hid this issue for these test cases, but a different test still could have encountered it. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D128310	2022-06-24 00:00:37 -07:00
Philip Reames	7bfad7b9d8	[RISCV] Replace two calls to getMinRVVVectorSizeInBits with useRVVForFixedLengthVectors [nfc]	2022-06-23 15:59:33 -07:00
Philip Reames	1cc9792281	[RISCV] Fix a crash in InsertVSETVLI where we hadn't properly guarded for a SEWLMULRatioOnly abstract state A forward abstract state can be in the special SEWLMULRatioOnly state which means we're not allowed to inspect its fields. The scalar to vector move case was mising a guard, and we'd crash on an assert. Test cases included.	2022-06-23 10:25:16 -07:00
Craig Topper	8b10ffabae	[RISCV] Disable <vscale x 1 x > types with Zve32x or Zve32f. According to the vector spec, mf8 is not supported for i8 if ELEN is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32. Since RVVBitsPerBlock is 64 and LMUL is calculated as ((MinNumElements ElementSize) / RVVBitsPerBlock) this means we need to disable any type with MinNumElements==1. For generic IR, these types will now be widened in type legalization. For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan to work on disabling the intrinsics in the riscv_vector.h header. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D128286	2022-06-23 08:49:18 -07:00
Craig Topper	4045b62d4c	[RISCV] Add macrofusion infrastructure and one example usage. This adds the macrofusion plumbing and support fusing LUI+ADDI(W). This is similar to D73643, but handles a different case. Other cases can be added in the future. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D128393	2022-06-23 08:38:39 -07:00
Craig Topper	352346fa9e	[RISCV] Refactor code to remove some small wrapper methods and merge two functions together. NFC	2022-06-22 23:04:58 -07:00
Craig Topper	f912d21e67	[RISCV] Add RISCVISD opcodes for the rest of getAddr. This adds RISCVISD opccodes for LA, LA_TLS_IE, and LA_TLS_GD to remove creation of MachineSDNodes form getAddr. This makes the code consistent with the previous patches that added RISCVISD::HI, ADD_LO, LLA, and TPREL_ADD. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D128325	2022-06-22 09:21:07 -07:00
Craig Topper	0efbf5bfbb	[RISCV] Move the passthru operand for RISCVISD::VRGATHER*_VL nodes. NFC Put it before the VL instead of as the first operand. I want to add passthru to more operands, but the commutable ones like VADD_VL require the commutable operands to be operand 0 and 1. So we can't have the passthru as operand 0 for those.	2022-06-21 14:01:02 -07:00
Craig Topper	0af19ef9ff	[RISCV] Remove true_mask patterns for VRGATHERE16.. After adding it to the table so the post-isel peephole can handle it.	2022-06-21 11:59:37 -07:00
Craig Topper	e50b141a13	[RISCV] Remove true_mask patterns for VRGATHER. These can be handled by the post-isel peephole.	2022-06-21 11:59:37 -07:00
Kazu Hirata	7a47ee51a1	[llvm] Don't use Optional::getValue (NFC)	2022-06-20 22:45:45 -07:00
Craig Topper	e01353f816	[RISCV] Add RISCVISD opcode for PseudoAddTPRel. Use it along with RISCVISD::HI and ADD_LO to avoid emitting MachineSDNodes during lowering.	2022-06-20 20:56:52 -07:00
Craig Topper	59cde2133d	Recommit "[RISCV] Enable subregister liveness tracking for RVV." The failure that caused the previous revert has been fixed by https://reviews.llvm.org/D126048 Original commit message: RVV makes heavy use of subregisters due to LMUL>1 and segment load/store tuples. Enabling subregister liveness tracking improves the quality of the register allocation. I've added a command line that can be used to turn it off if it causes compile time or functional issues. I used the command line to keep the old behavior for one interesting test case that was testing register allocation. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D128016	2022-06-20 20:46:06 -07:00
Kazu Hirata	0916d96d12	Don't use Optional::hasValue (NFC)	2022-06-20 20:17:57 -07:00
Craig Topper	16d3a82de5	[RISCV] Add merge operand to RISCVISD::VRGATHER_VL nodes. Use it in place of VSELECT_VL+VRGATHER_VL. This simplifies the isel patterns. Overall, I think trying to match select+op to create masked instructions in isel doesn't scale. We either need to do it in DAG combine, pre-isel peepole, or post-isel peephole. I don't yet know which is the right answer, but for this case it seemed best to be able to request the masked form directly from lowering. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D128023	2022-06-20 18:58:24 -07:00
Philip Reames	0aebd1d875	[RISCV] Fix crash when costing scalable gather/scatter of pointer This was a bug introduced in d764aa. A pointer type is not a primitive type, and thus we were ending up dividing by zero when computing VLMax. Differential Revision: https://reviews.llvm.org/D128219	2022-06-20 12:50:42 -07:00
Kazu Hirata	ad7ce1e769	Don't use Optional::hasValue (NFC)	2022-06-20 11:49:10 -07:00
Kazu Hirata	e0e687a615	[llvm] Don't use Optional::hasValue (NFC)	2022-06-20 10:38:12 -07:00
Philip Reames	14847098f9	[RISCV] Delete unexercised VL=0 vsetvli compatibility logic The code being removed is technically correct; if we end up with two VL=0 instructions next to each other, we can avoid a state transition if the second is a scalar move. However, since both ops are also nops, we should simply delete them instead. As such, this compatibility rule simply complicates the code for no purpose.	2022-06-20 10:15:31 -07:00
Philip Reames	dc562d570d	[RISCV] Fold prepass back into InsertVSETVLI data flow [nfc-ish] When working through correctness issues in this pass, I moved a number of transforms which were phrased as mutating prior vsetvli instructions out of the main data flow because mutating prior instructions can invalidate the running dataflow results in subtle ways. We ended up creating both a prepass and a post-pass. After consideration, I believe the prepass to be redundant, and this change removes it by folding it back into the data flow via a key conceptual change. Instead of phrasing the mutations on instructions, we can phrase them on abstract states. This avoids the dataflow inconsistency problem mentioned above by simply propagating the potential change forward, and thus reflecting its results in the dataflow. Critically, we do so without modifying existing VSETVLI instructions; some of the data flow steps include non-local IR analysis. Compile time wise, this removes a linear pass, but has the potential to increase the number of iterations for the data flow to converge. That's not a algorithmic complexity change, the needVSETVLI mechanism has the same effect. In practice, I don't see this triggering more iterations, so I think it's likely to be a net win overall. (I didn't do any careful analysis here; just an impression from glancing at a couple tests.) This has the potential to produce better results, so this isn't strictly speaking NFC. Differential Revision: https://reviews.llvm.org/D127870	2022-06-20 07:56:33 -07:00
Philip Reames	820e84e050	[RISCV] Assert initial load/store SEW is the EEW In D127983, I had flipped from using the computed EEW to using the SEW value pulled from the VSETVLI when checking compatibility. This wasn't intentional, though thankfully it appears to be a non-functional difference. The new code does make a unchecked assumption that the initial SEW operand on the load/store is the EEW. This patch clarifies the assumption, and adds an assert to make sure this remains true. Differential Revision: https://reviews.llvm.org/D128085	2022-06-20 07:45:21 -07:00
Craig Topper	8780630ded	[RISCV] Merge two similar asserts from different if/else blocks. NFC	2022-06-19 19:48:50 -07:00
Craig Topper	545a71c0d6	[RISCV] Pre-promote v1i1/v2i1/v4i1->i1/i2/i4 bitcasts before type legalization Type legalization will convert the bitcast into a vector store and scalar load. Instead this patch widens the vector to v8i1 with undef, and bitcasts it to i8. v8i1->i8 has custom handling for type legalization already to bitcast to a v1i8 vector and use an extract_element. The code here was lifted from X86's avx512 support. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D128099	2022-06-18 11:06:45 -07:00
Kazu Hirata	621f58e716	[Target, CodeGen] Use isImm(), isReg(), etc (NFC)	2022-06-18 07:41:04 -07:00
Craig Topper	cbf6737cc4	[RISCV] Use RVVBitsPerBlock instead of hardcoding multiples of 64. NFC	2022-06-17 14:10:39 -07:00
Philip Reames	fb8ecca06f	[RISCV] Remove redundant code checking for exact VTYPE match [nfc] Should be fully covered by the generic demanded field based logic just below, and this ensures better coverage of that logic.	2022-06-17 12:20:20 -07:00
Philip Reames	4d245f1bc2	[RISCV] Move store policy and mask reg ops into demanded handling in InsertVSETVLI Doing so let's the post-mutation pass leverage the demanded info to rewrite vsetvlis before a store/mask-op to eliminate later vsetvlis. Sorry for the lack of store test change; all of my attempts to write something reasonable have been handled through existing logic.	2022-06-17 12:09:50 -07:00
Philip Reames	b595cddea7	[riscv] Extract isMaskRegOp helper [nfc]	2022-06-17 10:40:54 -07:00
Philip Reames	e1f1407beb	[RISCV] Delete dead elideCopy code in InsertVSETVLI [nfc] This code should be dead. A simple whole register copy of an IMPLICIT_DEF, is simply an IMPLICIT_DEF of it's own. (This would not be true for freeze, but is for copy.) If we find a case which gets here with vector operand copy of an IMPLICIT_DEF, we most likely have an earlier missed optimization anyways. (The most recent case of this was `e6c7a3a`, found by Craig during review of this patch.) There might be others, and if so, we'll revisit them individually as regressions are reported. Differential Revision: https://reviews.llvm.org/D127996	2022-06-17 09:58:11 -07:00
Philip Reames	755c84c62c	[RISCV] Avoid changing etype for splat of 0 or -1 A splat of the values 0 and -1 as sign extended 12 bit immediates are always the same bit pattern regardless of the etype used to perform the operation. As a result, we can sometimes avoid introducing a vsetvli just for the purposes of a splat. Looking at the diffs, we don't get a huge amount of immediate value out of this. We mostly push the vsetvli one instruction down, usually in front of a vmerge. We also don't get the corresponding fixed length vector cases because VL typically is changed despite the actual bits written being the same. Both of these are areas I plan to explore in future patches. Interestingly, this makes a great example of why we need the forward and backward implementation to be consistent. Before we merged the demanded field handling, if we implement only the forward direction, we lost the ability to mutate a prior vsetvli and eliminate a later one entirely. This resulted in practical regressions instead of improvements. It's always nice when practice matches theory. :) Differential Revision: https://reviews.llvm.org/D128006	2022-06-17 08:10:14 -07:00
Philip Reames	ea690e7019	[RISCV] Rename VTy param of RISCVTTIImpl::getArithmeticReductionCost [NFC] Having it be consistent with getMinMaxReductionCost for ease of copy paste outweights the minor clarity of calling it VTy instead of Ty.	2022-06-16 15:26:09 -07:00
Craig Topper	9d7b01dc95	[RISCV] Implement RISCVTargetLowering::getTargetConstantFromLoad. This allows computeKnownBits to see the constant being loaded. This recovers the rv64zbp test case changes from D127520. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127679	2022-06-16 15:11:18 -07:00
Craig Topper	5afdceb82b	[RISCV] Add RISCVISD opcode for PseudoLLA. Rather than emitting a MachineSDNode from lowering. Let isel match it. This is consistent with the RISCVISD::HI and ADD_LO nodes that were also added. Having them both the same will make D127679 consistent. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127714	2022-06-16 15:11:03 -07:00
Craig Topper	4191de262f	[RISCV] Don't emit LUI/ADDI MachineSDNodes from getAddr Instead add RISCVISD opcodes that will be selected to LUI/ADDI during isel. I'm looking into maybe moving doPeepholeLoadStoreADDI into isel. Having the ADDI as a RISCVISD node will make it visible to isel. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127713	2022-06-16 14:56:07 -07:00
Philip Reames	2fa2cee6a8	[RISCV] Start merging demanded reasoning - starting with load/stores [nfc] This change merges the logic for reasoning about demanded portions of the VTYPE register between the main dataflow algorithm and the backwards mutation post pass. In the process, we get to delete a bunch of now redundant code. This should be entirely NFC. I included a slight hack (see TODO) to avoid changing behavior in the post pass while being able to use the generalized logic in the prepass. I will fix the TODO in a separate change once this lands. Differential Revision: https://reviews.llvm.org/D127983	2022-06-16 14:34:53 -07:00
Philip Reames	d764aa7fc6	[RISCV] Add cost model for scalable scatter and gather The costing we use for fixed length vector gather and scatter is to simply count up the memory ops, and multiply by a fixed memory op cost. For scalable vectors, we don't actually know how many lanes are active. Instead, we have to end up making a worst case assumption on how many lanes could be active. In the generic +V case, this results in very high costs, but we can do better when we know an upper bound on the VLEN. There's some obvious ways to improve this - e.g. using information about VL and mask bits from the instruction to reduce the upper bound - but this seems like a reasonable starting point. The resulting costs do bias us pretty strongly away from generating scatter/gather for generic +V. Without this, we'd be returning an invalid cost and thus definitely not vectorizing, so no major change in practical behavior expected. Differential Revision: https://reviews.llvm.org/D127541	2022-06-16 14:22:31 -07:00
Philip Reames	89a11ebd8e	[RISCV] Avoid reducing etype just to initialize lane 0 of an undef vector If we're writing to an undef vector (i.e. implicit_def), we can change the value of bits outside the requested write without consequence. This allows us to avoid a VSETVLI just for narrowing the value written. Differential Revision: https://reviews.llvm.org/D127880	2022-06-16 11:14:21 -07:00
Craig Topper	6716195cd7	[RISCV] Merge TIED_TU and TIED instructions for VWADD_W/VWSUB_W by using policy operand. This removes one of the uses of ForceTailUndisturbed.	2022-06-16 10:06:11 -07:00
Philip Reames	6ed81ec164	[RISCV] Reorder function definitions to reduce upcoming diff [nfc]	2022-06-16 09:25:27 -07:00
Craig Topper	912a5172f8	[RISCV] Use TAIL_AGNOSTIC in riscv_fma_vl patterns. We may eventually need tail undisturbed patterns, but we will need a policy operand on the ISD node to communicate it.	2022-06-16 09:09:36 -07:00
Philip Reames	27c61d033f	[RISCV] Split DemandedField logic in advance of reuse in dataflow [nfc] This change just moves some code around, and extracts out a helper function expected to be useful when reusing the demanded field logic in the forward dataflow.	2022-06-16 08:49:41 -07:00
Philip Reames	37fa5850f1	[RISCV] Move getSEWLMULRatio out of VSETVLIInfo [nfc]	2022-06-16 08:40:20 -07:00
Craig Topper	b34e3f40e7	[RISCV] Use TAIL_UNDISTURBED_MASK_UNDISTURBED for riscv_slidedown_vl unless the merge op is undef. If the merge operand isn't undef we need to be using tail undisturbed. Turns out all of our uses of riscv_slidedown_vl use undef so this doesn't affect any tests.	2022-06-16 08:35:27 -07:00
Philip Reames	4a3e46115a	[RISCV] Extend demanded field transform in InsertVSETVLI to VTYPE subfeilds The motivating case, and the only one actually enabled by this patch, is a load or store followed by another op with the same SEW/LMUL ratio. As an example, consider: define void @test1(ptr %in, ptr %out) { entry: %0 = load <8 x i16>, ptr %in, align 2 %1 = sext <8 x i16> %0 to <8 x i32> store <8 x i32> %1, ptr %out, align 4 ret void } Without this patch, we get: vsetivli zero, 8, e16, mf4, ta, mu vle16.v v8, (a0) vsetvli zero, zero, e32, mf2, ta, mu vsext.vf2 v9, v8 vse32.v v9, (a1) ret Whereas with the patch we get: vsetivli zero, 8, e32, mf2, ta, mu vle16.v v8, (a0) vsext.vf2 v9, v8 vse32.v v9, (a1) ret We have rewritten the first vsetvli and thus removed the second one. As is strongly hinted by the code structure and todos, I am planning on communing this with all (or most all?) of the cases from isCompatible used in the forward data flow. This will be done in a series of following changes - some NFC reworks, and some reviewed optimization extensions. Differential Revision: https://reviews.llvm.org/D127780	2022-06-16 08:01:27 -07:00
Guillaume Chatelet	412c788ab0	[NFC][Alignment] Use Align in MCAlignFragment	2022-06-15 12:31:00 +00:00
Kito Cheng	687e56614f	[RISCV] Fixing undefined physical register issue when subreg liveness tracking enabled. RISC-V expand register tuple spilling into series of register spilling after register allocation phase by the pseudo instruction expansion, however part of register tuple might be still undefined during spilling, machine verifier will complain the spill instruction is using an undefined physical register. Optimal solution should be doing liveness analysis and do not emit spill and reload for those undefined parts, but accurate liveness info at that point is not so easy to get. So the suboptimal solution is still spill and reload those undefined parts, but adding implicit-use of super register to spill function, then machine verifier will only report report using undefined physical register if the when whole super register is undefined, and this behavior are also documented in MachineVerifier::checkLiveness[1]. Example for demo what happend: ``` v10m2 = xxx # v12m2 not define yet PseudoVSPILL2_M2 v10m2_v12m2 ... ``` After expansion: ``` v10m2 = xxx # v12m2 not define yet # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r VS2R_V v10m2 VS2R_V v12m2 # Use undef reg! ``` What this patch did: ``` v10m2 = xxx # v12m2 not define yet # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r VS2R_V v10m2 implicit v10m2_v12m2 # Use undef reg (v12m2), but v10m2_v12m2 ins't totally undef, so # that's OK. VS2R_V v12m2 implicit v10m2_v12m2 ``` [1] https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/MachineVerifier.cpp#L2016-L2019 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127642	2022-06-15 16:23:39 +08:00
Yeting Kuo	9096a52566	[RISCV] Teach vsetvli insertion to not insert redundant vsetvli right after VLEFF/VLSEGFF. VSETVLIInfos right after VLEFF/VLSEGFF are currently unknown since they modify VL. Unknown VSETVLIInfos make next vector operations needed to be inserted VSET(I)VLI. Actually the next vector operation of VLEFF/VLSEGFF may not need to be inserted VSET(I)VLI if it uses same VTYPE and the resulted vl of VLEFF/VLSEGFF. Take the below C code as an example, vint8m4_t vec_src1 = vle8ff_v_i8m4(str1, &new_vl, vl); vbool2_t mask1 = vmseq_vx_i8m4_b2(vec_src1, 0, new_vl); vsetvli insertion adds a redundant vsetvli for that, Assembly result: vsetvli a2,a2,e8,m4,ta,mu vle8ff.v v28,(a0) csrr a3,vl ; redundant vsetvli zero,a3,e8,m4,ta,mu ; redundant vmseq.vi v25,v28,0 After D126794, VLEFF/VLSEGFF has a define having value of VL. The patch consider there is a ghost vsetvli right after VLEFF/VLSEGFF. The ghost VSET(I)LIs use the vl output of the VLEFF/VLSEGFF as its AVL and same VTYPE of the VLEFF/VLSEGFF. The ghost vsetvli must be redundant, and we could use it to get the VSETVLIInfo right after VLEFF/VLSEGFF. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127576	2022-06-15 13:58:40 +08:00
wangpc	8910349e43	[RISCV][NFC] Set default value for BaseInstr in RISCVVPseudo Since almost all pseudos have the same form of BaseInstr, we can just set it as default value to reduce some lines. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127632	2022-06-15 10:59:45 +08:00
Craig Topper	5ae3f65cfa	[RISCV] Replace uses of VLOpFrag in VLMax patterns with srcvalue. These are on inner nodes and we're dropping the captured $vl anyway.	2022-06-14 19:19:35 -07:00
Philip Reames	facb96584e	[RISCV] Minor code/comment improvement in prepass of InsertVSETVLI [nfc]	2022-06-14 16:18:11 -07:00
Saleem Abdulrasool	1582bcd003	RISCV: handle 64-bit PCREL data relocations We would previously fail to handle 64-bit PC-relative relocations on RISCV. This was exposed by trying to build with `-fprofile-instr-generate`. The original changes restricted the relocation handling to the text segment as the paired relocations are undesirable in at least the debug and .eh_frame sections. We now make this explicit to handle the general case for the data relocations as well. It would be preferable to use `R_RISCV_n_PCREL` when available to avoid an extra relocation. Differential Revision: https://reviews.llvm.org/D127549 Reviewed By: luismarques, MaskRay Fixes: #55971	2022-06-14 21:39:16 +00:00
Philip Reames	c67c4133ac	[RISCV] Split out transfer function explicitly in VSETVLI insertion dataflow [nfc] In an effort to make this code easier to read and extend, this splits out helper functions for the transfer function of the data flow. Due to the other results computed during the phases, we can't completely abstract away everything, but we can abstract the actual state transitions. The motivation here is the following upcoming changes: * The fault first load patch - already approved, this will be rebased over - adds another case into the transferAfter path. * An upcoming patch to fold the local prepass back into the main algorithm greatly complicates the transferBefore logic. Differential Revision: https://reviews.llvm.org/D127761	2022-06-14 14:07:15 -07:00
Philip Reames	44a0a558dc	[RISCV] Split out subfields in InsertVSETVLI's demanded fields analysis [nfc] At the moment, this just gets the infrastructure in place. Following changes will start using this in non-trivial ways.	2022-06-14 11:35:24 -07:00
Philip Reames	52b166c0de	[RISCV] Split out getEEWForLoadStore [nfc] Mostly about allowing reuse in an upcoming patch, but also makes the code slightly easier to follow.	2022-06-14 10:10:43 -07:00
Philip Reames	7659dc6cdd	[RISCV] simplify emitVSETVLIs handling of vsetvli xN, phi(), vtype case [NFC] This is possibly somewhat subjective, but having an explicitly named flag to track the property required and code structure that more closely matches phase 1/2 of the dataflow seems much easier to read. Differential Revision: https://reviews.llvm.org/D126893	2022-06-14 08:00:24 -07:00
Craig Topper	17457be1c3	[RISCV] Fix use of texternalsym in output pattern where input was tglobaladdr. NFC I don't think the name used in the output pattern is used to control anything about the isel table emission, but it should match the input.	2022-06-13 15:42:42 -07:00
Craig Topper	e4062522d3	[RISCV] Disable matchSplatAsGather for i1 vectors to prevent creating illegal nodes. We were incorrectly creating a VRGATHER node with i1 vector type. We could support this by promoting the mask to i8 and truncating it, but for now I want to prevent the crash. Fixes PR56007. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127681	2022-06-13 13:41:39 -07:00
Craig Topper	bb1a52aa8b	Recommit "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates." With fix for sanitizer build bot failure.	2022-06-13 11:35:44 -07:00
Mitch Phillips	9d99870590	Revert "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates." This reverts commit `8bbcb98848`. Broke the UBSan bot. More details in https://reviews.llvm.org/D127376.	2022-06-13 10:16:28 -07:00
Philip Reames	aaeb958ced	[RISCV] Mutate instruction after computing transfer rule in InsertVSETVLI [nfc] If we defer the mutation of the instruction, we can add the assert discussed in D126921. Once we do that, the API becomes subject to revision - but let's do that in a separate change.	2022-06-13 09:08:25 -07:00
Craig Topper	cef03e3dcd	[RISCV] Move creation of constant pools from isel to lowering. This simplifies the isel code by removing the manual load creation. It also improves our ability to use 0 strided loads for vector splats. There is an assumption here that Mask and ShiftedMask constants are cheap enough that they don't become constant pool loads so that our isel optimizations involving And still work. I believe those constants are 3 instructions in the worst case. The rv64zbp-intrinsic.ll changes is a regression caused by intrinsics being expanded to RISCVISD also occuring during lowering. So the optimizations were only happening during the last DAGCombine, which can't see through the load. I believe we can fix this test by implementing TargetLowering::getTargetConstantFromLoad for RISC-V or by adding the intrinsic to computeKnownBitsForTargetNode to enable earlier DAG combine. Since Zbp is not a ratified extension, I don't view these as blocking this patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127520	2022-06-13 09:07:57 -07:00
Craig Topper	052536b923	[RISCV] Use isShiftedInt to improve readability. NFC	2022-06-12 21:04:45 -07:00
Hubert Tong	775a22e32a	[NFC] Remove unused variable `MF` https://reviews.llvm.org/D127583 removed the only use of this variable and broke builds with warnings-as-errors.	2022-06-12 16:31:55 -04:00
Craig Topper	d63b66840f	[RISCV] Move some methods out of RISCVInstrInfo and into RISCV namespace. These methods don't access any state from RISCVInstrInfo. Make them free functions in the RISCV namespace. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D127583	2022-06-12 10:47:21 -07:00
Fangrui Song	adf4142f76	[MC] De-capitalize SwitchSection. NFC Add SwitchSection to return switchSection. The API will be removed soon.	2022-06-10 22:50:55 -07:00
Philip Reames	536095a27c	[RISCV] Refine costs for i1 reductions Our actual lowering for i1 reductions uses ctpop combined with possibly a vector negate and possibly a logic op afterwards. I believe ctpop to be low cost on all reasonable hardware. The default costing implementation here was returning quite inconsistent costs. and/or were returning very high costs (because we seem to think moving into scalar registers is very expensive?) and others were returning lower but still too high (because of the assumed tree reduce strategy). While we should probably improve the generic costing strategy for i1 vectors, let's start by fixing the immediate problem. Differential Revision: https://reviews.llvm.org/D127511	2022-06-10 13:21:52 -07:00
Philip Reames	f7bb691d61	[RISCV] Implement isElementTypeLegalForScalableVector TTI hook This brings us into alignment with AArch64, and in the process fixes a compiler crash bug in uniform store handling in the vectorizer. Before the recent invalid cost bailout work, this would have also avoided crashes on invalid costs in some cases. I honestly think the vectorizer should gracefully bailout on uniform stores it can't use a scatter for, but it doesn't, so lets take the path of least resistance here. It's also possible that there are other vectorizer bugs AArch64 isn't seeing because of this hook; we don't want to be finding them either. Differential Revision: https://reviews.llvm.org/D127514	2022-06-10 13:20:58 -07:00
Craig Topper	08ea27bf13	[RISCV] Don't require loop simplify form in RISCVGatherScatterLowering. We need a preheader and a single latch, but we don't need a dedicated exit. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127513	2022-06-10 13:00:20 -07:00
Shao-Ce SUN	117e10304b	[RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo` Fix build errors in D126794 ``` ld.lld: error: undefined symbol: llvm::MachineInstr::getNumExplicitDefs() const >>> referenced by RISCVBaseInfo.cpp >>> RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a ld.lld: error: undefined symbol: llvm::MachineInstr::findRegisterDefOperandIdx(llvm::Register, bool, bool, llvm::TargetRegisterInfo const*) const >>> referenced by RISCVBaseInfo.cpp >>> RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a clang-15: error: linker command failed with exit code 1 (use -v to see invocation) ``` Reviewed By: fakepaper56, craig.topper Differential Revision: https://reviews.llvm.org/D127477	2022-06-11 00:27:53 +08:00
Shao-Ce SUN	93116374e7	Revert "[RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo`" This reverts commit `e018e493c1`. There are some problems with this commit, related revision: https://reviews.llvm.org/D127477	2022-06-11 00:03:04 +08:00
Craig Topper	e91051184c	[RISCV] Mark FSIN and other math functions as Expand for scalable vectors. This prevents them from being assumed legal by the cost model. This matches what is done for AArch64 SVE. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D123799	2022-06-10 08:40:07 -07:00
Shao-Ce SUN	e018e493c1	[RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo` Fix build errors in D126794 ``` ld.lld: error: undefined symbol: llvm::MachineInstr::getNumExplicitDefs() const >>> referenced by RISCVBaseInfo.cpp >>> RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a ld.lld: error: undefined symbol: llvm::MachineInstr::findRegisterDefOperandIdx(llvm::Register, bool, bool, llvm::TargetRegisterInfo const*) const >>> referenced by RISCVBaseInfo.cpp >>> RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a clang-15: error: linker command failed with exit code 1 (use -v to see invocation) ``` Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D127477	2022-06-10 21:03:47 +08:00
Yeting Kuo	f68cad9087	[RISCV] Lower VLEFF/VLSEGFF SDNodes to MachineInstrs with VL outputs. The patch is a replacement of D125199. PseudoReadVL with vtype has worry for computing same vtypes of VLEFF/VLSEGFF in two different places, DAGToDAG and InsertVSETVLI. VLEFF/VLSEGFF MI with VL output still could provide the vtype of VLEFF/VLSEGFF to the users of its VL. The patch names the new pseudo as original VLEFF/VLSEGFF name suffixed "_VL" and expand them in RISCVInsertVSETVLI pass. This patch also reverts commit `4537aae0d5`, "[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.". Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126794	2022-06-10 13:57:10 +08:00
Philip Reames	28be4b7454	[RISCV] Simplify InstrInfo access in doPeepholeMaskedRVV [nfc]	2022-06-09 17:02:40 -07:00
Craig Topper	8bbcb98848	[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates. For an addition with simm14 and simm15 immediates with 2 or 3 trailing bits, we can use a shXadd instruction and an addi to do the addition. This patch teaches RISCVMergeBaseOffset to see through this pattern. I don't think the sh1add case occurs because we use two addis for that, but I implemented it for completeness. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127376	2022-06-09 16:07:35 -07:00
Kito Cheng	4b11f90903	[RISCV] Fix missing stack pointer recover In order to make sure the stack point is right through the EH region, we also need to restore stack pointer from the frame pointer if we don't preserve stack space within prologue/epilogue for outgoing variables, normally it's just checking the variable sized object is present or not is enough, but we also don't preserve that at prologue/epilogue when have vector objects in stack. Example to show what happened: ``` try { sp adjust for outgoing args. // 1. Sp changed. func_call // 2. Exception raised sp restore // Oh, not restored } catch { // 3. And now we are here. } // 4. Prepare to return!, restore return address from stack, but...sp is wrong. // 5. Screw up! ``` Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D126861	2022-06-09 23:38:50 +08:00
Philip Reames	0e29a80fdc	[RISCV] Add cost model for reverse shuffle The majority of the cost appears to be forming the indices vector. Differential Revision: https://reviews.llvm.org/D127141	2022-06-09 07:21:40 -07:00
Craig Topper	c739088af5	[RISCV] Fix 80 column violations in RISCVInsertVSETVLI.cpp. NFC I think these were likely introduced in the recent work done to this pass.	2022-06-08 18:58:48 -07:00
Craig Topper	209c07d486	[RISCV] Add debug message that should have been in D126843. For consistency with the other messages in this file.	2022-06-08 16:46:22 -07:00
Craig Topper	e4ba24c17d	[RISCV] Support (addi (addi globaladdr, C1), C2) in RISCVMergeBaseOffset. Add with immediates in the range [-4096, -2049] or [2048, 4095] get convert to two ADDIs. Teach RISCVMergeBaseOffset to recognize this pattern as well. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D126843	2022-06-08 08:20:37 -07:00
Craig Topper	33f4da2455	[RISCV] Support LUI+ADDIW in RISCVMergeBaseOffsetOpt::matchLargeOffset. LUI+ADDIW always produces a simm32. This allows us to always fold it into a global offset. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D126729	2022-06-08 08:19:21 -07:00
Philip Reames	1ea99328b4	[RISCV] Untangle instruction properties from VSETVLIInfo [NFC] The abstract state used in the data flow should not know anything about the instructions which produced the abstract states. Instead, when comparing two states, we can simply use information about the machine instr at that time. In the old design, basically any use of the instruction flags on the current (as opposed to a "Require" - aka upcoming state) would be a bug. We don't seem to actually have any such bugs, but we can make this much more obvious with code structure. Differential Revision: https://reviews.llvm.org/D126921	2022-06-08 08:09:59 -07:00

1 2 3 4 5 ...

2266 Commits