llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	5afdceb82b	[RISCV] Add RISCVISD opcode for PseudoLLA. Rather than emitting a MachineSDNode from lowering. Let isel match it. This is consistent with the RISCVISD::HI and ADD_LO nodes that were also added. Having them both the same will make D127679 consistent. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127714	2022-06-16 15:11:03 -07:00
Craig Topper	4191de262f	[RISCV] Don't emit LUI/ADDI MachineSDNodes from getAddr Instead add RISCVISD opcodes that will be selected to LUI/ADDI during isel. I'm looking into maybe moving doPeepholeLoadStoreADDI into isel. Having the ADDI as a RISCVISD node will make it visible to isel. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127713	2022-06-16 14:56:07 -07:00
Philip Reames	2fa2cee6a8	[RISCV] Start merging demanded reasoning - starting with load/stores [nfc] This change merges the logic for reasoning about demanded portions of the VTYPE register between the main dataflow algorithm and the backwards mutation post pass. In the process, we get to delete a bunch of now redundant code. This should be entirely NFC. I included a slight hack (see TODO) to avoid changing behavior in the post pass while being able to use the generalized logic in the prepass. I will fix the TODO in a separate change once this lands. Differential Revision: https://reviews.llvm.org/D127983	2022-06-16 14:34:53 -07:00
Philip Reames	d764aa7fc6	[RISCV] Add cost model for scalable scatter and gather The costing we use for fixed length vector gather and scatter is to simply count up the memory ops, and multiply by a fixed memory op cost. For scalable vectors, we don't actually know how many lanes are active. Instead, we have to end up making a worst case assumption on how many lanes could be active. In the generic +V case, this results in very high costs, but we can do better when we know an upper bound on the VLEN. There's some obvious ways to improve this - e.g. using information about VL and mask bits from the instruction to reduce the upper bound - but this seems like a reasonable starting point. The resulting costs do bias us pretty strongly away from generating scatter/gather for generic +V. Without this, we'd be returning an invalid cost and thus definitely not vectorizing, so no major change in practical behavior expected. Differential Revision: https://reviews.llvm.org/D127541	2022-06-16 14:22:31 -07:00
Philip Reames	89a11ebd8e	[RISCV] Avoid reducing etype just to initialize lane 0 of an undef vector If we're writing to an undef vector (i.e. implicit_def), we can change the value of bits outside the requested write without consequence. This allows us to avoid a VSETVLI just for narrowing the value written. Differential Revision: https://reviews.llvm.org/D127880	2022-06-16 11:14:21 -07:00
Craig Topper	6716195cd7	[RISCV] Merge TIED_TU and TIED instructions for VWADD_W/VWSUB_W by using policy operand. This removes one of the uses of ForceTailUndisturbed.	2022-06-16 10:06:11 -07:00
Philip Reames	6ed81ec164	[RISCV] Reorder function definitions to reduce upcoming diff [nfc]	2022-06-16 09:25:27 -07:00
Craig Topper	912a5172f8	[RISCV] Use TAIL_AGNOSTIC in riscv_fma_vl patterns. We may eventually need tail undisturbed patterns, but we will need a policy operand on the ISD node to communicate it.	2022-06-16 09:09:36 -07:00
Philip Reames	27c61d033f	[RISCV] Split DemandedField logic in advance of reuse in dataflow [nfc] This change just moves some code around, and extracts out a helper function expected to be useful when reusing the demanded field logic in the forward dataflow.	2022-06-16 08:49:41 -07:00
Philip Reames	37fa5850f1	[RISCV] Move getSEWLMULRatio out of VSETVLIInfo [nfc]	2022-06-16 08:40:20 -07:00
Craig Topper	b34e3f40e7	[RISCV] Use TAIL_UNDISTURBED_MASK_UNDISTURBED for riscv_slidedown_vl unless the merge op is undef. If the merge operand isn't undef we need to be using tail undisturbed. Turns out all of our uses of riscv_slidedown_vl use undef so this doesn't affect any tests.	2022-06-16 08:35:27 -07:00
Philip Reames	4a3e46115a	[RISCV] Extend demanded field transform in InsertVSETVLI to VTYPE subfeilds The motivating case, and the only one actually enabled by this patch, is a load or store followed by another op with the same SEW/LMUL ratio. As an example, consider: define void @test1(ptr %in, ptr %out) { entry: %0 = load <8 x i16>, ptr %in, align 2 %1 = sext <8 x i16> %0 to <8 x i32> store <8 x i32> %1, ptr %out, align 4 ret void } Without this patch, we get: vsetivli zero, 8, e16, mf4, ta, mu vle16.v v8, (a0) vsetvli zero, zero, e32, mf2, ta, mu vsext.vf2 v9, v8 vse32.v v9, (a1) ret Whereas with the patch we get: vsetivli zero, 8, e32, mf2, ta, mu vle16.v v8, (a0) vsext.vf2 v9, v8 vse32.v v9, (a1) ret We have rewritten the first vsetvli and thus removed the second one. As is strongly hinted by the code structure and todos, I am planning on communing this with all (or most all?) of the cases from isCompatible used in the forward data flow. This will be done in a series of following changes - some NFC reworks, and some reviewed optimization extensions. Differential Revision: https://reviews.llvm.org/D127780	2022-06-16 08:01:27 -07:00
Guillaume Chatelet	412c788ab0	[NFC][Alignment] Use Align in MCAlignFragment	2022-06-15 12:31:00 +00:00
Kito Cheng	687e56614f	[RISCV] Fixing undefined physical register issue when subreg liveness tracking enabled. RISC-V expand register tuple spilling into series of register spilling after register allocation phase by the pseudo instruction expansion, however part of register tuple might be still undefined during spilling, machine verifier will complain the spill instruction is using an undefined physical register. Optimal solution should be doing liveness analysis and do not emit spill and reload for those undefined parts, but accurate liveness info at that point is not so easy to get. So the suboptimal solution is still spill and reload those undefined parts, but adding implicit-use of super register to spill function, then machine verifier will only report report using undefined physical register if the when whole super register is undefined, and this behavior are also documented in MachineVerifier::checkLiveness[1]. Example for demo what happend: ``` v10m2 = xxx # v12m2 not define yet PseudoVSPILL2_M2 v10m2_v12m2 ... ``` After expansion: ``` v10m2 = xxx # v12m2 not define yet # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r VS2R_V v10m2 VS2R_V v12m2 # Use undef reg! ``` What this patch did: ``` v10m2 = xxx # v12m2 not define yet # Expand PseudoVSPILL2_M2 v10m2_v12m2 to 2 vs2r VS2R_V v10m2 implicit v10m2_v12m2 # Use undef reg (v12m2), but v10m2_v12m2 ins't totally undef, so # that's OK. VS2R_V v12m2 implicit v10m2_v12m2 ``` [1] https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/MachineVerifier.cpp#L2016-L2019 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127642	2022-06-15 16:23:39 +08:00
Yeting Kuo	9096a52566	[RISCV] Teach vsetvli insertion to not insert redundant vsetvli right after VLEFF/VLSEGFF. VSETVLIInfos right after VLEFF/VLSEGFF are currently unknown since they modify VL. Unknown VSETVLIInfos make next vector operations needed to be inserted VSET(I)VLI. Actually the next vector operation of VLEFF/VLSEGFF may not need to be inserted VSET(I)VLI if it uses same VTYPE and the resulted vl of VLEFF/VLSEGFF. Take the below C code as an example, vint8m4_t vec_src1 = vle8ff_v_i8m4(str1, &new_vl, vl); vbool2_t mask1 = vmseq_vx_i8m4_b2(vec_src1, 0, new_vl); vsetvli insertion adds a redundant vsetvli for that, Assembly result: vsetvli a2,a2,e8,m4,ta,mu vle8ff.v v28,(a0) csrr a3,vl ; redundant vsetvli zero,a3,e8,m4,ta,mu ; redundant vmseq.vi v25,v28,0 After D126794, VLEFF/VLSEGFF has a define having value of VL. The patch consider there is a ghost vsetvli right after VLEFF/VLSEGFF. The ghost VSET(I)LIs use the vl output of the VLEFF/VLSEGFF as its AVL and same VTYPE of the VLEFF/VLSEGFF. The ghost vsetvli must be redundant, and we could use it to get the VSETVLIInfo right after VLEFF/VLSEGFF. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127576	2022-06-15 13:58:40 +08:00
wangpc	8910349e43	[RISCV][NFC] Set default value for BaseInstr in RISCVVPseudo Since almost all pseudos have the same form of BaseInstr, we can just set it as default value to reduce some lines. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D127632	2022-06-15 10:59:45 +08:00
Craig Topper	5ae3f65cfa	[RISCV] Replace uses of VLOpFrag in VLMax patterns with srcvalue. These are on inner nodes and we're dropping the captured $vl anyway.	2022-06-14 19:19:35 -07:00
Philip Reames	facb96584e	[RISCV] Minor code/comment improvement in prepass of InsertVSETVLI [nfc]	2022-06-14 16:18:11 -07:00
Saleem Abdulrasool	1582bcd003	RISCV: handle 64-bit PCREL data relocations We would previously fail to handle 64-bit PC-relative relocations on RISCV. This was exposed by trying to build with `-fprofile-instr-generate`. The original changes restricted the relocation handling to the text segment as the paired relocations are undesirable in at least the debug and .eh_frame sections. We now make this explicit to handle the general case for the data relocations as well. It would be preferable to use `R_RISCV_n_PCREL` when available to avoid an extra relocation. Differential Revision: https://reviews.llvm.org/D127549 Reviewed By: luismarques, MaskRay Fixes: #55971	2022-06-14 21:39:16 +00:00
Philip Reames	c67c4133ac	[RISCV] Split out transfer function explicitly in VSETVLI insertion dataflow [nfc] In an effort to make this code easier to read and extend, this splits out helper functions for the transfer function of the data flow. Due to the other results computed during the phases, we can't completely abstract away everything, but we can abstract the actual state transitions. The motivation here is the following upcoming changes: * The fault first load patch - already approved, this will be rebased over - adds another case into the transferAfter path. * An upcoming patch to fold the local prepass back into the main algorithm greatly complicates the transferBefore logic. Differential Revision: https://reviews.llvm.org/D127761	2022-06-14 14:07:15 -07:00
Philip Reames	44a0a558dc	[RISCV] Split out subfields in InsertVSETVLI's demanded fields analysis [nfc] At the moment, this just gets the infrastructure in place. Following changes will start using this in non-trivial ways.	2022-06-14 11:35:24 -07:00
Philip Reames	52b166c0de	[RISCV] Split out getEEWForLoadStore [nfc] Mostly about allowing reuse in an upcoming patch, but also makes the code slightly easier to follow.	2022-06-14 10:10:43 -07:00
Philip Reames	7659dc6cdd	[RISCV] simplify emitVSETVLIs handling of vsetvli xN, phi(), vtype case [NFC] This is possibly somewhat subjective, but having an explicitly named flag to track the property required and code structure that more closely matches phase 1/2 of the dataflow seems much easier to read. Differential Revision: https://reviews.llvm.org/D126893	2022-06-14 08:00:24 -07:00
Craig Topper	17457be1c3	[RISCV] Fix use of texternalsym in output pattern where input was tglobaladdr. NFC I don't think the name used in the output pattern is used to control anything about the isel table emission, but it should match the input.	2022-06-13 15:42:42 -07:00
Craig Topper	e4062522d3	[RISCV] Disable matchSplatAsGather for i1 vectors to prevent creating illegal nodes. We were incorrectly creating a VRGATHER node with i1 vector type. We could support this by promoting the mask to i8 and truncating it, but for now I want to prevent the crash. Fixes PR56007. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127681	2022-06-13 13:41:39 -07:00
Craig Topper	bb1a52aa8b	Recommit "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates." With fix for sanitizer build bot failure.	2022-06-13 11:35:44 -07:00
Mitch Phillips	9d99870590	Revert "[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates." This reverts commit `8bbcb98848`. Broke the UBSan bot. More details in https://reviews.llvm.org/D127376.	2022-06-13 10:16:28 -07:00
Philip Reames	aaeb958ced	[RISCV] Mutate instruction after computing transfer rule in InsertVSETVLI [nfc] If we defer the mutation of the instruction, we can add the assert discussed in D126921. Once we do that, the API becomes subject to revision - but let's do that in a separate change.	2022-06-13 09:08:25 -07:00
Craig Topper	cef03e3dcd	[RISCV] Move creation of constant pools from isel to lowering. This simplifies the isel code by removing the manual load creation. It also improves our ability to use 0 strided loads for vector splats. There is an assumption here that Mask and ShiftedMask constants are cheap enough that they don't become constant pool loads so that our isel optimizations involving And still work. I believe those constants are 3 instructions in the worst case. The rv64zbp-intrinsic.ll changes is a regression caused by intrinsics being expanded to RISCVISD also occuring during lowering. So the optimizations were only happening during the last DAGCombine, which can't see through the load. I believe we can fix this test by implementing TargetLowering::getTargetConstantFromLoad for RISC-V or by adding the intrinsic to computeKnownBitsForTargetNode to enable earlier DAG combine. Since Zbp is not a ratified extension, I don't view these as blocking this patch. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127520	2022-06-13 09:07:57 -07:00
Craig Topper	052536b923	[RISCV] Use isShiftedInt to improve readability. NFC	2022-06-12 21:04:45 -07:00
Hubert Tong	775a22e32a	[NFC] Remove unused variable `MF` https://reviews.llvm.org/D127583 removed the only use of this variable and broke builds with warnings-as-errors.	2022-06-12 16:31:55 -04:00
Craig Topper	d63b66840f	[RISCV] Move some methods out of RISCVInstrInfo and into RISCV namespace. These methods don't access any state from RISCVInstrInfo. Make them free functions in the RISCV namespace. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D127583	2022-06-12 10:47:21 -07:00
Fangrui Song	adf4142f76	[MC] De-capitalize SwitchSection. NFC Add SwitchSection to return switchSection. The API will be removed soon.	2022-06-10 22:50:55 -07:00
Philip Reames	536095a27c	[RISCV] Refine costs for i1 reductions Our actual lowering for i1 reductions uses ctpop combined with possibly a vector negate and possibly a logic op afterwards. I believe ctpop to be low cost on all reasonable hardware. The default costing implementation here was returning quite inconsistent costs. and/or were returning very high costs (because we seem to think moving into scalar registers is very expensive?) and others were returning lower but still too high (because of the assumed tree reduce strategy). While we should probably improve the generic costing strategy for i1 vectors, let's start by fixing the immediate problem. Differential Revision: https://reviews.llvm.org/D127511	2022-06-10 13:21:52 -07:00
Philip Reames	f7bb691d61	[RISCV] Implement isElementTypeLegalForScalableVector TTI hook This brings us into alignment with AArch64, and in the process fixes a compiler crash bug in uniform store handling in the vectorizer. Before the recent invalid cost bailout work, this would have also avoided crashes on invalid costs in some cases. I honestly think the vectorizer should gracefully bailout on uniform stores it can't use a scatter for, but it doesn't, so lets take the path of least resistance here. It's also possible that there are other vectorizer bugs AArch64 isn't seeing because of this hook; we don't want to be finding them either. Differential Revision: https://reviews.llvm.org/D127514	2022-06-10 13:20:58 -07:00
Craig Topper	08ea27bf13	[RISCV] Don't require loop simplify form in RISCVGatherScatterLowering. We need a preheader and a single latch, but we don't need a dedicated exit. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127513	2022-06-10 13:00:20 -07:00
Shao-Ce SUN	117e10304b	[RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo` Fix build errors in D126794 ``` ld.lld: error: undefined symbol: llvm::MachineInstr::getNumExplicitDefs() const >>> referenced by RISCVBaseInfo.cpp >>> RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a ld.lld: error: undefined symbol: llvm::MachineInstr::findRegisterDefOperandIdx(llvm::Register, bool, bool, llvm::TargetRegisterInfo const*) const >>> referenced by RISCVBaseInfo.cpp >>> RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a clang-15: error: linker command failed with exit code 1 (use -v to see invocation) ``` Reviewed By: fakepaper56, craig.topper Differential Revision: https://reviews.llvm.org/D127477	2022-06-11 00:27:53 +08:00
Shao-Ce SUN	93116374e7	Revert "[RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo`" This reverts commit `e018e493c1`. There are some problems with this commit, related revision: https://reviews.llvm.org/D127477	2022-06-11 00:03:04 +08:00
Craig Topper	e91051184c	[RISCV] Mark FSIN and other math functions as Expand for scalable vectors. This prevents them from being assumed legal by the cost model. This matches what is done for AArch64 SVE. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D123799	2022-06-10 08:40:07 -07:00
Shao-Ce SUN	e018e493c1	[RISCV] move `isFaultFirstLoad` into `RISCVInstrInfo` Fix build errors in D126794 ``` ld.lld: error: undefined symbol: llvm::MachineInstr::getNumExplicitDefs() const >>> referenced by RISCVBaseInfo.cpp >>> RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a ld.lld: error: undefined symbol: llvm::MachineInstr::findRegisterDefOperandIdx(llvm::Register, bool, bool, llvm::TargetRegisterInfo const*) const >>> referenced by RISCVBaseInfo.cpp >>> RISCVBaseInfo.cpp.o:(llvm::isFaultFirstLoad(llvm::MachineInstr const&)) in archive lib/libLLVMRISCVDesc.a clang-15: error: linker command failed with exit code 1 (use -v to see invocation) ``` Reviewed By: fakepaper56 Differential Revision: https://reviews.llvm.org/D127477	2022-06-10 21:03:47 +08:00
Yeting Kuo	f68cad9087	[RISCV] Lower VLEFF/VLSEGFF SDNodes to MachineInstrs with VL outputs. The patch is a replacement of D125199. PseudoReadVL with vtype has worry for computing same vtypes of VLEFF/VLSEGFF in two different places, DAGToDAG and InsertVSETVLI. VLEFF/VLSEGFF MI with VL output still could provide the vtype of VLEFF/VLSEGFF to the users of its VL. The patch names the new pseudo as original VLEFF/VLSEGFF name suffixed "_VL" and expand them in RISCVInsertVSETVLI pass. This patch also reverts commit `4537aae0d5`, "[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF.". Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126794	2022-06-10 13:57:10 +08:00
Philip Reames	28be4b7454	[RISCV] Simplify InstrInfo access in doPeepholeMaskedRVV [nfc]	2022-06-09 17:02:40 -07:00
Craig Topper	8bbcb98848	[RISCV] Teach RISCVMergeBaseOffset about cases where we use SHXADD to add some immediates. For an addition with simm14 and simm15 immediates with 2 or 3 trailing bits, we can use a shXadd instruction and an addi to do the addition. This patch teaches RISCVMergeBaseOffset to see through this pattern. I don't think the sh1add case occurs because we use two addis for that, but I implemented it for completeness. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127376	2022-06-09 16:07:35 -07:00
Kito Cheng	4b11f90903	[RISCV] Fix missing stack pointer recover In order to make sure the stack point is right through the EH region, we also need to restore stack pointer from the frame pointer if we don't preserve stack space within prologue/epilogue for outgoing variables, normally it's just checking the variable sized object is present or not is enough, but we also don't preserve that at prologue/epilogue when have vector objects in stack. Example to show what happened: ``` try { sp adjust for outgoing args. // 1. Sp changed. func_call // 2. Exception raised sp restore // Oh, not restored } catch { // 3. And now we are here. } // 4. Prepare to return!, restore return address from stack, but...sp is wrong. // 5. Screw up! ``` Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D126861	2022-06-09 23:38:50 +08:00
Philip Reames	0e29a80fdc	[RISCV] Add cost model for reverse shuffle The majority of the cost appears to be forming the indices vector. Differential Revision: https://reviews.llvm.org/D127141	2022-06-09 07:21:40 -07:00
Craig Topper	c739088af5	[RISCV] Fix 80 column violations in RISCVInsertVSETVLI.cpp. NFC I think these were likely introduced in the recent work done to this pass.	2022-06-08 18:58:48 -07:00
Craig Topper	209c07d486	[RISCV] Add debug message that should have been in D126843. For consistency with the other messages in this file.	2022-06-08 16:46:22 -07:00
Craig Topper	e4ba24c17d	[RISCV] Support (addi (addi globaladdr, C1), C2) in RISCVMergeBaseOffset. Add with immediates in the range [-4096, -2049] or [2048, 4095] get convert to two ADDIs. Teach RISCVMergeBaseOffset to recognize this pattern as well. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D126843	2022-06-08 08:20:37 -07:00
Craig Topper	33f4da2455	[RISCV] Support LUI+ADDIW in RISCVMergeBaseOffsetOpt::matchLargeOffset. LUI+ADDIW always produces a simm32. This allows us to always fold it into a global offset. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D126729	2022-06-08 08:19:21 -07:00
Philip Reames	1ea99328b4	[RISCV] Untangle instruction properties from VSETVLIInfo [NFC] The abstract state used in the data flow should not know anything about the instructions which produced the abstract states. Instead, when comparing two states, we can simply use information about the machine instr at that time. In the old design, basically any use of the instruction flags on the current (as opposed to a "Require" - aka upcoming state) would be a bug. We don't seem to actually have any such bugs, but we can make this much more obvious with code structure. Differential Revision: https://reviews.llvm.org/D126921	2022-06-08 08:09:59 -07:00
Shao-Ce SUN	862f30a428	[RISCV] Add ISD::EH_DWARF_CFA Based on D24038. LLVM has an @llvm.eh.dwarf.cfa intrinsic, used to lower the GCC-compatible __builtin_dwarf_cfa() builtin. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D126181	2022-06-08 22:03:30 +08:00
Craig Topper	0c66deb498	[RISCV] Scalarize gather/scatter on RV64 with Zve32* extension. i64 indices aren't supported on Zve32*. Scalarize gathers to prevent generating illegal instructions. Since InstCombine will aggressively canonicalize GEP indices to pointer size, we're pretty much always going to have an i64 index. Trying to predict when SelectionDAG will find a smaller index from the TTI hook used by the ScalarizeMaskedMemIntrinPass seems fragile. To optimize this we probably need an IR pass to rewrite it earlier. Test RUN lines have also been added to make sure the strided load/store optimization still works. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127179	2022-06-07 08:07:50 -07:00
Matt Arsenault	cc5a1b3dd9	llvm-reduce: Add cloning of target MachineFunctionInfo MIR support is totally unusable for AMDGPU without this, since the set of reserved registers is set from fields here. Add a clone method to MachineFunctionInfo. This is a subtle variant of the copy constructor that is required if there are any MIR constructs that use pointers. Specifically, at minimum fields that reference MachineBasicBlocks or the MachineFunction need to be adjusted to the values in the new function.	2022-06-07 10:14:48 -04:00
Philip Reames	3fa5876216	[RISCV] Reorganize getShuffleCost to make it more clear what's going on [nfc]	2022-06-06 10:11:58 -07:00
yanming	bc93d51d36	[NFC][RISCV][format] Blank line between functions, remove unnecessary semicolon.	2022-06-06 15:38:14 +08:00
yanming	8d9d8f866a	[RISCV] Define risc-v's own register class to model FP Register. The default RegisterClass is not enough to model RISCV Register. We define risc-v's own register class to model FP Register. This helps to better estimate the register pressure in the loop-vectorize. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D126854	2022-06-06 14:43:52 +08:00
Fangrui Song	77e300ffdf	[MC] Change EndOfStatement "unexpected tokens in .xxx directive " to "expected newline"	2022-06-05 15:11:01 -07:00
LiaoChunyu	f14d18c7a9	[RISCV] Add more patterns for FNMADD D54205 handles fnmadd: -rs1 * rs2 - rs3 This patch add fnmadd: -(rs1 * rs2 + rs3) (the nsz flag on the FMA) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126852	2022-06-04 12:31:45 +08:00
Craig Topper	cc3bd43533	[RISCV] Support LUI+ADDIW in doPeepholeLoadStoreADDI. This fixes an inconsistency between RV32 and RV64. Still considering trying to do this peephole during isel, but wanted to fix the inconsistency first. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126986	2022-06-03 18:06:56 -07:00
Craig Topper	170c550ca8	[RISCV] Use SelectionDAG::isBaseWithConstantOffset in scalar load/store address matching. Test changes are because isBaseWithConstantOffset uses computeKnownBits and that is able to see that an earlier AND instruction guaranteed alignment so that we can treat an OR as an ADD. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126970	2022-06-03 10:55:28 -07:00
Craig Topper	4402852002	[RISCV] Reduce scalar load/store isel patterns to a single ComplexPattern. NFCI Previously we had 3 different isel patterns for every scalar load store instruction. This reduces them to a single ComplexPattern that returns the Base and Offset. Or an offset of 0 if there was no offset identified I've done a similar thing for the 2 isel patterns that match add/or with FrameIndex and immediate. Using the offset of 0, I was also able to remove the custom handler for FrameIndex. Happy to split that to another patch. We might be able to enhance in the future to remove the post-isel peephole or the special handling for ADD with constant added by D126576. A nice side effect is that this removes nearly 3000 bytes from the isel table. Differential Revision: https://reviews.llvm.org/D126932	2022-06-03 09:00:17 -07:00
Craig Topper	1d67adbfbf	[RISCV] Give CSImm12MulBy4 PatLeaf priority over CSImm12MulBy8. NFC The immediate range check for CSImm12MulBy8 included some values covered by CSImm12MulBy4. I assume CSImm12MulBy4 had priority due to pattern order in the td file, but this makes the priority explicit in the predicate.	2022-06-02 20:51:14 -07:00
Craig Topper	dbead2388b	[RISCV] Add custom isel for (add X, imm) used by load/stores. If the imm is out of range for an ADDI, we will materialize it in a register using multiple instructions. If the ADD is used by a load/store, doPeepholeLoadStoreADDI can try to pull an ADDI from the constant materialization into the load/store offset. This only works if the ADD has a single use, otherwise the peephole would have to rebuild multiple nodes. This patch instead tries to solve the problem when the add is selected. We check that the add is only used by loads/stores and if it is we will select it to (ADDI (ADD X, Imm-Lo12), Lo12). This will enable the simple case in doPeepholeLoadStoreADDI that can bypass an ADDI used as a pointer. As a result we can remove the more complicated peephole from doPeepholeLoadStoreADDI. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126576	2022-06-02 13:45:32 -07:00
Philip Reames	76ac916d63	[RISCV] Inline one copy of needVSETVLI into the other [NFC] Calling the non-MI version directly was unsound (as fixed in `dcdb0bf2`), so remove that version to decrease likelyhood of future mistakes.	2022-06-02 13:06:18 -07:00
Philip Reames	dcdb0bf25b	[RISCV] Fix an inconsistency with compatible load/store handling Once we've computed the incoming predecessor state, we should use the same compatibility check with knowledge of MI as we did in phase 2 in order to be consistent across all phases. Differential Revision: https://reviews.llvm.org/D126574	2022-06-02 08:03:51 -07:00
Craig Topper	909a78b3a4	[RISCV] Use MachineRegisterInfo::use_instr_begin instead of use_begin+getParent. NFCI	2022-06-01 15:37:48 -07:00
Craig Topper	aeb27f133a	[RISCV] Fix i64<->f64 and i32<->f32 bitcasts with VLS vectors enabled. We enable a custom handler to optimize conversions between scalars and fixed vectors. Unfortunately, the custom handler picks up scalar to scalar conversions as well. If the scalar types are both legal, we wouldn't match any of the fixed vector cases and would return SDValue() causing the LegalizeDAG to expand the bitcast through memory. This patch fixes this by checking if it's a scalar to scalar conversion and returns `Op` if both types are legal. Differential Revision: https://reviews.llvm.org/D126739	2022-06-01 08:13:49 -07:00
Craig Topper	1b2de79ff4	[RISCV] Use two ADDIs to do some stack pointer adjustments. If the adjustment doesn't fit in 12 bits, try to break it into two 12 bit values before falling back to movImm+add/sub. This is based on a similar idea from isel. Reviewed By: luismarques, reames Differential Revision: https://reviews.llvm.org/D126392	2022-05-31 10:25:28 -07:00
Craig Topper	80c4cf6369	[RISCV] Fix a few corner case bugs in RISCVMergeBaseOffsetOpt::matchLargeOffset The immediate for LUI is stored as 20-bit unsigned value. We need to sign extend if after shifting by 12 to match the instruction behavior. If we find an LUI+ADDI on RV64, it means the constant isn't a simm32. If it was, we would have emitted LUI+ADDIW from constant materialization. Make sure the constant is a simm32 before folding. This appears to match gcc. A future patch will add support for LUI+ADDIW on RV64.	2022-05-31 09:50:54 -07:00
Fraser Cormack	5a2e640eb7	[RISCV][NFC] Adjust some comments in RISCVInsertVSETVLI Capitalize the first letter of comments like the others, and a few other tweaks.	2022-05-31 10:13:15 +01:00
eopXD	2cadf84fc8	[RISCV] Pass OptLevel to `RISCVDAGToDAGISel` correctly Originally, `OptLevel` isn't passed into the `MachineFunctionPass`. This lets the default parameter of `SelectionDAGISel`, which is `CodeGenOpt::Default`, be passed in. OptLevelChanger captures the optimization level with the parameter, and rather not the value within `TargetMachine`. This lets the optimization be unintentionally overwriten if other value than `CodeGenOpt::Default` passed. This patch fixes this by passing the optimization level rather than using the default value. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D126641	2022-05-30 17:22:50 -07:00
Craig Topper	6a6cf2e28d	[RISCV] isel (add (and X, 0x1FFFFFFFE), Y) as (SH1ADD (SRLI X, 1), Y) This pattern is what we get after DAG combine for C code like this. short ptr1, ptr2, *ptr3; unsigned diff = ptr1 - ptr2; return ptr3[diff]; Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126588	2022-05-29 18:24:07 -07:00
Philip Reames	85b4470035	[RISCV] Allow PRE of vsetvli involving non-1 LMUL This is a follow up to address a review comment from D124869. When deciding whether to PRE a vsetvli, we can allow non-LMUL1 vsetvlis. Differential Revision: https://reviews.llvm.org/D126563	2022-05-27 15:49:41 -07:00
Craig Topper	b09e54541a	[RISCV] Use template version of SignExtend64 for constant extends. NFC We were inconsistent about which one we used.	2022-05-27 13:11:15 -07:00
Craig Topper	d0f65eaa85	[RISCV] Remove unused variables. NFC	2022-05-27 12:13:45 -07:00
Craig Topper	aaad507546	[RISCV] Return false from isOffsetFoldingLegal instead of reversing the fold in lowering. When lowering GlobalAddressNodes, we were removing a non-zero offset and creating a separate ADD. It already comes out of SelectionDAGBuilder with a separate ADD. The ADD was being removed by DAGCombiner. This patch disables the DAG combine so we don't have to reverse it. Test changes all look to be instruction order changes. Probably due to different DAG node ordering. Differential Revision: https://reviews.llvm.org/D126558	2022-05-27 11:05:18 -07:00
Fraser Cormack	3e450d9cbb	[RISCV][NFC] Unify compatibility checks under one function Split off from D125021. We were duplicating logic across different phases. Since we want to ensure a consistency of logic across phases for correctness, this patch combines our multiple compatibility checks into one function to better convey this. Several methods were made const too. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126472	2022-05-27 11:21:54 +01:00
Fangrui Song	cef377d75d	[RISCV] Simplify code after D125905	2022-05-26 18:13:38 -07:00
Philip Reames	8a3b6ba756	[RISCV] Add a subtarget feature to enable unaligned scalar loads and stores A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature. There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment. Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them. Differential Revision: https://reviews.llvm.org/D126085	2022-05-26 15:25:47 -07:00
Craig Topper	e9ac99b609	[RISCV] Simplfy creation of IndexVT in lowerMaskedGather/lowerMaskedScatter. NFC The scalar element width is not a factor in how ContainerVT is determined. We don't need to check the relative size of VT and IndexVT.	2022-05-26 13:13:32 -07:00
Philip Reames	d58cc0839e	[RISCV] reorganize getFrameIndexReference to reduce code duplication [nfc] This change reorganizes the majority of frame index resolution into a two strep process. Step 1 - Select which base register we're going to use. Step 2 - Compute the offset from that base register. The key point is that this allows us to share the step 2 logic for the SP case. This reduces the code duplication, and (I think) makes the code much easier to follow. I also went ahead and added assertions into phase 2 to catch errors where we select an illegal base pointer. In general, we can't index from a base register to a stack location if that requires crossing a variable and unknown region. In practice, we have two such cases: dynamic stack realign and var sized objects. Note that crossing the scalable region is fine since while variable, it's a known variability which can be expressed in the offset. Differential Revision: https://reviews.llvm.org/D126403	2022-05-26 09:44:58 -07:00
Philip Reames	afe49934a6	[RISCV] Allow compatible VTYPE in AVL Reg Forward cases During insertion of VSETVLI, we have two related bits of code which decide whether we can reuse a previous vsetvli result. As was pointed out in the original review, these cases can allow any prior state for which we know that VL is the same for any value of AVL. This was originally separated out of a desire for separate tests and review. As it turns out, finding a test case for this has been quite challenging. Most of the cases I tried, we manage to already get through other chains of logic. We do have one correct test change, but that only exercises one of the two changes. Differential Revision: https://reviews.llvm.org/D126400	2022-05-26 08:50:35 -07:00
Fraser Cormack	2c9983f530	[RISCV][NFC] Add braces to 'else' to match braced 'if'	2022-05-26 10:00:33 +01:00
Kito Cheng	e45087fd53	[RISCV] Fix state persistence bugs (PR55548) We didn't implement RISCVELFStreamer::reset and cause some very strange section output for attribute section...just reference D15950 to see how ARM implement that. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D125905	2022-05-26 16:09:00 +08:00
jacquesguan	b271488e8b	[RISCV] Replace ISD::FP_EXTEND and ISD::FP_ROUND with RVV VL op. This patch tries to solve the incoordination between the direct and intermediate cast caused by D123975. This patch replaces ISD::FP_EXTEND and ISD::FP_ROUND with RVV VL op in the lowering of FP scalable vector direct cast to unify with the intermediate cast. And it also changes the FP widenning pattern with the VL op. Differential Revision: https://reviews.llvm.org/D125364	2022-05-26 02:17:31 +00:00
Philip Reames	1f06398e96	Reapply "[RISCV] Enable strict assertions in InsertVSETVLI data flow" be2cb8 fixes the case which triggered the revert. Reapply, and let's see if anything else falls out. Original commit message: These asserts are believed to hold after several recent miscompiles have been fixed. If you see an assertion failure on this change, please toggle the default back and make sure you file a bug with a reproducer. We may have as yet uncaught miscompiles lurking in this code. Differential Revision: https://reviews.llvm.org/D125271	2022-05-25 11:18:55 -07:00
Philip Reames	be2cb824d0	[riscv] Remove mutation of prior vsetvli from insertion dataflow This moves mutation entirely out of the main algorithm. The immediate trigger is that we hit another case of the same issue I thought we'd fixed in `72925d9`. It turns out we hadn't considered the cross block case. As a brief summary, the issue being fixed is that if we mutate a previous vsetvli in phase 3, there's a possibility that some later use of that vsetvli changes "compatibility". In the cross_block_mutate test, this later vsetvli occurs in another block (and is thus visit order dependent too!). This causes us to fail strict asserts. (To be explicit, the current on by default workaround should compensate. It's only when we turn that off that we have problems.) Now, I want to explicitly call out an alternate workaround. We could leave the mutation in phase 3, and simplify restrict it to the case where the previous vsetvli's GPR result is unused. That covers the case we've actually seen. (I'll note that codegen regressions with a simple form of this were significant. We might have to check specifically for the use outside block case to keep them reasonable, which complicates the workaround slightly.) Personally, I'm at the point where I want the mutation pulled out just for robustness sake. I'm worried there's yet one more form of this bug we haven't thought about. The other motivation for this change is that it does give us a couple of minor codegen wins. None appear to be hugely significant, but improvements never hurt right? Differential Revision: https://reviews.llvm.org/D125270	2022-05-25 10:51:14 -07:00
Craig Topper	172149e98c	[RISCV] Preserve fast math flags in lowerVPOp. Update test to check MIR after finalize-isel instead of debug output. This is of course not the only place we should preserve FMF, but it's the most obvious one. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D126306	2022-05-25 09:16:07 -07:00
Philip Reames	2a3b6f2cba	[RISCV] Hoist VSETVLI vlmax, vtype out of scalable loops This is a straight forward extension of the PRE transform introduced in D124869 to handle the VLMAX case. The test changes here look quite positive. This surprised me until I realized that all the tests are using @llvm.vscale to figure out the VLMAX, not the llvm.riscv.vsetvlmax intrinsic. If they'd used the later, these would have been full redundancy cases and fully handled by the data flow. I'm not really sure if use of vscale here is representative or not. If it is, we should probably look at using VSETVLI to lower vscale rather than a raw read of vlenb and some math. Differential Revision: https://reviews.llvm.org/D126338	2022-05-25 08:00:27 -07:00
Philip Reames	dd336b6891	[RISCV] Restructure comment and add clarifying assert to getFrameIndexReference [NFC] Differential Revision: https://reviews.llvm.org/D126088	2022-05-25 07:59:27 -07:00
Lewis Revill	29a5a7c6d4	[RISCV] Add pre-emit pass to make more instructions compressible When optimizing for size, this pass searches for instructions that are prevented from being compressed by one of the following: 1. The use of a single uncompressed register. 2. A base register + offset where the offset is too large to be compressed and the base register may or may not already be compressed. In the first case, if there is a compressed register available, then the uncompressed register is copied to the compressed register and its uses replaced. This is only done if there are enough uses that code size would be improved. In the second case, if a compressed register is available, then the original base register is copied and adjusted such that: new_base_register = base_register + adjustment base_register + large_offset = new_base_register + small_offset and the uses of the base register are replaced with the new base register. Again this is only done if there are enough uses for code size to be improved. This pass was authored by Lewis Revill, with large offset optimization added by Craig Blackmore. Differential Revision: https://reviews.llvm.org/D92105	2022-05-25 09:25:02 +01:00
Craig Topper	66db5312bd	[RISCV] Fix vnsrl/vnsra isel patterns that are dropping VL. We were incorrectly using VLMax instead of the passed VL. Reviewed By: khchen, reames Differential Revision: https://reviews.llvm.org/D126319	2022-05-24 21:38:59 -07:00
Fraser Cormack	fd93736657	[RISCV] Replace untested code with assert We found untested code where negative frame indices were ostensibly handled despite it being in a block guarded by !MFI.isFixedObjectIndex. While the implementation of MachineFrameInfo::isFixedObjectIndex suggests this is possible (i.e., if a frame index was more negative - less than the number of fixed objects), I couldn't find any test in tree -- for any target -- where a negative frame index wasn't also a fixed object offset. I couldn't find a way of creating such a object with the public MachineFrameInfo creation APIs. Even MachineFrameInfo::getObjectIndexBegin starts counting at the negative number of fixed objects, so such frame indices wouldn't be covered by loops using the provided begin/end methods. Given all this, an assert that any object encountered in the block is non-negative seems reasonable. Reviewed By: StephenFan, kito-cheng Differential Revision: https://reviews.llvm.org/D126278	2022-05-25 05:03:53 +01:00
Philip Reames	948d931323	[RISCV] Ensure the forwarded AVL register is alive When the AVL value does not fit in 5 bits, the register in which this value is stored may be dead when we want to forward it. This patch ensure the kill flags on the register are cleared before forwarding. Patch by: loralb Differential Revision: https://reviews.llvm.org/D125971	2022-05-24 15:07:42 -07:00
Craig Topper	d2ee2c9c8d	[RISCV] Add an operand kind to the opcode/imm returned from RISCVMatInt. Instead of matching opcodes to know the format to emit, use an enum value that we can get from the RISCVMatInt::Inst class. Change the consumers to use fully covered switches so that we get a compiler warning if a new kind is added. With the opcode checks it was easier to forget to update one of the 3 consumers. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D126317	2022-05-24 14:56:29 -07:00
Philip Reames	fb948572e0	[riscv] Use getFirstInstrTerminator [nfc]	2022-05-24 14:56:01 -07:00
Philip Reames	a95ecb20bc	[RISCV] Hoist VSETVLI out of idiomatic fixed length vector loops This patch teaches the VSETVLI insertion pass to perform a very limited form of partial redundancy elimination. The motivating example comes from the fixed length vectorization of a simple loop such as: for (unsigned i = 0; i < a_len; i++) a[i] += b; Without this change, the core vector loop and preheader is as follows: .LBB0_3: # %vector.ph andi a1, a6, -8 addi a4, a0, 16 mv a5, a1 .LBB0_4: # %vector.body # =>This Inner Loop Header: Depth=1 addi a3, a4, -16 vsetivli zero, 4, e32, m1, ta, mu vle32.v v8, (a3) vle32.v v9, (a4) vadd.vx v8, v8, a2 vadd.vx v9, v9, a2 vse32.v v8, (a3) vse32.v v9, (a4) addi a5, a5, -8 addi a4, a4, 32 bnez a5, .LBB0_4 The key thing to note here is that, the execution of the vsetivli only needs to happen once. Since there's no tail folding happening here, the value of the vector configuration registers are invariant through the loop. After this patch, we hoist the configuration into the preheader and perform it once. .LBB0_3: # %vector.ph andi a1, a6, -8 vsetivli zero, 4, e32, m1, ta, mu addi a4, a0, 16 mv a5, a1 .LBB0_4: # %vector.body # =>This Inner Loop Header: Depth=1 addi a3, a4, -16 vle32.v v8, (a3) vle32.v v9, (a4) vadd.vx v8, v8, a2 vadd.vx v9, v9, a2 vse32.v v8, (a3) vse32.v v9, (a4) addi a5, a5, -8 addi a4, a4, 32 bnez a5, .LBB0_4 Differential Revision: https://reviews.llvm.org/D124869	2022-05-24 14:56:01 -07:00
Craig Topper	415b9f595d	Recommit "[RISCV] Use selectShiftMaskXLen ComplexPattern for isel of rotates." This reverts commit `dfe513ae1b`. Tests have been changed to avoid the type legalization bug being fixed in D126036. Original commit message: This will remove masks on the shift amount. We usually get this with SimplifyDemandedBits in DAGCombine, but that's restricted to cases where the AND has a single use. selectShiftMaskXLen does not have that restriction.	2022-05-24 09:41:04 -07:00
Fraser Cormack	08c9fb8447	[RISCV] Ensure the entire stack is aligned to the RVV stack alignment This patch fixes another bug in the RVV frame lowering. While some frame objects with non-default stack IDs (such scalable-vector alloca instructions) are considered in the target-independent max alignment calculations, others (for example, during calling-convention lowering) are not. This means we'd occasionally align the base of the stack to only 16 bytes, with no way to ensure that the RVV section contained within that is aligned to anything higher. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D125973	2022-05-24 06:58:51 +01:00
Fraser Cormack	cb8681a2b3	[RISCV] Fix RVV stack frame alignment bugs This patch addresses several alignment issues in the stack frame when RVV objects are taken into account. One bug is that the RVV stack was never guaranteed to keep the alignment of the stack as a whole. We must maintain a 16-byte aligned stack at all times, especially when calling other functions. With the standard V extension, this is conveniently happening since VLEN is at least 128 and always 16-byte aligned. However, we support Zvl64b which does not guarantee this. To fix this, the RVV stack size is rounded up to be aligned to 16 bytes. This in practice generally makes us allocate a stack sized at least 2VLEN in size, and a multiple of 2. \|------------------------------\| -- <-- FP \| 8-byte callee-save \| \| \| \|------------------------------\| \| \| \| one VLENB-sized RVV object \| \| \| \|------------------------------\| \| \| \| 8-byte local variable \| \| \| \|------------------------------\| -- <-- SP (must be aligned to 16) In the example above, with Zvl64b we are decrementing SP by 12 bytes which does not leave SP correctly aligned. We therefore introduce an extra VLENB-sized amount used for alignment. This would therefore ensure the total stack size was 16 bytes (48 for Zvl128b, 80 for Zvl256b, etc): \|------------------------------\| -- <-- FP \| 8-byte callee-save \| \| \| \|------------------------------\| \| \| \| one VLENB-sized padding obj \| \| \| \| one VLENB-sized RVV object \| \| \| \|------------------------------\| \| \| \| 8-byte local variable \| \| \| \|------------------------------\| -- <-- SP A new RVV invariant has been introduced in this patch, which is that the base of the RVV stack itself is now always aligned to 16 bytes, not 8 as before. This keeps us more in line with the scalar stack and should be easier to reason about. The calculation of the RVV padding has thus changed to be the amount required to align the scalar local variable section to the RVV section's alignment. This amount is further rounded up when setting up the initial stack to keep everything aligned: \|------------------------------\| -- <-- FP \| 8-byte callee-save \| \|------------------------------\| \| \| \| RVV objects \| \| (aligned to at least 16) \| \| \| \|------------------------------\| \| RVV padding of 8 bytes \| \|------------------------------\| \| 8-byte local variable \| \|------------------------------\| -- <-- SP In the example above, it's clear that we need 8 bytes of padding to keep the RVV section aligned to 16 when using SP. But to keep SP itself* aligned to 16 we can't decrement the initial stack pointer by 24 - we have to round up to 32. With the RVV section correctly aligned, the second bug fixed by this patch is that RVV objects themselves are now correctly aligned. We were previously only guaranteeing an alignment of 8 bytes, even if they required a higher alignment. This is relatively simple and in practice we see more rounding up of VLEN amounts to account for alignment in between objects: \|------------------------------\| \| RVV object (aligned to 16) \| \|------------------------------\| \| no padding necessary \| \|------------------------------\| \| 2VLENB RVV object (align 16)\| \|------------------------------\| \| VLENB alignment padding \| \|------------------------------\| \| RVV object (align 32) \| \|------------------------------\| \| 3VLENB alignment padding \| \|------------------------------\| \| VLENB RVV object (align 32) \| \|------------------------------\| -- <-- base of RVV section Note that a lot of the regressions in codegen owing to the new alignment rules are correct but actually only strictly necessary for Zvl64b (and Zvl32b but that's not really supported). I plan a follow-up patch to take the known VLEN into account when padding for alignment. Reviewed By: StephenFan Differential Revision: https://reviews.llvm.org/D125787	2022-05-24 06:53:51 +01:00
Peter Waller	ade47bdc31	[LV] Improve register pressure estimate at high VFs Previously, `getRegUsageForType` was implemented using `getTypeLegalizationCost`. `getRegUsageForType` is used by the loop vectorizer to estimate the register pressure caused by using a vector type. However, `getTypeLegalizationCost` currently only appears to understand splitting and not scalarization, so significantly underestimates the register requirements. Instead, use `getNumRegisters`, which understands when scalarization can occur (via computeRegisterProperties). This was discovered while investigating D118979 (Set maximum VF with shouldMaximizeVectorBandwidth), where under fixed-length 512-bit SVE the loop vectorizer previously ends up costing an v128i1 as 2 v64i* registers where it actually occupies 128 i32 registers. I'm sending this patch early for comment, I'm still doing some sanity checking with LNT. I note that getRegisterClassForType appears to return VectorRC even though the type in question (large vNi1 types) end up occupying scalar registers. That might be worth fixing too. Differential Revision: https://reviews.llvm.org/D125918	2022-05-23 07:57:45 +00:00
Paul Walker	258dac43d6	[SVE] Enable use of 32bit gather/scatter indices for fixed length vectors Differential Revision: https://reviews.llvm.org/D125193	2022-05-22 12:32:30 +01:00
Fraser Cormack	d60ae47f9d	[RISCV] Fix logic for determining RVV stack padding We must add padding when using SP or BP to access stack objects. Checking whether we're missing FP is not sufficient as stack realignment uses SP too. The test in D125962 explains the specific issue in more detail. Split from D125787. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D125964	2022-05-20 13:18:52 +01:00
jacquesguan	8fc4fcecb8	[RISCV] Add VL patterns for vector widening floating-point fused multiply-add instructions. This patch adds VL patterns for vector widening floating-point fused multiply-add instructions to support fixed length vector type. Differential Revision: https://reviews.llvm.org/D124505	2022-05-20 06:56:48 +00:00
Craig Topper	dfe513ae1b	Revert "[RISCV] Use selectShiftMaskXLen ComplexPattern for isel of rotates." This reverts commit `86f7d7074a`. The test cases added for this exposed an pre-existing bug that is failing the expensive checks bot. Reverting so I can revert that patch.	2022-05-19 14:39:38 -07:00
Jay Foad	6bec3e9303	[APInt] Remove all uses of zextOrSelf, sextOrSelf and truncOrSelf Most clients only used these methods because they wanted to be able to extend or truncate to the same bit width (which is a no-op). Now that the standard zext, sext and trunc allow this, there is no reason to use the OrSelf versions. The OrSelf versions additionally have the strange behaviour of allowing extending to a smaller width, or truncating to a larger width, which are also treated as no-ops. A small amount of client code relied on this (ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and needed rewriting. Differential Revision: https://reviews.llvm.org/D125557	2022-05-19 11:23:13 +01:00
Zi Xuan Wu (Zeson)	861489af1b	[NFC][RISCV] Enable TuneNoDefaultUnroll feature to control targets which use default unroll preference In RISCVTargetTransformInfo, enumerating the processor family is not a good way to predict. Because it needs to enumerate many subtarget family and is hard to update if add new subtarget. Instead, create a feature to distinguish whether targets want to use default unroll preference or not. Keep TuneSiFive7 because it's flag to indicate subtarget family, which may used in other place. Differential Revision: https://reviews.llvm.org/D125741	2022-05-19 12:21:49 +08:00
Craig Topper	86f7d7074a	[RISCV] Use selectShiftMaskXLen ComplexPattern for isel of rotates. This will remove masks on the shift amount. We usually get this with SimplifyDemandedBits in DAGCombine, but that's restricted to cases where the AND has a single use. selectShiftMaskXLen does not have that restriction.	2022-05-18 10:23:29 -07:00
Philip Reames	d4545e6fa0	Revert "[RISCV] Enable strict assertions in InsertVSETVLI data flow" This reverts commit `79a66ec97b`. The stronger asserts served their purpose; I stumbled across another bug. Will reapply once this one is also fixed. The bug appears to be a variant of a previous one: * We mutate an instruction in one block. * That mutation changes the phase3 results of another block. This is very similiar to a previous issue, except cross block instead of within a single block.	2022-05-17 15:53:13 -07:00
Philip Reames	118c5d1c97	[RISCV] Minor reorganization of VSETVLIInfo::operator== for readability [NFC]	2022-05-17 12:05:17 -07:00
Philip Reames	11a7e77c95	[RISCV] Canonicalize AVL=setvli to AVL=Imm or AVL=VLMAX This patch adds a transform to the local prepass in InsertVSETVLI which canonicalizes an AVL of a register from another vsetvli into immediate or VLMAX when VTYPE is the same. In this patch, I chose to be conservative and avoid arbitrary vreg forwarding due to profitability concerns about possibility overlapping live ranges. This has the effect of eliminating vsetvli instructions in loops which are walking either VLMAX or a constant number of lanes per iteration. Differential Revision: https://reviews.llvm.org/D125812	2022-05-17 11:46:22 -07:00
Philip Reames	79a66ec97b	[RISCV] Enable strict assertions in InsertVSETVLI data flow These asserts are believed to hold after several recent miscompiles have been fixed. If you see an assertion failure on this change, please toggle the default back and make sure you file a bug with a reproducer. We may have as yet uncaught miscompiles lurking in this code. Differential Revision: https://reviews.llvm.org/D125271	2022-05-17 11:12:31 -07:00
Fraser Cormack	8430b82741	[RISCV] Drop notion of "strict" vsetvli compatibility With recent fixes to the dataflow in place, we now never pass Strict=true to isCompatible, so remove the parameter completely. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D125748	2022-05-17 15:24:23 +01:00
Fraser Cormack	f00f894d5d	[RISCV][NFC] Reword split SP adjustment comments	2022-05-17 10:03:21 +01:00
Fraser Cormack	05ad4d4f38	[RISCV][NFC] Fix comment typos in split SP adjustment	2022-05-17 09:56:54 +01:00
Philip Reames	1474880353	[RISCV] Use classic dataflow for VSETVLI insertion Our current implementation of the InsertVSETVLI dataflow allows phase 3 to arrive at a different block end state than the data flow in phase 1/2 computed. This arises because a block which contains instructions (e.g. load or stores) which don't consume all the incoming bits of the VL/VTYPE can be compatible with multiple incoming states. The algorithm effectively changes the SEW on such instructions, and propagates the prior state forward. As phase 3 uses the block input state for this propagation, but phase 1/2 doesn't, this can result in different block end states. If we don't correct for it, this discrepancy can result in miscompiles. This was the source of multiple recent bugs. However, by now we have fixes for all known correctness issues. The basic strategy we use is to insert a compensation vsetvli to bring the block state leaving the block back into consistency with the one computed. This is correct, but results in extra vsetvlis being placed at the end of blocks. This change adjusts the phase 1/2 algorithm to propagate the incoming block state through the block, allowing the compatibility rules to modify the end state. The algorithm may need to run slightly more iterations, but the end result is consistent with what phase 3 does. The benefit of doing this is two fold. First, we reverse some of the code quality introductions introduced in the functional fixes. Second, we simplify the invariants, and allow the strict assertions to be enabled. Several humans, myself included, have found it quite surprising that invariant didn't hold already, and arguably that confusion is the cause of several of our recent miscompiles in this code. The downside to this patch is that the dataflow may require additional iterations to stabilize. In the worse case, we go from O(Edges) to O(E + UniquePaths) as the incoming state (and thus the outgoing one) can now change once for each path from the entry block. Differential Revision: https://reviews.llvm.org/D125232	2022-05-16 17:06:27 -07:00
Philip Reames	3d17c91709	[RISCV] Fix missing vsetvli in transparent block case We've got a lurking problem with our data flow implementation where different phases disagree, resulting in possible miscompiles. D119518 introduced a workaround, but failed to consider blocks which only contain load/stores compatible with their incoming state. When I went to rebase and simplify D125232, it turned out that not all of the correctness issues had been fixed yet after all. This is the correctness fix accidentally embedded in the original more complicated version. Note that the test changes here are mostly regressions. It's worth noting that the simplified version of D125232 exactly reverses all the non-functional diffs in the test caused here. D125232 should be the immediate following commit. Differential Revision: https://reviews.llvm.org/D125703	2022-05-16 17:06:27 -07:00
Philip Reames	7dbf2e7b57	Teach PeepholeOpt to eliminate redundant copy from constant physreg (e.g VLENB on RISCV) The existing redundant copy elimination required a virtual register source, but the same logic works for any physreg where we don't have to worry about clobbers. On RISCV, this helps eliminate redundant CSR reads from VLENB. Differential Revision: https://reviews.llvm.org/D125564	2022-05-16 16:38:30 -07:00
Paul Walker	7dd05ba9ed	[SelectionDAG] Remove duplicate "is scaled" information from gather/scatter SDNodes. During early gather/scatter enablement two different approaches were taken to represent scaled indices: * A Scale operand whereby byte_offsets = Index * Scale * An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType) Having multiple representations is bad as shown by this patch which fixes instances where the two are out of sync. The dedicated scale operand is more flexible and pervasive so this patch removes the UNSCALED values from IndexType. This means all indices are scaled but the scale can be one, hence unscaled. SDNodes now use the scale operand to answer the "isScaledIndex" question. I toyed with the idea of keeping the UNSCALED enums and helper functions but because they will have no uses and force SDNodes to validate the set of supported values I figured it's best to remove them. We can re-add them if there's a real need. For similar reasons I've kept the IndexType enum when a bool could be used as I think being explicitly looks better. Depends On D123347 Differential Revision: https://reviews.llvm.org/D123381	2022-05-16 20:47:52 +01:00
Philip Reames	e2df48bb23	[RISCV] Add further trace output to InsertVSETLVI	2022-05-16 09:15:32 -07:00
Liqin.Weng	d95513ae3a	[RISCV] remove useless code When legality check for vectoring reduction， hasVInstructions() check be unneeded. RISCV can only loop vectorization with hasVInstructions() Reviewed By: kito-cheng, craig.topper Differential Revision: https://reviews.llvm.org/D125460	2022-05-16 12:54:03 +00:00
jacquesguan	a8426ada49	[RISCV][NFC] Replace for-each with array argument call. This patch replaces some for-each set with the new arrayref argument API, since it already used an array in defination, I think this change won't cause any ambiguity. Differential Revision: https://reviews.llvm.org/D125455	2022-05-16 02:12:48 +00:00
Zakk Chen	1878f240c9	[RISCV] Fix incorrect use of tail agnostic vslideup. We need to use tail undisturbed for vslideup to implement vector insert operation correctly. Ideally, we cound use the tail agnostic when insert subvector or element at the end of the vector. This will be in follow-up patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125545	2022-05-15 18:32:21 -07:00
Sheng	c644488a8b	Rename `MCFixedLenDisassembler.h` as `MCDecoderOps.h` The name `MCFixedLenDisassembler.h` is out of date after D120958. Rename it as `MCDecoderOps.h` to reflect the change. Reviewed By: myhsu Differential Revision: https://reviews.llvm.org/D124987	2022-05-15 08:44:58 +08:00
Craig Topper	5a19fbad83	[RISCV] Remove unneeded check for ISD::VSCALE operand being a constant. NFC ISD::VSCALE only allows constant operands.	2022-05-14 13:45:03 -07:00
Roger Ferrer Ibanez	189ca6958e	[RISCV] Use the new chain when converting a fixed RVV load When building the final merged node, we were using the original chain rather than the output chain of the new operation. After some collapsing of the chain this could cause the loads be incorrectly scheduled respect to later stores. This was uncovered by SingleSource/Regression/C/gcc-c-torture/execute/pr36038.c of the llvm testsuite. https://reviews.llvm.org/D125560	2022-05-13 22:21:08 +00:00
Craig Topper	a2918976cd	Revert "[RISCV] Enable subregister liveness tracking for RVV." This reverts most of `ed242b54c9` I'm seeing failures in our intrinsic testing on qemu that seem related to this. Reverting while I investigate. I've left the command line option in place for directed testing. It defaults to off.	2022-05-13 10:59:58 -07:00
Philip Reames	853fa8ee22	[RISCV] Address post-commit feedback from `af5e09b`	2022-05-13 09:51:23 -07:00
Philip Reames	af5e09b7d9	[RISCV] Add llvm.read.register support for vlenb This patch adds minimal support for lowering an read.register intrinsic with vlenb as the argument. Note that vlenb is an implementation constant, so it is never allocatable. This was split off a patch to eventually replace PseudoReadVLENB with a COPY MI because doing so revealed a couple of optimization opportunities which really seemed to warrant individual patches and tests. To write those patches, I need a way to write the tests involving vlenb, and read.register seemed like the right testing hook. Differential Revision: https://reviews.llvm.org/D125552	2022-05-13 09:12:02 -07:00
Zakk Chen	7dfc56c107	[RISCV] Add the passthru operand for RVV unmasked segment load IR intrinsics. The goal is support tail and mask policy in RVV builtins. We focus on IR part first. If the passthru operand is undef, we use tail agnostic, otherwise use tail undisturbed. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125323	2022-05-13 02:16:40 -07:00
Philip Reames	52b5f1f7d4	[RISCV] Extend dataflow workaround from D119518 to fallthrough blocks We've got a lurking problem with our data flow implementation where different phases disagree, resulting in possible miscompiles. D119518 introduced a workaround, but failed to consider blocks without terminators (e.g. fallthroughs). I have a deeper rework of the algorithm in flight over in D125232, but this patch is specifically a minimal fix for an active miscompile. That change can be reworked over this once landed. Differential Revision: https://reviews.llvm.org/D125408	2022-05-12 10:45:59 -07:00
Craig Topper	40e9654511	[RISCV] Use tail agnostic policy when selecting riscv_fma_vl to instructions riscv_fma_vl doesn't have a tail, so use the tail_agnostic policy. We were already doing this for some patterns. I think the patterns with fneg and mask were added later and I copied the tail policy from the unmasked patterns. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D125424	2022-05-12 09:09:24 -07:00
Craig Topper	ed242b54c9	[RISCV] Enable subregister liveness tracking for RVV. RVV makes heavy use of subregisters due to LMUL>1 and segment load/store tuples. Enabling subregister liveness tracking improves the quality of the register allocation. I've added a command line that can be used to turn it off if it causes compile time or functional issues. I used the command line to keep the old behavior for one interesting test case that was testing register allocation. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125108	2022-05-11 12:49:03 -07:00
Craig Topper	5c7ec998a9	[RISCV] Fold addiw from (add X, (addiw (lui C1, C2))) into load/store address This is a followup to D124231. We can fold the ADDIW in this pattern if we can prove that LUI+ADDI would have produced the same result as LUI+ADDIW. This pattern occurs because constant materialization prefers LUI+ADDIW for all simm32 immediates. Only immediates in the range 0x7ffff800-0x7fffffff require an ADDIW. Other simm32 immediates work with LUI+ADDI. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D124693	2022-05-11 12:47:13 -07:00
Craig Topper	f499ec6b3d	[RISCV] Add caching to the gather/scatter to strided load/store conversion. If we have multiple gather/scatter instructions using the same the same strided address we would scalarize it multiple times. I guess a later pass cleans this up, but I don't know if that's guaranteed. This patch adds a cache to remember the scalarization we already created for a previous gather/scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D125326	2022-05-11 11:47:27 -07:00
Craig Topper	09f48c6b80	[RISCV] Move implementation of getVLOpNum and getSEWOpNum from RISCVInsertVSETVLI to RISCVBaseInfo.h. NFC We should consolidate the operand counting and ordering into RISCVBaseInfo.h and stop spreading it around. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D125344	2022-05-11 11:14:58 -07:00
Craig Topper	0ebb02b90a	[RISCV] Override TargetLowering::shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd. This hook determines if SimplifySetcc transforms (X & (C l>>/<< Y)) ==/!= 0 into ((X <</l>> Y) & C) ==/!= 0. Where C is a constant and X might be a constant. The default implementation favors doing the transform if X is not a constant. Otherwise the code is left alone. There is a provision that if the target supports a bit test instruction then the transform will favor ((1 << Y) & X) ==/!= 0. RISCV does not say it has a variable bit test operation. RISCV with Zbs does have a BEXT instruction that performs (X >> Y) & 1. Without Zbs, (X >> Y) & 1 still looks preferable to ((1 << Y) & X) since we can fold use ANDI instead of putting a 1 in a register for SLL. This patch overrides this hook to favor bit extract patterns and otherwise falls back to the "do the transform if X is not a constant" heuristic. I've added tests where both C and X are constants with both the shl form and lshr form. I've also added a test for a switch statement that lowers to a bit test. That was my original motivation for looking at this. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D124639	2022-05-11 11:13:17 -07:00
Craig Topper	0781742785	[RISCV] Add a DAG combine to pre-promote (i32 (and (srl X, Y), 1)) with Zbs on RV64. Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW, which will prevent us from using a BEXT instruction. I don't think there is any precedent for type promotion checking users to decide how to promote. Instead, I've added this DAG combine to do it before type legalization. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D124109	2022-05-11 10:49:16 -07:00
Philip Reames	72925d98bf	[riscv] Canonicalize vsetvli (vsetvli avl, vtype1) vtype2 transitionsas reviewed This patch is an alternative to a piece of D125270. If we have one vsetvli which is using as AVL the output of another, and the prior AVL can be proven to produce the same VL value as that defining one, we can use the AVL from the prior instruction. This has the effect of removing a state transition on AVL, and will let us use the cheaper 'vsetvli x0, x0, vtype1' form or possible even skip emitting it entirely. This builds on the same infrastructure as D125337, and does the analogous extension to working on abstract states instead of only prior explicit vsetvli instructions. This is where the (relatively minor) code improvements come from. More importantly, this fixes the last case where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries. Doing this transform inside the dataflow can cause the compatibility of a later store to change with regards to the current state. test15 in the diff illustrates this case well. What we have is a vsetvli which is mutated by one following vector op, but whose GPR result is used by another. The compatibility logic walks back to the def in this case, and checks to see if it matches the immediate prior state. In phase 1 and 2, it doesn't, and in phase 3 (after mutation) it does because we remove a transition which caused it to differ. Differential Revision: https://reviews.llvm.org/D125392	2022-05-11 10:45:29 -07:00
Philip Reames	cc0283a635	[riscv] Prefer to use previous VL for scalar move instructionsK This patch is an alternative to a piece of D125270. Its direct motivation is to fix a wrong code bug (described below), but somewhat unexpectedly, it also results in a significant code quality improvement for idiomatic fixed length vector patterns. The existing transform is simply wrong in its current location. We are correct about the fact that the scalar move itself can use the previous vsetvli, but we loose track of the fact that later instructions might depend on the state change represented. That is, the actual value of VL in the register is different than the abstract state thinks it is. Not simply due to precision of modeling, but e.g. the VL register could contain 3 when the abstract state says it is 1. This is annoying hard to demonstrate in practice due to differences in policy flags on the intrinsics, but this is at least a latent wrong code bug. The code quality benefit comes from the fact we don't need to tie this to explicit vsetvli instructions at all. We can propagate the abstract state, and reduce a) the number of transitions, or b) the cost of those transitions. It turns out we have a bunch of cases - in tests at least - where fixed length AVLs are known non-zero, and we can leave VL unchanged while changing VTYPE. Differential Revision: https://reviews.llvm.org/D125337	2022-05-11 07:37:50 -07:00
Fraser Cormack	27c7e922fe	[RISCV][NFC] Rename variable to appease code style	2022-05-11 12:41:25 +01:00
Fraser Cormack	874b802a6d	[RISCV][NFC] Move variable down closer to its first use	2022-05-11 12:33:01 +01:00
Fraser Cormack	c1d48b35d8	[SelectionDAG][VP] Rename VP sext/zext/trunc ISD opcodes Rather than VP_SEXT/VP_ZEXT/VP_TRUNC, having VP_SIGN_EXTEND/VP_ZERO_EXTEND/VP_TRUNCATE better matches their non-VP counterparts. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125298	2022-05-11 10:25:51 +01:00
Yeting Kuo	4537aae0d5	[RISCV] Make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF. The patch make PseudoReadVL have the vtypes of the corresponding VLEFF/VLSEGFF. It's useful to get the vtypes of locations of PseudoReadVL without finding the corresponding VLEFF/VLSEGFF. It could simplify optimizations in RISCVInsertVSETVLI like D123581. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125199	2022-05-11 14:07:58 +08:00
jacquesguan	2509dcd58a	[RISCV] Add rvv codegen support for vp.fpext. This patch adds rvv codegen support for vp.fpext. The lowering of fp_round, vp.fptrunc, fp_extend and vp.fpext share most code so use a common lowering function to handle these four. And this patch changes the intermediate cast from ISD::FP_EXTEND/ISD::FP_ROUND to the RVV VL version op RISCVISD::FP_EXTEND_VL and RISCVISD::FP_ROUND_VL for scalable vectors. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D123975	2022-05-11 03:28:25 +00:00
Philip Reames	7731935ffc	[riscv] Consolidate logic for SEW/VL operand offset calculations [nfc]	2022-05-10 15:06:26 -07:00
Philip Reames	413052310a	[riscv] Minor style cleanup so that code more obviously matches comments [nfc]	2022-05-10 14:20:26 -07:00
Fraser Cormack	0b2e7a7c72	[RISCV][NFC] Remove else after continue	2022-05-10 11:15:50 +01:00
Fraser Cormack	3b9a231d25	[RISCV] Remove two unmasked RVV patterns These can be selected to unmasked from masked instructions by the post-process DAG step. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D125239	2022-05-09 16:54:24 +01:00
Philip Reames	70ad96ca5e	[riscv, InsertVSETVLI] Rename InstrInfo to Require to more clearly indicate purpose [nfc]	2022-05-09 06:40:33 -07:00
Philip Reames	7ed16e7c51	[riscv] Fix state tracking bug on vsetvli (phi of vsetvli) peephole This fixes the first of several cases where the state computed in phase 1 and 2 of the algorithm differs from the state computed during phase 3. Note that such differences can cause miscompiles by creating disagreements about contents of the VL and VTYPE registers at block boundaries. In this particular case, we recognize that for the first vsetvli in a block, that if the AVL is a phi of GPR results from previous vsetvlis and the VTYPE field matches, we can avoid emitting a vsetvli as the register contents don't change. Unfortunately, the abstract state does change and that update was lost. As noted in the test change, this can actually improve results by preserving information until later state transitions in the block. However, this minor codegen improvement is not the motivation for the patch. The motivation is to avoid cases a case where we break a key internal correctness invariant. Differential Revision: https://reviews.llvm.org/D125133	2022-05-09 06:21:45 -07:00
Philip Reames	c7c3f58544	[riscv] Use early return to reduce nesting for InsertVSETVLI [nfc]	2022-05-06 13:10:05 -07:00
Philip Reames	99a41005fe	[riscv] Add early return to InsertVSETLI fixed point step [nfc] If the income state hasn't changed, and the step function is fixed by assumption, then the output state can't have changed. In the current algorithm, this is a very minor win and mostly allows adding tracing output without being horrible verbose.	2022-05-06 13:08:11 -07:00
Philip Reames	dee9b01d83	[riscv] Add some minimal tracing output to InsertVSETVLI Only available with -debug. Main purpose is simplifying an upcoming change, and providing tools for debugging problems.	2022-05-06 13:08:11 -07:00
Philip Reames	f486119ce9	[riscv] Add strict asserts for VSETVLI insertion algorithm to help catch bugs This assertion should hold for any reasonable data flow algorithm, but is known not to in several cases today. I'd like to go ahead and land this off-by-default, so that we can collaborate on fixes and have a common definition of success. Differential: https://reviews.llvm.org/D125035	2022-05-06 10:28:22 -07:00
wangpc	4ff5e8184c	[RISCV] Enable MachineOutliner by default under -Oz for RISCV Enable default outlining when the function has the minsize attribute. `addr-label.ll` crashed after enabling this, so a barrier is added before instruction selection as a workaround. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D122213	2022-05-06 17:37:45 +08:00
Philip Reames	042a7a5f0d	[riscv] Use X0 for destination of VSETVLI instruction if result unused If the GPR destination register of a VSETVLI instruction is unused, we can replace it with X0. This discards the result, and thus reduces register pressure. Since after the core insertion/lowering algorithm has run, many user written VSETVLIs will have their GPR result unused (as VTYPE/VLEN is now explicitly read instead), this kicks in for most tests which involve a vsetvli intrinsic for fixed length vectorization. (vscale vectorization generally uses the GPR result to know how far to e.g. advance pointers in a loop and these uses are not removed.) When inserting VSETVLIs to lower psuedos, we prefer the X0 form anyways. Differential Revision: https://reviews.llvm.org/D124961	2022-05-05 07:39:45 -07:00
Lian Wang	8bb10436ab	[RISCV][NFC] Use true_mask replace riscv_vmset_vl in defined patterns. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124660	2022-05-05 03:05:52 +00:00
Craig Topper	60cb489685	[RISCV] Use movImm went multiplying by simm12 in getVLENFactoredAmount. No reason to special case simm12, movImm handles all immediates. This also fixe a bug that we weren't passing the frame-setup/destroy flag to movImm when we were calling it.	2022-05-04 17:23:22 -07:00
Philip Reames	18ed2ee80c	[RISCV] Add a version of insertVSETVLI which uses an iterator [NFC] This is to simplify the final version of D124869.	2022-05-04 14:48:31 -07:00
Craig Topper	411bb42eed	[RISCV] Add a special case to treat riscv-v-vector-bits-min=-1 as meaning use Zvlb value. riscv-v-vector-bits-min is primarily used to opt-in to the autovectorizer. The vector width can be determined from Zvlb. This patch adds support treating -1 as meaning use Zvlb so we can still opt-in to autovectorization without needing to repeat a vector width already given by Zvlb or -mcpu. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D124960	2022-05-04 14:26:45 -07:00
Craig Topper	1d6430b9e2	[RISCV] Update isLegalAddressingMode for RVV. RVV instructions only support base register addressing. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D124820	2022-05-03 19:49:11 -07:00
Craig Topper	9cce9a126c	[RISCV] Make use of SHXADD instructions in RVV spill/reload code. We can use SH1ADD, SH2ADD, SH3ADD to multipy by 3, 5, and 9 respectively. We could extend this to 3, 5, or 9 multiplied by a power 2 by also emitting a SLLI. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D124824	2022-05-03 19:35:21 -07:00
Craig Topper	0971819740	[RISCV] Don't lookup TII in RISCVInstrInfo::getVLENFactoredAmount. NFCI We're already inside of our implementation of TII.	2022-05-03 19:35:21 -07:00
Weverything	5afd20806d	[riscv] Mark function as used to avoid unused warning.	2022-05-03 18:51:23 -07:00
Philip Reames	2982d0032b	Fix a buildbot warning [nfc]	2022-05-03 14:40:27 -07:00
Philip Reames	be50b8c185	[riscv] Add debug printing support for VSETVLIInfo class [nfc]	2022-05-03 14:00:17 -07:00
Hsiangkai Wang	eaaa31ff2c	[RISCV][TargetLowering] Special case overflow expansion for (uaddo X, C). Follow-up to D122933. Differential Revision: https://reviews.llvm.org/D124374	2022-05-03 03:51:36 +00:00
Craig Topper	72a66358f6	[RISCV] Add isCommutable to FADD/FMUL/FMIN/FMAX/FEQ. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D123972	2022-05-02 20:21:16 -07:00
Zakk Chen	5807e59a0a	[RISCV] Fix incorrect codegen for masked vmsge{u}.vx with mask agnostic. The result was totally wrong. We could use mask undisturbed result to emulate the mask agnostic result. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124684	2022-05-02 17:57:29 -07:00
Fangrui Song	2019c9b1c8	[RISCV] Lower case the first letter of LowerRISCVMachineOperandToMCOperand. NFC	2022-05-01 14:13:55 -07:00
luxufan	e098281c27	[RISCV] Don't getDebugLoc for the end node of MBB iterator Because of shrink wrapping, the block to insert epilog may don't have instructions (Only debug instructions). And the position to insert may point to MBB.end() that don't have a DebugLoc. This patch fix this problem. The test program was copied from the issue:https://github.com/llvm/llvm-project/issues/53662 Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D123679	2022-04-30 16:00:20 +08:00
Yeting Kuo	c069e37019	[RISCV] Add DAGCombine to fold base operation and reduction. Transform (<bop> x, (reduce.<bop> vec, splat(neutral_element))) to (reduce.<bop> vec, splat (x)). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122563	2022-04-30 14:07:05 +08:00
Craig Topper	f91690f7db	[RISCV] Don't merge addi into load/store address if addi has a FrameIndex operand. This fixes a crash from D124231. We can't fold (load (add base, (addi src, off1)), off2) -> (load (add base, src), off1+off2) if the src is a FrameIndex. FrameIndex cannot be the operand of an add. There was an immediate==0 check that I think was trying to catch the common case of FrameIndex addis where the immediate is 0, but they can also appear in non-zero form. Instead explicitly check for a FrameIndex operand.	2022-04-29 18:22:20 -07:00
Craig Topper	5aa1a7b307	[RISCV] Remove 'frameindex' from list for ComplexPattern. NFC Putting a node in this list allows the node to be used as the root of an isel pattern that would then call the ComplexPattern. The usual case is to use the ComplexPattern as the operand of another operator. AddrFI is never used as a root operation. frameindex is handled directly with custom code in RISCVISelDAGToDAG::Select. So adding frameindex to the list here serves no purpose.	2022-04-29 17:41:07 -07:00
Philip Reames	3ea191ed03	[RISCV] Factor repeating code into getMaskTypeFor(VT) [nfc]	2022-04-29 10:00:57 -07:00
Philip Reames	f927be0df8	[RISCV] Extract getAllOnesMask helper [nfc]	2022-04-29 09:30:18 -07:00
Craig Topper	5c38373125	[RISCV] Improve constant materialization for cases that can use LUI+ADDI instead of LUI+ADDIW. It's possible that we have a constant that isn't simm32 so we can't use LUI+ADDIW, but we can use LUI+ADDI. Because ADDI uses a sign extended constant, it's possible that after subtracting it out, we end up with a simm32 that maps to LUI. This patch detects this case after removing Lo12 and before shifting the value for SLLI. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D124222	2022-04-29 08:58:32 -07:00
LiaoChunyu	03a3654203	[RISCV] Add cost model for SK_Broadcast Add cost model for broadcast shuffle in RISCVTTIImpl::getShuffleCost with scalable vector. The cost model might not the best. For scalable vector, BasicTTIImpl::getShuffleCost return invalid cost, so this patch relies on the existing cost model in BasicTTIImpl. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124101	2022-04-29 13:28:02 +08:00
Hsiangkai Wang	c62b014db9	[RISCV] Merge addi into load/store as there is a ADD between them This patch adds peephole optimizations for the following patterns: (load (add base, (addi src, off1)), off2) -> (load (add base, src), off1+off2) (store val, (add base, (addi src, off1)), off2) -> (store val, (add base, src), off1+off2) Differential Revision: https://reviews.llvm.org/D124231	2022-04-29 04:33:05 +00:00
Craig Topper	ec11fbb1d6	[RISCV] Use default promotion for (i32 (shl 1, X)) on RV64 when Zbs is enabled. This improves opportunities to use bset/bclr/binv. Unfortunately, there are no W versions of these instrcutions so this isn't always a clear win. If we use SLLW we get free sign extend and shift masking, but need to put a 1 in a register and can't remove an or/xor. If we use bset/bclr/binv we remove the immediate materializationg and logic op, but might need a mask on the shift amount and sext.w. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D124096	2022-04-28 09:58:30 -07:00
Craig Topper	8631a5e712	[RISCV] Fix alias printing for vmnot.m By clearing the HasDummyMask flag from mask register binary operations and mask load/store. HasDummyMask was causing an extra operand to get appended when converting from MachineInstr to MCInst. This extra operand doesn't appear in the assembly string so was mostly ignored, but it prevented the alias instruction printing from working correctly. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D124424	2022-04-28 08:33:52 -07:00
Lian Wang	dc0ae8ce18	[RISCV] Support VP_SETCC mask operations Support VP_SETCC mask operations, turn it to logical operation. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124438	2022-04-28 08:52:29 +00:00
Craig Topper	c2614b31d9	[RISCV] Add isCommutable to scalar FMA instructions. The default implementation of findCommutedOpIndices picks the first two source operands. That's exactly what we want for the scalar FMA instructions. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D124463	2022-04-27 11:07:18 -07:00
Jim Lin	9de7b93bc0	[RISCV][NFC] Update and add missing closed curly bracket comment in RISCVInstrInfoZb.td	2022-04-27 15:08:51 +08:00
ShihPo Hung	6b55f133fb	[RISCV][RVV] Select unmasked TU RVV pseudos in a DAG post-process Following D118810 that reduced the size of ISel table, this patch optimizes allone-masked RVV pseudos with TU policy and swap them out to their unmasked TU pseudos. Since the UNDEF merge operand is not preserved, we turn it into TA pseudo regardless of the policy operand. Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D121881	2022-04-26 20:14:54 -07:00
Vasileios Porpodas	fa8a9fea47	Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`" This reverts commit `6a9bbd9f20`. Code review: https://reviews.llvm.org/D124202	2022-04-26 14:02:40 -07:00
Shao-Ce SUN	c59473aacc	[NFC][RISCV][CodeGen] Use ArrayRef in TargetLowering functions Based on D123467. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D123653	2022-04-26 23:53:00 +08:00
Craig Topper	40f1af4760	[RISCV] Add isCommutable to ADD/ADDW/MUL/AND/OR/XOR/MIN/MAX/CLMUL Reviewed By: reames Differential Revision: https://reviews.llvm.org/D123970	2022-04-25 10:53:41 -07:00
Zakk Chen	ffe03ff75c	[RISCV] Fix incorrect policy implement for unmasked vslidedown and vslideup. vslideup works by leaving elements 0<i<OFFSET undisturbed. so it need the destination operand as input for correctness regardless of policy. Add a operand to indicate policy. We also add policy operand for unmaksed vslidedown to keep the interface consistent with vslideup because vslidedown have only undisturbed at 0<i<vstart but user have no way to control of vstart. Reviewed By: rogfer01, craig.topper Differential Revision: https://reviews.llvm.org/D124186	2022-04-25 09:18:41 -07:00
wangpc	7a21a0525a	[RISCV] Add sched to pseudo function call instructions To fix llvm-mca's error of 'found an unsupported instruction in the input assembly sequence.' caused by the lack of scheduling info. Pseudo function call instructions will be expanded to `auipc` and `jalr`, so their scheduling info are the combination of two. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123578	2022-04-24 14:58:18 +08:00
Mohammed Nurul Hoque	5dd99f71aa	[RISCV] transform MI to W variant to remove sext.w Backwards search The sext.w removal pass (before the new patch) checks if the input to sext.w is already in sign-extended form, so it can eliminate it. It does that by checking every definition/source that reaches the sext.w is an instruction that produces a sign-extended value, either by definition (e.g. ADDW), or it propagates sign-extension (e.g. OR) so we check its sources recursively. Forward search Sometimes, one of the sources is an instruction that doesn't always produce a sign-extended value, but it has a W-version that does (e.g. ADD / ADDW). If we transform the ADD to ADDW, the sext.w can be removed (assuming other def paths are satisfied), but this transformation is sound only if every use of this ADD/W only reqruires the lower 32-bits either directly (like sll %x, 32) or they propagate dependency (lower word of output only depends on lower word of input) so we check its uses recursively. When searching backwards, if an instruction that can be replaced with W-variant is encountered, this pass runs the forward search to verify it can be replaced, then adds it to a list of fixable instructions. After verifying all paths, it replaces the instruction and removes the sext.w. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D119928	2022-04-22 10:59:26 -07:00
Fraser Cormack	98db7ea262	[RISCV][NFC] Adjust some formatting in VL patterns	2022-04-22 17:19:27 +01:00
Fraser Cormack	2b0fedc2dd	[RISCV] Print human-readable VTYPE/SEW/LMUL in MIR This patch adds custom MIR operand comments to VTYPE immediate operands in VSETVLI instructions and SEW/LMUL operands in vector codegen pseudo instructions. The result is intended to be more human-readable and hopefully maintainable when working with MIR, particularly when writing or reading test cases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124187	2022-04-22 17:13:18 +01:00
wangpc	5c3ea07848	[RISCV] Do not outline CFI instructions when they are needed in EH We saw a failure caused by unwinding with incomplete CFIs, so we can't outline CFI instructions when they are needed in EH. This is a recommit of `0d40688`, which was reverted in `ce83883` as related precommit test `360d44e` caused some errors. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D122634	2022-04-22 12:28:19 +08:00
Ping Deng	7493d9ffb6	[RISCV][NFC] Use defvar to simplify pattern definations. Reviewed By: jacquesguan, frasercrmck Differential Revision: https://reviews.llvm.org/D123839	2022-04-22 02:45:14 +00:00
Craig Topper	9534811aa8	[RISCV] Teach generateInstSeqImpl to generate BSETI for single bit cases. If the immediate has one bit set, but isn't a simm32 we can try the BSETI instruction from Zbs.	2022-04-21 12:08:34 -07:00
Craig Topper	98b866892d	[RISCV] Add special case to constant materialization to remove trailing zeros first. If there are fewer than 12 trailing zeros, we'll try to use an ADDI at the end of the sequence. If we strip trailing zeros and end the sequence with a SLLI we might find a shorter sequence. Differential Revision: https://reviews.llvm.org/D124148	2022-04-21 09:43:32 -07:00
wangpc	ce83883691	Revert "[RISCV] Do not outline CFI instructions when they are needed in EH" This reverts commit `0d40688925`.	2022-04-21 16:23:10 +08:00
wangpc	0d40688925	[RISCV] Do not outline CFI instructions when they are needed in EH We saw a failure caused by unwinding with incomplete CFIs, so we can't outline CFI instructions when they are needed in EH. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D122634	2022-04-21 16:13:22 +08:00
Fraser Cormack	3e678cb772	[RISCV] Don't emit fractional VIDs with negative steps We can't shift-right negative numbers to divide them, so avoid emitting such sequences. Use negative numerators as a proxy for this situation, since the indices are always non-negative. An alternative strategy could be to add a compiler flag to emit division instructions, which would at least allow us to test the VID sequence matching itself. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123796	2022-04-21 07:00:34 +01:00
Craig Topper	186d5c8af5	[RISCV] Make getInstSeqCost handle other Zb* instructions. We haven't been updating this as Zb* instructions have been used for immediate materialization. They will hit the default case and trigger an llvm_unreachable. Instead of trying to list them all, assume instructions that aren't explicitly listed aren't compressible. Spotted while looking at integer materialization for other reasons. I haven't seen a crash from this yet.	2022-04-20 22:08:04 -07:00
Craig Topper	6db0afb44e	[RISCV] Fold (xor (sllw 1, x), -1) -> (rolw ~1, x). There's an existing generic combine that does this for legal types. This patch adds a RISCV specific combine for W instructions. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D123983	2022-04-19 15:03:43 -07:00
Fraser Cormack	c5cac48549	[RISCV] Fix lowering of BUILD_VECTORs as VID sequences This patch fixes a bug when lowering BUILD_VECTOR via VID sequences. After adding support for fractional steps in D106533, elements with zero steps may be skipped if no step has yet been computed. This allowed certain sequences to slip through the cracks, being identified as VID sequences when in fact they are not. The fix for this is to perform a second loop over the BUILD_VECTOR to validate the entire sequence once the step has been computed. This isn't the most efficient, but on balance the code is more readable and maintainable than doing back-validation during the first loop. Fixes the tests introduced in D123785. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123786	2022-04-19 07:43:38 +01:00
jacquesguan	25445b94db	[RISCV] Add rvv codegen support for vp.fptrunc. This patch adds rvv codegen support for vp.fptrunc. The lowering of fp_round and vp.fptrunc share most code so use a common lowering function to handle those two, similar to vp.trunc. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123841	2022-04-19 01:56:18 +00:00
Lian Wang	545d353b3c	[RISCV][NFC] Refactor VL patterns for vnsrl and vnsra Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D123274	2022-04-15 07:42:59 +00:00
jacquesguan	1aa4f0bb6c	[RISCV][VP] Add RVV codegen for vp.trunc. Differential Revision: https://reviews.llvm.org/D123579	2022-04-15 02:29:53 +00:00
Lian Wang	3100893f63	[RISCV] Remove sext_inreg+riscv_grev/riscv_gorc isel patterns Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123565	2022-04-14 08:16:32 +00:00
Lian Wang	38706dd940	[RISCV][NFC] Refactor patterns for Multiply Add instructions Reviewed By: craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D123355	2022-04-14 08:00:00 +00:00
wangpc	d0828c5af9	[RISCV][NFC] Use addExpr() instead of createExpr() It seems to be neater. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D123675	2022-04-14 10:48:25 +08:00
Liqin Weng	8265679018	[RISCV][NFC] Refactor the type promotion of fsl/fsr/becompress/bdecompress/bfp Reviewed By: asb, jrtc27, craig.topper, frasercrmck Differential Revision: https://reviews.llvm.org/D123181	2022-04-13 08:52:04 +00:00
Craig Topper	057c063c9b	[RISCV] Add a encodeLMUL function to RISCVVType. NFC This moves the encoding handling out of the assembly parser. Reviewed By: khchen, frasercrmck Differential Revision: https://reviews.llvm.org/D123553	2022-04-12 13:39:47 -07:00
Craig Topper	2ce2562876	[RISCV][SelectionDAG] Add a hook to sign extend i32 ConstantInt operands of phis on RV64. Materializing constants on RISCV is simpler if the constant is sign extended from i32. By default i32 constant operands of phis are zero extended. This patch adds a hook to allow RISCV to override this for i32. We have an existing isSExtCheaperThanZExt, but it operates on EVT which we don't have at these places in the code. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122951	2022-04-11 14:38:39 -07:00
Craig Topper	76192182d0	[RISCV] Remove riscv-v-fixed-length-vector-elen-max command line option. This was added before Zve extensions were defined. I think users should use Zve32x or Zve32f now. Though we will lose support for limiting ELEN to 16 or 8, but I hope no one was using that. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D123418	2022-04-11 10:14:48 -07:00
Craig Topper	c266e50430	[RISCV] Remove ExtZvl enum from RISCVSubtarget. NFC Having an enum with names that contain the string representation of their value doesn't add any value. We can just use the numbers. Reviewed By: kito-cheng, frasercrmck Differential Revision: https://reviews.llvm.org/D123417	2022-04-11 10:01:17 -07:00
LiaoChunyu	505fce5a9e	[RISCV] Add basic code modeling for llvm.experimental.stepvector intrinsic Scalable vectors llvm.experimental.stepvector intrinsic will crash due to an invalid cost when run the code through the loopunroll. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D122782	2022-04-11 10:19:23 +08:00
Craig Topper	4e561a581f	[RISCV] Remove unnecessary cast to i8* when converting gather/scatter to strided load/store. Not sure why I thought this necessary at the time.	2022-04-09 20:05:03 -07:00
Craig Topper	70046438d0	[RISCV] Only try LUI+SHADD+ADDI for int materialization if LUI+ADDI+SHADD failed. There's an assert in LUI+SHADD+ADDI materialization that makes sure the lower 12 bits aren't zero since that case should have been handled as LUI+ADDI+SHADD. But nothing prevented the LUI+SH*ADD+ADDI checks from running after the earlier code handled it. The sequence would be the same length or longer so it wouldn't replace the earlier sequence, but the assert happened before that was checked. The vector holding the sequence also wasn't reset before the second check so that guaranteed the sequence would never be found to be shorter. This patch fixes this by only trying the second expansion when the earlier fails. Fixes PR54812. Reviewed By: benshi001 Differential Revision: https://reviews.llvm.org/D123406	2022-04-09 08:52:15 -07:00
Fraser Cormack	34e1b4774a	[RISCV] Select unmasked FP setcc insts via ISel post-process Similar to D123217 but for the floating-point patterns. No change in generated output, while reducing the generated table size. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D123291	2022-04-08 17:13:43 +01:00
Craig Topper	1903b99154	[RISCV] Always select (and (srl X, C), Mask) as (srli (slli X, C2), C3). SLLI is always compressible to C.SLLI as long as the source and dest register is the same. ANDI and SRLI are only compressible if the register is x8-x15. By using SLLI we have a better chance of generating shorter code. I had to exclude one exclusion for the BEXTI case so that it's pattern match could still fire. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D123336	2022-04-08 09:04:04 -07:00
Kito Cheng	9c5aedfbf5	[RISCV] Fixing stack offset for RVV object with vararg in stack. We found LLVM generate wrong stack offset for RVV object when stack having variable argument, that cause by we didn't count vaarg part during calculate RVV stack objects. Also update the stack layout diagram for including vaarg in the diagram. Stack layout ref: https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.cc#L3941 Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D123180	2022-04-08 12:01:16 +08:00
Kito Cheng	690085c9b7	[RISCV] Store/restore RISCVMachineFunctionInfo into MIR YAML file RISCVMachineFunctionInfo has some fields like VarArgsFrameIndex and VarArgsSaveSize are calculated at ISel lowering stage, those info are not contained in MIR files, that cause test cases rely on those field can't not reproduce correctly by MIR dump files. This patch adding the MIR read/write for those fields. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D123178	2022-04-08 11:55:48 +08:00
jacquesguan	a55c19c44b	[RISCV][NFC] Use defvar to simplify pattern definations. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D123292	2022-04-08 02:51:30 +00:00
Craig Topper	d98bea87ef	[RISCV] Add more .vx patterns for VLMax integer setccs. This patch synchronizes the structure of the templates with those in RISCVInstrInfoVVLPatterns.td so that we get patterns with .vx on the left hand side. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D123255	2022-04-07 09:17:43 -07:00
Craig Topper	82662b753d	[RISCV] Add swapped patterns to VPatIntegerSetCCVL_VIPlus1. This matches VPatIntegerSetCCVL_VI_Swappable. But as noted in the FIXME this may only be needed due to lack of canonicalization on VP_SETCC. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D123239	2022-04-07 09:17:08 -07:00
Luís Marques	d09d297c5d	[RISCV] Fix crash for section alignment with .option norvc The existing code wasn't getting the subtarget info from the fragment, so the current status of RVC would be ignored. This would cause a crash for the new test case when the target then reported it couldn't write the requested number of code alignment bytes. Differential Revision: https://reviews.llvm.org/D122236	2022-04-07 12:02:27 +01:00
Fraser Cormack	8ebc9b1560	[RISCV] Select unmasked integer setcc insts via ISel post-process This patch has no effect on the generated code, whilst mitigating the increase in ISel table size caused by the recent addition of masked patterns. I aim to do the same for floating-point patterns once D123051 lands, giving us a reason to use masked floating-point patterns. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D123217	2022-04-07 09:30:19 +01:00
Fraser Cormack	8216255c9f	[RISCV][VP] Add basic RVV codegen for vp.fcmp This patch adds the necessary infrastructure to lower vp.fcmp via ISD::VP_SETCC to RVV instructions. Most notably this patch adds cond-code legalization for VP_SETCC, reusing the existing TargetLowering::LegalizeSetCCCondCode by passing in additional SDValue parameters for the Mask and EVL. This method then uses VP operations to legalize the condcode. There is still a general lack of canonicalization on VP_SETCC as opposed to SETCC which results in worse code than is theoretically possible. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D123051	2022-04-07 09:16:07 +01:00
Liqin Weng	f891123556	[RISCV] Add CMOV isel pattern for (select (setgt X, Imm), Y, Z) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122644	2022-04-07 05:55:53 +00:00
Lian Wang	1b547799c5	[RISCV] Supplement patterns for vnsrl.wx/vnsra.wx when splat shift is sext or zext Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122786	2022-04-07 02:21:41 +00:00
Craig Topper	e13a44b460	[RISCV] Add lowering for vp.sext and vp.zext. Including mask vector inputs. Reviewed By: frasercrmck, rogfer01 Differential Revision: https://reviews.llvm.org/D123150	2022-04-06 09:59:49 -07:00
Fraser Cormack	6be5e875be	[RISCV][VP] Add basic RVV codegen for vp.icmp This patch adds the minimum required to successfully lower vp.icmp via the new ISD::VP_SETCC node to RVV instructions. Regular ISD::SETCC goes through a lot of canonicalization which targets may rely on which has not hereto been ported to VP_SETCC. It also supports expansion of individual condition codes and a non-boolean return type. Support for all of that will follow in later patches. In the case of RVV this largely isn't a problem as the vector integer comparison instructions are plentiful enough that it can lower all VP_SETCC nodes on legal integer vectors except for boolean vectors, which regular SETCC folds away immediately into logical operations. Floating-point VP_SETCC operations aren't as well supported in RVV and the backend relies on condition code expansion, so support for those operations will come in later patches. Portions of this code were taken from the VP reference patches. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122743	2022-04-06 16:51:22 +01:00
Craig Topper	3c831c9b28	[RISCV] Add support for vp.fptosi where the result is a mask type. We can do this conversion by converting the same sized integer type, then compare the result with 0. The conversion is undefined if the converted FP value doesn't fit in an i1. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D122678	2022-04-05 09:48:04 -07:00
Craig Topper	d970e96c53	[RISCV] Add lowering for vp.fptoui and vp.uitofp. This is a straightforward extension of D122512 to unsigned integers.	2022-04-01 18:28:46 -07:00
Craig Topper	fa630e7594	[RISCV][AMDGPU][TargetLowering] Special case overflow expansion for (uaddo X, 1). If we expand (uaddo X, 1) we previously expanded the overflow calculation as (X + 1) <u X. This potentially increases the live range of X and can prevent X+1 from reusing the register that previously held X. Since we're adding 1, overflow only occurs if X was UINT_MAX in which case (X+1) would be 0. So this patch adds a special case to expand the overflow calculation to (X+1) == 0. This seems to help with uaddo intrinsics that get introduced by CodeGenPrepare after LSR. Alternatively, we could block the uaddo transform in CodeGenPrepare for this case. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D122933	2022-04-01 13:14:10 -07:00
Lian Wang	62dd3674bc	[RISCV] Supplement SDNode patterns for vfwmul/vfwadd/vfwsub Reviewed By: jacquesguan Differential Revision: https://reviews.llvm.org/D122720	2022-04-01 03:09:50 +00:00
Fraser Cormack	ee51aefba0	[RISCV][NFC] Minor formatting fix	2022-03-31 16:15:22 +01:00
Fraser Cormack	a276d1f44b	[RISCV][NFC] Fix formatting on one line	2022-03-31 13:17:37 +01:00
ShihPo Hung	2f1261abe4	[RISCV][RVV] Add Uses = [FRM] and mayRaiseFPException = true to RVV instructions This patch adds Uses = [FRM] and mayRaiseFPException = true to following instructions: VFADD, VFSUB, VFRSUB, VFMUL, VFDIV, VFRDIV VFWADD, VFWSUB, VFWMUL VFMADD, VFMACC, VFMSAC, VFMSUB VFNMADD, VFNMACC, VFNMSAC, VVFNMSUB VFWMACC, VFWMSAC, VFWNMACC, VFWNMSAC VFSQRT, VFREC7 VFREDOSUM, VFREDUSUM, VFWREDOSUM, VFWREDUSUM and only adds mayRaiseFPException = true to following instructions: VFRSQRT7, VFMIN, VFMAX, VFREDMIN, VFREDMAX VMFEQ, VMFNE, VMFLT,VMFLE, VMFGT, VMFGE Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D121087	2022-03-31 01:33:17 -07:00
Fraser Cormack	893d63fbdc	[RISCV][NFC] Fix comment to refer to correct file	2022-03-31 08:59:10 +01:00
Lian Wang	b3851e9931	[RISCV] Add VL patterns for vfwmul/vfwadd/vfwsub Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D122369	2022-03-31 07:08:58 +00:00
Craig Topper	4477500533	[RISCV] ISel (and (shift X, C1), C2)) to shift pair in more cases Previously, these isel optimizations were disabled if the AND could be selected as a ANDI instruction. This patch disables the optimizations only if the immediate is valid for C.ANDI. If we can't use C.ANDI, we might be able to compress the shift instructions instead. I'm not checking the C extension since we have relatively poor test coverage of the C extension. Without C extension the code size should be equal. My only concern would be if the shift+andi had better latency/throughput on a particular CPU. I did have to add a peephole to match SRLIW if the input is zexti32 to prevent a regression in rv64zbp.ll. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D122701	2022-03-30 11:46:42 -07:00
Craig Topper	7417eb29ce	[RISCV] Use getSplatBuildVector instead of getSplatVector for fixed vectors. The splat_vector will be legalized to build_vector eventually anyway. This patch makes it take fewer steps. Unfortunately, this results in some codegen changes. It looks like it comes down to how the nodes were ordered in the topological sort for isel. Because the build_vector is created earlier we end up with a different ordering of nodes. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D122185	2022-03-30 11:36:34 -07:00
Liqin Weng	4cb85da811	[RISCV] Add CMIX isel pattern for (xor (and (xor rs1, rs3), rs2), rs3) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122702	2022-03-30 16:51:09 +08:00
Fraser Cormack	75047577d6	[RISCV] Trim RVV isel pats matchable via DAG post-process In D122512, several masked patterns were added to support lowering of vector-predicated float-to-int and int-to-float conversions. With the introduction of these patterns, all of the old "unmasked" patterns are matchable via the DAG post-process introduced in D118810, once the relevant opcode entries are set up in the helper table. Locally this reduces the generated isel table by 4%. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D122637	2022-03-30 08:56:38 +01:00
Liqin Weng	7f81765898	[RISCV][NFC] Add immediate tests for the icmp instruction Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122651	2022-03-30 02:51:26 +00:00
Zakk Chen	b578330754	[RISCV] Use maskedoff to decide mask policy for masked compare and vmsbf/vmsif/vmsof. masked compare and vmsbf/vmsif/vmsof are always tail agnostic, we could check maskedoff value to decide mask policy rather than have a addtional policy operand. Reviewed By: craig.topper, arcbbb Differential Revision: https://reviews.llvm.org/D122456	2022-03-29 18:05:33 -07:00
Zakk Chen	10b2760da0	Revert "[RISCV] Add policy operand for masked compare and vmsbf/vmsif/vmsof IR" This reverts commit `10fd2822b7`. I have a better implementation for those operations without the additional policy operand. masked compare and vmsbf/vmsif/vmsof are always tail agnostic so we could assume undef maskedoff is mask agnostic. Differential Revision: https://reviews.llvm.org/D122455	2022-03-29 18:05:33 -07:00
Liqin Weng	d660c0d793	[RISCV] Optimize LI+SLT to SLTI+XORI for immediates in specific range This transform will reduce one GPR. Reviewed By: craig.topper, benshi001 Differential Revision: https://reviews.llvm.org/D122051	2022-03-29 14:46:49 +08:00
Craig Topper	45e85feba6	[RISCV] Pull APInt/computeKnonwbits specifics out of computeGREVOrGORC. NFC This function now takes a uint64_t instead of an APInt. The caller is responsible for masking the shift amount, extracting and inserting into the KnownBits APInts, and inverting to compute zeros. This is less code and cleaner division of responsibilities.	2022-03-28 20:53:54 -07:00

... 3 4 5 6 7 ...

2266 Commits