llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	010f0f000f	Revert "[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions." I thought this might help with another optimization I was thinking about, but I don't think it will. So it just wastes compile time calling computeKnownBits for no benefit. This reverts commit `81b2f95971`.	2021-06-27 10:33:43 -07:00
Craig Topper	81b2f95971	[RISCV] Use zexti32/sexti32 in srliw/sraiw isel patterns to improve usage of those instructions.	2021-06-26 11:57:26 -07:00
Jim Lin	779d2b0a42	[RISCV][NFC] Combine the control flow for different RetOp of interrupt function Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104838	2021-06-26 17:28:03 +08:00
Craig Topper	d4f4a1ba62	[RISCV] Add DAG combine to detect opportunities to replace (i64 (any_extend (i32 X)) with sign_extend. If type legalization is going to insert a sign_extend for other users of X and we can fold the sign_extend into ADDW/MULW/SUBW, it is better to replace the ANY_EXTEND so we don't end up with a separate ADD/MUL/SUB instruction for the users of the ANY_EXTEND. I'm only handling setcc uses right now, but there are other instructions that force sign_extends like ashr. There are probably other *W instructions we could use in addition to ADDW/SUBW/MULW. My motivating case was a loop terminating compare and a phi use as seen in the new test file. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D104581	2021-06-25 23:16:37 -07:00
Fraser Cormack	ab1bd25593	[RISCV] Permit larger RVV stacks and stack offsets This patch teaches the compiler to generate code to handle larger RVV stack sizes and stack offsets which resolve an amount larger than 2047 vector registers in size. The previous behaviour was asserting on such large values as it was only able to materialize the constant by feeding it to the 12-bit immediate of an `ADDI` instruction. The compiler can now materialize this amount into a temporary register before continuing with the computation. A test case for this scenario is included which also checks that the temporary register used to materialize the amount doesn't require an additional spill slot over what we're already reserving for RVV code. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D104727	2021-06-25 07:17:33 +01:00
Fraser Cormack	a4729f7f88	[RISCV] Lower RVV vector SELECTs to VSELECTs This patch optimizes the code generation of vector-type SELECTs (LLVM select instructions with scalar conditions) by custom-lowering to VSELECTs (LLVM select instructions with vector conditions) by splatting the condition to a vector. This avoids the default expansion path which would either introduce control flow or fully scalarize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104772	2021-06-24 10:12:51 +01:00
Craig Topper	a37cf17834	[RISCV] Add explicit copy to V0 in the masked vmsge(u).vx intrinsic handling. This is consistent with our other masked vector instructions. Previously we found cases where not doing this broke fast reg alloc.	2021-06-23 08:04:42 -07:00
Craig Topper	c2e01ee4a5	[RISCV] Remove extra character from a comment. NFC	2021-06-21 12:52:02 -07:00
Craig Topper	9080659ac7	[RISCV] Add isel patterns to match vmacc/vmadd/vnmsub/vnmsac from add/sub and mul. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104163	2021-06-21 11:27:44 -07:00
Craig Topper	b663f30fa4	[RISCV] Prevent formation of shXadd(.uw) and add.uw if it prevents the use of addi. If the outer add has an simm12 immediate operand we should prefer it instead of materializing it in a register. This would guarantee and extra instruction and temporary register. Since we don't check one use on the shl or zext we might generate more instructions if there is an additional user.	2021-06-19 12:10:42 -07:00
Ben Shi	d934b72809	[RISCV] Optimize add-mul in the zba extension with SHADD This patch does the following optimization. Rx + Ry 18 => (SH1ADD (SH3ADD Rx, Rx), Ry) Rx + Ry * 20 => (SH2ADD (SH2ADD Rx, Rx), Ry) Rx + Ry * 24 => (SH3ADD (SH1ADD Rx, Rx), Ry) Rx + Ry * 36 => (SH2ADD (SH3ADD Rx, Rx), Ry) Rx + Ry * 40 => (SH3ADD (SH2ADD Rx, Rx), Ry) Rx + Ry * 72 => (SH3ADD (SH3ADD Rx, Rx), Ry) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104588	2021-06-19 14:33:27 +08:00
Craig Topper	ac87133f1d	[RISCV] Teach vsetvli insertion to remember when predecessors have same AVL and SEW/LMUL ratio if their VTYPEs otherwise mismatch. Previously we went directly to unknown state on VTYPE mismatch. If we instead remember the partial match, we can use this to still use X0, X0 vsetvli in successors if AVL and needed SEW/LMUL ratio match. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104069	2021-06-18 12:16:07 -07:00
Luke	c2e97ba85e	[RISCV] Don't enable Interleaved Access Vectorization The patch https://reviews.llvm.org/D101469 is intended to enable loop unrolling, not interleaved access vectorization. The method bool enableInterleavedAccessVectorization() should not be implemented.	2021-06-18 12:32:30 +08:00
Saleem Abdulrasool	116841c623	RISCV: clean up target expression handling The target specific expression handling was slightly regressed by `bbea64250f`. This restores the proper sub-expression evaluation to allow for constant folding within the expression. We explicitly discard the layout and assembler when evaluating the expression to avoid any symbolic computation and instead using the `evaluateAsRelocatable` to canonicalise and constant fold only. We can also simplify the expression handling - none of the target variants support symbolic difference. This simplifies the logic for that and adds additional tests to ensure that we do not accidentally regress here in the future. Reviewed By: maskray Differential Revision: https://reviews.llvm.org/D104473	2021-06-17 13:35:32 -07:00
Haojian Wu	53f5f14136	fix an -Wunused-variable warning in release built, NFC	2021-06-17 18:48:47 +02:00
Saleem Abdulrasool	bbea64250f	RISCV: adjust handling of relocation emission for RISCV This re-architects the RISCV relocation handling to bring the implementation closer in line with the implementation in binutils. We would previously aggressively resolve the relocation. With this restructuring, we always will emit a paired relocation for any symbolic difference of the type of S±T[±C] where S and T are labels and C is a constant. GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE` which indicates that a fixup may be expanded into multiple relocations. This is used by the RISCV backend to always emit a paired relocation - either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] + SUB[WIDTH] for a debug info relocation. Irrespective of whether linker relaxation support is enabled, symbolic difference is always emitted as a paired relocation. This change also sinks the target specific behaviour down into the target specific area rather than exposing it to the shared relocation handling. In the process, we also sink the "special" handling for debug information down into the RISCV target. Although this improves the path for the other targets, this is not necessarily entirely ideal either. The changes in the debug info emission could be done through another type of hook as this functionality would be required by any other target which wishes to do linker relaxation. However, as there are no other targets in LLVM which currently do this, this is a reasonable thing to do until such time as the code needs to be shared. Improve the handling of the relocation (and add a reduced test case from the Linux kernel) to ensure that we handle complex expressions for symbolic difference. This ensures that we correct relocate symbols with the adddends normalized and associated with the addition portion of the paired relocation. This change also addresses some review comments from Alex Bradbury about the relocations meant for use in the DWARF CFA being named incorrectly (using ADD6 instead of SET6) in the original change which introduced the relocation type. This resolves the issues with the symbolic difference emission sufficiently to enable building the Linux kernel with clang+IAS+lld (without linker relaxation). Resolves PR50153, PR50156! Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143 Reviewed By: nickdesaulniers, maskray Differential Revision: https://reviews.llvm.org/D103539	2021-06-17 08:20:02 -07:00
Fraser Cormack	fed1503e85	[RISCV][VP] Lower FP VP ISD nodes to RVV instructions With the exception of `frem`, this patch supports the current set of VP floating-point binary intrinsics by lowering them to to RVV instructions. It does so by using the existing `RISCVISD *_VL` custom nodes as an intermediate layer. Both scalable and fixed-length vectors are supported by using this method. The `frem` node is unsupported due to a lack of available instructions. For fixed-length vectors we could scalarize but that option is not (currently) available for scalable-vector types. The support is intentionally left out so it equivalent for both vector types. The matching of vector/scalar forms is currently lacking, as scalable vector types do not lower to the custom `VFMV_V_F_VL` node. We could either make floating-point scalable vector splats lower to this node, or support the matching of multiple kinds of splat via a `ComplexPattern`, much like we do for integer types. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D104237	2021-06-17 10:04:00 +01:00
Fangrui Song	1a76bff626	RISCVFixupKinds.h: Don’t duplicate function or class name at the beginning of the comment && fix some comments	2021-06-16 10:42:43 -07:00
Fraser Cormack	c75e454cb9	[RISCV] Transform unaligned RVV vector loads/stores to aligned ones This patch adds support for loading and storing unaligned vectors via an equivalently-sized i8 vector type, which has support in the RVV specification for byte-aligned access. This offers a more optimal path for handling of unaligned fixed-length vector accesses, which are currently scalarized. It also prevents crashing when `LegalizeDAG` sees an unaligned scalable-vector load/store operation. Future work could be to investigate loading/storing via the largest vector element type for the given alignment, in case that would be more optimal on hardware. For instance, a 4-byte-aligned nxv2i64 vector load could loaded as nxv4i32 instead of as nxv16i8. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104032	2021-06-14 18:12:18 +01:00
Simon Pilgrim	5e6bfb661e	[Analysis] Pass RecurrenceDescriptor as const reference. NFCI. We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.). Differential Revision: https://reviews.llvm.org/D104029	2021-06-11 10:24:14 +01:00
Hsiangkai Wang	643b6407fa	[RISCV] Avoid scalar outgoing argumetns overwriting vector frame objects. When using FP to access stack objects, the scalable stack objects will be put at the lower end of the frame. It looks like ``` \|-------------------\| <-- FP \| callee-saved regs \| \|-------------------\| \| scalar local vars \| \|-------------------\| \| RVV local vars \| \|-------------------\| <-- SP ``` If there are scalar arguments that need to pass through memory and there are vector objects on the stack using FP to access. The outgoing scalar arguments will overwrite the vector objects. It looks like ``` \|-------------------\| <-- FP \| callee-saved regs \| \|-------------------\| \| scalar local vars \| \|-------------------\| \|-------------------\| \| RVV local vars \| \| outgoing args \| <- outgoing arguments \|-------------------\| <-- SP \|-------------------\| overwrite from here. ``` In this patch, we reserve the stack for the outgoing arguments before function calls if using FP to access and there are scalable vector frame objects. It looks like ``` \|-------------------\| <-- FP \| callee-saved regs \| \|-------------------\| \| scalar local vars \| \|-------------------\| \| RVV local vars \| \|-------------------\| \| outgoing args \| \|-------------------\| <-- SP ``` Differential Revision: https://reviews.llvm.org/D103622	2021-06-11 12:26:29 +08:00
Craig Topper	420bd5ee8e	[RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel::selectSExti32/selectZExti32. This helps us select W instructions in more cases. Most of the affected tests have had the sign_extend_inreg or AND folded into sextload/zextload. Differential Revision: https://reviews.llvm.org/D104079	2021-06-10 19:06:45 -07:00
Craig Topper	8dfd0810f2	[RISCV] Remove unused method from RISCVInsertVSETVLI. NFC If this becomes needed its trivial to add it back.	2021-06-09 15:35:26 -07:00
Fraser Cormack	502edebd9d	[ValueTypes][RISCV] Cap RVV fixed-length vectors by size This patch changes RVV's policy for its supported list of fixed-length vector types by capping by vector size rather than element count. Now all 1024-byte vectors (of supported element types) are supported, rather than all 256-element vectors. This is a more natural fit for the architecture, and allows us to, for example, improve the support for vector bitcasts. This change necessitated the adding of some new simple types to avoid "regressing" on the number of currently-supported vectors. We round out the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and `v512f16`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103884	2021-06-09 12:15:37 +01:00
Fraser Cormack	e8f1f89103	[RISCV] Support CONCAT_VECTORS on scalable masks This patch is a simple fix which registers CONCAT_VECTORS as custom-lowered for scalable mask vectors. This follows the pattern of all other scalable-vector types, as the default expansion of CONCAT_VECTORS cannot handle scalable types, and even if it did it'd go through the stack and generate worse code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103896	2021-06-09 09:07:44 +01:00
Jim Lin	242ddd5089	[RISCV][NFC] Add a single space after comma for VType In most of cases, it has a single space after comma in assembly operands. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103790	2021-06-09 11:18:22 +08:00
Craig Topper	8b4c80d380	Further improve register allocation for vwadd(u).wv, vwsub(u).wv, vfwadd.wv, and vfwsub.wv. The first source has the same EEW as the destination, but we're using earlyclobber which prevents them from ever being the same register. This patch attempts to work around this. -For unmasked .wv, add a special TIED pseudo that pretends like the first operand and the destination must be the same register. This disables the earlyclobber for that source. Mark the instruction as convertible to 3 address form which will switch it to the original untied pseudo when the TwoAddressInstructionPass decides that keeping them tied would require an extra copy. This uses code in RISCVInstrInfo.cpp to do the conversion to the untied opcode. The untie test case show that we can generate the untied version. Not sure it was profitable to do it in this case, but they have really simple IR. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D103552	2021-06-08 09:43:43 -07:00
Craig Topper	c57bce9cc5	[RISCV] Remove ForceTailAgnostic flag from vmv.s.x, vfmv.s.f and reductions. In 0.9 these were defined to leave elements other than 0 in the destination unmodified. They were changed to use the tail policy in 0.10. I missed that update. I assume no one has noticed because in order cores treat tail agnostic the same as tail undisturbed. I believe Spike and QEMU do the same. Reviewed By: arcbbb, frasercrmck Differential Revision: https://reviews.llvm.org/D103736	2021-06-08 09:22:40 -07:00
Craig Topper	7c4e9a6826	[RISCV] Use 0 for Log2SEW for vle1/vse1 intrinsics to enable vsetvli optimization. Missed in D103299.	2021-06-07 22:41:14 -07:00
Craig Topper	ae3ab4f0ec	[RISCV] Masked compares should use a tail agnostic policy. Writes of a mask result are always tail agnostic. Unfortunately, this seems to have made codegen worse. I can only think this must be because the vsetvli was acting as some sort of barrier that prevented some code movement in the scheduler. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D103331	2021-06-07 21:43:44 -07:00
Craig Topper	7a105b5768	[RISCV] Use AVL Operand instead of GPR for tied mask pseudo for vwadd.wv and similar. I mistakenly copied this from an older version of our internal repo.	2021-06-07 21:16:50 -07:00
Craig Topper	0aa941654f	[RISCV] Use bitfields to shrink the size of the vector load/store intrinsics to pseudo instruction lookup tables.	2021-06-07 17:57:51 -07:00
Ben Shi	c705b7b04d	[RISCV] Optimize bitwise and with constant for the Zbs extension This patch optimizes (and r i) to (BCLRI (BCLRI r, i0), i1) in which i = ~((1<<i0) \| (1<<i1)). or (BCLRI (ANDI r, i0), i1) in which i = i0 & ~(1<<i1). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103743	2021-06-08 07:26:00 +08:00
Craig Topper	9b92ae01ee	[RISCV] Store Log2 of EEW in the vector load/store intrinsic to pseudo lookup tables. NFCI This uses 3 bits of data instead of 7. I'm wondering if we can use bitfields for the lookup table key where this would matter. I also name the shift_amount template to log2 since it is used with more than just an srl now.	2021-06-07 15:47:45 -07:00
Craig Topper	f30f8b4f12	[RISCV] Lower i8/i16 bswap/bitreverse to grevi/greviw with Zbp. Include known bits support so we know we don't need to zext the output if the input was already zero extended. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D103757	2021-06-07 10:31:51 -07:00
Craig Topper	8c6bd6c22f	[RISCV] Don't enable loop vectorizer interleaving if the V extension isn't enabled. This can cause the vectorizer to generate interleaved scalar code which might be ok for some CPUs, but definitely not all. Disable it to restore the previous scalar behavior. Differential Revision: https://reviews.llvm.org/D103787	2021-06-07 10:20:59 -07:00
Craig Topper	8bde5f06a1	[RISCV] Replace && with \|\|. Spotted by coverity. We should be exiting when the shift amount is greater than the bit width regardless of whether it is a power of 2. Reported by Simon Pilgrim here https://reviews.llvm.org/D96661 This requires getting a shift amount that is out of bounds that wasn't already optimized by SelectionDAG. This would be pretty trick to construct a test for. Or it would require a non-power of 2 shift amount and a mask that has runs of ones and zeros of the next lowest power of 2 from that shift amount. I tried a little to produce a test for this, but didn't get it to work.	2021-06-06 13:09:51 -07:00
Nikita Popov	1ffa6499ea	[TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC) Don't require a specific kind of IRBuilder for TargetLowering hooks. This allows us to drop the IRBuilder.h include from TargetLowering.h. Differential Revision: https://reviews.llvm.org/D103759	2021-06-06 16:29:50 +02:00
Nikita Popov	9914200393	[CodeGen] Add missing includes (NFC) These currently rely on the IRBuilder.h include in TargetLowering.h. Make them explicit.	2021-06-06 15:48:27 +02:00
Simon Pilgrim	be51737f59	Fix "not all control paths return a value" MSVC warning. NFCI.	2021-06-05 19:42:00 +01:00
Jim Lin	170b70b74b	[RISCV] Replace (XLenVT (VLOp GPR:$vl)) with VLOpFrag This is for D100288 to reduce the changes. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103682	2021-06-05 12:49:31 +08:00
Craig Topper	c653711fd3	[RISCV] Teach vsetvli insertion pass that operations on masks don't care about SEW/LMUL. All that really matters is that the VLMAX of the preceding instructions is the same as the VLMAX required by the mask operation. Also update the vmsge(u) handling to use the SEW/LMUL we use for other mask register operations. We were matching it to the compare before. Some cases will be improve if we fix masked compares to use tail agnostic policy. I think they ignore the tail policy anyway. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D103299	2021-06-04 09:17:46 -07:00
Craig Topper	e9313fa33a	[RISCV] Simplify some code in RISCVInsertVSETVLI by calling an existing function that does the same thing. NFCI	2021-06-03 17:31:54 -07:00
Fraser Cormack	8790e85255	[RISCV] Reserve an emergency spill slot for any RVV spills This patch addresses an issue in which fixed-length (VLS) vector RVV code could fail to reserve an emergency spill slot for their frame index elimination. This is because we were previously only reserving a spill slot when there were `scalable-vector` frame indices being used. However, fixed-length codegen uses regular-type frame indices if it needs to spill. This patch does the fairly brute-force method of checking ahead of time whether the function contains any RVV spill instructions, in which case it reserves one slot. Note that the second RVV slot is still only reserved for `scalable-vector` frame indices. This unfortunately causes quite a bit of churn in existing tests, where we chop and change stack offsets for spill slots. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103269	2021-06-03 10:44:34 +01:00
Fraser Cormack	3b0a33d0ad	[RISCV] Expand unaligned fixed-length vector memory accesses RVV vectors must be aligned to their element types, so anything less is unaligned. For regular loads and stores, our custom-lowering of fixed-length vectors meant that we opted out of LegalizeDAG's built-in unaligned expansion. This patch adds that logic in to our custom lower function. For masked intrinsics, we declare that anything unaligned is not legal, leaving the ScalarizeMaskedMemIntrin pass to do the expansion for us. Note that neither of these methods can handle the expansion of scalable-vector memory ops, so those cases are left alone by this patch. Scalable loads and stores already go through expansion by default but hit an assertion, and scalable masked intrinsics will silently generate incorrect code. It may be prudent to return an error in both of these cases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102493	2021-06-02 09:27:44 +01:00
Craig Topper	41ff1e0e29	[RISCV] Improve register allocation for masked vwadd(u).wv, vwsub(u).wv, vfwadd.wv, and vfwsub.wv. The first source has the same EEW as the destination, but we're using earlyclobber which prevents them from ever being the same register. To workaround this, add a special TIED pseudo to use whenever the first source and merge operand are the same value. This allows us to use a single operand for the merge operand and first source which we can then tie to the destination. A tied source disables earlyclobber for that operand. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D103211	2021-06-01 18:59:00 -07:00
Daniel Sanders	aaac268285	[globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one It's still in use in a few places so we can't delete it yet but there's not many at this point. Differential Revision: https://reviews.llvm.org/D103352	2021-06-01 13:23:48 -07:00
Craig Topper	896f9bc350	[RISCV] Remove earlyclobber from vnsrl/vnsra/vnclip(u) when the source and dest are a single vector register. This guarantees they meet this overlap exception: "The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group" Being a single register guarantees the overlap is always in the lowerst-number part of the group. Reviewed By: frasercrmck, khchen Differential Revision: https://reviews.llvm.org/D103351	2021-06-01 09:17:52 -07:00
Craig Topper	5a5219a0f9	[RISCV] Remove earlyclobber from compares with LMUL<=1. Compares are considered a narrowing operation for register overlap. I believe for LMUL<=1 they meet this exception to allow overlap "The destination EEW is smaller than the source EEW and the overlap is in the lowest-numbered part of the source register group" Both the result and the sources will occupy a single register for LMUL<=1 so the overlap would always be in the "lowest-numbered part". Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D103336	2021-06-01 09:08:11 -07:00
Fraser Cormack	4f500c402b	[RISCV] Support vector types in combination with fastcc This patch extends the RISC-V lowering of the 'fastcc' calling convention to vector types, both fixed-length and scalable. Without this patch, any function passing or returning vector types by value would throw a compiler error. Vectors are handled in 'fastcc' much as they are in the default calling convention, the noticeable difference being the extended set of scalar GPR registers that can be used to pass vectors indirectly. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102505	2021-06-01 10:31:18 +01:00

1 2 3 4 5 ...

1265 Commits