llvm-project

Commit Graph

Author	SHA1	Message	Date
Fraser Cormack	8c73a31c11	[RISCV] Allow passing fixed-length vectors via the stack The vector calling convention dictates that when the vector argument registers are exhaused, GPRs are used to pass the address via the stack. When the GPRs themselves are exhausted, at best we would previously crash with an assertion, and at worst we'd generate incorrect code. This patch addresses this issue by passing fixed-length vectors via the stack with their full fixed-length size and aligned to their element type size. Since the calling convention lowering can't yet handle scalable vector types, this patch adds a fatal error to make it clear that we are lacking in this regard. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102422	2021-05-27 14:14:07 +01:00
Fraser Cormack	772b58a641	[SelectionDAG][RISCV] Don't unroll 0/1-type bool VSELECTs This patch extends the cases in which the legalizer is able to express VSELECT in terms of XOR/AND/OR. When dealing with a VSELECT between boolean vector types, the mask itself is an all-ones or all-ones value of the operand type, so a 0/1 boolean type behaves identically to a 0/-1 type. This greatly helps RISC-V which relies on expansion for these nodes. It also allows scalable-vector bool VSELECTs to use the default expansion, where before it would crash in SelectionDAG::UnrollVectorOp. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103147	2021-05-27 10:08:57 +01:00
Craig Topper	fdf10e6197	[RISCV] Use X0 as destination of inserted vsetvli when possible. We aren't going to connect the result to anything so we might as well avoid allocating a register. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D102031	2021-05-26 13:08:51 -07:00
Craig Topper	9065118b64	[RISCV] Optimize SEW=64 shifts by splat on RV32. SEW=64 shifts only uses the log2(64) bits of shift amount. If we're splatting a 64 bit value in 2 parts, we can avoid splatting the upper bits and just let the low bits be sign extended. They won't be read anyway. For the purposes of SelectionDAG semantics of the generic ISD opcodes, if hi was non-zero or bit 31 of the low is 1, the shift was already undefined so it should be ok to replace high with sign extend of low. In order do be able to find the split i64 value before it becomes a stack operation, I added a new ISD opcode that will be expanded to the stack spill in PreprocessISelDAG. This new node is conceptually similar to BuildPairF64, but I expanded earlier so that we could go through regular isel to get the right VLSE opcode for the LMUL. BuildPairF64 is expanded in a CustomInserter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102521	2021-05-26 10:23:32 -07:00
Jessica Clarke	d63d662d3c	[RISCV] Remove --riscv-no-aliases from RVV tests This serves no useful purpose other than to clutter things up. Diff summary as the real diff is extremely unwieldy: 24844 -; CHECK-NEXT: jalr zero, 0(ra) 24844 +; CHECK-NEXT: ret 8 -; CHECK-NEXT: vl4re8.v v28, (a0) 8 +; CHECK-NEXT: vl4r.v v28, (a0) 64 -; CHECK-NEXT: vl8re8.v v24, (a0) 64 +; CHECK-NEXT: vl8r.v v24, (a0) 392 -; RUN: --riscv-no-aliases < %s \| FileCheck %s 392 +; RUN: < %s \| FileCheck %s 1 -; RUN: -verify-machineinstrs --riscv-no-aliases < %s \ 1 +; RUN: -verify-machineinstrs < %s \ As discussed in D103004.	2021-05-26 17:59:38 +01:00
Craig Topper	b2c7ac874f	[RISCV] Don't propagate VL/VTYPE across inline assembly in the Insert VSETVLI pass. It's conceivable someone could put a vsetvli in inline assembly so its safer to consider them as barriers. The alternative would be to trust that the user marks VL and VTYPE registers as clobbers of the inline assembly if they do that, but hat seems error prone. I'm assuming inline assembly in vector code is going to be rare. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D103126	2021-05-26 09:56:20 -07:00
Craig Topper	1b47a3de48	[RISCV] Enable cross basic block aware vsetvli insertion This patch extends D102737 to allow VL/VTYPE changes to be taken into account before adding an explicit vsetvli. We do this by using a data flow analysis to propagate VL/VTYPE information from predecessors until we've determined a value for every value in the function. We use this information to determine if a vsetvli needs to be inserted before the first vector instruction the block. Differential Revision: https://reviews.llvm.org/D102739	2021-05-26 09:25:42 -07:00
Fraser Cormack	7e27e4273d	[RISCV] Pre-commit fixed-length mask vselect tests These are default-expanded but later unrolled due to RISC-V's vector boolean content policy. A patch to improve this codegen will follow shortly.	2021-05-26 10:44:45 +01:00
Ben Shi	bf77317049	[RISCV] Optimize xor/or with immediate in the zbs extension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102893	2021-05-25 14:14:09 +08:00
Craig Topper	b510e4cf1b	[RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks. This is a replacement for D101938 for inserting vsetvli instructions where needed. This new version changes how we track the information in such a way that we can extend it to be aware of VL/VTYPE changes in other blocks. Given how much it changes the previous patch, I've decided to abandon the previous patch and post this from scratch. For now the pass consists of a single phase that assumes the incoming state from other basic blocks is unknown. A follow up patch will extend this with a phase to collect information about how VL/VTYPE change in each block and a second phase to propagate this information to the entire function. This will be used by a third phase to do the vsetvli insertion. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102737	2021-05-24 11:47:27 -07:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit `bda6e5bee0`. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since `d6de1e1a71`, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
luxufan	d70e9195a3	[RISCV] Optimize getVLENFactoredAmount function. If the local variable `NumOfVReg` isPowerOf2_32(NumOfVReg - 1) or isPowerOf2_32(NumOfVReg + 1), the ADDI and MUL instructions can be replaced with SLLI and ADD(or SUB) instructions. Based on original patch by StephenFan. Reviewed By: frasercrmck, StephenFan Differential Revision: https://reviews.llvm.org/D100577	2021-05-24 10:04:37 -07:00
Fraser Cormack	7a211ed110	[RISCV] Prevent store combining from infinitely looping RVV code generation does not successfully custom-lower BUILD_VECTOR in all cases. When it resorts to default expansion it may, on occasion, be expanded to scalar stores through the stack. Unfortunately these stores may then be picked up by the post-legalization DAGCombiner which merges them again. The merged store uses a BUILD_VECTOR which is then expanded, and so on. This patch addresses the issue by overriding the `mergeStoresAfterLegalization` hook. A lack of granularity in this method (being passed the scalar type) means we opt out in almost all cases when RVV fixed-length vector support is enabled. The only exception to this rule are mask vectors, which are always either custom-lowered or are expanded to a load from a constant pool. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102913	2021-05-24 10:19:32 +01:00
Jessica Clarke	e10958c807	[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics Unlike normal loads these don't have an extension field, but we know from TargetLowering whether these are sign-extending or zero-extending, and so can optimise away unnecessary extensions. This was noticed on RISC-V, where sign extensions in the calling convention would result in unnecessary explicit extension instructions, but this also fixes some Mips inefficiencies. PowerPC sees churn in the tests as all the zero extensions are only for promoting 32-bit to 64-bit, but these zero extensions are still not optimised away as they should be, likely due to i32 being a legal type. This also simplifies the WebAssembly code somewhat, which currently works around the lack of target-independent combines with some ugly patterns that break once they're optimised away. Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits, where zero-extending atomics were incorrectly returning 0 rather than the (slightly confusing) required return value of 1. Re-landed again after D102819 fixed PowerPC to correctly zero-extend all of its atomics as it claimed to do, since the combination of that bug and this optimisation caused buildbot regressions. Reviewed By: RKSimon, atanasyan Differential Revision: https://reviews.llvm.org/D101342	2021-05-20 20:34:23 +01:00
Fraser Cormack	c74ab891fc	[RISCV] Ensure small mask BUILD_VECTORs aren't expanded The default expansion for BUILD_VECTORs -- save for going through shuffles -- is to go through the stack. This method only works when the type is at least byte-sized, so for v2i1 and v4i1 we would crash. This patch ensures that small mask-type BUILD_VECTORs are always handled without crashing. We lower to a SETCC of the equivalent i8 type. This also exposes some pre-existing issues where the lowering when optimizing for size results in larger code than without. Those will be tackled in future patches. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102767	2021-05-20 19:12:29 +01:00
Fraser Cormack	26bd2250c1	[RISCV] Ensure shuffle splat operands are type-legal The use of `SelectionDAG::getSplatValue` isn't guaranteed to return a type-legal splat value as it may implicitly extract a vector element from another shuffle. It is not permitted to introduce an illegal type when lowering shuffles. This patch addresses the crash by adding a boolean flag to `getSplatValue`, defaulting to false, which when set will ensure a type-legal return value. If it is unable to do that it will fail to return a splat value. I've been through the existing uses of `getSplatValue` in other targets and was unable to find a need or test cases showing a need to update their uses. In some cases, the call is made during `LegalizeVectorOps` which may still produce illegal scalar types. In other situations, the illegally-typed splat value may be quickly patched up to a legal type (such as any-extending the returned `extract_vector_elt` up to a legal type) before `LegalizeDAG` notices. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102687	2021-05-20 18:00:03 +01:00
Fraser Cormack	ca2c245ba4	[RISCV] Support INSERT_VECTOR_ELT into i1 vectors Like the element extraction of these vectors, we choose to promote up to an i8 vector type and perform the insertion there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102697	2021-05-19 09:41:50 +01:00
Fraser Cormack	175bdf127d	[RISCV] Fix operand order in fixed-length VM(OR\|AND)NOT patterns Where the RVV specification writes `vs2, vs1`, our TableGen patterns use `rs1, rs2`. These differences can easily cause confusion. The VMANDNOT instruction performs `LHS && !RHS`, and similarly for VMORNOT. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102606	2021-05-18 09:21:25 +01:00
Ben Shi	3cf7983cbe	[RISCV][test] Add new tests of or/xor in the zbs extension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102625	2021-05-18 07:10:17 +08:00
Fraser Cormack	85e31eddf2	[DAGCombiner] Relax an assertion to an early return The select-of-constants transform was asserting that its constant vector inputs did not implicitly truncate their input without that as an explicit precondition to the function. This patch relaxes that assertion into an early return to skip the optimization. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D102393	2021-05-17 09:15:55 +01:00
Ben Shi	7746e818a5	[RISCV] Optimize or/xor with immediate in the zbs extension Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102398	2021-05-17 10:59:52 +08:00
Ben Shi	1dfd7d5041	[RISCV][test] Add new tests of or/xor in the zbs extension Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D102396	2021-05-17 09:47:23 +08:00
Hsiangkai Wang	b41e1306b8	[RISCV] Add the DebugLoc parameter to getVLENFactoredAmount(). The MachineBasicBlock::iterator is continuously changing during generating the frame handling instructions. We should use the DebugLoc from the caller, instead of getting it from the changing iterator. If the prologue instructions located in a basic block without any other instructions after these prologue instructions, the iterator will be updated to the boundary of the basic block and it is invalid to use the iterator to access DebugLoc. This patch also fixes the crash when accessing DebugLoc using the iterator. Differential Revision: https://reviews.llvm.org/D102386	2021-05-14 21:31:06 +08:00
Fraser Cormack	797e580db9	[RISCV][NFC] Simplify test run lines Several tests had -verify-machineinstrs twice, and several tests were explicitly specifying the default FileCheck prefix of CHECK.	2021-05-13 12:41:00 +01:00
Fraser Cormack	c5ec00e62b	[TargetLowering] Improve legalization of scalable vector types This patch extends the vector type-conversion and legalization capabilities of scalable vector types. Firstly, `vscale x 1` types now behave more like the corresponding `vscale x 2+` types. This enables the integer promotion legalization of extended scalable types, such as the promotion of `<vscale x 1 x i5>` to `<vscale x 1 x i8>`. These `vscale x 1` types are also now better handled by `getVectorTypeBreakdown`, where what looks like older handling for 1-element fixed-length vector types was spuriously updated to include scalable types. Widening of scalable types is now better supported, by using `INSERT_SUBVECTOR` to insert the smaller scalable vector "value" type into the wider scalable vector "part" type. This allows AArch64 to pass and return `vscale x 1` types by value by widening. There are still cases where we are unable to legalize `vscale x 1` types, such as where expansion would require splitting the vector in two. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102073	2021-05-12 16:33:07 +01:00
Stefan Pintilie	8d37411e48	Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics" This reverts commit `6c80361b84`. Breaks PowerPC Big Endian buildbots.	2021-05-12 09:46:18 -05:00
Craig Topper	d092dd56ae	[RISCV] Regenerate stepvector.ll. NFC It looks like the RV32 and RV64 prefixes were removed from the RUN lines while another patch was in review that added check lines that used them.	2021-05-11 13:04:57 -07:00
Fangrui Song	ec27c5f170	[RISCV] Prefer to lower MC_GlobalAddress operands to .Lfoo$local Similar to X86 D73230 and AArch64 D101872 With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode, for default visibility external linkage non-ifunc-non-COMDAT definitions. For such dso_local definitions, variable access/taking the address of a function/calling a function will go through a local alias to avoid GOT/PLT. Reviewed By: jrtc27, luismarques Differential Revision: https://reviews.llvm.org/D101875	2021-05-11 11:29:45 -07:00
Craig Topper	ce6e4f27dd	[RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min. My thought process is that if v2i64 is an LMUL=1 type then v2i32 should be an LMUL=1/2 type. We limit the fractional LMUL so that SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This ensures there's always a fractional LMUL available to truncate a type. This does reduce the number of vsetvlis in some cases. Some tests increase vsetvlis because the best container type for a mask type is dependent on the LMUL+SEW that the mask was produced from, but you can't tell that from the type. I think this is something we need to solve this in the machine IR when optimizing vsetvlis. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D101215	2021-05-11 09:42:48 -07:00
Craig Topper	dc00cbb505	[RISCV] Match trunc_vector_vl+sra_vl/srl_vl with splat shift amount to vnsra/vnsrl. Limited to splats because we would need to truncate the shift amount vector otherwise. I tried to do this with new ISD nodes and a DAG combine to avoid such a large pattern, but we don't form the splat until LegalizeDAG and need DAG combine to remove a scalable->fixed->scalable cast before it becomes visible to the shift node. By the time that happens we've already visited the truncate node and won't revisit it. I think I have an idea how to improve i64 on RV32 I'll save for a follow up. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D102019	2021-05-11 09:29:31 -07:00
Hsiangkai Wang	d8ec2b183e	[RISCV] Fix the calculation of the offset of Zvlsseg spilling. For Zvlsseg spilling, we need to convert the pseudo instructions into multiple vector load/store instructions with appropriate offsets. For example, for PseudoVSPILL3_M2, we need to convert it to VS2R %v2, %base ADDI %base, %base, (vlenb x 2) VS2R %v4, %base ADDI %base, %base, (vlenb x 2) VS2R %v6, %base We need to keep the size of the offset in the pseudo spilling instructions. In this case, it is (vlenb x 2). In the original implementation, we use the size of frame objects divide the number of vectors in zvlsseg types. The size of frame objects is not necessary exactly the same as the spilling data. It may be larger than it. So, we change it to (VLENB x LMUL) in this patch. The calculation is more direct and easy to understand. Differential Revision: https://reviews.llvm.org/D101869	2021-05-11 10:13:18 +08:00
Craig Topper	80b9510806	[RISCV] Correct VL for fixed length masked scatter. We were incorrectly calling getVectorNumElements on a scalable vector type. This shouldn't be allowed. This gives a warning on EVT, but not MVT.	2021-05-10 09:50:08 -07:00
Fraser Cormack	6db0cedd23	[LegalizeVectorOps][RISCV] Add scalable-vector SELECT expansion This patch extends VectorLegalizer::ExpandSELECT to permit expansion also for scalable vector types. The only real change is conditionally checking for BUILD_VECTOR or SPLAT_VECTOR legality depending on the vector type. We can use this to fix "cannot select" errors for scalable vector selects on the RISCV target. Note that in future patches RISCV will possibly custom-lower vector SELECTs to VSELECTs for branchless codegen. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102063	2021-05-10 08:22:35 +01:00
Jessica Clarke	6c80361b84	[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics Unlike normal loads these don't have an extension field, but we know from TargetLowering whether these are sign-extending or zero-extending, and so can optimise away unnecessary extensions. This was noticed on RISC-V, where sign extensions in the calling convention would result in unnecessary explicit extension instructions, but this also fixes some Mips inefficiencies. PowerPC sees churn in the tests as all the zero extensions are only for promoting 32-bit to 64-bit, but these zero extensions are still not optimised away as they should be, likely due to i32 being a legal type. This also simplifies the WebAssembly code somewhat, which currently works around the lack of target-independent combines with some ugly patterns that break once they're optimised away. Re-landed with correct handling in ComputeNumSignBits for Tmp == VTBits, where zero-extending atomics were incorrectly returning 0 rather than the (slightly confusing) required return value of 1. Reviewed By: RKSimon, atanasyan Differential Revision: https://reviews.llvm.org/D101342	2021-05-06 04:01:20 +01:00
Jessica Clarke	897d7bceb9	Revert "[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics" This seems to have broken sanitizers, giving lots of Assertion `NumBits <= MAX_INT_BITS && "bitwidth too large"' failed. failures across multiple targets (currently X86 and PowerPC). Reverting until I have a chance to reproduce and debug. This reverts commit `6e876f9ded`.	2021-05-05 17:02:05 +01:00
Jessica Clarke	6e876f9ded	[SelectionDAG][Mips][PowerPC][RISCV][WebAssembly] Teach computeKnownBits/ComputeNumSignBits about atomics Unlike normal loads these don't have an extension field, but we know from TargetLowering whether these are sign-extending or zero-extending, and so can optimise away unnecessary extensions. This was noticed on RISC-V, where sign extensions in the calling convention would result in unnecessary explicit extension instructions, but this also fixes some Mips inefficiencies. PowerPC sees churn in the tests as all the zero extensions are only for promoting 32-bit to 64-bit, but these zero extensions are still not optimised away as they should be, likely due to i32 being a legal type. This also simplifies the WebAssembly code somewhat, which currently works around the lack of target-independent combines with some ugly patterns that break once they're optimised away. Reviewed By: RKSimon, atanasyan Differential Revision: https://reviews.llvm.org/D101342	2021-05-05 16:34:45 +01:00
Fraser Cormack	61a46375a2	[RISCV][VP][NFC] Add tests for VP_SREM and VP_UREM As agreed in D101826, these are follow-up tests for the RISC-V VP support.	2021-05-05 13:13:34 +01:00
Fraser Cormack	437468f319	[RISCV][VP][NFC] Add tests for VP_MUL and VP_[US]DIV As agreed in D101826, these are follow-up tests for the RISC-V VP support.	2021-05-05 13:08:57 +01:00
Fraser Cormack	491a3d1359	[RISCV][VP][NFC] Add tests for VP_SHL and VP_LSHR As agreed in D101826, these are follow-up tests for the RISC-V VP support. Tests for VP_ASHR were landed as part of D101826.	2021-05-05 13:01:04 +01:00
Fraser Cormack	3fbcf07a99	[RISCV][VP][NFC] Add tests for VP_AND, VP_XOR, VP_OR As agreed in D101826, these are follow-up tests for the RISC-V VP support.	2021-05-05 12:58:08 +01:00
Fraser Cormack	6f17613bfb	[RISCV][VP] Lower VP ISD nodes to RVV instructions This patch supports all of the current set of VP integer binary intrinsics by lowering them to to RVV instructions. It does so by using the existing RISCVISD *_VL custom nodes as an intermediate layer. Both scalable and fixed-length vectors are supported by using this method. One notable change to the existing vector codegen strategy is that scalable all-ones and all-zeros mask SPLAT_VECTORs are now lowered to RISCVISD VMSET_VL and VMCLR_VL nodes to match their fixed-length BUILD_VECTOR counterparts. This allows them to reuse the existing "all-ones" VL patterns. To reduce the size of the phabricator diff, some tests are intentionally left out and will be added later if the patch is accepted. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101826	2021-05-05 12:32:24 +01:00
Fraser Cormack	cd6a52fede	[RISCV] Cap legal fixed-length vectors to 256-element types Previously, RISC-V would make legal all fixed-length vectors types whose size are less than or equal to some function of the minimum value of VLEN and the maximum-permissible LMUL grouping. Due to vector legalization issues, this patch instead caps the legal fixed-length vector types to those with 256 elements. This value was chosen because it is the longest vector length which has corresponding MVTs across all supported element types. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101839	2021-05-05 09:51:08 +01:00
Fraser Cormack	6523ff6d47	[ValueTypes] Add MVTs for v256i16 and v256f16 This patch adds the two MVTs to fix a legalizer crash when using vector shuffles of <256 x i16> and <128 x i16> on RISC-V. The legalizer can't promote the operand of `v256i32 = any_extend_vector_inreg v128i16`. Reviewed By: craig.topper, RKSimon Differential Revision: https://reviews.llvm.org/D101769	2021-05-04 18:06:13 +01:00
Jessica Clarke	fb92cf9208	[RISCV] Pre-commit tests for D101342 These tests show inefficient sign extension for AMOs on RISC-V. The normal CodeGen tests use anyext return values, but if marked signext then we end up generating unnecessary sign extension instructions. This can be seen when compiling C that returns an i32 (signed or unsigned), where the calling convention results in a signext return value.	2021-05-04 11:12:43 +01:00
Fraser Cormack	46fa214a6f	[RISCV] Lower splats of non-constant i1s as SETCCs This patch adds support for splatting i1 types to fixed-length or scalable vector types. It does so by lowering the operation to a SETCC of the equivalent i8 type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101465	2021-05-04 09:14:05 +01:00
Fraser Cormack	d23e4f6872	[RISCV] Add support for fmin/fmax vector reductions Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101518	2021-05-03 10:33:51 +01:00
Craig Topper	ba63cdb8f2	[RISCV] Store SEW in RISCV vector pseudo instructions in log2 form. This shrinks the immediate that isel table needs to emit for these instructions. Hoping this allows me to change OPC_EmitInteger to use a better variable length encoding for representing negative numbers. Similar to what was done a few months ago for OPC_CheckInteger. The alternative encoding uses less bytes for negative numbers, but increases the number of bytes need to encode 64 which was a very common number in the RISCV table due to SEW=64. By using Log2 this becomes 6 and is no longer a problem.	2021-05-02 12:09:20 -07:00
Fraser Cormack	1d85b24762	[RISCV][NFC] Merge RV32/RV64 test checks with a common prefix	2021-04-30 09:43:48 +01:00
Fraser Cormack	791766e6d2	[RISCV] Support STEP_VECTOR with a step greater than one DAGCombiner was recently taught how to combine STEP_VECTOR nodes, meaning the step value is no longer guaranteed to be one by the time it reaches the backend for lowering. This patch supports such cases on RISC-V by lowering to other step values to a multiply following the vid.v instruction. It includes a small optimization for common cases where the multiply can be expressed as a shift left. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D100856	2021-04-30 09:36:18 +01:00
luxufan	5603ed60ad	[RISCV] Fix StackOffset calculation when using sp to access the fixed stack object in the case of rvv vector objects existed When rvv vector objects existed, using sp to access the fixed stack object will pass the rvv vector objects field. So the StackOffset needs add a scalable offset of the size of rvv vector objects field Differential Revision: https://reviews.llvm.org/D100286	2021-04-30 11:02:38 +08:00
luxufan	325b454ed8	[RISCV] Precommit a test case that test accessing a fixed object when has rvv vector object existed Differential Revision: https://reviews.llvm.org/D100284	2021-04-30 10:35:03 +08:00
Craig Topper	dcdda2bdf2	[RISCV] Teach DAG combine to fold (and (select_cc lhs, rhs, cc, -1, c), x) -> (select_cc lhs, rhs, cc, x, (and, x, c)) Similar for or/xor with 0 in place of -1. This is the canonical form produced by InstCombine for something like `c ? x & y : x;` Since we have to use control flow to expand select we'll usually end up with a mv in basic block. By folding this we may be able to pull the and/or/xor into the block instead and avoid a mv instruction. The code here is based on code from ARM that uses this to create predicated instructions. I'm doing it on SELECT_CC so it happens late, but we could do it on select earlier which is what ARM does. I'm not sure if we lose any combine opportunities if we do it earlier. I left out add and sub because this can separate sext.w from the add/sub. It also made a conditional i64 addition/subtraction on RV32 worse. I guess both of those would be fixed by doing this earlier on select. The select-binop-identity.ll test has not been commited yet, but I made the diff show the changes to it. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D101485	2021-04-29 09:43:51 -07:00
Craig Topper	60216adef1	[RISCV] Add test cases for D101485. NFC	2021-04-29 09:43:51 -07:00
Craig Topper	0c330afdfa	[RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32. This replaces D98479. This allows type legalization to form SPLAT_VECTOR_PARTS so we don't lose the splattedness when the scalar type is split. I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so we can continue using non-VL nodes for scalable vectors. I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR with other operations. Especially interesting is a splat BUILD_VECTOR of the extract_vector_elt which can become a splat shuffle, but won't if we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR or add visitSPLAT_VECTOR. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100803	2021-04-29 08:20:09 -07:00
Craig Topper	25391cec3a	[RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31. This seems like a reasonable upper bound on VL. WG discussions for the V spec would probably allow us to use 2^16 as an upper bound on VLEN, but this is good enough for now. This allows us to remove sext and zext if user happens to assign the size_t result into an int and then uses it as a VL intrinsic argument which is size_t. Reviewed By: frasercrmck, rogfer01, arcbbb Differential Revision: https://reviews.llvm.org/D101472	2021-04-29 08:07:59 -07:00
Fraser Cormack	f6c54a61da	[RISCV][NFC] Combine identical RV32 and RV64 test checks	2021-04-29 11:38:10 +01:00
Fraser Cormack	43ad058a01	[RISCV] Fix stack slot for argument types (Bug 49500) This is an complementary/alternative fix for D99068. It takes a slightly different approach by explicitly summing up all of the required split part type sizes and ensuring we allocate enough space for them. It also takes the maximum alignment of each part. Compared with D99068 there are fewer changes to the stack objects in existing tests. However, @luismarques has shown in that patch that there are opportunities to reduce our stack usage in the future. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D99087	2021-04-29 09:10:48 +01:00
Craig Topper	ce09dd54e6	[RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter. This adds a special operand type that is allowed to be either an immediate or register. By giving it a unique operand type the machine verifier will ignore it. This perturbs a lot of tests but mostly it is just slightly different instruction orders. Something bad did happen to some min/max reduction tests. We're spilling vector registers when we weren't before. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101246	2021-04-27 14:38:16 -07:00
Craig Topper	262a72f50f	[RISCV] Use stack slot to handle SPLAT_VECTOR_PARTS on RV32. Reduces the amount of vector ALU operations and reduces vector register pressure.	2021-04-26 15:43:02 -07:00
Craig Topper	e2cd92cb9b	[RISCV] Match splatted load to scalar load + splat. Form strided load during isel. This modifies my previous patch to push the strided load formation to isel. This gives us opportunity to fold the splat into a .vx operation first. Using a scalar register and a .vx operation reduces vector register pressure which can be important for larger LMULs. If we can't fold the splat into a .vx operation, then it can make sense to use a strided load to free up the vector arithmetic ALU to do actual arithmetic rather than tying it up with vmv.v.x. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D101138	2021-04-26 13:32:03 -07:00
Ben Shi	60ed86d350	[RISCV] Optimize addition with immediate Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D101244	2021-04-26 13:26:17 +08:00
Craig Topper	8f5cd49405	[RISCV] Teach DAG combine what bits Zbp instructions demanded from their inputs. This teaches DAG combine that shift amount operands for grev, gorc shfl, unshfl only read a few bits. This also teaches DAG combine that grevw, gorcw, shflw, unshflw, bcompressw, bdecompressw only consume the lower 32 bits of their inputs. In the future we can teach SimplifyDemandedBits to also propagate demanded bits of the output to the inputs in some cases.	2021-04-25 21:54:06 -07:00
Levy Hsu	8cf54c7ff5	[RISCV] [1/2] Add IR intrinsic for Zbe extension RV32/64: bcompress bdecompress RV64 ONLY: bcompressw bdecompressw Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101143	2021-04-25 19:14:34 -07:00
Craig Topper	3064a63b2b	[RISCV] Remove GetVRegNoV0 from the output register class of masked compare pseudo instructions. Theses instructions are allowed to write v0 when they are masked. We'll still never use v0 because of the earlyclobber constraint so this doesn't really help anything. It just makes the definitions correct. While I was there remove an unused multiclass I noticed. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D101118	2021-04-23 09:33:29 -07:00
Fraser Cormack	83b8f8da82	[RISCV] Custom lower vector F(MIN\|MAX)NUM to vf(min\|max) This patch adds support for both scalable- and fixed-length vector code lowering of the llvm.minnum and llvm.maxnum intrinsics to the equivalent RVV instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D101035	2021-04-23 12:22:15 +01:00
Levy Hsu	b49337bbb9	[RISCV] [1/2] Add IR intrinsic for Zbp extension RV32/64: grev grevi gorc gorci shfl shfli unshfl unshfli RV64 ONLY: grevw greviw gorcw gorciw shflw shfli (For non-existing shfliw) unshfli (For non-existing unshfliw) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100830	2021-04-22 16:34:51 -07:00
Craig Topper	5185b52988	[RISCV] Fix crash with fptosi.sat/fptoui.sat intrinsics on RV64. Add test cases. Add PromoteIntOp_FP_TO_XINT_SAT to type legalize the bit width operand from i32 to i64 for RV64. Add test cases for the saturating intrinsics for half/float/double and i32/i64. CodeGen is definitely not optimal. We can probably make use of the native behavior of fcvt instructions in many cases. Fixes PR50083	2021-04-22 15:18:15 -07:00
Craig Topper	e01c419ecd	[RISCV] Add IR intrinsics for vmsge(u).vv/vx/vi. These instructions don't really exist, but we have ways we can emulate them. .vv will swap operands and use vmsle().vv. .vi will adjust the immediate and use .vmsgt(u).vi when possible. For .vx we need to use some of the multiple instruction sequences from the V extension spec. For unmasked vmsge(u).vx we use: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd For cases where mask and maskedoff are the same value then we have vmsge{u}.vx v0, va, x, v0.t which is the vd==v0 case that requires a temporary so we use: vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt For other masked cases we use this sequence: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0 We trust that register allocation will prevent vd in vmslt{u}.vx from being v0 since v0 is still needed by the vmxor. Differential Revision: https://reviews.llvm.org/D100925	2021-04-22 10:44:38 -07:00
Craig Topper	d77d56acfd	[RISCV] Add missing tests for vector type for second operand of vmsgt and vmsgtu IR intrinsics. Refactor to use new multiclass instead of individual patterns. We already supported this due to SEW=64 on RV32, but we didn't have test cases for all the types we supported. Part of D100925	2021-04-22 10:44:38 -07:00
Craig Topper	9524a0553d	[RISCV] Support vector type for second operand of vmfge and vmfgt IR intrinsics. We don't have instructions for these, but can swap the operands to use vmle/vmflt. This makes the IR interface more consistent and simplifies the frontend implementation. Part of D100925	2021-04-22 10:44:38 -07:00
Craig Topper	70254ccb69	[RISCV] Turn splat shuffles of vector loads into strided load with stride of x0. Implementations are allowed to optimize an x0 stride to perform less memory accesses. This is the case in SiFive cores. No idea if this is the case in other implementations. We might need a tuning flag for this. Reviewed By: frasercrmck, arcbbb Differential Revision: https://reviews.llvm.org/D100815	2021-04-22 10:02:57 -07:00
Craig Topper	77f14c96e5	[RISCV] Use stack temporary to splat two GPRs into SEW=64 vector on RV32. Rather than doing splatting each separately and doing bit manipulation to merge them in the vector domain, copy the data to the stack and splat it using a strided load with x0 stride. At least on some implementations this vector load is optimized to not do a load for each element. This is equivalent to how we move i64 to f64 on RV32. I've only implemented this for the intrinsic fallbacks in this patch. I think we do similar splatting/shifting/oring in other places. If this is approved, I'll refactor the others to share the code. Differential Revision: https://reviews.llvm.org/D101002	2021-04-22 09:50:07 -07:00
Jun Ma	978eb3f168	[DAGCombiner] Allow operand of step_vector to be negative. It is proper to relax non-negative limitation of step_vector. Also this patch adds more combines for step_vector: (sub X, step_vector(C)) -> (add X, step_vector(-C)) Differential Revision: https://reviews.llvm.org/D100812	2021-04-22 20:58:03 +08:00
Serge Pavlov	740962e5d0	[RISCV] Custom lowering of SET_ROUNDING Differential Revision: https://reviews.llvm.org/D91242	2021-04-22 15:04:55 +07:00
Serge Pavlov	6e63dfdae2	[RISCV] Custom lowering of FLT_ROUNDS_ Differential Revision: https://reviews.llvm.org/D90854	2021-04-22 11:39:15 +07:00
Craig Topper	f6d8cf7798	[RISCV] Teach lowerSPLAT_VECTOR_PARTS to detect cases where Hi is sign extended from Lo. This recognizes the case when Hi is (sra Lo, 31). We can use SPLAT_VECTOR_I64 rather than splatting the high bits and combining them in the vector register.	2021-04-21 20:24:23 -07:00
Fraser Cormack	c141bd3cf9	[DAGCombiner] Support all-ones/all-zeros SPLAT_VECTOR in more combines This patch adds incrementally-better support for SPLAT_VECTOR in a handful of vector combines by changing a few more isBuildVectorAllOnes/isBuildVectorAllZeros to the equivalent isConstantSplatVectorAllOnes/Zeros calls. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D100851	2021-04-21 11:05:37 +01:00
Fraser Cormack	3f02d26943	[RISCV] Further fixes for RVV stack offset computation This patch fixes a case missed out by D100574, in which RVV scalable stack offset computations may require three live registers in the case where the offset's fixed component is 12 bits or larger and has a scalable component. Instead of adding an additional emergency spill slot, this patch further optimizes the scalable stack offset computation sequences to reduce register usage. By emitting the sequence to compute the scalable component before the fixed component, we can free up one scratch register to be reallocated by the sequence for the fixed component. Doing this saves one register and thus one additional emergency spill slot. Compare: $x5 = LUI 1 $x1 = ADDIW killed $x5, -1896 $x1 = ADD $x2, killed $x1 $x5 = PseudoReadVLENB $x6 = ADDI $x0, 50 $x5 = MUL killed $x5, killed $x6 $x1 = ADD killed $x1, killed $x5 versus: $x5 = PseudoReadVLENB $x1 = ADDI $x0, 50 $x5 = MUL killed $x5, killed $x1 $x1 = LUI 1 $x1 = ADDIW killed $x1, -1896 $x1 = ADD $x2, killed $x1 $x1 = ADD killed $x1, killed $x5 Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100847	2021-04-21 10:51:07 +01:00
Craig Topper	78abad569c	[RISCV] Add missing SEW=64 tests to vmslt-rv32.ll. NFC	2021-04-20 18:31:36 -07:00
Fraser Cormack	60622b82a7	[RISCV][NFC] Add tests for scalable-vector DAGCombiner improvements These will all be improved by future patches.	2021-04-20 14:26:26 +01:00
Fraser Cormack	b4a358a7ba	[RISCV] Fix missing emergency slots for scalable stack offsets This patch adds an additional emergency spill slot to RVV code. This is required as RVV stack offsets may require an additional register to compute. This patch includes an optimization by @HsiangKai <kai.wang@sifive.com> to reduce the number of registers required for the computation of stack offsets from 3 to 2. Otherwise we'd need two additional emergency spill slots. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D100574	2021-04-20 09:59:41 +01:00
Jun Ma	1ef5699d1a	[DAGCombiner] Support fold zero scalar vector. This patch changes ISD::isBuildVectorAllZeros to ISD::isConstantSplatVectorAllZeros which handles zero sclar vector. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D100813	2021-04-20 16:28:43 +08:00
Fraser Cormack	457da7f298	[SelectionDAG] Relax constraints on STEP_VECTOR step operand This patch relaxes the requirement that the STEP_VECTOR step constant must be of a type at least as large as the vector element type. This does not permit its use on targets which have legal vector element types larger than the largest legal scalar type, such as i64 vectors on RV32. As such, the requirement has been loosened so that the step operand must be any scalar type so long as the constant immediate is non-negative and the value fits inside the vector element type. This limits combining optimizations in certain circumstances but in practice it's unlikely to be a hindrance. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D100660	2021-04-20 08:41:42 +01:00
Ben Shi	b7249bf3b5	[RISCV][test] Add a new test of addition Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D100767	2021-04-20 12:11:56 +08:00
Craig Topper	7ed01a420a	[RISCV] Pad v4i1/v2i1/v1i1 stores with 0s to make a full byte. As noted in the FIXME there's a sort of agreement that the any extra bits stored will be 0. The generated code is pretty terrible. I was really hoping we could use a tail undisturbed trick, but tail undisturbed no longer applies to masked destinations in the current draft spec. Fingers crossed that it isn't common to do this. I doubt IR from clang or the vectorizer would ever create this kind of store. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100618	2021-04-19 11:05:18 -07:00
Fraser Cormack	c9a93c3e01	[RISCV] Lower vector shuffles to vrgather operations This patch extends the lowering of RVV fixed-length vector shuffles to avoid the default stack expansion and instead lower to vrgather instructions. For "permute"-style shuffles where one vector is swizzled, we can lower to one vrgather. For shuffles involving two vector operands, we lower to one unmasked vrgather (or splat, where appropriate) followed by a masked vrgather which blends in the second half. On occasion, when it's not possible to create a legal BUILD_VECTOR for the indices, we use vrgatherei16 instructions with 16-bit index types. For 8-bit element vectors where we may have indices over 255, we have a fairly blunt fallback to the stack expansion to avoid custom-splitting of the vector types. To enable the selection of masked vrgather instructions, this patch extends the various RISCVISD::VRGATHER nodes to take a passthru operand. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100549	2021-04-19 11:13:13 +01:00
David Sherwood	83f5fa519e	[CodeGen] Improve code generation for clamping of constant indices with scalable vectors When trying to clamp a constant index into a scalable vector we can test if the index is less than the minimum number of elements in the vector. If so, we can simply return the index because we know it is guaranteed to fit inside the vector. Differential Revision: https://reviews.llvm.org/D100639	2021-04-19 08:34:17 +01:00
Fraser Cormack	ec0f7c6923	[RISCV] Rerun stack test through update_llc_test_checks.py Adjusts formatting of comments only. Just to reduce diffs in future patches.	2021-04-16 11:08:58 +01:00
Jim Lin	2893570e86	[RISCV] Don't emit save-restore call if function is a interrupt handler It has to save all caller-saved registers before a call in the handler. So don't emit a call that save/restore registers. Reviewed By: simoncook, luismarques, asb Differential Revision: https://reviews.llvm.org/D100532	2021-04-16 12:54:47 +08:00
Fraser Cormack	eae0ac3a1f	[RISCV] Pre-commit vector shuffle test cases This codegen will be improved by future patches.	2021-04-15 10:31:13 +01:00
ShihPo Hung	d5e962f1f2	[RISCV] Implement COPY for Zvlsseg registers When copying Zvlsseg register tuples, we split the COPY to NF whole register moves as below: $v10m2_v12m2 = COPY $v4m2_v6m2 # NF = 2 => $v10m2 = PseudoVMV2R_V $v4m2 $v12m2 = PseudoVMV2R_V $v6m2 This patch copies forwardCopyWillClobberTuple from AArch64 to check register overlapping. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D100280	2021-04-13 18:55:51 -07:00
Fraser Cormack	d737c47137	[RISCV] Support vector SET[U]LT and SET[U]GE with splatted immediates This patch adds more optimized codegen for the above SETCC forms, by matching the '.vi' vector forms when the immediate is a 5-bit signed immediate plus 1. The immediate can be decremented and the corresponding SET[U]LE or SET[U]GT forms can be matched. This work was left as a TODO from D94168. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100096	2021-04-12 18:36:45 +01:00
Craig Topper	ff902080a9	[RISCV] Use SLLI/SRLI instead of SLLIW/SRLIW for (srl (and X, 0xffff), C) custom isel on RV64. We don't need the sign extending behavior here and SLLI/SRLI are able to compress to C.SLLI/C.SRLI.	2021-04-11 13:59:51 -07:00
Craig Topper	3ae71226ef	[RISCV] Drop earlyclobber constraint from vwadd(u).wx, vwsub(u).wx, vfwadd.wf and vfwsub.wf. The first source has the same EEW as the destination and the other source is a scalar so the overlap constraints don't apply to the unmasked version. For the masked version we have a constraint that the destination can't be V0 so that covers the only overlap issue there. Reviewed By: khchen Differential Revision: https://reviews.llvm.org/D100217	2021-04-11 10:19:45 -07:00
Craig Topper	bc0e052730	[RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported. Similar to what we do for zext.w. Disable the (srl (and X, 0xffff), C) custom isel when zext.h is available.	2021-04-11 10:03:35 -07:00
Craig Topper	48d69edade	[RISCV] Add i8 and i16 srli and srai tests to Zbb/Zbp test files. NFC These require the input to be zero or sign extended. If we have sext.b, sext.h or zext.h instructions we can use them. Otherwise we need to use a pair of shifts to accomplish the zero/sign extend and the final shift. We don't currently use zext.h when it is available.	2021-04-11 10:00:38 -07:00
Fraser Cormack	a5693445ca	[RISCV] Support OR/XOR/AND reductions on vector masks This patch adds RVV codegen support for OR/XOR/AND reductions for both scalable- and fixed-length vector types. There are a few possible codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be used to some extent -- but the vpopc.m instruction was chosen since it produces the scalar result in one instruction, after which scalar instructions can finish off the computation. The reductions are lowered identically for both scalable- and fixed-length vectors, although some alternate strategies may be more optimal on fixed-length vectors since it's cheaper to get the length of those types. Other reduction types were not deemed to be relevant for mask vectors. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D100030	2021-04-08 09:46:38 +01:00
Hsiangkai Wang	ba72bdef32	[RISCV] Add scalable offset under very large stack size. If the stack size is larger than 12 bits, we have to use a scratch register to store the stack size. Before we introduce the scalable stack offset, we could simplify %0 = ADDI %stack.0, 0 => %scratch = ... # sequence of instructions to move the offset into %%scratch %0 = ADD %fp, %scratch However, if the offset contains scalable part, we need to consider it. %0 = ADDI %stack.0, 0 => %scratch = ... # sequence of instructions to move the offset into %%scratch %scratch = ADD %fp, %scratch %scalable_offset = ... # sequence of instructions for vscaled-offset. %0 = ADD/SUB %scratch, %scalable_offset Differential Revision: https://reviews.llvm.org/D100035	2021-04-08 14:46:05 +08:00
Hsiangkai Wang	b8cd668115	[NFC][RISCV] Add test for scalable offset under large stack size. This test case shows that we access wrong stack slots when the frame object has scalable offset under large stack size. Differential Revision: https://reviews.llvm.org/D100084	2021-04-08 14:46:05 +08:00
Craig Topper	56ea2e2fdd	[RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition. If the constants have a difference of 1 we can convert one to the other by adding or subtracting the condition. We have a DAG combine for this, but it only runs before type legalization. If the select is introduced later during type legalization or op legalization we will miss it. We don't need a specific condition, but some conditions are harder to materialize than others on RISCV. I know that SETLT will be a single instruction and it is what is used by the motivating pattern from signed saturating add/sub. Differential Revision: https://reviews.llvm.org/D99021	2021-04-07 13:47:17 -07:00
Craig Topper	f087d7544a	[RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32. This can't use our normal strategy of splatting the scalar and using a .vv operation instead of .vx. Instead this patch bitcasts the vector to the equivalent SEW=32 vector and inserts the scalar parts using two vslide1up/down. We do that unmasked and apply the mask separately at the end with a vmerge. For vslide1up there maybe some other options here like getting i64 into element 0 and using vslideup.vi with this vector as vd and the original source as vs1. Masking would still need to be done afterwards. That idea doesn't work for vslide1down. We need to slidedown and then insert a single scalar at vl-1 which we could do with a vslideup, but that assumes vl > 0 which I don't think we can assume. The i32 double slide1down implemented here is the best I could come up with and I just made vslide1up consistent. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99910	2021-04-07 10:44:53 -07:00
Craig Topper	67953311e2	[SelectionDAG] Teach SelectionDAG::FoldConstantArithmetic to handle SPLAT_VECTOR This allows FoldConstantArithmetic to handle SPLAT_VECTOR in addition to BUILD_VECTOR. This allows it to support scalable vectors. I'm also allowing fixed length SPLAT_VECTOR which is used by some targets, but I'm not familiar enough to write tests for those targets. I had to block this function from running on CONCAT_VECTORS to avoid calling getNode for a CONCAT_VECTORS of 2 scalars. This can happen because the 2 operand getNode calls this function for any opcode. Previously we were protected because CONCAT_VECTORs of BUILD_VECTOR is folded to a larger BUILD_VECTOR before that call. But it's not always possible to fold a CONCAT_VECTORS of SPLAT_VECTORs, and we don't even try. This fixes PR49781 where DAG combine thought constant folding should be possible, but FoldConstantArithmetic couldn't do it. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D99682	2021-04-07 10:03:33 -07:00
Craig Topper	cb1028a0b9	[RISCV] When custom iseling masked stores, copy the mask into V0 instead of virtual register. I missed a few intrinsics in `3dd4aa7d09` when I did this for masked loads and masked segment loads/stores. Found while trying to share more code between these custom isel functions.	2021-04-05 21:28:32 -07:00
Craig Topper	391514436d	[RISCV] Add more RV32 vslide1up intrinsic test cases. NFC For some reason we only had 1 test case. This synchronizes the test with vslide1down so we have the same number of tests for both.	2021-04-05 17:03:52 -07:00
Fraser Cormack	af3a839c70	[RISCV] Add support for bitcasts between scalars and fixed-length vectors This patch supports bitcasts from scalar types to fixed-length vectors and vice versa. It custom-lowers and custom-legalizes them to EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element vectors to hold the scalar where appropriate. Previously, some of these would fail to select, others would be expanded through stack loads and stores. Effort was made to ensure the codegen avoids the stack for both legal and illegal scalar types. Some of the codegen could be improved, but on first glance it looks like a general optimization of EXTRACT_VECTOR_ELT when extracting an i64 element on RV32. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99667	2021-04-05 17:21:55 +01:00
Fraser Cormack	3f0df4d7b0	[RISCV] Expand scalable-vector truncstores and extloads Caught in internal testing, these operations are assumed legal by default, even for scalable vector types. Expand them back into separate truncations and stores, or loads and extensions. Also add explicit fixed-length vector tests for these operations, even though they should have been correct already. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99654	2021-04-05 17:03:45 +01:00
Fraser Cormack	0d0514dd9b	[RISCV] Add a test showing incorrect codegen This patch adds a test which shows how the compiler incorrectly sets the size and alignment of a stack object used to indirectly pass vector types to functions. In the particular example, the test passes a <4 x i8> vector type to a function and creates a stack object of size and alignment equal to 4 bytes. However, the code generated to set up that parameter has been scalarized and stores each element as individual XLEN-sized values. Thus on RV32 this stores 16 bytes and on RV64 32 bytes, both of which clobber the stack. Similarly, the alignment is set up as the alignment of the vector type, which is not necessarily the natural alignment of XLEN. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D95025	2021-04-05 11:51:03 +01:00
Craig Topper	4708a05da0	[RISCV] Use gorciw for i32 orc.b intrinsic when Zbp is enabled. The W version of orc.b does not exist in Zbp so we need to use gorci encoding. If we have Zbp, we can use gorciw which can avoid a sext.w in some cases.	2021-04-04 17:14:28 -07:00
Craig Topper	a0e611cf72	[RISCV] Add signext attribute to i32 orc.b test for RV64 to match other Zbb tests. Shows the sext.w at the end that would show up in C code. I'm thinking orc.b would preserve sign bits from it's input, but I'm not sure.	2021-04-02 16:49:53 -07:00
Levy Hsu	f78d932cf2	[RISCV] Add IR intrinsics for Zbc extension Head files are included in a separate patch in case the name needs to be changed. RV32 / 64: clmul clmulh clmulr Differential Revision: https://reviews.llvm.org/D99711	2021-04-02 12:09:13 -07:00
Levy Hsu	944adbf285	Recommit "[RISCV] Add IR intrinsic for Zbb extension" Forgot to amend the Author. Original commit message: Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b Differential Revision: https://reviews.llvm.org/D99320	2021-04-02 11:50:19 -07:00
Craig Topper	1f0b309f24	Revert "[RISCV] Add IR intrinsic for Zbb extension" This reverts commit `1808194590`. I forgot to change the author.	2021-04-02 11:47:02 -07:00
Craig Topper	1808194590	[RISCV] Add IR intrinsic for Zbb extension Header files are included in a separate patch in case the name needs to be changed. RV32 / 64: orc.b	2021-04-02 11:23:57 -07:00
Levy Hsu	b001d574d7	[RISCV] Add IR intrinsic for Zbr extension Implementation for RISC-V Zbr extension intrinsic. Header files are included in separate patch in case the name needs to be changed RV32 / 64: crc32b crc32h crc32w crc32cb crc32ch crc32cw RV64 Only: crc32d crc32cd Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99009	2021-04-02 10:58:45 -07:00
Craig Topper	d7ffa82a8e	[RISCV] Improve 64-bit integer constant materialization for more cases. For positive constants we try shifting left to remove leading zeros and fill the bottom bits with 1s. We then materialize that constant shift it right. This patch adds a new strategy to try filling the bottom bits with zeros instead. This catches some additional cases.	2021-04-02 10:18:08 -07:00
Fraser Cormack	411673e769	[RISCV] Test llvm.experimental.vector.insert intrinsics on RV32 RV32 is able to use the llvm.experimental.vector.insert intrinsics too. This patch ensures they're tested. Reviewed By: khchen, asb Differential Revision: https://reviews.llvm.org/D99655	2021-04-02 11:49:54 +01:00
Fraser Cormack	3b48d849d4	[RISCV] Optimize more redundant VSETVLIs D99717 introduced some test cases which showed that the output of one vsetvli into another would not be picked up by the RISCVCleanupVSETVLI pass. This patch teaches the optimization about such a pattern. The pattern is quite common when using the RVV vsetvli intrinsic to pass the VL onto other intrinsics. The second test case introduced by D99717 is left unoptimized by this patch. It is a rarer case and will require us to rewire any uses of the redundant vset[i]vli's output to the previous one's. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99730	2021-04-02 10:04:07 +01:00
Fraser Cormack	a4ac847c8e	[RISCV] Add some tests showing vsetvli cleanup opportunities Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99717	2021-04-02 09:43:04 +01:00
Craig Topper	438b6dd3e5	[RISCV] Add missing nxvXf64 intrinsics tests cases for floating-point compare for RV32.	2021-04-01 20:57:13 -07:00
Craig Topper	5a9a8c7cd4	[RISCV] Add more nxvi64 vector intrinsic tests for RV32. NFC This confirms we handle most instrutions gracefully. We do currently fail for vslide1up and vslide1down though.	2021-04-01 20:34:28 -07:00
Craig Topper	0187c3a45c	[RISCV] Add nxvXi64 test cases to the RV32 Zvamo intrinsic test files. NFC	2021-04-01 17:08:20 -07:00
Craig Topper	766d27dc85	[RISCV] Add isel patterns to handle vrsub intrinsic with 2 vector operands. This occurs when we type legalize an i64 scalar input on RV32. We need to manually splat, which requires a vector input. Rather than special case this in lowering just pattern match it.	2021-04-01 14:10:21 -07:00
Craig Topper	dbbc95e3e5	[RISCV] Use softPromoteHalf legalization for fp16 without Zfh rather than PromoteFloat. The default legalization strategy is PromoteFloat which keeps half in single precision format through multiple floating point operations. Conversion to/from float is done at loads, stores, bitcasts, and other places that care about the exact size being 16 bits. This patches switches to the alternative method softPromoteHalf. This aims to keep the type in 16-bit format between every operation. So we promote to float and immediately round for any arithmetic operation. This should be closer to the IR semantics since we are rounding after each operation and not accumulating extra precision across multiple operations. X86 is the only other target that enables this today. See https://reviews.llvm.org/D73749 I had to update getRegisterTypeForCallingConv to force f16 to use f32 when the F extension is enabled. This way we can still pass it in the lower bits of an FPR for ilp32f and lp64f ABIs. The softPromoteHalf would otherwise always give i16 as the argument type. Reviewed By: asb, frasercrmck Differential Revision: https://reviews.llvm.org/D99148	2021-04-01 12:41:57 -07:00
Craig Topper	d157e3f387	[RISCV] Fix handling of nxvXi64 vmsgt(u).vx intrinsics on RV32. We need to splat the scalar separately and use .vv, but there is no vmsgt(u).vv. So add isel patterns to select vmslt(u).vv with swapped operands. We also need to get VT to use for the splat from an operand rather than the result since the result VT is nxvXi1. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D99704	2021-04-01 10:38:05 -07:00
Craig Topper	b7c2e577cc	[RISCV] Add custom type legalization to form MULHSU when possible. There's no target independent ISD opcode for MULHSU, so custom legalize 2*XLen multiplies ourselves. We have to be a little careful to prefer MULHU or MULHSU. I thought about doing this in isel by pattern matching the (add (mul X, (srai Y, XLen-1)), (mulhu X, Y)) pattern. I decided against this because the add might become part of a chain of adds. I don't trust DAG combine not to reassociate with other adds making it difficult to find both pieces again. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D99479	2021-04-01 10:15:55 -07:00
Craig Topper	dadcd940f0	[RISCV] Add MULHU and MULHS tests with a constant operand.	2021-04-01 10:15:55 -07:00
Craig Topper	d61b40ed27	[RISCV] Improve 64-bit integer materialization for some cases. This adds a new integer materialization strategy mainly targeted at 64-bit constants like 0xffffffff where there are 32 or more trailing ones with leading zeros. We can materialize these by using an addi -1 and srli to restore the leading zeros. This matches what gcc does. I haven't limited to just these cases though. The implementation here takes the constant, shifts out all the leading zeros and shifts ones into the LSBs, creates the new sequence, adds an srli, and checks if this is shorter than our original strategy. I've separated the recursive portion into a standalone function so I could append the new strategy outside of the recursion. Since external users are no longer using the recursive function, I've cleaned up the external interface to return the sequence instead of taking a vector by reference. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D98821	2021-04-01 09:12:52 -07:00
Simonas Kazlauskas	777a58e05b	Support {S,U}REMEqFold before legalization This allows these optimisations to apply to e.g. `urem i16` directly before `urem` is promoted to i32 on architectures where i16 operations are not intrinsically legal (such as on Aarch64). The legalization then later can happen more directly and generated code gets a chance to avoid wasting time on computing results in types wider than necessary, in the end. Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D88785	2021-04-01 01:35:41 +03:00
Craig Topper	2a8b7cab6a	[RISCV] Add RISCVISD opcodes for CLZW and CTZW. Our CLZW isel pattern is quite easily broken by surrounding code preventing it from matching sometimes. This usually results in failing to remove the and X, 0xffffffff inserted by type legalization. The add with -32 that type legalization also inserts will often gets combined into other add/sub nodes. That doesn't usually result in extra code when we don't use clzw. CTTZ seems to be less fragile, but I wanted to keep it consistent with CTLZ. Reviewed By: asb, HsiangKai Differential Revision: https://reviews.llvm.org/D99317	2021-03-31 09:40:07 -07:00
Craig Topper	04f10ab367	[RISCV] Add isel patterns to select vsub_vx intrinsic to vadd.vi if it uses a small enough immediate Also modify the simm5_plus1 check because Imm-1 is UB if Imm happens to be INT64_MIN. I don't think the compiler would optimize based on that in this usage, but it could fail UBSan or -ftrapv. Reviewed By: HsiangKai, frasercrmck Differential Revision: https://reviews.llvm.org/D99637	2021-03-31 09:26:41 -07:00
Fraser Cormack	10fc6e4358	[RISCV] Add support for the stepvector intrinsic This adds almost everything required for supporting the new stepvector intrinsic on RVV. It is lowered to the existing VID_VL SDNode. The only exception is a limitation that RV32 cannot yet lower the intrinsic on i64 vectors. This is because the step operand is (currently) required to be at least as large as the vector element type. I will look into patching that out and loosening the requirement to only an integer pointer type. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99594	2021-03-31 11:41:17 +01:00
Craig Topper	a33fcafaf0	[RISCV] Pass 'half' in the lower 16 bits of an f32 value when F extension is enabled, but Zfh is not. Without Zfh the half type isn't legal, but it could still be used as an argument/return in IR. Clang will not generate this today. Previously we promoted the half value to float for arguments and returns if the F extension is enabled but Zfh isn't. Then depending on which ABI is enabled we would pass it in either an FPR or a GPR in float format. If the F extension isn't enabled, it would get passed in the lower 16 bits of a GPR in half format. With this patch the value will always in half format and will be in the lower bits of a GPR or FPR. This should be consistent with where the bits are located when Zfh is enabled. I've based this implementation off of how this is done on ARM. I've manually nan-boxed the value to 32 bits using integer ops. It looks like flw, fsw, fmv.s, fmv.w.x, fmf.x.w won't canonicalize nans so should leave the value alone. I think those are the instructions that could get used on this value. Reviewed By: kito-cheng Differential Revision: https://reviews.llvm.org/D98670	2021-03-30 09:47:54 -07:00
Craig Topper	3dd4aa7d09	[RISCV] When custom iseling masked loads/stores, copy the mask into V0 instead of virtual register. This matches what we do in our isel patterns. In our internal testing we've found this is needed to make the fast register allocator happy at -O0. Otherwise it may assign V0 to an earlier operand and find itself with no registers left when it reaches the mask operand. By using V0 explicitly, the fast register allocator will see it when it checks for phys register usages before it starts allocating vregs. I'll try to update this with a test case. Unfortunately, this does appear to prevent some instruction reordering by the pre-RA scheduler which leads to the increased spills seen in some tests. I suspect that problem could already occur for other instructions that already used V0 directly. There's a lot of repeated code here that could do with some wrapper functions. Not sure if that should be at the level of the new code that deals with V0. That would require multiple output parameters to pass the glue, chain and register back. Maybe it should be at a higher level over the entire set of push_backs. Reviewed By: frasercrmck, HsiangKai Differential Revision: https://reviews.llvm.org/D99367	2021-03-29 10:20:43 -07:00
Roger Ferrer Ibanez	ef76a333fa	[RISCV] Fix offset computation for RVV In D97111 we changed the RVV frame layout when using sp or bp to address the stack slots so we could address the emergency stack slot. The idea is to put the RVV objects as far as possible (in offset terms) from the frame reference register (sp / fp / bp). When using fp this happens naturally because the RVV objects are already the top of the stack and due to the constraints of RVV (VLENB being a power of two >= 128) the stack remains aligned. The rest of this summary does not apply to this case. When using sp / bp we need to skip the non-RVV stack slots. The size of the the non-RVV objects is computed subtracting the callee saved register size (whose computation is added in D97111 itself) to the total size of the stack (which does not account for RVV stack slots). However, when doing so we round to 16 bytes when computing that size and we end emitting a smaller offset that may belong to a scalar stack slot (see D98801). So this change removes that rounding. Also, because we want the RVV objects be between the non-RVV stack slots and the callee-saved register slots, we need to make sure the RVV objects are properly aligned to 8 bytes. Adding a padding of 8 would render the stack unaligned. So when allocating space for RVV (only when we don't use fp) we need to have extra padding that preserves the stack alignment. This way we can round to 8 bytes the offset that skips the non-RVV objects and we do not misalign the whole stack in the way. In some circumstances this means that the RVV objects may have padding before (=lower offsets from sp/bp) and after (before the CSR stack slots). Differential Revision: https://reviews.llvm.org/D98802	2021-03-29 17:03:49 +00:00
Roger Ferrer Ibanez	3abd0bacc2	[NFC][RISCV] Add test showing wrong stack slot for GPR and RVV spilled registers This testcase shows that we attempt to assign the same offset sp + 16 to two different stack objects. The fix will come in a later change. Differential Revision: https://reviews.llvm.org/D98801	2021-03-29 17:03:18 +00:00
Roger Ferrer Ibanez	96d14ff505	[NFC][RISCV] Pass file through update_llc_tests to fix whitespace issues While addressing RVV frame layout issues I found this file had whitespace differences that made diffs noisier than they should be. Differential Revision: https://reviews.llvm.org/D98800	2021-03-29 17:02:47 +00:00
Bradley Smith	9745dce8c3	[SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps This is currently performed in SelectionDAGLegalize, here we make it also happen in LegalizeVectorOps, allowing a target to lower the SETCC condition codes first in LegalizeVectorOps and then lower to a custom node afterwards, without having to duplicate all of the SETCC condition legalization in the target specific lowering. As a result of this, fixed length floating point SETCC nodes can now be properly lowered for SVE. Differential Revision: https://reviews.llvm.org/D98939	2021-03-29 15:32:25 +01:00
Craig Topper	5a79909a14	[RISCV] Add a RV64 mulhsu test case. NFC	2021-03-28 15:54:44 -07:00
Craig Topper	7b35932b51	[RISCV] Add test case for mulhsu. We don't yet use mulhsu, but we should.	2021-03-28 11:03:39 -07:00
Craig Topper	5692fc38e0	[RISCV] Add a pattern for (sext_inreg (mul (and X, 0xffffffff), (and Y, 0xffffffff)), i32) to suppress MULW formation We have a special pattern for (mul (and X, 0xffffffff), (and Y, 0xffffffff)), to optimize the ANDs to shift. But if a sext_inreg coms first, we'll form a MULW and limit the effectiveness of the special match. So this patch adds a larger pattern to suppress the MULW formation by emitting a sext.w and then the same output we use for the (mul (and X, 0xffffffff), (and Y, 0xffffffff)). This should all get CSEd. This is the issue I was trying to fix with D99029, but that affected many more tests.	2021-03-27 15:37:18 -07:00
Zakk Chen	9049cf77e3	[RISCV] Add constraint for RVV indexed loads. Add the constraint when destination EEW not equals the source EEW for correctness. The RVV spec has three register overlap rules and I implement the first stricter constraint because the others are difficult to enforce. Reviewed By: frasercrmck, craig.topper Differential Revision: https://reviews.llvm.org/D98920	2021-03-26 07:23:24 -07:00
Craig Topper	8f62a80328	[RISCV] Optimize (and (shl GPR:, uimm5:), 0xffffffff) to use 2 shifts instead of 3. The and would normally become SLLI+SRLI, giving us 2 SLLI+SRLI. We can detect this and combine the 2 SLLIs into 1.	2021-03-25 23:31:01 -07:00
Craig Topper	9b3c0f9a54	[RISCV] Add Zbb+Zbt command lines to the signed saturing add/sub tests. This will enable cmov to be used for select. I improve the codegen of select_cc in D99021, but that patch doesn't work for cmov.	2021-03-25 17:25:36 -07:00
Craig Topper	c40cea6f08	[RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffffffff). We look for this pattern frequently in isel patterns so its a good idea to try to preserve it. This also let's us remove our special isel handling for srliw and use a direct pattern match of (srl (and X, 0xffffffff), C) since no bits will be removed from the and mask. Differential Revision: https://reviews.llvm.org/D99042	2021-03-25 09:03:25 -07:00
Fraser Cormack	99211352c1	[RISCV] Optimize select-like vector shuffles This patch adds a small optimization for vector shuffle lowering, detecting shuffles which can be re-expressed as vector selects. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99270	2021-03-25 11:39:57 +00:00
Fraser Cormack	1e56e8717f	[RISCV] Pre-commit shuffle test cases for D99270	2021-03-25 10:41:40 +00:00
Fraser Cormack	321a71a772	[RISCV] Optimize BUILD_VECTOR sequences that reveal hidden splats This patch adds further optimization techniques to RVV BUILD_VECTOR lowering. It teaches the compiler to find splats of larger vector element types "hidden" in smaller ones. For example, a v4i8 build_vector (0x1, 0x2, 0x1, 0x2) could be splat as v2i16 0x0201. This is generally more optimal than the dominant-element BUILD_VECTORs and so takes priority. This optimization is currently limited to all-constant-or-undef BUILD_VECTORs as those were found to be the most common. There's no reason this couldn't be extended to other BUILD_VECTORs, but the additional bit-manipulation instructions may require more sophisticated heuristics. There are some cases where the materialization of the larger constant takes more scalar instructions than it does to build the vector with vector instructions. We could add heuristics to try and catch this. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D99195	2021-03-25 10:35:31 +00:00
Craig Topper	32f6a15dfd	[RISCV] Add more tests that can be improved by D99042.	2021-03-25 00:02:42 -07:00
Craig Topper	c8cf8bc7ec	[RISCV] Add some 32-bit ctlz and cttz idiom tests to rv64zbb.ll. NFC This implements various idioms using ctlz/cttz like Log2, Log2_Ceil, findFirstSetBit, etc. Some of these demonstrate that we fail to use clzw because the idiom breaks the isel patterns we use. The isel pattern we use is (add (cttz (and X, 0xffffffff)), -32). Some of the idioms cause the constant on the add to be different.	2021-03-24 21:52:48 -07:00
Fraser Cormack	feff66a082	[RISCV] Further optimize BUILD_VECTORs with repeated elements This patch builds upon the initial BUILD_VECTOR work introduced in D98700. It further optimizes the lowering of BUILD_VECTOR by using VSELECT operations to effectively insert repeated elements into the vector with relatively few instructions. This allows us to optimize more BUILD_VECTORs without significantly increasing the size of the generated code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98969	2021-03-23 14:14:48 +00:00
Fraser Cormack	5bfbd9d938	[RISCV] Optimize all-constant mask BUILD_VECTORs This patch adds an optimization for mask-vector BUILD_VECTOR nodes whose elements are all constants or undef. It lowers such operations by building up the vector via a series of integer operations, in which multiple mask elements are inserted into a vector at a time via i8/i16/i32/i64 element types. The final result is then bitcast from that integer vector. We restrict this optimization in certain circumstances when optimizing for size. If we are required to use more than one integer insert operation, then it will likely increase code size compared with using a load from a constant pool. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98860	2021-03-23 10:11:19 +00:00
Craig Topper	728cd5dde7	[RISCV] Rename Zb* extension tests to use lower case 'Z' in file names. As discussed in D99009	2021-03-22 19:17:04 -07:00
Craig Topper	294efcd6f7	[RISCV] Add support for fixed vector masked gather/scatter. I've split the gather/scatter custom handler to avoid complicating it with even more differences between gather/scatter. Tests are the scalable vector tests with the vscale removed and dropped the tests that used vector.insert. We're probably not as thorough on the splitting cases since we use 128 for VLEN here but scalable vector use a known min size of 64. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98991	2021-03-22 10:17:30 -07:00
Luís Marques	20f845d7c9	[RISCV][NFC] Add test of stack slot sizes of large split arguments Illustrates bug 49500 <https://bugs.llvm.org/show_bug.cgi?id=49500>.	2021-03-22 13:41:11 +00:00
luxufan	02ffbac844	[RISCV] remove redundant instruction when eliminate frame index The reason for generating mv a0, a0 instruction is when the stack object offset is large then int<12>. To deal this situation, in the elimintateFrameIndex function, it will create a virtual register, which needs the register scavenger to scavenge it. If the machine instruction that contains the stack object and the opcode is ADDI(the addi was generated by frameindexNode), and then this instruction's destination register was the same as the register that was generated by the register scavenger, then the mv a0, a0 was generated. So to eliminnate this instruction, in the eliminateFrameIndex function, if the instrution opcode is ADDI, then the virtual register can't be created. Differential Revision: https://reviews.llvm.org/D92479	2021-03-21 18:54:00 +08:00
Craig Topper	27bc30c39d	[RISCV] Add test case to show a case where (mul (and X, 0xffffffff), (and Y, 0xffffffff)) optimization does not improve code. If the mul add two users, one of which was a sext.w, the mul would also be selected to a MULW before our pattern runs. This causes the ANDs to now be used by the already selected MULW and the mul we still need to select. They are unneeded on the MULW since MULW only reads the lower bits. So they get selected to SLLI+SRLI for the MULW use. The use for the (mul (and X, 0xffffffff), (and Y, 0xffffffff)) manages to reuse the SLLI. The end result is increased register pressure and no improvement to how soon we can start the MULW.	2021-03-20 17:54:28 -07:00
Craig Topper	07ed62b7d5	[RISCV] Disable (mul (and X, 0xffffffff), (and Y, 0xffffffff)) optimization when Zba is enabled. This optimization is trying to save SRLI instructions needed to implement the ANDs. If we have zext.w we won't save anything. Because we don't check that the multiply is the only user of the AND we might even increase instruction count.	2021-03-20 15:31:45 -07:00
Craig Topper	0874281d60	[RISCV] Add Zba command lines to xaluo.ll. NFC Some of the patterns end up with 32 to 64 bit zero extends on RV64 which can be handled by zext.w.	2021-03-20 15:31:45 -07:00
Craig Topper	b0d8823a8a	[RISCV] Add isel pattern to optimize (mul (and X, 0xffffffff), (and Y, 0xffffffff)) on RV64 This patterns computes the full 64 bit product of a 32x32 unsigned multiply. This requires a two pairs of SLLI+SRLI to zero the upper 32 bits of the inputs. We can do better than this by using two SLLI to move the lower bits to the upper bits then use MULHU to compute the product. This is the high half of a full 64x64 product. Since we put 32 0s in the lower bits of the inputs we know the 128-bit product will have zeros in the lower 64 bits. So the upper 64 bits, which MULHU computes, will contain the original 64 bit product we were after. The same trick would work for (mul (sext_inreg X, i32), (sext_inreg Y, i32)) using MULHS, but sext_inreg is sext.w which is already one instruction so we wouldn't save anything. Differential Revision: https://reviews.llvm.org/D99026	2021-03-20 14:55:46 -07:00
Fraser Cormack	d399b82e2a	[RISCV] Maintain fixed-length info when optimizing BUILD_VECTORs I'm not sure how I failed to notice this before, but when optimizing dominant-element BUILD_VECTORs we would lower via the scalable container type, which lost us the information about the fixed length of the vector types. By lowering via the fixed-length type we can preserve that information and eliminate redundant vsetvli instructions. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98938	2021-03-19 17:21:06 +00:00
Fraser Cormack	3bffa2c2aa	[RISCV] Add missing CHECKs to vector test Since the "LMUL-MAX=2" output for some test functions differed between RV32 and RV64, the update_llc_test_checks script failed to emit a unified LMULMAX2 check for them. I'm not sure why it didn't warn about this. This patch also takes the opportunity to add unified RV32/RV64 checks to help shorten the test file when the output for LMULMAX1 and LMULMAX2 is identical but differs between the two ISAs. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98944	2021-03-19 16:52:16 +00:00
Fraser Cormack	550292ecb1	[RISCV] Fix missing scalable->fixed-length vector conversion Returning the scalable-vector container type would present problems when the fixed-length INSERT_VECTOR_ELT was used by later operations. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98776	2021-03-19 16:49:47 +00:00
Hsiangkai Wang	aa8d33a6d6	[RISCV] Spilling for Zvlsseg registers. For Zvlsseg, we create several tuple register classes. When spilling for these tuple register classes, we need to iterate NF times to load/store these tuple registers. Differential Revision: https://reviews.llvm.org/D98629	2021-03-19 07:46:16 +08:00
Craig Topper	182b831aeb	[DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR. Previously only all zeros BUILD_VECTOR was recognized.	2021-03-18 15:34:14 -07:00
Fraser Cormack	3495031a39	[RISCV] Support scalable-vector masked scatter operations This patch adds support for masked scatter intrinsics on scalable vector types. It is mostly an extension of the earlier masked gather support introduced in D96263, since the addressing mode legalization is the same. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96486	2021-03-18 10:17:50 +00:00
Fraser Cormack	0331399dc9	[RISCV] Support scalable-vector masked gather operations This patch supports the masked gather intrinsics in RVV. The RVV indexed load/store instructions only support the "unsigned unscaled" addressing mode; indices are implicitly zero-extended or truncated to XLEN and are treated as byte offsets. This ISA supports the intrinsics directly, but not the majority of various forms of the MGATHER SDNode that LLVM combines to. Any signed or scaled indexing is extended to the XLEN value type and scaled accordingly. This is done during DAG combining as widening the index types to XLEN may produce illegal vectors that require splitting, e.g. nxv16i8->nxv16i64. Support for scalable-vector CONCAT_VECTORS was added to avoid spilling via the stack when lowering split legalized index operands. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D96263	2021-03-18 09:26:18 +00:00
Fraser Cormack	c2b4600ec8	[RISCV] Support bitcasts of fixed-length mask vectors Without this patch, bitcasts of fixed-length mask vectors would go through the stack. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98779	2021-03-18 08:52:42 +00:00
ShihPo Hung	fca5d63aa8	[RISCV] Fix isel pattern of masked vmslt[u] This patch changes the operand order of masked vmslt[u] from (mask, rs1, scalar, maskedoff, vl) to (maskedoff, rs1, scalar, mask, vl). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98839	2021-03-17 20:18:11 -07:00
Zakk Chen	9998b00c2e	[RISCV] Update RVV shift intrinsic tests to use XLEN bit as shift amount. Fix the unexpected of using op1's element type as shift amount type. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98501	2021-03-17 10:47:49 -07:00
Craig Topper	696ddef569	[RISCV] Support masked load/store for fixed vectors. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98561	2021-03-17 10:26:15 -07:00
Fraser Cormack	70251759a2	[RISCV] Optimize "dominant element" BUILD_VECTORs This patch adds an optimization path for BUILD_VECTOR nodes where the majority of the elements are identical. These can be splatted, with the remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can be tweaked as required - it is currently conservative. Undef elements are disregarded when judging the dominance of a particular element. This allows them to be covered by the splat value. In addition, vectors of 2 elements are always optimized to a splat (for the upper element) and an insert at element zero. This optimization is disabled when optimizing for size. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98700	2021-03-17 10:09:04 +00:00
Fangrui Song	6ab8927931	[RISCV] Support clang -fpatchable-function-entry && GNU function attribute 'patchable_function_entry' Similar to D72215 (AArch64) and D72220 (x86). ``` % clang -target riscv32 -march=rv64g -c -fpatchable-function-entry=2 a.c && llvm-objdump -dr a.o ... 0000000000000000 <main>: 0: 13 00 00 00 nop 4: 13 00 00 00 nop % clang -target riscv32 -march=rv64gc -c -fpatchable-function-entry=2 a.c && llvm-objdump -dr a.o ... 00000002 <main>: 2: 01 00 nop 4: 01 00 nop ``` Recently the mainline kernel started to use -fpatchable-function-entry=8 for riscv (https://git.kernel.org/linus/afc76b8b80112189b6f11e67e19cf58301944814). Differential Revision: https://reviews.llvm.org/D98610	2021-03-16 10:02:35 -07:00
Craig Topper	229eeb187d	[RISCV] Look through copies when trying to find an implicit def in addVSetVL. The InstrEmitter can sometimes insert a copy after an IMPLICIT_DEF before connecting it to the vector instruction. This occurs when constrainRegClass reduces to a class with less than 4 registers. I believe LMUL8 on masked instructions triggers this since the result can only use the v8, v16, or v24 register group as the mask is using v0. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98567	2021-03-16 07:59:09 -07:00
Craig Topper	a33ce06cf5	[RISCV] Improve i32 UADDSAT/USUBSAT on RV64. The default promotion uses zero extends that become shifts. We cam use sign extend instead which is better for RISCV. I've used two different implementations based on whether we have minu/maxu instructions. Differential Revision: https://reviews.llvm.org/D98683	2021-03-16 07:44:06 -07:00
Craig Topper	41759c3d92	[RISCV] Add RISCVISD::BR_CC similar to RISCVISD::SELECT_CC. This allows me to introduce similar combines for branches as we have recently added for SELECT_CC. Some of them are less useful for standalone setccs and only help branch instructions. By having a BR_CC node its easier to only affect branches. I'm using CondCodeSDNode to make isel patterns easier to write so we can refer to the codes by name. SELECT_CC uses a constant instead. I've translated the condition code just like SELECT_CC so we need less patterns for the swapped conditions. This includes special cases for X < 1 and X > -1 that get translated to blez and bgez by using a 0 constant. computeKnownBitsForTargetNode support for SELECT_CC is added to allow MaskedValueIsZero to work for cases where the true and false values of the SELECT_CC are setccs and the result of the SELECT_CC is used by a BR_CC. This was needed to avoid regressions in some of the overflow tests. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98159	2021-03-15 11:54:01 -07:00
Philipp Tomsich	018e96f71f	[RISCV] Add isel-patterns to optimize (a < 1) into blez (a <= 0) The following code-sequence showed up in a testcase (isolated from SPEC2017) for if-conversion and vectorization when searching for the maximum in an array: addi a2, zero, 1 blt a1, a2, .LBB0_5 which can be expressed as `bge zero,a1,.LBB0_5`/`blez a1,/LBB0_5`. More generally, we want to express (a < 1) as (a <= 0). This adds the required isel-pattern and updates the testcases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98449	2021-03-15 11:32:43 -07:00
Fraser Cormack	0035decae7	[CodeGen] Fix issues with scalable-vector INSERT/EXTRACT_SUBVECTORs This patch addresses a few issues when dealing with scalable-vector INSERT_SUBVECTOR and EXTRACT_SUBVECTOR nodes. When legalizing in DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR, we store the low and high halves to the stack separately. The offset for the high half was calculated incorrectly. Additionally, we can optimize this process when we can detect that the subvector is contained entirely within the low/high split vector type. While this optimization is valid on scalable vectors, when performing the 'high' optimization, the subvector must also be a scalable vector. Note that the 'low' optimization is still conservative: it may be possible to insert v2i32 into the low half of a split nxv1i32/nxv1i32, but we can't guarantee it. It is always possible to insert v2i32 into nxv2i32 or v2i32 into nxv4i32+2 as we know vscale is at least 1. Lastly, in SelectionDAG::isSplatValue, we early-exit on the extracted subvector value type being a scalable vector, forgetting that we can also extract a fixed-length vector from a scalable one. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98495	2021-03-15 17:04:21 +00:00
Craig Topper	3dc5b533e0	[RISCV] Improve legalization of i32 UADDO/USUBO on RV64. The default legalization uses zero extends that require pair of shifts on RISCV. Instead we can take advantage of the fact that unsigned compares work equally well on sign extended inputs. This allows us to use addw/subw and sext.w. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D98233	2021-03-15 09:30:23 -07:00
Fraser Cormack	0c5b789c73	[RISCV] Support fixed-length vectors in the calling convention This patch adds fixed-length vector support to the calling convention when RVV is used to lower fixed-length vectors. The scheme follows the regular vector calling convention for the argument/return registers, but uses scalable vector container types as the LocVTs, and converts to/from the fixed-length vector value types as required. Fixed-length vector types may be split when the combination of minimum VLEN and the maximum allowable LMUL is not large enough to fully contain the vector. In this case the behaviour differs between fixed-length vectors passed as parameters and as return values: 1. For return values, vectors must be passed entirely via registers or via the stack. 2. For parameters, unlike scalar values, split vectors continue to be passed by value, and are split across multiple registers until there are no remaining registers. Thus vector parameters may be found partly in registers and partly on the stack. As with scalable vectors, the first fixed-length mask vector is passed via v0. Split mask fixed-length vectors are passed first via v0 and then via the next available vector register: v8,v9,etc. The handling of vector return values uses all available argument registers v8-v23 which does not adhere to the calling convention we're supposedly implementing, but since this issue affects both fixed-length and scalable-vector values, it was left as-is. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97954	2021-03-15 10:43:51 +00:00
Hsiangkai Wang	a81dff1e58	[RISCV] Support inline asm for vector instructions. Types of fractional LMUL and LMUL=1 are all using VR register class. When using inline asm, it will use the first type in the register class as the type for the register. It is not necessary the same as the value type. We need to use INSERT_SUBVECTOR/EXTRACT_SUBVECToR/BITCAST to make it legal to put the value in the corresponding register class. Differential Revision: https://reviews.llvm.org/D97480	2021-03-15 11:02:18 +08:00
luxufan	a9b9c64fd4	change rvv frame layout This patch change the rvv frame layout that proposed in D94465. In patch D94465, In the eliminateFrameIndex function, to eliminate the rvv frame index, create temp virtual register is needed. This virtual register should be scavenged by class RegsiterScavenger. If the machine function has other unused registers, there is no problem. But if there isn't unused registers, we need a emergency spill slot. Because of the emergency spill slot belongs to the scalar local variables field, to access emergency spill slot, we need a temp virtual register again. This makes the compiler report the "Incomplete scavenging after 2nd pass" error. So I change the rvv frame layout as follows: ``` \|--------------------------------------\| \| arguments passed on the stack \| \|--------------------------------------\|<--- fp \| callee saved registers \| \|--------------------------------------\| \| rvv vector objects(local variables \| \| and outgoing arguments \| \|--------------------------------------\| \| realignment field \| \|--------------------------------------\| \| scalar local variable(also contains\| \| emergency spill slot) \| \|--------------------------------------\|<--- bp \| variable-sized local variables \| \|--------------------------------------\|<--- sp ``` Differential Revision: https://reviews.llvm.org/D97111	2021-03-13 16:05:55 +08:00
luxufan	5ddbd1fdbb	[RISCV] Remove redundancy -mattr=+d in test file Differential Revision: https://reviews.llvm.org/D97177	2021-03-13 15:17:51 +08:00
Craig Topper	2ea7014089	[DAGCombiner] Use isConstantSplatVectorAllZeros/Ones instead of isBuildVectorAllZeros/Ones in visitMSTORE and visitMLOAD. This allows us to optimize when the mask is a splat_vector in addition to build_vector.	2021-03-12 12:14:56 -08:00
Craig Topper	02da5e21ce	[RISCV] Add test cases for masked load/store with all ones/zeros mask. NFC These should be removed for all zeros mask or optimized to unmasked for all ones.	2021-03-12 12:14:56 -08:00
Craig Topper	51151828ac	[RISCV] Teach normaliseSetCC to canonicalize X > -1 to X >= 0 and X < 1 to 0 >= X. This allows the use of BGE with X0 instead of puting -1/1 in a register. Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D98542	2021-03-12 11:50:10 -08:00
Craig Topper	d701e37b42	[RISCV] Add test cases for failure to optimize select_cc with X < 1 or X > -1. NFC We can use BGE with X0 to implement these, but we currently put 1 or -1 into a register.	2021-03-12 11:19:04 -08:00
Craig Topper	45d3ed0304	[RISCV] Add support for scalable vector masked load/store. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98460	2021-03-12 10:32:33 -08:00
Simonas Kazlauskas	a2eca31da2	Test cases for rem-seteq fold with illegal types This also briefly tests a larger set of architectures than the more exhaustive functionality tests for AArch64 and x86. As requested in D88785 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98339	2021-03-12 16:28:04 +02:00
Fraser Cormack	641f5700f9	[RISCV] Optimize INSERT_VECTOR_ELT sequences This patch optimizes the codegen for INSERT_VECTOR_ELT in various ways. Primarily, it removes the use of vslidedown during lowering, and the vector element is inserted entirely using vslideup with a custom VL and slide index. Additionally, lowering of i64-element vectors on RV32 has been optimized in several ways. When the 64-bit value to insert is the same as the sign-extension of the lower 32-bits, the codegen can follow the regular path. When this is not possible, a new sequence of two i32 vslide1up instructions is used to get the vector element into a vector. This sequence was suggested by @craig.topper. From there, the value is slid into the final position for more consistent lowering across RV32 and RV64. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D98250	2021-03-12 09:13:38 +00:00
Craig Topper	1d26bbcf9b	[RISCV] Return false from isShuffleMaskLegal except for splats. We don't support any other shuffles currently. This changes the bswap/bitreverse tests that check for this in their expansion code. Previously we expanded a byte swapping shuffle through memory. Now we're scalarizing and doing bit operations on scalars to swap bytes. In the future we can probably use vrgather.vx to do a byte swap shuffle.	2021-03-11 20:02:49 -08:00
Craig Topper	2ac7a3cff1	[RISCV] Add test cases for fixed vector bitreverse, bswap, ctlz, cttz, and ctpop. Codegen needs to be improved, but I wanted to check for crashes.	2021-03-11 15:56:32 -08:00
Craig Topper	c82f442954	[RISCV] Support fixed vector copysign. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98394	2021-03-11 09:57:24 -08:00
Craig Topper	0dff8a9627	[RISCV] Handle vmv.x.s intrinsic for i64 vectors on RV32. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98372	2021-03-11 09:39:50 -08:00
Craig Topper	9c841cb8e8	[RISCV] Support extract_vector_elt for fixed and scalable masked registers. This uses a really simple approach of converting to an i8 vector and extracting. This is probably not the best approach especially if you know the index is constant. Other ideas: -Store to stack temporary using vse1, load as scalar and shift. -Sort of bitcast the vector to a vector of i8, slide down the appropriate 8 bit element, copy to scalar, shift down the correct bit within the 8 bits we extracted. Not exactly sure how to describe such a bitcast from i1 vector to i8 vector within the type system for elements less than 8. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98310	2021-03-11 09:26:44 -08:00
Craig Topper	e9426dfbae	[ValueTypes][RISCV] Add MVT for v1f16. RISCV makes all fixed vector MVTs with size less than or equal to a command line option legal. This didn't include v1f16 because it was missing but did include v1f32 and v1f64. One test is affected where we did test this type, but it is a horizontal reduction so it is non-sensical. Perhaps we should canonicalize that away somewhere. I'm not sure if we should be making v1 types legal, but this will at least make RISCV consistent across all types. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98365	2021-03-11 09:23:18 -08:00
Craig Topper	47c7a6cfed	[RISCV] Merge fixed-vectors-int-splat-rv32.ll and fixed-vectors-int-splat-rv64.ll. The vXi64 test cases no longer crash on rv32.	2021-03-10 20:15:26 -08:00
Craig Topper	85ae96d8b2	[RISCV] Add v2i64 _vi_ and _iv_ test cases to fixed-vectors-int.ll since we no longer crash. I think we were missing some build_vector or other support and skipped these test cases. They work now but don't generate optimal code.	2021-03-10 19:19:47 -08:00
Craig Topper	0c73a506e8	[RISCV] Starting fixing issues that prevent us from testing vXi64 intrinsics on RV32. Currently we crash in type legalization any time an intrinsic uses a scalar i64 on RV32. This patch adds support for type legalizing this to prevent crashing. I don't promise that it uses the best possible codegen just that it is functional. This first version handles 3 cases. vmv.v.x intrinsic, vmv.s.x intrinsic and intrinsics that take a scalar input, splat it and then do some operation. For vmv.v.x we'll either rely on hardware sign extension for constants or we'll convert it to multiple splats and bit manipulation. For vmv.s.x we use a really unoptimal sequence inspired by what we do for an INSERT_VECTOR_ELT. For the third case we'll either try to use the .vi form for constants or convert to a complicated splat and bitmanip and use the .vv form of the operation. I've renamed the ExtendOperand field to SplatOperand now use it specifically for the third case. The first two cases are handled by custom lowering specifically for those intrinsics. I haven't updated all tests yet, but I tried to cover a subset that includes single-width, widening, and narrowing. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D97895	2021-03-10 09:45:38 -08:00
Craig Topper	1e39118638	[RISCV] Manually split vector operands to VECREDUCE when handling vXi64 vectors on RV32. The type legalizer will visit the result before the operands. To avoid creating an illegal target specific node or falling back to scalarization, we need to manually split vector operands. This still doesn't handle the case of non-power of 2 operands which need to be widened. I'm not sure the type legalizer is ready for it. I think we would need to insert an INSERT_SUBVECTOR with the power of 2 type we want, with an undef first operand, and the non-power of 2 orignal operand as the vector to insert. Then fill in the neutral elements into the elements the padded elements. Alternatively we INSERT_SUBVECTOR into a neutral vector. From there we carry on splitting if needed to get to a legal type then do the target specific code. The problem with this is the type legalizer doesn't know how to widen an insert_subvector yet. We would need to add that including the handling for a non-undef first vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D98292	2021-03-10 09:27:38 -08:00

... 2 3 4 5 6 ...

1015 Commits