llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	36d8316cc8	[RISCV] Reduce duplicate code for calling SimplifyDemandedBits. This encapsulates the APInt creation and worklist management into a helper function. To keep one common interface I've use Log2_32 in places that previously created a mask by subtracting 1 from a power of 2. Differential Revision: https://reviews.llvm.org/D108324	2021-08-19 07:09:38 -07:00
Craig Topper	6d7ea597ef	[RISCV] Insert sext_inreg when type legalizing add/sub/mul with constant LHS. We already do this for non-constants RHS. This just removes the special case. I believe the special case may have been needed because the ANY_EXTEND of a constant used to create zero extended constants, but we recently changed that to produce sign extended constants. D107658 is needed to prevent some regressions. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107697	2021-08-18 10:44:25 -07:00
Craig Topper	d63f117210	[RISCV] Support RISCVISD::SELECT_CC in ComputeNumSignBitsForTargetNode.	2021-08-13 18:00:09 -07:00
Craig Topper	6f5edc3487	[RISCV] Fold (add (select lhs, rhs, cc, 0, y), x) -> (select lhs, rhs, cc, x, (add x, y)) Similar for sub except sub isn't commutative. Modify the existing and/or/xor folds to also work on ISD::SELECT and not just RISCVISD::SELECT_CC. This is needed to make sure we do this transform before type legalization turns i32 add/sub into add/sub+sign_extend_inreg on RV64. If we don't do this before that, the sign_extend_inreg will still be after the select. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107603	2021-08-10 09:02:56 -07:00
Fraser Cormack	2b4a1d4b86	[RISCV] Improve codegen for shuffles with LHS/RHS splats Shuffles which are broken into separate halves reveal splats in which a half is accessed via one index; such operations can be optimized to use "vrgather.vi". This optimization could be achieved by adding extra patterns to match `vrgather_vv_vl` which uses a splat as an index operand, but this patch instead identifies splat earlier. This way, future optimizations can build on top of the data gathered here, e.g., to splat-gather dominant indices and insert any leftovers. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107449	2021-08-09 10:31:40 +01:00
Craig Topper	2f3b738960	[RISCV] Add optimizations for FMV_X_ANYEXTH similar to FMV_X_ANYEXTW_RV64. This enables the fneg and fabs combines we have for FMV_X_ANYEXTW_RV64.	2021-08-08 18:30:48 -07:00
Craig Topper	88bc29f5f2	[RISCV] Introduce a RISCV CondCode enum instead of using ISD:SET* in MIR. NFC Previously we converted ISD condition codes to integers and stored them directly in our MIR instructions. The ISD enum kind of belongs to SelectionDAG so that seems like incorrect layering. This patch instead uses a CondCode node on RISCV::SELECT_CC until isel and then converts it from ISD encoding to a RISCV specific value. This value can be converted to/from the RISCV branch opcodes in the RISCV namespace. My larger motivation is to possibly support a microarchitectural feature of some CPUs where a short forward branch over a single instruction can be predicated internally. This will require a new pseudo instruction for select that needs to carry a branch condition and live probably until RISCVExpandPseudos. At that point it can be expanded to control flow without other instructions ending up in the predicated basic block. Using an ISD encoding in RISCVExpandPseudos doesn't seem like correct layering. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107400	2021-08-08 17:25:37 -07:00
Craig Topper	d4ee84ceee	[RISCV] Support FP_TO_S/UINT_SAT for i32 and i64. The fcvt fp to integer instructions saturate if their input is infinity or out of range, but the instructions produce a maximum integer for nan instead of 0 required for the ISD opcodes. This means we can use the instructions to do the saturating conversion, but we'll need to fix up the nan case at the end. We can probably improve the i8 and i16 default codegen as well, but I'll leave that for a follow up. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D107230	2021-08-07 16:06:00 -07:00
Fraser Cormack	cba6aab971	[RISCV] Support simple fractional steps in matching VID sequences This patch extends the optimization of VID-sequence BUILD_VECTORs introduced in D104921 to include simple fractional steps composed of a separated integer numerator and denominator. A notable limitation in this sequence detection is that only sequences with steps N/1 or 1/D are found, meaning that the step between elements and the frequency with which it changes is consistent across the whole sequence. Fractional steps such as 2/3 won't be matched as those would involve more complex tracking of state or some level of backtracking. As is stands, however, this patch is sufficient to match common interleave-type shuffle indices, for example matching `<0,0,1,1>` (or commonly `<0,u,1,u>` or `<u,0,u,1>`) to an index sequence divided by 2. While the optimization is relatively `undef`-tolerant, due to greedy pattern-matching there even are some simple patterns which confuse the sequence detection into identifying either a suboptimal sequence or no sequence at all. Currently only fractional-step sequences identified as having a power-of-two denominator are actually lowered to RVV instructions. This is to avoid introducing divisions into the generated code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106533	2021-08-03 10:38:24 +01:00
Hsiangkai Wang	8b33839f01	[RISCV] Rename vector inline constraint from 'v' to 'vr' and 'vm' in IR. Differential Revision: https://reviews.llvm.org/D107139	2021-08-01 05:58:17 +08:00
Craig Topper	593059b328	[RISCV] Rename RISCVISD::FCVT_W_RV64 to FCVT_W_RTZ_RV64. NFC fcvt.w(u) supports multiple rounding modes, but the ISD node doesn't encode that. So name it to match the rounding mode it uses.	2021-07-31 11:14:59 -07:00
Fraser Cormack	02dd4b59bc	[RISCV] Optimize floating-point "dominant value" BUILD_VECTORs This patch aims to improve the performance of BUILD_VECTORs which are identified as containing a dominant element. Given that most floating-point constants themselves require a load from the constant pool, it was possible for the optimization to actually increase the number of individual loads on small vectors. The exception is the zero constant -- +0.0 -- which can be materialized efficiently. While this optimization could do with a proper cost model to weigh the benfits of a single vector load vs. the manipulation of individual elements -- even for integer vectors which often require several instructions to materialize -- without a concrete RVV implementation to work with any heuristic is likely to be both more obtuse and inaccurate. Until then, this patch fixes at least one known obvious deficiency. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106963	2021-07-29 09:22:34 +01:00
Ben Shi	264b8e2a20	[RISCV] Optimize mul in the zba extension with SH*ADD This patch makes the following optimization, if the immediate multiplier is not a simm12. (mul x, (power_of_2 + 2)) => (SH1ADD x, (SLLI x, bits)) (mul x, (power_of_2 + 4)) => (SH2ADD x, (SLLI x, bits)) (mul x, (power_of_2 + 8)) => (SH3ADD x, (SLLI x, bits)) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106648	2021-07-29 09:46:41 +08:00
Craig Topper	3106f85945	[RISCV] Fix grammar in a comment. NFC	2021-07-28 09:09:26 -07:00
Craig Topper	54588bcc05	[RISCV] Restrict performANY_EXTENDCombine to prevent an infinite loop. The sign_extend we insert here can get turned into a zero_extend if the sign bit is known zero. This can enable a setcc combine that shrinks compares with zero_extend. This reduces the use count of the zero_extend allowing other combines to turn it back into an any_extend. This restricts the combine to only cases where the result is used by a CopyToReg. This works for my original motivating case. I hope the CopyToReg use will prevent any converted extends from turning back into an any_extend. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106754	2021-07-28 09:05:45 -07:00
Fraser Cormack	172487fe4c	[RISCV] Add support for vector saturating add/sub operations This patch adds support for lowering the saturating vector add/sub intrinsics to RVV instructions, for both fixed-length and scalable-vector forms alike. Note that some of the DAG combines are still not triggering for the scalable-vector tests. These require a bit more work in the DAGCombiner itself. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D106651	2021-07-27 10:04:14 +01:00
Craig Topper	c63dbd8501	[RISCV] Custom lower (i32 (fptoui/fptosi X)). I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32) isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if the assertzexti32 has an additional user. If we add a one use check it would just cause a fcvt.lu followed by a sext.w when only need a fcvt.wu to satisfy both users. To mitigate this I've added custom isel and new ISD opcodes for fcvt.wu. This allows us to keep know it started life as a conversion to i32 without needing to match multiple nodes. ComputeNumSignBits has been taught that this new nodes produces 33 sign bits. To prevent regressions when we need to zero extend the result of an (i32 (fptoui X)), I've added a DAG combine to convert it to an (i64 (fptoui X)) before type legalization. In most cases this would happen in InstCombine, but a zero_extend can be created for function returns or arguments. To keep everything consistent I've added new nodes for fptosi as well. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D106346	2021-07-24 10:50:43 -07:00
Fraser Cormack	b115c038d2	[RISCV] Fix a crash when lowering split float arguments Lowering certain float vectors without legal vector types could cause a crash due to a bad interaction between passing floats via GPRs and argument splitting. Split vector floats appear just like scalar floats. Under certain situations we choose to pass these float arguments via GPRs and use an XLenVT location and set the 'BCvt' info to track how they must be converted back to floating-point values. However, later logic for handling split arguments may take over, in which case we lose the previous information and set the 'Indirect' info, thus incorrectly lowering to integer types. I don't believe that we would have come across the notion of split floating-point arguments before. This patch addresses the issue by updating the lowering so that split arguments are only passed indirectly when they are scalar integer types. This has some change to how we lower some larger illegal float vectors, as can be seen in 'fastcc-float.ll' where the vector is now passed partly in registers and partly on the stack. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D102852	2021-07-22 09:55:26 +01:00
Fraser Cormack	7b3a69bc16	[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID This relands `a6ca88e908` which was originally reverted due to overflow bugs in `e3fa2b1eab`. This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921	2021-07-22 09:36:12 +01:00
Eli Friedman	0ca46a1757	[SelectionDAG] Fix the representation of ISD::STEP_VECTOR. The existing rule about the operand type is strange. Instead, just say the operand is a TargetConstant with the right width. (Legalization ignores TargetConstants, so it doesn't matter if that width is legal.) Highlights: 1. I had to substantially rewrite the AArch64 isel patterns to expect a TargetConstant. Nothing too exotic, but maybe a little hairy. Maybe worth considering a target-specific node with some dagcombines instead of this complicated nest of isel patterns. 2. Our behavior on RV32 for vectors of i64 has changed slightly. In particular, we correctly preserve the width of the arithmetic through legalization. This changes the DAG a bit. Maybe room for improvement here. 3. I explicitly defined the behavior around overflow. This is necessary to make the DAGCombine transforms legal, and I don't think it causes any practical issues. Differential Revision: https://reviews.llvm.org/D105673	2021-07-21 10:58:40 -07:00
Craig Topper	81efb82570	[RISCV] Teach RISCVMatInt about cases where it can use LUI+SLLI to replace LUI+ADDI+SLLI for large constants. If we need to shift left anyway we might be able to take advantage of LUI implicitly shifting its immediate left by 12 to cover part of the shift. This allows us to use more bits of the LUI immediate to avoid an ADDI. isDesirableToCommuteWithShift now considers compressed instruction opportunities when deciding if commuting should be allowed. I believe this is the same or similar to one of the optimizations from D79492. Reviewed By: luismarques, arcbbb Differential Revision: https://reviews.llvm.org/D105417	2021-07-20 09:22:06 -07:00
Craig Topper	84877a098a	[RISCV] Use unordered indexed loads for MGATHER. I don't think the semantics of the llvm masked gather intrinsic care about the order the elements are loaded. For example, type legalization by splitting will chain them in parallel. This is different than scatter which we do chain in order. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D106025	2021-07-20 08:46:02 -07:00
Craig Topper	50302feb1d	[SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend. RISCV would prefer a sign extended constant since that works better with our constant materialization. We have an existing TLI hook we use to control sign extension of setcc operands in type legalization. That hook happens to do the right check we need here, but might be straying from its original purpose. With only RISCV defining this hook in tree, I wasn't sure if it was worth adding another hook with identical behavior. This is an alternative to D105785 where I tried to handle this in the RISCV backend by not creating ANY_EXTENDs in some places. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D105918	2021-07-19 09:25:28 -07:00
Craig Topper	d0f8047d37	[RISCV] Teach computeKnownBitsForTargetNode that VLENB will never be more than 65536/8.	2021-07-17 11:24:20 -07:00
Craig Topper	173332d175	[RISCV] Manually emit the best shift for VSCALE lowering to improve codegen. We assume VLENB is a multiple of 8 and previously relied on shift pairs being optimized to an AND+SHL/SHR and computeKnownBits removing the AND. This doesn't happen if (vlenb >> 3) gets CSEd to have multiple uses. This patch manually emits the best shift to workaround this.	2021-07-17 00:52:07 -07:00
Craig Topper	4dbb788068	[RISCV] Teach constant materialization that it can use zext.w at the end with Zba to reduce number of instructions. If the upper 32 bits are zero and bit 31 is set, we might be able to use zext.w to fill in the zeros after using an lui and/or addi. Most of this patch is plumbing the subtarget features into the constant materialization. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D105509	2021-07-16 09:35:56 -07:00
Craig Topper	0ce13f92b7	[RISCV] Add curly braces around a case body that declares variables. NFC This is at the end of the switch so doesn't cause any issues now, but if a new case is added it will break.	2021-07-16 09:35:56 -07:00
Fraser Cormack	e3fa2b1eab	Revert "[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID" This reverts commit `a6ca88e908`. More caution is required to avoid overflow/underflow. Thanks to the santizers for catching this.	2021-07-16 15:00:20 +01:00
Fraser Cormack	a6ca88e908	[RISCV] Lower more BUILD_VECTOR sequences to RVV's VID This patch teaches the compiler to identify a wider variety of `BUILD_VECTOR`s which form integer arithmetic sequences, and to lower them to `vid.v` with modifications for non-unit steps and non-zero addends. The sequences handled by this optimization must either be monotonically increasing or decreasing. Consecutive elements holding the same value indicate a fractional step which, while simple mathematically, becomes more complex to handle both in the realm of lossy integer division and in the presence of `undef`s. For example, a common "interleaving" shuffle index will be lowered by LLVM to both `<0,u,1,u,2,...>` and `<u,0,u,1,u,...>` `BUILD_VECTOR` nodes. Either of these would ideally be lowered to `vid.v` shifted right by 1. Detection of this sequence in presence of general `undef` values is more complicated, however: `<0,u,u,1,>` could match either `<0,0,0,1,>` or `<0,0,1,1,>` depending on later values in the sequence. Both are possible, so backtracking or multiple passes is inevitable. Sticking to monotonic sequences keeps the logic simpler as it can be done in one pass. Fractional steps will likely be a separate optimization in a future patch. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104921	2021-07-16 10:35:13 +01:00
Fraser Cormack	03a4702c88	[RISCV] Fix the neutral element in vector 'fadd' reductions Using positive zero as the neutral element in 'fadd' reductions, while it generates better code, is incorrect. The correct neutral element is negative zero: 0.0 + -0.0 = 0.0, whereas -0.0 + -0.0 = -0.0. There are perhaps more optimal lowerings of negative zero avoiding constant-pool loads which could be left as future work. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D105902	2021-07-14 10:18:38 +01:00
Craig Topper	1e670dc7d7	[RISCV] Use DIVUW/REMUW/DIVW instructions for i8/i16/i32 udiv/urem/sdiv when LHS is constant. We don't really have optimizations for division with a constant LHS. If we don't use a W instruction we end up needing to sign or zero extend the RHS to use the 64-bit instruction. I had to sign_extend i32 constants on the LHS instead of using any_extend which becomes zero_extend. If we don't do this, constants that were originally negative become harder to materialize. I think this problem exists for more of our W instruction cases. For example (i32 (shl -1, X)), but we don't have lit tests. I'll work on that as a follow up. I also left a FIXME for enabling W instruction for RHS constants under -Oz. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D105769	2021-07-13 10:33:57 -07:00
Fangrui Song	3d89fb4d13	[RISCV] Support machine constraint "S" Similar to D46745, "S" represents an absolute symbolic operand, which can be used to specify the access models, e.g. extern int var; void addr_via_asm() { void ret; asm("lui %0, %%hi(%1)\naddi %0,%0,%%lo(%1)" : "=r"(ret) : "S"(&var)); return ret; } 'S' is documented in trunk GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101275 Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D105254	2021-07-13 09:30:09 -07:00
Fraser Cormack	d991b7212b	[RISCV] Pass undef VECTOR_SHUFFLE indices on to BUILD_VECTOR Often when lowering vector shuffles, we split the shuffle into two LHS/RHS shuffles which are then blended together. To do so we split the original indices into two, indexed into each respective vector. These two index vectors are then separately lowered as BUILD_VECTORs. This patch forwards on any undef indices to the BUILD_VECTOR, rather than having the VECTOR_SHUFFLE lowering decide on an optimal concrete index. The motiviation for ths change is so that we don't duplicate optimization logic between the two lowering methods and let BUILD_VECTOR do what it does best. Propagating undef in this way allows us, for example, to generate `vid.v` to produce the LHS indices of commonly-used interleave-type shuffles. I have designs on further optimizing interleave-type and other common shuffle patterns in the near future. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104789	2021-07-13 10:41:54 +01:00
Craig Topper	12d51f95fe	[RISCV] Implement lround/llround/lrint/llrint with fcvt instruction with -fno-math-errno These are fp->int conversions using either RMM or dynamic rounding modes. The lround and lrint opcodes have a return type of either i32 or i64 depending on sizeof(long) in the frontend which should follow xlen. llround/llrint should always return i64 so we'll need a libcall for those on rv32. The frontend will only emit the intrinsics if -fno-math-errno is in effect otherwise a libcall will be emitted which will not use these ISD opcodes. gcc also does this optimization. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D105206	2021-07-06 11:43:22 -07:00
Craig Topper	2b5e53111a	[RISCV] Add support for matching vwmul(u) and vwmacc(u) from fixed vectors. This adds a DAG combine to detect sext/zext inputs and emit a new ISD opcode. The extends will either be removed or replaced with narrower extends. Isel patterns are used to match add and widening mul to vwmacc similar to the recently added vmacc patterns. There's still some work to be to match vmulsu. We should also rewrite splats that were extended as scalars and then splatted. Reviewed By: arcbbb Differential Revision: https://reviews.llvm.org/D104802	2021-07-06 10:24:31 -07:00
Craig Topper	3b6dfa381e	[RISCV] Protect the SHL/SRA/SRL handlers in LowerOperation against being called for an illegal i32 shift amount. It seems it is possible for DAG combine to create a shl with an i64 result type and an i32 shift amount. This is ok before type legalization since the type don't need to match in SelectionDAG. This results in type legalization calling LowerOperation to legalize just the amount. We weren't expecting this so we asserted for not finding a fixed vector shift. To fix this, I've added a check for the fixed vector case and returned SDValue() to get the default type legalizer. I've factored all shifts together and added a fixed vector specific handler to avoid repeating similar code for each in LowerOperation. The particular case I found was exposed by D104581, but the bad shift is created after that patch triggers.	2021-06-29 09:45:13 -07:00
Jim Lin	779d2b0a42	[RISCV][NFC] Combine the control flow for different RetOp of interrupt function Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D104838	2021-06-26 17:28:03 +08:00
Craig Topper	d4f4a1ba62	[RISCV] Add DAG combine to detect opportunities to replace (i64 (any_extend (i32 X)) with sign_extend. If type legalization is going to insert a sign_extend for other users of X and we can fold the sign_extend into ADDW/MULW/SUBW, it is better to replace the ANY_EXTEND so we don't end up with a separate ADD/MUL/SUB instruction for the users of the ANY_EXTEND. I'm only handling setcc uses right now, but there are other instructions that force sign_extends like ashr. There are probably other *W instructions we could use in addition to ADDW/SUBW/MULW. My motivating case was a loop terminating compare and a phi use as seen in the new test file. Reviewed By: asb Differential Revision: https://reviews.llvm.org/D104581	2021-06-25 23:16:37 -07:00
Fraser Cormack	a4729f7f88	[RISCV] Lower RVV vector SELECTs to VSELECTs This patch optimizes the code generation of vector-type SELECTs (LLVM select instructions with scalar conditions) by custom-lowering to VSELECTs (LLVM select instructions with vector conditions) by splatting the condition to a vector. This avoids the default expansion path which would either introduce control flow or fully scalarize. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104772	2021-06-24 10:12:51 +01:00
Fraser Cormack	fed1503e85	[RISCV][VP] Lower FP VP ISD nodes to RVV instructions With the exception of `frem`, this patch supports the current set of VP floating-point binary intrinsics by lowering them to to RVV instructions. It does so by using the existing `RISCVISD *_VL` custom nodes as an intermediate layer. Both scalable and fixed-length vectors are supported by using this method. The `frem` node is unsupported due to a lack of available instructions. For fixed-length vectors we could scalarize but that option is not (currently) available for scalable-vector types. The support is intentionally left out so it equivalent for both vector types. The matching of vector/scalar forms is currently lacking, as scalable vector types do not lower to the custom `VFMV_V_F_VL` node. We could either make floating-point scalable vector splats lower to this node, or support the matching of multiple kinds of splat via a `ComplexPattern`, much like we do for integer types. Reviewed By: rogfer01 Differential Revision: https://reviews.llvm.org/D104237	2021-06-17 10:04:00 +01:00
Fraser Cormack	c75e454cb9	[RISCV] Transform unaligned RVV vector loads/stores to aligned ones This patch adds support for loading and storing unaligned vectors via an equivalently-sized i8 vector type, which has support in the RVV specification for byte-aligned access. This offers a more optimal path for handling of unaligned fixed-length vector accesses, which are currently scalarized. It also prevents crashing when `LegalizeDAG` sees an unaligned scalable-vector load/store operation. Future work could be to investigate loading/storing via the largest vector element type for the given alignment, in case that would be more optimal on hardware. For instance, a 4-byte-aligned nxv2i64 vector load could loaded as nxv4i32 instead of as nxv16i8. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104032	2021-06-14 18:12:18 +01:00
Fraser Cormack	502edebd9d	[ValueTypes][RISCV] Cap RVV fixed-length vectors by size This patch changes RVV's policy for its supported list of fixed-length vector types by capping by vector size rather than element count. Now all 1024-byte vectors (of supported element types) are supported, rather than all 256-element vectors. This is a more natural fit for the architecture, and allows us to, for example, improve the support for vector bitcasts. This change necessitated the adding of some new simple types to avoid "regressing" on the number of currently-supported vectors. We round out the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and `v512f16`. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103884	2021-06-09 12:15:37 +01:00
Fraser Cormack	e8f1f89103	[RISCV] Support CONCAT_VECTORS on scalable masks This patch is a simple fix which registers CONCAT_VECTORS as custom-lowered for scalable mask vectors. This follows the pattern of all other scalable-vector types, as the default expansion of CONCAT_VECTORS cannot handle scalable types, and even if it did it'd go through the stack and generate worse code. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D103896	2021-06-09 09:07:44 +01:00
Craig Topper	f30f8b4f12	[RISCV] Lower i8/i16 bswap/bitreverse to grevi/greviw with Zbp. Include known bits support so we know we don't need to zext the output if the input was already zero extended. Reviewed By: luismarques Differential Revision: https://reviews.llvm.org/D103757	2021-06-07 10:31:51 -07:00
Craig Topper	8bde5f06a1	[RISCV] Replace && with \|\|. Spotted by coverity. We should be exiting when the shift amount is greater than the bit width regardless of whether it is a power of 2. Reported by Simon Pilgrim here https://reviews.llvm.org/D96661 This requires getting a shift amount that is out of bounds that wasn't already optimized by SelectionDAG. This would be pretty trick to construct a test for. Or it would require a non-power of 2 shift amount and a mask that has runs of ones and zeros of the next lowest power of 2 from that shift amount. I tried a little to produce a test for this, but didn't get it to work.	2021-06-06 13:09:51 -07:00
Nikita Popov	1ffa6499ea	[TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC) Don't require a specific kind of IRBuilder for TargetLowering hooks. This allows us to drop the IRBuilder.h include from TargetLowering.h. Differential Revision: https://reviews.llvm.org/D103759	2021-06-06 16:29:50 +02:00
Nikita Popov	9914200393	[CodeGen] Add missing includes (NFC) These currently rely on the IRBuilder.h include in TargetLowering.h. Make them explicit.	2021-06-06 15:48:27 +02:00
Fraser Cormack	3b0a33d0ad	[RISCV] Expand unaligned fixed-length vector memory accesses RVV vectors must be aligned to their element types, so anything less is unaligned. For regular loads and stores, our custom-lowering of fixed-length vectors meant that we opted out of LegalizeDAG's built-in unaligned expansion. This patch adds that logic in to our custom lower function. For masked intrinsics, we declare that anything unaligned is not legal, leaving the ScalarizeMaskedMemIntrin pass to do the expansion for us. Note that neither of these methods can handle the expansion of scalable-vector memory ops, so those cases are left alone by this patch. Scalable loads and stores already go through expansion by default but hit an assertion, and scalable masked intrinsics will silently generate incorrect code. It may be prudent to return an error in both of these cases. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D102493	2021-06-02 09:27:44 +01:00
Fraser Cormack	4f500c402b	[RISCV] Support vector types in combination with fastcc This patch extends the RISC-V lowering of the 'fastcc' calling convention to vector types, both fixed-length and scalable. Without this patch, any function passing or returning vector types by value would throw a compiler error. Vectors are handled in 'fastcc' much as they are in the default calling convention, the noticeable difference being the extended set of scalar GPR registers that can be used to pass vectors indirectly. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D102505	2021-06-01 10:31:18 +01:00
Fraser Cormack	2b37c405cc	[RISCV] Scale scalably-typed split argument offsets by VSCALE This patch fixes a bug in lowering scalable-vector types in RISC-V's main calling convention. When scalable-vector types are split and passed indirectly, the target is responsible for scaling the offset -- initially set to the known-minimum store size -- by the scalable factor. Before this we were issuing overlapping loads or stores to the different parts, leading to incorrect codegen. Credit to @HsiangKai for spotting this. Reviewed By: HsiangKai Differential Revision: https://reviews.llvm.org/D103262	2021-05-31 10:43:13 +01:00

1 2 3 4 5 ...

438 Commits