This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
At the IR level, we generally assume that constants are free to materialize. However, for RISCV due to some quirks of the ISA, materializing arbitrary constants can be rather expensive. We frequently fallback to constant pool loads.
We've been slowly moving in the direction of modeling the cost of the remat as part of the instruction cost. This has the effect of disincentivizing vectorization - mostly SLP - when we'd have to materialize an expensive constant.
We need better modeling of which constants are expensive and not, but the moment let's be consistent with how we model arithmetic and memory instructions. The difference between the two is that arithmetic can sometimes fold a splat operation which stores can not.
Differential Revision: https://reviews.llvm.org/D138941
This patch reduces the number of unpredictable branches
(select (x < 0), y, z) -> x >> (XLEN - 1) & (y - z) + z
(select (x >= 0), y, z) -> x >> (XLEN - 1) & (z - y) + y
Reviewed By: craig.topper, reames
Differential Revision: https://reviews.llvm.org/D137949
So we have the opportunity to fold splat into .vx instruction as what
D101138 has done. If failed, we can select zero-stride vector load
again.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D138101
This patch add the support of RISCV Zca ext
`Zca` is a subset of C extension instructions that are compatible with the Zc extension.
So this patch implements Zca code generation with reference to the C extension and sets the 2-byte alignment for the Zca extension, just like C extension does.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D130483
Current lowerVECTOR_SHUFFLEAsVSlidedown only seeks whether input are
EXTRACT_SUBVECTOR and their source are same. The commit will make the
function seek input and their source until they are not
EXTRACT_SUBVECTOR.
Differential Revision: https://reviews.llvm.org/D138025
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW,
which will prevent us from using a BEXT instruction.
This is similar to what we do for (i32 (and (srl X, Y), 1)).
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137685
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137928
We may form a zero-stride vector load when lowering gather to strided
load. As what D137699 has done, we use `load+splat` for this form if
there is no optimized implementation.
We restrict this to unmasked loads currently in consideration of the
complexity of hanlding all falses masks.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D137931
Current getDefaultVLOps can only deduce VL from a MVT. However,
sometimes users have already known VL value. This commit will provide a
uniform interface to get VL instead of calling DAG.getConstant.
Differential Revision: https://reviews.llvm.org/D138003
This adds a RISCVISD::ABSW to remember that we started with an i32
abs. Previously we used a DAG combine of (sext_inreg (abs)) to
delay emitting a freeze from type legalization in order to make
ComputeNumSignBits optimizations work on other promoted nodes.
This new approach always uses negw+max even if the result doesn't
need to be sign extended. This helps the RISCVSExtWRemoval pass
if the sext.w is in another basic block.
All ISD::BSWAP nodes are not customized lowered in RISC-V now, so the patch
removed dead code for ISD::BSWAP in LowerOperation.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137907
The static chain parameter is a special parameter that is not passed in the usual argument registers or stack space. For example, in x64 System V ABI it is always passed in R10. Although the ABI of RISCV does not assign a register for this purpose, GCC had support for it on RISC-V a long time ago, and it is exposed via `__builtin_call_with_static_chain` intrinsic, and assign t2 for static chain parameters. This patch also chose t2 for compatibility.
In LLVM, static chain parameters are handled by the `nest` attribute of an argument to a function ([D6332](https://reviews.llvm.org/D6332)), so tests are added to ensure `nest` arguments are handled correctly.
Reviewed By: kito-cheng, MaskRay
Differential Revision: https://reviews.llvm.org/D129106
We can narrow one of the extends and keep the other original by
using a vwaddu.wv or vwadd.wv.
We were previously forgetting to keep the original operand and
instead took the source of its extend. This resulted in a type
mismatch that later failed with an impossible physical register copy.
To fix this I've refactored some code to maintain information about
whether the source needs to be extended at all for longer so we could
use it in materialize.
Differential Revision: https://reviews.llvm.org/D137106
We can use tail undisturbed vslide1down to insert into the vector.
This should make D136640 unneeded.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136738
This avoids the call overhead as well as the the save/restore of
fflags and the snan handling in the libm function.
The save/restore of fflags and snan handling are needed to be
correct for -ftrapping-math. I think we can ignore them in the
default environment.
The inline sequence will generate an invalid exception for nan
and an inexact exception if fractional bits are discarded.
I've used a custom inserter to explicitly create the control flow
around the float->int->float conversion.
We can probably avoid the final fsgnj after the conversion for
no signed zeros FMF, but I'll leave that for future work.
Note the comparison constant is slightly different than glibc uses.
They use 1<<53 for double, I'm using 1<<52. I believe either are valid.
Numbers >= 1<<52 can't have any fractional bits. It's ok to do the
float->int->float conversion on numbers between 1<<53 and 1<<52 since
they will all fit in 64. We only have a problem if the double can't fit
in i64
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136508
Instead of using vslide1up, use vslide1down and build the other
direction. This avoids the overlap constraint early clobber of
vslide1up.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D136735