There is an in-progress proposal for the following pseudo-instructions
in the assembler, to complement the existing `sext.w` rv64i instruction:
- sext.b
- sext.h
- zext.b
- zext.h
- zext.w
The `.b` and `.h` variants are available with rv32i and rv64i, and `zext.w` is
only available with `rv64i`.
These are implemented primarily as pseudo-instructions, as these instructions
expand to multiple real instructions. In the case of `zext.b`, this expands to a
single rv32/64i instruction, so it is implemented with an InstAlias (like
`sext.w` is on rv64i).
The proposal is available here: https://github.com/riscv/riscv-asm-manual/pull/61
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D92793
We can use these instructions for single bit immediates that are too large for ANDI/ORI/CLRI.
The _10 test cases are to make sure that we still use ANDI/ORI/CLRI for small immediates.
Differential Revision: https://reviews.llvm.org/D92262
On the surface this would be slightly less optimal for the isel
table, but due to a tablegen issue with HW mode this ends up
generating a smaller isel table.
This enables bswap/bitreverse to combine with other GREVI patterns or each other without needing to add more special cases to the DAG combine or new DAG combines.
I've also enabled the existing GREVI combine for GREVIW so that it can pick up the i32 bswap/bitreverse on RV64 after they've been type legalized to GREVIW.
Differential Revision: https://reviews.llvm.org/D92253
Not sure why bswap was treated specially. This also applies to bitreverse
or generic grevi. We can improve this in future patches.
For now I just wanted to get the consistency and the test coverage
as I plan to make some other changes around bswap.
This is the logically correct thing to do. But it generates worse
code for i32 umin/umax on the rv64 due to type legalize requesting
zext even though the arguments are sext. Maybe we can teach type
legalizer to use sext for umin/umax for RISCV.
It's also producing possibly worse code on i64 on RV32 since we
still end up with selects that become branches. But this seems
like something we could improve in type legalization or DAG combine.
Hopefully this makes D92095 work for RISCV with Zbb.
This adds custom opcodes for FSLW/FSRW so we can type legalize
fshl/fshr without needing to match a sign_extend_inreg.
I've used the operand order from fshl/fshr to make the isel
pattern similar to the non-W form. It was also hard to decide
another order since the register instruction has the shift amount
as the second operand, but the immediate instruction has it as
the third operand.
Differential Revision: https://reviews.llvm.org/D91479
Previously we required a sra to pattern match these properly in isel. If the consumer didn't need the result sign extended we'll have an srl instead of sra and fail to match.
This patch switches to custom legalizing to GREVIW using portions of D91259.
Differential Revision: https://reviews.llvm.org/D91457
This should result in better utilization of RORIW since we
don't need to look for a SIGN_EXTEND_INREG that may not exist.
Also remove rotl/rotr isel matching to GREVI and just prefer RORI.
This is to keep consistency so we don't have to match ROLW/RORW
to GREVIW as well. I imagine RORI/RORIW performance will be the
same or better than GREVI.
Differential Revision: https://reviews.llvm.org/D91449
This moves the recognition of GREVI and GORCI from TableGen patterns
into a DAGCombine. This is done primarily to match "deeper" patterns in
the future, like (grevi (grevi x, 1) 2) -> (grevi x, 3).
TableGen is not best suited to matching patterns such as these as the compile
time of the DAG matchers quickly gets out of hand due to the expansion of
commutative permutations.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D91259
The fshl and fshr intrinsics are defined to modulo their shift amount by the bitwidth of one of their inputs. The FSR/FSL instructions read one extra bit from the shift amount. If that bit is set the inputs are swapped. In order to preserve the semantics of the llvm intrinsics we need to make sure that the extra bit isn't set. DAG combine or instcombine may have removed any mask that was originally present.
We could be smarter here and try to use computeKnownBits to check if the bit is known zero, but wanted to start with correctness.
Differential Revision: https://reviews.llvm.org/D90905
This uses the shiftop PatFrags to handle the masked shift amount
and unmasked shift amount cases. That also checks XLen as part
of the masked amount check so we don't need separate RV32 and RV64
patterns.
Differential Revision: https://reviews.llvm.org/D91016
There is no FSLI instruction, but we can emulate it using FSRI by swapping operands and subtracting the immediate from the bitwidth.
Differential Revision: https://reviews.llvm.org/D90826
The operations in these patterns shouldn't be effected by sign
bits. And the pattern is starting from a sign_extend_inreg so
we aren't expecting sign bits to be passed through either.
Differential Revision: https://reviews.llvm.org/D90739
fsl/fsr take their shift amount in $rs2 or an immediate. The
sources are $rs1 and $rs3.
fshl/fshr ISD opcodes both concatenate operand 0 in the high bits and
operand 1 in the lower bits. fshl returns the high bits after
shifting and fshr returns the low bits. So a shift amount of 0
returns operand 0 for fshl and operand 1 for fshr.
fsl/fsr concatenate their operands in different orders such that
$rs1 will be returned for a shift amount of 0. So $rs1 needs to
come from operand 0 of fshl and operand 1 of fshr.
Differential Revision: https://reviews.llvm.org/D90735
riscv_sllw/srlw only reads the lower 32 bits of the first operand.
And the lower 5 bits of the second operands. Whether the upper
32 bits of the input are sign bits or not doesn't matter.
Also use ineg and not to shorten the patterns.
Differential Revision: https://reviews.llvm.org/D90668
We don't need custom matching, we just a need a predicate to check
the immediate is greater than 32. We can use the existing ImmSub32
to adjust the immediate.
I've also used the new predicate in the other location that used
ImmSub32. I tried to create a test case where we would break without
the greater than 32 check on that pattern, but DAG combine defeated me.
Still seemed safer to have it.
Differential Revision: https://reviews.llvm.org/D90546
DAGCombine doesn't canonicalize rotl/rotr with immediate so we
need patterns for both.
Remove the custom matcher for rotl to RORI and just use a SDNodeXForm
to convert the immediate instead. Doing this gives priority to the
rev32/rev16 versions of grevi over rori since an explicit immediate
is more precise than any immediate. I also added rotr patterns for
rev32/rev16. And removed the (or (shl), (shr)) patterns that should be
combined to rotl by DAG combine.
There is at least one other grev pattern that probably needs a
another rotr pattern, but we need more test coverage first.
Differential Revision: https://reviews.llvm.org/D90575
In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.
Differential Revision: https://reviews.llvm.org/D77152
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions from the ternary subset (zbt subextension) of the
experimental B extension of RISC-V.
It adds also the correspondent codegen tests.
This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf
Differential Revision: https://reviews.llvm.org/D79875
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions from the single-bit subset (zbs subextension) of
the experimental B extension of RISC-V.
It adds also the correspondent codegen tests.
This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf
Differential Revision: https://reviews.llvm.org/D79874
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions belonging to both the permutation and the base
subsets of the experimental B extension of RISC-V.
It adds also the correspondent codegen tests.
This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf
Differential Revision: https://reviews.llvm.org/D79873
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions from the permutation subset (zbp subextension) of
the experimental B extension of RISC-V.
It adds also the correspondent codegen tests.
This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf
Differential Revision: https://reviews.llvm.org/D79871
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions from the base subset (zbb subextension) of the
experimental B extension of RISC-V.
It adds also the correspondent codegen tests.
This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf
Differential Revision: https://reviews.llvm.org/D79870
This adds the instruction encoding and mnenomics for the proposed
RISC-V Bit Manipulation extension (version 0.92). It is implemented with
each category of instruction as its own target feature, with the 'b'
extension feature enabling all options. Since this extension is not yet
ratified, all target features are prefixed with 'experimental-' to note
their status.
Differential Revision: https://reviews.llvm.org/D65649