This avoids being dependent on SimplifyDemandedBits having cleared
those bits.
It could make sense to teach SimplifyDemandedBits to keep all
lower bits 1 in an AND mask when possible. This could be
implemented with slli+srli in the general case rather than
needing to materialize the constant.
For Zvlsseg, we need continuous vector registers for the values. We need
to define new register classes for the different combinations of (number
of fields and LMUL). For example,
when the number of fields(NF) = 3, LMUL = 2, the values will be assigned
to (V0M2, V2M2, V4M2), (V2M2, V4M2, V6M2), (V4M2, V6M2, V8M2), ...
We define the vlseg intrinsics with multiple outputs. There is no way to
describe the codegen patterns with multiple outputs in the tablegen
files. We do the codegen in RISCVISelDAGToDAG and use EXTRACT_SUBREG to
extract the values of output.
The multiple scalable vector values will be put into a struct. This
patch is depended on the support for scalable vector struct.
Differential Revision: https://reviews.llvm.org/D94229
MCTargetDesc includes headers from Utils and Utils includes headers
from MCTargetDesc. So from a library layering perspective it makes sense
for them to be in the same library. I guess the other option might be to
move the tablegen includes from RISCVMCTargetDesc.h to RISCVBaseInfo.h
so that RISCVBaseInfo.h didn't need to include RISCVMCTargetDesc.h.
Everything else that depends on Utils also depends on MCTargetDesc so
having one library seemed simpler.
Differential Revision: https://reviews.llvm.org/D93168
ComplexPatterns are kind of weird, they don't call any of the predicates on their operands. And their "complexity" used for tablegen ordering purposes in the matcher table is hand specified.
This started as an attempt to just use sext_inreg + SLOIPat to implement SLOIW just to have one less Select function. The matching for the or+shl is the same as long as you know the immediate is less than 32 for SLOIW. But that didn't work out because using uimm5 with SLOIPat didn't do anything if it was a ComplexPattern.
I realized I could just use a PatFrag with the opcodes I wanted to match and an immediate predicate would then evaluate correctly. This also computes the complexity just like any other pattern does. Then I just needed to check the constraints on the immediates in the predicate. Conveniently the predicate is evaluated after the fragment has been matched. So the structure has already been checked, we just need to find the constants.
I'll note that this is unusual, I didn't find any other targets looking through operands in PatFrag predicate. There is a PredicateCodeUsesOperands feature that can be used to collect the operands into an array that is used by AMDGPU/VOP3Instructions.td. I believe that feature exists to handle commuted matching, but since the nodes here use constants, they aren't ever commuted
Differential Revision: https://reviews.llvm.org/D91901
This patch extends the pattern-matching capability of vector-splatted
constants. When illegally-typed constants are legalized they are
canonically sign-extended to XLenVT. This preserves the sign and allows
us to match simm5. If they were zero-extended for whatever reason we'd
lose that ability: e.g. `(i8 -1) -> (XLenVT 255)` would not be matched
under the current logic.
To address this we first manually sign-extend the splatted constant from
the vector element type to int64_t. This preserves the semantics while
removing any implicitly-truncated bits.
The corresponding logic for uimm5 was not updated, the rationale being
that neither sign- nor zero-extending a legal uimm5 immediate should
change that (unless we expect actual "garbage" upper bits).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93837
This patch extends the SDNode ISel support for RVV from only the
vector/vector instructions to include the vector/scalar and
vector/immediate forms.
It uses splat_vector to carry the scalar in each case, except when
XLEN<SEW (RV32 SEW=64) when a custom node `SPLAT_VECTOR_I64` is used for
type-legalization and to encode the fact that the value is sign-extended
to SEW. When the scalar is a full 64-bit value we use a sequence to
materialize the constant into the vector register.
The non-intrinsic ISel patterns have also been split into their own
file.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93312
This patch adds two IR intrinsics for vsetvli instruction. One to set the vector length to a user specified value and one to set it to vlmax. The vlmax uses the X0 source register encoding.
Clang builtins will follow in a separate patch
Differential Revision: https://reviews.llvm.org/D92973
This node returns 2 results and uses a chain. As long as we use a DAG as part of the pseudo instruction definition where we can use the "set" operator, it looks like tablegen can handle use a pattern for this without a problem. I believe the original implementation was copied from PowerPC.
This also fixes the pseudo instruction so that it is marked as having side effects to match the definition of CSRRS and the RV64 instruction. And we don't need to explicitly clear mayLoad/mayStore since those can be inferred now.
Differential Revision: https://reviews.llvm.org/D92786
This should result in better utilization of RORIW since we
don't need to look for a SIGN_EXTEND_INREG that may not exist.
Also remove rotl/rotr isel matching to GREVI and just prefer RORI.
This is to keep consistency so we don't have to match ROLW/RORW
to GREVIW as well. I imagine RORI/RORIW performance will be the
same or better than GREVI.
Differential Revision: https://reviews.llvm.org/D91449
We need to make sure the upper 32 bits are all ones to ensure the result is properly sign extended. Previously we only checked the lower 32 bits of the mask. I've also added a check that the shift amount is less than 32. Without that the original code asserts inside maskLeadingOnes if the SROI check is removed or the SROIW pattern is checked first. I've refactored the code to use early outs to reduce nesting.
I've also updated SLOIW matching with the same changes, but I couldn't find a broken test case with the existing code.
Differential Revision: https://reviews.llvm.org/D90961
We need to ensure the upper 32 bits of the mask are zero.
So that the srl shifts zeroes into the lower 32 bits.
Differential Revision: https://reviews.llvm.org/D90585
We don't need custom matching, we just a need a predicate to check
the immediate is greater than 32. We can use the existing ImmSub32
to adjust the immediate.
I've also used the new predicate in the other location that used
ImmSub32. I tried to create a test case where we would break without
the greater than 32 check on that pattern, but DAG combine defeated me.
Still seemed safer to have it.
Differential Revision: https://reviews.llvm.org/D90546
DAGCombine doesn't canonicalize rotl/rotr with immediate so we
need patterns for both.
Remove the custom matcher for rotl to RORI and just use a SDNodeXForm
to convert the immediate instead. Doing this gives priority to the
rev32/rev16 versions of grevi over rori since an explicit immediate
is more precise than any immediate. I also added rotr patterns for
rev32/rev16. And removed the (or (shl), (shr)) patterns that should be
combined to rotl by DAG combine.
There is at least one other grev pattern that probably needs a
another rotr pattern, but we need more test coverage first.
Differential Revision: https://reviews.llvm.org/D90575
The code is looking for (sext_inreg (or (shl X, C2), (shr (and Y, C3), C1))).
We need to ensure X and Y are the same.
Differential Revision: https://reviews.llvm.org/D90580
In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.
Differential Revision: https://reviews.llvm.org/D77152
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions from the ternary subset (zbt subextension) of the
experimental B extension of RISC-V.
It adds also the correspondent codegen tests.
This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf
Differential Revision: https://reviews.llvm.org/D79875
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions belonging to both the permutation and the base
subsets of the experimental B extension of RISC-V.
It adds also the correspondent codegen tests.
This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf
Differential Revision: https://reviews.llvm.org/D79873
This patch provides optimization of bit manipulation operations by
enabling the +experimental-b target feature.
It adds matching of single block patterns of instructions to specific
bit-manip instructions from the base subset (zbb subextension) of the
experimental B extension of RISC-V.
It adds also the correspondent codegen tests.
This patch is based on Claire Wolf's proposal for the bit manipulation
extension of RISCV:
https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf
Differential Revision: https://reviews.llvm.org/D79870
For an addition with an immediate in specific ranges, a pair of
addi-addi can be generated instead of the ordinary lui-addi-add serial.
Reviewed By: MaskRay, luismarques
Differential Revision: https://reviews.llvm.org/D82262
We can often fold an ADDI into the offset of load/store instructions:
(load (addi base, off1), off2) -> (load base, off1+off2)
(store val, (addi base, off1), off2) -> (store val, base, off1+off2)
This is possible when the off1+off2 continues to fit the 12-bit immediate.
We remove the previous restriction where we would never fold the ADDIs if
the load/stores had nonzero offsets. We now do the fold the the resulting
constant still fits a 12-bit immediate, or if off1 is a variable's address
and we know based on that variable's alignment that off1+offs2 won't overflow.
Differential Revision: https://reviews.llvm.org/D79690
Summary:
RISC-V uses a post-select peephole pass to optimise
`(load/store (ADDI $reg, %lo(addr)), 0)` into `(load/store $reg, %lo(addr))`.
This peephole wasn't firing for accesses to constant pools, which is how we
materialise most floating point constants.
This adds support for the constantpool case, which improves code generation for
lots of small FP loading examples. I have not added any tests because this
structure is well-covered by the `fp-imm.ll` testcases, as well as almost
all other uses of floating point constants in the RISC-V backend tests.
Reviewed By: luismarques, asb
Differential Revision: https://reviews.llvm.org/D79523
Summary:
RISC-V uses a post-select peephole pass to optimise
`(load/store (ADDI $reg, %lo(addr)), 0)` into `(load/store $reg, %lo(addr))`.
This peephole wasn't firing for accesses to constant pools, which is how we
materialise most floating point constants.
This adds support for the constantpool case, which improves code generation for
lots of small FP loading examples. I have not added any tests because this
structure is well-covered by the `fp-imm.ll` testcases, as well as almost
all other uses of floating point constants in the RISC-V backend tests.
Reviewed By: luismarques, asb
Differential Revision: https://reviews.llvm.org/D79523
For the downstream RISCV maintenance, it would be easier to inherent
RISCVISelDAGToDAG by including header and only override the method that needs
to be customized for the provider non-standard ISA extension without touching
RISCVISelDAGToDAG.cpp which may cause conflict when upgrading the downstream
LLVM version.
Differential Revision: https://reviews.llvm.org/D77117
This allows us to delete InlineAsm::Constraint_i workarounds in
SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and
TargetLowering::getInlineAsmMemConstraint overrides.
They were introduced to X86 in r237517 to prevent crashes for
constraints like "=*imr". They were later copied to other targets.
This allows arguments with the constraint A to be lowered to input nodes
for RISC-V, which implies a memory address stored in a register.
This patch adds the minimal amount of code required to get operands with
the right constraints to compile.
https://reviews.llvm.org/D54296
llvm-svn: 369095
On RISC-V, the `cycle` CSR holds a 64-bit count of the number of clock
cycles executed by the core, from an arbitrary point in the past. This
matches the intended semantics of `@llvm.readcyclecounter()`, which we
currently leave to the default lowering (to the constant 0).
With this patch, we will now correctly lower this intrinsic to the
intended semantics, using the user-space instruction `rdcycle`. On
64-bit targets, we can directly lower to this instruction.
On 32-bit targets, we need to do more, as `rdcycle` only returns the low
32-bits of the `cycle` CSR. In this case, we perform a custom lowering,
based on the PowerPC lowering, using `rdcycleh` to obtain the high
32-bits of the `cycle` CSR. This custom lowering inserts a new basic
block which detects overflow in the high 32-bits of the `cycle` CSR
during reading (because multiple instructions are required to read). The
emitted assembly matches the suggested assembly in the RISC-V
specification.
Differential Revision: https://reviews.llvm.org/D64125
llvm-svn: 365201
The break isn't strictly needed yet as there is no subsequent entry in the
case. But adding to prevent mistakes further down the road.
llvm-svn: 351785
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
As discussed in the RFC
<http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit
RISC-V has i64 as the only legal integer type. This patch introduces patterns
to support codegen of the new instructions
introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw,
sraiw, ld, sd.
Custom selection code is needed for srliw as SimplifyDemandedBits will remove
lower bits from the mask, meaning the obvious pattern won't work:
def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32),
(SRLIW GPR:$rs1, uimm5:$shamt)>;
This is sufficient to compile and execute all of the GCC torture suite for
RV64I other than those files using frameaddr or returnaddr intrinsics
(LegalizeDAG doesn't know how to promote the operands - a future patch
addresses this).
When promoting i32 sltu/sltiu operands, it would be more efficient to use
sign-extension rather than zero-extension for RV64. A future patch adds a hook
to allow this.
Differential Revision: https://reviews.llvm.org/D52977
llvm-svn: 347973
This commit introduces support for materialising 64-bit constants for RV64I,
making use of the RISCVMatInt::generateInstSeq helper in order to share logic
for immediate materialisation with the MC layer (where it's used for the li
pseudoinstruction).
test/CodeGen/RISCV/imm.ll is updated to test RV64, and gains new 64-bit
constant tests. It would be preferable if anyext constant returns were sign
rather than zero extended (see PR39092). This patch simply adds an explicit
signext to the returns in imm.ll.
Further optimisations for constant materialisation are possible, most notably
for mask-like values which can be generated my loading -1 and shifting right.
A future patch will standardise on the C++ codepath for immediate selection on
RV32 as well as RV64, and then add further such optimisations to
RISCVMatInt::generateInstSeq in order to benefit both RV32 and RV64 for
codegen and li expansion.
Differential Revision: https://reviews.llvm.org/D52962
llvm-svn: 347042
r343712 performed this optimisation during instruction selection. As Eli
Friedman pointed out in post-commit review, implementing this as a DAGCombine
might allow opportunities for further optimisations.
llvm-svn: 343741
Although we can't write a tablegen pattern to remove redundant
splitf64+buildf64 pairs due to the multiple return values, we can handle it
with some C++ selection code. This is simpler than removing them after
instruction selection through RISCVDAGToDAGISel::PostprocessISelDAG, as was
done previously.
llvm-svn: 343712
Summary:
In r333455 we added a peephole to fix the corner cases that result
from separating base + offset lowering of global address.The
peephole didn't handle some of the cases because it only has a basic
block view instead of a function level view.
This patch replaces that logic with a machine function pass. In
addition to handling the original cases it handles uses of the global
address across blocks in function and folding an offset from LW\SW
instruction. This pass won't run for OptNone compilation, so there
will be a negative impact overall vs the old approach at O0.
Reviewers: asb, apazos, mgrang
Reviewed By: asb
Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones
Differential Revision: https://reviews.llvm.org/D47857
llvm-svn: 335786
Summary:
Base and offset are always separated when a GlobalAddress node is lowered
(rL332641) as an optimization to reduce instruction count. However, this
optimization is not profitable if the Global Address ends up being used in only
instruction.
This patch adds peephole optimizations that merge an offset of
an address calculation into the LUI %%hi and ADD %lo of the lowering sequence.
The peephole handles three patterns:
1) ADDI (ADDI (LUI %hi(global)) %lo(global)), offset
--->
ADDI (LUI %hi(global + offset)) %lo(global + offset).
This generates:
lui a0, hi (global + offset)
add a0, a0, lo (global + offset)
Instead of
lui a0, hi (global)
addi a0, hi (global)
addi a0, offset
This pattern is for cases when the offset is small enough to fit in the
immediate filed of ADDI (less than 12 bits).
2) ADD ((ADDI (LUI %hi(global)) %lo(global)), (LUI hi_offset))
--->
offset = hi_offset << 12
ADDI (LUI %hi(global + offset)) %lo(global + offset)
Which generates the ASM:
lui a0, hi(global + offset)
addi a0, lo(global + offset)
Instead of:
lui a0, hi(global)
addi a0, lo(global)
lui a1, (offset)
add a0, a0, a1
This pattern is for cases when the offset doesn't fit in an immediate field
of ADDI but the lower 12 bits are all zeros.
3) ADD ((ADDI (LUI %hi(global)) %lo(global)), (ADDI lo_offset, (LUI hi_offset)))
--->
offset = global + offhi20<<12 + offlo12
ADDI (LUI %hi(global + offset)) %lo(global + offset)
Which generates the ASM:
lui a1, %hi(global + offset)
addi a1, %lo(global + offset)
Instead of:
lui a0, hi(global)
addi a0, lo(global)
lui a1, (offhi20)
addi a1, (offlo12)
add a0, a0, a1
This pattern is for cases when the offset doesn't fit in an immediate field
of ADDI and both the lower 1 bits and high 20 bits are non zero.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, jordy.potman.lists, apazos,
niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang
llvm-svn: 333455
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
Inspired by r331508, I did a grep and found these.
Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa.
llvm-svn: 331577
Reverts rL330224, while issues with the C extension and missed common
subexpression elimination opportunities are addressed. Neither of these issues
are visible in current RISC-V backend unit tests, which clearly need
expanding.
llvm-svn: 330281
The implementation follows the MIPS backend and expands the
pseudo instruction directly during asm parsing. As the result, only
real MC instructions are emitted to the MCStreamer. Additionally,
PseudoLI instructions are emitted during codegen. The actual
expansion to real instructions is performed during MI to MC lowering
and is similar to the expansion performed by the GNU Assembler.
Differential Revision: https://reviews.llvm.org/D41949
Patch by Mario Werner.
llvm-svn: 330224