Some FP compares expand to a sequence ending with (xor X, 1) to invert the result. If
the consumer is a select_cc we can likely get rid of this xor by fixing
up the select_cc condition.
This patch combines (select_cc (xor X, 1), 0, setne, trueV, falseV) -
(select_cc X, 0, seteq, trueV, falseV) if we can prove X is 0/1.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D94546
MCTargetDesc includes headers from Utils and Utils includes headers
from MCTargetDesc. So from a library layering perspective it makes sense
for them to be in the same library. I guess the other option might be to
move the tablegen includes from RISCVMCTargetDesc.h to RISCVBaseInfo.h
so that RISCVBaseInfo.h didn't need to include RISCVMCTargetDesc.h.
Everything else that depends on Utils also depends on MCTargetDesc so
having one library seemed simpler.
Differential Revision: https://reviews.llvm.org/D93168
This patch custom lowers ISD::VSCALE into a csrr vlenb followed
by a shift right by 3 followed by a multiply by the scale amount.
I've added computeKnownBits support to indicate that the csrr vlenb
always produces 3 trailng bits of 0s so the shift right is "exact".
This allows the shift and multiply sequence to be nicely optimized
into a single shift or removed completely when the scale amount is
a power of 2.
The non power of 2 case multiplying by 24 is still producing
suboptimal code. We could remove the right shift and use a
multiply by 3. Hopefully we can improve DAG combine to fix that
since it's not unique to this sequence.
This replaces D94144.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D94249
The custom expansion of select operations in the RISC-V backend
interferes with the matching of cmov instructions. Legalizing
select when the Zbt extension is available solves that problem.
Reviewed By: lenary, craig.topper
Differential Revision: https://reviews.llvm.org/D93767
We can use a 0 immediate to avoid needing to materialize 0 into
an FPR first.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94459
Define the `vfclass` IR intrinsics for the respective V instructions.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>
Differential Revision: https://reviews.llvm.org/D94356
Original patch by @rogfer01.
This patch adds ISel patterns for the above operations to the
corresponding vector/vector and vector/scalar RVV instructions, as well
as extra patterns to match operand-swapped scalar/vector vfrsub and
vfrdiv.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94408
Original patch by @rogfer01.
All ordered comparisons except ONE are supported natively, and all
unordered comparisons except UNE are expanded into sequences involving
explicit NaN checks and mask arithmetic.
Additionally, we expand GT,OGT,GE,OGE to their swapped-operand versions, and
pattern-match those back to the "original", swapping operands once more. This
way we catch both operations and both "vf" and "fv" forms with fewer patterns.
Also add support for floating-point splat_vector, with an optimization for
splatting fpimm0.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94242
The Pseudo class sets isCodeGenOnly=1 which causes the asm strings
in the pseudos to be ignored. I think this is why the aliases are
needed at all.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94024
This patch moves all but the BaseInstr to bits in TSFlags.
For the index fields, we can just use a bit to indicate their presence.
The locations of the operands are well defined.
This reduces the llc binary by about 32K on my build. It also
removes the binary search of the table from the custom inserter.
Instead we just check that the SEW op is present.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D94375
This makes the mask align with the position of the bits in TSFlags
which is a little more logical.
I might be adding more fields to TSFlags and some might be single
bits where just ANDing with mask to test the bit would make sense.
While there rename TargetFlags in validateInstruction to reflect
that it's just the constraint bits.
We currently have about 7000 opcodes in the RISCVGenInstrInfo.inc
enum. We can use uint16_t to store these values. We would need to
grow by nearly 9x before we run out of space so this should last
for a little while.
This reduces the llc binary by 32K.
Original patch by @rogfer01.
The RVV integer comparison instructions are defined in such a way that
many LLVM operations are defined by using the "opposite" comparison
instruction and swapping the operands. This is done in this patch in
most cases, except for the mappings where the immediate range must be
adjusted to accomodate:
va < i --> vmsle{u}.vi vd, va, i-1, vm
va >= i --> vmsgt{u}.vi vd, va, i-1, vm
That is left for future optimization; this patch supports all operations
but in the case of the missing mappings the immediate will be moved to
a scalar register first.
Since there are so many condition codes and operand cases to check, it
was decided to reduce the test burden by only testing the "vscale x 8"
vector types.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94168
The TableGen immAllOnesV and immAllZerosV helpers implicitly wrapped the
ISD::isBuildVectorAll(Ones|Zeros) helper functions. This was inhibiting
their use for targets such as RISC-V which use ISD::SPLAT_VECTOR. In
particular, RISC-V had to define its own 'vnot' fragment.
In order to extend the scope of these nodes to include support for
ISD::SPLAT_VECTOR, two new ISD predicate functions have been introduced:
ISD::isConstantSplatVectorAll(Ones|Zeros). These effectively supersede
the older "isBuildVector" predicates, which are now simple wrappers for
the new functions. They pass a defaulted boolean toggle which preserves
the old behaviour. It is hoped that in time all call-sites can be ported
to the "isConstantSplatVector" functions.
While the use of ISD::isBuildVectorAll(Ones|Zeros) has not changed, the
behaviour of the TableGen immAll(Ones|Zeros)V **has**. To test the new
functionality, the custom RISC-V TableGen fragment has been removed and
replaced with the built-in 'vnot'. To test their use as pattern-roots, two
splat patterns have been updated accordingly.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94223
This is a first change needed to fix a crash in which the emergency
spill splot ends being out of reach. This happens when we run the
register scavenger after we have eliminated the frame indexes. The fix
for the actual crash will come in a later change.
This change removes an extra stack size increase we do in
RISCVFrameLowering::determineFrameLayout.
We don't have to change the size of the stack here as
PEI::calculateFrameObjectOffsets is already doing this with the right
size accounting the extra alignment.
Differential Revision: https://reviews.llvm.org/D89237
1. Break MUL with specific constant to a SLLI and an ADD/SUB on riscv32
with the M extension.
2. Break MUL with specific constant to two SLLI and an ADD/SUB, if the
constant needs a pair of LUI/ADDI to construct.
Reviewed by: craig.topper
Differential Revision: https://reviews.llvm.org/D93619
Define the `vfsqrt` IR intrinsics for the respective V instructions.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>
Differential Revision: https://reviews.llvm.org/D93745
The patterns that want to use 'vnot' use a custom PatFrag. This is
because 'vnot' uses immAllOnesV which implicitly uses BUILD_VECTOR
rather than SPLAT_VECTOR.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94078
nvxXi1 types are legal with V extension and that's the result
vmseq/vmsne/vmslt/etc instructions return.
No test cases yet because the setcc isel patterns aren't in
and we'll need more than basic tests to observe this. I locally
tested that this plus D947078, D94168, D94142, and D94149
was enough to be able to handle the overflow result from
llvm.sadd.overflow.
There is no test coverage for the mulhs or mulhu patterns as I can't get
the DAGCombiner to generate them for scalable vectors. There are a few
places in that still need updating for that to work. I left the patterns
in regardless as they are correct.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94073
If the return values can't be lowered to registers
SelectionDAG performs the sret demotion. This patch
contains the basic implementation for the same in
the GlobalISel pipeline.
Furthermore, targets should bring relevant changes
during lowerFormalArguments, lowerReturn and
lowerCall to make use of this feature.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D92953
ComplexPatterns are kind of weird, they don't call any of the predicates on their operands. And their "complexity" used for tablegen ordering purposes in the matcher table is hand specified.
This started as an attempt to just use sext_inreg + SLOIPat to implement SLOIW just to have one less Select function. The matching for the or+shl is the same as long as you know the immediate is less than 32 for SLOIW. But that didn't work out because using uimm5 with SLOIPat didn't do anything if it was a ComplexPattern.
I realized I could just use a PatFrag with the opcodes I wanted to match and an immediate predicate would then evaluate correctly. This also computes the complexity just like any other pattern does. Then I just needed to check the constraints on the immediates in the predicate. Conveniently the predicate is evaluated after the fragment has been matched. So the structure has already been checked, we just need to find the constants.
I'll note that this is unusual, I didn't find any other targets looking through operands in PatFrag predicate. There is a PredicateCodeUsesOperands feature that can be used to collect the operands into an array that is used by AMDGPU/VOP3Instructions.td. I believe that feature exists to handle commuted matching, but since the nodes here use constants, they aren't ever commuted
Differential Revision: https://reviews.llvm.org/D91901
vmsltu.vi v0, v1, 0 is always false there is no unsigned number
less than 0. vmsleu.vi v0, v1, -1 on the other hand is always true
since -1 will be considered unsigned max and all numbers are <=
unsigned max.
A similar problem exists for vmsgeu.vi v0, v1, 0 which is always true,
but becomes vmsgtu.vi v0, v1, -1 which is always false.
To match the GNU assembler we'll emit vmsne.vv and vmseq.vv with
the same register for these cases instead.
I'm using AsmParserOnly pseudo instructions here because we can't
match an explicit immediate in an InstAlias. And we can't use a
AsmOperand for the zero because the output we want doesn't use an
immediate so there's nowhere to name the AsmOperand we want to use.
To keep the implementations similar I'm also handling signed with
pseudo instructions even though they don't have this issue. This
way we can avoid the special renderMethod that decremented by 1 so
the immediate we see for the pseudo instruction in processInstruction
is 0 and not -1. Another option might have been to have a different
simm5_plus1 operand for the unsigned case or just live with the
immediate being pre-decremented. I felt this way was clearer, but I'm
open to other opinions.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94035
This alias for andi x, 255 was recently added to the spec. If we
print it, code we output can't be compiled with -fno-integrated-as
unless the GNU assembler is also a version that supports alias.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D93826
There are vmsle(u).vx and vmsle(u).vi instructions, but there is
only vmslt(u).vx and no vmslt(u).vi. vmslt(u).vi can be emulated
for some immediates by decrementing the immediate and using vmsle(u).vi.
To avoid the user needing to know about this, this patch does this
conversion.
The assembler does the same thing for vmslt(u).vi and vmsge(u).vi
pseudoinstructions. There is no vmsge(u).vx intrinsic or
instruction so this patch is limited to vmslt(u).
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94070
With the i32 these patterns will only fire on RV32, but they
don't look RV32 specific.
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D93843
We could expand vmsge{u}.vx pseudo instructions in RISCVAsmParser.
It is more appropriate to expand it before encoding.
Differential Revision: https://reviews.llvm.org/D93968
Define intrinsics:
1. vfcvt.xu.f.v/vfcvt.x.f.v
2. vfcvt.rtz.xu.f.v/vfcvt.rtz.x.f.v
3. vfcvt.f.xu.v/vfcvt.f.x.v
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Monk Chiang <monk.chiang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93933
Define intrinsics:
1. vfncvt.xu.f.w/vfncvt.x.f.w
2. vfncvt.rtz.xu.f.w/vfncvt.rtz.x.f.w
3. vfncvt.f.xu.w/vfncvt.f.x.w
4. vfncvt.f.f.w/vfncvt.rod.f.f.w
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Monk Chiang <monk.chiang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93932
Define intrinsics:
1. vfwcvt.xu.f.v/vfwcvt.x.f.v
2. vfwcvt.rtz.xu.f.v/vfwcvt.rtz.x.f.v
3. vfwcvt.f.xu.v/vfwcvt.f.x.v
4. vfwcvt.f.f.v
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Monk Chiang <monk.chiang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93855
This patch defines vcompress intrinsics and lower to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Differential revision: https://reviews.llvm.org/D93809
Define vsext/vzext intrinsics.and lower to V instructions.
Define new fraction register class fields in LMULInfo and a
NoReg to present invalid LMUL register classes.
Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93893
This complements the existing RVV ISel patterns for arithmetic, bitwise
and shifts with the remaining operations in those categories: sub, and,
xor, sra.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93852
If the destination is tied, then user has some control of the
register used for input. They would have the ability to control
the value of any tail elements. By using tail agnostic we take
this option away from them.
Its not clear that the intrinsics are defined such that this isn't
supposed to work. And undisturbed is a valid implementation for agnostic
so code wouldn't even fail to work on all systems if we always used
agnostic.
The vcompress intrinsic is defined to require tail undisturbed so
at minimum we need this for that instruction or need to redefine
the intrinsic.
I've made an exception here for vmv.s.x/fmv.s.f and reduction
instructions which only write to element 0 regardless of the tail
policy. This allows us to keep the agnostic policy on those which
should allow better redundant vsetvli removal.
An enhancement would be to check for undef input and keep the
agnostic policy, but we don't have good test coverage for that yet.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D93878
The spec for these instructions include this note. "The destination register
cannot overlap either the source register or the mask register ('v0') if the
instruction is masked." So we need earlyclobber to enforce this constraint.
I've regenerated the tests with update_llc_test_checks.py to show the
effects of the earlyclobber.
Reviewed By: khchen, frasercrmck
Differential Revision: https://reviews.llvm.org/D93867
Define vmclr.m/vmset.m intrinsics and lower to vmxor.mm/vmxnor.mm.
Ideally all rvv pseudo instructions could be implemented in C header,
but those two instructions don't take an input, codegen can not guarantee
that the source register becomes the same as the destination.
We expand pseduo-v-inst into corresponding v-inst in
RISCVExpandPseudoInsts pass.
Reviewed By: craig.topper, frasercrmck
Differential Revision: https://reviews.llvm.org/D93849
Define those intrinsics and lower to V instructions.
Use update_llc_test_checks.py for viota.m tests to check
earlyclobber is applied correctly.
mask viota.m tests uses the same argument as input and mask for
avoid dependency of D93364.
We work with @rogfer01 from BSC to come out this patch.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D93823
This patch extends the pattern-matching capability of vector-splatted
constants. When illegally-typed constants are legalized they are
canonically sign-extended to XLenVT. This preserves the sign and allows
us to match simm5. If they were zero-extended for whatever reason we'd
lose that ability: e.g. `(i8 -1) -> (XLenVT 255)` would not be matched
under the current logic.
To address this we first manually sign-extend the splatted constant from
the vector element type to int64_t. This preserves the semantics while
removing any implicitly-truncated bits.
The corresponding logic for uimm5 was not updated, the rationale being
that neither sign- nor zero-extending a legal uimm5 immediate should
change that (unless we expect actual "garbage" upper bits).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93837