In most of cases, it has a single space after comma in assembly operands.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103790
Writes of a mask result are always tail agnostic.
Unfortunately, this seems to have made codegen worse. I can only
think this must be because the vsetvli was acting as some sort
of barrier that prevented some code movement in the scheduler.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D103331
This can help avoid needing a virtual register for the vsetvl output
when the AVL is X0. For other register AVLs it can shorter the live
range of the AVL register if it isn't needed later.
There's probably no advantage when AVL is a 5 bit immediate that
can use vsetivli. But do it anyway for consistency.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D103215
We aren't going to connect the result to anything so we might
as well avoid allocating a register.
Reviewed By: frasercrmck, HsiangKai
Differential Revision: https://reviews.llvm.org/D102031
Theses instructions are allowed to write v0 when they are masked.
We'll still never use v0 because of the earlyclobber constraint so
this doesn't really help anything. It just makes the definitions
correct.
While I was there remove an unused multiclass I noticed.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D101118
Rather than doing splatting each separately and doing bit manipulation
to merge them in the vector domain, copy the data to the stack
and splat it using a strided load with x0 stride. At least on
some implementations this vector load is optimized to not do
a load for each element.
This is equivalent to how we move i64 to f64 on RV32.
I've only implemented this for the intrinsic fallbacks in this
patch. I think we do similar splatting/shifting/oring in other
places. If this is approved, I'll refactor the others to share
the code.
Differential Revision: https://reviews.llvm.org/D101002
This patch changes the operand order of masked vmslt[u]
from (mask, rs1, scalar, maskedoff, vl)
to (maskedoff, rs1, scalar, mask, vl).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98839
There are vmsle(u).vx and vmsle(u).vi instructions, but there is
only vmslt(u).vx and no vmslt(u).vi. vmslt(u).vi can be emulated
for some immediates by decrementing the immediate and using vmsle(u).vi.
To avoid the user needing to know about this, this patch does this
conversion.
The assembler does the same thing for vmslt(u).vi and vmsge(u).vi
pseudoinstructions. There is no vmsge(u).vx intrinsic or
instruction so this patch is limited to vmslt(u).
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94070
If the destination is tied, then user has some control of the
register used for input. They would have the ability to control
the value of any tail elements. By using tail agnostic we take
this option away from them.
Its not clear that the intrinsics are defined such that this isn't
supposed to work. And undisturbed is a valid implementation for agnostic
so code wouldn't even fail to work on all systems if we always used
agnostic.
The vcompress intrinsic is defined to require tail undisturbed so
at minimum we need this for that instruction or need to redefine
the intrinsic.
I've made an exception here for vmv.s.x/fmv.s.f and reduction
instructions which only write to element 0 regardless of the tail
policy. This allows us to keep the agnostic policy on those which
should allow better redundant vsetvli removal.
An enhancement would be to check for undef input and keep the
agnostic policy, but we don't have good test coverage for that yet.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D93878
Define vector compare intrinsics and lower them to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Differential Revision: https://reviews.llvm.org/D93368