When lowering GlobalAddressNodes, we were removing a non-zero offset and
creating a separate ADD.
It already comes out of SelectionDAGBuilder with a separate ADD. The
ADD was being removed by DAGCombiner.
This patch disables the DAG combine so we don't have to reverse it.
Test changes all look to be instruction order changes. Probably due
to different DAG node ordering.
Differential Revision: https://reviews.llvm.org/D126558
A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature.
There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment.
Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them.
Differential Revision: https://reviews.llvm.org/D126085
This patch tries to solve the incoordination between the direct and intermediate cast caused by D123975.
This patch replaces ISD::FP_EXTEND and ISD::FP_ROUND with RVV VL op in the lowering of FP scalable vector direct cast to unify with the intermediate cast.
And it also changes the FP widenning pattern with the VL op.
Differential Revision: https://reviews.llvm.org/D125364
Update test to check MIR after finalize-isel instead of debug output.
This is of course not the only place we should preserve FMF, but
it's the most obvious one.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D126306
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.
The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.
Differential Revision: https://reviews.llvm.org/D125557
During early gather/scatter enablement two different approaches
were taken to represent scaled indices:
* A Scale operand whereby byte_offsets = Index * Scale
* An IndexType whereby byte_offsets = Index * sizeof(MemVT.ElementType)
Having multiple representations is bad as shown by this patch which
fixes instances where the two are out of sync. The dedicated scale
operand is more flexible and pervasive so this patch removes the
UNSCALED values from IndexType. This means all indices are scaled
but the scale can be one, hence unscaled. SDNodes now use the scale
operand to answer the "isScaledIndex" question.
I toyed with the idea of keeping the UNSCALED enums and helper
functions but because they will have no uses and force SDNodes to
validate the set of supported values I figured it's best to remove
them. We can re-add them if there's a real need. For similar
reasons I've kept the IndexType enum when a bool could be used as I
think being explicitly looks better.
Depends On D123347
Differential Revision: https://reviews.llvm.org/D123381
This patch replaces some for-each set with the new arrayref argument API, since it already used an array in defination, I think this change won't cause any ambiguity.
Differential Revision: https://reviews.llvm.org/D125455
When building the final merged node, we were using the original chain
rather than the output chain of the new operation. After some collapsing
of the chain this could cause the loads be incorrectly scheduled respect
to later stores.
This was uncovered by SingleSource/Regression/C/gcc-c-torture/execute/pr36038.c
of the llvm testsuite.
https://reviews.llvm.org/D125560
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125323
This hook determines if SimplifySetcc transforms (X & (C l>>/<< Y))
==/!= 0 into ((X <</l>> Y) & C) ==/!= 0. Where C is a constant and
X might be a constant.
The default implementation favors doing the transform if X is not
a constant. Otherwise the code is left alone. There is a provision
that if the target supports a bit test instruction then the transform
will favor ((1 << Y) & X) ==/!= 0. RISCV does not say it has a variable
bit test operation.
RISCV with Zbs does have a BEXT instruction that performs (X >> Y) & 1.
Without Zbs, (X >> Y) & 1 still looks preferable to ((1 << Y) & X) since
we can fold use ANDI instead of putting a 1 in a register for SLL.
This patch overrides this hook to favor bit extract patterns and
otherwise falls back to the "do the transform if X is not a constant"
heuristic.
I've added tests where both C and X are constants with both the shl form
and lshr form. I've also added a test for a switch statement that lowers
to a bit test. That was my original motivation for looking at this.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D124639
Type legalization will want to turn (srl X, Y) into RISCVISD::SRLW,
which will prevent us from using a BEXT instruction.
I don't think there is any precedent for type promotion checking
users to decide how to promote. Instead, I've added this DAG combine to
do it before type legalization.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D124109
Rather than VP_SEXT/VP_ZEXT/VP_TRUNC, having
VP_SIGN_EXTEND/VP_ZERO_EXTEND/VP_TRUNCATE better matches their non-VP
counterparts.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D125298
This patch adds rvv codegen support for vp.fpext. The lowering of fp_round, vp.fptrunc, fp_extend and vp.fpext share most code so use a common lowering function to handle these four.
And this patch changes the intermediate cast from ISD::FP_EXTEND/ISD::FP_ROUND to the RVV VL version op RISCVISD::FP_EXTEND_VL and RISCVISD::FP_ROUND_VL for scalable vectors.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D123975
This improves opportunities to use bset/bclr/binv. Unfortunately,
there are no W versions of these instrcutions so this isn't always
a clear win. If we use SLLW we get free sign extend and shift masking,
but need to put a 1 in a register and can't remove an or/xor. If
we use bset/bclr/binv we remove the immediate materializationg and
logic op, but might need a mask on the shift amount and sext.w.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D124096
We can't shift-right negative numbers to divide them, so avoid emitting
such sequences. Use negative numerators as a proxy for this situation, since
the indices are always non-negative.
An alternative strategy could be to add a compiler flag to emit division
instructions, which would at least allow us to test the VID sequence
matching itself.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123796
There's an existing generic combine that does this for legal types.
This patch adds a RISCV specific combine for W instructions.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D123983
This patch fixes a bug when lowering BUILD_VECTOR via VID sequences.
After adding support for fractional steps in D106533, elements with zero
steps may be skipped if no step has yet been computed. This allowed
certain sequences to slip through the cracks, being identified as VID
sequences when in fact they are not.
The fix for this is to perform a second loop over the BUILD_VECTOR to
validate the entire sequence once the step has been computed. This isn't
the most efficient, but on balance the code is more readable and
maintainable than doing back-validation during the first loop.
Fixes the tests introduced in D123785.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123786
This patch adds rvv codegen support for vp.fptrunc. The lowering of fp_round and vp.fptrunc share most code so use a common lowering function to handle those two, similar to vp.trunc.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D123841
Materializing constants on RISCV is simpler if the constant is sign
extended from i32. By default i32 constant operands of phis are
zero extended.
This patch adds a hook to allow RISCV to override this for i32. We
have an existing isSExtCheaperThanZExt, but it operates on EVT which
we don't have at these places in the code.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D122951
This was added before Zve extensions were defined. I think users
should use Zve32x or Zve32f now. Though we will lose support for limiting
ELEN to 16 or 8, but I hope no one was using that.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D123418
This patch adds the minimum required to successfully lower vp.icmp via
the new ISD::VP_SETCC node to RVV instructions.
Regular ISD::SETCC goes through a lot of canonicalization which targets
may rely on which has not hereto been ported to VP_SETCC. It also
supports expansion of individual condition codes and a non-boolean
return type. Support for all of that will follow in later patches.
In the case of RVV this largely isn't a problem as the vector integer
comparison instructions are plentiful enough that it can lower all
VP_SETCC nodes on legal integer vectors except for boolean vectors,
which regular SETCC folds away immediately into logical operations.
Floating-point VP_SETCC operations aren't as well supported in RVV and
the backend relies on condition code expansion, so support for those
operations will come in later patches.
Portions of this code were taken from the VP reference patches.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D122743
We can do this conversion by converting the same sized integer type, then compare the result with 0. The conversion is undefined if the converted FP value doesn't fit in an i1.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D122678
If we expand (uaddo X, 1) we previously expanded the overflow calculation
as (X + 1) <u X. This potentially increases the live range of X and
can prevent X+1 from reusing the register that previously held X.
Since we're adding 1, overflow only occurs if X was UINT_MAX in which
case (X+1) would be 0. So this patch adds a special case to expand
the overflow calculation to (X+1) == 0.
This seems to help with uaddo intrinsics that get introduced by
CodeGenPrepare after LSR. Alternatively, we could block the uaddo
transform in CodeGenPrepare for this case.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D122933
The splat_vector will be legalized to build_vector eventually
anyway. This patch makes it take fewer steps.
Unfortunately, this results in some codegen changes. It looks
like it comes down to how the nodes were ordered in the topological
sort for isel. Because the build_vector is created earlier we end up
with a different ordering of nodes.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D122185
This function now takes a uint64_t instead of an APInt. The caller
is responsible for masking the shift amount, extracting and inserting
into the KnownBits APInts, and inverting to compute zeros.
This is less code and cleaner division of responsibilities.
Modified DAGCombiner to pass the shift the bittest input and the shift amount
to hasBitTest. This matches the other call to hasBitTest in TargetLowering.h
This is an alternative to D122454.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D122458
Don't call EltVT.getSizeInBits() or SrcEltVT.getSizeInBits() a second
time. They are already in EltSize or SrcEltSize variables.
Refactor some comparisons to use multiply instead of division.
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.
We already special cased sign extended constants, this extends it
to any sign extended value.
I've only added tests for one case of vadd. Most intrinsics go
through the same check.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D122186
On RV32, we need to type legalize i64 scalar arguments to intrinsics.
We usually do this by splatting the value into a vector separately.
If the scalar happens to be sign extended, we can continue using a .vx
intrinsic.
We already special cased sign extended constants, this extends it
to any sign extended value.
I've only added tests for one case of vadd. Most intrinsics go
through the same check. I can add more tests if we're concerned.
Differential Revision: https://reviews.llvm.org/D122186
RISCVISelDAGToDAG's selectImm uses RISCVTargetLowering::getAddr
(specifically the ConstantPoolSDNode) as of 41454ab256 ("[RISCV] Use
constant pool for large integers"), but nothing explicitly instantiates
any of the templates, the only reason they exist is because of the
various lowering methods in RISCVISelLowering.cpp that themselves use
the methods. However, with inlining, those can end up not existing as
real functions and thus not be exported, leading to link errors. Up
until now this hasn't happened, but for whatever reason D121654 has
triggered this on the sanitizer-ppc64be-linux buildbot, giving:
../../../../lib/libLLVMRISCVCodeGen.a(RISCVISelDAGToDAG.cpp.o): In function `selectImm(llvm::SelectionDAG*, llvm::SDLoc const&, llvm::MVT, long, llvm::RISCVSubtarget const&)':
RISCVISelDAGToDAG.cpp:(.text._ZL9selectImmPN4llvm12SelectionDAGERKNS_5SDLocENS_3MVTElRKNS_14RISCVSubtargetE+0x3d8): undefined reference to `llvm::SDValue llvm::RISCVTargetLowering::getAddr<llvm::ConstantPoolSDNode>(llvm::ConstantPoolSDNode*, llvm::SelectionDAG&, bool) const'
collect2: error: ld returned 1 exit status
Fix this by explicitly instantiating getAddr in its four different forms
so separate translation units can reliably use it.
Fixes: 41454ab256 ("[RISCV] Use constant pool for large integers")
Since we have SPLAT_VECTOR_PARTS these days, I don't think we need
to go through extra lengths to avoid introducing an illegal scalar type.
We can just call getConstant using the scalable vector type and let
it create either a SPLAT_VECTOR or a SPLAT_VECTOR_PARTS.
Reviewed By: frasercrmck, rogfer01
Differential Revision: https://reviews.llvm.org/D121645
Since we mark the pseudos as mayLoad but do not provide any MMOs,
isSafeToMove conservatively returns false, stopping MachineLICM from
hoisting the instructions. PseudoLA_TLS_GD does not actually expand to a
load, so stop marking that as mayLoad to allow it to be hoisted, and for
the others make sure to add MMOs during lowering to indicate they're GOT
loads and thus can be freely moved.
Fixes https://github.com/llvm/llvm-project/issues/54372
Reviewed By: MaskRay, arichardson
Differential Revision: https://reviews.llvm.org/D121654
This code handles fixed vector SPLAT_VECTOR, but is never called in
any tests.
We only form fixed vector splat vectors for vXi64 on RV32 as part
of DAGCombine. This will be type legalized to SPLAT_VECTOR_PARTS.
So the Custom handling for SPLAT_VECTOR is never needed.
This patch makes SPLAT_VECTOR for vXi64 'Legal' on RV32 so that
DAGCombine will create it, but there's no need for Custom handler.
It will still be type legalized to SPLAT_VECTOR_PARTS.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D121673
If the type is less than XLenVT, type legalization will turn this
into (srl (bitreverse (bswap (srl (bswap X), C))), C). We can't
completely recover from these shifts. They introduce zeros into
the upper bits of the result and we can't easily tell if they are
needed. By doing a DAG combine early, we avoid introducing these
shifts.
Type legalize narrow RISCVISD::GREV/GORC with constant to a larger
type without switching to W. Detect sext_inreg+gorci/grevi with a
uimm5 immediate during isel to emit GREVIW/GORCIW.
This allows us to better propagate known bits information through
extended bits after type legalization. It will also simplify a
change I'm considering for BREV8 with Zbkb.
A future patch will add computeKnownBits support for GORC.
A further improvement here would be to use hasAllWUsers and
doPeepholeSExtW like we do for SLLIW, but I don't think we have
the test coverage for that yet.
We know the shift amount is a constant with bit 31 clear. anyext
of constant will be either zext or sext which will produce the
same result here. But we really shouldn't rely on that. It would
be valid to put a random number in the upper bits. Our isel patterns
expect the upper bits to be 0 so we should ask for it explicitly.
This doesn't appear to be needed any more. I did some inspecting
of the gcc torture suite and SPEC2006 with this removed and didn't
find any meaningful changes.
I think we're more aggressive about forming ADDIW now using
sign_extend_inreg during type legalization and hasAllWUsers in isel.
This probably helps catch the cases this helped with before.
Similar to what we do for other loads/stores, use the intrinsic
version that we already have custom isel for.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D121166
vslide1up/down have this flag set, but the value isn't a splat.
Rename for clarity.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D121037
With Zbb, abs is expanded to (max X, neg) by default. If X has 33 or
more sign bits, we can expand it a little early using negw instead of
neg to save a sext_inreg. If X started as a 32 bit value, type
legalization would have inserted a sext before the abs so X having
33 sign bits should always be true.
Note: I've used ISD::FREEZE here since we increase the number of uses.
Our default expansion for ABS doesn't do that, but I think that's a bug.
We can't do this with custom type legalization because ISD::FREEZE
doesn't propagate sign bits so later DAG combine won't expand be
able to see optmize it.
Alives2 https://alive2.llvm.org/ce/z/Gx3RNe
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D120597
Until Zfinx is supported in CodeGen we need to convert all Zfinx
register classes to GPR.
Remove the zfinx-types.ll test which didn't test anything meaningful
since -mattr=zfinx isn't implemented completely in llc.
Follow up to D93298.
This miscompile was introduced in D119527.
This was a special pattern for rotate+bswap on RV32. It doesn't
work for RV64 since the rotate needs to be half the bitwidth. The
equivalent pattern for RV64 is ROTR ((GREV x, 56), 32) so match
that instead.
This could be generalized further as noted in the new FIXME.
Reviewed By: Chenbing.Zheng
Differential Revision: https://reviews.llvm.org/D120686
This patch added the MC layer support of Zfinx extension.
Authored-by: StephenFan
Co-Authored-by: Shao-Ce Sun
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D93298
This lowers VECTOR_SPLICE of scalable vectors to a slidedown follow by a slideup.
Fixed vectors are encouraged to use shufflevector instruction. The equivalent patch
for fixed vectors is D119039.
I've used a tail agnostic slidedown and limited the VL to only the
elements that will not be overwritten by the slideup. The slideup
uses VLMax for its VL. It unfortunately uses tail undisturbed policy
but it isn't required as there is no tail. We just need the merge
operand to carry the bits for the lower portion of the result.
Care was taken to ensure that either the slideup or slidedown will
be able to use a .vi instruction when the immediate is small. Which
one uses the immediate depends on the sign of the immediate.
Reviewed By: frasercrmck, ABataev
Differential Revision: https://reviews.llvm.org/D119303
Default type legalization will create sext_inreg+abs, but we may
not be able to remove the sext_inreg.
Instead this patch expands abs during type legalization to
Y = sraiw X, 31; subw(xor X, Y), Y) which doesn't require the input
to be sign extended.
This gives a big improvement for some neg-abs tests where the
abs is used more than the the neg. Previously the abs was expanded
a different way before and after type legalization. Now they are
expanded in a similar way enabling more CSE.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D120636
vcpop and vfirst are still useful when VL=0.
vcpop equivalents to li 0 and vfirst equivalents to li -1,
since no mask elements are active.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120302
Add a new ISD opcode to represent the sign extending behavior of
vmv.x.h. Keep the previous anyext opcode to allow the existing
(fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing
to generate a sign extend.
For fmv.x.w we are able to match the sext_inreg in an isel pattern,
but a 16-bit sext_inreg is lowered to a shift pair before isel. This
seemed like a larger match than we should do in isel.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D118974
Internally to DAGCombiner the SDValues were passed by non-const
reference despite not being modified. They were then passed by
const reference to TLI.
This patch passes them by value which is consistent with the vast
majority of code.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D120420
See https://github.com/llvm/llvm-project/issues/53831 for a full discussion.
The basic issue is that DAGCombiner::visitMUL and
RISCVISelLowering;:transformAddImmMullImm get stuck in a loop, as the
current checks in transformAddImmMulImm aren't sufficient to avoid all
cases where DAGCombiner::isMulAddWithConstProfitable might trigger a
transformation. This patch makes transformAddImmMulImm bail out if C0
(the constant used for multiplication) has more than one use.
Differential Revision: https://reviews.llvm.org/D120332
This generalizes isElementRotate to work when there's only a single
slide needed. I've removed matchShuffleAsSlideDown which is now
redundant.
Reviewed By: frasercrmck, khchen
Differential Revision: https://reviews.llvm.org/D119759
Add the passthru operand for
VMV_V_X_VL, VFMV_V_F_VL and SPLAT_VECTOR_SPLIT_I64_VL also.
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D119688
Part of the shift lowering creates a (sub XLEN-1, ShAmt). When this
value is used we know that ShAmt is [0..XLEN-1]. Since XLEN is a power
of 2 we can replace the sub with an xor. This allows us to use XORI
instead of LI+SUB.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D119411
The goal is support tail and mask policy in RVV builtins.
We focus on IR part first.
If the passthru operand is undef, we use tail agnostic, otherwise
use tail undisturbed.
Add passthru operand for VSLIDE1UP_VL and VSLIDE1DOWN_VL to support
i64 scalar in rv32.
The masked VSLIDE1 would only emit mask undisturbed policy regardless
of giving mask agnostic policy until InsertVSETVLI supports mask agnostic.
Reviewed by: craig.topper, rogfer01
Differential Revision: https://reviews.llvm.org/D117989
This is a more generic version of D119110 that uses MaskedValueIsZero
to do the matching and SimplifyDemandedBits to remove any unneeded
AND instructions.
Tests were taken from D119110.
Reviewed By: Chenbing.Zheng
Differential Revision: https://reviews.llvm.org/D119622
While matching widening multiply, if we matched an extend from i8->i32,
i16->i64 or i8->i64, we need to reintroduce a narrower extend. If we're
matching a vwmulsu we need to use a sext for op0 and a zext for op1.
This bug exists in LLVM 14 and will need to be backported.
Differential Revision: https://reviews.llvm.org/D119618
Move some combine patterns to DAG combine,and
it dealt with fixme left in RISCVInstrInfoZb.td.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D119527
We can lower a vector splice to a vslidedown and a vslideup.
The majority of the matching code here came from X86's code for matching
PALIGNR and VPALIGND/Q.
The slidedown and slideup lowering don't really require it to be concatenation,
but it happened to be an interesting pattern with existing analysis code I
could use.
This helps with cases where the scalar loop optimizer forwarded a load
result from a previous loop iteration. For example, this happens if the
loop uses x[i] and x[i+1] on the same iteration. The scalar optimizer
will forward x[i+1] load from the previous loop to satisfy x[i] on this
loop. When this get vectorized it results in one element of a vector
being forwarded from the previous loop to be concatenated with elements
loaded on this iteration.
Whether that's more efficient than doing a shifted loaded or reloading
the single scalar and using vslide1up is an interesting question.
But that's not something the backend can help with.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D119039
The VLMaxSentinel is represented as TargetConstant, but that's included
in isa<ConstantSDNode>. To keep constant VLs and VLMax separate as long
as possible, use the X0 register during lowering and only convert to
VLMaxSentinel during isel.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D118845
We already had FMA_VL node, but we didn't have masked patterns.
I have not added the fneg variations. I'll do those after I add
llvm.vp.fneg.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D119196
This patch adds an optimization to splat-like operations where the
splatted value is extracted from a identically-sized vector. On RVV we
can splat that via vrgather.vx/vrgather.vi without dropping to scalar
beforehand.
We do have a similar VECTOR_SHUFFLE-specific optimization but that only
works on fixed-length vector types and for those with a constant splat
lane. This patch extends this optimization to make it work on
scalable-vector types and on unknown extract indices.
It is performed during fixed-vector BUILD_VECTOR lowering and during a
new DAGCombine on SPLAT_VECTOR for scalable vectors.
Reviewed By: craig.topper, khchen
Differential Revision: https://reviews.llvm.org/D118456
Add a new ISD opcode to represent the sign extending behavior of
vmv.x.h. Keep the previous anyext opcode to allow the existing
(fmv_x_anyexth (fmv_h_x X)) combine to keep working without needing
to generate a sign extend.
For fmv.x.w we are able to match the sext_inreg in an isel pattern,
but a 16-bit sext_inreg is lowered to a shift pair before isel. This
seemed like a larger match than we should do in isel.
Differential Revision: https://reviews.llvm.org/D118974
Add the vslidedown and interleave patterns that I recently implemented.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D118952
This avoids a crash for scalable vectors and or scalarization for
fixed vectors.
The algorithm is different enough that I don't think it makes sense
to merge with ceil/floor/trunc. Algorithm is adapted from gcc's X86
SSE2 output.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D117247
SPLAT_VECTOR_I64 has the same semantics as RISCVISD::VMV_V_X_VL, it
just assumed VLMax instead of carrying a VL operand.
Include order of RISCVInstrInfoVSDPatterns.td and RISCVInstrInfoVVLPatterns.td
has been swapped to avoid moving riscv_vmv_v_x_vl into
RISCVInstrInfoVSDPatterns.td and to allow moving other "_vl" SDNodes back to
RISCVInstrInfoVVLPatterns.td
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D118841