A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.
A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.
Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.
Differential Revision: https://reviews.llvm.org/D124217
Change to use VEISD::CMPI/CMPU/CMPF/CMPQ and VEISD::CMOV in combineSelectCC
for better optimization. Support VEISD::CMPI/CMPU in combineTRUNCATE also
to optimize truncate. Remove obsolete lower patterns from VEInstrInfo.td.
Update regression tests also.
Reviewed By: efocht
Differential Revision: https://reviews.llvm.org/D136049
Change to use VEISD::CMOV in combineSelect for better optimization.
Support VEISD::CMOV in combineTRUNCATE also to optimize trancate.
Merge functions to handle condition codes to VE.h. And add basic
CMOV patterns to VEInstrInfo.td. Update regression tests also.
Reviewed By: efocht
Differential Revision: https://reviews.llvm.org/D135878
Support smax/smin in VEInstrInfo.td. Remove obsolete patterns for
smax/smin. Add regression tests for smax/smin/umax/umin.
Reviewed By: efocht
Differential Revision: https://reviews.llvm.org/D134583
Add maxnum and minnum for float and double. Lowering is already
implemented, so this patch changes them legal and adds regression
tests.
Reviewed By: efocht
Differential Revision: https://reviews.llvm.org/D134108
VE has fused multiply-add instruction for only vector calculations. This
patch forces to expand scalar FMA to multiply and add instructions.
This patch also adds regression test.
Reviewed By: efocht
Differential Revision: https://reviews.llvm.org/D134107
All in-tree targets pass pointer-sized ConstantSDNodes to the
method. This overload reduced amount of boilerplate code a bit. This
also makes getCALLSEQ_END consistent with getCALLSEQ_START, which
already takes uint64_ts.
There are two different senses in which a block can be "address-taken".
There can be a BlockAddress involved, which means we need to map the
IR-level value to some specific block of machine code. Or there can be
constructs inside a function which involve using the address of a basic
block to implement certain kinds of control flow.
Mixing these together causes a problem: if target-specific passes are
marking random blocks "address-taken", if we have a BlockAddress, we
can't actually tell which MachineBasicBlock corresponds to the
BlockAddress.
So split this into two separate bits: one for BlockAddress, and one for
the machine-specific bits.
Discovered while trying to sort out related stuff on D102817.
Differential Revision: https://reviews.llvm.org/D124697
Support load/store vm registers to memory location as a first step.
As a next step, support load/store vm registers to stack location.
This patch also adds several regression tests for not only load/store
vm registers but also missing load/store for vr registers.
Reviewed By: efocht
Differential Revision: https://reviews.llvm.org/D128610
ISel for experimental.vp.strided.load|store for v256.32 types via
lowering to vvp_load|store SDNodes.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D121616
Add `vvp_load|store` nodes. Lower to `vld`, `vst` where possible. Use
`vgt` for masked loads for now.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D120413
Split v512.32 binary ops into two v256.32 ops using packing support
opcodes (vec_unpack_lo|hi, vec_pack).
Depends on D120053 for packing opcodes.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D120146
Packed vector and mask registers (v512) are composed of two v256
subregisters that occupy the even and odd element positions. We add
packing support SDNodes (vec_unpack_lo|hi and vec_pack) and splitting of
v512i1 mask arithmetic ops with those.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D120053
The new LEGALAVL node annotates that the AVL refers to packs of 64bit.
We use a two-stage lowering approach with LEGALAVL:
First, standard SDNodes are translated into illegal VVP layer nodes.
Regardless of source (VP or standard), all VVP nodes have a mask and AVL
parameter. The AVL parameter refers to the element position (just as in
VP intrinsics).
Second, we legalize the AVL usage in VVP layer nodes. If the element
size is < 64bit, the EVL parameter has to be adjusted to refer to packs
of 64bits. We wrap the legalized AVL in a LEGALAVL node to track this.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D118321
Packed-mode broadcast of f32/i32 requires the subregister to be
replicated to the full I64 register prior. Add repl_i32 and repl_f32 to
faciliate this.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D117878
VECustomDAG's functions simplify emitting VE custom ISD nodes. The class
is just a stub now. We add more functions, in particular for the
VP->VVP->VE lowering, to VECustomDAG as we build up vector isel.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D116103
Use the `VMRG` for all three operations for now. `vp_select` will be
used in passthru patterns.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D117206
This implements vp_add, vp_and for the VE target by lowering them to the
VVP_* layer. We also add helper functions for VP SDNodes (isVPSDNode,
getVPMaskIdx, getVPExplicitVectorLengthIdx).
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D93766
We do this mostly to be able to test the insert_vector_elt isel
patterns. As long as we don't, most single element insertions show up as
`BUILD_VECTOR` in the backend.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D93759
Support EH_SJLJ_LONGJMP, EH_SJLJ_SETJMP, and EH_SJLJ_SETUP_DISPATCH
for SjLj exception handling. NC++ uses SjLj exception handling, so
implement it first. Add regression tests also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D94071
In order to support SJLJ exception, implement llvm.eh.sjlj.lsda first.
Add regression test also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D93811
Remove VA.needsCustom checks which are copied from Sparc implementation
at the very beginning of VE implementation. Add assert to sanity-check
VA.needsCustom flag, also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D93847
Support atomic exchange and atomic compare and exchange instructions.
Change CAS and TS1AM instructions for ISel patterns. Add selectADDRzi
pattern for them. Add TS1AM pseudo instruction also for better ISel.
Add shouldExpandAtomicRMWInIR() function to expand all atomicrmw
instructions except atomicrmw xchg. Add custom lower for i8/i16
atomicrmw xchg. Modify replaceFI to support CAS/TS1AM instructions
which use "reg+disp" operands instead of "reg+imm+disp" operands.
And, add several regression tests to check the correctness.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D93161
Optimize prologue/epilogue instructions if a given function use GOT but
do not call other functions by eliminating FP. Previously, we had wrong
implementations taken from other architectures. Update regression tests
also.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D92313
VE Vector Predicated (VVP) SDNodes form an intermediate layer between VE
vector instructions and the initial SDNodes.
We introduce 'vvp_add' with isel and tests as the first of these VVP
nodes. VVP nodes have a mask and explicit vector length operand, which
we will make proper use of later.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D91802