We were missing some instruction costs when converting vectors of
floating point half types into integers, so I've added those here.
I also manually generated assembly code for each FP->int case and
looked at the number of instructions generated, which meant
adjusting some of the existing costs too.
I've updated an existing test to reflect the new costs:
Analysis/CostModel/AArch64/sve-fptoi.ll
Differential Revision: https://reviews.llvm.org/D99935
New registers FRM, FFLAGS and FCSR was defined. They represent
corresponding system registers. The new registers are necessary to
properly order floating point instructions in non-default modes.
Differential Revision: https://reviews.llvm.org/D99083
This is just a cleanup of the very high level stuff. I'm sure there is
more to update here but I'll leave that to others and/or a followup.
Differential Revision: https://reviews.llvm.org/D100888
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.
This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.
After this lands and has baked for a couple days, planned cleanups:
Make GCStatepointInst a IntrinsicInst subclass.
Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
Do the same in SelectionDAG.
Do the same in FastISEL.
Differential Revision: https://reviews.llvm.org/D99976
Introduced the cost of thre reverse shuffles for AArch64, currently just
copied the costs for PermuteSingleSrc.
Differential Revision: https://reviews.llvm.org/D100871
af7925b4dd added a custom DAG combine for recognizing fp-to-ints of
extract_subvectors that could be lowered to f64x2.convert_low_i32x4_{s,u}
instructions. This commit extends the combines to recognize equivalent
extract_subvectors of fp-to-ints as well.
Differential Revision: https://reviews.llvm.org/D100790
In GFX10 VOP3 can have a literal, which opens up the possibility of two
operands using the same literal value, which is allowed and only counts
as one use of the constant bus.
AMDGPUAsmParser::validateConstantBusLimitations already knew about this
but SIInstrInfo::verifyInstruction did not.
Differential Revision: https://reviews.llvm.org/D100770
apple-m1 has the same level of ISA support as apple-a14,
so this is a straightforward mechanical change. However, that
also means this inherits apple-a14's v8.5a+nobti quirkiness.
rdar://68287159
The generic SoftFloatVectorExtract.ll test was failing when run on arm
machines, as it tries to create a f64 under soft float. Limit the
transform to when f64 is legal.
Also add a missing override, as reported in D100244.
Mark MULHS/MULHU nodes as legal for both scalable and fixed SVE types,
and lower them to the appropriate SVE instructions.
Additionally now that the MULH nodes are legal, integer divides can be
expanded into a more performant code sequence.
Differential Revision: https://reviews.llvm.org/D100487
This adds a combine for extract(x, n); extract(x, n+1) ->
VMOVRRD(extract x, n/2). This allows two vector lanes to be moved at the
same time in a single instruction, and thanks to the other VMOVRRD folds
we have added recently can help reduce the amount of executed
instructions. Floating point types are very similar, but will include a
bitcast to an integer type.
This also adds a shouldRewriteCopySrc, to prevent copy propagation from
DPR to SPR, which can break as not all DPR regs can be extracted from
directly. Otherwise the machine verifier is unhappy.
Differential Revision: https://reviews.llvm.org/D100244
Instructions on the transcendental unit are executed in parallel to the
normal VALU, so add this as an extra resource.
This doesn't seem to have any effect, but it should be more correct.
Differential Revision: https://reviews.llvm.org/D100123
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.
This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.
Differential revision: https://reviews.llvm.org/D100478
when the predicate used by last{a,b} specifies a known vector length.
For example:
aarch64_sve_lasta(VL1, D) -> extractelement(D, #1)
aarch64_sve_lastb(VL1, D) -> extractelement(D, #0)
Co-authored-by: Paul Walker <paul.walker@arm.com>
Differential Revision: https://reviews.llvm.org/D100476
This patch adds an additional emergency spill slot to RVV code. This is
required as RVV stack offsets may require an additional register to compute.
This patch includes an optimization by @HsiangKai <kai.wang@sifive.com>
to reduce the number of registers required for the computation of stack
offsets from 3 to 2. Otherwise we'd need two additional emergency spill
slots.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D100574
This patch exploits mtvsrdd instruction (available in ISA3.0+) to save
two callee-saved GPR registers into a single VSR, making it more
efficient.
Reviewed By: jsji, nemanjai
Differential Revision: https://reviews.llvm.org/D62565
Don't shrink VOP3 instructions if there are any uses of a carry-out
operand, because the shrunken form of the instruction would write the
carry-out to vcc instead of to a virtual register.
Differential Revision: https://reviews.llvm.org/D100760
This patch is the last one in backend to support fp128 type in
pre-POWER9 subtargets with VSX, removing temporary option and updating
remaining tests.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92374
This patch adds basic CSKY branch instructions and symbol address series instructions.
Those two kinds of instruction have relationship between each other, and it involves much work about Fixups.
For now, basic instructions are enabled except for disassembler support.
We would support to generate basic codegen asm firstly and delay disassembler work later.
Differential Revision: https://reviews.llvm.org/D95029
This patch adds basic CSKY integer instructions except for branch series such as bsr, br.
It mainly includes basic ALU, load & store, compare and data move instructions.
Branch series instructions need handle complex symbol operand as following patch later.
Differential Revision: https://reviews.llvm.org/D94007
This basic parser will handle basic instructions with register or immediate operands.
With the addition of CSKYInstPrinter, we can now make use of lit tests.
Differential Revision: https://reviews.llvm.org/D93798
It's necessary to calculate correct instruction size because
PseudoVRELOAD and PseudoSPILL will be expanded into multiple
instructions.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100702
M68kDisassembler should put M68kDesc as its direct library dependency
since it uses logics releated to code beads Otherwise the build will
fail when building LLVM libraries as shared objects (building LLVM
libraries statically won't have this problem though)
This also includes PC-relative addresses since they are still
referenced as absolute addresses in assembly and converted to
relative addresses by the assembler.
This changes, for example:
- `bra #-2` -> `bra $100`
- `jsr #16` -> `jsr $10`
Differential Revision: https://reviews.llvm.org/D100697
Used to model structural hazards on FP issue, where some
instructions take up 2 issue slots and others one as well
as similar structural hazards on load issue, where some
instructions take up two load lanes and others one.
Differential Revision: https://reviews.llvm.org/D98977
We previously used splats instead of v128.const to materialize vector constants
because V8 did not support v128.const. Now that V8 supports v128.const, we can
use v128.const instead. Although this increases code size, it should also
increase performance (or at least require fewer engine-side optimizations), so
it is an appropriate change to make.
Differential Revision: https://reviews.llvm.org/D100716
This patch removes -fixed-abi check for indirect calls
and also adds queue-ptr which is required for indirect calls to work.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D100633
Comparisons to zero or one after cset instructions can be safely
removed in examples like:
cset w9, eq cset w9, eq
cmp w9, #1 ---> <removed>
b.ne .L1 b.ne .L1
cset w9, eq cset w9, eq
cmp w9, #0 ---> <removed>
b.ne .L1 b.eq .L1
Peephole optimization to detect suitable cases and get rid of that
comparisons added.
Differential Revision: https://reviews.llvm.org/D98564
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.
The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.
Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100618
This is a partial port of AArch64TargetLowering::LowerCTPOP.
This custom lowering tries to uses NEON instructions to give a more efficient
CTPOP lowering when possible.
In the non-NEON/noimplicitfloat case, this should use the generic lowering
(see: https://godbolt.org/z/GcaPvWe4x). I think that's worth implementing after
implementing the widening code for s16/s8 though.
Differential Revision: https://reviews.llvm.org/D100399
These constraints are machine agnostic; there's no reason to handle
these per-arch. If arches don't support these constraints, then they
will fail elsewhere during instruction selection. We don't need virtual
calls to look these up; TargetLowering::getInlineAsmMemConstraint should
only be overridden by architectures with additional unique memory
constraints.
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D100416
It turns out we actually import a bunch of selection code for intrinsics. The
imported code checks that the register banks on the G_INTRINSIC instruction
are correct. If so, it goes ahead and selects it.
This adds code to AArch64RegisterBankInfo to allow us to correctly determine
register banks on intrinsics which have known register bank constraints.
For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for
porting AArch64TargetLowering::LowerCTPOP.
Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction.
This seems a little nicer than having to know about how intrinsic instructions
are structured.
Differential Revision: https://reviews.llvm.org/D100398