This new f16 shuffle under Neon would hit an assert in
GeneratePerfectShuffle as it would try to treat a f16 vector as an i8.
Add f16 handling, treating them like an i16.
Differential Revision: https://reviews.llvm.org/D95446
This adds some simple fp16 scalar_to_vector patterns, preventing a
selection failure if this came up.
Differential Revision: https://reviews.llvm.org/D95427
Similar to the recent patch for fpext, this adds vcvtb and vcvtt with
insert into vector instruction selection patterns for fptruncs. This
helps clear up a lot of register shuffling that we would otherwise do.
Differential Revision: https://reviews.llvm.org/D81637
We current extract and convert from a top lane of a f16 vector using a
VMOVX;VCVTB pair. We can simplify that to use a single VCVTT. The
pattern is mostly copied from a vector extract pattern, but produces a
VCVTTHS f32 directly.
This had to move some code around so that ARMInstrVFP had access to the
required pattern frags that were previously part of ARMInstrNEON.
Differential Revision: https://reviews.llvm.org/D81556
This patch adds codegen for the following BFloat
operations to the ARM backend:
* concatenation of bf16 vectors
* bf16 vector element extraction
* bf16 vector element insertion
* duplication of a bf16 value into each lane of a vector
* duplication of a bf16 vector lane into each lane
Differential Revision: https://reviews.llvm.org/D81411
This change adds two FP16 extraction and two insertion patterns
(one per possible vector length).
Extractions are handled by copying a Q/D register into one of VFP2
class registers, where single FP32 sub-registers can be accessed. Then
the extraction of even lanes are simple sub-register extractions
(because we don't care about the top parts of registers for FP16
operations). Odd lanes need an additional VMOVX instruction.
Unfortunately, insertions cannot be handled in the same way, because:
* There is no instruction to insert FP16 into an even lane (VINS only
works with odd lanes)
* The patterns for odd lanes will have a form of a DAG (not a tree),
and will not be implementable in pure tablegen
Because of this insertions are handled in the same way as 16-bit
integer insertions (with conversions between FP registers and GPRs
using VMOVHR instructions).
Without these patterns the ARM backend would sometimes fail during
instruction selection.
This patch also adds patterns which combine:
* an FP16 element extraction and a store into a single VST1
instruction
* an FP16 load and insertion into a single VLD1 instruction
Differential Revision: https://reviews.llvm.org/D62651
llvm-svn: 362482