The code gen for f32 to i32 bitcast is not currently the most efficient;
this patch removes some unneccessary instructions gerneated.
Differential revision: https://reviews.llvm.org/D100782
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.
This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.
Differential revision: https://reviews.llvm.org/D100478
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.
Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.
The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.
All this also applies to the BotHeightReduce heuristic in the bottom
zone.
Differential Revision: https://reviews.llvm.org/D72392
We currently miss a number of opportunities to emit single-instruction
VMRG[LH][BHW] instructions for shuffles on little endian subtargets. Although
this in itself is not a huge performance opportunity since loading the permute
vector for a VPERM can always be pulled out of loops, producing such merge
instructions is useful to downstream optimizations.
Since VPERM is essentially opaque to all subsequent optimizations, we want to
avoid it as much as possible. Other permute instructions have semantics that can
be reasoned about much more easily in later optimizations.
This patch does the following:
- Canonicalize shuffles so that the first element comes from the first vector
(since that's what most of the mask matching functions want)
- Switch the elements that come from splat vectors so that they match the
corresponding elements from the other vector (to allow for merges)
- Adds debugging messages for when a shuffle is matched to a VPERM so that
anyone interested in improving this further can get the info for their code
Differential revision: https://reviews.llvm.org/D77448
There are a few patterns where we use a superclass for inputs to this
instruction rather than the correct class. This can sometimes lead to
unncessary copies.
When switched to the MI scheduler for P9, the hardware is modeled as out of order.
However, inside the MI Scheduler algorithm, we still use the in-order scheduling model
as the MicroOpBufferSize isn't set. The MI scheduler take it as the hw cannot buffer
the op. So, only when all the available instructions issued, the pending instruction
could be scheduled. That is not true for our P9 hw in fact.
This patch is trying to enable the Out-of-Order scheduling model. The buffer size 44 is
picked from the P9 hw spec, and the perf test indicate that, its value won't hurt the cpu2017.
With this patch, there are 3 specs improved over 3% and 1 spec deg over 3%. The detail is as follows:
x264_r: +6.95%
cactuBSSN_r: +6.94%
lbm_r: +4.11%
xz_r: -3.85%
And the GEOMEAN for all the C/C++ spec in spec2017 is about 0.18% improved.
Reviewer: Nemanjai
Differential Revision: https://reviews.llvm.org/D55810
llvm-svn: 350285
This patch aims to improve the codegen for vector loads involving the
scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used
for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X)
to utilize:
LXSD and LXSDX for i64 and f64
LXSIWAX for i32 (sign extension to i64)
LXSIWZX for i32 and f64
Committing on behalf of Amy Kwan.
Differential Revision: https://reviews.llvm.org/D48950
llvm-svn: 339260