This patch fixes the following two bugs in `PPCInstrInfo::isSignOrZeroExtended` helper, which is used from sign-/zero-extension elimination in PPCMIPeephole pass.
- Registers defined by load with update (e.g. LBZU) were identified as already sign or zero-extended. But it is true only for the first def (loaded value) and not for the second def (i.e. updated pointer).
- Registers defined by ORIS/XORIS were identified as already sign-extended. But, it is not true for sign extension depending on the immediate (while it is ok for zero extension).
To handle the first case, the parameter for the helpers is changed from `MachineInstr` to a register number to distinguish first and second defs. Also, this patch moves the initialization of PPCMIPeepholePass to allow mir test case.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D40554
This patch updates two patterns involving `scalar_to_vector` and
`SCALAR_TO_VECTOR_PERMUTED` nodes to be safe for both 64-bit and 32-bit by
pulling the patterns out of the 64-bit specific guard. These patterns are
matched on POWER8 and above.
Differential Revision: https://reviews.llvm.org/D125389
This patch implements the following floating point negative absolute value
builtins that required for compatibility with the XL compiler:
```
double __fnabs(double);
float __fnabss(float);
```
These builtins will emit :
- fnabs on PWR6 and below, or if VSX is disabled.
- xsnabsdp on PWR7 and above, if VSX is enabled.
Differential Revision: https://reviews.llvm.org/D125506
Currently the regsiter operand definitions are found in three separate files.
This patch moves all of the definitions into PPCRegisterInfo.td.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D123543
VSX introduced some permute instructions that are direct
replacements for Altivec ones except they can target all
the VSX registers. We have added code generation for most
of these but somehow missed the low/hi word merges (XXMRG[LH]W).
This caused some additional spills on some large
computationally intensive code.
This patch simply adds the missed patterns.
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.
But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.
This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D116015
A big-endian version of vpermxor, named vpermxor_be, is added to LLVM
and Clang. vpermxor_be can be called directly on both the little-endian
and the big-endian platforms.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D114540
Currently, the floating point instructions that depend on
rounding mode are correctly marked in the PPC back end with
an implicit use of the RM register. Similarly, instructions
that explicitly define the register are marked with an
implicit def of the same register. So for the most part,
RM-using code won't be moved across RM-setting instructions.
However, calls are not marked as RM-setting instructions so
code can be moved across calls. This is generally desired,
but so is the ability to turn off this behaviour with an
appropriate option - and -frounding-math really should be
that option.
This patch provides a set of call instructions (for direct
and indirect calls) that are marked with an implicit def of
the RM register. These will be used for calls that are marked
with the strictfp attribute.
Differential revision: https://reviews.llvm.org/D111433
Add comments to explain why XXPERMDIs and XXPERMDI have different input register
classes, vsfrc for XXPERMDIs and vsrc for XXPERMDI.
This addresses the comments in abandoned patch D113178, we keep using `f0` instead
of using `vs0` for XXPERMDIs on purpose.
This patch removes the uneccessary mf/mtvsr generated in conjunction
with xscvdpsxws/xscvdpuxws.
Differential revision: https://reviews.llvm.org/D109902
This patch is in a series of patches to provide
builtins for compatibility with the XL compiler.
This patch adds builtins related to floating point
operations
Reviewed By: #powerpc, nemanjai, amyk, NeHuang
Differential Revision: https://reviews.llvm.org/D103986
This patch implements store, load, move from and to registers related
builtins, as well as the builtin for stfiw. The patch aims to provide
feature parady with xlC on AIX.
Differential revision: https://reviews.llvm.org/D105946
Implement a subset of builtins required for compatiblilty with AIX XL compiler.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D105930
This patch includes the following updates to the load/store refactoring effort introduced in D93370:
- Update various VSX patterns that use to "force" an XForm, to instead just XForm.
This allows the ability for the patterns to compute the most optimal addressing
mode (and to produce a DForm instruction when possible)
- Update pattern and test case for the LXVD2X/STXVD2X intrinsics
- Update LIT test cases that use to use the XForm instruction to use the DForm instruction
Differential Revision: https://reviews.llvm.org/D95115
The code gen for f32 to i32 bitcast is not currently the most efficient;
this patch removes some unneccessary instructions gerneated.
Differential revision: https://reviews.llvm.org/D100782
These patterns are missing even though the underlying instruction
doesn't really care about the type. Added these patterns to resolve
https://bugs.llvm.org/show_bug.cgi?id=50084
There are two reasons this shouldn't be restricted to Power8 and up:
1. For XL compatibility
2. Because clang will expand comparison operators to these intrinsics*
*Without this patch, the following causes a selection error:
int test(vector signed long a, vector signed long b) {
return a < b;
}
This patch provides the handling for the intrinsics in the back
end and removes the Power8 guards from the predicate functions
(vec_{all|any}_{eq|ne|gt|ge|lt|le}).
When an integer is converted into floating point in subword vector extract,
it can be done in 2 instructions instead of the 3+ instructions it generates
right now. This patch removes the uncessary generation.
Differential: https://reviews.llvm.org/D100604
This is a simple fix on LE. On BE, vector shuffles are categorized into
different ops. We may need more work to eliminate these in
tablegen/pre-isel.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D101605
This patch introduces a new infrastructure that is used to select the load and
store instructions in the PPC backend.
The primary motivation is that the current implementation of selecting load/stores
is dependent on the ordering of patterns in TableGen. Given this limitation, we
are not able to easily and reliably generate the P10 prefixed load and stores
instructions (such as when the immediates that fit within 34-bits). This
refactoring is meant to provide us with more control over the patterns/different
forms to exploit, as well as eliminating dependency of pattern declaration in TableGen.
The idea of this refactoring is that it introduces a set of addressing modes that
correspond to different instruction formats of a particular load and store
instruction, along with a set of common flags that describes a load/store.
Whenever a load/store instruction is being selected, we analyze the instruction
and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.
This patch is the first of a series of patches to be committed - it contains the
initial implementation of the refactored load/store selection infrastructure and
also updates P8/P9 patterns to adopt this infrastructure. The idea is that
incremental patches will add more implementation and support, and eventually
the old implementation will be removed.
Differential Revision: https://reviews.llvm.org/D93370
These are added for compatibility with XLC. They are similar to
vec_cts and vec_ctu except that the result is a doubleword vector
regardless of the parameter type.
We currently do not utilize instructions that convert single
precision vectors to doubleword integer vectors. These conversions
come up in code occasionally and this improvement allows us to
open code some functions that need to be added to altivec.h.
Extend shuffle canonicalization and conversion of shuffles fed by vectorized
scalars to big endian subtargets. For big endian subtargets, loads and direct
moves of scalars into vector registers put the data in the correct element for
SCALAR_TO_VECTOR if the data type is 8 bytes wide. However, if the data type is
narrower, the value still ends up in the wrong place - althouth a different
wrong place than on little endian targets.
This patch extends the combine that keeps values where they are if they feed a
shuffle to big endian targets.
Differential revision: https://reviews.llvm.org/D100478
The VSX tablegen file has some rather eggregious uses of
COPY_TO_REGCLASS even in situations where it needs to use
SUBREG_TO_REG. While this produces correct code, it often doesn't
allow the register coalescer to coalesce copies and the resulting
code ends up being suboptimal. This patch just changes over
patterns that should use SUBREG_TO_REG.
This pull request implements patterns to exploit the load rightmost vector
element instructions for loading element 0 on little endian PowerPC subtargets
into v8i16 and v16i8 vector registers for i16 and i8 data types.
Differential Revision: https://reviews.llvm.org/D94816#inline-921403
This patch generates the vinsw, vinsd, vinsblx, vinshlx, vinswlx, vinsdlx,
vinsbrx, vinshrx, vinswrx and vinsdrx instructions for vector insertion on P10.
Differential Revision: https://reviews.llvm.org/D94454
This intrinsic is supposed to have the permute control vector complemented on
little endian systems (as the ABI specifies and GCC implements). With the
current code gen, the result vector is byte-reversed.
Differential revision: https://reviews.llvm.org/D95004
A bug in the system assembler can assemble the xxspltd extended
menemonic into the wrong instruction (extracting the wrong element).
Emit the full xxpermdi with all operands to work around the problem.
Differential Revision: https://reviews.llvm.org/D94419
We will emit these permuted nodes on all VSX little endian subtargets
but don't have the patterns available to match them on subtargets
that don't have direct moves.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47916
Some of the pattern matching in PPCInstrVSX.td and node lowering involving vectors assumes 64bit mode. This patch disables some of the unsafe pattern matching and lowering of BUILD_VECTOR in 32bit mode.
Reviewed By: Xiangling_L
Differential Revision: https://reviews.llvm.org/D92789
PowerPC ISA support the input test for vector type v4f32 and v2f64.
Replace the software compare with hw test will improve the perf.
Reviewed By: ChenZheng
Differential Revision: https://reviews.llvm.org/D90914
For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will
have the impact the precision. As the fsqrt added belong to the cold path of the
cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve
the precision if the input is denormal.
Reviewed By: Spatel
Differential Revision: https://reviews.llvm.org/D80974
PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root.
LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to
target that has test instructions.
Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang
Differential Revision: https://reviews.llvm.org/D80706
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.
Reviewed By: steven.zhang, amyk
Differential Revision: https://reviews.llvm.org/D88278
According to POWER ISA, floating point instructions altering exception
bits in FPSCR should be 'may raise FP exception'. (excluding those
read or write the whole FPSCR directly, like mffs/mtfsf) We need to
model FPSCR well in future patches to handle the special case properly.
Instructions added mayRaiseFPException:
- fre(s)/frsqrte(s)
- fmadd(s)/fmsub(s)/fnmadd(s)/fnmsub(s)
- xscmpoqp/xscmpuqp/xscmpeqdp/xscmpgedp/xscmpgtdp
- xscvdphp/xscvhpdp/xvcvhpsp/xvcvsphp/xsrqpxp
- xsmaxcdp/xsincdp/xsmaxjdp/xsminjdp
Instructions removed mayRaiseFPException:
- xstdivdp/xvtdiv(d|s)p/xstsqrtdp/xvtsqrt(d|s)p
- xsabsdp/xsnabsdp/xvabs(d|s)p/xvnabs(d|s)p
- xsnegdp/xscpsgndp/xvneg(d|s)p/xvcpsgn(d|s)p
- xvcvsxwdp/xvcvuxwdp
- xscvdpspn/xscvspdpn
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D87738
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.
One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87220
This patch makes these operations legal, and add necessary codegen
patterns.
There's still some issue similar to D77033 for conversion from v1i128
type. But normal type tests synced in vector-constrained-fp-intrinsic
are passed successfully.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D83654
This patch adds support for constrained scalar int to fp operations on
PowerPC. Besides, this also fixes the FP exception bit of FCFID*
instructions.
Reviewed By: steven.zhang, uweigand
Differential Revision: https://reviews.llvm.org/D81669