This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.
Differential Revision: https://reviews.llvm.org/D98206
Prefer (self-documenting) return values to output parameters (which are
liable to be used).
While here, rename Noop to Nop which is more widely used and improves
consistency with hasEmitNops/setEmitNops/emitNop/etc.
Starting with Power 10 the instruction paddi is available to use.
The instruction allows for immediates that are 34 bits.
This patch adds exploitation of the paddi instruction to allow us
to materialize constants.
Reviewed By: lei, amyk
Differential Revision: https://reviews.llvm.org/D93300
4c973ae implemented reduction of vector swap for lane-insensitive
operations. This commit fixes it for checking number of uses of the
vector operation.
If we encounter a degenerate select node where both operands are
the same, then we can continue negating the condition while swapping
operands, resulting in an infinite loop. Avoid this by bailing out
if both operands are the same.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49509.
Differential Revision: https://reviews.llvm.org/D98340
This patch simplifies pattern (xxswap (vec-op (xxswap a) (xxswap b)))
into (vec-op a b) if vec-op is lane-insensitive. The motivating case
is ScalarToVector-VecOp-ExtractElement sequence on LE, but the
peephole itself is not related to endianness, so BE may also benefit
from this.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D97658
This pull request implements patterns to exploit the load rightmost vector
element instructions for loading element 0 on little endian PowerPC subtargets
into v8i16 and v16i8 vector registers for i16 and i8 data types.
Differential Revision: https://reviews.llvm.org/D94816#inline-921403
Add support for the TLS general dynamic access model to assembly
files on AIX 64-bit.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D98078
Since P8 is the oldest machine supported by MASSV pass,
_massv place holder is removed and the oldest version of
MASSV functions is assumed. If the P9 vector specific is
detected in the compilation process, the P8 prefix will
be updated to P9.
Differential Revision: https://reviews.llvm.org/D98064
Adds support for the TLS general dynamic access model to
assembly files on AIX 32-bit.
To generate the correct code sequence when accessing a TLS variable
`v`, we first create two TOC entry nodes, one for the variable offset, one
for the region handle. These nodes are followed by a `PPCISD::TLSGD_AIX`
node (new node introduced by this patch).
The `PPCISD::TLSGD_AIX` node (`TLSGDAIX` pseudo instruction) is
expanded to 2 copies (to put the variable offset and region handle in
the right registers) and a call to `__tls_get_addr`.
This patch also changes the way TC entries are generated in asm files.
If the generated TC entry is for the region handle of a TLS variable,
we add the `@m` relocation and the `.` prefix to the entry name.
For example:
```
L..C0:
.tc .v[TC],v[TL]@m -> region handle
L..C1:
.tc v[TC],v[TL] -> variable offset
```
Reviewed By: nemanjai, sfertile
Differential Revision: https://reviews.llvm.org/D97948
This changes the target data layout to make stack align to 16 bytes
on Power10. Before this change, stack was being aligned to 32 bytes.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D96265
Patch adds support for passing vector call operands to variadic
functions. Arguments which are fixed shadow GPRs and stack space even
when they are passed in vector registers, while arguments passed through
ellipses are passed in properly aligned GPRs if available and on the
stack once all GPR arguments registers are consumed.
Differential Revision: https://reviews.llvm.org/D97956
This patch adds support for the default AltiVec ABI for AIX.
Vector registers 20 through 31 are marked as reserved and cannot
be used in the default ABI. This patch adds handling for this case
and also remove the default AltiVec ABI errors.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D96351
Copy-paste P9 insns were added back in 2016,
however, looks like the opcodes has changed in ISA3.1.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D97416
Patch adds support for passing vector arguments to variadic functions.
Arguments which are fixed shadow GPRs and stack space even when they are
passed in vector registers, while arguments passed through ellipses are
passed in(properly aligned GPRs if available and on the stack once all
GPR arguments registers are consumed.
Differential Revision: https://reviews.llvm.org/D97485
This patch allows generating TLS variables in assembly files on AIX.
Initialized and external uninitialized variables are generated with the
.csect pseudo-op and local uninitialized variables are generated with
the .comm/.lcomm pseudo-ops. The patch also adds a check to
explicitly say that TLS is not yet supported on AIX.
Reviewed by: daltenty, jasonliu, lei, nemanjai, sfertile
Originally patched by: bsaleil
Commandeered by: NeHuang
Differential Revision: https://reviews.llvm.org/D96184
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.
Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.
Differential Revision: https://reviews.llvm.org/D97732
This patch provides two major changes:
1. Add getRelocationInfo to check if a constant will have static, dynamic, or
no relocations. (Also rename the original needsRelocation to needsDynamicRelocation.)
2. Only allow a constant with no relocations (static or dynamic) to be placed
in a mergeable section.
This will allow unused symbols that contain static relocations and happen to
fit in mergeable constant sections (.rodata.cstN) to instead be placed in
unique-named sections if -fdata-sections is used and subsequently garbage collected
by --gc-sections.
See https://lists.llvm.org/pipermail/llvm-dev/2021-February/148281.html.
Differential Revision: https://reviews.llvm.org/D95960
This patch generates the vinsw, vinsd, vinsblx, vinshlx, vinswlx, vinsdlx,
vinsbrx, vinshrx, vinswrx and vinsdrx instructions for vector insertion on P10.
Differential Revision: https://reviews.llvm.org/D94454
Added -mrop-protection for Power PC to turn on codegen that provides some
protection from ROP attacks.
The option is off by default and can be turned on for Power 8, Power 9 and
Power 10.
This patch is for the option only. The feature will be implemented by a later
patch.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D96512
We are going to support debug sections for XCOFF. So the csect
properties are not necessary. This patch makes these properties
optional.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D95931
Summary:
Currently Shrinkwrap is not enabled on AIX.
This patch enables shrink wrap on 32 and 64 bit AIX, and 64 bit ELF.
Reviewed By: sfertile, nemanjai
Differential Revision: https://reviews.llvm.org/D95094
Do not defer to the base class when the register constraint is a
physical fpr. The base class will select SPILLTOVSRRC as the register
class and register allocation will fail on subtargets without VSX
registers.
Differential Revision: https://reviews.llvm.org/D91629
We currently represent TOC entries by an MCSymbol. This is not enough in some situations.
For example, when accessing an initialized TLS variable v on AIX using the general dynamic
model, we need to generate the two following entries for v:
.tc .v[TC],v@m
.tc v[TC],v
One is for the region handle (with the @m relocation), the other is for the variable offset.
This refactoring allows storing several entries for the same symbol with different VariantKind
in the TOC. If the VariantKind is not specified, we default to VK_None.
The AIX TLS implementation using this refactoring to generate the two entries will be posted
in a subsequent patch.
Patched By: bsaleil
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D96346
As of commit 284f2bffc9, the DAG Combiner gets rid of the masking of the
input to this node if the mask only keeps the bottom 16 bits. This is because
the underlying library function does not use the high order bits. However, on
PowerPC's ELFv2 ABI, it is the caller that is responsible for clearing the bits
from the register. Therefore, the library implementation of __gnu_h2f_ieee will
return an incorrect result if the bits aren't cleared.
This combine is desired for ARM (and possibly other targets) so this patch adds
a query to Target Lowering to check if this zeroing needs to be kept.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=49092
Differential revision: https://reviews.llvm.org/D96283
NOTE: This patch was originally written by Anil Mahmud. His code has been
rebased but otherwise left mostly unchanged.
A new instructon on Power 10 allows for the materialization of 34 bit
immediate values. This patch allows the compiler to take advantage of
the new instruction in this situation.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D92879
Some cases may be transformed into 32 bit splats before hitting the boolean statement, which may cause incorrect behaviour and provide XXSPLTI32DX with the incorrect values of splat. The condition was reversed so that the shortcut prevents this problem.
Differential Revision: https://reviews.llvm.org/D95634
If the APInt returned by BuildVectorSDNode::isConstantSplat() is narrower than
64 bits, the result produced by XXSPLTI32DX is incorrect. The result returned
by the function appears to be incorrect and we'll investigate/fix it in a
follow-up commit. However, since this causes miscompiles, we must
temporarily disable emitting this instruction for such values.
This intrinsic is supposed to have the permute control vector complemented on
little endian systems (as the ABI specifies and GCC implements). With the
current code gen, the result vector is byte-reversed.
Differential revision: https://reviews.llvm.org/D95004
For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.
Reviewed By: dmgreen, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D94480