An assertion of the following can occur because Altivec and VSX splats use a different operand number for the immediate:
```
int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed.
```
This patch updates PPCMIPeephole.cpp assign the correct splat immediate.
Differential Revision: https://reviews.llvm.org/D105790
The lowering for v2i64 is now guarded with hasDirectMove,
however, the current lowering can handle the pattern correctly,
only lowering it when there is efficient patterns and corresponding
instructions.
The original guard was added in D21135, and was for Legal action.
The code has evloved now, this guard is not necessary anymore.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D105596
This patch implements trap and FP to and from double conversions. The builtins
generate code that mirror what is generated from the XL compiler. Intrinsics
are named conventionally with builtin_ppc, but are aliased to provide the same
builtin names as the XL compiler.
Differential Revision: https://reviews.llvm.org/D103668
Summary:
The bit order of the has_vec and longtbtable bits in the traceback table generated by the XL compiler flipped at some point after v12.1. This is different from the definition is the AIX header debug.h. The change in the XL compiler that caused the deviation from the OS header definition was unintentional. Since both orderings are extant and the XL compiler runtime also expects the ordering defined by the OS, we will correct the output from LLVM to match the defined ordering given by the OS (which is also consistent with the Assembler Language Reference). Mitigation for traceback tables encoded with the wrong ordering is required for either ordering.
Reviewers: XingXue, HubertTong
Differential Revision: https://reviews.llvm.org/D105487
When the instruction has imm form and fed by LI, we can remove the redundat LI instruction.
Below is an example:
```
renamable $x5 = LI8 2
renamable $x4 = exact SRD killed renamable $x4, killed renamable $r5, implicit $x5
```
will be converted to:
```
renamable $x5 = LI8 2
renamable $x4 = exact RLDICL killed renamable $x4, 62, 2, implicit killed $x5
```
But when we do this optimization, we forget to remove implicit killed $x5
This bug has caused a lnt case error. This patch is to fix above bug.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D85288
SelectionDAG's equivalents in ISD::InputArg/OutputArg track the
original argument index. Mips relies on this, and its currently
reinventing its own parallel CallLowering infrastructure which tracks
these indexes on the side. Add this to help move towards deleting the
custom mips handling.
Lowering for scalar to vector would skip if current subtarget is big
endian and the scalar is larger or equal than 64 bits. However there's
some issue in implementation that SToVRHS may refer to SToVLHS's scalar
size if SToVLHS is present, which leads to some crash.o
Reviewed By: nemanjai, shchenz
Differential Revision: https://reviews.llvm.org/D105094
There are some patterns involving the permuted scalar to vector node
for which we don't have patterns without direct moves on little endian
subtargets. This causes selection errors. While we can of course add
the missing patterns, any additional effort to make this work is not
useful since there is no support for any CPU that can run in
little endian mode and does not support direct moves.
Adding usage of VSSRC and VSFRC when adding the live in registers on AIX.
This matches the behaviour of the rest of PPC Subtargets.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D104396
The combine was disabled in 4e22c7265d as it caused failures in
the ppc64be-multistage (bootstrap) bot.
It turns out that the combine did not correctly update the MMO for
the high load which caused aliased stores to be reported as unaliased.
This patch fixes that problem and re-enables the combine.
This patch implaments the load and reserve and store conditional
builtins for the PowerPC target, in order to have feature parody with
xlC on AIX.
Differential revision: https://reviews.llvm.org/D105236
Allocate non-volatile registers in order to be compatible with ABI, regarding gpr_save.
Quoted from https://www.ibm.com/docs/en/ssw_aix_72/assembler/assembler_pdf.pdf page55,
> The preferred method of using GPRs is to use the volatile registers first. Next, use the nonvolatile registers
> in descending order, starting with GPR31.
This patch is based on @jsji 's initial draft.
Tested on test-suite and SPEC, found no degradation.
Reviewed By: jsji, ZarkoCA, xingxue
Differential Revision: https://reviews.llvm.org/D100167
This patch changes return type of tryCandidate from void to bool:
1. Methods in some targets already follow this convention.
2. This would help if some target wants to re-use generic code.
3. It looks more intuitive if these try-method returns the same type.
We may need to change return type of them from bool to some enum
further, to make it less confusing.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103951
Summary:
in the patch https://reviews.llvm.org/D103651 [AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table.
when generate eh_info, it switch to other section, when it done, it need to switch back to text section again.
Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/105195
On PowerPC, VSRpRC represents the pairs of even and odd VSX register,
and VRRC corresponds to higher 32 VSX registers. In some cases, extra
copies are produced when handling incoming VRRC arguments with VSRpRC.
This patch changes allocation order of VSRpRC to eliminate this kind of
copy.
Stack frame sizes may increase if allocating non-volatile registers, and
some other vector copies happen. They need fix in future changes.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D104855
Commit 0464586ac5 added a combine
for a 64-bit load feeding a bswap but the implementation is only
correct for little endian systems.
This fixes it for big endian systems.
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
When targeting CPUs that don't have LDBRX, we end up producing code that is
very inefficient and large for this common idiom. This patch just
optimizes it two 32-bit LWBRX instructions along with a merge.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49610
Differential revision: https://reviews.llvm.org/D104836
Summary:
generate eh_info when vector registers are saved according to the traceback table.
struct eh_info_t {
unsigned version; /* EH info version 0 */
#if defined(64BIT)
char _pad[4]; /* padding */
#endif
unsigned long lsda; /* Pointer to Language Specific Data Area */
unsigned long personality; /* Pointer to the personality routine */
};
the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function
Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/D103651
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.
This fixes https://bugs.llvm.org/show_bug.cgi?id=46996
One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)
This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
ret [ 1 x %T1 ] zeroinitializer
}
```
We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.
This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.
You get this:
```
renamable $d0 = FMOVD0
$w0 = COPY killed renamable $d0
```
Rather than this:
```
$d0 = FMOVD0
$w0 = COPY $wzr
```
The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.
This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.
Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.
New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104123
We have added STXVP/LXVP for spilling and restoring the registers
but we neglected to add FI elimination code for these. The result
is that we end up producing impossible MachineInstr's that have
register operands in place of immediates.
Pointee types are going away soon.
For this, we mostly just care about store/load types, which are already
available without the pointee types. The other intrinsics always use
i8*.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D103719
Export `lq`, `stq`, `lqarx` and `stqcx.` in preparation for implementing 16-byte lock free atomic operations on AIX.
Add a new register class `g8prc` for these instructions, since these instructions require even-odd register pair.
Reviewed By: nemanjai, jsji, #powerpc
Differential Revision: https://reviews.llvm.org/D103010
GCC documentation for the `wa` constraint states that:
```
wa
A VSX register (VSR), vs0…vs63. This is either an FPR (vs0…vs31 are f0…f31)
or a VR (vs32…vs63 are v0…v31).
```
This technically means that we could accept floating point parameters. In fact,
gcc itself does. The following testcase compiles and runs on all PPC platforms with GCC,
whereas clang/llc will assert:
```
#include <stdio.h>
double foo ( vector double a ) {
double b, c;
asm("xvabsdp %x0, %x2 \n"
"xxsldwi %x1, %x0, %x0, 2 \n"
: "+wa" (b),
"=wa" (c)
: "wa" (a)
);
return b+c;
}
int main(void) {
vector double a = {-3., -4.};
double t = foo( a );
printf("%g\n", t);
}
```
This patch allows clang/llc to build and run this testcase.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D103409
Relaxing superclass constraint for VSX register classes helps reducing
32-byte spills and copies when register pressure is high.
In test case affected, some of them introduces more copies due to new
allocation order. However, this patch should not be the root cause, and
we may be able to fix it in other places of register allocation.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D104006
We will need to set the ssp canary bit in traceback table to communicate
with unwinder about the canary.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D103202
When `-fstack-clash-protection` is enabled and stack has to be realigned, some parts of redzone is written prior the probe, so probe might overwrite content already written in redzone. To avoid it, we have to make sure the first probe is at full probe size or is the last probe so that we can skip redzone.
It also fixes violation of ABI under PPC where `r1` isn't updated atomically.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49903.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D100290
According to ELF V2 ABI, `0` should be the dwarf number of `r0`. Currently MMA's register also uses `0` as its dwarf number, this confuses `RegisterInfoEmitter` and generates wrong dwarf -> llvm mapping.
```
extern const MCRegisterInfo::DwarfLLVMRegPair PPCDwarfFlavour1Dwarf2L[] = {
{ 0U, PPC::VSRp31 },
```
This leads to wrong cfi output in https://reviews.llvm.org/D100290.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D103761
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.
Differential Revision: https://reviews.llvm.org/D103759
It's still in use in a few places so we can't delete it yet but there's not
many at this point.
Differential Revision: https://reviews.llvm.org/D103352
The code gen for f32 to i32 bitcast is not currently the most efficient;
this patch removes some unneccessary instructions gerneated.
Differential revision: https://reviews.llvm.org/D100782
When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE
definition defined around the #includes in files include it could
result in redefinition warnings even compile errors.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102594
AIX use `__ssp_canary_word` instead of `__stack_chk_guard`.
This patch update the target hook to use correct symbol,
so that the basic stackprotect feature can work.
The traceback will be handled in follow up patch.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D103100
This is the first in a series of patches to provide builtins for
compatibility with the XL compiler. Most of the builtins already had
intrinsics and only needed to be implemented in the front end.
Intrinsics were created for the three iospace builtins, eieio, and icbt.
Pseudo instructions were created for eieio and iospace_eieio to
ensure that nops were inserted before the eieio instruction.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D102443
We are using TOCEntry symbols like `LC..0` in TOC loads,
this is hard to read , at least requiring an additional step to figure
out the loaded symbols.
We should print out the name in comments.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D102949
Partword atomic binaries are not zero extended as they should be.
This patch fixes them to ensure that they are zero extended.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D102819
These patterns are missing even though the underlying instruction
doesn't really care about the type. Added these patterns to resolve
https://bugs.llvm.org/show_bug.cgi?id=50084