On PowerPC, VSRpRC represents the pairs of even and odd VSX register,
and VRRC corresponds to higher 32 VSX registers. In some cases, extra
copies are produced when handling incoming VRRC arguments with VSRpRC.
This patch changes allocation order of VSRpRC to eliminate this kind of
copy.
Stack frame sizes may increase if allocating non-volatile registers, and
some other vector copies happen. They need fix in future changes.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D104855
Commit 0464586ac5 added a combine
for a 64-bit load feeding a bswap but the implementation is only
correct for little endian systems.
This fixes it for big endian systems.
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
When targeting CPUs that don't have LDBRX, we end up producing code that is
very inefficient and large for this common idiom. This patch just
optimizes it two 32-bit LWBRX instructions along with a merge.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49610
Differential revision: https://reviews.llvm.org/D104836
Summary:
generate eh_info when vector registers are saved according to the traceback table.
struct eh_info_t {
unsigned version; /* EH info version 0 */
#if defined(64BIT)
char _pad[4]; /* padding */
#endif
unsigned long lsda; /* Pointer to Language Specific Data Area */
unsigned long personality; /* Pointer to the personality routine */
};
the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function
Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/D103651
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.
This fixes https://bugs.llvm.org/show_bug.cgi?id=46996
One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)
This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
ret [ 1 x %T1 ] zeroinitializer
}
```
We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.
This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.
You get this:
```
renamable $d0 = FMOVD0
$w0 = COPY killed renamable $d0
```
Rather than this:
```
$d0 = FMOVD0
$w0 = COPY $wzr
```
The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.
This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.
Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.
New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104123
We have added STXVP/LXVP for spilling and restoring the registers
but we neglected to add FI elimination code for these. The result
is that we end up producing impossible MachineInstr's that have
register operands in place of immediates.
Pointee types are going away soon.
For this, we mostly just care about store/load types, which are already
available without the pointee types. The other intrinsics always use
i8*.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D103719
Export `lq`, `stq`, `lqarx` and `stqcx.` in preparation for implementing 16-byte lock free atomic operations on AIX.
Add a new register class `g8prc` for these instructions, since these instructions require even-odd register pair.
Reviewed By: nemanjai, jsji, #powerpc
Differential Revision: https://reviews.llvm.org/D103010
GCC documentation for the `wa` constraint states that:
```
wa
A VSX register (VSR), vs0…vs63. This is either an FPR (vs0…vs31 are f0…f31)
or a VR (vs32…vs63 are v0…v31).
```
This technically means that we could accept floating point parameters. In fact,
gcc itself does. The following testcase compiles and runs on all PPC platforms with GCC,
whereas clang/llc will assert:
```
#include <stdio.h>
double foo ( vector double a ) {
double b, c;
asm("xvabsdp %x0, %x2 \n"
"xxsldwi %x1, %x0, %x0, 2 \n"
: "+wa" (b),
"=wa" (c)
: "wa" (a)
);
return b+c;
}
int main(void) {
vector double a = {-3., -4.};
double t = foo( a );
printf("%g\n", t);
}
```
This patch allows clang/llc to build and run this testcase.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D103409
Relaxing superclass constraint for VSX register classes helps reducing
32-byte spills and copies when register pressure is high.
In test case affected, some of them introduces more copies due to new
allocation order. However, this patch should not be the root cause, and
we may be able to fix it in other places of register allocation.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D104006
We will need to set the ssp canary bit in traceback table to communicate
with unwinder about the canary.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D103202
When `-fstack-clash-protection` is enabled and stack has to be realigned, some parts of redzone is written prior the probe, so probe might overwrite content already written in redzone. To avoid it, we have to make sure the first probe is at full probe size or is the last probe so that we can skip redzone.
It also fixes violation of ABI under PPC where `r1` isn't updated atomically.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49903.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D100290
According to ELF V2 ABI, `0` should be the dwarf number of `r0`. Currently MMA's register also uses `0` as its dwarf number, this confuses `RegisterInfoEmitter` and generates wrong dwarf -> llvm mapping.
```
extern const MCRegisterInfo::DwarfLLVMRegPair PPCDwarfFlavour1Dwarf2L[] = {
{ 0U, PPC::VSRp31 },
```
This leads to wrong cfi output in https://reviews.llvm.org/D100290.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D103761
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.
Differential Revision: https://reviews.llvm.org/D103759
It's still in use in a few places so we can't delete it yet but there's not
many at this point.
Differential Revision: https://reviews.llvm.org/D103352
The code gen for f32 to i32 bitcast is not currently the most efficient;
this patch removes some unneccessary instructions gerneated.
Differential revision: https://reviews.llvm.org/D100782
When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE
definition defined around the #includes in files include it could
result in redefinition warnings even compile errors.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102594
AIX use `__ssp_canary_word` instead of `__stack_chk_guard`.
This patch update the target hook to use correct symbol,
so that the basic stackprotect feature can work.
The traceback will be handled in follow up patch.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D103100
This is the first in a series of patches to provide builtins for
compatibility with the XL compiler. Most of the builtins already had
intrinsics and only needed to be implemented in the front end.
Intrinsics were created for the three iospace builtins, eieio, and icbt.
Pseudo instructions were created for eieio and iospace_eieio to
ensure that nops were inserted before the eieio instruction.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D102443
We are using TOCEntry symbols like `LC..0` in TOC loads,
this is hard to read , at least requiring an additional step to figure
out the loaded symbols.
We should print out the name in comments.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D102949
Partword atomic binaries are not zero extended as they should be.
This patch fixes them to ensure that they are zero extended.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D102819
These patterns are missing even though the underlying instruction
doesn't really care about the type. Added these patterns to resolve
https://bugs.llvm.org/show_bug.cgi?id=50084
This instruction is a nop on all server cores (certainly on all
cores that AIX supports) so it is fine to emit a nop instead of it.
In fact, that is exactly what XL emits. So we emit a nop on AIX
and we leave the codegen as is on other platforms since there may
indeed be cores out there for which this actually does some prefetching.
The default AsmPrinter print GV in comments,
AIX should do so too.
This also fix LLVM :: CodeGen/Generic/inline-asm-mem-clobber.ll.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D102534
Added hashst to the prologue and hashchk to the epilogue.
The hash for the prologue and epilogue must always be stored as the first
element in the local variable space on the stack.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D99377
There are two reasons this shouldn't be restricted to Power8 and up:
1. For XL compatibility
2. Because clang will expand comparison operators to these intrinsics*
*Without this patch, the following causes a selection error:
int test(vector signed long a, vector signed long b) {
return a < b;
}
This patch provides the handling for the intrinsics in the back
end and removes the Power8 guards from the predicate functions
(vec_{all|any}_{eq|ne|gt|ge|lt|le}).
getVectorNumElements() returns a value for scalable vectors
without any warning so it is effectively getVectorMinNumElements().
By renaming it and making getVectorNumElements() forward to
it, we can insert a check for scalable vectors into getVectorNumElements()
similar to EVT. I didn't do that in this patch because there are still more
fixes needed, but I was able to temporarily do it and passed the RISCV
lit tests with these changes.
The changes to isPow2VectorType and getPow2VectorType are copied from EVT.
The change to TypeInfer::EnforceSameNumElts reduces the size of AArch64's isel table.
We're now considering SameNumElts to require the scalable property to match which
removes some unneeded type checks.
This was motivated by the bug I fixed yesterday in 80b9510806
Reviewed By: frasercrmck, sdesmalen
Differential Revision: https://reviews.llvm.org/D102262
When an integer is converted into floating point in subword vector extract,
it can be done in 2 instructions instead of the 3+ instructions it generates
right now. This patch removes the uncessary generation.
Differential: https://reviews.llvm.org/D100604
The stack frame update code does not take into consideration spilling
to registers for callee saved registers. The option -ppc-enable-pe-vector-spills
turns on spilling to registers for callee saved registers and may expose a bug
in the code that moves a stack frame pointer update instruction.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D101366
If spills are to registers instead of to the stack then a copy will be used
and frame index scavenging is not required.
This patch adds debug info to frame index scavenging and makes sure that
spilling to registers does not cause frame index scavenging.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D101360
This is a simple fix on LE. On BE, vector shuffles are categorized into
different ops. We may need more work to eliminate these in
tablegen/pre-isel.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D101605
- Add branch absolute reloction R_RBA, R_TLS relocation for the variable offset
for the tlsgd model and R_TLSM for the region handle for the tlsgd model
- Properly set the relocation fixed values for R_TLS and R_TLSM
- Emit the TCEntry with the variant kind in the XCOFFStreamer
Reviewed by: sfertile, nemanjai, DiggerLin
Differential Revision: https://reviews.llvm.org/D100214
This patch updates the scalar atomic patterns to use the refactored load/store
implementation introduced in D93370.
All existing test cases pass with when the refactored patterns are utilized.
Differential Revision: https://reviews.llvm.org/D94498