Weak functions can be replaced by other functions at link time. Previously it
was assumed that no matter what the weak callee function was replaced with it
would still share the same TOC as the caller. This is no longer true as a weak
callee with a TOC setup can be replaced by another function that was compiled
with PC Relative and does not have a TOC at all.
This patch makes sure that all calls to functions defined as weak from a caller
that has a valid TOC have a nop after the call to allow a place for the linker
to restore the TOC.
Reviewed By: NeHuang
Differential Revision: https://reviews.llvm.org/D91983
A simple SELECT is used for converting i1 to floating types on ppc32,
but in constrained cases, the chain is not handled properly. This patch
will fix that.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92365
PowerPC ISA support the input test for vector type v4f32 and v2f64.
Replace the software compare with hw test will improve the perf.
Reviewed By: ChenZheng
Differential Revision: https://reviews.llvm.org/D90914
In lowering of FLT_ROUNDS_, FPSCR content will be moved into FP register
and then GPR, and then truncated into word.
For subtargets without direct move support, it will store and then load.
The load address needs adjustment (+4) only on big-endian targets. This
patch fixes it on using generic opcodes on little-endian and subtargets
with direct-move.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D91845
i1 is the native type for PowerPC if crbits is enabled. However, we need
to promote the i1 to i64 as we didn't have the pattern for i1.
Reviewed By: Qiu Chao Fang
Differential Revision: https://reviews.llvm.org/D92067
For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will
have the impact the precision. As the fsqrt added belong to the cold path of the
cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve
the precision if the input is denormal.
Reviewed By: Spatel
Differential Revision: https://reviews.llvm.org/D80974
This patch enables passing non variadic vector type parameters on the caller and callee side and vector return on AIX that are passed in vector registers only.
So far, support is enabled for only the AIX extended Altivec ABI Calling convention.
Reviewed By: sfertile, DiggerLin
Differential Revision: https://reviews.llvm.org/D86476
If smax() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method.
This is also what PowerPC has been doing for its abs implementation, so it lets us get rid of a load of custom lowering code there (and which was never updated when they added smax lowering).
Alive2: https://alive2.llvm.org/ce/z/xRk3cD
Differential Revision: https://reviews.llvm.org/D92095
PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root.
LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to
target that has test instructions.
Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang
Differential Revision: https://reviews.llvm.org/D80706
For now, we are using the GPR to pass the arguments/return value for fp128 on Power8,
which is incorrect. It should be VSR. The reason why we do it this way is that,
we are setting the fp128 as illegal which make LLVM try to emulate it with i128 on
Power8. So, we need to correct it as legal.
Reviewed By: Nemanjai
Differential Revision: https://reviews.llvm.org/D91527
Added support for the options mabi=vec-extabi and mabi=vec-default which are analogous to qvecnvol and qnovecnvol when using XL on AIX.
The extended Altivec ABI on AIX is enabled using mabi=vec-extabi in clang and vec-extabi in llc.
Reviewed By: Xiangling_L, DiggerLin
Differential Revision: https://reviews.llvm.org/D89684
When the operand to an (s/u)int_to_fp node is an illegally typed load we
cannot reuse the load address since we can not build a proper dependancy
chain. The legalized loads will use a different chain output then the
illegal load. If we reuse the load address then we will build a
conversion node that uses the chain of the illegal load and operations
which modify the memory address in the other dependancy chain can be
scheduled before the floating point load which feeds the conversion.
Differential Revision: https://reviews.llvm.org/D91265
The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.
Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.
Differential Revision: https://reviews.llvm.org/D91846
When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load
as of 1461fb6e78, we inaccurately check
for a single user of the load and neglect to update the users of the
output chain of the original load. As a result, we can emit a new
load when the original load is kept and the new load can be reordered
after a dependent store. This patch fixes those two issues.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47891
Turn on TLS support for PCRel by default and update the test cases.
Differential Revision: https://reviews.llvm.org/D88738
Reviewed by: stefanp, kamaub
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.
Differential Revision: https://reviews.llvm.org/D80485
In most of lib/Target we know that we are not dealing with scalable
types so it's perfectly fine to replace TypeSize comparison operators
with their fixed width equivalents, making use of getFixedSize()
and so on.
Differential Revision: https://reviews.llvm.org/D89101
This patch adds support for assemble disassemble intrinsics
for MMA.
Reviewed By: bsaleil, #powerpc
Differential Revision: https://reviews.llvm.org/D88739
Summary: This patch is derived from D87384.
In this patch we expand the existing decomposition of mul-by-constant to be more general by implementing 2 patterns:
```
mul x, (2^N + 2^M) --> (add (shl x, N), (shl x, M))
mul x, (2^N - 2^M) --> (sub (shl x, N), (shl x, M))
```
The conversion will be trigged if the multiplier is a big constant that the target can't use a single multiplication instruction to handle. This is controlled by the hook `decomposeMulByConstant`.
More over, the conversion benefits from an ILP improvement since the instructions are independent. A case with the sequence like following also gets benefit since a shift instruction is saved.
```
*res1 = a * 0x8800;
*res2 = a * 0x8080;
```
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D88201
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.
Differential Revision: https://reviews.llvm.org/D88063
It looks like in some circumstances when compiling with `-mcpu=pwr9` we create an EXTSWSLI node when which causes llc to fail. No such error occurs in pwr8 or lower.
This occurs in 32BIT AIX and BE Linux. the cause seems to be that the default return in combineSHL is to create an EXTSWSLI node. Adding a check for whether we are in PPC64 before that fixes the issue.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D87046
After removal of Darwin as a PowerPC subtarget, the VRSAVE
save/restore/spill/update code is no longer needed by any supported
subtarget, so remove it while keeping support for vrsave and related instruction
aliases for inline asm. I've pre-commited tests to document the existing vrsave
handling in relation to @llvm.eh.unwind.init and inline asm usage, as
well as a test which shows a beahviour change on AIX related to
returning vector type as we were wrongly emiting VRSAVE_UPDATE on AIX.
This patch legalizes the v256i1 and v512i1 types that will be used for MMA.
It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.
This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.
Differential Revision: https://reviews.llvm.org/D84968
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.
Differential Revision: https://reviews.llvm.org/D87910
This patch is the initial support for the Local Dynamic Thread Local Storage
model to produce code sequence and relocation correct to the ABI for the model
when using PC relative memory operations.
Differential Revision: https://reviews.llvm.org/D87721
This is a follow-up of D86605. For strict DAG FP node, if its FP
exception behavior metadata is ignore, it should have nofpexcept flag.
But during custom lowering, this flag isn't passed down.
This is also seen on X86 target.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87390
llc would crash for (store (fptosi-f128-i32)) when -mcpu=pwr8, we should
not generate FP_TO_(S|U)INT_IN_VSR for f128 types at this time. This
patch fixes it.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D86686
This patch is the initial support for the Local Exec Thread Local
Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.
Patch by: Kamau Bridgeman
Differential Revision: https://reviews.llvm.org/D83404
22a0edd0 introduced a config IsStrictFPEnabled, which controls the
strict floating point mutation (transforming some strict-fp operations
into non-strict in ISel). This patch disables the mutation by default
since we've finished PowerPC strict-fp enablement in backend.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87222
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.
One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87220
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.
This required adding a SDNodeFlags to SelectionDAG::getSetCC.
Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.
Differential Revision: https://reviews.llvm.org/D87200
Libcall __gcc_qtou is not available, which breaks some tests needing
it. On PowerPC, we have code to manually expand the operation, this
patch applies it to constrained conversion. To keep it strict-safe,
it's using the algorithm similar to expandFP_TO_UINT.
For constrained operations marking FP exception behavior as 'ignore',
we should set the NoFPExcept flag. However, in some custom lowering
the flag is missed. This should be fixed by future patches.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D86605
Quite a while ago, we legalized these nodes as we added custom
handling for reciprocal estimates in the back end. We have since
moved to target-independent combines but neglected to turn off
legalization. As a result, we can now get selection failures on
non-VSX subtargets as evidenced in the listed PR.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47373
This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D82502#inline-797941
Current custom lowering of truncate vector handles a source of up to 128 bits, but that only uses one of the two shuffle vector operands. Extend it to use both operands to handle 256 bit sources.
Differential Revision: https://reviews.llvm.org/D68035
This patch makes these operations legal, and add necessary codegen
patterns.
There's still some issue similar to D77033 for conversion from v1i128
type. But normal type tests synced in vector-constrained-fp-intrinsic
are passed successfully.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D83654
This patch adds support for constrained scalar int to fp operations on
PowerPC. Besides, this also fixes the FP exception bit of FCFID*
instructions.
Reviewed By: steven.zhang, uweigand
Differential Revision: https://reviews.llvm.org/D81669
This patch is the initial support for the Intial Exec Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D81947
This patch is the initial support for the General Dynamic Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.
Patch by: NeHuang
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D82315
This patch adds support for constrained scalar fp to int operations on
PowerPC. Besides, this fixes the FP exception bit of quad-precision
convert & truncate instructions.
Reviewed By: steven.zhang, uweigand
Differential Revision: https://reviews.llvm.org/D81537
This patch implements the builtins for the vector shifts (shl, srl, sra), and
adds the appropriate test cases for these builtins. The builtins utilize the
vector shift instructions introduced within ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83338
SUMMARY:
1. in the patch , remove setting storageclass in function .getXCOFFSection and construct function of class MCSectionXCOFF
there are
XCOFF::StorageMappingClass MappingClass;
XCOFF::SymbolType Type;
XCOFF::StorageClass StorageClass;
in the MCSectionXCOFF class,
these attribute only used in the XCOFFObjectWriter, (asm path do not need the StorageClass)
we need get the value of StorageClass, Type,MappingClass before we invoke the getXCOFFSection every time.
actually , we can get the StorageClass of the MCSectionXCOFF from it's delegated symbol.
2. we also change the oprand of branch instruction from symbol name to qualify symbol name.
for example change
bl .foo
extern .foo
to
bl .foo[PR]
extern .foo[PR]
3. and if there is reference indirect call a function bar.
we also add
extern .bar[PR]
Reviewers: Jason liu, Xiangling Liao
Differential Revision: https://reviews.llvm.org/D84765
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85220
This patch introduces two intrinsics: llvm.ppc.setflm and
llvm.ppc.readflm. They read from or write to FPSCR register
(floating-point status & control) which contains rounding mode and
exception status.
To ensure correctness of program, we need to prevent FP operations from
being moved across these intrinsics (mffs/mtfsf instruction), so here I
set them as scheduling boundaries. We can relax such restriction if
FPSCR is modeled well in the future.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D84914
Introduce a fatal error if any thread local storage code is compiled
using pc relative memory operations as well as a hidden override
option `-enable-ppc-pcrel-tls` so that this support can be incrementally
added if possible.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D85448
The custom lowering saves an instruction over the generic expansion, by
taking advantage of the fact that PowerPC shift instructions are well
defined in the shift-by-bitwidth case.
Differential Revision: https://reviews.llvm.org/D83948
For FP_TO_INT and INT_TO_FP lowering, we have direct-move and
non-direct-move methods. But they share some conversion logic, so we can
reduce redundant code by introducing new methods.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D81818
SPE doesn't have a fsel instruction, so don't try to lower to it.
This fixes a "Cannot select: tN: f64 = PPCISD::FSEL tX, tY, tZ" error.
Reviewed By: #powerpc, lkail
Differential Revision: https://reviews.llvm.org/D77773
The patterns were incorrect copies from the FPU code, and are
unnecessary, since there's no extended load for SPE. Just let LLVM
itself do the work by marking it expand.
Reviewed By: #powerpc, lkail
Differential Revision: https://reviews.llvm.org/D78670
Summary:
Some instructions have set the wrong [RM] flag, this patch is to fix it.
Instructions x(v|s)r(d|s)pi[zmp]? and fri[npzm] use fixed rounding
directions without referencing current rounding mode.
Also, the SETRNDi, SETRND, BCLRn, MTFSFI, MTFSB0, MTFSB1, MTFSFb,
MTFSFI, MTFSFI_rec, MTFSF, MTFSF_rec should also fix the RM flag.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D81360
Summary:
PPC only supports the instruction selection for v16i8, v8i16, v4i32,
v2i64, v4f32 and v2f64 for ISD::SETCC, don't support the v1i128, so
v1i128 for ISD::SETCC will crash.
This patch is to set v1i128 to expand to avoid crash.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D84238
I mixed up the precedence of operators in the assert and thought I
had it right since there was no compiler warning. This just
adds the parentheses in the expression as needed.
Unfortunately this is another regression from my canonicalization patch
(1fed131660). The patch contained two implicit assumptions:
1. That we would have a permuted load only if we are loading a partial vector
2. That a partial vector load would necessarily be as wide as the splat
However, assumption 2 is not correct since it is possible to do a wider
load and only splat a half of it. This patch corrects this assumption by
simply checking if the load is permuted and adjusting the offset if it is.
This patch aims to implement the low order vector multiply, divide and modulo
instructions available on Power10.
The patch involves legalizing the ISD nodes MUL, UDIV, SDIV, UREM and SREM for
v2i64 and v4i32 vector types in order to utilize the following instructions:
- Vector Multiply Low Doubleword: vmulld
- Vector Modulus Word/Doubleword: vmodsw, vmoduw, vmodsd, vmodud
- Vector Divide Word/Doubleword: vdivsw, vdivsd, vdivuw, vdivud
Differential Revision: https://reviews.llvm.org/D82510
Summary:
AIX assembly's .set directive is not usable for aliasing purpose.
We need to use extra-label-at-defintion strategy to generate symbol
aliasing on AIX.
Reviewed By: DiggerLin, Xiangling_L
Differential Revision: https://reviews.llvm.org/D83252
Current powerpc backend generates wrong code sequence if stack pointer
has to realign if `-fstack-clash-protection` enabled. When probing
dynamic stack allocation, current `PREPARE_PROBED_ALLOCA` takes
`NegSizeReg` as input and returns
`FinalStackPtr`. `FinalStackPtr=StackPtr+ActualNegSize` is calculated
correctly, however code following `PREPARE_PROBED_ALLOCA` still uses
value of `NegSizeReg`, which does not contain `ActualNegSize` if
`MaxAlign > TargetAlign`, to calculate loop trip count and residual
number of bytes.
This patch is part of fix of
https://bugs.llvm.org/show_bug.cgi?id=46759.
Differential Revision: https://reviews.llvm.org/D84152
store (load float*) can be optimized to store(load i32*) in InstCombine pass.
Add store (load float*) to isProfitableToHoist to make sure we don't break
the opt in InstCombine pass.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D82341
SUMMARY:
when we call memset, memcopy,memmove etc(this are llvm intrinsic function) in the c source code. the llvm will generate IR
like call call void @llvm.memset.p0i8.i32(i8* align 4 bitcast (%struct.S* @s to i8*), i8 %1, i32 %2, i1 false)
for c source code
bash> cat test_memset.call
struct S{
int a;
int b;
};
extern struct S s;
void bar() {
memset(&s, s.b, s.b);
}
like
%struct.S = type { i32, i32 }
@s = external global %struct.S, align 4
; Function Attrs: noinline nounwind optnone
define void @bar() #0 {
entry:
%0 = load i32, i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 1), align 4
%1 = trunc i32 %0 to i8
%2 = load i32, i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 1), align 4
call void @llvm.memset.p0i8.i32(i8* align 4 bitcast (%struct.S* @s to i8*), i8 %1, i32 %2, i1 false)
ret void
}
declare void @llvm.memset.p0i8.i32(i8* nocapture writeonly, i8, i32, i1 immarg) #1
If we want to let the aix as assembly compile pass without -u
it need to has following assembly code.
.extern .memset
(we do not output extern linkage for llvm instrinsic function.
even if we output the extern linkage for llvm intrinsic function, we should not out .extern llvm.memset.p0i8.i32,
instead of we should emit .extern memset)
for other llvm buildin function floatdidf . even if we do not call these function floatdidf in the c source code(the generated IR also do not the call __floatdidf . the function call
was generated in the LLVM optimized.
the function is not in the functions list of Module, but we still need to emit extern .__floatdidf
The solution for it as :
We record all the lllvm intrinsic extern symbol when transformCallee(), and emit all these symbol in the AsmPrinter::doFinalization(Module &M)
Reviewers: jasonliu, Sean Fertile, hubert.reinterpretcast,
Differential Revision: https://reviews.llvm.org/D78929
This patch adds support for constrained int/fp conversion between
signed/unsigned i32 and f32/f64.
Reviewed By: jhibbits
Differential Revision: https://reviews.llvm.org/D82747
On PPC64, for a variadic function, if va_start is not called, it won't
access any variadic argument on stack, thus we can save stores of
registers used to pass arguments.
Differential Revision: https://reviews.llvm.org/D82361
When legalizing shuffles, we make an attempt to combine it into
a PPC specific canonical form that avoids a need for a swap. If the
combine is successful, we RAUW the node and the custom legalization
replaces the now dead node instead of the one it should replace.
Remove that erroneous call to RAUW.
This patch aims to exploit the xxsplti32dx XT, IX, IMM32 instruction when lowering VECTOR_SHUFFLEs.
We implement lowerToXXSPLTI32DX when lowering vector shuffles to check if:
- Element size is 4 bytes
- The RHS is a constant vector (and constant splat of 4-bytes)
- The shuffle mask is a suitable mask for the XXSPLTI32DX instruction where it is one of the 32 masks:
<0, 4-7, 2, 4-7>
<4-7, 1, 4-7, 3>
Differential Revision: https://reviews.llvm.org/D83245
Summary:
When a desired symbol name contains invalid character that the
system assembler could not process, we need to emit .rename
directive in assembly path in order for that desired symbol name
to appear in the symbol table.
Reviewed By: hubert.reinterpretcast, DiggerLin, daltenty, Xiangling_L
Differential Revision: https://reviews.llvm.org/D82481
Summary: As Bugzilla-35090 reported, the rationale for using custom lowering SREM/UREM should no longer be true. At the IR level, the div-rem-pairs pass performs the transformation where the remainder is computed from the result of the division when both a required. We should now be able to lower these directly on P9. And the pass also fixed the problem that divide is in a different block than the remainder. This is a patch to remove redundant code and make SREM/UREM legal directly on P9.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D82145
This patch is part of supporting `-fstack-clash-protection`. Mainly do
such things compared to existing `lowerDynamicAlloc`
- Added a new pseudo instruction PPC::PREPARE_PROBED_ALLOC to get
actual frame pointer and final stack pointer.
- Synthesize a loop to probe by blocks.
- Use DYNAREAOFFSET to get MaxCallFrameSize which is calculated in
prologepilog.
Differential Revision: https://reviews.llvm.org/D81358
As of 1fed131660, we have code that
changes shuffle masks so that we can put the shuffle in a canonical
form that can be matched to a single instruction. However, it
does not properly account for undef elements in the BUILD_VECTOR
that is the RHS splat so we can end up with undefs where they
shouldn't be. This patch converts the splat input with undefs to
one without.
The situation where the caller uses a TOC and the callee does not
but is marked as clobbers the TOC (st_other=1) was not being compiled
correctly if both functions where in the same object file.
The call site where we had `callee` was missing a nop after the call.
This is because it was assumed that since the two functions where in
the same DSO they would be sharing a TOC. This is not the case if the
callee uses PC Relative because in that case it may clobber the TOC.
This patch makes sure that we add the cnop correctly so that the
linker has a place to restore the TOC.
Reviewers: sfertile, NeHuang, saghir
Differential Revision: https://reviews.llvm.org/D81126
Commit 1fed131660 assumed that shuffle vector canonicalization will
always ensure that the shuffle mask will be ordered so that element
zero comes from the LHS vector. However there is code out there for
which this is not the case. This patch simply removes that unsafe
assumption and makes the code work regardless of the source of the
first element.
We currently miss a number of opportunities to emit single-instruction
VMRG[LH][BHW] instructions for shuffles on little endian subtargets. Although
this in itself is not a huge performance opportunity since loading the permute
vector for a VPERM can always be pulled out of loops, producing such merge
instructions is useful to downstream optimizations.
Since VPERM is essentially opaque to all subsequent optimizations, we want to
avoid it as much as possible. Other permute instructions have semantics that can
be reasoned about much more easily in later optimizations.
This patch does the following:
- Canonicalize shuffles so that the first element comes from the first vector
(since that's what most of the mask matching functions want)
- Switch the elements that come from splat vectors so that they match the
corresponding elements from the other vector (to allow for merges)
- Adds debugging messages for when a shuffle is matched to a VPERM so that
anyone interested in improving this further can get the info for their code
Differential revision: https://reviews.llvm.org/D77448
Summary: A bug is reported in bugzilla-45628, where the swap_with_shift case can’t be matched to a single HW instruction xxswapd as expected.
In fact the case matches the idiom of rotate. We have MatchRotate to handle an ‘or’ of two operands and generate a rot[lr] if the case matches the idiom of rotate. While PPC doesn’t support ROTL v1i128. We can custom lower ROTL v1i128 to the vector_shuffle. The vector_shuffle will be matched to a single HW instruction during the phase of instruction selection.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D81076
This patch adds handling of constrained FP intrinsics about round,
truncate and extend for PowerPC target, with necessary tests.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D64193
On PowerPC, we have vnmsubfp Altivec instruction for fnmsub operation on
v4f32 type. Default pattern for this instruction never works since we
don't have legal fneg for v4f32 when VSX disabled.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D80617
On PowerPC, FNMSUB (both VSX and non-VSX version) means -(a*b-c). But
the backend used to generate these instructions regardless whether nsz
flag exists or not. If a*b-c==0, such transformation changes sign of
zero.
This patch introduces PPC specific FNMSUB ISD opcode, which may help
improving combined FMA code sequence.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D76585
These two nodes were added by 69caef2b78 in 2005
and they are not used by PowerPC backend anymore. And the ISD::FMA is a prefer
way for VMADDFP if we really want to create that node. For VNMSUBFP, we will
also add a more generic node FNMSUB in D76585 if we really want it.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D80429
Summary: Exploit vabsd* for for absolute difference of vectors on P9,
for example:
void foo (char *restrict p, char *restrict q, char *restrict t)
{
for (int i = 0; i < 16; i++)
t[i] = abs (p[i] - q[i]);
}
this case should be matched to the HW instruction vabsdub.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D80271
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given
Differential Revision: https://reviews.llvm.org/D79537
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
As reported in PR45186, we could be in a situation where we don't
want to handle unaligned memory accesses for FP scalars but still
have VSX (which allows unaligned access for vectors). Change the
default to only apply to scalars.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45186
As reported in https://bugs.llvm.org/show_bug.cgi?id=45709 we can hit an
infinite loop in legalization since we set the legalization action for
ISD::SELECT_CC for all fixed length vector types to Promote. Without some
different legalization action for the type being promoted to, the legalizer
simply loops. Since we don't have patterns to match the node, the right
legalization action should be Expand.
Differential revision: https://reviews.llvm.org/D79854
This patch introduces a TargetLowering query, isMulhCheaperThanMulShift.
Currently in DAG Combine, it will transform mulhs/mulhu into a
wider multiply and a shift if the wide multiply is legal.
This TLI function is implemented on 64-bit PowerPC, as it is more desirable to
have multiply-high over multiply + shift for words and doublewords. Having
multiply-high can also aid in further transformations that can be done.
Differential Revision: https://reviews.llvm.org/D78271
The fix for PR39865 took care of some of the handling for half precision
but it missed a number of issues that still exist. This patch fixes the
remaining issues that cause crashes in the PPC back end.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45776
Differential revision: https://reviews.llvm.org/D79283
SplitCSR was only suppored for functions with CXX_FAST_TLS calling
convention. Clang only emits that calling convention for Darwin which is
no longer supported by the PowerPC backend. Another IR producer could
use the calling convention, but considering the calling convention is
meant to be an optimization and the codegen for SplitCSR can be
attrocious on Power (see the modifed lit test) it is best to remove it
and codegen CXX_FAST_TLS same as the C calling convention.
Differential Revision: https://reviews.llvm.org/D79018
Legalizer should respect both command-line options or SDNode-level
fast-math flags.
Also, this patch propagates other flags during custom simplifying.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D79074
This patch adds strict-fp intrinsics support for fma, fsqrt, fmaxnum and
fminnum on PowerPC.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D72749
Summary:
This patch will set the variable PredictableSelectIsExpensive to do the
select to if based on BranchProbability in CodeGenPrepare.
When the BranchProbability more than MinPercentageForPredictableBranch,
PPC will convert SELECT to branch.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D71883
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.
Removing getAlignment() will be done as a follow up.
Differential Revision: https://reviews.llvm.org/D79436
Implement passing of ByVal formal arguments when the argument is passed
partly in the argument registers, with the remainder of the argument
passed on the stack.
Differential Revision: https://reviews.llvm.org/D78515
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Tail Calls were initially disabled for PC Relative code because it was not safe
to make certain assumptions about the tail calls (namely that all compiled
functions no longer used the TOC pointer in R2). However, once all of the
TOC pointer references have been removed it is safe to tail call everything
that was tail called prior to the PC relative additions as well as a number of
new cases.
For example, it is now possible to tail call indirect functions as there is no
need to save and restore the TOC pointer for indirect functions if the caller
is marked as may clobber R2 (st_other=1). For the same reason it is now also
possible to tail call functions that are external.
Differential Revision: https://reviews.llvm.org/D77788
1. Use Subtarget.isUsingPCRelativeCalls() in LowerConstantPool to
check if using PCRelative addressing.
2. Change MO_GOT_FLAG = 32 to MO_GOT_FLAG = 8 in PPC.h to use
consecutive bits.
Differential Revision: https://reviews.llvm.org/D78406
Currently an indirect call produces the following sequence on PCRelative mode:
extern void function( );
extern void (*ptrfunc) ( );
void g() {
ptrfunc=function;
}
void f() {
(*ptrfunc) ( );
}
Producing
paddi 3, 0, .LC0@PCREL, 1
ld 3, 0(3)
std 2, 24(1)
ld 12, 0(3)
mtctr 12
bctrl
ld 2, 24(1)
Though the caller does not use or preserve r2, it is still saved and restored
across a function call. This patch is added to remove these redundant save and
restores for indirect calls.
Differential Revision: https://reviews.llvm.org/D77749
Add initial support for PC Relative addressing to get jump table base
address instead of using TOC.
Differential Revision: https://reviews.llvm.org/D75931
This is an optimization that applies to global addresses and
allows for the following transformation:
Convert this:
paddi r3, 0, symbol@PCREL, 1
ld r4, 8(r3)
To this:
pld r4, symbol@PCREL+8(0), 1
An instruction is saved and the linker can do the addition when
the symbol is resolved.
Differential Revision: https://reviews.llvm.org/D76160
Adds support for passing a ByVal formal argument completely on the stack
(ie after all argument registers are exhausted).
Differential Revision: https://reviews.llvm.org/D78263
Add initial support for PC Relative addressing for global values that
require GOT indirect addressing. This patch adds PCRelative support for
global addresses that may not be known at link time and may require
access through the GOT.
Differential Revision: https://reviews.llvm.org/D76064
This patch adds PC Relative support for global values that are known at link
time. If a global value requires access through the global offset table (GOT)
it is not covered in this patch.
Differential Revision: https://reviews.llvm.org/D75280
In LowerFP_TO_INTForReuse, when emitting `stfiwx`, alignment of 4 is
set for the `MachineMemOperand`, but RLI(ReuseLoadInfo)'s alignment is
not updated for following loads.
It's related to failed alignment check reported in
https://bugs.llvm.org/show_bug.cgi?id=45297
Differential Revision: https://reviews.llvm.org/D77624
Summary:
This patch adds support for handling of variadic functions for AIX.
This includes ensuring that use and consume correct type of
va_list (char *va_list) for AIX.
Authored by: ZarkoCA
Reviewers: cebowleratibm, sfertile, jasonliu
Reviewed by: jasonliu
Differential Revision: https://reviews.llvm.org/D76130
Add initial support for PC Relative addressing for constant pool loads.
This includes adding a new relocation for @pcrel and adding a new PowerPC flag
to identify PC relative addressing.
Differential Revision: https://reviews.llvm.org/D74486
Any or all the argument registers can be used to pass a byval formal
argument, with the limitation that the argument must fit in the
available registers (ie: is not split between registers and stack).
Differential Revision: https://reviews.llvm.org/D76902
On PowerPC most functions require a valid TOC pointer.
This is the case because either the function itself needs to use this
pointer to access the TOC or because other functions that are called
from that function expect a valid TOC pointer in the register R2.
The main exception to this is leaf functions that do not access the TOC
since they are guaranteed not to need a valid TOC pointer.
This patch introduces a feature that will allow more functions to not
require a valid TOC pointer in R2.
Differential Revision: https://reviews.llvm.org/D73664
Summary:
For current architect, we always require setContainingCsect to be
called on every MCSymbol got used in XCOFF context.
This is very hard to achieve because symbols gets created everywhere
and other MCSymbol types(ELF, COFF) do not have similar rules.
It's very easy to miss setting the containing csect, and we would
need to add a lot of XCOFF specialized code around some common code area.
This patch intendeds to do
1. Rely on getFragment().getParent() to get csect from labels.
2. Only use get/setRepresentedCsect (was get/setContainingCsect)
if symbol itself represents a csect.
Reviewers: DiggerLin, hubert.reinterpretcast, daltenty
Differential Revision: https://reviews.llvm.org/D77080
Summary:
In https://bugs.llvm.org/show_bug.cgi?id=45297, it fails selecting
instructions for `PPCISD::ST_VSR_SCAL_INT`. The reason it generate the
`PPCISD::ST_VSR_SCAL_INT` with `-power8-vector` in IR is PPC's
combiner checks `hasP8Altivec` rather than `hasP8Vector`. This patch
should resolve PR45297.
Differential Revision: https://reviews.llvm.org/D76773
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77059
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76925
We can legalize the operation MUL for v8i16 with instruction (vmladduhm A, B, 0)
if altivec enabled. Now, it is set as custom and expand it later, which is not
the right way. And then, we can add the pattern to match the mul + add with (vmladduhm A, B, C)
Reviewed By: Nemanjai
Differential Revision: https://reviews.llvm.org/D76751
On Powerpc fma is faster than fadd + fmul for some types,
(PPCTargetLowering::isFMAFasterThanFMulAndFAdd). we should implement target
hook isProfitableToHoist to prevent simplifyCFGpass from breaking fma
pattern by hoisting fmul to predecessor block.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D76207
This is the first of a series of patches that adds caller support for
by-value arguments. This patch add support for arguments that are passed in a
single GPR.
There are 3 limitation cases:
-The by-value argument is larger than a single register.
-There are no remaining GPRs even though the by-value argument would
otherwise fit in a single GPR.
-The by-value argument requires alignment greater than register width.
Future patches will be required to add support for these cases as well
as for the callee handling (in LowerFormalArguments_AIX) that
corresponds to this work.
Differential Revision: https://reviews.llvm.org/D75863
The PPCISD::SExtVElems was added by commit https://reviews.llvm.org/D34009. However,
we have another ISD node ISD::SIGN_EXTEND_INREG that perfectly match the semantics
of SExtVElems. And the DAGCombiner has some combine rules for SIGN_EXTEND_INREG
that produce better code.
Differential Revision: https://reviews.llvm.org/D70771
Summary:
On 32-bit PPC target[AIX and BE], when we convert an `i64` to `f32`, a `setcc` operand expansion is needed. The expansion will set the result type of expanded `setcc` operation based on if the subtarget use CRBits or not. If the subtarget does use the CRBits, like AIX and BE, then it will set the result type to `i1`, leading to an inconsistency with original `setcc` result type[i32].
And the reason why it crashed underneath is because we don't set result type of setcc consistent in those two places.
This patch fixes this problem by setting original setcc opnode result type also with `getSetCCResultType` interface.
Reviewers: sfertile, cebowleratibm, hubert.reinterpretcast, Xiangling_L
Reviewed By: sfertile
Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75702
This is a follow up to the previous patch: [AIX] Implement caller
arguments passed in stack memory.
This corrects a defect in AIX 64-bit where an i32 is written to the
stack with stw (4 bytes) rather than the expected std (8 bytes.) Integer
arguments pass on the stack as images of their register representation.
I also took the opportunity to tidy up some of the calling convention
AIX tests I added in my last commit. This patch adds the missed assembly
expected output for the stack arg int case, which would have caught this
problem.
Differential Revision: https://reviews.llvm.org/D75126
Allow all ExternalSymbolSDNode on AIX, and rely on the linker error to find
symbols which we don't have definitions from any library/compiler-rt.
Differential Revision: https://reviews.llvm.org/D75075
Summary:
The patch D62993 : `[PowerPC] Emit scalar min/max instructions with unsafe fp math`
has modified the functionality when `Subtarget.hasP9Vector() && (!HasNoInfs || !HasNoNaNs)`,
this modification is not expected.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D74701
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.
This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.
Differential Revision: https://reviews.llvm.org/D75132
Exploit native VSX rounding instruction, x(v|s)r(d|s)pic, which does
rounding using current rounding mode.
According to C standard library, rint may raise INEXACT exception while
nearbyint won't.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D72685
Remove code from LegalizeTypes that allowed this to work.
We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.
Reviewers: courbet
Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73964
This patch implements the caller side of placing function call arguments
in stack memory. This removes the current limitation where LLVM on AIX
will report fatal error when arguments can't be contained in registers.
There is a particular oddity that a float argument that passes in a
register and also in stack memory requires that the caller initialize
both. From what AIX "ABI" documentation I have it's not clear that this
needs to be done, however, it is necessary for compatibility with the
AIX XL compiler so I think it's best to implement it the same way.
Note a later patch will follow to address the callee side.
Differential Revision: https://reviews.llvm.org/D73209
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73885
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.
Reviewers: courbet
Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73785
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
Collect the calling convention and a number of boolean arguments into a
structure to slightly reduces the number of arguments passed around between
LowerCall_<Subtarget>, FinishCall and a few of the helpers. Also
calulates if a call is indirect once using the exisitng helper and caches the
result replacing several instances where we duplicated the logic determining if
a call is indirect.
These intrinsics and the corresponding ISD nodes were recently added. PPC has
instructions that do this for vectors. Legalize them and add patterns to emit
the satuarting instructions.
Differential revision: https://reviews.llvm.org/D71940
For memcpy/memset/memmove etc., replace ExternalSymbolSDNode with a
MCSymbolSDNode, which have a prefix dot before function name as entry
point symbol.
Differential Revision: https://reviews.llvm.org/D70718
Summary:
This patch pushes the AIX vararg unimplemented error diagnostic later
and allows vararg calls so long as all the arguments can be passed in register.
This patch extends the AIX calling convention implementation to initialize
GPR(s) for vararg float arguments. On AIX, both GPR(s) and FPR are allocated
for floating point arguments. The GPR(s) are only initialized for vararg calls,
otherwise the callee is expected to retrieve the float argument in the FPR.
f64 in AIX PPC32 requires special handling in order to allocated and
initialize 2 GPRs. This is performed with bitcast, SRL, truncation to
initialize one GPR for the MSW and bitcast, truncations to initialize
the other GPR for the LSW.
A future patch will follow to add support for arguments passed on the stack.
Patch provided by: cebowleratibm
Reviewers: sfertile, ZarkoCA, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D71013
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
We use o suffix to indicate record form instuctions,
(as it is similar to dot '.' in mne?)
This was fine before, as we did not support XO-form.
However, with https://reviews.llvm.org/D66902,
we now have XO-form support.
It becomes confusing now to still use 'o' for record form,
and it is weird to have something like 'Oo' .
This patch rename all 'o' instructions to use '_rec' instead.
Also rename `isDot` to `isRecordForm`.
Reviewed By: #powerpc, hfinkel, nemanjai, steven.zhang, lkail
Differential Revision: https://reviews.llvm.org/D70758
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).
There's no major functionality change, except for targets that never
implemented this check.
This LLVM attribute was originally added in
d9699bc7bd (2015).
Reviewers: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D72118
Commit 0f0330a787 legalized these nodes on PPC without consideration of
unsafe math which means that we get inexact exceptions raised for nearbyint.
Since this doesn't conform to the standard, switch this legalization to depend
on unsafe fp math.
VSX provides a full complement of rounding instructions yet we somehow ended up
with some of them legal and others not. This just legalizes all of the FP
rounding nodes and the FP -> int rounding nodes with unsafe math.
Differential revision: https://reviews.llvm.org/D69949
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=40554
Some CPU's trap to the kernel on unaligned floating point access and there are
kernels that do not handle the interrupt. The program then fails with a SIGBUS
according to the PR. This just switches the default for unaligned access to only
allow it on recent server CPUs that are known to allow this.
Differential revision: https://reviews.llvm.org/D71954
The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP.
We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse.
Differential Revision: https://reviews.llvm.org/D70657
Summary:
Currently, we set legalization action of `ISD::ROTL` vectors as
`Expand` in `PPCISelLowering`. However, we can exploit `vrl(b|h|w|d)`
to lower `ISD::ROTL` directly.
Differential Revision: https://reviews.llvm.org/D71324
static void *ifunc(void) __attribute__((ifunc("resolver")));
void foo() { ifunc(); }
The relocation produced by the ifunc() call:
1. gcc -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
2. gcc -msecure-plt -PIE => R_PPC_PLTREL24 r_addend=0x8000
3. clang -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
4. clang -msecure-plt -fPIE => R_PPC_REL24
4 is incorrect. The R_PPC_REL24 needs a call stub due to ifunc. If this
relocation is mixed with other R_PPC_PLTREL24(r_addend=0x8000) in a
function, both GNU ld and lld (after D71621 fix) may produce a wrong
result.
This patch fixes 4 to use R_PPC_PLTREL24, which matches GCC.
Both GNU ld and lld (after D71621) will be happy.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D71649
Summary:
The default static (non-PIC, non-PIE) model for 32-bit powerpc does not
use @PLT annotations and relocations in GCC. LLVM shouldn't use @PLT
annotations either, because it breaks secure-PLT linking with (some
versions of?) GNU LD.
Update the available-externally.ll test to reflect that default mode should be
the same as the static relocation, by using the same check prefix.
Reviewed by: sfertile
Differential Revision: https://reviews.llvm.org/D70570
Refactor the splatting of a constant to a vector so that common code is used
both for Power9 and Power8.
Patch by: Anil Mahmud
Differential Revision: https://reviews.llvm.org/D71481
We somehow missed doing this when we were working on Power9 exploitation.
This just adds the missing legalization and cost for producing the vector
intrinsics.
Differential revision: https://reviews.llvm.org/D70436
Extends the desciptor-based indirect call support for 32-bit codegen,
and enables indirect calls for AIX.
In-depth Description:
In a function descriptor based ABI, a function pointer points at a
descriptor structure as opposed to the function's entry point. The
descriptor takes the form of 3 pointers: 1 for the function's entry
point, 1 for the TOC anchor of the module containing the function
definition, and 1 for the environment pointer:
struct FunctionDescriptor {
void *EntryPoint;
void *TOCAnchor;
void *EnvironmentPointer;
};
An indirect call has several steps of loading the the information from
the descriptor into the proper registers for setting up the call. Namely
it has to:
1) Save the caller's TOC pointer into the TOC save slot in the linkage
area, and then load the callee's TOC pointer into the TOC register
(GPR 2 on AIX).
2) Load the function descriptor's entry point into the count register.
3) Load the environment pointer into the environment pointer register
(GPR 11 on AIX).
4) Perform the call by branching on count register.
5) Restore the caller's TOC pointer after returning from the indirect call.
A couple important caveats to the above:
- There is no way to directly load a value from memory into the count register.
Instead we populate the count register by loading the entry point address into
a gpr and then moving the gpr to the count register.
- The TOC restore has to come immediately after the branch on count register
instruction (i.e., the 1st instruction executed after we return from the
call). This is an implementation limitation. We could, in theory, schedule
the restore elsewhere as long as no uses of the TOC pointer fall in between
the call and the restore; however, to keep it simple, we insert a pseudo
instruction that represents both the indirect branch instruction and the
load instruction that restores the caller's TOC from the linkage area. As
they flow through the compiler as a single pseudo instruction, nothing can be
inserted between them and the caller's TOC is then valid at any use.
Differtential Revision: https://reviews.llvm.org/D70724
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
Summary:
This is found during https://reviews.llvm.org/D70758
All the other record forms are having suffix o at the end.
ANDIo8 and ANDISo8 are the only two that put o before 8.
This patch rename them to be consistent with others.
Reviewers: #powerpc, hfinkel, nemanjai, lei, steven.zhang, echristo, jhibbits, joerg
Reviewed By: jhibbits
Subscribers: wuzish, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70928