This patch adds support for constrained int/fp conversion between
signed/unsigned i32 and f32/f64.
Reviewed By: jhibbits
Differential Revision: https://reviews.llvm.org/D82747
On PPC64, for a variadic function, if va_start is not called, it won't
access any variadic argument on stack, thus we can save stores of
registers used to pass arguments.
Differential Revision: https://reviews.llvm.org/D82361
When legalizing shuffles, we make an attempt to combine it into
a PPC specific canonical form that avoids a need for a swap. If the
combine is successful, we RAUW the node and the custom legalization
replaces the now dead node instead of the one it should replace.
Remove that erroneous call to RAUW.
This patch aims to exploit the xxsplti32dx XT, IX, IMM32 instruction when lowering VECTOR_SHUFFLEs.
We implement lowerToXXSPLTI32DX when lowering vector shuffles to check if:
- Element size is 4 bytes
- The RHS is a constant vector (and constant splat of 4-bytes)
- The shuffle mask is a suitable mask for the XXSPLTI32DX instruction where it is one of the 32 masks:
<0, 4-7, 2, 4-7>
<4-7, 1, 4-7, 3>
Differential Revision: https://reviews.llvm.org/D83245
Summary:
When a desired symbol name contains invalid character that the
system assembler could not process, we need to emit .rename
directive in assembly path in order for that desired symbol name
to appear in the symbol table.
Reviewed By: hubert.reinterpretcast, DiggerLin, daltenty, Xiangling_L
Differential Revision: https://reviews.llvm.org/D82481
Summary: As Bugzilla-35090 reported, the rationale for using custom lowering SREM/UREM should no longer be true. At the IR level, the div-rem-pairs pass performs the transformation where the remainder is computed from the result of the division when both a required. We should now be able to lower these directly on P9. And the pass also fixed the problem that divide is in a different block than the remainder. This is a patch to remove redundant code and make SREM/UREM legal directly on P9.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D82145
This patch is part of supporting `-fstack-clash-protection`. Mainly do
such things compared to existing `lowerDynamicAlloc`
- Added a new pseudo instruction PPC::PREPARE_PROBED_ALLOC to get
actual frame pointer and final stack pointer.
- Synthesize a loop to probe by blocks.
- Use DYNAREAOFFSET to get MaxCallFrameSize which is calculated in
prologepilog.
Differential Revision: https://reviews.llvm.org/D81358
As of 1fed131660, we have code that
changes shuffle masks so that we can put the shuffle in a canonical
form that can be matched to a single instruction. However, it
does not properly account for undef elements in the BUILD_VECTOR
that is the RHS splat so we can end up with undefs where they
shouldn't be. This patch converts the splat input with undefs to
one without.
The situation where the caller uses a TOC and the callee does not
but is marked as clobbers the TOC (st_other=1) was not being compiled
correctly if both functions where in the same object file.
The call site where we had `callee` was missing a nop after the call.
This is because it was assumed that since the two functions where in
the same DSO they would be sharing a TOC. This is not the case if the
callee uses PC Relative because in that case it may clobber the TOC.
This patch makes sure that we add the cnop correctly so that the
linker has a place to restore the TOC.
Reviewers: sfertile, NeHuang, saghir
Differential Revision: https://reviews.llvm.org/D81126
Commit 1fed131660 assumed that shuffle vector canonicalization will
always ensure that the shuffle mask will be ordered so that element
zero comes from the LHS vector. However there is code out there for
which this is not the case. This patch simply removes that unsafe
assumption and makes the code work regardless of the source of the
first element.
We currently miss a number of opportunities to emit single-instruction
VMRG[LH][BHW] instructions for shuffles on little endian subtargets. Although
this in itself is not a huge performance opportunity since loading the permute
vector for a VPERM can always be pulled out of loops, producing such merge
instructions is useful to downstream optimizations.
Since VPERM is essentially opaque to all subsequent optimizations, we want to
avoid it as much as possible. Other permute instructions have semantics that can
be reasoned about much more easily in later optimizations.
This patch does the following:
- Canonicalize shuffles so that the first element comes from the first vector
(since that's what most of the mask matching functions want)
- Switch the elements that come from splat vectors so that they match the
corresponding elements from the other vector (to allow for merges)
- Adds debugging messages for when a shuffle is matched to a VPERM so that
anyone interested in improving this further can get the info for their code
Differential revision: https://reviews.llvm.org/D77448
Summary: A bug is reported in bugzilla-45628, where the swap_with_shift case can’t be matched to a single HW instruction xxswapd as expected.
In fact the case matches the idiom of rotate. We have MatchRotate to handle an ‘or’ of two operands and generate a rot[lr] if the case matches the idiom of rotate. While PPC doesn’t support ROTL v1i128. We can custom lower ROTL v1i128 to the vector_shuffle. The vector_shuffle will be matched to a single HW instruction during the phase of instruction selection.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D81076
This patch adds handling of constrained FP intrinsics about round,
truncate and extend for PowerPC target, with necessary tests.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D64193
On PowerPC, we have vnmsubfp Altivec instruction for fnmsub operation on
v4f32 type. Default pattern for this instruction never works since we
don't have legal fneg for v4f32 when VSX disabled.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D80617
On PowerPC, FNMSUB (both VSX and non-VSX version) means -(a*b-c). But
the backend used to generate these instructions regardless whether nsz
flag exists or not. If a*b-c==0, such transformation changes sign of
zero.
This patch introduces PPC specific FNMSUB ISD opcode, which may help
improving combined FMA code sequence.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D76585
These two nodes were added by 69caef2b78 in 2005
and they are not used by PowerPC backend anymore. And the ISD::FMA is a prefer
way for VMADDFP if we really want to create that node. For VNMSUBFP, we will
also add a more generic node FNMSUB in D76585 if we really want it.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D80429
Summary: Exploit vabsd* for for absolute difference of vectors on P9,
for example:
void foo (char *restrict p, char *restrict q, char *restrict t)
{
for (int i = 0; i < 16; i++)
t[i] = abs (p[i] - q[i]);
}
this case should be matched to the HW instruction vabsdub.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D80271
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given
Differential Revision: https://reviews.llvm.org/D79537
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
As reported in PR45186, we could be in a situation where we don't
want to handle unaligned memory accesses for FP scalars but still
have VSX (which allows unaligned access for vectors). Change the
default to only apply to scalars.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45186
As reported in https://bugs.llvm.org/show_bug.cgi?id=45709 we can hit an
infinite loop in legalization since we set the legalization action for
ISD::SELECT_CC for all fixed length vector types to Promote. Without some
different legalization action for the type being promoted to, the legalizer
simply loops. Since we don't have patterns to match the node, the right
legalization action should be Expand.
Differential revision: https://reviews.llvm.org/D79854
This patch introduces a TargetLowering query, isMulhCheaperThanMulShift.
Currently in DAG Combine, it will transform mulhs/mulhu into a
wider multiply and a shift if the wide multiply is legal.
This TLI function is implemented on 64-bit PowerPC, as it is more desirable to
have multiply-high over multiply + shift for words and doublewords. Having
multiply-high can also aid in further transformations that can be done.
Differential Revision: https://reviews.llvm.org/D78271
The fix for PR39865 took care of some of the handling for half precision
but it missed a number of issues that still exist. This patch fixes the
remaining issues that cause crashes in the PPC back end.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45776
Differential revision: https://reviews.llvm.org/D79283
SplitCSR was only suppored for functions with CXX_FAST_TLS calling
convention. Clang only emits that calling convention for Darwin which is
no longer supported by the PowerPC backend. Another IR producer could
use the calling convention, but considering the calling convention is
meant to be an optimization and the codegen for SplitCSR can be
attrocious on Power (see the modifed lit test) it is best to remove it
and codegen CXX_FAST_TLS same as the C calling convention.
Differential Revision: https://reviews.llvm.org/D79018
Legalizer should respect both command-line options or SDNode-level
fast-math flags.
Also, this patch propagates other flags during custom simplifying.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D79074
This patch adds strict-fp intrinsics support for fma, fsqrt, fmaxnum and
fminnum on PowerPC.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D72749
Summary:
This patch will set the variable PredictableSelectIsExpensive to do the
select to if based on BranchProbability in CodeGenPrepare.
When the BranchProbability more than MinPercentageForPredictableBranch,
PPC will convert SELECT to branch.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D71883
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.
Removing getAlignment() will be done as a follow up.
Differential Revision: https://reviews.llvm.org/D79436
Implement passing of ByVal formal arguments when the argument is passed
partly in the argument registers, with the remainder of the argument
passed on the stack.
Differential Revision: https://reviews.llvm.org/D78515
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Tail Calls were initially disabled for PC Relative code because it was not safe
to make certain assumptions about the tail calls (namely that all compiled
functions no longer used the TOC pointer in R2). However, once all of the
TOC pointer references have been removed it is safe to tail call everything
that was tail called prior to the PC relative additions as well as a number of
new cases.
For example, it is now possible to tail call indirect functions as there is no
need to save and restore the TOC pointer for indirect functions if the caller
is marked as may clobber R2 (st_other=1). For the same reason it is now also
possible to tail call functions that are external.
Differential Revision: https://reviews.llvm.org/D77788
1. Use Subtarget.isUsingPCRelativeCalls() in LowerConstantPool to
check if using PCRelative addressing.
2. Change MO_GOT_FLAG = 32 to MO_GOT_FLAG = 8 in PPC.h to use
consecutive bits.
Differential Revision: https://reviews.llvm.org/D78406
Currently an indirect call produces the following sequence on PCRelative mode:
extern void function( );
extern void (*ptrfunc) ( );
void g() {
ptrfunc=function;
}
void f() {
(*ptrfunc) ( );
}
Producing
paddi 3, 0, .LC0@PCREL, 1
ld 3, 0(3)
std 2, 24(1)
ld 12, 0(3)
mtctr 12
bctrl
ld 2, 24(1)
Though the caller does not use or preserve r2, it is still saved and restored
across a function call. This patch is added to remove these redundant save and
restores for indirect calls.
Differential Revision: https://reviews.llvm.org/D77749
Add initial support for PC Relative addressing to get jump table base
address instead of using TOC.
Differential Revision: https://reviews.llvm.org/D75931
This is an optimization that applies to global addresses and
allows for the following transformation:
Convert this:
paddi r3, 0, symbol@PCREL, 1
ld r4, 8(r3)
To this:
pld r4, symbol@PCREL+8(0), 1
An instruction is saved and the linker can do the addition when
the symbol is resolved.
Differential Revision: https://reviews.llvm.org/D76160
Adds support for passing a ByVal formal argument completely on the stack
(ie after all argument registers are exhausted).
Differential Revision: https://reviews.llvm.org/D78263
Add initial support for PC Relative addressing for global values that
require GOT indirect addressing. This patch adds PCRelative support for
global addresses that may not be known at link time and may require
access through the GOT.
Differential Revision: https://reviews.llvm.org/D76064
This patch adds PC Relative support for global values that are known at link
time. If a global value requires access through the global offset table (GOT)
it is not covered in this patch.
Differential Revision: https://reviews.llvm.org/D75280
In LowerFP_TO_INTForReuse, when emitting `stfiwx`, alignment of 4 is
set for the `MachineMemOperand`, but RLI(ReuseLoadInfo)'s alignment is
not updated for following loads.
It's related to failed alignment check reported in
https://bugs.llvm.org/show_bug.cgi?id=45297
Differential Revision: https://reviews.llvm.org/D77624
Summary:
This patch adds support for handling of variadic functions for AIX.
This includes ensuring that use and consume correct type of
va_list (char *va_list) for AIX.
Authored by: ZarkoCA
Reviewers: cebowleratibm, sfertile, jasonliu
Reviewed by: jasonliu
Differential Revision: https://reviews.llvm.org/D76130
Add initial support for PC Relative addressing for constant pool loads.
This includes adding a new relocation for @pcrel and adding a new PowerPC flag
to identify PC relative addressing.
Differential Revision: https://reviews.llvm.org/D74486
Any or all the argument registers can be used to pass a byval formal
argument, with the limitation that the argument must fit in the
available registers (ie: is not split between registers and stack).
Differential Revision: https://reviews.llvm.org/D76902
On PowerPC most functions require a valid TOC pointer.
This is the case because either the function itself needs to use this
pointer to access the TOC or because other functions that are called
from that function expect a valid TOC pointer in the register R2.
The main exception to this is leaf functions that do not access the TOC
since they are guaranteed not to need a valid TOC pointer.
This patch introduces a feature that will allow more functions to not
require a valid TOC pointer in R2.
Differential Revision: https://reviews.llvm.org/D73664
Summary:
For current architect, we always require setContainingCsect to be
called on every MCSymbol got used in XCOFF context.
This is very hard to achieve because symbols gets created everywhere
and other MCSymbol types(ELF, COFF) do not have similar rules.
It's very easy to miss setting the containing csect, and we would
need to add a lot of XCOFF specialized code around some common code area.
This patch intendeds to do
1. Rely on getFragment().getParent() to get csect from labels.
2. Only use get/setRepresentedCsect (was get/setContainingCsect)
if symbol itself represents a csect.
Reviewers: DiggerLin, hubert.reinterpretcast, daltenty
Differential Revision: https://reviews.llvm.org/D77080
Summary:
In https://bugs.llvm.org/show_bug.cgi?id=45297, it fails selecting
instructions for `PPCISD::ST_VSR_SCAL_INT`. The reason it generate the
`PPCISD::ST_VSR_SCAL_INT` with `-power8-vector` in IR is PPC's
combiner checks `hasP8Altivec` rather than `hasP8Vector`. This patch
should resolve PR45297.
Differential Revision: https://reviews.llvm.org/D76773
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77059
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76925
We can legalize the operation MUL for v8i16 with instruction (vmladduhm A, B, 0)
if altivec enabled. Now, it is set as custom and expand it later, which is not
the right way. And then, we can add the pattern to match the mul + add with (vmladduhm A, B, C)
Reviewed By: Nemanjai
Differential Revision: https://reviews.llvm.org/D76751
On Powerpc fma is faster than fadd + fmul for some types,
(PPCTargetLowering::isFMAFasterThanFMulAndFAdd). we should implement target
hook isProfitableToHoist to prevent simplifyCFGpass from breaking fma
pattern by hoisting fmul to predecessor block.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D76207
This is the first of a series of patches that adds caller support for
by-value arguments. This patch add support for arguments that are passed in a
single GPR.
There are 3 limitation cases:
-The by-value argument is larger than a single register.
-There are no remaining GPRs even though the by-value argument would
otherwise fit in a single GPR.
-The by-value argument requires alignment greater than register width.
Future patches will be required to add support for these cases as well
as for the callee handling (in LowerFormalArguments_AIX) that
corresponds to this work.
Differential Revision: https://reviews.llvm.org/D75863
The PPCISD::SExtVElems was added by commit https://reviews.llvm.org/D34009. However,
we have another ISD node ISD::SIGN_EXTEND_INREG that perfectly match the semantics
of SExtVElems. And the DAGCombiner has some combine rules for SIGN_EXTEND_INREG
that produce better code.
Differential Revision: https://reviews.llvm.org/D70771
Summary:
On 32-bit PPC target[AIX and BE], when we convert an `i64` to `f32`, a `setcc` operand expansion is needed. The expansion will set the result type of expanded `setcc` operation based on if the subtarget use CRBits or not. If the subtarget does use the CRBits, like AIX and BE, then it will set the result type to `i1`, leading to an inconsistency with original `setcc` result type[i32].
And the reason why it crashed underneath is because we don't set result type of setcc consistent in those two places.
This patch fixes this problem by setting original setcc opnode result type also with `getSetCCResultType` interface.
Reviewers: sfertile, cebowleratibm, hubert.reinterpretcast, Xiangling_L
Reviewed By: sfertile
Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75702
This is a follow up to the previous patch: [AIX] Implement caller
arguments passed in stack memory.
This corrects a defect in AIX 64-bit where an i32 is written to the
stack with stw (4 bytes) rather than the expected std (8 bytes.) Integer
arguments pass on the stack as images of their register representation.
I also took the opportunity to tidy up some of the calling convention
AIX tests I added in my last commit. This patch adds the missed assembly
expected output for the stack arg int case, which would have caught this
problem.
Differential Revision: https://reviews.llvm.org/D75126
Allow all ExternalSymbolSDNode on AIX, and rely on the linker error to find
symbols which we don't have definitions from any library/compiler-rt.
Differential Revision: https://reviews.llvm.org/D75075
Summary:
The patch D62993 : `[PowerPC] Emit scalar min/max instructions with unsafe fp math`
has modified the functionality when `Subtarget.hasP9Vector() && (!HasNoInfs || !HasNoNaNs)`,
this modification is not expected.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D74701
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.
This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.
Differential Revision: https://reviews.llvm.org/D75132
Exploit native VSX rounding instruction, x(v|s)r(d|s)pic, which does
rounding using current rounding mode.
According to C standard library, rint may raise INEXACT exception while
nearbyint won't.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D72685
Remove code from LegalizeTypes that allowed this to work.
We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.
Reviewers: courbet
Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73964
This patch implements the caller side of placing function call arguments
in stack memory. This removes the current limitation where LLVM on AIX
will report fatal error when arguments can't be contained in registers.
There is a particular oddity that a float argument that passes in a
register and also in stack memory requires that the caller initialize
both. From what AIX "ABI" documentation I have it's not clear that this
needs to be done, however, it is necessary for compatibility with the
AIX XL compiler so I think it's best to implement it the same way.
Note a later patch will follow to address the callee side.
Differential Revision: https://reviews.llvm.org/D73209
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73885
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.
Reviewers: courbet
Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73785
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
Collect the calling convention and a number of boolean arguments into a
structure to slightly reduces the number of arguments passed around between
LowerCall_<Subtarget>, FinishCall and a few of the helpers. Also
calulates if a call is indirect once using the exisitng helper and caches the
result replacing several instances where we duplicated the logic determining if
a call is indirect.
These intrinsics and the corresponding ISD nodes were recently added. PPC has
instructions that do this for vectors. Legalize them and add patterns to emit
the satuarting instructions.
Differential revision: https://reviews.llvm.org/D71940
For memcpy/memset/memmove etc., replace ExternalSymbolSDNode with a
MCSymbolSDNode, which have a prefix dot before function name as entry
point symbol.
Differential Revision: https://reviews.llvm.org/D70718
Summary:
This patch pushes the AIX vararg unimplemented error diagnostic later
and allows vararg calls so long as all the arguments can be passed in register.
This patch extends the AIX calling convention implementation to initialize
GPR(s) for vararg float arguments. On AIX, both GPR(s) and FPR are allocated
for floating point arguments. The GPR(s) are only initialized for vararg calls,
otherwise the callee is expected to retrieve the float argument in the FPR.
f64 in AIX PPC32 requires special handling in order to allocated and
initialize 2 GPRs. This is performed with bitcast, SRL, truncation to
initialize one GPR for the MSW and bitcast, truncations to initialize
the other GPR for the LSW.
A future patch will follow to add support for arguments passed on the stack.
Patch provided by: cebowleratibm
Reviewers: sfertile, ZarkoCA, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D71013
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
We use o suffix to indicate record form instuctions,
(as it is similar to dot '.' in mne?)
This was fine before, as we did not support XO-form.
However, with https://reviews.llvm.org/D66902,
we now have XO-form support.
It becomes confusing now to still use 'o' for record form,
and it is weird to have something like 'Oo' .
This patch rename all 'o' instructions to use '_rec' instead.
Also rename `isDot` to `isRecordForm`.
Reviewed By: #powerpc, hfinkel, nemanjai, steven.zhang, lkail
Differential Revision: https://reviews.llvm.org/D70758
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).
There's no major functionality change, except for targets that never
implemented this check.
This LLVM attribute was originally added in
d9699bc7bd (2015).
Reviewers: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D72118
Commit 0f0330a787 legalized these nodes on PPC without consideration of
unsafe math which means that we get inexact exceptions raised for nearbyint.
Since this doesn't conform to the standard, switch this legalization to depend
on unsafe fp math.
VSX provides a full complement of rounding instructions yet we somehow ended up
with some of them legal and others not. This just legalizes all of the FP
rounding nodes and the FP -> int rounding nodes with unsafe math.
Differential revision: https://reviews.llvm.org/D69949
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=40554
Some CPU's trap to the kernel on unaligned floating point access and there are
kernels that do not handle the interrupt. The program then fails with a SIGBUS
according to the PR. This just switches the default for unaligned access to only
allow it on recent server CPUs that are known to allow this.
Differential revision: https://reviews.llvm.org/D71954
The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP.
We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse.
Differential Revision: https://reviews.llvm.org/D70657
Summary:
Currently, we set legalization action of `ISD::ROTL` vectors as
`Expand` in `PPCISelLowering`. However, we can exploit `vrl(b|h|w|d)`
to lower `ISD::ROTL` directly.
Differential Revision: https://reviews.llvm.org/D71324
static void *ifunc(void) __attribute__((ifunc("resolver")));
void foo() { ifunc(); }
The relocation produced by the ifunc() call:
1. gcc -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
2. gcc -msecure-plt -PIE => R_PPC_PLTREL24 r_addend=0x8000
3. clang -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
4. clang -msecure-plt -fPIE => R_PPC_REL24
4 is incorrect. The R_PPC_REL24 needs a call stub due to ifunc. If this
relocation is mixed with other R_PPC_PLTREL24(r_addend=0x8000) in a
function, both GNU ld and lld (after D71621 fix) may produce a wrong
result.
This patch fixes 4 to use R_PPC_PLTREL24, which matches GCC.
Both GNU ld and lld (after D71621) will be happy.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D71649
Summary:
The default static (non-PIC, non-PIE) model for 32-bit powerpc does not
use @PLT annotations and relocations in GCC. LLVM shouldn't use @PLT
annotations either, because it breaks secure-PLT linking with (some
versions of?) GNU LD.
Update the available-externally.ll test to reflect that default mode should be
the same as the static relocation, by using the same check prefix.
Reviewed by: sfertile
Differential Revision: https://reviews.llvm.org/D70570
Refactor the splatting of a constant to a vector so that common code is used
both for Power9 and Power8.
Patch by: Anil Mahmud
Differential Revision: https://reviews.llvm.org/D71481
We somehow missed doing this when we were working on Power9 exploitation.
This just adds the missing legalization and cost for producing the vector
intrinsics.
Differential revision: https://reviews.llvm.org/D70436
Extends the desciptor-based indirect call support for 32-bit codegen,
and enables indirect calls for AIX.
In-depth Description:
In a function descriptor based ABI, a function pointer points at a
descriptor structure as opposed to the function's entry point. The
descriptor takes the form of 3 pointers: 1 for the function's entry
point, 1 for the TOC anchor of the module containing the function
definition, and 1 for the environment pointer:
struct FunctionDescriptor {
void *EntryPoint;
void *TOCAnchor;
void *EnvironmentPointer;
};
An indirect call has several steps of loading the the information from
the descriptor into the proper registers for setting up the call. Namely
it has to:
1) Save the caller's TOC pointer into the TOC save slot in the linkage
area, and then load the callee's TOC pointer into the TOC register
(GPR 2 on AIX).
2) Load the function descriptor's entry point into the count register.
3) Load the environment pointer into the environment pointer register
(GPR 11 on AIX).
4) Perform the call by branching on count register.
5) Restore the caller's TOC pointer after returning from the indirect call.
A couple important caveats to the above:
- There is no way to directly load a value from memory into the count register.
Instead we populate the count register by loading the entry point address into
a gpr and then moving the gpr to the count register.
- The TOC restore has to come immediately after the branch on count register
instruction (i.e., the 1st instruction executed after we return from the
call). This is an implementation limitation. We could, in theory, schedule
the restore elsewhere as long as no uses of the TOC pointer fall in between
the call and the restore; however, to keep it simple, we insert a pseudo
instruction that represents both the indirect branch instruction and the
load instruction that restores the caller's TOC from the linkage area. As
they flow through the compiler as a single pseudo instruction, nothing can be
inserted between them and the caller's TOC is then valid at any use.
Differtential Revision: https://reviews.llvm.org/D70724
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
Summary:
This is found during https://reviews.llvm.org/D70758
All the other record forms are having suffix o at the end.
ANDIo8 and ANDISo8 are the only two that put o before 8.
This patch rename them to be consistent with others.
Reviewers: #powerpc, hfinkel, nemanjai, lei, steven.zhang, echristo, jhibbits, joerg
Reviewed By: jhibbits
Subscribers: wuzish, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70928
Refactor FinishCall to be more easily understandable as a precursor to
implementing indirect calls for AIX. The refactor tries to group similar
code together at the cost of some code duplication. The high level
overview of the refactor:
- Adds a number of helper functions for things like:
* Determining if a call is indirect.
* What the Opcode for a call is.
* Transforming the callee for a direct function call.
* Extracting the Chain operand from a CallSeqStart node.
* Building the operands of the call.
- Adds helpers for building the indirect call DAG nodes
(excluding the call instruction itself which is created in
`FinishCall`).
- Removes PrepareCall, which has been subsumed by the
helpers.
- Rename 'InFlag' to 'Glue'.
- FinishCall has been refactored to:
1) Set TOC pointer usage on the DAG for the TOC based
subtargets.
2) Calculate if a call is indirect.
3) Determine the Opcode to use for the call
instruction.
4) Transform the Callee for direct calls, or build
the DAG nodes for indirect calls.
5) Buildup the call operands.
6) Emit the call instruction.
7) If needed, emit the callSeqEnd Node and
finish lowering by calling `LowerCallResult`
Differential Revision: https://reviews.llvm.org/D70126
This patch adds LowerFormalArguments_AIX, support is added for lowering
int, float, and double formal arguments into general purpose and
floating point registers only.
The aix calling convention testcase have been redone to test for caller
and callee functionality in the same lit test.
Patch by Zarko Todorovski!
Differential Revision: https://reviews.llvm.org/D69578
This is a continuation of D70262
The previous patch as listed above added the future CPU in clang. This patch
adds the future CPU in the PowerPC backend. At this point the patch simply
assumes that a future CPU will have the same characteristics as pwr9. Those
characteristics may change with later patches.
Differential Revision: https://reviews.llvm.org/D70333
Summary:
This patch renames the DarwinDirective (used to identify which CPU was defined)
to CPUDirective. It also adds the getCPUDirective() method and replaces all uses
of getDarwinDirective() with getCPUDirective().
Once this patch lands and downstream users of the getDarwinDirective() method
have switched to the getCPUDirective() method, the old getDarwinDirective()
method will be removed.
Reviewers: nemanjai, hfinkel, power-llvm-team, jsji, echristo, #powerpc, jhibbits
Reviewed By: hfinkel, jsji, jhibbits
Subscribers: hiraditya, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70352
If an inline asm statement clobbers a VSX register that overlaps with a
callee-saved Altivec register or FPR, we will not record the clobber and will
therefore violate the ABI. This is clearly a bug so this patch fixes it.
Differential revision: https://reviews.llvm.org/D68576
Summary:
This patch sets up the infrastructure for
1. Associate MCSymbolXCOFF with an MCSectionXCOFF when it could not
get implicitly associated.
2. Generate undefined symbols. The patch itself generates undefined symbol
for external function call only. Generate undefined symbol for external
global variable and external function descriptors will be handled in
separate patch(s) after this is land.
Differential Revision: https://reviews.llvm.org/D70443
Power9 has instructions to implement the semantics of SIGN_EXTEND_INREG for vector type.
Mark it as legal and add the match pattern.
Differential Revision: https://reviews.llvm.org/D69601
This patch lowering jump table, constant pool and block address in assembly.
1. On AIX, jump table index is always relative;
2. Put CPI and JTI into ReadOnlySection until we support unique data sections;
3. Create the temp symbol for block address symbol;
4. Update MIR testcases and add related assembly part;
Differential Revision: https://reviews.llvm.org/D70243
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
This option allows the user to specify the use of absolute jumptables instead
of relative which is the default on most PPC subtargets.
Patch by Kamauu Bridgeman
Differential revision: https://reviews.llvm.org/D69108
VSX provides floating point minimum and maximum instructions that conform
to IEEE semantics. This legalizes the respective nodes and emits VSX code
for them. Furthermore, on Power9 cores we have xsmaxcdp and xsmincdp
instructions that conform to language semantics for the conditional operator
even in the presence of NaNs.
Differential revision: https://reviews.llvm.org/D62993
This patch reworks the AIX call lowering to use CCState. Some defensive errors
are added in this patch to protect from emitting bad code for calling convention
logic that has not been implemented by design. The use of CCState follows the
precedent of other targets and enables the reuse of calling convention logic in
LowerFormalArguments, which will be rewritten to also use CCState in a late
patch.
Patch by Chris Bowler.
Differential Revision: https://reviews.llvm.org/D69101
Replace with the MachineFunction. X86 is the only user, and only uses
it for the function. This removes one obstacle from using this in
GlobalISel. The other is the more tolerable EVT argument.
The X86 use of the function seems questionable to me. It checks hasFP,
before frame lowering.
llvm-svn: 373292
We always(and only) check the NLP flag after calling
classifyGlobalReference to see whether it is accessed
indirectly.
Refactor to code to use isGVIndirectSym instead.
llvm-svn: 372417
We currently produce a load, followed by (possibly a move for integers and) a
splat as separate instructions. VSX has always had a splatting load for
doublewords, but as of Power9, we have it for words as well. This patch just
exploits these instructions.
Differential revision: https://reviews.llvm.org/D63624
llvm-svn: 372139
* Reordered MVT simple types to group scalable vector types
together.
* New range functions in MachineValueType.h to only iterate over
the fixed-length int/fp vector types.
* Stopped backends which don't support scalable vector types from
iterating over scalable types.
Reviewers: sdesmalen, greened
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D66339
llvm-svn: 372099
Add the missing piece of r372029.
Somehow when the patch for review D61961 was committed, only the test case
went in and the code didn't. This of course caused all kinds of build bot
breaks.
This patch just adds the code for that patch.
Author: Lei Huang
Differential revision: https://reviews.llvm.org/D61961
llvm-svn: 372043
Summary:
Since the SPE4RC register class contains an identical set of registers
and an identical spill size to the GPRC class its slightly confusing
the tablegen emitter. It's preventing the GPRC_and_GPRC_NOR0 synthesized
register class from inheriting VTs and AltOrders from GPRC or GPRC_NOR0.
This is because SPE4C is found first in the super register class list
when inheriting these properties and it doesn't set the VTs or
AltOrders the same way as GPRC or GPRC_NOR0.
This patch replaces all uses of GPE4RC with GPRC and allows GPRC and
GPRC_NOR0 to contain f32.
The test changes here are because the AltOrders are being inherited
to GPRC_NOR0 now.
Found while trying to determine if getCommonSubClass needs to take
a VT argument. It was originally added to support fp128 on x86-64,
I've changed some things about that so that it might be needed
anymore. But a PowerPC test crashed without it and I think its
due to this subclass issue.
Reviewers: jhibbits, nemanjai, kbarton, hfinkel
Subscribers: wuzish, nemanjai, mehdi_amini, hiraditya, kbarton, MaskRay, dexonsmith, jsji, shchenz, steven.zhang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67513
llvm-svn: 371779
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: nemanjai, javed.absar, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, ychen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67267
llvm-svn: 371212
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67229
llvm-svn: 371200
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
The smin opcode and friends for v1i128 are incorrectly marked as legal for PPC.
Change them to expand.
Differential Revision: https://reviews.llvm.org/D64960
llvm-svn: 369797
A lot of places in the code combine checks for both ABI (SVR4/Darwin/AIX) and
addressing mode (64-bit vs 32-bit). In an attempt to make some of the code more
readable I've added a couple functions that combine checking for the ELF abi and
64-bit/32-bit code at once. As we add more AIX support I intend to add similar
functions for the AIX ABI.
Differential Revision: https://reviews.llvm.org/D65814
llvm-svn: 369658
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
This patch implements global address lowering for 32/64 bit with small/large code models.
1.For 32bit large code model on AIX, there are newly added pseudo opcode LWZtocL & ADDIStocHA32, the support of which on MC layer will be
provided by future patches.
2.The default code model on AIX should be small code model.
3.Since AIX does not have medium code model, "report_fatal_error" when users specify it.
Differential Revision: https://reviews.llvm.org/D63547
llvm-svn: 368744
The legalizer would hit an assertion on PowerPC platform when truncating
a vector whose size is not power of 2. This patch is to add a check to
prevent vectors with such odd-size elements from being custom lowered.
Reviewed By: Hal Finkel
Differential Revision: https://reviews.llvm.org/D65261
llvm-svn: 368654
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, jfb, jakehehrlich
Reviewed By: jfb
Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65514
llvm-svn: 367828
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target.
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.
Differential Revision: https://reviews.llvm.org/D65063
llvm-svn: 367516
In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target.
So we can combine vector load + reverse into big endian load to eliminate the swap instruction.
Also combine vector reverse + store into big endian store.
llvm-svn: 367382
Summary:
Since we are planning to add ADDIStocHA for 32bit in later patch, we decided
to change 64bit one first to follow naming convention with 8 behind opcode.
Patch by: Xiangling_L
Differential Revision: https://reviews.llvm.org/D64814
llvm-svn: 366731
Summary:
Pointed out in a comment for D49754, register spilling will currently
spill SPE registers at almost any offset. However, the instructions
`evstdd` and `evldd` require a) 8-byte alignment, and b) a limit of 256
(unsigned) bytes from the base register, as the offset must fix into a
5-bit offset, which ranges from 0-31 (indexed in double-words).
The update to the register spill test is taken partially from the test
case shown in D49754.
Additionally, pointed out by Kei Thomsen, globals will currently use
evldd/evstdd, though the offset isn't known at compile time, so may
exceed the 8-bit (unsigned) offset permitted. This fixes that as well,
by forcing it to always use evlddx/evstddx when accessing globals.
Part of the patch contributed by Kei Thomsen.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54409
llvm-svn: 366318
Summary: llvm/IR/GlobalValue.h can't be included in MC, that creates a circular dependency between MC and IR libraries. This circular dependency is causing an issue for build system that enforce layering.
Author: Xiangling_L
Reviewers: sfertile, jasonliu, hubert.reinterpretcast, gribozavr
Reviewed By: gribozavr
Subscribers: wuzish, nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64445
llvm-svn: 365701
Summary:
"ww" and "ws" are both constraint codes for VSX vector registers that
hold scalar double data. "ww" is preferred for float while "ws" is
preferred for double.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D64119
llvm-svn: 365106
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.
Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.
Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.
Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel
Reviewed By: efriedma
Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64090
llvm-svn: 365010
This was reported in https://bugs.llvm.org/show_bug.cgi?id=41751
llvm-mc aborted when disassembling tabortdc.
This patch try to clean up TM related DAGs.
* Fixes the problem by remove explicit output of cr0, and put it as implicit def.
* Update int_ppc_tbegin pattern to accommodate the implicit def of cr0.
* Update the TCHECK operand and int_ppc_tcheck accordingly.
* Add some builtin test and disassembly tests.
* Remove unused CRRC0/crrc0
Differential Revision: https://reviews.llvm.org/D61935
llvm-svn: 364544
This was just an omission in the back end. We have had the instructions for both
single and double precision for a few HW generations, but never got around to
legalizing these.
Differential revision: https://reviews.llvm.org/D63634
llvm-svn: 364373
Avoids using a plain unsigned for registers throughoug codegen.
Doesn't attempt to change every register use, just something a little
more than the set needed to build after changing the return type of
MachineOperand::getReg().
llvm-svn: 364191
Summary:
SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.
For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:
evstdd 5, X(1)
lwz 3, X(1)
lwz 4, X+4(1)
Likewise, to form a double into r5 from args in r3 and r4:
stw 3, X(1)
stw 4, X+4(1)
evldd 5, X(1)
This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:
mr 4, 5
evmergehi 3, 5, 5
And to form a double into r5 from args in r3 and r4:
evmergelo 5, 3, 4
This is comparable to the way that gcc generates the double splits.
This also fixes a bug with expanding builtins to libcalls, where the
LowerCallTo() code path was generating intermediate illegal type nodes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54583
llvm-svn: 363526
Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
the loop will depend on the hotness check and other logic in alignBlocks.
The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
to 32 bytes not only for the hot loop whose size is 16~32 bytes.
Reviewed By: steven.zhang, jsji
Differential Revision: https://reviews.llvm.org/D61228
llvm-svn: 363495
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
Patch which introduces a target-independent framework for generating
hardware loops at the IR level. Most of the code has been taken from
PowerPC CTRLoops and PowerPC has been ported over to use this generic
pass. The target dependent parts have been moved into
TargetTransformInfo, via isHardwareLoopProfitable, with
HardwareLoopInfo introduced to transfer information from the backend.
Three generic intrinsics have been introduced:
- void @llvm.set_loop_iterations
Takes as a single operand, the number of iterations to be executed.
- i1 @llvm.loop_decrement(anyint)
Takes the maximum number of elements processed in an iteration of
the loop body and subtracts this from the total count. Returns
false when the loop should exit.
- anyint @llvm.loop_decrement_reg(anyint, anyint)
Takes the number of elements remaining to be processed as well as
the maximum numbe of elements processed in an iteration of the loop
body. Returns the updated number of elements remaining.
llvm-svn: 362774
Use the PPC vector min/max instructions for computing the corresponding
operation as these should be faster than the compare/select sequences
we currently emit.
Differential revision: https://reviews.llvm.org/D47332
llvm-svn: 362759
Summary:
(1) Function descriptor on AIX
On AIX, a called routine may have 2 distinct symbols associated with it:
* A function descriptor (Name)
* A function entry point (.Name)
The descriptor structure on AIX is the same as those in the ELF V1 ABI:
* The address of the entry point of the function.
* The TOC base address for the function.
* The environment pointer.
The descriptor symbol uses the same name as the source level function in C.
The function entry point is analogous to the symbol we would generate for a
function in a non-descriptor-based ABI, except that it is renamed by
prepending a ".".
Which symbol gets referenced depends on the context:
* Taking the address of the function references the descriptor symbol.
* Calling the function references the entry point symbol.
(2) Speaking of implementation on AIX, for direct function call target, we
create proper MCSymbol SDNode(e.g . ".foo") while constructing SDAG to
replace original TargetGlobalAddress SDNode. Then down the path, we can
take advantage of this MCSymbol.
Patch by: Xiangling_L
Reviewed by: sfertile, hubert.reinterpretcast, jasonliu, syzaara
Differential Revision: https://reviews.llvm.org/D62532
llvm-svn: 362735
Summary:
This patch implements SDAG call lowering on AIX for functions
which only have parameters that could fit into GPRs.
Reviewers: hubert.reinterpretcast, syzaara
Differential Revision: https://reviews.llvm.org/D62823
llvm-svn: 362708
Summary:dd
This patch implements call lowering for calls without parameters
on AIX as initial support.
Reviewers: sfertile, hubert.reinterpretcast, aheejin, efriedma
Differential Revision: https://reviews.llvm.org/D61948
llvm-svn: 361669
The single-constant algorithm produces infinities on a lot of denormal values.
The precision of the two-constant algorithm is actually sufficient across the
range of denormals. We will switch to that algorithm for now to avoid the
infinities on denormals. In the future, we will re-evaluate the algorithm to
find the optimal one for PowerPC.
Differential revision: https://reviews.llvm.org/D60037
llvm-svn: 360144
A condition for exiting the legalization of v4i32 conversion to v2f64 through
extract/convert/build erroneously checks for the extract having type i32.
This is not adequate as smaller extracts are actually legalized to i32 as well.
Furthermore, an early exit is missing which means that we only check that
both extracts are from the same vector if that check fails.
As a result, both cases in the included test case fail - the first gets a
select error and the second generates incorrect code.
The culprit commit is r274535.
llvm-svn: 360043
Summary:
Based on the Eli Friedman's comments in https://reviews.llvm.org/D60811 , we'd better return early if the element type is not byte-sized in `combineBVOfConsecutiveLoads`.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61076
llvm-svn: 359764
The MachineFunction wasn't used in getOptimalMemOpType, but more importantly,
this allows reuse of findOptimalMemOpLowering that is calling getOptimalMemOpType.
This is the groundwork for the changes in D59766 and D59787, that allows
implementation of TTI::getMemcpyCost.
Differential Revision: https://reviews.llvm.org/D59785
llvm-svn: 359537
Change the PPCISelLowering.cpp function that decides to avoid update form in
favor of partial vector loads to know about newer load types and to not be
confused by the chain operand.
Differential Revision: https://reviews.llvm.org/D60102
llvm-svn: 359504
Using initial-exec TLS variables is a reasonable performance
optimisation for system libraries. Use the correct PIC mechanism to get
hold of the GOT to avoid text relocations.
Differential Revision: https://reviews.llvm.org/D61026
llvm-svn: 359146
Summary:
This issue from the bugzilla: https://bugs.llvm.org/show_bug.cgi?id=41177
When the two operands for BUILD_VECTOR are same, we will get assert error.
llvm::SDValue combineBVOfConsecutiveLoads(llvm::SDNode*, llvm::SelectionDAG&):
Assertion `!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) &&
"The loads cannot be both consecutive and reverse consecutive."' failed.
This error caused by the wrong ElemSIze when calling isConsecutiveLS(). We
should use `getScalarType().getStoreSize();` to get the ElemSize instread of
`getScalarSizeInBits() / 8`.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D60811
llvm-svn: 358644
Summary:
PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode.
double __builtin_setrnd(int mode);
The effective values for mode are:
0 - round to nearest
1 - round to zero
2 - round to +infinity
3 - round to -infinity
Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2).
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D59405
llvm-svn: 357241
A shift and add/sub sequence combination is faster in place of a multiply by constant.
Because the cycle or latency of multiply is not huge, we only consider such following
worthy patterns.
```
(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))
```
And the cycles or latency is subtarget-dependent so that we need consider the
subtarget to determine to do or not do such transformation.
Also data type is considered for different cycles or latency to do multiply.
Differential Revision: https://reviews.llvm.org/D58950
llvm-svn: 357233
These changes are related to PR37743 and include:
SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.
Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.
Add promoting the integer ABS node in the LegalizeIntegerType.
Expand-based legalization of integer result for the ABS nodes.
Expand-based legalization of ABS vector operations.
Add some integer abs testcases for different typesizes for Thumb arch
Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
tmp = (SRA, Hi, 31)
Lo = (UADDO tmp, Lo)
Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
Lo = (XOR tmp, Lo)
The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
(ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).
Change integer abs testcases for codegen with the ABS node support for AArch64.
Indicate that the ABS is legal for the i64 type when the NEON is supported.
Change the integer abs testcases to show changing of codegen.
Add combine and legalization of ABS nodes for Thumb arch.
Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.
For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743
Patch by: @ikulagin (Ivan Kulagin)
Differential Revision: https://reviews.llvm.org/D49837
llvm-svn: 356468
This allows better code size for aarch64 floating point materialization
in a future patch.
Reviewers: evandro
Differential Revision: https://reviews.llvm.org/D58690
llvm-svn: 356389
The PowerPC code generator currently scalarizes vector truncates that would fit in a vector register, resulting in vector extracts, scalar operations, and vector merges. This patch custom lowers a vector truncate that would fit in a register to a vector shuffle instead.
Differential Revision: https://reviews.llvm.org/D56507
llvm-svn: 353724
Move the CC analysis implementation to its own .cpp file instead of
duplicating it and artificually using functions in PPCISelLowering.cpp
and PPCFastISel.cpp. Follow-up to the same change done for X86, ARM, and
AArch64.
llvm-svn: 352444
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
For SDAG, we pretend patchpoints aren't special at all until we emit the code for the pseudo.
Then the verifier runs and it seems like we have a use of an undefined register (the register will
be reserved later, but the verifier doesn't know that).
So this patch call setUsesTOCBasePtr before emit the code for the pseudo, so verifier can know
X2 is a reserved register.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D56148
llvm-svn: 350165