When performing dynamic stack allocation, calculation of frame pointer
and actual negsize can be separated. This patch refactors
`lowerDynamicAlloc` in preparation of supporting
`-fstack-clash-protection` which also has to calculate actual frame
pointer and negsize.
Differential Revision: https://reviews.llvm.org/D81354
As of 1fed131660, we have code that
changes shuffle masks so that we can put the shuffle in a canonical
form that can be matched to a single instruction. However, it
does not properly account for undef elements in the BUILD_VECTOR
that is the RHS splat so we can end up with undefs where they
shouldn't be. This patch converts the splat input with undefs to
one without.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.
Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.
Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.
Reviewed By: nickdesaulniers, void
Differential Revision: https://reviews.llvm.org/D79794
The situation where the caller uses a TOC and the callee does not
but is marked as clobbers the TOC (st_other=1) was not being compiled
correctly if both functions where in the same object file.
The call site where we had `callee` was missing a nop after the call.
This is because it was assumed that since the two functions where in
the same DSO they would be sharing a TOC. This is not the case if the
callee uses PC Relative because in that case it may clobber the TOC.
This patch makes sure that we add the cnop correctly so that the
linker has a place to restore the TOC.
Reviewers: sfertile, NeHuang, saghir
Differential Revision: https://reviews.llvm.org/D81126
There are two uses of TM (instance of TargetMachine) when checking options.
These will not work once we enable GlobalISel. This patch replaces those uses of
TM with Subtarget->getTargetMachine().
This patch adds the td definitions and asm/disasm tests for the
following instructions:
XXSPLTIW
XXSPLTIDP
XXSPLTI32DX
XXPERMX
XXBLENDVB
XXBLENDVH
XXBLENDVW
XXBLENDVD
VSLDBI
VSRDBI
Differential Revision: https://reviews.llvm.org/D82896
Commit 1fed131660 assumed that shuffle vector canonicalization will
always ensure that the shuffle mask will be ordered so that element
zero comes from the LHS vector. However there is code out there for
which this is not the case. This patch simply removes that unsafe
assumption and makes the code work regardless of the source of the
first element.
This patch adds LLVM intrinsics for the dcbt (Data Cache Block Touch),
dcbtst (Data Cache Block Touch for Store) and isync (Instruction
Synchronize) instructions.
The intrinsic for dcbt and dcbst in this patch are named llvm.ppc.dcbt.with.hint
and llvm.ppc.dcbtst.with.hint respectively as there already exists an intrinsic
for llvm.ppc.dcbt and llvm.ppc.dcbtst. However, the original variants of the
intrinsics do not accept the TH immediate field, whereas these variants do.
Differential Revision: https://reviews.llvm.org/D79633
Summary:
In preparation for GlobalISel, PPCSubTarget needs to be renamed to Subtarget as there places in GlobalISel that assume the presence of the variable Subtarget.
This patch introduces the variable Subtarget, and replaces all existing uses of PPCSubTarget with Subtarget. A subsequent patch will remove the definiton of
PPCSubTarget, once any downstream users have the opportunity to rename any uses they have.
Reviewers: hfinkel, nemanjai, jhibbits, #powerpc, echristo, lkail
Reviewed By: #powerpc, echristo, lkail
Subscribers: echristo, lkail, wuzish, nemanjai, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81623
This patch implements builtins for the following prototypes:
unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long)
unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long)
vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long)
vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long)
Differential Revision: https://reviews.llvm.org/D80941
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
This patch implements builtins for the following prototypes for the VSX Permute
Control Vector Generate with Mask Instructions:
vector unsigned char vec_genpcvm (vector unsigned char, const int);
vector unsigned short vec_genpcvm (vector unsigned short, const int);
vector unsigned int vec_genpcvm (vector unsigned int, const int);
vector unsigned long long vec_genpcvm (vector unsigned long long, const int);
Differential Revision: https://reviews.llvm.org/D81774
This patch implements builtins for the following prototypes:
```
vector signed char vec_clrl (vector signed char a, unsigned int n);
vector unsigned char vec_clrl (vector unsigned char a, unsigned int n);
vector signed char vec_clrr (vector signed char a, unsigned int n);
vector signed char vec_clrr (vector unsigned char a, unsigned int n);
```
Differential Revision: https://reviews.llvm.org/D81707
We currently miss a number of opportunities to emit single-instruction
VMRG[LH][BHW] instructions for shuffles on little endian subtargets. Although
this in itself is not a huge performance opportunity since loading the permute
vector for a VPERM can always be pulled out of loops, producing such merge
instructions is useful to downstream optimizations.
Since VPERM is essentially opaque to all subsequent optimizations, we want to
avoid it as much as possible. Other permute instructions have semantics that can
be reasoned about much more easily in later optimizations.
This patch does the following:
- Canonicalize shuffles so that the first element comes from the first vector
(since that's what most of the mask matching functions want)
- Switch the elements that come from splat vectors so that they match the
corresponding elements from the other vector (to allow for merges)
- Adds debugging messages for when a shuffle is matched to a VPERM so that
anyone interested in improving this further can get the info for their code
Differential revision: https://reviews.llvm.org/D77448
This patch implements builtins for the following prototypes:
vector unsigned long long vec_pdep(vector unsigned long long, vector unsigned long long);
vector unsigned long long vec_pext(vector unsigned long long, vector unsigned long long __b);
unsigned long long __builtin_pdepd (unsigned long long, unsigned long long);
unsigned long long __builtin_pextd (unsigned long long, unsigned long long);
Revision Depends on D80758
Differential Revision: https://reviews.llvm.org/D80935
Summary:
For PPC BinaryOperator of fp128 will become libcall, we shouldn't
convert loop to CTR loop if the loop contain libCall.
But currently, in the PPCTTIImpl::mightUseCTR() function, we only deal
with BinaryOperator for ppc_fp128, don't deal with the fp128.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D81353
Summary: A bug is reported in bugzilla-45628, where the swap_with_shift case can’t be matched to a single HW instruction xxswapd as expected.
In fact the case matches the idiom of rotate. We have MatchRotate to handle an ‘or’ of two operands and generate a rot[lr] if the case matches the idiom of rotate. While PPC doesn’t support ROTL v1i128. We can custom lower ROTL v1i128 to the vector_shuffle. The vector_shuffle will be matched to a single HW instruction during the phase of instruction selection.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D81076
Summary:
In the patch D62907 the PPC CTRLoops pass has been replaced by Generic
Hardware Loop pass, and it has imported some new intrinsic for Generic
Hardware Loop.
The old intrinsic used in PPC CTRLoops int_ppc_mtctr and
int_ppc_is_decremented_ctr_nonzero is been replaced by
int_set_loop_iterations and loop_decrement.
This patch is to remove above unused two instrinsic.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D81539
Summary:
L is meant to support the second word used by 32b calling conventions for 64b arguments.
This is required for build 32b PowerPC Linux kernels after upstream
commit 334710b1496a ("powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'")
Thanks for the report from @nathanchance, and reference to GCC's
implementation from @segher.
Fixes: pr/46186
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1044
Reviewers: echristo, hfinkel, MaskRay
Reviewed By: MaskRay
Subscribers: MaskRay, wuzish, nemanjai, hiraditya, kbarton, steven.zhang, llvm-commits, segher, nathanchance, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81767
Change isa<> to a variadic function template, so that it can be used to test against one of multiple types as follows:
isa<Type0, Type1, Type2>(Val)
Differential Revision: https://reviews.llvm.org/D81045
We should not be adding the relocation addend to the instruction encoding.
This patch removes that and sets those bits to zero.
Differential Revision: https://reviews.llvm.org/D81082
Have BasicTTI call the base implementation so that both agree on the
default behaviour, which the default being a cost of '1'. This has
required an X86 specific implementation as it seems to be very
reliant on those instructions being free. Changes are also made to
AMDGPU so that their implementations distinguish between cost kinds,
so that the unrolling isn't affected. PowerPC also has its own
implementation to prevent changes to the reg-usage vectorizer test.
The cost model test changes now reflect that ret instructions are not
generally free.
Differential Revision: https://reviews.llvm.org/D79164
This patch tries to reassociate two patterns related to FMA to expose
more ILP on PowerPC.
// Pattern 1:
// A = FADD X, Y (Leaf)
// B = FMA A, M21, M22 (Prev)
// C = FMA B, M31, M32 (Root)
// -->
// A = FMA X, M21, M22
// B = FMA Y, M31, M32
// C = FADD A, B
// Pattern 2:
// A = FMA X, M11, M12 (Leaf)
// B = FMA A, M21, M22 (Prev)
// C = FMA B, M31, M32 (Root)
// -->
// A = FMUL M11, M12
// B = FMA X, M21, M22
// D = FMA A, M31, M32
// C = FADD B, D
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D80175
Summary:
We have defined MTSPR/MFSPR and MTSPR8/MFSPR8, but we only defined
mtspr/mfspr InstAlias for some MTSPR/MFSPR.
This patch is to add the InstAlias definitions for MTSPR8/MFSPR8,
and add the some new mtspr/mfspr InstAlias we may use.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D77531
This patch adds handling of constrained FP intrinsics about round,
truncate and extend for PowerPC target, with necessary tests.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D64193
On PowerPC, we have vnmsubfp Altivec instruction for fnmsub operation on
v4f32 type. Default pattern for this instruction never works since we
don't have legal fneg for v4f32 when VSX disabled.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D80617
Here, I am proposing to add an special case for massv powf4/powd2 function (SIMD counterpart of powf/pow function in MASSV library) in MASSV pass to get later optimizations like conversion from pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's in the DAGCombiner in vector float case. My reason for doing this is: the optimized pow(x,0.75) and pow(x,0.25) for double and single precision to sequence of sqrt's is faster than powf4/powd2 on P8 and P9.
In case MASSV functions is called, and if the exponent of pow is 0.75 or 0.25, we will get the sequence of sqrt's and if exponent is not 0.75 or 0.25 we will get the appropriate MASSV function.
Reviewed By: steven.zhang
Tags: #LLVM #PowerPC
Differential Revision: https://reviews.llvm.org/D80744
This is a NFC patch to make convertToImmediateForm a light wrapper
for converting xform and imm form instructions on PowerPC.
Reviewed By: Steven.zhang
Differential Revision: https://reviews.llvm.org/D80907
SUMMARY:
Since we deal with aix emitLinkage in the PPCAIXAsmPrinter::emitLinkage() in the patch https://reviews.llvm.org/D75866. It do not go to AsmPrinter::emitLinkage() any more, we clean up some aix related code in the AsmPrinter::emitLinkage()
Reviewers: Jason liu
Differential Revision: https://reviews.llvm.org/D81613
Add the remaining arithmetic opcodes into the generic implementation
of getUserCost and then call this from getInstructionThroughput. Most
of the backends have been modified to return the base implementation
for cost kinds other RecipThroughput. The outlier here is AMDGPU
which already uses getArithmeticInstrCost for all the cost kinds.
This change means that most of the opcodes can be removed from that
backends implementation of getUserCost.
Differential Revision: https://reviews.llvm.org/D80992
SUMMARY:
in the aix assembly , it do not have .hidden and .protected directive.
in current llvm. if a function or a variable which has visibility attribute, it will generate something like the .hidden or .protected , it can not recognize by aix as.
in aix assembly, the visibility attribute are support in the pseudo-op like
.extern Name [ , Visibility ]
.globl Name [, Visibility ]
.weak Name [, Visibility ]
in this patch, we implement the visibility attribute for the global variable, function or extern function .
for example.
extern __attribute__ ((visibility ("hidden"))) int
bar(int* ip);
__attribute__ ((visibility ("hidden"))) int b = 0;
__attribute__ ((visibility ("hidden"))) int
foo(int* ip){
return (*ip)++;
}
the visibility of .comm linkage do not support , we will have a separate patch for it.
we have the unsupported cases ("default" and "internal") , we will implement them in a a separate patch for it.
Reviewers: Jason Liu ,hubert.reinterpretcast,James Henderson
Differential Revision: https://reviews.llvm.org/D75866
Add cases for icmp, fcmp and select into the switch statement of the
generic getUserCost implementation with getInstructionThroughput then
calling into it. The BasicTTI and backend implementations have be set
to return a default value (1) when a cost other than throughput is
being queried.
Differential Revision: https://reviews.llvm.org/D80550
Summary:
We have handle the InstAlias for OR instructions, but we handle it
agagin in PPCInstPrinter.cpp.
This patch is to Remove the redundant InstAlias for OR instruction.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D80502
The previous implementation used "asm parser only" pseudo-instructions in their
output patterns. Those are not meant to emit code and will caused crashes when
built with -filetype=obj.
Differential Revision: https://reviews.llvm.org/D80151
The function emitRLDICWhenLoweringJumpTables in PPCMIPeephole.cpp
was supposed to convert a pair of RLDICL and RLDICR to a single RLDIC,
but it was leaving out the RLDICL instruction. This PR fixes the bug.
Differential Revision: https://reviews.llvm.org/D78063
Fix the incorrect PC Relative relocations for Big Endian for 34 bit offsets.
The offset should be zero for both BE and LE in this situation.
Differential Revision: https://reviews.llvm.org/D81033
Attempt to handle unsupported types, such as structs, in
getMemoryOpCost. The backend now checks for a supported type and
calls into BasicTTI as a fallback. BasicTTI will now also perform
the same check and will default to an expensive cost of 4 for 'Other'
MVTs.
Differential Revision: https://reviews.llvm.org/D80984
After pseudo-expansion, we may end up with ADDI (add immediate)
instructions where the operand is not an immediate but a relocation.
For such instructions, attempts to get the immediate result in
assertion failures for obvious reasons.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45432
The instruction addi is usually used to post increase the loop indvar, which looks like this:
label_X:
load x, base(i)
...
y = op x
...
i = addi i, 1
goto label_X
However, for PowerPC, if there are too many vsx instructions that between y = op x and i = addi i, 1,
it will use all the hw resource that block the execution of i = addi, i, 1, which result in the stall
of the load instruction in next iteration. So, a heuristic is added to move the addi as early as possible
to have the load hide the latency of vsx instructions, if other heuristic didn't apply to avoid the starve.
Reviewed By: jji
Differential Revision: https://reviews.llvm.org/D80269
Calls that are marked as @notoc do not require the extra nop after the call
for the TOC restore.
Differential Revision: https://reviews.llvm.org/D81081
Use getMemoryOpCost from the generic implementation of getUserCost
and have getInstructionThroughput return the result of that for loads
and stores.
This also means that the X86 implementation of getUserCost can be
removed with the functionality folded into its getMemoryOpCost.
Differential Revision: https://reviews.llvm.org/D80984
On PowerPC, FNMSUB (both VSX and non-VSX version) means -(a*b-c). But
the backend used to generate these instructions regardless whether nsz
flag exists or not. If a*b-c==0, such transformation changes sign of
zero.
This patch introduces PPC specific FNMSUB ISD opcode, which may help
improving combined FMA code sequence.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D76585
Summary:
The standard data emission directives (e.g. .short, .long) in the AIX assembler
have the unintended consequence of aligning their output to the natural byte
boundary. This cause problems because we aren't expecting behavior from the
Data*bitsDirectives, so the final alignment of data isn't correct in some cases
on AIX.
This patch updated the Data*bitsDirectives to use .vbyte pseudo-ops instead to emit the
data, since we will emit the .align directives as needed. We update the existing
testcases and add a test for emission of struct data.
Reviewers: hubert.reinterpretcast, Xiangling_L, jasonliu
Reviewed By: hubert.reinterpretcast, jasonliu
Subscribers: wuzish, nemanjai, hiraditya, kbarton, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80934
These two nodes were added by 69caef2b78 in 2005
and they are not used by PowerPC backend anymore. And the ISD::FMA is a prefer
way for VMADDFP if we really want to create that node. For VNMSUBFP, we will
also add a more generic node FNMSUB in D76585 if we really want it.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D80429
Summary: Exploit vabsd* for for absolute difference of vectors on P9,
for example:
void foo (char *restrict p, char *restrict q, char *restrict t)
{
for (int i = 0; i < 16; i++)
t[i] = abs (p[i] - q[i]);
}
this case should be matched to the HW instruction vabsdub.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D80271
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given
Differential Revision: https://reviews.llvm.org/D79537
Since on AIX, our strategy is to not use -u to suppress any undefined
symbols, we need to emit .extern for the symbols with AvailableExternally
linkage.
Differential Revision: https://reviews.llvm.org/D80642
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
As reported in PR45186, we could be in a situation where we don't
want to handle unaligned memory accesses for FP scalars but still
have VSX (which allows unaligned access for vectors). Change the
default to only apply to scalars.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45186
Add the remaining cast instruction opcodes to the base implementation
of getUserCost and directly return the result. This allows
getInstructionThroughput to return getUserCost for the casts. This
has required changes to PPC and SystemZ because they implement
getUserCost and/or getCastInstrCost with adjustments for vector
operations. Adjusts have also been made in the remaining backends
that implement the method so that they still produce a cost of zero
or one for cost kinds other than throughput.
Differential Revision: https://reviews.llvm.org/D79848
As reported in https://bugs.llvm.org/show_bug.cgi?id=45709 we can hit an
infinite loop in legalization since we set the legalization action for
ISD::SELECT_CC for all fixed length vector types to Promote. Without some
different legalization action for the type being promoted to, the legalizer
simply loops. Since we don't have patterns to match the node, the right
legalization action should be Expand.
Differential revision: https://reviews.llvm.org/D79854
This patch introduces a TargetLowering query, isMulhCheaperThanMulShift.
Currently in DAG Combine, it will transform mulhs/mulhu into a
wider multiply and a shift if the wide multiply is legal.
This TLI function is implemented on 64-bit PowerPC, as it is more desirable to
have multiply-high over multiply + shift for words and doublewords. Having
multiply-high can also aid in further transformations that can be done.
Differential Revision: https://reviews.llvm.org/D78271
If the caller needs to reponsible for making sure the MaybeAlign
has a value, then we should just make the caller convert it to an Align
with operator*.
I explicitly deleted the relational comparison operators that
were being inherited from Optional. It's unclear what the meaning
of two MaybeAligns were one is defined and the other isn't
should be. So make the caller reponsible for defining the behavior.
I left the ==/!= operators from Optional. But now that exposed a
weird quirk that ==/!= between Align and MaybeAlign required the
MaybeAlign to be defined. But now we use the operator== from
Optional that takes an Optional and the Value.
Differential Revision: https://reviews.llvm.org/D80455
This patch adds support for Vector Multiply-Sum Unsigned Doubleword Modulo
instruction; vmsumudm.
Differential Revision: https://reviews.llvm.org/D80294
The fix for PR39865 took care of some of the handling for half precision
but it missed a number of issues that still exist. This patch fixes the
remaining issues that cause crashes in the PPC back end.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45776
Differential revision: https://reviews.llvm.org/D79283
This has not been implemented by any backends which appear to cover
the functionality through getCastInstrCost. Sink what there is in the
default implementation into BasicTTI.
Differential Revision: https://reviews.llvm.org/D78922
Combine the two API calls into one by introducing a structure to hold
the relevant data. This has the added benefit of moving the boiler
plate code for arguments and flags, into the constructors. This is
intended to be a non-functional change, but the complicated web of
logic involved here makes it very hard to guarantee.
Differential Revision: https://reviews.llvm.org/D79941
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
This patch was originally committed as b8a3c34eee, but broke the
modules build, as LoopAccessAnalysis was using the Expander.
The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
Summary:
For PowerPC, there are 3 passes has disabled the machine verification.
```
PPCTargetMachine.cpp: addPass(&LiveVariablesID, false);
PPCTargetMachine.cpp: addPass(createPPCEarlyReturnPass(), false);
PPCTargetMachine.cpp: addPass(createPPCBranchSelectionPass(), false);
```
This patch is to enable machine verification for above three passes.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D79840
Replace with forward declarations and move includes down to source files where required.
I also needed to move the TargetLoweringObjectFile::SectionForGlobal wrapper implementation down into TargetLoweringObjectFile.cpp
Summary: This patch adds the intrinsic llvm.ppc.popcntb for the HW
instruction POPCNTB
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D79703
SplitCSR was only suppored for functions with CXX_FAST_TLS calling
convention. Clang only emits that calling convention for Darwin which is
no longer supported by the PowerPC backend. Another IR producer could
use the calling convention, but considering the calling convention is
meant to be an optimization and the codegen for SplitCSR can be
attrocious on Power (see the modifed lit test) it is best to remove it
and codegen CXX_FAST_TLS same as the C calling convention.
Differential Revision: https://reviews.llvm.org/D79018
xsnegdp, xsabsdp and xsnabsdp can be used to operate on f32 operand.
This patch adds the missing patterns since we prefer VSX instructions
when available.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D75344
Legalizer should respect both command-line options or SDNode-level
fast-math flags.
Also, this patch propagates other flags during custom simplifying.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D79074
Summary:
The ppc-early-ret pass use the addReg() to add operand to the new
instruction, it can't reserve the flag of old operand. This has caused
machine verfications failed.
This patch use add() to instead of addReg().
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D77997
Summary:
The SPE doesn't have a 'fma' instruction, so the intrinsic becomes a
libcall. It really should become an expansion to two instructions, but
for some reason the compiler doesn't think that's as optimal as a
branch. Since this lowering is done after CTR is allocated for loops,
tell the optimizer that CTR may be used in this case. This prevents a
"Invalid PPC CTR loop!" assertion in the case that a fma() function call
is used in a C/C++ file, and clang converts it into an intrinsic.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D78668
This patch folds redundant load immediates into a zero for instructions
which recognise this as the value zero and not the register. If the load
immediate is no longer in use it is then deleted.
This is already done in earlier passes but the ppc-mi-peephole allows for
a more general implementation.
Differential Revision: https://reviews.llvm.org/D69168
This patch adds strict-fp intrinsics support for fma, fsqrt, fmaxnum and
fminnum on PowerPC.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D72749
Summary:
This patch tries to emit the correct alignment result for both
object file generation path and assembly path.
Reviewed by: hubert.reinterpretcast, DiggerLin, daltenty
Differential Revision: https://reviews.llvm.org/D79127
Summary:
This patch will set the variable PredictableSelectIsExpensive to do the
select to if based on BranchProbability in CodeGenPrepare.
When the BranchProbability more than MinPercentageForPredictableBranch,
PPC will convert SELECT to branch.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D71883
This patch stores the alignment for ConstantPoolSDNode as an
Align and updates the getConstantPool interface to take a MaybeAlign.
Removing getAlignment() will be done as a follow up.
Differential Revision: https://reviews.llvm.org/D79436
The code to prevent using `PPCXCOFFMCAsmInfo` with little-endian targets
used an incorrect check. Also, there does not appear to be sufficient
earlier checking to prevent failing this check, so the check here is
upgraded to be a `report_fatal_error`.
`PPCAIXAsmPrinter` was also missing a check against use with
little-endian targets. This patch adds such a check in.
Summary:
`AsmPrinter::emitGlobalIndirectSymbol` is dependent on
`MCStreamer::emitAssignment` to produce `.set` directives for alias
symbols; however, the `.set` pseudo-op on AIX is documented as not
usable with external relocatable terms or expressions, which limits its
applicability in generating alias symbols.
Disable generating aliases on AIX until a different implementation
strategy is available.
Reviewers: cebowleratibm, jasonliu, sfertile, daltenty, DiggerLin
Reviewed By: jasonliu
Differential Revision: https://reviews.llvm.org/D79044
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.
RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
Differential Revision: https://reviews.llvm.org/D79002
Over time, we have made many additions to this file and it has frankly become a
bit of a mess. This has led to at least one issue - we have a number of
instructions where the side effects flag should be set to false and we neglected
to do this. This patch suggests a refactoring that should make the file much
more maintainable. The file is split up into major sections and the nesting
level is reduced, predicate blocks merged, etc.
Sections:
- Custom PPCISD node definitions
- Predicate definitions
- Instruction formats
- Instruction definitions
- Helper DAG definitions
- Anonymous patterns
- Instruction aliases
Differential revision: https://reviews.llvm.org/D78132
The setting of `MCAsmInfo` properties for XCOFF got split between
`MCAsmInfoXCOFF` and `PPCXCOFFMCAsmInfo`. Except for the properties that
are dependent on the target information being passed via the
constructor, the properties being set in `PPCXCOFFMCAsmInfo` had no
fundamental reason for being treated as specific for XCOFF on PowerPC.
Indeed, the property that might be considered more specific to PowerPC,
`NeedsFunctionDescriptors`, was set in `MCAsmInfoXCOFF`.
XCOFF being specific to PowerPC anyway, this patch consolidates the
setting of the properties into `MCAsmInfoXCOFF` except for the cases
that are dependent on the information provided via the
`PPCXCOFFMCAsmInfo` constructor.
This patch also reorders the assignments to the fields to match the
declaration order in `MCAsmInfo`.
Implement passing of ByVal formal arguments when the argument is passed
partly in the argument registers, with the remainder of the argument
passed on the stack.
Differential Revision: https://reviews.llvm.org/D78515
Summary:
They all match the base implementation in
TargetInstrInfo::isUnpredicatedTerminator.
Follow up to D62749.
Reviewers: echristo, MaskRay, hfinkel
Reviewed By: echristo
Subscribers: wuzish, nemanjai, hiraditya, kbarton, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78976
getTargetStreamer() might return null (e.g. when running inlined-strings.ll test),
downcasting to a reference will be wrong. This is detectable with -fsanitize=null.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D78686
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.
The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.
getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.
This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.
Differential Revision: https://reviews.llvm.org/D78635
Currently, on PowerPC target, it uses function scope UnsafeFPMath
option to drive Machine Combiner pass.
This is not accurate in two ways:
1: the scope is not accurate. Machine Combiner pass only requires
instruction-level flags instead of the function scope.
2: the float point flag is not accurate. Machine Combiner pass
only requires float point flags reassoc and nsz.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D78183
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Summary:
In the ppc-expand-isel pass, we use stepForward() to update the
liveins, this function is not recommended, because it needs the
accurate kill info.
This patch uses the function computeAndAddLiveIns() to update the
liveins, it's the recommended method and can fix the liveins bug for
ppc-expand-isel pass..
Reviewed By: efriedma, lkail
Differential Revision: https://reviews.llvm.org/D78657
"unskipableSimplifyCode()" was added to handle unsafe BL8_NOTOC instruction
when TOC was not completely removed. The function is not needed after confirming
TOC pointer is not used in a function that uses PC-Relative addressing.
Differential Revision: https://reviews.llvm.org/D78517
Tail Calls were initially disabled for PC Relative code because it was not safe
to make certain assumptions about the tail calls (namely that all compiled
functions no longer used the TOC pointer in R2). However, once all of the
TOC pointer references have been removed it is safe to tail call everything
that was tail called prior to the PC relative additions as well as a number of
new cases.
For example, it is now possible to tail call indirect functions as there is no
need to save and restore the TOC pointer for indirect functions if the caller
is marked as may clobber R2 (st_other=1). For the same reason it is now also
possible to tail call functions that are external.
Differential Revision: https://reviews.llvm.org/D77788
Follow-up of D78082 (x86-64).
This change avoids dynamic relocations in `xray_instr_map` for ARM/AArch64/powerpc64le.
MIPS64 cannot use 64-bit PC-relative addresses because R_MIPS_PC64 is not defined.
Because MIPS32 shares the same code, for simplicity, we don't use PC-relative addresses for MIPS32 as well.
Tested on AArch64 Linux and ppc64le Linux.
Reviewed By: ianlevesque
Differential Revision: https://reviews.llvm.org/D78590
1. Use Subtarget.isUsingPCRelativeCalls() in LowerConstantPool to
check if using PCRelative addressing.
2. Change MO_GOT_FLAG = 32 to MO_GOT_FLAG = 8 in PPC.h to use
consecutive bits.
Differential Revision: https://reviews.llvm.org/D78406
Currently an indirect call produces the following sequence on PCRelative mode:
extern void function( );
extern void (*ptrfunc) ( );
void g() {
ptrfunc=function;
}
void f() {
(*ptrfunc) ( );
}
Producing
paddi 3, 0, .LC0@PCREL, 1
ld 3, 0(3)
std 2, 24(1)
ld 12, 0(3)
mtctr 12
bctrl
ld 2, 24(1)
Though the caller does not use or preserve r2, it is still saved and restored
across a function call. This patch is added to remove these redundant save and
restores for indirect calls.
Differential Revision: https://reviews.llvm.org/D77749
Add initial support for PC Relative addressing to get jump table base
address instead of using TOC.
Differential Revision: https://reviews.llvm.org/D75931
This patch exploits rldimi instruction for patterns like
`or %a, 0b000011110000`, which saves number of instructions when the
operand has only one use, compared with `li-ori-sldi-or`.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D77850
This is an optimization that applies to global addresses and
allows for the following transformation:
Convert this:
paddi r3, 0, symbol@PCREL, 1
ld r4, 8(r3)
To this:
pld r4, symbol@PCREL+8(0), 1
An instruction is saved and the linker can do the addition when
the symbol is resolved.
Differential Revision: https://reviews.llvm.org/D76160
Summary:
Before this patch, `relaxInstruction` takes three arguments, the first
argument refers to the instruction before relaxation and the third
argument is the output instruction after relaxation. There are two quite
strange things:
1) The first argument's type is `const MCInst &`, the third
argument's type is `MCInst &`, but they may be aliased to the same
variable
2) The backends of ARM, AMDGPU, RISC-V, Hexagon assume that the third
argument is a fresh uninitialized `MCInst` even if `relaxInstruction`
may be called like `relaxInstruction(Relaxed, STI, Relaxed)` in a
loop.
In this patch, we drop the thrid argument, and let `relaxInstruction`
directly modify the given instruction. Also, this patch fixes the bug https://bugs.llvm.org/show_bug.cgi?id=45580, which is introduced by D77851, and
breaks the assumption of ARM, AMDGPU, RISC-V, Hexagon.
Reviewers: Razer6, MaskRay, jyknight, asb, luismarques, enderby, rtaylor, colinl, bcain
Reviewed By: Razer6, MaskRay, bcain
Subscribers: bcain, nickdesaulniers, nathanchance, wuzish, annita.zhang, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, tpr, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78364
Adds support for passing a ByVal formal argument completely on the stack
(ie after all argument registers are exhausted).
Differential Revision: https://reviews.llvm.org/D78263
Add initial support for PC Relative addressing for global values that
require GOT indirect addressing. This patch adds PCRelative support for
global addresses that may not be known at link time and may require
access through the GOT.
Differential Revision: https://reviews.llvm.org/D76064
Summary:
AIX symbol have qualname and unqualified name. The stock getSymbol
could only return unqualified name, which leads us to patch many
caller side(lowerConstant, getMCSymbolForTOCPseudoMO).
So we should try to address this problem in the callee
side(getSymbol) and clean up the caller side instead.
Note: this is a "mostly" NFC patch, with a fix for the original
lowerConstant behavior.
Differential Revision: https://reviews.llvm.org/D78045
If we are and the constant like 0xFFFFFFC00000, for now, we are using several
instructions to generate this 48bit constant and final an "and". However, we
could exploit it with two rotate instructions.
MB ME MB+63-ME
+----------------------+ +----------------------+
|0000001111111111111000| -> |0000000001111111111111|
+----------------------+ +----------------------+
0 63 0 63
Rotate left ME + 1 bit first, and then, mask it with (MB + 63 - ME, 63),
finally, rotate back. Notice that, we need to round it with 64 bit for the
wrapping case.
Reviewed by: ChenZheng, Nemanjai
Differential Revision: https://reviews.llvm.org/D71831
This patch adds PC Relative support for global values that are known at link
time. If a global value requires access through the global offset table (GOT)
it is not covered in this patch.
Differential Revision: https://reviews.llvm.org/D75280
Summary:
When doing the conversion: MachineInst -> MCInst, we should ignore the
implicit operands, it will expose more opportunity for InstiAlias.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D77118
We have added code to correct the .localentry values on assignments. However, we
never clear the set so presumably it will still contain the (now dangling)
MCSymbol pointers across a call to finish() and reset() in the streamer.
This is based on my speculation that it is the reason we are getting
segmentation faults mentioned in https://bugs.llvm.org/show_bug.cgi?id=45366
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45366
Differential revision: https://reviews.llvm.org/D78196
There was another issue introduced by this commit that the OP
initially missed. Namely, for functions that are free to use
R2 as a callee-saved register, we emit a TOC expression based
on the address of the GEP label without emitting the GEP label.
Since we only emit such expressions for the large code model, this
issue only surfaced there.
I have confirmed that with this fix, the kernel build is successful
with target "all".
In LowerFP_TO_INTForReuse, when emitting `stfiwx`, alignment of 4 is
set for the `MachineMemOperand`, but RLI(ReuseLoadInfo)'s alignment is
not updated for following loads.
It's related to failed alignment check reported in
https://bugs.llvm.org/show_bug.cgi?id=45297
Differential Revision: https://reviews.llvm.org/D77624
The transformation currently does not differentiate between explicit
and implicit kills. However, it is not valid to later simply clear
an implicit kill flag since the kill could be due to a call or return.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45374
This is a fix for the previous patch 6c4b40def7.
In some cases it may be possible to have the compiler produce st_other=1 without
the compiler using mcpu=future which should not be the case. This patch adds a
guard to make sure that if we are using st_other=1 then we are also compiling
for future CPU.
When we try to select a SELECT_CC on Power9, we check if it can be matched to a
SETB instruction. In that function, we assert that the output type is i32/i64.
This is unnecessary as it is perfectly reasonable to have an i1 SELECT_CC.
Change that from an assert to an early exit condition.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=45448
Summary:
This patch adds support for handling of variadic functions for AIX.
This includes ensuring that use and consume correct type of
va_list (char *va_list) for AIX.
Authored by: ZarkoCA
Reviewers: cebowleratibm, sfertile, jasonliu
Reviewed by: jasonliu
Differential Revision: https://reviews.llvm.org/D76130
Add initial support for PC Relative addressing for constant pool loads.
This includes adding a new relocation for @pcrel and adding a new PowerPC flag
to identify PC relative addressing.
Differential Revision: https://reviews.llvm.org/D74486
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: hfinkel, efriedma, sdesmalen
Reviewed By: efriedma
Subscribers: wuzish, nemanjai, hiraditya, kbarton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77266
Any or all the argument registers can be used to pass a byval formal
argument, with the limitation that the argument must fit in the
available registers (ie: is not split between registers and stack).
Differential Revision: https://reviews.llvm.org/D76902
On PowerPC most functions require a valid TOC pointer.
This is the case because either the function itself needs to use this
pointer to access the TOC or because other functions that are called
from that function expect a valid TOC pointer in the register R2.
The main exception to this is leaf functions that do not access the TOC
since they are guaranteed not to need a valid TOC pointer.
This patch introduces a feature that will allow more functions to not
require a valid TOC pointer in R2.
Differential Revision: https://reviews.llvm.org/D73664
There are a few patterns where we use a superclass for inputs to this
instruction rather than the correct class. This can sometimes lead to
unncessary copies.
Summary:
- Remove the no longer used Darwin CalleeSavedRegs
- Combine the SVR464 callee saved regs and AIX64 since the two are (and should be) identical into PPC64
- Update tests for 64-bit CSR change
Reviewers: sfertile, ZarkoCA, cebowleratibm, jasonliu, #powerpc
Reviewed By: sfertile
Subscribers: wuzish, nemanjai, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77235
Summary:
For current architect, we always require setContainingCsect to be
called on every MCSymbol got used in XCOFF context.
This is very hard to achieve because symbols gets created everywhere
and other MCSymbol types(ELF, COFF) do not have similar rules.
It's very easy to miss setting the containing csect, and we would
need to add a lot of XCOFF specialized code around some common code area.
This patch intendeds to do
1. Rely on getFragment().getParent() to get csect from labels.
2. Only use get/setRepresentedCsect (was get/setContainingCsect)
if symbol itself represents a csect.
Reviewers: DiggerLin, hubert.reinterpretcast, daltenty
Differential Revision: https://reviews.llvm.org/D77080
MI peephole will remove unnecessary FRSP instructions. This patch
removes such unnecessary XSRSP.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D77208
It was added by D76591 for migration purposes (not all
printBranchOperand users have migrated to the overload with `uint64_t Address`).
Now that all have been migrated, the parameter can go away.
Summary:
In https://bugs.llvm.org/show_bug.cgi?id=45297, it fails selecting
instructions for `PPCISD::ST_VSR_SCAL_INT`. The reason it generate the
`PPCISD::ST_VSR_SCAL_INT` with `-power8-vector` in IR is PPC's
combiner checks `hasP8Altivec` rather than `hasP8Vector`. This patch
should resolve PR45297.
Differential Revision: https://reviews.llvm.org/D76773
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77059
SUMMARY:
Address clang format issue:
"clang format this block, I don't think the spaces are aligned correctly."
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D76162
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76925
SUMMARY:
SUMMARY
for a source file "test.c"
void foo() {};
llc will generate assembly code as (assembly patch)
.globl foo
.globl .foo
.csect foo[DS]
foo:
.long .foo
.long TOC[TC0]
.long 0
and symbol table as (xcoff object file)
[4] m 0x00000004 .data 1 unamex foo
[5] a4 0x0000000c 0 0 SD DS 0 0
[6] m 0x00000004 .data 1 extern foo
[7] a4 0x00000004 0 0 LD DS 0 0
After first patch, the assembly will be as
.globl foo[DS] # -- Begin function foo
.globl .foo
.align 2
.csect foo[DS]
.long .foo
.long TOC[TC0]
.long 0
and symbol table will as
[6] m 0x00000004 .data 1 extern foo
[7] a4 0x00000004 0 0 DS DS 0 0
Change the code for the assembly path and xcoff objectfile patch for llc.
Reviewers: Jason Liu
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D76162
Summary:
The linker is free to relax this (relocation R_PPC_GOT_TPREL16) against
R_PPC_TLS, if it sees fit (initial exec to local exec). If r0 is used,
this can generate execution-invalid code (converts to 'addi %rX, %r0,
FOO, which translates in PPC-lingo to li %rX, FOO). Forbid this
instead.
This fixes static binaries using locales on FreeBSD/powerpc
(tested on FreeBSD/powerpcspe).
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D76662
```
// llvm-objdump -d output (before)
0: bl .-4
4: bl .+0
8: bl .+4
// llvm-objdump -d output (after) ; GNU objdump -d
0: bl 0xfffffffc / bl 0xfffffffffffffffc
4: bl 0x4
8: bl 0xc
```
Many Operand's are not annotated as OPERAND_PCREL.
They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches.
Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test
address space wraparound for powerpc32 and powerpc64.
Reviewed By: sfertile, jhenderson
Differential Revision: https://reviews.llvm.org/D76591
Follow-up of D72172 and D72180
This patch passes `uint64_t Address` to print methods of PC-relative
operands so that subsequent target specific patches can change
`*InstPrinter::print{Operand,PCRelImm,...}` to customize the output.
Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by
llvm-objdump.
```
// Current llvm-objdump -d output
aarch64: 20000: bl #0
ppc: 20000: bl .+4
x86: 20000: callq 0
// Ideal output
aarch64: 20000: bl 0x20000
ppc: 20000: bl 0x20004
x86: 20000: callq 0x20005
// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
aarch64: 20000: bl 20000
ppc: 20000: bl 0x20004
x86: 20000: callq 20005
```
In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`):
```
case 12:
// CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J...
- printPCRelImm(MI, 0, O);
+ printPCRelImm(MI, Address, 0, O);
return;
```
Some targets have 2 `printOperand` overloads, one without `Address` and
one with `Address`. They should annotate derived `Operand` properly with
`let OperandType = "OPERAND_PCREL"`.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D76574
Summary:
Below InstAlias have been redefined, this patch is to remove the repeated
definition.
mtdec/mfdec mtsdr1/mfsdr1 mtsrr0/mfsrr0 mtsrr1/mfsrr1 mtasr
Reviewed By: nemanjai, steven.zhang
Differential Revision: https://reviews.llvm.org/D75821
We can legalize the operation MUL for v8i16 with instruction (vmladduhm A, B, 0)
if altivec enabled. Now, it is set as custom and expand it later, which is not
the right way. And then, we can add the pattern to match the mul + add with (vmladduhm A, B, C)
Reviewed By: Nemanjai
Differential Revision: https://reviews.llvm.org/D76751
An analysis of real world code turned up a number of patterns with BUILD_VECTOR
of nodes resulting from operations on extracted vector elements for which we
produce poor code. This addresses those cases. No attempt is made for
completeness as that would entail a large amount of work for something that
there is no evidence of in real code.
Differential revision: https://reviews.llvm.org/D72660
The e500 core has a silicon bug that triggers an illegal instruction
program trap on any sync other than msync. Other cores will typically
ignore illegal sync types, and the documentation even implies that the
'illegal' bits are ignored.
Address this hardware deficiency by only using msync, like the PPC440.
Differential Revision: https://reviews.llvm.org/D76614
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: dylanmckay, sdardis, nemanjai, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76551
-fuse-init-array is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false.
The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called.
clang -target i386 -c a.c
clang -target x86_64 -c a.c
This cleanup fixes this as a bonus.
X86SpeculativeLoadHardeningPass::tracePredStateThroughCall can call
MCContext::createTempSymbol before TargetLoweringObjectFileELF::Initialize().
We need to call TargetLoweringObjectFileELF::Initialize() ealier.
test/CodeGen/X86/speculative-load-hardening-indirect.ll
Differential Revision: https://reviews.llvm.org/D71360
UseInitArray is now the CC1 default but TargetLoweringObjectFileELF::UseInitArray still defaults to false.
The following two unknown OS target triples continue using .ctors/.dtors because InitializeELF is not called.
clang -target i386 -c a.c
clang -target x86_64 -c a.c
This cleanup fixes this as a bonus.
Differential Revision: https://reviews.llvm.org/D71360
On Powerpc fma is faster than fadd + fmul for some types,
(PPCTargetLowering::isFMAFasterThanFMulAndFAdd). we should implement target
hook isProfitableToHoist to prevent simplifyCFGpass from breaking fma
pattern by hoisting fmul to predecessor block.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D76207
This is the first of a series of patches that adds caller support for
by-value arguments. This patch add support for arguments that are passed in a
single GPR.
There are 3 limitation cases:
-The by-value argument is larger than a single register.
-There are no remaining GPRs even though the by-value argument would
otherwise fit in a single GPR.
-The by-value argument requires alignment greater than register width.
Future patches will be required to add support for these cases as well
as for the callee handling (in LowerFormalArguments_AIX) that
corresponds to this work.
Differential Revision: https://reviews.llvm.org/D75863
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76348
The PPCISD::SExtVElems was added by commit https://reviews.llvm.org/D34009. However,
we have another ISD node ISD::SIGN_EXTEND_INREG that perfectly match the semantics
of SExtVElems. And the DAGCombiner has some combine rules for SIGN_EXTEND_INREG
that produce better code.
Differential Revision: https://reviews.llvm.org/D70771
This patch renames some of the instruction formats within PPCInstrPrefix.td to
adopt a more uniform naming convention. It also adds the naming convention
extension, `_MEM` to indicate instruction formats for memory ops.
Differential Revision: https://reviews.llvm.org/D75819
Summary:
On 32-bit PPC target[AIX and BE], when we convert an `i64` to `f32`, a `setcc` operand expansion is needed. The expansion will set the result type of expanded `setcc` operation based on if the subtarget use CRBits or not. If the subtarget does use the CRBits, like AIX and BE, then it will set the result type to `i1`, leading to an inconsistency with original `setcc` result type[i32].
And the reason why it crashed underneath is because we don't set result type of setcc consistent in those two places.
This patch fixes this problem by setting original setcc opnode result type also with `getSetCCResultType` interface.
Reviewers: sfertile, cebowleratibm, hubert.reinterpretcast, Xiangling_L
Reviewed By: sfertile
Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75702
This patch is intend to implement the missing P8 MacroFusion for LLVM
according to Power8 User's Manual Section 10.1.12 Instruction Fusion
Differential Revision: https://reviews.llvm.org/D70651
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.
Differential Revision: https://reviews.llvm.org/D75525
Handle LinkOnceODRLinkage;
Handle AppendingLinkage type for llvm.global_ctors/dtors static init global arrays;
Differential Revision: https://reviews.llvm.org/D75305
This is a follow up to the previous patch: [AIX] Implement caller
arguments passed in stack memory.
This corrects a defect in AIX 64-bit where an i32 is written to the
stack with stw (4 bytes) rather than the expected std (8 bytes.) Integer
arguments pass on the stack as images of their register representation.
I also took the opportunity to tidy up some of the calling convention
AIX tests I added in my last commit. This patch adds the missed assembly
expected output for the stack arg int case, which would have caught this
problem.
Differential Revision: https://reviews.llvm.org/D75126
Allow all ExternalSymbolSDNode on AIX, and rely on the linker error to find
symbols which we don't have definitions from any library/compiler-rt.
Differential Revision: https://reviews.llvm.org/D75075
Summary:
The patch D62993 : `[PowerPC] Emit scalar min/max instructions with unsafe fp math`
has modified the functionality when `Subtarget.hasP9Vector() && (!HasNoInfs || !HasNoNaNs)`,
this modification is not expected.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D74701
This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order.
This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code.
Differential Revision: https://reviews.llvm.org/D75132
Summary:
Function descriptor csect on AIX should be 4 byte align instead of 1 byte align.
Reviewer: daltenty
Differential Revision: https://reviews.llvm.org/D74974
Extends the existing support for spilling and restoring the condition
register to the linkage area for 32-bit targets, and enables for AIX.
Differential Revision: https://reviews.llvm.org/D74349
This moves all the logic of converting LLVM Triples to
MachO::CPU_(SUB_)TYPE from the specific target (Target)AsmBackend to
more convenient functions in lib/BinaryFormat.
This also gets rid of the separate two X86AsmBackend classes.
The previous attempt was to add it to libObject, but that adds an
unnecessary dependency to libObject from all the targets.
Differential Revision: https://reviews.llvm.org/D74808
Remove some cumbersome Darwin specific logic for updating the frame
offsets of the condition-register spill slots. The containing function has an
early return if the subtarget is not ELF based which makes the Darwin logic
dead.
The code at https://reviews.llvm.org/D74808 has broken builds that are
configured with -DBUILD_SHARED_LIBS=On.
This patch adds the correct library dependencies.
This moves all the logic of converting LLVM Triples to
MachO::CPU_(SUB_)TYPE from the specific target (Target)AsmBackend to
more convenient functions in libObject.
This also gets rid of the separate two X86AsmBackend classes.
Differential Revision: https://reviews.llvm.org/D74808
Create preprocessor defines for callee saved floating-point register spill
slots, vector register spill slots, and both 32-bit and 64-bit general
purpose register spill slots. This is an NFC refactor to prepare for
adding ABI compliant callee saves and restores for AIX.
We have the InstAlias rules for 32-bit rotate but missing the 64-bit one.
Rotate left immediate rotlwi ra,rs,n rlwinm ra,rs,n,0,31
Rotate left rotlw ra,rs,rb rlwnm ra,rs,rb,0,31
Differential Revision: https://reviews.llvm.org/D72676
On Powerpc, set instruction count as lsr first priority of lsr by default.
Add an option ppc-lsr-no-insns-cost to return back to default lsr cost model.
Reviewed By: steven.zhang, jsji
Differential Revision: https://reviews.llvm.org/D72683
Skip the loop over the CalleSavedInfos in 'restoreCalleeSavedRegisters' when
the register is a CR field and we are not targeting 32-bit ELF. This is safe
because:
1) The helper function 'restoreCRs' returns if the target is not 32-bit ELF,
making all the code in the loop related to CR fields dead for every other
subtarget. This code is only called on ELF right now, but the patch
to extend it for AIX also needs to skip 'restoreCRs'.
2) The loop will not otherwise modify the iterator, so the iterator
manipulations at the bottom of the loop end up setting 'I' to its
current value.
This simplifciation allows us to remove one argument from 'restoreCRs'.
Also add a helper function to determine if a register is one of the
callee saved condition register fields.
Exploit native VSX rounding instruction, x(v|s)r(d|s)pic, which does
rounding using current rounding mode.
According to C standard library, rint may raise INEXACT exception while
nearbyint won't.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D72685
An option is added for PowerPC to disable use of non-volatile CR
register fields and avoid CR spilling in the prologue.
Differential Revision: https://reviews.llvm.org/D69835
Added support for the intrinsic llvm.ppc.dcbfl and llvm.ppc.dcbflp.
These will be used for emitting cache control instructions dcbfl and dcbflp
which are actually mnemonics for using dcbf instruction with different
immediate arguments.
dcbfl ra, rb -> dcbf ra, rb, 1
dcbflp, ra, rb -> dcbf ra, rb, 3
Differential Revision: https://reviews.llvm.org/D68411
Summary:
Add a new method (tryParseRegister) that attempts to parse a register specification.
MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't.
Reviewers: thakis
Reviewed By: thakis
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73486
This patch:
- enable frame pointer for AIX;
- update some of red zone comments;
- add/update testcases;
Differential Revision: https://reviews.llvm.org/D72454
SUMMARY:
The patch is enable to support Mergeable2ByteCString and Mergeable4ByteCString
Reviewers: daltenty
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D74164
Remove code from LegalizeTypes that allowed this to work.
We were already using BUILD_PAIR for this in some places so this
standardizes on a single way to do this.
On little endian targets prior to Power9, we spill vector registers using a
swapping store (i.e. stdxvd2x saves the vector with the two doublewords in
big endian order regardless of endianness). This is generally not a problem
since we restore them using the corresponding swapping load (lxvd2x). However
if the restore is done by the unwinder, the vector register contains data in
the incorrect order.
This patch fixes that by using Altivec loads/stores for vector saves and
restores in PEI (which keep the order correct) under those specific conditions:
- EH aware function
- Subtarget requires swaps for VSX memops (Little Endian prior to Power9)
Differential revision: https://reviews.llvm.org/D73692
hasReservedSpillSlot returns a dummy frame index of '0' on PPC64 for the
non-volatile condition registers, which leads to the CalleSavedInfo
either referencing an unrelated stack object, or an invalid object if
there are no stack objects. The latter case causes the mir-printer to
crash due to assertions that checks if the frame index referenced by a
CalleeSavedInfo is valid.
To fix the problem create an immutable FixedStack object at the correct offset
in the linkage area of the previous stack frame (ie SP + positive offset).
Differential Revision: https://reviews.llvm.org/D73709
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.
Reviewers: courbet
Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73964
This patch implements the caller side of placing function call arguments
in stack memory. This removes the current limitation where LLVM on AIX
will report fatal error when arguments can't be contained in registers.
There is a particular oddity that a float argument that passes in a
register and also in stack memory requires that the caller initialize
both. From what AIX "ABI" documentation I have it's not clear that this
needs to be done, however, it is necessary for compatibility with the
AIX XL compiler so I think it's best to implement it the same way.
Note a later patch will follow to address the callee side.
Differential Revision: https://reviews.llvm.org/D73209
Summary:
The AIX assembler .space directive can't take a second non-zero argument to fill
with. But LLVM emitFill currently assumes it can. We add a flag to the AsmInfo
to check if non-zero fill is supported, and if we can't zerofill non-zero values
we just splat the .byte directives.
Reviewers: stevewan, sfertile, DiggerLin, jasonliu, Xiangling_L
Reviewed By: jasonliu
Subscribers: Xiangling_L, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73554
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73885
Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code.
Reviewers: courbet
Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73785
Summary:
This patch intends to support three most common relocation type
on AIX: R_POS, R_TOC, R_RBR.
These three relocation type will be needed for object file generation
on AIX for small code model.
We will have follow up patches to bring relocation support for
large code model on AIX.
Reviewers: hubert.reinterpretcast, daltenty, DiggerLin
Differential Revision: https://reviews.llvm.org/D72027
By adding the prefixed instructions the branch distances are no longer
computed correctly. Since prefixed instructions cannot cross a 64 byte
boundary we have to assume that a prefixed instruction may have a nop
prepended to it. This patch tries to take that nop into consideration
when computing the size of basic blocks.
Differential Revision: https://reviews.llvm.org/D72572
A known limitation for Future CPU is that the new prefixed instructions may
not cross 64 Byte boundaries.
All instructions are already 4 byte aligned so the only situation where this
can occur is when the prefix is in one 64 byte block and the instruction that
is prefixed is at the top of the next 64 byte block. To fix this case
PPCELFStreamer was added to intercept EmitInstruction. When a prefixed
instruction is emitted we try to align it to 64 Bytes by adding a maximum of
4 bytes. If the prefixed instruction crosses the 64 Byte boundary then the
alignment would trigger and a 4 byte nop would be added to push the
instruction into the next 64 byte block.
Differential Revision: https://reviews.llvm.org/D72570
A previous patch should have added pld and pstd and any support code in
the backend that is required for prefixed load and store type operations.
This patch adds a number of additional prefixed load and store type
instructions for the future CPU.
Differential Revision: https://reviews.llvm.org/D72577
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
Add the prefixed instructions pld and pstd to future CPU. These are load and
store instructions that require new operand types that are 34 bits. This patch
adds the two instructions as well as the operand types required.
Note that this patch also makes a minor change to tablegen to account for the
fact that some instructions are going to require shifts greater than 31 bits
for the new 34 bit instructions.
Differential Revision: https://reviews.llvm.org/D72574
Updated FoldConstantArithmetic method signature to match that of
FoldConstantVectorArithmetic in preparation for merging the two
functions together
https://bugs.llvm.org/show_bug.cgi?id=36544
This is the first step in combining the various
FoldConstantVectorArithmetic and FoldConstantVectorArithmetic
functions into one FoldConstantArithmetic function.
Differential Revision: https://reviews.llvm.org/D72870
Future CPU will include support for prefixed instructions.
These prefixed instructions are formed by a 4 byte prefix
immediately followed by a 4 byte instruction effectively
making an 8 byte instruction. The new instruction paddi
is a prefixed form of addi.
This patch adds paddi and all of the support required
for that instruction. The majority of the patch deals with
supporting the new prefixed instructions. The addition of
paddi is mainly to allow for testing.
Differential Revision: https://reviews.llvm.org/D72569
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.
Reviewers: xbolva00, courbet, bollu
Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73099
Collect the calling convention and a number of boolean arguments into a
structure to slightly reduces the number of arguments passed around between
LowerCall_<Subtarget>, FinishCall and a few of the helpers. Also
calulates if a call is indirect once using the exisitng helper and caches the
result replacing several instances where we duplicated the logic determining if
a call is indirect.
Summary:
We create a number of standard types of control sections in multiple places for
things like the function descriptors, external references and the TOC anchor
among others, so it is possible for their properties to be defined
inconsistently in different places. This refactor moves their creation and
properties into functions in the TargetLoweringObjectFile class hierarchy, where
functions for retrieving various special types of sections typically seem
to reside.
Note: There is one case in PPCISelLowering which is specific to function entry
points which we don't address since we don't have access to the TLOF there.
Reviewers: DiggerLin, jasonliu, hubert.reinterpretcast
Reviewed By: jasonliu, hubert.reinterpretcast
Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72347
In GlobalISel we may in some unfortunate circumstances generate PHIs with
operands that are on separate banks. If-conversion doesn't currently check for
that case and ends up generating a CSEL on AArch64 with incorrect register
operands.
Differential Revision: https://reviews.llvm.org/D72961
We removed UseVSXReg flag in https://reviews.llvm.org/D58685
But we did not reclain the bit 6 it was assigned,
this will become confusing and a hole later..
We should reclaim it as early as possible before new bits.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D72649
Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have
problems), every target overrides it with true. PostMachineScheduler
requires livein information. Not providing it can cause assertion
failures in ScheduleDAGInstrs::addSchedBarrierDeps().
Summary:
The `llc` tool currently defaults to Static relocation model and generates non-relocatable code for 32-bit Power.
This is not desirable on AIX where we always generate Position Independent Code (PIC). This patch makes PIC the default relocation model for AIX.
Reviewers: daltenty, hubert.reinterpretcast, DiggerLin, Xiangling_L, sfertile
Reviewed By: hubert.reinterpretcast
Subscribers: mgorny, wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72479
These intrinsics and the corresponding ISD nodes were recently added. PPC has
instructions that do this for vectors. Legalize them and add patterns to emit
the satuarting instructions.
Differential revision: https://reviews.llvm.org/D71940
Summary:
As currently written, -target powerpcspe will enable SPE regardless of
disabling the feature later on in the command line. Instead, change
this to just set a default CPU to 'e500' instead of a generic CPU.
As part of this, add FeatureSPE to the e500 definition.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D72673
Summary:
For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF
this change makes all symbols in the target specific libraries hidden
by default.
A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these
libraries public, which is mainly needed for the definitions of the
LLVMInitialize* functions.
This patch reduces the number of public symbols in libLLVM.so by about
25%. This should improve load times for the dynamic library and also
make abi checker tools, like abidiff require less memory when analyzing
libLLVM.so
One side-effect of this change is that for builds with
LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that
access symbols that are no longer public will need to be statically linked.
Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1):
nm before/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
36221
nm after/libLLVM-9svn.so | grep ' [A-Zuvw] ' | wc -l
26278
Reviewers: chandlerc, beanz, mgorny, rnk, hans
Reviewed By: rnk, hans
Subscribers: merge_guards_bot, luismarques, smeenai, ldionne, lenary, s.egerton, pzheng, sameer.abuasal, MaskRay, wuzish, echristo, Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D54439
SUMMARY:
In this patch we put the global variable in a Csect which's SectionKind is "ReadOnlyWithRel" into Data Section.
Reviewers: hubert.reinterpretcast,jasonliu,Xiangling_L
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D72461
For memcpy/memset/memmove etc., replace ExternalSymbolSDNode with a
MCSymbolSDNode, which have a prefix dot before function name as entry
point symbol.
Differential Revision: https://reviews.llvm.org/D70718
The argument is llvm::null() everywhere except llvm::errs() in
llvm-objdump in -DLLVM_ENABLE_ASSERTIONS=On builds. It is used by no
target but X86 in -DLLVM_ENABLE_ASSERTIONS=On builds.
If we ever have the needs to add verbose log to disassemblers, we can
record log with a member function, instead of passing it around as an
argument.
Summary:
This patch pushes the AIX vararg unimplemented error diagnostic later
and allows vararg calls so long as all the arguments can be passed in register.
This patch extends the AIX calling convention implementation to initialize
GPR(s) for vararg float arguments. On AIX, both GPR(s) and FPR are allocated
for floating point arguments. The GPR(s) are only initialized for vararg calls,
otherwise the callee is expected to retrieve the float argument in the FPR.
f64 in AIX PPC32 requires special handling in order to allocated and
initialize 2 GPRs. This is performed with bitcast, SRL, truncation to
initialize one GPR for the MSW and bitcast, truncations to initialize
the other GPR for the LSW.
A future patch will follow to add support for arguments passed on the stack.
Patch provided by: cebowleratibm
Reviewers: sfertile, ZarkoCA, hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D71013
PowerPC uses a dedicated method to check if the machine instr is
predicable by opcode. However, there's a bit `isPredicable` in instr
definition. This patch removes the method and set the bit only to
opcodes referenced in it.
Differential Revision: https://reviews.llvm.org/D71921
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
Fix a conditional that guarded code for execution only on 32-bit ELF by
checking that the Subtarget was not 64-bit and not-Darwin. By adding a new
target ABI (AIX), the condition is no longer correct. This code is dead for
AIX, due to a 'report_fatal_error' for thread local storage usage earlier in the
pipeline, but needs to be modifed as part of Darwins removal from the
PowerPC backend.
Summary:
This allows the use of '-target powerpcspe-unknown-linux-gnu' or
'powerpcspe-unknown-freebsd' to be used, instead of
'-target powerpc-unknown-linux-gnu -mspe'.
Reviewed By: dim
Differential Revision: https://reviews.llvm.org/D72014
Summary:
Every powerpc64le platform uses elfv2.
For powerpc64, the environments "elfv1" and "elfv2" were added for
FreeBSD ELFv1->ELFv2 migration in D61950. FreeBSD developers have
decided to use OS versions to select ABI, and no one is relying on the
environments.
Also use elfv2 on powerpc64-linux-musl.
Users can always use -mabi=elfv1 and -mabi=elfv2 to override the default
ABI.
Reviewed By: adalava
Differential Revision: https://reviews.llvm.org/D72352
SUMMARY:
In this patch, we map mergeable const objects to the read-only section in the same manner as const objects that are not mergeable.
Reviewers: hubert.reinterpretcast,jasonliu
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D71551
printInst prints a branch/call instruction as `b offset` (there are many
variants on various targets) instead of `b address`.
It is a convention to use address instead of offset in most external
symbolizers/disassemblers. This difference makes `llvm-objdump -d`
output unsatisfactory.
Add `uint64_t Address` to printInst(), so that it can pass the argument to
printInstruction(). `raw_ostream &OS` is moved to the last to be
consistent with other print* methods.
The next step is to pass `Address` to printInstruction() (generated by
tablegen from the instruction set description). We can gradually migrate
targets to print addresses instead of offsets.
In any case, downstream projects which don't know `Address` can pass 0 as
the argument.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D72172
We use o suffix to indicate record form instuctions,
(as it is similar to dot '.' in mne?)
This was fine before, as we did not support XO-form.
However, with https://reviews.llvm.org/D66902,
we now have XO-form support.
It becomes confusing now to still use 'o' for record form,
and it is weird to have something like 'Oo' .
This patch rename all 'o' instructions to use '_rec' instead.
Also rename `isDot` to `isRecordForm`.
Reviewed By: #powerpc, hfinkel, nemanjai, steven.zhang, lkail
Differential Revision: https://reviews.llvm.org/D70758
In https://reviews.llvm.org/D67148, we use isFloatTy to test floating
point type, otherwise we return GPRRC.
So 'double' will be classified as GPRRC, which is not accurate.
This patch covers other floating point types.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D71946
SUMMARY:
We currently emit a reference for function address constants as labels;
for example:
foo_ptr:
.long foo
however, there may be no such label in the case where the function is
undefined. Although the label exists when the function is defined, we
will (to be consistent) also use a csect reference in that case.
Address one comment
https://reviews.llvm.org/D71144#inline-653255
Reviewers: daltenty,hubert.reinterpretcast,jasonliu,Xiangling_L
Subscribers: cebowleratibm, wuzish, nemanjai
Differential Revision: https://reviews.llvm.org/D71144
SUMMARY:
We currently emit a reference for function address constants as labels;
for example:
foo_ptr:
.long foo
however, there may be no such label in the case where the function is
undefined. Although the label exists when the function is defined, we
will (to be consistent) also use a csect reference in that case.
Reviewers: daltenty,hubert.reinterpretcast,jasonliu,Xiangling_L
Subscribers: cebowleratibm, wuzish, nemanjai
Differential Revision: https://reviews.llvm.org/D71144
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.
Reviewers: sanjoy.google, efriedma, reames
Reviewed By: sanjoy.google
Differential Revision: https://reviews.llvm.org/D71537
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).
There's no major functionality change, except for targets that never
implemented this check.
This LLVM attribute was originally added in
d9699bc7bd (2015).
Reviewers: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D72118
Commit 0f0330a787 legalized these nodes on PPC without consideration of
unsafe math which means that we get inexact exceptions raised for nearbyint.
Since this doesn't conform to the standard, switch this legalization to depend
on unsafe fp math.
VSX provides a full complement of rounding instructions yet we somehow ended up
with some of them legal and others not. This just legalizes all of the FP
rounding nodes and the FP -> int rounding nodes with unsafe math.
Differential revision: https://reviews.llvm.org/D69949
For now, PowerPC will using several instructions to get the constant and "and" it with the following case:
define i32 @test1(i32 %a) {
%and = and i32 %a, -2
ret i32 %and
}
However, we could exploit it with the rotate mask instructions.
MB ME
+----------------------+
|xxxxxxxxxxx00011111000|
+----------------------+
0 32 64
Notice that, we can only do it if the MB is larger than 32 and MB <= ME as
RLWINM will replace the content of [0 - 32) with [32 - 64) even we didn't rotate it.
Differential Revision: https://reviews.llvm.org/D71829
This allows us to delete InlineAsm::Constraint_i workarounds in
SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and
TargetLowering::getInlineAsmMemConstraint overrides.
They were introduced to X86 in r237517 to prevent crashes for
constraints like "=*imr". They were later copied to other targets.
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=40554
Some CPU's trap to the kernel on unaligned floating point access and there are
kernels that do not handle the interrupt. The program then fails with a SIGBUS
according to the PR. This just switches the default for unaligned access to only
allow it on recent server CPUs that are known to allow this.
Differential revision: https://reviews.llvm.org/D71954
Summary:
If we didn't set the value for hasSideEffects bit in our td file, `llvm-tblgen`
will set it as true for those instructions which has no match pattern.
Below 6 instructions don't set the hasSideEffects flag and don't have match
pattern, so their hasSideEffects flag will be set true by llvm-tblgen.
But in fact below instructions don't modify any special register and don't have
other SideEffects, they shouldn't have SideEffects.
This patch is to modify the hasSideEffects of below instructions from 1 to 0.
```
VEXTUHLX
VEXTUHRX
VEXTUWLX
VEXTUWRX
VSPLTBs
VSPLTHs
```
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D71391
that 'and' with constant
More patches will be committed later to exploit more about 'and' with
constant.
Differential Revision: https://reviews.llvm.org/D71693
Summary:
If we didn't set the value for hasSideEffects bit in our td file, `llvm-tblgen`
will set it as true for those instructions which has no match pattern.
The instructions `MTLR` and `MFLR` don't set the hasSideEffects flag and don't
have match pattern, so their hasSideEffects flag will be set true by
`llvm-tblgen`.
But in fact, we can use `[LR]` to model the two instructions, so they should not
have SideEffects.
This patch is to modify the hasSideEffects of MTLR and MFLR from 1 to 0.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D71390
The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP.
We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse.
Differential Revision: https://reviews.llvm.org/D70657
Summary:
Currently, we set legalization action of `ISD::ROTL` vectors as
`Expand` in `PPCISelLowering`. However, we can exploit `vrl(b|h|w|d)`
to lower `ISD::ROTL` directly.
Differential Revision: https://reviews.llvm.org/D71324
static void *ifunc(void) __attribute__((ifunc("resolver")));
void foo() { ifunc(); }
The relocation produced by the ifunc() call:
1. gcc -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
2. gcc -msecure-plt -PIE => R_PPC_PLTREL24 r_addend=0x8000
3. clang -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
4. clang -msecure-plt -fPIE => R_PPC_REL24
4 is incorrect. The R_PPC_REL24 needs a call stub due to ifunc. If this
relocation is mixed with other R_PPC_PLTREL24(r_addend=0x8000) in a
function, both GNU ld and lld (after D71621 fix) may produce a wrong
result.
This patch fixes 4 to use R_PPC_PLTREL24, which matches GCC.
Both GNU ld and lld (after D71621) will be happy.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D71649
Recommit after making the same API change in non-x86 targets. This has been build for all targets, and tested for effected ones. Why the difference? Because my disk filled up when I tried make check for all.
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated. This just rearranges the code to make the upcoming change more obvious.
Summary:
The default static (non-PIC, non-PIE) model for 32-bit powerpc does not
use @PLT annotations and relocations in GCC. LLVM shouldn't use @PLT
annotations either, because it breaks secure-PLT linking with (some
versions of?) GNU LD.
Update the available-externally.ll test to reflect that default mode should be
the same as the static relocation, by using the same check prefix.
Reviewed by: sfertile
Differential Revision: https://reviews.llvm.org/D70570
Refactor the splatting of a constant to a vector so that common code is used
both for Power9 and Power8.
Patch by: Anil Mahmud
Differential Revision: https://reviews.llvm.org/D71481
We somehow missed doing this when we were working on Power9 exploitation.
This just adds the missing legalization and cost for producing the vector
intrinsics.
Differential revision: https://reviews.llvm.org/D70436
Summary:
If a function is defined after it appears in a TOC expression, we may
try to access an unset containing csect when returning a symbol for the
expression.
Reviewers: Xiangling_L, DiggerLin, jasonliu, hubert.reinterpretcast
Reviewed By: hubert.reinterpretcast
Subscribers: hubert.reinterpretcast, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71125
This fixes an assertion failure that triggers inside
getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr
that is not a memory operation.
Different backends implement getMemOperandWithOffset differently: some
return false on non-memory MachineInstrs, others assert.
The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck
relies on getMemOperandWithOffset to return false on non-memory
MachineInstrs, instead of asserting.
This patch updates the documentation on getMemOperandWithOffset that it
should return false on any MachineInstr it cannot handle, instead of
asserting. It also adapts the in-tree backends accordingly where
necessary.
Differential Revision: https://reviews.llvm.org/D71359
Currently -fuse-init-array option is not effective when target triple
does not specify os, on x86,x86_64.
i.e.
// -fuse-init-array is not honored.
$ clang -target i386 -fuse-init-array test.c -S
// -fuse-init-array is honored.
$ clang -target i386-linux -fuse-init-array test.c -S
This patch fixes first case.
And does cleanup.
Reviewers: rnk, craig.topper, fhahn, echristo
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D71360
Summary:
r372285 changed LLVM to use a `TargetConstant` for parameters of intrinsics that are required to be immediates.
Since that commit, use of `%llvm.ppc.altivec.vc{fsx,fux,tsxs,tuxs}` intrinsics has not worked, and resulted in a `LLVM ERROR: Cannot select: intrinsic %llvm.ppc.altivec.vc*` error. The intrinsics' TableGen definitions matched on `imm` instead of `timm`.
This commit updates those definitions to use `timm`.
Fixes: https://llvm.org/PR44239
Reviewers: hfinkel, nemanjai, #powerpc, Jim
Reviewed By: Jim
Subscribers: qiucf, wuzish, Jim, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Patched by vddvss (Colin Samples).
Differential Revision: https://reviews.llvm.org/D71138
Extends the desciptor-based indirect call support for 32-bit codegen,
and enables indirect calls for AIX.
In-depth Description:
In a function descriptor based ABI, a function pointer points at a
descriptor structure as opposed to the function's entry point. The
descriptor takes the form of 3 pointers: 1 for the function's entry
point, 1 for the TOC anchor of the module containing the function
definition, and 1 for the environment pointer:
struct FunctionDescriptor {
void *EntryPoint;
void *TOCAnchor;
void *EnvironmentPointer;
};
An indirect call has several steps of loading the the information from
the descriptor into the proper registers for setting up the call. Namely
it has to:
1) Save the caller's TOC pointer into the TOC save slot in the linkage
area, and then load the callee's TOC pointer into the TOC register
(GPR 2 on AIX).
2) Load the function descriptor's entry point into the count register.
3) Load the environment pointer into the environment pointer register
(GPR 11 on AIX).
4) Perform the call by branching on count register.
5) Restore the caller's TOC pointer after returning from the indirect call.
A couple important caveats to the above:
- There is no way to directly load a value from memory into the count register.
Instead we populate the count register by loading the entry point address into
a gpr and then moving the gpr to the count register.
- The TOC restore has to come immediately after the branch on count register
instruction (i.e., the 1st instruction executed after we return from the
call). This is an implementation limitation. We could, in theory, schedule
the restore elsewhere as long as no uses of the TOC pointer fall in between
the call and the restore; however, to keep it simple, we insert a pseudo
instruction that represents both the indirect branch instruction and the
load instruction that restores the caller's TOC from the linkage area. As
they flow through the compiler as a single pseudo instruction, nothing can be
inserted between them and the caller's TOC is then valid at any use.
Differtential Revision: https://reviews.llvm.org/D70724
Fix PR44284. This is probably not valid assembly but we should not crash.
Reviewed By: luporl, #powerpc, steven.zhang
Differential Revision: https://reviews.llvm.org/D71443
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).
In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.
Committing this change will significantly reduce our merge conflicts
for each upstream merge.
Reviewers: spatel, bogner
Reviewed By: bogner
Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70917
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
object file size.
- Incremental step towards decoupling target intrinsics.
The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.
Part of PR34259
Reviewers: efriedma, echristo, MaskRay
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D71320
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.
Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.
Split off from D71320
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D71381
PowerPC has instruction to do the semantics of this piece of code:
vector int foo(vector int m, vector int n) {
return (m + n + 1) >> 1;
}
This patch is adding the match rule to select it.
Differential Revision: https://reviews.llvm.org/D71002
Summary:
Following on from rG884351547da2, this patch cleans up the logic for `xxpermdi`
peephole optimizations by converting two layers of nested `if`s to early breaks
and simplifying the logic.
Reviewers: hfinkel, nemanjai, jsji, lkail, #powerpc, steven.zhang
Reviewed By: #powerpc, steven.zhang
Subscribers: wuzish, steven.zhang, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71170
Patch by vddvss (Colin Samples).
Summary:
This is found during https://reviews.llvm.org/D70758
All the other record forms are having suffix o at the end.
ANDIo8 and ANDISo8 are the only two that put o before 8.
This patch rename them to be consistent with others.
Reviewers: #powerpc, hfinkel, nemanjai, lei, steven.zhang, echristo, jhibbits, joerg
Reviewed By: jhibbits
Subscribers: wuzish, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70928
Refactor FinishCall to be more easily understandable as a precursor to
implementing indirect calls for AIX. The refactor tries to group similar
code together at the cost of some code duplication. The high level
overview of the refactor:
- Adds a number of helper functions for things like:
* Determining if a call is indirect.
* What the Opcode for a call is.
* Transforming the callee for a direct function call.
* Extracting the Chain operand from a CallSeqStart node.
* Building the operands of the call.
- Adds helpers for building the indirect call DAG nodes
(excluding the call instruction itself which is created in
`FinishCall`).
- Removes PrepareCall, which has been subsumed by the
helpers.
- Rename 'InFlag' to 'Glue'.
- FinishCall has been refactored to:
1) Set TOC pointer usage on the DAG for the TOC based
subtargets.
2) Calculate if a call is indirect.
3) Determine the Opcode to use for the call
instruction.
4) Transform the Callee for direct calls, or build
the DAG nodes for indirect calls.
5) Buildup the call operands.
6) Emit the call instruction.
7) If needed, emit the callSeqEnd Node and
finish lowering by calling `LowerCallResult`
Differential Revision: https://reviews.llvm.org/D70126
This attempts to teach the cost model in Arm that code such as:
%s = shl i32 %a, 3
%a = and i32 %s, %b
Can under Arm or Thumb2 become:
and r0, r1, r2, lsl #3
So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.
We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.
Differential Revision: https://reviews.llvm.org/D70966
Summary:
This patch fixes an issue where the PPC MI peephole optimization pass incorrectly remove a vector swap.
Specifically, the pass can combine a splat/swap to a splat/copy. It uses `TargetRegisterInfo::lookThruCopyLike` to determine that the operands to the splat are the same. However, the current logic only compares the operands based on register numbers. In the case where the splat operands are ultimately feed from the same physical register, the pass can incorrectly remove a swap if the feed register for one of the operands has been clobbered.
This patch adds a check to ensure that the registers feeding are both virtual registers or the operands to the splat or swap are both the same register.
Here is an example in pseudo-MIR of what happens in the test cased added in this patch:
Before PPC MI peephole optimization:
```
%arg = XVADDDP %0, %1
$f1 = COPY %arg.sub_64
call double rint(double)
%res.first = COPY $f1
%vec.res.first = SUBREG_TO_REG 1, %res.first, %subreg.sub_64
%arg.swapped = XXPERMDI %arg, %arg, 2
$f1 = COPY %arg.swapped.sub_64
call double rint(double)
%res.second = COPY $f1
%vec.res.second = SUBREG_TO_REG 1, %res.second, %subreg.sub_64
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
%vec.res = XXPERMDI %vec.res.splat, %vec.res.splat, 2
; %vec.res == [ %vec.res.second[0], %vec.res.first[0] ]
```
After optimization:
```
; ...
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
; lookThruCopyLike(%vec.res.first) == lookThruCopyLike(%vec.res.second) == $f1
; so the pass replaces the swap with a copy:
%vec.res = COPY %vec.res.splat
; %vec.res == [ %vec.res.first[0], %vec.res.second[0] ]
```
As best as I can tell, this has occurred since r288152, which added support for lowering certain vector operations to direct moves in the form of a splat.
Committed for vddvss (Colin Samples). Thanks Colin for the patch!
Differential Revision: https://reviews.llvm.org/D69497
Summary: Previously we only handled the case where the csect hadn't been set up yet, so we'd hit an assert later on.
Reviewers: jasonliu, DiggerLin, stevewan
Reviewed By: jasonliu
Subscribers: hubert.reinterpretcast, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71032
Summary:
Implement emitTCEntry for PPCTargetXCOFFStreamer.
Add TC csects to TOCCsects for object file writing.
Note:
1. I did not include any raw data testing for this object file generation
because TC entries raw data will all be 0 without relocation implemented.
I will add raw data testing as part of relocation testing later.
2. I removed "Symbol->setFragment(F);" for common symbols because we
don't need it, and if we have it then we would hit assertions below:
Assertion `(SymbolContents == SymContentsUnset ||
SymbolContents == SymContentsOffset) &&
"Cannot get offset for a common/variable symbol"' failed.
3.Fixed incorrect TOC-base alignment.
Differential Revision: https://reviews.llvm.org/D70798
The old processor design assume that, all the old processor's feature must be
inherited into future processor. That is not true as instruction fusion or some
implementation defined features are not inheritable.
What this patch did:
* Rename the old "specific features" to "additional features" that keep the new added inheritable features.
* Use the "specific features" to keep those features only for specific processor.
* Add the "inheritable features" to keep all the features that inherited from early processor.
Differential Revision: https://reviews.llvm.org/D70768
When converting reg+reg shifts to reg+imm rotates, we neglect to consider the
CodeGenOnly versions of the 32-bit shift mnemonics. This means we produce a
rotate with missing operands which causes a crash.
Committing this fix without review since it is non-controversial that the list
of mnemonics to consider should include the 64-bit aliases for the exact
mnemonics.
Fixes PR44183.
This patch adds LowerFormalArguments_AIX, support is added for lowering
int, float, and double formal arguments into general purpose and
floating point registers only.
The aix calling convention testcase have been redone to test for caller
and callee functionality in the same lit test.
Patch by Zarko Todorovski!
Differential Revision: https://reviews.llvm.org/D69578
Summary:
Emit the correct .toc psuedo op when we change to the TOC and emit
TC entries. Make sure TOC psuedos get the right symbols via overriding
getMCSymbolForTOCPseudoMO on AIX. Add a test for TOC assembly writing
and update tests to include TOC entries.
Also make sure external globals have a csect set and handle external function descriptor (originally authored by Jason Liu) so we can emit TOC entries for them.
Reviewers: DiggerLin, sfertile, Xiangling_L, jasonliu, hubert.reinterpretcast
Reviewed By: jasonliu
Subscribers: arphaman, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70461
The Power 9 CPU has some features that are unlikely to be passed on to future
versions of the CPU. This patch separates this out so that future CPU does not
inherit them.
Differential Revision: https://reviews.llvm.org/D70466
This is a continuation of D70262
The previous patch as listed above added the future CPU in clang. This patch
adds the future CPU in the PowerPC backend. At this point the patch simply
assumes that a future CPU will have the same characteristics as pwr9. Those
characteristics may change with later patches.
Differential Revision: https://reviews.llvm.org/D70333
Summary:
This is NFC code clean work after D67088. In that patch, we extend loop instructions prep for ds/dq form.
This patch only changes the file name PPCLoopPreIncPrep.cpp to PPCLoopInstrFormPrep.cpp for better reviewing of the content change of file PPCLoopInstrFormPrep.cpp.
Reviewers: #powerpc, nemanjai, steven.zhang, shchenz
Reviewed By: #powerpc, shchenz
Subscribers: wuzish, mgorny, hiraditya, kbarton, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70716
Summary:
This patch renames the DarwinDirective (used to identify which CPU was defined)
to CPUDirective. It also adds the getCPUDirective() method and replaces all uses
of getDarwinDirective() with getCPUDirective().
Once this patch lands and downstream users of the getDarwinDirective() method
have switched to the getCPUDirective() method, the old getDarwinDirective()
method will be removed.
Reviewers: nemanjai, hfinkel, power-llvm-team, jsji, echristo, #powerpc, jhibbits
Reviewed By: hfinkel, jsji, jhibbits
Subscribers: hiraditya, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70352
If an inline asm statement clobbers a VSX register that overlaps with a
callee-saved Altivec register or FPR, we will not record the clobber and will
therefore violate the ABI. This is clearly a bug so this patch fixes it.
Differential revision: https://reviews.llvm.org/D68576
Summary:
This patch sets up the infrastructure for
1. Associate MCSymbolXCOFF with an MCSectionXCOFF when it could not
get implicitly associated.
2. Generate undefined symbols. The patch itself generates undefined symbol
for external function call only. Generate undefined symbol for external
global variable and external function descriptors will be handled in
separate patch(s) after this is land.
Differential Revision: https://reviews.llvm.org/D70443
This patch aims to spill CR[0-7]LT bits on POWER9 using the setb instruction.
The sequence on P9 to spill these bits will be:
setb %reg, %CRREG
stw %reg, $FI
Instead of the typical sequence:
mfocrf %reg, %CRREG
rlwinm %reg1, %reg, $SH, 0, 0
stw %reg1, $FI
Differential Revision: https://reviews.llvm.org/D68443
Power9 has instructions to implement the semantics of SIGN_EXTEND_INREG for vector type.
Mark it as legal and add the match pattern.
Differential Revision: https://reviews.llvm.org/D69601
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so. Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:
1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so. This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.
With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.
2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set. This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.
I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:
- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON
Reviewers: beanz, smeenai, compnerd, phosek
Reviewed By: beanz
Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70179
This patch lowering jump table, constant pool and block address in assembly.
1. On AIX, jump table index is always relative;
2. Put CPI and JTI into ReadOnlySection until we support unique data sections;
3. Create the temp symbol for block address symbol;
4. Update MIR testcases and add related assembly part;
Differential Revision: https://reviews.llvm.org/D70243
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.