When doing the float to int conversion the strict conversion also needs to
retun a chain. This patch fixes that.
Reviewed By: nemanjai, #powerpc, qiucf
Differential Revision: https://reviews.llvm.org/D117464
FAST-ISEL should fall back to DAG-ISEL when a global variable has the
toc-data attribute. A number of the checks were duplicated in the lit
test becuase of
1) Slightly different output between -O0 and -O2 due to FAST-ISEL vs
DAG-ISEL codegen.
2) In preperation of a peephole optimization that will run when
optimizations are enabled.
Differential Revision: https://reviews.llvm.org/D115373
User errors should use reportError. reportError allows us to continue parsing
the file and collect more diagnostics.
While here, make the diagnostic follow convention, merge tests, and test
line/column numbers.
As pointed out in https://reviews.llvm.org/D115688#inline-1108193, we
don't want to sink the save point past an INLINEASM_BR, otherwise
prologepilog may incorrectly sink a prolog past the MBB containing an
INLINEASM_BR and into the wrong MBB.
ShrinkWrap is getting this wrong because LR is not in the list of callee
saved registers. Specifically, ShrinkWrap::useOrDefCSROrFI calls
RegisterClassInfo::getLastCalleeSavedAlias which reads
CalleeSavedAliases which was populated by
RegisterClassInfo::runOnMachineFunction by iterating the list of
MCPhysReg returned from MachineRegisterInfo::getCalleeSavedRegs.
Because PPC's LR is non-allocatable, it's NOT considered callee saved.
Add an interface to TargetRegisterInfo for such a case and use it in
Shrinkwrap to ensure we don't sink a prolog past an INLINEASM or
INLINEASM_BR that clobbers LR.
Reviewed By: jyknight, efriedma, nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D116424
Add support for Return Oriented Programming (ROP) protection for 32 bit.
This patch also adds a testing for AIX on both 64 and 32 bit.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D111362
This reverts commit fd4808887e.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.
This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.
This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).
Differential Revision: https://reviews.llvm.org/D116031
The current code makes the assumption that equality
comparison can be performed with a word comparison
instruction. While this is true if the entire 64-bit
results are used, it does not generally work. It is
possible that the low order words and high order
words produce different results and a user of only
one will get the wrong result.
This patch adds an and of the result words so that
each word has the result of the comparison of the
entire doubleword that contains it.
Differential revision: https://reviews.llvm.org/D115678
Commit 150681f increases
cost of producing MMA types (vector pair and quad).
However, it increases the cost for getUserCost() which is
used in unrolling. As a result, loops that contain these
types already (from the user code) cannot be unrolled
(even with the user's unroll pragma). This was an unintended
sideeffect. Reverting that portion of the commit to allow
unrolling such loops.
Differential revision: https://reviews.llvm.org/D115424
Summary: When disassembling, symbolize a branch target operand
to print a label instead of a real address.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D114492
This mnemonic has been supported by GAS for years and
it was added to the PowerPC ISA as of ISA 3.1. We will
support the mnemonic to be compatible with GAS.
There are two signatures of setSpecialOperandAttr in TargetInstrInfo.
One of them is only called from PPCInstrInfo which has an override
of it.
Remove it from TargetInstrInfo and make it a non-virtual method in
PPCInstrInfo.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D115404
Now we won't copy the byval parameter (bigger than 8 bytes) to
caller's parameter save area. Instead, we will only copy the
byval parameter when it can not be passed entirely in registers
which means we have to use parameter save area according to the
64 bit SVR4 ABI.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D111485
Implement 'back-to-back' FX fusion according to Power10 User Manual
'19.1.5.4 Fusion', not enabled by default.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D114345
The Power ISA defined l[bhwdq]arx as both base and
extended mnemonics. The base mnemonic takes the EH
bit as an operand and the extended mnemonic omits
it, making it implicitly zero. The existing
implementation only handles the base mnemonic when
EH is 1 and internally produces a different
instruction. There are historical reasons for this.
This patch simply removes the limitation introduced
by this implementation that disallows the base
mnemonic with EH = 0 in the ASM parser.
This resolves an issue that prevented some files
in the Linux kernel from being built with
-fintegrated-as.
Also fix a crash if the value is not an integer immediate.
The load/store infrastructure previously made an incorrect assumption that
whenever it is used with a load/store intrinsic on Power10 - those intrinsics
would automatically be the lxvp/stxvp intrinsics introduced in Power10.
However, this is obviously not the case as there are multiple instances of
pre-P10 intrinsics that use the refactored load/store implementation.
This patch corrects this assumption, and produces the expected intrinsic on pre-P10.
Differential Revision: https://reviews.llvm.org/D114978
The patch expands the existing 32-bit toc-data attribute support to 64-bit.
In both 32-bit and 64-bit it is supported for small code model only.
Differential Revision: https://reviews.llvm.org/D114654
A big-endian version of vpermxor, named vpermxor_be, is added to LLVM
and Clang. vpermxor_be can be called directly on both the little-endian
and the big-endian platforms.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D114540
This patch prevents the optimizer from producing wide vectors in the IR,
specifically the MMA types (v256i1, v512i1). The idea is that on Power, we only
want to be producing these types only if the vector_pair and vector_quad types
are used specifically.
To prevent the optimizer from producing these types in the IR,
vectorCostAdjustmentFactor() is updated to return an instruction cost factor or
an invalid instruction cost if the current type is that of an MMA type. An
invalid instruction cost returned by this function signifies to other cost
computing functions to return the maximum instruction cost to inform the
optimizer that producing these types within the IR is expensive, and should not
be produced in the first place.
This issue was first seen in the test case included within this patch.
Differential Revision: https://reviews.llvm.org/D113900
The XL implementation of vec_round for vector double uses
"round-to-nearest, ties to even" just as the vector float
`version does. However clang and gcc use "round-to-nearest-away"
for vector double and "round-to-nearest, ties to even"
for vector float.
The XL behaviour is implemented under the __XL_COMPAT_ALTIVEC__
macro similarly to other instances of incompatibility.
Differential revision: https://reviews.llvm.org/D113642
Similarly to what GCC does, we should allow scalars with
the "v" constraint rather than introducing unnecessary
new constraints for scalars in Altivec registers.
Differential revision: https://reviews.llvm.org/D113635
Support for builtins that use bcdadd./bcdsub. to add/subtract
Binary Coded Decimal values as well as to determine validity
and compare BCD values.
Differential revision: https://reviews.llvm.org/D114088
This implements the rest of Power10 instruction fusion pairs, according
to user manual, including 'wide immediate', 'load compare', 'zero move'
and 'SHA3 assist'.
Only 'SHA3 assist' is enabled by default.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D112912
`asm` always has AT&T-style input (`asm inteldialect` has Intel-style asm
input), so EmitGCCInlineAsmStr() always has to pick the same variant since it
cares about the input asm string, not the output asm string.
For PowerPC, that default variant is 1. For other targets, it's 0.
Without this, the included test case errors out with
error: unknown use of instruction mnemonic without a size suffix
mov rax, rbx
since it picks the intel branch and then tries to interpret it as AT&T
when selecting intel-style output with `-x86-asm-syntax=intel`.
Differential Revision: https://reviews.llvm.org/D113894
This patch adds PPC back end optimization to analyze the arguments of a
conditional trap instruction to execute one of the following:
1. Delete it if never trap
2. Replace it if always trap
3. Otherwise keep it
Reviewed By: nemanjai, amyk, PowerPC
Differential revision: https://reviews.llvm.org/D111434
The platform independent ISD::INSERT_VECTOR_ELT take a element index,
but vins* instructions take a byte index. Update 32bit td patterns for
vector insert to handle the element index accordingly.
Since vector insert for non constant index are supported in
ISA3.1, there is no need to use platform specific ISD node,
PPCISD::VECINSERT. Update td pattern to directly use
ISD::INSERT_VECTOR_ELT instead.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D113802
This patch adds the backend optimization to match XL behavior for the two
builtins __tdw and __tw that when the second input argument is an immediate,
emitting tdi/twi instructions instead of td/tw.
Reviewed By: nemanjai, amyk, PowerPC
Differential revision: https://reviews.llvm.org/D112285
Currently, the floating point instructions that depend on
rounding mode are correctly marked in the PPC back end with
an implicit use of the RM register. Similarly, instructions
that explicitly define the register are marked with an
implicit def of the same register. So for the most part,
RM-using code won't be moved across RM-setting instructions.
However, calls are not marked as RM-setting instructions so
code can be moved across calls. This is generally desired,
but so is the ability to turn off this behaviour with an
appropriate option - and -frounding-math really should be
that option.
This patch provides a set of call instructions (for direct
and indirect calls) that are marked with an implicit def of
the RM register. These will be used for calls that are marked
with the strictfp attribute.
Differential revision: https://reviews.llvm.org/D111433
Add comments to explain why XXPERMDIs and XXPERMDI have different input register
classes, vsfrc for XXPERMDIs and vsrc for XXPERMDI.
This addresses the comments in abandoned patch D113178, we keep using `f0` instead
of using `vs0` for XXPERMDIs on purpose.
ppc_fp128 and fp128 are both 128-bit floating point types. However, we
can't do conversion between them now, since trunc/ext are not allowed
for same-size fp types.
This patch adds two new intrinsics: llvm.ppc.convert.f128.to.ppcf128 and
llvm.convert.ppcf128.to.f128, to support such conversion.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D109421
Currently, FPSCR is not modeled, so in some early passes (such as
early-cse), the read/set intrinsics to FPSCR may get incorrect
simplification.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D112380
Implement two builtins to pack/unpack IBM extended long double float,
according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D112055
Removed references to `sanity check` in `PPCBranchCoalescing.cpp` code comments.
No word substitution made in this case, as the comments and code following illustrated are
sufficient IMO.
Reviewed By: quinnp
Differential Revision: https://reviews.llvm.org/D112452
Add a new preparation pattern in PPCLoopInstFormPrep pass to reduce register
pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D108750
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....
Differential Revision: https://reviews.llvm.org/D111998
On AIX, the system assembler does not support the extended mnemonics
dcbtt and dcbtstt. This patch stops them from being emitted on
AIX and emits the base mnemonics instead, dcbt X, X, 16 and
dcbtstt X, X, 16 respectively.
Differential revision: https://reviews.llvm.org/D111258
This moves the registry higher in the LLVM library dependency stack.
Every client of the target registry needs to link against MC anyway to
actually use the target, so we might as well move this out of Support.
This allows us to ensure that Support doesn't have includes from MC/*.
Differential Revision: https://reviews.llvm.org/D111454
To better reflect the meaning of the now-disambiguated {GlobalValue,
GlobalAlias}::getBaseObject after breaking off GlobalIFunc::getResolverFunction
(D109792), the function is renamed to getAliaseeObject.
Lowering of byval parameters with sizes that are not represented by a single
store require multiple stores to properly address the correct size of the
parameter.
Sizes that cannot be done with a single store are 3 bytes, 5 bytes, 6 bytes,
7 bytes. It is not correct to simply perform an 8 byte store and for these
elements because then the store would be larger than the element and alias
analysis would assume that this is undefined behaivour and return NoAlias
for them.
This patch adds the correct stores so that the size of the store is not larger
than the size of the element.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D108795
The delayed stack protector feature which is currently used for SDAG (and thus
allows for more commonly generating tail calls) depends on being able to extract
the tail call into a separate return block. To do this it also has to extract
the vreg->physreg copies that set up the call's arguments, since if it doesn't
then the call inst ends up using undefined physregs in it's new spliced block.
SelectionDAG implementations can do this because they delay emitting register
copies until *after* the stack arguments are set up. GISel however just
processes and emits the arguments in IR order, so stack arguments always end up
last, and thus this breaks the code that looks for any register arg copies that
precede the call instruction.
This patch adds a thunk argument to the assignValueToReg() and custom assignment
hooks. For outgoing arguments, register assignments use this return param to
return a thunk that does the actual generating of the copies. We collect these
until all the outgoing stack assignments have been done and then execute them,
so that the copies (and perhaps some artifacts like G_SEXTs) are placed after
any stores.
Differential Revision: https://reviews.llvm.org/D110610
This patch fixes the return value of the builtin __builtin_ppc_load2r to
correctly return short instead of int.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D110771
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.
Differential Revision: https://reviews.llvm.org/D110807
This patch removes the uneccessary mf/mtvsr generated in conjunction
with xscvdpsxws/xscvdpuxws.
Differential revision: https://reviews.llvm.org/D109902
This patch makes sure that the builtins __builtin_ppc_load8r and
__ builtin_ppc_store8r are only available for Power 7 and up.
Currently the builtins seem to produce incorrect code if used for
Power 6 or before.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D110653
The instruction has similar semantics to vbpermq but for doublewords.
It was added in Power9 and the ABI documents the builtin.
Differential revision: https://reviews.llvm.org/D107899
This patch is in a series of patches to provide builtins for
compatability with the XL compiler. This patch adds builtins for compare
exponent and test data class operations on floating point values.
Reviewed By: #powerpc, lei
Differential Revision: https://reviews.llvm.org/D109437
This patch fixes the pattern for the P10 instructions Vector Shift Left
Double by Bit Immediate VN-form and Vector Shift Right Double by Bit
Immediate VN-form. The third argument should be a target constant (`timm`)
instead of an `i32` because an immediate is expected.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D109920
This patch marks splat immediate instructions XXSPLTIW and XXSPLTIDP as
rematerializable to prevent MachineLICM from moving them out of loops.
Reviewed By: lei, amy
Differential revision: https://reviews.llvm.org/D108823
Based off a discussion on D110100, we should be avoiding default CostKinds whenever possible.
This initial patch removes them from the 'inner' target implementation callbacks - these should only be used by the main TTI calls, so this should guarantee that we don't cause changes in CostKind by missing it in an inner call. This exposed a few missing arguments in getGEPCost and reduction cost calls that I've cleaned up.
Differential Revision: https://reviews.llvm.org/D110242
This is a follow-up of D105872. Now we are able to prepare for update
form with non-const increment.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D106032
This patch adds a prefixed load pattern involving v2f32 fpext v2f64, where we
are dealing with a value with an offset that fits into a 34-bit signed immediate.
A reduced test case is also added to patch that tests the pattern, in which the
pattern is tested in the big endian CHECKs of the newly added test.
Differential Revision: https://reviews.llvm.org/D109887
This patch exploits the prefixed load and store instructions utilizing the
refactored load/store implementation introduced in D93370.
Prefixed load and store instructions are emitted whenever we are loading or
storing a value with an offset that fits into a 34-bit signed immediate.
Patterns for the prefixed load and stores are added in this patch, as well as
the implementation that detects when we are loading and storing a value with an
offset that fits in 34-bits.
Differential Revision: https://reviews.llvm.org/D96075
PPCLoopInstrFormPrep pass now can prepare for load store instructions
in a loop whose increment is not a constant integer.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D105872
This patch updates the PC-Relative load and store patterns to utilize the
refactored load/store implementation introduced in D93370.
PC-Relative implementation has been added to PPCISelLowering.cpp, and also the
patterns in PPCInstrPrefix.td have been updated and no longer require AddedComplexity.
All existing test cases pass with this update.
Differential Revision: https://reviews.llvm.org/D95116
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D109535
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
This patch adds a fix to do early if conversion to select when
conditional branch not using physical register to prevent the crash when
expanding ISEL instruction.
Reviewed By: lei, kamaub, PowerPC
Differential revision: https://reviews.llvm.org/D108302
This is exposed by enabling FastIsel on 64bit AIX.
We are generating XSRSP regardless of the arch,
which may be wrong when -mcpu=pwr7.
The fix is to guard the generation in P8 only.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D109365
On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change replaces the per module MCSubtargetInfo retained
by the targets AsmBackend in favour of passing through the local
MCSubtargetInfo in operation at the time.
On Arm using the architectural NOP instruction can have a performance
benefit on some implementations.
For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to
limit the chances of this causing problems in the future. I've not
done this for other targets such as X86 as there is more frequent use
of the MCSubtargetInfo and it looks to be for stable properties that
we would not expect to vary per function.
This change required threading STI through MCNopsFragment and
MCBoundaryAlignFragment.
I've attempted to take into account the in tree experimental backends.
Differential Revision: https://reviews.llvm.org/D45962
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.
There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.
For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.
This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.
Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.
Differential Revision: https://reviews.llvm.org/D45961
This patch basically enables fast-isel for AIX 64-bit subtarget
(previously enabled only for ELF 64). The initial motivation is to
introduce branch folding to AIX generated code for correct debug
behavior. I also saw some compiling time improvement in a few LLVM
test-suite benchmarks. (toast, dbms, cjpeg, burg, etc.)
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D98844
Similar to D108842, D108844, and D108926.
__has_builtin(builtin_mul_overflow) returns true for 32b PPC targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks ppc44x_defconfig + CONFIG_BLK_DEV_NBD=y builds of the Linux
kernel that are using builtin_mul_overflow with these types for these
targets.
If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.
This will still need to be worked around in the Linux kernel in order to
continue to support these builds of the Linux kernel for this
target with older releases of clang.
Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D108936
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.
This is a followup to D108076.
Differential Revision: https://reviews.llvm.org/D108875
PowerPC can model these instructions, so we don't need this flag set.
Reviewed By: shchenz, jsji
Differential Revision: https://reviews.llvm.org/D71983
After c063946476 usage of R31 doesn't necessarily mean
that alloca is used. The `TracebackTable::IsAllocaUsedMask` flag should be set only
when R31 is used as a frame pointer.
On AIX the `function calls alloca' bit seems to be set whenever R31 is
set up as a frame pointer, even when there is no alloca call.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D108141
This is the first step to enable PPC64 support huge frame size(>2G). Also fix an assertion error for frame size, i.e.,`int x; !isInt<32>(x);` should be always evaluated false, so the guard code for frame size is impossible to hit.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D107435
It was introduced in 1a6dc92 and only enabled on PowerPC/AMDGPU. That
should be enabled for all targets.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D108010
It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual
FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this
intrinsic needs to remain as a call as it cannot be lowered to any instruction, which
also means we need to disable CTR loop generation for fma involving the ppcf128 type.
This patch accomplishes this behaviour.
Differential Revision: https://reviews.llvm.org/D107914
Add builtin and intrinsic for `__addex`.
This patch is part of a series of patches to provide builtins for
compatibility with the XL compiler.
Reviewed By: stefanp, nemanjai, NeHuang
Differential Revision: https://reviews.llvm.org/D107002
When depth > 0, callee frame address is used to compute the return address of
callee producing improper return address. This patch adds the fix to use caller
frame address to compute the return address of callee.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D107646
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.
Differential Revision: https://reviews.llvm.org/D107528
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D107271
Regsier hints when copying to a UACC register do not always produce VSRp
registers. This patch makes sure that we do not produce hints in cases
where the subregsiter of the UACC is not a VSRp.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D107101
The dst/dstt/dstst/dststt instructions are nop's on all PowerPC
cores that AIX supports. The AIX assembler also does not accept
these mnemonics. Turn them into nop's on AIX (similar to dstall).
All floating point values in registers are in double precision
representation. In order to materialize the correct single precision
value, we need to convert the APFloat that represents the value
to double precision first.
Reviewed By: amyk, NeHuang
Differential Revision: https://reviews.llvm.org/D106812
Before MASSV only supported P8 and P9 on AIX ans Linux . This patch proposes
MASSV to add support of P7 and P10 only on AIX too.
Differential: https://reviews.llvm.org/D106678
Add td definitions and asm/disasm tests for the addex instruction introduced in
ISA 3.0.
Reviewed By: nemanjai, amyk, NeHuang
Differential Revision: https://reviews.llvm.org/D106666
This is a followup patch for D105930 to add implicit-def of RM for
mtfsb[01] instructions as per review comments.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D106603
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtin and intrinsic for "__stbcx".
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D106484
This patch is in a series of patches to provide
builtins for compatibility with the XL compiler.
This patch adds builtins related to floating point
operations
Reviewed By: #powerpc, nemanjai, amyk, NeHuang
Differential Revision: https://reviews.llvm.org/D103986
Implemented builtins for mtmsr, mfspr, mtspr on PowerPC;
the patch is intended for XL Compatibility.
Differential revision: https://reviews.llvm.org/D106130
This patch implements store, load, move from and to registers related
builtins, as well as the builtin for stfiw. The patch aims to provide
feature parady with xlC on AIX.
Differential revision: https://reviews.llvm.org/D105946
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch add the builtin and emit target independent
code for __cmpb.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D105194
ACC registers are a combination of four consecutive vector registers.
If the vector registers are assigned first this often forces a number
of copies to appear just before the ACC register is created. If the ACC
register is assigned first then fewer copies are generated when the vector
registers are assigned.
This patch tries to force the register allocator to assign the ACC registers first
and then the UACC registers and then the vector pair registers. It does this
by changing the priority of the register classes.
This patch also adds hints to help the register allocator assign UACC registers from
known ACC registers and vector pair registers from known UACC registers.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D105854
This provides intrinsics for emitting instructions that set the FPSCR (`mtfsf/mtfsfi`).
The patch also conservatively marks the rounding mode as an implicit def for both since they both may set the rounding mode depending on the operands.
Reviewed By: #powerpc, qiucf
Differential Revision: https://reviews.llvm.org/D105957
Implement a subset of builtins required for compatiblilty with AIX XL compiler.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D105930
This patch includes the following updates to the load/store refactoring effort introduced in D93370:
- Update various VSX patterns that use to "force" an XForm, to instead just XForm.
This allows the ability for the patterns to compute the most optimal addressing
mode (and to produce a DForm instruction when possible)
- Update pattern and test case for the LXVD2X/STXVD2X intrinsics
- Update LIT test cases that use to use the XForm instruction to use the DForm instruction
Differential Revision: https://reviews.llvm.org/D95115
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtins and instrisics for population
count, reversed load and store related operations.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D106021
This patch implements the `__popcntb` XL compatibility builtin for 32bit in the frontend and backend. This patch also updates tests for `__popcntb` and other XL Compat sync related builtins.
Reviewed By: #powerpc, nemanjai, amyk
Differential Revision: https://reviews.llvm.org/D105360
This patch uses AtomicExpandPass to implement quadword lock free atomic operations. It adopts the method introduced in https://reviews.llvm.org/D47882, which expand atomic operations post RA to avoid spilling that might prevent LL/SC progress.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D103614
$ is used as PC for PowerPC inlineasm, ELF use it,
enable it for AIX XCOFF as well.
Reviewed By: #powerpc, amyk, nemanjai
Differential Revision: https://reviews.llvm.org/D105956
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtins and instrisics for compare
and multiply related operations.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D102875
[NFC] This patch adds features for pwr7, pwr8, and pwr9 that can be
used for semachecking builtin functions that are only valid for certain
versions of ppc.
Reviewed By: nemanjai, #powerpc
Authored By: Quinn Pham <Quinn.Pham@ibm.com>
Differential revision: https://reviews.llvm.org/D105501
This patch adds a function that checks whether or not the frame index
is aligned when the computed addressing mode is an aligned D-Form (DS, or DQ-Form).
If the frame index appears to be unaligned, within these two modes, reset
the mode to X-Form in order to fall back to selection X-Form loads.
A test case is added to ensure that the test emits X-Form loads and not DQ-Form
loads since the frame index is not aligned within the test case.
Differential Revision: https://reviews.llvm.org/D105661
LDARX and LWARX sometimes gets optimized out by the compiler
when it is critical to the correctness of the code. This inline asm generation
ensures that it preserved.
Differential Revision: https://reviews.llvm.org/D105754
[NFC] This patch adds features for pwr7, pwr8, and pwr9 that can be
used for semachecking builtin functions that are only valid for certain
versions of ppc.
Reviewed By: nemanjai, #powerpc
Authored By: Quinn Pham <Quinn.Pham@ibm.com>
Differential revision: https://reviews.llvm.org/D105501
An assertion of the following can occur because Altivec and VSX splats use a different operand number for the immediate:
```
int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed.
```
This patch updates PPCMIPeephole.cpp assign the correct splat immediate.
Differential Revision: https://reviews.llvm.org/D105790
The lowering for v2i64 is now guarded with hasDirectMove,
however, the current lowering can handle the pattern correctly,
only lowering it when there is efficient patterns and corresponding
instructions.
The original guard was added in D21135, and was for Legal action.
The code has evloved now, this guard is not necessary anymore.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D105596
This patch implements trap and FP to and from double conversions. The builtins
generate code that mirror what is generated from the XL compiler. Intrinsics
are named conventionally with builtin_ppc, but are aliased to provide the same
builtin names as the XL compiler.
Differential Revision: https://reviews.llvm.org/D103668
Summary:
The bit order of the has_vec and longtbtable bits in the traceback table generated by the XL compiler flipped at some point after v12.1. This is different from the definition is the AIX header debug.h. The change in the XL compiler that caused the deviation from the OS header definition was unintentional. Since both orderings are extant and the XL compiler runtime also expects the ordering defined by the OS, we will correct the output from LLVM to match the defined ordering given by the OS (which is also consistent with the Assembler Language Reference). Mitigation for traceback tables encoded with the wrong ordering is required for either ordering.
Reviewers: XingXue, HubertTong
Differential Revision: https://reviews.llvm.org/D105487
When the instruction has imm form and fed by LI, we can remove the redundat LI instruction.
Below is an example:
```
renamable $x5 = LI8 2
renamable $x4 = exact SRD killed renamable $x4, killed renamable $r5, implicit $x5
```
will be converted to:
```
renamable $x5 = LI8 2
renamable $x4 = exact RLDICL killed renamable $x4, 62, 2, implicit killed $x5
```
But when we do this optimization, we forget to remove implicit killed $x5
This bug has caused a lnt case error. This patch is to fix above bug.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D85288
SelectionDAG's equivalents in ISD::InputArg/OutputArg track the
original argument index. Mips relies on this, and its currently
reinventing its own parallel CallLowering infrastructure which tracks
these indexes on the side. Add this to help move towards deleting the
custom mips handling.
Lowering for scalar to vector would skip if current subtarget is big
endian and the scalar is larger or equal than 64 bits. However there's
some issue in implementation that SToVRHS may refer to SToVLHS's scalar
size if SToVLHS is present, which leads to some crash.o
Reviewed By: nemanjai, shchenz
Differential Revision: https://reviews.llvm.org/D105094
There are some patterns involving the permuted scalar to vector node
for which we don't have patterns without direct moves on little endian
subtargets. This causes selection errors. While we can of course add
the missing patterns, any additional effort to make this work is not
useful since there is no support for any CPU that can run in
little endian mode and does not support direct moves.
Adding usage of VSSRC and VSFRC when adding the live in registers on AIX.
This matches the behaviour of the rest of PPC Subtargets.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D104396
The combine was disabled in 4e22c7265d as it caused failures in
the ppc64be-multistage (bootstrap) bot.
It turns out that the combine did not correctly update the MMO for
the high load which caused aliased stores to be reported as unaliased.
This patch fixes that problem and re-enables the combine.
This patch implaments the load and reserve and store conditional
builtins for the PowerPC target, in order to have feature parody with
xlC on AIX.
Differential revision: https://reviews.llvm.org/D105236
Allocate non-volatile registers in order to be compatible with ABI, regarding gpr_save.
Quoted from https://www.ibm.com/docs/en/ssw_aix_72/assembler/assembler_pdf.pdf page55,
> The preferred method of using GPRs is to use the volatile registers first. Next, use the nonvolatile registers
> in descending order, starting with GPR31.
This patch is based on @jsji 's initial draft.
Tested on test-suite and SPEC, found no degradation.
Reviewed By: jsji, ZarkoCA, xingxue
Differential Revision: https://reviews.llvm.org/D100167
This patch changes return type of tryCandidate from void to bool:
1. Methods in some targets already follow this convention.
2. This would help if some target wants to re-use generic code.
3. It looks more intuitive if these try-method returns the same type.
We may need to change return type of them from bool to some enum
further, to make it less confusing.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D103951
Summary:
in the patch https://reviews.llvm.org/D103651 [AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table.
when generate eh_info, it switch to other section, when it done, it need to switch back to text section again.
Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/105195
On PowerPC, VSRpRC represents the pairs of even and odd VSX register,
and VRRC corresponds to higher 32 VSX registers. In some cases, extra
copies are produced when handling incoming VRRC arguments with VSRpRC.
This patch changes allocation order of VSRpRC to eliminate this kind of
copy.
Stack frame sizes may increase if allocating non-volatile registers, and
some other vector copies happen. They need fix in future changes.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D104855
Commit 0464586ac5 added a combine
for a 64-bit load feeding a bswap but the implementation is only
correct for little endian systems.
This fixes it for big endian systems.
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
When targeting CPUs that don't have LDBRX, we end up producing code that is
very inefficient and large for this common idiom. This patch just
optimizes it two 32-bit LWBRX instructions along with a merge.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49610
Differential revision: https://reviews.llvm.org/D104836
Summary:
generate eh_info when vector registers are saved according to the traceback table.
struct eh_info_t {
unsigned version; /* EH info version 0 */
#if defined(64BIT)
char _pad[4]; /* padding */
#endif
unsigned long lsda; /* Pointer to Language Specific Data Area */
unsigned long personality; /* Pointer to the personality routine */
};
the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function
Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/D103651
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.
This fixes https://bugs.llvm.org/show_bug.cgi?id=46996
One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)
This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
ret [ 1 x %T1 ] zeroinitializer
}
```
We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.
This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.
You get this:
```
renamable $d0 = FMOVD0
$w0 = COPY killed renamable $d0
```
Rather than this:
```
$d0 = FMOVD0
$w0 = COPY $wzr
```
The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.
This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.
Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.
New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104123
We have added STXVP/LXVP for spilling and restoring the registers
but we neglected to add FI elimination code for these. The result
is that we end up producing impossible MachineInstr's that have
register operands in place of immediates.
Pointee types are going away soon.
For this, we mostly just care about store/load types, which are already
available without the pointee types. The other intrinsics always use
i8*.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D103719
Export `lq`, `stq`, `lqarx` and `stqcx.` in preparation for implementing 16-byte lock free atomic operations on AIX.
Add a new register class `g8prc` for these instructions, since these instructions require even-odd register pair.
Reviewed By: nemanjai, jsji, #powerpc
Differential Revision: https://reviews.llvm.org/D103010
GCC documentation for the `wa` constraint states that:
```
wa
A VSX register (VSR), vs0…vs63. This is either an FPR (vs0…vs31 are f0…f31)
or a VR (vs32…vs63 are v0…v31).
```
This technically means that we could accept floating point parameters. In fact,
gcc itself does. The following testcase compiles and runs on all PPC platforms with GCC,
whereas clang/llc will assert:
```
#include <stdio.h>
double foo ( vector double a ) {
double b, c;
asm("xvabsdp %x0, %x2 \n"
"xxsldwi %x1, %x0, %x0, 2 \n"
: "+wa" (b),
"=wa" (c)
: "wa" (a)
);
return b+c;
}
int main(void) {
vector double a = {-3., -4.};
double t = foo( a );
printf("%g\n", t);
}
```
This patch allows clang/llc to build and run this testcase.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D103409
Relaxing superclass constraint for VSX register classes helps reducing
32-byte spills and copies when register pressure is high.
In test case affected, some of them introduces more copies due to new
allocation order. However, this patch should not be the root cause, and
we may be able to fix it in other places of register allocation.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D104006
We will need to set the ssp canary bit in traceback table to communicate
with unwinder about the canary.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D103202
When `-fstack-clash-protection` is enabled and stack has to be realigned, some parts of redzone is written prior the probe, so probe might overwrite content already written in redzone. To avoid it, we have to make sure the first probe is at full probe size or is the last probe so that we can skip redzone.
It also fixes violation of ABI under PPC where `r1` isn't updated atomically.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49903.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D100290
According to ELF V2 ABI, `0` should be the dwarf number of `r0`. Currently MMA's register also uses `0` as its dwarf number, this confuses `RegisterInfoEmitter` and generates wrong dwarf -> llvm mapping.
```
extern const MCRegisterInfo::DwarfLLVMRegPair PPCDwarfFlavour1Dwarf2L[] = {
{ 0U, PPC::VSRp31 },
```
This leads to wrong cfi output in https://reviews.llvm.org/D100290.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D103761
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.
Differential Revision: https://reviews.llvm.org/D103759
It's still in use in a few places so we can't delete it yet but there's not
many at this point.
Differential Revision: https://reviews.llvm.org/D103352
The code gen for f32 to i32 bitcast is not currently the most efficient;
this patch removes some unneccessary instructions gerneated.
Differential revision: https://reviews.llvm.org/D100782