This patch removes the uneccessary mf/mtvsr generated in conjunction
with xscvdpsxws/xscvdpuxws.
Differential revision: https://reviews.llvm.org/D109902
This patch makes sure that the builtins __builtin_ppc_load8r and
__ builtin_ppc_store8r are only available for Power 7 and up.
Currently the builtins seem to produce incorrect code if used for
Power 6 or before.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D110653
The instruction has similar semantics to vbpermq but for doublewords.
It was added in Power9 and the ABI documents the builtin.
Differential revision: https://reviews.llvm.org/D107899
This patch is in a series of patches to provide builtins for
compatability with the XL compiler. This patch adds builtins for compare
exponent and test data class operations on floating point values.
Reviewed By: #powerpc, lei
Differential Revision: https://reviews.llvm.org/D109437
This patch fixes the pattern for the P10 instructions Vector Shift Left
Double by Bit Immediate VN-form and Vector Shift Right Double by Bit
Immediate VN-form. The third argument should be a target constant (`timm`)
instead of an `i32` because an immediate is expected.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D109920
This patch marks splat immediate instructions XXSPLTIW and XXSPLTIDP as
rematerializable to prevent MachineLICM from moving them out of loops.
Reviewed By: lei, amy
Differential revision: https://reviews.llvm.org/D108823
Based off a discussion on D110100, we should be avoiding default CostKinds whenever possible.
This initial patch removes them from the 'inner' target implementation callbacks - these should only be used by the main TTI calls, so this should guarantee that we don't cause changes in CostKind by missing it in an inner call. This exposed a few missing arguments in getGEPCost and reduction cost calls that I've cleaned up.
Differential Revision: https://reviews.llvm.org/D110242
This is a follow-up of D105872. Now we are able to prepare for update
form with non-const increment.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D106032
This patch adds a prefixed load pattern involving v2f32 fpext v2f64, where we
are dealing with a value with an offset that fits into a 34-bit signed immediate.
A reduced test case is also added to patch that tests the pattern, in which the
pattern is tested in the big endian CHECKs of the newly added test.
Differential Revision: https://reviews.llvm.org/D109887
This patch exploits the prefixed load and store instructions utilizing the
refactored load/store implementation introduced in D93370.
Prefixed load and store instructions are emitted whenever we are loading or
storing a value with an offset that fits into a 34-bit signed immediate.
Patterns for the prefixed load and stores are added in this patch, as well as
the implementation that detects when we are loading and storing a value with an
offset that fits in 34-bits.
Differential Revision: https://reviews.llvm.org/D96075
PPCLoopInstrFormPrep pass now can prepare for load store instructions
in a loop whose increment is not a constant integer.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D105872
This patch updates the PC-Relative load and store patterns to utilize the
refactored load/store implementation introduced in D93370.
PC-Relative implementation has been added to PPCISelLowering.cpp, and also the
patterns in PPCInstrPrefix.td have been updated and no longer require AddedComplexity.
All existing test cases pass with this update.
Differential Revision: https://reviews.llvm.org/D95116
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D109535
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
This patch adds a fix to do early if conversion to select when
conditional branch not using physical register to prevent the crash when
expanding ISEL instruction.
Reviewed By: lei, kamaub, PowerPC
Differential revision: https://reviews.llvm.org/D108302
This is exposed by enabling FastIsel on 64bit AIX.
We are generating XSRSP regardless of the arch,
which may be wrong when -mcpu=pwr7.
The fix is to guard the generation in P8 only.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D109365
On some architectures such as Arm and X86 the encoding for a nop may
change depending on the subtarget in operation at the time of
encoding. This change replaces the per module MCSubtargetInfo retained
by the targets AsmBackend in favour of passing through the local
MCSubtargetInfo in operation at the time.
On Arm using the architectural NOP instruction can have a performance
benefit on some implementations.
For Arm I've deleted the copy of the AsmBackend's MCSubtargetInfo to
limit the chances of this causing problems in the future. I've not
done this for other targets such as X86 as there is more frequent use
of the MCSubtargetInfo and it looks to be for stable properties that
we would not expect to vary per function.
This change required threading STI through MCNopsFragment and
MCBoundaryAlignFragment.
I've attempted to take into account the in tree experimental backends.
Differential Revision: https://reviews.llvm.org/D45962
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.
There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.
For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.
This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.
Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.
Differential Revision: https://reviews.llvm.org/D45961
This patch basically enables fast-isel for AIX 64-bit subtarget
(previously enabled only for ELF 64). The initial motivation is to
introduce branch folding to AIX generated code for correct debug
behavior. I also saw some compiling time improvement in a few LLVM
test-suite benchmarks. (toast, dbms, cjpeg, burg, etc.)
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D98844
Similar to D108842, D108844, and D108926.
__has_builtin(builtin_mul_overflow) returns true for 32b PPC targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks ppc44x_defconfig + CONFIG_BLK_DEV_NBD=y builds of the Linux
kernel that are using builtin_mul_overflow with these types for these
targets.
If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.
This will still need to be worked around in the Linux kernel in order to
continue to support these builds of the Linux kernel for this
target with older releases of clang.
Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D108936
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.
This is a followup to D108076.
Differential Revision: https://reviews.llvm.org/D108875
PowerPC can model these instructions, so we don't need this flag set.
Reviewed By: shchenz, jsji
Differential Revision: https://reviews.llvm.org/D71983
After c063946476 usage of R31 doesn't necessarily mean
that alloca is used. The `TracebackTable::IsAllocaUsedMask` flag should be set only
when R31 is used as a frame pointer.
On AIX the `function calls alloca' bit seems to be set whenever R31 is
set up as a frame pointer, even when there is no alloca call.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D108141
This is the first step to enable PPC64 support huge frame size(>2G). Also fix an assertion error for frame size, i.e.,`int x; !isInt<32>(x);` should be always evaluated false, so the guard code for frame size is impossible to hit.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D107435
It was introduced in 1a6dc92 and only enabled on PowerPC/AMDGPU. That
should be enabled for all targets.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D108010
It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual
FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this
intrinsic needs to remain as a call as it cannot be lowered to any instruction, which
also means we need to disable CTR loop generation for fma involving the ppcf128 type.
This patch accomplishes this behaviour.
Differential Revision: https://reviews.llvm.org/D107914
Add builtin and intrinsic for `__addex`.
This patch is part of a series of patches to provide builtins for
compatibility with the XL compiler.
Reviewed By: stefanp, nemanjai, NeHuang
Differential Revision: https://reviews.llvm.org/D107002
When depth > 0, callee frame address is used to compute the return address of
callee producing improper return address. This patch adds the fix to use caller
frame address to compute the return address of callee.
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D107646
Some files still contained the old University of Illinois Open Source
Licence header. This patch replaces that with the Apache 2 with LLVM
Exception licence.
Differential Revision: https://reviews.llvm.org/D107528
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D107271
Regsier hints when copying to a UACC register do not always produce VSRp
registers. This patch makes sure that we do not produce hints in cases
where the subregsiter of the UACC is not a VSRp.
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D107101
The dst/dstt/dstst/dststt instructions are nop's on all PowerPC
cores that AIX supports. The AIX assembler also does not accept
these mnemonics. Turn them into nop's on AIX (similar to dstall).
All floating point values in registers are in double precision
representation. In order to materialize the correct single precision
value, we need to convert the APFloat that represents the value
to double precision first.
Reviewed By: amyk, NeHuang
Differential Revision: https://reviews.llvm.org/D106812
Before MASSV only supported P8 and P9 on AIX ans Linux . This patch proposes
MASSV to add support of P7 and P10 only on AIX too.
Differential: https://reviews.llvm.org/D106678
Add td definitions and asm/disasm tests for the addex instruction introduced in
ISA 3.0.
Reviewed By: nemanjai, amyk, NeHuang
Differential Revision: https://reviews.llvm.org/D106666
This is a followup patch for D105930 to add implicit-def of RM for
mtfsb[01] instructions as per review comments.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D106603
This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch adds the builtin and intrinsic for "__stbcx".
Reviewed By: nemanjai, #powerpc
Differential revision: https://reviews.llvm.org/D106484
This patch is in a series of patches to provide
builtins for compatibility with the XL compiler.
This patch adds builtins related to floating point
operations
Reviewed By: #powerpc, nemanjai, amyk, NeHuang
Differential Revision: https://reviews.llvm.org/D103986