This patch updates two patterns involving `scalar_to_vector` and
`SCALAR_TO_VECTOR_PERMUTED` nodes to be safe for both 64-bit and 32-bit by
pulling the patterns out of the 64-bit specific guard. These patterns are
matched on POWER8 and above.
Differential Revision: https://reviews.llvm.org/D125389
On Power PC we have ISA3.0 for Power 9, ISA3.1 for Power 10.
This patchs adds an ISA for mcpu=future. The idea is to have a placeholder ISA
for work that is experimental and may not be supported by existing ISAs.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D126075
This commit resolves a Linux kernel build failure that was revealed by
c35ca3a1c7. The patch introduces two new
intrinsics, which ultimately changes the intrinsic numbering of other PPC
intrinsics. This causes an issue introduced by
ff40fb07ad, as the patch checks for intrinsics
with particular values, but the addition of the fnabs/fnabss intrinsics updates
the original sqrt/sdiv intrinsic values.
This patch implements the following floating point negative absolute value
builtins that required for compatibility with the XL compiler:
```
double __fnabs(double);
float __fnabss(float);
```
These builtins will emit :
- fnabs on PWR6 and below, or if VSX is disabled.
- xsnabsdp on PWR7 and above, if VSX is enabled.
Differential Revision: https://reviews.llvm.org/D125506
This fixes bug 55463, similar to D78668. This is a temporary fix since
we will switch to post-isel CTR loop determination in the future.
Reviewed By: dim, shchenz
Differential Revision: https://reviews.llvm.org/D125746
The name `MCFixedLenDisassembler.h` is out of date after D120958.
Rename it as `MCDecoderOps.h` to reflect the change.
Reviewed By: myhsu
Differential Revision: https://reviews.llvm.org/D124987
If the mask is made up of elements that form a mask in the higher type
we can convert shuffle(bitcast into the bitcast type, simplifying the
instruction sequence. A v4i32 2,3,0,1 for example can be treated as a
1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load
combines, along with helping simplify a number of other tests.
The PowerPC combine for v16i8 splat vector loads needed some fixes to
keep it working for v16i8 vectors. This improves the handling of v2i64
shuffles to match too, hopefully improving them in general.
Differential Revision: https://reviews.llvm.org/D123801
This patch turns on support for CR bit accesses for Power8 and above. The reason
why CR bits are turned on as the default for Power8 and above is that because
later architectures make use of builtins and instructions that require CR bit
accesses (such as the use of setbc in the vector string isolate predicate
and bcd builtins on Power10).
This patch also adds the clang portion to allow for turning on CR bits in the
front end if the user so desires to.
Differential Revision: https://reviews.llvm.org/D124060
Add the isNoTOCCallInstr function to PPCInstrInfo to determine if a call opcode
does not need a TOC restore after the call. All call opcodes should be listed in
this function. A default unreachable in this function should force future call
opcodes to also be added.
This is a follow up patch to D122012
Reviewed By: jsji, shchenz
Differential Revision: https://reviews.llvm.org/D124415
For the AIX linker, under default options, global or weak symbols which
have no visibility bits set to zero (i.e. no visibility, similar to ELF
default) are only exported if specified on an export list provided to
the linker. So AIX has an additional visibility style called
"exported" which indicates to the linker that the symbol should
be explicitly globally exported.
This change maps "dllexport" in the LLVM IR to correspond to XCOFF
exported as we feel this best models the intended semantic (discussion
on the discourse RFC thread: https://discourse.llvm.org/t/rfc-adding-exported-visibility-style-to-the-ir-to-model-xcoff-exported-visibility/61853)
and allows us to enable writing this visibility for the AIX target
in the assembly path.
Reviewed By: DiggerLin
Differential Revision: https://reviews.llvm.org/D123951
Currently the regsiter operand definitions are found in three separate files.
This patch moves all of the definitions into PPCRegisterInfo.td.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D123543
Renamed the two classes 8LS_DForm_R_SI34_RTA5 and 8LS_DForm_R_SI34_XT6_RA5 to
8LS_DForm_R_SI34_RTA5_MEM and 8LS_DForm_R_SI34_XT6_RA5_MEM because the
instructions that use the classes use memory reads/writes.
Moved the instruction defs up closer to the classes.
Removed unnecessary whitespace.
This fixes CVE-2019-15847, preventing random number generation from
being merged.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D122783
AtomicExpandPass uses this variable to determine emitting libcalls or not. The default value is 1024 and if we don't specify it for PPC64 explicitly, AtomicExpandPass won't emit `__atomic_*` libcalls for those target unable to inline atomic ops and finally the backend emits `__sync_*` libcalls. Thanks @efriedma for pointing it out.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D122868
Make 16-byte atomic type aligned to 16-byte on PPC64, thus consistent with GCC. Also enable inlining 16-byte atomics on non-AIX targets on PPC64.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D122377
Add support for builtin_[max|min] which has below prototype:
A builtin_max (A1, A2, A3, ...)
All arguments must have the same type; they must all be float, double, or long double.
Internally use SelectCC to get the result.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D122478
To store a byval parameter the existing code would store as many 8 byte elements
as was required to store the full size of the byval parameter.
For example, a paramter of size 16 would store two element of 8 bytes.
A paramter of size 12 would also store two elements of 8 bytes.
This would sometimes store too many bytes as the size of the paramter is not
always a factor of 8.
This patch fixes that issue and now byval paramters are stored with the correct
number of bytes.
Reviewed By: nemanjai, #powerpc, quinnp, amyk
Differential Revision: https://reviews.llvm.org/D121430
Add a compiler option and the instructions required to set the
special Data Stream Control Register (DSCR). The special register will
not be set by default.
Original patch by: Muhammad Usman
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D117013
All LLVM backends use MCDisassembler as a base class for their
instruction decoders. Use "const MCDisassembler *" for the decoder
instead of "const void *". Remove unnecessary static casts.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D122245
The BL8_NOTOC_RM instruction was incorrectly producing a relocation that reqired
a TOC restore after the call. This patch fixes that issue and the notoc
relocation is now used.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D122012
Rename file for PPCCTRLoopsVerify pass from PPCCTRLoops.cpp
to PPCCTRLoopsVerify.cpp.
There will be a new file PPCCTRLoops.cpp for PPC CTR loops
generation later.
After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3
instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not
even with compiler-rt. Block it as well so that we get inline code.
Because libgcc doesn't have __muloti4, we block that as well.
Fixes#54460.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D122090
Add the calling convention for the vector pair registers.
These registers overlap with the vector registers.
Part of an original patch by: Lei Huang
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D117225
We are going to remove the old 'perfect shuffle' optimization since it
brings performance penalty in hot loop around vectors. For example, in
following loop sharing the same mask:
%v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
%v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead
of `vperm-vperm`. In some large loop cases, this causes 20%+ performance
penalty.
The original attempt to resolve this is to pre-record masks of every
shufflevector operation in DAG, but that is somewhat complex and brings
unnecessary computation (to scan all nodes) in optimization. Here we
disable it by default. There're indeed some cases becoming worse after
this, which will be fixed in a more careful way in future patches.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D121082
VSX introduced some permute instructions that are direct
replacements for Altivec ones except they can target all
the VSX registers. We have added code generation for most
of these but somehow missed the low/hi word merges (XXMRG[LH]W).
This caused some additional spills on some large
computationally intensive code.
This patch simply adds the missed patterns.
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.
But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.
This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D116015
This wraps up from D119053. The 2 headers are moved as described,
fixed file headers and include guards, updated all files where the old
paths were detected (simple grep through the repo), and `clang-format`-ed it all.
Differential Revision: https://reviews.llvm.org/D119876
There are two MMA patterns that have been added twice. This patch just removes
one set of petterns. Should not change the way MMA behaves.
Reviewed By: lei, #powerpc
Differential Revision: https://reviews.llvm.org/D120680
Currently all of the MMA instructions as well as the MMA related register info
is bundled with the Power 10 instructions. This patch just splits them out.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D120515
Added the license info as well as description about how classes should be named
based on existing documentation.
Reviewed By: lei, #powerpc
Differential Revision: https://reviews.llvm.org/D120530
Add the Power 10 instruction LXVKQ.
This patch was taken from an original patch by: Yi-Hong Lyu
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D117507
The Linux kernel build uses absolute expressions suffixed with @lo/@ha
relocations. This currently doesn't work for DS/DQ form instructions and
there is no reason for it not to. It also works with GAS.
This patch allows this as long as the value is a multiple of 4/16
for DS/DQ form.
Differential revision: https://reviews.llvm.org/D115419
Perfect shuffle was introduced into PowerPC backend years ago, and only
available in big-endian subtargets. This optimization has good effects
in simple cases, but brings serious negative impact in large programs
with many shuffle instructions sharing the same mask.
Here introduces a temporary backend hidden option to control it until we
implemented better way to fix the gap in vectorshuffle decomposition.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D120072
This patch updates the handling of vectors in getPreferredVectorAction():
For single-element and scalable vectors, fall back to default vector legalization
handling. For vNi1 vectors, add handling to either split or promote them in
order to prevent the production of wide v256i1/v512i1 types.
The following assertion is fixed by this patch, as we ended up producing the
wide vector types (that are used for MMA) in the backend prior to this fix.
```
Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() &&
"Cannot BITCAST between types of different sizes!"
```
Differential Revision: https://reviews.llvm.org/D119521
The LDMX instruction was to be potentially added in P9 but it was never added
in either ISA 3.0 or ISA 3.1. This patch removes that instruction as it is
currently still an invalid instruction.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D118074
Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection,
and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128,
llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt.
Reviewed By: nemanjai, shchenz, amyk
Differential Revision: https://reviews.llvm.org/D117006
There's a few relevant forward declarations in there that may require downstream
adding explicit includes:
llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h
Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after: 1049293745
Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244
For PGO on AIX, when we switch to the linux-style PGO variable access
(via _start and _stop labels), we need the compiler to generate a .ref
assembly for each of the three csects:
- __llvm_prf_data[RW]
- __llvm_prf_names[RO]
- __llvm_prf_vnds[RW]
We insert the .ref inside the __llvm_prf_cnts[RW] csect so that if it's
live then the 3 csects are live.
For example, for a testcase with at least one function definition, when
compiled with -fprofile-generate we should generate:
.csect __llvm_prf_cnts[RW],3
.ref __llvm_prf_data[RW] <<============ needs to be inserted
.ref __llvm_prf_names[RO] <<===========
the __llvm_prf_vnds is not always present, so we reference it only when
it's present.
Reviewed By: sfertile, daltenty
Differential Revision: https://reviews.llvm.org/D116607
This patch introduces the conversions from math function calls
to MASS library calls. To resolves calls generated with these conversions, one
need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX.
Differential: https://reviews.llvm.org/D101759
Reviewer: bmahjour
This patch updates the P10 patterns with a load feeding into an insertelt to
utilize the refactored load and store infrastructure, as well as updating any
tests that exhibit any codegen changes.
Furthermore, custom legalization is added for v4f32 on Power9 and above to not
only assist with adjusting the refactored load/stores for P10 vector insert,
but also it enables the utilization of direct moves.
Differential Revision: https://reviews.llvm.org/D115691
This patch updates how splat loads handled and is an extension of D106555.
Particularly, for v2i64/v4f32/v4i32 types, they are updated to handle only
non-extending loads. For v8i16/v16i8 types, they are updated to handle extending
loads only if the memory VT is the same vector element VT type.
A test case has been added to illustrate a scenario where a PPCISD::LD_SPLAT
node should not be produced. In this test, it depicts the following f64
extending load used in a v2f64 build vector, but the extending load is actually
used in more places other than the build vector (such as in t12 and t16).
```
Type-legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 20 nodes:
t0: ch = EntryToken
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t6: i64,ch = CopyFromReg t0, Register:i64 %2
t11: f64,ch = load<(load (s64) from %ir.b, !tbaa !7)> t0, t4, undef:i64
t16: f64 = fadd t31, t37
t34: ch = store<(store (s64) into %ir.c, !tbaa !7)> t31:1, t16, t6, undef:i64
t36: ch = TokenFactor t34, t37:1
t27: v2f64 = BUILD_VECTOR t37, t37
t22: ch,glue = CopyToReg t36, Register:v2f64 $v2, t27
t12: f64 = fadd t11, t37
t28: ch = store<(store (s64) into %ir.b, !tbaa !7)> t11:1, t12, t4, undef:i64
t31: f64,ch = load<(load (s64) from %ir.c, !tbaa !7)> t28, t6, undef:i64
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t37: f64,ch = load<(load (s32) from %ir.a, !tbaa !3), anyext from f32> t0, t2, undef:i64
t23: ch = PPCISD::RET_FLAG t22, Register:v2f64 $v2, t22:1
```
Differential Revision: https://reviews.llvm.org/D117803
This reverts commit ef82063207.
- It conflicts with the existing llvm::size in STLExtras, which will now
never be called.
- Calling it without llvm:: breaks C++17 compat
In commit 1674d9b6b2, I fixed the bug where we didn't consider
both words of the result of the comparison. However, the logic
needs to be different for eq and ne.
Namely for eq, we need both words of the doubleword to equal so it
is an AND. OTOH for ne, we need either word to be unequal so it
is an OR.
According to GNU as documentation, PowerPC supports some .gnu_attribute
tags to represent the vector and float ABI type in the object file.
Some linkers like GNU ld respects the attribute and will prevent objects
with conflicting ABIs being linked.
This patch emits gnu_attribute value in assembly printer according to
the float-abi metadata. More attributes for soft-fp, hard single/double
and even vector ABI need to be supported in the future.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D117193
Instead use either Type::getPointerElementType() or
Type::getNonOpaquePointerElementType().
This is part of D117885, in preparation for deprecating the API.
This patch emits a warning when the stack pointer register (`R1`) is found in
the clobber list of an inline asm statement. Clobbering the stack pointer is
not supported.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D112073
This patch is the first step to enable support of GNU attribute in LLVM
PowerPC, enabling it for PowerPC targets, otherwise llvm-mc raises error
when seeing the attribute section.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D115854
When doing the float to int conversion the strict conversion also needs to
retun a chain. This patch fixes that.
Reviewed By: nemanjai, #powerpc, qiucf
Differential Revision: https://reviews.llvm.org/D117464
FAST-ISEL should fall back to DAG-ISEL when a global variable has the
toc-data attribute. A number of the checks were duplicated in the lit
test becuase of
1) Slightly different output between -O0 and -O2 due to FAST-ISEL vs
DAG-ISEL codegen.
2) In preperation of a peephole optimization that will run when
optimizations are enabled.
Differential Revision: https://reviews.llvm.org/D115373
User errors should use reportError. reportError allows us to continue parsing
the file and collect more diagnostics.
While here, make the diagnostic follow convention, merge tests, and test
line/column numbers.
As pointed out in https://reviews.llvm.org/D115688#inline-1108193, we
don't want to sink the save point past an INLINEASM_BR, otherwise
prologepilog may incorrectly sink a prolog past the MBB containing an
INLINEASM_BR and into the wrong MBB.
ShrinkWrap is getting this wrong because LR is not in the list of callee
saved registers. Specifically, ShrinkWrap::useOrDefCSROrFI calls
RegisterClassInfo::getLastCalleeSavedAlias which reads
CalleeSavedAliases which was populated by
RegisterClassInfo::runOnMachineFunction by iterating the list of
MCPhysReg returned from MachineRegisterInfo::getCalleeSavedRegs.
Because PPC's LR is non-allocatable, it's NOT considered callee saved.
Add an interface to TargetRegisterInfo for such a case and use it in
Shrinkwrap to ensure we don't sink a prolog past an INLINEASM or
INLINEASM_BR that clobbers LR.
Reviewed By: jyknight, efriedma, nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D116424
Add support for Return Oriented Programming (ROP) protection for 32 bit.
This patch also adds a testing for AIX on both 64 and 32 bit.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D111362
This reverts commit fd4808887e.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.
This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.
This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).
Differential Revision: https://reviews.llvm.org/D116031
The current code makes the assumption that equality
comparison can be performed with a word comparison
instruction. While this is true if the entire 64-bit
results are used, it does not generally work. It is
possible that the low order words and high order
words produce different results and a user of only
one will get the wrong result.
This patch adds an and of the result words so that
each word has the result of the comparison of the
entire doubleword that contains it.
Differential revision: https://reviews.llvm.org/D115678
Commit 150681f increases
cost of producing MMA types (vector pair and quad).
However, it increases the cost for getUserCost() which is
used in unrolling. As a result, loops that contain these
types already (from the user code) cannot be unrolled
(even with the user's unroll pragma). This was an unintended
sideeffect. Reverting that portion of the commit to allow
unrolling such loops.
Differential revision: https://reviews.llvm.org/D115424
Summary: When disassembling, symbolize a branch target operand
to print a label instead of a real address.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D114492
This mnemonic has been supported by GAS for years and
it was added to the PowerPC ISA as of ISA 3.1. We will
support the mnemonic to be compatible with GAS.