In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
I don't have any evidence these particular uses are actually causing any
issues, but we should avoid accidentally truncating immediate values
depending on the host.
We can't guarantee the long always 64 bits like WINDOWS or LLP64 data
model (rare but we should consider).
So use int64_t from inttypes.h and safe in this case.
Fixes https://github.com/llvm/llvm-project/issues/55911 .
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
We add the following ISEL pattern for i64 imm in D87384, this patch is for i32.
`mul with (2^N * int16_imm) -> MULLI + RLWINM`
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D129708
isSafeToExpand() for addrecs depends on whether the SCEVExpander
will be used in CanonicalMode. At least one caller currently gets
this wrong, resulting in PR50506.
Fix this by a) making the CanonicalMode argument on the freestanding
functions required and b) adding member functions on SCEVExpander
that automatically take the SCEVExpander mode into account. We can
use the latter variant nearly everywhere, and thus make sure that
there is no chance of CanonicalMode mismatch.
Fixes https://github.com/llvm/llvm-project/issues/50506.
Differential Revision: https://reviews.llvm.org/D129630
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo. The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.
Recommitted with some fixes for the leftover MCII variables in release
builds.
Differential Revision: https://reviews.llvm.org/D129506
This reverts commit e2fb8c0f4b as it does
not build for Release builds, and some buildbots are giving more warning
than I saw locally. Reverting to fix those issues.
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo. The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.
Differential Revision: https://reviews.llvm.org/D129506
Proposing to move the check for scalar MASS conversion from constructor
of PPCTargetLowering to the lowerLibCallBase function which decides
about the lowering.
The Target machine option Options.PPCGenScalarMASSEntries is set in
PPCTargetMachine.cpp. But an object of the class PPCTargetLowering
is created in one of the included header files. So, the constructor will run
before setting PPCGenScalarMASSEntries to correct value. So, we cannot
check this option in the constructor.
Differential: https://reviews.llvm.org/D128653
Reviewer: @bmahjour
According to D127731, PPCTLSDynamicCall does not preserve
LiveIntervals, so stop claiming that it does and remove the code
that tried to repair them. NFCI.
Differential Revision: https://reviews.llvm.org/D128421
variable with its multiple aliases.
This patch handles the case where a variable has
multiple aliases.
AIX's assembly directive .set is not usable for the
aliasing purpose, and using different labels allows
AIX to emulate symbol aliases. If a value is emitted
between any two labels, meaning they are not aligned,
XCOFF will automatically calculate the offset for them.
This patch implements:
1) Emits the label of the alias just before emitting
the value of the sub-element that the alias referred to.
2) A set of aliases that refers to the same offset
should be aligned.
3) We didn't emit aliasing labels for common and
zero-initialized local symbols in
PPCAIXAsmPrinter::emitGlobalVariableHelper, but
emitted linkage for them in
AsmPrinter::emitGlobalAlias, which caused a FAILURE.
This patch fixes the bug by blocking emitting linkage
for the alias without a label.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D124654
opportunities
There are straight forward splat load opportunities blocked by
getNormalLoadInput(), since those cases involve consecutive bitcasts.
Improve by looking through bitcasts.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D128703
This patch implements a new way to generate the CTR loops. Now the
intrinsics inserted in hardware loop pass will be mapped to pseudo
instructions and these pseudo instructions will be expanded to CTR
loop or normal compare+branch loop in this post ISEL pass.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D122125
There are instances where using paired vector stores leads to significant
performance degradation due to issues with store forwarding.To avoid falling
into this trap with compiler - generated code, we will not emit these
instructions unless the user requests them explicitly(with a builtin or by
specifying the option).
Reviewed By : lei, amyk, saghir
Differential Revision: https://reviews.llvm.org/D127218
This patch adds the instructions `MTVSCR` and `MFVSCR` as not swappable to the
PPCVSXSwapRemoval pass because they are not lane-insensitive. This will prevent
the compiler from optimizing out required swaps when using `lxvd2x` and
`stxvd2x`.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D128062
In some passes we need a valid number of cache line size to do analysis or
transformation, e.g., loop cache analysis and loop date prefetch. However,
for some backend targets, `TTIImpl->getCacheLineSize()` is not implemented
and hence 'TTI.getCacheLineSize()' would just return 0 which eventually might
produce invalid result.
In this patch we add a user-specified opt/llc option for cache line size.
If the option is specified by users we use the value supplied, otherwise we
fall-back to the default value obtained from `TTIImpl->->getCacheLineSize()`.
The powerpc target already has such an option, this patch generalizes
this option to TargetTransformInfo.cpp.
Reviewed By: bmahjour, #loopoptwg
Differential Revision: https://reviews.llvm.org/D127342
This patch fixes the load and store quadword instructions on
PowerPC to use correct offset and base address.
Reviewed By: #powerpc, nemanjai, lkail
Differential Revision: https://reviews.llvm.org/D126807
Currently in `combineVectorShuffle()`, we update the shuffle mask if either
input vector comes from a scalar_to_vector, and we keep the respective input
vectors in its permuted form by producing PPCISD::SCALAR_TO_VECTOR_PERMUTED.
However, it is possible that we end up in a situation where both input vectors
to the vector_shuffle are scalar_to_vector, and are different vector types.
In situations like this, the shuffle mask is updated incorrectly as the current
code assumes both scalar_to_vector inputs are the same vector type.
This patch skips the combines for vector_shuffle if both input vectors are
scalar_to_vector, and if they are of different vector types. A follow up patch
will focus on fixing this issue afterwards, in order to correctly update the
shuffle mask.
Differential Revision: https://reviews.llvm.org/D127818
This patch changes the PowerPC backend to generate VSX load/store instructions
for all vector loads/stores on Power8 and earlier (LE) instead of VMX
load/store instructions. The reason for this change is because VMX instructions
require the vector to be 16-byte aligned. So, a vector load/store will fail with
VMX instructions if the vector is misaligned. Also, `gcc` generates VSX
instructions in this situation which allow for unaligned access but require a
swap instruction after loading/before storing. This is not an issue for BE
because we already emit VSX instructions since no swap is required. And this is
not an issue on Power9 and up since we have access to `lxv[x]`/`stxv[x]` which
allow for unaligned access and do not require swaps.
This patch also delays the VSX load/store for LE combines until after
LegalizeOps to prioritize other load/store combines.
Reviewed By: #powerpc, stefanp
Differential Revision: https://reviews.llvm.org/D127309
The combine step for shufflevector will sometimes replace undef in the mask
with a defined value. This can cause an infinite loop in some cases as another
combine will then put the undef back in the mask.
This patch fixes the issue so that undefs are not replaced when doing a combine.
Reviewed By: ZarkoCA, amyk, quinnp, saghir
Differential Revision: https://reviews.llvm.org/D127439
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.
Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.
Differential Revision: https://reviews.llvm.org/D126910
Support allocation of huge stack frame(>2g) on PPC64.
For ELFv2 ABI on Linux, quoted from the spec 2.2.3.1 General Stack Frame Requirements
> There is no maximum stack frame size defined.
On AIX, XL allows such huge frame.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D107886
This patch updates two patterns involving `scalar_to_vector` and
`SCALAR_TO_VECTOR_PERMUTED` nodes to be safe for both 64-bit and 32-bit by
pulling the patterns out of the 64-bit specific guard. These patterns are
matched on POWER8 and above.
Differential Revision: https://reviews.llvm.org/D125389
On Power PC we have ISA3.0 for Power 9, ISA3.1 for Power 10.
This patchs adds an ISA for mcpu=future. The idea is to have a placeholder ISA
for work that is experimental and may not be supported by existing ISAs.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D126075
This commit resolves a Linux kernel build failure that was revealed by
c35ca3a1c7. The patch introduces two new
intrinsics, which ultimately changes the intrinsic numbering of other PPC
intrinsics. This causes an issue introduced by
ff40fb07ad, as the patch checks for intrinsics
with particular values, but the addition of the fnabs/fnabss intrinsics updates
the original sqrt/sdiv intrinsic values.
This patch implements the following floating point negative absolute value
builtins that required for compatibility with the XL compiler:
```
double __fnabs(double);
float __fnabss(float);
```
These builtins will emit :
- fnabs on PWR6 and below, or if VSX is disabled.
- xsnabsdp on PWR7 and above, if VSX is enabled.
Differential Revision: https://reviews.llvm.org/D125506
This fixes bug 55463, similar to D78668. This is a temporary fix since
we will switch to post-isel CTR loop determination in the future.
Reviewed By: dim, shchenz
Differential Revision: https://reviews.llvm.org/D125746
The name `MCFixedLenDisassembler.h` is out of date after D120958.
Rename it as `MCDecoderOps.h` to reflect the change.
Reviewed By: myhsu
Differential Revision: https://reviews.llvm.org/D124987
If the mask is made up of elements that form a mask in the higher type
we can convert shuffle(bitcast into the bitcast type, simplifying the
instruction sequence. A v4i32 2,3,0,1 for example can be treated as a
1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load
combines, along with helping simplify a number of other tests.
The PowerPC combine for v16i8 splat vector loads needed some fixes to
keep it working for v16i8 vectors. This improves the handling of v2i64
shuffles to match too, hopefully improving them in general.
Differential Revision: https://reviews.llvm.org/D123801
This patch turns on support for CR bit accesses for Power8 and above. The reason
why CR bits are turned on as the default for Power8 and above is that because
later architectures make use of builtins and instructions that require CR bit
accesses (such as the use of setbc in the vector string isolate predicate
and bcd builtins on Power10).
This patch also adds the clang portion to allow for turning on CR bits in the
front end if the user so desires to.
Differential Revision: https://reviews.llvm.org/D124060
Add the isNoTOCCallInstr function to PPCInstrInfo to determine if a call opcode
does not need a TOC restore after the call. All call opcodes should be listed in
this function. A default unreachable in this function should force future call
opcodes to also be added.
This is a follow up patch to D122012
Reviewed By: jsji, shchenz
Differential Revision: https://reviews.llvm.org/D124415
For the AIX linker, under default options, global or weak symbols which
have no visibility bits set to zero (i.e. no visibility, similar to ELF
default) are only exported if specified on an export list provided to
the linker. So AIX has an additional visibility style called
"exported" which indicates to the linker that the symbol should
be explicitly globally exported.
This change maps "dllexport" in the LLVM IR to correspond to XCOFF
exported as we feel this best models the intended semantic (discussion
on the discourse RFC thread: https://discourse.llvm.org/t/rfc-adding-exported-visibility-style-to-the-ir-to-model-xcoff-exported-visibility/61853)
and allows us to enable writing this visibility for the AIX target
in the assembly path.
Reviewed By: DiggerLin
Differential Revision: https://reviews.llvm.org/D123951
Currently the regsiter operand definitions are found in three separate files.
This patch moves all of the definitions into PPCRegisterInfo.td.
Reviewed By: amyk
Differential Revision: https://reviews.llvm.org/D123543
Renamed the two classes 8LS_DForm_R_SI34_RTA5 and 8LS_DForm_R_SI34_XT6_RA5 to
8LS_DForm_R_SI34_RTA5_MEM and 8LS_DForm_R_SI34_XT6_RA5_MEM because the
instructions that use the classes use memory reads/writes.
Moved the instruction defs up closer to the classes.
Removed unnecessary whitespace.
This fixes CVE-2019-15847, preventing random number generation from
being merged.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D122783
AtomicExpandPass uses this variable to determine emitting libcalls or not. The default value is 1024 and if we don't specify it for PPC64 explicitly, AtomicExpandPass won't emit `__atomic_*` libcalls for those target unable to inline atomic ops and finally the backend emits `__sync_*` libcalls. Thanks @efriedma for pointing it out.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D122868
Make 16-byte atomic type aligned to 16-byte on PPC64, thus consistent with GCC. Also enable inlining 16-byte atomics on non-AIX targets on PPC64.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D122377
Add support for builtin_[max|min] which has below prototype:
A builtin_max (A1, A2, A3, ...)
All arguments must have the same type; they must all be float, double, or long double.
Internally use SelectCC to get the result.
Reviewed By: qiucf
Differential Revision: https://reviews.llvm.org/D122478
To store a byval parameter the existing code would store as many 8 byte elements
as was required to store the full size of the byval parameter.
For example, a paramter of size 16 would store two element of 8 bytes.
A paramter of size 12 would also store two elements of 8 bytes.
This would sometimes store too many bytes as the size of the paramter is not
always a factor of 8.
This patch fixes that issue and now byval paramters are stored with the correct
number of bytes.
Reviewed By: nemanjai, #powerpc, quinnp, amyk
Differential Revision: https://reviews.llvm.org/D121430
Add a compiler option and the instructions required to set the
special Data Stream Control Register (DSCR). The special register will
not be set by default.
Original patch by: Muhammad Usman
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D117013
All LLVM backends use MCDisassembler as a base class for their
instruction decoders. Use "const MCDisassembler *" for the decoder
instead of "const void *". Remove unnecessary static casts.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D122245
The BL8_NOTOC_RM instruction was incorrectly producing a relocation that reqired
a TOC restore after the call. This patch fixes that issue and the notoc
relocation is now used.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D122012
Rename file for PPCCTRLoopsVerify pass from PPCCTRLoops.cpp
to PPCCTRLoopsVerify.cpp.
There will be a new file PPCCTRLoops.cpp for PPC CTR loops
generation later.
After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3
instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not
even with compiler-rt. Block it as well so that we get inline code.
Because libgcc doesn't have __muloti4, we block that as well.
Fixes#54460.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D122090
Add the calling convention for the vector pair registers.
These registers overlap with the vector registers.
Part of an original patch by: Lei Huang
Reviewed By: nemanjai, #powerpc
Differential Revision: https://reviews.llvm.org/D117225
We are going to remove the old 'perfect shuffle' optimization since it
brings performance penalty in hot loop around vectors. For example, in
following loop sharing the same mask:
%v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
%v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead
of `vperm-vperm`. In some large loop cases, this causes 20%+ performance
penalty.
The original attempt to resolve this is to pre-record masks of every
shufflevector operation in DAG, but that is somewhat complex and brings
unnecessary computation (to scan all nodes) in optimization. Here we
disable it by default. There're indeed some cases becoming worse after
this, which will be fixed in a more careful way in future patches.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D121082
VSX introduced some permute instructions that are direct
replacements for Altivec ones except they can target all
the VSX registers. We have added code generation for most
of these but somehow missed the low/hi word merges (XXMRG[LH]W).
This caused some additional spills on some large
computationally intensive code.
This patch simply adds the missed patterns.
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.
But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.
This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D116015
This wraps up from D119053. The 2 headers are moved as described,
fixed file headers and include guards, updated all files where the old
paths were detected (simple grep through the repo), and `clang-format`-ed it all.
Differential Revision: https://reviews.llvm.org/D119876
There are two MMA patterns that have been added twice. This patch just removes
one set of petterns. Should not change the way MMA behaves.
Reviewed By: lei, #powerpc
Differential Revision: https://reviews.llvm.org/D120680
Currently all of the MMA instructions as well as the MMA related register info
is bundled with the Power 10 instructions. This patch just splits them out.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D120515
Added the license info as well as description about how classes should be named
based on existing documentation.
Reviewed By: lei, #powerpc
Differential Revision: https://reviews.llvm.org/D120530
Add the Power 10 instruction LXVKQ.
This patch was taken from an original patch by: Yi-Hong Lyu
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D117507
The Linux kernel build uses absolute expressions suffixed with @lo/@ha
relocations. This currently doesn't work for DS/DQ form instructions and
there is no reason for it not to. It also works with GAS.
This patch allows this as long as the value is a multiple of 4/16
for DS/DQ form.
Differential revision: https://reviews.llvm.org/D115419
Perfect shuffle was introduced into PowerPC backend years ago, and only
available in big-endian subtargets. This optimization has good effects
in simple cases, but brings serious negative impact in large programs
with many shuffle instructions sharing the same mask.
Here introduces a temporary backend hidden option to control it until we
implemented better way to fix the gap in vectorshuffle decomposition.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D120072
This patch updates the handling of vectors in getPreferredVectorAction():
For single-element and scalable vectors, fall back to default vector legalization
handling. For vNi1 vectors, add handling to either split or promote them in
order to prevent the production of wide v256i1/v512i1 types.
The following assertion is fixed by this patch, as we ended up producing the
wide vector types (that are used for MMA) in the backend prior to this fix.
```
Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() &&
"Cannot BITCAST between types of different sizes!"
```
Differential Revision: https://reviews.llvm.org/D119521
The LDMX instruction was to be potentially added in P9 but it was never added
in either ISA 3.0 or ISA 3.1. This patch removes that instruction as it is
currently still an invalid instruction.
Reviewed By: lei
Differential Revision: https://reviews.llvm.org/D118074
Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection,
and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128,
llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt.
Reviewed By: nemanjai, shchenz, amyk
Differential Revision: https://reviews.llvm.org/D117006
There's a few relevant forward declarations in there that may require downstream
adding explicit includes:
llvm/MC/MCContext.h no longer includes llvm/BinaryFormat/ELF.h, llvm/MC/MCSubtargetInfo.h, llvm/MC/MCTargetOptions.h
llvm/MC/MCObjectStreamer.h no longer include llvm/MC/MCAssembler.h
llvm/MC/MCAssembler.h no longer includes llvm/MC/MCFixup.h, llvm/MC/MCFragment.h
Counting preprocessed lines required to rebuild llvm-project on my setup:
before: 1052436830
after: 1049293745
Which is significant and backs up the change in addition to the usual benefits of
decreasing coupling between headers and compilation units.
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D119244
For PGO on AIX, when we switch to the linux-style PGO variable access
(via _start and _stop labels), we need the compiler to generate a .ref
assembly for each of the three csects:
- __llvm_prf_data[RW]
- __llvm_prf_names[RO]
- __llvm_prf_vnds[RW]
We insert the .ref inside the __llvm_prf_cnts[RW] csect so that if it's
live then the 3 csects are live.
For example, for a testcase with at least one function definition, when
compiled with -fprofile-generate we should generate:
.csect __llvm_prf_cnts[RW],3
.ref __llvm_prf_data[RW] <<============ needs to be inserted
.ref __llvm_prf_names[RO] <<===========
the __llvm_prf_vnds is not always present, so we reference it only when
it's present.
Reviewed By: sfertile, daltenty
Differential Revision: https://reviews.llvm.org/D116607