These changes to address issue
https://github.com/llvm/llvm-project/issues/55857.
Since R30/S30 is used as pointer (32 bits) for GOT Table in the ppc32 ABI,
remove it from the SPE callee save register when PIC is enabled.
This prevents emitting the SPE load and store for S30 and S31 regs.
Differential revision: https://reviews.llvm.org/D127495
SPE doesn't have a fmadd instruction, so don't bother hoisting a
multiply and add sequence to this, as it'd become just a library call.
Hoisting happens too late for the CTR usability test to veto using the
CTR in a loop, and results in an assert "Invalid PPC CTR loop!".
Map hardware loop intrinsics loop_decrement and set_loop_iteration
to the new PowerPC pseudo instructions, so that the hardware loop
intrinsics will be expanded to normal cmp+branch form or ctrloop
form based on the CTR register usage on MIR level.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D123366
1) Overloaded (instruction-based) method is a wrapper around the current (opcode-based) method.
2) This patch also changes a few callsites (VectorCombine.cpp,
SLPVectorizer.cpp, CodeGenPrepare.cpp) to call the overloaded method.
3) This is a split of D128302.
Differential Revision: https://reviews.llvm.org/D131114
This patch ensures consistency in the construction of FP_ROUND nodes
such that they always use ISD::TargetConstant instead of ISD::Constant.
This additionally fixes a bug in the AArch64 SVE backend where patterns
were matching against TargetConstant nodes and sometimes failing when
passed a Constant node.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D130370
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
I don't have any evidence these particular uses are actually causing any
issues, but we should avoid accidentally truncating immediate values
depending on the host.
We can't guarantee the long always 64 bits like WINDOWS or LLP64 data
model (rare but we should consider).
So use int64_t from inttypes.h and safe in this case.
Fixes https://github.com/llvm/llvm-project/issues/55911 .
This was stored in LiveIntervals, but not actually used for anything
related to LiveIntervals. It was only used in one check for if a load
instruction is rematerializable. I also don't think this was entirely
correct, since it was implicitly assuming constant loads are also
dereferenceable.
Remove this and rely only on the invariant+dereferenceable flags in
the memory operand. Set the flag based on the AA query upfront. This
should have the same net benefit, but has the possible disadvantage of
making this AA query nonlazy.
Preserve the behavior of assuming pointsToConstantMemory implying
dereferenceable for now, but maybe this should be changed.
We add the following ISEL pattern for i64 imm in D87384, this patch is for i32.
`mul with (2^N * int16_imm) -> MULLI + RLWINM`
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D129708
isSafeToExpand() for addrecs depends on whether the SCEVExpander
will be used in CanonicalMode. At least one caller currently gets
this wrong, resulting in PR50506.
Fix this by a) making the CanonicalMode argument on the freestanding
functions required and b) adding member functions on SCEVExpander
that automatically take the SCEVExpander mode into account. We can
use the latter variant nearly everywhere, and thus make sure that
there is no chance of CanonicalMode mismatch.
Fixes https://github.com/llvm/llvm-project/issues/50506.
Differential Revision: https://reviews.llvm.org/D129630
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo. The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.
Recommitted with some fixes for the leftover MCII variables in release
builds.
Differential Revision: https://reviews.llvm.org/D129506
This reverts commit e2fb8c0f4b as it does
not build for Release builds, and some buildbots are giving more warning
than I saw locally. Reverting to fix those issues.
D25618 added a method to verify the instruction predicates for an
emitted instruction, through verifyInstructionPredicates added into
<Target>MCCodeEmitter::encodeInstruction. This is a very useful idea,
but the implementation inside MCCodeEmitter made it only fire for object
files, not assembly which most of the llvm test suite uses.
This patch moves the code into the <Target>_MC::verifyInstructionPredicates
method, inside the InstrInfo. The allows it to be called from other
places, such as in this patch where it is called from the
<Target>AsmPrinter::emitInstruction methods which should trigger for
both assembly and object files. It can also be called from other places
such as verifyInstruction, but that is not done here (it tends to catch
errors earlier, but in reality just shows all the mir tests that have
incorrect feature predicates). The interface was also simplified
slightly, moving computeAvailableFeatures into the function so that it
does not need to be called externally.
The ARM, AMDGPU (but not R600), AVR, Mips and X86 backends all currently
show errors in the test-suite, so have been disabled with FIXME
comments.
Differential Revision: https://reviews.llvm.org/D129506
Proposing to move the check for scalar MASS conversion from constructor
of PPCTargetLowering to the lowerLibCallBase function which decides
about the lowering.
The Target machine option Options.PPCGenScalarMASSEntries is set in
PPCTargetMachine.cpp. But an object of the class PPCTargetLowering
is created in one of the included header files. So, the constructor will run
before setting PPCGenScalarMASSEntries to correct value. So, we cannot
check this option in the constructor.
Differential: https://reviews.llvm.org/D128653
Reviewer: @bmahjour
According to D127731, PPCTLSDynamicCall does not preserve
LiveIntervals, so stop claiming that it does and remove the code
that tried to repair them. NFCI.
Differential Revision: https://reviews.llvm.org/D128421
variable with its multiple aliases.
This patch handles the case where a variable has
multiple aliases.
AIX's assembly directive .set is not usable for the
aliasing purpose, and using different labels allows
AIX to emulate symbol aliases. If a value is emitted
between any two labels, meaning they are not aligned,
XCOFF will automatically calculate the offset for them.
This patch implements:
1) Emits the label of the alias just before emitting
the value of the sub-element that the alias referred to.
2) A set of aliases that refers to the same offset
should be aligned.
3) We didn't emit aliasing labels for common and
zero-initialized local symbols in
PPCAIXAsmPrinter::emitGlobalVariableHelper, but
emitted linkage for them in
AsmPrinter::emitGlobalAlias, which caused a FAILURE.
This patch fixes the bug by blocking emitting linkage
for the alias without a label.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D124654
opportunities
There are straight forward splat load opportunities blocked by
getNormalLoadInput(), since those cases involve consecutive bitcasts.
Improve by looking through bitcasts.
Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D128703
This patch implements a new way to generate the CTR loops. Now the
intrinsics inserted in hardware loop pass will be mapped to pseudo
instructions and these pseudo instructions will be expanded to CTR
loop or normal compare+branch loop in this post ISEL pass.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D122125
There are instances where using paired vector stores leads to significant
performance degradation due to issues with store forwarding.To avoid falling
into this trap with compiler - generated code, we will not emit these
instructions unless the user requests them explicitly(with a builtin or by
specifying the option).
Reviewed By : lei, amyk, saghir
Differential Revision: https://reviews.llvm.org/D127218
This patch adds the instructions `MTVSCR` and `MFVSCR` as not swappable to the
PPCVSXSwapRemoval pass because they are not lane-insensitive. This will prevent
the compiler from optimizing out required swaps when using `lxvd2x` and
`stxvd2x`.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D128062
In some passes we need a valid number of cache line size to do analysis or
transformation, e.g., loop cache analysis and loop date prefetch. However,
for some backend targets, `TTIImpl->getCacheLineSize()` is not implemented
and hence 'TTI.getCacheLineSize()' would just return 0 which eventually might
produce invalid result.
In this patch we add a user-specified opt/llc option for cache line size.
If the option is specified by users we use the value supplied, otherwise we
fall-back to the default value obtained from `TTIImpl->->getCacheLineSize()`.
The powerpc target already has such an option, this patch generalizes
this option to TargetTransformInfo.cpp.
Reviewed By: bmahjour, #loopoptwg
Differential Revision: https://reviews.llvm.org/D127342
This patch fixes the load and store quadword instructions on
PowerPC to use correct offset and base address.
Reviewed By: #powerpc, nemanjai, lkail
Differential Revision: https://reviews.llvm.org/D126807
Currently in `combineVectorShuffle()`, we update the shuffle mask if either
input vector comes from a scalar_to_vector, and we keep the respective input
vectors in its permuted form by producing PPCISD::SCALAR_TO_VECTOR_PERMUTED.
However, it is possible that we end up in a situation where both input vectors
to the vector_shuffle are scalar_to_vector, and are different vector types.
In situations like this, the shuffle mask is updated incorrectly as the current
code assumes both scalar_to_vector inputs are the same vector type.
This patch skips the combines for vector_shuffle if both input vectors are
scalar_to_vector, and if they are of different vector types. A follow up patch
will focus on fixing this issue afterwards, in order to correctly update the
shuffle mask.
Differential Revision: https://reviews.llvm.org/D127818
This patch changes the PowerPC backend to generate VSX load/store instructions
for all vector loads/stores on Power8 and earlier (LE) instead of VMX
load/store instructions. The reason for this change is because VMX instructions
require the vector to be 16-byte aligned. So, a vector load/store will fail with
VMX instructions if the vector is misaligned. Also, `gcc` generates VSX
instructions in this situation which allow for unaligned access but require a
swap instruction after loading/before storing. This is not an issue for BE
because we already emit VSX instructions since no swap is required. And this is
not an issue on Power9 and up since we have access to `lxv[x]`/`stxv[x]` which
allow for unaligned access and do not require swaps.
This patch also delays the VSX load/store for LE combines until after
LegalizeOps to prioritize other load/store combines.
Reviewed By: #powerpc, stefanp
Differential Revision: https://reviews.llvm.org/D127309
The combine step for shufflevector will sometimes replace undef in the mask
with a defined value. This can cause an infinite loop in some cases as another
combine will then put the undef back in the mask.
This patch fixes the issue so that undefs are not replaced when doing a combine.
Reviewed By: ZarkoCA, amyk, quinnp, saghir
Differential Revision: https://reviews.llvm.org/D127439
MIR support is totally unusable for AMDGPU without this, since the set
of reserved registers is set from fields here.
Add a clone method to MachineFunctionInfo. This is a subtle variant of
the copy constructor that is required if there are any MIR constructs
that use pointers. Specifically, at minimum fields that reference
MachineBasicBlocks or the MachineFunction need to be adjusted to the
values in the new function.
I can't remove the function just yet as it is used in the generated .inc files.
I would also like to provide a way to compare alignment with TypeSize since it came up a few times.
Differential Revision: https://reviews.llvm.org/D126910
Support allocation of huge stack frame(>2g) on PPC64.
For ELFv2 ABI on Linux, quoted from the spec 2.2.3.1 General Stack Frame Requirements
> There is no maximum stack frame size defined.
On AIX, XL allows such huge frame.
Reviewed By: #powerpc, nemanjai
Differential Revision: https://reviews.llvm.org/D107886