Commit Graph

3389 Commits

Author SHA1 Message Date
Amy Kwan a5bef98c75 [PowerPC][NFC] Add additional vector_shuffle tests involving scalar_to_vector.
This patch adds additional test cases involving vector_shuffles where either its
left, right or both inputs are scalar_to_vector nodes. These test cases involve
v16i8, v2i64, v4i32 and v8i16 vector shuffles, and were generated in preparation
for D130487.

Differential Revision: https://reviews.llvm.org/D130485
2022-08-15 12:30:58 -05:00
Filipp Zhinkin 1626ee6a95 [DAGCombine] Hoist shifts out of a logic operations tree.
Hoist and combine shift operations from logic operations tree:
logic (logic (SH x0, s), y), (logic (SH x1, s), z)  --> logic (SH (logic x0, x1), s), (logic y, z)

The transformation improves code generated for some cases related to the issue https://github.com/llvm/llvm-project/issues/49541.

Correctness:
https://alive2.llvm.org/ce/z/pVqVgY
https://alive2.llvm.org/ce/z/YVvT-q
https://alive2.llvm.org/ce/z/W5zTBq
https://alive2.llvm.org/ce/z/YfJsvJ
https://alive2.llvm.org/ce/z/3YSyDM
https://alive2.llvm.org/ce/z/Bs2kzk
https://alive2.llvm.org/ce/z/EoQpzU
https://alive2.llvm.org/ce/z/Jnc_5H
https://alive2.llvm.org/ce/z/_LP6k_
https://alive2.llvm.org/ce/z/KvZNC9

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D131189
2022-08-12 12:42:16 +03:00
Ting Wang 13c1e7a8aa [PowerPC] Fix test case changed by "Add XXEVAL TD pattern" [NFC] 2022-08-12 02:56:54 -04:00
Ting Wang 12e1936f64 [PowerPC] Add XXEVAL TD pattern
Add xxeval TD pattern for P10 on: eqv, nor, or, xor.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D131654
2022-08-12 01:27:24 -04:00
Chen Zheng 8d19cfb72e [PowerPC] omit location attribute for TLS variable on AIX
TLS debug on AIX is not ready for now.
The location generated in no-integrated-as mode is wrong and
in integrated-as mode causes AIX linker error.

Reviewed By: Esme

Differential Revision: https://reviews.llvm.org/D130245
2022-08-12 00:54:48 -04:00
esmeyi 7a70e6e224 [XCOFF] ignore the cold attribute.
Summary: AIX XCOFF doesn't support the cold feature.
    While it shouldn't be a function error when XCOFF catching the cold attribute.
    As with the behavior of other formats, we just ignore the attribute for now.

Reviewed By: DiggerLin

Differential Revision: https://reviews.llvm.org/D131473
2022-08-11 01:13:05 -04:00
Umesh Kalappa 9757f4f2dd [PowerPC] Don't use the S30 and S31 regs for the pic code
These changes to address issue
https://github.com/llvm/llvm-project/issues/55857.

Since R30/S30 is used as pointer (32 bits) for GOT Table in the ppc32 ABI,
remove it from the SPE callee save register when PIC is enabled.

This prevents emitting the SPE load and store for S30 and S31 regs.

Differential revision: https://reviews.llvm.org/D127495
2022-08-10 10:31:27 -05:00
Justin Hibbits f43b228581 PowerPC: Don't hoist float multiply + add to fused operation on SPE
SPE doesn't have a fmadd instruction, so don't bother hoisting a
multiply and add sequence to this, as it'd become just a library call.
Hoisting happens too late for the CTR usability test to veto using the
CTR in a loop, and results in an assert "Invalid PPC CTR loop!".
2022-08-10 11:04:27 -04:00
Edd Barrett fa250250b2
Migrate llvm.experimental.patchpoint() to ptr.
This intrinsic used a typed pointer for a call target operand. This
change updates the operand to be an opaque pointer and updates all
pointers in all test files that use the intrinsic.

Differential revision: https://reviews.llvm.org/D131261
2022-08-10 13:18:02 +01:00
Yuta Mukai 5357dd2f43 [MachinePipeliner] Fix Phi generation failure for large stages
The previous code overwrites VRMap for prologue stages during Phi
generation if a register spans many stages.
As a result, the wrong register is used as the one coming from
the prologue in Phis at later stages. (A process exists to correct
this, but it does not work in all cases.)
In addition, VRMap for prologue must be preserved until addBranches().

This patch fixes them by separating the map for Phis into a different
variable (VRMapPhi).

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D127840
2022-08-09 13:14:26 +09:00
Chen Zheng d9004dfbab [PowerPC] mapping hardward loop intrinsics to powerpc pseudo
Map hardware loop intrinsics loop_decrement and set_loop_iteration
to the new PowerPC pseudo instructions, so that the hardware loop
intrinsics will be expanded to normal cmp+branch form or ctrloop
form based on the CTR register usage on MIR level.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D123366
2022-08-08 21:34:20 -04:00
Roland Froese d6bd3d373e [DAGCombiner] Add some BE store forwarding tests; NFC
Add tests before D130115. NFC.
2022-08-08 16:33:01 -04:00
Chen Zheng 13016f1f1b [NFC] add test cases for D123366 2022-08-06 08:56:42 -04:00
Chen Zheng ef60e44fe8 [PowerPC] fix stack size allocated for float point argument
This is for https://github.com/llvm/llvm-project/issues/56469

Allocate 4 bytes for float point arguments on PPC32.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D129558
2022-08-06 08:38:52 -04:00
Chen Zheng e99ffe6ae8 [NFC] add test case for D129558 2022-08-05 23:15:48 -04:00
Amaury Séchet 1e15e24a76 [NFC] Autogenerate CodeGen/PowerPC/pzero-fp-xored.ll 2022-07-28 16:18:43 +00:00
Simon Pilgrim 69d5a038b9 [DAG] Enable ISD::SRL SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the ISD::SRL source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This is another step towards removing SelectionDAG::GetDemandedBits and just using TargetLowering::SimplifyMultipleUseDemandedBits.

There a few cases where we end up with extra register moves which I think we can accept in exchange for the increased ILP.

Differential Revision: https://reviews.llvm.org/D77804
2022-07-28 14:10:44 +01:00
Umesh Kalappa f38ea84a9f [PowerPC] Change long to int64_t (which is always 64 bit or 8 bytes )
We can't guarantee the long always 64 bits like WINDOWS or LLP64 data
model (rare but we should consider).

So use int64_t from inttypes.h and safe in this case.

Fixes https://github.com/llvm/llvm-project/issues/55911 .
2022-07-27 09:34:45 -07:00
Fangrui Song 7225213c0a [LegacyPM] Remove {,PostInline}EntryExitInstrumenterPass
Following recent changes removing non-core features of the legacy
PM/optimization pipeline.
2022-07-23 15:30:15 -07:00
Stefan Pintilie 475a39fbc3 [PowerPC][NFC] Convert the MMA test cases to use opaque pointers.
This patch modifies only test cases.
Converted the MMA test cases to use opaque pointers.

Reviewed By: lei, amyk

Differential Revision: https://reviews.llvm.org/D130090
2022-07-22 11:43:40 -05:00
Chen Zheng bc5c637376 enable P10 vector builtins test on AIX 64 bit; NFC
Verify that P10 vector builtins with type `vector signed __int128`
and `vector unsigned __int128` work well on AIX 64 bit.
2022-07-21 04:23:02 -04:00
esmeyi 339392ecf2 [AIX] follow-up of D124654.
Emitting the remaining aliases instead of reporting
an error to avoid SPEC2017 PEAK failures.
And mark this as a TODO.
2022-07-21 01:10:09 -04:00
esmeyi b1847ff068 [XCOFF] write the aux header when the visibility is specified in XCOFF32.
The n_type field in the symbol table entry has two interpretations in XCOFF32, and a single interpretation in XCOFF64.
The new interpretation is used in XCOFF32 if the value of the o_vstamp field in the auxiliary header is 2.
In XCOFF64 and the new XCOFF32 interpretation, the n_type field is used for the symbol type and visibility.
The patch writes the aux header with an o_vstamp field value of 2 when the visibility is specified in XCOFF32 to make the new XCOFF32 interpretation used.

Reviewed By: DiggerLin, jhenderson

Differential Revision: https://reviews.llvm.org/D128148
2022-07-20 07:09:34 -04:00
Simon Pilgrim 9fc347aa4e [DAG] PromoteIntRes_BUILD_VECTOR - extend constant boolean vectors according to target BooleanContents
PromoteIntRes_BUILD_VECTOR currently always ANY_EXTENDs build vector operands, but if this is a constant boolean vector we're losing the useful ability to keep the vector matching the BooleanContents mode used by the target.

This patch extends constant boolean vectors according to target BooleanContents, allowing a number of additional all-bits folds (notable XOR -> NOT conversions) to occur.

Differential Revision: https://reviews.llvm.org/D129641
2022-07-20 10:49:31 +01:00
esmeyi 28b1ba1c07 [PowerPC] Add an ISEL pattern for i32 MULLI.
We add the following ISEL pattern for i64 imm in D87384, this patch is for i32.
`mul with (2^N * int16_imm) -> MULLI + RLWINM`

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D129708
2022-07-18 04:40:51 -04:00
Nikita Popov 2a721374ae [IR] Don't use blockaddresses as callbr arguments
Following some recent discussions, this changes the representation
of callbrs in IR. The current blockaddress arguments are replaced
with `!` label constraints that refer directly to callbr indirect
destinations:

    ; Before:
    %res = callbr i8* asm "", "=r,r,i"(i8* %x, i8* blockaddress(@test8, %foo))
    to label %asm.fallthrough [label %foo]
    ; After:
    %res = callbr i8* asm "", "=r,r,!i"(i8* %x)
    to label %asm.fallthrough [label %foo]

The benefit of this is that we can easily update the successors of
a callbr, without having to worry about also updating blockaddress
references. This should allow us to remove some limitations:

* Allow unrolling/peeling/rotation of callbr, or any other
  clone-based optimizations
  (https://github.com/llvm/llvm-project/issues/41834)
* Allow duplicate successors
  (https://github.com/llvm/llvm-project/issues/45248)

This is just the IR representation change though, I will follow up
with patches to remove limtations in various transformation passes
that are no longer needed.

Differential Revision: https://reviews.llvm.org/D129288
2022-07-15 10:18:17 +02:00
Simon Pilgrim 64ffcba1f8 [PowerPC] Regenerate pr35402.ll test checks 2022-07-13 11:01:44 +01:00
esmeyi 100319cdb4 [AIX] follow-up of D124654.
Report an error when alias symbols are not emitted all.
2022-07-13 03:39:08 -04:00
Kai Nacke 42f7364fcb [GISel] Check useLoadStackGuardNode() before generating LOAD_STACK_GUARD
When lowering llvm::stackprotect intrinsic, the SDAG implementation
checks useLoadStackGuardNode() to either create a LOAD_STACK_GUARD or use
the first argument of the intrinsic. This check is not present in the
IRTranslator, which results in always generating a LOAD_STACK_GUARD even
if the target does not support it.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D129505
2022-07-12 11:44:42 -04:00
Nikita Popov 4bb7b6fae3 [IR] Remove support for float binop constant expressions
As part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179,
this removes support for the floating-point binop constant expressions
fadd, fsub, fmul, fdiv and frem.

As part of this change, the C APIs LLVMConstFAdd, LLVMConstFSub,
LLVMConstFMul, LLVMConstFDiv and LLVMConstFRem are removed.
The LLVMBuild APIs should be used instead.

Differential Revision: https://reviews.llvm.org/D129478
2022-07-12 09:40:49 +02:00
Sanjay Patel 8b75671314 [SDAG] try to replace subtract-from-constant with xor
This is almost the same as the abandoned D48529, but it
allows splat vector constants too.

This replaces the x86-specific code that was added with
the alternate patch D48557 with the original generic
combine.

This transform is a less restricted form of an existing
InstCombine and the proposed SDAG equivalent for that
in D128080:
https://alive2.llvm.org/ce/z/OUm6N_

Differential Revision: https://reviews.llvm.org/D128123
2022-07-08 08:14:24 -04:00
Nikita Popov 9936d732cd [PowerPC] Simplify test for PR33636 (NFC)
There was a lot of unnecessary code here. Add the -O0 flag to
avoid using constant expressions, otherwise it may get folded
away during EarlyCSE.

Verified that this test fails prior to the fixing commit.
2022-07-06 09:47:42 +02:00
Jay Foad 3ff319c690 [PowerPC] PPCTLSDynamicCall does not preserve LiveIntervals
According to D127731, PPCTLSDynamicCall does not preserve
LiveIntervals, so stop claiming that it does and remove the code
that tried to repair them. NFCI.

Differential Revision: https://reviews.llvm.org/D128421
2022-07-05 20:09:42 +01:00
esmeyi d2a35e4d39 [AIX] Handling the label alignment of a global
variable with its multiple aliases.

This patch handles the case where a variable has
multiple aliases.
AIX's assembly directive .set is not usable for the
aliasing purpose, and using different labels allows
AIX to emulate symbol aliases. If a value is emitted
between any two labels, meaning they are not aligned,
XCOFF will automatically calculate the offset for them.

This patch implements:
1) Emits the label of the alias just before emitting
the value of the sub-element that the alias referred to.
2) A set of aliases that refers to the same offset
should be aligned.
3) We didn't emit aliasing labels for common and
zero-initialized local symbols in
PPCAIXAsmPrinter::emitGlobalVariableHelper, but
emitted linkage for them in
AsmPrinter::emitGlobalAlias, which caused a FAILURE.
This patch fixes the bug by blocking emitting linkage
for the alias without a label.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D124654
2022-07-03 23:16:16 -04:00
Chen Zheng 370127b7d5 [XCOFF] change default program code csect alignment to 32
This is the same with commercial XLC on AIX.

Reviewed By: Esme

Differential Revision: https://reviews.llvm.org/D114419
2022-06-29 04:16:01 +00:00
Ting Wang 88b6d22791 [PowerPC] Improve getNormalLoadInput to reach more splat load
opportunities

There are straight forward splat load opportunities blocked by
getNormalLoadInput(), since those cases involve consecutive bitcasts.
Improve by looking through bitcasts.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D128703
2022-06-28 08:02:49 -04:00
Ting Wang 22b8f3511a [PowerPC] Add base test case for load splat opportunity
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D128718
2022-06-28 06:55:23 -04:00
Nikita Popov 217e85761c [ArgPromotion] Remove legacy PM support
Support for the legacy pass manager in ArgPromotion causes
complications in D125485. As the legacy pass manager for middle-end
optimizations is unsupported, drop ArgPromotion from the legacy
pipeline, rather than introducing additional complexity to deal
with it.

Differential Revision: https://reviews.llvm.org/D128536
2022-06-27 09:42:17 +02:00
Kai Luo 106657df4c [PowerPC][AIX] Fix assertion message on AIX. NFC.
Fixes build https://lab.llvm.org/buildbot/#/builders/214/builds/1980.
2022-06-24 12:03:57 +08:00
Kai Luo 6710b21d46 [PowerPC] Allow llvm.ppc.cfence to accept pointer types
In the context of atomic load, integer, pointer and float point types are allowed, thus we should allow llvm.ppc.cfence to accept any type mentioned.

Fixes https://github.com/llvm/llvm-project/issues/55983.

Reviewed By: shchenz, vchuravy

Differential Revision: https://reviews.llvm.org/D127554
2022-06-24 10:55:32 +08:00
esmeyi d29e986ed5 [XCOFF] write the real source file name in C_FILE symbol.
The symbol table starts with all the C_FILE symbols.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D126623
2022-06-22 06:23:36 -04:00
Martin Sebor b19194c032 [InstCombine] handle subobjects of constant aggregates
Remove the known limitation of the library function call folders to only
work with top-level arrays of characters (as per the TODO comment in
the code) and allows them to also fold calls involving subobjects of
constant aggregates such as member arrays.
2022-06-21 11:55:14 -06:00
Chen Zheng 9cfbe7bbfe [PowerPC][ctrloop] handles calls in preheader before MTCTRloop 2022-06-21 01:22:39 -04:00
Chen Zheng a71fe49bb5 [PowerPC] add a new pass to expand ctr loop pseudos
This patch implements a new way to generate the CTR loops. Now the
intrinsics inserted in hardware loop pass will be mapped to pseudo
instructions and these pseudo instructions will be expanded to CTR
loop or normal compare+branch loop in this post ISEL pass.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D122125
2022-06-20 22:57:24 -04:00
Nemanja Ivanovic e09f6ff3c1 [PowerPC] Disable automatic generation of STXVP
There are instances where using paired vector stores leads to significant
performance degradation due to issues with store forwarding.To avoid falling
into this trap with compiler - generated code, we will not emit these
instructions unless the user requests them explicitly(with a builtin or by
specifying the option).

Reviewed By : lei, amyk, saghir

Differential Revision: https://reviews.llvm.org/D127218
2022-06-20 14:30:29 -05:00
Quinn Pham deb7655209 [PowerPC] Fix PPCVSXSwapRemoval pass to include MTVSCR and MFVSCR as not swappable.
This patch adds the instructions `MTVSCR` and `MFVSCR` as not swappable to the
PPCVSXSwapRemoval pass because they are not lane-insensitive. This will prevent
the compiler from optimizing out required swaps when using `lxvd2x` and
`stxvd2x`.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D128062
2022-06-17 10:14:24 -05:00
Congzhe Cao a9dccb0072 [TargetTransformInfo] Added an opt/llc option for cache line size
In some passes we need a valid number of cache line size to do analysis or
transformation, e.g., loop cache analysis and loop date prefetch. However,
for some backend targets, `TTIImpl->getCacheLineSize()` is not implemented
and hence 'TTI.getCacheLineSize()' would just return 0 which eventually might
produce invalid result.

In this patch we add a user-specified opt/llc option for cache line size.
If the option is specified by users we use the value supplied, otherwise we
fall-back to the default value obtained from `TTIImpl->->getCacheLineSize()`.
The powerpc target already has such an option, this patch generalizes
this option to TargetTransformInfo.cpp.

Reviewed By: bmahjour, #loopoptwg

Differential Revision: https://reviews.llvm.org/D127342
2022-06-16 15:57:51 -04:00
Ahsan Saghir 3d259a82da [PowerPC] Fix LQ-STQ instructions to use correct offset and base
This patch fixes the load and store quadword instructions on
PowerPC to use correct offset and base address.

Reviewed By: #powerpc, nemanjai, lkail

Differential Revision: https://reviews.llvm.org/D126807
2022-06-16 10:47:38 -05:00
Amy Kwan 34033a84b8 [PowerPC] Skip combine for vector_shuffles when two scalar_to_vector nodes are different vector types.
Currently in `combineVectorShuffle()`, we update the shuffle mask if either
input vector comes from a scalar_to_vector, and we keep the respective input
vectors in its permuted form by producing PPCISD::SCALAR_TO_VECTOR_PERMUTED.
However, it is possible that we end up in a situation where both input vectors
to the vector_shuffle are scalar_to_vector, and are different vector types.
In situations like this, the shuffle mask is updated incorrectly as the current
code assumes both scalar_to_vector inputs are the same vector type.

This patch skips the combines for vector_shuffle if both input vectors are
scalar_to_vector, and if they are of different vector types. A follow up patch
will focus on fixing this issue afterwards, in order to correctly update the
shuffle mask.

Differential Revision: https://reviews.llvm.org/D127818
2022-06-15 14:12:18 -05:00
Quinn Pham 335e8bf100 [PowerPC] emit VSX instructions instead of VMX instructions for vector loads and stores
This patch changes the PowerPC backend to generate VSX load/store instructions
for all vector loads/stores on Power8 and earlier  (LE) instead of VMX
load/store instructions. The reason for this change is because VMX instructions
require the vector to be 16-byte aligned. So, a vector load/store will fail with
VMX instructions if the vector is misaligned. Also, `gcc` generates VSX
instructions in this situation which allow for unaligned access but require a
swap instruction after loading/before storing. This is not an issue for BE
because we already emit VSX instructions since no swap is required. And this is
not an issue on Power9 and up since we have access to `lxv[x]`/`stxv[x]` which
allow for unaligned access and do not require swaps.

This patch also delays the VSX load/store for LE combines until after
LegalizeOps to prioritize other load/store combines.

Reviewed By: #powerpc, stefanp

Differential Revision: https://reviews.llvm.org/D127309
2022-06-15 12:06:04 -05:00