Commit Graph

3389 Commits

Author SHA1 Message Date
Stefan Pintilie 263f1b2f5d [PowerPC] Fix combine step for shufflevector.
The combine step for shufflevector will sometimes replace undef in the mask
with a defined value. This can cause an infinite loop in some cases as another
combine will then put the undef back in the mask.

This patch fixes the issue so that undefs are not replaced when doing a combine.

Reviewed By: ZarkoCA, amyk, quinnp, saghir

Differential Revision: https://reviews.llvm.org/D127439
2022-06-14 11:31:24 -05:00
Kai Luo 7735653e16 [PowerPC] Update cfence tests to avoid using undef. NFC. 2022-06-14 12:45:46 +08:00
Simon Pilgrim 1cf9b24da3 [DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits
This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts.

This helps with several of the regressions from D125836
2022-06-12 19:25:20 +01:00
Simon Pilgrim cf5c63d187 [DAG] visitVECTOR_SHUFFLE - fold splat(insert_vector_elt()) and splat(scalar_to_vector()) to build_vector splats
Addresses a number of regressions identified in D127115
2022-06-11 21:06:42 +01:00
Simon Pilgrim a71ad6a3c8 [DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector()
Allow scalar_to_vector nodes to be used for the start of a build_vector creation
2022-06-11 15:29:22 +01:00
Simon Pilgrim 599aa617e3 [PowerPC] Regenerate pre-inc-disable.ll checks 2022-06-11 15:12:49 +01:00
Kai Luo e06faedf1d [PowerPC] Add tests to reflect cfence on float point types. NFC. 2022-06-11 12:30:15 +08:00
Kai Luo 5018a5dcbe [PowerPC] Support huge frame size for PPC64
Support allocation of huge stack frame(>2g) on PPC64.

For ELFv2 ABI on Linux, quoted from the spec 2.2.3.1 General Stack Frame Requirements
> There is no maximum stack frame size defined.

On AIX, XL allows such huge frame.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D107886
2022-06-06 09:08:28 +00:00
Nikita Popov 41d5033eb1 [IR] Enable opaque pointers by default
This enabled opaque pointers by default in LLVM. The effect of this
is twofold:

* If IR that contains *neither* explicit ptr nor %T* types is passed
  to tools, we will now use opaque pointer mode, unless
  -opaque-pointers=0 has been explicitly passed.
* Users of LLVM as a library will now default to opaque pointers.
  It is possible to opt-out by calling setOpaquePointers(false) on
  LLVMContext.

A cmake option to toggle this default will not be provided. Frontends
or other tools that want to (temporarily) keep using typed pointers
should disable opaque pointers via LLVMContext.

Differential Revision: https://reviews.llvm.org/D126689
2022-06-02 09:40:56 +02:00
Hendrik Greving a92ed167f2 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-02 00:49:11 +00:00
Hendrik Greving e9d05cc7d8 Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4."
This reverts commit 430ac5c302.

Due to failures in Clang tests.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 13:27:49 -07:00
Hendrik Greving 430ac5c302 [ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4.
Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4.

Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be
removed once targets set this explicitly.

Adjusts 11 lit tests to reflect slightly different behavior during
DAG combine.

Differential Revision: https://reviews.llvm.org/D125247
2022-06-01 12:48:01 -07:00
Nuno Lopes 80b3dcc045 [Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace
There are a few places where we use report_fatal_error when the input is broken.
Currently, this function always crashes LLVM with an abort signal, which
then triggers the backtrace printing code.
I think this is excessive, as wrong input shouldn't give a link to
LLVM's github issue URL and tell users to file a bug report.
We shouldn't print a stack trace either.

This patch changes report_fatal_error so it uses exit() rather than
abort() when its argument GenCrashDiag=false.

Reviewed by: nikic, MaskRay, RKSimon

Differential Revision: https://reviews.llvm.org/D126550
2022-05-30 19:19:23 +01:00
Edd Barrett d245974e1a Test stackmap support for floating point types.
It appears that float support is complete, or at least, the stackmap records
emitted are not inconceivable (I must admit that I don't know about many of the
architectures under test here).

One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect)
quirk of the stackmap format: in the case of a Register record, the Offset or
SmallConstant field can encode a sub-register index! I've only ever seen this
field zero for Register entries up until now.
2022-05-30 10:49:32 +01:00
Amy Kwan af430944b3 [PowerPC][AIX] Allow VSX patterns to be 32-bit and 64-bit safe on P8+.
This patch updates two patterns involving `scalar_to_vector` and
`SCALAR_TO_VECTOR_PERMUTED` nodes to be safe for both 64-bit and 32-bit by
pulling the patterns out of the 64-bit specific guard. These patterns are
matched on POWER8 and above.

Differential Revision: https://reviews.llvm.org/D125389
2022-05-27 10:34:17 -05:00
Rahman Lavaee 3aa249329f Revert "[Propeller] Promote functions with propeller profiles to .text.hot."
This reverts commit 4d8d2580c5.
2022-05-26 18:45:40 -07:00
Rahman Lavaee 4d8d2580c5 [Propeller] Promote functions with propeller profiles to .text.hot.
Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely).

This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`.

The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`.

Differential Revision: https://reviews.llvm.org/D122930
2022-05-26 16:23:21 -07:00
Chen Zheng d79275238f [MachineSink] replace MachineLoop with MachineCycle
reapply 62a9b36fcf and fix module build
failue:
1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def
   MachineCycleInfoWrapperPass is a anylysis pass, should not be there.
2: move the definition for MachineCycleInfoPrinterPass to cpp file.

Otherwise, there are module conflicit for MachineCycleInfoWrapperPass
in MachinePassRegistry.def and MachineCycleAnalysis.h after
62a9b36fcf.

MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-26 06:45:23 -04:00
Chen Zheng 80c4910f3d Revert "[MachineSink] replace MachineLoop with MachineCycle"
This reverts commit 62a9b36fcf.
Cause build failure on lldb incremental buildbot:
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes
2022-05-24 22:43:37 -04:00
Chen Zheng 62a9b36fcf [MachineSink] replace MachineLoop with MachineCycle
MachineCycle can handle irreducible loop. Natural loop
analysis (MachineLoop) can not return correct loop depth if
the loop is irreducible loop. And MachineSink is sensitive
to the loop depth, see MachineSinking::isProfitableToSinkTo().

This patch tries to use MachineCycle so that we can handle
irreducible loop better.

Reviewed By: sameerds, MatzeB

Differential Revision: https://reviews.llvm.org/D123995
2022-05-24 01:16:19 -04:00
Edd Barrett c5e5cf1258 Test stackmap support for i128
This diff adds tests that check the currently-working stackmap cases for i128.
This will help ensure no regressions are later introduced by D125680 (when
ready).

Note that i128 stackmap support is currently incomplete, so we cant test all
i128 functionality:

    i128 constants >= 2^{63} crash LLVM
    non-constant i128s crash LLVM

So this change tests only constant i128 operands of value < 2^{63}.

A couple of incorrect comments are also fixed.
2022-05-23 11:56:24 +01:00
Amy Kwan c35ca3a1c7 [PowerPC] Implement XL compat __fnabs and __fnabss builtins.
This patch implements the following floating point negative absolute value
builtins that required for compatibility with the XL compiler:
```
double __fnabs(double);
float __fnabss(float);
```

These builtins will emit :
- fnabs on PWR6 and below, or if VSX is disabled.
- xsnabsdp on PWR7 and above, if VSX is enabled.

Differential Revision: https://reviews.llvm.org/D125506
2022-05-19 11:28:40 -05:00
Qiu Chaofan d9d15af787 [PowerPC] Treat llvm.fmuladd intrinsic as using CTR
This fixes bug 55463, similar to D78668. This is a temporary fix since
we will switch to post-isel CTR loop determination in the future.

Reviewed By: dim, shchenz

Differential Revision: https://reviews.llvm.org/D125746
2022-05-18 15:57:55 +08:00
esmeyi 8d6e2c3e3d [XCOFF] support writing sections, relocations and symbols for XCOFF64.
This is the second patch to enable the XCOFF64 object writer.

Reviewed By: jhenderson, shchenz

Differential Revision: https://reviews.llvm.org/D122287
2022-05-17 04:27:47 -04:00
Craig Topper 1c4880a2d3 [TargetLowering] Expand the last stage of i16 popcnt using shift+add+and instead of mul+shift.
If we use multiply it would be with 0x0101 which is 1 more than a power
of 2. On some targets we would expand this to shl+add. By avoiding the
multiply earlier, we can generate better code.

Note, PowerPC doesn't do the shl+add expansion of multiply so one of
the tests increased in instruction count.

Limiting to scalars because it almost always increased the number of
instructions in vector tests.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D125638
2022-05-16 09:27:44 -07:00
Ting Wang 289236d597 [PowerPC] Fix PPCISD::STBRX selection issue on A2
Enable FeatureISA2_06 on Power A2 target

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D125203
2022-05-10 20:47:51 -04:00
Amy Kwan 0c1000cbd6 [NFC][PowerPC] Add 32-bit AIX RUN lines to test cases.
This patch adds 32-bit AIX RUN lines to several test cases, along with the
addition of one new test case, to prepare for future codegen changes involving
the PPCISD::SCALAR_TO_VECTOR_PERMUTED node on 32-bit mode.
2022-05-10 09:20:10 -05:00
Amaury Séchet 06fad8bc05 [DAGCombine] Add node in the worklist in topological order in CombineTo
This is part of an ongoing effort toward making DAGCombine process the nodes in topological order.

This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124743
2022-05-07 16:24:31 +00:00
Amaury Séchet f4183441d4 Automatically generate aix32-cc-abi-vaarg.ll . NFC 2022-05-07 13:22:40 +00:00
David Green 5930691ee1 Revert "[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific"
This reverts commit 891c3cf99e as it turns
out that the error was not caused by this commit, the error caming
from D124526 instead.
2022-05-06 21:03:22 +01:00
David Green 891c3cf99e [DAGCombine] Make combineShuffleOfBitcast LittleEndian specific
Something is going wrong with the BigEndian PowerPC bot. It is hard to
tell what is wrong from here, but attempt to fix it by disabling the
combineShuffleOfBitcast combine for bigendian.
2022-05-06 18:42:44 +01:00
Craig Topper 76f90a9d71 [SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift.
Otherwise we have garbage in the upper bits that can affect the
results of the UREM.

Fixes PR55296.

Differential Revision: https://reviews.llvm.org/D125076
2022-05-06 09:26:30 -07:00
David Green 115c188807 [DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask'))
If the mask is made up of elements that form a mask in the higher type
we can convert shuffle(bitcast into the bitcast type, simplifying the
instruction sequence. A v4i32 2,3,0,1 for example can be treated as a
1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load
combines, along with helping simplify a number of other tests.

The PowerPC combine for v16i8 splat vector loads needed some fixes to
keep it working for v16i8 vectors. This improves the handling of v2i64
shuffles to match too, hopefully improving them in general.

Differential Revision: https://reviews.llvm.org/D123801
2022-05-06 10:50:31 +01:00
David Green 1f37d94838 [PowerPC] Add extra v2i64 splat load tests. NFC
In service of D123801, this add some tests targetting a v2i64 splat of a
load, and regenerates vsx_shuffle_le.ll for easier updating.
2022-05-05 15:56:55 +01:00
Xing Xue e5926906eb [XCOFF][AIX] Use unique section names for LSDA and EH info sections with -ffunction-sections
Summary:
When -ffunction-sections is on, this patch makes the compiler to generate unique LSDA and EH info sections for functions on AIX by appending the function name to the section name as a suffix. This will allow the AIX linker to garbage-collect unused function.

Reviewed by: MaskRay, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D124855
2022-05-05 09:01:36 -04:00
Craig Topper ef849f5048 [PowerPC] Re-run update_mir_test_checks.py on nofpexcept.ll. NFC
This test was previously generated by the script, but the script
now uses CHECK-NEXT instead of CHECK.

This is preparation for a strictfp related patch I'm working on.
2022-05-04 16:17:14 -07:00
Simon Pilgrim 731f0e27ec [PowerPC] Regenerate urem-seteq-illegal-types.ll
Remove superfluous whitespace
2022-05-03 15:57:45 +01:00
Amy Kwan 2534dc120a [PowerPC] Enable CR bits support for Power8 and above.
This patch turns on support for CR bit accesses for Power8 and above. The reason
why CR bits are turned on as the default for Power8 and above is that because
later architectures make use of builtins and instructions that require CR bit
accesses (such as the use of setbc in the vector string isolate predicate
and bcd builtins on Power10).

This patch also adds the clang portion to allow for turning on CR bits in the
front end if the user so desires to.

Differential Revision: https://reviews.llvm.org/D124060
2022-05-02 12:06:15 -05:00
Nikita Popov aae5f8115a [Local] Consider atomic loads from constant global as dead
Per the guidance in
https://llvm.org/docs/Atomics.html#atomics-and-ir-optimization,
an atomic load from a constant global can be dropped, as there can
be no stores to synchronize with. Any write to the constant global
would be UB.

IPSCCP will already drop such loads, but the main helper in Local
doesn't recognize this currently. This is motivated by D118387.

Differential Revision: https://reviews.llvm.org/D124241
2022-05-02 10:52:58 +02:00
Serge Pavlov 9fc58f1820 [PowerPC] Support of ppc_fp128 in lowering of llvm.is_fpclass
PowerPC supports `ppc_fp128`, which is not an IEEE floating point
type. The generic lowering of llvm.is_fpclass could not handle it
properly. This change extends the generic lowering code to
support `ppc_fp128`.

The change was tested on emulator using runtime tests from
https://reviews.llvm.org/D112933 and the patch for clang
https://reviews.llvm.org/D112932.

Differential Revision: https://reviews.llvm.org/D113908
2022-04-29 11:10:47 +07:00
David Tenty 8042699a30 [LLVM] Add exported visibility style for XCOFF
For the AIX linker, under default options, global or weak symbols which
have no visibility bits set to zero (i.e. no visibility, similar to ELF
default) are only exported if specified on an export list provided to
the linker. So AIX has an additional visibility style called
"exported" which indicates to the linker that the symbol should
be explicitly globally exported.

This change maps "dllexport" in the LLVM IR to correspond to XCOFF
exported as we feel this best models the intended semantic (discussion
on the discourse RFC thread: https://discourse.llvm.org/t/rfc-adding-exported-visibility-style-to-the-ir-to-model-xcoff-exported-visibility/61853)
and allows us to enable writing this visibility for the AIX target
in the assembly path.

Reviewed By: DiggerLin

Differential Revision: https://reviews.llvm.org/D123951
2022-04-28 14:56:00 -04:00
David Tenty f6d209b3ec [AIX][XCOFF] error on emit symbol visibility for XCOFF object file
This is a follow on to the revert of D84265 to add an error if we'd need
to write a non-zero visibility type in the xcoff object file. We can't
currently do that because we lack the auxilary header to interpret the
bits in XCOFF32. This is important because visibility is being enabled
in the assembly writing path, and without this error the visibility
could be silently ignored.

Differential Revision: https://reviews.llvm.org/D124392
2022-04-26 19:22:44 -04:00
Alexey Bataev 2cca53c815 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 09:37:16 -07:00
Alexey Bataev 5f7ac15912 Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer."
This reverts commit 2f49163b33 to fix
a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284
2022-04-20 06:35:55 -07:00
Alexey Bataev 2f49163b33 [DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer.
We can process the long shuffles (working across several actual
vector registers) in the best way if we take the actual register
represantion into account. We can build more correct representation of
register shuffles, improve number of recognised buildvector sequences.
Also, same function can be used to improve the cost model for the
shuffles. in future patches.

Part of D100486

Differential Revision: https://reviews.llvm.org/D115653
2022-04-20 05:32:56 -07:00
Qiu Chaofan 1e23175df6 [PowerPC] Mark side effects of Power9 darn instruction
This fixes CVE-2019-15847, preventing random number generation from
being merged.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D122783
2022-04-18 13:21:40 +08:00
Kai Luo 7c5d5edec8 [PowerPC] Generate tests for 16-byte atomic load/store. NFC. 2022-04-09 16:36:57 +08:00
Kai Luo 18679ac0d7 [PowerPC] Adjust `MaxAtomicSizeInBitsSupported` on PPC64
AtomicExpandPass uses this variable to determine emitting libcalls or not. The default value is 1024 and if we don't specify it for PPC64 explicitly, AtomicExpandPass won't emit `__atomic_*` libcalls for those target unable to inline atomic ops and finally the backend emits `__sync_*` libcalls. Thanks @efriedma for pointing it out.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D122868
2022-04-09 00:03:09 +00:00
Kai Luo 549e118e93 [PowerPC] Support 16-byte lock free atomics on pwr8 and up
Make 16-byte atomic type aligned to 16-byte on PPC64, thus consistent with GCC. Also enable inlining 16-byte atomics on non-AIX targets on PPC64.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D122377
2022-04-08 23:25:56 +00:00
Daniil Kovalev 62a983ebc5 Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if"
This reverts commit 83a798d4b0.

As discussed in D120714 with @thakis, the patch added unneeded complexity
without noticeable benefits.
2022-04-06 20:32:53 +03:00
Daniil Kovalev 83a798d4b0 [CodeGen] Place SDNode debug ID declaration under appropriate #if
Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to
reduce memory usage when it is not needed.

Differential Revision: https://reviews.llvm.org/D120714
2022-04-06 14:09:32 +03:00
Ting Wang b389354b28 [Clang][PowerPC] Add max/min intrinsics to Clang and PPC backend
Add support for builtin_[max|min] which has below prototype:
A builtin_max (A1, A2, A3, ...)
All arguments must have the same type; they must all be float, double, or long double.
Internally use SelectCC to get the result.

Reviewed By: qiucf

Differential Revision: https://reviews.llvm.org/D122478
2022-04-05 22:43:48 -04:00
Dávid Bolvanský fb65aaf0be [NFCI] Fixed missing colon in CHECK directives - part 2 2022-04-03 14:42:59 +02:00
Stefan Pintilie 585c85abe5 [PowerPC] Fix lowering of byval parameters for sizes greater than 8 bytes.
To store a byval parameter the existing code would store as many 8 byte elements
as was required to store the full size of the byval parameter.
For example, a paramter of size 16 would store two element of 8 bytes.
A paramter of size 12 would also store two elements of 8 bytes.
This would sometimes store too many bytes as the size of the paramter is not
always a factor of 8.

This patch fixes that issue and now byval paramters are stored with the correct
number of bytes.

Reviewed By: nemanjai, #powerpc, quinnp, amyk

Differential Revision: https://reviews.llvm.org/D121430
2022-03-31 15:12:46 -05:00
Stefan Pintilie 2e55bc9f3c [PowerPC] Set the special DSCR with a compiler option.
Add a compiler option and the instructions required to set the
special Data Stream Control Register (DSCR). The special register will
not be set by default.

Original patch by: Muhammad Usman

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D117013
2022-03-31 14:06:30 -05:00
Kai Luo a2c0c4abff [PowerPC] Add test for failing lowering llvm.ppc.cfence on i128. NFC. 2022-03-25 17:56:11 +08:00
Stefan Pintilie 2c25c65cdc [PowerPC] The BL8_NOTOC_RM instruction needs to produce a notoc relocation.
The BL8_NOTOC_RM instruction was incorrectly producing a relocation that reqired
a TOC restore after the call. This patch fixes that issue and the notoc
relocation is now used.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D122012
2022-03-23 19:01:05 -05:00
Stefan Pintilie 4275d7e65a [PowerPC][NFC] Add test case for byval argument passing
Add a test case for byval argument passing where the argument size is more than
8 bytes and is not a factor of 8 bytes.
2022-03-21 15:14:28 -05:00
Aaron Puchert c1a31ee65b [PPCISelLowering] Avoid emitting calls to __multi3, __muloti4
After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3
instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not
even with compiler-rt. Block it as well so that we get inline code.

Because libgcc doesn't have __muloti4, we block that as well.

Fixes #54460.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122090
2022-03-20 20:59:30 +01:00
Chen Zheng 973b02b6f1 [PowerPC][NFC] use right hardware loop intrinsics in test case 2022-03-20 10:00:57 -04:00
esmeyi de20a3b677 [XCOFF] support XCOFFObjectWriter for fileHeader and sectionHeaders in 64-bit XCOFF.
This is the first patch to enable the XCOFF64 object writer.
Currently only fileHeader and sectionHeaders are supported.

Reviewed By: jhenderson, DiggerLin

Differential Revision: https://reviews.llvm.org/D120861
2022-03-20 09:31:29 -04:00
Kai Luo 31906a6090 [AtomicExpand][PowerPC] Fix all-one mask value
When generating a all-one mask value whose bitwidth is larger than 64, signed extension should be used rather then zero extension.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D120865
2022-03-18 13:35:54 +08:00
Stefan Pintilie 78406ac898 [PowerPC][P10] Add Vector pair calling convention
Add the calling convention for the vector pair registers.
These registers overlap with the vector registers.

Part of an original patch by: Lei Huang

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D117225
2022-03-15 14:08:42 -05:00
Qiu Chaofan 300e1293de [PowerPC] Disable perfect shuffle by default
We are going to remove the old 'perfect shuffle' optimization since it
brings performance penalty in hot loop around vectors. For example, in
following loop sharing the same mask:

  %v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>
  %v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27>

The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead
of `vperm-vperm`. In some large loop cases, this causes 20%+ performance
penalty.

The original attempt to resolve this is to pre-record masks of every
shufflevector operation in DAG, but that is somewhat complex and brings
unnecessary computation (to scan all nodes) in optimization. Here we
disable it by default. There're indeed some cases becoming worse after
this, which will be fixed in a more careful way in future patches.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D121082
2022-03-15 15:52:24 +08:00
Nemanja Ivanovic 766ca2c59e [PowerPC] Add missed VSX shuffles instead of Altivec ones
VSX introduced some permute instructions that are direct
replacements for Altivec ones except they can target all
the VSX registers. We have added code generation for most
of these but somehow missed the low/hi word merges (XXMRG[LH]W).
This caused some additional spills on some large
computationally intensive code.

This patch simply adds the missed patterns.
2022-03-14 10:11:54 -05:00
Xiang1 Zhang c31014322c TLS loads opimization (hoist)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120000
2022-03-10 09:29:06 +08:00
Masoud Ataei 30f30e1c12 [PowerPC] Fix the none tail call in scalar MASS conversion
This patch is proposing a fix for patch https://reviews.llvm.org/D101759
on none tail call math function conversion to MASS call.

Differential: https://reviews.llvm.org/D121016

reviewer: @nemanjai
2022-03-08 08:59:17 -08:00
Qiu Chaofan b2497e5435 [PowerPC] Add generic fnmsub intrinsic
Currently in Clang, we have two types of builtins for fnmsub operation:
one for float/double vector, they'll be transformed into IR operations;
one for float/double scalar, they'll generate corresponding intrinsics.

But for the vector version of builtin, the 3 op chain may be recognized
as expensive by some passes (like early cse). We need some way to keep
the fnmsub form until code generation.

This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub
intrinsics.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D116015
2022-03-07 13:00:06 +08:00
David Green 4388f4f776 [DAG] Don't convert undef to 0 when creating buildvector
When inserting undef into buildvectors created from shuffles of
buildvectors, we convert elements to the largest needed type. This had
the effect of converting undef into 0, which isn't needed as the
buildvector implicitly truncates and trunc(zext(undef)) == undef.

Differential Revision: https://reviews.llvm.org/D121002
2022-03-06 18:35:34 +00:00
Kai Luo 1cfcbf197c [PowerPC][atomics] Precommit test cases for i128 cmpxchg. NFC. 2022-03-03 10:47:52 +08:00
Xiang1 Zhang 65588a0776 Revert "TLS loads opimization (hoist)"
Revert for more reviews

This reverts commit 30e612ebdf.
2022-03-02 14:10:11 +08:00
Xiang1 Zhang 30e612ebdf TLS loads opimization (hoist)
Reviewed By: Wang Pheobe, Topper Craig

Differential Revision: https://reviews.llvm.org/D120000
2022-03-02 10:37:24 +08:00
Jay Foad 719bac55df [MIRParser] Diagnose too large align values in MachineMemOperands
When parsing MachineMemOperands, MIRParser treated the "align" keyword
the same as "basealign". Really "basealign" should specify the
alignment of the MachinePointerInfo base value, and "align" should
specify the alignment of that base value plus the offset.

This worked OK when the specified alignment was no larger than the
alignment of the offset, but in cases like this it just caused
confusion:

    STW killed %18, 4, %stack.1.ap2.i.i :: (store (s32) into %stack.1.ap2.i.i + 4, align 8)

MIRPrinter would never have printed this, with an offset of 4 but an
align of 8, so it must have been written by hand. MIRParser would
interpret "align 8" as "basealign 8", but I think it is better to give
an error and force the user to write "basealign 8" if that is what they
really meant.

Differential Revision: https://reviews.llvm.org/D120400

Change-Id: I7eeeefc55c2df3554ba8d89f8809a2f45ada32d8
2022-02-24 15:32:08 +00:00
Stefan Pintilie b3e63ee2e5 [NFC][PowerPC] Fix the check-cpu.ll test case.
This test doesn't work because the CHECK-NOT line is actually checking
something that only exists on stderr and not stdout.
Changed the test so that we now check both stderr and stdout.
Changed the test so that we check pwr9, pwr10, and future. The cpu names of
power9 or power10 are not supported in the llc backend.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D120349
2022-02-23 14:09:34 -06:00
Craig Topper 440c4b705a [SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y).
Previous we used sra (X, size(X)-1); xor (add (X, Y), Y).

By placing sub at the end, we allow RISCV to combine sign_extend_inreg
with it to form subw.

Some X86 tests for Z - abs(X) seem to have improved as well.

Other targets look to be a wash.

I had to modify ARM's abs matching code to match from sub instead of
xor. Maybe instead ISD::ABS should be made legal. I'll try that in
parallel to this patch.

This is an alternative to D119099 which was focused on RISCV only.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D119171
2022-02-20 21:11:23 -08:00
esmeyi 7b67d2e398 Reland [XCOFF][llvm-objdump] change the priority of symbols with the same address by symbol types.
Fix the Buildbot failure #19373.

Differential Revision: https://reviews.llvm.org/D117642
2022-02-20 21:51:10 -05:00
esmeyi 0bf3fec4cd Revert "[XCOFF][llvm-objdump] change the priority of symbols with"
This reverts commit 2ad662172c.

Buildbot failure #19373
2022-02-18 04:12:32 -05:00
esmeyi 2ad662172c [XCOFF][llvm-objdump] change the priority of symbols with
the same address by symbol types.

Summary: In XCOFF, each section comes with a default symbol
         with the same name as the section. It doesn't bind
         to code locations and it may cause incorrect display
         of symbol names under `llvm-objdump -d`.
         This patch changes the priority of symbols with the
         same address by symbol type.

Reviewed By: jhenderson, shchenz

Differential Revision: https://reviews.llvm.org/D117642
2022-02-18 00:29:10 -05:00
Amy Kwan 5dc0a1657b [PowerPC] Fix __builtin_pdepd and __builtin_pextd to be 64-bit and P10 only.
The `__builtin_pdepd` and `__builtin_pextd` are P10 builtins that are meant to
be used under 64-bit only. For instance, when the builtins are compiled under
32-bit mode:
```
$ cat t.c
unsigned long long foo(unsigned long long a, unsigned long long b) {
  return __builtin_pextd(a,b);
}

$ clang -c t.c -mcpu=pwr10 -m32
ExpandIntegerResult #0: t31: i64 = llvm.ppc.pextd TargetConstant:i32<6928>, t28, t29

fatal error: error in backend: Do not know how to expand the result of this operator!
```
This patch adds sema checking for these builtins to compile under 64-bit
mode only and on P10. The builtins will emit a diagnostic when they are compiled on
non-P10 compilations and on 32-bit mode.

Differential Revision: https://reviews.llvm.org/D118753
2022-02-15 12:30:50 -06:00
Amy Kwan ac5a5a9cfe [PowerPC] Add default handling for single element vectors, and split/promote vNi1 vectors.
This patch updates the handling of vectors in getPreferredVectorAction():

For single-element and scalable vectors, fall back to default vector legalization
handling. For vNi1 vectors, add handling to either split or promote them in
order to prevent the production of wide v256i1/v512i1 types.

The following assertion is fixed by this patch, as we ended up producing the
wide vector types (that are used for MMA) in the backend prior to this fix.

```
Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() &&
"Cannot BITCAST between types of different sizes!"
```

Differential Revision: https://reviews.llvm.org/D119521
2022-02-15 08:44:08 -06:00
Roman Lebedev 9ff087598e
[NFC][CodeGen][PPC] Autogenerate checklines in a test to simplify further updates 2022-02-11 01:21:45 +03:00
Ting Wang 097a95f2df [PowerPC] Add custom lowering for SELECT_CC fp128 using xsmaxcqp
Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection,
and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128,
llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt.

Reviewed By: nemanjai, shchenz, amyk

Differential Revision: https://reviews.llvm.org/D117006
2022-02-09 21:48:28 -05:00
Wael Yehia addd073325 [AIX][PowerPC][PGO] Generate .ref for some PGO sections
For PGO on AIX, when we switch to the linux-style PGO variable access
(via _start and _stop labels), we need the compiler to generate a .ref
assembly for each of the three csects:

 -   __llvm_prf_data[RW]
 -   __llvm_prf_names[RO]
 -   __llvm_prf_vnds[RW]

We insert the .ref inside the __llvm_prf_cnts[RW] csect so that if it's
live then the 3 csects are live.

For example, for a testcase with at least one function definition, when
compiled with -fprofile-generate we should generate:

        .csect __llvm_prf_cnts[RW],3
        .ref __llvm_prf_data[RW]   <<============ needs to be inserted
        .ref __llvm_prf_names[RO]  <<===========

the __llvm_prf_vnds is not always present, so we reference it only when
it's present.

Reviewed By: sfertile, daltenty

Differential Revision: https://reviews.llvm.org/D116607
2022-02-05 06:34:20 -05:00
Masoud Ataei 8ce13bc93b [PowerPC] Option controling scalar MASS convertion
differential: https://reviews.llvm.org/D119035

reviewer: bmahjour
2022-02-04 13:24:22 -08:00
Masoud Ataei 256d253332 [PowerPC] Scalar IBM MASS library conversion pass
This patch introduces the conversions from math function calls
to MASS library calls. To resolves calls generated with these conversions, one
need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX.

Differential: https://reviews.llvm.org/D101759

Reviewer: bmahjour
2022-02-02 07:54:19 -08:00
Amy Kwan 0d6e64755a [PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert.
This patch updates the P10 patterns with a load feeding into an insertelt to
utilize the refactored load and store infrastructure, as well as updating any
tests that exhibit any codegen changes.

Furthermore, custom legalization is added for v4f32 on Power9 and above to not
only assist with adjusting the refactored load/stores for P10 vector insert,
but also it enables the utilization of direct moves.

Differential Revision: https://reviews.llvm.org/D115691
2022-02-01 08:48:37 -06:00
Amy Kwan 9cc5b064f1 [PowerPC] Update handling of splat loads for v4i32/v4f32/v2i64 to require non-extending loads.
This patch updates how splat loads handled and is an extension of D106555.

Particularly, for v2i64/v4f32/v4i32 types, they are updated to handle only
non-extending loads. For v8i16/v16i8 types, they are updated to handle extending
loads only if the memory VT is the same vector element VT type.

A test case has been added to illustrate a scenario where a PPCISD::LD_SPLAT
node should not be produced. In this test, it depicts the following f64
extending load used in a v2f64 build vector, but the extending load is actually
used in more places other than the build vector (such as in t12 and t16).
```
Type-legalized selection DAG: %bb.0 'test:entry'
SelectionDAG has 20 nodes:
  t0: ch = EntryToken
  t4: i64,ch = CopyFromReg t0, Register:i64 %1
  t6: i64,ch = CopyFromReg t0, Register:i64 %2
  t11: f64,ch = load<(load (s64) from %ir.b, !tbaa !7)> t0, t4, undef:i64
        t16: f64 = fadd t31, t37
      t34: ch = store<(store (s64) into %ir.c, !tbaa !7)> t31:1, t16, t6, undef:i64
    t36: ch = TokenFactor t34, t37:1
    t27: v2f64 = BUILD_VECTOR t37, t37
  t22: ch,glue = CopyToReg t36, Register:v2f64 $v2, t27
      t12: f64 = fadd t11, t37
    t28: ch = store<(store (s64) into %ir.b, !tbaa !7)> t11:1, t12, t4, undef:i64
  t31: f64,ch = load<(load (s64) from %ir.c, !tbaa !7)> t28, t6, undef:i64
    t2: i64,ch = CopyFromReg t0, Register:i64 %0
  t37: f64,ch = load<(load (s32) from %ir.a, !tbaa !3), anyext from f32> t0, t2, undef:i64
  t23: ch = PPCISD::RET_FLAG t22, Register:v2f64 $v2, t22:1
```

Differential Revision: https://reviews.llvm.org/D117803
2022-01-28 08:23:01 -06:00
Yousuf Ali dad2b6e797 [PowerPC][AIX] Support toc-data attribute for read-only globals.
The patch handles the addition of constant global variables to the table
of contents.

Differential Revision: https://reviews.llvm.org/D116181
2022-01-27 10:47:22 -05:00
Nemanja Ivanovic 0c56bc92e4 [PowerPC] Fix eq/ne comparison of v2i64 pre-Power8
In commit 1674d9b6b2, I fixed the bug where we didn't consider
both words of the result of the comparison. However, the logic
needs to be different for eq and ne.
Namely for eq, we need both words of the doubleword to equal so it
is an AND. OTOH for ne, we need either word to be unequal so it
is an OR.
2022-01-26 08:59:08 -06:00
Qiu Chaofan ad0345aed1 [PowerPC] Emit gnu_attribute according to float-abi metadata
According to GNU as documentation, PowerPC supports some .gnu_attribute
tags to represent the vector and float ABI type in the object file.
Some linkers like GNU ld respects the attribute and will prevent objects
with conflicting ABIs being linked.

This patch emits gnu_attribute value in assembly printer according to
the float-abi metadata. More attributes for soft-fp, hard single/double
and even vector ABI need to be supported in the future.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D117193
2022-01-26 13:28:50 +08:00
Sean Fertile a2505bd063 [PowerPC][AIX] Override markFunctionEnd()
During fast-isel calling 'markFunctionEnd' in the base class will call
tidyLandingPads. This can cause an issue where we have determined that
we need ehinfo and emitted a traceback table with the bits set to
indicate that we will be emitting the ehinfo, but the tidying deletes
all landing pads. In this case we end up emitting a reference to
__ehinfo.N symbol, but not emitting a definition to said symbol and the
resulting file fails to assemble.

Differential Revision: https://reviews.llvm.org/D117040
2022-01-25 10:08:53 -05:00
Bjorn Pettersson 109cc5adcc [DAGCombine] Fold SRA of a load into a narrower sign-extending load
An sra is basically sign-extending a narrower value. Fold away the
shift by doing a sextload of a narrower value, when it is legal to
reduce the load width accordingly.

Differential Revision: https://reviews.llvm.org/D116930
2022-01-25 12:14:48 +01:00
Quinn Pham 6a028296fe [PowerPC] Emit warning when SP is clobbered by asm
This patch emits a warning when the stack pointer register (`R1`) is found in
the clobber list of an inline asm statement. Clobbering the stack pointer is
not supported.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D112073
2022-01-24 15:12:23 -06:00
Sander de Smalen 4f8fdf7827 [ISEL] Canonicalise constant splats to RHS.
SelectionDAG::getNode() canonicalises constants to the RHS if the
operation is commutative, but it doesn't do so for constant splat
vectors. Doing this early helps making certain folds on vector types,
simplifying the code required for target DAGCombines that are enabled
before Type legalization.

Somewhat to my surprise, DAGCombine doesn't seem to traverse the
DAG in a post-order DFS, so at the time of doing some custom fold where
the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the
constant splat to the RHS.

This patch leads to a few improvements, but also a few  minor regressions,
which I traced down to D46492. When I tried reverting this change to see
if the changes were still necessary, I ran into some segfaults. Not sure
if there is some latent bug there.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D117794
2022-01-24 09:38:36 +00:00
Qiu Chaofan 8dedf9b58b [PowerPC] Change CTR clobber estimation for 128-bit floating types
Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D117459
2022-01-22 23:20:14 +08:00
Fangrui Song e6cdef187e [XRay][test] Clean up llc RUN lines 2022-01-21 17:00:03 -08:00
Mircea Trofin e67430cca4 [MLGO] ML Regalloc Eviction Advisor
The bulk of the implementation is common between 'release' mode (==AOT-ed
model) and 'development' mode (for training), the main difference is
that in development mode, we may also log features (for training logs),
inject scoring information (currently after the Virtual Register
Rewriter) and then produce the log file.

This patch also introduces the score injection pass, 'Register
Allocation Pass Scoring', which is trivially just logging the score in
development mode.

Differential Revision: https://reviews.llvm.org/D117147
2022-01-19 11:00:32 -08:00
Stefan Pintilie 1324bb29f7 [PowerPC] Fix issue with strict float to int conversion.
When doing the float to int conversion the strict conversion also needs to
retun a chain. This patch fixes that.

Reviewed By: nemanjai, #powerpc, qiucf

Differential Revision: https://reviews.llvm.org/D117464
2022-01-19 10:57:22 -06:00
Sean Fertile 10d3bf9518 [PowerPC][AIX] Fallback to DAG-ISEL if global has toc-data attribute.
FAST-ISEL should fall back to DAG-ISEL when a global variable has the
toc-data attribute. A number of the checks were duplicated in the lit
test becuase of
1) Slightly different output between -O0 and -O2 due to FAST-ISEL vs
   DAG-ISEL codegen.
2) In preperation of a peephole optimization that will run when
   optimizations are enabled.

Differential Revision: https://reviews.llvm.org/D115373
2022-01-17 16:21:38 -05:00
Sanjay Patel fe17ce0fa6 [PowerPC] add RUN lines for both endians to test; NFC
The load narrowing transform works for both targets,
so we might as well test both with simple examples
like this.
2022-01-13 10:49:23 -05:00