llvm-project

Commit Graph

Author	SHA1	Message	Date
Stefan Pintilie	263f1b2f5d	[PowerPC] Fix combine step for shufflevector. The combine step for shufflevector will sometimes replace undef in the mask with a defined value. This can cause an infinite loop in some cases as another combine will then put the undef back in the mask. This patch fixes the issue so that undefs are not replaced when doing a combine. Reviewed By: ZarkoCA, amyk, quinnp, saghir Differential Revision: https://reviews.llvm.org/D127439	2022-06-14 11:31:24 -05:00
Kai Luo	7735653e16	[PowerPC] Update cfence tests to avoid using undef. NFC.	2022-06-14 12:45:46 +08:00
Simon Pilgrim	1cf9b24da3	[DAG] Enable ISD::FSHL/R SimplifyMultipleUseDemandedBits handling inside SimplifyDemandedBits This patch allows SimplifyDemandedBits to call SimplifyMultipleUseDemandedBits in cases where the source operand has other uses, enabling us to peek through the shifted value if we don't demand all the bits/elts. This helps with several of the regressions from D125836	2022-06-12 19:25:20 +01:00
Simon Pilgrim	cf5c63d187	[DAG] visitVECTOR_SHUFFLE - fold splat(insert_vector_elt()) and splat(scalar_to_vector()) to build_vector splats Addresses a number of regressions identified in D127115	2022-06-11 21:06:42 +01:00
Simon Pilgrim	a71ad6a3c8	[DAG] visitINSERT_VECTOR_ELT - fold insert_vector_elt(scalar_to_vector(x),v,i) -> build_vector() Allow scalar_to_vector nodes to be used for the start of a build_vector creation	2022-06-11 15:29:22 +01:00
Simon Pilgrim	599aa617e3	[PowerPC] Regenerate pre-inc-disable.ll checks	2022-06-11 15:12:49 +01:00
Kai Luo	e06faedf1d	[PowerPC] Add tests to reflect cfence on float point types. NFC.	2022-06-11 12:30:15 +08:00
Kai Luo	5018a5dcbe	[PowerPC] Support huge frame size for PPC64 Support allocation of huge stack frame(>2g) on PPC64. For ELFv2 ABI on Linux, quoted from the spec 2.2.3.1 General Stack Frame Requirements > There is no maximum stack frame size defined. On AIX, XL allows such huge frame. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D107886	2022-06-06 09:08:28 +00:00
Nikita Popov	41d5033eb1	[IR] Enable opaque pointers by default This enabled opaque pointers by default in LLVM. The effect of this is twofold: * If IR that contains neither explicit ptr nor %T* types is passed to tools, we will now use opaque pointer mode, unless -opaque-pointers=0 has been explicitly passed. * Users of LLVM as a library will now default to opaque pointers. It is possible to opt-out by calling setOpaquePointers(false) on LLVMContext. A cmake option to toggle this default will not be provided. Frontends or other tools that want to (temporarily) keep using typed pointers should disable opaque pointers via LLVMContext. Differential Revision: https://reviews.llvm.org/D126689	2022-06-02 09:40:56 +02:00
Hendrik Greving	a92ed167f2	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as expand, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-02 00:49:11 +00:00
Hendrik Greving	e9d05cc7d8	Revert "[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4." This reverts commit `430ac5c302`. Due to failures in Clang tests. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 13:27:49 -07:00
Hendrik Greving	430ac5c302	[ValueTypes] Define MVTs for v128i2/v64i4 as well as i2 and i4. Adds MVT::v128i2, MVT::v64i4, and implied MVT::i2, MVT::i4. Keeps MVT::i2, MVT::i4 lowering actions as `expand`, which should be removed once targets set this explicitly. Adjusts 11 lit tests to reflect slightly different behavior during DAG combine. Differential Revision: https://reviews.llvm.org/D125247	2022-06-01 12:48:01 -07:00
Nuno Lopes	80b3dcc045	[Support] Make report_fatal_error respect its GenCrashDiag argument so it doesn't generate a backtrace There are a few places where we use report_fatal_error when the input is broken. Currently, this function always crashes LLVM with an abort signal, which then triggers the backtrace printing code. I think this is excessive, as wrong input shouldn't give a link to LLVM's github issue URL and tell users to file a bug report. We shouldn't print a stack trace either. This patch changes report_fatal_error so it uses exit() rather than abort() when its argument GenCrashDiag=false. Reviewed by: nikic, MaskRay, RKSimon Differential Revision: https://reviews.llvm.org/D126550	2022-05-30 19:19:23 +01:00
Edd Barrett	d245974e1a	Test stackmap support for floating point types. It appears that float support is complete, or at least, the stackmap records emitted are not inconceivable (I must admit that I don't know about many of the architectures under test here). One curiosity, the SystemZ tests highlight an undocumented (or maybe incorrect) quirk of the stackmap format: in the case of a Register record, the Offset or SmallConstant field can encode a sub-register index! I've only ever seen this field zero for Register entries up until now.	2022-05-30 10:49:32 +01:00
Amy Kwan	af430944b3	[PowerPC][AIX] Allow VSX patterns to be 32-bit and 64-bit safe on P8+. This patch updates two patterns involving `scalar_to_vector` and `SCALAR_TO_VECTOR_PERMUTED` nodes to be safe for both 64-bit and 32-bit by pulling the patterns out of the 64-bit specific guard. These patterns are matched on POWER8 and above. Differential Revision: https://reviews.llvm.org/D125389	2022-05-27 10:34:17 -05:00
Rahman Lavaee	3aa249329f	Revert "[Propeller] Promote functions with propeller profiles to .text.hot." This reverts commit `4d8d2580c5`.	2022-05-26 18:45:40 -07:00
Rahman Lavaee	4d8d2580c5	[Propeller] Promote functions with propeller profiles to .text.hot. Today, text section prefixes (none, .unlikely, .hot, and .unkown) are determined based on PGO profile. However, Propeller may deem a function hot when PGO doesn't. Besides, when `-Wl,-keep-text-section-prefix=true` Propeller cannot enforce a global section ordering as the linker can only reorder sections within each output section (.text, .text.hot, .text.unlikely). This patch promotes all functions with Propeller profiles (functions listed in the basic-block-sections profile) to .text.hot. The feature is hidden behind the flag `--bbsections-guided-section-prefix` which defaults to `true`. The new implementation refactors the parsing of basic block sections profile into a new `BasicBlockSectionsProfileReader` analysis pass. This allows us to use the information earlier in `CodeGenPrepare` in order to set the functions text prefix. `BasicBlockSectionsProfileReader` will be used both by `BasicBlockSections` pass and `CodeGenPrepare`. Differential Revision: https://reviews.llvm.org/D122930	2022-05-26 16:23:21 -07:00
Chen Zheng	d79275238f	[MachineSink] replace MachineLoop with MachineCycle reapply `62a9b36fcf` and fix module build failue: 1: remove MachineCycleInfoWrapperPass in MachinePassRegistry.def MachineCycleInfoWrapperPass is a anylysis pass, should not be there. 2: move the definition for MachineCycleInfoPrinterPass to cpp file. Otherwise, there are module conflicit for MachineCycleInfoWrapperPass in MachinePassRegistry.def and MachineCycleAnalysis.h after `62a9b36fcf`. MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-26 06:45:23 -04:00
Chen Zheng	80c4910f3d	Revert "[MachineSink] replace MachineLoop with MachineCycle" This reverts commit `62a9b36fcf`. Cause build failure on lldb incremental buildbot: https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/43994/changes	2022-05-24 22:43:37 -04:00
Chen Zheng	62a9b36fcf	[MachineSink] replace MachineLoop with MachineCycle MachineCycle can handle irreducible loop. Natural loop analysis (MachineLoop) can not return correct loop depth if the loop is irreducible loop. And MachineSink is sensitive to the loop depth, see MachineSinking::isProfitableToSinkTo(). This patch tries to use MachineCycle so that we can handle irreducible loop better. Reviewed By: sameerds, MatzeB Differential Revision: https://reviews.llvm.org/D123995	2022-05-24 01:16:19 -04:00
Edd Barrett	c5e5cf1258	Test stackmap support for i128 This diff adds tests that check the currently-working stackmap cases for i128. This will help ensure no regressions are later introduced by D125680 (when ready). Note that i128 stackmap support is currently incomplete, so we cant test all i128 functionality: i128 constants >= 2^{63} crash LLVM non-constant i128s crash LLVM So this change tests only constant i128 operands of value < 2^{63}. A couple of incorrect comments are also fixed.	2022-05-23 11:56:24 +01:00
Amy Kwan	c35ca3a1c7	[PowerPC] Implement XL compat __fnabs and __fnabss builtins. This patch implements the following floating point negative absolute value builtins that required for compatibility with the XL compiler: ``` double __fnabs(double); float __fnabss(float); ``` These builtins will emit : - fnabs on PWR6 and below, or if VSX is disabled. - xsnabsdp on PWR7 and above, if VSX is enabled. Differential Revision: https://reviews.llvm.org/D125506	2022-05-19 11:28:40 -05:00
Qiu Chaofan	d9d15af787	[PowerPC] Treat llvm.fmuladd intrinsic as using CTR This fixes bug 55463, similar to D78668. This is a temporary fix since we will switch to post-isel CTR loop determination in the future. Reviewed By: dim, shchenz Differential Revision: https://reviews.llvm.org/D125746	2022-05-18 15:57:55 +08:00
esmeyi	8d6e2c3e3d	[XCOFF] support writing sections, relocations and symbols for XCOFF64. This is the second patch to enable the XCOFF64 object writer. Reviewed By: jhenderson, shchenz Differential Revision: https://reviews.llvm.org/D122287	2022-05-17 04:27:47 -04:00
Craig Topper	1c4880a2d3	[TargetLowering] Expand the last stage of i16 popcnt using shift+add+and instead of mul+shift. If we use multiply it would be with 0x0101 which is 1 more than a power of 2. On some targets we would expand this to shl+add. By avoiding the multiply earlier, we can generate better code. Note, PowerPC doesn't do the shl+add expansion of multiply so one of the tests increased in instruction count. Limiting to scalars because it almost always increased the number of instructions in vector tests. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D125638	2022-05-16 09:27:44 -07:00
Ting Wang	289236d597	[PowerPC] Fix PPCISD::STBRX selection issue on A2 Enable FeatureISA2_06 on Power A2 target Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D125203	2022-05-10 20:47:51 -04:00
Amy Kwan	0c1000cbd6	[NFC][PowerPC] Add 32-bit AIX RUN lines to test cases. This patch adds 32-bit AIX RUN lines to several test cases, along with the addition of one new test case, to prepare for future codegen changes involving the PPCISD::SCALAR_TO_VECTOR_PERMUTED node on 32-bit mode.	2022-05-10 09:20:10 -05:00
Amaury Séchet	06fad8bc05	[DAGCombine] Add node in the worklist in topological order in CombineTo This is part of an ongoing effort toward making DAGCombine process the nodes in topological order. This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124743	2022-05-07 16:24:31 +00:00
Amaury Séchet	f4183441d4	Automatically generate aix32-cc-abi-vaarg.ll . NFC	2022-05-07 13:22:40 +00:00
David Green	5930691ee1	Revert "[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific" This reverts commit `891c3cf99e` as it turns out that the error was not caused by this commit, the error caming from D124526 instead.	2022-05-06 21:03:22 +01:00
David Green	891c3cf99e	[DAGCombine] Make combineShuffleOfBitcast LittleEndian specific Something is going wrong with the BigEndian PowerPC bot. It is hard to tell what is wrong from here, but attempt to fix it by disabling the combineShuffleOfBitcast combine for bigendian.	2022-05-06 18:42:44 +01:00
Craig Topper	76f90a9d71	[SelectionDAG] Clear promoted bits before UREM on shift amount in PromoteIntRes_FunnelShift. Otherwise we have garbage in the upper bits that can affect the results of the UREM. Fixes PR55296. Differential Revision: https://reviews.llvm.org/D125076	2022-05-06 09:26:30 -07:00
David Green	115c188807	[DAG][PowerPC] Combine shuffle(bitcast(X), Mask) to bitcast(shuffle(X, Mask')) If the mask is made up of elements that form a mask in the higher type we can convert shuffle(bitcast into the bitcast type, simplifying the instruction sequence. A v4i32 2,3,0,1 for example can be treated as a 1,0 v2i64 shuffle. This helps clean up some of the AArch64 concat load combines, along with helping simplify a number of other tests. The PowerPC combine for v16i8 splat vector loads needed some fixes to keep it working for v16i8 vectors. This improves the handling of v2i64 shuffles to match too, hopefully improving them in general. Differential Revision: https://reviews.llvm.org/D123801	2022-05-06 10:50:31 +01:00
David Green	1f37d94838	[PowerPC] Add extra v2i64 splat load tests. NFC In service of D123801, this add some tests targetting a v2i64 splat of a load, and regenerates vsx_shuffle_le.ll for easier updating.	2022-05-05 15:56:55 +01:00
Xing Xue	e5926906eb	[XCOFF][AIX] Use unique section names for LSDA and EH info sections with -ffunction-sections Summary: When -ffunction-sections is on, this patch makes the compiler to generate unique LSDA and EH info sections for functions on AIX by appending the function name to the section name as a suffix. This will allow the AIX linker to garbage-collect unused function. Reviewed by: MaskRay, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D124855	2022-05-05 09:01:36 -04:00
Craig Topper	ef849f5048	[PowerPC] Re-run update_mir_test_checks.py on nofpexcept.ll. NFC This test was previously generated by the script, but the script now uses CHECK-NEXT instead of CHECK. This is preparation for a strictfp related patch I'm working on.	2022-05-04 16:17:14 -07:00
Simon Pilgrim	731f0e27ec	[PowerPC] Regenerate urem-seteq-illegal-types.ll Remove superfluous whitespace	2022-05-03 15:57:45 +01:00
Amy Kwan	2534dc120a	[PowerPC] Enable CR bits support for Power8 and above. This patch turns on support for CR bit accesses for Power8 and above. The reason why CR bits are turned on as the default for Power8 and above is that because later architectures make use of builtins and instructions that require CR bit accesses (such as the use of setbc in the vector string isolate predicate and bcd builtins on Power10). This patch also adds the clang portion to allow for turning on CR bits in the front end if the user so desires to. Differential Revision: https://reviews.llvm.org/D124060	2022-05-02 12:06:15 -05:00
Nikita Popov	aae5f8115a	[Local] Consider atomic loads from constant global as dead Per the guidance in https://llvm.org/docs/Atomics.html#atomics-and-ir-optimization, an atomic load from a constant global can be dropped, as there can be no stores to synchronize with. Any write to the constant global would be UB. IPSCCP will already drop such loads, but the main helper in Local doesn't recognize this currently. This is motivated by D118387. Differential Revision: https://reviews.llvm.org/D124241	2022-05-02 10:52:58 +02:00
Serge Pavlov	9fc58f1820	[PowerPC] Support of ppc_fp128 in lowering of llvm.is_fpclass PowerPC supports `ppc_fp128`, which is not an IEEE floating point type. The generic lowering of llvm.is_fpclass could not handle it properly. This change extends the generic lowering code to support `ppc_fp128`. The change was tested on emulator using runtime tests from https://reviews.llvm.org/D112933 and the patch for clang https://reviews.llvm.org/D112932. Differential Revision: https://reviews.llvm.org/D113908	2022-04-29 11:10:47 +07:00
David Tenty	8042699a30	[LLVM] Add exported visibility style for XCOFF For the AIX linker, under default options, global or weak symbols which have no visibility bits set to zero (i.e. no visibility, similar to ELF default) are only exported if specified on an export list provided to the linker. So AIX has an additional visibility style called "exported" which indicates to the linker that the symbol should be explicitly globally exported. This change maps "dllexport" in the LLVM IR to correspond to XCOFF exported as we feel this best models the intended semantic (discussion on the discourse RFC thread: https://discourse.llvm.org/t/rfc-adding-exported-visibility-style-to-the-ir-to-model-xcoff-exported-visibility/61853) and allows us to enable writing this visibility for the AIX target in the assembly path. Reviewed By: DiggerLin Differential Revision: https://reviews.llvm.org/D123951	2022-04-28 14:56:00 -04:00
David Tenty	f6d209b3ec	[AIX][XCOFF] error on emit symbol visibility for XCOFF object file This is a follow on to the revert of D84265 to add an error if we'd need to write a non-zero visibility type in the xcoff object file. We can't currently do that because we lack the auxilary header to interpret the bits in XCOFF32. This is important because visibility is being enabled in the assembly writing path, and without this error the visibility could be silently ignored. Differential Revision: https://reviews.llvm.org/D124392	2022-04-26 19:22:44 -04:00
Alexey Bataev	2cca53c815	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 09:37:16 -07:00
Alexey Bataev	5f7ac15912	Revert "[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer." This reverts commit `2f49163b33` to fix a buildbot failure. Reported in https://lab.llvm.org/buildbot#builders/105/builds/24284	2022-04-20 06:35:55 -07:00
Alexey Bataev	2f49163b33	[DAG]Introduce llvm::processShuffleMasks and use it for shuffles in DAG Type Legalizer. We can process the long shuffles (working across several actual vector registers) in the best way if we take the actual register represantion into account. We can build more correct representation of register shuffles, improve number of recognised buildvector sequences. Also, same function can be used to improve the cost model for the shuffles. in future patches. Part of D100486 Differential Revision: https://reviews.llvm.org/D115653	2022-04-20 05:32:56 -07:00
Qiu Chaofan	1e23175df6	[PowerPC] Mark side effects of Power9 darn instruction This fixes CVE-2019-15847, preventing random number generation from being merged. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D122783	2022-04-18 13:21:40 +08:00
Kai Luo	7c5d5edec8	[PowerPC] Generate tests for 16-byte atomic load/store. NFC.	2022-04-09 16:36:57 +08:00
Kai Luo	18679ac0d7	[PowerPC] Adjust `MaxAtomicSizeInBitsSupported` on PPC64 AtomicExpandPass uses this variable to determine emitting libcalls or not. The default value is 1024 and if we don't specify it for PPC64 explicitly, AtomicExpandPass won't emit `__atomic_` libcalls for those target unable to inline atomic ops and finally the backend emits `__sync_` libcalls. Thanks @efriedma for pointing it out. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D122868	2022-04-09 00:03:09 +00:00
Kai Luo	549e118e93	[PowerPC] Support 16-byte lock free atomics on pwr8 and up Make 16-byte atomic type aligned to 16-byte on PPC64, thus consistent with GCC. Also enable inlining 16-byte atomics on non-AIX targets on PPC64. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D122377	2022-04-08 23:25:56 +00:00
Daniil Kovalev	62a983ebc5	Revert "[CodeGen] Place SDNode debug ID declaration under appropriate #if" This reverts commit `83a798d4b0`. As discussed in D120714 with @thakis, the patch added unneeded complexity without noticeable benefits.	2022-04-06 20:32:53 +03:00
Daniil Kovalev	83a798d4b0	[CodeGen] Place SDNode debug ID declaration under appropriate #if Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed. Differential Revision: https://reviews.llvm.org/D120714	2022-04-06 14:09:32 +03:00
Ting Wang	b389354b28	[Clang][PowerPC] Add max/min intrinsics to Clang and PPC backend Add support for builtin_[max\|min] which has below prototype: A builtin_max (A1, A2, A3, ...) All arguments must have the same type; they must all be float, double, or long double. Internally use SelectCC to get the result. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D122478	2022-04-05 22:43:48 -04:00
Dávid Bolvanský	fb65aaf0be	[NFCI] Fixed missing colon in CHECK directives - part 2	2022-04-03 14:42:59 +02:00
Stefan Pintilie	585c85abe5	[PowerPC] Fix lowering of byval parameters for sizes greater than 8 bytes. To store a byval parameter the existing code would store as many 8 byte elements as was required to store the full size of the byval parameter. For example, a paramter of size 16 would store two element of 8 bytes. A paramter of size 12 would also store two elements of 8 bytes. This would sometimes store too many bytes as the size of the paramter is not always a factor of 8. This patch fixes that issue and now byval paramters are stored with the correct number of bytes. Reviewed By: nemanjai, #powerpc, quinnp, amyk Differential Revision: https://reviews.llvm.org/D121430	2022-03-31 15:12:46 -05:00
Stefan Pintilie	2e55bc9f3c	[PowerPC] Set the special DSCR with a compiler option. Add a compiler option and the instructions required to set the special Data Stream Control Register (DSCR). The special register will not be set by default. Original patch by: Muhammad Usman Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D117013	2022-03-31 14:06:30 -05:00
Kai Luo	a2c0c4abff	[PowerPC] Add test for failing lowering llvm.ppc.cfence on i128. NFC.	2022-03-25 17:56:11 +08:00
Stefan Pintilie	2c25c65cdc	[PowerPC] The BL8_NOTOC_RM instruction needs to produce a notoc relocation. The BL8_NOTOC_RM instruction was incorrectly producing a relocation that reqired a TOC restore after the call. This patch fixes that issue and the notoc relocation is now used. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D122012	2022-03-23 19:01:05 -05:00
Stefan Pintilie	4275d7e65a	[PowerPC][NFC] Add test case for byval argument passing Add a test case for byval argument passing where the argument size is more than 8 bytes and is not a factor of 8 bytes.	2022-03-21 15:14:28 -05:00
Aaron Puchert	c1a31ee65b	[PPCISelLowering] Avoid emitting calls to __multi3, __muloti4 After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3 instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not even with compiler-rt. Block it as well so that we get inline code. Because libgcc doesn't have __muloti4, we block that as well. Fixes #54460. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122090	2022-03-20 20:59:30 +01:00
Chen Zheng	973b02b6f1	[PowerPC][NFC] use right hardware loop intrinsics in test case	2022-03-20 10:00:57 -04:00
esmeyi	de20a3b677	[XCOFF] support XCOFFObjectWriter for fileHeader and sectionHeaders in 64-bit XCOFF. This is the first patch to enable the XCOFF64 object writer. Currently only fileHeader and sectionHeaders are supported. Reviewed By: jhenderson, DiggerLin Differential Revision: https://reviews.llvm.org/D120861	2022-03-20 09:31:29 -04:00
Kai Luo	31906a6090	[AtomicExpand][PowerPC] Fix all-one mask value When generating a all-one mask value whose bitwidth is larger than 64, signed extension should be used rather then zero extension. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D120865	2022-03-18 13:35:54 +08:00
Stefan Pintilie	78406ac898	[PowerPC][P10] Add Vector pair calling convention Add the calling convention for the vector pair registers. These registers overlap with the vector registers. Part of an original patch by: Lei Huang Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D117225	2022-03-15 14:08:42 -05:00
Qiu Chaofan	300e1293de	[PowerPC] Disable perfect shuffle by default We are going to remove the old 'perfect shuffle' optimization since it brings performance penalty in hot loop around vectors. For example, in following loop sharing the same mask: %v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> %v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead of `vperm-vperm`. In some large loop cases, this causes 20%+ performance penalty. The original attempt to resolve this is to pre-record masks of every shufflevector operation in DAG, but that is somewhat complex and brings unnecessary computation (to scan all nodes) in optimization. Here we disable it by default. There're indeed some cases becoming worse after this, which will be fixed in a more careful way in future patches. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D121082	2022-03-15 15:52:24 +08:00
Nemanja Ivanovic	766ca2c59e	[PowerPC] Add missed VSX shuffles instead of Altivec ones VSX introduced some permute instructions that are direct replacements for Altivec ones except they can target all the VSX registers. We have added code generation for most of these but somehow missed the low/hi word merges (XXMRG[LH]W). This caused some additional spills on some large computationally intensive code. This patch simply adds the missed patterns.	2022-03-14 10:11:54 -05:00
Xiang1 Zhang	c31014322c	TLS loads opimization (hoist) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120000	2022-03-10 09:29:06 +08:00
Masoud Ataei	30f30e1c12	[PowerPC] Fix the none tail call in scalar MASS conversion This patch is proposing a fix for patch https://reviews.llvm.org/D101759 on none tail call math function conversion to MASS call. Differential: https://reviews.llvm.org/D121016 reviewer: @nemanjai	2022-03-08 08:59:17 -08:00
Qiu Chaofan	b2497e5435	[PowerPC] Add generic fnmsub intrinsic Currently in Clang, we have two types of builtins for fnmsub operation: one for float/double vector, they'll be transformed into IR operations; one for float/double scalar, they'll generate corresponding intrinsics. But for the vector version of builtin, the 3 op chain may be recognized as expensive by some passes (like early cse). We need some way to keep the fnmsub form until code generation. This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub intrinsics. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D116015	2022-03-07 13:00:06 +08:00
David Green	4388f4f776	[DAG] Don't convert undef to 0 when creating buildvector When inserting undef into buildvectors created from shuffles of buildvectors, we convert elements to the largest needed type. This had the effect of converting undef into 0, which isn't needed as the buildvector implicitly truncates and trunc(zext(undef)) == undef. Differential Revision: https://reviews.llvm.org/D121002	2022-03-06 18:35:34 +00:00
Kai Luo	1cfcbf197c	[PowerPC][atomics] Precommit test cases for i128 cmpxchg. NFC.	2022-03-03 10:47:52 +08:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit `30e612ebdf`.	2022-03-02 14:10:11 +08:00
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Jay Foad	719bac55df	[MIRParser] Diagnose too large align values in MachineMemOperands When parsing MachineMemOperands, MIRParser treated the "align" keyword the same as "basealign". Really "basealign" should specify the alignment of the MachinePointerInfo base value, and "align" should specify the alignment of that base value plus the offset. This worked OK when the specified alignment was no larger than the alignment of the offset, but in cases like this it just caused confusion: STW killed %18, 4, %stack.1.ap2.i.i :: (store (s32) into %stack.1.ap2.i.i + 4, align 8) MIRPrinter would never have printed this, with an offset of 4 but an align of 8, so it must have been written by hand. MIRParser would interpret "align 8" as "basealign 8", but I think it is better to give an error and force the user to write "basealign 8" if that is what they really meant. Differential Revision: https://reviews.llvm.org/D120400 Change-Id: I7eeeefc55c2df3554ba8d89f8809a2f45ada32d8	2022-02-24 15:32:08 +00:00
Stefan Pintilie	b3e63ee2e5	[NFC][PowerPC] Fix the check-cpu.ll test case. This test doesn't work because the CHECK-NOT line is actually checking something that only exists on stderr and not stdout. Changed the test so that we now check both stderr and stdout. Changed the test so that we check pwr9, pwr10, and future. The cpu names of power9 or power10 are not supported in the llc backend. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D120349	2022-02-23 14:09:34 -06:00
Craig Topper	440c4b705a	[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y). Previous we used sra (X, size(X)-1); xor (add (X, Y), Y). By placing sub at the end, we allow RISCV to combine sign_extend_inreg with it to form subw. Some X86 tests for Z - abs(X) seem to have improved as well. Other targets look to be a wash. I had to modify ARM's abs matching code to match from sub instead of xor. Maybe instead ISD::ABS should be made legal. I'll try that in parallel to this patch. This is an alternative to D119099 which was focused on RISCV only. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119171	2022-02-20 21:11:23 -08:00
esmeyi	7b67d2e398	Reland [XCOFF][llvm-objdump] change the priority of symbols with the same address by symbol types. Fix the Buildbot failure #19373. Differential Revision: https://reviews.llvm.org/D117642	2022-02-20 21:51:10 -05:00
esmeyi	0bf3fec4cd	Revert "[XCOFF][llvm-objdump] change the priority of symbols with" This reverts commit `2ad662172c`. Buildbot failure #19373	2022-02-18 04:12:32 -05:00
esmeyi	2ad662172c	[XCOFF][llvm-objdump] change the priority of symbols with the same address by symbol types. Summary: In XCOFF, each section comes with a default symbol with the same name as the section. It doesn't bind to code locations and it may cause incorrect display of symbol names under `llvm-objdump -d`. This patch changes the priority of symbols with the same address by symbol type. Reviewed By: jhenderson, shchenz Differential Revision: https://reviews.llvm.org/D117642	2022-02-18 00:29:10 -05:00
Amy Kwan	5dc0a1657b	[PowerPC] Fix __builtin_pdepd and __builtin_pextd to be 64-bit and P10 only. The `__builtin_pdepd` and `__builtin_pextd` are P10 builtins that are meant to be used under 64-bit only. For instance, when the builtins are compiled under 32-bit mode: ``` $ cat t.c unsigned long long foo(unsigned long long a, unsigned long long b) { return __builtin_pextd(a,b); } $ clang -c t.c -mcpu=pwr10 -m32 ExpandIntegerResult #0: t31: i64 = llvm.ppc.pextd TargetConstant:i32<6928>, t28, t29 fatal error: error in backend: Do not know how to expand the result of this operator! ``` This patch adds sema checking for these builtins to compile under 64-bit mode only and on P10. The builtins will emit a diagnostic when they are compiled on non-P10 compilations and on 32-bit mode. Differential Revision: https://reviews.llvm.org/D118753	2022-02-15 12:30:50 -06:00
Amy Kwan	ac5a5a9cfe	[PowerPC] Add default handling for single element vectors, and split/promote vNi1 vectors. This patch updates the handling of vectors in getPreferredVectorAction(): For single-element and scalable vectors, fall back to default vector legalization handling. For vNi1 vectors, add handling to either split or promote them in order to prevent the production of wide v256i1/v512i1 types. The following assertion is fixed by this patch, as we ended up producing the wide vector types (that are used for MMA) in the backend prior to this fix. ``` Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() && "Cannot BITCAST between types of different sizes!" ``` Differential Revision: https://reviews.llvm.org/D119521	2022-02-15 08:44:08 -06:00
Roman Lebedev	9ff087598e	[NFC][CodeGen][PPC] Autogenerate checklines in a test to simplify further updates	2022-02-11 01:21:45 +03:00
Ting Wang	097a95f2df	[PowerPC] Add custom lowering for SELECT_CC fp128 using xsmaxcqp Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection, and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128, llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt. Reviewed By: nemanjai, shchenz, amyk Differential Revision: https://reviews.llvm.org/D117006	2022-02-09 21:48:28 -05:00
Wael Yehia	addd073325	[AIX][PowerPC][PGO] Generate .ref for some PGO sections For PGO on AIX, when we switch to the linux-style PGO variable access (via _start and _stop labels), we need the compiler to generate a .ref assembly for each of the three csects: - __llvm_prf_data[RW] - __llvm_prf_names[RO] - __llvm_prf_vnds[RW] We insert the .ref inside the __llvm_prf_cnts[RW] csect so that if it's live then the 3 csects are live. For example, for a testcase with at least one function definition, when compiled with -fprofile-generate we should generate: .csect __llvm_prf_cnts[RW],3 .ref __llvm_prf_data[RW] <<============ needs to be inserted .ref __llvm_prf_names[RO] <<=========== the __llvm_prf_vnds is not always present, so we reference it only when it's present. Reviewed By: sfertile, daltenty Differential Revision: https://reviews.llvm.org/D116607	2022-02-05 06:34:20 -05:00
Masoud Ataei	8ce13bc93b	[PowerPC] Option controling scalar MASS convertion differential: https://reviews.llvm.org/D119035 reviewer: bmahjour	2022-02-04 13:24:22 -08:00
Masoud Ataei	256d253332	[PowerPC] Scalar IBM MASS library conversion pass This patch introduces the conversions from math function calls to MASS library calls. To resolves calls generated with these conversions, one need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX. Differential: https://reviews.llvm.org/D101759 Reviewer: bmahjour	2022-02-02 07:54:19 -08:00
Amy Kwan	0d6e64755a	[PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert. This patch updates the P10 patterns with a load feeding into an insertelt to utilize the refactored load and store infrastructure, as well as updating any tests that exhibit any codegen changes. Furthermore, custom legalization is added for v4f32 on Power9 and above to not only assist with adjusting the refactored load/stores for P10 vector insert, but also it enables the utilization of direct moves. Differential Revision: https://reviews.llvm.org/D115691	2022-02-01 08:48:37 -06:00
Amy Kwan	9cc5b064f1	[PowerPC] Update handling of splat loads for v4i32/v4f32/v2i64 to require non-extending loads. This patch updates how splat loads handled and is an extension of D106555. Particularly, for v2i64/v4f32/v4i32 types, they are updated to handle only non-extending loads. For v8i16/v16i8 types, they are updated to handle extending loads only if the memory VT is the same vector element VT type. A test case has been added to illustrate a scenario where a PPCISD::LD_SPLAT node should not be produced. In this test, it depicts the following f64 extending load used in a v2f64 build vector, but the extending load is actually used in more places other than the build vector (such as in t12 and t16). ``` Type-legalized selection DAG: %bb.0 'test:entry' SelectionDAG has 20 nodes: t0: ch = EntryToken t4: i64,ch = CopyFromReg t0, Register:i64 %1 t6: i64,ch = CopyFromReg t0, Register:i64 %2 t11: f64,ch = load<(load (s64) from %ir.b, !tbaa !7)> t0, t4, undef:i64 t16: f64 = fadd t31, t37 t34: ch = store<(store (s64) into %ir.c, !tbaa !7)> t31:1, t16, t6, undef:i64 t36: ch = TokenFactor t34, t37:1 t27: v2f64 = BUILD_VECTOR t37, t37 t22: ch,glue = CopyToReg t36, Register:v2f64 $v2, t27 t12: f64 = fadd t11, t37 t28: ch = store<(store (s64) into %ir.b, !tbaa !7)> t11:1, t12, t4, undef:i64 t31: f64,ch = load<(load (s64) from %ir.c, !tbaa !7)> t28, t6, undef:i64 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t37: f64,ch = load<(load (s32) from %ir.a, !tbaa !3), anyext from f32> t0, t2, undef:i64 t23: ch = PPCISD::RET_FLAG t22, Register:v2f64 $v2, t22:1 ``` Differential Revision: https://reviews.llvm.org/D117803	2022-01-28 08:23:01 -06:00
Yousuf Ali	dad2b6e797	[PowerPC][AIX] Support toc-data attribute for read-only globals. The patch handles the addition of constant global variables to the table of contents. Differential Revision: https://reviews.llvm.org/D116181	2022-01-27 10:47:22 -05:00
Nemanja Ivanovic	0c56bc92e4	[PowerPC] Fix eq/ne comparison of v2i64 pre-Power8 In commit `1674d9b6b2`, I fixed the bug where we didn't consider both words of the result of the comparison. However, the logic needs to be different for eq and ne. Namely for eq, we need both words of the doubleword to equal so it is an AND. OTOH for ne, we need either word to be unequal so it is an OR.	2022-01-26 08:59:08 -06:00
Qiu Chaofan	ad0345aed1	[PowerPC] Emit gnu_attribute according to float-abi metadata According to GNU as documentation, PowerPC supports some .gnu_attribute tags to represent the vector and float ABI type in the object file. Some linkers like GNU ld respects the attribute and will prevent objects with conflicting ABIs being linked. This patch emits gnu_attribute value in assembly printer according to the float-abi metadata. More attributes for soft-fp, hard single/double and even vector ABI need to be supported in the future. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D117193	2022-01-26 13:28:50 +08:00
Sean Fertile	a2505bd063	[PowerPC][AIX] Override markFunctionEnd() During fast-isel calling 'markFunctionEnd' in the base class will call tidyLandingPads. This can cause an issue where we have determined that we need ehinfo and emitted a traceback table with the bits set to indicate that we will be emitting the ehinfo, but the tidying deletes all landing pads. In this case we end up emitting a reference to __ehinfo.N symbol, but not emitting a definition to said symbol and the resulting file fails to assemble. Differential Revision: https://reviews.llvm.org/D117040	2022-01-25 10:08:53 -05:00
Bjorn Pettersson	109cc5adcc	[DAGCombine] Fold SRA of a load into a narrower sign-extending load An sra is basically sign-extending a narrower value. Fold away the shift by doing a sextload of a narrower value, when it is legal to reduce the load width accordingly. Differential Revision: https://reviews.llvm.org/D116930	2022-01-25 12:14:48 +01:00
Quinn Pham	6a028296fe	[PowerPC] Emit warning when SP is clobbered by asm This patch emits a warning when the stack pointer register (`R1`) is found in the clobber list of an inline asm statement. Clobbering the stack pointer is not supported. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D112073	2022-01-24 15:12:23 -06:00
Sander de Smalen	4f8fdf7827	[ISEL] Canonicalise constant splats to RHS. SelectionDAG::getNode() canonicalises constants to the RHS if the operation is commutative, but it doesn't do so for constant splat vectors. Doing this early helps making certain folds on vector types, simplifying the code required for target DAGCombines that are enabled before Type legalization. Somewhat to my surprise, DAGCombine doesn't seem to traverse the DAG in a post-order DFS, so at the time of doing some custom fold where the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the constant splat to the RHS. This patch leads to a few improvements, but also a few minor regressions, which I traced down to D46492. When I tried reverting this change to see if the changes were still necessary, I ran into some segfaults. Not sure if there is some latent bug there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117794	2022-01-24 09:38:36 +00:00
Qiu Chaofan	8dedf9b58b	[PowerPC] Change CTR clobber estimation for 128-bit floating types Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D117459	2022-01-22 23:20:14 +08:00
Fangrui Song	e6cdef187e	[XRay][test] Clean up llc RUN lines	2022-01-21 17:00:03 -08:00
Mircea Trofin	e67430cca4	[MLGO] ML Regalloc Eviction Advisor The bulk of the implementation is common between 'release' mode (==AOT-ed model) and 'development' mode (for training), the main difference is that in development mode, we may also log features (for training logs), inject scoring information (currently after the Virtual Register Rewriter) and then produce the log file. This patch also introduces the score injection pass, 'Register Allocation Pass Scoring', which is trivially just logging the score in development mode. Differential Revision: https://reviews.llvm.org/D117147	2022-01-19 11:00:32 -08:00
Stefan Pintilie	1324bb29f7	[PowerPC] Fix issue with strict float to int conversion. When doing the float to int conversion the strict conversion also needs to retun a chain. This patch fixes that. Reviewed By: nemanjai, #powerpc, qiucf Differential Revision: https://reviews.llvm.org/D117464	2022-01-19 10:57:22 -06:00
Sean Fertile	10d3bf9518	[PowerPC][AIX] Fallback to DAG-ISEL if global has toc-data attribute. FAST-ISEL should fall back to DAG-ISEL when a global variable has the toc-data attribute. A number of the checks were duplicated in the lit test becuase of 1) Slightly different output between -O0 and -O2 due to FAST-ISEL vs DAG-ISEL codegen. 2) In preperation of a peephole optimization that will run when optimizations are enabled. Differential Revision: https://reviews.llvm.org/D115373	2022-01-17 16:21:38 -05:00
Sanjay Patel	fe17ce0fa6	[PowerPC] add RUN lines for both endians to test; NFC The load narrowing transform works for both targets, so we might as well test both with simple examples like this.	2022-01-13 10:49:23 -05:00

1 2 3 4 5 ...

3389 Commits