llvm-project

Commit Graph

Author	SHA1	Message	Date
Daniil Kovalev	83a798d4b0	[CodeGen] Place SDNode debug ID declaration under appropriate #if Place PersistentId declaration under #if LLVM_ENABLE_ABI_BREAKING_CHECKS to reduce memory usage when it is not needed. Differential Revision: https://reviews.llvm.org/D120714	2022-04-06 14:09:32 +03:00
Ting Wang	b389354b28	[Clang][PowerPC] Add max/min intrinsics to Clang and PPC backend Add support for builtin_[max\|min] which has below prototype: A builtin_max (A1, A2, A3, ...) All arguments must have the same type; they must all be float, double, or long double. Internally use SelectCC to get the result. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D122478	2022-04-05 22:43:48 -04:00
Dávid Bolvanský	fb65aaf0be	[NFCI] Fixed missing colon in CHECK directives - part 2	2022-04-03 14:42:59 +02:00
Stefan Pintilie	585c85abe5	[PowerPC] Fix lowering of byval parameters for sizes greater than 8 bytes. To store a byval parameter the existing code would store as many 8 byte elements as was required to store the full size of the byval parameter. For example, a paramter of size 16 would store two element of 8 bytes. A paramter of size 12 would also store two elements of 8 bytes. This would sometimes store too many bytes as the size of the paramter is not always a factor of 8. This patch fixes that issue and now byval paramters are stored with the correct number of bytes. Reviewed By: nemanjai, #powerpc, quinnp, amyk Differential Revision: https://reviews.llvm.org/D121430	2022-03-31 15:12:46 -05:00
Stefan Pintilie	2e55bc9f3c	[PowerPC] Set the special DSCR with a compiler option. Add a compiler option and the instructions required to set the special Data Stream Control Register (DSCR). The special register will not be set by default. Original patch by: Muhammad Usman Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D117013	2022-03-31 14:06:30 -05:00
Kai Luo	a2c0c4abff	[PowerPC] Add test for failing lowering llvm.ppc.cfence on i128. NFC.	2022-03-25 17:56:11 +08:00
Stefan Pintilie	2c25c65cdc	[PowerPC] The BL8_NOTOC_RM instruction needs to produce a notoc relocation. The BL8_NOTOC_RM instruction was incorrectly producing a relocation that reqired a TOC restore after the call. This patch fixes that issue and the notoc relocation is now used. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D122012	2022-03-23 19:01:05 -05:00
Stefan Pintilie	4275d7e65a	[PowerPC][NFC] Add test case for byval argument passing Add a test case for byval argument passing where the argument size is more than 8 bytes and is not a factor of 8 bytes.	2022-03-21 15:14:28 -05:00
Aaron Puchert	c1a31ee65b	[PPCISelLowering] Avoid emitting calls to __multi3, __muloti4 After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3 instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not even with compiler-rt. Block it as well so that we get inline code. Because libgcc doesn't have __muloti4, we block that as well. Fixes #54460. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D122090	2022-03-20 20:59:30 +01:00
Chen Zheng	973b02b6f1	[PowerPC][NFC] use right hardware loop intrinsics in test case	2022-03-20 10:00:57 -04:00
esmeyi	de20a3b677	[XCOFF] support XCOFFObjectWriter for fileHeader and sectionHeaders in 64-bit XCOFF. This is the first patch to enable the XCOFF64 object writer. Currently only fileHeader and sectionHeaders are supported. Reviewed By: jhenderson, DiggerLin Differential Revision: https://reviews.llvm.org/D120861	2022-03-20 09:31:29 -04:00
Kai Luo	31906a6090	[AtomicExpand][PowerPC] Fix all-one mask value When generating a all-one mask value whose bitwidth is larger than 64, signed extension should be used rather then zero extension. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D120865	2022-03-18 13:35:54 +08:00
Stefan Pintilie	78406ac898	[PowerPC][P10] Add Vector pair calling convention Add the calling convention for the vector pair registers. These registers overlap with the vector registers. Part of an original patch by: Lei Huang Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D117225	2022-03-15 14:08:42 -05:00
Qiu Chaofan	300e1293de	[PowerPC] Disable perfect shuffle by default We are going to remove the old 'perfect shuffle' optimization since it brings performance penalty in hot loop around vectors. For example, in following loop sharing the same mask: %v.1 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> %v.2 = shufflevector ... <0,1,2,3,8,9,10,11,16,17,18,19,24,25,26,27> The generated instructions will be `vmrglw-vmrghw-vmrglw-vmrghw` instead of `vperm-vperm`. In some large loop cases, this causes 20%+ performance penalty. The original attempt to resolve this is to pre-record masks of every shufflevector operation in DAG, but that is somewhat complex and brings unnecessary computation (to scan all nodes) in optimization. Here we disable it by default. There're indeed some cases becoming worse after this, which will be fixed in a more careful way in future patches. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D121082	2022-03-15 15:52:24 +08:00
Nemanja Ivanovic	766ca2c59e	[PowerPC] Add missed VSX shuffles instead of Altivec ones VSX introduced some permute instructions that are direct replacements for Altivec ones except they can target all the VSX registers. We have added code generation for most of these but somehow missed the low/hi word merges (XXMRG[LH]W). This caused some additional spills on some large computationally intensive code. This patch simply adds the missed patterns.	2022-03-14 10:11:54 -05:00
Xiang1 Zhang	c31014322c	TLS loads opimization (hoist) Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D120000	2022-03-10 09:29:06 +08:00
Masoud Ataei	30f30e1c12	[PowerPC] Fix the none tail call in scalar MASS conversion This patch is proposing a fix for patch https://reviews.llvm.org/D101759 on none tail call math function conversion to MASS call. Differential: https://reviews.llvm.org/D121016 reviewer: @nemanjai	2022-03-08 08:59:17 -08:00
Qiu Chaofan	b2497e5435	[PowerPC] Add generic fnmsub intrinsic Currently in Clang, we have two types of builtins for fnmsub operation: one for float/double vector, they'll be transformed into IR operations; one for float/double scalar, they'll generate corresponding intrinsics. But for the vector version of builtin, the 3 op chain may be recognized as expensive by some passes (like early cse). We need some way to keep the fnmsub form until code generation. This patch introduces ppc.fnmsub.* intrinsic to unify four fnmsub intrinsics. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D116015	2022-03-07 13:00:06 +08:00
David Green	4388f4f776	[DAG] Don't convert undef to 0 when creating buildvector When inserting undef into buildvectors created from shuffles of buildvectors, we convert elements to the largest needed type. This had the effect of converting undef into 0, which isn't needed as the buildvector implicitly truncates and trunc(zext(undef)) == undef. Differential Revision: https://reviews.llvm.org/D121002	2022-03-06 18:35:34 +00:00
Kai Luo	1cfcbf197c	[PowerPC][atomics] Precommit test cases for i128 cmpxchg. NFC.	2022-03-03 10:47:52 +08:00
Xiang1 Zhang	65588a0776	Revert "TLS loads opimization (hoist)" Revert for more reviews This reverts commit `30e612ebdf`.	2022-03-02 14:10:11 +08:00
Xiang1 Zhang	30e612ebdf	TLS loads opimization (hoist) Reviewed By: Wang Pheobe, Topper Craig Differential Revision: https://reviews.llvm.org/D120000	2022-03-02 10:37:24 +08:00
Jay Foad	719bac55df	[MIRParser] Diagnose too large align values in MachineMemOperands When parsing MachineMemOperands, MIRParser treated the "align" keyword the same as "basealign". Really "basealign" should specify the alignment of the MachinePointerInfo base value, and "align" should specify the alignment of that base value plus the offset. This worked OK when the specified alignment was no larger than the alignment of the offset, but in cases like this it just caused confusion: STW killed %18, 4, %stack.1.ap2.i.i :: (store (s32) into %stack.1.ap2.i.i + 4, align 8) MIRPrinter would never have printed this, with an offset of 4 but an align of 8, so it must have been written by hand. MIRParser would interpret "align 8" as "basealign 8", but I think it is better to give an error and force the user to write "basealign 8" if that is what they really meant. Differential Revision: https://reviews.llvm.org/D120400 Change-Id: I7eeeefc55c2df3554ba8d89f8809a2f45ada32d8	2022-02-24 15:32:08 +00:00
Stefan Pintilie	b3e63ee2e5	[NFC][PowerPC] Fix the check-cpu.ll test case. This test doesn't work because the CHECK-NOT line is actually checking something that only exists on stderr and not stdout. Changed the test so that we now check both stderr and stdout. Changed the test so that we check pwr9, pwr10, and future. The cpu names of power9 or power10 are not supported in the llc backend. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D120349	2022-02-23 14:09:34 -06:00
Craig Topper	440c4b705a	[SelectionDAG][RISCV][ARM][PowerPC][X86][WebAssembly] Change default abs expansion to use sra (X, size(X)-1); sub (xor (X, Y), Y). Previous we used sra (X, size(X)-1); xor (add (X, Y), Y). By placing sub at the end, we allow RISCV to combine sign_extend_inreg with it to form subw. Some X86 tests for Z - abs(X) seem to have improved as well. Other targets look to be a wash. I had to modify ARM's abs matching code to match from sub instead of xor. Maybe instead ISD::ABS should be made legal. I'll try that in parallel to this patch. This is an alternative to D119099 which was focused on RISCV only. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D119171	2022-02-20 21:11:23 -08:00
esmeyi	7b67d2e398	Reland [XCOFF][llvm-objdump] change the priority of symbols with the same address by symbol types. Fix the Buildbot failure #19373. Differential Revision: https://reviews.llvm.org/D117642	2022-02-20 21:51:10 -05:00
esmeyi	0bf3fec4cd	Revert "[XCOFF][llvm-objdump] change the priority of symbols with" This reverts commit `2ad662172c`. Buildbot failure #19373	2022-02-18 04:12:32 -05:00
esmeyi	2ad662172c	[XCOFF][llvm-objdump] change the priority of symbols with the same address by symbol types. Summary: In XCOFF, each section comes with a default symbol with the same name as the section. It doesn't bind to code locations and it may cause incorrect display of symbol names under `llvm-objdump -d`. This patch changes the priority of symbols with the same address by symbol type. Reviewed By: jhenderson, shchenz Differential Revision: https://reviews.llvm.org/D117642	2022-02-18 00:29:10 -05:00
Amy Kwan	5dc0a1657b	[PowerPC] Fix __builtin_pdepd and __builtin_pextd to be 64-bit and P10 only. The `__builtin_pdepd` and `__builtin_pextd` are P10 builtins that are meant to be used under 64-bit only. For instance, when the builtins are compiled under 32-bit mode: ``` $ cat t.c unsigned long long foo(unsigned long long a, unsigned long long b) { return __builtin_pextd(a,b); } $ clang -c t.c -mcpu=pwr10 -m32 ExpandIntegerResult #0: t31: i64 = llvm.ppc.pextd TargetConstant:i32<6928>, t28, t29 fatal error: error in backend: Do not know how to expand the result of this operator! ``` This patch adds sema checking for these builtins to compile under 64-bit mode only and on P10. The builtins will emit a diagnostic when they are compiled on non-P10 compilations and on 32-bit mode. Differential Revision: https://reviews.llvm.org/D118753	2022-02-15 12:30:50 -06:00
Amy Kwan	ac5a5a9cfe	[PowerPC] Add default handling for single element vectors, and split/promote vNi1 vectors. This patch updates the handling of vectors in getPreferredVectorAction(): For single-element and scalable vectors, fall back to default vector legalization handling. For vNi1 vectors, add handling to either split or promote them in order to prevent the production of wide v256i1/v512i1 types. The following assertion is fixed by this patch, as we ended up producing the wide vector types (that are used for MMA) in the backend prior to this fix. ``` Assertion failed: VT.getSizeInBits() == Operand.getValueSizeInBits() && "Cannot BITCAST between types of different sizes!" ``` Differential Revision: https://reviews.llvm.org/D119521	2022-02-15 08:44:08 -06:00
Roman Lebedev	9ff087598e	[NFC][CodeGen][PPC] Autogenerate checklines in a test to simplify further updates	2022-02-11 01:21:45 +03:00
Ting Wang	097a95f2df	[PowerPC] Add custom lowering for SELECT_CC fp128 using xsmaxcqp Power ISA 3.1 adds xsmaxcqp/xsmincqp for quad-precision type-c max/min selection, and this opens the opportunity to improve instruction selection on: llvm.maxnum.f128, llvm.minnum.f128, and select_cc ordered gt/lt and (don't care) gt/lt. Reviewed By: nemanjai, shchenz, amyk Differential Revision: https://reviews.llvm.org/D117006	2022-02-09 21:48:28 -05:00
Wael Yehia	addd073325	[AIX][PowerPC][PGO] Generate .ref for some PGO sections For PGO on AIX, when we switch to the linux-style PGO variable access (via _start and _stop labels), we need the compiler to generate a .ref assembly for each of the three csects: - __llvm_prf_data[RW] - __llvm_prf_names[RO] - __llvm_prf_vnds[RW] We insert the .ref inside the __llvm_prf_cnts[RW] csect so that if it's live then the 3 csects are live. For example, for a testcase with at least one function definition, when compiled with -fprofile-generate we should generate: .csect __llvm_prf_cnts[RW],3 .ref __llvm_prf_data[RW] <<============ needs to be inserted .ref __llvm_prf_names[RO] <<=========== the __llvm_prf_vnds is not always present, so we reference it only when it's present. Reviewed By: sfertile, daltenty Differential Revision: https://reviews.llvm.org/D116607	2022-02-05 06:34:20 -05:00
Masoud Ataei	8ce13bc93b	[PowerPC] Option controling scalar MASS convertion differential: https://reviews.llvm.org/D119035 reviewer: bmahjour	2022-02-04 13:24:22 -08:00
Masoud Ataei	256d253332	[PowerPC] Scalar IBM MASS library conversion pass This patch introduces the conversions from math function calls to MASS library calls. To resolves calls generated with these conversions, one need to link libxlopt.a library. This patch is tested on PowerPC Linux and AIX. Differential: https://reviews.llvm.org/D101759 Reviewer: bmahjour	2022-02-02 07:54:19 -08:00
Amy Kwan	0d6e64755a	[PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert. This patch updates the P10 patterns with a load feeding into an insertelt to utilize the refactored load and store infrastructure, as well as updating any tests that exhibit any codegen changes. Furthermore, custom legalization is added for v4f32 on Power9 and above to not only assist with adjusting the refactored load/stores for P10 vector insert, but also it enables the utilization of direct moves. Differential Revision: https://reviews.llvm.org/D115691	2022-02-01 08:48:37 -06:00
Amy Kwan	9cc5b064f1	[PowerPC] Update handling of splat loads for v4i32/v4f32/v2i64 to require non-extending loads. This patch updates how splat loads handled and is an extension of D106555. Particularly, for v2i64/v4f32/v4i32 types, they are updated to handle only non-extending loads. For v8i16/v16i8 types, they are updated to handle extending loads only if the memory VT is the same vector element VT type. A test case has been added to illustrate a scenario where a PPCISD::LD_SPLAT node should not be produced. In this test, it depicts the following f64 extending load used in a v2f64 build vector, but the extending load is actually used in more places other than the build vector (such as in t12 and t16). ``` Type-legalized selection DAG: %bb.0 'test:entry' SelectionDAG has 20 nodes: t0: ch = EntryToken t4: i64,ch = CopyFromReg t0, Register:i64 %1 t6: i64,ch = CopyFromReg t0, Register:i64 %2 t11: f64,ch = load<(load (s64) from %ir.b, !tbaa !7)> t0, t4, undef:i64 t16: f64 = fadd t31, t37 t34: ch = store<(store (s64) into %ir.c, !tbaa !7)> t31:1, t16, t6, undef:i64 t36: ch = TokenFactor t34, t37:1 t27: v2f64 = BUILD_VECTOR t37, t37 t22: ch,glue = CopyToReg t36, Register:v2f64 $v2, t27 t12: f64 = fadd t11, t37 t28: ch = store<(store (s64) into %ir.b, !tbaa !7)> t11:1, t12, t4, undef:i64 t31: f64,ch = load<(load (s64) from %ir.c, !tbaa !7)> t28, t6, undef:i64 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t37: f64,ch = load<(load (s32) from %ir.a, !tbaa !3), anyext from f32> t0, t2, undef:i64 t23: ch = PPCISD::RET_FLAG t22, Register:v2f64 $v2, t22:1 ``` Differential Revision: https://reviews.llvm.org/D117803	2022-01-28 08:23:01 -06:00
Yousuf Ali	dad2b6e797	[PowerPC][AIX] Support toc-data attribute for read-only globals. The patch handles the addition of constant global variables to the table of contents. Differential Revision: https://reviews.llvm.org/D116181	2022-01-27 10:47:22 -05:00
Nemanja Ivanovic	0c56bc92e4	[PowerPC] Fix eq/ne comparison of v2i64 pre-Power8 In commit `1674d9b6b2`, I fixed the bug where we didn't consider both words of the result of the comparison. However, the logic needs to be different for eq and ne. Namely for eq, we need both words of the doubleword to equal so it is an AND. OTOH for ne, we need either word to be unequal so it is an OR.	2022-01-26 08:59:08 -06:00
Qiu Chaofan	ad0345aed1	[PowerPC] Emit gnu_attribute according to float-abi metadata According to GNU as documentation, PowerPC supports some .gnu_attribute tags to represent the vector and float ABI type in the object file. Some linkers like GNU ld respects the attribute and will prevent objects with conflicting ABIs being linked. This patch emits gnu_attribute value in assembly printer according to the float-abi metadata. More attributes for soft-fp, hard single/double and even vector ABI need to be supported in the future. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D117193	2022-01-26 13:28:50 +08:00
Sean Fertile	a2505bd063	[PowerPC][AIX] Override markFunctionEnd() During fast-isel calling 'markFunctionEnd' in the base class will call tidyLandingPads. This can cause an issue where we have determined that we need ehinfo and emitted a traceback table with the bits set to indicate that we will be emitting the ehinfo, but the tidying deletes all landing pads. In this case we end up emitting a reference to __ehinfo.N symbol, but not emitting a definition to said symbol and the resulting file fails to assemble. Differential Revision: https://reviews.llvm.org/D117040	2022-01-25 10:08:53 -05:00
Bjorn Pettersson	109cc5adcc	[DAGCombine] Fold SRA of a load into a narrower sign-extending load An sra is basically sign-extending a narrower value. Fold away the shift by doing a sextload of a narrower value, when it is legal to reduce the load width accordingly. Differential Revision: https://reviews.llvm.org/D116930	2022-01-25 12:14:48 +01:00
Quinn Pham	6a028296fe	[PowerPC] Emit warning when SP is clobbered by asm This patch emits a warning when the stack pointer register (`R1`) is found in the clobber list of an inline asm statement. Clobbering the stack pointer is not supported. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D112073	2022-01-24 15:12:23 -06:00
Sander de Smalen	4f8fdf7827	[ISEL] Canonicalise constant splats to RHS. SelectionDAG::getNode() canonicalises constants to the RHS if the operation is commutative, but it doesn't do so for constant splat vectors. Doing this early helps making certain folds on vector types, simplifying the code required for target DAGCombines that are enabled before Type legalization. Somewhat to my surprise, DAGCombine doesn't seem to traverse the DAG in a post-order DFS, so at the time of doing some custom fold where the input is a MUL, DAGCombiner::visitMUL hasn't yet reordered the constant splat to the RHS. This patch leads to a few improvements, but also a few minor regressions, which I traced down to D46492. When I tried reverting this change to see if the changes were still necessary, I ran into some segfaults. Not sure if there is some latent bug there. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D117794	2022-01-24 09:38:36 +00:00
Qiu Chaofan	8dedf9b58b	[PowerPC] Change CTR clobber estimation for 128-bit floating types Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D117459	2022-01-22 23:20:14 +08:00
Fangrui Song	e6cdef187e	[XRay][test] Clean up llc RUN lines	2022-01-21 17:00:03 -08:00
Mircea Trofin	e67430cca4	[MLGO] ML Regalloc Eviction Advisor The bulk of the implementation is common between 'release' mode (==AOT-ed model) and 'development' mode (for training), the main difference is that in development mode, we may also log features (for training logs), inject scoring information (currently after the Virtual Register Rewriter) and then produce the log file. This patch also introduces the score injection pass, 'Register Allocation Pass Scoring', which is trivially just logging the score in development mode. Differential Revision: https://reviews.llvm.org/D117147	2022-01-19 11:00:32 -08:00
Stefan Pintilie	1324bb29f7	[PowerPC] Fix issue with strict float to int conversion. When doing the float to int conversion the strict conversion also needs to retun a chain. This patch fixes that. Reviewed By: nemanjai, #powerpc, qiucf Differential Revision: https://reviews.llvm.org/D117464	2022-01-19 10:57:22 -06:00
Sean Fertile	10d3bf9518	[PowerPC][AIX] Fallback to DAG-ISEL if global has toc-data attribute. FAST-ISEL should fall back to DAG-ISEL when a global variable has the toc-data attribute. A number of the checks were duplicated in the lit test becuase of 1) Slightly different output between -O0 and -O2 due to FAST-ISEL vs DAG-ISEL codegen. 2) In preperation of a peephole optimization that will run when optimizations are enabled. Differential Revision: https://reviews.llvm.org/D115373	2022-01-17 16:21:38 -05:00
Sanjay Patel	fe17ce0fa6	[PowerPC] add RUN lines for both endians to test; NFC The load narrowing transform works for both targets, so we might as well test both with simple examples like this.	2022-01-13 10:49:23 -05:00
Nick Desaulniers	79ebc3b0dd	[llvm][test] rewrite callbr to use i rather than X constraint NFC In D115311, we're looking to modify clang to emit i constraints rather than X constraints for callbr's indirect destinations. Prior to doing so, update all of the existing tests in llvm/ to match. Reviewed By: void, jyknight Differential Revision: https://reviews.llvm.org/D115410	2022-01-11 11:31:08 -08:00
Nick Desaulniers	9c4b49db19	[ShrinkWrap] check for PPC's non-callee-saved LR As pointed out in https://reviews.llvm.org/D115688#inline-1108193, we don't want to sink the save point past an INLINEASM_BR, otherwise prologepilog may incorrectly sink a prolog past the MBB containing an INLINEASM_BR and into the wrong MBB. ShrinkWrap is getting this wrong because LR is not in the list of callee saved registers. Specifically, ShrinkWrap::useOrDefCSROrFI calls RegisterClassInfo::getLastCalleeSavedAlias which reads CalleeSavedAliases which was populated by RegisterClassInfo::runOnMachineFunction by iterating the list of MCPhysReg returned from MachineRegisterInfo::getCalleeSavedRegs. Because PPC's LR is non-allocatable, it's NOT considered callee saved. Add an interface to TargetRegisterInfo for such a case and use it in Shrinkwrap to ensure we don't sink a prolog past an INLINEASM or INLINEASM_BR that clobbers LR. Reviewed By: jyknight, efriedma, nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D116424	2022-01-11 10:01:34 -08:00
Nadav Rotem	e2cc091a7d	Fix a missed opportunity to merge stores. This commit fixes a missed opportunity in merging consecutive stores. The code that searches for stores skipped the case of stores that directly connect to the root. The comment above the implementation lists this case but the code did not handle it. I found this pattern when looking into the shared_ptr destructor. GCC generates the right sequence. Here is a small repo: int foo(int* buff) { buff[0] = 0; int x = buff[1]; buff[1] = 0; return x; } Differential Revision: https://reviews.llvm.org/D116895	2022-01-10 13:49:02 -08:00
Chen Zheng	2c46ca96e2	[PowerPC] fast isel can lower intrinsics call on AIX. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D114778	2022-01-10 02:30:05 +00:00
Qiu Chaofan	c9e8a516df	[NFC] Pre-commit case for PowerPC perfect shuffle	2022-01-07 18:07:26 +08:00
Nikita Popov	f430c1eb64	[Tests] Add elementtype attribute to indirect inline asm operands (NFC) This updates LLVM tests for D116531 by adding elementtype attributes to operands that correspond to indirect asm constraints.	2022-01-06 14:23:51 +01:00
Stefan Pintilie	04496201e0	[PowerPC] Add support for ROP protection for 32 bit. Add support for Return Oriented Programming (ROP) protection for 32 bit. This patch also adds a testing for AIX on both 64 and 32 bit. Reviewed By: amyk Differential Revision: https://reviews.llvm.org/D111362	2022-01-05 15:15:53 -06:00
Philip Reames	b061d86c69	[SCEV] Compute exit count from overflow check expressed w/ x.with.overflow intrinsics This ports the logic we generate in instcombine for a single use x.with.overflow check for use in SCEV's analysis. The result is that we can prove trip counts for many checks, and (through existing logic) often discharge them. Motivation comes from compiling a simple example with -ftrapv. Differential Revision: https://reviews.llvm.org/D116499	2022-01-04 09:44:23 -08:00
Nemanja Ivanovic	de4e0195ae	[PowerPC] Add missed test case updates In commit `1674d9b6b2`, I missed adding the updates to existing test cases. This should bring the bots back to green.	2021-12-21 14:55:19 -06:00
Nemanja Ivanovic	1674d9b6b2	[PowerPC] Fix vector equality comparison for v2i64 pre-Power8 The current code makes the assumption that equality comparison can be performed with a word comparison instruction. While this is true if the entire 64-bit results are used, it does not generally work. It is possible that the low order words and high order words produce different results and a user of only one will get the wrong result. This patch adds an and of the result words so that each word has the result of the comparison of the entire doubleword that contains it. Differential revision: https://reviews.llvm.org/D115678	2021-12-21 14:28:41 -06:00
Mircea Trofin	09103807e7	[NFC][regalloc] Introduce the RegAllocEvictionAdvisorAnalysis This patch introduces the eviction analysis and the eviction advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Differential Revision: https://reviews.llvm.org/D115707	2021-12-16 17:56:46 -08:00
Florian Hahn	59a85a7a52	[PPC] Update test after `f5f421e0ee`.	2021-12-16 11:28:54 +00:00
Chen Zheng	d0022a7250	[PowerPC] copy byval parameter to caller's stack when needed Now we won't copy the byval parameter (bigger than 8 bytes) to caller's parameter save area. Instead, we will only copy the byval parameter when it can not be passed entirely in registers which means we have to use parameter save area according to the 64 bit SVR4 ABI. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D111485	2021-12-09 01:00:47 +00:00
Chen Zheng	c16c99ab03	[Powerpc] testcases for D111485; nfc	2021-12-08 02:22:00 +00:00
Chen Zheng	63cd1842a7	[PowerPC] use lvx + splat directly for aligned splat load Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D114062	2021-12-08 02:02:18 +00:00
Chen Zheng	d0a8f86667	[PowerPC][NFC] add cases for D114062	2021-12-07 01:12:01 +00:00
Qiu Chaofan	e3c2694da9	[PowerPC] Implement general back2back fusion Implement 'back-to-back' FX fusion according to Power10 User Manual '19.1.5.4 Fusion', not enabled by default. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D114345	2021-12-06 10:15:05 +08:00
Nemanja Ivanovic	d6c0ef7887	[PowerPC] Handle base load with reservation mnemonic The Power ISA defined l[bhwdq]arx as both base and extended mnemonics. The base mnemonic takes the EH bit as an operand and the extended mnemonic omits it, making it implicitly zero. The existing implementation only handles the base mnemonic when EH is 1 and internally produces a different instruction. There are historical reasons for this. This patch simply removes the limitation introduced by this implementation that disallows the base mnemonic with EH = 0 in the ASM parser. This resolves an issue that prevented some files in the Linux kernel from being built with -fintegrated-as. Also fix a crash if the value is not an integer immediate.	2021-12-03 09:13:02 -06:00
Simon Pilgrim	e85667a2fb	[PowerPC] Add non-constant fcopysign f128 test coverage As discussed on D114589 as the constant case gets affected by SimplifyDemandedBits a lot - the non-constant case currently falls back to copysignl libcalls	2021-12-03 12:04:06 +00:00
Amy Kwan	c27734c183	[PowerPC] Fix load/store selection infrastructure when load/store intrinsics are used on P10. The load/store infrastructure previously made an incorrect assumption that whenever it is used with a load/store intrinsic on Power10 - those intrinsics would automatically be the lxvp/stxvp intrinsics introduced in Power10. However, this is obviously not the case as there are multiple instances of pre-P10 intrinsics that use the refactored load/store implementation. This patch corrects this assumption, and produces the expected intrinsic on pre-P10. Differential Revision: https://reviews.llvm.org/D114978	2021-12-02 15:59:29 -06:00
Simon Pilgrim	6803d08c38	[DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case. Differential Revision: https://reviews.llvm.org/D114676	2021-12-02 11:47:53 +00:00
Yousuf Ali	415e821a50	[PowerPC][AIX] Add toc-data support for 64-bit AIX small code model. The patch expands the existing 32-bit toc-data attribute support to 64-bit. In both 32-bit and 64-bit it is supported for small code model only. Differential Revision: https://reviews.llvm.org/D114654	2021-12-01 10:56:21 -05:00
Qiu Chaofan	15826eb437	[Legalizer] Avoid expansion to BR_CC if illegal Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D110616	2021-12-01 12:22:21 +08:00
Tarique Islam	0850655da6	Big-endian version of vpermxor A big-endian version of vpermxor, named vpermxor_be, is added to LLVM and Clang. vpermxor_be can be called directly on both the little-endian and the big-endian platforms. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D114540	2021-11-30 22:49:55 +00:00
Philip Reames	8906a0fe64	[SCEVExpander] Drop poison generating flags when reusing instructions The basic problem we have is that we're trying to reuse an instruction which is mapped to some SCEV. Since we can have multiple such instructions (potentially with different flags), this is analogous to our need to drop flags when performing CSE. A trivial implementation would simply drop flags on any instruction we decided to reuse, and that would be correct. This patch is almost that trivial patch except that we preserve flags on the reused instruction when existing users would imply UB on overflow already. Adding new users can, at most, refine this program to one which doesn't execute UB which is valid. In practice, this fixes two conceptual problems with the previous code: 1) a binop could have been canonicalized into a form with different opcode or operands, or 2) the inbounds GEP case which was simply unhandled. On the test changes, most are pretty straight forward. We loose some flags (in some cases, they'd have been dropped on the next CSE pass anyways). The one that took me the longest to understand was the ashr-expansion test. What's happening there is that we're considering reuse of the mul, previously we disallowed it entirely, now we allow it with no flags. The surrounding diffs are all effects of generating the same mul with a different operand order, and then doing simple DCE. The loss of the inbounds is unfortunate, but even there, we can recover most of those once we actually treat branch-on-poison as immediate UB. Differential Revision: https://reviews.llvm.org/D112734	2021-11-29 15:23:34 -08:00
Simon Pilgrim	7ba64ab05a	[PowerPC] Regenerate ppc64-P9-vabsd.ll tests	2021-11-27 16:43:50 +00:00
Nikita Popov	2b160e95c8	Reland [SCEV] Fix and validate ValueExprMap/ExprValueMap consistency Relative to the previous landing attempt, this introduces an additional flag on forgetMemoizedResults() to not remove SCEVUnknown phis from the value map. The invalidation after BECount calculation wants to leave these alone and skips them in its own use-def walk, but we can still end up invalidating them via forgetMemoizedResults() if there is another IR value with the same SCEV. This is intended as a temporary workaround only, and the need for this should go away once the getBackedgeTakenInfo() invalidation is refactored in the spirit of D114263. ----- This adds validation for consistency of ValueExprMap and ExprValueMap, and fixes identified issues: * Addrec construction directly wrote to ValueExprMap in a few places, without updating ExprValueMap. Add a helper to ensures they stay consistent. The adjustment in forgetSymbolicName() explicitly drops the old value from the map, so that we don't rely on it being overwritten. * forgetMemoizedResultsImpl() was dropping the SCEV from ExprValueMap, but not dropping the corresponding entries from ValueExprMap. Differential Revision: https://reviews.llvm.org/D113349	2021-11-27 12:37:15 +01:00
Nikita Popov	719354a571	Revert "[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency" This reverts commit `bee8dcda1f`. Some sanitizer buildbots fail with: > Attempt to use a SCEVCouldNotCompute object! For example: https://lab.llvm.org/buildbot/#/builders/85/builds/7020/steps/9/logs/stdio	2021-11-26 22:18:23 +01:00
Nikita Popov	bee8dcda1f	[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency Relative to the previous landing attempt, this makes insertValueToMap() resilient against the value already being present in the map -- previously I only checked this for the createSimpleAffineAddRec() case, but the same issue can also occur for the general createNodeForPHI(). In both cases, the addrec may be constructed and added to the map in a recursive query trying to create said addrec. In this case, this happens due to the invalidation when the BE count is computed, which ends up clearing out the symbolic name as well. ----- This adds validation for consistency of ValueExprMap and ExprValueMap, and fixes identified issues: * Addrec construction directly wrote to ValueExprMap in a few places, without updating ExprValueMap. Add a helper to ensures they stay consistent. The adjustment in forgetSymbolicName() explicitly drops the old value from the map, so that we don't rely on it being overwritten. * forgetMemoizedResultsImpl() was dropping the SCEV from ExprValueMap, but not dropping the corresponding entries from ValueExprMap. Differential Revision: https://reviews.llvm.org/D113349	2021-11-26 20:57:47 +01:00
Simon Pilgrim	a25e08dd3c	[PowerPC/ Regenerate fp128-bitcast-after-operation test checks	2021-11-25 13:39:57 +00:00
Nemanja Ivanovic	b7bf937bbe	[PowerPC] Provide XL-compatible vec_round implementation The XL implementation of vec_round for vector double uses "round-to-nearest, ties to even" just as the vector float `version does. However clang and gcc use "round-to-nearest-away" for vector double and "round-to-nearest, ties to even" for vector float. The XL behaviour is implemented under the __XL_COMPAT_ALTIVEC__ macro similarly to other instances of incompatibility. Differential revision: https://reviews.llvm.org/D113642	2021-11-24 06:43:56 -06:00
Nemanja Ivanovic	c9cb8edc51	[PowerPC] Allow scalars for asm constraint "v" with VSX Similarly to what GCC does, we should allow scalars with the "v" constraint rather than introducing unnecessary new constraints for scalars in Altivec registers. Differential revision: https://reviews.llvm.org/D113635	2021-11-23 17:03:04 -06:00
Nemanja Ivanovic	c933c2eb33	[PowerPC] Add BCD add/sub/cmp builtins Support for builtins that use bcdadd./bcdsub. to add/subtract Binary Coded Decimal values as well as to determine validity and compare BCD values. Differential revision: https://reviews.llvm.org/D114088	2021-11-23 11:42:36 -06:00
Qiu Chaofan	59f4b3d308	[PowerPC] Implement more fusion types for Power10 This implements the rest of Power10 instruction fusion pairs, according to user manual, including 'wide immediate', 'load compare', 'zero move' and 'SHA3 assist'. Only 'SHA3 assist' is enabled by default. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D112912	2021-11-23 17:21:17 +08:00
Nikita Popov	62e9acad0a	Revert "[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency" This reverts commit `d633db8f9d`. Causes bootstrap assertion failures: https://lab.llvm.org/buildbot/#/builders/168/builds/3459/steps/9/logs/stdio	2021-11-22 15:47:33 +01:00
Nikita Popov	d633db8f9d	[SCEV] Fix and validate ValueExprMap/ExprValueMap consistency This adds validation for consistency of ValueExprMap and ExprValueMap, and fixes identified issues: * Addrec construction directly wrote to ValueExprMap in a few places, without updating ExprValueMap. Add a helper to ensures they stay consistent. The adjustment in forgetSymbolicName() explicitly drops the old value from the map, so that we don't rely on it being overwritten. * forgetMemoizedResultsImpl() was dropping the SCEV from ExprValueMap, but not dropping the corresponding entries from ValueExprMap. Differential Revision: https://reviews.llvm.org/D113349	2021-11-22 15:27:25 +01:00
Simon Pilgrim	357d636289	[PowerPC] Regenerate rlwinm2.ll test	2021-11-21 18:33:28 +00:00
Stefan Pintilie	e9d12c2480	[PowerPC][NFC] Add a series of codegen tests for vector reductions. This patch only adds tests for PowerPC. The purpose of these tests is to track what code is generated for various vector reductions. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D113801	2021-11-19 15:03:01 -06:00
Victor Huang	86e77cdb08	[PowerPC] Add a flag for conditional trap optimization This patch adds a flag to enable/disable conditional trap optimization. Optimization disabled by default. Peer reviewed by: nemanjai	2021-11-19 10:24:54 -06:00
Simon Pilgrim	812e64ef0c	[DAG] MatchRotate - support rotate-by-constant of illegal types Patch to fix some of the regressions in D77804. By folding to rotate/funnel-shift by constant amounts for illegal types, we prevent SimplifyDemandedBits from destroying the patterns prematurely, allowing us to use the rotate/funnel-shift legalization that was added in D112443. Differential Revision: https://reviews.llvm.org/D113192	2021-11-19 11:12:04 +00:00
Victor Huang	40c65655af	[PowerPC] Remove the redundant terminator instruction when optimizing conditional trap This patch is a follow up patch for `ae27ca9a67` to the remove redundant terminator when optimizing conditional trap. Peer reviewed by: nemanjai	2021-11-18 17:52:26 -06:00
Victor Huang	ae27ca9a67	[PowerPC] PPC backend optimization on conditional trap intrustions This patch adds PPC back end optimization to analyze the arguments of a conditional trap instruction to execute one of the following: 1. Delete it if never trap 2. Replace it if always trap 3. Otherwise keep it Reviewed By: nemanjai, amyk, PowerPC Differential revision: https://reviews.llvm.org/D111434	2021-11-16 13:11:57 -06:00
Kai Luo	c0da8a4e40	[CGP][PowerPC] Pre-commit test case for D113872. NFC.	2021-11-16 09:18:49 +00:00
Lei Huang	f50c6c1718	[PowerPC] Fix 32bit vector insert instructions for ISA3.1 The platform independent ISD::INSERT_VECTOR_ELT take a element index, but vins* instructions take a byte index. Update 32bit td patterns for vector insert to handle the element index accordingly. Since vector insert for non constant index are supported in ISA3.1, there is no need to use platform specific ISD node, PPCISD::VECINSERT. Update td pattern to directly use ISD::INSERT_VECTOR_ELT instead. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D113802	2021-11-15 14:36:39 -06:00
Chen Zheng	eec9ca622c	[PowerPC] guard update form prepare with non-const increment with option Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D113471	2021-11-15 02:16:46 +00:00
Victor Huang	18fe0a0d9e	[PowerPC] PPC backend optimization to lower int_ppc_tdw/int_ppc_tw intrinsics to TDI/TWI machine instructions This patch adds the backend optimization to match XL behavior for the two builtins __tdw and __tw that when the second input argument is an immediate, emitting tdi/twi instructions instead of td/tw. Reviewed By: nemanjai, amyk, PowerPC Differential revision: https://reviews.llvm.org/D112285	2021-11-11 09:52:00 -06:00
Qiu Chaofan	5e9021c606	[NFC] Clean-up typos in PowerPC CodeGen tests	2021-11-11 15:42:08 +08:00
Qiu Chaofan	bc39ce9fa5	[NFC] Remove unnecessary check prefix of AIX test `9e9b0f4` introduced support for asm-full-reg-names on AIX. Now we can merge the test check prefix.	2021-11-11 13:27:42 +08:00
Nemanja Ivanovic	5840f7197d	[PowerPC] Respect rounding mode in the back end Currently, the floating point instructions that depend on rounding mode are correctly marked in the PPC back end with an implicit use of the RM register. Similarly, instructions that explicitly define the register are marked with an implicit def of the same register. So for the most part, RM-using code won't be moved across RM-setting instructions. However, calls are not marked as RM-setting instructions so code can be moved across calls. This is generally desired, but so is the ability to turn off this behaviour with an appropriate option - and -frounding-math really should be that option. This patch provides a set of call instructions (for direct and indirect calls) that are marked with an implicit def of the RM register. These will be used for calls that are marked with the strictfp attribute. Differential revision: https://reviews.llvm.org/D111433	2021-11-10 08:19:58 -06:00
Qiu Chaofan	9b5e2b5261	[PowerPC] Implement basic macro fusion in Power10 Including basic fusion types around arithmetic and logical instructions. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D111693	2021-11-08 17:23:56 +08:00
Chen Zheng	50acbbe3cd	[AsmPrinter][ORE] use correct opcode name Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D113173	2021-11-08 01:51:24 +00:00
Chen Zheng	c7d27f90e7	[ORE][AsmPrinter] add testcase for D113173; NFC	2021-11-08 01:47:22 +00:00
Alfredo Dal'Ava Junior	1cb9f37a17	[FreeBSD] Do not mark __stack_chk_guard as dso_local This symbol is defined in libc.so so it is definitely not DSO-Local. Marking it as such causes problems on some platforms (such as PowerPC). Differential revision: https://reviews.llvm.org/D109090	2021-11-05 07:29:50 -05:00
Chen Zheng	fed2889f07	[PowerPC] use correct selection for v16i8/v8i16 splat load Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D113236	2021-11-05 10:04:03 +00:00
Qiu Chaofan	5fd406e254	[PowerPC] Add intrinsic to convert between ppc_fp128 and fp128 ppc_fp128 and fp128 are both 128-bit floating point types. However, we can't do conversion between them now, since trunc/ext are not allowed for same-size fp types. This patch adds two new intrinsics: llvm.ppc.convert.f128.to.ppcf128 and llvm.convert.ppcf128.to.f128, to support such conversion. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D109421	2021-11-05 16:58:38 +08:00
Qiu Chaofan	a84118756c	[PowerPC] Enforce side effects to FPSCR read/set intrinsics Currently, FPSCR is not modeled, so in some early passes (such as early-cse), the read/set intrinsics to FPSCR may get incorrect simplification. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D112380	2021-11-04 11:45:32 +08:00
Qiu Chaofan	741aeda97d	[PowerPC] Implement longdouble pack/unpack builtins Implement two builtins to pack/unpack IBM extended long double float, according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D112055	2021-11-03 17:57:25 +08:00
Chen Zheng	5a8b196340	[PowerPC] handle more splat loads without stack operation This mostly improves splat loads code generation on Power7 Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106555	2021-11-03 05:17:41 +00:00
Simon Pilgrim	325031786e	[SelectionDAG] Optimize expansion for rotates/funnel shifts If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering. Also use the optimized funnel shift lowering for rotates. Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept (Branched from D108058 as getting this completed should help unlock some other WIP patches). Original Patch: @efriedma (Eli Friedman) Differential Revision: https://reviews.llvm.org/D112443	2021-11-02 11:38:25 +00:00
Jinsong Ji	bd932f7499	[NFC][PowerPC] Update testcases using script For D106555.	2021-11-01 15:37:23 +00:00
Chen Zheng	eeed1545b2	[PowerPC] turn off chain commoning by default.	2021-11-01 04:11:10 +00:00
Itay Bookstein	848812a55e	[Verifier] Add verification logic for GlobalIFuncs Verify that the resolver exists, that it is a defined Function, and that its return type matches the ifunc's type. Add corresponding check to BitcodeReader, change clang to emit the correct type, and fix tests to comply. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D112349	2021-10-31 20:00:57 -07:00
Victor Huang	40cad47fd8	[PowerPC][NFC] Update builtins-ppc-xlcompat-trap-64bit-only.ll and builtins-ppc-xlcompat-trap.ll to show full reg names	2021-10-28 11:59:27 -05:00
Chen Zheng	631f44f338	[PowerPC] use right extend type for SCEV Fix an issue caused by D108750 Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D112502	2021-10-26 13:32:03 +00:00
Chen Zheng	80e6aff6bb	[PowerPC] common chains to reuse offsets to reduce register pressure. Add a new preparation pattern in PPCLoopInstFormPrep pass to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D108750	2021-10-25 03:27:16 +00:00
Jinsong Ji	7ea1fbe86d	[AIX] Add i128 arg split tests Address comments in D111078. Reviewed By: hubert.reinterpretcast, lkail Differential Revision: https://reviews.llvm.org/D112272	2021-10-25 02:41:05 +00:00
Craig Topper	0766aef3f3	[LegalizeTypes][RISCV][PowerPC] Expand CTLZ/CTTZ/CTPOP instead of promoting if they'll be expanded later. Expanding these requires multiple constants. If we promote during type legalization when they'll end up getting expanded in LegalizeDAG, we'll use larger constants. These constants may be harder to materialize. For example, 64-bit constants on 64-bit RISCV are very expensive. This is similar to what has already been done to BSWAP and BITREVERSE. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112268	2021-10-22 09:10:01 -07:00
Bjorn Pettersson	a413663d8f	[NewPM][test] Avoid using -enable-new-pm=1 since -passes implies new PM	2021-10-20 15:16:17 +02:00
Qiu Chaofan	67c64d8337	[PowerPC] Implement scheduling model for Power10 Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D110855	2021-10-18 15:27:49 +08:00
Jinsong Ji	42eea2b69b	[AIX] Enable int128 in 64 bit mode This patch remove the override in AIX target, so the int128 is enabled in 64 bit mode or with ForceEnableInt128. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D111078	2021-10-15 16:23:04 +00:00
Qiu Chaofan	9e9b0f4621	[PowerPC] Support ppc-asm-full-reg-names for AIX Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D94282	2021-10-15 12:22:44 +08:00
Albion Fung	b4b9f9b4b3	[PowerPC] Emit dcbt and dcbtst in place of their extended mnemonics on AIX On AIX, the system assembler does not support the extended mnemonics dcbtt and dcbtstt. This patch stops them from being emitted on AIX and emits the base mnemonics instead, dcbt X, X, 16 and dcbtstt X, X, 16 respectively. Differential revision: https://reviews.llvm.org/D111258	2021-10-12 15:47:57 -05:00
Roland Froese	28e648b29e	[PowerPC] Simplify PPC codegen test pre-inc-disable.ll Simplify the test case to make it easier to look at. Change from auto-generated checks to targeted manual checks to reduce sensitivity to register allocation and scheduling changes. Differential Revision: https://reviews.llvm.org/D111333	2021-10-12 20:12:31 +00:00
Qiu Chaofan	1f253e4fd6	Pre-commit pre-inc-disable.ll to avoid dead code The case was added in `728e139`, testing it outputs lxsibzx instead of lbzux. Here we need some minimal update to avoid DCE in future patches.	2021-10-12 16:03:17 +08:00
Chen Zheng	4ead32d1cf	[PowerPC] update test case using the scripts; nfc	2021-10-10 14:39:20 +00:00
Qiu Chaofan	573531fb1f	Fix typo of colon to semicolon in lit tests	2021-10-09 10:03:50 +08:00
Chen Zheng	5f4c91583e	[XCOFF] support DWARF for 32-bit XCOFF for object output Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D97184	2021-10-08 02:35:11 +00:00
Stefan Pintilie	740086596c	[PowerPC] Fix issue with lowering byval parameters. Lowering of byval parameters with sizes that are not represented by a single store require multiple stores to properly address the correct size of the parameter. Sizes that cannot be done with a single store are 3 bytes, 5 bytes, 6 bytes, 7 bytes. It is not correct to simply perform an 8 byte store and for these elements because then the store would be larger than the element and alias analysis would assume that this is undefined behaivour and return NoAlias for them. This patch adds the correct stores so that the size of the store is not larger than the size of the element. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D108795	2021-10-06 13:19:15 -05:00
Kamau Bridgeman	8737c74fab	[PowerPC][MMA] Allow MMA builtin types in pre-P10 compilation units This patch allows the use of __vector_quad and __vector_pair, PPC MMA builtin types, on all PowerPC 64-bit compilation units. When these types are made available the builtins that use them automatically become available so semantic checking for mma and pair vector memop __builtins is also expanded to ensure these builtin function call are only allowed on Power10 and new architectures. All related test cases are updated to ensure test coverage. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D109599	2021-10-05 07:59:32 -05:00
Jinsong Ji	933e2469a2	[PowerPC][NFC] Remove reg name option in int128 test The test is generated by script, so we don't really need the regname to be meaniful here. AIX doesn't support the reg name option, removing it for now so that we can reuse the CHECKs for AIX triple as well.	2021-10-04 15:31:25 +00:00
Stefan Pintilie	4fc2f4979c	[PowerPC] Fix __builtin_ppc_load2r to return short instead of int. This patch fixes the return value of the builtin __builtin_ppc_load2r to correctly return short instead of int. Reviewed By: nemanjai, #powerpc Differential Revision: https://reviews.llvm.org/D110771	2021-10-04 06:17:02 -05:00
Philip Reames	f39978b84f	[SCEV] Correctly propagate nowrap flags across scopes when folding invariant add through addrec This fixes a violation of the wrap flag rules introduced in `c4048d8f`. This is an alternate fix to D106852. The basic problem being fixed is that we infer a set of flags which is valid at some inner scope S1 (usually by correctly propagating them from IR), and then (incorrectly) extend them to a SCEV in scope S2 where S1 != S2. This is not in general safe per the wrap flags semantics recently defined. In this patch, I include a simple inference step to handle the case where we can prove that S2 is the preheader of the loop S1, and that entry into S2 implies execution of S1. See the code for a more detailed explanation. One worry I have with this patch is that I might be over-fitting what shows up in tests - and thus hiding negative impact we'd see in the real world. My best defense is that the rule used here very closely follows the one used to propagate the flags from IR to the inner add to start with, and thus if one is reasonable, so probably is the other. Curious what others think about that piece. The test diffs are roughly as expected. Mostly analysis only, with two transform changes. Oddly, the result looks better in the loop-idiom test, and I don't understand the PPC output enough to have tell. Nothing terrible looking though. (For context, without the scope inference peephole, the test delta includes a couple of vectorization tests. Again, not super concerning, but slightly more so.) Differential Revision: https://reviews.llvm.org/D109845	2021-10-03 15:19:33 -07:00
Stefan Pintilie	40f382ad10	[NFC][PowerPC] Add test case for byval store. Added a test case for situations where a struct of size 1-7 bytes is passed by value.	2021-10-01 16:54:29 -05:00
Albion Fung	4195ed9959	[PowerPC] Improved codegen related to xscvdpsxws/xscvdpuxws This patch removes the uneccessary mf/mtvsr generated in conjunction with xscvdpsxws/xscvdpuxws. Differential revision: https://reviews.llvm.org/D109902	2021-09-30 14:31:00 -05:00
Quinn Pham	67a3d1e275	[PowerPC] swdiv builtins for XL compatibility This patch is in a series of patches to provide builtins for compatibility with the XL compiler. This patch implements the software divide builtin as wrappers for a floating point divide. XL provided these builtins because it didn't produce software estimates by default at `-Ofast`. When compiled with `-Ofast` these builtins will produce the software estimate for divide. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D106959	2021-09-29 11:31:07 -05:00
Nemanja Ivanovic	09b67aa1c3	[PowerPC] Implement builtin for vbpermd The instruction has similar semantics to vbpermq but for doublewords. It was added in Power9 and the ABI documents the builtin. Differential revision: https://reviews.llvm.org/D107899	2021-09-29 06:34:31 -05:00
Quinn Pham	70391b3468	[PowerPC] FP compare and test XL compat builtins. This patch is in a series of patches to provide builtins for compatability with the XL compiler. This patch adds builtins for compare exponent and test data class operations on floating point values. Reviewed By: #powerpc, lei Differential Revision: https://reviews.llvm.org/D109437	2021-09-28 11:01:51 -05:00
Albion Fung	3678df5ae6	[PowerPC][NFC] Add test case in preparation for codegen change This test case tests doubles inserted into vector ints, and help make apparent the optimizations a future patch will make.	2021-09-24 12:17:50 -05:00
Victor Huang	6e1aaf18af	[PowerPC] Mark splat immediate instructions as rematerializable This patch marks splat immediate instructions XXSPLTIW and XXSPLTIDP as rematerializable to prevent MachineLICM from moving them out of loops. Reviewed By: lei, amy Differential revision: https://reviews.llvm.org/D108823	2021-09-24 12:03:34 -05:00
Chen Zheng	957514eb9e	[PowerPC] add testcase for chain commoning; nfc	2021-09-22 05:08:00 +00:00
Chen Zheng	ffa9fa9ed2	[PowerPC] prepare for udpate form with non-const increment. This is a follow-up of D105872. Now we are able to prepare for update form with non-const increment. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D106032	2021-09-22 02:54:28 +00:00
Amy Kwan	2af57b6099	[PowerPC] Add prefix load pattern for fpext to v2f64 This patch adds a prefixed load pattern involving v2f32 fpext v2f64, where we are dealing with a value with an offset that fits into a 34-bit signed immediate. A reduced test case is also added to patch that tests the pattern, in which the pattern is tested in the big endian CHECKs of the newly added test. Differential Revision: https://reviews.llvm.org/D109887	2021-09-21 12:45:24 -05:00
Chen Zheng	80584f0056	Revert "[PowerPC][ELF] make sure local variable space does not overlap with parameter save area" This causes mix-compile issues on PowerPC Linux. This reverts commit `324bd467a2`.	2021-09-17 08:07:18 +00:00
Matt Arsenault	54d755a034	DAG: Fix incorrect folding of fmul -1 to fneg The fmul is a canonicalizing operation, and fneg is not so this would break denormals that need flushing and also would not quiet signaling nans. Fold to fsub instead, which is also canonicalizing.	2021-09-14 21:25:02 -04:00
Matt Arsenault	4a36e96c3f	RegAllocGreedy: Account for reserved registers in num regs heuristic This simple heuristic uses the estimated live range length combined with the number of registers in the class to switch which heuristic to use. This was taking the raw number of registers in the class, even though not all of them may be available. AMDGPU heavily relies on dynamically reserved numbers of registers based on user attributes to satisfy occupancy constraints, so the raw number is highly misleading. There are still a few problems here. In the original testcase that made me notice this, the live range size is incorrect after the scheduler rearranges instructions, since the instructions don't have the original InstrDist offsets. Additionally, I think it would be more appropriate to use the number of disjointly allocatable registers in the class. For the AMDGPU register tuples, there are a large number of registers in each tuple class, but only a small fraction can actually be allocated at the same time since they all overlap with each other. It seems we do not have a query that corresponds to the number of independently allocatable registers. Relatedly, I'm still debugging some allocation failures where overlapping tuples seem to not be handled correctly. The test changes are mostly noise. There are a handful of x86 tests that look like regressions with an additional spill, and a handful that now avoid a spill. The worst looking regression is likely test/Thumb2/mve-vld4.ll which introduces a few additional spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll shows a massive improvement by completely eliminating a large number of spills inside a loop.	2021-09-14 21:00:29 -04:00
Amy Kwan	5041a485b9	[PowerPC] Exploit Prefixed Load/Stores using the refactored Load/Store Implementation This patch exploits the prefixed load and store instructions utilizing the refactored load/store implementation introduced in D93370. Prefixed load and store instructions are emitted whenever we are loading or storing a value with an offset that fits into a 34-bit signed immediate. Patterns for the prefixed load and stores are added in this patch, as well as the implementation that detects when we are loading and storing a value with an offset that fits in 34-bits. Differential Revision: https://reviews.llvm.org/D96075	2021-09-14 08:39:49 -05:00
Chen Zheng	946e69d253	[PowerPC] prepare more loop load/store instructions PPCLoopInstrFormPrep pass now can prepare for load store instructions in a loop whose increment is not a constant integer. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D105872	2021-09-14 05:00:48 +00:00
Amy Kwan	351a0d8a90	[PowerPC] Update PC-Relative Load/Store Patterns to use the refactored Load/Store Implementation This patch updates the PC-Relative load and store patterns to utilize the refactored load/store implementation introduced in D93370. PC-Relative implementation has been added to PPCISelLowering.cpp, and also the patterns in PPCInstrPrefix.td have been updated and no longer require AddedComplexity. All existing test cases pass with this update. Differential Revision: https://reviews.llvm.org/D95116	2021-09-09 15:38:42 -05:00
David Green	d8d24c64fe	[DAG] Fix GT -> GE condition when creating SetCC `79845ed6df` folded some setcc(ashr) conditions to setcc, but got the condition for NE incorrect, using GT where it should be using GE.	2021-09-08 12:41:51 +01:00
Victor Huang	4a226529e2	[PowerPC] Fixed the crash due to early if conversion with fixed CR fields This patch adds a fix to do early if conversion to select when conditional branch not using physical register to prevent the crash when expanding ISEL instruction. Reviewed By: lei, kamaub, PowerPC Differential revision: https://reviews.llvm.org/D108302	2021-09-07 10:51:03 -05:00

1 2 3 4 5 ...

3389 Commits