llvm-project

Commit Graph

Author	SHA1	Message	Date
Chen Zheng	df9d60af1f	[PowerPC] handle more than two predecessors loop header in ctrloop pass After ISEL, the "valid" loop header which has two predecessors (one is preheader and the other one is latch) may be transformed to have more than two predecessors by some optimizations, like tail duplicator, if the old header's successor(will be changed to new header) is a sub loop. The predecessors of the new loop header are preheader, loop latch and the loop latch(es) of the sub loop(old header's successor). Before the patch, ctrloop pass assumes two predecessors for candidate loop header. This patch fixes this case. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D135846	2022-10-19 01:11:58 +00:00
Stefan Pintilie	b107ff4856	[NFC][PowerPC] Add a test to check power 10 features. This patch only adds a single test for Power 10 features.	2022-10-18 09:05:24 -05:00
Peter Rong	c2e7c9cb33	[CodeGen] Using ZExt for extractelement indices. In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`. This is because IRTranslator uses SExt for indices. In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt. This change includes both documentation, SelectionDAG and IRTranslator. We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86 This patch fixes issue #57452. Differential Revision: https://reviews.llvm.org/D132978	2022-10-15 15:45:35 -07:00
Amy Kwan	22e4203df8	[PowerPC][NFC] Pre-commit case for lowering vector shuffles to xxsplti32dx (64 bit) This patch adds a test case for lowering vector shuffles to xxsplti32dx in preparation for D135024. The test case added in this patch only adds the 64-bit CHECKs, as the 32-bit CHECKs cannot be generated (in which D135024 aims to fix).	2022-10-14 10:15:34 -05:00
Nemanja Ivanovic	0d253bbd33	[PowerPC] Change CRNOT to a code gen single operand instruction Inputs to crnor can come from operands with chains so if it is being used simply to negate such an operand, the repeated input cannot be CSE'd. This patch just adds a code-gen only instruction for this that takes a single input and duplicates it in the encoding of the underlying crnor. Differential revision: https://reviews.llvm.org/D133577	2022-10-13 20:09:44 -05:00
Nemanja Ivanovic	a77a70fa3c	[PowerPC] Stash GPR to VSR if emergency spill slot is not reachable When removing frame indices on PowerPC, we need to scavenge a GPR to materialize a large constant if the stack offset for the spill/reload cannot be reached by a D-Form instruction. However, in a perfect storm of conditions, we may not have GPR's available to scavenge, thereby requiring an emergency spill. If such an emergency spill also needs to be spilled to a location with a large offset, it would itself require register scavenging thereby creating an infinite loop. This patch detects when the scavenger cannot scavenge a register and the spill/reload is to a location with a large offset. It then stashes a GPR into a VSR so that it can use the GPR to materialize the constant (rather than scavenging a GPR). Fixes: https://github.com/llvm/llvm-project/issues/52894 Differential revision: https://reviews.llvm.org/D124841	2022-10-13 09:06:37 -05:00
Peter Rong	c7dd7f20b0	[PowerPC] Pre-commit unit test change for D132978	2022-10-12 11:26:57 -07:00
Chen Zheng	5f4927da77	[PowerPC][NFC] refactor some test cases.	2022-10-12 12:19:52 +00:00
Craig Topper	ac9209751a	Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))" This reverts commit `0148df8157`. Getting a lit test failures on AMDGPU but I can't reproduce it so far. Reverting to investigate.	2022-10-11 16:30:40 -07:00
Craig Topper	0148df8157	[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y)) (sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y. This pattern shows up when type legalizing wide multiplies involving a sign extended value. Fixes PR57549. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133399	2022-10-11 16:20:55 -07:00
Kai Nacke	5403c59c60	[PPC] Opaque pointer migration, part 2. The LIT test cases were migrated with the script provided by Nikita Popov. Due to the size of the change it is split into several parts. Reviewed By: nemanja, nikic Differential Revision: https://reviews.llvm.org/D135474	2022-10-11 17:24:06 +00:00
Kai Nacke	427fb35192	[PPC] Opaque pointer migration, part 1. The LIT test cases were migrated with the script provided by Nikita Popov. Due to the size of the change it is split into several parts. Reviewed By: nemanja, amyk, nikic, PowerPC Differential Revision: https://reviews.llvm.org/D135470	2022-10-11 17:24:06 +00:00
Ting Wang	bc5e969ca1	[PowerPC] Add vector pair calling convention for AIX This is AIX part of update after https://reviews.llvm.org/D117225 Fixed the issue that AIX64 with vector pair enabled saw redundant spill/reload of callee saved vector registers. Based on original patch by: Kai Luo Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D133466	2022-10-09 01:23:18 -04:00
Arthur Eubanks	c384b20b55	[opt] Remove temporary legacy pass name translations And update corresponding tests.	2022-10-07 11:09:46 -07:00
Stefan Pintilie	30d639180f	[PowerPC] Fix the register allocation hints for ACC registers. The allocation hints for copies of ACC registers assumed that we would only be copying between VSRp and UACC registers. In reality it is also possible to copy between UACC and ACC registers. This patch adds a new case for the ACC copy to fix that issue. Note that the test case added with this patch will hit an assert without the fix. Reviewed By: lei, amyk Differential Revision: https://reviews.llvm.org/D134501	2022-10-04 20:30:16 -05:00
Sanjay Patel	0a1210e482	[InstSimplify] try harder to fold fmul with 0.0 operand https://alive2.llvm.org/ce/z/oShzr3 This was noted as a missing fold in D134876 (with additional examples based on issue #58046). I'm assuming that fmul with a zero operand is rare enough that the use of ValueTracking will not noticeably increase compile-time. This adjusts a PowerPC codegen test that was added with D88388 because it would get folded away and no longer provide coverage for the bug fix.	2022-10-04 11:20:01 -04:00
Nemanja Ivanovic	4ea121c904	[PowerPC] Fix a number of inefficiencies and issues with atomic code gen There are a few issues with the code we generate for atomic operations and the way we generate it: - Hard coded CR0 for compares - Order of operands for compares not conducive to emitting compare-immediate or for CSE of compares - Missing MachineMemOperand for st[bhwd]cx intrinsics - Missing intrinsic properties for the same - Unnecessary blocks with store conditional instructions to clear reservation (which ends up hindering performance) - Move from CR instructions just to compare the result of a store conditional with zero (even though it is a record-form) This patch aims to resolve all of those issues. Differential revision: https://reviews.llvm.org/D134783	2022-10-03 19:55:29 -05:00
Nemanja Ivanovic	f0ec83eb0a	[PowerPC][NFC] Pre-commit test case for an upcoming atomics patch Just a new test case with auto generated checks.	2022-09-27 18:52:41 -05:00
Amaury Séchet	d1baed7c9c	[DAG] select Cond, -1, C --> or (sext Cond), C if Cond is MVT::i1 This seems to be beneficial overall, except for midpoint-int.ll . The X86 backend seems to generate zeroing that are not necesary. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D131260	2022-09-27 12:54:52 +00:00
Paul Scoropan	ce004fb4f2	[PowerPC] XCOFF exception section support on the direct assembler path This feature implements support for making entries in the exception section on XCOFF on the direct assembly path using the ".except" pseudo-op. It also provides functionality to lower entries (comprised of language and reason codes) into the exception section through the use of annotation metadata attached to llvm.ppc.trap/trapd/tw/tdw intrinsics. Integrated assembler support will be provided in another review. https://reviews.llvm.org/D133030 needs to merge first for LIT tests Reviewed By: shchenz, RKSimon Differential Revision: https://reviews.llvm.org/D132146	2022-09-26 22:24:20 -04:00
Ting Wang	514ac16b51	[PowerPC][NFC] Add virtual call to show redundant spill of vector registers Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D133921	2022-09-20 21:21:06 -04:00
Yuta Mukai	116838b151	[MachinePipeliner] Fix the interpretation of the scheduling model The method of counting resource consumption is modified to be based on "Cycles" value when DFA is not used. The calculation of ResMII is modified to total "Cycles" and divide it by the number of units for each resource. Previously, ResMII was excessive because it was assumed that resources were consumed for the cycles of "Latency" value. The method of resource reservation is modified similarly. When a value of "Cycles" is larger than 1, the resource is considered to be consumed by 1 for cycles of its length from the scheduled cycle. To realize this, ResourceManager maintains a resource table for all slots. Previously, resource consumption was always 1 for 1 cycle regardless of the value of "Cycles" or "Latency". In addition, the number of micro operations per cycle is modified to be constrained by "IssueWidth". To disable the constraint, --pipeliner-force-issue-width=100 can be used. For the case of using DFA, the scheduling results are unchanged. Reviewed By: dpenry Differential Revision: https://reviews.llvm.org/D133572	2022-09-16 09:51:48 +09:00
esmeyi	6e0e926c2f	[PowerPC] Converts to comparison against zero even when the optimization doesn't happened in peephole optimizer. Summary: Converting a comparison against 1 or -1 into a comparison against 0 can exploit record-form instructions for comparison optimization. The conversion will happen only when a record-form instruction can be used to replace the comparison during the peephole optimizer (see function optimizeCompareInstr). In post-RA, we also want to optimize the comparison by using the record form (see D131873) and it requires additional dataflow analysis to reliably find uses of the CR register set. It's reasonable to common the conversion for both peephole optimizer and post-RA optimizer. Converting to comparison against zero even when the optimization doesn't happened in peephole optimizer may create additional opportunities for the post-RA optimization. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D131374	2022-09-15 06:06:25 -04:00
Roland Froese	207228c1d6	[DAGCombiner] More load-store forwarding for big-endian Get some load-store forwarding cases for big-endian where a larger store covers a smaller load, and the offset would be 0 and handled on little-endian but on big-endian the offset is adjusted to be non-zero. The idea is just to shift the data to make it look like the offset 0 case. Differential Revision: https://reviews.llvm.org/D130115	2022-09-14 15:36:35 -04:00
Ting Wang	12e78d96f2	[PowerPC][NFC] Add base test case to show redundant spill of vector registers Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D133543	2022-09-13 00:47:47 -04:00
Eric Wang	d8a2d3f7d4	[NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis This patch introduces the priority analysis and the priority advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D132835	2022-09-08 07:50:03 -07:00
Fangrui Song	c89b78a22b	[test] Remove PowerPC/aix-xcoff-exported-nondefault.ll This is not asserted by IR verifier.	2022-09-06 16:48:16 -07:00
Matthias Gehre	af3758d678	Fix remaining test failures for "[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64"	2022-09-06 16:38:43 +01:00
Fangrui Song	2417618d5c	[Verifier] Reject dllexport with non-default visibility Add a visibility check for dllimport and dllexport. Note: dllimport with a non-default visibility (implicit dso_local) is already rejected, but with a less clear dso_local error. The MC level visibility `MCSA_Exported` (D123951) is mapped from IR level default visibility when dllexport is specified. The D123951 error is now very difficult to trigger (needs to disable the IR verifier). Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D133267	2022-09-05 10:53:41 -07:00
Kai Luo	ad2f7fd286	[AtomicExpand] Make floating point conversion happens before fence insertion IIUC, the conversion part is not part of atomic operations and fences should be put around converted atomic operations. This also fixes atomic load of floating point values which requires fence on PowerPC. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D127609	2022-08-31 09:54:58 +08:00
Ting Wang	710923cdc8	[PowerPC] CTRLoop pseudo instructions should not be duplicated Add isNotDuplicable to CTRLoop pseudo instructions, to avoid other pass such as early-tailduplication break the loop structure by duplicating pseudo instructions. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D132738	2022-08-30 04:32:29 -04:00
Ting Wang	f908cbc36f	[NFC][PowerPC] Add test case to show ctrloop mi shall not be duplicated Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D132899	2022-08-30 01:57:22 -04:00
Maryam Moghadas	a09140f34f	Revert "[PowerPC] Remove extra swap for extract+vperm on LE" This reverts commit `f7294ac809`.	2022-08-25 12:34:43 -05:00
Matthias Braun	5364f49407	Fix CSR update check D132080 introduced a bug leading to `RegisterClassInfo` caches not getting invalidated when there was exactly one more CSR register added. Differential Revision: https://reviews.llvm.org/D132606	2022-08-24 18:09:49 -07:00
esmeyi	dfe55cc1cd	[AIX] use the original name as the input to create the new symbol for TLS symbol. Summary: Currently, an error was reported when a thread local symbol has an invalid name. D100956 create a new symbol to prefix the TLS symbol name with a dot. When the symbol name is renamed, the error occurs. This patch uses the original symbol name (name in the symbol table) as the input for the symbol for TOC entry. Reviewed By: shchenz, lkail Differential Revision: https://reviews.llvm.org/D132348	2022-08-24 01:36:40 -04:00
Amaury Séchet	7e24354c8b	[PowerPC] Autogenerate crbits.ll . NFC	2022-08-23 13:50:25 +00:00
Matthias Braun	b2542c40b9	RegisterClassInfo: Fix CSR cache invalidation `RegisterClassInfo` caches information like allocation orders and reuses it for multiple machine functions where possible. However the `MCPhysReg *CalleeSavedRegs` field used to test whether the set of callee saved registers changed did not work: After D28566 `MachineRegisterInfo::getCalleeSavedRegs()` can return dynamically computed CSR sets that are only valid while the `MachineRegisterInfo` object of the current function exists. This changes the code to make a copy of the CSR list instead of keeping a possibly invalid pointer around. Differential Revision: https://reviews.llvm.org/D132080	2022-08-22 09:28:26 -07:00
Stefan Pintilie	1492c88f49	[PowerPC] Fix bugs in sign-/zero-extension elimination This patch fixes the following two bugs in `PPCInstrInfo::isSignOrZeroExtended` helper, which is used from sign-/zero-extension elimination in PPCMIPeephole pass. - Registers defined by load with update (e.g. LBZU) were identified as already sign or zero-extended. But it is true only for the first def (loaded value) and not for the second def (i.e. updated pointer). - Registers defined by ORIS/XORIS were identified as already sign-extended. But, it is not true for sign extension depending on the immediate (while it is ok for zero extension). To handle the first case, the parameter for the helpers is changed from `MachineInstr` to a register number to distinguish first and second defs. Also, this patch moves the initialization of PPCMIPeepholePass to allow mir test case. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D40554	2022-08-19 07:05:40 -05:00
Nick Desaulniers	6b0e2fa6f0	[SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses As part of re-architecting callbr to no longer use blockaddresses (https://reviews.llvm.org/D129288), we don't really need them in MIR. They make comparing MachineBasicBlocks of indirect targets during MachineVerifier a PITA. Suggested by @efriedma from the discussion: https://reviews.llvm.org/D130290#3669531 Reviewed By: efriedma, void Differential Revision: https://reviews.llvm.org/D130316	2022-08-17 09:34:31 -07:00
Amy Kwan	a5bef98c75	[PowerPC][NFC] Add additional vector_shuffle tests involving scalar_to_vector. This patch adds additional test cases involving vector_shuffles where either its left, right or both inputs are scalar_to_vector nodes. These test cases involve v16i8, v2i64, v4i32 and v8i16 vector shuffles, and were generated in preparation for D130487. Differential Revision: https://reviews.llvm.org/D130485	2022-08-15 12:30:58 -05:00
Filipp Zhinkin	1626ee6a95	[DAGCombine] Hoist shifts out of a logic operations tree. Hoist and combine shift operations from logic operations tree: logic (logic (SH x0, s), y), (logic (SH x1, s), z) --> logic (SH (logic x0, x1), s), (logic y, z) The transformation improves code generated for some cases related to the issue https://github.com/llvm/llvm-project/issues/49541. Correctness: https://alive2.llvm.org/ce/z/pVqVgY https://alive2.llvm.org/ce/z/YVvT-q https://alive2.llvm.org/ce/z/W5zTBq https://alive2.llvm.org/ce/z/YfJsvJ https://alive2.llvm.org/ce/z/3YSyDM https://alive2.llvm.org/ce/z/Bs2kzk https://alive2.llvm.org/ce/z/EoQpzU https://alive2.llvm.org/ce/z/Jnc_5H https://alive2.llvm.org/ce/z/_LP6k_ https://alive2.llvm.org/ce/z/KvZNC9 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D131189	2022-08-12 12:42:16 +03:00
Ting Wang	13c1e7a8aa	[PowerPC] Fix test case changed by "Add XXEVAL TD pattern" [NFC]	2022-08-12 02:56:54 -04:00
Ting Wang	12e1936f64	[PowerPC] Add XXEVAL TD pattern Add xxeval TD pattern for P10 on: eqv, nor, or, xor. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D131654	2022-08-12 01:27:24 -04:00
Chen Zheng	8d19cfb72e	[PowerPC] omit location attribute for TLS variable on AIX TLS debug on AIX is not ready for now. The location generated in no-integrated-as mode is wrong and in integrated-as mode causes AIX linker error. Reviewed By: Esme Differential Revision: https://reviews.llvm.org/D130245	2022-08-12 00:54:48 -04:00
esmeyi	7a70e6e224	[XCOFF] ignore the cold attribute. Summary: AIX XCOFF doesn't support the cold feature. While it shouldn't be a function error when XCOFF catching the cold attribute. As with the behavior of other formats, we just ignore the attribute for now. Reviewed By: DiggerLin Differential Revision: https://reviews.llvm.org/D131473	2022-08-11 01:13:05 -04:00
Umesh Kalappa	9757f4f2dd	[PowerPC] Don't use the S30 and S31 regs for the pic code These changes to address issue https://github.com/llvm/llvm-project/issues/55857. Since R30/S30 is used as pointer (32 bits) for GOT Table in the ppc32 ABI, remove it from the SPE callee save register when PIC is enabled. This prevents emitting the SPE load and store for S30 and S31 regs. Differential revision: https://reviews.llvm.org/D127495	2022-08-10 10:31:27 -05:00
Justin Hibbits	f43b228581	PowerPC: Don't hoist float multiply + add to fused operation on SPE SPE doesn't have a fmadd instruction, so don't bother hoisting a multiply and add sequence to this, as it'd become just a library call. Hoisting happens too late for the CTR usability test to veto using the CTR in a loop, and results in an assert "Invalid PPC CTR loop!".	2022-08-10 11:04:27 -04:00
Edd Barrett	fa250250b2	Migrate llvm.experimental.patchpoint() to ptr. This intrinsic used a typed pointer for a call target operand. This change updates the operand to be an opaque pointer and updates all pointers in all test files that use the intrinsic. Differential revision: https://reviews.llvm.org/D131261	2022-08-10 13:18:02 +01:00
Yuta Mukai	5357dd2f43	[MachinePipeliner] Fix Phi generation failure for large stages The previous code overwrites VRMap for prologue stages during Phi generation if a register spans many stages. As a result, the wrong register is used as the one coming from the prologue in Phis at later stages. (A process exists to correct this, but it does not work in all cases.) In addition, VRMap for prologue must be preserved until addBranches(). This patch fixes them by separating the map for Phis into a different variable (VRMapPhi). Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D127840	2022-08-09 13:14:26 +09:00
Chen Zheng	d9004dfbab	[PowerPC] mapping hardward loop intrinsics to powerpc pseudo Map hardware loop intrinsics loop_decrement and set_loop_iteration to the new PowerPC pseudo instructions, so that the hardware loop intrinsics will be expanded to normal cmp+branch form or ctrloop form based on the CTR register usage on MIR level. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D123366	2022-08-08 21:34:20 -04:00

1 2 3 4 5 ...

3428 Commits