llvm-project

Commit Graph

Author	SHA1	Message	Date
Chen Zheng	0a9b1c59f0	[PowerPC][GISel]support for float point and integer convertion Add support for fptosi,fptoui,sitofp,uitofp For now only handle 64 bit integer to make it does not depend on any other patches. 32 bit integer needs handling for G_SEXT/G_ZEXT. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D139174	2022-12-04 22:21:57 -05:00
Chen Zheng	b5e1fc19da	[PowerPC] don't check CTR clobber in hardware loop insertion pass We added a new post-isel CTRLoop pass in D122125. That pass will expand the hardware loop related intrinsic to CTR loop or normal loop based on the loop context. So we don't need to conservatively check the CTR clobber now on the IR level. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D135847	2022-12-04 20:53:49 -05:00
Jonas Paulsson	122efef8ee	Revert "Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions."" This reverts commit `17db0de330`. Some more bots got broken - need to investigate.	2022-12-05 00:52:00 +01:00
Jonas Paulsson	17db0de330	Reapply "[CodeGen] Add new pass for late cleanup of redundant definitions." Init captures added in processBlock() to avoid capturing structured bindings, which caused the build problems (with clang). RISCV has this disabled for now until problems relating to post RA pseudo expansions are resolved.	2022-12-03 14:15:15 -06:00
Chen Zheng	b61ff0ca76	[PowerPC] move ctrloop pass before tail duplication Tail duplication may modify the loop to a "non-canonical" form that CTR Loop pass can not recognize. We fixed one issue in D135846. And we found in some other case, the loop is changed to irreducible form. It is hard to fix this case in CTR loop pass, instead we reorder the CTR loop pass before tail duplication pass and just after finalize-isel pass to avoid any unexpected change to the loop form. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D138265	2022-12-02 00:31:00 -05:00
Chen Zheng	dff8227189	Revert "[PowerPC] handle more than two predecessors loop header in ctrloop pass" This reverts commit `df9d60af1f`. The CTRLoops pass is reordered to front of tail duplication pass in D138265.	2022-12-02 00:30:56 -05:00
Chen Zheng	4dfa12addd	[PowerPC] [NFC] add test for O0 pipeline This is to address comments https://reviews.llvm.org/D138265#3950197 This should be helpful for detecting optimization passes added to O0 pipeline by mistake. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D138973	2022-12-01 22:16:54 -05:00
Jonas Paulsson	8ef4632681	Revert "[CodeGen] Add new pass for late cleanup of redundant definitions." Temporarily revert and fix buildbot failure. This reverts commit `6d12599fd4`.	2022-12-01 13:29:24 -05:00
Jonas Paulsson	6d12599fd4	[CodeGen] Add new pass for late cleanup of redundant definitions. A new pass MachineLateInstrsCleanup is added to be run after PEI. This is a simple pass that removes redundant and identical instructions whenever found by scanning the MF once while keeping track of register definitions in a map. These instructions are typically immediate loads resulting from rematerialization, and address loads emitted by target in eliminateFrameInde(). This is enabled by default, but a target could easily disable it by means of 'disablePass(&MachineLateInstrsCleanupID);'. This late cleanup is naturally not "optimal" in removing instructions as it is done by looking at phys-regs, but still quite effective. It would be desirable to improve other parts of CodeGen and avoid these redundant instructions in the first place, but there are no ideas for this yet. Differential Revision: https://reviews.llvm.org/D123394 Reviewed By: RKSimon, foad, craig.topper, arsenm, asb	2022-12-01 13:21:35 -05:00
Roman Lebedev	7850ab2112	[NFC] Port an assortment of tests that invoke SROA to new pass manager	2022-12-01 21:17:18 +03:00
Freddy Ye	89f36dd8f3	[X86] Add ExpandLargeFpConvert Pass and enable for X86 As stated in https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528, this implementation is very similar to ExpandLargeDivRem, which expands ‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions with a bitwidth above a threshold into auto-generated functions. This is useful for targets like x86_64 that cannot lower fp convertions with more than 128 bits. The expanded nodes are referring from the IR generated by `compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`, and etc. Corner cases: 1. For fp16: as there is no related builtins added in compliler-rt. So I mainly utilized the fp32 <-> fp16 lib calls to implement. 2. For fp80: as this pass is soft fp emulation and no fp80 instructions can help in this problem. I recommend users to deprecate this usage. For now, the implementation uses fp128 as the temporary conversion type and inserts fptrunc/ext at top/end of the function. 3. For bf16: as clang FE currently doesn't support bf16 algorithm operations (convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for now. 4. For unsigned FPToI: since both default hardware behaviors and libgcc are ignoring "returns 0 for negative input" spec. This pass follows this old way to ignore unsigned FPToI. See this example: https://gcc.godbolt.org/z/bnv3jqW1M The end-to-end tests are uploaded at https://reviews.llvm.org/D138261 Reviewed By: LuoYuanke, mgehre-amd Differential Revision: https://reviews.llvm.org/D137241	2022-12-01 13:47:43 +08:00
Marco Elver	b95646fe70	Revert "Use-after-return sanitizer binary metadata" This reverts commit `d3c851d3fc`. Some bots broke: - https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8796062278266465473/overview - https://lab.llvm.org/buildbot/#/builders/124/builds/5759/steps/7/logs/stdio	2022-11-30 23:35:50 +01:00
Dmitry Vyukov	d3c851d3fc	Use-after-return sanitizer binary metadata Currently per-function metadata consists of: (start-pc, size, features) This adds a new UAR feature and if it's set an additional element: (start-pc, size, features, stack-args-size) Reviewed By: melver Differential Revision: https://reviews.llvm.org/D136078	2022-11-30 14:50:22 +01:00
Maryam Moghadas	7614ba0a5d	[PowerPC] Fix vperm codegen Commit rG934d5fa2b8672695c335deed0e19d0e777c98403 changed the vperm codegen for cases that vperm is not replaced by xxperm, this patch is to revert that. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D138736	2022-11-29 15:47:32 -06:00
Qiu Chaofan	b9be5a6823	Pre-commit PowerPC case for zero/inf fpclassify	2022-11-25 17:37:41 +08:00
Maryam Moghadas	934d5fa2b8	[PowerPC] Exploit xxperm, check for dead vectors and substitute vperm with xxperm vperm instruction requires the data to be in the Altivec registers, if one of the vector operands is not used after this vperm instruction then it can be substituted by xxperm which doubles the number of available registers. Reviewed By: stefanp Differential Revision: https://reviews.llvm.org/D133700	2022-11-23 13:28:12 -06:00
Stefan Pintilie	1ac6956b52	[PowerPC] Add handling for WACC register spilling. This patch adds spilling for the new WACC registers. In order to get the spilling test to work the MMA instructions from Power 10 are now supported for Future CPU except that they are all using the new WACC registers instead of the ACC registers from Power 10. Reviewed By: amyk, saghir Differential Revision: https://reviews.llvm.org/D136728	2022-11-22 09:37:52 -06:00
esmeyi	c7c7ef8bda	[XCOFF] set fragment for XMC_PR csects. Summary: -xcoff-traceback-table is a default option on AIX regardless of optimization and debug levels. An error of relocation for paired relocatable term is not yet supported in XCOFFObjectWriter::recordRelocation occurred when both of the -xcoff-traceback-table and -function-sections are enabled. The root cause is that we missed to calculate the symbols difference as absolute value before adding fixups when symbol_A without the fragment set is the csect itself and symbol_B is in it. This patch only sets the fragment for XMC_PR csects because we don't have other cases that hit this problem yet. Reviewed By: DiggerLin, hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D137230	2022-11-22 07:17:44 -05:00
Chen Zheng	d9143ce3fd	[PowerPC][GISel]add support for float point arithmetic operations Add global isel support for G_FADD, G_FSUB, G_FMUL, G_FDIV. Reviewed By: Kai, nemanjai, arsenm, amyk Differential Revision: https://reviews.llvm.org/D132942	2022-11-22 03:00:27 -05:00
Chen Zheng	375323fb85	[PowerPC] store the LR before stack update for big offsets. For case that LROffset + FrameSize can not be encoded to the LR store instruction, we have to store the LR before the stack update.	2022-11-22 07:25:28 +00:00
Chen Zheng	2aa8a1a3bd	[PowerPC][NFC] add test case for mflr store fix	2022-11-22 07:25:23 +00:00
Kai Nacke	2b1e895afb	[PowerPC] Add support for G_ADD and G_SUB. Extends the global isel implementation to support G_ADD and G_SUB. Reviewed By: arsenm, amyk Differential Revision: https://reviews.llvm.org/D128106	2022-11-21 23:35:17 +00:00
Kai Nacke	be4a1dfbf9	[PowerPC] Extend GlobalISel implementation to emit and/or/xor. Adds some more code to GlobalISel to enable instruction selection for and/or/xor. - Makes G_IMPLICIT_DEF, G_CONSTANT, G_AND, G_OR, G_XOR legal for 64bit register size. - Implement lowerReturn in CallLowering - Provides mapping of the operands to register banks. - Adds register info to G_COPY operands. The utility functions are all only implemented so far to support this use case. Especially the functions in PPCGenRegisterBankInfo.def are too simple for general use. Reviewed By: nemanjai, shchenz, amyk Differential Revision: https://reviews.llvm.org/D127530	2022-11-21 20:08:20 +00:00
Paul Scoropan	2234098291	[PowerPC] XCOFF exception section support on the integrated assembler path Continuation of https://reviews.llvm.org/D132146 (direct assembly path support, needs to merge first). Adds support to the integrated assembler path for emitting XCOFF exception sections. Both features need https://reviews.llvm.org/D133030 to merge first Reviewed By: shchenz, DiggerLin Differential Revision: https://reviews.llvm.org/D134195	2022-11-21 01:16:31 -05:00
Chen Zheng	f034c98af0	[PowerPC] mark dead def for ctr be clobber. TLS pseudo ADDIStlsgdHA will have such def. This dead def should also prevent PPC from generating CTR loops.	2022-11-18 06:55:42 +00:00
Qiu Chaofan	5d19fea81f	[PowerPC] Fix strict load-conversion recognition Direct-move instructions are usually more efficient than load then store for conversion. But direct moves are not needed when the source register was just loaded from some address. The pattern has already been recognized, but the source value of strict nodes are not the first (that's the chain), but the second. Reviewed By: shchenz Differential Revision: https://reviews.llvm.org/D138011	2022-11-16 10:02:10 +08:00
Qiu Chaofan	a853c42a6a	Pre-commit load/store cases for PowerPC direct-move	2022-11-15 17:35:49 +08:00
Chen Zheng	eb7d16ea25	[PowerPC] make expensive mflr be away from its user in the function prologue mflr is kind of expensive on Power version smaller than 10, so we should schedule the store for the mflr's def away from mflr. In epilogue, the expensive mtlr has no user for its def, so it doesn't matter that the load and the mtlr are back-to-back. Reviewed By: RolandF Differential Revision: https://reviews.llvm.org/D137423	2022-11-14 21:14:20 -05:00
Chen Zheng	6ceb607b30	[PowerPC][NFC] remove the rop-protect attribute in LIT cases. This flag will cause LLC warning: "'-rop-protection' is not a recognized feature for this target (ignoring feature)" Remove this unused feature first. We may also need to check why llc emits this warning as we declare '-rop-protection' not '+rop-protection'.	2022-11-08 05:51:02 +00:00
Chen Zheng	eb421c0c0e	[PowerPC][NFC] fix the LIT regressions This is to fix the wrong checking introdued in D64195. `std {{[0-9]+}}, 16(1)` is the store for the lr register. It breaks previous testing point before D64195.	2022-11-07 04:17:14 -05:00
Chen Zheng	6e557e28ec	[PowerPC][NFC] use script to generate check lines	2022-11-07 02:04:34 -05:00
Zequan Wu	a7fa5febaa	[Test] Fix CHECK typo. Differential Revision: https://reviews.llvm.org/D137287	2022-11-04 10:18:04 -07:00
Stefan Pintilie	9df924a634	[PowerPC] Add new DMR register classes to Future CPU. A new register class as well as a number of related subregisters are being added to Future CPU. These registers are Dense Math Registers (DMR) and are 1024 bits long. These regsiters can also be used in consecutive pairs which leads to a register that is 2048 bits. This patch also adds 7 new instructions that use these registers. More instructions will be added in future patches. Reviewed By: amyk, saghir Differential Revision: https://reviews.llvm.org/D136366	2022-11-03 08:29:55 -05:00
John Brawn	88ac25b357	[MachineCSE] Allow PRE of instructions that read physical registers Currently MachineCSE forbids PRE when the instruction reads a physical register. Relax this so that it's allowed when the value being read is the same as what would be read in the place the instruction would be hoisted to. This is being done in preparation for adding FPCR handling to the AArch64 backend, in order to prevent it to from worsening the generated code, but for targets that already have a similar register it should improve things. This patch affects code generation in several tests. The new code looks better except for in Thumb2/LowOverheadLoops/memcall.ll where we perform PRE but the LowOverheadLoops transformation then undoes it. Also in AMDGPU/selectcc-opt.ll the CHECK makes things look worse, but actually the function as a whole is better (as a MOV is PRE'd). Differential Revision: https://reviews.llvm.org/D136675	2022-11-02 13:53:12 +00:00
Paul Robinson	4f0a1201a4	[lit][REQUIRES] Fix some tests with incorrect REQUIRES clauses These weren't running anywhere because of bad specifications. One test has bit-rotted and had to be XFAILed, the rest are okay. Differential Revision: https://reviews.llvm.org/D136612	2022-11-01 13:49:23 -07:00
esmeyi	d1115c2b84	[PowerPC] Optimize compare by using record form in post-RA. Summary: We currently optimize the comparison only in SSA, therefore we will miss some optimization opportunities where the input of comparison is lowered from COPY in post-RA. Ie. ExpandPostRA::LowerCopy is called after PPCInstrInfo::optimizeCompareInstr. This patch optimizes the comparison in post-RA and only the cases that compare against zero can be handled. D131374 converts the comparison and its user to a compare against zero with the appropriate predicate on the branch, which creates additional opportunities for this patch. Reviewed By: shchenz, lkail Differential Revision: https://reviews.llvm.org/D131873	2022-10-31 01:33:50 -04:00
Daniel Thornburgh	75cdab6dc2	[llvm-objdump] Add --no-print-imm-hex to tests depending on it. This prepares for an upcoming change to make --print-imm-hex the default behavior of llvm-objdump. These tests were updated in a semi-automatic fashion. See D136972 for details.	2022-10-29 15:40:26 -07:00
John Brawn	7a7b36e96b	Revert "[MachineCSE] Allow PRE of instructions that read physical registers" This reverts commit `628467e53f`. This is causing a miscompile in ffmpeg when compiled for armv7.	2022-10-28 14:39:56 +01:00
John Brawn	628467e53f	[MachineCSE] Allow PRE of instructions that read physical registers Currently MachineCSE forbids PRE when the instruction reads a physical register. Relax this so that it's allowed when the value being read is the same as what would be read in the place the instruction would be hoisted to. This is being done in preparation for adding FPCR handling to the AArch64 backend, in order to prevent it to from worsening the generated code, but for targets that already have a similar register it should improve things. This patch affects code generation in several tests. The new code looks better except for in Thumb2/LowOverheadLoops/memcall.ll where we perform PRE but the LowOverheadLoops transformation then undoes it. Also in AMDGPU/selectcc-opt.ll the CHECK makes things look worse, but actually the function as a whole is better (as a MOV is PRE'd). Differential Revision: https://reviews.llvm.org/D136675	2022-10-27 14:14:57 +01:00
Amy Kwan	715301056e	[PowerPC] Fix invalid cast for vector shuffles when lowering to the xxsplti32dx instruction. When lowering vector shuffles into the xxsplti32dx instruction on Power10, we canonicalize the right operand to be a BUILD_VECTOR and as a result, get the commuted vector shuffle node. However, a vector shuffle will not always be returned as the result for a commuted vector shuffle. In such a scenario, this patch updates the original cast of a shuffle into a dyn_cast<> and checks if the shuffle is a valid vector shuffle node prior to obtaining the commuted shuffle mask. This patch also adds a new test case that demonstrates this scenario (primarily seen on 32-bit), and was originally a crash prior to this fix. Differential Revision: https://reviews.llvm.org/D135024	2022-10-24 09:56:54 -05:00
Craig Topper	db25f51e37	Revert "[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y))" This reverts commit `e8b3ffa532`. The AMDGPU/mad_64_32.ll seems to fail on some of the build bots but passes locally. I'm really confused.	2022-10-22 22:50:43 -07:00
Craig Topper	e8b3ffa532	[DAGCombiner] Fold (mul (sra X, BW-1), Y) -> (neg (and (sra X, BW-1), Y)) (sra X, BW-1) is either 0 or -1. So the multiply is a conditional negate of Y. This pattern shows up when type legalizing wide multiplies involving a sign extended value. Fixes PR57549. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D133399	2022-10-22 21:51:45 -07:00
Chen Zheng	df9d60af1f	[PowerPC] handle more than two predecessors loop header in ctrloop pass After ISEL, the "valid" loop header which has two predecessors (one is preheader and the other one is latch) may be transformed to have more than two predecessors by some optimizations, like tail duplicator, if the old header's successor(will be changed to new header) is a sub loop. The predecessors of the new loop header are preheader, loop latch and the loop latch(es) of the sub loop(old header's successor). Before the patch, ctrloop pass assumes two predecessors for candidate loop header. This patch fixes this case. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D135846	2022-10-19 01:11:58 +00:00
Stefan Pintilie	b107ff4856	[NFC][PowerPC] Add a test to check power 10 features. This patch only adds a single test for Power 10 features.	2022-10-18 09:05:24 -05:00
Peter Rong	c2e7c9cb33	[CodeGen] Using ZExt for extractelement indices. In https://github.com/llvm/llvm-project/issues/57452, we found that IRTranslator is translating `i1 true` into `i32 -1`. This is because IRTranslator uses SExt for indices. In this fix, we change the expected behavior of extractelement's index, moving from SExt to ZExt. This change includes both documentation, SelectionDAG and IRTranslator. We also included a test for AMDGPU, updated tests for AArch64, Mips, PowerPC, RISCV, VE, WebAssembly and X86 This patch fixes issue #57452. Differential Revision: https://reviews.llvm.org/D132978	2022-10-15 15:45:35 -07:00
Amy Kwan	22e4203df8	[PowerPC][NFC] Pre-commit case for lowering vector shuffles to xxsplti32dx (64 bit) This patch adds a test case for lowering vector shuffles to xxsplti32dx in preparation for D135024. The test case added in this patch only adds the 64-bit CHECKs, as the 32-bit CHECKs cannot be generated (in which D135024 aims to fix).	2022-10-14 10:15:34 -05:00
Nemanja Ivanovic	0d253bbd33	[PowerPC] Change CRNOT to a code gen single operand instruction Inputs to crnor can come from operands with chains so if it is being used simply to negate such an operand, the repeated input cannot be CSE'd. This patch just adds a code-gen only instruction for this that takes a single input and duplicates it in the encoding of the underlying crnor. Differential revision: https://reviews.llvm.org/D133577	2022-10-13 20:09:44 -05:00
Nemanja Ivanovic	a77a70fa3c	[PowerPC] Stash GPR to VSR if emergency spill slot is not reachable When removing frame indices on PowerPC, we need to scavenge a GPR to materialize a large constant if the stack offset for the spill/reload cannot be reached by a D-Form instruction. However, in a perfect storm of conditions, we may not have GPR's available to scavenge, thereby requiring an emergency spill. If such an emergency spill also needs to be spilled to a location with a large offset, it would itself require register scavenging thereby creating an infinite loop. This patch detects when the scavenger cannot scavenge a register and the spill/reload is to a location with a large offset. It then stashes a GPR into a VSR so that it can use the GPR to materialize the constant (rather than scavenging a GPR). Fixes: https://github.com/llvm/llvm-project/issues/52894 Differential revision: https://reviews.llvm.org/D124841	2022-10-13 09:06:37 -05:00
Peter Rong	c7dd7f20b0	[PowerPC] Pre-commit unit test change for D132978	2022-10-12 11:26:57 -07:00
Chen Zheng	5f4927da77	[PowerPC][NFC] refactor some test cases.	2022-10-12 12:19:52 +00:00

1 2 3 4 5 ...

3470 Commits