llvm-project

Commit Graph

Author	SHA1	Message	Date
Jessica Paquette	8aedb435db	[GlobalISel] Combine abs(undef) -> 0 SDAG does this, GISel doesn't. See https://gcc.godbolt.org/z/sqjMx3Tfv More context: https://github.com/llvm/llvm-project/issues/57256 Differential Revision: https://reviews.llvm.org/D135021	2022-10-01 15:16:32 -07:00
Jessica Paquette	424c69190c	Update missing test after `24553df57d`	2022-10-01 14:00:01 -07:00
Jessica Paquette	24553df57d	[GlobalISel] Combine `undef / X -> 0` and `undef % X -> 0` This fixes the `urem_undef_lhs` case in the following: https://gcc.godbolt.org/z/Wo9x7o679 Also see https://github.com/llvm/llvm-project/issues/57256 for more related bugs. This is equivalent to the undef bits in `simplifyDivRem` in the DAGCombiner. Differential Revision: https://reviews.llvm.org/D135020	2022-10-01 13:48:55 -07:00
zhongyunde	4a549be9c3	[AArch64] Lower multiplication by a negative constant to shl+sub+shl Change the costmodel to lower a = b * C where C = -(2^n - 2^m) to lsl w8, w0, m sub w0, w8, w0, lsl n Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134934	2022-10-01 21:27:42 +08:00
Nilanjana Basu	85714ff0bd	[AArch64] Unit test for trunc lowering from i64 to i8 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134840	2022-09-30 13:03:12 -07:00
Peter Collingbourne	0caa9d4b1e	AArch64: Don't use RETA[AB] when ShadowCallStack is enabled. When returning from a function with both SCS and PAC-RET enabled, we need to authenticate the return address from the stack and then load from the SCS, but this was happening in the reverse order when RETA[AB] were being used. Fix it by disabling the use of RETA[AB] when SCS is enabled. Fixes pr58072. Differential Revision: https://reviews.llvm.org/D134931	2022-09-30 12:33:23 -07:00
Zain Jaffal	137faa4a84	[AArch64] Add additional tests for SMULL instruction selection Add tests where the operands are switched and where the top bit of the operand is set to 1 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134867	2022-09-30 12:32:26 +01:00
Zain Jaffal	661403b85c	[AArch64] Add support for 128-bit non temporal loads. Adding to the work done in `D131773` here we add support to 128-bit loads. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D132559	2022-09-30 11:04:04 +01:00
Amara Emerson	7653586d88	[AArch64][GlobalISel] Implement another combine for shufflevector->AArch64 G_EXT. This is a port of an existing optimization in AArch64 ISelLowering, handling a case when the same input vector can be used for both ext inputs. Differential Revision: https://reviews.llvm.org/D134891	2022-09-29 22:53:24 +01:00
zhongyunde	62a51c357c	[AArch64] Lower multiplication by a constant int to shl+sub+shl Decompose the const 14 can be separated from D132322 Change the costmodel to lower a = b * C where C = 2^n - 2^m to lsl w8, w0, n sub w0, w8, w0, lsl m Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134706	2022-09-30 01:31:06 +08:00
Amara Emerson	a1aa0390cb	[AArch64][GlobalISel] Update shuffle->ext test before patch.	2022-09-29 17:20:34 +01:00
Jessica Paquette	1eb49bbab6	[GlobalISel][CallLowering] Use hasRetAttr for return flags on CallBases Given something like this: ``` declare signext i16 @signext_callee() define i32 @caller() { %res = call i16 @signext_callee() ... } ``` CallLowering would miss that signext_callee's return value is sign extended, because it isn't on the call. Use hasRetAttr on the CallBase to allow us to catch this. (This now inserts G_ASSERT_SEXT/G_ASSERT_ZEXT like in the original review.) Differential Revision: https://reviews.llvm.org/D86228	2022-09-28 19:38:24 -07:00
Jessica Paquette	95dabac7a5	[AArch64][GlobalISel] Make G_PTRTOINT only legal for s64 + p0 A few issues: 1. There was no legalizer test for G_PTRTOINT 2. Same clamping issue as in many other opcodes 3. AArch64 pointers can only be 64b, so in reality we always have to trunc or extend with any size other than p0 anyway. This seems to actually produce more correct selection for narrow types as well. Differential Revision: https://reviews.llvm.org/D107588	2022-09-28 16:20:24 -07:00
Jessica Paquette	a7aaafde2e	[AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN This is intended to be equivalent to the s32 + s64 cases in AArch64TargetLowering::LowerFCOPYSIGN. Widen everything and then use G_BIT + a mask to handle the actual copysign operation. Then, narrow back down to s32/s64. I wasn't sure about what the best/most canonical INSERT_SUBREG-selectable pattern is. I chose G_INSERT_VECTOR_ELT + an undef vector because it produces reasonably okay codegen. (It doesn't produce INSERT_SUBREG right now though.) If there's a better way to do this then I'm happy to change it. We also have a couple codegen deficiencies with how we emit vector constants right now. (We need a GISel equivalent to the tryAdvSIMDModImm64 stuff) Differential Revision: https://reviews.llvm.org/D108725	2022-09-28 16:03:22 -07:00
Florian Mayer	0401dc2913	[MTE] [HWASan] unify isInterestingAlloca Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D134779	2022-09-28 15:52:34 -07:00
Jessica Paquette	4957ee6529	[AArch64][GlobalISel] Add a target-specific G_BIT opcode. This is necessary for custom-legalizing G_FCOPYSIGN. This is equivalent to the BIT instruction (bitwise insert if true). Add selection testcases for imported patterns. Differential Revision: https://reviews.llvm.org/D108714	2022-09-28 15:48:35 -07:00
Matt Devereau	0a4771a7e8	[AArch64][SVE] Expand gather index to 32 bits instead of 64 bits For gathers which load in 8 and 16 bit data then use that data as an index, the index can be extended to 32 bits instead of 64 bits Differential Revision: https://reviews.llvm.org/D130692	2022-09-28 14:42:12 +00:00
Florian Hahn	2d3c260362	[AArch64] break non-temporal loads over 256 into 256-loads and a smaller load Currently over 256 non-temporal loads are broken inefficently. For example, `v17i32` gets broken into 2 128-bit loads. It is better if we can use 256-bit loads instead. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133421	2022-09-28 15:20:26 +01:00
Cullen Rhodes	3918ef07c4	[AArch64][SVE] Remove redundant ptest after match/nmatch These instructions are flag setting so the ptest is redundant, the TableGen class wasn't setting the element size for the predicate causing the checks in AArch64InstrInfo::optimizePTestInstr to fail.	2022-09-28 08:23:23 +00:00
Cullen Rhodes	e36ffdf42e	[AArch64][SVE] Precommit tests for redundant ptest after match/nmatch	2022-09-28 08:23:23 +00:00
Zain Jaffal	4fbbfd2530	[AArch64] Add tests for selecting SMULL instruction where the operand is zero extended and the top bit value is 0 This covers the case where we can convert a zext instruction to a sext and then select smull Differential Revision: https://reviews.llvm.org/D134643	2022-09-27 19:43:43 +01:00
David Green	401481daac	[AArch64] Remove incorrect zero element insert-bitcast patterns These two patterns are not working as intended, as shown in D134022. They need to insert the value into the new register, not override it.	2022-09-27 17:08:17 +01:00
David Sherwood	fbb119412f	[AArch64] Add Neoverse V2 CPU support Adds support for the Neoverse V2 CPU to the AArch64 backend. Differential Revision: https://reviews.llvm.org/D134352	2022-09-27 07:56:08 +00:00
David Green	bebc96956b	[AArch64] Enable FeatureFuseAdrpAdd for all Arm cpus The commit D120104 enabled FeatureFuseAdrpAdd for -mcpu=generic, allowing the linker to relax adrp;add pairs where possible. D132075 extended that to neoverse-n1, this patch extends it to all other cortex and neoverse cpus for the same reasons. Differential Revision: https://reviews.llvm.org/D134521	2022-09-26 09:55:10 +01:00
Amaury Séchet	dc947ba3f5	Autogenerate a couple of test in the AMDGPU backend. NFC	2022-09-25 14:12:49 +00:00
Caroline Concatto	5431bf27bd	[AArch64]Remove svget/svset/svcreate from llvm This patch removes the aarch64 instrinsic svget/svset/svcreate from llvm. It also implements the InstCombine for vector.extract that used to be in svget. Depends on: D131547 Differential Revision: https://reviews.llvm.org/D131548	2022-09-23 10:48:43 +01:00
Philip Reames	b9c4733079	[DAG] Move one-use add of splat to base of scatter/gather This extends the uniform base transform used with scatter/gather to support one-use vector adds-of-splats with a non-zero base. This has the effect of essentially reassociating an add from vector to scalar domain. The motivation is to improve the lowering of scatter/gather operations fed by complex geps. Differential Revision: https://reviews.llvm.org/D134472	2022-09-22 18:45:12 -07:00
Sanjay Patel	ef7d61d67c	[AArch64] add tests for vector load combining; NFC More coverage for D133584	2022-09-22 11:43:37 -04:00
Florian Hahn	ac434afed8	[AArch64] Try to fold shuffle (tbl2, tbl2) to tbl4. shuffle (tbl2, tbl2) can be folded into a single tbl4 if the mask for the selected elements is constant. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133491	2022-09-21 19:15:56 +01:00
Sanjay Patel	ddd27a3d39	[AArch64] add tests for fadd -> fma combines; NFC The transform to create a final fma was added with: D132837 / `c98a46fee6` These tests are intended to show the minimal fast-math-flags necessary to enable the fold: currently only the final fadd needs to have "reassoc".	2022-09-21 09:00:11 -04:00
Alex Richardson	b84be9f2f1	Add all constant physical registers to callee preserved masks This allows MachineCopyPropagation to eliminate copies of constant registers such as zero registers. They were previously not being eliminated as the check for MO.clobbersPhysReg(AvailSrc) would return true for constant registers such as MIPS $zero. To avoid having to manually add the zero registers to all CalleeSavedRegs instantiations in tablegen, I instead added a new isConstant bit to the Register and set this for MIPS, RISC-V, and AArch64 zero registers. RegisterInfoEmitter.cpp looks at this flag and adds all constant registers to the preserved register mask. This may also benefit other passes but so far I have only seen differences in MachineCopyPropagation. In the future it might make sense to generate `isConstantPhysReg()` from this information. Original source: `8588d8b814` Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D131958	2022-09-21 12:50:12 +00:00
Alex Richardson	963287acbf	Add a baseline test for D131958 This test shows that the save of MIPS $zero to a callee-saved register is not elided by the machine-cp pass. Differential Revision: https://reviews.llvm.org/D131957	2022-09-21 12:50:12 +00:00
David Green	4f78e022ee	[AArch64] Lower scalar sqxtn intrinsics to use fp registers The llvm.aarch64.neon.scalar.sqxtn.i32.i64 intrinsics take and return integer types, but operate on fp registers. This can create some inefficiencies in their lowering, where the registers are converted to fp a little too late. This patch adds lowering for the intrinsics, creating bitcasts to/from fp types to allow nicer folding later when the instructions are selected, especially around insert/extracts. Differential Revision: https://reviews.llvm.org/D134024	2022-09-21 10:46:43 +01:00
David Green	9a20596f48	[AArch64] Insert/Extract of bitcast patterns This adds some quick tablegen patterns for vector_insert(bitcast(..)) and bitcast(vector_extract(..)), allowing us to avoid a round-trip through GPRs. Differential Revision: https://reviews.llvm.org/D134022	2022-09-21 09:54:17 +01:00
David Green	cb375e8c1f	[AArch64] Enable LSLFast for modern OoO cpus This patch enables the LSLFast feature for Cortex-A76, Cortex-A77, Cortex-A78, Cortex-A78C, Cortex-A710, Cortex-X1, Cortex-X2, Neoverse N1, Neoverse N2, Neoverse V1 and the Neoverse 512TB pseudo-cpu, in-line with the software optimization guides for those CPUs. Differntial revision: https://reviews.llvm.org/D134273	2022-09-20 17:09:14 +01:00
Amara Emerson	78833a43e8	[GlobalISel][Legalizer] Fix lowerSelect() not sign-extending the mask value. I'm not sure why the SEXT_INREG was gated on a bitwidth check of the mask vs element size. This fixes a miscompile in chromium's skia library. Differential Revision: https://reviews.llvm.org/D134236	2022-09-20 16:40:34 +01:00
Caroline Concatto	d32b8fdbdb	[LLVM][AArch64] Replace aarch64.sve.ld by aarch64.sve.ldN.sret This patch removes the intrinsic aarch64.sve.ldN from tablegen in favour of using arch64.sve.ldN.sret. Depends on: D133023 Differential Revision: https://reviews.llvm.org/D133025	2022-09-20 13:15:07 +01:00
Caroline Concatto	2aad9093ac	[AArch64][NFC] Correctly rename mangling name for ldN.sret Remove from the function name the predicate type and pointer type, because: The predicate type in the name(nxvNi1) can be deduced from the overloaded element count(nxvNEltTy). The pointer type(p0EltTy) can be deduced from the overloaded element type. Differential Revision: https://reviews.llvm.org/D133023	2022-09-20 09:56:44 +01:00
David Green	908b3b6ccb	[AArch64] Use fast-math-flags in isAssociativeAndCommutative Previously only using the UnsafeFPMath option, this now looks for the fast moth flags on the instructions, using the same flag flags as other backends.	2022-09-19 11:34:00 +01:00
Sander de Smalen	bed214cf0f	[AArch64][SME] Add intrinsics for enabling/disabling ZA. This adds the intrinsics: * void @llvm.aarch64.sme.za.enable() -> smstart za * void @llvm.aarch64.sme.za.disable() -> smstop za Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D133894	2022-09-17 16:41:42 +00:00
Sander de Smalen	5fae000f36	[AArch64][SME] Disable tail-call optimization when streaming mode change or lazy-save may be required. When a streaming mode change is (or may be) required for a call, it will need to restore the original mode after the call, which prevents the use of tail-call optimization. The same holds true for a call that requires the lazy-save mechanism to be set up before the call, and possibly restored after. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131579	2022-09-17 16:15:07 +00:00
Vladislav Dzhidzhoev	6cf11f4462	[GlobalISel][DebugInfo] Salvage trivially dead instructions Use salvageDebugInfo for instructions erased as trivially dead in GlobalISel. It would be helpful to implement support of G_PTR_ADD and G_FRAME_INDEX in salvageDebugInfo in future in order to preserve more variable location. Reviewed by: arsenm Differential Revision: https://reviews.llvm.org/D133986	2022-09-17 03:54:55 +03:00
Sander de Smalen	bd4935c175	[AArch64][SME] Implement ABI for calls from streaming-compatible functions. When a function is streaming-compatible and calls a function with a normal or streaming interface, it may need to enable/disable stremaing mode before the call, and needs to restore PSTATE.SM after the call. This patch implements this with a Pseudo node that gets expanded to a conditional branch and smstart/smstop node. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131578	2022-09-16 14:48:37 +00:00
Liqiang Tao	2e37557fde	StackProtector: ensure stack checks are inserted before the tail call The IR stack protector pass should insert stack checks before the tail calls not only the musttail calls. So that the attributes `ssqreq` and `tail call`, which are emited by llvm-opt, could be both enabled by llvm-llc. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D133860	2022-09-16 22:24:46 +08:00
Sander de Smalen	b00c36c295	[AArch64][SME] Implement ABI for calls to/from streaming functions. This patch implements the ABI for calls from: Normal -> Streaming Normal -> Streaming-compatible Streaming -> Normal Streaming -> Streaming-compatible Streaming -> Streaming The compiler inserts SMSTART/SMSTOP instructions before and after the call, depending on the required transition. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131576	2022-09-16 14:07:47 +00:00
Florian Hahn	6b86b481e3	[AArch64] Use tbl for truncating vector FPtoUI conversions. On AArch64, doing the vector truncate separately after the fptoui conversion can be lowered more efficiently using tbl.4, building on D133495. https://alive2.llvm.org/ce/z/T538CC Depends on D133495 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133496	2022-09-16 14:57:43 +01:00
Florian Hahn	8491d01cc3	[AArch64] Lower vector trunc using tbl. Similar to using tbl to lower vector ZExts, tbl4 can be used to lower vector truncates. The initial version support i32->i8 conversions. Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133495	2022-09-16 12:42:49 +01:00
Nikita Popov	b4309800e9	[CodeGen] Don't zero callee-save registers with zero-call-used-regs (PR57692) Callee save registers must be preserved, so -fzero-call-used-regs should not be zeroing them. The previous implementation only did not zero callee save registers that were saved&restored inside the function, but we need preserve all of them. Fixes https://github.com/llvm/llvm-project/issues/57692. Differential Revision: https://reviews.llvm.org/D133946	2022-09-16 11:52:29 +02:00
Florian Hahn	9f2c39418b	[AArch64] Add tests with 2 x tbl2 for v8i8 and nonconst masks. Extra tests for D133491.	2022-09-16 10:25:27 +01:00
Florian Hahn	5871f18827	[AArch64] Lower extending uitofp using tbl. On AArch64, doing the zero-extend separately first can be lowered more efficiently using tbl, building on D120571. https://alive2.llvm.org/ce/z/8Je595 Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133494	2022-09-16 10:20:25 +01:00

1 2 3 4 5 ...

5955 Commits