llvm-project

Commit Graph

Author	SHA1	Message	Date
Victor Campos	79f9c79aaf	[AArch64][MC] Merge FeaturePMU into FeaturePerfMon FeaturePMU was created in AArch64 to accommodate one missing system register, PMMIR_EL1, in commit `ffcd7698ae`. However, the Performance Monitors extension already had a target feature, which is called FeaturePerfMon. Therefore, FeaturePMU is redundant. This patch removes FeaturePMU and merges its contents into FeaturePerfMon. Reviewed By: dnsampaio Differential Revision: https://reviews.llvm.org/D109246	2021-09-06 14:56:49 +01:00
David Truby	b297531ece	[AArch64][sve] Prevent incorrect function call on fixed width vector The isEssentiallyExtractHighSubvector function currently calls getVectorNumElements on a type that in specific cases might be scalable. Since this function only has correct behaviour at the moment on scalable types anyway, the function can just return false when given a fixed type. Differential Revision: https://reviews.llvm.org/D109163	2021-09-06 14:25:03 +01:00
Fangrui Song	0e03450ae4	[AArch64] Remove an uneeded !NeedsWinCFI check. NFC	2021-09-05 21:02:56 -07:00
guopeilin	5f48c144c5	[AArch64][GlobalISel] Use ZExtValue for zext(xor) when invert tb(n)z Currently, we use SExtValue to decide whether to invert tbz or tbnz. However, for the case zext (xor x, c), we should use ZExt rather than SExt otherwise we will generate totally opposite branches. Reviewed By: paquette Differential Revision: https://reviews.llvm.org/D108755	2021-09-06 11:12:07 +08:00
Kevin Athey	c7f50a445e	Revert "[AArch64] Implement target hook function to decide folding (mul (add x, c1), c2)" This reverts commit `095bea23d0`. Broke buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/11411	2021-09-03 18:08:58 -07:00
Ben Shi	095bea23d0	[AArch64] Implement target hook function to decide folding (mul (add x, c1), c2) Prevent the folding if it leads to worse code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D108871	2021-09-04 07:24:23 +08:00
Florian Mayer	abf8ed8a82	[hwasan] Support more complicated lifetimes. This is important as with exceptions enabled, non-POD allocas often have two lifetime ends: the exception handler, and the normal one. Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D108365	2021-09-03 10:29:50 +01:00
Cullen Rhodes	dc5dd77ac7	[AArch64][SME] Support NEON vector to GPR integer moves in streaming mode A small subset of the NEON instruction set is legal in streaming mode. This patch adds support for the following vector to integer move instructions: 0x00 1110 0000 0001 0010 11xx xxxx xxxx # SMOV W\|Xd,Vn.B[0] 0x00 1110 0000 0010 0010 11xx xxxx xxxx # SMOV W\|Xd,Vn.H[0] 0100 1110 0000 0100 0010 11xx xxxx xxxx # SMOV Xd,Vn.S[0] 0000 1110 0000 0001 0011 11xx xxxx xxxx # UMOV Wd,Vn.B[0] 0000 1110 0000 0010 0011 11xx xxxx xxxx # UMOV Wd,Vn.H[0] 0000 1110 0000 0100 0011 11xx xxxx xxxx # UMOV Wd,Vn.S[0] 0100 1110 0000 1000 0011 11xx xxxx xxxx # UMOV Xd,Vn.D[0] Only the zero index variants are legal, all others indexes are illegal. To support this, new instructions are defined specifically for zero index which is hardcoded, along an implicit 'VectorIndex0' operand. Since the index operand is implicit and takes no bits in the encoding, custom decoding is required to add the operand. I'm not sure if this is the best approach but the predicate constraint on a subset of an operand is unusual. Would be interested to hear some alternatives. The instructions are predicated on 'HasNEONorStreamingSVE', i.e. they're enabled by either +neon or +streaming-sve. This follows on from the work in D106272 to support the subset of SVE(2) instructions that are legal in streaming mode. Depends on D107902. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D107903	2021-09-03 07:59:17 +00:00
Cullen Rhodes	1dcd900d1d	[AArch64][ISel] NFC: DAG.getMachineFunction() -> MF Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D109135	2021-09-03 07:59:17 +00:00
Amara Emerson	6d9505b8e0	[AArch64][GlobalISel] Support for folding G_ROTR as shifted operands. This allows selection like: eor w0, w1, w2, ror #8 Saves 500 bytes on ClamAV -Os, which is 0.1%. Differential Revision: https://reviews.llvm.org/D109206	2021-09-02 21:37:24 -07:00
Bradley Smith	14e1a4a6ee	[AArch64][SVE] Workaround incorrect types when lowering fixed length gather/scatter When lowering a fixed length gather/scatter the index type is assumed to be the same as the memory type, this is incorrect in cases where the extension of the index has been folded into the addressing mode. For now add a temporary workaround to fix the codegen faults caused by this by preventing the removal of this extension. At a later date the lowering for SVE gather/scatters will be redesigned to improve the way addressing modes are handled. As a short term side effect of this change, the addressing modes generated for fixed length gather/scatters will not be optimal. Differential Revision: https://reviews.llvm.org/D109145	2021-09-02 15:07:24 +00:00
Roman Lebedev	3f1f08f0ed	Revert @llvm.isnan intrinsic patchset. Please refer to https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html (and that whole thread.) TLDR: the original patch had no prior RFC, yet it had some changes that really need a proper RFC discussion. It won't be productive to discuss such an RFC, once it's actually posted, while said patch is already committed, because that introduces bias towards already-committed stuff, and the tree is potentially in broken state meanwhile. While the end result of discussion may lead back to the current design, it may also not lead to the current design. Therefore i take it upon myself to revert the tree back to last known good state. This reverts commit `4c4093e6e3`. This reverts commit `0a2b1ba33a`. This reverts commit `d9873711cb`. This reverts commit `791006fb8c`. This reverts commit `c22b64ef66`. This reverts commit `72ebcd3198`. This reverts commit `5fa6039a5f`. This reverts commit `9efda541bf`. This reverts commit `94d3ff09cf`.	2021-09-02 13:53:56 +03:00
David Sherwood	d581d94385	[SVE] Fix the FP arithmetic instruction costs for SVE Several FP instructions (fadd, fsub, etc.) were incorrectly assigned a higher cost for SVE because they have custom lowering, however we know they are legal. This patch explicitly assigns a cost of 2 to these opcodes. Tests added here: Analysis/CostModel/AArch64/arith-fp-sve.ll Differential Revision: https://reviews.llvm.org/D108993	2021-09-02 09:55:13 +01:00
Jon Roelofs	9237eda304	Revert "[AArch64][GlobalISel] Legalize bswap <2 x i16>" This reverts commit `5cd63e9ec2`. https://bugs.llvm.org/show_bug.cgi?id=51707 The sequence feeding in/out of the rev32/ushr isn't quite right: _swap: ldr h0, [x0] ldr h1, [x0, #2] - mov v0.h[1], v1.h[0] + mov v0.s[1], v1.s[0] rev32 v0.8b, v0.8b ushr v0.2s, v0.2s, #16 - mov h1, v0.h[1] + mov s1, v0.s[1] str h0, [x0] str h1, [x0, #2] ret	2021-09-01 16:49:20 -07:00
Amara Emerson	a86bbe1e31	[AArch64][GlobalISel] Handle any-extending FPR loads in manual selection code. When we have an any-extending FPR bank load, none of the tablegen patterns match and we fall back to the C++ selector. Like with the truncating stores that were fixed recently, the C++ wasn't able to handle it and ended up generating invalid copies between different size regclasses. This change adds handling for this case, splitting the load into a regular load and a SUBREG_TO_REG to extend it into the original wide destination reg.	2021-09-01 10:19:22 -07:00
Nikita Popov	c1b7540645	[TTI] Sink IVDescriptors.h include (NFC) Forward declare RecurrenceDescriptor and include IVDescritor.h only in implementation code that actually needs it.	2021-08-30 22:41:58 +02:00
Owen Anderson	db9de22f2b	Teach the AArch64 backend patterns to generate the EOR3 instruction. Adds patterns to match the EOR3 instruction. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D108793	2021-08-30 20:01:08 +00:00
Nikita Popov	0529e2e018	[InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI) The backend generally uses 64-bit immediates (e.g. what MachineOperand::getImm() returns), so use that for analyzeCompare() and optimizeCompareInst() as well. This avoids truncation for targets that support immediates larger 32-bit. In particular, we can avoid the bugprone value normalization hack in the AArch64 target. This is a followup to D108076. Differential Revision: https://reviews.llvm.org/D108875	2021-08-30 19:46:04 +02:00
Jun Ma	15b2a8e7fa	[AArch64][SVE] Optimize ptrue predicate pattern with known sve register width. For vectors that are exactly equal to getMaxSVEVectorSizeInBits, just use AArch64SVEPredPattern::all, which can enable the use of unpredicated ptrue when available. TestPlan: check-llvm Differential Revision: https://reviews.llvm.org/D108706	2021-08-27 20:03:48 +08:00
Jun Ma	8c47103491	[AArch64][SVE] Add API for conversion between SVE predicate pattern and element number. NFC This patch solely moves convert operation between SVE predicate pattern and element number into two small functions. It's pre-commit patch for optimize pture with known sve register width. Differential Revision: https://reviews.llvm.org/D108705	2021-08-27 20:03:48 +08:00
Jun Ma	3f919dfe0d	[AArch64][SVE] Use getPTrue uniformly.NFC.	2021-08-27 20:03:48 +08:00
Jessica Paquette	2363a20001	[AArch64][GlobalISel] Optimize G_BUILD_VECTOR of undef + 1 elt -> SUBREG_TO_REG This pattern ``` %elt = ... something ... %undef = G_IMPLICIT_DEF %vec = G_BUILD_VECTOR %elt, %undef, %undef, ... %undef ``` Can be selected to a SUBREG_TO_REG, assuming `%elt` and `%vec` have the same register bank. We don't care about any of the bits in `%vec` aside from those in `%elt`, which just happens to be the 0th element. This is preferable to emitting `mov` instructions for every index. This gives minor code size improvements on the test suite at -Os. Differential Revision: https://reviews.llvm.org/D108773	2021-08-26 11:45:11 -07:00
Jacob Bramley	05f3219b38	[AArch64] Lower fptoi.sat intrinsics for NEON. Following on from D102353, extend the fptoi.sat intrinsics to use NEON fcvt* instructions. Differential Revision: https://reviews.llvm.org/D108460	2021-08-26 15:37:00 +01:00
Matthew Devereau	9b830c798e	[AArch64][SVE] Teach cost model masked gathers/scatters are cheap Tell the cost model to use the scalable calculation for non-neon fixed vector. This results in a cheaper cost for fixed-length SVE masked gathers/scatters allowing the vectorizor to emit them more frequently.	2021-08-26 11:17:47 +01:00
David Green	6ffc6951a3	[AArch64] Remove unpredictable from narrowing instructions. Like other similar instructions the xtn2 family do not have side effects, and explicitly marking them as such can help improve scheduling freedom.	2021-08-26 09:43:44 +01:00
Nicholas Guy	36fcf47fc8	[AArch64] Generate SMOV in place of sext(fmov(...)) A single smov instruction is capable of moving from a vector register while performing the sign-extend during said move, rather than each step being performed by separate instructions. Differential Revision: https://reviews.llvm.org/D108633	2021-08-25 15:23:22 +01:00
Amara Emerson	2ed8053d46	Revert "[AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores." This reverts commit `67bf3ac744`. The reason is that this change is now superseded by `04fb9b729a` which fixes the underlying problem in the selector. Now it's fine to generate truncating FP stores since the selector code will just generate subreg copies to handle them.	2021-08-24 16:26:56 -07:00
Amara Emerson	04fb9b729a	[AArch64][GlobalISel] Fix incorrect handling of fp truncating stores. When the tablegen patterns fail to select a truncating scalar FPR store, our manual selection code also failed to handle it silently, trying to generate an invalid copy. Fix this by adding support in the manual code to generate a proper subreg copy before selecting a non-truncating store.	2021-08-24 16:07:00 -07:00
Jessica Paquette	ef8707574b	[AArch64][GlobalISel] Legalize narrow scalar FP arithmetic Widen narrow fp arithmetic ops (e.g. G_FADD). When we don't have full FP16 support, widen to s32. Otherwise widen to s16. https://godbolt.org/z/TbT9Pqa7e Differential Revision: https://reviews.llvm.org/D108660	2021-08-24 13:54:28 -07:00
Jessica Paquette	db232de193	[AArch64][GlobalISel] Legalize + select v2p0 -> v264 G_PTRTOINT 1) Just mark this case as legal because it can just be a copy. 2) Ensure the copy in the existing code actually gets selected. Without doing this, we'll crash because the destination won't have a register class. This fell back 35 times in a build of clang with GISel for AArch64. Differential Revision: https://reviews.llvm.org/D108610	2021-08-24 11:02:01 -07:00
Jessica Paquette	67d4dd5c07	[AArch64][GlobalISel] Select @llvm.aarch64.neon.ld4.* Reuse the selection code from the ld2 case. This is similar to how SDAG handles things in AArch64ISelDAGToDAG. (See SelectLoad) This fell back ~100 times while building clang with GISel enabled for AArch64. Factoring out the gross subreg copy part ought to make selecting the rest of this family fairly easy. Differential Revision: https://reviews.llvm.org/D108600	2021-08-24 09:03:49 -07:00
Cullen Rhodes	e9c8973f1c	[AArch64][SME] Fix v8.6a bf16 NEON instruction predication In streaming mode on SME targets only the scalar BFCVT armv8.6-a instruction is legal, predicate the illegal instructions on NEON to disable them in streaming mode (see D107902). BFCVT is predicated on HasNEONorStreamingSVE. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D108279	2021-08-24 08:13:57 +00:00
Jessica Paquette	2ec2b25fba	[AArch64][GlobalISel] Select @llvm.aarch64.neon.ld2.* This is pretty similar to the ST2 selection code in `AArch64InstructionSelector::selectIntrinsicWithSideEffects`. This is a GISel equivalent of the ld2 case in `AArch64DAGToDAGISel::Select`. There's some weirdness there that appears here too (e.g. using ld1 for scalar cases, which are 1-element vectors in SDAG.) It's a little gross that we have to create the copy and then select it right after, but I think we'd need to refactor the existing copy selection code quite a bit to do better. This was falling back while building llvm-project with GISel for AArch64. Differential Revision: https://reviews.llvm.org/D108590	2021-08-23 17:15:53 -07:00
David Green	50f4ae58eb	[AArch64] Correct store ReadAdrBase operand It appears that the Read operand for stores was being placed on the first operand (the stored value) not the address base. This adds a ReadST for the stored value operand, allowing the ReadAdrBase to correctly act upon the address. Differential Revision: https://reviews.llvm.org/D108287	2021-08-23 21:07:55 +01:00
Jessica Paquette	a2c8e17658	[AArch64][GlobalISel] Add regbankselect support for G_LLROUND Same as G_LROUND: destination should always be a GPR, source should always be a FPR. Differential Revision: https://reviews.llvm.org/D108566	2021-08-23 10:32:20 -07:00
Jessica Paquette	fe51f9098b	[AArch64][GlobalISel] Legalize G_LLROUND for s64 + s32 Same as G_LROUND. Also add a TODO for full fp16 legalization. Differential Revision: https://reviews.llvm.org/D108564	2021-08-23 09:45:23 -07:00
Florian Hahn	d024a01511	Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64" This reverts the revert `ab9296f13b`. The issue causing the revert should be fixed in `9baed023b4`.	2021-08-23 11:25:27 +01:00
Cullen Rhodes	fb82b836b7	[AArch64][SME] Support NEON scalar FP instructions in streaming mode The following scalar FP instructions are legal in streaming mode: 0101 1110 xx1x xxxx 11x1 11xx xxxx xxxx # FMULX/FRECPS/FRSQRTS (scalar) 0101 1110 x10x xxxx 00x1 11xx xxxx xxxx # FMULX/FRECPS/FRSQRTS (scalar, FP16) 01x1 1110 1x10 0001 11x1 10xx xxxx xxxx # FRECPE/FRSQRTE/FRECPX (scalar) 01x1 1110 1111 1001 11x1 10xx xxxx xxxx # FRECPE/FRSQRTE/FRECPX (scalar, FP16) Predicate them on `HasNEONorStreamingSVE`. Full list of affected instructions: FMULX16, FMULX32, FMULX64, FRECPS16, FRECPS32, FRECPS64, FRSQRTS16, FRSQRTS32, FRSQRTS64, FRECPEv1f16, FRECPEv1i32, FRECPEv1i64, FRECPXv1f16, FRECPXv1i32, FRECPXv1i64, FRSQRTEv1f16, FRSQRTEv1i32, FRSQRTEv1i64 Depends on D107902. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions Execution of NEON instructions that are illegal in streaming mode will cause a trap or exception. Using FMULX [1] as an example, this check is at the top of the pseudocode: if elements == 1 then CheckFPEnabled64(); else CheckFPAdvSIMDEnabled64(); For the legal scalar variants it calls `CheckFPEnabled64`, whereas for the illegal vector variants it calls `CheckFPAdvSIMDEnabled64` which traps. This is useful for observing which instructions are/aren't legal in streaming mode. [1] https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions/FMULX--Floating-point-Multiply-extended- Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D108039	2021-08-23 08:48:34 +00:00
Cullen Rhodes	cf3c6cca9f	[AArch64][SME] Add predicate for NEON support in streaming mode Split out from D107903 to remove dependency for D108039 and D108279. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D108293	2021-08-23 08:48:33 +00:00
Fangrui Song	0473e9f41a	[AArch64] Replace unneeded CCAssignToRegWithShadow with CCAssignToReg CCState::AllocateReg handles aliased registers.	2021-08-21 16:33:29 -07:00
Amara Emerson	3187a4f3f1	[AArch64][GlobalISel] Add legalizer support for the @llvm.get.dynamic.area.offset intrinsic. This is just 0 on AArch64.	2021-08-20 17:13:34 -07:00
Amara Emerson	67bf3ac744	[AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores. Truncating stores with GPR bank sources shouldn't be mutated into using FPR bank sources, since those aren't supported. Ideally this should be a selection failure in the tablegen patterns, but for now avoid generating them.	2021-08-20 16:36:23 -07:00
Jessica Paquette	9e9d70591e	[AArch64][GlobalISel] Legalize non-register-sized scalar G_BITREVERSE Clamp types to [s32, s64] and make them a power of 2. This matches SDAG's behaviour. https://godbolt.org/z/vTeGqf4vT Differential Revision: https://reviews.llvm.org/D108344	2021-08-20 14:44:03 -07:00
Jessica Paquette	7e91c59844	[AArch64][GlobalISel] Legalize 32-bit + narrow G_SMULO + G_UMULO SDAG lowers 32-bit and 64-bit G_SMULO + G_UMULO. We were missing the 32-bit case. For other sizes, make the 0th type a power of 2 and clamp it to either 32 bits or 64 bits. Right now, this will allow us to handle narrow types (e.g. s4, s24, etc.). The LegalizerHelper doesn't support narrowing G_SMULO or G_UMULO right now. I think we want clamping behaviour either way, so we might as well include it now to be explicit. Differential Revision: https://reviews.llvm.org/D108240	2021-08-20 14:37:46 -07:00
Jessica Paquette	16caf6321c	[AArch64][GlobalISel] Clamp vectors of p0 when legalizing G_LOAD/G_STORE We had a rule for <n x s64> but not one for <n x p0>. As a result, we'd fall back on like <5 x p0> or whatever. Differential Revision: https://reviews.llvm.org/D108484	2021-08-20 14:34:49 -07:00
Jessica Paquette	470c74f181	[AArch64][GlobalISel] Add regbankselect support for G_LROUND Destination is always a GPR, since the result is always an integer. Source is always a FPR, since the source is always floating point. Differential Revision: https://reviews.llvm.org/D108419	2021-08-20 14:31:14 -07:00
Jessica Paquette	44bf0dc625	[AArch64][GlobalISel] Mark G_LROUND as legal for s64 dst + s32/s64 src. Matches SDAG's behaviour for these types. Differential Revision: https://reviews.llvm.org/D108420	2021-08-20 14:22:58 -07:00
Florian Hahn	ab9296f13b	Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64" This reverts commit `f4122398e7` to investigate a crash exposed by it. The patch breaks building the code below with `clang -O2 --target=aarch64-linux` int a; double b, c; void d() { for (; a; a++) { b += c; c = a; } }	2021-08-20 21:24:28 +01:00
Tim Northover	3d41ef68e7	AArch64: don't form indexed paired ops if base reg overlaps operands. The registers involved might not be identical, but can still overlap (e.g. "str w0, [x0, #4]!").	2021-08-20 11:39:38 +01:00
Jingu Kang	94c4952951	[AArch64] Enable Upper bound unrolling universally Differential Revision: https://reviews.llvm.org/D105996	2021-08-20 11:25:38 +01:00

1 2 3 4 5 ...

5459 Commits