llvm-project

Commit Graph

Author	SHA1	Message	Date
zhongyunde	4a549be9c3	[AArch64] Lower multiplication by a negative constant to shl+sub+shl Change the costmodel to lower a = b * C where C = -(2^n - 2^m) to lsl w8, w0, m sub w0, w8, w0, lsl n Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134934	2022-10-01 21:27:42 +08:00
Peter Collingbourne	0caa9d4b1e	AArch64: Don't use RETA[AB] when ShadowCallStack is enabled. When returning from a function with both SCS and PAC-RET enabled, we need to authenticate the return address from the stack and then load from the SCS, but this was happening in the reverse order when RETA[AB] were being used. Fix it by disabling the use of RETA[AB] when SCS is enabled. Fixes pr58072. Differential Revision: https://reviews.llvm.org/D134931	2022-09-30 12:33:23 -07:00
Florian Hahn	fe49ba84d3	[AArch64] Reflow comment in AArch64IselLowering.cpp (NFC).	2022-09-30 17:17:04 +01:00
Zain Jaffal	fca8730793	[AArch64] Refactor opcode selection for LowerMUL (NFC) Move the logic for selecting `NewOpc` out of `LowerMUL` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134875	2022-09-30 16:48:02 +01:00
Zain Jaffal	661403b85c	[AArch64] Add support for 128-bit non temporal loads. Adding to the work done in `D131773` here we add support to 128-bit loads. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D132559	2022-09-30 11:04:04 +01:00
Amara Emerson	7653586d88	[AArch64][GlobalISel] Implement another combine for shufflevector->AArch64 G_EXT. This is a port of an existing optimization in AArch64 ISelLowering, handling a case when the same input vector can be used for both ext inputs. Differential Revision: https://reviews.llvm.org/D134891	2022-09-29 22:53:24 +01:00
zhongyunde	4d15e7b21b	[AArch64] Lower multiplication by a constant (NFC) Refactor according https://reviews.llvm.org/D134706#inline-1298952 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134848	2022-09-30 01:37:28 +08:00
zhongyunde	62a51c357c	[AArch64] Lower multiplication by a constant int to shl+sub+shl Decompose the const 14 can be separated from D132322 Change the costmodel to lower a = b * C where C = 2^n - 2^m to lsl w8, w0, n sub w0, w8, w0, lsl m Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134706	2022-09-30 01:31:06 +08:00
Jessica Paquette	95dabac7a5	[AArch64][GlobalISel] Make G_PTRTOINT only legal for s64 + p0 A few issues: 1. There was no legalizer test for G_PTRTOINT 2. Same clamping issue as in many other opcodes 3. AArch64 pointers can only be 64b, so in reality we always have to trunc or extend with any size other than p0 anyway. This seems to actually produce more correct selection for narrow types as well. Differential Revision: https://reviews.llvm.org/D107588	2022-09-28 16:20:24 -07:00
Jessica Paquette	a7aaafde2e	[AArch64][GlobalISel] Implement custom legalization for s32/s64 G_FCOPYSIGN This is intended to be equivalent to the s32 + s64 cases in AArch64TargetLowering::LowerFCOPYSIGN. Widen everything and then use G_BIT + a mask to handle the actual copysign operation. Then, narrow back down to s32/s64. I wasn't sure about what the best/most canonical INSERT_SUBREG-selectable pattern is. I chose G_INSERT_VECTOR_ELT + an undef vector because it produces reasonably okay codegen. (It doesn't produce INSERT_SUBREG right now though.) If there's a better way to do this then I'm happy to change it. We also have a couple codegen deficiencies with how we emit vector constants right now. (We need a GISel equivalent to the tryAdvSIMDModImm64 stuff) Differential Revision: https://reviews.llvm.org/D108725	2022-09-28 16:03:22 -07:00
Florian Mayer	0401dc2913	[MTE] [HWASan] unify isInterestingAlloca Reviewed By: eugenis Differential Revision: https://reviews.llvm.org/D134779	2022-09-28 15:52:34 -07:00
Jessica Paquette	4957ee6529	[AArch64][GlobalISel] Add a target-specific G_BIT opcode. This is necessary for custom-legalizing G_FCOPYSIGN. This is equivalent to the BIT instruction (bitwise insert if true). Add selection testcases for imported patterns. Differential Revision: https://reviews.llvm.org/D108714	2022-09-28 15:48:35 -07:00
Matt Devereau	0a4771a7e8	[AArch64][SVE] Expand gather index to 32 bits instead of 64 bits For gathers which load in 8 and 16 bit data then use that data as an index, the index can be extended to 32 bits instead of 64 bits Differential Revision: https://reviews.llvm.org/D130692	2022-09-28 14:42:12 +00:00
Florian Hahn	eba84971ae	Revert "[AARCH64][CostModel] Modified the cost of mask vector load/store" This reverts commit `1c62af3e23`. The commit causes the test below to fail. Revert for now to get the bots back to green. Failing test: lvm/test/Transforms/LoopVectorize/AArch64/masked-op-cost.ll	2022-09-28 15:35:13 +01:00
Florian Hahn	2d3c260362	[AArch64] break non-temporal loads over 256 into 256-loads and a smaller load Currently over 256 non-temporal loads are broken inefficently. For example, `v17i32` gets broken into 2 128-bit loads. It is better if we can use 256-bit loads instead. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133421	2022-09-28 15:20:26 +01:00
liqinweng	1c62af3e23	[AARCH64][CostModel] Modified the cost of mask vector load/store Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D134413	2022-09-28 19:40:29 +08:00
Cullen Rhodes	3918ef07c4	[AArch64][SVE] Remove redundant ptest after match/nmatch These instructions are flag setting so the ptest is redundant, the TableGen class wasn't setting the element size for the predicate causing the checks in AArch64InstrInfo::optimizePTestInstr to fail.	2022-09-28 08:23:23 +00:00
Florian Mayer	979db5343f	[HWASan] [NFC] use auto* over auto& for pointers	2022-09-27 18:19:25 -07:00
David Green	401481daac	[AArch64] Remove incorrect zero element insert-bitcast patterns These two patterns are not working as intended, as shown in D134022. They need to insert the value into the new register, not override it.	2022-09-27 17:08:17 +01:00
David Sherwood	fbb119412f	[AArch64] Add Neoverse V2 CPU support Adds support for the Neoverse V2 CPU to the AArch64 backend. Differential Revision: https://reviews.llvm.org/D134352	2022-09-27 07:56:08 +00:00
David Green	bebc96956b	[AArch64] Enable FeatureFuseAdrpAdd for all Arm cpus The commit D120104 enabled FeatureFuseAdrpAdd for -mcpu=generic, allowing the linker to relax adrp;add pairs where possible. D132075 extended that to neoverse-n1, this patch extends it to all other cortex and neoverse cpus for the same reasons. Differential Revision: https://reviews.llvm.org/D134521	2022-09-26 09:55:10 +01:00
James Y Knight	a538d1f13a	[TableGen][CodeEmitterGen] Allow local names for sub-operands in a operand list. These names can then be matched by name against 'bits' fields in a record, to populate an instruction's encoding. This does _not_ yet change DecoderEmitter to allow by-name matching of sub-operands. Unlike the encoder, the decoder already defaulted to not supporting positional matching, and backends had workarounds in place for the missing decoding support. Additionally, use this new capability to allow the ARM and AArch64 backends not to require any positional operand matching. Differential Revision: https://reviews.llvm.org/D131003	2022-09-24 09:40:44 -04:00
Hassnaa Hamdi	181f200a1c	[NFC]: AArch64-SVE modify some comments	2022-09-23 12:07:31 +00:00
Caroline Concatto	5431bf27bd	[AArch64]Remove svget/svset/svcreate from llvm This patch removes the aarch64 instrinsic svget/svset/svcreate from llvm. It also implements the InstCombine for vector.extract that used to be in svget. Depends on: D131547 Differential Revision: https://reviews.llvm.org/D131548	2022-09-23 10:48:43 +01:00
Hassnaa Hamdi	f2072e0ae0	[AArh64-SVE]: Improve cost model for div/udiv/mul 128-bit vector operations Differential Revision: https://reviews.llvm.org/D132477	2022-09-22 16:50:55 +00:00
Tim Northover	677da09d02	AArch64: add support for newer Apple CPUs They're roughly ARMv8.6. This works in the .td file, but in AArch64TargetParser.def, marking them v8.6 brings in support for the SM4 cryptographic hash and we don't actually have that. So TargetParser side they're marked as v8.5, with the extra features (BF16 and I8MM added manually). Finally, A16 supports the HCX extension in addition to v8.6. This has no TargetParser implications.	2022-09-22 11:58:51 +01:00
Florian Hahn	ac434afed8	[AArch64] Try to fold shuffle (tbl2, tbl2) to tbl4. shuffle (tbl2, tbl2) can be folded into a single tbl4 if the mask for the selected elements is constant. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133491	2022-09-21 19:15:56 +01:00
David Green	4f78e022ee	[AArch64] Lower scalar sqxtn intrinsics to use fp registers The llvm.aarch64.neon.scalar.sqxtn.i32.i64 intrinsics take and return integer types, but operate on fp registers. This can create some inefficiencies in their lowering, where the registers are converted to fp a little too late. This patch adds lowering for the intrinsics, creating bitcasts to/from fp types to allow nicer folding later when the instructions are selected, especially around insert/extracts. Differential Revision: https://reviews.llvm.org/D134024	2022-09-21 10:46:43 +01:00
David Green	9a20596f48	[AArch64] Insert/Extract of bitcast patterns This adds some quick tablegen patterns for vector_insert(bitcast(..)) and bitcast(vector_extract(..)), allowing us to avoid a round-trip through GPRs. Differential Revision: https://reviews.llvm.org/D134022	2022-09-21 09:54:17 +01:00
David Sherwood	64bef3d568	[AArch64][SME] Disable inlining when SME attributes require smstart/smstop or lazy-save. Inlining must be disabled when the call-site needs to toggle PSTATE.SM or when the callee's function body is executed in a different streaming mode than its caller. This is needed because function calls are the boundaries for streaming mode changes. More details about the SME attributes and design can be found in D131562. Differential Revision: https://reviews.llvm.org/D131581	2022-09-21 09:35:47 +01:00
David Green	cb375e8c1f	[AArch64] Enable LSLFast for modern OoO cpus This patch enables the LSLFast feature for Cortex-A76, Cortex-A77, Cortex-A78, Cortex-A78C, Cortex-A710, Cortex-X1, Cortex-X2, Neoverse N1, Neoverse N2, Neoverse V1 and the Neoverse 512TB pseudo-cpu, in-line with the software optimization guides for those CPUs. Differntial revision: https://reviews.llvm.org/D134273	2022-09-20 17:09:14 +01:00
Caroline Concatto	d32b8fdbdb	[LLVM][AArch64] Replace aarch64.sve.ld by aarch64.sve.ldN.sret This patch removes the intrinsic aarch64.sve.ldN from tablegen in favour of using arch64.sve.ldN.sret. Depends on: D133023 Differential Revision: https://reviews.llvm.org/D133025	2022-09-20 13:15:07 +01:00
David Green	908b3b6ccb	[AArch64] Use fast-math-flags in isAssociativeAndCommutative Previously only using the UnsafeFPMath option, this now looks for the fast moth flags on the instructions, using the same flag flags as other backends.	2022-09-19 11:34:00 +01:00
Kazu Hirata	20d764aff0	[llvm] Don't including SetVector.h (NFC) llvm/lib/ProfileData/RawMemProfReader.cpp uses SetVector without including SetVector.h, so this patch adds an appropriate #include there.	2022-09-17 12:36:43 -07:00
Sander de Smalen	bed214cf0f	[AArch64][SME] Add intrinsics for enabling/disabling ZA. This adds the intrinsics: * void @llvm.aarch64.sme.za.enable() -> smstart za * void @llvm.aarch64.sme.za.disable() -> smstop za Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D133894	2022-09-17 16:41:42 +00:00
Sander de Smalen	5fae000f36	[AArch64][SME] Disable tail-call optimization when streaming mode change or lazy-save may be required. When a streaming mode change is (or may be) required for a call, it will need to restore the original mode after the call, which prevents the use of tail-call optimization. The same holds true for a call that requires the lazy-save mechanism to be set up before the call, and possibly restored after. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131579	2022-09-17 16:15:07 +00:00
Jessica Paquette	1076b31da8	[GlobalISel] Combine select + fcmp to fminnum/fmaxnum/fminimum/fmaximum This is a partial port of the code used by the SelectionDAGBuilder to translate selects. In particular, see matchSelectPattern in ValueTracking.cpp. This is a GISel-equivalent of the portion which handles fminnum/fmaxnum/fminimum/fmaximum. I tried to set it up so it'd be easy to add the non-FP cases. Those are simpler. On the AArch64-end, it seems like the FP cases are more important for perf right now, so I bit the bullet and went at the more complicated problem. :) I elected to do this as a post-legalize combine rather than in the IRTranslator because Deciding which fmax/fmin to use can depend on legalization rules Philosophically-speaking (TM), putting it in a combine just feels cleaner Being able to enable/disable the combine is handy Another option would be to use the ValueTracking code in the IRTranslator and match what SelectionDAGBuilder::visitSelect does. I think that may be somewhat annoying since we'd need to write lowerings back into the selects in the legalizer. I'm not strongly opposed to the approach. We'd also want to be careful with vector selects once that's implemented, which explicitly check if a vector select is legal on the target. That'd probably need a hook. From what I can tell, doing this as a combine is probably a cleaner option long-term. Differential Revision: https://reviews.llvm.org/D116702	2022-09-16 13:35:46 -07:00
David Majnemer	8a868d8859	Revert "Revert "[clang, llvm] Add __declspec(safebuffers), support it in CodeView"" This reverts commit `cd20a18286` and adds a "let Heading" to NoStackProtectorDocs.	2022-09-16 19:39:48 +00:00
Sander de Smalen	bd4935c175	[AArch64][SME] Implement ABI for calls from streaming-compatible functions. When a function is streaming-compatible and calls a function with a normal or streaming interface, it may need to enable/disable stremaing mode before the call, and needs to restore PSTATE.SM after the call. This patch implements this with a Pseudo node that gets expanded to a conditional branch and smstart/smstop node. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131578	2022-09-16 14:48:37 +00:00
Sander de Smalen	b00c36c295	[AArch64][SME] Implement ABI for calls to/from streaming functions. This patch implements the ABI for calls from: Normal -> Streaming Normal -> Streaming-compatible Streaming -> Normal Streaming -> Streaming-compatible Streaming -> Streaming The compiler inserts SMSTART/SMSTOP instructions before and after the call, depending on the required transition. More details about the SME attributes and design can be found in D131562. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131576	2022-09-16 14:07:47 +00:00
Florian Hahn	6b86b481e3	[AArch64] Use tbl for truncating vector FPtoUI conversions. On AArch64, doing the vector truncate separately after the fptoui conversion can be lowered more efficiently using tbl.4, building on D133495. https://alive2.llvm.org/ce/z/T538CC Depends on D133495 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133496	2022-09-16 14:57:43 +01:00
Florian Hahn	8491d01cc3	[AArch64] Lower vector trunc using tbl. Similar to using tbl to lower vector ZExts, tbl4 can be used to lower vector truncates. The initial version support i32->i8 conversions. Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133495	2022-09-16 12:42:49 +01:00
Florian Hahn	5871f18827	[AArch64] Lower extending uitofp using tbl. On AArch64, doing the zero-extend separately first can be lowered more efficiently using tbl, building on D120571. https://alive2.llvm.org/ce/z/8Je595 Depends on D120571 Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D133494	2022-09-16 10:20:25 +01:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Sergei Barannikov	c6acb4eb0f	[SDAG] Add `getCALLSEQ_END` overload taking `uint64_t`s All in-tree targets pass pointer-sized ConstantSDNodes to the method. This overload reduced amount of boilerplate code a bit. This also makes getCALLSEQ_END consistent with getCALLSEQ_START, which already takes uint64_ts.	2022-09-15 14:02:12 -04:00
Sander de Smalen	45d28779c5	[AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm() A thread may not have access to SME or TPIDR2_EL0, so in order to safely query PSTATE.SM in a streaming-compatible function, the code should call `__arm_sme_state()`, as described in the ABI: `c2bb09c4d4` This means that the value of pstate.sm is: * 0 if the function is non-streaming. * 1 if the function has `arm_streaming` or `arm_locally_streaming`. * evaluated at runtime by a call to __arm_sme_state() otherwise. This patch also adds a calling convention for calls to SME support routines. At some point we can remove the need for the llvm.aarch64.get.pstatesm() intrinsic and use function calls (with the corresponding cc) directly instead. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131571	2022-09-15 15:14:13 +00:00
Marco Elver	72e7575ffe	[GlobalISel][AArch64] Fix pcsections for expanded atomics and add more tests Add fix for propagation of !pcsections metadata for expanded atomics, together with more tests for interesting atomic instructions (based on llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133710	2022-09-15 10:36:11 +02:00
David Spickett	3acaf04033	[LLVM][AArch64] Don't warn about clobbering X16 when Speculative Load Hardening is used SLH will fall back to a different technique if X16 is being used, so there is no need to warn for inline asm use. Only prevent other codegen from using it. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D133766	2022-09-14 15:19:53 +00:00
Zain Jaffal	d1dec04d76	[AArch64] Disable nontemproal load for Big Endian The current code for generating nontemporal load outputs the wrong assembly for big endian architecture. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133789	2022-09-14 14:49:55 +01:00
Antonio Frighetto	c63e05dc07	[AArch64InstPrinter] Introduce register markup tags emission AArch64 assembly syntax emission now leverages markup tags for registers, if enabled. Reviewed By: MaskRay, david-arm Differential Revision: https://reviews.llvm.org/D129870	2022-09-13 20:52:02 -07:00

1 2 3 4 5 ...

6374 Commits