llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	d869feb406	[AArch64] Add some vector lowering tests and regenerate a couple of files. NFC	2022-09-15 21:52:55 +01:00
Florian Hahn	81a11da762	[CGP,AArch64] Replace zexts with shuffle that can be lowered using tbl. This patch extends CodeGenPrepare to lower zext v16i8 -> v16i32 in loops using a wide shuffle creating a v64i8 vector, selecting groups of 3 zero elements and an element from the input. This is profitable on AArch64 where such shuffles can be lowered to tbl instructions, but only in loops, because it requires materializing 4 masks, which can be done in the loop preheader. This is the only reason the transform is part of CGP. If there's a better alternative I missed, please let me know. The same goes for the shouldReplaceZExtWithShuffle hook which guards this. I am not sure if this transform will be beneficial on other targets, but it seems like there is no way other convenient way. This improves the generated code for loops like the one below in combination with D96522. int foo(uint8_t p, int N) { unsigned long long sum = 0; for (int i = 0; i < N ; i++, p++) { unsigned int v = p; sum += (v < 127) ? v : 256 - v; } return sum; } https://clang.godbolt.org/z/Wco866MjY Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D120571	2022-09-15 19:18:13 +01:00
Sander de Smalen	45d28779c5	[AArch64][SME] Fix lowering of llvm.aarch64.get.pstatesm() A thread may not have access to SME or TPIDR2_EL0, so in order to safely query PSTATE.SM in a streaming-compatible function, the code should call `__arm_sme_state()`, as described in the ABI: `c2bb09c4d4` This means that the value of pstate.sm is: * 0 if the function is non-streaming. * 1 if the function has `arm_streaming` or `arm_locally_streaming`. * evaluated at runtime by a call to __arm_sme_state() otherwise. This patch also adds a calling convention for calls to SME support routines. At some point we can remove the need for the llvm.aarch64.get.pstatesm() intrinsic and use function calls (with the corresponding cc) directly instead. Reviewed By: aemerson Differential Revision: https://reviews.llvm.org/D131571	2022-09-15 15:14:13 +00:00
Florian Hahn	791a7ae1ba	[AArch64] Add big-endian tests for trunc-to-tbl.ll Extra tests for D133495.	2022-09-15 15:12:34 +01:00
Florian Hahn	8f19de848b	[AArch64] Add big-endian tests for zext-to-tbl.ll Extra tests for D120571.	2022-09-15 14:01:27 +01:00
Marco Elver	72e7575ffe	[GlobalISel][AArch64] Fix pcsections for expanded atomics and add more tests Add fix for propagation of !pcsections metadata for expanded atomics, together with more tests for interesting atomic instructions (based on llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D133710	2022-09-15 10:36:11 +02:00
Roland Froese	207228c1d6	[DAGCombiner] More load-store forwarding for big-endian Get some load-store forwarding cases for big-endian where a larger store covers a smaller load, and the offset would be 0 and handled on little-endian but on big-endian the offset is adjusted to be non-zero. The idea is just to shift the data to make it look like the offset 0 case. Differential Revision: https://reviews.llvm.org/D130115	2022-09-14 15:36:35 -04:00
David Spickett	3acaf04033	[LLVM][AArch64] Don't warn about clobbering X16 when Speculative Load Hardening is used SLH will fall back to a different technique if X16 is being used, so there is no need to warn for inline asm use. Only prevent other codegen from using it. Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D133766	2022-09-14 15:19:53 +00:00
Zain Jaffal	d1dec04d76	[AArch64] Disable nontemproal load for Big Endian The current code for generating nontemporal load outputs the wrong assembly for big endian architecture. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133789	2022-09-14 14:49:55 +01:00
Zain Jaffal	244a6a76d9	[AArch64] Add nontemporal load tests for big endian. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133765	2022-09-14 13:51:58 +01:00
David Green	993b203b6a	[AArch64] Sink splat(s/zext(..)) to uses If the Shuffle is a splat and the operand is a zext/sext, sinking the operand and the s/zext can help create indexed s/umull. This is especially useful to prevent i64 mul being scalarized. Differential Revision: https://reviews.llvm.org/D133355	2022-09-13 15:47:41 +01:00
David Spickett	0b8a44388e	[llvm][AArch64] Explain why certain registers are reserved on Arm64EC This extends `4658366d95` to add a note explaining why the register is reserved. note: x13 is clobbered by asynchronous signals when using Arm64EC. I've added testing for w/x registers and v/q/s/d and h floating point registers. llvm will accept, but silently do nothing with, b registers. So they are not tested here (clang rejects them so at least for C you're safe anyway). Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D133701	2022-09-13 10:13:06 +00:00
David Green	f124e59b2e	[TypePromotionPass] Don't treat phi's as ToPromote This attempts to stop the type promotion pass transforming where it is not profitable, by not marking PhiNodes as ToPromote and being more aggressive about pulling extends out of loops. Differential Revision: https://reviews.llvm.org/D133203	2022-09-13 08:57:15 +01:00
Amara Emerson	f24f469223	[GlobalISel] Fix crash when lowering G_SELECT of pointer vectors. The bit masking lowering only works for vectors of scalars, so for pointer element types we need to add some casting. Differential Revision: https://reviews.llvm.org/D133672	2022-09-13 00:01:37 +01:00
David Green	f072263e9f	[AArch64] Add some extra typepromotion cost tests. NFC	2022-09-12 11:13:23 +01:00
David Spickett	29bb6497da	[llvm][AArch64] Test warning for clobbering w19 with base frame pointer The test added in `739b69e655` only checked that X19 triggers the explanation, also check W19.	2022-09-12 09:57:53 +00:00
David Spickett	739b69e655	[LLVM][AArch64] Explain that X19 is used as the frame base pointer register Fixes #50098 LLVM uses X19 as the frame base pointer, if it needs to. Meaning you can get warnings if you clobber that with inline asm. However, it doesn't explain why. The frame base register is not part of the ABI so it's pretty confusing why you get that warning out of the blue. This adds a method to explain a reserved register with X19 as the first one. The logic is the same as getReservedRegs. I could have added a return parameter to isASMClobberable and friends but found that there's a lot of things that call isReservedReg in various ways. So while one more method on the pile isn't great design, it is simpler right now to do it this way and only pay the cost if you are actually using a reserved register. Reviewed By: lenary Differential Revision: https://reviews.llvm.org/D133213	2022-09-12 09:18:09 +00:00
zhongyunde	7a81782585	[AArch64][CodeGen]Fold the mov and lsl into ubfiz Fix the issue exposed by D132322, depand on D132939 Reviewed By: efriedma, paulwalker-arm Differential Revision: https://reviews.llvm.org/D132325	2022-09-09 23:50:29 +08:00
zhongyunde	b6655333c2	[Peephole] rewrite INSERT_SUBREG to SUBREG_TO_REG if upper bits zero Restrict the 32-bit form of an instruction of integer as too many test cases will be clobber as the register number updated. From %reg = INSERT_SUBREG %reg, %subreg, subidx To %reg:subidx = SUBREG_TO_REG 0, %subreg, subidx Try to prefix the redundant mov instruction at D132325 as the SUBREG_TO_REG should not generate code. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D132939	2022-09-09 09:00:54 +08:00
Zain Jaffal	04548e82ed	[AArch64] Add test for vscale nontemporal loads larger than 256. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D133498	2022-09-08 17:03:44 +01:00
Eric Wang	d8a2d3f7d4	[NFC][Regalloc] Introduce the RegAllocPriorityAdvisorAnalysis This patch introduces the priority analysis and the priority advisor, the default implementation, and the scaffolding for introducing the other implementations of the advisor. Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D132835	2022-09-08 07:50:03 -07:00
Florian Hahn	39fcb4a268	[AArch64] Add tests for lowering trunc to i8 using tbl.	2022-09-08 15:45:32 +01:00
Florian Hahn	7d4ee32662	[AArch64] Add tests for shuffle (tbl2, tbl2) -> tbl4 fold. Add extra tests where shuffle (tbl2, tbl2) can be folded to tbl4. Regenerate check lines automatically as well.	2022-09-08 14:01:16 +01:00
Matt Devereau	3864643dea	[AArch64][SVE] Add out of range SVE arg CC test Add calling convention test for callee functions that have SVE parameters outside of the z0-z7 range	2022-09-08 11:41:49 +00:00
liqinweng	723245bfac	[AARCH64][COST] Improve cost of reverse shuffles for AArch64 Update the comments for reverse shuffles and add tests Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D132730	2022-09-08 18:55:49 +08:00
chenglin.bi	5b5f6e7547	[AArch64] add i56 load store pair test case; NFC	2022-09-08 16:52:25 +08:00
Nikita Popov	96cb7c2273	[ConstantExpr] Remove fneg expression As part of https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179, this removes the fneg constant expression (which is, incidentally, the only unary operator expression). Differential Revision: https://reviews.llvm.org/D133418	2022-09-08 10:24:55 +02:00
Florian Hahn	8105f555af	[AArch64] Add tests for using tbl for fp conversions.	2022-09-07 18:35:30 +01:00
Florian Hahn	faf9e05065	[AArch64] Add additional tests for tbl expansion. Add test coverage to make sure tbl expansion isn't used when optimizing for size or when the zext is only executed conditionally in a loop.	2022-09-07 17:47:59 +01:00
Marco Elver	343700358f	[AsmPrinter] Emit PCs into requested PCSections Interpret MD_pcsections in AsmPrinter emitting the requested metadata to the associated sections. Functions and normal instructions are handled. Differential Revision: https://reviews.llvm.org/D130879	2022-09-07 11:36:02 +02:00
chenglin.bi	5fa13212d1	[AArch64] add tests for non-power2 int types; NFC	2022-09-07 16:07:26 +08:00
Xiang1 Zhang	16743c9534	[CodeGen] Limit building time in CodeGenPrepare for huge function Details: Currently CodeGenPrepare is very time consuming in handling big functions. Old Algorithm : It iterate each BB in function, and go on handle very instructions in BB. Due to some instruction optimizations may affect the BBs' dominate tree. The old logic will re-iterate and try optimize for each BB. Suppose we have a big function with 20000 BBs, If we handled the last BB with fine tuning the dominate tree. We need totally re-iterate and try optimize the 20000 BBs from the beginning. The Complex is near N! And we really encounter somes big tests (> 20000 BBs) that cost more than 30 mins in this pass. (Debug version compiler will cost 2 hours here) What this patch do for huge function ? It mainly changes the iteration way for optimization. 1 We do optimizeBlock for each BB (that is same with old way). And, in the meaning time, If BB is changed/updated in the optimization, it will be put into FreshBBs (try do optimizeBlock again). The new created BB at previous iteration will also put into FreshBBs. 2 For the BBs which not updated at previous iteration, we directly skip it. Strictly speaking, here may miss some opportunity, but the probability is very small. 3 For Instructions in single BB, we do optimizeInst for each instruction. If optimizeInst change the instruction dominator in this BB, rather than break and go back to optimize the first BB (the old way), we directly iterate instructions (to do optimizeInst) in this updated BB again (the new way). What this patch do for small/normal (not huge) function ? It is same with the Old Algorithm. (NFC) Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D129352	2022-09-07 10:05:40 +08:00
Alexander Shaposhnikov	6a2442e9be	[AArch64] Increase AddedComplexity of BIC This diff adjusts AddedComplexity of BIC to bump its position in the list of patterns to make LLVM pick it instead of MVN + AND. MVN + AND requires 2 cycles, so does e.g. MOV + BIC, but the latter outperforms the former if the instructions producing the operands of BIC can be issued in parallel. One may consider the following example: ldur x15, [x0, #2] # 4 cycles mvn x10, x15 # 1 cycle (depends on ldur) and x9, x10, #0x8080808080808080 vs. ldur x15, [x0, #2] # 4 cycles mov x9, #0x8080808080808080 # 1 cycle (can be executed in parallel with ldur) bic x9, x9, x15. # 1 cycle Test plan: ninja check-all Differential revision: https://reviews.llvm.org/D133345	2022-09-06 20:31:24 +00:00
Guozhi Wei	3cf4ab5447	[AArch64] Add an option to reserve physical registers from RA This patch adds an option --reserve-regs-for-regalloc, so we can reserve a list of physical registers. These registers will not be used by register allocator, but can still be used as ABI requests such as passing arguments to function call. Its main purpose is simulating high register pressure by reserving many physical registers. So it will be much easier to test and debug register allocation changes. Differential Revision: https://reviews.llvm.org/D132717	2022-09-06 17:18:01 +00:00
David Green	db818219c5	[AArch64] Additional tests for sinking splats to muls. NFC	2022-09-06 16:04:28 +01:00
Matthias Gehre	2090e85fee	[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64 This adds the ExpandLargeDivRem to the default pass pipeline. The limit at which it expands div/rem instructions is configured via a new TargetTransformInfo hook (default: no expansion) X86, Arm and AArch64 backends implement this hook to expand div/rem instructions with more than 128 bits. Differential Revision: https://reviews.llvm.org/D130076	2022-09-06 15:32:04 +01:00
Amara Emerson	3dd861818a	[GlobalISel] Combine G_INSERT/EXTRACT_VECTOR_ELT with out of bounds indices to undef. Differential Revision: https://reviews.llvm.org/D133309	2022-09-06 13:45:04 +01:00
Amara Emerson	fda691e18d	[AArch64] Update checks in call lowering test for signext in prep for bug fix.	2022-09-05 21:39:57 +01:00
Eli Friedman	2b9cec6244	[ARM64EC 5/?] Fix names of __chkstk and __security_check_cookie. Part of initial Arm64EC patchset. Arm64EC code needs to use functions with a different name, to avoid using the x64 versions. Differential Revision: https://reviews.llvm.org/D125417	2022-09-05 13:19:54 -07:00
Eli Friedman	5637ec0983	[ARM64EC 4/?] Add LLVM support for varargs calling convention. Part of patchset to add initial support for ARM64EC. The ARM64EC calling convention is the same as ARM64 for non-varargs functions, but for varargs, the convention is significantly different. Basically, only x0-x3 registers are used for passing arguments, and x4 and x5 describe the address/size of the arguments passed in memory. (See https://docs.microsoft.com/en-us/windows/uwp/porting/arm64ec-abi for more details; see https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention for the x64 calling convention rules, which this convention needs to match.) Note that this currently doesn't handle i128 arguments correctly; as noted in review, that's sort of complicated to handle, so I'm leaving it for a followup. Differential Revision: https://reviews.llvm.org/D125415	2022-09-05 13:05:48 -07:00
Eli Friedman	4658366d95	[ARM64EC 3/?] Mark reserved registers specific to ARM64EC ABI. Part of patchset to add initial support for ARM64EC. I'm not completely sure I understand the reason for this restriction, but Microsoft documentation says that asynchronous signals clobber these registers, so we can't ever use them. As far as I know, none of these registers have any hardcoded meaning, so reserving them shouldn't have any significant side-effects. Differental Revision: https://reviews.llvm.org/D125413	2022-09-05 12:59:39 -07:00
David Green	3c6edc0b2f	[AArch64][GlobalISel] Recognise some CCMPri This is a simple addition to emitConditionalComparison, to match CCMP with immediates using getIConstantVRegValWithLookThrough, letting it select the CCMPri variants of the instructions. Differential Revision: https://reviews.llvm.org/D131073	2022-09-05 19:43:23 +01:00
Amara Emerson	511f2169a8	[GlobalISel] Update combine-build-vector.mir test checks before patch.	2022-09-05 16:06:05 +01:00
Amara Emerson	22b6a4fcac	[GlobalISel] Update test checks before a patch.	2022-09-05 15:24:07 +01:00
David Sherwood	ffa6267300	[CodeGen] Support extracting fixed-length vectors from illegal scalable vectors For some indices we can simply extract the fixed-length subvector from the low half of the scalable vector, for example when the index is less than the minimum number of elements in the low half. For all other cases we can expand the operation through the stack by storing out the vector and reloading the fixed-length part we need. Fixes https://github.com/llvm/llvm-project/issues/55412 Tests added here: CodeGen/AArch64/sve-extract-fixed-from-scalable-vector.ll Differential Revision: https://reviews.llvm.org/D117499	2022-09-05 15:05:14 +01:00
Amara Emerson	fb60e50c78	[GlobalISel] Fix a combine crash due to a negative G_INSERT_VECTOR_ELT idx. These should really be folded away to undef but we shouldn't crash in any case.	2022-09-05 12:10:17 +01:00
Matt Devereau	b9062ceffc	[AArch64][SVE] Add floating-point repeated complex pattern llc tests	2022-09-01 15:04:59 +00:00
Amara Emerson	4cf3db41da	[GlobalISel] Add sdiv exact (X, constant) -> mul combine. This port of the SDAG optimization is only for exact sdiv case. Differential Revision: https://reviews.llvm.org/D130517	2022-09-01 13:34:00 +01:00
Nikita Popov	ab6876a40d	reland: [Local] Allow creating callbr with duplicate successors Since D129288, callbr is allowed to have duplicate successors. This patch removes a limitation which prevents optimizations from actually producing such callbrs. This is probably the riskiest of all the recent callbr changes, because code with incorrect assumptions might be lurking somewhere. I fixed the one case I encountered ahead of time in `8201e3ef5c`. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D129997 Originally landed as commit `08860f525a` ("[Local] Allow creating callbr with duplicate successors") Reverted in commit `1cf6b93df1` ("Revert "[Local] Allow creating callbr with duplicate successors"")	2022-08-31 13:23:00 -07:00
Nick Desaulniers	d7474bef77	[llvm][TailDuplicator] don't taildup isInlineAsmBrIndirectTargets This fixes a crash observed after https://reviews.llvm.org/D129997. Similar to D88823. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D130127	2022-08-31 13:07:10 -07:00
Hassnaa Hamdi	a6d9c944df	[AArch64 - SVE]: Use SVE to lower reduce.fadd. Differential Revision: https://reviews.llvm.org/D132573 skip custom-lowering for v1f64 to be expanded instead, because it has only one lane Differential Revision: https://reviews.llvm.org/D132959	2022-08-31 12:31:06 +00:00
Hassnaa Hamdi	d8655bdeb4	[AArch64-SVE-fixed]: change vscale_range<2,0> to vscale_range<1,0> for 64/128-bit vectors of fadda tests	2022-08-31 11:40:46 +00:00
Markus Böck	2fdf963daf	[GlobalISel] Explicitly fail trying to translate `gc.statepoint` and related intrinsics The provided testcase would previously fail with an assertion due to later down below trying to allocate registers for `token` return types and arguments. This is especially problematic as the process would then exit instead of falling back to using FastIsel. This patch fixes that by simply explicitly failing translation if either of these intrinsics are encountered. Fixes https://github.com/llvm/llvm-project/issues/57349 Differential Revision: https://reviews.llvm.org/D132974	2022-08-31 00:47:17 +02:00
Mingming Liu	4df696fbe9	[NFC] Move a test case across files. The test case is about pmull2 instruction generated used than a SIMD ldr being generated. So aarch64-pmull2.ll is a better test file. Differential Revision: https://reviews.llvm.org/D132277	2022-08-30 14:16:28 -07:00
Stephen Long	40999cbd93	[SVE] Fix SVEDup0 matching -0.0f Because of D128669, CPY is being used to zero active lanes even in the case of -0.0f. This patch checks for floating point positive zero. That way SVEDup0 won't match -0.0f. Fixes https://github.com/llvm/llvm-project/issues/57428 Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D132880	2022-08-30 11:07:17 -07:00
Tomas Matheson	050dad57f7	[AArch64][GISel] constrain regclass for 128->64 copy When selecting G_EXTRACT to COPY for extracting a 64-bit GPR from a 128-bit register pair (XSeqPair) we know enough to constrain the destination register class to gpr64. Without this it may have only a register bank and some copy elimination code would assert while assuming that a register class existed. The register class has to be set explicitly because we might hit the COPY -> COPY case where register class can't be inferred. This would cause the following to crash in selection, where the store is commented (otherwise the store constrains the register class): define dso_local i128 @load_atomic_i128_unordered(i128* %p) { %pair = cmpxchg i128* %p, i128 0, i128 0 acquire acquire %val = extractvalue { i128, i1 } %pair, 0 ; store i128 %val, i128* %p ret i128 %val } Differential Revision: https://reviews.llvm.org/D132665	2022-08-30 11:02:51 +01:00
Tomas Matheson	9a390d6692	[AArch64][GISel] fix G_ADD/G_SUB legalization widenScalarDst updates the insert point to after MI, so widenScalarSrc must be called before widenScalarDst. Otherwise The updated Src values will appear after MI and break SSA. e.g.: %14:_(s64), %15:_(s1) = G_UADDE %9:_, %11:_, %13:_ becomes %14:_(s64), %16:_(s32) = G_UADDE %9:_, %11:_, %17:_ %15:_(s1) = G_TRUNC %16:_(s32) %17:_(s32) = G_ZEXT %13:_(s1) Differential Revision: https://reviews.llvm.org/D132547 Change-Id: Ie3458747a6879433f4d5ab9939d2bd102dd0f2db	2022-08-30 10:59:32 +01:00
Paul Walker	11b4dce7d3	[SVE] Lower fixed-length floating point loads and stores to integer variants. There's no advatange to emitting floating point scalable accesses, whereas by lowering them to integer variants we can benefit from several combines that seek to replace explicit extends/truncates with extending/truncating accesses. Differential Revision: https://reviews.llvm.org/D132393	2022-08-26 11:10:23 +01:00
Matthias Gehre	6d13b80fcb	Revert "[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers" This reverts https://reviews.llvm.org/D120329. I abandoned the PR [0] to add __divei4 functions to compiler-rt in favor of adding a pass to transform div/rem [1]. This removes the backend code that was supposed to emit calls to the __divei4 functions. [0] https://reviews.llvm.org/D120327 [1] https://reviews.llvm.org/D130076 Differential Revision: https://reviews.llvm.org/D130079	2022-08-26 10:52:56 +01:00
Alex Richardson	0483b00875	Mark the $local function begin symbol as a function While this does not matter for most targets, when building for Arm Morello, we have to mark the symbol as a function and add size information, so that LLD can correctly evaluate relocations against the local symbol. Since Morello is an out-of-tree target, I tried to reproduce this with in-tree backends and with the previous reviews applied this results in a noticeable difference when targeting Thumb. Background: Morello uses a method similar Thumb where the encoding mode is specified in the LSB of the symbol. If we don't mark the target as a function, the relocation will not have the LSB set and calls will end up using the wrong encoding mode (which will almost certainly crash). Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D131429	2022-08-26 09:34:04 +00:00
Stephen Long	525af9f8eb	[MC] Omit fill value if it's zero when emitting code alignment Previously, we were generating zeroes when generating code alignments for AArch64, but now we should omit the value and let the assembler choose to generate nops or zeroes. Reviewed By: efriedma, MaskRay Differential Revision: https://reviews.llvm.org/D132508	2022-08-25 10:07:33 -07:00
Matt Devereau	30b045aba6	[AArch64][SVE] Extend LD1RQ ISel patterns to cover missing addressing modes Add some missing patterns for ld1rq's scalar + scalar addressing mode. Also, adds the scalar + imm and scalar + scalar addressing modes for the patterns added in https://reviews.llvm.org/D130010 Differential Revision: https://reviews.llvm.org/D130993	2022-08-25 13:07:37 +00:00
Usman Nadeem	46768052e0	[AArch64][DAGCombine] Fix a bug in performBuildVectorCombine where it could produce an invalid EXTRACT_SUBVECTOR EXTRACT_SUBVECTOR requires that Idx be a constant multiple of ResultType's known minimum vector length. Something like this will produce an invalid extract_subvector: t1: v4i16 = ..... t2: i32 = extract_vector_elt t1, Constant:i64<1> t3: i32 = extract_vector_elt t1, Constant:i64<2> t4: v2i32 = BUILD_VECTOR t2, t3 // produces t5: v2i32 = extract_subvector t...., Constant:i64<1> Differential Revision: https://reviews.llvm.org/D132517 Change-Id: I7a5acf054edee3e89c0f85a28d8869256403ce08	2022-08-24 16:24:19 -07:00
Sami Tolvanen	cff5bef948	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Relands `67504c9549` with a fix for 32-bit builds. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 22:41:38 +00:00
Sami Tolvanen	a79060e275	Revert "KCFI sanitizer" This reverts commit `67504c9549` as using PointerEmbeddedInt to store 32 bits breaks 32-bit arm builds.	2022-08-24 19:30:13 +00:00
Sami Tolvanen	67504c9549	KCFI sanitizer The KCFI sanitizer, enabled with `-fsanitize=kcfi`, implements a forward-edge control flow integrity scheme for indirect calls. It uses a !kcfi_type metadata node to attach a type identifier for each function and injects verification code before indirect calls. Unlike the current CFI schemes implemented in LLVM, KCFI does not require LTO, does not alter function references to point to a jump table, and never breaks function address equality. KCFI is intended to be used in low-level code, such as operating system kernels, where the existing schemes can cause undue complications because of the aforementioned properties. However, unlike the existing schemes, KCFI is limited to validating only function pointers and is not compatible with executable-only memory. KCFI does not provide runtime support, but always traps when a type mismatch is encountered. Users of the scheme are expected to handle the trap. With `-fsanitize=kcfi`, Clang emits a `kcfi` operand bundle to indirect calls, and LLVM lowers this to a known architecture-specific sequence of instructions for each callsite to make runtime patching easier for users who require this functionality. A KCFI type identifier is a 32-bit constant produced by taking the lower half of xxHash64 from a C++ mangled typename. If a program contains indirect calls to assembly functions, they must be manually annotated with the expected type identifiers to prevent errors. To make this easier, Clang generates a weak SHN_ABS `__kcfi_typeid_<function>` symbol for each address-taken function declaration, which can be used to annotate functions in assembly as long as at least one C translation unit linked into the program takes the function address. For example on AArch64, we might have the following code: ``` .c: int f(void); int (*p)(void) = f; p(); .s: .4byte __kcfi_typeid_f .global f f: ... ``` Note that X86 uses a different preamble format for compatibility with Linux kernel tooling. See the comments in `X86AsmPrinter::emitKCFITypeId` for details. As users of KCFI may need to locate trap locations for binary validation and error handling, LLVM can additionally emit the locations of traps to a `.kcfi_traps` section. Similarly to other sanitizers, KCFI checking can be disabled for a function with a `no_sanitize("kcfi")` function attribute. Reviewed By: nickdesaulniers, kees, joaomoreira, MaskRay Differential Revision: https://reviews.llvm.org/D119296	2022-08-24 18:52:42 +00:00
Hassnaa Hamdi	d8f63382e8	AArch64 SVE Add SVE patterns to make use of predicated smin, umin, smax, and umax instructions, add sve-min-max-pred.ll test file for the new patterns Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D132122	2022-08-24 11:09:22 +00:00
Sanjay Patel	f8dfbea324	[SDAG] expand more is-power-of-2 patterns that use popcount (ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0) Adjust the legality check to avoid the poor codegen on AArch64. We probably only want to use popcount on this pattern when it is a single instruction. fixes #57225 Differential Revision: https://reviews.llvm.org/D132237	2022-08-23 17:53:53 -04:00
Sanjay Patel	7d670976db	[AArch64] add test for popcount i32; NFC More coverage for D132237	2022-08-23 17:53:53 -04:00
Alvin Wong	c0214db51a	[llvm] Mark CFGuard fn ptr symbol as DSO local and add tests for mingw For mingw target, if a symbol is not marked DSO local, a `.refptr` is generated for it. This makes CFG check calls use an extra pointer dereference, which adds extra overhead compared to the MSVC version, so mark the CFG guard check funciton pointer DSO local to stop it. This should have no effect on MSVC target. Also adapt the existing cfguard tests to run for mingw targets, so that this change is checked. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D132331	2022-08-23 23:39:39 +03:00
Mingming Liu	945a306501	[AArch64] Change aarch64_neon_pmull{,64} intrinsic ISel through a new SDNode. How: 1) Add AArch64ISD::PMULL SDNode, and extend aarch64_neon_pmull intrinsic tablegen pattern for this SDNode. 2) For aarch64_neon_pmull64, canonicalize i64 operands to v1i64 vectors during legalization. 3) For {aarch64_neon_pmull, aarch64_neon_pmull64}, combine intrinsic to SDNode. Why 1) Adding the SDNode makes it easier to canonicalize i64 inputs (required by aarch64_neon_pmull64) to vector inputs. Vector inputs carries lane information, which helps dag-combiner to combine nodes (e.g. rewrite to a better node to prepare for instruction selection) and instruction-selection to emit instructions that use higher-half inputs in place (i.e., no need to move lane 1 content to lane 0). 2) Using the SDNode for aarch64_neon_pmull64 is NFC, yet without this we have to move the definition of {PMULLv1i64, PMULLv2i64} out of its current group of records without gains. Test cases are commented with what is being tested in `aarch64-pmull2.ll` and `pmull-ldr-merge.ll` under directory `llvm/test/CodeGen/AArch64`. Differential Revision: https://reviews.llvm.org/D131047	2022-08-19 13:17:13 -07:00
Mingming Liu	3e6d1a6f54	[NFC][AArch64] Precommit test to optimize instruction selection for aarch64_neon_pmull64 intrinsic. Differential Revision: https://reviews.llvm.org/D131045	2022-08-19 13:17:13 -07:00
Archibald Elliott	270c179afd	[AArch64][GISel] Lower llvm.prefetch This change adds support for lowering llvm.prefetch directly using GlobalISel. Currently, llvm.prefetch falls back to SelectionDAG. This Change: - Adds an AArch64-specific G_PREFETCH generic instruction, to be used where AArch64ISD::PREFETCH is used in SelectionDAG. - Adds the GINodeEquiv so patterns are translated over to GlobalISel automatically. - Corrects the AArch64Prefetch patterns to use a target immediate, which is needed to get the patterns to translate across correctly. - Translates the SelectionDAG legalisation of the prefetch intrinsic into the corresponding GlobalISel legalisation. Differential Revision: https://reviews.llvm.org/D132043	2022-08-19 09:11:18 +01:00
wanglian	fc2b4dfef2	[DAGCombiner] Add use check for VSCALE in visitSUB. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D132115	2022-08-19 09:46:18 +08:00
Alexander Shaposhnikov	fba0367e03	[AArch64] Enable AdrpAdd fusion for neoverse-n1 AdrpAdd fusion is already enabled for "generic" and it helps the linker apply relocation relaxations for more adrp+add pairs. This patch enables it for -mtune=neoverse-n1. Differential revision: https://reviews.llvm.org/D132075	2022-08-19 00:27:32 +00:00
Karl Meakin	71f0ec242f	[AArch64] Add `foldCSELOfCSEL` combine. This time more conservative. Differential review: https://reviews.llvm.org/D125504	2022-08-19 01:04:29 +01:00
Paul Walker	96c8d615d6	[SVE] Extend findMoreOptimalIndexType so BUILD_VECTORs do not force 64bit indices. Extends findMoreOptimalIndexType to allow ISD::BUILD_VECTOR based indices to be truncated when such truncation is lossless. This can enable the use of 32bit gather/scatter indices thus making it less likely to have to split a gather/scatter in two. Depends on D125194 Differential Revision: https://reviews.llvm.org/D130533	2022-08-18 18:00:53 +01:00
wanglian	eeac894418	Precommit tests for D132115 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D132116	2022-08-18 17:58:15 +08:00
Sanjay Patel	7f72a0f5bb	[SDAG] avoid generating libcall to function with same name This is a potentially better alternative to D131452 that also should avoid the infinite loop bug from: issue #56403 This is again a minimal fix to reduce merging pain for the release. But if this makes sense, then we might want to guard all of the RTLIB generation (and other libcalls?) with a similar name check. Differential Revision: https://reviews.llvm.org/D131521	2022-08-17 16:19:34 -04:00
Matthias Braun	19ce5e515f	RAGreedyStats: Ignore identity COPYs; count COPYs from/to physregs Improve copy statistics: - Count copies from or to physical registers: They are used to model function parameters and calling conventions and the register allocator optimizes for them. - Check physical registers assigned to virtual registers and stop counting "identity" `COPY`s where source and destination is the same physical registers; they will be removed in the `virtregmap` pass anyway. Differential Revision: https://reviews.llvm.org/D131932	2022-08-17 12:53:29 -07:00
Sanjay Patel	8eddd1ec60	[AArch64] add test for recursive libcall lowering; NFC Issue #56403	2022-08-17 14:54:50 -04:00
Vladislav Dzhidzhoev	4b57939583	[AArch64][GlobalISel] Fallback to generic lowering of G_CTPOP Use generic lowering of G_CTPOP for s32 and s64 scalars when noimplicitfloat is specified. Differential Revision: https://reviews.llvm.org/D131454	2022-08-17 21:10:27 +03:00
Nick Desaulniers	6b0e2fa6f0	[SelectionDAG] make INLINEASM_BR use MachineBasicBlocks instead of BlockAddresses As part of re-architecting callbr to no longer use blockaddresses (https://reviews.llvm.org/D129288), we don't really need them in MIR. They make comparing MachineBasicBlocks of indirect targets during MachineVerifier a PITA. Suggested by @efriedma from the discussion: https://reviews.llvm.org/D130290#3669531 Reviewed By: efriedma, void Differential Revision: https://reviews.llvm.org/D130316	2022-08-17 09:34:31 -07:00
OverMighty	232953f996	[AArch64] Add pattern for SQDML*Lv1i32_indexed There was no pattern to fold into these instructions. This patch adds the pattern obtained from the following ACLE intrinsics so that they generate sqdmlal/sqdmlsl instructions instead of separate sqdmull and sqadd/sqsub instructions: - vqdmlalh_s16, vqdmlslh_s16 - vqdmlalh_lane_s16, vqdmlalh_laneq_s16, vqdmlslh_lane_s16, vqdmlslh_laneq_s16 (when the lane index is 0) It also modifies the result of the existing pattern for the latter, when the lane index is not 0, to use the v1i32_indexed instructions instead of the v4i16_indexed ones. Fixes #49997. Differential Revision: https://reviews.llvm.org/D131700	2022-08-17 12:00:47 +01:00
Vitaly Buka	16fecdfa70	Revert "[AArch64] Add `foldCSELOfCSEl` DAG combine" Breaks ubsan on buildbot, details in D125504 This reverts commit `6f9423ef06`.	2022-08-16 20:29:37 -07:00
Eli Friedman	cfd2c5ce58	Untangle the mess which is MachineBasicBlock::hasAddressTaken(). There are two different senses in which a block can be "address-taken". There can be a BlockAddress involved, which means we need to map the IR-level value to some specific block of machine code. Or there can be constructs inside a function which involve using the address of a basic block to implement certain kinds of control flow. Mixing these together causes a problem: if target-specific passes are marking random blocks "address-taken", if we have a BlockAddress, we can't actually tell which MachineBasicBlock corresponds to the BlockAddress. So split this into two separate bits: one for BlockAddress, and one for the machine-specific bits. Discovered while trying to sort out related stuff on D102817. Differential Revision: https://reviews.llvm.org/D124697	2022-08-16 16:15:44 -07:00
Karl Meakin	6f9423ef06	[AArch64] Add `foldCSELOfCSEl` DAG combine Differential Revision: https://reviews.llvm.org/D125504	2022-08-16 12:49:11 +01:00
Zain Jaffal	7155ed4289	[AArch64] Add support for 256-bit non temporal loads Currenlty all temporal loads are mapped to `LDP` or `LDR`. This patch will map all the non temporal 256-bit loads into `LDNP`. Future patches should address other non-temporal loads. Reviewed By: fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D131773	2022-08-16 12:19:36 +01:00
Vitaly Buka	e0e960923f	[AArch64] Fix signed integer overflow in CSINC case Followup to D131815, which overlflows on different values.	2022-08-15 15:04:20 -07:00
Peter Waller	6e85db7293	[DAGCombine] Combine signext_inreg of extract-extend The outer signext_inreg is redundant in the following: Fold (signext_inreg (extract_subvector (zext\|anyext\|sext iN_value to _) _) from iN) -> (extract_subvector (signext iN_value to iM)) Tests are precommitted and clone those by analogy from the AND case in the same file. Add a negative test to check extension width is handled correctly. This patch supersedes D130700. Differential Revision: https://reviews.llvm.org/D131503	2022-08-15 10:58:07 +00:00
Zain Jaffal	df4878d28d	[AArch64] Tests for non-temporal loads. Add some test cases for D131773 where LDNP could be used as well as negative tests. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D131767	2022-08-15 09:16:02 +01:00
Vitaly Buka	f1596952f9	[AArch64] Fix signed integer overflow in CSINC case https://lab.llvm.org/staging/#/builders/224/builds/2/steps/16/logs/stdio Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D131815	2022-08-13 13:12:09 -07:00
Florian Hahn	c2af37dcdb	Revert "[AArch64][GlobalISel] Recognise some CCMPri" This reverts commit `38c2366b3f`. This patch seems to break boostraping LLVM with `-fglobal-isel -O3` on AArch64 hardware. Without the revert, there are 500+ test failures for the `check-llvm-codegen-x86` target.	2022-08-13 17:44:41 +01:00
David Green	a9e9dd9a3a	[AArch64] Add bf16 select handling A bfloat select operation will currently crash, but is allowed from C. This adds handling for the operation, turning it into a FCSELHrrr if fullfp16 is present, or converting it to a FCSELSrrr if not. The FCSELSrrr is created via using INSERT_SUBREG/EXTRACT_SUBREG to convert the bf16 to a f32 and using the f32 pattern for FCSELSrrr. (I originally attempted to do this via a tablegen pattern, but it appears that the nzcv glue is places onto the wrong node, causing it to be forgotten and incorrect scheduling to be emitted). The FCSELSrrr can also be used for fp16 selects when +fullfp16 is not present, which helps avoid an unnecessary promotion to f32. Differential Revision: https://reviews.llvm.org/D131253	2022-08-11 14:20:36 +01:00
Andre Vieira	1640679187	[TypePromotion] Search from ZExt + PHI Expand TypePromotion pass to try to promote PHI-nodes in loops that are the operand of a ZExt, using the ZExt's result type to determine the Promote Width. Differential Revision: https://reviews.llvm.org/D111237	2022-08-11 09:50:10 +01:00
Edd Barrett	fa250250b2	Migrate llvm.experimental.patchpoint() to ptr. This intrinsic used a typed pointer for a call target operand. This change updates the operand to be an opaque pointer and updates all pointers in all test files that use the intrinsic. Differential revision: https://reviews.llvm.org/D131261	2022-08-10 13:18:02 +01:00
David Truby	b1b9c39629	[AArch64][SVE] Use SVE for VLS fcopysign for wide vectors Currently fcopysign for VLS vectors lowers through NEON even when the vector width is wider than a NEON vector, causing bad codegen as the vectors are split. This patch causes SVE to be used for these vectors instead, giving much better codegen on wide VLS vectors. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D128642	2022-08-10 10:17:19 +00:00
David Green	20e6239a44	[AArch64] Regenerate arm64-fmax.ll test. NFC	2022-08-09 16:59:00 +01:00
Peter Waller	310962f25e	[DAGCombine][NFC] Precommit extract-subvec-combine sext tests	2022-08-09 15:44:15 +00:00
Luo, Yuanke	aaf6c7b05c	[globalisel] Select register bank for DBG_VALUE The register operand of DBG_VALUE is not selected to a proper register bank in both AArch64 and X86. This would cause getRegClass crash after global ISel. After discussion, we think the MIR should assume all vritual register should be set proper register class after global ISel, so this patch is to fix the gap of DBG_VALUE for AArch64 and X86. Differential Revision: https://reviews.llvm.org/D129037	2022-08-09 13:11:51 +08:00

1 2 3 4 5 ...

5955 Commits