llvm-project

Commit Graph

Author	SHA1	Message	Date
Amara Emerson	95ac3d15e9	[AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full scalarization. For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize completely if the source is <= 64b. This change adds support for that in the legalizer. If the source has a pow-2 num elements, then we can do a tree reduction using the scalar operation in the individual elements. Otherwise, we just create a sequential chain of operations. For AArch64, we only need to scalarize if the input is <64b. If it's great than 64b then we can first do a fewElements step to 64b, taking advantage of vector instructions until we reach the point of scalarization. I also had to relax the verifier checks for reductions because the intrinsics support <1 x EltTy> types, which we lower to scalars for GlobalISel. Differential Revision: https://reviews.llvm.org/D108276	2021-08-19 16:38:52 -07:00
Amara Emerson	a0051f7149	[AArch64][GlobalISel] Fix miscompile of <16 x s8> G_EXTRACT_VECTOR_ELT. When support for copying vector s8 lanes was added recently, this also had the side effect of fixing a fallback for <16 x s8> extracts since both used the same helper. However, there was a bug in another helper to get the regclass for a specific FPR-native type, which was assigning FPR16 to s8 instead of FPR8.	2021-08-19 16:22:32 -07:00
Tim Northover	edab411ee6	AArch64: copy all parts of the mem operand across when combining a store In particular we were dropping volatility, which can lead to unwanted transformations.	2021-08-19 18:26:39 +01:00
Owen Anderson	06a4c85890	Use v16i8 rather than v2i64 as the VT for memset expansion on AArch64. This allows the instruction selector to realize that it can directly broadcast the low byte of the memset value, rather than replicating it to a 64-bit GPR before broadcasting. This fixes PR50985. Differential Revision: https://reviews.llvm.org/D108354	2021-08-19 16:54:07 +00:00
Matthew Devereau	734708e04f	[AArch64][SVE] Teach cost model that masked loads/stores are cheap Reduce the cost of VLS masked loads/stores to make the vectorizor emit them more frequently.	2021-08-19 13:01:33 +01:00
David Sherwood	f4122398e7	[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64 I have added a new TTI interface called enableOrderedReductions() that controls whether or not ordered reductions should be enabled for a given target. By default this returns false, whereas for AArch64 it returns true and we rely upon the cost model to make sensible vectorisation choices. It is still possible to override the new TTI interface by setting the command line flag: -force-ordered-reductions=true\|false I have added a new RUN line to show that we use ordered reductions by default for SVE and Neon: Transforms/LoopVectorize/AArch64/strict-fadd.ll Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll Differential Revision: https://reviews.llvm.org/D106653	2021-08-19 09:29:40 +01:00
Jessica Paquette	c22b64ef66	[AArch64][GlobalISel] Don't allow s128 for G_ISNAN getAPFloatFromSize doesn't support s128, so we can't lower this without asserting right now. To fix the buildbots, don't allow any scalars other than s16, s32, and s64.	2021-08-18 13:59:00 -07:00
Jessica Paquette	3d91d5b757	[AArch64][GlobalISel] Mark G_FMINNUM/G_FMAXNUM as floating point opcodes We need to ensure that these end up on FPR to allow imported patterns to select them. This will also ensure that we get good regbank selection when dealing with instructions like G_PHI/G_LOAD/G_STORE which deduce their banks from their uses/users. Differential Revision: https://reviews.llvm.org/D108260	2021-08-18 13:32:19 -07:00
Jessica Paquette	45e1a6bd25	[AArch64][GlobalISel] Legalize scalar G_FMINNUM + G_FMAXNUM For subtargets with full FP16, this is legal for s16, s32, and s64. Without full FP16, it's legal for s32 and s64. For s128, this is a libcall. We also support some vector types, but for now, let's just support scalars. Differential Revision: https://reviews.llvm.org/D108259	2021-08-18 13:30:03 -07:00
Jessica Paquette	791006fb8c	[GlobalISel] Implement lowering for G_ISNAN + use it in AArch64 GlobalISel equivalent to `TargetLowering::expandISNAN`. Use it in AArch64 and add a testcase. Differential Revision: https://reviews.llvm.org/D108227	2021-08-18 10:54:25 -07:00
David Sherwood	219d4518fc	[Analysis][AArch64] Make fixed-width ordered reductions slightly more expensive For tight loops like this: float r = 0; for (int i = 0; i < n; i++) { r += a[i]; } it's better not to vectorise at -O3 using fixed-width ordered reductions on AArch64 targets. Although the resulting number of instructions in the generated code ends up being comparable to not vectorising at all, there may be additional costs on some CPUs, for example perhaps the scheduling is worse. It makes sense to deter vectorisation in tight loops. Differential Revision: https://reviews.llvm.org/D108292	2021-08-18 17:01:56 +01:00
Tim Northover	8eb054a87d	AArch64: compare correct type for multi-valued SDNode. If Orig produces more than one value (rare) with different types (rarer) then we need to make sure we check against the one that Orig actually represents, not just the first type. Unfortunately because of the combination of things that need to happen I wasn't able to produce a test.	2021-08-18 09:35:31 +01:00
Amara Emerson	284006079e	[AArch64][GlobalISel] Add support for selection of s8:fpr = G_UNMERGE <8 x s8>	2021-08-18 00:34:06 -07:00
Arthur Eubanks	46cf82532c	[NFC] Replace Function handling of attributes with less confusing calls To avoid magic constants and confusing indexes.	2021-08-17 21:05:40 -07:00
Simon Pilgrim	caff2acae1	[AArch64] AArch64DAGToDAGISel::tryReadRegister/tryWriteRegister - don't dereference dyn_cast<> results. dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct. Fixes static analyser warnings.	2021-08-17 18:40:59 +01:00
Dylan Fleming	ef198cd99e	[SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute Removed AArch64 usage of the getMaxVScale interface, replacing it with the vscale_range(min, max) IR Attribute. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D106277	2021-08-17 14:42:47 +01:00
Simon Pilgrim	895ed64009	[AArch64] LowerCONCAT_VECTORS - merge getNumOperands() calls. NFCI. Improves on the unused variable fix from rG4357562067003e25ab343a2d67a60bd89cd66dbf	2021-08-17 11:23:03 +01:00
Jordan Rupprecht	4357562067	[NFC][AArch64] Fix unused var in release build	2021-08-16 10:04:32 -07:00
Simon Pilgrim	d6fe8d37c6	[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597	2021-08-16 16:06:54 +01:00
Cullen Rhodes	09507b5325	[AArch64][SME] Disable NEON in streaming mode In streaming mode most of the NEON instruction set is illegal, disable NEON when compiling with `+streaming-sve`, unless NEON is explictly requested. Subsequent patches will add support for the small subset of NEON instructions that are legal in streaming mode. Reviewed By: paulwalker-arm, david-arm Differential Revision: https://reviews.llvm.org/D107902	2021-08-16 07:56:48 +00:00
Nikita Popov	81b106584f	[AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476) This is a non-intrusive fix for https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport to the 13.x release branch. It expands on the current hack by distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have the obvious meaning and 2 means "anything else". The new optimization from D98564 should only be performed for CmpValue of 0 or 1. For main, I think we should switch the analyzeCompare() and optimizeCompare() APIs to use int64_t instead of int, which is in line with MachineOperand's notion of an immediate, and avoids this problem altogether. Differential Revision: https://reviews.llvm.org/D108076	2021-08-15 12:35:52 +02:00
Kazu Hirata	915cc69259	[Aarch64] Remove redundant c_str (NFC) Identified with readability-redundant-string-cstr.	2021-08-14 08:49:40 -07:00
Arthur Eubanks	d7593ebaee	[NFC] Clean up users of AttributeList::hasAttribute() AttributeList::hasAttribute() is confusing, use clearer methods like hasParamAttr()/hasRetAttr(). Add hasRetAttr() since it was missing from AttributeList.	2021-08-13 11:59:18 -07:00
Arthur Eubanks	92ce6db9ee	[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr() This is more consistent with similar methods.	2021-08-13 11:09:18 -07:00
Jessica Paquette	ccfc079047	[AArch64][GlobalISel] Legalize scalar G_SSUBSAT + G_SADDSAT These are lowered, matching SDAG behaviour. (See llvm/test/CodeGen/AArch64/ssub_sat.ll and llvm/test/CodeGen/AArch64/sadd_sat.ll) These fall back ~159 times on a build of clang with GISel enabled. Differential Revision: https://reviews.llvm.org/D107777	2021-08-13 09:02:25 -07:00
David Truby	9c47d6b48d	[llvm][sve] Lowering for VLS extending loads This patch enables extending loads for fixed length SVE code generation. There is a slight regression here in the mulh tests; since these tests load the parameter and then extend it these are treated as extending loads which are merged, preventing the mulh instruction from being generated. As this affects scalable SVE codegen as well this should be addressed in a separate patch. Reviewed By: bsmith Differential Revision: https://reviews.llvm.org/D107057	2021-08-12 09:43:39 +00:00
Cullen Rhodes	419deccfd1	[AArch64] NFC: Remove register decoder tables in disassembler The register classes are generated by TableGen, use them instead of handwritten tables. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107763	2021-08-12 07:28:56 +00:00
Amara Emerson	73056f239e	[AArch64][GlobalISel] Simplify/nuke the merge/unmerge legalizer rules. These rules were originally written when the new predicate based legalizer was introduced in an attempt to preserve existing behaviour. It wasn't properly kept up to date as things like vector support was split out into G_CONCAT_VECTORS, and frankly, even if it was, it was too complex. It's much easier to start from scratch with what we can actually support, which is just a few type combinations. Anything illegal we should either legalize, or should be eliminated as a side effect of artifact combination. Differential Revision: https://reviews.llvm.org/D107937	2021-08-11 16:45:23 -07:00
Usman Nadeem	9396c3ec7b	[AArch64][SVE] Remove assertion/range check for i16 values during immediate selection The assertion can fail in some cases when an i16 constant is promoted to i32. e.g. in the added test case the value `i16 -32768` is within the range of i16 but the assert fails when the constant is promoted to positive `i32 32768` by an earlier call to DAG.getConstant(). Differential Revision: https://reviews.llvm.org/D107880 Change-Id: I2f6179783cbc9630e6acab149a762b43c65664de	2021-08-11 14:50:20 -07:00
Amara Emerson	2c1789bc8c	[AArch64][GlobalISel] Add ptradd_immed_chain combine to post-legalizer combiner.	2021-08-11 13:59:23 -07:00
Cullen Rhodes	1fe0e6a380	[AArch64][SME] Support ptrue(s) in streaming mode The ptrue and ptrues instructions are legal in streaming mode, missed in D106272. The reference can be found here: https://developer.arm.com/documentation/ddi0602/2021-06/SVE-Instructions Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D107807	2021-08-11 07:49:36 +00:00
Christopher Di Bella	c874dd5362	[llvm][clang][NFC] updates inline licence info Some files still contained the old University of Illinois Open Source Licence header. This patch replaces that with the Apache 2 with LLVM Exception licence. Differential Revision: https://reviews.llvm.org/D107528	2021-08-11 02:48:53 +00:00
David Green	013030a0b2	[AArch64] Correct sinking of shuffles to adds/subs This was checking extends as shuffles, where as we should be checking the operands. This helps sink the shuffles, creating more addl/subl instructions. Differential Revision: https://reviews.llvm.org/D107623	2021-08-10 13:25:42 +01:00
Tim Northover	5ad0860899	AArch64: support @llvm.va_copy in GISel	2021-08-10 13:11:03 +01:00
Cullen Rhodes	81f057c253	[AArch64][SVE] NFC: Remove unused p0-p7 with element size predicates Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D107752	2021-08-10 07:56:22 +00:00
Usman Nadeem	5420fc4a27	[AArch64][SVE][InstCombine] Unpack of a splat vector -> Scalar extend Replace vector unpack operation with a scalar extend operation. unpack(splat(X)) --> splat(extend(X)) If we have both, unpkhi and unpklo, for the same vector then we may save a register in some cases, e.g: Hi = unpkhi (splat(X)) Lo = unpklo(splat(X)) --> Hi = Lo = splat(extend(X)) Differential Revision: https://reviews.llvm.org/D106929 Change-Id: I77c5c201131e3a50de1cdccbdcf84420f5b2244b	2021-08-09 14:58:54 -07:00
Usman Nadeem	85bbc05154	[AArch64][SVE][InstCombine] Move last{a,b} before binop if one operand is a splat value Move the last{a,b} operation to the vector operand of the binary instruction if the binop's operand is a splat value. This essentially converts the binop to a scalar operation. Example: // If x and/or y is a splat value: lastX (binop (x, y)) --> binop(lastX(x), lastX(y)) Differential Revision: https://reviews.llvm.org/D106932 Change-Id: I93ff5302f9a7972405ee0d3854cf115f072e99c0	2021-08-09 14:48:41 -07:00
Eli Friedman	ac20e56911	[AArch64] Implement FCOPYSIGN for SVE. I was originally going to try to implement this in target-independent code, but it's actually sort of tricky to generate the correct sequence for vectors like nxv2f32. So just stick this in target-specific code, at least for now. Differential Revision: https://reviews.llvm.org/D107608	2021-08-09 12:06:48 -07:00
Bradley Smith	73ecb9987b	[AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter The patterns for fixed length gather/scatter with 32-bit offsets and 64-bit memory type are slightly different that the rest of the patterns, as such the lowering needs to be slightly different to ensure the correct types are used. Differential Revision: https://reviews.llvm.org/D107576	2021-08-09 14:05:22 +00:00
Cullen Rhodes	1a18bb9270	[AArch64] NFC: Remove DecodeVectorRegisterClass from disassembler The decoder function and table are the same as FPR128, use that instead. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107644	2021-08-09 06:52:47 +00:00
Amara Emerson	2b067e3335	Change TargetLowering::canMergeStoresTo() to take a MF instead of DAG. DAG is unnecessary and we need this hook to implement store merging on GlobalISel too.	2021-08-06 12:57:53 -07:00
Cullen Rhodes	08bc441174	[AArch64] NFC: drop unnecessary llvm:: namespace prefix on MCInst	2021-08-06 09:23:18 +00:00
Jessica Paquette	e6a3944ea9	[AArch64][GlobalISel] Overhaul G_INSERT legalization Similar cleanup to G_EXTRACT (`51bd4e874f`). Also swap the order of clamp/widen to avoid unnecessary complex merges. Add a bunch of missing testcases to legalize-inserts while we're at it. Differential Revision: https://reviews.llvm.org/D107601	2021-08-05 18:28:22 -07:00
Jessica Paquette	562c8e14d9	[AArch64][GlobalISel] Widen G_IMPLICIT_DEF and G_FREEZE before clamping Similar to other cleanup commits which widen instructions before clamping during legalization. Purpose of this is to avoid weird type breakdowns. In terms of G_IMPLICIT_DEF, this simplifies legalization for other instructions. The legalizer has to emit G_IMPLICIT_DEF to legalize certain instructions, so this can help with emitting merges elsewhere. Differential Revision: https://reviews.llvm.org/D107604	2021-08-05 18:21:14 -07:00
Jessica Paquette	8a557d8311	[AArch64][GlobalISel] Widen extloads before clamping during legalization Allows us to avoid awkward type breakdowns on types like s88, like the other commits. Differential Revision: https://reviews.llvm.org/D107587	2021-08-05 16:14:06 -07:00
David Green	649cf4514d	[AArch64] Expand the SVE min/max reduction costs to NEON This takes the existing SVE costing for the various min/max reduction intrinsics and expands it to NEON, where I believe it applies equally well. In the process it changes the lowering to use min/max cost, as opposed to summing up the cost of ICmp+Select. Differential Revision: https://reviews.llvm.org/D106239	2021-08-05 23:23:24 +01:00
Jessica Paquette	36498374d4	[AArch64][GlobalISel] Widen G_BSWAP before clamping This allows us to avoid odd type breakdowns + allows us to legalize types like s88 in the first place. Add some testcases for known legal types + testcases for s4 and s88. Differential Revision: https://reviews.llvm.org/D107607	2021-08-05 15:16:00 -07:00
Jessica Paquette	51bd4e874f	[AArch64][GlobalISel] Overhaul G_EXTRACT legalization This simplifies our existing G_EXTRACT rules and adds some test coverage. Mostly changing this because it should make it easier to improve legalization for instructions which use G_EXTRACT as part of the legalization process. This also adds support for legalizing some weird types. Similar to other recent legalizer changes, this changes the order of widening/clamping. There was some dead code in our existing rules (e.g. the p0 case would never get hit), so this knocks those out and makes the types we want to handle explicit. This also removes some checks which, nowadays, are handled by the MachineVerifier. Differential Revision: https://reviews.llvm.org/D107505	2021-08-05 13:55:15 -07:00
Jon Roelofs	98f38c151b	[AArch64][GlobalISel] Legalize ctpop s128 This is re-landing the same patch again, but without the changes to LegalizerHelper that regressed the Mips test: test/CodeGen/Mips/GlobalISel/llvm-ir/ctpop.ll Differential revision: https://reviews.llvm.org/D106494	2021-08-05 11:54:53 -07:00
Jessica Paquette	f3f3098afe	[AArch64][GlobalISel] Mark v16s8 <- v8s8, v8s8 G_CONCAT_VECTOR as legal G_CONCAT_VECTORS shows up from time to time when legalizing other instructions. We actually import patterns for the v16s8 <- v8s8, v8s8 case so marking it as legal gives us selection for free. Differential Revision: https://reviews.llvm.org/D107512	2021-08-05 09:40:46 -07:00

1 2 3 4 5 ...

5409 Commits