llvm-project

Commit Graph

Author	SHA1	Message	Date
Amara Emerson	a35c2c7942	[GlobalISel] Implement fewerElements legalization for vector reductions. This patch adds 3 methods, one for power-of-2 vectors which use tree reductions using vector ops, before a final reduction op. For non-pow-2 types it generates multiple narrow reductions and combines the values with scalar ops. Differential Revision: https://reviews.llvm.org/D97163	2021-03-30 11:19:21 -07:00
Amara Emerson	1bc90847ee	[AArch64][GlobalISel] Define some legalization rules for G_ROTR and G_ROTL. For imported pattern purposes, we have a custom rule that promotes the rotate amount to 64b as well. Differential Revision: https://reviews.llvm.org/D99463	2021-03-30 11:11:19 -07:00
Amara Emerson	91887cd4ec	[AArch64][GlobalISel] Combine funnel shifts to rotates. Differential Revision: https://reviews.llvm.org/D99388	2021-03-30 11:00:36 -07:00
Jessica Paquette	700431128e	[GlobalISel][AArch64] Combine G_SEXT_INREG + right shift -> G_SBFX Basically a port of isBitfieldExtractOpFromSExtInReg in AArch64ISelDAGToDAG. This is only done post-legalization for now. Once the legalizer knows how to decompose these back into shifts, this requirement can probably be removed. Differential Revision: https://reviews.llvm.org/D99230	2021-03-30 10:14:30 -07:00
Joe Ellis	a7dde4c5f7	[AArch64][SVE] Lower fixed length INSERT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98496	2021-03-30 09:37:11 +00:00
Joe Ellis	c4d39f64d0	[AArch64][SVE] Lower fixed length EXTRACT_VECTOR_ELT Differential Revision: https://reviews.llvm.org/D98625	2021-03-30 09:35:44 +00:00
Jun Ma	1af373c673	[AArch64][SVE] Codegen dup_lane for dup(vector_extract) Differential Revision: https://reviews.llvm.org/D99324	2021-03-30 10:35:08 +08:00
Jun Ma	b0db2dbc29	[AArch64][SVEIntrinsicOpts] Optimize tbl+dup into dup+extractelement Differential Revision: https://reviews.llvm.org/D99412	2021-03-30 10:35:08 +08:00
Jessica Paquette	247ff26a89	[AArch64][GlobalISel] NFC: Replace IR regbankselect test with MIR test regbank-ceil.ll -> regbank-ceil.mir The IR test was intended to only check register banks. This makes it brittle, especially as we improve load/store combines in GlobalISel. Rewriting this as a MIR test also makes it more consistent with the rest of the testcases in GlobalISel.	2021-03-29 16:32:34 -07:00
Florian Hahn	482283042f	[AArch64] Remove custom zext/sext legalization code. Currently performExtendCombine assumes that the src-element bitwidth * 2 is a valid MVT. But this is not the case for i1 and it causes a crash on the v64i1 test cases added in this patch. It turns out that this code appears to not be needed; the same patterns are handled by other code and we end up with the same results, even without the custom lowering. I also added additional test cases in `a50037aaa6`. Let's just remove the unneeded code. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D99437	2021-03-29 22:22:05 +01:00
Florian Hahn	a50037aaa6	[AArch64] Add a few more vector extension tests.	2021-03-29 18:56:00 +01:00
Bradley Smith	9745dce8c3	[SelectionDAG][AArch64][SVE] Perform SETCC condition legalization in LegalizeVectorOps This is currently performed in SelectionDAGLegalize, here we make it also happen in LegalizeVectorOps, allowing a target to lower the SETCC condition codes first in LegalizeVectorOps and then lower to a custom node afterwards, without having to duplicate all of the SETCC condition legalization in the target specific lowering. As a result of this, fixed length floating point SETCC nodes can now be properly lowered for SVE. Differential Revision: https://reviews.llvm.org/D98939	2021-03-29 15:32:25 +01:00
Matt Arsenault	2f779e79d5	AArch64/GlobalISel: Remove IR section from test	2021-03-28 11:12:59 -04:00
Amara Emerson	55533203d7	[GlobalISel] Add G_ROTR and G_ROTL opcodes for rotates. Differential Revision: https://reviews.llvm.org/D99383	2021-03-25 17:23:30 -07:00
Jessica Paquette	23f657c165	[AArch64][GlobalISel] Emit bzero on Darwin Darwin platforms for both AArch64 and X86 can provide optimized `bzero()` routines. In this case, it may be preferable to use `bzero` in place of a memset of 0. This adds a G_BZERO generic opcode, similar to G_MEMSET et al. This opcode can be generated by platforms which may want to use bzero. To emit the G_BZERO, this adds a pre-legalize combine for AArch64. The conditions for this are largely a port of the bzero case in `AArch64SelectionDAGInfo::EmitTargetCodeForMemset`. The only difference in comparison to the SelectionDAG code is that, when compiling for minsize, this will fire for all memsets of 0. The original code notes that it's not beneficial to do this for small memsets; however, using bzero here will save a mov from wzr. For minsize, I think that it's preferable to prioritise omitting the mov. This also fixes a bug in the libcall legalization code which would delete instructions which could not be legalized. It also adds a check to make sure that we actually get a libcall name. Code size improvements (Darwin): - CTMark -Os: -0.0% geomean (-0.1% on pairlocalalign) - CTMark -Oz: -0.2% geomean (-0.5% on bullet) Differential Revision: https://reviews.llvm.org/D99358	2021-03-25 17:14:25 -07:00
Amara Emerson	0d2c4db637	[GlobalISel] Fix crash in RBS with a non-generic IMPLICIT_DEF. This may occur when swifterror codegen in the translator generates these, but we shouldn't try to handle them since they should have regclasses anyway. rdar://75784009 Differential Revision: https://reviews.llvm.org/D99287	2021-03-24 23:08:51 -07:00
Jessica Paquette	a141c7d06b	[AArch64][GlobalISel] Select G_SBFX and G_UBFX Add selection support for G_SBFX and G_UBFX and add a test. These must always have a constant LSB and width. Differential Revision: https://reviews.llvm.org/D99224	2021-03-24 11:15:57 -07:00
Jessica Paquette	1818dc394f	[AArch64][GlobalISel] Mark G_SBFX/G_UBFX as legal for s32 and s64 This isn't perfect, since we should also verify that these only use constants. Differential Revision: https://reviews.llvm.org/D99219	2021-03-24 11:08:41 -07:00
Nashe Mncube	ac2a1e9596	[SVE] Suppress vselect warning from incorrect interface call The VSelectCombine handler within AArch64ISelLowering, uses an interface call which only expects fixed vectors. This generates a warning when the call is made on a scalable vector. This warning has been suppressed with this change, by using the ElementCount interface, which supports both fixed and scalable vectors. I have also added a regression test which recreates the warning. Differential Revision: https://reviews.llvm.org/D98249	2021-03-24 14:34:34 +00:00
Amara Emerson	45a7fe1911	[AArch64][GlobalISel] Add test for G_FSHR legalization.	2021-03-23 16:11:45 -07:00
Amara Emerson	7bddf00581	[AArch64][GlobalISel] Lower G_FSHL and G_FSHR. Codegen isn't as good as we need it, but that'll be done later.	2021-03-23 16:09:19 -07:00
Amara Emerson	75b6a47bd0	[AArch64][GlobalISel] Lower G_CTLZ_ZERO_UNDEF. This adds some missing legalizer tests, which uncovered a v2s64 selection test that wasn't working since there's no legalization or instruction for that.	2021-03-23 12:49:10 -07:00
David Sherwood	748ae5281d	[IR][SVE] Add new llvm.experimental.stepvector intrinsic This patch adds a new llvm.experimental.stepvector intrinsic, which takes no arguments and returns a linear integer sequence of values of the form <0, 1, ...>. It is primarily intended for scalable vectors, although it will work for fixed width vectors too. It is intended that later patches will make use of this new intrinsic when vectorising induction variables, currently only supported for fixed width. I've added a new CreateStepVector method to the IRBuilder, which will generate a call to this intrinsic for scalable vectors and fall back on creating a ConstantVector for fixed width. For scalable vectors this intrinsic is lowered to a new ISD node called STEP_VECTOR, which takes a single constant integer argument as the step. During lowering this argument is set to a value of 1. The reason for this additional argument at the codegen level is because in future patches we will introduce various generic DAG combines such as mul step_vector(1), 2 -> step_vector(2) add step_vector(1), step_vector(1) -> step_vector(2) shl step_vector(1), 1 -> step_vector(2) etc. that encourage a canonical format for all targets. This hopefully means all other targets supporting scalable vectors can benefit from this too. I've added cost model tests for both fixed width and scalable vectors: llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll as well as codegen lowering tests for fixed width and scalable vectors: llvm/test/CodeGen/AArch64/neon-stepvector.ll llvm/test/CodeGen/AArch64/sve-stepvector.ll See this thread for discussion of the intrinsic: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html	2021-03-23 10:43:35 +00:00
Joe Ellis	6dc32da1b0	[AArch64][SVE] Test more types in sve-fixed-length-subvector.ll Previously only the i32 type was tested. Now, the {i,f}{16,32,64} types are tested. The v8{i,f}16 cases lower differently to the other cases, which is worth defending. The lowering for the other cases is currently identical, but probably worth having for the better coverage. Differential Revision: https://reviews.llvm.org/D98690	2021-03-22 14:09:05 +00:00
Sjoerd Meijer	7515e81e8c	[AArch64] Add some float -> int -> float conversion patterns This adds some conversion match patterns for which we want to keep the int values in FP registers using the corresponding NEON instructions (not the FP instructions) to avoid more costly int <-> fp register transfers. Differential Revision: https://reviews.llvm.org/D98956	2021-03-22 11:06:08 +00:00
Jessica Paquette	0ca83730cc	Recommit "[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE" This reverts commit `962b73dd0f`. This commit was reverted because of some internal SPEC test failures. It turns out that this wasn't actually relevant to anything in open source, so it's safe to recommit this.	2021-03-18 16:01:02 -07:00
Peter Waller	0d6482a76a	[llvm][AArch64][SVE] Lower fixed length vector fabs Seemingly striaghtforward. Differential Revision: https://reviews.llvm.org/D98434	2021-03-18 17:20:08 +00:00
Matt Arsenault	61f834cc09	GlobalISel: Insert memcpy for outgoing byval arguments byval requires an implicit copy between the caller and callee such that the callee may write into the stack area without it modifying the value in the parent. Previously, this was passing through the raw pointer value which would break if the callee wrote into it. Most of the time, this copy can be optimized out (however we don't have the optimization SelectionDAG does yet). This will trigger more fallbacks for AMDGPU now, since we don't have legalization for memcpy yet (although we should stop using byval anyway).	2021-03-18 09:16:54 -04:00
Thomas Preud'homme	b79044391e	[test] Fix incorrect use of string variable use LLVM test CodeGen/AArch64/machine-outliner-retaddr-sign-thunk.ll uses a string substitution block that contains a regex matching block. This seems like as a copy/paste from other similar test where the match also defines a variable, hence the [[]] syntax. In this case however this is a CHECK-NOT variable so nothing should match. No variable definition is thus expected and the square brackets can be dropped. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D98853	2021-03-18 12:19:51 +00:00
Sjoerd Meijer	90ecb862a0	[AArch64] Rewrite (add, csel) to cinc Don't rewrite an add instruction with 2 SET_CC operands into a csel instruction. The total instruction sequence uses an extra instruction and register. Preventing this allows us to match a `(add, csel)` pattern and rewrite this into a `cinc`. Differential Revision: https://reviews.llvm.org/D98704	2021-03-18 08:49:27 +00:00
Amara Emerson	28963d895b	[GlobalISel] Don't DCE LIFETIME_START/LIFETIME_END markers. These are pseudos without any users, so DCE was killing them in the combiner. Marking them as having side effects doesn't seem quite right since they don't. Gives a nice 0.3% geomean size win on CTMark -Os. Differential Revision: https://reviews.llvm.org/D98811	2021-03-17 18:02:08 -07:00
Amara Emerson	d7fed7b899	[AArch64][GlobalISel] Fall back if disabling neon/fp in the translator. The previous technique relied on early-exiting the legalizer predicate initialization, leaving an empty rule table. That causes a fallback for most instructions, but some have legacy rules defined like G_ZEXT which can try continue, but then crash. We should fall back earlier, in the translator, to avoid this issue. Differential Revision: https://reviews.llvm.org/D98730	2021-03-17 15:08:08 -07:00
Pavel Iliin	bd79b565e3	[NFC][AArch64] Add codegen tests for various csinc-cmp sequences.	2021-03-17 20:17:40 +00:00
Bradley Smith	cf0da91ba5	[AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE Previously NEON used a target specific intrinsic for frintn, given that the FROUNDEVEN ISD node now exists, move over to that instead and add codegen support for that node for both NEON and fixed length SVE. Differential Revision: https://reviews.llvm.org/D98487	2021-03-17 11:41:22 +00:00
Joe Ellis	ff2dd8a212	[AArch64][SVE] Fold vector ZExt/SExt into gather loads where possible This commit folds sxtw'd or uxtw'd offsets into gather loads where possible with a DAGCombine optimization. As an example, the following code: 1 #include <arm_sve.h> 2 3 svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) { 4 return svld1sw_gather_s64offset_u64( 5 pred, base, svextw_s64_x(pred, offsets) 6 ); 7 } would previously lower to the following assembly: sxtw z0.d, p0/m, z0.d ld1sw { z0.d }, p0/z, [x0, z0.d] ret but now lowers to: ld1sw { z0.d }, p0/z, [x0, z0.d, sxtw] ret Differential Revision: https://reviews.llvm.org/D97858	2021-03-16 15:09:46 +00:00
Joe Ellis	14bd44edc6	[AArch64][SVEIntrinsicOpts] Factor out redundant SVE mul/fmul intrinsics This commit implements an IR-level optimization to eliminate idempotent SVE mul/fmul intrinsic calls. Currently, the following patterns are captured: fmul pg (dup_x 1.0) V => V mul pg (dup_x 1) V => V fmul pg V (dup_x 1.0) => V mul pg V (dup_x 1) => V fmul pg V (dup v pg 1.0) => V mul pg V (dup v pg 1) => V The result of this commit is that code such as: 1 #include <arm_sve.h> 2 3 svfloat64_t foo(svfloat64_t a) { 4 svbool_t t = svptrue_b64(); 5 svfloat64_t b = svdup_f64(1.0); 6 return svmul_m(t, a, b); 7 } will lower to a nop. This commit does not capture all possibilities; only the simple cases described above. There is still room for further optimisation. Differential Revision: https://reviews.llvm.org/D98033	2021-03-16 14:50:17 +00:00
Amara Emerson	9575c48b89	[AArch64][GlobalISel] Fix crash on lowering <1 x half> types.	2021-03-15 23:27:43 -07:00
Stelios Ioannou	ab86edbc88	[AArch64] Implement __rndr, __rndrrs intrinsics This patch implements the __rndr and __rndrrs intrinsics to provide access to the random number instructions introduced in Armv8.5-A. They are only defined for the AArch64 execution state and are available when __ARM_FEATURE_RNG is defined. These intrinsics store the random number in their pointer argument and return a status code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter intrinsic reseeds the random number generator. The instructions write the NZCV flags indicating the success of the operation that we can then read with a CSET. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics [2] https://bugs.llvm.org/show_bug.cgi?id=47838 Differential Revision: https://reviews.llvm.org/D98264 Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc	2021-03-15 17:51:48 +00:00
David Green	0b2aae42e5	[AArch64] Zero extended extract_vector_elt pattern This adds a pattern for i64 zext_inreg(i32 extract_vector_elt X), producing a single UMOVvi16 instruction that is already expected to clear the top bits. The exact pattern that this matches is and(anyext(vector_extract X, lane), 0xff), similar to the sext patterns higher up in the same file. Differential Revision: https://reviews.llvm.org/D98599	2021-03-15 14:56:20 +00:00
Bradley Smith	d09ae9328f	[AArch64][SVE] Add unpredicated ld1/st1 patterns for reg+reg addressing modes Differential Revision: https://reviews.llvm.org/D95677	2021-03-15 12:36:28 +00:00
David Green	b0b9126897	[AArch64] Expand build-vector-extract.ll tests to i8's. NFC	2021-03-14 15:29:14 +00:00
Simonas Kazlauskas	a2eca31da2	Test cases for rem-seteq fold with illegal types This also briefly tests a larger set of architectures than the more exhaustive functionality tests for AArch64 and x86. As requested in D88785 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D98339	2021-03-12 16:28:04 +02:00
Matt Arsenault	6b76d82853	GlobalISel: Fix marking byval arguments as immutable byval arguments need to be assumed writable. Only implicitly stack passed arguments which aren't addressable in the IR can be assumed immutable. Mips is still broken since for some reason its doing its own thing with the ValueHandlers (and x86 doesn't actually handle byval arguments now, although some of the code is there).	2021-03-12 09:01:53 -05:00
Matt Arsenault	34471c3060	GlobalISel: Partially fix handling of byval arguments This was essentially ignoring byval and treating them as a pointer argument which needed to be loaded from. This should copy the frame index value to the virtual register, not insert a load from the frame index into the pointer value. For AMDGPU, this was producing a load from the byval pointer argument, to a pointer used for the byval arguments. I do not understand how AArch64 managed to work before since it appears to be similarly broken. We could also change the ValueHandler API to avoid the extra copy from the frame index, since currently it returns a new register. I believe there is still an issue with outgoing byval arguments. These should have a copy inserted in case the callee decided to overwrite the memory.	2021-03-12 09:01:53 -05:00
Matt Arsenault	d44a3dad99	AArch64/GlobalISel: Don't use common prefix in test Unlike update_llc_test_checks, update_mir_test_checks isn't actually smart enough to common functions with identical output.	2021-03-12 09:01:52 -05:00
Bradley Smith	860ae9d50c	[AArch64][SVE] Add fixed/scalable lowering of FMAXIMUM/FMINIMUM ISD nodes Differential Revision: https://reviews.llvm.org/D98348	2021-03-11 13:37:47 +00:00
Bradley Smith	ea834c8365	Revert "[AArch64][SVE] Allow accesses to SVE stack objects to use frame pointer" This patch introduced codegen faults. An attempt to fix this was done in https://reviews.llvm.org/D97193, but ultimately it was decided to approach this differently. This reverts commit `42635856ed`. Differential Revision: https://reviews.llvm.org/D98350	2021-03-11 13:32:35 +00:00
Daniel Sanders	134a179dee	[mir] Change 'undef' for MMO base addresses to 'unknown-address' Differential Revision: https://reviews.llvm.org/D98100	2021-03-10 16:46:44 -08:00
David Green	1a808286ef	[AArch64] Extend vecreduce -> udot handling to mla reductions We previously have lowering for: vecreduce.add(zext(X)) to vecreduce.add(UDOT(zero, X, one)) This extends that to also handle: vecreduce.add(mul(zext(X), zext(Y)) to vecreduce.add(UDOT(zero, X, Y)) It extends the existing code to optionally handle a mul with equal extends. Differential Revision: https://reviews.llvm.org/D97280	2021-03-10 22:25:12 +00:00
David Green	a02f506876	[AArch64] Extend vecreduce -> udot handling to v8i8 https://reviews.llvm.org/D88577 added v16i8 vecreduce to udot/sdot lowering. This extends that to v8i8 too, generalizing the pattern to handle the extra types. Differential Revision: https://reviews.llvm.org/D97279	2021-03-10 21:03:15 +00:00

1 2 3 4 5 ...

4491 Commits