llvm-project

Commit Graph

Author	SHA1	Message	Date
Guillaume Chatelet	189d2e215f	[Alignment][NFC] Use more Align versions of various functions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: MatzeB, qcolombet, arsenm, sdardis, jvesely, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77291	2020-04-02 09:00:53 +00:00
Guillaume Chatelet	3a78f44daf	[Alignment][NFC] Convert SelectionDAG::InferPtrAlignment to MaybeAlign Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77212	2020-04-01 13:22:11 +00:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Guillaume Chatelet	b9810988b2	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77127	2020-03-31 11:04:10 +00:00
Guillaume Chatelet	c9d5c19597	[Alignment][NFC] Transitionning more getMachineMemOperand call sites Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77121	2020-03-31 08:36:18 +00:00
Guillaume Chatelet	b91535f6c7	[Alignment][NFC] Return Align for SelectionDAGNodes::getOriginalAlignment/getAlignment Summary: Also deprecate getOriginalAlignment, getAlignment will take much more time as it is pervasive through the codebase (including TableGened files). This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76933	2020-03-30 07:26:48 +00:00
David Green	c9eaed5149	[ARM] MVE VMOV.i64 In the original batch of MVE VMOVimm code generation VMOV.i64 was left out due to the way it was done downstream. It turns out that it's fairly simple though. This adds the codegen for it, similar to NEON. Bigendian is technically incorrect in this version, which John is fixing in a Neon patch.	2020-03-30 07:44:23 +01:00
David Green	37b9cc8f29	[ARM] Sink splats to vector float instructions Some MVE floating point instructions have gpr register variants that take the scalar gpr value and splat them to all lanes. In order to accept them in loops, the shuffle_vector and insert need to be sunk down into the loop, next to the instruction so that ISel can see the whole pattern. This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for mul are slightly more constrained as there are no fms variants taking register arguments. Differential Revision: https://reviews.llvm.org/D76023	2020-03-26 09:02:18 +00:00
David Green	f8c79b94af	[ARM] Fold VMOVrh VLDR to LDRH This adds a simple fold to combine VMOVrh load to a integer load. Similar to what is already performed for BITCAST, but needs to account for the types being of different sizes, creating an zero extending load. Differential Revision: https://reviews.llvm.org/D76485	2020-03-24 15:51:03 +00:00
David Green	1232cfa385	[ARM] Don't split trunc stores that can be better handled as VMOVN We deliberately split stores of the form store(truncate(larger-than-legal-type)) into two stores, allowing each store to perform part of the truncate for free. There are times however where it makes more sense to use VMOVN to de-interlace the results back into a single vector, and store that in one go. This adds a check for that situation, not splitting the store if it looks like a VMOVN can be more useful. Differential Revision: https://reviews.llvm.org/D76511	2020-03-24 08:48:52 +00:00
Simon Tatham	1adfa4c991	[ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family. Summary: I've implemented them as target-specific IR intrinsics rather than using `@llvm.experimental.vector.reduce.add`, on the grounds that the 'experimental' intrinsic doesn't currently have much code generation benefit, and my replacements encapsulate the sign- or zero-extension so that you don't expose the illegal MVE vector type (`<4 x i64>`) in IR. The machine instructions come in two versions: with and without an input accumulator. My new IR intrinsics, like the 'experimental' one, don't take an accumulator parameter: we represent that by just adding on the input value using an ordinary i32 or i64 add. So if you write the `vaddvaq` C-language intrinsic with an input accumulator of zero, it can be optimised to VADDV, and conversely, if you write something like `x += vaddvq(y)` then that can be combined into VADDVA. Most of this is achieved in isel lowering, by converting these IR intrinsics into the existing `ARMISD::VADDV` family of custom SDNode types. For the difficult case (64-bit accumulators), isel lowering already implements the optimization of folding an addition into a VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is already done, except that I had to introduce a parallel set of ARMISD nodes for the //predicated// forms of VADDLV. For the simpler VADDV, we handle the predicated form by just leaving the IR intrinsic alone and matching it in an ordinary dag pattern. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76491	2020-03-20 15:42:33 +00:00
Simon Tatham	45a9945b9e	[ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family. Summary: I've implemented these as target-specific IR intrinsics, because they're not //quite// enough like @llvm.experimental.vector.reduce.min (which doesn't take the extra scalar parameter). Also this keeps the predicated and unpredicated versions looking similar, and the floating-point minnm/maxnm versions fold into the same schema. We had a couple of min/max reductions already implemented, from the initial pathfinding exercise in D67158. Those were done by having separate IR intrinsic names for the signed and unsigned integer versions; as part of this commit, I've changed them to use a flag parameter indicating signedness, which is how we ended up deciding that the rest of the MVE intrinsics family ought to work. So now hopefully the ewhole lot is consistent. In the new llc test, the output code from the `v8f16` test functions looks quite unpleasant, but most of it is PCS lowering (you can't pass a `half` directly in or out of a function). In other circumstances, where you do something else with your `half` in the same function, it doesn't look nearly as nasty. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76490	2020-03-20 15:42:33 +00:00
David Green	b3499f572d	[ARM] Change VDUP type to i32 for MVE The MVE VDUP instruction take a GPR and splats into every lane of a vector register. Unlike NEON we do not have a VDUPLANE equivalent instruction, doing the same splat from a fp register. Previously a VDUP to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which would mean the instruction pattern needs to add a COPY_TO_REGCLASS to the GPR. Instead this now converts that earlier during an ISel DAG combine, converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction selection to tell that the input needs to be an i32, which in one of the testcases allows it to use ldr (or specifically ldm) over (vldr;vmov). Whilst being simple enough for floats, as the types sizes are the same, these is no BITCAST equivalent for getting a half into a i32. This uses a VMOVrh ARMISD node, which doesn't know the same tricks yet. Differential Revision: https://reviews.llvm.org/D76292	2020-03-20 09:48:45 +00:00
Eli Friedman	e24e95fe90	Remove CompositeType class. The existence of the class is more confusing than helpful, I think; the commonality is mostly just "GEP is legal", which can be queried using APIs on GetElementPtrInst. Differential Revision: https://reviews.llvm.org/D75660	2020-03-18 13:53:17 -07:00
Oliver Stannard	73cea83a6f	[IPRA][ARM] Spill extra registers at -Oz When optimising for code size at the expense of performance, it is often worth saving and restoring some of r0-r3, if IPRA will be able to take advantage of them. This doesn't cost any extra code size if we already have a PUSH/POP pair, and increases the number of available registers across any calls to the function. We already have an optimisation which tries fold the subtract/add of the SP into the PUSH/POP by using extra registers, which somewhat conflicts with this. I've made the new optimisation less aggressive in cases where the existing one is likely to trigger, which gives better results than either of these optimisations by themselves. Differential revision: https://reviews.llvm.org/D69936	2020-03-18 13:51:16 +00:00
Simon Tatham	928776de92	[ARM,MVE] Add intrinsics for the VQDMLAH family. Summary: These are complicated integer multiply+add instructions with extra saturation, taking the high half of a double-width product, and optional rounding. There's no sensible way to represent that in standard IR, so I've converted the clang builtins directly to target-specific intrinsics. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76123	2020-03-18 10:55:04 +00:00
Simon Tatham	28c5d97bee	[ARM,MVE] Add intrinsics and isel for MVE integer VMLA. Summary: These instructions compute multiply+add in integers, with one of the operands being a splat of a scalar. (VMLA and VMLAS differ in whether the splat operand is a multiplier or the addend.) I've represented these in IR using existing standard IR operations for the unpredicated forms. The predicated forms are done with target- specific intrinsics, as usual. When operating on n-bit vector lanes, only the bottom n bits of the i32 scalar operand are used. So we have to tell that to isel lowering, to allow it to remove a pointless sign- or zero-extension instruction on that input register. That's done in `PerformIntrinsicCombine`, but first I had to enable `PerformIntrinsicCombine` for MVE targets (previously all the intrinsics it handled were for NEON), and make it a method of `ARMTargetLowering` so that it can get at `SimplifyDemandedBits`. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76122	2020-03-18 10:55:04 +00:00
David Green	2c6c169dbd	[ARM] Optimise ASRL/LSRL to smaller shifts using demand bits. The ASRL/LSRL long shifts are generated from 64bit shifts. Once we have them, it might turn out that enough of the 64bit result was not required that we can use a smaller shift to perform the same result. As the smaller shift can in general be folded in more way, such as into add instructions in one of the test cases here, we can use the demand bit analysis to prefer the smaller shifts where we can. Differential Revision: https://reviews.llvm.org/D75371	2020-03-13 10:09:03 +00:00
David Green	f67d93dc23	[ARM] Constant long shift combines This changes the way that asrl and lsrl intrinsics are lowered, going via a the ISEL ASRL and LSLL nodes instead of straight to machine nodes. On top of that, it adds some constant folds for long shifts, in case it turns out that the shift amount was either constant or 0. Differential Revision: https://reviews.llvm.org/D75553	2020-03-13 08:54:59 +00:00
Victor Campos	8a12553223	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. The code generation explicitly deviates from using the register-offset variant of LDRD/STRD. In this variant, the register allocated to the register-offset cannot be reused in any of the remaining operands. Such restriction seems to be non-trivial to implement in LLVM, thus it is left as a to-do. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2020-03-11 10:19:27 +00:00
James Greenhalgh	f0de8d0940	[Arm] Do not lower vmax/vmin to Neon instructions On some Arm cores there is a performance penalty when forwarding from an S register to a D register. Calculating VMAX in a D register creates false forwarding hazards, so don't do that unless we're on a core which specifically asks for it. Patch by James Greenhalgh Differential Revision: https://reviews.llvm.org/D75248	2020-03-10 10:48:48 +00:00
Djordje Todorovic	c15c68abdc	[CallSiteInfo] Enable the call site info only for -g + optimizations Emit call site info only in the case of '-g' + 'O>0' level. Differential Revision: https://reviews.llvm.org/D75175	2020-03-09 12:12:44 +01:00
David Green	13f2a5883f	[ARM] Fixup FP16 bitcasts Under fp16 we optimise the bitcast between a VMOVhr and a CopyToReg via custom lowering. This rewrites that to be a DAG combine instead, which helps produce better code in the cases where the bitcast is actaully legal. Differential Revision: https://reviews.llvm.org/D72753	2020-02-27 12:19:31 +00:00
Craig Topper	735d27dc40	[SelectionDAG][PowerPC][AArch64][X86][ARM] Add chain input and output the ISD::FLT_ROUNDS_ This node reads the rounding control which means it needs to be ordered properly with operations that change the rounding control. So it needs to be chained to maintain order. This patch adds a chain input and output to the node and connects it to the chain in SelectionDAGBuilder. I've update all in-tree targets to connect their chain through their lowering code. Differential Revision: https://reviews.llvm.org/D75132	2020-02-25 16:58:23 -08:00
Hans Wennborg	decd021fac	Don't generate libcalls for wide shift on Windows ARM (PR42711) The previous patch (`cff90f07cb`) didn't cover ARM.	2020-02-25 11:54:07 +01:00
Sam Parker	03756a4197	[ARM][MVE] Combine more extending masked loads For MVE, don't look at the users of the extending loads so that more as desirable for folding. Differential Revision: https://reviews.llvm.org/D74958	2020-02-24 07:50:15 +00:00
David Green	83012cb217	[ARM] Correct Formatting. NFC Also removed an unnecessary TODO that I don't believe is relevant for the instruction in question.	2020-02-21 16:08:56 +00:00
Djordje Todorovic	2f215cf36a	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGfaff707db82d. A failure found on an ARM 2-stage buildbot. The investigation is needed.	2020-02-20 14:41:39 +01:00
David Green	33aa5dfe9c	[ARM] VMLAVA reduction patterns Similar to VADDV and VADDLV that have been added recently, this adds lowering and patterns for VMLAV, VMLAVA, VMLALV and VMLALVA. They perform the same roles as the add's, just folding a mul into the same instruction (and so taking two inputs). As such, they need to be lowered in the same way as the types are often not legal. Differential Revision: https://reviews.llvm.org/D74390	2020-02-19 12:39:58 +00:00
David Green	fceb3e3b4a	[ARM] MVE VADDLV lowering Following on from the extra VADDV lowering, this extends things to handle VADDLV which allows summing values into a pair of i32 registers, together treated as a i64. This needs to be done in DAGCombine too as the types are otherwise illegal, which is a fairly simple addition on top of the existing code. There is also a VADDLVA instruction handled here, that adds the incoming values from the two general purpose registers. As opposed to the non-long version where we could just add patterns for add(x, VADDV), the long version needs to handle this early before the i64 has being split into too many pieces. Differential Revision: https://reviews.llvm.org/D74224	2020-02-19 11:07:20 +00:00
Djordje Todorovic	faff707db8	Reland "[DebugInfo] Enable the debug entry values feature by default" Differential Revision: https://reviews.llvm.org/D73534	2020-02-19 11:12:26 +01:00
David Green	51c6e9445c	[ARM] Extra MVE VADDV reduction patterns We already make use of the VADDV vector reduction instruction for cases where the input and the output start out at the same type. The MVE instruction however will sum into an i32, so if we are summing a v16i8 into an i32, we can still use the same instructions. In terms of IR, this looks like a sext of a legal type (v16i8) into a very illegal type (v16i32) and a vecreduce.add of that into the result. This means we have to catch the pattern early in a DAG combine, producing a target VADDVs/u node, where the signedness is now important. This is the first part, handling VADDV and VADDVA. There are also VADDVL/VADDVLA instructions, which are interesting because they sum into a 64bit value. And VMLAV and VMLALV, which are interesting because they also do a multiply of two values. It may look a little odd in places as a result. On it's own this will probably not do very much, as the vectorizer will not produce this IR yet. Differential Revision: https://reviews.llvm.org/D74218	2020-02-19 09:45:35 +00:00
Djordje Todorovic	2bf44d11cb	Revert "Reland "[DebugInfo] Enable the debug entry values feature by default"" This reverts commit rGa82d3e8a6e67.	2020-02-18 16:38:11 +01:00
Djordje Todorovic	a82d3e8a6e	Reland "[DebugInfo] Enable the debug entry values feature by default" This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-18 14:41:08 +01:00
John Brawn	594a89f727	[FPEnv][ARM] Don't call mutateStrictFPToFP when lowering mutateStrictFPToFP can delete the node and replace it with another with the same value which can later cause problems, and returning the result of mutateStrictFPToFP doesn't work because SelectionDAGLegalize expects that the returned value has the same number of results as the original. Instead handle things by doing the mutation manually. Differential Revision: https://reviews.llvm.org/D74726	2020-02-17 18:19:25 +00:00
Benjamin Kramer	5fc5c7db38	Strength reduce vectors into arrays. NFCI.	2020-02-17 15:37:35 +01:00
John Brawn	0ec5797296	[ARM] Fix infinite loop when lowering STRICT_FP_EXTEND If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the current implementation will cause in infinite loop for STRICT_FP_EXTEND due to emitting a merge_values of the original node which after replacement becomes a merge_values of itself. Fix this by not doing anything for f32 to f64 extend when we have FP64, though for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that doesn't happen automatically for opcodes with custom lowering. Differential Revision: https://reviews.llvm.org/D74559	2020-02-13 16:12:50 +00:00
David Green	9d4c597541	[ARM] Fix ReconstructShuffle for bigendian Simon pointed out that this function is doing a bitcast, which can be incorrect for big endian. That makes the lowering of VMOVN in MVE wrong, but the function is shared between Neon and MVE so both can be incorrect. This attempts to fix things by using the newly added VECTOR_REG_CAST instead of the BITCAST. As it may now be used on Neon, I've added the relevant patterns for it there too. I've also added a quick dag combine for it to remove them where possible. Differential Revision: https://reviews.llvm.org/D74485	2020-02-13 09:56:46 +00:00
Jay Foad	32aac25637	[KnownBits] Introduce anyext instead of passing a flag into zext Summary: This was a very odd API, where you had to pass a flag into a zext function to say whether the extended bits really were zero or not. All callers passed in a literal true or false. I think it's much clearer to make the function name reflect the operation being performed on the value we're tracking (rather than on the KnownBits Zero and One fields), so zext means the value is being zero extended and new function anyext means the value is being extended with unknown bits. NFC. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74482	2020-02-12 19:06:53 +00:00
Djordje Todorovic	97ed706a96	Revert "[DebugInfo] Enable the debug entry values feature by default" This reverts commit rG9f6ff07f8a39. Found a test failure on clang-with-thin-lto-ubuntu buildbot.	2020-02-12 11:59:04 +01:00
Djordje Todorovic	9f6ff07f8a	[DebugInfo] Enable the debug entry values feature by default This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-12 10:25:14 +01:00
Craig Topper	eeb63944e4	[LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results. Remove code from LegalizeTypes that allowed this to work. We were already using BUILD_PAIR for this in some places so this standardizes on a single way to do this.	2020-02-08 09:52:31 -08:00
Victor Campos	af2a384581	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `60e0120c91`.	2020-02-08 13:18:45 +00:00
Guillaume Chatelet	f85d3408e6	[NFC] Introduce an API for MemOp Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73964	2020-02-07 11:32:27 +01:00
Guillaume Chatelet	b8144c0536	[NFC] Encapsulate MemOp logic Summary: This patch simply introduces functions instead of directly accessing the fields. This helps introducing additional check logic. A second patch will add simplifying functions. Reviewers: courbet Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73945	2020-02-04 10:36:26 +01:00
Guillaume Chatelet	333f2ad8b8	[Alignment][NFC] Use Align for getMemcpy/Memmove/Memset Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73885	2020-02-03 17:13:19 +01:00
John Brawn	b37d59353f	[FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS These can be lowered to code sequences using CMPFP and CMPFPE which then get selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain operand isn't handled correctly, but resolving that looks like it would involve changes around FPSCR-handling instructions and how the FPSCR is modelled. The fp-intrinsics test was already testing some of this but as the entire test was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the cases where we aren't generating the right instruction sequences as FIXME. Differential Revision: https://reviews.llvm.org/D73194	2020-02-03 12:59:12 +00:00
Simon Tatham	961530fdc9	[ARM,MVE] Fix vreinterpretq in big-endian mode. Summary: In big-endian MVE, the simple vector load/store instructions (i.e. both contiguous and non-widening) don't all store the bytes of a register to memory in the same order: it matters whether you did a VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register formats of different vector types relate to each other in a different way from the in-memory formats. So, if you want to 'bitcast' or 'reinterpret' one vector type as another, you have to carefully specify which you mean: did you want to reinterpret the //register// format of one type as that of the other, or the //memory// format? The ACLE `vreinterpretq` intrinsics are specified to reinterpret the register format. But I had implemented them as LLVM IR bitcast, which is specified for all types as a reinterpretation of the memory format. So a `vreinterpretq` intrinsic, applied to values already in registers, would code-generate incorrectly if compiled big-endian: instead of emitting no code, it would emit a `vrev`. To fix this, I've introduced a new IR intrinsic to perform a register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's implemented by a trivial isel pattern that expects the input in an MQPR register, and just returns it unchanged. In the clang codegen, I only emit this new intrinsic where it's actually needed: I prefer a bitcast wherever it will have the right effect, because LLVM understands bitcasts better. So we still generate bitcasts in little-endian mode, and even in big-endian when you're casting between two vector types with the same lane size. For testing, I've moved all the codegen tests of vreinterpretq out into their own file, so that they can have a different set of RUN lines to check both big- and little-endian. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73786	2020-02-03 11:20:06 +00:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Benjamin Kramer	adcd026838	Make llvm::StringRef to std::string conversions explicit. This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.	2020-01-28 23:25:25 +01:00
Simon Tatham	4321c6af28	[ARM,MVE] Support immediate vbicq,vorrq,vmvnq intrinsics. Summary: Immediate vmvnq is code-generated as a simple vector constant in IR, and left to the backend to recognize that it can be created with an MVE VMVN instruction. The predicated version is represented as a select between the input and the same constant, and I've added a Tablegen isel rule to turn that into a predicated VMVN. (That should be better than the previous VMVN + VPSEL: it's the same number of instructions but now it can fold into an adjacent VPT block.) The unpredicated forms of VBIC and VORR are done by enabling the same isel lowering as for NEON, recognizing appropriate immediates and rewriting them as ARMISD::VBICIMM / ARMISD::VORRIMM SDNodes, which I then instruction-select into the right MVE instructions (now that I've also reworked those instructions to use the same MC operand encoding). In order to do that, I had to promote the Tablegen SDNode instance `NEONvorrImm` to a general `ARMvorrImm` available in MVE as well, and similarly for `NEONvbicImm`. The predicated forms of VBIC and VORR are represented as a vector select between the original input vector and the output of the unpredicated operation. The main convenience of this is that it still lets me use the existing isel lowering for VBICIMM/VORRIMM, and not have to write another copy of the operand encoding translation code. This intrinsic family is the first to use the `imm_simd` system I put into the MveEmitter tablegen backend. So, naturally, it showed up a bug or two (emitting bogus range checks and the like). Fixed those, and added a full set of tests for the permissible immediates in the existing Sema test. Also adjusted the isel pattern for `vmovlb.u8`, which stopped matching because lowering started turning its input into a VBICIMM. Now it recognizes the VBICIMM instead. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D72934	2020-01-23 11:53:52 +00:00
David Green	ff2e67a4f7	[ARM] MVE VLDn postinc This adds Post inc variants of the VLD2/4 and VST2/4 instructions in MVE. It uses the same mechanism/nodes as Neon, transforming the intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a post-inc instruction. The code to do that is mostly taken from the existing Neon code, but simplified as less variants are needed. It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4 instrinsics, which allow the nodes to have MMO's, calculated as the full length to the memory being loaded/stored. Differential Revision: https://reviews.llvm.org/D71194	2020-01-20 06:57:07 +00:00
Craig Topper	bb2553175a	[TargetLowering][ARM][Mips][WebAssembly] Remove the ordered FP compare from RunttimeLibcalls.def and all associated usages Summary: This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare. This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code. Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn Reviewed By: efriedma Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72536	2020-01-10 19:30:08 -08:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
Victor Campos	60e0120c91	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2020-01-07 13:16:18 +00:00
James Henderson	d68904f957	[NFC] Fix trivial typos in comments Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D72143 Patch by Kazuaki Ishizaki.	2020-01-06 10:50:26 +00:00
David Green	fb8c9a339a	[ARM] Use isFMAFasterThanFMulAndFAdd for scalars as well as MVE vectors This adds extra scalar handling to isFMAFasterThanFMulAndFAdd, allowing the target independent code to handle more folds in more situations (for example if the fast math flags are present, but the global AllowFPOpFusion option isnt). It also splits apart the HasSlowFPVMLx into HasSlowFPVFMx, to allow VFMA and VMLA to be controlled separately if needed. Differential Revision: https://reviews.llvm.org/D72139	2020-01-05 11:24:04 +00:00
Reid Kleckner	9c2b72821b	Move tail call disabling code to target independent code When the "disable-tail-calls" attribute was added, checks were added for it in various backends. Now this code has proliferated, and it is something the target is responsible for checking. Move that responsibility back to the ISels (fast, global, and SD). There's no major functionality change, except for targets that never implemented this check. This LLVM attribute was originally added in `d9699bc7bd` (2015). Reviewers: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D72118	2020-01-03 11:27:41 -08:00
QingShan Zhang	2133d3c558	[DAGCombine] Initialize the default operation action for SIGN_EXTEND_INREG for vector type as 'expand' instead of 'legal' For now, we didn't set the default operation action for SIGN_EXTEND_INREG for vector type, which is 0 by default, that is legal. However, most target didn't have native instructions to support this opcode. It should be set as expand by default, as what we did for ANY_EXTEND_VECTOR_INREG. Differential Revision: https://reviews.llvm.org/D70000	2020-01-03 03:26:41 +00:00
David Green	b4abe7afbf	[ARM] Sink splat to ICmp This adds ICmp to the list of instructions that we sink a splat to in a loop, allowing the register forms of instructions to be selected more often. It does not add FCmp yet as the results look a little odd, trying to keep the register in an float reg and having to move it back to a GPR. Differential Revision: https://reviews.llvm.org/D70997	2019-12-30 12:58:14 +00:00
Martin Storsjö	b774aa1011	[ARM] [Windows] Use COFF stubs for calls to extern_weak functions As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Instead check the shouldAssumeDSOLocal method and load the address from a COFF stub. This matches what was done for X86 in `6bf108d77a`. Differential Revision: https://reviews.llvm.org/D71720	2019-12-23 12:13:49 +02:00
Victor Campos	2ff5a596cb	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `bbcf1c3496`.	2019-12-20 18:08:24 +00:00
Victor Campos	bbcf1c3496	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2019-12-19 11:23:01 +00:00
John Brawn	01ba201abc	[ARM] Add custom strict fp conversion lowering when non-strict is custom We have custom lowering for operations converting to/from floating-point types when we don't have hardware support for those types, and this doesn't interact well with the target-independent legalization of the strict versions of these operations. Fix this by adding similar custom lowering of the strict versions. This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics test, with the remaining failures due to poor instruction selection. Differential Revision: https://reviews.llvm.org/D71127	2019-12-13 13:00:00 +00:00
Alex Richardson	be15dfa88f	[NFC] Use EVT instead of bool for getSetCCInverse() Summary: The use of a boolean isInteger flag (generally initialized using VT.isInteger()) caused errors in our out-of-tree CHERI backend (https://github.com/CTSRD-CHERI/llvm-project). In our backend, pointers use a separate ValueType (iFATPTR) and therefore .isInteger() returns false. This meant that getSetCCInverse() was using the floating-point variant and generated incorrect code for us: `(void )0x12033091e < (void )0xffffffffffffffff` would return false. Committing this change will significantly reduce our merge conflicts for each upstream merge. Reviewers: spatel, bogner Reviewed By: bogner Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70917	2019-12-13 12:22:03 +00:00
Sam Parker	1274ac3dc2	[ARM][MVE] Sink vector shift operand Recommit `e0b966643f`. sub instructions were being generated for the negated value, and for some reason they were the register only ones. I think the problem was because I was grabbing the 'zero' from vmovimm, which is a target constant. Now I'm just generating a new Constant zero and so rsb instructions are now generated. Original commit message: The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841	2019-12-12 14:34:00 +00:00
Sam Parker	f8ff3bf55b	Revert "[ARM][MVE] Sink vector shift operand" This reverts commit `e0b966643f`. Instruction selection is failing with expensive checks.	2019-12-12 07:52:57 +00:00
Sam Parker	e0b966643f	[ARM][MVE] Sink vector shift operand The shift amount operand can be provided in a general purpose register so sink it. Flip the vdup and negate so the existing patterns can be used for matching. Differential Revision: https://reviews.llvm.org/D70841	2019-12-12 07:35:21 +00:00
Reid Kleckner	5d986953c8	[IR] Split out target specific intrinsic enums into separate headers This has two main effects: - Optimizes debug info size by saving 221.86 MB of obj file size in a Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of object file size. - Incremental step towards decoupling target intrinsics. The enums are still compact, so adding and removing a single target-specific intrinsic will trigger a rebuild of all of LLVM. Assigning distinct target id spaces is potential future work. Part of PR34259 Reviewers: efriedma, echristo, MaskRay Reviewed By: echristo, MaskRay Differential Revision: https://reviews.llvm.org/D71320	2019-12-11 18:02:14 -08:00
David Green	792fab343b	[ARM] Attempt to use whole register vmovs for MVE shuffles. MVE doesn't have the range of shuffle instructions available in Neon. We also cannot use the trick of cutting a difficult vector shuffle in half to simplify things. Instead we need to be more careful about how we lower shuffles. This patch adds an extra combine that attempts to find "whole lane" vmovs when lowering shuffles of smaller types. This helps us make some shuffles a lot simpler, generating single lane movs for the parts that can make use of it, falling back to the original shuffle for the rest. Differential Revision: https://reviews.llvm.org/D69509	2019-12-08 10:53:54 +00:00
David Green	3a6eb5f160	[ARM] Disable VLD4 under MVE Alas, using half the available vector registers in a single instruction is just too much for the register allocator to handle. The mve-vldst4.ll test here fails when these instructions are enabled at present. This patch disables the generation of VLD4 and VST4 by adding a mve-max-interleave-factor option, which we currently default to 2. Differential Revision: https://reviews.llvm.org/D71109	2019-12-08 10:37:29 +00:00
David Green	57d96ab593	[ARM] Add some VCMP folding and canonicalisation The VCMP instructions in MVE can accept a register or ZR, but only as the right hand operator. Most of the time this will already be correct because the icmp will have been canonicalised that way already. There are some cases in the lowering of float conditions that this will not apply to though. This code should fix up those cases. Differential Revision: https://reviews.llvm.org/D70822	2019-12-02 19:57:12 +00:00
Carey Williams	76fd58d0fe	Revert "[ARM] Allocatable Global Register Variables for ARM" This reverts commit `2d739f98d8`.	2019-11-29 17:01:05 +00:00
David Green	9f15fcc271	[ARM] Replace arm_neon_vqadds with sadd_sat This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat. This helps generate vqadds from standard IR nodes, which might be produced from the vectoriser. The old variants are removed in the process. Differential Revision: https://reviews.llvm.org/D69350	2019-11-27 13:32:29 +00:00
David Green	b5315ae8ff	[Codegen][ARM] Add addressing modes from masked loads and stores MVE has a basic symmetry between it's normal loads/store operations and the masked variants. This means that masked loads and stores can use pre-inc and post-inc addressing modes, just like the standard loads and stores already do. To enable that, this patch adds all the relevant infrastructure for treating masked loads/stores addressing modes in the same way as normal loads/stores. This involves: - Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra Offset operand that is added after the PtrBase. - Extending the IndexedModeActions from 8bits to 16bits to store the legality of masked operations as well as normal ones. This array is fairly small, so doubling the size still won't make it very large. Offset masked loads can then be controlled with setIndexedMaskedLoadAction, similar to standard loads. - The same methods that combine to indexed loads, such as CombineToPostIndexedLoadStore, are adjusted to handle masked loads in the same way. - The ARM backend is then adjusted to make use of these indexed masked loads/stores. - The X86 backend is adjusted to hopefully be no functional changes. Differential Revision: https://reviews.llvm.org/D70176	2019-11-26 16:21:01 +00:00
David Green	882f23caea	[ARM] MVE interleaving load and stores. Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392	2019-11-19 18:37:30 +00:00
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Graham Hunter	3f08ad611a	[SVE][CodeGen] Scalable vector MVT size queries * Implements scalable size queries for MVTs, split out from D53137. * Contains a fix for FindMemType to avoid using scalable vector type to contain non-scalable types. * Explicit casts for several places where implicit integer sign changes or promotion from 32 to 64 bits caused problems. * CodeGenDAGPatterns will treat scalable and non-scalable vector types as different. Reviewers: greened, cameron.mcinally, sdesmalen, rovka Reviewed By: rovka Differential Revision: https://reviews.llvm.org/D66871	2019-11-18 12:30:59 +00:00
Anna Welker	2d739f98d8	[ARM] Allocatable Global Register Variables for ARM Provides support for using r6-r11 as globally scoped register variables. This requires a -ffixed-rN flag in order to reserve rN against general allocation. If for a given GRV declaration the corresponding flag is not found, or the the register in question is the target's FP, we fail with a diagnostic. Differential Revision: https://reviews.llvm.org/D68862	2019-11-18 10:07:37 +00:00
Simon Tatham	5b9e4daef0	[ARM,MVE] Use VMOV.{S8,S16} for sign-extended extractelement. MVE includes instructions that extract an 8- or 16-bit lane from a vector and sign-extend it into the output 32-bit GPR. `ARMInstrMVE.td` already included isel patterns to select those instructions in response to the `ARMISD::VGETLANEs` selection-DAG node type. But `ARMISD::VGETLANEs` was never actually generated, because the code that creates it was conditioned on NEON only. It's an easy fix to enable the same code for integer MVE, and now IR that sign-extends the result of an extractelement (whether explicitly or as part of the function call ABI) will use `vmov.s8` instead of `vmov.u8` followed by `sxtb`. Reviewers: SjoerdMeijer, dmgreen, ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70132	2019-11-13 09:08:41 +00:00
Eli Friedman	5df3a87224	[AArch64][X86] Don't assume __powidf2 is available on Windows. We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013	2019-11-08 12:43:21 -08:00
David Green	91b0cad813	[ARM] Use isFMAFasterThanFMulAndFAdd for MVE The Arm backend will usually return false for isFMAFasterThanFMulAndFAdd, where both the fused VFMA.f32 and a non-fused VMLA.f32 are usually available for scalar code. For MVE we don't have the non-fused version though. It makes more sense for isFMAFasterThanFMulAndFAdd to return true, allowing us to simplify some of the existing ISel patterns. The tests here are that non of the existing tests failed, and so we are still selecting VFMA and VFMS. The one test that changed shows we can now select from fast math flags, as opposed to just relying on the isFMADLegalForFAddFSub option. Differential Revision: https://reviews.llvm.org/D69115	2019-11-04 15:05:41 +00:00
Andrew Paverd	d157a9bc8b	Add Windows Control Flow Guard checks (/guard:cf). Summary: A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on indirect function calls, using either the check mechanism (X86, ARM, AArch64) or or the dispatch mechanism (X86-64). The check mechanism requires a new calling convention for the supported targets. The dispatch mechanism adds the target as an operand bundle, which is processed by SelectionDAG. Another pass (CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as required by /guard:cf. This feature is enabled using the `cfguard` CC1 option. Reviewers: thakis, rnk, theraven, pcc Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65761	2019-10-28 15:19:39 +00:00
vhscampos	f6e11a36c4	[ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE Summary: Writing support for three ACLE functions: unsigned int __cls(uint32_t x) unsigned int __clsl(unsigned long x) unsigned int __clsll(uint64_t x) CLS stands for "Count number of leading sign bits". In AArch64, these two intrinsics can be translated into the 'cls' instruction directly. In AArch32, on the other hand, this functionality is achieved by implementing it in terms of clz (count number of leading zeros). Reviewers: compnerd Reviewed By: compnerd Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69250	2019-10-28 11:06:58 +00:00
Simon Tatham	1b45297e01	[ARM] Begin adding IR intrinsics for MVE instructions. This commit, together with the next few, will add a representative sample of the kind of IR intrinsics that we'll need in order to implement the user-facing ACLE intrinsics for MVE. Supporting all of them will take more work; the intention of this initial series of commits is to implement an intrinsic or two from lots of different categories, as examples and proofs of concept. This initial commit introduces a small number of IR intrinsics for instructions simple enough that they can use Tablegen ISel patterns: the predicated versions of the VADD and VSUB instructions (both integer and FP), VMIN and VMAX, and the float->half VCVT instruction (predicated and unpredicated). When using VPT-predicated instructions in automatic code generation, it will be convenient to specify the predicate value as a vector of the appropriate number of i1. To make it easy to specify all sizes of an instruction in one go and give each one the matching predicate vector type, I've added a system of Tablegen informational records describing MVE's vector types: each one gives the underlying LLVM IR ValueType (which may not be the same if the MVE vector is of explicitly signed or unsigned integers) and an appropriate vNi1 to use as the predicate vector. (Also, those info records include the usual encoding for the types, so that as we add associations between each instruction encoding and one of the new `MVEVectorVTInfo` records, we can remove some of the existing template parameters and replace them with references to the vector type info's fields.) The user-facing ACLE intrinsics will receive a predicate mask as a 16-bit integer, so I've also provided a pair of intrinsics i2v and v2i, to convert between an integer and a vector of i1 by just changing the register class. Reviewers: dmgreen, miyuki, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67158	2019-10-24 16:33:13 +01:00
David Green	d7b77f2203	[ARM] Add qadd lowering from a sadd_sat This lowers a sadd_sat to a qadd by treating it as legal. Also adds qsub at the same time. The qadd instruction sets the q flag, but we already have many cases where we do not model this in llvm. Differential Revision: https://reviews.llvm.org/D68976 llvm-svn: 375411	2019-10-21 12:33:46 +00:00
Guillaume Chatelet	bac5f6bd21	[Alignment][NFC] TargetCallingConv::setOrigAlign and TargetLowering::getABIAlignmentForCallingConv Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69243 llvm-svn: 375407	2019-10-21 11:01:55 +00:00
David Green	fba831e791	[ARM] Lower sadd_sat to qadd8 and qadd16 Lower the target independent signed saturating intrinsics to qadd8 and qadd16. This custom lowers them from a sadd_sat, catching the node early before it is promoted. It also adds a QADD8b and QADD16b node to mean the bottom "lane" of a qadd8/qadd16, so that we can call demand bits on it to show that it does not use the upper bits. Also handles QSUB8 and QSUB16. Differential Revision: https://reviews.llvm.org/D68974 llvm-svn: 375402	2019-10-21 09:53:38 +00:00
Sam Parker	39af8a3a3b	[DAGCombine][ARM] Enable extending masked loads Add generic DAG combine for extending masked loads. Allow us to generate sext/zext masked loads which can access v4i8, v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively. Differential Revision: https://reviews.llvm.org/D68337 llvm-svn: 375085	2019-10-17 07:55:55 +00:00
David Green	543236232c	[ARM] Selection for MVE VMOVN The adds both VMOVNt and VMOVNb instruction selection from the appropriate shuffles. We detect shuffle masks of the form: 0, N, 2, N+2, 4, N+4, ... or 0, N+1, 2, N+3, 4, N+5, ... ISel will also try the opposite patterns, with inputs reversed. These are selected to VMOVNt and VMOVNb respectively. Differential Revision: https://reviews.llvm.org/D68283 llvm-svn: 374781	2019-10-14 15:19:33 +00:00
David Green	8628bb0491	[ARM] VQSUB instruction Same as VQADD, VQSUB can be selected from llvm.ssub.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68567 llvm-svn: 374377	2019-10-10 16:34:30 +00:00
David Green	39596ec2fe	[ARM] VQADD instructions This selects MVE VQADD from the vector llvm.sadd.sat or llvm.uadd.sat intrinsics. Differential Revision: https://reviews.llvm.org/D68566 llvm-svn: 374336	2019-10-10 13:05:04 +00:00
Nikola Prica	02682498b8	[ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers Support for tracking registers that forward function parameters into the following function frame. For now we only support cases when parameter is forwarded through single register. Reviewers: aprantl, vsk, t.p.northover Reviewed By: vsk Differential Revision: https://reviews.llvm.org/D66953 llvm-svn: 374033	2019-10-08 09:43:05 +00:00
Kristof Beyls	78bfe3ab94	[ARM] Generate vcmp instead of vcmpe Based on the discussion in http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the conclusion was reached that the ARM backend should produce vcmp instead of vcmpe instructions by default, i.e. not be producing an Invalid Operation exception when either arguments in a floating point compare are quiet NaNs. In the future, after constrained floating point intrinsics for floating point compare have been introduced, vcmpe instructions probably should be produced for those intrinsics - depending on the exact semantics they'll be defined to have. This patch logically consists of the following parts: - Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which implemented fine-tuning for when to produce vcmpe (i.e. not do it for equality comparisons). The complexity introduced by those patches isn't needed anymore if we just always produce vcmp instead. Maybe these patches need to be reintroduced again once support is needed to map potential LLVM-IR constrained floating point compare intrinsics to the ARM instruction set. - Simply select vcmp, instead of vcmpe, see simple changes in lib/Target/ARM/ARMInstrVFP.td - Adapt lots of tests that tested for vcmpe (instead of vcmp). For all of these test, the intent of what is tested for isn't related to whether the vcmp should produce an Invalid Operation exception or not. Fixes PR43374. Differential Revision: https://reviews.llvm.org/D68463 llvm-svn: 374025	2019-10-08 08:25:42 +00:00
Benjamin Kramer	12e915b3fc	[ARM] Make helpers static. NFC. llvm-svn: 373503	2019-10-02 18:20:24 +00:00
David Green	c9b5ab8b1c	[ARM] Identity shuffles are legal Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE (they essentially just become bitcasts). We were not catching that in the existing set of what we considered legal though. On NEON, they would be covered by vext's, but that is not generally available in MVE. This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here but does what we want and prevents us from just rewriting what is the same function. Differential Revision: https://reviews.llvm.org/D68241 llvm-svn: 373446	2019-10-02 11:40:51 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
Guillaume Chatelet	18f805a7ea	[Alignment][NFC] Remove unneeded llvm:: scoping on Align types llvm-svn: 373081	2019-09-27 12:54:21 +00:00
David Green	10d10102a4	[ARM] Ensure we do not attempt to create lsll #0 During legalisation we can end up with some pretty strange nodes, like shifts of 0. We need to make sure we don't try to make long shifts of these, ending up with invalid assembly instructions. A long shift with a zero immediate actually encodes a shift by 32. Differential Revision: https://reviews.llvm.org/D67664 llvm-svn: 372839	2019-09-25 10:16:48 +00:00
David Green	2fb41fc70c	[ARM] Split large widening MVE loads Similar to rL372717, we can force the splitting of extends of vector loads in MVE, in order to use the better widening loads as opposed to going through expensive extends. This adds a combine to early-on detect extends of loads and split the load in two, from where normal legalisation will kick in and we get a series of widening loads. Differential Revision: https://reviews.llvm.org/D67909 llvm-svn: 372721	2019-09-24 10:53:09 +00:00
David Green	49d851f403	[ARM] Split large truncating MVE stores MVE does not have a simple sign extend instruction that can move elements across lanes. We currently often end up moving each lane into and out of a GPR, in order to get elements into the correct places. When we have a store of a trunc (or a extend of a load), we can instead just split the store/load in two, using the narrowing/widening load/store instructions from each half of the vector. This does that for stores. It happens very early in a store combine, so as to easily detect the truncates. (It would be possible to do this later, but that would involve looking through a buildvector of extract elements. Not impossible but this way seemed simpler). By enabling store combines we also get a vmovdrr combine for free, helping some other tests. Differential Revision: https://reviews.llvm.org/D67828 llvm-svn: 372717	2019-09-24 10:10:41 +00:00
Guillaume Chatelet	c281b40814	[Alignment] Get DataLayout::StackAlignment as Align Summary: Internally it is needed to know if StackAlignment is set but we can expose it as llvm::Align. This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67852 llvm-svn: 372585	2019-09-23 12:01:32 +00:00
Oliver Cruickshank	c84722ff27	[ARM] Fix CTTZ not generating correct instructions MVE CTTZ intrinsic should have been set to Custom, not Expand llvm-svn: 372401	2019-09-20 15:03:44 +00:00
Matt Arsenault	3ecab8e455	Reapply r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This reverts r372314, reapplying r372285 and the commits which depend on it (r372286-r372293, and r372296-r372297) This was missing one switch to getTargetConstant in an untested case. llvm-svn: 372338	2019-09-19 16:26:14 +00:00
Hans Wennborg	13bdae8541	Revert r372285 "GlobalISel: Don't materialize immarg arguments to intrinsics" This broke the Chromium build, causing it to fail with e.g. fatal error: error in backend: Cannot select: t362: v4i32 = X86ISD::VSHLI t392, Constant:i8<15> See llvm-commits thread of r372285 for details. This also reverts r372286, r372287, r372288, r372289, r372290, r372291, r372292, r372293, r372296, and r372297, which seemed to depend on the main commit. > Encode them directly as an imm argument to G_INTRINSIC. > > Since now intrinsics can now define what parameters are required to be > immediates, avoid using registers for them. Intrinsics could > potentially want a constant that isn't a legal register type. Also, > since G_CONSTANT is subject to CSE and legalization, transforms could > potentially obscure the value (and create extra work for the > selector). The register bank of a G_CONSTANT is also meaningful, so > this could throw off future folding and legalization logic for AMDGPU. > > This will be much more convenient to work with than needing to call > getConstantVRegVal and checking if it may have failed for every > constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth > immarg operands, many of which need inspection during lowering. Having > to find the value in a register is going to add a lot of boilerplate > and waste compile time. > > SelectionDAG has always provided TargetConstant for constants which > should not be legalized or materialized in a register. The distinction > between Constant and TargetConstant was somewhat fuzzy, and there was > no automatic way to force usage of TargetConstant for certain > intrinsic parameters. They were both ultimately ConstantSDNode, and it > was inconsistently used. It was quite easy to mis-select an > instruction requiring an immediate. For SelectionDAG, start emitting > TargetConstant for these arguments, and using timm to match them. > > Most of the work here is to cleanup target handling of constants. Some > targets process intrinsics through intermediate custom nodes, which > need to preserve TargetConstant usage to match the intrinsic > expectation. Pattern inputs now need to distinguish whether a constant > is merely compatible with an operand or whether it is mandatory. > > The GlobalISelEmitter needs to treat timm as a special case of a leaf > node, simlar to MachineBasicBlock operands. This should also enable > handling of patterns for some G_ instructions with immediates, like > G_FENCE or G_EXTRACT. > > This does include a workaround for a crash in GlobalISelEmitter when > ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372314	2019-09-19 12:33:07 +00:00
David Green	0cfb78e52a	[ARM] MVE i1 splat We needn't BFI each lane individually into a predicate register when each lane in the same. A simple sign extend and a vmsr will do. Differential Revision: https://reviews.llvm.org/D67653 llvm-svn: 372313	2019-09-19 12:17:41 +00:00
Matt Arsenault	d8399d12cd	GlobalISel: Don't materialize immarg arguments to intrinsics Encode them directly as an imm argument to G_INTRINSIC. Since now intrinsics can now define what parameters are required to be immediates, avoid using registers for them. Intrinsics could potentially want a constant that isn't a legal register type. Also, since G_CONSTANT is subject to CSE and legalization, transforms could potentially obscure the value (and create extra work for the selector). The register bank of a G_CONSTANT is also meaningful, so this could throw off future folding and legalization logic for AMDGPU. This will be much more convenient to work with than needing to call getConstantVRegVal and checking if it may have failed for every constant intrinsic parameter. AMDGPU has quite a lot of intrinsics wth immarg operands, many of which need inspection during lowering. Having to find the value in a register is going to add a lot of boilerplate and waste compile time. SelectionDAG has always provided TargetConstant for constants which should not be legalized or materialized in a register. The distinction between Constant and TargetConstant was somewhat fuzzy, and there was no automatic way to force usage of TargetConstant for certain intrinsic parameters. They were both ultimately ConstantSDNode, and it was inconsistently used. It was quite easy to mis-select an instruction requiring an immediate. For SelectionDAG, start emitting TargetConstant for these arguments, and using timm to match them. Most of the work here is to cleanup target handling of constants. Some targets process intrinsics through intermediate custom nodes, which need to preserve TargetConstant usage to match the intrinsic expectation. Pattern inputs now need to distinguish whether a constant is merely compatible with an operand or whether it is mandatory. The GlobalISelEmitter needs to treat timm as a special case of a leaf node, simlar to MachineBasicBlock operands. This should also enable handling of patterns for some G_ instructions with immediates, like G_FENCE or G_EXTRACT. This does include a workaround for a crash in GlobalISelEmitter when ARM tries to uses "imm" in an output with a "timm" pattern source. llvm-svn: 372285	2019-09-19 01:33:14 +00:00
Graham Hunter	1a9195d817	[SVE][MVT] Fixed-length vector MVT ranges * Reordered MVT simple types to group scalable vector types together. * New range functions in MachineValueType.h to only iterate over the fixed-length int/fp vector types. * Stopped backends which don't support scalable vector types from iterating over scalable types. Reviewers: sdesmalen, greened Reviewed By: greened Differential Revision: https://reviews.llvm.org/D66339 llvm-svn: 372099	2019-09-17 10:19:23 +00:00
David Green	8d21460dc5	[ARM] A predicate cast of a predicate cast is a predicate cast The adds some very basic folding of PREDICATE_CASTS, removing cases when they are chained together. These would already be removed eventually, as these are lowered to copies. This just allows it to happen earlier, which can help other simplifications. Differential Revision: https://reviews.llvm.org/D67591 llvm-svn: 372012	2019-09-16 17:29:07 +00:00
Oliver Cruickshank	ee6fbebbaf	[ARM] Add patterns for BSWAP intrinsic on MVE BSWAP can use the VREV instruction on MVE to produce better results than expanding. llvm-svn: 372002	2019-09-16 15:20:10 +00:00
Oliver Cruickshank	e9510a6cad	[ARM] Add patterns for bitreverse intrinsic on MVE BITREVERSE can use the VBRSR which will reverse and right shift. Shifting right by 0 will just reverse the bits. llvm-svn: 372001	2019-09-16 15:20:03 +00:00
Oliver Cruickshank	5f799ef162	[ARM] Lower CTTZ on MVE Lower CTTZ on MVE using VBRSR and VCLS which will reverse the bits and count the leading zeros, equivalent to a count trailing zeros (CTTZ). llvm-svn: 372000	2019-09-16 15:19:56 +00:00
Oliver Cruickshank	cd1a0b9271	[ARM] Add patterns for CTLZ on MVE CTLZ intrinsic can use the VCLS instruction on MVE, which produces better results than expanding. llvm-svn: 371999	2019-09-16 15:19:49 +00:00
David Green	b325c05732	[ARM] Masked loads and stores Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc, and so is currently behind an option. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. Differential Revision: https://reviews.llvm.org/D67186 llvm-svn: 371932	2019-09-15 14:14:47 +00:00
Sam Tebbs	1572b68509	[ARM] Add support for MVE vmaxv and vminv This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv. Differential Revision: https://reviews.llvm.org/D66413 llvm-svn: 371827	2019-09-13 09:11:46 +00:00
Guillaume Chatelet	b6722af068	[Alignment] Use Align for TargetLowering::MinStackArgumentAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, nemanjai, hiraditya, kbarton, jrtc27, MaskRay, atanasyan, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67288 llvm-svn: 371498	2019-09-10 09:01:18 +00:00
David Green	2b7089949e	[ARM] Fix loads and stores for predicate vectors These predicate vectors can usually be loaded and stored with a single instruction, a VSTR_P0. However this instruction will store the entire P0 predicate, 16 bits, zeroextended to 32bits. Each lane of the the v4i1/v8i1/v16i1 representing 4/2/1 bits. As far as I understand, when llvm says "store this v4i1", it really does need to store 4 bits (or 8, that being the size of a byte, with this bottom 4 as the interesting bits). For example a bitcast from a v8i1 to a i8 is defined as a store followed by a load, which is how the code is expanded. So this instead lowers the v4i1/v8i1 load/store through some shuffles to get the bits into the correct positions. This, as you might imagine, is not as efficient as a single instruction. But I believe it is needed for correctness. v16i1 equally should not load/store 32bits, only storing the 16bits of data. Stack loads/stores are still using the VSTR_P0 (as can be seen by the test not changing). This is fine as they are self-consistent, it is only "externally observable loads/stores" (from our point of view) that need to be corrected. Differential revision: https://reviews.llvm.org/D67085 llvm-svn: 371419	2019-09-09 16:35:49 +00:00
Simon Pilgrim	d7d8bb937a	Fix MSVC "32-bit shift implicitly converted to 64 bits" warnings. NFCI. llvm-svn: 371302	2019-09-07 11:04:04 +00:00
Sam Tebbs	f1cdd95a2f	[ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection This patch sinks add/mul(shufflevector(insertelement())) into the basic block in which they are used so that they can then be selected together. This is useful for various MVE instructions, such as vmla and others that take R registers. Loop tests have been added to the vmla test file to make sure vmlas are generated in loops. Differential revision: https://reviews.llvm.org/D66295 llvm-svn: 371218	2019-09-06 16:01:32 +00:00
Guillaume Chatelet	9fcf066d0c	[Alignment][NFC] Use Align with TargetLowering::setPrefLoopAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, ychen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67278 llvm-svn: 371210	2019-09-06 14:51:15 +00:00
Guillaume Chatelet	4fc3ad9e13	[Alignment][NFC] Use Align with TargetLowering::setMinFunctionAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: jyknight, sdardis, nemanjai, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, s.egerton, pzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67229 llvm-svn: 371200	2019-09-06 12:48:34 +00:00
David Candler	a59bffb576	[ARM] Add support for the s,j,x,N,O inline asm constraints A number of inline assembly constraints are currently supported by LLVM, but rejected as invalid by Clang: Target independent constraints: s: An integer constant, but allowing only relocatable values ARM specific constraints: j: An immediate integer between 0 and 65535 (valid for MOVW) x: A 32, 64, or 128-bit floating-point/SIMD register: s0-s15, d0-d7, or q0-q3 N: An immediate integer between 0 and 31 (Thumb1 only) O: An immediate integer which is a multiple of 4 between -508 and 508. (Thumb1 only) This patch adds support to Clang for the missing constraints along with some checks to ensure that the constraints are used with the correct target and Thumb mode, and that immediates are within valid ranges (at least where possible). The constraints are already implemented in LLVM, but just a couple of minor corrections to checks (V8M Baseline includes MOVW so should work with 'j', 'N' and 'O' shouldn't be valid in Thumb2) so that Clang and LLVM are in line with each other and the documentation. Differential Revision: https://reviews.llvm.org/D65863 Change-Id: I18076619e319bac35fbb60f590c069145c9d9a0a llvm-svn: 371079	2019-09-05 15:17:25 +00:00
Guillaume Chatelet	aff45e4b23	[LLVM][Alignment] Make functions using log of alignment explicit Summary: This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align. The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment. A few renames uncovered dubious assignments: - `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation. - `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation, - `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation, Reviewers: lattner, thegameg, courbet Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet Tags: #llvm Differential Revision: https://reviews.llvm.org/D65945 llvm-svn: 371045	2019-09-05 10:00:22 +00:00
David Green	61973d978b	[ARM] Invert CSEL predicates if the opposite is a simpler constant to materialise This moves ConstantMaterializationCost into ARMBaseInstrInfo so that it can also be used in ISel Lowering, adding codesize values to the computed costs, to be able to compare either approximate instruction counts or codesize costs. It also adds a HasLowerConstantMaterializationCost, which compares the ConstantMaterializationCost of two values, returning true if the first is smaller either in instruction count/codesize, or falling back to the other in the case that they are equal. This is used in constant CSEL lowering to invert the predicate if the opposite is easier to materialise. Differential revision: https://reviews.llvm.org/D66701 llvm-svn: 370741	2019-09-03 11:06:24 +00:00
David Green	57cc65ff47	[ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions. Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value. This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used. Code by Ranjeet Singh and Simon Tatham, with some modifications from me. Differential revision: https://reviews.llvm.org/D66483 llvm-svn: 370739	2019-09-03 10:53:07 +00:00
David Green	a95ec59fa5	[ARM] Use MQPR not QPR for MVE registers We should be using MQPR, and if we don't we can get COPYs and PHIs created for QPR. These get folded into instructions, failing verification checks. Differential revision: https://reviews.llvm.org/D66214 llvm-svn: 370676	2019-09-02 17:18:23 +00:00
David Green	8469a39af3	[ARM] Remove MVE masked loads/stores These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607	2019-09-01 10:11:40 +00:00
David Green	942c2e3795	[ARM] MVE Masked loads and stores Masked loads and store fit naturally with MVE, the instructions being easily predicated. This adds lowering for the simple cases of masked loads and stores. It does not yet deal with widening/narrowing or pre/post inc. The llvm masked load intrinsic will accept a "passthru" value, dictating the values used for the zero masked lanes. In MVE the instructions write 0 to the zero predicated lanes, so we need to match a passthru that isn't 0 (or undef) with a select instruction to pull in the correct data after the load. We also need to do something with unaligned loads/stores. Currently this uses a similar method used in big endian, using an VLDRB.8 (and potentially a VREV in BE). This does mean that the predicate mask is converted from, for example, a v4i1 to a v16i1. The VLDR instructions are defined as using the first bit of the relevant mask lane, so this could potentially load different results if the predicate is little odd. As the input is a v4i1 however, I believe this is OK and all the bits required should be set in the predicate, making the VLDRB.8 load the same data. Differential Revision: https://reviews.llvm.org/D66534 llvm-svn: 370329	2019-08-29 10:54:35 +00:00
Shiva Chen	b39876d8cd	[RISCV] Avoid generating AssertZext for LP64 ABI when lowering floating LibCall The patch fixed the issue that RV64 didn't clear the upper bits when return complex floating value with lp64 ABI. float _Complex complex_add(float _Complex a, float _Complex b) { return a + b; } RealResult = zero_extend(RealA + RealB) ImageResult = ImageA + ImageB Return (RealResult \| (ImageResult << 32)) The patch introduces shouldExtendTypeInLibCall target hook to suppress the AssertZext generation when lowering floating LibCall. Thanks to Eli's comments from the Bugzilla https://bugs.llvm.org/show_bug.cgi?id=42820 Differential Revision: https://reviews.llvm.org/D65497 llvm-svn: 370275	2019-08-28 23:40:37 +00:00
Amaury Sechet	4f4387dd12	[TargetLowering] Add buildLegalVectorShuffle facility to help build legal shuffles Summary: There are at least 2 ways to express the same shuffle. Various pieces of code explicit check for both option, but other places do not when they would benefit from doing it. This patches refactor the codebase to use buildLegalVectorShuffle in order to make that behavior more consistent. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66804 llvm-svn: 370190	2019-08-28 12:00:06 +00:00
Sam Tebbs	a69d9d6156	Reapply: [ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32 The CodeGen/Thumb2/mve-vaddv.ll test needed to be amended to reflect the changes from the above patch. This reverts commit `cd53ff6`, reapplying `7c6b229`. llvm-svn: 369638	2019-08-22 10:29:20 +00:00
Hans Wennborg	cd53ff6c0d	Revert r369626 "[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32" It broke the bots, see e.g. http://lab.llvm.org:8011/builders/clang-cuda-build/builds/36275/ > This patch fixes shifts by a 128/256 bit shift amount. It also fixes > codegen for shifts of 32 by delegating to LLVM's default optimisation > instead of emitting a long shift. > > Tests that used to generate long shifts of 32 are updated to check for the > more optimised codegen. > > Differential revision: https://reviews.llvm.org/D66519 > > llvm-svn: 369626 llvm-svn: 369636	2019-08-22 09:16:53 +00:00
Sam Tebbs	7c6b229204	[ARM] Fix lsrl with a 128/256 bit shift amount or a shift of 32 This patch fixes shifts by a 128/256 bit shift amount. It also fixes codegen for shifts of 32 by delegating to LLVM's default optimisation instead of emitting a long shift. Tests that used to generate long shifts of 32 are updated to check for the more optimised codegen. Differential revision: https://reviews.llvm.org/D66519 llvm-svn: 369626	2019-08-22 08:12:06 +00:00
Shiva Chen	72a41e7b0d	[TargetLowering] Remove optional arguments passing to makeLibCall The patch introduces MakeLibCallOptions struct as suggested by @efriedma on D65497. The struct contain argument flags which will pass to makeLibCall function. The patch should not has any functionality changes. Differential Revision: https://reviews.llvm.org/D65795 llvm-svn: 369622	2019-08-22 04:59:43 +00:00
Sam Tebbs	f312c1ecf4	[ARM] Add support for MVE vaddv This patch adds vecreduce_add and the relevant instruction selection for vaddv. Differential revision: https://reviews.llvm.org/D66085 llvm-svn: 369245	2019-08-19 09:38:28 +00:00
Jian Cai	16fa8b0970	Reland "[ARM] push LR before __gnu_mcount_nc" This relands r369147 with fixes to unit tests. https://reviews.llvm.org/D65019 llvm-svn: 369173	2019-08-16 23:30:16 +00:00
Jian Cai	2d957cfe02	Revert "[ARM] push LR before __gnu_mcount_nc" This reverts commit `f4cf3b9593`. llvm-svn: 369149	2019-08-16 20:40:21 +00:00
Jian Cai	f4cf3b9593	[ARM] push LR before __gnu_mcount_nc Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of the stack on ARM32. Differential Revision: https://reviews.llvm.org/D65019 llvm-svn: 369147	2019-08-16 20:21:08 +00:00
David Green	8c2c5f5045	[ARM] Don't pretend we know how to generate MVE VLDn We don't yet know how to generate these instructions for MVE. And in the case of VLD3, we don't even have the instruction. For the moment don't tell the vectoriser that we have VLD4, just to end up serialising the results. Differential Revision: https://reviews.llvm.org/D66009 llvm-svn: 369101	2019-08-16 13:06:49 +00:00
Daniel Sanders	0c47611131	Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM Summary: This clang-tidy check is looking for unsigned integer variables whose initializer starts with an implicit cast from llvm::Register and changes the type of the variable to llvm::Register (dropping the llvm:: where possible). Partial reverts in: X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned& MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register PPCFastISel.cpp - No Register::operator-=() PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned& MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor Manual fixups in: ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned& HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register. PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned& Depends on D65919 Reviewers: arsenm, bogner, craig.topper, RKSimon Reviewed By: arsenm Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65962 llvm-svn: 369041	2019-08-15 19:22:08 +00:00
David Green	27ca82f32a	[ARM] Add support for MVE pre and post inc loads and stores This adds pre- and post- increment and decrements for MVE loads and stores. It uses the builtin pre and post load/store detection, unlike Neon. Loads are selected with the code in tryT2IndexedLoad, stores are selected with tablegen patterns. The immediates have a +/-7bit range, multiplied by the size of the element. Differential Revision: https://reviews.llvm.org/D63840 llvm-svn: 368305	2019-08-08 15:27:58 +00:00
David Green	824ffd8b12	[ARM] MVE big endian loads/stores This adds some missing patterns for big endian loads/stores, allowing unaligned loads/stores to also be selected with an extra VREV, which produces better code than aligning through a stack. Also moves VLDR_P0 to not be LE only, and adjusts some of the tests to show all that working. Differential Revision: https://reviews.llvm.org/D65583 llvm-svn: 368304	2019-08-08 15:15:19 +00:00
David Green	1becefd3f7	[ARM] Tighten up VLDRH.32 with low alignments VLDRH needs to have an alignment of at least 2, including the widening/narrowing versions. This tightens up the ISel patterns for it and alters allowsMisalignedMemoryAccesses so that unaligned accesses are expanded through the stack. It also fixed some incorrect shift amounts, which seemed to be passing a multiple not a shift. Differential Revision: https://reviews.llvm.org/D65580 llvm-svn: 368256	2019-08-08 06:22:03 +00:00
Oliver Cruickshank	4d4eefda6c	[ARM] Expand CTPOP intrinsic for MVE llvm-svn: 368180	2019-08-07 15:47:45 +00:00
Matt Arsenault	3922392969	AMDGPU: Correct behavior of f16 buffer loads Don't assume format loads for f16. Also fixes support for targets without i16. llvm-svn: 367879	2019-08-05 15:59:07 +00:00
Guillaume Chatelet	c97a3d15d2	[LLVM][Alignment] Introduce Alignment Type Summary: This is patch is part of a serie to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, jfb, jakehehrlich Reviewed By: jfb Subscribers: wuzish, jholewinski, arsenm, dschuff, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, s.egerton, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65514 llvm-svn: 367828	2019-08-05 11:02:05 +00:00
Bill Wendling	41a2847a9a	Emit diagnostic if an inline asm constraint requires an immediate Summary: An inline asm call can result in an immediate after inlining. Therefore emit a diagnostic here if constraint requires an immediate but one isn't supplied. Reviewers: joerg, mgorny, efriedma, rsmith Reviewed By: joerg Subscribers: asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, s.egerton, MaskRay, jyknight, dylanmckay, javed.absar, fedor.sergeev, jrtc27, Jim, krytarowski, eraman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60942 llvm-svn: 367750	2019-08-03 05:52:47 +00:00
Daniel Sanders	2bea69bf65	Finish moving TargetRegisterInfo::isVirtualRegister() and friends to llvm::Register as started by r367614. NFC llvm-svn: 367633	2019-08-01 23:27:28 +00:00
Eli Friedman	89b80f1239	[ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1. This is extremely specific, but saves three instructions when it's legal. I don't think the code can be usefully generalized. Differential Revision: https://reviews.llvm.org/D65351 llvm-svn: 367492	2019-07-31 23:19:21 +00:00
Eli Friedman	2f45ec1c39	[ARM] Transform compare of masked value to shift on Thumb1. Thumb1 has very limited immediate modes, so turning an "and" into a shift can save multiple instructions. It's possible to simplify the generated code for test2 and test3 in cmp-and-fold.ll a little more, but I'll implement that as a followup. Differential Revision: https://reviews.llvm.org/D65175 llvm-svn: 367491	2019-07-31 23:17:34 +00:00

1 2 3 4 5 ...

1954 Commits