llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	a0b1616359	[ARM] Regenerate tests. NFC	2020-04-19 13:45:39 +01:00
Sam Parker	f88000a4b5	[ARM][MVE] Add VHADD and VHSUB patterns Add patterns that use a normal, non-wrapping, add and sub nodes along with an arm vshr imm node. Differential Revision: https://reviews.llvm.org/D77065	2020-04-17 07:45:15 +01:00
David Green	94052da929	[ARM] MVE postinc tests. NFC	2020-04-16 22:05:28 +01:00
Anna Welker	d736571538	[ARM][MVE] Fix location of optimized gather addresses Fix for the address optimization for gathers and scatters which would in some complex cases push out instructions not to the vector loop preheader, but to other locations as well which lead to a scrambled order and the compilation failing. This patch ensures that said instructions are always pushed to the end of the vector loop preheader. Differential Revision: https://reviews.llvm.org/D78293	2020-04-16 18:15:28 +01:00
Konstantin Schwarz	1a3e89aa2b	[MIR] Add comments to INLINEASM immediate flag MachineOperands Summary: The INLINEASM MIR instructions use immediate operands to encode the values of some operands. The MachineInstr pretty printer function already handles those operands and prints human readable annotations instead of the immediates. This patch adds similar annotations to the output of the MIRPrinter, however uses the new MIROperandComment feature. Reviewers: SjoerdMeijer, arsenm, efriedma Reviewed By: arsenm Subscribers: qcolombet, sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78088	2020-04-16 13:46:14 +02:00
Pierre-vh	13eb890139	[Target][ARM] Fix VPT Block Pass miscompilation The pass was incorrectly reverting back to a "T" when something wrote to VPR inside a "E" block. This is not the correct behaviour, the predicate should stay the same. Differential Revision: https://reviews.llvm.org/D77798	2020-04-14 15:16:27 +01:00
Pierre-vh	4563024356	[Target][ARM] Adding MVE VPT Optimisation Pass Differential Revision: https://reviews.llvm.org/D76709	2020-04-14 15:16:27 +01:00
Anna Welker	89e1248d7b	[ARM][MVE] Optimise offset addresses of gathers/scatters This patch adds an analysis of the offset addresses used by gathers and scatters to the MVEGatherScatterLowering pass to find multiplications and additions that are loop invariant and thus can be moved into the loop preheader, avoiding to execute them each time. Differential Revision: https://reviews.llvm.org/D76681	2020-04-08 11:46:57 +01:00
Keith Walker	01dc10774e	[ARM] unwinding .pad instructions missing in execute-only prologue If the stack pointer is altered for local variables and we are generating Thumb2 execute-only code the .pad directive is missing. Usually the size of the adjustment is stored in a PC-relative location and loaded into a register which is then added to the stack pointer. However when we are generating execute-only code code the size of the adjustment is instead generated using the MOVW/MOVT instruction pair. As a by product of handling the execute-only case this also fixes an existing issue that in the none execute-only case the .pad directive was generated against the load of the constant to a register instruction, instead of the instruction which adds the register to the stack pointer. Differential Revision: https://reviews.llvm.org/D76849	2020-04-07 11:51:59 +01:00
Pierre-vh	4fc59a468f	Revert "[CodeGen][SelectionDAG] Flip Booleans More Often" This reverts commit `23342bdcc8`.	2020-04-07 09:09:10 +01:00
Pierre-vh	23342bdcc8	[CodeGen][SelectionDAG] Flip Booleans More Often Differential Revision: https://reviews.llvm.org/D77201	2020-04-07 08:19:57 +01:00
Jonathan Roelofs	7c5d2bec76	[llvm] Fix missing FileCheck directive colons https://reviews.llvm.org/D77352	2020-04-06 09:59:08 -06:00
David Green	9fa38c985f	[ARM] MVE vqmovn tests. NFC.	2020-04-06 11:13:02 +01:00
Oliver Stannard	a294d9eb21	Revert "[IPRA][ARM] Spill extra registers at -Oz" Reverting because this is causing failures on bots with expensive checks enabled. This reverts commit `73cea83a6f`.	2020-04-06 10:34:59 +01:00
John Brawn	4ad9ca0f9e	[ARM] Fix incorrect handling of big-endian vmov.i64 Currently when the target is big-endian vmov.i64 reverses the order of the two words of the vector. This is correct only when the underlying element type is 32-bit, as actually what it should be doing is considering it a vector of the underlying type and reversing the elements of that. Differential Revision: https://reviews.llvm.org/D76515	2020-04-03 17:36:50 +01:00
John Brawn	cd58fb6325	[ARM] Avoid pointless vrev of element-wise vmov If we have an element-wise vmov immediate instruction then a subsequent vrev with width greater or equal to the vmov element width, then that vrev won't do anything. Add a DAG combine to convert bitcasts that would become such vrevs into vector_reg_casts instead. Differential Revision: https://reviews.llvm.org/D76514	2020-04-03 17:36:50 +01:00
David Green	fbd53ffc3a	[ARM] MVE VMULL patterns This adds MVE vmull patterns, which are conceptually the same as mul(vmovl, vmovl), and so the tablegen patterns follow the same structure. For i8 and i16 this is simple enough, but in the i32 version the multiply (in 64bits) is illegal, meaning we need to catch the pattern earlier in a dag fold. Because bitcasts are involved in the zext versions and the patterns are a little different in little and big endian. I have only added little endian support in this patch. Differential Revision: https://reviews.llvm.org/D76740	2020-04-02 10:57:40 +01:00
David Green	c697dd9ffd	[ARM] Make remaining MVE instruction predictable The unpredictable/hasSideEffects flag is usually inferred by tablegen from whether the instruction has a tablegen pattern (and that pattern only has a single output instruction). Now that the MVE intrinsics are all committed and producing code, the remaining instructions still marked as unpredictable need to be specially handled. This adds the flag directly to instructions that need it, notably the V*MLAL instructions and some of the MOV's. Differential Revision: https://reviews.llvm.org/D76910	2020-04-02 10:57:40 +01:00
David Green	a0c537834a	[ARM] Extra vmull loop tests. NFC	2020-04-01 14:07:45 +01:00
Pierre-vh	2effe8f5e7	[Target][ARM] Improvements to the VPT Block Insertion Pass This allows the MVE VPT Block insertion pass to remove VPNOTs in order to create more complex VPT blocks such as TE, TEET, TETE, etc. Differential Revision: https://reviews.llvm.org/D75993	2020-04-01 12:34:20 +01:00
Sam Parker	94b195ff12	[ARM][LowOverheadLoops] Add horizontal reduction support Add a bit more logic into the 'FalseLaneZeros' tracking to enable horizontal reductions and also make the VADDV variants validForTailPredication. Differential Revision: https://reviews.llvm.org/D76708	2020-03-30 09:55:41 +01:00
David Green	c9eaed5149	[ARM] MVE VMOV.i64 In the original batch of MVE VMOVimm code generation VMOV.i64 was left out due to the way it was done downstream. It turns out that it's fairly simple though. This adds the codegen for it, similar to NEON. Bigendian is technically incorrect in this version, which John is fixing in a Neon patch.	2020-03-30 07:44:23 +01:00
David Green	7c1a6873aa	[ARM] VMOV.64 immediate tests. NFC	2020-03-29 21:08:43 +01:00
Sam Parker	d7084fa34a	[ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros Given that some instructions generate wider result elements than their inputs, flag them as being able to generate non zeros in the false lanes. Differential Revision: https://reviews.llvm.org/D76766	2020-03-27 15:26:13 +00:00
David Green	8689f98e9b	[ARM] Fix MVE VCMPr f16 pattern This patterns seemed to be using the f32 instruction, not f16. Fix it to use the correct one. Differential Revision: https://reviews.llvm.org/D76841	2020-03-27 11:18:24 +00:00
David Green	37b9cc8f29	[ARM] Sink splats to vector float instructions Some MVE floating point instructions have gpr register variants that take the scalar gpr value and splat them to all lanes. In order to accept them in loops, the shuffle_vector and insert need to be sunk down into the loop, next to the instruction so that ISel can see the whole pattern. This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for mul are slightly more constrained as there are no fms variants taking register arguments. Differential Revision: https://reviews.llvm.org/D76023	2020-03-26 09:02:18 +00:00
Mikhail Maltsev	bb4da94e5b	[ARM,CDE] Implement predicated Q-register CDE intrinsics Summary: This patch implements the following CDE intrinsics: T __arm_vcx1q_m(int coproc, T inactive, uint32_t imm, mve_pred_t p); T __arm_vcx2q_m(int coproc, T inactive, U n, uint32_t imm, mve_pred_t p); T __arm_vcx3q_m(int coproc, T inactive, U n, V m, uint32_t imm, mve_pred_t p); T __arm_vcx1qa_m(int coproc, T acc, uint32_t imm, mve_pred_t p); T __arm_vcx2qa_m(int coproc, T acc, U n, uint32_t imm, mve_pred_t p); T __arm_vcx3qa_m(int coproc, T acc, U n, V m, uint32_t imm, mve_pred_t p); The intrinsics are not part of the released ACLE spec, but internally at Arm we have reached consensus to add them to the next ACLE release. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: simon_tatham Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76610	2020-03-25 17:08:19 +00:00
Simon Tatham	8f1651ccea	[ARM,MVE] Add missing tests for vqdmlash intrinsics. Summary: These were accidentally left out of D76123. I added tests for the other three instructions in this small cross-product family (vqdmlah, vqrdmlah, vqrdmlash) but missed this one. Reviewers: miyuki Reviewed By: miyuki Subscribers: kristof.beyls, dmgreen, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76714	2020-03-25 09:46:16 +00:00
David Green	f8c79b94af	[ARM] Fold VMOVrh VLDR to LDRH This adds a simple fold to combine VMOVrh load to a integer load. Similar to what is already performed for BITCAST, but needs to account for the types being of different sizes, creating an zero extending load. Differential Revision: https://reviews.llvm.org/D76485	2020-03-24 15:51:03 +00:00
Sam Parker	ca21e60fdf	[NFC][ARM] Add missing tests	2020-03-24 11:08:01 +00:00
David Green	1232cfa385	[ARM] Don't split trunc stores that can be better handled as VMOVN We deliberately split stores of the form store(truncate(larger-than-legal-type)) into two stores, allowing each store to perform part of the truncate for free. There are times however where it makes more sense to use VMOVN to de-interlace the results back into a single vector, and store that in one go. This adds a check for that situation, not splitting the store if it looks like a VMOVN can be more useful. Differential Revision: https://reviews.llvm.org/D76511	2020-03-24 08:48:52 +00:00
David Green	e10af89d99	[ARM] Extra VMOVN and VMULL tests. NFC	2020-03-23 16:18:49 +00:00
Sam Parker	62fdb1f534	[DAGCombine] Skip PostInc combine with later users When decided whether to generate a post-inc load/store, look at the other memory nodes that use the same base address and, if any proceed the current node, then don't do the combine. The change only seems to be affecting the Arm backend, which I was surprised at, but it appears to fix a lot of our issues around MVE masked load/stores having to store a temporary address after an early post-increment on a shared base address. Differential Revision: https://reviews.llvm.org/D75847	2020-03-23 08:39:53 +00:00
Simon Tatham	1adfa4c991	[ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family. Summary: I've implemented them as target-specific IR intrinsics rather than using `@llvm.experimental.vector.reduce.add`, on the grounds that the 'experimental' intrinsic doesn't currently have much code generation benefit, and my replacements encapsulate the sign- or zero-extension so that you don't expose the illegal MVE vector type (`<4 x i64>`) in IR. The machine instructions come in two versions: with and without an input accumulator. My new IR intrinsics, like the 'experimental' one, don't take an accumulator parameter: we represent that by just adding on the input value using an ordinary i32 or i64 add. So if you write the `vaddvaq` C-language intrinsic with an input accumulator of zero, it can be optimised to VADDV, and conversely, if you write something like `x += vaddvq(y)` then that can be combined into VADDVA. Most of this is achieved in isel lowering, by converting these IR intrinsics into the existing `ARMISD::VADDV` family of custom SDNode types. For the difficult case (64-bit accumulators), isel lowering already implements the optimization of folding an addition into a VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is already done, except that I had to introduce a parallel set of ARMISD nodes for the //predicated// forms of VADDLV. For the simpler VADDV, we handle the predicated form by just leaving the IR intrinsic alone and matching it in an ordinary dag pattern. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76491	2020-03-20 15:42:33 +00:00
Simon Tatham	45a9945b9e	[ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family. Summary: I've implemented these as target-specific IR intrinsics, because they're not //quite// enough like @llvm.experimental.vector.reduce.min (which doesn't take the extra scalar parameter). Also this keeps the predicated and unpredicated versions looking similar, and the floating-point minnm/maxnm versions fold into the same schema. We had a couple of min/max reductions already implemented, from the initial pathfinding exercise in D67158. Those were done by having separate IR intrinsic names for the signed and unsigned integer versions; as part of this commit, I've changed them to use a flag parameter indicating signedness, which is how we ended up deciding that the rest of the MVE intrinsics family ought to work. So now hopefully the ewhole lot is consistent. In the new llc test, the output code from the `v8f16` test functions looks quite unpleasant, but most of it is PCS lowering (you can't pass a `half` directly in or out of a function). In other circumstances, where you do something else with your `half` in the same function, it doesn't look nearly as nasty. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76490	2020-03-20 15:42:33 +00:00
Mikhail Maltsev	969034b860	[ARM,CDE] Implement CDE unpredicated Q-register intrinsics Summary: This patch implements the following intrinsics: uint8x16_t __arm_vcx1q_u8 (int coproc, uint32_t imm); T __arm_vcx1qa(int coproc, T acc, uint32_t imm); T __arm_vcx2q(int coproc, T n, uint32_t imm); uint8x16_t __arm_vcx2q_u8(int coproc, T n, uint32_t imm); T __arm_vcx2qa(int coproc, T acc, U n, uint32_t imm); T __arm_vcx3q(int coproc, T n, U m, uint32_t imm); uint8x16_t __arm_vcx3q_u8(int coproc, T n, U m, uint32_t imm); T __arm_vcx3qa(int coproc, T acc, U n, V m, uint32_t imm); Most of them are polymorphic. Furthermore, some intrinsics are polymorphic by 2 or 3 parameter types, such polymorphism is not supported by the existing MVE/CDE tablegen backends, also we don't really want to have a combinatorial explosion caused by 1000 different combinations of 3 vector types. Because of this some intrinsics are implemented as macros involving a cast of the polymorphic arguments to uint8x16_t. The IR intrinsics are even more restricted in terms of types: all MVE vectors are cast to v16i8. Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76299	2020-03-20 14:01:56 +00:00
Mikhail Maltsev	d22e661712	[ARM,CDE] Implement CDE S and D-register intrinsics Summary: This patch implements the following ACLE intrinsics: uint32_t __arm_vcx1_u32(int coproc, uint32_t imm); uint32_t __arm_vcx1a_u32(int coproc, uint32_t acc, uint32_t imm); uint32_t __arm_vcx2_u32(int coproc, uint32_t n, uint32_t imm); uint32_t __arm_vcx2a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t imm); uint32_t __arm_vcx3_u32(int coproc, uint32_t n, uint32_t m, uint32_t imm); uint32_t __arm_vcx3a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t m, uint32_t imm); uint64_t __arm_vcx1d_u64(int coproc, uint32_t imm); uint64_t __arm_vcx1da_u64(int coproc, uint64_t acc, uint32_t imm); uint64_t __arm_vcx2d_u64(int coproc, uint64_t m, uint32_t imm); uint64_t __arm_vcx2da_u64(int coproc, uint64_t acc, uint64_t m, uint32_t imm); uint64_t __arm_vcx3d_u64(int coproc, uint64_t n, uint64_t m, uint32_t imm); uint64_t __arm_vcx3da_u64(int coproc, uint64_t acc, uint64_t n, uint64_t m, uint32_t imm); Since the semantics of CDE instructions is opaque to the compiler, the ACLE intrinsics require dedicated LLVM IR intrinsics. The 64-bit and 32-bit variants share the same IR intrinsic. Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76298	2020-03-20 14:01:53 +00:00
Mikhail Maltsev	7a85e3585e	[ARM,CDE] Implement GPR CDE intrinsics Summary: This change implements ACLE CDE intrinsics that translate to instructions working with general-purpose registers. The specification is available at https://static.docs.arm.com/101028/0010/ACLE_2019Q4_release-0010.pdf Each ACLE intrinsic gets a corresponding LLVM IR intrinsic (because they have distinct function prototypes). Dual-register operands are represented as pairs of i32 values. Because of this the instruction selection for these intrinsics cannot be represented as TableGen patterns and requires custom C++ code. Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard Reviewed By: MarkMurrayARM Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76296	2020-03-20 14:01:51 +00:00
David Green	b3499f572d	[ARM] Change VDUP type to i32 for MVE The MVE VDUP instruction take a GPR and splats into every lane of a vector register. Unlike NEON we do not have a VDUPLANE equivalent instruction, doing the same splat from a fp register. Previously a VDUP to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which would mean the instruction pattern needs to add a COPY_TO_REGCLASS to the GPR. Instead this now converts that earlier during an ISel DAG combine, converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction selection to tell that the input needs to be an i32, which in one of the testcases allows it to use ldr (or specifically ldm) over (vldr;vmov). Whilst being simple enough for floats, as the types sizes are the same, these is no BITCAST equivalent for getting a half into a i32. This uses a VMOVrh ARMISD node, which doesn't know the same tricks yet. Differential Revision: https://reviews.llvm.org/D76292	2020-03-20 09:48:45 +00:00
David Green	9cf920e64d	[ARM] Extra MVE float loop tests. NFC	2020-03-20 09:21:45 +00:00
Simon Tatham	e13d153c1b	[ARM,MVE] Add intrinsics for the VQDMLAD family. Summary: This is another set of instructions too complicated to be sensibly expressed in IR by anything short of a target-specific intrinsic. Given input vectors a,b, the instruction generates intermediate values 2(a[0]b[0]+a[1]+b[1]), 2(a[2]b[2]+a[3]+b[3]), etc; takes the high half of each double-width values, and overwrites half the lanes in the output vector c, which you therefore have to provide the input value of. Optionally you can swap the elements of b so that the are things like a[0]b[1]+a[1]b[0]; optionally you can round to nearest when taking the high half; and optionally you can take the difference rather than sum of the two products. Finally, saturation is applied when converting back to a single-width vector lane. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76359	2020-03-18 17:11:22 +00:00
Oliver Stannard	73cea83a6f	[IPRA][ARM] Spill extra registers at -Oz When optimising for code size at the expense of performance, it is often worth saving and restoring some of r0-r3, if IPRA will be able to take advantage of them. This doesn't cost any extra code size if we already have a PUSH/POP pair, and increases the number of available registers across any calls to the function. We already have an optimisation which tries fold the subtract/add of the SP into the PUSH/POP by using extra registers, which somewhat conflicts with this. I've made the new optimisation less aggressive in cases where the existing one is likely to trigger, which gives better results than either of these optimisations by themselves. Differential revision: https://reviews.llvm.org/D69936	2020-03-18 13:51:16 +00:00
Simon Tatham	928776de92	[ARM,MVE] Add intrinsics for the VQDMLAH family. Summary: These are complicated integer multiply+add instructions with extra saturation, taking the high half of a double-width product, and optional rounding. There's no sensible way to represent that in standard IR, so I've converted the clang builtins directly to target-specific intrinsics. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: miyuki Subscribers: kristof.beyls, hiraditya, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76123	2020-03-18 10:55:04 +00:00
Simon Tatham	28c5d97bee	[ARM,MVE] Add intrinsics and isel for MVE integer VMLA. Summary: These instructions compute multiply+add in integers, with one of the operands being a splat of a scalar. (VMLA and VMLAS differ in whether the splat operand is a multiplier or the addend.) I've represented these in IR using existing standard IR operations for the unpredicated forms. The predicated forms are done with target- specific intrinsics, as usual. When operating on n-bit vector lanes, only the bottom n bits of the i32 scalar operand are used. So we have to tell that to isel lowering, to allow it to remove a pointless sign- or zero-extension instruction on that input register. That's done in `PerformIntrinsicCombine`, but first I had to enable `PerformIntrinsicCombine` for MVE targets (previously all the intrinsics it handled were for NEON), and make it a method of `ARMTargetLowering` so that it can get at `SimplifyDemandedBits`. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76122	2020-03-18 10:55:04 +00:00
Simon Pilgrim	68224c1952	[TargetLowering] Only demand a rotation's modulo amount bits ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits. Differential Revision: https://reviews.llvm.org/D76201	2020-03-17 21:23:46 +00:00
Simon Pilgrim	1ec395523d	[Thumb2] Regenerate rotate tests	2020-03-15 18:28:54 +00:00
David Green	2c6c169dbd	[ARM] Optimise ASRL/LSRL to smaller shifts using demand bits. The ASRL/LSRL long shifts are generated from 64bit shifts. Once we have them, it might turn out that enough of the 64bit result was not required that we can use a smaller shift to perform the same result. As the smaller shift can in general be folded in more way, such as into add instructions in one of the test cases here, we can use the demand bit analysis to prefer the smaller shifts where we can. Differential Revision: https://reviews.llvm.org/D75371	2020-03-13 10:09:03 +00:00
David Green	f67d93dc23	[ARM] Constant long shift combines This changes the way that asrl and lsrl intrinsics are lowered, going via a the ISEL ASRL and LSLL nodes instead of straight to machine nodes. On top of that, it adds some constant folds for long shifts, in case it turns out that the shift amount was either constant or 0. Differential Revision: https://reviews.llvm.org/D75553	2020-03-13 08:54:59 +00:00
David Green	05334de679	[ARM] Long shift tests. NFC	2020-03-12 19:01:49 +00:00
Simon Tatham	3f8e714e2f	[ARM,MVE] Add intrinsics and isel for MVE fused multiply-add. Summary: This adds the ACLE intrinsic family for the VFMA and VFMS instructions, which perform fused multiply-add on vectors of floats. I've represented the unpredicated versions in IR using the cross- platform `@llvm.fma` IR intrinsic. We already had isel rules to convert one of those into a vector VFMA in the simplest possible way; but we didn't have rules to detect a negated argument and turn it into VFMS, or rules to detect a splat argument and turn it into one of the two vector/scalar forms of the instruction. Now we have all of those. The predicated form uses a target-specific intrinsic as usual, but I've stuck to just one, for a predicated FMA. The subtraction and splat versions are code-generated by passing an fneg or a splat as one of its operands, the same way as the unpredicated version. In arm_mve_defs.h, I've had to introduce a tiny extra piece of infrastructure: a record `id` for use in codegen dags which implements the identity function. (Just because you can't declare a Tablegen value of type dag which is //only// a `$varname`: you have to wrap it in something. Now I can write `(id $varname)` to get the same effect.) Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75998	2020-03-12 11:13:50 +00:00

1 2 3 4 5 ...

1001 Commits