llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Tatham	1adfa4c991	[ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family. Summary: I've implemented them as target-specific IR intrinsics rather than using `@llvm.experimental.vector.reduce.add`, on the grounds that the 'experimental' intrinsic doesn't currently have much code generation benefit, and my replacements encapsulate the sign- or zero-extension so that you don't expose the illegal MVE vector type (`<4 x i64>`) in IR. The machine instructions come in two versions: with and without an input accumulator. My new IR intrinsics, like the 'experimental' one, don't take an accumulator parameter: we represent that by just adding on the input value using an ordinary i32 or i64 add. So if you write the `vaddvaq` C-language intrinsic with an input accumulator of zero, it can be optimised to VADDV, and conversely, if you write something like `x += vaddvq(y)` then that can be combined into VADDVA. Most of this is achieved in isel lowering, by converting these IR intrinsics into the existing `ARMISD::VADDV` family of custom SDNode types. For the difficult case (64-bit accumulators), isel lowering already implements the optimization of folding an addition into a VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is already done, except that I had to introduce a parallel set of ARMISD nodes for the //predicated// forms of VADDLV. For the simpler VADDV, we handle the predicated form by just leaving the IR intrinsic alone and matching it in an ordinary dag pattern. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76491	2020-03-20 15:42:33 +00:00
Simon Tatham	28c5d97bee	[ARM,MVE] Add intrinsics and isel for MVE integer VMLA. Summary: These instructions compute multiply+add in integers, with one of the operands being a splat of a scalar. (VMLA and VMLAS differ in whether the splat operand is a multiplier or the addend.) I've represented these in IR using existing standard IR operations for the unpredicated forms. The predicated forms are done with target- specific intrinsics, as usual. When operating on n-bit vector lanes, only the bottom n bits of the i32 scalar operand are used. So we have to tell that to isel lowering, to allow it to remove a pointless sign- or zero-extension instruction on that input register. That's done in `PerformIntrinsicCombine`, but first I had to enable `PerformIntrinsicCombine` for MVE targets (previously all the intrinsics it handled were for NEON), and make it a method of `ARMTargetLowering` so that it can get at `SimplifyDemandedBits`. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76122	2020-03-18 10:55:04 +00:00
David Green	2c6c169dbd	[ARM] Optimise ASRL/LSRL to smaller shifts using demand bits. The ASRL/LSRL long shifts are generated from 64bit shifts. Once we have them, it might turn out that enough of the 64bit result was not required that we can use a smaller shift to perform the same result. As the smaller shift can in general be folded in more way, such as into add instructions in one of the test cases here, we can use the demand bit analysis to prefer the smaller shifts where we can. Differential Revision: https://reviews.llvm.org/D75371	2020-03-13 10:09:03 +00:00
Victor Campos	8a12553223	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. The code generation explicitly deviates from using the register-offset variant of LDRD/STRD. In this variant, the register allocated to the register-offset cannot be reused in any of the remaining operands. Such restriction seems to be non-trivial to implement in LLVM, thus it is left as a to-do. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2020-03-11 10:19:27 +00:00
David Green	33aa5dfe9c	[ARM] VMLAVA reduction patterns Similar to VADDV and VADDLV that have been added recently, this adds lowering and patterns for VMLAV, VMLAVA, VMLALV and VMLALVA. They perform the same roles as the add's, just folding a mul into the same instruction (and so taking two inputs). As such, they need to be lowered in the same way as the types are often not legal. Differential Revision: https://reviews.llvm.org/D74390	2020-02-19 12:39:58 +00:00
David Green	fceb3e3b4a	[ARM] MVE VADDLV lowering Following on from the extra VADDV lowering, this extends things to handle VADDLV which allows summing values into a pair of i32 registers, together treated as a i64. This needs to be done in DAGCombine too as the types are otherwise illegal, which is a fairly simple addition on top of the existing code. There is also a VADDLVA instruction handled here, that adds the incoming values from the two general purpose registers. As opposed to the non-long version where we could just add patterns for add(x, VADDV), the long version needs to handle this early before the i64 has being split into too many pieces. Differential Revision: https://reviews.llvm.org/D74224	2020-02-19 11:07:20 +00:00
Florian Hahn	216afd3301	[TargetLower] Update shouldFormOverflowOp check if math is used. On some targets, like SPARC, forming overflow ops is only profitable if the math result is used: https://godbolt.org/z/DxSmdB This patch adds a new MathUsed parameter to allow the targets to make the decision and defaults to only allowing it if the math result is used. That is the conservative choice. This patch also updates AArch64ISelLowering, X86ISelLowering, ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow ops if the math result is not used. On those targets using the overflow intrinsic for the overflow check only generates better code. Reviewers: nikic, RKSimon, lebedev.ri, spatel Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D74722	2020-02-19 11:28:33 +01:00
David Green	51c6e9445c	[ARM] Extra MVE VADDV reduction patterns We already make use of the VADDV vector reduction instruction for cases where the input and the output start out at the same type. The MVE instruction however will sum into an i32, so if we are summing a v16i8 into an i32, we can still use the same instructions. In terms of IR, this looks like a sext of a legal type (v16i8) into a very illegal type (v16i32) and a vecreduce.add of that into the result. This means we have to catch the pattern early in a DAG combine, producing a target VADDVs/u node, where the signedness is now important. This is the first part, handling VADDV and VADDVA. There are also VADDVL/VADDVLA instructions, which are interesting because they sum into a 64bit value. And VMLAV and VMLALV, which are interesting because they also do a multiply of two values. It may look a little odd in places as a result. On it's own this will probably not do very much, as the vectorizer will not produce this IR yet. Differential Revision: https://reviews.llvm.org/D74218	2020-02-19 09:45:35 +00:00
Victor Campos	af2a384581	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `60e0120c91`.	2020-02-08 13:18:45 +00:00
John Brawn	b37d59353f	[FPEnv][ARM] Add lowering of STRICT_FSETCC and STRICT_FSETCCS These can be lowered to code sequences using CMPFP and CMPFPE which then get selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain operand isn't handled correctly, but resolving that looks like it would involve changes around FPSCR-handling instructions and how the FPSCR is modelled. The fp-intrinsics test was already testing some of this but as the entire test was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the cases where we aren't generating the right instruction sequences as FIXME. Differential Revision: https://reviews.llvm.org/D73194	2020-02-03 12:59:12 +00:00
Simon Tatham	961530fdc9	[ARM,MVE] Fix vreinterpretq in big-endian mode. Summary: In big-endian MVE, the simple vector load/store instructions (i.e. both contiguous and non-widening) don't all store the bytes of a register to memory in the same order: it matters whether you did a VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register formats of different vector types relate to each other in a different way from the in-memory formats. So, if you want to 'bitcast' or 'reinterpret' one vector type as another, you have to carefully specify which you mean: did you want to reinterpret the //register// format of one type as that of the other, or the //memory// format? The ACLE `vreinterpretq` intrinsics are specified to reinterpret the register format. But I had implemented them as LLVM IR bitcast, which is specified for all types as a reinterpretation of the memory format. So a `vreinterpretq` intrinsic, applied to values already in registers, would code-generate incorrectly if compiled big-endian: instead of emitting no code, it would emit a `vrev`. To fix this, I've introduced a new IR intrinsic to perform a register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's implemented by a trivial isel pattern that expects the input in an MQPR register, and just returns it unchanged. In the clang codegen, I only emit this new intrinsic where it's actually needed: I prefer a bitcast wherever it will have the right effect, because LLVM understands bitcasts better. So we still generate bitcasts in little-endian mode, and even in big-endian when you're casting between two vector types with the same lane size. For testing, I've moved all the codegen tests of vreinterpretq out into their own file, so that they can have a different set of RUN lines to check both big- and little-endian. Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73786	2020-02-03 11:20:06 +00:00
Guillaume Chatelet	3c89b75f23	[NFC] Introduce a type to model memory operation Summary: This is a first step before changing the types to llvm::Align and introduce functions to ease client code. Reviewers: courbet Subscribers: arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73785	2020-01-31 17:29:01 +01:00
Matt Arsenault	255cc5a760	CodeGen: Use LLT instead of EVT in getRegisterByName Only PPC seems to be using it, and only checks some simple cases and doesn't distinguish between FP. Just switch to using LLT to simplify use from GlobalISel.	2020-01-09 17:37:52 -05:00
Victor Campos	60e0120c91	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2020-01-07 13:16:18 +00:00
Victor Campos	2ff5a596cb	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `bbcf1c3496`.	2019-12-20 18:08:24 +00:00
Victor Campos	bbcf1c3496	[ARM] Improve codegen of volatile load/store of i64 Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. Reviewers: dmgreen, efriedma, john.brawn Reviewed By: efriedma Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072	2019-12-19 11:23:01 +00:00
David Green	882f23caea	[ARM] MVE interleaving load and stores. Now that we have the intrinsics, we can add VLD2/4 and VST2/4 lowering for MVE. This works the same way as Neon, recognising the load/shuffles combination and converting them into intrinsics in a pre-isel pass, which just calls getMaxSupportedInterleaveFactor, lowerInterleavedLoad and lowerInterleavedStore. The main difference to Neon is that we do not have a VLD3 instruction. Otherwise most of the code works very similarly, with just some minor differences in the form of the intrinsics to work around. VLD3 is disabled by making isLegalInterleavedAccessType return false for those cases. We may need some other future adjustments, such as VLD4 take up half the available registers so should maybe cost more. This patch should get the basics in though. Differential Revision: https://reviews.llvm.org/D69392	2019-11-19 18:37:30 +00:00
Matt Arsenault	b696b9dba7	DAG: Add function context to isFMAFasterThanFMulAndFAdd AMDGPU needs to know the FP mode for the function to answer this correctly when this is removed from the subtarget. AArch64 had to make this more complicated by using this from an IR hook, so add an IR typed overload.	2019-11-19 19:25:26 +05:30
Matthew Malcomson	e5f3760e8c	Fix comment spelling {addresing -> addressing} (NFC)	2019-11-13 16:14:32 +00:00
David Green	91b0cad813	[ARM] Use isFMAFasterThanFMulAndFAdd for MVE The Arm backend will usually return false for isFMAFasterThanFMulAndFAdd, where both the fused VFMA.f32 and a non-fused VMLA.f32 are usually available for scalar code. For MVE we don't have the non-fused version though. It makes more sense for isFMAFasterThanFMulAndFAdd to return true, allowing us to simplify some of the existing ISel patterns. The tests here are that non of the existing tests failed, and so we are still selecting VFMA and VFMS. The one test that changed shows we can now select from fast math flags, as opposed to just relying on the isFMADLegalForFAddFSub option. Differential Revision: https://reviews.llvm.org/D69115	2019-11-04 15:05:41 +00:00
Guillaume Chatelet	bac5f6bd21	[Alignment][NFC] TargetCallingConv::setOrigAlign and TargetLowering::getABIAlignmentForCallingConv Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: sdardis, hiraditya, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69243 llvm-svn: 375407	2019-10-21 11:01:55 +00:00
David Green	fba831e791	[ARM] Lower sadd_sat to qadd8 and qadd16 Lower the target independent signed saturating intrinsics to qadd8 and qadd16. This custom lowers them from a sadd_sat, catching the node early before it is promoted. It also adds a QADD8b and QADD16b node to mean the bottom "lane" of a qadd8/qadd16, so that we can call demand bits on it to show that it does not use the upper bits. Also handles QSUB8 and QSUB16. Differential Revision: https://reviews.llvm.org/D68974 llvm-svn: 375402	2019-10-21 09:53:38 +00:00
David Green	543236232c	[ARM] Selection for MVE VMOVN The adds both VMOVNt and VMOVNb instruction selection from the appropriate shuffles. We detect shuffle masks of the form: 0, N, 2, N+2, 4, N+4, ... or 0, N+1, 2, N+3, 4, N+5, ... ISel will also try the opposite patterns, with inputs reversed. These are selected to VMOVNt and VMOVNb respectively. Differential Revision: https://reviews.llvm.org/D68283 llvm-svn: 374781	2019-10-14 15:19:33 +00:00
Kristof Beyls	78bfe3ab94	[ARM] Generate vcmp instead of vcmpe Based on the discussion in http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the conclusion was reached that the ARM backend should produce vcmp instead of vcmpe instructions by default, i.e. not be producing an Invalid Operation exception when either arguments in a floating point compare are quiet NaNs. In the future, after constrained floating point intrinsics for floating point compare have been introduced, vcmpe instructions probably should be produced for those intrinsics - depending on the exact semantics they'll be defined to have. This patch logically consists of the following parts: - Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which implemented fine-tuning for when to produce vcmpe (i.e. not do it for equality comparisons). The complexity introduced by those patches isn't needed anymore if we just always produce vcmp instead. Maybe these patches need to be reintroduced again once support is needed to map potential LLVM-IR constrained floating point compare intrinsics to the ARM instruction set. - Simply select vcmp, instead of vcmpe, see simple changes in lib/Target/ARM/ARMInstrVFP.td - Adapt lots of tests that tested for vcmpe (instead of vcmp). For all of these test, the intent of what is tested for isn't related to whether the vcmp should produce an Invalid Operation exception or not. Fixes PR43374. Differential Revision: https://reviews.llvm.org/D68463 llvm-svn: 374025	2019-10-08 08:25:42 +00:00
Matt Arsenault	f24ac13aaa	TLI: Remove DAG argument from getRegisterByName Replace with the MachineFunction. X86 is the only user, and only uses it for the function. This removes one obstacle from using this in GlobalISel. The other is the more tolerable EVT argument. The X86 use of the function seems questionable to me. It checks hasFP, before frame lowering. llvm-svn: 373292	2019-10-01 01:44:39 +00:00
David Green	57cc65ff47	[ARM] Generate 8.1-m CSINC, CSNEG and CSINV instructions. Arm 8.1-M adds a number of related CSEL instructions, including CSINC, CSNEG and CSINV. These choose between two values given the content in CPSR and a condition, performing an increment, negation or inverse of the false value. This adds some selection for them, either from constant values or patterns. It does not include CSEL directly, which is currently not always making code better. It is still useful, but we will have to check more carefully where it should and shouldn't be used. Code by Ranjeet Singh and Simon Tatham, with some modifications from me. Differential revision: https://reviews.llvm.org/D66483 llvm-svn: 370739	2019-09-03 10:53:07 +00:00
Jian Cai	16fa8b0970	Reland "[ARM] push LR before __gnu_mcount_nc" This relands r369147 with fixes to unit tests. https://reviews.llvm.org/D65019 llvm-svn: 369173	2019-08-16 23:30:16 +00:00
Jian Cai	2d957cfe02	Revert "[ARM] push LR before __gnu_mcount_nc" This reverts commit `f4cf3b9593`. llvm-svn: 369149	2019-08-16 20:40:21 +00:00
Jian Cai	f4cf3b9593	[ARM] push LR before __gnu_mcount_nc Push LR register before calling __gnu_mcount_nc as it expects the value of LR register to be the top value of the stack on ARM32. Differential Revision: https://reviews.llvm.org/D65019 llvm-svn: 369147	2019-08-16 20:21:08 +00:00
David Green	8c2c5f5045	[ARM] Don't pretend we know how to generate MVE VLDn We don't yet know how to generate these instructions for MVE. And in the case of VLD3, we don't even have the instruction. For the moment don't tell the vectoriser that we have VLD4, just to end up serialising the results. Differential Revision: https://reviews.llvm.org/D66009 llvm-svn: 369101	2019-08-16 13:06:49 +00:00
Eli Friedman	89b80f1239	[ARM] Lower "(x<<c) > 0x80000000U" to "lsls" on Thumb1. This is extremely specific, but saves three instructions when it's legal. I don't think the code can be usefully generalized. Differential Revision: https://reviews.llvm.org/D65351 llvm-svn: 367492	2019-07-31 23:19:21 +00:00
David Green	cd7a6fa314	[ARM] Rewrite how VCMP are lowered, using a single node This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and VCMPZ with an extra operand as the condition code. I believe this will make some combines simpler, allowing us to just look at these codes and not the operands. It also helps fill in a missing VCGTUZ MVE selection without adding extra nodes for it. Differential Revision: https://reviews.llvm.org/D65072 llvm-svn: 366934	2019-07-24 17:36:47 +00:00
David Green	bab4d8ac5a	[ARM] Better OR's for MVE compares This adds a DeMorgan combine for OR's of compares to turn them into AND's, helping prevent them from going into and out of gpr registers. It also fills in the VCLE and VCLT nodes that MVE can select, allowing it to invert more compares. Differential Revision: https://reviews.llvm.org/D65059 llvm-svn: 366920	2019-07-24 16:42:09 +00:00
David Green	c7e55d4f52	[ARM] MVE predicate register support This adds support code for building and shuffling i1 predicate registers. It generally uses two basic principles, either converting the predicate into an scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or by converting the register to an full vector register and back. Some of the code here is a not super efficient but will hopefully cover most cases of moving i1 vectors around and can be improved in subsequent patches. Some code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65052 llvm-svn: 366890	2019-07-24 11:51:36 +00:00
David Green	b9d96ceca0	[ARM] MVE integer compares and selects This adds the very basics for MVE vector predication, adding integer VCMP and VSEL instruction support. This is done through predicate registers (MVT::v16i1, MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc). An extra VCNE was added, as this can be handled sensibly by MVE's expanded number of VCMP condition codes. (There are also VCLE and VCLT which are added later). VPSEL is also added here, simply selecting on the vselect. Original code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65051 llvm-svn: 366885	2019-07-24 11:08:14 +00:00
Sam Parker	57e87dd81b	[ARM][LowOverheadLoops] Fix branch target codegen While lowering test.set.loop.iterations, it wasn't checked how the brcond was using the result and so the wls could branch to the loop preheader instead of not entering it. The same was true for loop.decrement.reg. So brcond and br_cc and now lowered manually when using the hwloop intrinsics. During this we now check whether the result has been negated and whether we're using SETEQ or SETNE and 0 or 1. We can then figure out which basic block the WLS and LE should be targeting. Differential Revision: https://reviews.llvm.org/D64616 llvm-svn: 366809	2019-07-23 14:08:46 +00:00
David Green	fdedf240f8	[ARM] Rename NEONModImm to VMOVModImm. NFC Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE. llvm-svn: 366790	2019-07-23 09:19:24 +00:00
David Green	da750b1688	[ARM] Adjust how NEON shifts are lowered This adjusts the way that we lower NEON shifts to use a DAG target node, not via a neon intrinsic. This is useful for handling MVE shifts operations in the same the way. It also renames some of the immediate shift nodes for consistency, and moves some of the processing of immediate shifts into LowerShift allowing it to capture more cases. Differential Revision: https://reviews.llvm.org/D64426 llvm-svn: 366051	2019-07-15 10:44:50 +00:00
Martin Storsjo	8d9d290d4c	[ARM] Add support for MSVC stack cookie checking Heavily based on the same for AArch64, from SVN r346469. Differential Revision: https://reviews.llvm.org/D64292 llvm-svn: 365283	2019-07-07 18:57:31 +00:00
David Green	25cf705097	[ARM] MVE VMOV immediate handling This adds some handling for VMOVimm, using the same method that NEON uses. We create VMOVIMM/VMVNIMM/VMOVFPIMM nodes based on the immediate, and select them using the now renamed ARMvmovImm/etc. There is also an extra 64bit immediate mode that I have not yet added here. Code by David Sherwood Differential Revision: https://reviews.llvm.org/D63884 llvm-svn: 365178	2019-07-05 10:02:43 +00:00
Roman Lebedev	c4b83a6054	[Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457) Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 \| PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010	2019-07-03 09:41:35 +00:00
Simon Tatham	bffd099d15	[ARM] MVE: allow soft-float ABI to pass vector types. Passing a vector type over the soft-float ABI involves it being split into four GPRs, so the first thing that has to happen at the start of the function is to recombine those into a vector register. The ABI types all vectors as v2f64, so we need to support BUILD_VECTOR for that type, which I do in this patch by allowing it to be expanded in terms of INSERT_VECTOR_ELT, and writing an ISel pattern for that in turn. Similarly, I provide a rule for EXTRACT_VECTOR_ELT so that a returned vector can be marshalled back into GPRs. While I'm here, I've also added ISD::UNDEF to the list of operations we turn back on in `setAllExpand`, because I noticed that otherwise it gets expanded into a BUILD_VECTOR with explicit zero inputs, leading to pointless machine instructions to zero out a vector register that's about to have every lane overwritten of in any case. Reviewers: dmgreen, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63937 llvm-svn: 364910	2019-07-02 11:26:11 +00:00
Simon Tatham	7b63a9533c	[ARM] Stop using scalar FP instructions in integer-only MVE mode. If you compile with `-mattr=+mve` (enabling integer MVE instructions but not floating-point ones), then the scalar FP //registers// exist and it's legal to move things in and out of them, load and store them, but it's not legal to do arithmetic on them. In D60708, the calls to `addRegisterClass` in ARMISelLowering that enable use of the scalar FP registers became conditionalised on `Subtarget->hasFPRegs()` instead of `Subtarget->hasVFP2Base()`, so that loads, stores and moves of those registers would work. But I didn't realise that that would also enable all the operations on those types by default. Now, if the target doesn't have basic VFP, we follow up those `addRegisterClass` calls by turning back off all the nontrivial operations you can perform on f32 and f64. That causes several knock-on failures, which are fixed by allowing the `VMOVDcc` and `VMOVScc` instructions to be selected even if all you have is `HasFPRegs`, and adjusting several checks for 'is this a double in a single-precision-only world?' to the more general 'is this any FP type we can't do arithmetic on?'. Between those, the whole of the `float-ops.ll` and `fp16-instructions.ll` tests can now run in MVE-without-FP mode and generate correct-looking code. One odd side effect is that I had to relax the check lines in that test so that they permit test functions like `add_f` to be generated as tailcalls to software FP library functions, instead of ordinary calls. Doing that is entirely legal, but the mystery is why this is the first RUN line that's needed the relaxation: on the usual kind of non-FP target, no tailcalls ever seem to be generated. Going by the llc messages, I think `SoftenFloatResult` must be perturbing the code generation in some way, but that's as much as I can guess. Reviewers: dmgreen, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63938 llvm-svn: 364909	2019-07-02 11:26:00 +00:00
Sam Parker	98722691b0	[ARM] WLS/LE Code Generation Backend changes to enable WLS/LE low-overhead loops for armv8.1-m: 1) Use TTI to communicate to the HardwareLoop pass that we should try to generate intrinsics that guard the loop entry, as well as setting the loop trip count. 2) Lower the BRCOND that uses said intrinsic to an Arm specific node: ARMWLS. 3) ISelDAGToDAG the node to a new pseudo instruction: t2WhileLoopStart. 4) Add support in ArmLowOverheadLoops to handle the new pseudo instruction. Differential Revision: https://reviews.llvm.org/D63816 llvm-svn: 364733	2019-07-01 08:21:28 +00:00
Sam Tebbs	e39e958da3	[ARM] Add support for the MVE long shift instructions MVE adds the lsll, lsrl and asrl instructions, which perform a shift on a 64 bit value separated into two 32 bit registers. The Expand64BitShift function is modified to accept ISD::SHL, ISD::SRL and ISD::SRA and convert it into the appropriate opcode in ARMISD. An SHL is converted into an lsll, an SRL is converted into an lsrl for the immediate form and a negation and lsll for the register form, and SRA is converted into an asrl. test/CodeGen/ARM/shift_parts.ll is added to test the logic of emitting these instructions. Differential Revision: https://reviews.llvm.org/D63430 llvm-svn: 364654	2019-06-28 15:43:31 +00:00
David Green	eb7080ac6e	[ARM] Widening loads and narrowing stores MVE has instructions to widen as it loads, and narrow as it stores. This adds the required patterns and legalisation to make them work including specifying that they are legal, patterns to select them and test changes. Patch by David Sherwood. Differential Revision: https://reviews.llvm.org/D63839 llvm-svn: 364636	2019-06-28 09:47:55 +00:00
David Green	8be372b190	[ARM] MVE vector shuffles This patch adds necessary shuffle vector and buildvector support for ARM MVE. It essentially adds support for VDUP, VREVs and some VMOVs, which are often required by other code (like upcoming patches). This mostly uses the same code from Neon that already generated NEONvdup/NEONvduplane/NEONvrev's. These have been renamed to ARMvdup/etc and moved to ARMInstrInfo as they are common to both architectures. Most of the selection code seems to be applicable to both, but NEON does have some more instructions making some parts specific. Most code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63567 llvm-svn: 364626	2019-06-28 07:08:42 +00:00
Simon Tatham	a4b415a683	[ARM] Code-generation infrastructure for MVE. This provides the low-level support to start using MVE vector types in LLVM IR, loading and storing them, passing them to __asm__ statements containing hand-written MVE vector instructions, and if you have the hard-float ABI turned on, using them as function parameters. (In the soft-float ABI, vector types are passed in integer registers, and combining all those 32-bit integers into a q-reg requires support for selection DAG nodes like insert_vector_elt and build_vector which aren't implemented yet for MVE. In fact I've also had to add `arm_aapcs_vfpcc` to a couple of existing tests to avoid that problem.) Specifically, this commit adds support for: * spills, reloads and register moves for MVE vector registers * ditto for the VPT predication mask that lives in VPR.P0 * make all the MVE vector types legal in ISel, and provide selection DAG patterns for BITCAST, LOAD and STORE * make loads and stores of scalar FP types conditional on `hasFPRegs()` rather than `hasVFP2Base()`. As a result a few existing tests needed their llc command lines updating to use `-mattr=-fpregs` as their method of turning off all hardware FP support. Reviewers: dmgreen, samparker, SjoerdMeijer Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60708 llvm-svn: 364329	2019-06-25 16:48:46 +00:00
Simon Pilgrim	4e0648a541	[TargetLowering] Add MachineMemOperand::Flags to allowsMemoryAccess tests (PR42123) As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space. This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them. If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores. Differential Revision: https://reviews.llvm.org/D63075 llvm-svn: 363179	2019-06-12 17:14:03 +00:00
Diogo N. Sampaio	df92f84110	[ARM][FIX] Ran out of registers due tail recursion Summary: - pr42062 When compiling for MinSize, ARMTargetLowering::LowerCall decides to indirect multiple calls to a same function. However, it disconsiders the limitation that thumb1 indirect calls require the callee to be in a register from r0 to r3 (llvm limiation). If all those registers are used by arguments, the compiler dies with "error: run out of registers during register allocation". This patch tells the function IsEligibleForTailCallOptimization if we intend to perform indirect calls, as to avoid tail call optimization. Reviewers: dmgreen, efriedma Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62683 llvm-svn: 362366	2019-06-03 08:58:05 +00:00

1 2 3 4 5 ...

487 Commits