llvm-project

Commit Graph

Author	SHA1	Message	Date
Daniel Egger	98c2e4115d	[ARM] Add lowering of uadd_sat to uq{add\|sub}8 and uq{add\|sub}16 This follow the lead of https://reviews.llvm.org/D68974 to add lowering of unsigned saturated addition/subtraction. Differential Revision: https://reviews.llvm.org/D105413	2021-07-11 15:58:11 +01:00
David Green	38c9a4068d	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484	2021-07-09 11:51:16 +01:00
Craig Topper	631516301e	[ARM] Pass 2 instead of 0 to PHINode::Create in MVEGatherScatterLowering. NFC This parameter controls how much space is reserved for incoming values. There are always going to be 2 incoming values in this case. While there remove the unused std::vector right below. Found while looking at porting this code to RISCV.	2021-07-08 15:59:33 -07:00
Craig Topper	6dd94cbff5	[ARM] Use matchSimpleRecurrence to simplify some code in MVEGatherScatterLowering. NFCI Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105262	2021-07-08 11:42:56 -07:00
Matt Arsenault	9b057f647d	GlobalISel: Track original argument index in ArgInfo SelectionDAG's equivalents in ISD::InputArg/OutputArg track the original argument index. Mips relies on this, and its currently reinventing its own parallel CallLowering infrastructure which tracks these indexes on the side. Add this to help move towards deleting the custom mips handling.	2021-07-08 13:39:02 -04:00
David Green	a77e2d196c	[ARM] Fix arm.mve.pred.v2i range upper limit The range metadata specifies a half open range, so our top limit was one off.	2021-07-05 21:06:30 +01:00
Krzysztof Parzyszek	df88c26f0d	[OpaquePtr] Add type parameter to emitLoadLinked Differential Revision: https://reviews.llvm.org/D105353	2021-07-02 13:07:40 -05:00
David Green	3d48775b89	[ARM] Reassociate BFI D104868 removed an (incorrect) fold for distributing BFI instructions in a chain, combining them into a single instruction. BFIs like that are hard to test, as the patterns are often destroyed before they become BFIs. But it can come up in places, with chains of BFIs that can be combined. This patch adds a replacement, which reassociates BFI instructions with non-overlapping insertion masks so that low bits are inserted first. This can end up sorting the nodes so that adjacent inserts are next to one another, allowing the existing folds to combine into a single BFI. Differential Revision: https://reviews.llvm.org/D105096	2021-07-01 21:08:13 +01:00
Matt Arsenault	99c7e918b5	GlobalISel: Use LLT in call lowering callbacks This preserves the memory type so the lowerings can rely on them.	2021-07-01 12:15:54 -04:00
Sam Tebbs	24d76419d6	[ARM] Transform a floating-point to fixed-point conversion to a VCVT_fix Much like fixed-point to floating-point conversion, the converse can also be transformed into a fixed-point VCVT. This patch transforms multiplications of floating point numbers by 2^n into a VCVT_fix. The exception is that a float to fixed conversion with 1 fractional bit ends up being an FADD (FADD(x, x) emulates FMUL(x, 2)) rather than an FMUL so there is a special case for that. This patch also moves the code from https://reviews.llvm.org/D103903 into a separate function as fixed to float and float to fixed are very similar. Differential Revision: https://reviews.llvm.org/D104793	2021-07-01 15:10:40 +01:00
Matt Arsenault	28f2f66200	GlobalISel: Use LLT in memory legality queries This enables proper lowering of non-byte sized loads. We still aren't faithfully preserving memory types everywhere, so the legality checks still only consider the size.	2021-06-30 17:44:13 -04:00
Jonas Paulsson	7aef99351a	[MCStreamer] Move emission of attributes section into MCELFStreamer Enable the emission of a GNU attributes section by reusing the code for emitting the ARM build attributes section. The GNU attributes follow the exact same section format as the ARM BuildAttributes section, so this can be factored out and reused for GNU attributes generally. The immediate motivation for this is to emit a GNU attributes section for the vector ABI on SystemZ (https://reviews.llvm.org/D105067). Review: Logan Chien, Ulrich Weigand Differential Revision: https://reviews.llvm.org/D102894	2021-06-30 16:00:27 -05:00
David Green	cd76f43b49	[ARM] Set the immediate cost of GEP operands to 0 This prevents constant gep operands from being hoisted by the Constant Hoisting pass, leaving them to CodegenPrepare which can usually do a better job at splitting large offsets. This can, in general, improve performance and decrease codesize, especially for v6m where many constants have a high cost. Differential Revision: https://reviews.llvm.org/D104877	2021-06-30 19:19:03 +01:00
Craig Topper	0f1f92156f	[ARM] Fix incorrect assignment of Changed variable in MVEGatherScatterLowering::optimiseOffsets. I believe this Changed flag should be initialized to false, otherwise the if (!Changed) is always dead. This doesn't manifest in a functional issue because the PHINode checks will fail if nothing changed. They are identical to the earlier checks that must have already failed to get into this else block. While there remove an else after return to reduce indentation. Differential Revision: https://reviews.llvm.org/D105159	2021-06-30 07:52:57 -07:00
Igor Kudrin	657e067bb5	[ARMInstPrinter] Print the target address of a branch instruction This follows other patches that changed printing immediate values of branch instructions to target addresses, see D76580 (x86), D76591 (PPC), D77853 (AArch64). As observing immediate values might sometimes be useful, they are printed as comments for branch instructions. // llvm-objdump -d output (before) 000200b4 <_start>: 200b4: ff ff ff fa blx #-4 <thumb> 000200b8 <thumb>: 200b8: ff f7 fc ef blx #-8 <_start> // llvm-objdump -d output (after) 000200b4 <_start>: 200b4: ff ff ff fa blx 0x200b8 <thumb> @ imm = #-4 000200b8 <thumb>: 200b8: ff f7 fc ef blx 0x200b4 <_start> @ imm = #-8 // GNU objdump -d. 000200b4 <_start>: 200b4: faffffff blx 200b8 <thumb> 000200b8 <thumb>: 200b8: f7ff effc blx 200b4 <_start> Differential Revision: https://reviews.llvm.org/D104701	2021-06-30 16:35:28 +07:00
Igor Kudrin	17bcae8906	[ARM][NFC] Remove an unused method `ARMInstPrinter::printMveAddrModeQOperand()` was added in D62680, but was never used. It looks like `printT2AddrModeImm8Operand<false>()` is used instead. Differential Revision: https://reviews.llvm.org/D105124	2021-06-30 15:55:37 +07:00
Matt Arsenault	990278d026	CodeGen: Store LLT instead of uint64_t in MachineMemOperand GlobalISel is relying on regular MachineMemOperands to track all of the memory properties of accesses. Just the raw byte size is insufficent to disambiguate all situations. For example, if we need to split an unaligned extending load, we need to know the number of bits in the original source value and can't infer it from the result type. This is also a problem for extending vector loads. This does decrease the maximum representable size from the full uint64_t bytes to a maximum of 16-bits. No in tree testcases hit this, other than places using UINT64_MAX for unknown sizes. This may be an issue for G_MEMCPY and co., although they can just use unknown size for large static sizes. This also has potential for backend abuse by relying on the type when it really shouldn't be relevant after selection. This does not include the necessary MIR printer/parser changes to represent this.	2021-06-29 17:38:51 -04:00
Tim Northover	c82957e792	ARM: fix vacuously true assertion to actually check what it should. NFC.	2021-06-29 14:24:03 +01:00
David Green	371ee32e01	[ARM] Fold extract of ARM_BUILD_VECTOR This adds a small fold for extract (ARM_BUILD_VECTOR) to fold to the original node. This can help simplify the resulting codegen in some cases. Differential Revision: https://reviews.llvm.org/D104860	2021-06-29 11:03:19 +01:00
David Spickett	558d9e8228	[llvm][ARM] Treat xscale arch as an alias of armv5te Previously xscale was known to everything apart from the ELF streamer so we would crash as soon as you tried to output an object file. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D104776	2021-06-28 15:20:24 +00:00
David Green	a1c0f09a89	[ARM] Add an extra fold for f32 extract(vdup(i32)) This adds another small fold for extract of a vdup, between a i32 and a f32, converting to a BITCAST. This allows some extra folding to happen, simplifying the resulting code. Differential Revision: https://reviews.llvm.org/D104857	2021-06-28 08:54:03 +01:00
David Green	41d8149ee9	[ARM] Lower MVETRUNC to stack operations The MVETRUNC node truncates two wide vectors to a single vector with narrower elements. This is usually lowered to a series of extract/insert elements, going via GPR registers. This patch changes that to instead use a pair of truncating stores and a stack reload. This cuts down the number of instructions at the expense of some stack space. Differential Revision: https://reviews.llvm.org/D104515	2021-06-26 22:12:57 +01:00
David Green	5955812927	[ARM] Introduce MVETRUNC ISel lowering Currently, when encountering store(trunc(..)) where the trunc is double a legal vector lenth in MVE, we spilt the node into two different stores each performing half of the trunc from the wider type. This works well for efficiently lowering wider than legal types, else the trunc becomes a series of individual lane moves. Unfortunately this splitting is currently one of the first combines attempted, so can happen before any other combines which might be more preferable. This patch instead introduces the concept of a MVETRUNC ISel node that the trunk is initially lowered to, to keep it intact as a single item as opposed to splitting it up. This allows us to push the store(trunc(..)) combine later, allowing other optimisations to potentially happen on the trunc first. The store(trunc(..)) splitting can then be done later in the legalisation period if needed, or else fall back to a buildvector as before. This can also be used in the future to lower to loads/stores, as opposed to the more expensive lane extracts/inserts. Some extra combines are added to keep all the existing tests happy. Differential Revision: https://reviews.llvm.org/D91921	2021-06-26 22:00:26 +01:00
David Green	0f83d37a14	[ARM] MVE vabd This adds MVE lowering for VABDS/VABDU, using the code parted from AArch64 in D91937. Differential Revision: https://reviews.llvm.org/D91938	2021-06-26 19:41:32 +01:00
Amara Emerson	f9b3840c3d	[ARM] Fix crash in chained BFI combine due to incorrectly RAUW'ing a node. For a bfi chain like: a = bfi input, x, y b = bfi a, x', y' The previous code was RAUW'ing a with x, mutating the second 'b' bfi, and when SelectionDAG's CSE code ended up deleting it unexpectedly, bad things happend. There's no need to RAUW in this case because we can just return our newly created replacement BFI node. It also looked incorrect because it didn't account for other users of the 'a' bfi. Since it seems that chains of more than 2 BFI nodes are hard/impossible to produce without this combine kicking in at some point, I've removed that functionality since it had no test coverage. rdar://79095399 Differential Revision: https://reviews.llvm.org/D104868	2021-06-24 23:35:47 -07:00
Martin Storsjö	42f74e8249	[llvm] Rename StringRef _lower() method calls to _insensitive() This is a mechanical change. This actually also renames the similarly named methods in the SmallString class, however these methods don't seem to be used outside of the llvm subproject, so this doesn't break building of the rest of the monorepo.	2021-06-25 00:22:01 +03:00
David Green	1113e06821	[ARM] Extend narrow values to allow using truncating scatters As a minor adjustment to the existing lowering of offset scatters, this extends any smaller-than-legal vectors into full vectors using a zext, so that the truncating scatters can be used. Due to the way MVE legalizes the vectors this should be cheap in most situations, and will prevent the vector from being scalarized. Differential Revision: https://reviews.llvm.org/D103704	2021-06-24 13:09:11 +01:00
David Green	8cfc080132	[ARM] Limit v6m unrolling with multiple live outs v6m cores only have a limited number of registers available. Unrolling can mean we spend more on stack spills and reloads than we save from the unrolling. This patch adds an extra heuristic to put a limit on the unroll count for loops with multiple live out values, as measured from the LCSSA phi nodes. Differential Revision: https://reviews.llvm.org/D104659	2021-06-23 16:36:37 +01:00
Nikita Popov	8c01deb8e6	[ARMParallelDSP] Remove unnecessary wrapper function (NFC) AreSequentialAccesses() forwards directly to isConsecutiveAccess() and has an unnecessary template parameter to boot.	2021-06-23 15:27:54 +02:00
David Green	015c27caa2	[ARM] Change some Gather/Scatter interface types to Instructions. NFC These returned Values are cast to an Instruction already, this just cleans up the interface a little to match the expected types.	2021-06-22 19:11:39 +01:00
Eli Friedman	74909e4b6e	Rename MachineMemOperand::getOrdering -> getSuccessOrdering. Since this method can apply to cmpxchg operations, make sure it's clear what value we're actually retrieving. This will help ensure we don't accidentally ignore the failure ordering of cmpxchg in the future. We could potentially introduce a getOrdering() method on AtomicSDNode that asserts the operation isn't cmpxchg, but not sure that's worthwhile. Differential Revision: https://reviews.llvm.org/D103338	2021-06-21 16:49:27 -07:00
Eli Friedman	bf0d0671a1	[ARM] Make sure we don't transform unaligned store to stm on Thumb1. This isn't likely to come up in practice; the combination of compiler flags required to hit this issue should be rare. Found by inspection.	2021-06-21 14:32:42 -07:00
Sam Tebbs	bbe16b7af2	[ARM] Transform a fixed-point to floating-point conversion into a VCVT_fix Conversion from a fixed-point number to a floating-point number is done by multiplying the fixed-point number by 2^(-n) where n is the number of fractional bits. Currently this is lowered to a vcvt (integer to floating-point) then a vmul, but it can instead be lowered directly to a vcvt (fixed-point to floating-point). This patch enables such transformations as long as the multiplication factor is a power of 2. Differential Revision: https://reviews.llvm.org/D103903	2021-06-21 14:14:09 +01:00
Fangrui Song	521d373274	Fix -Wunused-variable and -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build. NFC	2021-06-20 11:09:07 -07:00
Tomas Matheson	18dbe68978	[ARM][NFC] Tidy up subtarget frame pointer routines getFramePointerReg only depends on information in ARMSubtarget, so move it in there so it can be accessed from more places. Make use of ARMSubtarget::getFramePointerReg to remove duplicated code. The main use of useR7AsFramePointer is getFramePointerReg, so inline it. Differential Revision: https://reviews.llvm.org/D104476	2021-06-19 17:00:45 +01:00
Igor Kudrin	85ec210751	[objdump][ARM] Fix evaluating the target address of a Thumb BLX(i) The instruction can be 16-bit aligned while targeting 32-bit aligned code. To calculate the target address correctly, the address of the instruction has to be adjusted. Differential Revision: https://reviews.llvm.org/D104446	2021-06-18 10:40:55 +07:00
David Spickett	e4ecd83fe9	[llvm][AArch64] Handle arrays of struct properly (from IR) This only applies to FastIsel. GlobalIsel seems to sidestep the issue. This fixes https://bugs.llvm.org/show_bug.cgi?id=46996 One of the things we do in llvm is decide if a type needs consecutive registers. Previously, we just checked if it was an array or not. (plus an SVE specific check that is not changing here) This causes some confusion when you arbitrary IR like: ``` %T1 = type { double, i1 }; define [ 1 x %T1 ] @foo() { entry: ret [ 1 x %T1 ] zeroinitializer } ``` We see it is an array so we call CC_AArch64_Custom_Block which bails out when it sees the i1, a type we don't want to put into a block. This leaves the location of the double in some kind of intermediate state and leads to odd codegen. Which then crashes the backend because it doesn't know how to implement what it's been asked for. You get this: ``` renamable $d0 = FMOVD0 $w0 = COPY killed renamable $d0 ``` Rather than this: ``` $d0 = FMOVD0 $w0 = COPY $wzr ``` The backend knows how to copy 64 bit to 64 bit registers, but not 64 to 32. It can certainly be taught how but the real issue seems to be us even trying to assign a register block in the first place. This change makes the logic of AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters a bit more in depth. If we find an array, also check that all the nested aggregates in that array have a single member type. Then CC_AArch64_Custom_Block's assumption of a type that looks like [ N x type ] will be valid and we get the expected codegen. New tests have been added to exercise these situations. Note that some of the output is not ABI compliant. The aim of this change is to simply handle these situations and not to make our processing of arbitrary IR ABI compliant. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D104123	2021-06-16 13:56:01 +00:00
David Green	0a714eaa51	[ARM] Correct type of setcc results for FP vectors Under MVE v4f32 and v8f16 vectors should be using v4i1/v8i1 predicates for the setcc result type, as they have predicated registers for those types. Setting this correctly prevents some inefficient optimizations from happening.	2021-06-16 11:11:03 +01:00
David Green	93aa445e16	Revert "[ARM] Extend narrow values to allow using truncating scatters" This commit adds nodes that might not always be used, which the expensive checks builder does not like. Reverting for now to think up a better way of handling it.	2021-06-15 18:19:25 +01:00
David Green	b9bd2936f9	[ARM] Extend narrow values to allow using truncating scatters As a minor adjustment to the existing lowering of offset scatters, this extends any smaller-than-legal vectors into full vectors using a zext, so that the truncating scatters can be used. Due to the way MVE legalizes the vectors this should be cheap in most situations, and will prevent the vector from being scalarized. Differential Revision: https://reviews.llvm.org/D103704	2021-06-15 17:45:14 +01:00
David Green	680d3f8f17	[ARM] Use rq gather/scatters for smaller v4 vectors A pointer will always fit into an i32, so a rq offset gather/scatter can be used with v4i8 and v4i16 gathers, using a base of 0 and the Ptr as the offsets. The rq gather can then correctly extend the type, allowing us to use the gathers without falling back to scalarizing. This patch rejigs tryCreateMaskedGatherOffset in the MVEGatherScatterLowering pass to decompose the Ptr into Base:0 + Offset:Ptr (with a scale of 1), if the Ptr could not be decomposed from a GEP. v4i32 gathers will already use qi gathers, this extends that to v4i8 and v4i16 gathers using the extending rq variants. Differential Revision: https://reviews.llvm.org/D103674	2021-06-15 17:06:15 +01:00
David Green	09924cbab7	[ARM] Rejig some of the MVE gather/scatter lowering pass. NFC This adjusts some of how the gather/scatter lowering pass passes around data and where certain gathers/scatters are created from. It should not effect code generation on its own, but allows other patches to more clearly reason about the code. A number of extra test cases were also added for smaller gathers/ scatters that can be extended, and some of the test comments were updated.	2021-06-15 15:38:39 +01:00
David Green	bee2f618d5	[ARM] Introduce t2WhileLoopStartTP This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in D90591. It keeps a reference to both the tripcount register and the element count register, so that the ARMLowOverheadLoops pass in the backend can pick the correct one without having to search for it from the operand of a VCTP. Differential Revision: https://reviews.llvm.org/D103236	2021-06-13 13:55:34 +01:00
Kristina Bessonova	f6b9836b09	[ARM][NEON] Combine base address updates for vld1Ndup intrinsics Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D103836	2021-06-13 11:18:32 +02:00
Koutheir Attouchi	789708617d	Do not generate calls to the 128-bit function __multi3() on 32-bit ARM Re-applying this patch after bots failures. Should be fine now. The function __multi3() is undefined on 32-bit ARM, so a call to it should never be emitted. Instead, plain instructions need to be generated to perform 128-bit multiplications. Differential Revision: https://reviews.llvm.org/D103906	2021-06-11 11:45:21 +01:00
David Green	5d5b686f6b	[ARM] Fix Changed status in MVEGatherScatterLoweringPass. Now that we are calling SimplifyInstructionsInBlock, make sure we update Changed when it reports alterations.	2021-06-10 21:53:04 +01:00
David Green	e0c605f638	[ARM] Ensure instructions are simplified prior to GatherScatter lowering. Surprisingly, not all instructions are always simplified after unrolling and before MVE gather/scatter lowering. Notably dead gather operations can be left around which cause the gather/scatter lowering pass to crash if there are multiple gathers, some of which are dead. This patch ensures they are simplified before we modify anything, which can change some of the existing tests, including making them no-longer test what they originally tested. This uses a combination of disabling the gather/scatter lowering pass and adjusting the test to keep them as before. Differential Revision: https://reviews.llvm.org/D103150	2021-06-10 20:18:12 +01:00
David Green	9872551ca0	[ARM] Skip debug during vpt block creation Debug info is currently preventing VPT block creation, leading to different codegen. This patch attempts to skip any debug instructions during vpt block creation, making sure they do not interfere. Differential Revision: https://reviews.llvm.org/D103610	2021-06-10 14:49:04 +01:00
Simon Pilgrim	4eb47e3cd4	[TargetLowering] getABIAlignmentForCallingConv - pass DataLayout by const reference. NFCI. Avoid unnecessary copies and match every other method in TargetLowering that takes DataLayout as an argument.	2021-06-10 10:55:24 +01:00
Nico Weber	68a1d9a1f5	Revert "Do not generate calls to the 128-bit function __multi3() on 32-bit ARM" This reverts commit `64e9aa3302`. Breaks check-llvm everywhere, see https://reviews.llvm.org/D103906	2021-06-09 13:21:05 -04:00

1 2 3 4 5 ...

11482 Commits