Commit Graph

309 Commits

Author SHA1 Message Date
David Green de6dfbbb30 [ARM] Fix for MVE i128 vector icmp costs.
We were hitting an assert as the legalied type needn't be a vector.

Fixes #58364
2022-10-14 18:49:25 +01:00
Philip Reames f6d110e26f [LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc]
This is purely NFC restructure in advance of a change which actually exposes zero strides.  This is mostly because I find this interface confusing each time I look at it.
2022-09-27 15:55:44 -07:00
Kazu Hirata 2833760c57 [Target] Qualify auto in range-based for loops (NFC) 2022-08-28 17:35:09 -07:00
Philip Reames df20ff9ae2 [TTI] Kill last couple uses of OperandValueKind in targets [nfc]
Use the accessor methods on the containing class instead so that we can change the representation.
2022-08-23 08:54:41 -07:00
Philip Reames c9608d57b8 [TTI] Plumb through OperandValueInfo in getMemoryOpCost [NFC]
This has the effect of exposing the power-of-two property for use in memory op costing, but no target actually uses it yet.  The main point of this change is simple consistency with the recently changes getArithmeticInstrCost, and to remove the last (interface) use of OperandValueKind.
2022-08-23 07:55:42 -07:00
Philip Reames 104fa367ee [TTI] Use OperandValueInfo in getArithmeticInstrCost implementation [NFC]
This change completes the process of replacing OperandValueKind and OperandValueProperties which were previously passed independently in this API with a single container class which contains both.

This is the change which motivated the whole sequence which preceeded it.  In an original spike version of this change, I'd noticed a nasty bug: I'd changed the signature without changing names, and as result, we silently passed additional information through a callsite which previously dropped the power-of-two fact.  This might be harmless in most cases, but at least a couple clearly dependend for correctness on not passing that property through.

I did my best to split off prior changes which reduced the scope of this one, and which made it possible to use compiler assistance.  For instance, every parameter which changes type in this change also changes name.  This was intentional to make sure that every call site possible effected must show up in the diff.  This let me audit each one closely.
2022-08-22 15:16:39 -07:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Alexey Bataev d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Simon Pilgrim fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
Daniil Fukalov 7ed3d81333 [NFCI] Move cost estimation from TargetLowering to TargetTransformInfo.
TragetLowering had two last InstructionCost related `getTypeLegalizationCost()`
and `getScalingFactorCost()` members, but all other costs are processed in TTI.

E.g. it is not comfortable to use other TTI members in these two functions
overrided in a target.

Minor refactoring: `getTypeLegalizationCost()` now doesn't need DataLayout
parameter - it was always passed from TTI.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D117723
2022-08-18 00:38:55 +03:00
David Sherwood 4ef9cb6c17 [AArch64][LoopVectorize] Disable tail-folding for SVE when loop has interleaved accesses
If we have interleave groups in the loop we want to vectorise then
we should fall back on normal vectorisation with a scalar epilogue. In
such cases when tail-folding is enabled we'll almost certainly go on to
create vplans with very high costs for all vector VFs and fall back on
VF=1 anyway. This is likely to be worse than if we'd just used an
unpredicated vector loop in the first place.

Once the vectoriser has proper support for analysing all the costs
for each combination of VF and vectorisation style, then we should
be able to remove this.

Added an extra test here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D128342
2022-08-02 09:52:33 +01:00
jacquesguan e38af7ba95 [LV] Refactor getExtendedAddReductionCost to support other extended reduction more than Add.
Now the API getExtendedAddReductionCost is used to determine the cost of extended Add reduction with optional Mul. For Arm, it could cover the cases. But for other target, for example: RISCV, they support other kinds of extended recution, such as FAdd.

This patch does the following changes:
1, Split getExtendedAddReductionCost into 2 new API: getExtendedReductionCost which handles the extended reduction with addtional input of Opcode; getMulAccReductionCost which handle the MLA cases the getExtendedAddReductionCost.
2, Refactor getReductionPatternCost, add some contraint condition to make sure the getMulAccReductionCost should only handle the reuction of Add + Mul.

Differential Revision: https://reviews.llvm.org/D130868
2022-08-02 16:02:38 +08:00
David Sherwood f15b6b2907 [AArch64] Add target hook for preferPredicateOverEpilogue
This patch adds the AArch64 hook for preferPredicateOverEpilogue,
which currently returns true if SVE is enabled and one of the
following conditions (non-exhaustive) is met:

1. The "sve-tail-folding" option is set to "all", or
2. The "sve-tail-folding" option is set to "all+noreductions"
and the loop does not contain reductions,
3. The "sve-tail-folding" option is set to "all+norecurrences"
and the loop has no first-order recurrences.

Currently the default option is "disabled", but this will be
changed in a later patch.

I've added new tests to show the options behave as expected here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding-option.ll

Differential Revision: https://reviews.llvm.org/D129560
2022-07-21 17:20:06 +01:00
David Sherwood 03fee6712a [LoopVectorize] Add option to use active lane mask for loop control flow
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.

This patch adds support for using the active lane mask for control
flow by:

1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.

I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.

Differential Revision: https://reviews.llvm.org/D125301
2022-07-11 13:46:55 +01:00
David Green 53be6ab25c [ARM] Fix MVE getShuffleCost legalized type check
The MVE shuffle costing for VREV instructions was making incorrect
assumptions as to legalized vector types remaining as vectors. Add a
quick check to ensure they are indeed vectors before attempting to get
the number of elements.
2022-06-07 14:36:04 +01:00
David Green b4dd9fc370 [ARM] Cost modelling for MVE vector fptoi_sat
Building on top of D125665, this adds MVE costs for fptosi.sat and
fptoui.sat, providing MVE is available and the types are legal.

Differential Revision: https://reviews.llvm.org/D125666
2022-05-20 11:00:34 +01:00
David Green 80aab0312a [ARM] Cost modelling for scalar fptoi_sat
Similar to D124357, this adds some cost modelling for fptoi_sat for Arm
targets. Where VFP2 is available (and FP64/FP16 for the relevant types),
the operations are legal as the Arm instructions naturally saturate.
Otherwise they will need an extra smin/smax clamp, similar to AArch64.

Differential Revision: https://reviews.llvm.org/D125665
2022-05-19 19:53:21 +01:00
Vasileios Porpodas fa8a9fea47 Recommit "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 6a9bbd9f20.

Code review: https://reviews.llvm.org/D124202
2022-04-26 14:02:40 -07:00
Vasileios Porpodas 6a9bbd9f20 Revert "[SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`"
This reverts commit 55ce296d6f.
2022-04-26 11:25:26 -07:00
Vasileios Porpodas 55ce296d6f [SLP][TTI] Refactoring of `getShuffleCost` `Args` to work like `getArithmeticInstrCost`
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.

Differential Revision: https://reviews.llvm.org/D124202
2022-04-26 11:11:29 -07:00
Craig Topper 65942554e2 [ARM] Add missing return to ARMTTIImpl::isLoweredToCall.
I assume we meant to return the result of the call to
BaseT::isLoweredToCall(F).

This might not be a functional change in practice because it would
still hit the default case in the switch and call
BaseT::isLoweredToCall(F) at the end.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D123333
2022-04-07 12:52:54 -07:00
Vasileios Porpodas 39aa202aff Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit e6ead19b77.
2022-03-23 18:32:17 -07:00
Arthur Eubanks e6ead19b77 Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash."
This reverts commit 27bd8f9492.

Causes crashes, see comments in D121973
2022-03-23 10:57:45 -07:00
Vasileios Porpodas 27bd8f9492 Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit f7d7d2a08d.
2022-03-22 16:41:55 -07:00
Arthur Eubanks f7d7d2a08d Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads.""
This reverts commit 79613185d3.

Causes crashes, see comments in https://reviews.llvm.org/D121973.
2022-03-22 13:33:49 -07:00
Vasileios Porpodas 79613185d3 Recommit "[SLP] Fix lookahead operand reordering for splat loads."
Original review: https://reviews.llvm.org/D121354

The original commit 9136145eb0 broke the build on several targets.

Differential Revision: https://reviews.llvm.org/D121973
2022-03-21 15:57:32 -07:00
Nikita Popov 93122b2567 [ARM] Don't look through pointer types in canTailPredicateLoop()
Inspecting the pointer element type here is incompatible with
opaque pointers, and doesn't seem necessary to me. I think the
intention might have been to check the type of load/store pointer
arguments, but I believe those should get checked through their
return type or value operand anyway. I don't get any test failures
if I simply drop this.

Differential Revision: https://reviews.llvm.org/D118353
2022-01-28 09:34:13 +01:00
Rosie Sumpter 552eb372cb [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
This is required to query the legality more precisely in the LoopVectorizer.

This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.

Differential Revision: https://reviews.llvm.org/D115329
2022-01-12 13:34:12 +00:00
David Green 646c872f9d [ARM] Teach getIntImmCostInst about the cost of saturating fp converts
Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.

Recommitted from 0e98659ea1 with a fix for APInt comparison.

Differential Revision: https://reviews.llvm.org/D114380
2021-12-02 07:56:27 +00:00
David Green 13e66c070b Revert "[ARM] Teach getIntImmCostInst about the cost of saturating fp converts"
This reverts commit 6d41de380f as the
windows bots are not happy, in a way I do not understand. Revert whilst
we figure out what is wrong.
2021-12-01 15:25:19 +00:00
David Green 6d41de380f [ARM] Teach getIntImmCostInst about the cost of saturating fp converts
Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.

Differential Revision: https://reviews.llvm.org/D114380
2021-12-01 10:25:52 +00:00
Simon Pilgrim 71e39e3f18 [ADT] Add APInt::isNegatedPowerOf2() helper
Inspired by D111968, provide a isNegatedPowerOf2() wrapper instead of obfuscating code with (-Value).isPowerOf2() patterns, which I'm sure are likely avenues for typos.....

Differential Revision: https://reviews.llvm.org/D111998
2021-10-19 14:38:21 +01:00
Jay Foad a9bceb2b05 [APInt] Stop using soft-deprecated constructors and methods in llvm. NFC.
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.

Differential Revision: https://reviews.llvm.org/D110807
2021-10-04 08:57:44 +01:00
David Green 20b1a16a69 [ARM] Mark <= -1 immediate constant as cheap
A <= -1 constant on a compare can be converted to a < 0 operation, which
is usually cheap. If we mark the constant as cheap, preventing hoisting,
we allow that fold to happen even across different blocks.

Differential Revision: https://reviews.llvm.org/D109360
2021-10-03 19:30:08 +01:00
Kazu Hirata c1e32b3fc0 [Target] Migrate from getNumArgOperands to arg_size (NFC)
Note that getNumArgOperands is considered a legacy name.  See
llvm/include/llvm/IR/InstrTypes.h for details.
2021-10-02 12:06:29 -07:00
David Green 5a6dfbb8cd [ARM] Teach DemandedVectorElts about VMOVN lanes
The class of instructions that write to narrow top/bottom lanes only
demand the even or odd elements of the input lanes. Which means that a
pair of VMOVNT; VMOVNB demands no lanes from the original input. This
teaches that to instcombine from the target hooks available through
ARMTTIImpl.

Differential Revision: https://reviews.llvm.org/D109325
2021-09-14 11:05:31 +01:00
Nikita Popov 45c467346a [LAA] Pass access type to getPtrStride()
Pass the access type to getPtrStride(), so it is not determined
from the pointer element type. Many cases still fetch the element
type at a higher level though, so this only partially addresses
the issue.
2021-09-11 19:16:49 +02:00
Florian Hahn 5d1a6d0d1a
[ARM] Remove unnecessary use of replaceSymbolicStrideSCEV (NFC).
When passing an empty strides map, there's nothing to replace for
replaceSymbolicStrideSCEV and it just returns the SCEV for Ptr. There
should be no need to call the function.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D109462
2021-09-10 10:47:26 +02:00
David Green efa340fbd2 [ARM] Workaround tailpredication min/max costmodel
The min/max intrinsics are not yet canonical, but when they are the tail
predications analysis will change from treating them like icmp to
treating them like intrinsics. Unfortunately, they can currently produce
better code by not being tail predicated thanks to the vectorizer picking
higher VF's and the backend folding to better instructions (especially
for saturate patterns). In the long run we will need to improve the
vectorizers cost modelling, recognizing the instruction directly, but in
the meantime this treats min/max as before to prevent performance
regressions.
2021-08-30 19:19:51 +01:00
Roman Lebedev 6f6e9a867f
[BasicTTIImpl][LoopUnroll] getUnrollingPreferences(): emit ORE remark when advising against unrolling due to a call in a loop
I'm not sure this is the best way to approach this,
but the situation is rather not very detectable unless we explicitly call it out when refusing to advise to unroll.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D107271
2021-08-03 00:57:26 +03:00
Simon Pilgrim 0579050116 Fix MSVC signed/unsigned comparison warning. NFCI. 2021-08-02 11:23:43 +01:00
David Green 41cedb1c9a [LV][ARM] Tighten up MLA reduction costing
This makes a couple of changes to the costing of MLA reduction patterns,
to more accurately cost various patterns that can come up from
vectorization.

 - The Arm implementation of getExtendedAddReductionCost is altered to
   only provide costs for legal or smaller types. Larger than legal types
   need to be split, which currently does not work very well, especially
   for predicated reductions where the predicate may be legal but needs to
   be split. Currently we limit it to legal or smaller input types.
 - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext))
   is a pattern that can come up, and can be treated the same as
   reduce(mul(ext, ext)) providing the extension types match.
 - And it has been adjusted to not count the ext in reduce(mul(ext, ext))
   as part of a reduce(mul) pattern.

Together these changes help to more accurately cost the mla reductions
in cases such as where the extend types don't match or the extend
opcodes are different, picking better vector factors that don't result
in expanded reductions.

Differential Revision: https://reviews.llvm.org/D106166
2021-07-28 12:50:58 +01:00
David Sherwood 0aff1798b5 [Analysis] Add simple cost model for strict (in-order) reductions
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:

  1. Tree-wise. This is the typical fast-math reduction that involves
  continually splitting a vector up into halves and adding each
  half together until we get a scalar result. This is the default
  behaviour for integers, whereas for floating point we only do this
  if reassociation is allowed.
  2. Ordered. This now allows us to estimate the cost of performing
  a strict vector reduction by treating it as a series of scalar
  operations in lane order. This is the case when FP reassociation
  is not permitted. For scalable vectors this is more difficult
  because at compile time we do not know how many lanes there are,
  and so we use the worst case maximum vscale value.

I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.

New tests have been added here:

  Analysis/CostModel/AArch64/reduce-fadd.ll
  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
  Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll

Differential Revision: https://reviews.llvm.org/D105432
2021-07-26 10:26:06 +01:00
David Green 38c9a4068d [TTI] Remove IsPairwiseForm from getArithmeticReductionCost
This patch removes the IsPairwiseForm flag from the Reduction Cost TTI
hooks, along with some accompanying code for pattern matching reductions
from trees starting at extract elements. IsPairWise is now assumed to be
false, which was the predominant way that the value was used from both
the Loop and SLP vectorizers. Since the adjustments such as D93860, the
SLP vectorizer has not relied upon this distinction between paiwise and
non-pairwise reductions.

This also removes some code that was detecting reductions trees starting
from extract elements inside the costmodel. This case was
double-counting costs though, adding the individual costs on the
individual instruction _and_ the total cost of the reduction. Removing
it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to
not double count. The cost of reduction intrinsics is still tested
through the various tests in
llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.

Differential Revision: https://reviews.llvm.org/D105484
2021-07-09 11:51:16 +01:00
David Green a77e2d196c [ARM] Fix arm.mve.pred.v2i range upper limit
The range metadata specifies a half open range, so our top limit was one
off.
2021-07-05 21:06:30 +01:00
David Green cd76f43b49 [ARM] Set the immediate cost of GEP operands to 0
This prevents constant gep operands from being hoisted by the Constant
Hoisting pass, leaving them to CodegenPrepare which can usually do a
better job at splitting large offsets. This can, in general, improve
performance and decrease codesize, especially for v6m where many
constants have a high cost.

Differential Revision: https://reviews.llvm.org/D104877
2021-06-30 19:19:03 +01:00
David Green 8cfc080132 [ARM] Limit v6m unrolling with multiple live outs
v6m cores only have a limited number of registers available. Unrolling
can mean we spend more on stack spills and reloads than we save from the
unrolling. This patch adds an extra heuristic to put a limit on the
unroll count for loops with multiple live out values, as measured from
the LCSSA phi nodes.

Differential Revision: https://reviews.llvm.org/D104659
2021-06-23 16:36:37 +01:00
Daniil Fukalov 0195e594fe [TTI] NFC: Change getIntImmCodeSizeCost to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102915
2021-06-02 16:04:11 +03:00
Daniil Fukalov 3489c2d7b1 [TTI] NFC: Change getTypeLegalizationCost to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen, kparzysz

Differential Revision: https://reviews.llvm.org/D101533
2021-04-30 22:51:51 +03:00
Alexey Bataev 12c51f2358 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 12:48:00 -07:00