Commit Graph

150 Commits

Author SHA1 Message Date
Simon Pilgrim fdec50182d [CostModel] Replace getUserCost with getInstructionCost
* Replace getUserCost with getInstructionCost, covering all cost kinds.
* Remove getInstructionLatency, it's not implemented by any backends, and we should fold the functionality into getUserCost (now getInstructionCost) to make it easier for targets to handle the cost kinds with their existing cost callbacks.

Original Patch by @samparker (Sam Parker)

Differential Revision: https://reviews.llvm.org/D79483
2022-08-18 11:55:23 +01:00
Simon Pilgrim 4178e33470 [CostModel] Update RUN -passes=* to double quotes to appease update scripts on windows
DOS really doesn't like `` quotes to be used in command lines

Some prep work as I'm intending to resurrect D79483 soon
2022-08-10 17:54:06 +01:00
David Green 0a11ad2aa8 [ARM] Expand MVE i1 fptoint and inttofp if mve.fp is not present.
If MVE.fp is not present then we cannot select the vector i1 fp
operations to VCMP instructions, so need to expand.
2022-07-11 13:03:30 +01:00
David Green 438ffdb821 [ARM] Switch the costs of mve1beat and mve4beat
These three subtarget features are meant to control where MVE
instructions take 1 vs 2 vs 4 architectural beats. The mve1beat feature
is described as "Model MVE instructions as a 1 beat per tick
architecture", meaning MVE instruction will execute over 4 cycles.
mve4beat is the opposite where the entire 4 beats of the MVE instruction
execute in a single cycle. The costs for the two were backwards though,
not matching the cycle counts like they should. This patch switches the
costs on the two to bring them in-line with expectations.

Differential Revision: https://reviews.llvm.org/D129141
2022-07-07 16:10:00 +01:00
David Green 53be6ab25c [ARM] Fix MVE getShuffleCost legalized type check
The MVE shuffle costing for VREV instructions was making incorrect
assumptions as to legalized vector types remaining as vectors. Add a
quick check to ensure they are indeed vectors before attempting to get
the number of elements.
2022-06-07 14:36:04 +01:00
David Green b4dd9fc370 [ARM] Cost modelling for MVE vector fptoi_sat
Building on top of D125665, this adds MVE costs for fptosi.sat and
fptoui.sat, providing MVE is available and the types are legal.

Differential Revision: https://reviews.llvm.org/D125666
2022-05-20 11:00:34 +01:00
David Green 80aab0312a [ARM] Cost modelling for scalar fptoi_sat
Similar to D124357, this adds some cost modelling for fptoi_sat for Arm
targets. Where VFP2 is available (and FP64/FP16 for the relevant types),
the operations are legal as the Arm instructions naturally saturate.
Otherwise they will need an extra smin/smax clamp, similar to AArch64.

Differential Revision: https://reviews.llvm.org/D125665
2022-05-19 19:53:21 +01:00
David Green 4a8c13a6f4 [CostModel] Add basic fptoi_sat costs
This adds some basic fptosi_sat and fptoui_sat target independent cost
modelling. The fptosi_sat is modelled as a fmin/fmax to saturate the
value, followed by a fp convert. The signed values then have an
additional fcmp+select for handling Nan correctly.

The AArch64/Arm costs may be more incorrect, as the instruction exist
natively. This can be fixed with target specific cost updates.

Differential Revision: https://reviews.llvm.org/D124269
2022-04-27 09:30:00 +01:00
David Green 1159984802 [CostModel] Add fptoi_sat costmodel tests. NFC 2022-04-25 18:44:35 +01:00
David Sherwood e7b89c2fc3 Add BasicTTIImpl cost model for llvm.get.active.lane.mask intrinsic
The vectoriser sometimes generates predicated vector loops using
the llvm.get.active.lane.mask intrinsic so it's important that we
are able to calculate a valid cost for the call instruction. When
SVE is enabled we are able to use a single whilelo instruction
for some vector types - in such cases I've marked the cost as 1.
For all other cases I've set the cost according to how the intrinsic
will be expanded.

Tests added here:

  Analysis/CostModel/AArch64/sve-intrinsics.ll
  Analysis/CostModel/ARM/active_lane_mask.ll
  Analysis/CostModel/RISCV/active_lane_mask.ll

Differential Revision: https://reviews.llvm.org/D121109
2022-03-14 09:35:05 +00:00
Arthur Eubanks 15ba588d6d [test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>' 2022-02-09 15:42:16 -08:00
Andrew Litteken 4ff4e7ea30 [CostModel] Use cost of target trunc type when only it is the only use of a non-register sized load
The code size cost model for most targets uses the legalization cost for the type of the pointer of a load. If this load is followed directly by a trunc instruction, and is the only use of the result of the load, only one instruction is generated in the target assembly language. This adds a check for this case, and uses the target type of the trunc instruction if so.

This did not show any changes in CTMark code size benchmarks.

Reviewers: paquette, samparker, dmgreen

Differential Revision: https://reviews.llvm.org/D109388
2022-01-12 18:03:50 -06:00
David Green 255ad73424 [ARM] Make MVE v2i1 predicates legal
MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the
same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the
two halves. This was never treated as a legal type in llvm in the past
as there are not many 64bit instructions and no 64bit compares. There
are a few instructions that could use it though, notably a VSELECT (as
it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for
similar reasons, some gathers/scatter and long multiplies and VCTP64
instructions.

This patch goes through and makes v2i1 a legal type, handling all the
cases that fall out of that. It also makes VSELECT legal for v2i64 as a
side benefit. A lot of the codegen changes as a result - usually in way
that is a little better or a little worse, but still expensive. Costs
can change a little too in the process, again in a way that expensive
things remain expensive. A lot of the tests that changed are mainly to
ensure correctness - the code can hopefully be improved in the future
where it comes up in practice.

The intrinsics currently remain using the v4i1 they previously did to
emulate a v2i1. This will be changed in a followup patch but this one
was already large enough.

Differential Revision: https://reviews.llvm.org/D114449
2021-12-03 14:05:41 +00:00
Zarko Todorovski 7f7dac7126 [NFC][llvm] Inclusive language: reword uses of sanity test and check
Part of continuing work to use more inclusive language. Reworded uses
of sanity check and sanity test in llvm/test/
2021-11-25 07:21:42 -05:00
David Green 309f1e4ac8 [ARM] Add datalayout to costmodel tests. NFC
This adds a sensible datalayout to the ARM cost model tests, to prevent
the costs reported being incorrect for the size of pointers.
2021-11-16 09:49:42 +00:00
Simon Pilgrim 7bd097fd1e [CostModel][TTI] Fix ops used for generic smulo/umulo cost expansion
Fix copy+pasta that was checking for smul_fix instead of smul_with_overflow to detected signed values.

The LShr is performed on the extended type as we use it to truncate+extract the upper/hi bits of the extended multiply.

More closely matches the default expansion from TargetLowering::expandMULO
2021-10-06 19:11:32 +01:00
Craig Topper 765348298c [CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering
The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted.

I believe the cost model was also incorrect for the old expansion.
The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result
to calculate theirs signs. Then 2 icmps to compare the signs. Followed
by an And. The previous cost model was using 3 icmps and 2 selects.
Digging back through git blame, those 2 selects in the cost model used to
be 2 icmps, but were changed in https://reviews.llvm.org/D90681

Differential Revision: https://reviews.llvm.org/D110739
2021-09-30 09:41:14 -07:00
Simon Pilgrim 7397dcb403 [TTI] Add basic SK_InsertSubvector shuffle mask recognition
This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188

This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion.

The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic.

The widening case will be addressed in a follow up patch that treats the cost as 0.

Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors.

Differential Revision: https://reviews.llvm.org/D107228
2021-08-02 11:23:44 +01:00
Sander de Smalen 97215fe3f4 [CostModel] Express cost(urem) as cost(div+mul+sub) when set to Expand.
The Legalizer expands the operations of urem/srem into a div+mul+sub or divrem
when those are legal/custom. This patch changes the cost-model to reflect that
cost.

Since there is no 'divrem' Instruction in LLVM IR, the cost of divrem
is assumed to be the same as div+mul+sub since the three operations will
need to be executed at runtime regardless.

Patch co-authored by David Sherwood (@david-arm)

Reviewed By: RKSimon, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D103799
2021-07-07 14:40:28 +01:00
Sander de Smalen 4ca860742d [InstructionCost] Don't conflate Invalid costs with Unknown costs.
We previously made a change to getUserCost to return a Invalid cost
when one of the TTI costs returned '-1' (meaning 'unknown' or
'infinitely expensive'). It makes no sense to say that:

  shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

has an invalid cost. Perhaps the cost is not known, but the IR is valid
and can be code-generated. Invalid should only be used for IR that
cannot possibly be code-generated and where a cost is nonsensical.

With more passes now asserting that the cost must be valid, it is possible
that those assertions will fail for perfectly valid IR. An incomplete
cost-model probably shouldn't be a reason for the compiler to break.

It's better to consider these costs as 'very expensive' and ignore them
for other reasons. At some point, we should consider replacing -1 with
some other mechanism.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D99502
2021-03-30 09:29:42 +01:00
David Green a2e0312cda [ARM] Tone down the MVE scalarization overhead
The scalarization overhead was set deliberately high for MVE, whilst the
codegen was new. It helps protect us against the negative ramifications
of mixing scalar and vector instructions. This decreases that,
especially for floating point where the cost of extracting/inserting
lane elements can be low. For integer the cost is still fairly high due
to the cross-register-bank copy, but is no longer n^2 in the length of
the vector.

In general, this will decrease the cost of scalarizing floats and long
integer vectors. i64 increase in cost, having a high cost before and
after this patch. For floats this allows up to start doing things like
vectorizing fdiv instructions, even if they are scalarized.

Differential Revision: https://reviews.llvm.org/D98245
2021-03-19 18:30:11 +00:00
Alexey Bataev 14ae0cf0f5 [Cost]Canonicalize the cost for logical or/and reductions.
The generic cost of logical or/and reductions should be cost of bitcast
<ReduxWidth x i1> to iReduxWidth + cmp eq|ne iReduxWidth.

Differential Revision: https://reviews.llvm.org/D97961
2021-03-19 11:01:58 -07:00
David Green 35e0567d58 [ARM] Add VREV MVE shuffle costs
This uses the shuffle mask cost from D98206 to give a better cost of MVE
VREV instructions. This helps especially in VectorCombine where the cost
of shuffles is used to reorder bitcasts, which this helps keep the phase
ordering test for fp16 reductions producing optimal code. The isVREVMask
has been moved to a header file to allow it to be used across target
transform and isel lowering.

Differential Revision: https://reviews.llvm.org/D98210
2021-03-17 21:21:43 +00:00
Alexey Bataev 60470ac7ff [Cost]Add tests for boolean and/or reductions, NFC.
Tests with the default costs for boolean and/or reductions.

Differential Revision: https://reviews.llvm.org/D97793
2021-03-03 12:34:30 -08:00
Juneyoung Lee c89d9d8a48 [TTI] Consider select form of and/or i1 as having arithmetic cost
This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b`
as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`.

Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe.
This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked.
These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D97360
2021-03-02 02:18:19 +09:00
David Green 33ba220611 [ARM] Ensure types provided to getIntrinsicCost are valid
It appears that pointer types were causing issues for the min/max cost
code in getIntrinsicInstrCost. This makes sure that when matching
icmp/select to a min/max, we only do that for normal int or float types.
2021-02-18 14:00:23 +00:00
David Green 1a6744e3dc [ARM] Add larger than legal ICmp costs
A v8i32 compare will produce a v8i1 predicate, but during codegen the
v8i32 will be split into two v4i32, potentially requiring two v4i1
predicates to be merged into a single v8i1. Because this merging of two
v4i1's into a v8i1 is very expensive, we need to make the cost of the
compare equally high.

This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost.
Because we don't know whether the user of the predicate can be split,
and the cost model is mostly pre-instruction, we may be pessimistic but
that should only be for larger and legal types. This also adds min/max
detection to the costmodel where it can be detected, to keep those in
line with the cost of simple min/max instructions. Otherwise for the
most part, costs that were already expensive have become more expensive.

Differential Revision: https://reviews.llvm.org/D96692
2021-02-18 11:42:17 +00:00
David Green 1fbb3287fc [ARM] MVE ICmp costing tests. NFC 2021-02-18 10:50:34 +00:00
David Green 6d835c5fcd [ARM] Add MVE abs costs
Similar to min/max, this increases the accuracy of abs intrinsics costs
under MVE.
2021-02-17 14:21:09 +00:00
David Green 415deff10b [ARM] MVE abs intrinsic costs. NFC 2021-02-17 13:54:17 +00:00
David Green 0a98efb049 [ARM] Add some basic Min/Max costs
This adds basic MVE costs for SMIN/SMAX/UMIN/UMAX, as well as MINNUM and
MAXNUM representing fmin and fmax. It tightens up the costs, not using a
ICmp+Select cost.

Differential Revision: https://reviews.llvm.org/D96603
2021-02-15 15:06:19 +00:00
David Green 6abe362ed7 [ARM] Fix duplicate fdiv tests, changing them to frem. NFC 2021-02-13 15:16:11 +00:00
David Green 7c2e061188 [ARM] Extra vector shuffle tests of various kinds. NFC 2021-02-13 15:03:10 +00:00
David Green b7c3de8d5a [ARM] MVE min/max cost tests. NFC 2021-02-13 11:12:12 +00:00
Sander de Smalen 63d787e5d4 [CostModel] An extending load to illegal type is not free.
COST(zext (<4 x i32> load(...) to <4 x i64>)) != 0 when
<4 x i64> is an illegal result type that requires splitting
of the operation.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D96250
2021-02-12 07:59:21 +00:00
David Green b1ef919aad [ARM] Add CostKind to getMVEVectorCostFactor.
This adds the CostKind to getMVEVectorCostFactor, so that it can
automatically account for CodeSize costs, where it returns a cost of 1
not the MVEFactor used for Throughput/Latency. This helps simplify the
caller code and allows us to get the codesize cost more correct in more
cases.
2021-02-11 15:33:59 +00:00
Sander de Smalen b9417c3616 [CostModel] Handle CTLZ and CCTZ in getTypeBasedIntrinsicInstrCost
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D95355
2021-01-26 14:37:51 +00:00
David Green 6a563eef13 [ARM] Expand vXi1 VSELECT's
We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as
expanded to turn them into a series of logical operations.

Differential Revision: https://reviews.llvm.org/D94946
2021-01-19 17:56:50 +00:00
David Green f373b30923 [ARM] Add MVE add.sat costs
This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs,
based on when the instruction is legal. With smaller than legal types
that are promoted we generate shr(qadd(shl, shl)), so the cost is 4
appropriately.

Differential Revision: https://reviews.llvm.org/D94958
2021-01-19 15:38:46 +00:00
David Green 54e38440e7 [ARM] Expand add.sat/sub.sat cost checks. NFC 2021-01-19 15:06:06 +00:00
David Green dcefcd51e0 [ARM] Update trunc costs
We did not have specific costs for larger than legal truncates that were
not otherwise cheap (where they were next to stores, for example). As
MVE does not have a dedicated instruction for them (and we do not use
loads/stores yet), they should be expensive as they get expanded to a
series of lane moves.

Differential Revision: https://reviews.llvm.org/D94260
2021-01-11 08:59:28 +00:00
David Green 0c8b748f32 [ARM] Additional trunc cost tests. NFC 2021-01-11 08:35:16 +00:00
David Green a4823377fd [ARM] Add basic masked load/store costs
This adds some basic MVE masked load/store costs, notably changing the
cost of legal loads/stores to the MVECostFactor and the cost of
scalarized instructions to 8*NumElts.

Differential Revision: https://reviews.llvm.org/D86538
2020-12-12 15:26:32 +00:00
Sanjay Patel 2717252c92 [CostModel] add basic handling for FP maximum/minimum intrinsics
This might be a regression for some ARM targets, but that should
be changed in the target-specific overrides.

There is apparently still no default lowering for these nodes,
so I am assuming these intrinsics are not in common use.
X86, PowerPC, and RISC-V for example, just crash given the most
basic IR.
2020-11-22 13:43:53 -05:00
Sanjay Patel 3a18f26723 [CostModel] add tests for FP maximum; NFC
These min/max intrinsics are not handled in the basic
implementation and probably not handled in target-specific
overrides either.
2020-11-22 13:33:42 -05:00
Sanjay Patel e32bd35120 [CostModel] mostly remove cost-kind predicate for intrinsics in basic TTI implementation
This is re-applying a combination of f7eac51b9b and 8ec7ea3ddc as one patch
to avoid regressions now that we have better testing in place.

Those were reverted with 32dd5870ee because of crashing in experimental intrinsics.
That bug should be fixed with 7ae346434.

Paraphrased original commit messages:

This is the last step in removing cost-kind as a consideration in the
basic class model for intrinsics.
See D89461 for the start of that.
Subsequent commits dealt with each of the special-case intrinsics that
had customization here in the basic class. This should remove a barrier
to retrying D87188 (canonicalization to the abs intrinsic).

The ARM and x86 cost diffs seen here may be wrong because the
target-specific overrides have their own bugs, but we hope this is
less wrong - if something has a significant throughput cost, then it
should have a significant size / blended cost too by default.

The only behavioral diff in current regression tests is shown in the
x86 scatter-gather test (which is misplaced or broken because it runs
the entire -O3 pipeline) - we unrolled less, and we assume that is
a improvement.

Exception: in general, we want the *size* cost for a scalar call to be
cheap even if the other costs are expensive - we expect it to just be
a branch with some optional stack manipulation.

It is likely that we will want to carve out some
exceptions/overrides to this rule as follow-up patches for
calls that have some general and/or target-specific difference
to the expected lowering.

This was noticed as a regression in unrolling, so we have a test
for that now along with a couple of direct cost model tests.

If the assumed scalarization costs for the oversized vector
calls are not realistic, that would be another follow-up
refinement of the cost models.

Differential Revision: https://reviews.llvm.org/D90554
2020-11-20 11:21:10 -05:00
Sanjay Patel 7ae346434a [CostModel] avoid crashing while finding scalarization overhead
The constrained intrinsics have metadata arguments, so the
tests here were crashing as noted in D90554 (and that was
reverted even though this bug exists independently of that
change).
2020-11-20 10:18:29 -05:00
Sanjay Patel 1285781fc5 [CostModel] add tests for math library calls; NFC
This is a partial un-revert of 32dd5870ee (originally df09f82599 ).

I'm adding back the baseline tests first, so we don't have
to back-track as much in case there are still problems.
2020-11-20 08:24:49 -05:00
Eric Christopher 32dd5870ee Temporarily Revert "[CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation"
as it's causing crashes in the optimizer. A reduced testcase has been posted as a follow-up.

This reverts commit f7eac51b9b.

Temporarily Revert "[CostModel] make default size cost for libcalls small (again)" as it depends upon the primary revert.

This reverts commit 8ec7ea3ddc.

Temporarily Revert "[CostModel] add tests for math library calls; NFC" as it depends upon the primary revert.

This reverts commit df09f82599.

Temporarily Revert "[LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC" as it depends upon the primary revert.

This reverts commit 618d555e8d.
2020-11-19 22:10:23 -08:00
Sanjay Patel 8ec7ea3ddc [CostModel] make default size cost for libcalls small (again)
This was changed recently with D90554 / f7eac51b9b
...because we had a regression testing blindspot for intrinsics
that are expected to be lowered to libcalls.

In general, we want the *size* cost for a scalar call to be cheap
even if the other costs are expensive - we expect it to just be
a branch with some optional stack manipulation.

It is likely that we will want to carve out some
exceptions/overrides to this rule as follow-up patches for
calls that have some general and/or target-specific difference
to the expected lowering.

This was noticed as a regression in unrolling, so we have a test
for that now along with a couple of direct cost model tests.

If the assumed scalarization costs for the oversized vector
calls are not realistic, that would be another follow-up
refinement of the cost models.
2020-11-14 08:15:35 -05:00