Commit Graph

13 Commits

Author SHA1 Message Date
Simon Pilgrim 4455c5cdea [CostModel][X86] Update RUN -passes=* to double quotes to appease update scripts on windows 2022-03-18 11:44:18 +00:00
Arthur Eubanks 15ba588d6d [test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>' 2022-02-09 15:42:16 -08:00
Simon Pilgrim 872a950033 [CostModel] Treat 'widen subvector' patterns as zero cost
As discussed on D107228, widening a subvector by inserting the whole subvector into the bottom a larger undef vector should always be cheap enough that we can treat it as zero cost.

NOTE: If this proves to cause issues we have the option of introducing a "SK_WidenSubvector" shuffle kind enum that targets could override the zero cost, but that doesn't seem necessary atm.

Differential Revision: https://reviews.llvm.org/D107228
2021-08-02 11:43:10 +01:00
Simon Pilgrim 7397dcb403 [TTI] Add basic SK_InsertSubvector shuffle mask recognition
This patch adds an initial ShuffleVectorInst::isInsertSubvectorMask helper to recognize 2-op shuffles where the lowest elements of one of the sources are being inserted into the "in-place" other operand, this includes "concat_vectors" patterns as can be seen in the Arm shuffle cost changes. This also helped fix a x86 issue with irregular/length-changing SK_InsertSubvector costs - I'm hoping this will help with D107188

This doesn't currently attempt to work with 1-op shuffles that could either be a "widening" shuffle or a self-insertion.

The self-insertion case is tricky, but we currently always match this with the existing SK_PermuteSingleSrc logic.

The widening case will be addressed in a follow up patch that treats the cost as 0.

Masks with a high number of undef elts will still struggle to match optimal subvector widths - its currently bounded by minimum-width possible insertion, whilst some cases would benefit from wider (pow2?) subvectors.

Differential Revision: https://reviews.llvm.org/D107228
2021-08-02 11:23:44 +01:00
Sander de Smalen 4ca860742d [InstructionCost] Don't conflate Invalid costs with Unknown costs.
We previously made a change to getUserCost to return a Invalid cost
when one of the TTI costs returned '-1' (meaning 'unknown' or
'infinitely expensive'). It makes no sense to say that:

  shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3>

has an invalid cost. Perhaps the cost is not known, but the IR is valid
and can be code-generated. Invalid should only be used for IR that
cannot possibly be code-generated and where a cost is nonsensical.

With more passes now asserting that the cost must be valid, it is possible
that those assertions will fail for perfectly valid IR. An incomplete
cost-model probably shouldn't be a reason for the compiler to break.

It's better to consider these costs as 'very expensive' and ignore them
for other reasons. At some point, we should consider replacing -1 with
some other mechanism.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D99502
2021-03-30 09:29:42 +01:00
Simon Pilgrim fe9403df06 [CostModel][X86] Remove unused check-prefixes 2020-11-10 12:48:35 +00:00
Bing1 Yu ec24e50553 [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)
add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87884
2020-09-23 10:29:10 +08:00
Craig Topper ff66919020 [X86][CostModel] Bump the cost of vpermw/vpermt2b/vperm2w
vpermw is 2 uops. vpermt2b/vpermt2w are two shuffle uops and a port 015 uop. Weirdly vpermb is a single uop.

This patch bumps the cost to 2 for these operations. Maybe should go to 3 for the vpermt2*, but I've started conservative.

I've also removed a few entries that were now the same as earlier subtargets or that I didn't think we really did. Like I don't think we extend v32i8 to v32i16, shuffle, and then truncate.

Differential Revision: https://reviews.llvm.org/D79148
2020-04-30 11:32:25 -07:00
Simon Pilgrim 12c629ec6c [CostModel][X86] Add shuffle costs for some common sub-128bit vectors
v2i8/v4i8/v8i8 + v2i16/v4i16 all show up in vectorizer code and by just using the legalized types (v16i8/v8i16) we're highly exaggerating the actual cost of the shuffle.
2020-04-09 19:57:06 +01:00
Simon Pilgrim be84d2b5b7 [CostModel][X86] Add some insert subvector cost tests for vXf32/vXi32/vXi16/vXi8 types 2020-04-04 22:46:57 +01:00
Simon Pilgrim 168a44a70e [CostModel][X86] Improve extract/insert element costs (PR43605)
This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index.

It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true).

Differential Revision: https://reviews.llvm.org/D74976
2020-02-27 15:54:13 +00:00
Simon Pilgrim eaa41e103c [CostModel][X86] Try to check against common prefixes before using target-specific cpu checks
SLM/GLM is still a mess so not all of them have been updated yet.
2020-02-24 11:59:07 +00:00
Simon Pilgrim acd10b9f6c [CostModel][X86] Add some initial extract/insert subvector shuffle cost tests
Just f64/i64 tests initially to demonstrate PR39368

llvm-svn: 344857
2018-10-20 17:38:33 +00:00