Commit Graph

34 Commits

Author SHA1 Message Date
Simon Pilgrim 4455c5cdea [CostModel][X86] Update RUN -passes=* to double quotes to appease update scripts on windows 2022-03-18 11:44:18 +00:00
Arthur Eubanks 15ba588d6d [test] Migrate '-analyze -cost-model' to '-passes=print<cost-model>' 2022-02-09 15:42:16 -08:00
Simon Pilgrim 4c7e9a3852 [CostModel][X86] Adjust sext/zext SSE/AVX legalized costs based on llvm-mca reports.
Update costs based on the worst case costs from the script in D103695.

Move to using legalized types wherever possible, which allows us to prune the cost tables.
2021-07-07 13:58:27 +01:00
Simon Pilgrim a7da0296a6 [CostModel][X86] Adjust sitofp/uitofp SSE/AVX legalized costs based on llvm-mca reports.
Update (mainly) vXi8/vXi16 -> vXf32/vXf64 sitofp/uitofp costs based on the worst case costs from the script in D103695.

Move to using legalized types wherever possible, which allows us to prune the cost tables.
2021-07-07 12:03:45 +01:00
Simon Pilgrim 6f3f9535fc [CostModel][X86] i8/i16 sitofp/uitofp are sext/zext to i32 for sitofp
Provide a generic fallback that extends sub-i32 scalars before using the existing sitofp instructions.

These numbers can be tweaked for specific sse levels, but we should get the default handling in place first.

We get the extension for free for non-vector loads.
2021-07-06 13:58:52 +01:00
Simon Pilgrim 65e4240fa1 [CostModel][X86] Adjust i32/i64 to f32/f64 scalar based on llvm-mca reports (+ Agner).
Older SSE targets have slower gpr->fpu scalar conversions - we also need to account for uitofp i32 > f32/f64 being lowered as sitofp i64 -> f32/f64
2021-07-05 13:26:53 +01:00
Simon Pilgrim 2aecffcd40 [CostModel][X86] Find AVX conversion costs using legalized types if custom types didn't match
Building on rG2a1ef8784ad9a, fallback to attempting to match against legalized types like we do for SSE targets.
2021-07-02 13:49:31 +01:00
Simon Pilgrim cdca1785d3 [CostModel][X86] Adjust uitofp(vXi64) SSE/AVX legalized costs based on llvm-mca reports.
Update v4i64 -> v4f32/v4f64 uitofp costs based on the worst case costs from the script in D103695.

Fixes a few regressions before we start adding AVX costs for legalized types.
2021-07-02 13:09:00 +01:00
Simon Pilgrim 5e5ba14b4d [CostModel][X86] Adjust fp<->int vXi32 SSE legalized costs based on llvm-mca reports.
Building on rG2a1ef8784ad9a, adjust the SSE cost tables to use the legalized types based on the worst case costs from the script in D103695.

To account for different numbers of src/dst legalized type registers we must scale the cost by maximum of the src/dst, not just use src
2021-07-01 15:34:20 +01:00
Simon Pilgrim 47941d601d [CostModel][X86] Adjust fp<->int vXi32 AVX1+ costs based on llvm-mca reports
Based off the worse case numbers generated by D103695, the AVX1/2/512 sitofp/uitofp/fptosi/fptoui costs were higher than necessary (based off instruction counts instead of actual throughput).

The SSE costs still need further fixes, but I hit an issue with the order in which SSE costs are checked - we need to check CUSTOM costs (with non-legal types) first, and then fallback to LEGALIZED types. I'm looking at this now, and this should let us start thinning out a lot of the duplicates in the costs tables.

Then we can finally start work on vXi64 / vXi16 / vXi8 / vXi1 integers, which should let us look at sub-128-bit vectorization (D103925).
2021-06-30 15:23:34 +01:00
Simon Pilgrim 3ae7f7ae0a [CostModel][X86] Tweak fptoui v4f32->v4i32 + v8f32->v8i32 SSE/AVX costs
Adjust for worst case for atom/slm (SSE), btver2/sandybridge (AVX1) and haswell/znver* (AVX2)
2021-05-21 12:09:31 +01:00
Simon Pilgrim 4865ed3020 [CostModel][X86] Match SSE41 legalized conversion costs as well as SSE2 2021-05-21 11:42:22 +01:00
Simon Pilgrim fe9403df06 [CostModel][X86] Remove unused check-prefixes 2020-11-10 12:48:35 +00:00
Craig Topper e39c7ab2b9 [CostModel][X86][ARM] Teach default implementation of getCastInstrCost to not add a split/join cost if source type and the destination type both have a SplitVector action
If both the source and the destination need to be split then the two halves of the split operation are completely independent and don't need to be split or joined. So we don't need to assess a cost for the split or join.

Differential Revision: https://reviews.llvm.org/D79111
2020-05-01 18:55:23 -07:00
Craig Topper 5f2ea70980 [X86] Add cost model tests for conversions between <2 x float> and integers.
For all but 2 x i32 we were starting from 4 x float.
2020-04-26 19:59:01 -07:00
Simon Pilgrim eaa41e103c [CostModel][X86] Try to check against common prefixes before using target-specific cpu checks
SLM/GLM is still a mess so not all of them have been updated yet.
2020-02-24 11:59:07 +00:00
Simon Pilgrim 5d986a68a5 [CostModel][X86] Add missing scalar i64->f32 uitofp costs 2020-01-06 13:17:02 +00:00
Craig Topper 8b5f2ab2a4 Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."
The assert that caused this to be reverted should be fixed now.

Original commit message:

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 368183
2019-08-07 16:24:26 +00:00
Mitch Phillips bd0d97e1c4 Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."
This reverts commit 3de33245d2.

This commit broke the MSan buildbots. See
https://reviews.llvm.org/rL367901 for more information.

llvm-svn: 368107
2019-08-06 23:00:43 +00:00
Craig Topper 3de33245d2 [X86] Enable -x86-experimental-vector-widening-legalization by default.
This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 367901
2019-08-05 18:25:36 +00:00
Simon Pilgrim 53e8e145e9 [CostModel][X86] Add realistic vXi64 uitofp vXf64 costs
Match codegen improvements from D53649/rL345256

llvm-svn: 345263
2018-10-25 13:06:20 +00:00
Simon Pilgrim 0573b8d8b6 [CostModel][X86] Add realistic i64 uitofp f64 scalar costs
llvm-svn: 345261
2018-10-25 12:42:10 +00:00
Simon Pilgrim cd9ccf8824 [CostModel][X86] Split off BtVer2 cost checks
llvm-svn: 330433
2018-04-20 13:50:33 +00:00
Simon Pilgrim 34b397a318 [CostModel][X86] Add some specific cpu targets to the cost models
We're mostly testing with generic isa attributes, but PR36550 will require testing of specific target's scheduler models as well.

llvm-svn: 330056
2018-04-13 19:30:15 +00:00
Simon Pilgrim ad768585ff [CostModel][X86] Regenerate int<->fp cost tests with update_analyze_test_checks.py
llvm-svn: 329398
2018-04-06 15:12:36 +00:00
Simon Pilgrim d1aed9a9e6 [CostModel][X86] Updated sitofp/uitofp scalar/vector cost tests
Better coverage of all legal types + special cases.

Removed old fptoui tests which are all handled in fptoui.ll

llvm-svn: 287678
2016-11-22 18:55:49 +00:00
Michael Kuperstein f0c59330e9 [X86] Make some cast costs more precise
Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).

Differential Revision: http://reviews.llvm.org/D22064

llvm-svn: 275106
2016-07-11 21:39:44 +00:00
Michael Kuperstein aa71bdd3af [TTI] The cost model should not assume vector casts get completely scalarized
The cost model should not assume vector casts get completely scalarized, since
on targets that have vector support, the common case is a partial split up to
the legal vector size. So, when a vector cast  gets split, the resulting casts
end up legal and cheap.

Instead of pessimistically assuming scalarization, base TTI can use the costs
the concrete TTI provides for the split vector, plus a fudge factor to account
for the cost of the split itself. This fudge factor is currently 1 by default,
except on AMDGPU where inserts and extracts are considered free.

Differential Revision: http://reviews.llvm.org/D21251

llvm-svn: 274642
2016-07-06 17:30:56 +00:00
Elena Demikhovsky a1a40cce9f AVX-512: Updated cost of FP/SINT/UINT conversion operations
I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode.
Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets).

Differential Revision: http://reviews.llvm.org/D15074

llvm-svn: 254494
2015-12-02 08:59:47 +00:00
Simon Pilgrim d862e0f33a [X86][SSE][CostModel] Added full set of sitofp/uitofp costings for SSE2/AVX/AVX2/AVX512F.
Merged separate (but equivalent) SSE2/AVX512F tests.

Removed codegen tests since these are already done better in test/CodeGen/X86.

The actual cost values still need to be updated to match recent codegen improvements.

llvm-svn: 240219
2015-06-20 14:58:01 +00:00
Simon Pilgrim de94fa6438 [X86][SSE][CostModel] Fixed uitofp/sitofp cost target tests to specify sse2/avx2/avx512f directly instead of via a cpu model.
llvm-svn: 240062
2015-06-18 21:26:01 +00:00
Quentin Colombet 360460ba64 [X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if
AVX2 is available.
According to IACA, the new lowering has a throughput of 8 cycles instead of 13
with the previous one.

Althought this lowering kicks in some SPECs benchmarks, the performance
improvement was within the noise.

Correctness testing has been done for the whole range of uint32_t with the
following program:
    uint4 v = (uint4) {0,1,2,3};
    uint32_t i;
    
    //Check correctness over entire range for uint4 -> float4 conversion
    for( i = 0; i < 1U << (32-2); i++ )
    {
        float4 t = test(v);
        float4 c = correct(v);
        
        if( 0xf != _mm_movemask_ps( t == c ))
        {
            printf( "Error @ %vx: %vf vs. %vf\n", v, c, t);
            return -1;
        }
        
        v += 4;
    }
Where "correct" is the old lowering and "test" the new one.

The patch adds a test case for the two custom lowering instruction.
It also modifies the vector cost model, which is why cast.ll and uitofp.ll are
modified.
2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7
instructions instead of 4 (3 more constant loads).

rdar://problem/18153096>

llvm-svn: 221657
2014-11-11 02:23:47 +00:00
Arnold Schwaighofer c0c7ff4ac0 X86 cost model: Exit before calling getSimpleVT on non-simple VTs
getSimpleVT can only handle simple value types.

radar://13676022

llvm-svn: 179714
2013-04-17 20:04:53 +00:00
Arnold Schwaighofer f47d2d7f6b X86 cost model: Model cost for uitofp and sitofp on SSE2
The costs are overfitted so that I can still use the legalization factor.

For example the following kernel has about half the throughput vectorized than
unvectorized when compiled with SSE2. Before this patch we would vectorize it.

unsigned short A[1024];
double B[1024];
void f() {
  int i;
  for (i = 0; i < 1024; ++i) {
    B[i] = (double) A[i];
  }
}

radar://13599001

llvm-svn: 179033
2013-04-08 18:05:48 +00:00