Commit Graph

8 Commits

Author SHA1 Message Date
Michael Kuperstein f0c59330e9 [X86] Make some cast costs more precise
Make some AVX and AVX512 cast costs more precise.
Based on part of a patch by Elena Demikhovsky (D15604).

Differential Revision: http://reviews.llvm.org/D22064

llvm-svn: 275106
2016-07-11 21:39:44 +00:00
Michael Kuperstein aa71bdd3af [TTI] The cost model should not assume vector casts get completely scalarized
The cost model should not assume vector casts get completely scalarized, since
on targets that have vector support, the common case is a partial split up to
the legal vector size. So, when a vector cast  gets split, the resulting casts
end up legal and cheap.

Instead of pessimistically assuming scalarization, base TTI can use the costs
the concrete TTI provides for the split vector, plus a fudge factor to account
for the cost of the split itself. This fudge factor is currently 1 by default,
except on AMDGPU where inserts and extracts are considered free.

Differential Revision: http://reviews.llvm.org/D21251

llvm-svn: 274642
2016-07-06 17:30:56 +00:00
Elena Demikhovsky a1a40cce9f AVX-512: Updated cost of FP/SINT/UINT conversion operations
I checked and updated the cost of AVX-512 conversion operations. Added cost of conversion operations in DQ mode.
Conversion of illegal types that requires vector split is not calculated right now (like for other X86 targets).

Differential Revision: http://reviews.llvm.org/D15074

llvm-svn: 254494
2015-12-02 08:59:47 +00:00
Simon Pilgrim d862e0f33a [X86][SSE][CostModel] Added full set of sitofp/uitofp costings for SSE2/AVX/AVX2/AVX512F.
Merged separate (but equivalent) SSE2/AVX512F tests.

Removed codegen tests since these are already done better in test/CodeGen/X86.

The actual cost values still need to be updated to match recent codegen improvements.

llvm-svn: 240219
2015-06-20 14:58:01 +00:00
Simon Pilgrim de94fa6438 [X86][SSE][CostModel] Fixed uitofp/sitofp cost target tests to specify sse2/avx2/avx512f directly instead of via a cpu model.
llvm-svn: 240062
2015-06-18 21:26:01 +00:00
Quentin Colombet 360460ba64 [X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if
AVX2 is available.
According to IACA, the new lowering has a throughput of 8 cycles instead of 13
with the previous one.

Althought this lowering kicks in some SPECs benchmarks, the performance
improvement was within the noise.

Correctness testing has been done for the whole range of uint32_t with the
following program:
    uint4 v = (uint4) {0,1,2,3};
    uint32_t i;
    
    //Check correctness over entire range for uint4 -> float4 conversion
    for( i = 0; i < 1U << (32-2); i++ )
    {
        float4 t = test(v);
        float4 c = correct(v);
        
        if( 0xf != _mm_movemask_ps( t == c ))
        {
            printf( "Error @ %vx: %vf vs. %vf\n", v, c, t);
            return -1;
        }
        
        v += 4;
    }
Where "correct" is the old lowering and "test" the new one.

The patch adds a test case for the two custom lowering instruction.
It also modifies the vector cost model, which is why cast.ll and uitofp.ll are
modified.
2009-02-26-MachineLICMBug.ll is also modified because we now hoist 7
instructions instead of 4 (3 more constant loads).

rdar://problem/18153096>

llvm-svn: 221657
2014-11-11 02:23:47 +00:00
Arnold Schwaighofer c0c7ff4ac0 X86 cost model: Exit before calling getSimpleVT on non-simple VTs
getSimpleVT can only handle simple value types.

radar://13676022

llvm-svn: 179714
2013-04-17 20:04:53 +00:00
Arnold Schwaighofer f47d2d7f6b X86 cost model: Model cost for uitofp and sitofp on SSE2
The costs are overfitted so that I can still use the legalization factor.

For example the following kernel has about half the throughput vectorized than
unvectorized when compiled with SSE2. Before this patch we would vectorize it.

unsigned short A[1024];
double B[1024];
void f() {
  int i;
  for (i = 0; i < 1024; ++i) {
    B[i] = (double) A[i];
  }
}

radar://13599001

llvm-svn: 179033
2013-04-08 18:05:48 +00:00