Commit Graph

8 Commits

Author SHA1 Message Date
Simon Pilgrim 86478c6909 [X86][SSE] Vectorize i64 ASHR operations
This patch vectorizes the v2i64/v4i64 ASHR shift operations - the last remaining integer vector shifts that are still being transferred to/from the scalar unit to be completed.

Differential Revision: http://reviews.llvm.org/D11439

llvm-svn: 243569
2015-07-29 20:31:45 +00:00
Nadav Rotem 87a0af6e1b CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.
llvm-svn: 179413
2013-04-12 21:15:03 +00:00
Arnold Schwaighofer e9b5016411 X86 cost model: Vector shifts are expensive in most cases
The default logic does not correctly identify costs of casts because they are
marked as custom on x86.

For some cases, where the shift amount is a scalar we would be able to generate
better code. Unfortunately, when this is the case the value (the splat) will get
hoisted out of the loop, thereby making it invisible to ISel.

radar://13130673
radar://13537826

llvm-svn: 178703
2013-04-03 21:46:05 +00:00
Michael Liao 70dd7f999d Correct cost model for vector shift on AVX2
- After moving logic recognizing vector shift with scalar amount from
  DAG combining into DAG lowering, we declare to customize all vector
  shifts even vector shift on AVX is legal. As a result, the cost model
  needs special tuning to identify these legal cases.

llvm-svn: 177586
2013-03-20 22:01:10 +00:00
Arnold Schwaighofer 20ef54f4c1 X86 cost model: Adjust cost for custom lowered vector multiplies
This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403
2013-03-02 04:02:52 +00:00
Nadav Rotem 0a471ea66c Cost Model: change the default cost of control flow instructions (br / ret / ...) to zero.
llvm-svn: 169423
2012-12-05 21:21:26 +00:00
Nadav Rotem 7411623fd8 Implement the cost of abnormal x86 instruction lowering as a table.
llvm-svn: 167395
2012-11-05 19:32:46 +00:00
Nadav Rotem c2345cbe73 X86 CostModel: Add support for a some of the common arithmetic instructions for SSE4, AVX and AVX2.
llvm-svn: 167347
2012-11-03 00:39:56 +00:00