Commit Graph

7 Commits

Author SHA1 Message Date
Sander de Smalen 51c2fa0e2a Improve reduction intrinsics by overloading result value.
This patch uses the mechanism from D62995 to strengthen the
definitions of the reduction intrinsics by letting the scalar
result/accumulator type be overloaded from the vector element type.

For example:

  ; The LLVM LangRef specifies that the scalar result must equal the
  ; vector element type, but this is not checked/enforced by LLVM.
  declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a)

This patch changes that into:

  declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a)

Which has the type-constraint more explicit and causes LLVM to check
the result type with the vector element type.

Reviewers: RKSimon, arsenm, rnk, greened, aemerson

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D62996

llvm-svn: 363240
2019-06-13 09:37:38 +00:00
Sander de Smalen cbeb563cfb Change semantics of fadd/fmul vector reductions.
This patch changes how LLVM handles the accumulator/start value
in the reduction, by never ignoring it regardless of the presence of
fast-math flags on callsites. This change introduces the following
new intrinsics to replace the existing ones:

  llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd
  llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul

and adds functionality to auto-upgrade existing LLVM IR and bitcode.

Reviewers: RKSimon, greened, dmgreen, nikic, simoll, aemerson

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D60261

llvm-svn: 363035
2019-06-11 08:22:10 +00:00
Sanjay Patel 5a4f7cf2ff [IR] allow fast-math-flags on select of FP values
This is a minimal start to correcting a problem most directly discussed in PR38086:
https://bugs.llvm.org/show_bug.cgi?id=38086

We have been hacking around a limitation for FP select patterns by using the
fast-math-flags on the condition of the select rather than the select itself.
This patch just allows FMF to appear with the 'select' opcode. No changes are
needed to "FPMathOperator" because it already includes select-of-FP because
that definition is based on the (return) value type.

Once we have this ability, we can start correcting and adding IR transforms
to use the FMF on a 'select' instruction. The instcombine and vectorizer test
diffs only show that the IRBuilder change is behaving as expected by applying
an FMF guard value to 'select'.

For reference:
rL241901 - allowed FMF with fcmp
rL255555 - allowed FMF with FP calls

Differential Revision: https://reviews.llvm.org/D61917

llvm-svn: 361401
2019-05-22 15:50:46 +00:00
Simon Pilgrim 23c2182c2b Support generic expansion of ordered vector reduction (PR36732)
Without the fast math flags, the llvm.experimental.vector.reduce.fadd/fmul intrinsic expansions must be expanded in order.

This patch scalarizes the reduction, applying the accumulator at the start of the sequence: ((((Acc + Scl[0]) + Scl[1]) + Scl[2]) + ) ... + Scl[NumElts-1]

Differential Revision: https://reviews.llvm.org/D45366

llvm-svn: 329585
2018-04-09 15:44:20 +00:00
Simon Pilgrim 389dc7f0c8 Add additional tests from D45336
llvm-svn: 329427
2018-04-06 17:18:44 +00:00
Simon Pilgrim 1e6659c1f0 Add additional tests from D45366
llvm-svn: 329425
2018-04-06 17:15:56 +00:00
Amara Emerson 836b0f48c1 Add a late IR expansion pass for the experimental reduction intrinsics.
This pass uses a new target hook to decide whether or not to expand a particular
intrinsic to the shuffevector sequence.

Differential Revision: https://reviews.llvm.org/D32245

llvm-svn: 302631
2017-05-10 09:42:49 +00:00