Commit Graph

4 Commits

Author SHA1 Message Date
Charlie Turner b69b92855d [NFC] Update horizontal reduction test cases.
These testcases no longer need to specify -slp-vectorize-hor, since it was
enabled by default in r252733.

llvm-svn: 255783
2015-12-16 17:22:24 +00:00
Charlie Turner ab3215fa11 [SLP] Be more aggressive about reduction width selection.
Summary:
This change could be way off-piste, I'm looking for any feedback on whether it's an acceptable approach.

It never seems to be a problem to gobble up as many reduction values as can be found, and then to attempt to reduce the resulting tree. Some of the workloads I'm looking at have been aggressively unrolled by hand, and by selecting reduction widths that are not constrained by a vector register size, it becomes possible to profitably vectorize. My test case shows such an unrolling which SLP was not vectorizing (on neither ARM nor X86) before this patch, but with it does vectorize.

I measure no significant compile time impact of this change when combined with D13949 and D14063. There are also no significant performance regressions on ARM/AArch64 in SPEC or LNT.

The more principled approach I thought of was to generate several candidate tree's and use the cost model to pick the cheapest one. That seemed like quite a big design change (the algorithms seem very much one-shot), and would likely be a costly thing for compile time. This seemed to do the job at very little cost, but I'm worried I've misunderstood something!

Reviewers: nadav, jmolloy

Subscribers: mssimpso, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D14116

llvm-svn: 251428
2015-10-27 17:59:03 +00:00
Charlie Turner cd6e8cf8c2 [SLP] Try a bit harder to find reduction PHIs
Summary:
Currently, when the SLP vectorizer considers whether a phi is part of a reduction, it dismisses phi's whose incoming blocks are not the same as the block containing the phi. For the patterns I'm looking at, extending this rule to allow phis whose incoming block is a containing loop latch allows me to vectorize certain workloads.

There is no significant compile-time impact, and combined with D13949, no performance improvement measured in ARM/AArch64 in any of SPEC2000, SPEC2006 or LNT.

Reviewers: jmolloy, mcrosier, nadav

Subscribers: mssimpso, nadav, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14063

llvm-svn: 251425
2015-10-27 17:54:16 +00:00
Charlie Turner 74c387feb7 [SLP] Treat SelectInsts as reduction values.
Summary:
Certain workloads, in particular sum-of-absdiff loops, can be vectorized using SLP if it can treat select instructions as reduction values.

The test case is a bit awkward. The AArch64 cost model needs some tuning to not be so pessimistic about selects. I've had to tweak the SLP threshold here.

Reviewers: jmolloy, mzolotukhin, spatel, nadav

Subscribers: nadav, mssimpso, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D13949

llvm-svn: 251424
2015-10-27 17:49:11 +00:00