Commit Graph

639 Commits

Author SHA1 Message Date
Karthik Bhat 5ab7795649 Allow vectorization of intrinsics such as powi,cttz and ctlz in Loop and SLP Vectorizer.
This patch adds support to vectorize intrinsics such as powi, cttz and ctlz in Vectorizer. These intrinsics are different from other
intrinsics as second argument to these function must be same in order to vectorize them and it should be represented as a scalar.
Review: http://reviews.llvm.org/D3851#inline-32769 and http://reviews.llvm.org/D3937#inline-32857

llvm-svn: 209873
2014-05-30 04:31:24 +00:00
Arnold Schwaighofer e2067680a6 LoopVectorizer: Add a check that the backedge taken count + 1 does not overflow
The loop vectorizer instantiates be-taken-count + 1 as the loop iteration count.
If this expression overflows the generated code was invalid.

In case of overflow the code now jumps to the scalar loop.

Fixes PR17288.

llvm-svn: 209854
2014-05-29 22:10:01 +00:00
Diego Novillo 7f8af8bf91 Add support for missed and analysis optimization remarks.
Summary:
This adds two new diagnostics: -pass-remarks-missed and
-pass-remarks-analysis. They take the same values as -pass-remarks but
are intended to be triggered in different contexts.

-pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed,
which passes call when they tried to apply a transformation but
couldn't.

-pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis,
which passes call when they want to inform the user about analysis
results.

The patch also:

1- Adds support in the inliner for the two new remarks and a
   test case.

2- Moves emitOptimizationRemark* functions to the llvm namespace.

3- Adds an LLVMContext argument instead of making them member functions
   of LLVMContext.

Reviewers: qcolombet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3682

llvm-svn: 209442
2014-05-22 14:19:46 +00:00
Eric Christopher 650c8f2a06 Clean up language and grammar.
Based on a patch by jfcaron3@gmail.com!
PR19806

llvm-svn: 209216
2014-05-20 17:11:11 +00:00
Zinovy Nis abdf44e7f3 [LV][REFACTOR] One more tiny fix for printing debug locations in loop vectorizer. Now consistent with the remarks emitter.
Differential Revision: http://reviews.llvm.org/D3821

llvm-svn: 209197
2014-05-20 08:26:20 +00:00
Benjamin Kramer 1fef64214c SLPVectorizer: Instead of just performing CSE on dead blocks ignore them completely.
Turns out that there is a very cheap way of testing whether a block is dead,
just look it up in the DomTree. We have to do this anyways so just ignore
unreachable blocks before sorting by domination. This restores a proper
ordering for std::stable_sort when dead code is present.

Covered by existing tests & buildbots running in STL debug mode (MSVC).

llvm-svn: 208492
2014-05-11 10:28:58 +00:00
Benjamin Kramer 8722aa5754 SLPVectorizer: When sorting by domination for CSE don't assert on unreachable code.
There is no total ordering if the CFG is disconnected. We don't care if we
catch all CSE opportunities in dead code either so just exclude ignore them in
the assert.

PR19646

llvm-svn: 208461
2014-05-09 23:28:49 +00:00
Zinovy Nis da925c0d7c [BUG][REFACTOR]
1) Fix for printing debug locations for absolute paths.
2) Location printing is moved into public method DebugLoc::print() to avoid re-inventing the wheel.

Differential Revision: http://reviews.llvm.org/D3513

llvm-svn: 208177
2014-05-07 09:51:22 +00:00
Yi Jiang a4821fc9fb Always set alignment of vectorized LD/ST in SLP-Vectorizer. <rdar://problem/16812145>
llvm-svn: 207983
2014-05-05 17:59:14 +00:00
Arnold Schwaighofer cd566c423a SLPVectorizer: Bring back the insertelement patch (r205965) with fixes
When can't assume a vectorized tree is rooted in an instruction. The IRBuilder
could have constant folded it. When we rebuild the build_vector (the series of
InsertElement instructions) use the last original InsertElement instruction. The
vectorized tree root is guaranteed to be before it.

Also, we can't assume that the n-th InsertElement inserts the n-th element into
a vector.

This reverts r207746 which reverted the revert of the revert of r205018 or so.

Fixes the test case in PR19621.

llvm-svn: 207939
2014-05-04 17:10:15 +00:00
Benjamin Kramer 64425fe875 SLPVectorizer: Lazily allocate the map for block numbering.
There is no point in creating it if we're not going to vectorize
anything. Creating the map is expensive as it creates large values.
No functionality change.

llvm-svn: 207916
2014-05-03 15:50:37 +00:00
Karthik Bhat ddd0cb5ecf Vectorize intrinsic math function calls in SLPVectorizer.
This patch adds support to recognize and vectorize intrinsic math functions in SLPVectorizer.
Review: http://reviews.llvm.org/D3560 and http://reviews.llvm.org/D3559

llvm-svn: 207901
2014-05-03 09:59:54 +00:00
Eric Christopher 6c26beb770 Clean up constructor logic and member access for LoopVectorizeHints.
There are public functions that mutate various members as well as
another private member already, so make all the members private to
avoid the discontinuity and add accessors for the values. Should
be no functional change.

llvm-svn: 207868
2014-05-02 20:40:04 +00:00
Chandler Carruth 18c2fbb143 Revert r205965, which essentially reverts r205018 for the second time.
=[

Turns out that this was the root cause of PR19621. We found a crasher
only recently (likely due to improvements elsewhere in the SLP
vectorizer) but the reduced test case failed all the way back to here.
I've confirmed that reverting this patch both fixes the reduced test
case in PR19621 and the actual source file that led to it, so it seems
to really be rooted here. I've replied to the commit thread with
discussion of my (feeble) attempts to debug this. Didn't make it very
far, so reverting now that we have a good test case so that things can
get back to healthy while the debugging carries on.

llvm-svn: 207746
2014-05-01 11:24:11 +00:00
Benjamin Kramer bf2368d94b Add a <tuple> include to more files that aren't getting it transitively on MSVC.
llvm-svn: 207617
2014-04-30 07:21:01 +00:00
Diego Novillo cd64780d18 Fix vectorization remarks.
This patch changes the vectorization remarks to also inform when
vectorization is possible but not beneficial.

Added tests to exercise some loop remarks.

llvm-svn: 207574
2014-04-29 20:06:10 +00:00
Yi Jiang 1a3f18b161 Continue slp vectorization even the BB already has vectorized store radar://16641956
llvm-svn: 207572
2014-04-29 19:37:20 +00:00
Diego Novillo 34fc8a7c4c Add optimization remarks to the loop unroller and vectorizer.
Summary:
This calls emitOptimizationRemark from the loop unroller and vectorizer
at the point where they make a positive transformation. For the
vectorizer, it reports vectorization and interleave factors. For the
loop unroller, it reports all the different supported types of
unrolling.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3456

llvm-svn: 207528
2014-04-29 14:27:31 +00:00
Zinovy Nis 487268574a [BUG] Fix -Wunused-variable warning in Release mode. Thnx to Kostya Serebryany for pointing.
llvm-svn: 207516
2014-04-29 09:45:08 +00:00
Kostya Serebryany dc8e551d84 fix -Wunused-variable warning in Release mode
llvm-svn: 207514
2014-04-29 09:33:02 +00:00
Zinovy Nis d373fec199 [OPENMP][LV][D3423] Respect Hints.Force meta-data for loops in LoopVectorizer
llvm-svn: 207512
2014-04-29 08:55:11 +00:00
Craig Topper e73658ddbb [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Craig Topper f40110f4d8 [C++] Use 'nullptr'. Transforms edition.
llvm-svn: 207196
2014-04-25 05:29:35 +00:00
Karthik Bhat 6a48f7d66e Allow vectorization of bit intrinsics in BB Vectorizer.
This patch adds support for vectorization of  bit intrinsics such as bswap,ctpop,ctlz,cttz.

llvm-svn: 207174
2014-04-25 03:33:48 +00:00
Karthik Bhat 81e6bf0a41 Allow vectorization of few missed llvm intrinsic calls in BBVectorizor by handling them in isVectorizableIntrinsic function.
llvm-svn: 207085
2014-04-24 07:29:55 +00:00
Alexander Musman f0785f4db4 [LV] Statistics numbers for LoopVectorize introduced: a number of analyzed loops & a number of vectorized loops.
Use -stats to see how many loops were analyzed for possible vectorization and how many of them were actually vectorized.
Patch by Zinovy Nis

Differential Revision: http://reviews.llvm.org/D3438

llvm-svn: 206956
2014-04-23 08:40:37 +00:00
Chandler Carruth 964daaaf19 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Transforms/...
edition.

This one is tricky for two reasons. We again have a couple of passes
that define something else before the includes as well. I've sunk their
name macros with the DEBUG_TYPE.

Also, InstCombine contains headers that need DEBUG_TYPE, so now those
headers #define and #undef DEBUG_TYPE around their code, leaving them
well formed modular headers. Fixing these headers was a large motivation
for all of these changes, as "leaky" macros of this form are hard on the
modules implementation.

llvm-svn: 206844
2014-04-22 02:55:47 +00:00
Alexey Bataev b97f9e8698 D3348 - [BUG] "Rotate Loop" pass kills "llvm.vectorizer.enable" metadata
llvm-svn: 206266
2014-04-15 09:37:30 +00:00
Arnold Schwaighofer b373e01d87 Reapply "SLPVectorizer: Ignore users that are insertelements we can reschedule them"
This commit reapplies 205018. After 205855 we should correctly vectorize
intrinsics.

llvm-svn: 205965
2014-04-10 13:41:35 +00:00
Arnold Schwaighofer fd0bf5d6e5 SLPVectorizer: Only vectorize intrinsics whose operands are widened equally
The vectorizer only knows how to vectorize intrinics by widening all operands by
the same factor.

Patch by Tyler Nowicki!

llvm-svn: 205855
2014-04-09 14:20:47 +00:00
Eric Christopher fcd222aa47 Add NDEBUG markers around debug only function.
llvm-svn: 205706
2014-04-07 12:46:30 +00:00
Eric Christopher 6ab6a0648b Add debug location information to the vectorizer debug statements.
Patch by Zinovy Nis.

llvm-svn: 205705
2014-04-07 12:32:17 +00:00
David Blaikie 6425696818 Fixing typo.
Differential Revision: http://reviews.llvm.org/D3154

llvm-svn: 205674
2014-04-05 20:30:31 +00:00
Tim Northover 670df3d937 SLPVectorizer: compare entire intrinsic for SLP compatibility.
Some Intrinsics are overloaded to the extent that return type equality (all
that's been checked up to now) does not guarantee that the arguments are the
same. In these cases SLP vectorizer should not recurse into the operands, which
can be achieved by comparing them as "Function *" rather than simply the ID.

llvm-svn: 205424
2014-04-02 14:39:02 +00:00
Hal Finkel b0ebdc0f43 [LoopVectorizer] Count dependencies of consecutive pointers as uniforms
For the purpose of calculating the cost of the loop at various vectorization
factors, we need to count dependencies of consecutive pointers as uniforms
(which means that the VF = 1 cost is used for all overall VF values).

For example, the TSVC benchmark function s173 has:
  ...
  %3 = add nsw i64 %indvars.iv, 16000
  %arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3
  ...
and we must realize that the add will be a scalar in order to correctly deduce
it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all
dependencies of a consecutive pointer must be a scalar (uniform), and so we
simply need to add all consecutive pointers to the worklist that currently
detects collects uniforms.

Fixes PR19296.

llvm-svn: 205387
2014-04-02 02:34:49 +00:00
Arnold Schwaighofer 15262e6703 Revert "SLPVectorizer: Ignore users that are insertelements we can reschedule them"
This reverts commit r205018.

Conflicts:
	lib/Transforms/Vectorize/SLPVectorizer.cpp
	test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

This is breaking libclc build.

llvm-svn: 205260
2014-03-31 23:05:56 +00:00
Arnold Schwaighofer c9d58e8d32 SLPVectorizer: Take credit for free extractelement instructions
Extract element instructions that will be removed when vectorzing lower the
cost.

Patch by Arch D. Robison!

llvm-svn: 205020
2014-03-28 17:21:32 +00:00
Arnold Schwaighofer b0d3bcdd32 SLPVectorizer: Fix typos
Patch by Arch D. Robison!

llvm-svn: 205019
2014-03-28 17:21:27 +00:00
Arnold Schwaighofer b190cb30c3 SLPVectorizer: Ignore users that are insertelements we can reschedule them
Patch by Arch D. Robison!

llvm-svn: 205018
2014-03-28 17:21:22 +00:00
Andrew Trick c8ac7ea261 SLP vectorizer: Don't hoist vector extracts of phis.
Extracts coming from phis were being hoisted, while all others were
sunk to their uses. This was inconsistent and didn't seem to serve a
purpose. Changing all extracts to be sunk to uses is a prerequisite
for adding block frequency to the SLP vectorizer's cost model.

I benchmarked the change in isolation (without block frequency). I
only saw noise on x86 and some potentially significant improvements on
ARM. No major regressions is good enough for me.

llvm-svn: 204699
2014-03-25 02:18:47 +00:00
Chandler Carruth 4c5001cc9c [LV] While I'm here, use range based for loops which are so much cleaner
for this kind of walk.

llvm-svn: 204188
2014-03-18 22:00:32 +00:00
Chandler Carruth ae324439d0 [LV] The actual change I intended to commit in r204148. Sorry for the
noise.

Original commit log:
Replace some dead code with an assert. When I first ported this pass
from a loop pass to a function pass I did so in the naive, recursive
way. It doesn't actually work, we need a worklist instead. When
I switched to the worklist I didn't delete the naive recursion. That
recursion was also buggy because it was dead and never really exercised.

llvm-svn: 204187
2014-03-18 21:58:38 +00:00
Chandler Carruth f73079ca89 [LV] Replace some dead code with an assert. When I first ported this
pass from a loop pass to a function pass I did so in the naive,
recursive way. It doesn't actually work, we need a worklist instead.
When I switched to the worklist I didn't delete the naive recursion.
That recursion was also buggy because it was dead and never really
exercised.

llvm-svn: 204184
2014-03-18 21:51:46 +00:00
Raul E. Silvera 62f0236d36 Resubmit "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..."
This reverts commit 86cb795388643710dab34941ddcb5a9470ac39d8.
The problems previously found have been resolved through other CLs.

llvm-svn: 203707
2014-03-12 20:21:50 +00:00
Ahmed Charles 5d461ede5b Fix build break.
llvm-svn: 203366
2014-03-09 03:50:36 +00:00
Chandler Carruth cdf4788401 [C++11] Add range based accessors for the Use-Def chain of a Value.
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
   detail
2) Change it to actually be a *Use* iterator rather than a *User*
   iterator.
3) Add an adaptor which is a User iterator that always looks through the
   Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
   they wanted a use_iterator (and to explicitly dig out the User when
   needed), or a user_iterator which makes the Use itself totally
   opaque.

Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.

The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.

However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]

llvm-svn: 203364
2014-03-09 03:16:01 +00:00
Arnold Schwaighofer ab12363c02 LoopVectorizer: Preserve fast-math flags
Fixes PR19045.

llvm-svn: 203008
2014-03-05 21:10:47 +00:00
Craig Topper 3e4c697ca1 [C++11] Add 'override' keyword to virtual methods that override their base class.
llvm-svn: 202953
2014-03-05 09:10:37 +00:00
Chandler Carruth 4220e9c154 [Modules] Move ValueHandle into the IR library where Value itself lives.
Move the test for this class into the IR unittests as well.

This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.

llvm-svn: 202821
2014-03-04 11:17:44 +00:00
Chandler Carruth 820a908df7 [Modules] Move the LLVM IR pattern match header into the IR library, it
obviously is coupled to the IR.

llvm-svn: 202818
2014-03-04 11:08:18 +00:00
Benjamin Kramer d6f1f84f51 [C++11] Replace llvm::tie with std::tie.
The old implementation is no longer needed in C++11.

llvm-svn: 202644
2014-03-02 13:30:33 +00:00
Benjamin Kramer b6d0bd48bd [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Benjamin Kramer 3a377bce4e Now that we have C++11, turn simple functors into lambdas and remove a ton of boilerplate.
No intended functionality change.

llvm-svn: 202588
2014-03-01 11:47:00 +00:00
Rafael Espindola 935125126c Make DataLayout a plain object, not a pass.
Instead, have a DataLayoutPass that holds one. This will allow parts of LLVM
don't don't handle passes to also use DataLayout.

llvm-svn: 202168
2014-02-25 17:30:31 +00:00
Rafael Espindola 43b5a51e7c Make a few more DataLayout variables const.
llvm-svn: 202155
2014-02-25 14:24:11 +00:00
Rafael Espindola aeff8a9c05 Make some DataLayout pointers const.
No functionality change. Just reduces the noise of an upcoming patch.

llvm-svn: 202087
2014-02-24 23:12:18 +00:00
Arnold Schwaighofer 9611d23d63 SLPVectorizer: Try vectorizing 'splat' stores
Vectorize sequential stores of a broadcasted value.
5% on eon.

radar://16124699

llvm-svn: 202067
2014-02-24 19:52:29 +00:00
Rafael Espindola 37dc9e19f5 Rename many DataLayout variables from TD to DL.
I am really sorry for the noise, but the current state where some parts of the
code use TD (from the old name: TargetData) and other parts use DL makes it
hard to write a patch that changes where those variables come from and how
they are passed along.

llvm-svn: 201827
2014-02-21 00:06:31 +00:00
Gerolf Hoflehner 7a463d0650 fix for null VectorizedValue assertion in the SLP Vectorizer (in function vectorizeTree()). radar://16064178
llvm-svn: 201501
2014-02-17 03:06:16 +00:00
Gerolf Hoflehner 282949bf4d fixed typo in comment as my test commit
llvm-svn: 201486
2014-02-16 10:43:25 +00:00
Benjamin Kramer 989b92936c Reduce code duplication resulting from the ConstantVector/ConstantDataVector split.
No intended functionality change.

llvm-svn: 201344
2014-02-13 16:48:38 +00:00
Andrea Di Biagio b7882b3bd1 [Vectorizer] Add a new 'OperandValueKind' in TargetTransformInfo called
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.

The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.

With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.

This patch applies the following changes:
 - The cost model computation now takes into account non-uniform constants;
 - The cost of vector shift instructions has been improved in
   X86TargetTransformInfo analysis pass;
 - BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
   between non-uniform and uniform constant operands.

Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.

llvm-svn: 201272
2014-02-12 23:43:47 +00:00
Arnold Schwaighofer 348e1b60be LoopVectorizer: Keep track of conditional store basic blocks
Before conditional store vectorization/unrolling we had only one
vectorized/unrolled basic block. After adding support for conditional store
vectorization this will not only be one block but multiple basic blocks. The
last block would have the back-edge. I updated the code to use a vector of basic
blocks instead of a single basic block and fixed the users to use the last entry
in this vector. But, I forgot to add the basic blocks to this vector!

Fixes PR18724.

llvm-svn: 201028
2014-02-08 20:41:13 +00:00
Paul Robinson af4e64d095 Disable most IR-level transform passes on functions marked 'optnone'.
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.

llvm-svn: 200892
2014-02-06 00:07:05 +00:00
Arnold Schwaighofer 17455633c7 LoopVectorizer: Enable unrolling of conditional stores and the load/store
unrolling heuristic per default

Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed
up some benchmarks while not causing any interesting regressions.

llvm-svn: 200621
2014-02-02 03:12:34 +00:00
Reid Kleckner a04504fe97 Revert "[SLPV] Recognize vectorizable intrinsics during SLP vectorization ..."
This reverts commit r200576.  It broke 32-bit self-host builds by
vectorizing two calls to @llvm.bswap.i64, which we then fail to expand.

llvm-svn: 200602
2014-02-01 01:37:30 +00:00
Chandler Carruth b3da389e30 [SLPV] Recognize vectorizable intrinsics during SLP vectorization and
transform accordingly. Based on similar code from Loop vectorization.
Subsequent commits will include vectorization of function calls to
vector intrinsics and form function calls to vector library calls.

Patch by Raul Silvera! (Much delayed due to my not running dcommit)

llvm-svn: 200576
2014-01-31 21:14:40 +00:00
Chandler Carruth c12224cb93 [vectorizer] Tweak the way we do small loop runtime unrolling in the
loop vectorizer to not do so when runtime pointer checks are needed and
share code with the new (not yet enabled) load/store saturation runtime
unrolling. Also ensure that we only consider the runtime checks when the
loop hasn't already been vectorized. If it has, the runtime check cost
has already been paid.

I've fleshed out a test case to cover the scalar unrolling as well as
the vector unrolling and comment clearly why we are or aren't following
the pattern.

llvm-svn: 200530
2014-01-31 10:51:08 +00:00
Arnold Schwaighofer 1aab75ab49 LoopVectorizer: Don't count the induction variable multiple times
When estimating register pressure, don't count the induction variable mulitple
times. It is unlikely to be unrolled. This is currently disabled and hidden
behind a flag ("enable-ind-var-reg-heur").

llvm-svn: 200371
2014-01-29 04:36:12 +00:00
Chandler Carruth b783628560 [vectorizer] Completely disable the block frequency guidance of the loop
vectorizer, placing it behind an off-by-default flag.

It turns out that block frequency isn't what we want at all, here or
elsewhere. This has been I think a nagging feeling for several of us
working with it, but Arnold has given some really nice simple examples
where the results are so comprehensively wrong that they aren't useful.

I'm planning to email the dev list with a summary of why its not really
useful and a couple of ideas about how to better structure these types
of heuristics.

llvm-svn: 200294
2014-01-28 09:10:41 +00:00
Arnold Schwaighofer 18865db3c1 LoopVectorize: Support conditional stores by scalarizing
The vectorizer takes a loop like this and widens all instructions except for the
store. The stores are scalarized/unrolled and hidden behind an "if" block.

  for (i = 0; i < 128; ++i) {
    if (a[i] < 10)
      a[i] += val;
  }

  for (i = 0; i < 128; i+=2) {
    v = a[i:i+1];
    v0 = (extract v, 0) + 10;
    v1 = (extract v, 1) + 10;
    if (v0 < 10)
      a[i] = v0;
    if (v1 < 10)
      a[i] = v1;
  }

The vectorizer relies on subsequent optimizations to sink instructions into the
conditional block where they are anticipated.

The flag "vectorize-num-stores-pred" controls whether and how many stores to
handle this way. Vectorization of conditional stores is disabled per default for
now.

This patch also adds a change to the heuristic when the flag
"enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
loops until load/store ports are saturated. This heuristic uses TTI's
getMaxUnrollFactor as a measure for load/store ports.

I also added a second flag -enable-cond-stores-vec. It will enable vectorization
of conditional stores. But there is no cost model for vectorization of
conditional stores in place yet so this will not do good at the moment.

rdar://15892953

Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
-vectorize-num-stores-pred=1 (before the BFI change):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
   Applications/siod/siod         2.18%
 Performance improvements:
   mesa                          -4.42%
   libquantum                    -4.15%

 With a patch that slightly changes the register heuristics (by subtracting the
 induction variable on both sides of the register pressure equation, as the
 induction variable is probably not really unrolled):

 Performance Regressions:
   Benchmarks/Ptrdist/yacr2/yacr2  7.73%
   Applications/siod/siod          1.97%

 Performance Improvements:
   libquantum                    -13.05% (we now also unroll quantum_toffoli)
   mesa                           -4.27%

llvm-svn: 200270
2014-01-28 01:01:53 +00:00
Chandler Carruth e24f3973eb [vectorize] Initial version of respecting PGO in the vectorizer: treat
cold loops as-if they were being optimized for size.

Nothing fancy here. Simply test case included. The nice thing is that we
can now incrementally build on top of this to drive other heuristics.
All of the infrastructure work is done to get the profile information
into this layer.

The remaining work necessary to make this a fully general purpose loop
unroller for very hot loops is to make it a fully general purpose loop
unroller. Things I know of but am not going to have time to benchmark
and fix in the immediate future:

1) Don't disable the entire pass when the target is lacking vector
   registers. This really doesn't make any sense any more.
2) Teach the unroller at least and the vectorizer potentially to handle
   non-if-converted loops. This is trivial for the unroller but hard for
   the vectorizer.
3) Compute the relative hotness of the loop and thread that down to the
   various places that make cost tradeoffs (very likely only the
   unroller makes sense here, and then only when dealing with loops that
   are small enough for unrolling to not completely blow out the LSD).

I'm still dubious how useful hotness information will be. So far, my
experiments show that if we can get the correct logic for determining
when unrolling actually helps performance, the code size impact is
completely unimportant and we can unroll in all cases. But at least
we'll no longer burn code size on cold code.

One somewhat unrelated idea that I've had forever but not had time to
implement: mark all functions which are only reachable via the global
constructors rigging in the module as optsize. This would also decrease
the impact of any more aggressive heuristics here on code size.

llvm-svn: 200219
2014-01-27 13:11:50 +00:00
Chandler Carruth edfa37effa [vectorizer] Add an override for the target instruction cost and use it
to stabilize a test that really is trying to test generic behavior and
not a specific target's behavior.

llvm-svn: 200215
2014-01-27 11:41:50 +00:00
Chandler Carruth 2bb03ba605 [vectorizer] Simplify code to use existing helpers on the Function
object and fewer pointless variables.

Also, add a clarifying comment and a FIXME because the code which
disables *all* vectorization if we can't use implicit floating point
instructions just makes no sense at all.

llvm-svn: 200214
2014-01-27 11:27:37 +00:00
Chandler Carruth 147c23278f [vectorizer] Teach the loop vectorizer's unroller to only unroll by
powers of two. This is essentially always the correct thing given the
impact on alignment, scaling factors that can be used in addressing
modes, etc. Also, fix the management of the unroll vs. small loop cost
to more accurately model things with this world.

Enhance a test case to actually exercise more of the unroll machinery if
using synthetic constants rather than a specific target model. Before
this change, with the added flags this test will unroll 3 times instead
of either 2 or 4 (the two sensible answers).

While I don't expect this to make a huge difference, if there are lots
of loops sitting right on the edge of hitting the 'small unroll' factor,
they might change behavior. However, I've benchmarked moving the small
loop cost up and down in many various ways and by a huge factor (2x)
without seeing more than 0.2% code size growth. Small adjustments such
as the series that led up here have led to about 1% improvement on some
benchmarks, but it is very close to the noise floor so I mostly checked
that nothing regressed. Let me know if you see bad behavior on other
targets but I don't expect this to be a sufficiently dramatic change to
trigger anything.

llvm-svn: 200213
2014-01-27 11:12:24 +00:00
Chandler Carruth 7f90b4530b [vectorizer] Add some flags which are useful for conducting experiments
with the unrolling behavior in the loop vectorizer. No functionality
changed at this point.

These are a bit hack-y, but talking with Hal, there doesn't seem to be
a cleaner way to easily experiment with different thresholds here and he
was also interested in them so I wanted to commit them. Suggestions for
improvement are very welcome here.

llvm-svn: 200212
2014-01-27 11:12:19 +00:00
Chandler Carruth 328998b2f7 [vectorizer] Fix a trivial oversight where we always requested the
number of vector registers rather than toggling between vector and
scalar register number based on VF. I don't have a test case as
I spotted this by inspection and on X86 it only makes a difference if
your target is lacking SSE and thus has *no* vector registers.

If someone wants to add a test case for this for ARM or somewhere else
where this is more significant, that would be awesome.

Also made the variable name a bit more sensible while I'm here.

llvm-svn: 200211
2014-01-27 11:12:14 +00:00
Chandler Carruth 56612b204a [vectorizer] Clean up the handling of unvectorized loop unrolling in the
LoopVectorize pass.

The logic here doesn't make much sense. We *only* unrolled if the
unvectorized loop was a reduction loop with a single basic block *and*
small loop body. The reduction part in particular doesn't make much
sense. Instead, if we just fall through to the vectorized unroll logic
it makes more sense of unrolling if there is a vectorized reduction that
could be hacked on by the SLP vectorizer *or* if the loop is small.

This is mostly a cleanup and nothing in the test suite really exercises
this, but I did run benchmarks across this change and saw no really
significant changes.

llvm-svn: 200198
2014-01-27 08:17:58 +00:00
Chandler Carruth 3aebcb99f7 [LPM] Conclude my immediate work by making the LoopVectorizer
a FunctionPass. With this change the loop vectorizer no longer is a loop
pass and can readily depend on function analyses. In particular, with
this change we no longer have to form a loop pass manager to run the
loop vectorizer which simplifies the entire pass management of LLVM.

The next step here is to teach the loop vectorizer to leverage profile
information through the profile information providing analysis passes.

llvm-svn: 200074
2014-01-25 10:01:55 +00:00
Alp Toker cb40291100 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Arnold Schwaighofer cc742dd9e4 LoopVectorizer: A reduction that has multiple uses of the reduction value is not
a reduction.

Really. Under certain circumstances (the use list of an instruction has to be
set up right - hence the extra pass in the test case) we would not recognize
when a value in a potential reduction cycle was used multiple times by the
reduction cycle.

Fixes PR18526.
radar://15851149

llvm-svn: 199570
2014-01-19 03:18:31 +00:00
Arnold Schwaighofer dc4c9460a2 LoopVectorize: Only strip casts from integer types when replacing symbolic
strides

Fixes PR18480.

llvm-svn: 199291
2014-01-15 03:35:46 +00:00
Chandler Carruth 73523021d0 [PM] Split DominatorTree into a concrete analysis result object which
can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104
2014-01-13 13:07:17 +00:00
Chandler Carruth 5ad5f15cff [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082
2014-01-13 09:26:24 +00:00
Arnold Schwaighofer 66c742aeea LoopVectorizer: Enable strided memory accesses versioning per default
I saw no compile or execution time regressions on x86_64 -mavx -O3.

radar://13075509

llvm-svn: 199015
2014-01-11 20:40:34 +00:00
NAKAMURA Takumi 41c409ce0d LoopVectorize.cpp: Appease MSC16.
Excuse me, I hope msc16 builders would be fine till its end day.
Introduce nullptr then. ;)

llvm-svn: 199001
2014-01-11 09:59:27 +00:00
Arnold Schwaighofer c2e9d759f2 LoopVectorizer: Handle strided memory accesses by versioning
for (i = 0; i < N; ++i)
   A[i * Stride1] += B[i * Stride2];

We take loops like this and check that the symbolic strides 'Strided1/2' are one
and drop to the scalar loop if they are not.

This is currently disabled by default and hidden behind the flag
'enable-mem-access-versioning'.

radar://13075509

llvm-svn: 198950
2014-01-10 18:20:32 +00:00
Chandler Carruth 8a8cd2bab9 Re-sort all of the includes with ./utils/sort_includes.py so that
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.

Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.

llvm-svn: 198685
2014-01-07 11:48:04 +00:00
Arnold Schwaighofer 50b8302c55 LoopVectorizer: Don't if-convert constant expressions that can trap
A phi node operand or an instruction operand could be a constant expression that
can trap (division). Check that we don't vectorize such cases.

PR16729
radar://15653590

llvm-svn: 197449
2013-12-17 01:11:01 +00:00
NAKAMURA Takumi 8bc9bfaa5a Prune redundant dependencies in LLVMBuild.txt.
llvm-svn: 196988
2013-12-11 00:30:57 +00:00
NAKAMURA Takumi e3afe2ef62 Whitespaces.
llvm-svn: 196880
2013-12-10 05:39:12 +00:00
Jakub Staszak 3ab283c157 Don't #include heavy Dominators.h file in LoopInfo.h. This change reduces
overall time of LLVM compilation by ~1%.

llvm-svn: 196667
2013-12-07 21:20:17 +00:00
Renato Golin 729a3ae90a Add #pragma vectorize enable/disable to LLVM
The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.

This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.

The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.

Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.

llvm-svn: 196537
2013-12-05 21:20:02 +00:00
Rafael Espindola cdbde3aacc Fix non-deterministic behavior.
We use CSEBlocks to initialize a worklist:

SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end());

so it must have a deterministic order.

llvm-svn: 196520
2013-12-05 18:28:01 +00:00
Arnold Schwaighofer 7ee53cac80 SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use
We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.

Fixes PR18129.

radar://15582184

llvm-svn: 196508
2013-12-05 15:14:40 +00:00
Alp Toker f907b891da Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Nadav Rotem b0082d246a PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions
may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE.

llvm-svn: 195791
2013-11-26 22:24:25 +00:00
Arnold Schwaighofer a2c8e008d2 LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary
In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.

Fixes PR18049.

llvm-svn: 195787
2013-11-26 22:11:23 +00:00
Nadav Rotem f9f8482e3a PR18060 - When we RAUW values with ExtractElement instructions in some cases
we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.

llvm-svn: 195773
2013-11-26 17:29:19 +00:00
Chandler Carruth 57458517ef Migrate metadata information from scalar to vector instructions during
SLP vectorization. Based on the code in BBVectorizer.

Fixes PR17741.

Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my
driving of clang-format. =]

llvm-svn: 195528
2013-11-23 00:48:34 +00:00