Commit Graph

1749 Commits

Author SHA1 Message Date
Simon Pilgrim 5cd4eb96f6 [SLPVectorizer] shouldReorderOperands - just check for reordering. NFCI.
Remove the I.getOperand() calls from inside shouldReorderOperands - reorderInputsAccordingToOpcode should handle the creation of the operand lists and shouldReorderOperands should just check to see whether the i'th element should be commuted.

llvm-svn: 356854
2019-03-24 13:36:32 +00:00
Simon Pilgrim 1466e5c383 Fix unused variable warning on non-asserts builds. NFCI.
llvm-svn: 356841
2019-03-23 16:56:23 +00:00
Simon Pilgrim 64feec7977 Remove unused function argument. NFCI.
llvm-svn: 356840
2019-03-23 16:20:34 +00:00
Simon Pilgrim c7ba9555cf [SLPVectorizer] reorderInputsAccordingToOpcode - use InstructionState directly. NFCI.
llvm-svn: 356832
2019-03-23 13:44:06 +00:00
Simon Pilgrim f4f01f3cff [SLPVectorizer] Don't repeat VL.size() call. NFCI.
llvm-svn: 356830
2019-03-23 12:11:25 +00:00
Simon Pilgrim b68322f9d0 [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059

llvm-svn: 356814
2019-03-22 21:27:11 +00:00
Simon Pilgrim d3a8fd8bfb Revert rL355906: [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059
........

Reverted due to buildbot failures that I don't have time to track down.

llvm-svn: 355913
2019-03-12 11:51:59 +00:00
Simon Pilgrim 5db95efdbd Try to fix SLPVectorizer BoUpSLP::BoEdgeInfo::dump visibility on non-debug builds
llvm-svn: 355912
2019-03-12 11:31:06 +00:00
Simon Pilgrim 2086a8894d [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059

llvm-svn: 355906
2019-03-12 10:51:51 +00:00
Sanjoy Das 3f5ce18658 Reland "Relax constraints for reduction vectorization"
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.

Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355889
2019-03-12 01:31:44 +00:00
Sanjoy Das 2136a5bc49 Revert "Relax constraints for reduction vectorization"
This reverts commit r355868.  Breaks hexagon.

llvm-svn: 355873
2019-03-11 22:37:31 +00:00
Sanjoy Das 93f8cc186a Relax constraints for reduction vectorization
Summary:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355868
2019-03-11 21:36:41 +00:00
Jonas Hahnfeld e071cd86df Hide two unused debugging methods, NFCI.
GCC correctly moans that PlainCFGBuilder::isExternalDef(llvm::Value*) and
StackSafetyDataFlowAnalysis::verifyFixedPoint() are defined but not used
in Release builds. Hide them behind 'ifndef NDEBUG'.

llvm-svn: 355205
2019-03-01 17:15:21 +00:00
Simon Pilgrim a066f1f9e6 [Vectorizer] Add vectorization support for fixed smul/umul intrinsics
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.

Differential Revision: https://reviews.llvm.org/D58616

llvm-svn: 354790
2019-02-25 15:42:02 +00:00
Alina Sbirlea 0a8bc14ad7 [MemorySSA & LoopPassManager] Add remaining book keeping [NFCI].
Add plumbing to get MemorySSA in the remaining loop passes.
Also update unit test to add the dependency.
[EnableMSSALoopDependency remains disabled].

llvm-svn: 353901
2019-02-12 23:48:02 +00:00
Michael Kruse 77a614a6e1 Refactor setAlreadyUnrolled() and setAlreadyVectorized().
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.

In doing so we fix 3 potential issues:

 * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
   metadata even though it will not be used anymore. This already caused
   problems such as http://llvm.org/PR40546. Change the behavior to the
   one of setAlreadyUnrolled which deletes this loop metadata.

 * setAlreadyUnrolled() used to create a new LoopID by calling
   MDNode::get with nullptr as the first operand, then replacing it by
   the returned references using replaceOperandWith. It is possible
   that MDNode::get would instead return an existing node (due to
   de-duplication) that then gets modified. To avoid, use a fresh
   TempMDNode that does not get uniqued with anything else before
   replacing it with replaceOperandWith.

 * LoopVectorizeHints::matchesHintMetadataName() only compares the
   suffix of the attribute to set the new value for. That is, when
   called with "enable", would erase attributes such as
   "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
   "llvm.loop.distribute.enable" instead of the one to replace.
   Fortunately, function was only called with "isvectorized".

Differential Revision: https://reviews.llvm.org/D57566

llvm-svn: 353738
2019-02-11 19:45:44 +00:00
Chandler Carruth b53f0e1145 Update files that were mistakenly added with the old file header to the
new one.

llvm-svn: 353665
2019-02-11 08:07:38 +00:00
Florian Hahn f557a94aa3 [LV] Remove unnecessary assignment to UserIC.
llvm-svn: 353469
2019-02-07 21:23:37 +00:00
Florian Hahn ba5acbc4fe [LV] Prevent interleaving if computeMaxVF returned None.
As discussed in D57382, interleaving should be avoided if computeMaxVF
returns None, same as we currently do for vectorization.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477

Reviewers: Ayal, dcaballe, hsaito, mkuper, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D57837

llvm-svn: 353461
2019-02-07 20:49:10 +00:00
James Y Knight 7716075a17 [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

llvm-svn: 352913
2019-02-01 20:44:47 +00:00
James Y Knight 14359ef1b6 [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
Yevgeny Rouban 4cdd783955 [SLPVectorizer] Get rid of IndexQueue array from vectorizeStores. NFCI.
Indices are checked as they are generated. No need to fill the whole array of indices.

Differential Revision: https://reviews.llvm.org/D57144

llvm-svn: 352839
2019-02-01 06:44:08 +00:00
Mircea Trofin ec02630278 [llvm] Clarify responsiblity of some of DILocation discriminator APIs
Summary:
Renamed setBaseDiscriminator to cloneWithBaseDiscriminator, to match
similar APIs. Also changed its behavior to copy over the other
discriminator components, instead of eliding them.

Renamed cloneWithDuplicationFactor to
cloneByMultiplyingDuplicationFactor, which more closely matches what
this API does.

Reviewers: dblaikie, wmi

Reviewed By: dblaikie

Subscribers: zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D56220

llvm-svn: 351996
2019-01-24 00:10:25 +00:00
Hideki Saito 4e4ecae028 [LV][VPlan] Change to implement VPlan based predication for
VPlan-native path

Context: Patch Series #2 for outer loop vectorization support in LV
using VPlan. (RFC:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).

Patch series #2 checks that inner loops are still trivially lock-step
among all vector elements. Non-loop branches are blindly assumed as
divergent.

Changes here implement VPlan based predication algorithm to compute
predicates for blocks that need predication. Predicates are computed
for the VPLoop region in reverse post order. A block's predicate is
computed as OR of the masks of all incoming edges. The mask for an
incoming edge is computed as AND of predecessor block's predicate and
either predecessor's Condition bit or NOT(Condition bit) depending on
whether the edge from predecessor block to the current block is true
or false edge.

Reviewers: fhahn, rengolin, hsaito, dcaballe

Reviewed By: fhahn

Patch by Satish Guggilla, thanks!

Differential Revision: https://reviews.llvm.org/D53349

llvm-svn: 351990
2019-01-23 22:43:12 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Alexey Bataev 18809a6bbb [SLP] Fix PR40310: The reduction nodes should stay scalar.
Summary:
Sometimes the SLP vectorizer tries to vectorize the horizontal reduction
nodes during regular vectorization. This may happen inside of the loops,
when there are some vectorizable PHIs. Patch fixes this by checking if
the node is the reduction node and thus it must not be vectorized, it must
be gathered.

Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56783

llvm-svn: 351349
2019-01-16 15:39:52 +00:00
Sanjay Patel 7d65fe5cd5 [LoopVectorizer] give more advice in remark about failure to vectorize call
Something like this is requested by:
https://bugs.llvm.org/show_bug.cgi?id=40265
...and it seems like a common enough case that we should acknowledge it.

Differential Revision: https://reviews.llvm.org/D56551

llvm-svn: 351010
2019-01-12 15:27:15 +00:00
Mircea Trofin b53eeb6f4c [llvm] API for encoding/decoding DWARF discriminators.
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)

The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.

Reviewers: davidxl, danielcdh, wmi, dblaikie

Reviewed By: dblaikie

Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D55681

llvm-svn: 349973
2018-12-21 22:48:50 +00:00
Anton Afanasyev ce28791e20 Test commit
Fix typos.

llvm-svn: 349644
2018-12-19 17:18:40 +00:00
Michael Kruse d4eb13c880 [LoopVectorize] Rename pass options. NFC.
Rename:
NoUnrolling to InterleaveOnlyWhenForced
and
AlwaysVectorize to !VectorizeOnlyWhenForced

Contrary to what the name 'AlwaysVectorize' suggests, it does not
unconditionally vectorize all loops, but applies a cost model to
determine whether vectorization is profitable to all loops. Hence,
passing false will disable the cost model, except when a loop is marked
with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
by @hfinkel in D55716) better matches this behavior.

Similarly, 'NoUnrolling' disables the profitability cost model for
interleaving (a term to distinguish it from unrolling by the
LoopUnrollPass); rename it for consistency.

Differential Revision: https://reviews.llvm.org/D55785

llvm-svn: 349513
2018-12-18 17:46:09 +00:00
Michael Kruse 7244852557 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288

llvm-svn: 348944
2018-12-12 17:32:52 +00:00
Markus Lavin 4dc4ebd606 [PM] Port LoadStoreVectorizer to the new pass manager.
Differential Revision: https://reviews.llvm.org/D54848

llvm-svn: 348570
2018-12-07 08:23:37 +00:00
Alexey Bataev 3689747619 [SLP]PR39774: Update references of the replaced external instructions.
Summary:
An additional fix for PR39774. Need to update the references for the
RedcutionRoot instruction when it is replaced during the vectorization
phase to avoid compiler crash on reduction vectorization.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55017

llvm-svn: 347997
2018-11-30 15:14:20 +00:00
Alexey Bataev 579c2d9d64 [SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized.
Summary:
If the original reduction root instruction was vectorized, it might be
removed from the tree. It means that the insertion point may become
invalidated and the whole vectorization of the reduction leads to the
incorrect output result.
The ReductionRoot instruction must be marked as externally used so it
could not be removed. Otherwise it might cause inconsistency with the
cost model and we may end up with too optimistic optimization.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54955

llvm-svn: 347759
2018-11-28 14:34:11 +00:00
Vedant Kumar 4de31bba51 [IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlock
Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.

This can be more efficient than using pred_size(), which is a linear
time operation.

We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).

With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.

See llvm.org/PR39702 for a harder but more general way of achieving
similar results.

Differential Revision: https://reviews.llvm.org/D54686

llvm-svn: 347256
2018-11-19 19:54:27 +00:00
Anna Thomas 5e9215f02b [LV] Avoid vectorizing unsafe dependencies in uniform address
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.

Fixes PR39653.

Reviewers: Ayal, efriedma

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D54538

llvm-svn: 347220
2018-11-19 15:39:59 +00:00
Florian Hahn 6df11868b5 [VPlan, SLP] Use SmallPtrSet for Candidates.
This slightly improves the candidate handling in getBest().

llvm-svn: 346870
2018-11-14 15:58:40 +00:00
Florian Hahn 02cb67deb9 [VPlan] Remove LLVM_DEBUG from VPlanSlp::dumpBundle.
The caller should take care of only calling it with debug enabled.

llvm-svn: 346860
2018-11-14 13:33:44 +00:00
Florian Hahn 2eca3728ee [VPlan] Update ifdef.
llvm-svn: 346858
2018-11-14 13:21:26 +00:00
Florian Hahn 09e516c54b [VPlan, SLP] Add simple SLP analysis on top of VPlan.
This patch adds an initial implementation of the look-ahead SLP tree
construction described in 'Look-Ahead SLP: Auto-vectorization in the Presence
of Commutative Operations, CGO 2018 by Vasileios Porpodas, Rodrigo C. O. Rocha,
Luís F. W. Góes'.

It returns an SLP tree represented as VPInstructions, with combined
instructions represented as a single, wider VPInstruction.

This initial version does not support instructions with multiple
different users (either inside or outside the SLP tree) or
non-instruction operands; it won't generate any shuffles or
insertelement instructions.

It also just adds the analysis that builds an SLP tree rooted in a set
of stores. It does not include any cost modeling or memory legality
checks. The plan is to integrate it with VPlan based cost modeling, once
available and to only apply it to operations that can be widened.

A follow-up patch will add a support for replacing instructions in a
VPlan with their SLP counter parts.

Reviewers: Ayal, mssimpso, rengolin, mkuper, hfinkel, hsaito, dcaballe, vporpo, RKSimon, ABataev

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D4949

llvm-svn: 346857
2018-11-14 13:11:49 +00:00
Florian Hahn a4dc7feeea [VPlan] VPlan version of InterleavedAccessInfo.
This patch turns InterleaveGroup into a template with the instruction type
being a template parameter. It also adds a VPInterleavedAccessInfo class, which
only contains a mapping from VPInstructions to their respective InterleaveGroup.
As we do not have access to scalar evolution in VPlan, we can re-use
convert InterleavedAccessInfo to VPInterleavedAccess info.


Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D49489

llvm-svn: 346758
2018-11-13 15:58:18 +00:00
Simon Pilgrim 631f2bf51e [CostModel] Add more realistic SK_ExtractSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type.

llvm-svn: 346656
2018-11-12 14:25:23 +00:00
Ayal Zaks 45a3ca7be7 [LV] Avoid vectorizing loops under opt for size that involve SCEV checks
Fix PR39417, PR39497

The loop vectorizer may generate runtime SCEV checks for overflow and stride==1
cases, leading to execution of original scalar loop. The latter is forbidden
when optimizing for size. An assert introduced in r344743 triggered the above
PR's showing it does happen. This patch fixes this behavior by preventing
vectorization in such cases.

Differential Revision: https://reviews.llvm.org/D53612

llvm-svn: 345959
2018-11-02 09:16:12 +00:00
Dorit Nuzman 34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Simon Pilgrim 44a9a71d2a [TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.

Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!

I've done my best to fix a number of vectorizer uses:

SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.

LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.

Differential Revision: https://reviews.llvm.org/D53573

llvm-svn: 345617
2018-10-30 18:10:02 +00:00
Jonas Paulsson 1f067c94dc [LoopVectorizer] Fix for cost values of memory accesses.
This commit is a combination of two patches:

* "Fix in getScalarizationOverhead()"

   If target returns false in TTI.prefersVectorizedAddressing(), it means the
   address registers will not need to be extracted. Therefore, there should
   be no operands scalarization overhead for a load instruction.

* "Don't pass the instruction pointer from getMemInstScalarizationCost."

   Since VF is always > 1, this is a cost query for an instruction in the
   vectorized loop and it should not be evaluated within the scalar
   context of the instruction.

Review: Ulrich Weigand, Hal Finkel
https://reviews.llvm.org/D52351
https://reviews.llvm.org/D52417

llvm-svn: 345603
2018-10-30 14:34:15 +00:00
Dorit Nuzman 5114390e48 [LV] Don't have fold-tail under optsize invalidate interleave-groups when
masked-interleaving is enabled

Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D53559

llvm-svn: 345115
2018-10-24 07:11:38 +00:00
Simon Pilgrim 532a0f122e [SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions
Expand arithmetic reduction to include mul/and/or/xor instructions.

This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests.

This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason.

Differential Revision: https://reviews.llvm.org/D53473

llvm-svn: 345037
2018-10-23 15:13:09 +00:00
Dorit Nuzman da5dc13355 Leftover bits from https://reviews.llvm.org/D53420 that were accidentally left
out of revision 344883

llvm-svn: 345021
2018-10-23 11:51:55 +00:00
Dorit Nuzman 3ec99fe21b [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when
optimizing for size

LV is careful to respect -Os and not to create a scalar epilog in all cases
(runtime tests, trip-counts that require a remainder loop) except for peeling
due to gaps in interleave-groups. This patch fixes that; -Os will now have us
invalidate such interleave-groups and vectorize without an epilog.

The patch also removes a related FIXME comment that is now obsolete, and was
also inaccurate:
"FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller
MaxVF that does not require a scalar epilog."
(requiresScalarEpilog() has nothing to do with VF).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53420

llvm-svn: 344883
2018-10-22 06:17:09 +00:00
Ayal Zaks b0b5312e67 [LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size
When optimizing for size, a loop is vectorized only if the resulting vector loop
completely replaces the original scalar loop. This holds if no runtime guards
are needed, if the original trip-count TC does not overflow, and if TC is a
known constant that is a multiple of the VF. The last two TC-related conditions
can be overcome by
1. rounding the trip-count of the vector loop up from TC to a multiple of VF;
2. masking the vector body under a newly introduced "if (i <= TC-1)" condition.

The patch allows loops with arbitrary trip counts to be vectorized under -Os,
subject to the existing cost model considerations. It also applies to loops with
small trip counts (under -O2) which are currently handled as if under -Os.

The patch does not handle loops with reductions, live-outs, or w/o a primary
induction variable, and disallows interleave groups.

(Third, final and main part of -)
Differential Revision: https://reviews.llvm.org/D50480

llvm-svn: 344743
2018-10-18 15:03:15 +00:00
Anna Thomas 6f732bfb79 [LV] Teach vectorizer about variant value store into uniform address
Summary:
Teach vectorizer about vectorizing variant value stores to uniform
address. Similar to rL343028, we do not allow vectorization if we have
multiple stores to the same uniform address.

Cost model already has the change for considering the extract
instruction cost for a variant value store. See added test cases for how
vectorization is done.
The patch also contains changes to the ORE messages.

Reviewers: Ayal, mkuper, anemet, hsaito

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D52656

llvm-svn: 344613
2018-10-16 15:46:26 +00:00
Chandler Carruth e303c87e19 [TI removal] Make `getTerminator()` return a generic `Instruction`.
This removes the primary remaining API producing `TerminatorInst` which
will reduce the rate at which code is introduced trying to use it and
generally make it much easier to remove the remaining APIs across the
codebase.

Also clean up some of the stragglers that the previous mechanical update
of variables missed.

Users of LLVM and out-of-tree code generally will need to update any
explicit variable types to handle this. Replacing `TerminatorInst` with
`Instruction` (or `auto`) almost always works. Most of these edits were
made in prior commits using the perl one-liner:
```
perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator\(\))/Instruction\1/g'
```

This also my break some rare use cases where people overload for both
`Instruction` and `TerminatorInst`, but these should be easily fixed by
removing the `TerminatorInst` overload.

llvm-svn: 344504
2018-10-15 10:42:50 +00:00
Chandler Carruth edb12a838a [TI removal] Make variables declared as `TerminatorInst` and initialized
by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502
2018-10-15 10:04:59 +00:00
Ayal Zaks e567b5b526 [LV] Fix comments reported when not vectorizing single iteration loops; NFC
Landing this as a separate part of https://reviews.llvm.org/D50480, being a
seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count
without remainder under opt for size).

llvm-svn: 344483
2018-10-14 17:53:02 +00:00
Dorit Nuzman 38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman 5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman 8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Florian Hahn 18e07bb822 [LV] Use SmallVector instead of DenseMap in calculateRegisterUsage (NFC).
We assign indices sequentially for seen instructions, so we can just use
a vector and push back the seen instructions. No need for using a
DenseMap.

Reviewers: hsaito, rengolin, nadav, dcaballe

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D53089

llvm-svn: 344233
2018-10-11 09:46:25 +00:00
Florian Hahn 7eb5cb4ebc [LV] Ignore more debug info.
We can avoid doing some unnecessary work by skipping debug instructions
in a few loops. It also helps to ensure debug instructions do not
prevent vectorization, although I do not have any concrete test cases
for that.

Reviewers: rengolin, hsaito, dcaballe, aprantl, vsk

Reviewed By: rengolin, dcaballe

Differential Revision: https://reviews.llvm.org/D53091

llvm-svn: 344232
2018-10-11 09:27:24 +00:00
Renato Golin d8e7ca4a32 [VPlan] Fix CondBit quoting in dumpBasicBlock
Quotes were being printed for VPInstructions but not the rest.

llvm-svn: 344161
2018-10-10 17:55:21 +00:00
Max Kazantsev b07369651e [LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160
At the point when we perform `emitTransformedIndex`, we have a broken IR (in
particular, we have Phis for which not every incoming value is properly set). On
such IR, it is illegal to create SCEV expressions, because their internal
simplification process may try to prove some predicates and break when it
stumbles across some broken IR.

The only purpose of using SCEV in this particular place is attempt to simplify
the generated code slightly. It seems that the result isn't worth it, because
some trivial cases (like addition of zero and multiplication by 1) can be
handled separately if needed, but more generally InstCombine is able to achieve
the goals we want to achieve by using SCEV.

This patch fixes a functional crash described in PR39160, and as side-effect it
also generates a bit smarter code in some simple cases. It also may cause some
optimality loss (i.e. we will now generate `mul` by power of `2` instead of
shift etc), but there is nothing what InstCombine could not handle later. In
case of dire need, we can support more trivial cases just in place.

Note that this patch only fixes one particular case of the general problem that
LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR.
The general solution, however, seems complex enough.

Differential Revision: https://reviews.llvm.org/D52881
Reviewed By: fhahn, hsaito

llvm-svn: 343954
2018-10-08 05:46:29 +00:00
Jonas Paulsson 29d80f07ee [LoopVectorizer] Use TTI.getOperandInfo()
Call getOperandInfo() instead of using (near) duplicated code in
LoopVectorizationCostModel::getInstructionCost().

This gets the OperandValueKind and OperandValueProperties values for a Value
passed as operand to an arithmetic instruction.

getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but
is now instead a public member.

Review: Florian Hahn
https://reviews.llvm.org/D52883

llvm-svn: 343852
2018-10-05 14:34:04 +00:00
Anna Thomas b1e3d45318 [LV][LAA] Vectorize loop invariant values stored into loop invariant address
Summary:
We are overly conservative in loop vectorizer with respect to stores to loop
invariant addresses.
More details in https://bugs.llvm.org/show_bug.cgi?id=38546
This is the first part of the fix where we start with vectorizing loop invariant
values to loop invariant addresses.

This also includes changes to ORE for stores to invariant address.

Reviewers: anemet, Ayal, mkuper, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50665

llvm-svn: 343028
2018-09-25 20:57:20 +00:00
Warren Ristow 4f27730eaf [Loop Vectorizer] Abandon vectorization when no integer IV found
Support for vectorizing loops with secondary floating-point induction
variables was added in r276554.  A primary integer IV is still required
for vectorization to be done.  If an FP IV was found, but no integer IV
was found at all (primary or secondary), the attempt to vectorize still
went forward, causing a compiler-crash.  This change abandons that
attempt when no integer IV is found.  (Vectorizing FP-only cases like
this, rather than bailing out, is discussed as possible future work
in D52327.)

See PR38800 for more information.

Differential Revision: https://reviews.llvm.org/D52327

llvm-svn: 342786
2018-09-21 23:03:50 +00:00
Matt Arsenault c640798597 LSV: Fix adjust alloca alignment trick for AMDGPU
This was checking the hardcoded address space 0 for the stack.
Additionally, this should be checking for legality with
the adjusted alignment, so defer the alignment check.

Also try to split if the unaligned access isn't allowed.

llvm-svn: 342442
2018-09-18 02:05:44 +00:00
Hideki Saito d19851ac7e Fix for the buildbot failure http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/23635
from the commit (r342197) of https://reviews.llvm.org/D50820.

llvm-svn: 342201
2018-09-14 02:02:57 +00:00
Hideki Saito ea7f3035a0 [VPlan] Implement initial vector code generation support for simple outer loops.
Summary:
[VPlan] Implement vector code generation support for simple outer loops.

Context: Patch Series #1 for outer loop vectorization support in LV  using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
                                                          
This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following:

  - force vector code generation using explicit vectorize_width

  - add conservative early returns in cost model and other places for VPlanNativePath

  - add code for setting up outer loop inductions 

  - support for widening non-induction PHIs that can result from inner loops and uniform conditional branches

  - support for generating uniform inner branches

We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer. 

Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal

Reviewed By: fhahn, hsaito

Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits

Differential Revision: https://reviews.llvm.org/D50820

llvm-svn: 342197
2018-09-14 00:36:00 +00:00
Florian Hahn 1086ce2397 [LV] Move InterleaveGroup and InterleavedAccessInfo to VectorUtils.h (NFC)
Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use
them for VPlan outside LoopVectorize.cpp

Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00

Reviewed By: rengolin, xbolva00

Differential Revision: https://reviews.llvm.org/D49488

llvm-svn: 342027
2018-09-12 08:01:57 +00:00
Vikram TV 09be521d4d Move a transformation routine from LoopUtils to LoopVectorize.
Summary:
Move InductionDescriptor::transform() routine from LoopUtils to its only uses in LoopVectorize.cpp.
Specifically, the function is renamed as InnerLoopVectorizer::emitTransformedIndex().

This is a child to D51153.

Reviewers: dmgreen, llvm-commits

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D51837

llvm-svn: 341776
2018-09-10 06:16:44 +00:00
Vikram TV 6594dc377d Move createMinMaxOp() out of RecurrenceDescriptor.
Reviewers: dmgreen, llvm-commits

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D51838

llvm-svn: 341773
2018-09-10 05:05:08 +00:00
Anna Thomas 110df11a1a [LV] Fix code gen for conditionally executed loads and stores
Fix a latent bug in loop vectorizer which generates incorrect code for
memory accesses that are executed conditionally. As pointed in review,
this bug definitely affects uniform loads and may affect conditional
stores that should have turned into scatters as well).

The code gen for conditionally executed uniform loads on architectures
that support masked gather instructions is broken.

Without this patch, we were unconditionally executing the *conditional*
load in the vectorized version.

This patch does the following:
1. Uniform conditional loads on architectures with gather support will
   have correct code generated. In particular, the cost model
   (setCostBasedWideningDecision) is fixed.
2. For the recipes which are handled after the widening decision is set,
   we use the isScalarWithPredication(I, VF) form which is added in the
   patch.

3. Fix the vectorization cost model for scalarization
   (getMemInstScalarizationCost): implement and use isPredicatedInst to
   identify *all* predicated instructions, not just scalar+predicated. So,
   now the cost for scalarization will be increased for maskedloads/stores
   and gather/scatter operations. In short, we should be choosing the
   gather/scatter in place of scalarization on archs where it is
   profitable.
4. We needed to weaken the assert in useEmulatedMaskMemRefHack.

Reviewers: Ayal, hsaito, mkuper

Differential Revision: https://reviews.llvm.org/D51313

llvm-svn: 341673
2018-09-07 15:53:48 +00:00
Anna Thomas dbacea188b [LV] First order recurrence phis should not be treated as uniform
This is fix for PR38786.
First order recurrence phis were incorrectly treated as uniform,
which caused them to be vectorized as uniform instructions.

Patch by Ayal Zaks and Orivej Desh!

Reviewed by: Anna

Differential Revision: https://reviews.llvm.org/D51639

llvm-svn: 341416
2018-09-04 22:12:23 +00:00
Matt Arsenault c807ce0ee4 SLPVectorizer: Fix assert with different sized address spaces
llvm-svn: 341215
2018-08-31 14:34:53 +00:00
David Bolvansky 589bb484f6 [LoopVectorize][NFCI] Use find instead of count
Summary:
Avoid "count" if possible -> use "find" to check for the existence of keys.

Passed llvm test suite.

Reviewers: fhahn, dcaballe, mkuper, rengolin

Reviewed By: fhahn

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51054

llvm-svn: 340563
2018-08-23 18:34:58 +00:00
Anna Thomas b02b0ad8c7 [LV] Vectorize loops where non-phi instructions used outside loop
Summary:
Follow up change to rL339703, where we now vectorize loops with non-phi
instructions used outside the loop. Note that the cyclic dependency
identification occurs when identifying reduction/induction vars.

We also need to identify that we do not allow users where the PSCEV information
within and outside the loop are different. This was the fix added in rL307837
for PR33706.

Reviewers: Ayal, mkuper, fhahn

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D50778

llvm-svn: 340278
2018-08-21 14:40:27 +00:00
Anna Thomas 6a1dd77f5d NFC: Clarify comment in loop vectorization legality
Clarifying the comment about PSCEV and external IV users by referencing
the bug in question.

llvm-svn: 339722
2018-08-14 20:25:13 +00:00
Anna Thomas 60a1e4dddc [LV] Teach about non header phis that have uses outside the loop
Summary:
This patch teaches the loop vectorizer to vectorize loops with non
header phis that have have outside uses.  This is because the iteration
dependence distance for these phis can be widened upto VF (similar to
how we do for induction/reduction) if they do not have a cyclic
dependence with header phis. When identifying reduction/induction/first
order recurrence header phis, we already identify if there are any cyclic
dependencies that prevents vectorization.

The vectorizer is taught to extract the last element from the vectorized
phi and update the scalar loop exit block phi to contain this extracted
element from the vector loop.

This patch can be extended to vectorize loops where instructions other
than phis have outside uses.

Reviewers: Ayal, mkuper, mssimpso, efriedma

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50579

llvm-svn: 339703
2018-08-14 18:22:19 +00:00
Alexey Bataev 0edcd0278d [SLP] Fix insert point for reused extract instructions.
Summary:
Reworked the previously committed patch to insert shuffles for reused
extract element instructions in the correct position. Previous logic was
incorrect, and might lead to the crash with PHIs and EH instructions.

Reviewers: efriedma, javed.absar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50143

llvm-svn: 339166
2018-08-07 19:21:05 +00:00
Alexey Bataev c0c3a6ed5e [SLP] Fix PR38339: Instruction does not dominate all uses!
Summary:
If the ExtractElement instructions can be optimized out during the
vectorization and we need to reshuffle the parent vector, this
ShuffleInstruction may be inserted in the wrong place causing compiler
to produce incorrect code.

Reviewers: spatel, RKSimon, mkuper, hfinkel, javed.absar

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49928

llvm-svn: 338380
2018-07-31 14:02:43 +00:00
Diego Caballero 3587150fcb [VPlan] Introduce VPLoopInfo analysis.
The patch introduces loop analysis (VPLoopInfo/VPLoop) for VPBlockBases.
This analysis will be necessary to perform some H-CFG transformations and
detect and introduce regions representing a loop in the H-CFG.

Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso

Reviewed By: fhahn 

Differential Revision: https://reviews.llvm.org/D48816

llvm-svn: 338346
2018-07-31 01:57:29 +00:00
Diego Caballero 2a34ac86d3 [VPlan] Introduce VPlan-based dominator analysis.
The patch introduces dominator analysis for VPBlockBases and extend
VPlan's GraphTraits specialization with the required interfaces. Dominator
analysis will be necessary to perform some H-CFG transformations and
to introduce VPLoopInfo (LoopInfo analysis on top of the VPlan representation).

Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48815

llvm-svn: 338310
2018-07-30 21:33:31 +00:00
Fangrui Song f78650a8de Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h}

llvm-svn: 338293
2018-07-30 19:41:25 +00:00
Anastasis Grammenos f6e143e67f Revert "[LV][DebugInfo] Set DL to the middle block Icmp instruction"
This reverts commit r338106.

llvm-svn: 338109
2018-07-27 08:22:54 +00:00
Anastasis Grammenos 03948d0e0f [LV][DebugInfo] Set DL to the middle block Icmp instruction
Reviewers: hsaito

Differential Revision: https://reviews.llvm.org/D49746

llvm-svn: 338106
2018-07-27 07:12:44 +00:00
Fangrui Song 984a424c8a [LoadStoreVectorizer] Use const reference
llvm-svn: 337992
2018-07-26 01:11:36 +00:00
Roman Tereshin 4f10a9d3a3 [LSV] Look through selects for consecutive addresses
In some cases LSV sees (load/store _ (select _ <pointer expression>
<pointer expression>)) patterns in input IR, often due to sinking and
other forms of CFG simplification, sometimes interspersed with
bitcasts and all-constant-indices GEPs. With this
patch`areConsecutivePointers` method would attempt to handle select
instructions. This leads to an increased number of successful
vectorizations.

Technically, select instructions could appear in index arithmetic as
well, however, we don't see those in our test suites / benchmarks.
Also, there is a lot more freedom in IR shapes computing integral
indices in general than in what's common in pointer computations, and
it appears that it's quite unreliable to do anything short of making
select instructions first class citizens of Scalar Evolution, which
for the purposes of this patch is most definitely an overkill.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D49428

llvm-svn: 337965
2018-07-25 21:33:00 +00:00
Hideki Saito ef380b0fc5 [LV] Fix for PR38110, LV encountered llvm_unreachable()
Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash).

Reviewers: sbaranga, mssimpso, hsaito

Reviewed By: hsaito

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D49461

llvm-svn: 337861
2018-07-24 22:30:31 +00:00
Roman Tereshin 31d52847ef Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reapplies commit r337489 reverted by r337541
Additionally, this commit contains a speculative fix to the issue reported in r337541
(the report does not contain an actionable reproducer, just a stack trace)

llvm-svn: 337606
2018-07-20 20:10:04 +00:00
Sam McCall 57743883f1 Revert "[LSV] Refactoring + supporting bitcasts to a type of different size"
This reverts commit r337489.
It causes asserts to fire in some TensorFlow tests, e.g.
tensorflow/compiler/tests/gather_test.py on GPU.

Example stack trace:
Start test case: GatherTest.testHigherRank
assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits"
    @     0x5559446ebe10  __assert_fail
    @     0x55593ef32f5e  llvm::APInt::trunc()
    @     0x55593d78f86e  (anonymous namespace)::Vectorizer::lookThroughComplexAddresses()
    @     0x55593d78f2bc  (anonymous namespace)::Vectorizer::areConsecutivePointers()
    @     0x55593d78d128  (anonymous namespace)::Vectorizer::isConsecutiveAccess()
    @     0x55593d78c926  (anonymous namespace)::Vectorizer::vectorizeInstructions()
    @     0x55593d78c221  (anonymous namespace)::Vectorizer::vectorizeChains()
    @     0x55593d78b948  (anonymous namespace)::Vectorizer::run()
    @     0x55593d78b725  (anonymous namespace)::LoadStoreVectorizer::runOnFunction()
    @     0x55593edf4b17  llvm::FPPassManager::runOnFunction()
    @     0x55593edf4e55  llvm::FPPassManager::runOnModule()
    @     0x55593edf563c  (anonymous namespace)::MPPassManager::runOnModule()
    @     0x55593edf5137  llvm::legacy::PassManagerImpl::run()
    @     0x55593edf5b71  llvm::legacy::PassManager::run()
    @     0x55593ced250d  xla::gpu::IrDumpingPassManager::run()
    @     0x55593ced5033  xla::gpu::(anonymous namespace)::EmitModuleToPTX()
    @     0x55593ced40ba  xla::gpu::(anonymous namespace)::CompileModuleToPtx()
    @     0x55593ced33d0  xla::gpu::CompileToPtx()
    @     0x55593b26b2a2  xla::gpu::NVPTXCompiler::RunBackend()
    @     0x55593b21f973  xla::Service::BuildExecutable()
    @     0x555938f44e64  xla::LocalService::CompileExecutable()
    @     0x555938f30a85  xla::LocalClient::Compile()
    @     0x555938de3c29  tensorflow::XlaCompilationCache::BuildExecutable()
    @     0x555938de4e9e  tensorflow::XlaCompilationCache::CompileImpl()
    @     0x555938de3da5  tensorflow::XlaCompilationCache::Compile()
    @     0x555938c5d962  tensorflow::XlaLocalLaunchBase::Compute()
    @     0x555938c68151  tensorflow::XlaDevice::Compute()
    @     0x55593f389e1f  tensorflow::(anonymous namespace)::ExecutorState::Process()
    @     0x55593f38a625  tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()()
*** SIGABRT received by PID 7798 (TID 7837) from PID 7798; ***

llvm-svn: 337541
2018-07-20 12:03:00 +00:00
Roman Tereshin b49b2a601f [LSV] Refactoring + supporting bitcasts to a type of different size
This is mostly a preparation work for adding a limited support for
select instructions. It proved to be difficult to do due to size and
irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I
believe.

It also turned out that these changes make it simpler to finish one of
the TODOs and fix a number of other small issues, namely:

1. Looking through bitcasts to a type of a different size (requires
careful tracking of the original load/store size and some math
converting sizes in bytes to expected differences in indices of GEPs).

2. Reusing partial analysis of pointers done by first attempt in proving
them consecutive instead of starting from scratch. This added limited
support for nested GEPs co-existing with difficult sext/zext
instructions. This also required a careful handling of negative
differences between constant parts of offsets.

3. Handing a case where the first pointer index is not an add, but
something else (a function parameter for instance).

I observe an increased number of successful vectorizations on a large
set of shader programs. Only few shaders are affected, but those that
are affected sport >5% less loads and stores than before the patch.

Reviewed By: rampitec

Differential-Revision: https://reviews.llvm.org/D49342
llvm-svn: 337489
2018-07-19 19:42:43 +00:00
Farhana Aleen 8c7a30baea [LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers.
Summary: Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by
         comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or
         both of them are not factorized. But isConsecutiveAccess() fails if one of the
         pointers is factorized but the other one is not.

         Here is an example:
         PtrA = 4 * (A + B)
         PtrB = 4 + 4A + 4B

         This patch uses getMinusSCEV() to compute the distance between two pointers.
         getMinusSCEV() allows combining the expressions and computing the simplified distance.

Author: FarhanaAleen

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D49516

llvm-svn: 337471
2018-07-19 16:50:27 +00:00
Simon Pilgrim 2b37ddce4b [SLPVectorizer] Avoid duplicate scalar cost calculations in BoUpSLP::getEntryCost. NFCI.
Pulled out from D49225, we have a lot of repeated scalar cost calculations, often with arguments that don't look the same but turn out to be.

llvm-svn: 337390
2018-07-18 13:53:55 +00:00
Simon Pilgrim 1a4f3c93fb [SLPVectorizer] Don't attempt horizontal reduction on pointer types (PR38191)
TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only.

llvm-svn: 337280
2018-07-17 13:43:33 +00:00
Simon Pilgrim 36b944e778 [SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED-2)
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Reapplied with fix to only accept 2 different casts if they come from the same source type (PR38154).

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336989
2018-07-13 11:09:52 +00:00
Martin Storsjo 86b95489fd Revert "[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)"
This reverts commit r336812, which broke compilation of a number
of projects, see PR38154.

llvm-svn: 336949
2018-07-12 21:33:42 +00:00
Simon Pilgrim 876e99bf2c [SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Reapplied with fix to only accept 2 different casts if they come from the same source type.

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336812
2018-07-11 15:05:10 +00:00
Simon Pilgrim 1edde95abd Revert rL336804: [SLPVectorizer] Add initial alternate opcode support for cast instructions.
Reverting due to buildbot failures

llvm-svn: 336806
2018-07-11 14:08:16 +00:00
Simon Pilgrim 2f963a7e83 [SLPVectorizer] Add initial alternate opcode support for cast instructions.
We currently only support binary instructions in the alternate opcode shuffles.

This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:

1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.

Differential Revision: https://reviews.llvm.org/D49135

llvm-svn: 336804
2018-07-11 13:34:09 +00:00
Anastasis Grammenos 612bf7cac5 [DebugInfo][LoopVectorize] Preserve DL in induction PHI and Add
Differential Revision: https://reviews.llvm.org/D48968

llvm-svn: 336667
2018-07-10 13:29:50 +00:00
Diego Caballero d09530144a [VPlan][LV] Introduce condition bit in VPBlockBase
This patch introduces a VPValue in VPBlockBase to represent the condition
bit that is used as successor selector when a block has multiple successors.
This information wasn't necessary until now, when we are about to introduce
outer loop vectorization support in VPlan code gen.

Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D48814

llvm-svn: 336554
2018-07-09 15:57:09 +00:00
Simon Pilgrim dafd828c97 [SLPVectorizer] Begin abstracting InstructionsState alternate matching away from opcodes. NFCI.
This is an early step towards matching Instructions by attributes other than the opcode. This will be necessary for cast/call alternates which share the same opcode but have different types/intrinsicIDs etc. - which we could vectorize as long as we split them using the alternate mechanism.

Differential Revision: https://reviews.llvm.org/D48945

llvm-svn: 336344
2018-07-05 12:30:44 +00:00
Simon Pilgrim ae1c4dcc6e Fix some irregular whitespace/indentation. NFCI.
llvm-svn: 336291
2018-07-04 17:24:05 +00:00
Anastasis Grammenos 204726b345 [DebugInfo][LoopVectorize] Preserve DL in generated phi instruction
When creating `phi` instructions to resume at the scalar part of the loop,
copy the DebugLoc from the original phi over to the new one.

Differential Revision: https://reviews.llvm.org/D48769

llvm-svn: 336256
2018-07-04 10:16:55 +00:00
Farhana Aleen 3b416db19b [SLP] Recognize min/max pattern using instructions producing same values.
Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization.

         %1 = extractelement <2 x i32> %a, i32 0
         %2 = extractelement <2 x i32> %a, i32 1
         %cond = icmp sgt i32 %1, %2
         %3 = extractelement <2 x i32> %a, i32 0
         %4 = extractelement <2 x i32> %a, i32 1
         %select = select i1 %cond, i32 %3, i32 %4

Author: FarhanaAleen

Reviewed By: ABataev, RKSimon, spatel

Differential Revision: https://reviews.llvm.org/D47608

llvm-svn: 336130
2018-07-02 17:55:31 +00:00
Simon Pilgrim d5fb50e3bf [SLPVectorizer] Remove nullptr early-outs from Instruction::ShuffleVector getEntryCost
This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure.

llvm-svn: 336102
2018-07-02 13:41:29 +00:00
Simon Pilgrim 265793d52a [SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns.
We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case.

This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now...

llvm-svn: 336095
2018-07-02 11:28:01 +00:00
Simon Pilgrim 409bd5f487 [SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for getEntryCost/vectorizeTree. NFCI.
Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators.

llvm-svn: 336092
2018-07-02 10:54:19 +00:00
Simon Pilgrim 3dafb553d9 [SLPVectorizer] Call InstructionsState.isOpcodeOrAlt with Instruction instead of an opcode. NFCI.
llvm-svn: 336069
2018-07-01 20:22:46 +00:00
Simon Pilgrim ef9c97c343 [SLPVectorizer] Replace sameOpcodeOrAlt with InstructionsState.isOpcodeOrAlt helper. NFCI.
This is a basic step towards matching more general instructions types than just opcodes.

llvm-svn: 336068
2018-07-01 20:07:30 +00:00
Simon Pilgrim 77d2067677 [SLPVectorizer] Use InstructionsState Op/Alt opcodes directly. NFCI.
llvm-svn: 336063
2018-07-01 13:41:58 +00:00
Simon Pilgrim bbfc18b5b5 [SLPVectorizer] Recognise non uniform power of 2 constants
Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them.

As SLP works with arrays of values I don't think we can easily use the pattern match helpers here.

Differential Revision: https://reviews.llvm.org/D48214

llvm-svn: 335621
2018-06-26 16:20:16 +00:00
Simon Pilgrim 9d3ef8ee2b [SLPVectorizer] Support alternate opcodes in tryToVectorizeList
Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree.

NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc.

Differential Revision: https://reviews.llvm.org/D48488

llvm-svn: 335364
2018-06-22 16:37:34 +00:00
Simon Pilgrim 213cb1b82d [SLPVectorizer] reorderAltShuffleOperands should just take InstructionsState. NFCI.
All calls were extracting the InstructionsState Opcode/AltOpcode values so we might as well pass it directly

llvm-svn: 335359
2018-06-22 16:10:26 +00:00
Simon Pilgrim 1e564504bb [SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair
SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle.

This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle.

Differential Revision: https://reviews.llvm.org/D48477

llvm-svn: 335349
2018-06-22 14:04:06 +00:00
Simon Pilgrim 3d1c8c97b8 [SLPVectorizer] Provide InstructionsState down the BoUpSLP vectorization call tree
As described in D48359, this patch pushes InstructionsState down the BoUpSLP call hierarchy instead of the corresponding raw OpValue. This makes it easier to track the alternate opcode etc. and avoids us having to call getAltOpcode which makes it difficult to support more than one alternate opcode.

Differential Revision: https://reviews.llvm.org/D48382

llvm-svn: 335170
2018-06-20 20:54:52 +00:00
Simon Pilgrim 292651a5b7 [SLPVectorizer] Move isOneOf after InstructionsState type. NFCI.
A future patch will have isOneOf use InstructionsState.

llvm-svn: 335142
2018-06-20 16:11:00 +00:00
Simon Pilgrim 0c9f8dcde7 [SLPVectorizer] Use InstructionsState to record AltOpcode
This is part of a move towards generalizing the alternate opcode mechanism and not just supporting (F)Add/(F)Sub counterparts.

The patch embeds the AltOpcode in the InstructionsState instead of calling getAltOpcode so often.

I'm hoping to eventually remove all uses of getAltOpcode and handle alternate opcode selection entirely within getSameOpcode, that will require us to use InstructionsState throughout the BoUpSLP call hierarchy (similar to some of the changes in D28907), which I will begin in future patches.

Differential Revision: https://reviews.llvm.org/D48359

llvm-svn: 335134
2018-06-20 15:13:40 +00:00
Simon Pilgrim 2e2f20a949 [SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern
D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand.

This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation.

The AArch64 regression will be fixed by D48172.

Differential Revision: https://reviews.llvm.org/D48174

llvm-svn: 335130
2018-06-20 14:26:28 +00:00
Simon Pilgrim b7ac037797 [SLPVectorizer] Split Tree/Reduction cost calls to simplify debugging. NFCI.
llvm-svn: 335110
2018-06-20 09:39:01 +00:00
Simon Pilgrim 0461393660 [SLPVectorizer] Remove default OperandValueKind arguments from getArithmeticInstrCost calls (NFC)
The getArithmeticInstrCost calls for shuffle vectors entry costs specify TargetTransformInfo::OperandValueKind arguments, but are just using the method's default values. This seems to be a copy + paste issue and doesn't affect the costs in anyway. The TargetTransformInfo::OperandValueProperties default arguments are already not being used.

Noticed while working on D47985.

Differential Revision: https://reviews.llvm.org/D48008

llvm-svn: 335045
2018-06-19 13:40:00 +00:00
Simon Pilgrim c966f7213e [SLPVectorizer] Pull out AltOpcode determination from reorderAltShuffleOperands.
Minor step towards making the alternate opcode system work with a wider range of opcode pairs.

llvm-svn: 335032
2018-06-19 09:16:06 +00:00
Florian Hahn 3385caaafd [VPlan] Add VPInstruction to VPRecipe transformation.
This patch introduces a VPInstructionToVPRecipe transformation, which
allows us to generate code for a VPInstruction based VPlan re-using the
existing infrastructure.

Reviewers: dcaballe, hsaito, mssimpso, hfinkel, rengolin, mkuper, javed.absar, sguggill

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D46827

llvm-svn: 334969
2018-06-18 18:28:49 +00:00
Simon Pilgrim 5b962b2fc3 [SLPVectorizer] Tidyup isShuffle helper
Ensure we keep track of the input vectors in all cases instead of just for SK_Select.

Ideally we'd reuse the shuffle mask pattern matching in TargetTransformInfo::getInstructionThroughput here to easily add support for all TargetTransformInfo::ShuffleKind without mass code duplication, I've added a TODO for now but D48236 should help us here.

Differential Revision: https://reviews.llvm.org/D48023

llvm-svn: 334958
2018-06-18 16:25:01 +00:00
Florian Hahn 63cbcf98a5 [VPlanRecipeBase] Add eraseFromParent().
Reviewers: dcaballe, hsaito, mkuper, hfinkel

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D48081

llvm-svn: 334951
2018-06-18 15:18:48 +00:00
Florian Hahn 3bcff3662c [VPlan] Fix sanitizer problem with insertBefore.
llvm-svn: 334943
2018-06-18 13:51:28 +00:00
Simon Pilgrim 99a5832016 [SLPVectorizer] Avoid calling const VL.size() repeatedly in for-loop. NFCI.
llvm-svn: 334934
2018-06-18 11:35:36 +00:00
Florian Hahn 7591e4e94a [VPlanRecipeBase] Add insertBefore helper.
Reviewers: dcaballe, mkuper, hfinkel, hsaito, mssimpso

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D48080

llvm-svn: 334933
2018-06-18 11:34:17 +00:00
Diego Caballero 68795245cf [LV] Prevent LV to run cost model twice for VF=2
This is a minor fix for LV cost model, where the cost for VF=2 was
computed twice when the vectorization of the loop was forced without
specifying a VF.

Reviewers: xusx595, hsaito, fhahn, mkuper

Reviewed By: hsaito, xusx595

Differential Revision: https://reviews.llvm.org/D48048

llvm-svn: 334840
2018-06-15 16:21:35 +00:00
Simon Pilgrim b234ff136e [SLPVectorizer] Remove RawInstructionsData/getMainOpcode and merge into getSameOpcode
This is part of the work to cleanup use of 'alternate' ops so we can use the more general SK_Select shuffle type.

Only getSameOpcode calls getMainOpcode and much of the logic is repeated in both functions. This will require some reworking of D28907 but that patch has hit trouble and is unlikely to be completed anytime soon.

Differential Revision: https://reviews.llvm.org/D48120

llvm-svn: 334701
2018-06-14 10:25:19 +00:00
Simon Pilgrim 2c9d2adff5 [SLPVectorizer] getSameOpcode - remove useless cast [NFC]
There's no need to cast the base Value to an Instruction

llvm-svn: 334588
2018-06-13 10:49:24 +00:00
Simon Pilgrim 1224260f83 [SLPVectorizer] getSameOpcode - remove unusued alternate code [NFC]
We early-out for the case where we don't use alternate opcodes, so no need to check for it later.

llvm-svn: 334587
2018-06-13 10:14:27 +00:00
Simon Pilgrim e39fa6cbbb [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:

e.g. v4f32: <0,5,2,7> or <4,1,6,3>

This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:

e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.

This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.

Differential Revision: https://reviews.llvm.org/D47985

llvm-svn: 334513
2018-06-12 16:12:29 +00:00
Florian Hahn a1cc848399 Use SmallPtrSet explicitly for SmallSets with pointer types (NFC).
Currently SmallSet<PointerTy> inherits from SmallPtrSet<PointerTy>. This
patch replaces such types with SmallPtrSet, because IMO it is slightly
clearer and allows us to get rid of unnecessarily including SmallSet.h

Reviewers: dblaikie, craig.topper

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D47836

llvm-svn: 334492
2018-06-12 11:16:56 +00:00
Craig Topper 61998289f9 Use SmallPtrSet instead of SmallSet in places where we iterate over the set.
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.

These places were found by hiding the begin/end methods in the SmallSet forwarding

llvm-svn: 334343
2018-06-09 05:04:20 +00:00
Florian Hahn 45e5d5b4be [VPlan] Move recipe construction to VPRecipeBuilder.
This patch moves the recipe-creation functions out of
LoopVectorizationPlanner, which should do the high-level
orchestration of the transformations.

Reviewers: dcaballe, rengolin, hsaito, Ayal

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D47595

llvm-svn: 334305
2018-06-08 17:30:45 +00:00
Florian Hahn b3c6f07dde [VPlan] Move recipe based VPlan generation to separate function.
This first step separates VPInstruction-based and VPRecipe-based
VPlan creation, which should make it easier to migrate to VPInstruction
based code-gen step by step.

Reviewers: Ayal, rengolin, dcaballe, hsaito, mkuper, mzolotukhin

Reviewed By: dcaballe

Subscribers: bollu, tschuett, rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D47477

llvm-svn: 334284
2018-06-08 12:53:51 +00:00
Roman Shirokiy 9ba0aa2da0 [LV] Fix PR36983. For a given recurrence, fix all phis in exit block
There could be more than one PHIs in exit block using same loop recurrence.
Don't assume there is only one and fix each user.

Differential Revision: https://reviews.llvm.org/D47788

llvm-svn: 334271
2018-06-08 08:21:20 +00:00
David Blaikie 31b98d2e99 Move Analysis/Utils/Local.h back to Transforms
Review feedback from r328165. Split out just the one function from the
file that's used by Analysis. (As chandlerc pointed out, the original
change only moved the header and not the implementation anyway - which
was fine for the one function that was used (since it's a
template/inlined in the header) but not in general)

llvm-svn: 333954
2018-06-04 21:23:21 +00:00
Diego Caballero b94b21d441 [VPlan] Replace LLVM_ATTRIBUTE_USED with ifndef NDEBUG
Minor replacement. LLVM_ATTRIBUTE_USED was introduced to silence
a warning but using #ifndef NDEBUG makes more sense in this case.

Reviewers: dblaikie, fhahn, hsaito

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D47498

llvm-svn: 333476
2018-05-29 23:10:44 +00:00
Fangrui Song afa95ee03d [LLVM-C] [OCaml] Remove LLVMAddBBVectorizePass
Summary: It was fully replaced back in 2014, and the implementation was removed 11 months ago by r306797.

Reviewers: hfinkel, chandlerc, whitequark, deadalnix

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47436

llvm-svn: 333378
2018-05-28 16:58:10 +00:00
Andrei Elovikov d34b765cb2 [NFC][VPlan] Wrap PlainCFGBuilder with an anonymous namespace.
Summary:
It's internal to the VPlanHCFGBuilder and should not be visible outside of its
translation unit.

Reviewers: dcaballe, fhahn

Reviewed By: fhahn

Subscribers: rengolin, bollu, tschuett, llvm-commits, rkruppe

Differential Revision: https://reviews.llvm.org/D47312

llvm-svn: 333187
2018-05-24 14:31:00 +00:00
Nicola Zaghen 03d0b91f43 Remove DEBUG macro.
Now that the LLVM_DEBUG() macro landed on the various sub-projects
the DEBUG macro can be removed.
Also change the new uses of DEBUG to LLVM_DEBUG.

Differential Revision: https://reviews.llvm.org/D46952

llvm-svn: 333091
2018-05-23 15:09:29 +00:00
Diego Caballero 1bd5f2261d Fix warning from r332654 with LLVM_ATTRIBUTE_USED
r332654 tried to fix an unused function warning with
a void cast. This approach worked for clang and gcc 
but not for MSVC. This commit replaces the void cast
with the LLVM_ATTRIBUTE_USED approach.

llvm-svn: 332910
2018-05-21 22:12:38 +00:00
Diego Caballero 168d04d544 [VPlan] Reland r332654 and silence unused func warning
r332654 was reverted due to an unused function warning in
release build. This commit includes the same code with the
warning silenced.

Differential Revision: https://reviews.llvm.org/D44338

llvm-svn: 332860
2018-05-21 18:14:23 +00:00
Galina Kistanova 083ea389d6 Reverted r332654 as it has broken some buildbots and left unfixed for a long time.
The introduced problem is:
llvm.src/lib/Transforms/Vectorize/VPlanVerifier.cpp:29:13: error: unused function 'hasDuplicates' [-Werror,-Wunused-function]
static bool hasDuplicates(const SmallVectorImpl<VPBlockBase *> &VPBlockVec) {
            ^

llvm-svn: 332747
2018-05-18 18:14:06 +00:00
Diego Caballero f58ad3129c [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.
Patch #3 from VPlan Outer Loop Vectorization Patch Series #1
(RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).

Expected to be NFC for the current inner loop vectorization path. It
introduces the basic algorithm to build the VPlan plain CFG (single-level
CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization
path using VPInstructions. It includes:
  - VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now).
  - VPlanVerifier: Main class with utilities to check the consistency of a H-CFG.
  - VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan.

Reviewers: rengolin, fhahn, mkuper, mssimpso, a.elovikov, hfinkel, aprantl.

Differential Revision: https://reviews.llvm.org/D44338

llvm-svn: 332654
2018-05-17 19:24:47 +00:00
Nicola Zaghen d34e60ca85 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Vedant Kumar e0b5f86b30 [STLExtras] Add distance() for ranges, pred_size(), and succ_size()
This commit adds a wrapper for std::distance() which works with ranges.
As it would be a common case to write `distance(predecessors(BB))`, this
also introduces `pred_size()` and `succ_size()` helpers to make that
easier to write.

Differential Revision: https://reviews.llvm.org/D46668

llvm-svn: 332057
2018-05-10 23:01:54 +00:00
Krzysztof Parzyszek ea4c1bb772 [LV] Change MaxVectorSize bound to 256 in assertion, NFC otherwise
It's possible to have a vector of 256 bytes in HVX code on Hexagon
(vector pair in 128-byte mode).

llvm-svn: 331885
2018-05-09 15:18:12 +00:00