Commit Graph

2081 Commits

Author SHA1 Message Date
Huihui Zhang fc1f205745 [SLPVectorizer][SVE] Bail out early for scalable vector.
Summary:
SLPVectorizer try to vectorize list of scalar instructions of the same type,
instructions already vectorized are rejected through isValidElementType().

Without this patch, tryToVectorizeList() will first try to determine vectorization
factor of a list of Instructions before checking whether each instruction has unsupported
type or not. For instructions already vectorized for SVE, it will crash at getVectorElementSize(),
where it try to return a fixed size.

This patch make sure invalid element types are rejected before trying to get vectorization
factor. This make sure we are not trying to vectorize instructions already vectorized.

Reviewers: sdesmalen, efriedma, spatel, RKSimon, ABataev, apazos, rengolin

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76017
2020-03-13 11:23:31 -07:00
Huihui Zhang 118abf2017 [SVE] Update API ConstantVector::getSplat() to use ElementCount.
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.

This change is needed for D73753.

Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74386
2020-03-12 13:22:41 -07:00
Anna Welker a6d3bec83f [TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.

Differential Revision: https://reviews.llvm.org/D75525
2020-03-11 10:23:41 +00:00
Benjamin Kramer 247a177cf7 Give helpers internal linkage. NFC. 2020-03-10 18:27:42 +01:00
Florian Hahn 2d6ecf4648 [SLP] Support vectorizing functions provided by vector libs.
It seems like the SLPVectorizer is currently not aware of vector
versions of functions provided by libraries like Accelerate [1].
This patch updates SLPVectorizer to use the same infrastructure
the LoopVectorizer uses to detect vectorizable library functions.

For calls, it computes the cost of an intrinsic call (existing behavior)
and the cost of a vector function library call, if available. Like
LoopVectorizer, it assumes the cost of the vector function is simply the
cost of a call to a vector function.

[1] https://developer.apple.com/documentation/accelerate

Reviewers: ABataev, RKSimon, spatel

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D75878
2020-03-10 13:10:50 +00:00
Sanjay Patel a69158c12a [VectorCombine] fold extract-extract-op with different extraction indexes
opcode (extelt V0, Ext0), (ext V1, Ext1) --> extelt (opcode (splat V0, Ext0), V1), Ext1

The first part of this patch generalizes the cost calculation to accept
different extraction indexes. The second part creates a shuffle+extract
before feeding into the existing code to create a vector op+extract.

The patch conservatively uses "TargetTransformInfo::SK_PermuteSingleSrc"
rather than "TargetTransformInfo::SK_Broadcast" (splat specifically
from element 0) because we do not have a more general "SK_Splat"
currently. That does not affect any of the current regression tests,
but we might be able to find some cost model target specialization where
that comes into play.

I suspect that we can expose some missing x86 horizontal op codegen with
this transform, so I'm speculatively adding a debug flag to disable the
binop variant of this transform to allow easier testing.

The test changes show that we're sensitive to cost model diffs (as we
should be), so that means that patches like D74976
should have better coverage.

Differential Revision: https://reviews.llvm.org/D75689
2020-03-08 09:57:55 -04:00
Florian Hahn 40e7bfc424 [VPlan] Use consecutive numbers to print VPValues instead of addresses.
Currently when printing VPValues we use the object address, which makes
it hard to distinguish VPValues as they usually are large numbers with
varying distance between them.

This patch adds a simple slot tracker, similar to the ModuleSlotTracker
used for IR values. In order to dump a VPValue or anything containing a
VPValue, a slot tracker for the enclosing VPlan needs to be created. The
existing VPlanPrinter can take care of that for the existing code. We
assign consecutive numbers to each VPValue we encounter in a reverse
post order traversal of the VPlan.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D73078
2020-03-05 14:55:15 +00:00
Simon Pilgrim 01a91a6de7 Fix static analyzer uninitialized variable warning. NFCI. 2020-03-05 14:22:24 +00:00
Florian Hahn 05afa55521 [VPlan] Add getPlan() to VPBlockBase.
This patch adds a getPlan accessor to VPBlockBase, which finds the entry
block of the plan containing the block and returns the plan set for this
block.

VPBlockBase contains a VPlan pointer, but it should only be set for
the entry block of a plan. This allows moving blocks without updating
the pointer for each moved block and in the future we might introduce a
parent relationship between plans and blocks, similar to the one in LLVM IR.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D74445
2020-03-03 13:20:13 +00:00
David Green d0d38df091 [LoopVectorizer] Change types of lists from pointers to references. NFC
getReductionVars, getInductionVars and getFirstOrderRecurrences were all
being returned from LoopVectorizationLegality as pointers to lists. This
just changes them to be references, cleaning up the interface slightly.

Differential Revision: https://reviews.llvm.org/D75448
2020-03-02 15:04:41 +00:00
Austin Kerbow 4fa63fd452 [VectorCombine] Fix assert on compare extract index
Extract index could be a differnet integral type.

Differential Revision: https://reviews.llvm.org/D75327
2020-02-28 10:37:08 -08:00
Valery N Dmitriev d723ec4f04 [SLP][NFC] Assert that tree entry operands completed when scheduler looks for dependencies.
This change adds an assertion to prevent tricky bug related to recursive
approach of building vectorization tree. For loop below takes number of
operands directly from tree entry rather than from scalars.
If the entry at this moment turns out incomplete (i.e. not all operands set)
then not all the dependencies will be seen by the scheduler.
This can lead to failed scheduling (and thus failed vectorization)
for perfectly vectorizable tree.
Here is code example which is likely to fire the assertion:
for (i : VL0->getNumOperands()) {
  ...
  TE->setOperand(i, Operands);
  buildTree_rec(Operands, Depth + 1,...);
}

Correct way is two steps process: first set all operands to a tree entry
and then recursively process each operand.

Differential Revision: https://reviews.llvm.org/D75296
2020-02-28 10:34:48 -08:00
Valery N Dmitriev 02e5e47e17 [SLP][NFC] Delete some unreachable code.
This patch deletes some dead code out of SLP vectorizer.
Couple of changes taken out of D57059 to slightly lighten it
plus one more similar case fixed.

Differential Revision: https://reviews.llvm.org/D75276
2020-02-28 09:22:51 -08:00
Nemanja Ivanovic c46b85aaf4 [LoopVectorize] Fix cost for calls to functions that have vector versions
A recent commit
(https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed
the cost for calls to functions that have a vector version for some
vectorization factor. However, no check is performed for whether the
vectorization factor matches the current one being cost modeled. This leads to
attempts to widen call instructions to a vectorization factor for which such a
function does not exist, which in turn leads to an assertion failure.

This patch adds the check for vectorization factor (i.e. not just that the
called function has a vector version for some VF, but that it has a vector
version for this VF).

Differential revision: https://reviews.llvm.org/D74944
2020-02-26 21:39:11 -06:00
Sanjay Patel 25c6544f32 [VectorCombine] add a debug flag to skip all transforms
As suggested in D75145 -

I'm not sure why, but several passes have this kind of disable/enable flag
implemented at the pass manager level. But that means we have to duplicate
the flag for both pass managers and add code to check the flag every time
the pass appears in the pipeline.

We want a debug option to see if this pass is misbehaving regardless of the
pass managers, so just add a disablement check at the single point before
any transforms run.

Differential Revision: https://reviews.llvm.org/D75204
2020-02-26 15:15:42 -05:00
Sanjay Patel 10ea01d80d [VectorCombine] make cost calc consistent for binops and cmps
Code duplication (subsequently removed by refactoring) allowed
a logic discrepancy to creep in here.

We were being conservative about creating a vector binop -- but
not a vector cmp -- in the case where a vector op has the same
estimated cost as the scalar op. We want to be more aggressive
here because that can allow other combines based on reduced
instruction count/uses.

We can reverse the transform in DAGCombiner (potentially with a
more accurate cost model) if this causes regressions.

AFAIK, this does not conflict with InstCombine. We have a
scalarize transform there, but it relies on finding a constant
operand or a matching insertelement, so that means it eliminates
an extractelement from the sequence (so we won't have 2 extracts
by the time we get here if InstCombine succeeds).

Differential Revision: https://reviews.llvm.org/D75062
2020-02-25 08:41:59 -05:00
Sanjay Patel e9c79a7aef [VectorCombine] refactor to reduce duplicated code; NFC
This should be the last step in the current cleanup.
Follow-ups should resolve the TODO about cost calc
and enable the more general case where we extract
different elements.
2020-02-21 15:56:00 -05:00
Sanjay Patel 34e3485560 [VectorCombine] refactor cost calcs to reduce duplication; NFC
More cleanup is possible now, but we probably need to
resolve the TODO about the existing difference between
compares and binops.
2020-02-21 15:12:00 -05:00
Florian Hahn 98f5268a72 [VectorUtils] Move ToVectorTy to VectorUtils.h (NFC).
ToVectorTy is defined and used in multiple places. Hoist it to
VectorUtils.h to avoid duplication and improve re-usability.

Reviewers: rengolin, hsaito, Ayal, gilr, fpetrogalli

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D74959
2020-02-21 17:31:24 +00:00
Sanjay Patel fc4455891c [VectorCombine] refactor matching code to reduce duplication; NFC
cmp/binop were already diverging even though they are largely
the same logic.
2020-02-21 12:06:51 -05:00
Reid Kleckner 0c2b09a9b6 [IR] Lazily number instructions for local dominance queries
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.

The downside is that Instruction grows from 56 bytes to 64 bytes.  The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.

The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.

We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.

Reviewed By: lattner, vsk, fhahn, dexonsmith

Differential Revision: https://reviews.llvm.org/D51664
2020-02-18 14:44:24 -08:00
Huihui Zhang 8ee0e1dc02 [NFC] Silence compiler warning [-Wmissing-braces]. 2020-02-18 10:37:12 -08:00
Florian Hahn e32522ca17 [SLPVectorizer] Do not assume extracelement idx is a ConstantInt.
The index of an ExtractElementInst is not guaranteed to be a
ConstantInt. It can be any integer value. Check explicitly for
ConstantInts.

The new test cases illustrate scenarios where we crash without
this patch. I've also added another test case to check the matching
of extractelement vector ops works.

Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D74758
2020-02-18 18:16:06 +01:00
Benjamin Kramer 5fc5c7db38 Strength reduce vectors into arrays. NFCI. 2020-02-17 15:37:35 +01:00
Sanjay Patel 62dd44d76d [VectorCombine] fix cost calc for extract-cmp
getOperationCost() is not the cost we wanted; that's not the
throughput value that the rest of the calculation uses.

We may want to switch everything in this code to use the
getInstructionThroughput() wrapper to avoid these kinds of
problems, but I'll look at that as a follow-up because that
can create other logical diffs via using optional parameters
(we'd need to speculatively create the vector instruction to
make a fair(er) comparison).
2020-02-16 10:40:28 -05:00
Kadir Cetinkaya 1674f772b4
[VecotrCombine] Fix unused variable for assertion disabled builds 2020-02-14 09:30:29 +01:00
Sanjay Patel 19b62b79db [VectorCombine] try to form vector binop to eliminate an extract element
binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C

This is a transform that has been considered for canonicalization (instcombine)
in the past because it reduces instruction count. But as shown in the x86 tests,
it's impossible to know if it's profitable without a cost model. There are many
potential target constraints to consider.

We have implemented similar transforms in the backend (DAGCombiner and
target-specific), but I don't think we have this exact fold there either (and if
we did it in SDAG, it wouldn't work across blocks).

Note: this patch was intended to handle the more general case where the extract
indexes do not match, but it got too big, so I scaled it back to this pattern
for now.

Differential Revision: https://reviews.llvm.org/D74495
2020-02-13 17:23:27 -05:00
Matt Arsenault 86f9117d47 AMDGPU: Don't report 2-byte alignment as fast
This is apparently worse than 1-byte alignment. This does not attempt
to decompose 2-byte aligned wide stores, but will stop trying to
produce them.

Also fix bug in LoadStoreVectorizer which was decreasing the alignment
and vectorizing stack accesses. It was assuming a stack object was an
alloca that could have its base alignment changed, which is not true
if the pointer is derived from a function argument.
2020-02-11 18:35:00 -05:00
Sanjay Patel a2a0f9a43a [VectorCombine] remove unused debug counter; NFC
The variable was added to the initial commit via copy/paste of existing
code, but it wasn't actually used in the code. We can add it back with
the proper usage if/when that is needed.
2020-02-11 08:24:07 -05:00
Sanjay Patel a17f03bd93 [VectorCombine] new IR transform pass for partial vector ops
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.

There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:

    InstCombine - can't happen without TTI, but we don't want target-specific
                  folds there.
    SDAG - too late to assist other vectorization passes; TLI is not equipped
           for these kind of cost queries; limited to a single basic block.
    CGP - too late to assist other vectorization passes; would need to re-implement
          basic cleanups like CSE/instcombine.
    SLP - doesn't fit with existing transforms; limited to a single basic block.

This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.

We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.

Differential Revision: https://reviews.llvm.org/D73480
2020-02-09 10:04:41 -05:00
Florian Hahn d1f849a284 [LV] Hoist code to mark conditional assumes as dead to caller (NFC).
This is a follow-up suggested in D73423. It is sufficient to just add
the conditional assumes to DeadInstructions once.
2020-01-28 08:50:44 -08:00
Florian Hahn a911fef3dd [LV] Do not try to sink dead instructions.
Dead instructions do not need to be sunk. Currently we try and record
the recipies for them, but there are no recipes emitted for them and
there's nothing to sink. They can be removed from SinkAfter while
marking them for recording.

Fixes PR44634.

Reviewers: rengolin, hsaito, fhahn, Ayal, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D73423
2020-01-28 08:28:03 -08:00
Guillaume Chatelet d0a7cc7177 [Alignment][NFC] Use Align with CreateMaskedScatter/Gather
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

This patch shows that CreateMaskedScatter/CreateMaskedGather can only take positive non zero alignment values.

Reviewers: courbet

Subscribers: hiraditya, llvm-commits, delena

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73361
2020-01-27 10:17:14 +01:00
Guillaume Chatelet 59f95222d4 [Alignment][NFC] Use Align with CreateAlignedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73274
2020-01-23 17:34:32 +01:00
Guillaume Chatelet 279fa8e006 [Alignement][NFC] Deprecate untyped CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73260
2020-01-23 13:34:32 +01:00
Florian Hahn f14f2a8568 [LV] Fix predication for branches with matching true and false succs.
Currently due to the edge caching, we create wrong predicates for
branches with matching true and false successors. We will cache the
condition for the edge from the true successor, and then lookup the same
edge (src and dst are the same) for the edge to the false successor.

If both successors match, the condition should always be true. At the
moment, we cannot really create constant VPValues, but we can just
create a true condition as X | !X. Later passes will clean that up.

Fixes PR44488.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D73079
2020-01-22 18:34:11 -08:00
Guillaume Chatelet 0957233320 [Alignment][NFC] Use Align with CreateMaskedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73106
2020-01-22 11:04:39 +01:00
Andrei Elovikov e1d6d36852 [SLP] Don't allow Div/Rem as alternate opcodes
Summary:
We don't have control/verify what will be the RHS of the division, so it might
happen to be zero, causing UB.

Reviewers: Vasilis, RKSimon, ABataev

Reviewed By: ABataev

Subscribers: vporpo, ABataev, hiraditya, llvm-commits, vdmitrie

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72740
2020-01-21 15:21:17 -08:00
Guillaume Chatelet bc8a1ab26f [Alignment][NFC] Use Align with CreateMaskedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73087
2020-01-21 14:13:22 +01:00
Evgeniy Brevnov af7e158872 [LV] Vectorizer should adjust trip count in profile information
Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.

Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov

Reviewed By: Ayal, DaniilSuchkov

Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67905
2020-01-20 18:36:28 +07:00
Francesco Petrogalli 66c120f025 [VectorUtils] Rework the Vector Function Database (VFDatabase).
Summary:
This commits is a rework of the patch in
https://reviews.llvm.org/D67572.

The rework was requested to prevent out-of-tree performance regression
when vectorizing out-of-tree IR intrinsics. The vectorization of such
intrinsics is enquired via the static function `isTLIScalarize`. For
detail see the discussion in https://reviews.llvm.org/D67572.

Reviewers: uabelho, fhahn, sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72734
2020-01-16 15:08:26 +00:00
Florian Hahn 23c113802e [LV] Allow assume calls in predicated blocks.
The assume intrinsic is intentionally marked as may reading/writing
memory, to avoid passes moving them around. When flattening the CFG
for predicated blocks, we have to drop the assume calls, as they
are control-flow dependent.

There are some cases where we can do better (when control flow is
preserved), but that is follow-up work.

Fixes PR43620.

Reviewers: hsaito, rengolin, dcaballe, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D68814
2020-01-16 10:11:35 +00:00
Benjamin Kramer 498856fca5 [LV] Silence unused variable warning in Release builds. NFC. 2020-01-10 11:21:27 +01:00
Gil Rapaport 8647a72c4a [LV] VPValues for memory operation pointers (NFCI)
Memory instruction widening recipes use the pointer operand of their load/store
ingredient for generating the needed GEPs, making it difficult to feed these
recipes with pointers based on other ingredients or none at all.
This patch modifies these recipes to use a VPValue for the pointer instead, in
order to reduce ingredient def-use usage by ILV as a step towards full
VPlan-based def-use relations. The recipes are constructed with VPValues bound
to these ingredients, maintaining current behavior.

Differential revision: https://reviews.llvm.org/D70865
2020-01-10 09:24:59 +02:00
Sjoerd Meijer 8f1887456a [LV] Still vectorise when tail-folding can't find a primary inducation variable
This addresses a vectorisation regression for tail-folded loops that are
counting down, e.g. loops as simple as this:

  void foo(char *A, char *B, char *C, uint32_t N) {
    while (N > 0) {
      *C++ = *A++ + *B++;
       N--;
    }
  }

These are loops that can be vectorised, but when tail-folding is requested, it
can't find a primary induction variable which we do need for predicating the
loop. As a result, the loop isn't vectorised at all, which it is able to do
when tail-folding is not attempted. So, this adds a check for the primary
induction variable where we decide how to lower the scalar epilogue. I.e., when
there isn't a primary induction variable, a scalar epilogue loop is allowed
(i.e. don't request tail-folding) so that vectorisation could still be
triggered.

Having this check for the primary induction variable make sense anyway, and in
addition, in a follow-up of this I will look into discovering earlier the
primary induction variable for counting down loops, so that this can also be
tail-folded.

Differential revision: https://reviews.llvm.org/D72324
2020-01-09 09:14:00 +00:00
Florian Hahn b8a3c34eee Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)."
This reverts commit 51ef53f3bd, as it
breaks some bots.
2020-01-04 18:44:38 +00:00
Florian Hahn 51ef53f3bd [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537
2020-01-04 18:29:35 +00:00
Evgeniy Brevnov 948e745270 [LV][NFC] Keep dominator tree up to date during vectorization. 2019-12-30 18:38:41 +07:00
Evgeniy Brevnov 1b6286b945 [LV][NFC] Some refactoring and renaming to facilitate next change. 2019-12-30 18:38:41 +07:00
Gil Rapaport d62bf16131 [LV] Use getMask() when printing recipe [NFCI]
Use dedicated API for getting the mask instead of duplicating it.

Differential Revision: https://reviews.llvm.org/D71964
2019-12-29 08:50:40 +02:00
Dinar Temirbulatov a755ccefe6 [SLP] Replace NeedToGather variable with enum. 2019-12-23 08:21:53 +01:00
Ayal Zaks e498be5738 [LV] Strip wrap flags from vectorized reductions
A sequence of additions or multiplications that is known not to wrap, may wrap
if it's order is changed (i.e., reassociated). Therefore when vectorizing
integer sum or product reductions, their no-wrap flags need to be removed.

Fixes PR43828

Patch by Denis Antrushin

Differential Revision: https://reviews.llvm.org/D69563
2019-12-20 14:48:53 +02:00
Anna Welker 7cd1cfdd6b [NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter]
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.

Differential Revision: https://reviews.llvm.org/D71610
2019-12-18 09:14:39 +00:00
Francesco Petrogalli 19f73f0d1b Revert "[VectorUtils] Introduce the Vector Function Database (VFDatabase)."
This reverts commit 0be81968a2.

The VFDatabase needs some rework to be able to handle vectorization
and subsequent scalarization of intrinsics in out-of-tree versions of
the compiler. For more details, see the discussion in
https://reviews.llvm.org/D67572.
2019-12-13 19:42:04 +00:00
Francesco Petrogalli 0be81968a2 [VectorUtils] Introduce the Vector Function Database (VFDatabase).
This patch introduced the VFDatabase, the framework proposed in
http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [*]

In this patch the VFDatabase is used to bridge the TargetLibraryInfo
(TLI) calls that were previously used to query for the availability of
vector counterparts of scalar functions.

The VFISAKind field `ISA` of VFShape have been moved into into VFInfo,
under the assumption that different vector ISAs may provide the same
vector signature. At the moment, the vectorizer accepts any of the
available ISAs as long as the signature provided by the VFDatabase
matches the one expected in the vectorization process. For example,
when targeting AVX or AVX2, which both have 256-bit registers, the IR
signature of the two vector functions associated to the two ISAs is
the same. The `getVectorizedFunction` method at the moment returns the
first available match. We will need to add more heuristics to the
search system to decide which of the available version (TLI, AVX,
AVX2, ...)  the system should prefer, when multiple versions with the
same VFShape are present.

Some of the code in this patch is based on the work done by Sumedh
Arani in https://reviews.llvm.org/D66025.

[*] Notice that in the proposal the VFDatabase was called SVFS. The
name VFDatabase is more in line with LLVM recommendations for
naming classes and variables.

Differential Revision: https://reviews.llvm.org/D67572
2019-12-10 16:36:44 +00:00
David Green be7a107070 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
2019-12-09 10:24:33 +00:00
Florian Hahn c491949694 [LV] Pick correct BB as insert point when fixing PHI for FORs.
Currently we fail to pick the right insertion point when
PreviousLastPart of a first-order-recurrence is a PHI node not in the
LoopVectorBody. This can happen when PreviousLastPart is produce in a
predicated block. In that case, we should pick the insertion point in
the BB the PHI is in.

Fixes PR44020.

Reviewers: hsaito, fhahn, Ayal, dorit

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D71071
2019-12-07 19:32:00 +00:00
Florian Hahn e60b36cf92 [VPlan] Rename VPlanHCFGTransforms to VPlanTransforms (NFC).
The file is intended to gather various VPlan transformations, not only
CFG related transforms. Actually, the only transformation there is not
CFG related.

Reviewers: Ayal, gilr, hsaito, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D70732
2019-12-07 08:56:35 +00:00
Gil Rapaport 39ccc099c9 [LV] Record GEP widening decisions in recipe (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit moves GEP operand queries controlling how GEPs are widened to a
dedicated recipe and extracts GEP widening code to its own ILV method taking
those recorded decisions as arguments. This reduces ingredient def-use usage by
ILV as a step towards full VPlan-based def-use relations.

Differential revision: https://reviews.llvm.org/D69067
2019-12-06 13:41:19 +02:00
Ayal Zaks 6ed9cef25f [LV] Scalar with predication must not be uniform
Fix PR40816: avoid considering scalar-with-predication instructions as also
uniform-after-vectorization.

Instructions identified as "scalar with predication" will be "vectorized" using
a replicating region. If such instructions are also optimized as "uniform after
vectorization", namely when only the first of VF lanes is used, such a
replicating region becomes erroneous - only the first instance of the region can
and should be formed. Fix such cases by not considering such instructions as
"uniform after vectorization".

Differential Revision: https://reviews.llvm.org/D70298
2019-12-03 19:50:24 +02:00
Anton Afanasyev a315519c17 [SLP] Enhance SLPVectorizer to vectorize different combinations of aggregates
Summary:
Make SLPVectorize to recognize homogeneous aggregates like
`{<2 x float>, <2 x float>}`, `{{float, float}, {float, float}}`,
`[2 x {float, float}]` and so on.
It's a follow-up of https://reviews.llvm.org/D70068.
Merged `findBuildVector()` and `findBuildAggregate()` to
one `findBuildAggregate()` function making it recursive
to recognize multidimensional aggregates. Aggregates required
to be homogeneous.

Reviewers: RKSimon, ABataev, dtemirbulatov, spatel, vporpo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70587
2019-12-03 19:29:27 +03:00
Florian Hahn e9c68422de [VPlan] Add dump function to VPlan class.
This adds a dump() function to VPlan, which uses the existing
operator<<.

This method provides a convenient way to dump a VPlan while debugging,
e.g. from lldb.

Reviewers: hsaito, Ayal, gilr, rengolin

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D70920
2019-12-03 11:59:10 +00:00
Hiroshi Yamauchi 8cdfdfeee6 [PGO][PGSO] Add an optional query type parameter to shouldOptimizeForSize.
Summary:
In case of a need to distinguish different query sites for gradual commit or
debugging of PGSO. NFC.

Reviewers: davidxl

Subscribers: hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70510
2019-12-02 13:54:13 -08:00
Florian Hahn fe459ce65a [VPlan] Move graph traits (NFC).
By defining the graph traits right after the VPBlockBase definitions, we
can make use of them earlier in the file.

Reviewers: hsaito, Ayal, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D70733
2019-12-02 18:23:11 +00:00
Florian Hahn 9d24933f79 Recommit f0c2a5a "[LV] Generalize conditions for sinking instrs for first order recurrences."
This version contains 2 fixes for reported issues:
1. Make sure we do not try to sink terminator instructions.
2. Make sure we bail out, if we try to sink an instruction that needs to
   stay in place for another recurrence.

Original message:
If the recurrence PHI node has a single user, we can sink any
instruction without side effects, given that all users are dominated by
the instruction computing the incoming value of the next iteration
('Previous'). We can sink instructions that may cause traps, because
that only causes the trap to occur later, but not on any new paths.

With the relaxed check, we also have to make sure that we do not have a
direct cycle (meaning PHI user == 'Previous), which indicates a
reduction relation, which potentially gets missed by
ReductionDescriptor.

As follow-ups, we can also sink stores, iff they do not alias with
other instructions we move them across and we could also support sinking
chains of instructions and multiple users of the PHI.

Fixes PR43398.

Reviewers: hsaito, dcaballe, Ayal, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D69228
2019-11-24 21:21:55 +00:00
Anton Afanasyev 80cd6b6e04 [SLP] Enhance SLPVectorizer to vectorize vector aggregate
Summary:
Vector aggregate is homogeneous aggregate of vectors like `{ <2 x float>, <2 x float> }`.
This patch allows `findBuildAggregate()` to consider vector aggregates as
well as scalar ones. For instance, `{ <2 x float>, <2 x float> }` maps to `<4 x float>`.

Fixes vector part of llvm.org/PR42022

Reviewers: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70068
2019-11-22 20:01:59 +03:00
Tom Stellard ab411801b8 [cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO.  I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so.  Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:

1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so.  This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.

With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.

2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set.  This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.

I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:

- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON

Reviewers: beanz, smeenai, compnerd, phosek

Reviewed By: beanz

Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70179
2019-11-21 10:48:08 -08:00
Sjoerd Meijer 901cd3b3f6 [LV] PreferPredicateOverEpilog respecting option
Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when
option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not
to predicate the loop.

Differential Revision: https://reviews.llvm.org/D70382
2019-11-21 14:06:10 +00:00
Eric Christopher 714aabacfb Temporarily Revert "[SLP] allow forming 2-way reduction patterns" and update testcases.
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.

This reverts commit 8a0aa5310b.
2019-11-20 16:00:53 -08:00
Eric Christopher 8a0aa5310b Temporarily Revert "Temporarily Revert "[SLP] allow forming 2-way reduction patterns""
as there were testcase changes after that need to also be reverted.

This reverts commit cd8748a15f.
2019-11-20 15:39:47 -08:00
Eric Christopher cd8748a15f Temporarily Revert "[SLP] allow forming 2-way reduction patterns"
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.

This reverts commit 7ff57705ba.
2019-11-20 15:19:31 -08:00
Sanjay Patel 0a8e7ca402 [SLP] fix miscompile on min/max reductions with extra uses (PR43948) (2nd try)
The 1st attempt was reverted because it revealed an existing
bug where we could produce invalid IR (use of value before
definition). That should be fixed with:
rG39de82ecc9c2

The bug manifests as replacing a reduction operand with an undef
value.

The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.

In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.

For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.

Differential Revision: https://reviews.llvm.org/D70148
2019-11-19 14:57:35 -05:00
Sanjay Patel 39de82ecc9 [SLP] fix insertion point for min/max reduction
As discussed in D70148 (and caused a revert of the original commit):
if we insert at the select, then we can produce invalid IR because
the replacement for the compare may have uses before the select.
2019-11-19 10:50:10 -05:00
Evgeniy Brevnov 4a64d710ae [NFC] Test commit. Please ignore.
As a test commit I fixed a misspelling in one of comments in SLP
vectorizer.
2019-11-19 15:41:57 +07:00
Eric Christopher 6f1cc4151a Temporarily revert "[SLP] fix miscompile on min/max reductions with extra uses (PR43948)"
as it causes an ICE on valid. A testcase was followed up on the original thread.

This reverts commit a3e61946c5.
2019-11-18 14:41:37 -08:00
Richard Smith 7889d8e7eb Revert "[LoadStoreVectorize] Use '||' instead of '|' between sides with function calls. NFCI."
This broke two tests. Presumably the non-short-circuting '|' was
intentional here.

This reverts commit f7efea0ded.
2019-11-15 12:49:35 -08:00
Dávid Bolvanský f7efea0ded [LoadStoreVectorize] Use '||' instead of '|' between sides with function calls. NFCI.
Fixes warning from PVS Studio
2019-11-15 18:51:13 +01:00
Reid Kleckner 4c1a1d3cf9 Add missing includes needed to prune LLVMContext.h include, NFC
These are a pre-requisite to removing #include "llvm/Support/Options.h"
from LLVMContext.h: https://reviews.llvm.org/D70280
2019-11-14 15:23:15 -08:00
Alexey Bataev bfa32573bf Revert "Temporarily Revert:"
This reverts commit e511c4b0dff1692c267addf17dce3cebe8f97faa:

    Temporarily Revert:

     "[SLP] Generalization of stores vectorization."
     "[SLP] Fix -Wunused-variable. NFC"
     "[SLP] Vectorize jumbled stores."

after fixing the problem with compile time.
2019-11-14 16:38:20 -05:00
Sjoerd Meijer cb47b87830 [LV] PreferPredicateOverEpilog respecting predicate loop hint
The vectoriser queries TTI->preferPredicateOverEpilogue to determine if
tail-folding is preferred for a loop, but it was not respecting loop hint
'predicate' that can disable this, which has now been added. This showed that
we were incorrectly initialising loop hint 'vectorize.predicate.enable' with 0
(i.e. FK_Disabled) but this should have been FK_Undefined, which has been
fixed.

Differential Revision: https://reviews.llvm.org/D70125
2019-11-14 13:10:44 +00:00
Reid Kleckner 05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Sanjay Patel a3e61946c5 [SLP] fix miscompile on min/max reductions with extra uses (PR43948)
The bug manifests as replacing a reduction operand with an undef
value.

The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.

In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.

For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.

Differential Revision: https://reviews.llvm.org/D70148
2019-11-13 15:57:35 -05:00
Sanjay Patel e9bf7a60a0 [SLP] reduce code duplication for min/max vs. other reductions; NFCI 2019-11-13 11:26:08 -05:00
Simon Pilgrim d1bd5e476b SLPVectorizer - make comparison operators + isInSchedulingRegion const
Fixes cppcheck warnings.
2019-11-13 14:40:19 +00:00
Vasileios Porpodas 6a18a95487 [SLP] Look-ahead operand reordering heuristic.
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for examples).

Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk

Reviewed By: RKSimon, dtemirbulatov

Subscribers: xbolva00, Carrot, hiraditya, phosek, rnk, rcorcs, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60897
2019-11-11 21:06:51 -08:00
Simon Pilgrim 4ff246fef2 Remove unused variable (which allows us to remove vector include). NFC. 2019-11-10 12:16:23 +00:00
Gil Rapaport 7f152543e4 [LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)
This recommits 11ed1c0239 (reverted in
9f08ce0d21 for failing an assert) with a fix:
tryToWidenMemory() now first checks if the widening decision is to interleave,
thus maintaining previous behavior where tryToInterleaveMemory() was called
first, giving priority to interleave decisions over widening/scalarization. This
commit adds the test case that exposed this bug as a LIT.
2019-11-09 20:52:25 +02:00
Gil Rapaport 9f08ce0d21 Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)"
This reverts commit 11ed1c0239 - causes an assert failure.
2019-11-08 22:17:11 +02:00
Gil Rapaport 11ed1c0239 [LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)
This recommits 100e797adb (reverted in
009e032634 for failing an assert). While the
root cause was independently reverted in eaff300401,
this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not
try to sink branch instructions.
2019-11-08 15:25:14 +02:00
Sanjay Patel 7ff57705ba [SLP] allow forming 2-way reduction patterns
We have a vector compare reduction problem seen in PR39665 comment 2:
https://bugs.llvm.org/show_bug.cgi?id=39665#c2

Or slightly reduced here:

define i1 @cmp2(<2 x double> %a0) {
  %a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0>
  %b = extractelement <2 x i1> %a, i32 0
  %c = extractelement <2 x i1> %a, i32 1
  %d = and i1 %b, %c
  ret i1 %d
}

SLP would not attempt to turn this into a vector reduction because there is an
artificial lower limit on that transform. We can not completely remove that limit
without inducing regressions though, so this patch just hacks an extra attempt at
creating a 2-way reduction to the end of the analysis.

As shown in the test file, we are still not getting some of the motivating cases,
so follow-on patches will be needed to solve those cases.

Differential Revision: https://reviews.llvm.org/D59710
2019-11-07 06:08:42 -05:00
Eric Christopher 009e032634 Temporarily Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)"
as it's causing assert failures.

This reverts commit 100e797adb.
2019-11-06 21:58:28 -08:00
Eric Christopher e511c4b0df Temporarily Revert:
"[SLP] Generalization of stores vectorization."
 "[SLP] Fix -Wunused-variable. NFC"
 "[SLP] Vectorize jumbled stores."

As they're causing significant (10-30x) compile time regressions on
vectorizable code.

The primary cause of the compile-time regression is f228b53716.

This reverts commits:

f228b53716
5503455ccb
21d498c9c0
2019-11-06 16:06:15 -08:00
Sjoerd Meijer 6c2a4f5ff9 [TTI][LV] preferPredicateOverEpilogue
We have two ways to steer creating a predicated vector body over creating a
scalar epilogue. To force this, we have 1) a command line option and 2) a
pragma available. This adds a third: a target hook to TargetTransformInfo that
can be queried whether predication is preferred or not, which allows the
vectoriser to make the decision without forcing it.

While this change behaves as a non-functional change for now, it shows the
required TTI plumbing, usage of this new hook in the vectoriser, and the
beginning of an ARM MVE implementation. I will follow up on this with:
- a complete MVE implementation, see D69845.
- a patch to disable this, i.e. we should respect "vector_predicate(disable)"
  and its corresponding loophint.

Differential Revision: https://reviews.llvm.org/D69040
2019-11-06 10:14:20 +00:00
Sergey Dmitriev 82588e05cc [SLP] - Add couple safety checks to TreeEntry::dump(). NFC
Summary: Check for MainOp and AltOp for NULL before dereferencing or issue NULL.

Reviewers: Vasilis, dtemirbulatov, RKSimon, ABataev

Reviewed By: ABataev

Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69812
2019-11-05 09:57:30 -08:00
Gil Rapaport 100e797adb [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)
This recommits 2be17087f8 (reverted in
d3ec06d219 for heap-use-after-free) with a fix
in IAI's reset() which was not clearing the set of interleave groups after
deleting them.
2019-11-05 17:29:13 +02:00
Alexey Bataev b80c41cd3c [SLP]Fix PR43799: Crash on different sizes of GEP indices.
Summary:
If the GEP instructions are going to be vectorized, the indices in those
GEP instructions must be of the same type. Otherwise, the compiler may
crash when trying to build the vector constant.

Reviewers: RKSimon, spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69627
2019-11-04 10:36:26 -05:00
Benjamin Kramer d3ec06d219 Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)"
This reverts commit 2be17087f8. Fails ASAN.
2019-11-04 15:04:42 +01:00
Gil Rapaport 2be17087f8 [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)
The sink-after and interleave-group vectorization decisions were so far applied to
VPlan during initial VPlan construction, which complicates VPlan construction – also because of
their inter-dependence. This patch refactors buildVPlanWithRecipes() to construct a simpler
initial VPlan and later apply both these vectorization decisions, in order, as VPlan-to-VPlan
transformations.

Differential Revision: https://reviews.llvm.org/D68577
2019-11-04 10:37:39 +02:00
Simon Pilgrim 81ba611e88 Ensure VPlanPrinter::Depth is initialized to fix static analyzer warning. NFCI. 2019-11-03 11:17:05 +00:00
Alexey Bataev 70ad617dd6 [SLP] Vectorize jumbled stores.
Summary:
Patch adds support for vectorization of the jumbled stores. The value
operands are vectorized and then shuffled in the right order before
store.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43339
2019-10-31 16:02:25 -04:00
Haojian Wu e65ddcafee Revert "[SLP] Vectorize jumbled stores."
This reverts commit 21d498c9c0.

This commit causes some crashes on some targets.
2019-10-31 10:21:24 +01:00
Alexey Bataev 21d498c9c0 [SLP] Vectorize jumbled stores.
Summary:
Patch adds support for vectorization of the jumbled stores. The value
operands are vectorized and then shuffled in the right order before
store.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43339
2019-10-30 13:33:52 -04:00
Simon Pilgrim d52f5ed01a [SLPVectorizer] Use getAPInt() for comparison. NFCI.
Technically integers can assert on getZExtValue() if beyond i64 range, and a fuzzer usually find this.....
2019-10-30 16:16:55 +00:00
Fangrui Song 5503455ccb [SLP] Fix -Wunused-variable. NFC 2019-10-29 09:38:55 -07:00
Alexey Bataev f228b53716 [SLP] Generalization of stores vectorization.
Stores are vectorized with maximum vectorization factor of 16. Patch
tries to improve the situation and use maximal vectorization factor.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Differential Revision: https://reviews.llvm.org/D43582
2019-10-29 11:46:36 -04:00
Craig Topper 18824d25d8 [LV] Interleaving should not exceed estimated loop trip count.
Currently we may do iterleaving by more than estimated trip count
coming from the profile or computed maximum trip count. The solution is to
use "best known" trip count instead of exact one in interleaving analysis.

Patch by Evgeniy Brevnov.

Differential Revision: https://reviews.llvm.org/D67948
2019-10-28 10:58:22 -07:00
Guillaume Chatelet a4783ef58d [Alignment][NFC] getMemoryOpCost uses MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69307
2019-10-25 21:26:59 +02:00
Sanjay Patel b82fa80e80 [SLP] adjust code comment; NFC
(check commit access)
2019-10-25 11:39:43 -04:00
Guillaume Chatelet 5e1e83ee23 [Alignment][NFC] Instructions::getLoadStoreAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69256

llvm-svn: 375416
2019-10-21 14:49:28 +00:00
Sanjay Patel 8cc6d42e8d [SLP] avoid reduction transform on patterns that the backend can load-combine (2nd try)
The 1st attempt at this modified the cost model in a bad way to avoid the vectorization,
but that caused problems for other users (the loop vectorizer) of the cost model.

I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a cost-independent bailout with a conservative pattern match for a
multi-instruction sequence that can probably be reduced later.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 375025
2019-10-16 18:06:24 +00:00
Sam Parker 527a35e155 [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]
Add an extra parameter so the backend can take the alignment into
consideration.

Differential Revision: https://reviews.llvm.org/D68400

llvm-svn: 374763
2019-10-14 10:00:21 +00:00
Benjamin Kramer 97c9804e06 [LV] Merge LLVM_DEBUG blocks.
Avoids unused variable warnings about the range-based for loops in
there. NFCI.

llvm-svn: 374646
2019-10-12 10:57:22 +00:00
Zi Xuan Wu 9802268ad3 recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374634
2019-10-12 02:53:04 +00:00
Florian Hahn 39d4c9fd56 [VPlan] Add moveAfter to VPRecipeBase.
This patch adds a moveAfter method to VPRecipeBase, which can be used to
move elements after other elements, across VPBasicBlocks, if necessary.

Reviewers: dcaballe, hsaito, rengolin, hfinkel

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D46825

llvm-svn: 374565
2019-10-11 15:36:55 +00:00
Florian Hahn a3ca7acb4f [LV][NFC] Factor out calculation of "best" estimated trip count.
This is just small refactoring to minimize changes in upcoming patch.
In the next path I'm going to introduce changes into heuristic for vectorization of "tiny trip count" loops.

Patch by Evgeniy Brevnov <evgueni.brevnov@gmail.com>

Reviewers: hsaito, Ayal, fhahn, reames

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D67690

llvm-svn: 374338
2019-10-10 13:07:01 +00:00
Guillaume Chatelet 837a1b84ce [Alignment][NFC] Make VectorUtils uas llvm::Align
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, rogfer01, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68784

llvm-svn: 374330
2019-10-10 12:35:04 +00:00
Sanjay Patel df14bd315d [SLP] respect target register width for GEP vectorization (PR43578)
We failed to account for the target register width (max vector factor)
when vectorizing starting from GEPs. This causes vectorization to
proceed to obviously illegal widths as in:
https://bugs.llvm.org/show_bug.cgi?id=43578

For x86, this also means that SLP can produce rogue AVX or AVX512
code even when the user specifies a narrower vector width.

The AArch64 test in ext-trunc.ll appears to be better using the
narrower width. I'm not exactly sure what getelementptr.ll is trying
to do, but it's testing with "-slp-threshold=-18", so I'm not worried
about those diffs. The x86 test is an over-reduction from SPEC h264;
this patch appears to restore the perf loss caused by SLP when using
-march=haswell.

Differential Revision: https://reviews.llvm.org/D68667

llvm-svn: 374183
2019-10-09 16:32:49 +00:00
Sjoerd Meijer d1170dbe58 [LV] Emitting SCEV checks with OptForSize
When optimising for size and SCEV runtime checks need to be emitted to check
overflow behaviour, the loop vectorizer can run in this assert:

  LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks(
  llvm::Loop *, llvm::BasicBlock *): Assertion `!BB->getParent()->hasOptSize()
  && "Cannot SCEV check stride or overflow when opt

We should not generate predicates while optimising for size because
code will be generated for predicates such as these SCEV overflow runtime
checks.

This should fix PR43371.

Differential Revision: https://reviews.llvm.org/D68082

llvm-svn: 374166
2019-10-09 13:19:41 +00:00
Jinsong Ji 9912232b46 Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"

This reverts commit 9f41deccc0.
This reverts commit 18b6fe07bc.

The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.

llvm-svn: 374091
2019-10-08 17:32:56 +00:00
Kadir Cetinkaya 18b6fe07bc [LoopVectorize] Fix non-debug builds after rL374017
llvm-svn: 374021
2019-10-08 07:39:50 +00:00
Zi Xuan Wu 9f41deccc0 [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374017
2019-10-08 03:28:33 +00:00
Martin Storsjo dfc1aee25b Revert "[SLP] avoid reduction transform on patterns that the backend can load-combine"
This reverts SVN r373833, as it caused a failed assert "Non-zero loop
cost expected" on building numerous projects, see PR43582 for details
and reproduction samples.

llvm-svn: 373882
2019-10-07 08:21:37 +00:00
Sanjay Patel e2321bb448 [SLP] avoid reduction transform on patterns that the backend can load-combine
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 373833
2019-10-05 18:03:58 +00:00
Guillaume Chatelet d400d45150 [Alignment][NFC] Remove StoreInst::setAlignment(unsigned)
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu, jdoerfert

Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68268

llvm-svn: 373595
2019-10-03 13:17:21 +00:00
Guillaume Chatelet 17380227e8 [Alignment][NFC] Remove LoadInst::setAlignment(unsigned)
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jdoerfert

Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68142

llvm-svn: 373195
2019-09-30 09:37:05 +00:00
Alexey Bataev 8b1eeafb91 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 373166
2019-09-29 14:18:06 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Jordan Rupprecht f98d2c099a Revert [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
This reverts r372626 (git commit 6a278d9073)

llvm-svn: 373019
2019-09-26 22:09:17 +00:00
Simon Pilgrim 934f18144d LoopVectorize - silence static analyzer dyn_cast<CmpInst> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<CmpInst> directly and if not assert will fire for us.

llvm-svn: 372732
2019-09-24 11:27:38 +00:00
Sjoerd Meijer 0fcb3afb40 [LV] Forced vectorization with runtime checks and OptForSize
When vectorisation is forced with a pragma, we optimise for min size, and we
need to emit runtime memory checks, then allow this code growth and don't run
in an assert like we currently do.

This is the result of D65197 and D66803, and was a use-case not really
considered before. If this now happens, we emit an optimisation remark warning
about the code-size expansion, which can be avoided by not forcing
vectorisation or possibly source-code modifications.

Differential Revision: https://reviews.llvm.org/D67764

llvm-svn: 372694
2019-09-24 08:03:34 +00:00
Alexey Bataev 6a278d9073 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Summary:
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 372626
2019-09-23 16:25:03 +00:00
Simon Pilgrim a56bd6c51e [VPlan] Silence static analyzer dyn_cast null dereference warning. NFCI.
llvm-svn: 372502
2019-09-22 13:02:00 +00:00
Simon Pilgrim a2719f38c1 [LoopVectorize] Don't dereference a dyn_cast result. NFCI.
The static analyzer is warning about potential null dereferences of dyn_cast<> results, we can use cast<> directly as we know that these cases should all be CastInst, which is why its working atm and anyway cast<> will assert if they aren't.

llvm-svn: 372116
2019-09-17 13:24:54 +00:00
Simon Pilgrim 1aaefbca24 [VPlanSLP] Don't dereference a cast_or_null<VPInstruction> result. NFCI.
The static analyzer is warning about a potential null dereference of the cast_or_null result, I've split the cast_or_null check from the ->getUnderlyingInstr() call to avoid this, but it appears that we weren't seeing any null pointers in the dumped bundles in the first place.

llvm-svn: 371975
2019-09-16 11:22:44 +00:00
Simon Pilgrim bfe6b35c70 [SLPVectorizer] Assert that we find a LastInst to silence analyzer null dereference warning. NFCI.
llvm-svn: 371974
2019-09-16 10:48:16 +00:00
Simon Pilgrim ae625d70cd [SLPVectorizer] Don't dereference a dyn_cast result. NFCI.
The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't.

llvm-svn: 371973
2019-09-16 10:35:09 +00:00
Simon Pilgrim 4e46ea3946 [LoadStoreVectorizer] vectorizeLoadChain - ensure we find a valid Type down the load chain. NFCI.
Silence static analyzer uninitialized variable warning by setting the LoadTy to null and then asserting we find a real value.

llvm-svn: 371936
2019-09-15 16:44:35 +00:00
Sanjay Patel b6a0faaa0c [SLP] limit vectorization of Constant subclasses (PR33958)
This is a fix for:
https://bugs.llvm.org/show_bug.cgi?id=33958

It seems universally true that we would not want to transform this kind of
sequence on any target, but if that's not correct, then we could view this
as a target-specific cost model problem. We could also white-list ConstantInt,
ConstantFP, etc. rather than blacklist Global and ConstantExpr.

Differential Revision: https://reviews.llvm.org/D67362

llvm-svn: 371931
2019-09-15 13:03:24 +00:00
Philip Reames cffa630c80 [Loads] Move generic code out of vectorizer into a location it might be reused [NFC]
llvm-svn: 371558
2019-09-10 21:33:53 +00:00
Philip Reames 1e1db80048 [ValueTracking] Factor our common speculation suppression logic [NFC]
Expose a utility function so that all places which want to suppress speculation (when otherwise legal) due to ordering and/or sanitizer interaction can do so.

llvm-svn: 371556
2019-09-10 21:12:29 +00:00
Philip Reames 7403569be7 [LoopVectorize] Leverage speculation safety to avoid masked.loads
If we're vectorizing a load in a predicated block, check to see if the load can be speculated rather than predicated.  This allows us to generate a normal vector load instead of a masked.load.

To do so, we must prove that all bytes accessed on any iteration of the original loop are dereferenceable, and that all loads (across all iterations) are properly aligned.  This is equivelent to proving that hoisting the load into the loop header in the original scalar loop is safe.

Note: There are a couple of code motion todos in the code.  My intention is to wait about a day - to be sure this sticks - and then perform the NFC motion without furthe review.

Differential Revision: https://reviews.llvm.org/D66688

llvm-svn: 371452
2019-09-09 20:54:13 +00:00
Simon Pilgrim 879ed20bde Fix typo. NFCI
llvm-svn: 371317
2019-09-07 18:09:09 +00:00
Teresa Johnson 9c27b59cec Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.

Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.

There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.

Reviewers: chandlerc, hfinkel

Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66428

llvm-svn: 371284
2019-09-07 03:09:36 +00:00
Guillaume Chatelet 33671ceffa [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67223

llvm-svn: 371063
2019-09-05 13:09:42 +00:00
Philip Reames 4228245e41 [NFC] Switch last couple of invariant_load checks to use hasMetadata
llvm-svn: 370948
2019-09-04 18:27:31 +00:00
Bjorn Pettersson dd18ce4501 [LV] Fix miscompiles by adding non-header PHI nodes to AllowedExit
Summary:
Fold-tail currently supports reduction last-vector-value live-out's,
but has yet to support last-scalar-value live-outs, including
non-header phi's. As it relies on AllowedExit in order to detect
them and bail out we need to add the non-header PHI nodes to
AllowedExit, otherwise we end up with miscompiles.

Solves https://bugs.llvm.org/show_bug.cgi?id=43166

Reviewers: fhahn, Ayal

Reviewed By: fhahn, Ayal

Subscribers: anna, hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67074

llvm-svn: 370721
2019-09-03 09:33:55 +00:00
Sjoerd Meijer 718f909ccd [LV] Tail-folding, runtime scev checks
Now that we allow tail-folding, not only when we optimise for size, make
sure we do not run in this assert.

Differential revision: https://reviews.llvm.org/D66932

llvm-svn: 370711
2019-09-03 08:53:02 +00:00
Sjoerd Meijer 0469b0e4ef [LV] Tail-folding with runtime memory checks
The loop vectorizer was running in an assert when it tried to fold the tail and
had to emit runtime memory disambiguation checks.

Differential revision: https://reviews.llvm.org/D66803

llvm-svn: 370707
2019-09-03 08:38:24 +00:00
Simon Pilgrim 757cc16ab7 Fix cppcheck shadow variable and variable scope warnings. NFCI.
llvm-svn: 370580
2019-08-31 12:30:19 +00:00
Ayal Zaks d15df0ede5 [LV] Fold tail by masking - handle reductions
Allow vectorizing loops that have reductions when tail is folded by masking.
A select is introduced in VPlan, choosing between the last value carried by the
loop-exit/live-out instruction of the reduction, and the penultimate value
carried by the reduction phi, according to the "i < n" mask of fold-tail.
This select replaces the last value as the live-out value of the loop.

Differential Revision: https://reviews.llvm.org/D66720

llvm-svn: 370173
2019-08-28 09:02:23 +00:00
Philip Reames cf3b555973 Add a clarify comment for meaning of SafePointes [NFC]
Extracted from D66688 as requested.

llvm-svn: 369962
2019-08-26 20:48:35 +00:00
Sanjay Patel 5a5d44e801 [SLP] use range-for loops, fix formatting; NFC
These are part of D57059, but that patch doesn't apply cleanly to trunk
at this point, so we might as well remove some of the noise.

llvm-svn: 369776
2019-08-23 16:22:32 +00:00
Sanjay Patel 9182467886 [SLP] fix formatting; NFC
These are part of D57059, but that patch doesn't apply cleanly to trunk
at this point, so we might as well remove some of the noise.

llvm-svn: 369769
2019-08-23 15:26:12 +00:00
Dinar Temirbulatov 081c57989e [SLP][NFC] Avoid repetitive calls to getSameOpcode()
We can avoid repetitive calls getSameOpcode() for already known tree elements by keeping MainOp and AltOp in TreeEntry.

Differential Revision: https://reviews.llvm.org/D64700

llvm-svn: 369315
2019-08-20 00:22:04 +00:00
Sanjay Patel b38bac3699 [SLP] reduce duplicated code; NFC
llvm-svn: 369250
2019-08-19 11:39:56 +00:00
Vasileios Porpodas 1d254f3dae [SLPVectorizer] Make the scheduler aware of the TreeEntry operands.
Summary:
The scheduler's dependence graph gets the use-def dependencies by accessing the operands of the instructions in a bundle. However, buildTree_rec() may change the order of the operands in TreeEntry, and the scheduler is currently not aware of this. This is not causing any functional issues currently, because reordering is restricted to the operands of a single instruction. Once we support operand reordering across multiple TreeEntries, as shown here: http://www.llvm.org/devmtg/2019-04/slides/Poster-Porpodas-Supernode_SLP.pdf , the scheduler will need to get the correct operands from TreeEntry and not from the individual instructions.

In short, this patch:
- Connects the scheduler's bundle with the corresponding TreeEntry. It introduces new TE and Lane fields in ScheduleData.
- Moves the location where the operands of the TreeEntry are initialized. This used to take place in newTreeEntry() setting one operand at a time, but is now moved pre-order just before the recursion of buildTree_rec(). This is required because the scheduler needs to access both operands of the TreeEntry in tryScheduleBundle().
- Updates the scheduler to access the instruction operands through the TreeEntry operands instead of accessing the instruction operands directly.

Reviewers: ABataev, RKSimon, dtemirbulatov, Ayal, dorit, hfinkel

Reviewed By: ABataev

Subscribers: hiraditya, llvm-commits, lebedev.ri, rcorcs

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62432

llvm-svn: 369131
2019-08-16 17:21:18 +00:00
Simon Pilgrim 59894d4668 [SLPVectorizer] Silence null dereference warning. NFCI.
cppcheck + MSVC analyzer both over zealously warn that we might dereference a null Bundle pointer - add an assertion to check for null to silence the warning, plus its a good idea to check that we succeeded in finding a schedule bundle anyway....

llvm-svn: 369094
2019-08-16 10:28:23 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Dorit Nuzman d57d73daed [LV] fold-tail predication should be respected even with assume_safety
assume_safety implies that loads under "if's" can be safely executed
speculatively (unguarded, unmasked). However this assumption holds only for the
original user "if's", not those introduced by the compiler, such as the
fold-tail "if" that guards us from loading beyond the original loop trip-count.
Currently the combination of fold-tail and assume-safety pragmas results in
ignoring the fold-tail predicate that guards the loads, generating unmasked
loads. This patch fixes this behavior.

Differential Revision: https://reviews.llvm.org/D66106

Reviewers: Ayal, hsaito, fhahn
llvm-svn: 368973
2019-08-15 07:12:14 +00:00
Dinar Temirbulatov da0435a690 [SLP][NFC] Use pointers to address to ScalarToTreeEntry elements, instead of indexes.
llvm-svn: 368906
2019-08-14 19:46:50 +00:00
Dorit Nuzman 491ca2425d [LV] Fold-tail flag
This is the compiler-flag equivalent of the Predicate pragma
(https://reviews.llvm.org/D65197), to direct the vectorizer to fold the
remainder-loop into the main-loop using predication.

Differential Revision: https://reviews.llvm.org/D66108

Reviewers: Ayal, hsaito, fhahn, SjoerdMeije
llvm-svn: 368801
2019-08-14 05:22:20 +00:00
Craig Topper 005b22855e [LoopVectorize][X86] Clamp interleave factor if we have a known constant trip count that is less than VF*interleave
If we know the trip count, we should make sure the interleave factor won't cause the vectorized loop to exceed it.

Improves one of the cases from PR42674

Differential Revision: https://reviews.llvm.org/D65896

llvm-svn: 368215
2019-08-07 21:44:14 +00:00
Mitch Phillips 924359dc0f Revert "[X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors."
This reverts commit fc33e33776.

This commit depends on the rolled back commit rL367901, and thus needs
to be rolled back.

llvm-svn: 368109
2019-08-06 23:38:14 +00:00
Craig Topper fc33e33776 [X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors.
With the switch to widening legalization, we need to a better
job of costing extractions of less than 128-bits.

llvm-svn: 368081
2019-08-06 20:12:41 +00:00
Hideki Saito ec818d7fb3 [LV][NFC] Share the LV illegality reporting with LoopVectorize.
Reviewers: hsaito, fhahn, rengolin
 
Reviewed By: rengolin
 
Patch by psamolysov, thanks!
 
Differential Revision: https://reviews.llvm.org/D62997

llvm-svn: 367980
2019-08-06 06:08:48 +00:00
Stanislav Mekhanoshin 6fe00a21f2 Handle casts changing pointer size in the vectorizer
Added code to truncate or shrink offsets so that we can continue
base pointer search if size has changed along the way.

Differential Revision: https://reviews.llvm.org/D65612

llvm-svn: 367646
2019-08-02 04:03:37 +00:00
Stanislav Mekhanoshin eee9312a85 Relax load store vectorizer pointer strip checks
The previous change to fix crash in the vectorizer introduced
performance regressions. The condition to preserve pointer
address space during the search is too tight, we only need to
match the size.

Differential Revision: https://reviews.llvm.org/D65600

llvm-svn: 367624
2019-08-01 22:18:56 +00:00
Sjoerd Meijer e0dfce0723 Follow up of rL367592, fix the build
Some buildbots complained about:
error: default label in switch which covers all enumeration values

llvm-svn: 367603
2019-08-01 18:54:29 +00:00
Sjoerd Meijer 20b198ec5e [LV] Tail-Loop Folding
This allows folding of the scalar epilogue loop (the tail) into the main
vectorised loop body when the loop is annotated with a "vector predicate"
metadata hint. To fold the tail, instructions need to be predicated (masked),
enabling/disabling lanes for the remainder iterations.

Differential Revision: https://reviews.llvm.org/D65197

llvm-svn: 367592
2019-08-01 18:21:44 +00:00
Stanislav Mekhanoshin ba1e845c21 [AMDGPU] Fix for vectorizer crash with pointers of different size
When vectorizer strips pointers it can eventually end up with
pointers of two different sizes, then SCEV will crash.

Differential Revision: https://reviews.llvm.org/D65480

llvm-svn: 367443
2019-07-31 16:33:11 +00:00
Sjoerd Meijer 5c606cef79 [LV] Scalar Epilogue Lowering. NFC.
This refactors boolean 'OptForSize' that was passed around in a lot of places.
It controlled folding of the tail loop, the scalar epilogue, into the main loop
but code-size reasons may not be the only reason to do this. Thus, this is a
first step to generalise the concept of tail-loop folding, and hence OptForSize
has been renamed and is using an enum ScalarEpilogueStatus that holds the
status how the epilogue should be lowered.

This will be followed up by D65197, that picks up the predicate loop hint and
performs the tail-loop folding.

Differential Revision: https://reviews.llvm.org/D64916

llvm-svn: 366993
2019-07-25 08:06:02 +00:00
Simon Pilgrim 5d4bb8628c [SLPVectorizer] Revert local change that got accidently got committed in rL366799
This wasn't part of D63281

llvm-svn: 366807
2019-07-23 13:42:01 +00:00
Simon Pilgrim 743d45ee25 [TargetLowering] Add SimplifyMultipleUseDemandedBits
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.

The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.

We do see a couple of regressions that need to be addressed:

    AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
	
    X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.

The code owners have confirmed its ok for these cases to fixed up in future patches.

Differential Revision: https://reviews.llvm.org/D63281

llvm-svn: 366799
2019-07-23 12:39:08 +00:00
Simon Pilgrim 87adcf8c47 [SLPVectorizer] Remove null-pointer test. NFCI.
cast<CallInst> shouldn't return null and we dereference the pointer in a lot of other places, causing both MSVC + cppcheck to warn about dereferenced null pointers

llvm-svn: 366793
2019-07-23 10:51:43 +00:00
Simon Pilgrim 3ebd2fe91a [SLPVectorizer] Fix some MSVC/cppcheck uninitialized variable warnings. NFCI.
llvm-svn: 366712
2019-07-22 17:57:36 +00:00
Eric Christopher 93dfb93ad6 Temporarily Revert "[SLP] Recommit: Look-ahead operand reordering heuristic."
As there are some reported miscompiles with AVX512 and performance regressions
in Eigen. Verified with the original committer and testcases will be forthcoming.

This reverts commit r364964.

llvm-svn: 366154
2019-07-15 23:36:02 +00:00
Florian Hahn 1d554b7441 [LoopVectorize] Pass unfiltered list of arguments to getIntrinsicInstCost.
We do not compute the scalarization overhead in getVectorIntrinsicCost
and TTI::getIntrinsicInstrCost requires the full arguments list.

llvm-svn: 366049
2019-07-15 08:48:47 +00:00
Florian Hahn 9428d95ce7 [LV] Exclude loop-invariant inputs from scalar cost computation.
Loop invariant operands do not need to be scalarized, as we are using
the values outside the loop. We should ignore them when computing the
scalarization overhead.

Fixes PR41294

Reviewers: hsaito, rengolin, dcaballe, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D59995

llvm-svn: 366030
2019-07-14 20:12:36 +00:00
Fangrui Song b251cc0d91 Delete dead stores
llvm-svn: 365903
2019-07-12 14:58:15 +00:00
Nikita Popov 5ca39e828c [SLP] Optimize getSpillCost(); NFCI
For a given set of live values, the spill cost will always be the
same for each call. Compute the cost once and multiply it by the
number of calls.

(I'm not sure this spill cost modeling makes sense if there are
multiple calls, as the spill cost will likely be shared across
calls in that case. But that's how it currently works.)

llvm-svn: 365552
2019-07-09 20:24:44 +00:00
Vasileios Porpodas cf47ff5ffb [SLP] Recommit: Look-ahead operand reordering heuristic.
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk

Reviewed By: RKSimon, dtemirbulatov

Subscribers: hiraditya, phosek, rnk, rcorcs, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60897

llvm-svn: 364964
2019-07-02 20:20:28 +00:00
Jordan Rupprecht a7972dc04a Revert [SLP] Look-ahead operand reordering heuristic.
This reverts r364478 (git commit 574cb0eb3a)

The patch is causing compilation timeouts.

llvm-svn: 364846
2019-07-01 21:10:43 +00:00
Vasileios Porpodas 574cb0eb3a [SLP] Look-ahead operand reordering heuristic.
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk

Reviewed By: RKSimon, dtemirbulatov

Subscribers: rnk, rcorcs, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60897

llvm-svn: 364478
2019-06-26 21:25:24 +00:00
Vasileios Porpodas 3081f78776 [SLP] NFC: Fixed typo in comment
llvm-svn: 364237
2019-06-24 21:40:48 +00:00
Cameron McInally fe3f15cf90 [SLP] Support unary FNeg vectorization
Differential Revision: https://reviews.llvm.org/D63609

llvm-svn: 364219
2019-06-24 19:24:23 +00:00
Reid Kleckner 592a193285 Revert [SLP] Look-ahead operand reordering heuristic.
This reverts r364084 (git commit 5698921be2)

It caused crashes while compiling a file in Chrome. Reduction
forthcoming.

llvm-svn: 364111
2019-06-21 23:10:25 +00:00
Simon Pilgrim 5698921be2 [SLP] Look-ahead operand reordering heuristic.
This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D60897

llvm-svn: 364084
2019-06-21 17:57:01 +00:00
Orlando Cazalet-Hyams 1251cac62a [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

I have set up a separate review D61933 for a fix which is required for this patch.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse

Reviewed By: hfinkel, jmorse

Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

> llvm-svn: 363046

llvm-svn: 363786
2019-06-19 10:50:47 +00:00
Warren Ristow 6452bdd29b [LV] Suppress vectorization in some nontemporal cases
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.

This adds two new functions:
  bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
  bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;

to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.

This fixes https://llvm.org/PR40759

Differential Revision: https://reviews.llvm.org/D61764

llvm-svn: 363581
2019-06-17 17:20:08 +00:00
Whitney Tsang 15b7f5b72d PHINode: introduce setIncomingValueForBlock() function, and use it.
Summary:
There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue()
but no function to replace incoming value for a specified BasicBlock*
predecessor.
Clearly, there are a lot of places that could use that functionality.

Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn
Reviewed By: Meinersbur, fhahn
Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63338

llvm-svn: 363566
2019-06-17 14:38:56 +00:00
Bjorn Pettersson 83773b77a5 [LV] Deny irregular types in interleavedAccessCanBeWidened
Summary:
Avoid that loop vectorizer creates loads/stores of vectors
with "irregular" types when interleaving. An example of
an irregular type is x86_fp80 that is 80 bits, but that
may have an allocation size that is 96 bits. So an array
of x86_fp80 is not bitcast compatible with a vector
of the same type.

Not sure if interleavedAccessCanBeWidened is the best
place for this check, but it solves the problem seen
in the added test case. And it is the same kind of check
that already exists in memoryInstructionCanBeWidened.

Reviewers: fhahn, Ayal, craig.topper

Reviewed By: fhahn

Subscribers: hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63386

llvm-svn: 363547
2019-06-17 12:02:24 +00:00
Orlando Cazalet-Hyams a947156396 Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion"
This reverts commit 1a0f7a2077.
See phabricator thread for D60831.

llvm-svn: 363132
2019-06-12 08:34:51 +00:00
Orlando Cazalet-Hyams 1a0f7a2077 [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

I have set up a separate review D61933 for a fix which is required for this patch.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse

Reviewed By: hfinkel, jmorse

Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

llvm-svn: 363046
2019-06-11 10:37:20 +00:00
Fangrui Song 19189993c9 [LV] Fix -Wunused-function after r362736
llvm-svn: 362762
2019-06-07 01:48:26 +00:00
Renato Golin 9e97caf594 [LV] Wrap LV illegality reporting in a function. NFC.
A function for loop vectorization illegality reporting has been
introduced:

void LoopVectorizationLegality::reportVectorizationFailure(
    const StringRef DebugMsg, const StringRef OREMsg,
    const StringRef ORETag, Instruction * const I) const;

The function prints a debug message when the debug for the compilation
unit is enabled as well as invokes the optimization report emitter to
generate a message with a specified tag. The function doesn't cover any
complicated logic when a custom lambda should be passed to the emitter,
only generating a message with a tag is supported.

The function always prints the instruction `I` after the debug message
whenever the instruction is specified, otherwise the debug message
ends with a dot: 'LV: Not vectorizing: Disabled/already vectorized.'

Patch by Pavel Samolysov <samolisov@gmail.com>

llvm-svn: 362736
2019-06-06 19:15:52 +00:00
Dinar Temirbulatov 15c657d13d [SLP] Fix regression in broadcasts caused by operand reordering patch D59973.
This patch fixes a regression caused by the operand reordering refactoring patch https://reviews.llvm.org/D59973 .
The fix changes the strategy to Splat instead of Opcode, if broadcast opportunities are found.
Please see the lit test for some examples.

Committed on behalf of @vporpo (Vasileios Porpodas)
    
Differential Revision: https://reviews.llvm.org/D62427

llvm-svn: 362613
2019-06-05 15:26:28 +00:00
Sanjay Patel ad62a3a299 [LoopUtils][SLPVectorizer] clean up management of fast-math-flags
Instead of passing around fast-math-flags as a parameter, we can set those
using an IRBuilder guard object. This is no-functional-change-intended.

The motivation is to eventually fix the vectorizers to use and set the
correct fast-math-flags for reductions. Examples of that not behaving as
expected are:
https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast')
https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0)
D61802 (should be able to reduce with IR-level FMF)

Differential Revision: https://reviews.llvm.org/D62272

llvm-svn: 362612
2019-06-05 14:58:04 +00:00
Florian Hahn 9bbdde2598 [LV] Remove the redundant using LoopVectorizationPlanner:VPlanPtr
VPlan.h already contains the declaration of VPlanPtr type alias:

using VPlanPtr = std::unique_ptr<VPlan>;

The LoopVectorizationPlanner class also contains the same declaration
of VPlanPtr and therefore LoopVectorize requires a long wording when
its methods return VPlanPtr:

    LoopVectorizationPlanner::VPlanPtr
    LoopVectorizationPlanner::buildVPlanWithVPRecipes(...)

but LoopVectorize.cpp includes VPlan.h (via LoopVectorizationPlanner.h)
and can use VPlanPtr from that header.

Patch by Pavel Samolysov.

Reviewers: hsaito, rengolin, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D62576

llvm-svn: 362126
2019-05-30 18:46:13 +00:00
Craig Topper 778e445c58 [LoopVectorize] Add FNeg instruction support
Differential Revision: https://reviews.llvm.org/D62510

llvm-svn: 362124
2019-05-30 18:19:35 +00:00
Florian Hahn e4cfa89915 [LV] Inform about exactly reason of loop illegality
Currently, only the following information is provided by LoopVectorizer
in the case when the CF of the loop is not legal for vectorization:

 LV: Can't vectorize the instructions or CFG
    LV: Not vectorizing: Cannot prove legality.

But this information is not enough for the root cause analysis; what is
exactly wrong with the loop should also be printed:

 LV: Not vectorizing: The exiting block is not the loop latch.

Patch by Pavel Samolysov.

Reviewers: mkuper, hsaito, rengolin, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D62311

llvm-svn: 362056
2019-05-30 05:03:12 +00:00
Alina Sbirlea 63729b0c49 [SLPVectorizer] Set flag to previous default.
Summary:
The refactoring in r360276 moved the `RunSLPVectorization` flag and added the default explicitly. The default should have been `false`, as before.

The new pass manager used to have SLPVectorization on by default, now it's off in opt, and needs D61617 checked in to enable it in clang.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, eraman, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61955

llvm-svn: 361537
2019-05-23 19:07:41 +00:00
Simon Pilgrim 3c05cad03e LoopVectorizationCostModel::selectInterleaveCount - assert we have a non-zero loop cost. NFCI.
The input LoopCost value can be zero, but if so it should be recalculated with the current VF. After that it should always be non-zero.

llvm-svn: 361387
2019-05-22 14:18:17 +00:00
Dinar Temirbulatov 2ff72f6654 [SLP] Refactoring of EdgeInfo and UserTreeIdx in buildTree_rec().
This is a follow-up refactoring patch after the introduction of usable TreeEntry pointers in D61706.
The EdgeInfo struct can now use a TreeEntry pointer instead of an index in VectorizableTree.

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D61795

llvm-svn: 361110
2019-05-19 01:30:41 +00:00
Florian Hahn 9e778e6c73 [LV] Move getScalarizationOverhead and vector call cost computations to CM. (NFC)
This reduces the number of parameters we need to pass in and they seem a
natural fit in LoopVectorizationCostModel. Also simplifies things for
D59995.

As a follow up refactoring, we could only expose a expose a
shouldUseVectorIntrinsic() helper in LoopVectorizationCostModel, instead
of calling getVectorCallCost/getVectorIntrinsicCost in
InnerLoopVectorizer/VPRecipeBuilder.

Reviewers: Ayal, hsaito, dcaballe, rengolin

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D61638

llvm-svn: 360758
2019-05-15 10:05:49 +00:00
Simon Pilgrim 6c3ae79e9b [SLP] Refactor VectorizableTree to use unique_ptr.
This patch fixes the TreeEntry dangling pointer issue caused by reallocations of VectorizableTree.

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D61706

llvm-svn: 360456
2019-05-10 18:55:17 +00:00
Alina Sbirlea 458c7339e1 [NewPassManager] Add tuning option: SLPVectorization [NFC].
Summary: Mirror tuning option from old pass manager in new pass manager.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61616

llvm-svn: 360276
2019-05-08 17:58:35 +00:00
Alina Sbirlea f31eba6494 [MemorySSA] Teach LoopSimplify to preserve MemorySSA.
Summary:
Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available.
Do not preserve it in the new pass manager.
Update tests.

Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60833

llvm-svn: 360270
2019-05-08 17:05:36 +00:00
Simon Pilgrim cced3ecc35 [VPlan] Fix "value never used" static analyzer warning. NFCI.
llvm-svn: 360241
2019-05-08 10:52:26 +00:00
Kostya Serebryany b9c5768302 revert r360162 as it breaks most of the buildbots
llvm-svn: 360190
2019-05-07 20:57:11 +00:00
Orlando Cazalet-Hyams 78a6062c24 [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel

Reviewed By: hfinkel

Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

llvm-svn: 360162
2019-05-07 15:37:38 +00:00
Simon Pilgrim 97fbc2abfe [LoadStoreVectorizer] vectorizeStoreChain - ensure we find a store type.
Properly initialize store type to null then ensure we find a real store type in the chain.

Fixes scan-build null dereference warning and makes the code clearer.

llvm-svn: 360031
2019-05-06 10:25:11 +00:00
Simon Pilgrim afb0e664e6 [SLPVectorizer] Prefer pre-increments. NFCI.
llvm-svn: 359989
2019-05-05 17:53:09 +00:00
Simon Pilgrim 5b05f20a3a [SLPVectorizer] Make getSpillCost() const. NFCI.
Ideally getTreeCost() should be const as well but non-const Type creation would need to be addressed first.

llvm-svn: 359975
2019-05-05 10:37:38 +00:00
Alina Sbirlea 733c8c40c8 Enable LoopVectorization by default.
Summary:
When refactoring vectorization flags, vectorization was disabled by default in the new pass manager.
This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults.
Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization.

Reviewers: chandlerc, jgorbe

Subscribers: jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61091

llvm-svn: 359167
2019-04-25 04:49:48 +00:00
Alexey Bataev ef3c1884ec [SLP] Fix crash after r358519, by V. Porpodas.
Summary: The code did not check if operand was undef before casting it to Instruction.

Reviewers: RKSimon, ABataev, dtemirbulatov

Reviewed By: ABataev

Subscribers: uabelho

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61024

llvm-svn: 359136
2019-04-24 20:21:32 +00:00
Fangrui Song efd94c56ba Use llvm::stable_sort
While touching the code, simplify if feasible.

llvm-svn: 358996
2019-04-23 14:51:27 +00:00
Alina Sbirlea 0499a2f961 [NewPassManager] Adding pass tuning options: loop vectorize.
Summary:
Trying to add the plumbing necessary to add tuning options to the new pass manager.
Testing with the flags for loop vectorize.

Reviewers: chandlerc

Subscribers: sanjoy, mehdi_amini, jlebar, steven_wu, dexonsmith, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59723

llvm-svn: 358763
2019-04-19 16:11:59 +00:00
Ali Tamur e8de5cd602 Fix a typo in comments. [NFC]
llvm-svn: 358531
2019-04-16 21:37:43 +00:00
Simon Pilgrim 82ffa88a04 [SLP] Refactoring of the operand reordering code.
This is a refactoring patch which should have all the functionality of the current code. Its goal is twofold:
i. Cleanup and simplify the reordering code, and
ii. Generalize reordering so that it will work for an arbitrary number of operands, not just 2.

This is the second patch in a series of patches that will enable operand reordering across chains of operations. An example of this was presented in EuroLLVM'18 https://www.youtube.com/watch?v=gIEn34LvyNo .

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59973

llvm-svn: 358519
2019-04-16 19:27:00 +00:00
Hiroshi Yamauchi 09e539fcae [PGO] Profile guided code size optimization.
Summary:
Enable some of the existing size optimizations for cold code under PGO.

A ~5% code size saving in big internal app under PGO.

The way it gets BFI/PSI is discussed in the RFC thread

http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html 

Note it doesn't currently touch loop passes.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59514

llvm-svn: 358422
2019-04-15 16:49:00 +00:00
Florian Hahn db1a69c250 [VPLAN] Minor improvement to testing and debug messages.
1. Use computed VF for stress testing.
2. If the computed VF does not produce vector code (VF smaller than 2), force VF to be 4.
3. Test vectorization of i64 data on AArch64 to make sure we generate VF != 4 (on X86 that was already tested on AVX).

Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D59952

llvm-svn: 358056
2019-04-10 08:17:28 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Vedant Kumar c6bceec01a [DebugInfo] Fix pr41180 : Loop Vectorization Debugify Failure
Bug: https://bugs.llvm.org/show_bug.cgi?id=41180

In the bug test case the debug location was missing for the cmp instruction in
the "middle block" BB. This patch fixes the bug by copying the debug location
from the cmp of the scalar loop's terminator branch, if it exists.

The patch also fixes the debug location on the subsequent branch instruction.
It was previously using the location of the of the original loop's pre-header
block terminator. Both of these instructions will now map to the source line of
the conditional branch in the original loop.

A regression test has been added that covers these issues.

Patch by Orlando Cazalet-Hyams!

Differential Revision: https://reviews.llvm.org/D59944

llvm-svn: 357499
2019-04-02 17:28:34 +00:00
Simon Pilgrim a3b71018d9 [SLP] reorderInputsAccordingToOpcode is const method. NFCI.
llvm-svn: 357490
2019-04-02 16:27:11 +00:00
Simon Pilgrim b06935fa8c [SLP] getVectorElementSize and isTreeTinyAndNotFullyVectorizable are const methods. NFCI.
llvm-svn: 357416
2019-04-01 17:48:03 +00:00
Simon Pilgrim f6c04ad486 [SLP] getGatherCost and isFullyVectorizableTinyTree are const methods. NFCI.
llvm-svn: 357414
2019-04-01 17:32:46 +00:00
Simon Pilgrim 6a75c36ea9 [SLP] Add support for commutative icmp/fcmp predicates
For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands.

This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?).

Differential Revision: https://reviews.llvm.org/D59992

llvm-svn: 357266
2019-03-29 15:28:25 +00:00
Simon Pilgrim 62f0d1650a [SLP] Add support for swapping icmp/fcmp predicates to permit vectorization
We should be able to match elements with the swapped predicate as well - as long as we commute the source operands.

Differential Revision: https://reviews.llvm.org/D59956

llvm-svn: 357243
2019-03-29 10:41:00 +00:00
Benjamin Kramer ba2ea93ad1 Make helper functions static. NFC.
llvm-svn: 357187
2019-03-28 17:18:42 +00:00
Florian Hahn e21ed594d8 [VPlan] Determine Vector Width programmatically.
With this change, the VPlan native path is triggered with the directive:

   #pragma clang loop vectorize(enable)

There is no need to specify the vectorize_width(N) clause.

Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D57598

llvm-svn: 357156
2019-03-28 10:37:12 +00:00
Simon Pilgrim 6f96795b88 [SLPVectorizer] Merge reorderAltShuffleOperands into reorderInputsAccordingToOpcode
As discussed on D59738, this generalizes reorderInputsAccordingToOpcode to handle multiple + non-commutative instructions so we can get rid of reorderAltShuffleOperands and make use of the extra canonicalizations that reorderInputsAccordingToOpcode brings.

Differential Revision: https://reviews.llvm.org/D59784

llvm-svn: 356939
2019-03-25 20:05:27 +00:00
Simon Pilgrim ff3abef395 [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.

This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.

Differential Revision: https://reviews.llvm.org/D59738

llvm-svn: 356913
2019-03-25 15:53:55 +00:00
Simon Pilgrim 5cd4eb96f6 [SLPVectorizer] shouldReorderOperands - just check for reordering. NFCI.
Remove the I.getOperand() calls from inside shouldReorderOperands - reorderInputsAccordingToOpcode should handle the creation of the operand lists and shouldReorderOperands should just check to see whether the i'th element should be commuted.

llvm-svn: 356854
2019-03-24 13:36:32 +00:00
Simon Pilgrim 1466e5c383 Fix unused variable warning on non-asserts builds. NFCI.
llvm-svn: 356841
2019-03-23 16:56:23 +00:00
Simon Pilgrim 64feec7977 Remove unused function argument. NFCI.
llvm-svn: 356840
2019-03-23 16:20:34 +00:00
Simon Pilgrim c7ba9555cf [SLPVectorizer] reorderInputsAccordingToOpcode - use InstructionState directly. NFCI.
llvm-svn: 356832
2019-03-23 13:44:06 +00:00
Simon Pilgrim f4f01f3cff [SLPVectorizer] Don't repeat VL.size() call. NFCI.
llvm-svn: 356830
2019-03-23 12:11:25 +00:00
Simon Pilgrim b68322f9d0 [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059

llvm-svn: 356814
2019-03-22 21:27:11 +00:00
Simon Pilgrim d3a8fd8bfb Revert rL355906: [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059
........

Reverted due to buildbot failures that I don't have time to track down.

llvm-svn: 355913
2019-03-12 11:51:59 +00:00
Simon Pilgrim 5db95efdbd Try to fix SLPVectorizer BoUpSLP::BoEdgeInfo::dump visibility on non-debug builds
llvm-svn: 355912
2019-03-12 11:31:06 +00:00
Simon Pilgrim 2086a8894d [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059

llvm-svn: 355906
2019-03-12 10:51:51 +00:00
Sanjoy Das 3f5ce18658 Reland "Relax constraints for reduction vectorization"
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.

Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355889
2019-03-12 01:31:44 +00:00
Sanjoy Das 2136a5bc49 Revert "Relax constraints for reduction vectorization"
This reverts commit r355868.  Breaks hexagon.

llvm-svn: 355873
2019-03-11 22:37:31 +00:00
Sanjoy Das 93f8cc186a Relax constraints for reduction vectorization
Summary:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355868
2019-03-11 21:36:41 +00:00
Jonas Hahnfeld e071cd86df Hide two unused debugging methods, NFCI.
GCC correctly moans that PlainCFGBuilder::isExternalDef(llvm::Value*) and
StackSafetyDataFlowAnalysis::verifyFixedPoint() are defined but not used
in Release builds. Hide them behind 'ifndef NDEBUG'.

llvm-svn: 355205
2019-03-01 17:15:21 +00:00
Simon Pilgrim a066f1f9e6 [Vectorizer] Add vectorization support for fixed smul/umul intrinsics
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.

Differential Revision: https://reviews.llvm.org/D58616

llvm-svn: 354790
2019-02-25 15:42:02 +00:00
Alina Sbirlea 0a8bc14ad7 [MemorySSA & LoopPassManager] Add remaining book keeping [NFCI].
Add plumbing to get MemorySSA in the remaining loop passes.
Also update unit test to add the dependency.
[EnableMSSALoopDependency remains disabled].

llvm-svn: 353901
2019-02-12 23:48:02 +00:00
Michael Kruse 77a614a6e1 Refactor setAlreadyUnrolled() and setAlreadyVectorized().
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.

In doing so we fix 3 potential issues:

 * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
   metadata even though it will not be used anymore. This already caused
   problems such as http://llvm.org/PR40546. Change the behavior to the
   one of setAlreadyUnrolled which deletes this loop metadata.

 * setAlreadyUnrolled() used to create a new LoopID by calling
   MDNode::get with nullptr as the first operand, then replacing it by
   the returned references using replaceOperandWith. It is possible
   that MDNode::get would instead return an existing node (due to
   de-duplication) that then gets modified. To avoid, use a fresh
   TempMDNode that does not get uniqued with anything else before
   replacing it with replaceOperandWith.

 * LoopVectorizeHints::matchesHintMetadataName() only compares the
   suffix of the attribute to set the new value for. That is, when
   called with "enable", would erase attributes such as
   "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
   "llvm.loop.distribute.enable" instead of the one to replace.
   Fortunately, function was only called with "isvectorized".

Differential Revision: https://reviews.llvm.org/D57566

llvm-svn: 353738
2019-02-11 19:45:44 +00:00
Chandler Carruth b53f0e1145 Update files that were mistakenly added with the old file header to the
new one.

llvm-svn: 353665
2019-02-11 08:07:38 +00:00
Florian Hahn f557a94aa3 [LV] Remove unnecessary assignment to UserIC.
llvm-svn: 353469
2019-02-07 21:23:37 +00:00