Commit Graph

2117 Commits

Author SHA1 Message Date
Francesco Petrogalli c8d2b065b9 [llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]
Changes:

* Change `ToVectorTy` to deal directly with `ElementCount` instances.
* `VF == 1` replaced with `VF.isScalar()`.
* `VF > 1` and `VF >=2` replaced with `VF.isVector()`.
* `VF <=1` is replaced with `VF.isZero() || VF.isScalar()`.
* Add `<` operator to `ElementCount` to be able to use
`llvm::SmallSetVector<ElementCount, ...>`.
* Bits and pieces around printing the ElementCount to string streams.
* Added a static method to `ElementCount` to represent a scalar.

To guarantee that this change is a NFC, `VF.Min` and asserts are used
in the following places:

1. When it doesn't make sense to deal with the scalable property, for
example:
   a. When computing unrolling factors.
   b. When shuffle masks are built for fixed width vector types
In this cases, an
assert(!VF.Scalable && "<mgs>") has been added to make sure we don't
enter coepaths that don't make sense for scalable vectors.
2. When there is a conscious decision to use `FixedVectorType`. These
uses of `FixedVectorType` will likely be removed in favour of
`VectorType` once the vectorizer is generic enough to deal with both
fixed vector types and scalable vector types.
3. When dealing with building constants out of the value of VF, for
example when computing the vectorization `step`, or building vectors
of indices. These operation _make sense_ for scalable vectors too,
but changing the code in these places to be generic and make it work
for scalable vectors is to be submitted in a separate patch, as it is
a functional change.
4. When building the potential VFs in VPlan. Making the VPlan generic
enough to handle scalable vectorization factors is a functional change
that needs a separate patch. See for example `void
LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned
MaxVF)`.
5. The class `IntrinsicCostAttribute`: this class still uses `unsigned
VF` as updating the field to use `ElementCount` woudl require changes
that could result in changing the behavior of the compiler. Will be done
in a separate patch.
7. When dealing with user input for forcing the vectorization
factor. In this case, adding support for scalable vectorization is a
functional change that migh require changes at command line.

Differential Revision: https://reviews.llvm.org/D85794
2020-08-24 13:39:42 +00:00
David Green 2b69efded0 [ARM][LV] Add a preferPredicatedReductionSelect target hook
As part of D84741, this adds a target hook for the
preferPredicatedReductionSelect option and makes use
of it under MVE, allowing us to tail predicate most
reduction loops.

Differential Revision: https://reviews.llvm.org/D85980
2020-08-21 08:48:12 +01:00
David Green 816097e4e5 [LV] Allow tail folded reduction selects to remain in the loop
The normal scheme for tail folding reductions is to use:

loop:
  p = phi(0, a)
  mask = ...
  x = masked_load(..., mask)
  a = add(x, p)
s = select(mask, a, p)

This means we need to keep the register p and a alive out of the loop, plus
the mask. On a target with predicated operations we can instead generate
the phi as p = phi(0, s). This ensures the select in the loop and we can
fold select(m, add(a, b), c) to something like a vaddt c, a, b using the
m predicate. This in turn allows us to tail predicate the entire loop.

Differential Revision: https://reviews.llvm.org/D84741
2020-08-20 14:31:14 +01:00
Hiroshi Yamauchi ab401a8c8a [PGO][PGSO][LV] Fix loop not vectorized issue under profile guided size opts.
D81345 appears to accidentally disables vectorization when explicitly
enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to
the vectorization with versioning-for-unit-stride for PGSO.

Differential Revision: https://reviews.llvm.org/D85784
2020-08-19 12:13:34 -07:00
Mehdi Amini a407ec9b6d Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private.""
Was reverted because MLIR/Flang builds were broken, these APIs have been
fixed in the meantime.
2020-08-19 17:26:36 +00:00
Mehdi Amini 4fc56d70aa Revert "[NFC][llvm] Make the contructors of `ElementCount` private."
This reverts commit 264afb9e6a.
(and dependent 6b742cc48 and fc53bd610f)

MLIR/Flang are broken.
2020-08-19 17:21:37 +00:00
Francesco Petrogalli 264afb9e6a [NFC][llvm] Make the contructors of `ElementCount` private.
Differential Revision: https://reviews.llvm.org/D86120
2020-08-19 16:26:44 +00:00
Bjorn Pettersson 11446b02c7 [VectorCombine] Fix for non-zero addrspace when creating vector load from scalar load
This is a fixup to commit 43bdac2906, to make sure the
address space from the original load pointer is retained in the
vector pointer.

Resolves problem with
  Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
due to address space mismatch.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D85912
2020-08-13 18:25:32 +02:00
Sanjay Patel cc892fd9f4 [VectorCombine] early exit if target has no vector registers
Based on post-commit discussion in:
D81766

Other vectorization passes (SLP and Loop) use this TTI API similarly.
2020-08-12 09:22:31 -04:00
Sanjay Patel b0b95dab1c [VectorCombine] add safety check for 0-width register
Based on post-commit discussion in D81766, Hexagon sets this to "0".
I'll see if I can come up with a test, but making the obvious
code fix first to unblock that target.
2020-08-11 20:30:02 -04:00
Dinar Temirbulatov b1600d8b89 [NFC] Guard the cost report block of debug outputs with NDEBUG and
switch to SmallString, this is part of D57779.
2020-08-11 16:34:47 +02:00
Florian Hahn 0b774acf11 [SLP] Make sure instructions are ordered when computing spill cost.
The entries in VectorizableTree are not necessarily ordered by their
position in basic blocks. Collect them and order them by dominance so
later instructions are guaranteed to be visited first. For instructions
in different basic blocks, we only scan to the beginning of the block,
so their order does not matter, as long as all instructions in a basic
block are grouped together. Using dominance ensures a deterministic order.

The modified test case contains an example where we compute a wrong
spill cost (2) without this patch, even though there is no call between
any instruction in the bundle.

This seems to have limited practical impact, .e.g on X86 with a recent
Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006
there are no binary changes.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D82444
2020-08-11 11:18:12 +02:00
Sanjay Patel 43bdac2906 [VectorCombine] try to create vector loads from scalar loads
This patch was adjusted to match the most basic pattern that starts with an insertelement
(so there's no extract created here). Hopefully, that removes any concern about
interfering with other passes. Ie, the transform should almost always be profitable.

We could make an argument that this could be part of canonicalization, but we
conservatively try not to create vector ops from scalar ops in passes like instcombine.

If the transform is not profitable, the backend should be able to re-scalarize the load.

Differential Revision: https://reviews.llvm.org/D81766
2020-08-09 09:05:06 -04:00
Anton Afanasyev a7478fab6c [SLP] Fix order of `insertelement`/`insertvalue` seed operands
Summary:
This patch takes the indices operands of `insertelement`/`insertvalue`
into account while generation of seed elements for `findBuildAggregate()`.
This function has kept the original order of `insert`s before.
Also this patch optimizes `findBuildAggregate()` preventing it from
redundant temporary vector allocations and its multiple reversing.

Fixes llvm.org/pr44067

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83779
2020-08-06 22:09:24 +03:00
David Green 745bf6cf44 [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-06 10:10:50 +01:00
Jordan Rupprecht 3c39db0c44 Revert "[LoopVectorizer] Inloop vector reductions"
This reverts commit e9761688e4. It breaks the build:

```
~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>'
  return ReductionOperations;
```
2020-08-05 10:24:15 -07:00
David Green e9761688e4 [LoopVectorizer] Inloop vector reductions
Arm MVE has multiple instructions such as VMLAVA.s8, which (in this
case) can take two 128bit vectors, sign extend the inputs to i32,
multiplying them together and sum the result into a 32bit general
purpose register. So taking 16 i8's as inputs, they can multiply and
accumulate the result into a single i32 without any rounding/truncating
along the way. There are also reduction instructions for plain integer
add and min/max, and operations that sum into a pair of 32bit registers
together treated as a 64bit integer (even though MVE does not have a
plain 64bit addition instruction). So giving the vectorizer the ability
to use these instructions both enables us to vectorize at higher
bitwidths, and to vectorize things we previously could not.

In order to do that we need a way to represent that the reduction
operation, specified with a llvm.experimental.vector.reduce when
vectorizing for Arm, occurs inside the loop not after it like most
reductions. This patch attempts to do that, teaching the vectorizer
about in-loop reductions. It does this through a vplan recipe
representing the reductions that the original chain of reduction
operations is replaced by. Cost modelling is currently just done through
a prefersInloopReduction TTI hook (which follows in a later patch).

Differential Revision: https://reviews.llvm.org/D75069
2020-08-05 18:14:05 +01:00
Bardia Mahjour 3c0f347002 [NFC][LV] Vectorized Loop Skeleton Refactoring
This patch tries to improve readability and maintenance
of createVectorizedLoopSkeleton by reorganizing some lines,
updating some of the comments and breaking it up into
smaller logical units.

Reviewed By: pjeeva01

Differential Revision: https://reviews.llvm.org/D83824
2020-08-04 14:50:57 -04:00
Florian Hahn 98db27711d [LV] Do not check widening decision for instrs outside of loop.
No widening decisions will be computed for instructions outside the
loop. Do not try to get a widening decision. The load/store will be just
a scalar load, so treating at as normal should be fine I think.

Fixes PR46950.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D85087
2020-08-03 10:09:24 +01:00
Vitaly Buka b0eb40ca39 [NFC] Remove unused GetUnderlyingObject paramenter
Depends on D84617.

Differential Revision: https://reviews.llvm.org/D84621
2020-07-31 02:10:03 -07:00
Vitaly Buka 89051ebace [NFC] GetUnderlyingObject -> getUnderlyingObject
I am going to touch them in the next patch anyway
2020-07-30 21:08:24 -07:00
David Green 1da0c47fa2 [LoopVectorizer] Don't create unused block masks for reductions. NFC
This removes some unneeded block masks when we don't have any
reductions. It should not have any effect on codegen as the values
created are dead anyway.

Differential Revision: https://reviews.llvm.org/D81415
2020-07-30 14:28:08 +01:00
Simon Pilgrim cc529285fd VectorUtils.h - reduce unnecessary includes. NFC.
Replace TargetLibraryInfo.h include with forward declaration and fix implicit dependencies.

Reduce SmallSet.h include to SmallVector.h include.
2020-07-30 12:27:49 +01:00
David Sherwood 9ad7c980bb [SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock
In vectorizeChainsInBlock we try to collect chains of PHI nodes
that have the same element type, but the code is relying upon
the implicit conversion from TypeSize -> uint64_t. For now, I have
modified the code to ignore PHI nodes with scalable types.

Differential Revision: https://reviews.llvm.org/D83542
2020-07-29 16:29:19 +01:00
David Green 60280e9818 [Analysis] TTI: Add CastContextHint for getCastInstrCost
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types.  Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization.  Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.

For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.

To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.

Original patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79162
2020-07-29 13:32:53 +01:00
Kazu Hirata 902cbcd59e Use llvm::is_contained where appropriate (NFC)
Summary:
This patch replaces std::find with llvm::is_contained where
appropriate.

Reviewers: efriedma, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84489
2020-07-27 10:20:44 -07:00
Hiroshi Yamauchi 7bedae7dee [PGO][PGSO] Add profile guided size optimization to loop vectorization legality. 2020-07-21 11:16:36 -07:00
Arthur Eubanks 0dfa4a83fa Revert "[PGO][PGSO] Add profile guided size optimization to loop vectorization legality."
This reverts commit 30c382a7c6.

See https://crbug.com/1106813.
2020-07-17 16:47:41 -07:00
Stanislav Mekhanoshin efb5040262 Fixed warning about signed/unsigned comparison
I've got the report clang11 issues signed/unsigned mismatch
warning here. For some reason only clang11 seems to issue
this warning.

Differential Revision: https://reviews.llvm.org/D83916
2020-07-17 11:03:42 -07:00
Anna Welker 23c9534515 [LV] Enable the LoopVectorizer to create pointer inductions
This patch enables the LoopVectorizer to build a phi of pointer
type and provide the vector loads and stores with vector type
getelementptrs built from the pointer induction variable, which
produces much less instructions than the previous approach of
creating scalar getelementpointers and glue them together to a
vector.

Differential Revision: https://reviews.llvm.org/D81267
2020-07-17 13:35:07 +01:00
Hiroshi Yamauchi 30c382a7c6 [PGO][PGSO] Add profile guided size optimization to loop vectorization legality.
Differential Revision: https://reviews.llvm.org/D83329
2020-07-15 11:49:36 -07:00
Sanne Wouda 13fec93a77 [NFC] rename to reflect F is not necessarily an Intrinsic 2020-07-13 15:28:46 +01:00
Sanne Wouda 7b84045565 [SLPVectorizer] handle vectorizeable library functions
Teaches the SLPVectorizer to use vectorized library functions for
non-intrinsic calls.

This already worked for intrinsics that have vectorized library
functions, thanks to D75878, but schedules with library functions with a
vector variant were being rejected early.

-   assume that there are no load/store dependencies between lib
    functions with a vector variant; this would otherwise prevent the
    bundle from becoming "ready"

-   check during legalization that the vector variant can be used

-   fix-up where we previously assumed that a call would be an intrinsic

Differential Revision: https://reviews.llvm.org/D82550
2020-07-13 15:28:46 +01:00
Ayal Zaks 82a5157ff1 [LV] Fixing versioning-for-unit-stide of loops with small trip count
This patch fixes D81345 and PR46652.

If a loop with a small trip count is compiled w/o -Os/-Oz, Loop Access Analysis
still generates runtime checks for unit strides that will version the loop.

In such cases, the loop vectorizer should either re-run the analysis or bail-out
from vectorizing the loop, as done prior to D81345. The latter is applied for
now as the former requires refactoring.

Differential Revision: https://reviews.llvm.org/D83470
2020-07-12 19:51:47 +03:00
Florian Hahn 264ab1e2c8 [LV] Pick vector loop body as insert point for SCEV expansion.
Currently the DomTree is not kept up to date for additional blocks
generated in the vector loop, for example when vectorizing with
predication. SCEVExpander relies on dominance checks when looking for
existing instructions to re-use and in some cases that can lead to the
expander picking instructions that do not actually dominate their insert
point (e.g. as in PR46525).

Unfortunately keeping the DT up-to-date is a bit tricky, because the CFG
is only patched up after generating code for a block. For now, we can
just use the vector loop header, as this ensures the inserted
instructions dominate all uses in the vector loop. There should be no
noticeable impact on the generated code, as other passes should sink
those instructions, if profitable.

Fixes PR46525.

Reviewers: Ayal, gilr, mkazantsev, dmgreen

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D83288
2020-07-10 10:37:12 +01:00
Benjamin Kramer b44470547e Make helpers static. NFC. 2020-07-09 13:48:56 +02:00
Nicolai Hähnle 3fa989d4fd DomTree: remove explicit use of DomTreeNodeBase::iterator
Summary:
Almost all uses of these iterators, including implicit ones, really
only need the const variant (as it should be). The only exception is
in NewGVN, which changes the order of dominator tree child nodes.

Change-Id: I4b5bd71e32d71b0c67b03d4927d93fe9413726d4

Reviewers: arsenm, RKSimon, mehdi_amini, courbet, rriddle, aartbik

Subscribers: wdng, Prazek, hiraditya, kuhar, rogfer01, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, vkmr, Kayjukh, jurahul, msifontes, cfe-commits, llvm-commits

Tags: #clang, #mlir, #llvm

Differential Revision: https://reviews.llvm.org/D83087
2020-07-08 18:18:49 +02:00
Stanislav Mekhanoshin 64030099c3 SLP: honor requested max vector size merging PHIs
At the moment this place does not check maximum size set
by TTI and just creates a maximum possible vectors.

Differential Revision: https://reviews.llvm.org/D82227
2020-07-08 08:06:15 -07:00
Florian Hahn 04b85e2bcb Revert "[SLP] Make sure instructions are ordered when computing spill cost."
This seems to break http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/24371

This reverts commit eb46137daa.
2020-07-07 23:15:01 +01:00
Ayal Zaks 7bf299c8d8 [LV] Vectorize without versioning-for-unit-stride under -Os/-Oz
If a loop is in a function marked OptSize, Loop Access Analysis should refrain
from generating runtime checks for unit strides that will version the loop.

If a loop is in a function marked OptSize and its vectorization is enabled, it
should be vectorized w/o any versioning.

Fixes PR46228.

Differential Revision: https://reviews.llvm.org/D81345
2020-07-07 15:04:21 +03:00
Jordan Rupprecht 10c82eecbc Revert "[LV] Enable the LoopVectorizer to create pointer inductions"
This reverts commit a8fe12065e.

It causes a crash when building gzip. Will post the detailed reduced test case to D81267.
2020-07-06 17:50:38 -07:00
Florian Hahn cff5739157 [LV] Pass dbgs() to verifyFunction call.
This is done in other places of the pass already and improves the output
on verification failure.
2020-07-06 15:09:20 +01:00
Florian Hahn eb46137daa [SLP] Make sure instructions are ordered when computing spill cost.
The entries in VectorizableTree are not necessarily ordered by their
position in basic blocks. Collect them and order them by dominance so
later instructions are guaranteed to be visited first. For instructions
in different basic blocks, we only scan to the beginning of the block,
so their order does not matter, as long as all instructions in a basic
block are grouped together. Using dominance ensures a deterministic order.

The modified test case contains an example where we compute a wrong
spill cost (2) without this patch, even though there is no call between
any instruction in the bundle.

This seems to have limited practical impact, .e.g on X86 with a recent
Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006
there are no binary changes.

Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D82444
2020-07-03 17:30:17 +01:00
Anna Welker a8fe12065e [LV] Enable the LoopVectorizer to create pointer inductions
This patch enables the LoopVectorizer to build a phi of pointer
type and provide the vector loads and stores with vector type
getelementptrs built from the pointer induction variable, which
produces much less instructions than the previous approach of
creating scalar getelementpointers and glue them together to a
vector.

Differential Revision: https://reviews.llvm.org/D81267
2020-07-02 11:39:28 +01:00
Sanjay Patel b6315aee5b [VectorCombine] try to form vector compare and binop to eliminate scalar ops
binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
-->
vcmp = cmp Pred X, VecC
ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0

This is a larger pattern than the existing extractelement folds because we can't
reasonably vectorize the sub-patterns with constants based on cost model calcs
(it doesn't usually make sense to replace a single extracted scalar op with
constant operand with a vector op).

I salvaged as much of the existing logic as I could, but there might be better
ways to share and reduce code.

The motivating case from PR43745:
https://bugs.llvm.org/show_bug.cgi?id=43745
...is the special case of a 2-way reduction. We tried to get SLP to handle that
particular pattern in D59710, but that caused crashing and regressions.
This patch is more general, but hopefully safer.

The v2f64 test with SSE2 surprised me - the cost model accounting looks like this:
OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4
NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5

Differential Revision: https://reviews.llvm.org/D82474
2020-06-29 10:38:52 -04:00
Sanjay Patel 3b95d8346d [VectorCombine] refactor - make helper function for extract to shuffle logic; NFC
Preliminary for D82474
2020-06-29 09:55:34 -04:00
Florian Hahn c0cdba727a [VPlan] Add & use VPValue for VPWidenGEPRecipe operands (NFC).
This patch adds VPValue version of the GEP's operands to
VPWidenGEPRecipe and uses them during code-generation.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D80220
2020-06-26 20:59:17 +01:00
Guillaume Chatelet 1507fc1506 [Alignment][NFC] Migrate TTI::isLegalToVectorize{Load,Store}Chain to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82653
2020-06-26 14:14:27 +00:00
Guillaume Chatelet b66e33a689 [Alignment][NFC] Migrate TTI::getGatherScatterOpCost to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82577
2020-06-26 11:08:27 +00:00
Guillaume Chatelet fdc7c7fb87 [Alignment][NFC] Migrate TTI::getInterleavedMemoryOpCost to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82573
2020-06-26 11:00:53 +00:00
Guillaume Chatelet 7e1f79c3de [Alignment][NFC] Migrate TTI::getMaskedMemoryOpCost to Align
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82569
2020-06-26 10:14:16 +00:00
Simon Pilgrim 1b10c618e9 LoopVectorize.h - reduce AliasAnalysis.h include to forward declaration. NFC.
Replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-26 10:49:00 +01:00
dfukalov 7ddee0922f [NFCI][CostModel] Add const to Value*.
Summary:
Get back `const` partially lost in one of recent changes.
Additionally specify explicit qualifiers in few places.

Reviewers: samparker

Reviewed By: samparker

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82383
2020-06-24 23:16:08 +03:00
Florian Hahn 35bb9bfbb0 [SLP] Limit GEP lists based on width of index computation.
D68667 introduced a tighter limit to the number of GEPs to simplify
together. The limit was based on the vector element size of the pointer,
but the pointers themselves are not actually put in vectors.

IIUC we try to vectorize the index computations here, so we should base
the limit on the vector element size of the computation of the index.

This restores the test regression on AArch64 and also restores the
vectorization for a important pattern in SPEC2006/464.h264ref on
AArch64 (@test_i16_extend). We get a large benefit from doing a single
load up front and then processing the index computations in vectors.

Note that we could probably even further improve the AArch64 codegen, if
we would do zexts to i32 instead of i64 for the sub operands and then do
a single vector sext on the result of the subtractions. AArch64 provides
dedicated vector instructions to do so. Sketch of proof in Alive:
https://alive2.llvm.org/ce/z/A4xYAB

Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel

Reviewed By: ABataev, spatel

Differential Revision: https://reviews.llvm.org/D82418
2020-06-24 19:56:53 +01:00
Sanjay Patel a0f967418f [VectorCombine] give invalid index value a name; NFC 2020-06-24 11:10:36 -04:00
Sanjay Patel 54143e2bd5 [VectorCombine] do not use magic number for undef mask element; NFC 2020-06-22 20:47:09 -04:00
Sanjay Patel 9934cc544c [VectorCombine] make helper function for shift-shuffle; NFC
This will probably be useful for other extract patterns.
2020-06-22 12:23:52 -04:00
Sanjay Patel 98c2f4eea5 [VectorCombine] add helper to replace uses and rename
The tests are regenerated to show a path that missed renaming,
but there should be no functional difference from this patch.
2020-06-22 09:58:49 -04:00
Sanjay Patel de65b356dc [VectorCombine] add/use pass-level IRBuilder
This saves creating/destroying a builder every time we
perform some transform.

The tests show instruction ordering diffs resulting from
always inserting at the root instruction now, but those
should be benign.
2020-06-22 09:01:29 -04:00
Sanjay Patel cce625f73d [VectorCombine] improve IR debugging by providing/salvaging value names
The tests are regenerated to show the diffs, but there should be no
functional change from this patch.
2020-06-22 08:35:47 -04:00
Sanjay Patel 6bdd531af5 [VectorCombine] create class for pass to hold analyses, etc; NFC
This doesn't change anything currently, but it would make sense
to create a class-level IRBuilder instead of recreating that
everywhere. As we expand to more optimizations, we will probably
also want to hold things like the DataLayout or other constant
refs in here too.
2020-06-21 16:07:33 -04:00
Sanjay Patel 741e20f3d6 [VectorCombine] fix assert for type of compare operand
As shown in the post-commit comment for D81661 - we need to
loosen the type assertion to allow scalarization of a compare
for vectors of pointers.
2020-06-20 15:20:17 -04:00
Sanjay Patel 216a37bb46 [VectorCombine] refactor extract-extract logic; NFCI 2020-06-19 14:52:27 -04:00
Sanjay Patel 6d864097a2 [VectorCombine] fix crash while transforming constants
This is a variation of the proposal in D82049 with an extra test.
2020-06-19 12:30:32 -04:00
Sanjay Patel 46a285ad9e [IRBuilder] add/use wrapper to create a generic compare based on predicate type; NFC
The predicate can always be used to distinguish between icmp and fcmp,
so we don't need to keep repeating this check in the callers.
2020-06-18 15:47:06 -04:00
Simon Pilgrim a5f1f9c9b8 ScalarEvolution.h - reduce LoopInfo.h include to forward declarations. NFC.
Move ScalarEvolution::forgetLoopDispositions implementation to ScalarEvolution.cpp to remove the dependency.

Add implicit header dependency to source files where necessary.
2020-06-17 15:48:23 +01:00
Sjoerd Meijer c1034d044a Follow up of rGe345d547a0d5, and attempt to pacify buildbot:
"error: 'get' is deprecated: The base class version of get with the scalable
argument defaulted to false is deprecated."

Changed VectorType::get() -> FixedVectorType::get().
2020-06-17 13:24:09 +01:00
Sjoerd Meijer e345d547a0 Recommit "[LV] Emit @llvm.get.active.lane.mask for tail-folded loops"
Fixed ARM regression test.

Please see the original commit message rG47650451738c for details.
2020-06-17 13:12:15 +01:00
Sjoerd Meijer d4e183f686 Revert "[LV] Emit @llvm.get.active.mask for tail-folded loops"
This reverts commit 4765045173
while I investigate the build bot failures.
2020-06-17 10:09:54 +01:00
Sjoerd Meijer 4765045173 [LV] Emit @llvm.get.active.mask for tail-folded loops
This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised
loops if the intrinsic is supported by the backend, which is checked by
querying TargetTransform hook emitGetActiveLaneMask.

This intrinsic creates a mask representing active and inactive vector lanes,
which is used by the masked load/store instructions that are created for
tail-folded loops. The semantics of @llvm.get.active.mask are described here in
LangRef:

https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics

This intrinsic is also used to provide a hint to the backend. That is, the
second argument of the intrinsic represents the back-edge taken count of the
loop. For MVE, for example, we use that to set up tail-predication, which is a
new form of predication in MVE for vector loops that implicitely predicates the
last vector loop iteration by implicitely setting active/inactive lanes, i.e.
the tail loop is predicated. In order to set up a tail-predicated vector loop,
we need to know the number of data elements processed by the vector loop, which
corresponds the the tripcount of the scalar loop, which we can now reconstruct
using @llvm.get.active.mask.

Differential Revision: https://reviews.llvm.org/D79100
2020-06-17 09:53:58 +01:00
Christopher Tetreault ff628f5f5e [SVE] Eliminate calls to default-false VectorType::get() from Vectorize
Reviewers: efriedma, fhahn, spatel, sdesmalen, kmclaughlin

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81521
2020-06-16 12:50:13 -07:00
Sanjay Patel ed67f5e7ab [VectorCombine] scalarize compares with insertelement operand(s)
Generalize scalarization (recently enhanced with D80885)
to allow compares as well as binops.
Similar to binops, we are avoiding scalarization of a loaded
value because that could avoid a register transfer in codegen.
This requires 1 extra predicate that I am aware of: we do not
want to scalarize the condition value of a vector select. That
might also invert a transform that we do in instcombine that
prefers a vector condition operand for a vector select.

I think this is the final step in solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463

Differential Revision: https://reviews.llvm.org/D81661
2020-06-16 13:48:10 -04:00
Sam Parker 2596da3174 [CostModel] getCFInstrCost in getUserCost.
Have BasicTTI call the base implementation so that both agree on the
default behaviour, which the default being a cost of '1'. This has
required an X86 specific implementation as it seems to be very
reliant on those instructions being free. Changes are also made to
AMDGPU so that their implementations distinguish between cost kinds,
so that the unrolling isn't affected. PowerPC also has its own
implementation to prevent changes to the reg-usage vectorizer test.

The cost model test changes now reflect that ret instructions are not
generally free.

Differential Revision: https://reviews.llvm.org/D79164
2020-06-15 09:28:46 +01:00
Roman Lebedev 7aeb41b3c8
[NFCI] VectorCombine: add statistic for bitcast(shuf()) -> shuf(bitcast()) xform 2020-06-12 23:10:53 +03:00
Florian Hahn 3a846d4d92 [VPlan] Reject loops without computable backedge taken counts
getOrCreateTripCount is used to generate code for the outer loop, but it
requires a computable backedge taken counts. Check that in the VPlan
native path.

Reviewers: Ayal, gilr, rengolin, sguggill

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D81088
2020-06-12 10:31:18 +01:00
Sanjay Patel 039ff29ef6 [VectorCombine] remove unused parameters; NFC 2020-06-11 19:15:03 -04:00
Simon Pilgrim 5dc4e7c2b9 [VectorCombine] scalarizeBinop - support an all-constant src vector operand
scalarizeBinop currently folds

  vec_bo((inselt VecC0, V0, Index), (inselt VecC1, V1, Index))
  ->
  inselt(vec_bo(VecC0, VecC1), scl_bo(V0,V1), Index)

This patch extends this to account for cases where one of the vec_bo operands is already all-constant and performs similar cost checks to determine if the scalar binop with a constant still makes sense:

  vec_bo((inselt VecC0, V0, Index), VecC1)
  ->
  inselt(vec_bo(VecC0, VecC1), scl_bo(V0,extractelt(V1,Index)), Index)

Fixes PR42174

Differential Revision: https://reviews.llvm.org/D80885
2020-06-09 19:02:05 +01:00
Benjamin Kramer 3badd17b69 SmallPtrSet::find -> SmallPtrSet::count
The latter is more readable and more efficient. While there clean up
some double lookups. NFCI.
2020-06-07 22:38:08 +02:00
Simon Pilgrim 5006e551d3 LoopAnalysisManager.h - reduce includes to forward declarations. NFC.
Move implicit include dependencies down to header/source files.
2020-06-06 14:06:46 +01:00
Florian Hahn 211596c94e [VPlan] Support extracting lanes for defs managed in VPTransformState.
Currently extracting a lane for a VPValue def is not supported, if it is
managed directly by VPTransformState (e.g. because it is created by a
VPInstruction or an external VPValue def).

For now, simply extract the requested lane. In the future, we should
also cache the extracted scalar values, similar to LV.

Reviewers: Ayal, rengolin, gilr, SjoerdMeijer

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D80787
2020-06-03 12:14:16 +01:00
Florian Hahn b446ec56a2 [LV] Make sure the MaxVF is a power-of-2 by rounding down.
LV currently only supports power of 2 vectorization factors, which has
been made explicit with the assertion added in
840450549c.

However, if the widest type is not a power-of-2 the computed MaxVF won't
be a power-of-2 either. This patch updates computeFeasibleMaxVF to
ensure the returned value is a power-of-2 by rounding down to the
nearest power-of-2.

Fixes PR46139.

Reviewers: Ayal, gilr, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D80870
2020-06-02 10:40:49 +01:00
Valery N Dmitriev a45688a72c [SLP] Apply external to vectorizable tree users cost adjustment for
relevant aggregate build instructions only (UserCost).
Users are detected with findBuildAggregate routine and the trick is
that following SLP vectorization may end up vectorizing entire list
with smaller chunks. Cost adjustment then is applied for individual
chunks and these adjustments obviously have to be smaller than the
entire aggregate build cost.

Differential Revision: https://reviews.llvm.org/D80773
2020-05-29 15:37:41 -07:00
Christopher Tetreault d2befc6633 [SVE] Eliminate calls to default-false VectorType::get() from Vectorize
Reviewers: efriedma, c-rhodes, david-arm, fhahn

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80339
2020-05-29 11:31:24 -07:00
Florian Hahn 9b507b2127 [LAA] We only need pointer checks if there are non-zero checks (NFC).
If it turns out that we can do runtime checks, but there are no
runtime-checks to generate, set RtCheck.Need to false.

This can happen if we can prove statically that the pointers passed in
to canCheckPtrAtRT do not alias. This should not change any results, but
allows us to skip some work and assert that runtime checks are
generated, if LAA indicates that runtime checks are required.

Reviewers: anemet, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D79969

Note: This is a recommit of 259abfc7cb,
with some suggested renaming.
2020-05-27 12:47:36 +01:00
Florian Hahn 2d0389821e Revert "[LAA] We only need pointer checks if there are non-zero checks (NFC)."
This reverts commit 259abfc7cb.

Reverting this, as I missed a case where we return without setting
RtCheck.Need.
2020-05-27 12:39:45 +01:00
Florian Hahn 259abfc7cb [LAA] We only need pointer checks if there are non-zero checks (NFC).
If it turns out that we can do runtime checks, but there are no
runtime-checks to generate, set RtCheck.Need to false.

This can happen if we can prove statically that the pointers passed in
to canCheckPtrAtRT do not alias. This should not change any results, but
allows us to skip some work and assert that runtime checks are
generated, if LAA indicates that runtime checks are required.

Reviewers: anemet, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D79969
2020-05-27 12:37:20 +01:00
Simon Pilgrim 35963f6d85 VPlanValue.h - reduce unnecessary includes to forward declarations. NFC. 2020-05-27 11:26:14 +01:00
Ayal Zaks 840450549c [LV] Clamp MaxVF to power of 2.
If a loop has a constant trip count known to be a multiple of MaxVF (times user
UF), LV infers that no tail will be generated for any chosen VF. This relies on
the chosen VF's being powers of 2 bound by MaxVF, and assumes MaxVF is a power
of 2. Make sure the latter holds, in particular when MaxVF is set by a memory
dependence distance which may not be a power of 2.

Differential Revision: https://reviews.llvm.org/D80491
2020-05-25 11:24:33 +03:00
Florian Hahn 0deab8a54f [LV] Either get invariant condition OR vector condition.
Currently we unconditionally get the first lane of the condition
operand, even if we later use the full vector condition. This can result
in some unnecessary instructions being generated.

Suggested as follow-up in D80219.
2020-05-24 17:16:42 +01:00
Sanjay Patel 7eed772a27 [PatternMatch] abbreviate vector inst matchers; NFC
Readability is not reduced with these opcodes/match lines,
so reduce odds of awkward wrapping from 80-col limit.
2020-05-24 09:19:47 -04:00
Florian Hahn 15224408f0 [VPlan] Use VPUser for VPWidenSelectRecipe operands (NFC).
VPWidenSelectRecipe already contains a VPUser, but it is not used. This
patch updates the code related to VPWidenSelectRecipe to use VPUser for
its operands.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D80219
2020-05-24 13:58:08 +01:00
Sanjay Patel 024098ae53 [VectorCombine] set preserve alias analysis
As noted in D80236, moving the pass in the pipeline exposed this
shortcoming. Extra work to recalculate the alias results showed
up as a compile-time slowdown.
2020-05-22 16:25:16 -04:00
Anh Tuyen Tran 13bf6039c9 Title: [LV] Handle Fold-Tail of loops with vectorizarion factor equal to 1
Summary:
When handling loops whose VF is 1, fold-tail vectorization sets the
backedge taken count of the original loop with a vector of a single
element. This causes type-mismatch during instruction generartion.

The purpose of this patch is toto address the case of VF==1.

Reviewer: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn), gilr (Gil Rapaport), rengolin (Renato Golin)

Reviewed By: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn)

Subscribers: Ayal (Ayal Zaks), rkruppe (Hanna Kruppe), bmahjour (Bardia Mahjour), rogfer01 (Roger Ferrer Ibanez), vkmr (Vineet Kumar), bollu (Siddharth Bhat), hiraditya (Aditya Kumar), llvm-commits (Mailing List llvm-commits)

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D79976
2020-05-22 13:30:56 +00:00
Sanjay Patel 21f7cf4057 [SLP] fix verification check for valid IR
This is a fix for PR45965 - https://bugs.llvm.org/show_bug.cgi?id=45965 -
which was left out of D80106 because of a test failure.

SLP does its own mini-CSE after potentially creating redundant instructions,
so we need to wait for that to complete before running the verifier.
Otherwise, we will see a test failure for
test/Transforms/SLPVectorizer/X86/crash_vectorizeTree.ll (not changed here)
because a phi temporarily has identical but different incoming values for
the same incoming block.

A related, but independent, test that would have been altered here was
fixed with:
rG880df55

The test was escaping verification in SLP without this change because we
were not running verifyFunction() unless SLP actually changed the IR.

Differential Revision: https://reviews.llvm.org/D80401
2020-05-22 09:15:27 -04:00
Dinar Temirbulatov df3b95bc0a [SLP][NFC] PR45269 getVectorElementSize() is slow
The algorithm inside getVectorElementSize() is almost O(x^2) complexity and
when, for example, we compile MultiSource/Applications/ClamAV/shared_sha256.c
with 1k instructions inside sha256_transform() function that resulted in almost
~800k iterations. The following change improves the algorithm with the map to
a liner complexity.

Differential Revision: https://reviews.llvm.org/D80241
2020-05-21 17:26:50 +02:00
Sam Parker 8cc911fa5b [NFCI][CostModel] Refactor getIntrinsicInstrCost
Combine the two API calls into one by introducing a structure to hold
the relevant data. This has the added benefit of moving the boiler
plate code for arguments and flags, into the constructors. This is
intended to be a non-functional change, but the complicated web of
logic involved here makes it very hard to guarantee.

Differential Revision: https://reviews.llvm.org/D79941
2020-05-20 11:59:08 +01:00
Florian Hahn bcbd26bfe6 [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

This patch was originally committed as b8a3c34eee, but broke the
modules build, as LoopAccessAnalysis was using the Expander.

The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537
2020-05-20 10:53:40 +01:00
Florian Hahn 7cefd1b4cd [LV] Remove duplicated return stmt (NFC). 2020-05-19 17:20:50 +01:00
Florian Hahn cff9399f6b [VPlan] Fix comment for User in VPWidenSelectRecipe (NFC).
The comment was referring the arguments of the call, but the recipe
widens a select.
2020-05-19 15:31:39 +01:00
Florian Hahn f828d75b46 [VPlan] Add & use VPValue operands for VPReplicateRecipe (NFC).
This patch adds VPValue version of the instruction operands to
VPReplicateRecipe and uses them during code-generation.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D80114
2020-05-19 15:12:17 +01:00