Commit Graph

2332 Commits

Author SHA1 Message Date
Florian Hahn 259abfc7cb [LAA] We only need pointer checks if there are non-zero checks (NFC).
If it turns out that we can do runtime checks, but there are no
runtime-checks to generate, set RtCheck.Need to false.

This can happen if we can prove statically that the pointers passed in
to canCheckPtrAtRT do not alias. This should not change any results, but
allows us to skip some work and assert that runtime checks are
generated, if LAA indicates that runtime checks are required.

Reviewers: anemet, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D79969
2020-05-27 12:37:20 +01:00
Simon Pilgrim 35963f6d85 VPlanValue.h - reduce unnecessary includes to forward declarations. NFC. 2020-05-27 11:26:14 +01:00
Ayal Zaks 840450549c [LV] Clamp MaxVF to power of 2.
If a loop has a constant trip count known to be a multiple of MaxVF (times user
UF), LV infers that no tail will be generated for any chosen VF. This relies on
the chosen VF's being powers of 2 bound by MaxVF, and assumes MaxVF is a power
of 2. Make sure the latter holds, in particular when MaxVF is set by a memory
dependence distance which may not be a power of 2.

Differential Revision: https://reviews.llvm.org/D80491
2020-05-25 11:24:33 +03:00
Florian Hahn 0deab8a54f [LV] Either get invariant condition OR vector condition.
Currently we unconditionally get the first lane of the condition
operand, even if we later use the full vector condition. This can result
in some unnecessary instructions being generated.

Suggested as follow-up in D80219.
2020-05-24 17:16:42 +01:00
Sanjay Patel 7eed772a27 [PatternMatch] abbreviate vector inst matchers; NFC
Readability is not reduced with these opcodes/match lines,
so reduce odds of awkward wrapping from 80-col limit.
2020-05-24 09:19:47 -04:00
Florian Hahn 15224408f0 [VPlan] Use VPUser for VPWidenSelectRecipe operands (NFC).
VPWidenSelectRecipe already contains a VPUser, but it is not used. This
patch updates the code related to VPWidenSelectRecipe to use VPUser for
its operands.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D80219
2020-05-24 13:58:08 +01:00
Sanjay Patel 024098ae53 [VectorCombine] set preserve alias analysis
As noted in D80236, moving the pass in the pipeline exposed this
shortcoming. Extra work to recalculate the alias results showed
up as a compile-time slowdown.
2020-05-22 16:25:16 -04:00
Anh Tuyen Tran 13bf6039c9 Title: [LV] Handle Fold-Tail of loops with vectorizarion factor equal to 1
Summary:
When handling loops whose VF is 1, fold-tail vectorization sets the
backedge taken count of the original loop with a vector of a single
element. This causes type-mismatch during instruction generartion.

The purpose of this patch is toto address the case of VF==1.

Reviewer: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn), gilr (Gil Rapaport), rengolin (Renato Golin)

Reviewed By: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn)

Subscribers: Ayal (Ayal Zaks), rkruppe (Hanna Kruppe), bmahjour (Bardia Mahjour), rogfer01 (Roger Ferrer Ibanez), vkmr (Vineet Kumar), bollu (Siddharth Bhat), hiraditya (Aditya Kumar), llvm-commits (Mailing List llvm-commits)

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D79976
2020-05-22 13:30:56 +00:00
Sanjay Patel 21f7cf4057 [SLP] fix verification check for valid IR
This is a fix for PR45965 - https://bugs.llvm.org/show_bug.cgi?id=45965 -
which was left out of D80106 because of a test failure.

SLP does its own mini-CSE after potentially creating redundant instructions,
so we need to wait for that to complete before running the verifier.
Otherwise, we will see a test failure for
test/Transforms/SLPVectorizer/X86/crash_vectorizeTree.ll (not changed here)
because a phi temporarily has identical but different incoming values for
the same incoming block.

A related, but independent, test that would have been altered here was
fixed with:
rG880df55

The test was escaping verification in SLP without this change because we
were not running verifyFunction() unless SLP actually changed the IR.

Differential Revision: https://reviews.llvm.org/D80401
2020-05-22 09:15:27 -04:00
Dinar Temirbulatov df3b95bc0a [SLP][NFC] PR45269 getVectorElementSize() is slow
The algorithm inside getVectorElementSize() is almost O(x^2) complexity and
when, for example, we compile MultiSource/Applications/ClamAV/shared_sha256.c
with 1k instructions inside sha256_transform() function that resulted in almost
~800k iterations. The following change improves the algorithm with the map to
a liner complexity.

Differential Revision: https://reviews.llvm.org/D80241
2020-05-21 17:26:50 +02:00
Sam Parker 8cc911fa5b [NFCI][CostModel] Refactor getIntrinsicInstrCost
Combine the two API calls into one by introducing a structure to hold
the relevant data. This has the added benefit of moving the boiler
plate code for arguments and flags, into the constructors. This is
intended to be a non-functional change, but the complicated web of
logic involved here makes it very hard to guarantee.

Differential Revision: https://reviews.llvm.org/D79941
2020-05-20 11:59:08 +01:00
Florian Hahn bcbd26bfe6 [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

This patch was originally committed as b8a3c34eee, but broke the
modules build, as LoopAccessAnalysis was using the Expander.

The code-gen part of LAA was moved to lib/Transforms recently, so this
patch can be landed again.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537
2020-05-20 10:53:40 +01:00
Florian Hahn 7cefd1b4cd [LV] Remove duplicated return stmt (NFC). 2020-05-19 17:20:50 +01:00
Florian Hahn cff9399f6b [VPlan] Fix comment for User in VPWidenSelectRecipe (NFC).
The comment was referring the arguments of the call, but the recipe
widens a select.
2020-05-19 15:31:39 +01:00
Florian Hahn f828d75b46 [VPlan] Add & use VPValue operands for VPReplicateRecipe (NFC).
This patch adds VPValue version of the instruction operands to
VPReplicateRecipe and uses them during code-generation.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D80114
2020-05-19 15:12:17 +01:00
Florian Hahn 66ad107452 [VPlan] Remove unique_ptr from VPBranchOnRecipeMask (NFC).
We can remove a dynamic memory allocation, by checking the number of
operands: no operands = all true, 1 operand = mask.

Reviewers: Ayal, gilr, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D80110
2020-05-19 15:01:37 +01:00
Eli Friedman 27b4e6931d [NFC] Replace MaybeAlign with Align in TargetTransformInfo. 2020-05-18 19:25:49 -07:00
Ayal Zaks 682e739638 [LV] Fix FoldTail under user VF and UF
LV considers an internally computed MaxVF to decide if a constant trip-count is
a multiple of any subsequently chosen VF, and conclude that no scalar remainder
iterations (tail) will be left for Fold Tail to handle. If an external VF is
provided via -force-vector-width, it must be considered instead of the internal
MaxVF.
If an external UF is provided via -force-vector-interleave, it too must be
considered in addition to MaxVF or user VF.

Fixes PR45679.

Differential Revision: https://reviews.llvm.org/D80085
2020-05-19 01:32:25 +03:00
Craig Topper c9f63297e2 Fix several places that were calling verifyFunction or verifyModule without checking the return value.
verifyFunction/verifyModule don't assert or error internally. They
also don't print anything if you don't pass a raw_ostream to them.
So the caller needs to check the result and ideally pass a stream
to get the messages. Otherwise they're just really expensive no-ops.

I've filed PR45965 for another instance in SLPVectorizer
that causes a lit test failure.

Differential Revision: https://reviews.llvm.org/D80106
2020-05-18 13:28:46 -07:00
Volkan Keles 63081dc6f6 LoadStoreVectorizer: Match nested adds to prove vectorization is safe
If both OpA and OpB is an add with NSW/NUW and with the same LHS operand,
we can guarantee that the transformation is safe if we can prove that OpA
won't overflow when IdxDiff added to the RHS of OpA.

Review: https://reviews.llvm.org/D79817
2020-05-18 12:13:01 -07:00
Nikita Popov 52e98f620c [Alignment] Remove unnecessary getValueOrABITypeAlignment calls (NFC)
Now that load/store alignment is required, we no longer need most
of them. Also switch the getLoadStoreAlignment() helper to return
Align instead of MaybeAlign.
2020-05-17 22:19:15 +02:00
Sanjay Patel 81e9ede3a2 [VectorCombine] forward walk through instructions to improve chaining of transforms
This is split off from D79799 - where I was proposing to fully iterate
over a function until there are no more transforms. I suspect we are
still going to want to do something like that eventually.

But we can achieve the same gains much more efficiently on the current
set of regression tests just by reversing the order that we visit the
instructions.

This may also reduce the motivation for D79078, but we are still not
getting the optimal pattern for a reduction.
2020-05-16 13:08:01 -04:00
Florian Hahn 4c8285c750 [VPlan] Move emission of \\l\"+\n to dumpBasicBlock (NFC).
The patch standardizes printing of VPRecipes a bit, by hoisting out the
common emission of \\l\"+\n. It simplifies the code and is also a first
step towards untangling printing from DOT format output, with the goal
of making the DOT output optional and to provide a more concise debug
output if DOT output is disabled.

Reviewers: gilr, Ayal, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D78883
2020-05-14 13:07:59 +01:00
Alina Sbirlea bd541b217f [NewPassManager] Add assertions when getting statefull cached analysis.
Summary:
Analyses that are statefull should not be retrieved through a proxy from
an outer IR unit, as these analyses are only invalidated at the end of
the inner IR unit manager.
This patch disallows getting the outer manager and provides an API to
get a cached analysis through the proxy. If the analysis is not
stateless, the call to getCachedResult will assert.

Reviewers: chandlerc

Subscribers: mehdi_amini, eraman, hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72893
2020-05-13 12:38:38 -07:00
Sjoerd Meijer 9529597cf4 Recommit #2: "[LV] Induction Variable does not remain scalar under tail-folding."
This was reverted because of a miscompilation. At closer inspection, the
problem was actually visible in a changed llvm regression test too. This
one-line follow up fix/recommit will splat the IV, which is what we are trying
to avoid if unnecessary in general, if tail-folding is requested even if all
users are scalar instructions after vectorisation. Because with tail-folding,
the splat IV will be used by the predicate of the masked loads/stores
instructions. The previous version omitted this, which caused the
miscompilation. The original commit message was:

If tail-folding of the scalar remainder loop is applied, the primary induction
variable is splat to a vector and used by the masked load/store vector
instructions, thus the IV does not remain scalar. Because we now mark
that the IV does not remain scalar for these cases, we don't emit the vector IV
if it is not used. Thus, the vectoriser produces less dead code.

Thanks to Ayal Zaks for the direction how to fix this.
2020-05-13 13:50:09 +01:00
Sanjay Patel 5f730b645d [VectorCombine] account for extra uses in scalarization cost
Follow-up to D79452.
Mimics the extra use cost formula for the inverse transform with extracts.
2020-05-11 15:20:57 -04:00
Florian Hahn 8528186b9b [LAA] Move runtime-check generation to Transforms/Utils/loopUtils (NFC)
Currently LAA's uses of ScalarEvolutionExpander blocks moving the
expander from Analysis to Transforms. Conceptually the expander does not
fit into Analysis (it is only used for code generation) and
runtime-check generation also seems to be better suited as a
transformation utility.

Reviewers: Ayal, anemet

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D78460
2020-05-10 17:39:26 +01:00
Florian Hahn 96c63f544f Recommit "[LAA] Remove one addRuntimeChecks function (NFC)."
The failing assertion has been fixed and the problematic test case has
been added.

This reverts the revert commit fc44617f28.
2020-05-10 15:19:57 +01:00
Florian Hahn fc44617f28 Revert "[LAA] Remove one addRuntimeChecks function (NFC)."
This reverts commit c28114c8ff.

This causes some bots to fail:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-android/builds/30596/steps/build%20android%2Faarch64/logs/stdio
2020-05-10 13:28:00 +01:00
Florian Hahn c28114c8ff [LAA] Remove one addRuntimeChecks function (NFC).
In order to reduce the API surface area (preparation for D78460), remove
a addRuntimeChecks() function and do the additional check in the single
caller.

Reviewers: Ayal, anemet

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D79679
2020-05-10 12:48:55 +01:00
Sanjay Patel 0d2a0b44c8 [VectorCombine] scalarize binop of inserted elements into vector constants
As with the extractelement patterns that are currently in vector-combine,
there are going to be several possible variations on this theme. This
should be the clearest, simplest example.

Scalarization is the right direction for target-independent canonicalization,
and InstCombine has some of those folds already, but it doesn't do this.
I proposed a similar transform in D50992. Here in vector-combine, we can
check the cost model to be sure it's profitable, so there should be less risk.

Differential Revision: https://reviews.llvm.org/D79452
2020-05-08 16:31:12 -04:00
Benjamin Kramer f936457f80 Revert "Recommit "[LV] Induction Variable does not remain scalar under tail-folding.""
This reverts commit ae45b4dbe7. It
causes miscompilations, test case on the mailing list.
2020-05-08 14:49:10 +02:00
Sanjay Patel 02051c7f3a [SLP] add another bailout for load-combine patterns (2nd try)
The original patch (rG86dfbc676ebe) exposed an existing bug:
we could wrongly cast a constant expression to BinaryOperator
because the pattern matching allows that. This adds a check
for that case, and there's a reduced test case to verify no
crashing.

Original commit message:

This builds on the or-reduction bailout that was added with D67841.
We still do not have IR-level load combining, although that could
be a target-specific enhancement for -vector-combiner.

The heuristic is narrowly defined to catch the motivating case from
PR39538:
https://bugs.llvm.org/show_bug.cgi?id=39538
...while preserving existing functionality.

That is, there's an unmodified test of pure load/zext/store that is
not seen in this patch at llvm/test/Transforms/SLPVectorizer/X86/cast.ll.
That's the reason for the logic difference to require the 'or'
instructions. The chances that vectorization would actually help a
memory-bound sequence like that seem small, but it looks nicer with:

  vpmovzxwd     (%rsi), %xmm0
  vmovdqu       %xmm0, (%rdi)

rather than:

  movzwl        (%rsi), %eax
  movl  %eax, (%rdi)
  ...

In the motivating test, we avoid creating a vector mess that is
unrecoverable in the backend, and SDAG forms the expected bswap
instructions after load combining:

  movzbl (%rdi), %eax
  vmovd %eax, %xmm0
  movzbl 1(%rdi), %eax
  vmovd %eax, %xmm1
  movzbl 2(%rdi), %eax
  vpinsrb $4, 4(%rdi), %xmm0, %xmm0
  vpinsrb $8, 8(%rdi), %xmm0, %xmm0
  vpinsrb $12, 12(%rdi), %xmm0, %xmm0
  vmovd %eax, %xmm2
  movzbl 3(%rdi), %eax
  vpinsrb $1, 5(%rdi), %xmm1, %xmm1
  vpinsrb $2, 9(%rdi), %xmm1, %xmm1
  vpinsrb $3, 13(%rdi), %xmm1, %xmm1
  vpslld $24, %xmm0, %xmm0
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpslld $16, %xmm1, %xmm1
  vpor %xmm0, %xmm1, %xmm0
  vpinsrb $1, 6(%rdi), %xmm2, %xmm1
  vmovd %eax, %xmm2
  vpinsrb $2, 10(%rdi), %xmm1, %xmm1
  vpinsrb $3, 14(%rdi), %xmm1, %xmm1
  vpinsrb $1, 7(%rdi), %xmm2, %xmm2
  vpinsrb $2, 11(%rdi), %xmm2, %xmm2
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpinsrb $3, 15(%rdi), %xmm2, %xmm2
  vpslld $8, %xmm1, %xmm1
  vpmovzxbd %xmm2, %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero
  vpor %xmm2, %xmm1, %xmm1
  vpor %xmm1, %xmm0, %xmm0
  vmovdqu %xmm0, (%rsi)

  movl  (%rdi), %eax
  movl  4(%rdi), %ecx
  movl  8(%rdi), %edx
  movbel        %eax, (%rsi)
  movbel        %ecx, 4(%rsi)
  movl  12(%rdi), %ecx
  movbel        %edx, 8(%rsi)
  movbel        %ecx, 12(%rsi)

Differential Revision: https://reviews.llvm.org/D78997
2020-05-07 15:04:37 -04:00
Hans Wennborg c54c6ee1a7 Revert "[SLP] add another bailout for load-combine patterns"
It caused asserts building Chromium, see discussion on
https://reviews.llvm.org/D78997

This reverts commit 86dfbc676e.
2020-05-07 16:31:52 +02:00
Sjoerd Meijer 3bbc71d6c9 [LV] Fix typo in variable name. NFC. 2020-05-07 13:53:44 +01:00
Sjoerd Meijer ae45b4dbe7 Recommit "[LV] Induction Variable does not remain scalar under tail-folding."
With 3 llvm regr tests fixed/updated that I had missed.
2020-05-07 11:52:20 +01:00
Sjoerd Meijer 20d67ffeae Revert "[LV] Induction Variable does not remain scalar under tail-folding."
This reverts commit 617aa64c84.

while I investigate buildbot failures.
2020-05-07 09:29:56 +01:00
Sjoerd Meijer 617aa64c84 [LV] Induction Variable does not remain scalar under tail-folding.
If tail-folding of the scalar remainder loop is applied, the primary induction
variable is splat to a vector and used by the masked load/store vector
instructions, thus the IV does not remain scalar. Because we now mark
that the IV does not remain scalar for these cases, we don't emit the vector IV
if it is not used. Thus, the vectoriser produces less dead code.

Thanks to Ayal Zaks for the direction how to fix this.

Differential Revision: https://reviews.llvm.org/D78911
2020-05-07 09:15:23 +01:00
Sanjay Patel 86dfbc676e [SLP] add another bailout for load-combine patterns
This builds on the or-reduction bailout that was added with D67841.
We still do not have IR-level load combining, although that could
be a target-specific enhancement for -vector-combiner.

The heuristic is narrowly defined to catch the motivating case from
PR39538:
https://bugs.llvm.org/show_bug.cgi?id=39538
...while preserving existing functionality.

That is, there's an unmodified test of pure load/zext/store that is
not seen in this patch at llvm/test/Transforms/SLPVectorizer/X86/cast.ll.
That's the reason for the logic difference to require the 'or'
instructions. The chances that vectorization would actually help a
memory-bound sequence like that seem small, but it looks nicer with:

  vpmovzxwd	(%rsi), %xmm0
  vmovdqu	%xmm0, (%rdi)

rather than:

  movzwl	(%rsi), %eax
  movl	%eax, (%rdi)
  ...

In the motivating test, we avoid creating a vector mess that is
unrecoverable in the backend, and SDAG forms the expected bswap
instructions after load combining:

  movzbl (%rdi), %eax
  vmovd %eax, %xmm0
  movzbl 1(%rdi), %eax
  vmovd %eax, %xmm1
  movzbl 2(%rdi), %eax
  vpinsrb $4, 4(%rdi), %xmm0, %xmm0
  vpinsrb $8, 8(%rdi), %xmm0, %xmm0
  vpinsrb $12, 12(%rdi), %xmm0, %xmm0
  vmovd %eax, %xmm2
  movzbl 3(%rdi), %eax
  vpinsrb $1, 5(%rdi), %xmm1, %xmm1
  vpinsrb $2, 9(%rdi), %xmm1, %xmm1
  vpinsrb $3, 13(%rdi), %xmm1, %xmm1
  vpslld $24, %xmm0, %xmm0
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpslld $16, %xmm1, %xmm1
  vpor %xmm0, %xmm1, %xmm0
  vpinsrb $1, 6(%rdi), %xmm2, %xmm1
  vmovd %eax, %xmm2
  vpinsrb $2, 10(%rdi), %xmm1, %xmm1
  vpinsrb $3, 14(%rdi), %xmm1, %xmm1
  vpinsrb $1, 7(%rdi), %xmm2, %xmm2
  vpinsrb $2, 11(%rdi), %xmm2, %xmm2
  vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
  vpinsrb $3, 15(%rdi), %xmm2, %xmm2
  vpslld $8, %xmm1, %xmm1
  vpmovzxbd %xmm2, %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero
  vpor %xmm2, %xmm1, %xmm1
  vpor %xmm1, %xmm0, %xmm0
  vmovdqu %xmm0, (%rsi)

  movl	(%rdi), %eax
  movl	4(%rdi), %ecx
  movl	8(%rdi), %edx
  movbel	%eax, (%rsi)
  movbel	%ecx, 4(%rsi)
  movl	12(%rdi), %ecx
  movbel	%edx, 8(%rsi)
  movbel	%ecx, 12(%rsi)

Differential Revision: https://reviews.llvm.org/D78997
2020-05-05 12:44:38 -04:00
Simon Pilgrim 4e3c005554 [TTI] getScalarizationOverhead - use explicit VectorType operand
getScalarizationOverhead is only ever called with vectors (and we already had a load of cast<VectorType> calls immediately inside the functions).

Followup to D78357

Reviewed By: @samparker

Differential Revision: https://reviews.llvm.org/D79341
2020-05-05 16:59:23 +01:00
Sam Parker 40574fefe9 [NFC][CostModel] Add TargetCostKind to relevant APIs
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.

RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html

Differential Revision: https://reviews.llvm.org/D79002
2020-05-05 10:35:54 +01:00
Florian Hahn bbdfcf8f69 [VPlan] Remove unused & undefined print method (NFC). 2020-05-03 18:36:20 +01:00
Anh Tuyen Tran c7878ad231 [VFDatabase] Scalar functions are vector functions with VF =1
Summary:
Return scalar function when VF==1. The new trivial mapping scalar --> scalar when VF==1 to prevent false positive for "isVectorizable" query.

Author: masoud.ataei (Masoud Ataei)

Reviewers: Whitney (Whitney Tsang), fhahn (Florian Hahn), pjeeva01 (Jeeva P.), fpetrogalli (Francesco Petrogalli), rengolin (Renato Golin)

Reviewed By: fpetrogalli (Francesco Petrogalli)

Subscribers: hiraditya (Aditya Kumar), llvm-commits, LLVM

Tag: LLVM

Differential Revision: https://reviews.llvm.org/D78054
2020-04-29 17:20:37 +00:00
Simon Pilgrim 090cae8491 [TTI] Add DemandedElts to getScalarizationOverhead
The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.

This patch does 2 things:
1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.

This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.

A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.

Reviewed By: @craig.topper

Differential Revision: https://reviews.llvm.org/D78216
2020-04-29 12:00:38 +01:00
Florian Hahn e89379856a Recommit "[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)."
The crash that caused the original revert has been fixed in
a3c964a278. I also added a reduced version of the crash reproducer.

This reverts the revert commit 2107af9ccf.
2020-04-29 11:40:39 +01:00
Sanjay Patel 21acc0612a [SLP] refactor load-combine logic; NFC
We may want to identify sequences that are not
reductions, but still qualify as load-combines
in the back-end, so make most of the body a
helper function.
2020-04-27 16:02:37 -04:00
Ayal Zaks a3c964a278 [LV] Fix recording of BranchTakenCount for FoldTail
When folding tail, branch taken count is computed during initial VPlan execution
and recorded to be used by the compare computing the loop's mask. This recording
should directly set the State, instead of reusing Value2VPValue mapping which
serves original Values present prior to vectorization.
The branch taken count may be a constant Value, which may be used elsewhere in
the loop; trying to employ Value2VPValue for both leads to the issue reported in
https://reviews.llvm.org/D76992#inline-721028

Differential Revision: https://reviews.llvm.org/D78847
2020-04-26 20:13:10 +03:00
Max Kazantsev 9cd4debd5a [LoopVectorize] Preserve CFG analyses if CFG wasn't modified
One of transforms the loop vectorizer makes is LCSSA formation. In some cases it
is the only transform it makes. We should not drop CFG analyzes if only LCSSA was
formed and no actual CFG changes was made.

We should think of expanding this logic to other passes as well, and maybe make
it a part of PM framework.

Reviewed By: Florian Hahn
Differential Revision: https://reviews.llvm.org/D78360
2020-04-24 17:22:24 +07:00
Mehdi Amini 2107af9ccf Revert "[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)."
This reverts commit 9245c7ac13.

This is triggering a segfault in XLA downstream, we'll follow-up with
a reproducer, it is likely influenced by TTI/TLI settings or other
options as a simple `opt -loop-vectorize` invocation on the IR
before the crash does not reproduce immediately.
2020-04-24 05:07:32 +00:00
Simon Pilgrim b108a457e1 [VPlan] Remove unused forward declarations. NFC.
Move VPlan.h include from VPlanVerifier.h down to VPlanVerifier.cpp
2020-04-23 12:34:20 +01:00
Florian Hahn 9245c7ac13 [VPlan] Add & use VPValue operands for VPWidenRecipe (NFC).
This patch adds VPValue version of the instruction operands to
VPWidenRecipe and uses them during code-generation.

Similar to D76373 this reduces ingredient def-use usage by ILV as
a step towards full VPlan-based def-use relations.

Reviewers: rengolin, Ayal, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D76992
2020-04-23 12:16:46 +01:00
Florian Hahn 647c9e72e4 [VPlan] Make various tryTo* helpers private and mark as const (NFC).
The individual tryTo* helpers do not need to be public. Also, the
builder contained two consecutive public: sections, which is not
necessary. Moved the remaining public methods after the constructor.

Also make some of the tryTo* helpers const.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed by: gilr

Differential Revision: https://reviews.llvm.org/D78288
2020-04-21 14:49:02 +01:00
Craig Topper 68b2e507e4 [Local] Update getOrEnforceKnownAlignment/getKnownAlignment to use Align/MaybeAlign.
Differential Revision: https://reviews.llvm.org/D78443
2020-04-20 21:31:44 -07:00
Craig Topper fcc9d70260 Revert "[Local] Update getOrEnforceKnownAlignment/getKnownAlignment to use Align/MaybeAlign."
This is breaking the clang build.

This reverts commit 897409fb56.
2020-04-20 13:25:06 -07:00
Craig Topper 897409fb56 [Local] Update getOrEnforceKnownAlignment/getKnownAlignment to use Align/MaybeAlign.
Differential Revision: https://reviews.llvm.org/D78443
2020-04-20 13:08:05 -07:00
Florian Hahn fa284e136e [VPlan] Clean up tryToCreate(Widen)Recipe. (NFC)
This patch includes some clean-ups to tryToCreateRecipe, suggested in
D77973.

It includes:
  * Renaming tryToCreateRecipe to tryToCreateWidenRecipe.
  * Move VPBB insertion logic to caller of tryToCreateWidenRecipe.
  * Hoists instruction checks to tryToCreateWidenRecipe, making it
    clearer which instructions are handled by which recipe, simplifying
    the checks by using early exits.
  * Split up handling of induction PHIs and truncates using inductions.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D78287
2020-04-20 10:06:35 +01:00
Sam Parker e3056ae9a0 [NFC][TTI] Explicit use of VectorType
The API for shuffles and reductions uses generic Type parameters,
instead of VectorType, and so assertions and casts are used a lot.
This patch makes those types explicit, which means that the clients
can't be lazy, but results in less ambiguity, and that can only be a
good thing.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562

Differential Revision: https://reviews.llvm.org/D78357
2020-04-20 09:16:52 +01:00
Sanjay Patel bef6e67e95 [VectorCombine] transform bitcasted shuffle to wider elements
bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'

This is the widen shuffle elements enhancement to D76727.
It builds on the analysis and simplifications in
D77881 and rG6a7e958a423e.

The phase ordering tests show that we can simplify inverse
shuffles across a binop in both directions (widen/narrow or
narrow/widen) now.

There's another potential transform visible in some of the
remaining TODOs - move a bitcasted operand of a shuffle
after the shuffle.

Differential Revision: https://reviews.llvm.org/D78371
2020-04-19 08:24:38 -04:00
Benjamin Kramer ff54d1c897 Remove remaining callers of CreateShuffleVector with unsigned indices and mark it as deprecated
No functionality change intended.
2020-04-19 11:48:28 +02:00
Ayal Zaks 8e0c5f7200 [LV] Mark first-order recurrences as allowed exits
First-order recurrences require special treatment when they are live-out;
such treatment is provided by fixFirstOrderRecurrence(), so they should be
included in AllowedExit set.

(Should probably have been included originally in D16197.)

Fixes PR45526: AllowedExit set is used by prepareToFoldTailByMasking() to
check whether the treatment for live-outs also holds when folding the tail,
which is not (yet) the case for first-order recurrences.

Differential Revision: https://reviews.llvm.org/D78210
2020-04-18 23:54:21 +03:00
Florian Hahn 4ee45ab60f [LV] Invalidate cost model decisions along with interleave groups.
Cost-modeling decisions are tied to the compute interleave groups
(widening decisions, scalar and uniform values). When invalidating the
interleave groups, those decisions also need to be invalidated.

Otherwise there is a mis-match during VPlan construction.
VPWidenMemoryRecipes created initially are left around w/o converting them
into VPInterleave recipes. Such a conversion indeed should not take place,
and these gather/scatter recipes may in fact be right. The crux is leaving around
obsolete CM_Interleave (and dependent) markings of instructions along with
their costs, instead of recalculating decisions, costs, and recipes.

Alternatively to forcing a complete recompute later on, we could try
to selectively invalidate the decisions connected to the interleave
groups. But we would likely need to run the uniform/scalar value
detection parts again anyways and the extra complexity is probably not
worth it.

Fixes PR45572.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D78298
2020-04-18 10:23:49 +01:00
Benjamin Kramer c5e7c2691d Remove accidental include.
Thank you clangd.
2020-04-17 16:36:30 +02:00
Benjamin Kramer b639091c02 Change users of CreateShuffleVector to pass the masks as int instead of Constants
No functionality change intended.
2020-04-17 16:34:29 +02:00
Benjamin Kramer 166467e822 [VectorUtils] Create shufflevector masks as int vectors instead of Constants
No functionality change intended.
2020-04-17 15:28:00 +02:00
Simon Pilgrim fa7f328a15 [cmake] LLVMVectorize - add include/llvm/Transforms/Vectorize header path
MSVC projects were missing the llvm/Transforms/Vectorize/* headers
2020-04-17 11:06:26 +01:00
Florian Hahn 3f7f06888b [VPlan] Branches are not widened by VPWidenRecipe, assert (NFC). 2020-04-15 12:03:45 +01:00
Benjamin Kramer 6f64daca8f Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints
No functionality change intended.
2020-04-15 12:51:38 +02:00
Florian Hahn 5b4b3e0b6e [VPlan] Move widening check for non-memory/non-calls to function (NFC).
After introducing VPWidenSelectRecipe, the duplicated logic can be
shared.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D77973
2020-04-15 11:48:37 +01:00
Florian Hahn 79d185c792 [VPlan] Move Load/Store checks out of tryToWiden (NFC).
Handling LoadInst and StoreInst in tryToWiden seems a bit
counter-intuitive, as there is only an assertion for them and in no
case VPWidenRefipes are created for them.

I think it makes sense to move the assertion to handleReplication, where
the non-widened loads and store are handled.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D77972
2020-04-15 10:18:42 +01:00
Gil Rapaport b747d72c19 [LV] Fix PR45525: Incorrect assert in blend recipe
Fix an assert introduced in 41ed5d856c1: a phi with a single predecessor and a
mask is a valid case which is already supported by the code.

Differential Revision: https://reviews.llvm.org/D78115
2020-04-15 10:39:07 +03:00
Teresa Johnson 33ffb62e23 Allow disabling of vectorization using internal options
Summary:
Currently, the internal options -vectorize-loops, -vectorize-slp, and
-interleave-loops do not have much practical effect. This is because
they are used to initialize the corresponding flags in the pass
managers, and those flags are then unconditionally overwritten when
compiling via clang or via LTO from the linkers. The only exception was
-vectorize-loops via opt because of some special hackery there.

While vectorization could still be disabled when compiling via clang,
using -fno-[slp-]vectorize, this meant that there was no way to disable
it when compiling in LTO mode via the linkers. This only affected
ThinLTO, since for regular LTO vectorization is done during the compile
step for scalability reasons. For ThinLTO it is invoked in the LTO
backends. See also the discussion on PR45434.

This patch makes it so the internal options can actually be used to
disable these optimizations. Ultimately, the best long term solution is
to mark the loops with metadata (similar to the approach used to fix
-fno-unroll-loops in D77058), but this enables a shorter term
workaround, and actually makes these internal options useful.

I constant propagated the initial values of these internal flags into
the pass manager flags (for some reasons vectorize-loops and
interleave-loops were initialized to true, while vectorize-slp was
initialized to false). As mentioned above, they are overwritten
unconditionally so this doesn't have any real impact, and these initial
values aren't particularly meaningful.

I then changed the passes to check the internl values and return without
performing the associated optimization when false (I changed the default
of -vectorize-slp to true so the options behave similarly). I was able
to remove the hackery in opt used to get -vectorize-loops=false to work,
as well as a special option there used to disable SLP vectorization.

Finally, I changed thinlto-slp-vectorize-pm.c to:
a) Only test SLP (moved the loop vectorization checking to a new test).
b) Use code that is slp vectorized when it is enabled, and check that
instead of whether the pass is enabled.
c) Test the new behavior of -vectorize-slp.
d) Test both pass managers.

The loop vectorization (and associated interleaving) testing I moved to
a new thinlto-loop-vectorize-pm.c test, with several changes:
a) Changed the flags on the interleaving testing so that it will
actually interleave, and check that.
b) Test the new behavior of -vectorize-loops and -interleave-loops.
c) Test both pass managers.

Reviewers: fhahn, wmi

Subscribers: hiraditya, steven_wu, dexonsmith, cfe-commits, davezarzycki, llvm-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77989
2020-04-14 18:09:10 -07:00
Christopher Tetreault 3297e9b7c3 Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: rriddle, sdesmalen, efriedma

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77259
2020-04-13 12:29:43 -07:00
Gil Rapaport 41ed5d856c [LV] Clean up vectorizeInterleaveGroup (NFCI)
Pass from the calling recipe the interleave group itself instead of passing the
group's insertion position and having the function query CM for its interleave
group and making sure that given instruction is the insertion point of.

Differential Revision: https://reviews.llvm.org/D78002
2020-04-13 13:15:06 +03:00
Florian Hahn 18138e0252 [VPlan] Introduce VPWidenSelectRecipe (NFC).
Widening a selects depends on whether the condition is loop invariant or
not. Rather than checking during codegen-time, the information can be
recorded at the VPlan construction time.

This was suggested as part of D76992, to reduce the reliance on
accessing the original underlying IR values.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D77869
2020-04-13 08:35:28 +01:00
Florian Hahn ae1e353a25 [VPlan] Turn classes with all public members into structs (NFC).
struct should be used when all members are public:
 https://llvm.org/docs/CodingStandards.html#use-of-class-and-struct-keywords

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D77865
2020-04-12 11:03:39 +01:00
Sanjay Patel 1318ddbc14 [VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC
As proposed in D77881, we'll have the related widening operation,
so this name becomes too vague.

While here, change the function signature to take an 'int' rather
than 'size_t' for the scaling factor, add an assert for overflow of
32-bits, and improve the documentation comments.
2020-04-11 10:05:49 -04:00
Florian Hahn 719846c469 [VPlan] Drop redundant private: at beginning of class defs (NFC).
Default visibility for classes is private, so the private: at the top of
various class definitions is redundant.

Reviewers: gilr, rengolin, Ayal, hsaito

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D77810
2020-04-11 13:27:10 +01:00
Gil Rapaport e2a1867880 [LV] Add VPValue operands to VPBlendRecipe (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces VPValues for VPBlendRecipe to use as the values to
blend. The recipe is generated with VPValues wrapping the phi's incoming values
of the scalar phi. This reduces ingredient def-use usage by ILV as a step
towards full VPlan-based def-use relations.

Differential Revision: https://reviews.llvm.org/D77539
2020-04-09 18:48:33 +03:00
Ayal Zaks 1678489234 [LV] FoldTail w/o Primary Induction
Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector
induction for use in fold-tail-with-masking, if a primary induction is absent.

The canonical scalar IV having start = 0 and step = VF*UF, created during code
-gen to control the vector loop, is widened into a canonical vector IV having
start = {<Part*VF, Part*VF+1, ..., Part*VF+VF-1> for 0 <= Part < UF} and
step = <VF*UF, VF*UF, ..., VF*UF>.

Differential Revision: https://reviews.llvm.org/D77635
2020-04-09 17:45:23 +03:00
Florian Hahn a7efe06af0 [LV] Assert no DbgInfoIntrinsic calls are passed to widening (NFC).
When building a VPlan, BasicBlock::instructionsWithoutDebug() is used to
iterate over the instructions in a block. This means that no recipes
should be created for debug info intrinsics already and we can turn the
early exit into an assertion.

Reviewers: Ayal, gilr, rengolin, aprantl

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D77636
2020-04-09 11:37:32 +01:00
Florian Hahn 9997ee23ed [VPlan] Add & use VPValue operands for VPWidenCallRecipe (NFC).
This patch adds VPValue versions for the arguments of the call to
VPWidenCallRecipe and uses them during code-generation.

Similar to D76373 this reduces ingredient def-use usage by ILV as
a step towards full VPlan-based def-use relations.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D77655
2020-04-09 10:23:26 +01:00
Eli Friedman 3f13ee8a00 [NFC] Modernize misc. uses of Align/MaybeAlign APIs.
Use the current getAlign() APIs where it makes sense, and use Align
instead of MaybeAlign when we know the value is non-zero.
2020-04-06 17:53:04 -07:00
Eli Friedman 68b03aee1a Remove SequentialType from the type heirarchy.
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.

In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.

In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)

Differential Revision: https://reviews.llvm.org/D75661
2020-04-06 17:03:49 -07:00
Florian Hahn 7aba6a0333 [LV] Fix value that could be read uninitialized.
This should fix
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/18569
2020-04-06 17:54:50 +01:00
Florian Hahn 90be3c24a7 [VPlan] Introduce new VPWidenCallRecipe (NFC).
This patch moves calls to their own recipe, to simplify the transition
to VPUser for operands of VPWidenRecipe, as discussed in D76992.

Subsequently additional information can be added to the recipe rather
than computing it during the execute step.

Reviewers: rengolin, Ayal, gilr, hsaito

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D77467
2020-04-06 16:07:37 +01:00
Florian Hahn a2b18c5a08 [LV] Simplify tryToWiden as recipes are not re-used (NFC).
After 49d00824bb, VPWidenRecipe only stores a single instruction.
tryToWiden can simply return the widen recipe, like other helpers in
VPRecipeBuilder.
2020-04-04 18:30:50 +01:00
Sanjay Patel ce97ce3a5d [VectorCombine] try to form a better extractelement
Extracting to the same index that we are going to insert back into
allows forming select ("blend") shuffles and enables further transforms.

Admittedly, this is a quick-fix for a more general problem that I'm
hoping to solve by adding transforms for patterns that start with an
insertelement.

But this might resolve some regressions known to be caused by the
extract-extract transform (although I have not gotten more details on
those yet).

In the motivating case from PR34724:
https://bugs.llvm.org/show_bug.cgi?id=34724

The combination of subsequent instcombine and codegen transforms gets us this improvement:

  vmovshdup	%xmm0, %xmm2    ## xmm2 = xmm0[1,1,3,3]
  vhaddps	%xmm1, %xmm1, %xmm4
  vmovshdup	%xmm1, %xmm3    ## xmm3 = xmm1[1,1,3,3]
  vaddps	%xmm0, %xmm2, %xmm0
  vaddps	%xmm1, %xmm3, %xmm1
  vshufps	$200, %xmm4, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm4[0,3]
  vinsertps	$177, %xmm1, %xmm0, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm1[2]

  -->

  vmovshdup	%xmm0, %xmm2    ## xmm2 = xmm0[1,1,3,3]
  vhaddps	%xmm1, %xmm1, %xmm1
  vaddps	%xmm0, %xmm2, %xmm0
  vshufps	$200, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,3]

Differential Revision: https://reviews.llvm.org/D76623
2020-04-03 13:55:13 -04:00
Guillaume Chatelet 1a584a8d50 [Alignment][NFC] Remove unused private functions
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77297
2020-04-03 09:16:20 +00:00
Sanjay Patel b6050ca181 [VectorCombine] transform bitcasted shuffle to narrower elements
bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC'

We do not attempt this in InstCombine because we do not want to change
types and create new shuffle ops that are potentially not lowered as
well as the original code. Here, we can check the cost model to see if
it is worthwhile.

I've aggressively enabled this transform even if the types are the same
size and/or equal cost because moving the bitcast allows InstCombine to
make further simplifications.

In the motivating cases from PR35454:
https://bugs.llvm.org/show_bug.cgi?id=35454
...this is enough to let instcombine and the backend eliminate the
redundant shuffles, but we probably want to extend VectorCombine to
handle the inverse pattern (shuffle-of-bitcast) to get that
simplification directly in IR.

Differential Revision: https://reviews.llvm.org/D76727
2020-04-02 13:30:22 -04:00
Eli Friedman 1ee6ec2bf3 Remove "mask" operand from shufflevector.
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.

This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.

I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors.  Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it.  Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.

Differential Revision: https://reviews.llvm.org/D72467
2020-03-31 13:08:59 -07:00
Vedant Kumar dcc410b5cf [LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259)
In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken
count is a SCEV add expression, its type is defined by the type of the
last operand of the add expression.

In the test case from PR45259, this last operand happens to be a
pointer, which (according to llvm::Type) does not have a primitive size
in bits. In this case, LoopVectorize fails to truncate the SCEV and
crashes as a result.

Uing ScalarEvolution::getTypeSizeInBits makes the truncation work as expected.

https://bugs.llvm.org/show_bug.cgi?id=45259

Differential Revision: https://reviews.llvm.org/D76669
2020-03-30 10:14:14 -07:00
Sanjay Patel fc3cc8a4b0 [VectorCombine] skip debug intrinsics first for efficiency 2020-03-29 13:58:04 -04:00
Florian Hahn 49d00824bb [VPlan] Use one VPWidenRecipe per original IR instruction. (NFC).
This patch changes VPWidenRecipe to only store a single original IR
instruction. This is the first required step towards modeling it's
operands as VPValues and also towards breaking it up into a
VPInstruction.

Discussed as part of D74695.

Reviewers: Ayal, gilr, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D76988
2020-03-29 13:47:28 +01:00
Sjoerd Meijer 401a324c51 [LV] Refactor widenIntOrFpInduction. NFC.
This untangles the logic in widenIntOrFpInduction in order to make more
explicit and visible how exactly the induction variable is lowered.

Differential Revision: https://reviews.llvm.org/D76686
2020-03-27 12:58:50 +00:00
Gil Rapaport 078c863305 [LV] Replace stored value with a VPValue (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit introduces a VPValue for VPWidenMemoryInstructionRecipe to use as
the stored value. The recipe is generated with a VPValue wrapping the stored
value of the scalar store. This reduces ingredient def-use usage by ILV as a
step towards full VPlan-based def-use relations.

Differential Revision: https://reviews.llvm.org/D76373
2020-03-25 19:36:55 +02:00
Guillaume Chatelet 32851f8d63 [Alignment][NFC] Deprecate VectorUtils::getAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, rogfer01, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76542
2020-03-23 13:54:15 +01:00
Nikita Popov a63eaa5449 [SLP] Avoid repeated visitation in getVectorElementSize(); NFC
We need to insert into the Visited set at the same time we insert
into the worklist. Otherwise we may end up pushing the same
instruction to the worklist multiple times, and only adding it to
the visited set later.
2020-03-22 14:34:29 +01:00
Florian Hahn fd2c15e602 [VPlan] Do not print mapping for Value2VPValue.
The latest improvements to VPValue printing make this mapping clear when
printing the operand. Printing the mapping separately is not required
any longer.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76375
2020-03-18 21:44:07 +00:00
Florian Hahn 00c1cd1934 [VPlan] Record underlying value for VPValues created by addVPValue (NFC).
Now that printing VPValues uses the underlying IR value name, if
available, recording the underlying value here improves printing.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76374
2020-03-18 21:30:58 +00:00
Eli Friedman e24e95fe90 Remove CompositeType class.
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.

Differential Revision: https://reviews.llvm.org/D75660
2020-03-18 13:53:17 -07:00
Florian Hahn e6a74803d4 [VPlan] Use underlying value for printing, if available.
When the an underlying value is available, we can use its name for
printing, as discussed in D73078.

Reviewers: rengolin, hsaito, Ayal, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D76200
2020-03-18 17:46:57 +00:00
Huihui Zhang fc1f205745 [SLPVectorizer][SVE] Bail out early for scalable vector.
Summary:
SLPVectorizer try to vectorize list of scalar instructions of the same type,
instructions already vectorized are rejected through isValidElementType().

Without this patch, tryToVectorizeList() will first try to determine vectorization
factor of a list of Instructions before checking whether each instruction has unsupported
type or not. For instructions already vectorized for SVE, it will crash at getVectorElementSize(),
where it try to return a fixed size.

This patch make sure invalid element types are rejected before trying to get vectorization
factor. This make sure we are not trying to vectorize instructions already vectorized.

Reviewers: sdesmalen, efriedma, spatel, RKSimon, ABataev, apazos, rengolin

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76017
2020-03-13 11:23:31 -07:00
Huihui Zhang 118abf2017 [SVE] Update API ConstantVector::getSplat() to use ElementCount.
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.

This change is needed for D73753.

Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74386
2020-03-12 13:22:41 -07:00
Anna Welker a6d3bec83f [TTI][ARM][MVE] Refine gather/scatter cost model
Refines the gather/scatter cost model, but also changes the TTI
function getIntrinsicInstrCost to accept an additional parameter
which is needed for the gather/scatter cost evaluation.
This did require trivial changes in some non-ARM backends to
adopt the new parameter.
Extending gathers and truncating scatters are now priced cheaper.

Differential Revision: https://reviews.llvm.org/D75525
2020-03-11 10:23:41 +00:00
Benjamin Kramer 247a177cf7 Give helpers internal linkage. NFC. 2020-03-10 18:27:42 +01:00
Florian Hahn 2d6ecf4648 [SLP] Support vectorizing functions provided by vector libs.
It seems like the SLPVectorizer is currently not aware of vector
versions of functions provided by libraries like Accelerate [1].
This patch updates SLPVectorizer to use the same infrastructure
the LoopVectorizer uses to detect vectorizable library functions.

For calls, it computes the cost of an intrinsic call (existing behavior)
and the cost of a vector function library call, if available. Like
LoopVectorizer, it assumes the cost of the vector function is simply the
cost of a call to a vector function.

[1] https://developer.apple.com/documentation/accelerate

Reviewers: ABataev, RKSimon, spatel

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D75878
2020-03-10 13:10:50 +00:00
Sanjay Patel a69158c12a [VectorCombine] fold extract-extract-op with different extraction indexes
opcode (extelt V0, Ext0), (ext V1, Ext1) --> extelt (opcode (splat V0, Ext0), V1), Ext1

The first part of this patch generalizes the cost calculation to accept
different extraction indexes. The second part creates a shuffle+extract
before feeding into the existing code to create a vector op+extract.

The patch conservatively uses "TargetTransformInfo::SK_PermuteSingleSrc"
rather than "TargetTransformInfo::SK_Broadcast" (splat specifically
from element 0) because we do not have a more general "SK_Splat"
currently. That does not affect any of the current regression tests,
but we might be able to find some cost model target specialization where
that comes into play.

I suspect that we can expose some missing x86 horizontal op codegen with
this transform, so I'm speculatively adding a debug flag to disable the
binop variant of this transform to allow easier testing.

The test changes show that we're sensitive to cost model diffs (as we
should be), so that means that patches like D74976
should have better coverage.

Differential Revision: https://reviews.llvm.org/D75689
2020-03-08 09:57:55 -04:00
Florian Hahn 40e7bfc424 [VPlan] Use consecutive numbers to print VPValues instead of addresses.
Currently when printing VPValues we use the object address, which makes
it hard to distinguish VPValues as they usually are large numbers with
varying distance between them.

This patch adds a simple slot tracker, similar to the ModuleSlotTracker
used for IR values. In order to dump a VPValue or anything containing a
VPValue, a slot tracker for the enclosing VPlan needs to be created. The
existing VPlanPrinter can take care of that for the existing code. We
assign consecutive numbers to each VPValue we encounter in a reverse
post order traversal of the VPlan.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D73078
2020-03-05 14:55:15 +00:00
Simon Pilgrim 01a91a6de7 Fix static analyzer uninitialized variable warning. NFCI. 2020-03-05 14:22:24 +00:00
Florian Hahn 05afa55521 [VPlan] Add getPlan() to VPBlockBase.
This patch adds a getPlan accessor to VPBlockBase, which finds the entry
block of the plan containing the block and returns the plan set for this
block.

VPBlockBase contains a VPlan pointer, but it should only be set for
the entry block of a plan. This allows moving blocks without updating
the pointer for each moved block and in the future we might introduce a
parent relationship between plans and blocks, similar to the one in LLVM IR.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D74445
2020-03-03 13:20:13 +00:00
David Green d0d38df091 [LoopVectorizer] Change types of lists from pointers to references. NFC
getReductionVars, getInductionVars and getFirstOrderRecurrences were all
being returned from LoopVectorizationLegality as pointers to lists. This
just changes them to be references, cleaning up the interface slightly.

Differential Revision: https://reviews.llvm.org/D75448
2020-03-02 15:04:41 +00:00
Austin Kerbow 4fa63fd452 [VectorCombine] Fix assert on compare extract index
Extract index could be a differnet integral type.

Differential Revision: https://reviews.llvm.org/D75327
2020-02-28 10:37:08 -08:00
Valery N Dmitriev d723ec4f04 [SLP][NFC] Assert that tree entry operands completed when scheduler looks for dependencies.
This change adds an assertion to prevent tricky bug related to recursive
approach of building vectorization tree. For loop below takes number of
operands directly from tree entry rather than from scalars.
If the entry at this moment turns out incomplete (i.e. not all operands set)
then not all the dependencies will be seen by the scheduler.
This can lead to failed scheduling (and thus failed vectorization)
for perfectly vectorizable tree.
Here is code example which is likely to fire the assertion:
for (i : VL0->getNumOperands()) {
  ...
  TE->setOperand(i, Operands);
  buildTree_rec(Operands, Depth + 1,...);
}

Correct way is two steps process: first set all operands to a tree entry
and then recursively process each operand.

Differential Revision: https://reviews.llvm.org/D75296
2020-02-28 10:34:48 -08:00
Valery N Dmitriev 02e5e47e17 [SLP][NFC] Delete some unreachable code.
This patch deletes some dead code out of SLP vectorizer.
Couple of changes taken out of D57059 to slightly lighten it
plus one more similar case fixed.

Differential Revision: https://reviews.llvm.org/D75276
2020-02-28 09:22:51 -08:00
Nemanja Ivanovic c46b85aaf4 [LoopVectorize] Fix cost for calls to functions that have vector versions
A recent commit
(https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed
the cost for calls to functions that have a vector version for some
vectorization factor. However, no check is performed for whether the
vectorization factor matches the current one being cost modeled. This leads to
attempts to widen call instructions to a vectorization factor for which such a
function does not exist, which in turn leads to an assertion failure.

This patch adds the check for vectorization factor (i.e. not just that the
called function has a vector version for some VF, but that it has a vector
version for this VF).

Differential revision: https://reviews.llvm.org/D74944
2020-02-26 21:39:11 -06:00
Sanjay Patel 25c6544f32 [VectorCombine] add a debug flag to skip all transforms
As suggested in D75145 -

I'm not sure why, but several passes have this kind of disable/enable flag
implemented at the pass manager level. But that means we have to duplicate
the flag for both pass managers and add code to check the flag every time
the pass appears in the pipeline.

We want a debug option to see if this pass is misbehaving regardless of the
pass managers, so just add a disablement check at the single point before
any transforms run.

Differential Revision: https://reviews.llvm.org/D75204
2020-02-26 15:15:42 -05:00
Sanjay Patel 10ea01d80d [VectorCombine] make cost calc consistent for binops and cmps
Code duplication (subsequently removed by refactoring) allowed
a logic discrepancy to creep in here.

We were being conservative about creating a vector binop -- but
not a vector cmp -- in the case where a vector op has the same
estimated cost as the scalar op. We want to be more aggressive
here because that can allow other combines based on reduced
instruction count/uses.

We can reverse the transform in DAGCombiner (potentially with a
more accurate cost model) if this causes regressions.

AFAIK, this does not conflict with InstCombine. We have a
scalarize transform there, but it relies on finding a constant
operand or a matching insertelement, so that means it eliminates
an extractelement from the sequence (so we won't have 2 extracts
by the time we get here if InstCombine succeeds).

Differential Revision: https://reviews.llvm.org/D75062
2020-02-25 08:41:59 -05:00
Sanjay Patel e9c79a7aef [VectorCombine] refactor to reduce duplicated code; NFC
This should be the last step in the current cleanup.
Follow-ups should resolve the TODO about cost calc
and enable the more general case where we extract
different elements.
2020-02-21 15:56:00 -05:00
Sanjay Patel 34e3485560 [VectorCombine] refactor cost calcs to reduce duplication; NFC
More cleanup is possible now, but we probably need to
resolve the TODO about the existing difference between
compares and binops.
2020-02-21 15:12:00 -05:00
Florian Hahn 98f5268a72 [VectorUtils] Move ToVectorTy to VectorUtils.h (NFC).
ToVectorTy is defined and used in multiple places. Hoist it to
VectorUtils.h to avoid duplication and improve re-usability.

Reviewers: rengolin, hsaito, Ayal, gilr, fpetrogalli

Reviewed By: fpetrogalli

Differential Revision: https://reviews.llvm.org/D74959
2020-02-21 17:31:24 +00:00
Sanjay Patel fc4455891c [VectorCombine] refactor matching code to reduce duplication; NFC
cmp/binop were already diverging even though they are largely
the same logic.
2020-02-21 12:06:51 -05:00
Reid Kleckner 0c2b09a9b6 [IR] Lazily number instructions for local dominance queries
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.

The downside is that Instruction grows from 56 bytes to 64 bytes.  The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.

The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.

We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.

Reviewed By: lattner, vsk, fhahn, dexonsmith

Differential Revision: https://reviews.llvm.org/D51664
2020-02-18 14:44:24 -08:00
Huihui Zhang 8ee0e1dc02 [NFC] Silence compiler warning [-Wmissing-braces]. 2020-02-18 10:37:12 -08:00
Florian Hahn e32522ca17 [SLPVectorizer] Do not assume extracelement idx is a ConstantInt.
The index of an ExtractElementInst is not guaranteed to be a
ConstantInt. It can be any integer value. Check explicitly for
ConstantInts.

The new test cases illustrate scenarios where we crash without
this patch. I've also added another test case to check the matching
of extractelement vector ops works.

Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D74758
2020-02-18 18:16:06 +01:00
Benjamin Kramer 5fc5c7db38 Strength reduce vectors into arrays. NFCI. 2020-02-17 15:37:35 +01:00
Sanjay Patel 62dd44d76d [VectorCombine] fix cost calc for extract-cmp
getOperationCost() is not the cost we wanted; that's not the
throughput value that the rest of the calculation uses.

We may want to switch everything in this code to use the
getInstructionThroughput() wrapper to avoid these kinds of
problems, but I'll look at that as a follow-up because that
can create other logical diffs via using optional parameters
(we'd need to speculatively create the vector instruction to
make a fair(er) comparison).
2020-02-16 10:40:28 -05:00
Kadir Cetinkaya 1674f772b4
[VecotrCombine] Fix unused variable for assertion disabled builds 2020-02-14 09:30:29 +01:00
Sanjay Patel 19b62b79db [VectorCombine] try to form vector binop to eliminate an extract element
binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C

This is a transform that has been considered for canonicalization (instcombine)
in the past because it reduces instruction count. But as shown in the x86 tests,
it's impossible to know if it's profitable without a cost model. There are many
potential target constraints to consider.

We have implemented similar transforms in the backend (DAGCombiner and
target-specific), but I don't think we have this exact fold there either (and if
we did it in SDAG, it wouldn't work across blocks).

Note: this patch was intended to handle the more general case where the extract
indexes do not match, but it got too big, so I scaled it back to this pattern
for now.

Differential Revision: https://reviews.llvm.org/D74495
2020-02-13 17:23:27 -05:00
Matt Arsenault 86f9117d47 AMDGPU: Don't report 2-byte alignment as fast
This is apparently worse than 1-byte alignment. This does not attempt
to decompose 2-byte aligned wide stores, but will stop trying to
produce them.

Also fix bug in LoadStoreVectorizer which was decreasing the alignment
and vectorizing stack accesses. It was assuming a stack object was an
alloca that could have its base alignment changed, which is not true
if the pointer is derived from a function argument.
2020-02-11 18:35:00 -05:00
Sanjay Patel a2a0f9a43a [VectorCombine] remove unused debug counter; NFC
The variable was added to the initial commit via copy/paste of existing
code, but it wasn't actually used in the code. We can add it back with
the proper usage if/when that is needed.
2020-02-11 08:24:07 -05:00
Sanjay Patel a17f03bd93 [VectorCombine] new IR transform pass for partial vector ops
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.

There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:

    InstCombine - can't happen without TTI, but we don't want target-specific
                  folds there.
    SDAG - too late to assist other vectorization passes; TLI is not equipped
           for these kind of cost queries; limited to a single basic block.
    CGP - too late to assist other vectorization passes; would need to re-implement
          basic cleanups like CSE/instcombine.
    SLP - doesn't fit with existing transforms; limited to a single basic block.

This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.

We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.

Differential Revision: https://reviews.llvm.org/D73480
2020-02-09 10:04:41 -05:00
Florian Hahn d1f849a284 [LV] Hoist code to mark conditional assumes as dead to caller (NFC).
This is a follow-up suggested in D73423. It is sufficient to just add
the conditional assumes to DeadInstructions once.
2020-01-28 08:50:44 -08:00
Florian Hahn a911fef3dd [LV] Do not try to sink dead instructions.
Dead instructions do not need to be sunk. Currently we try and record
the recipies for them, but there are no recipes emitted for them and
there's nothing to sink. They can be removed from SinkAfter while
marking them for recording.

Fixes PR44634.

Reviewers: rengolin, hsaito, fhahn, Ayal, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D73423
2020-01-28 08:28:03 -08:00
Guillaume Chatelet d0a7cc7177 [Alignment][NFC] Use Align with CreateMaskedScatter/Gather
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

This patch shows that CreateMaskedScatter/CreateMaskedGather can only take positive non zero alignment values.

Reviewers: courbet

Subscribers: hiraditya, llvm-commits, delena

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73361
2020-01-27 10:17:14 +01:00
Guillaume Chatelet 59f95222d4 [Alignment][NFC] Use Align with CreateAlignedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73274
2020-01-23 17:34:32 +01:00
Guillaume Chatelet 279fa8e006 [Alignement][NFC] Deprecate untyped CreateAlignedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73260
2020-01-23 13:34:32 +01:00
Florian Hahn f14f2a8568 [LV] Fix predication for branches with matching true and false succs.
Currently due to the edge caching, we create wrong predicates for
branches with matching true and false successors. We will cache the
condition for the edge from the true successor, and then lookup the same
edge (src and dst are the same) for the edge to the false successor.

If both successors match, the condition should always be true. At the
moment, we cannot really create constant VPValues, but we can just
create a true condition as X | !X. Later passes will clean that up.

Fixes PR44488.

Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D73079
2020-01-22 18:34:11 -08:00
Guillaume Chatelet 0957233320 [Alignment][NFC] Use Align with CreateMaskedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73106
2020-01-22 11:04:39 +01:00
Andrei Elovikov e1d6d36852 [SLP] Don't allow Div/Rem as alternate opcodes
Summary:
We don't have control/verify what will be the RHS of the division, so it might
happen to be zero, causing UB.

Reviewers: Vasilis, RKSimon, ABataev

Reviewed By: ABataev

Subscribers: vporpo, ABataev, hiraditya, llvm-commits, vdmitrie

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72740
2020-01-21 15:21:17 -08:00
Guillaume Chatelet bc8a1ab26f [Alignment][NFC] Use Align with CreateMaskedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73087
2020-01-21 14:13:22 +01:00
Evgeniy Brevnov af7e158872 [LV] Vectorizer should adjust trip count in profile information
Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.

Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov

Reviewed By: Ayal, DaniilSuchkov

Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67905
2020-01-20 18:36:28 +07:00
Francesco Petrogalli 66c120f025 [VectorUtils] Rework the Vector Function Database (VFDatabase).
Summary:
This commits is a rework of the patch in
https://reviews.llvm.org/D67572.

The rework was requested to prevent out-of-tree performance regression
when vectorizing out-of-tree IR intrinsics. The vectorization of such
intrinsics is enquired via the static function `isTLIScalarize`. For
detail see the discussion in https://reviews.llvm.org/D67572.

Reviewers: uabelho, fhahn, sdesmalen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72734
2020-01-16 15:08:26 +00:00
Florian Hahn 23c113802e [LV] Allow assume calls in predicated blocks.
The assume intrinsic is intentionally marked as may reading/writing
memory, to avoid passes moving them around. When flattening the CFG
for predicated blocks, we have to drop the assume calls, as they
are control-flow dependent.

There are some cases where we can do better (when control flow is
preserved), but that is follow-up work.

Fixes PR43620.

Reviewers: hsaito, rengolin, dcaballe, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D68814
2020-01-16 10:11:35 +00:00
Benjamin Kramer 498856fca5 [LV] Silence unused variable warning in Release builds. NFC. 2020-01-10 11:21:27 +01:00
Gil Rapaport 8647a72c4a [LV] VPValues for memory operation pointers (NFCI)
Memory instruction widening recipes use the pointer operand of their load/store
ingredient for generating the needed GEPs, making it difficult to feed these
recipes with pointers based on other ingredients or none at all.
This patch modifies these recipes to use a VPValue for the pointer instead, in
order to reduce ingredient def-use usage by ILV as a step towards full
VPlan-based def-use relations. The recipes are constructed with VPValues bound
to these ingredients, maintaining current behavior.

Differential revision: https://reviews.llvm.org/D70865
2020-01-10 09:24:59 +02:00
Sjoerd Meijer 8f1887456a [LV] Still vectorise when tail-folding can't find a primary inducation variable
This addresses a vectorisation regression for tail-folded loops that are
counting down, e.g. loops as simple as this:

  void foo(char *A, char *B, char *C, uint32_t N) {
    while (N > 0) {
      *C++ = *A++ + *B++;
       N--;
    }
  }

These are loops that can be vectorised, but when tail-folding is requested, it
can't find a primary induction variable which we do need for predicating the
loop. As a result, the loop isn't vectorised at all, which it is able to do
when tail-folding is not attempted. So, this adds a check for the primary
induction variable where we decide how to lower the scalar epilogue. I.e., when
there isn't a primary induction variable, a scalar epilogue loop is allowed
(i.e. don't request tail-folding) so that vectorisation could still be
triggered.

Having this check for the primary induction variable make sense anyway, and in
addition, in a follow-up of this I will look into discovering earlier the
primary induction variable for counting down loops, so that this can also be
tail-folded.

Differential revision: https://reviews.llvm.org/D72324
2020-01-09 09:14:00 +00:00
Florian Hahn b8a3c34eee Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)."
This reverts commit 51ef53f3bd, as it
breaks some bots.
2020-01-04 18:44:38 +00:00
Florian Hahn 51ef53f3bd [SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).
SCEVExpander modifies the underlying function so it is more suitable in
Transforms/Utils, rather than Analysis. This allows using other
transform utils in SCEVExpander.

Reviewers: sanjoy.google, efriedma, reames

Reviewed By: sanjoy.google

Differential Revision: https://reviews.llvm.org/D71537
2020-01-04 18:29:35 +00:00
Evgeniy Brevnov 948e745270 [LV][NFC] Keep dominator tree up to date during vectorization. 2019-12-30 18:38:41 +07:00
Evgeniy Brevnov 1b6286b945 [LV][NFC] Some refactoring and renaming to facilitate next change. 2019-12-30 18:38:41 +07:00
Gil Rapaport d62bf16131 [LV] Use getMask() when printing recipe [NFCI]
Use dedicated API for getting the mask instead of duplicating it.

Differential Revision: https://reviews.llvm.org/D71964
2019-12-29 08:50:40 +02:00
Dinar Temirbulatov a755ccefe6 [SLP] Replace NeedToGather variable with enum. 2019-12-23 08:21:53 +01:00
Ayal Zaks e498be5738 [LV] Strip wrap flags from vectorized reductions
A sequence of additions or multiplications that is known not to wrap, may wrap
if it's order is changed (i.e., reassociated). Therefore when vectorizing
integer sum or product reductions, their no-wrap flags need to be removed.

Fixes PR43828

Patch by Denis Antrushin

Differential Revision: https://reviews.llvm.org/D69563
2019-12-20 14:48:53 +02:00
Anna Welker 7cd1cfdd6b [NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter]
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.

Differential Revision: https://reviews.llvm.org/D71610
2019-12-18 09:14:39 +00:00
Francesco Petrogalli 19f73f0d1b Revert "[VectorUtils] Introduce the Vector Function Database (VFDatabase)."
This reverts commit 0be81968a2.

The VFDatabase needs some rework to be able to handle vectorization
and subsequent scalarization of intrinsics in out-of-tree versions of
the compiler. For more details, see the discussion in
https://reviews.llvm.org/D67572.
2019-12-13 19:42:04 +00:00
Francesco Petrogalli 0be81968a2 [VectorUtils] Introduce the Vector Function Database (VFDatabase).
This patch introduced the VFDatabase, the framework proposed in
http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html. [*]

In this patch the VFDatabase is used to bridge the TargetLibraryInfo
(TLI) calls that were previously used to query for the availability of
vector counterparts of scalar functions.

The VFISAKind field `ISA` of VFShape have been moved into into VFInfo,
under the assumption that different vector ISAs may provide the same
vector signature. At the moment, the vectorizer accepts any of the
available ISAs as long as the signature provided by the VFDatabase
matches the one expected in the vectorization process. For example,
when targeting AVX or AVX2, which both have 256-bit registers, the IR
signature of the two vector functions associated to the two ISAs is
the same. The `getVectorizedFunction` method at the moment returns the
first available match. We will need to add more heuristics to the
search system to decide which of the available version (TLI, AVX,
AVX2, ...)  the system should prefer, when multiple versions with the
same VFShape are present.

Some of the code in this patch is based on the work done by Sumedh
Arani in https://reviews.llvm.org/D66025.

[*] Notice that in the proposal the VFDatabase was called SVFS. The
name VFDatabase is more in line with LLVM recommendations for
naming classes and variables.

Differential Revision: https://reviews.llvm.org/D67572
2019-12-10 16:36:44 +00:00
David Green be7a107070 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
2019-12-09 10:24:33 +00:00
Florian Hahn c491949694 [LV] Pick correct BB as insert point when fixing PHI for FORs.
Currently we fail to pick the right insertion point when
PreviousLastPart of a first-order-recurrence is a PHI node not in the
LoopVectorBody. This can happen when PreviousLastPart is produce in a
predicated block. In that case, we should pick the insertion point in
the BB the PHI is in.

Fixes PR44020.

Reviewers: hsaito, fhahn, Ayal, dorit

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D71071
2019-12-07 19:32:00 +00:00
Florian Hahn e60b36cf92 [VPlan] Rename VPlanHCFGTransforms to VPlanTransforms (NFC).
The file is intended to gather various VPlan transformations, not only
CFG related transforms. Actually, the only transformation there is not
CFG related.

Reviewers: Ayal, gilr, hsaito, rengolin

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D70732
2019-12-07 08:56:35 +00:00
Gil Rapaport 39ccc099c9 [LV] Record GEP widening decisions in recipe (NFCI)
InnerLoopVectorizer's code called during VPlan execution still relies on
original IR's def-use relations to decide which vector code to generate,
limiting VPlan transformations ability to modify def-use relations and still
have ILV generate the vector code.
This commit moves GEP operand queries controlling how GEPs are widened to a
dedicated recipe and extracts GEP widening code to its own ILV method taking
those recorded decisions as arguments. This reduces ingredient def-use usage by
ILV as a step towards full VPlan-based def-use relations.

Differential revision: https://reviews.llvm.org/D69067
2019-12-06 13:41:19 +02:00
Ayal Zaks 6ed9cef25f [LV] Scalar with predication must not be uniform
Fix PR40816: avoid considering scalar-with-predication instructions as also
uniform-after-vectorization.

Instructions identified as "scalar with predication" will be "vectorized" using
a replicating region. If such instructions are also optimized as "uniform after
vectorization", namely when only the first of VF lanes is used, such a
replicating region becomes erroneous - only the first instance of the region can
and should be formed. Fix such cases by not considering such instructions as
"uniform after vectorization".

Differential Revision: https://reviews.llvm.org/D70298
2019-12-03 19:50:24 +02:00
Anton Afanasyev a315519c17 [SLP] Enhance SLPVectorizer to vectorize different combinations of aggregates
Summary:
Make SLPVectorize to recognize homogeneous aggregates like
`{<2 x float>, <2 x float>}`, `{{float, float}, {float, float}}`,
`[2 x {float, float}]` and so on.
It's a follow-up of https://reviews.llvm.org/D70068.
Merged `findBuildVector()` and `findBuildAggregate()` to
one `findBuildAggregate()` function making it recursive
to recognize multidimensional aggregates. Aggregates required
to be homogeneous.

Reviewers: RKSimon, ABataev, dtemirbulatov, spatel, vporpo

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70587
2019-12-03 19:29:27 +03:00
Florian Hahn e9c68422de [VPlan] Add dump function to VPlan class.
This adds a dump() function to VPlan, which uses the existing
operator<<.

This method provides a convenient way to dump a VPlan while debugging,
e.g. from lldb.

Reviewers: hsaito, Ayal, gilr, rengolin

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D70920
2019-12-03 11:59:10 +00:00
Hiroshi Yamauchi 8cdfdfeee6 [PGO][PGSO] Add an optional query type parameter to shouldOptimizeForSize.
Summary:
In case of a need to distinguish different query sites for gradual commit or
debugging of PGSO. NFC.

Reviewers: davidxl

Subscribers: hiraditya, zzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70510
2019-12-02 13:54:13 -08:00
Florian Hahn fe459ce65a [VPlan] Move graph traits (NFC).
By defining the graph traits right after the VPBlockBase definitions, we
can make use of them earlier in the file.

Reviewers: hsaito, Ayal, gilr

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D70733
2019-12-02 18:23:11 +00:00
Florian Hahn 9d24933f79 Recommit f0c2a5a "[LV] Generalize conditions for sinking instrs for first order recurrences."
This version contains 2 fixes for reported issues:
1. Make sure we do not try to sink terminator instructions.
2. Make sure we bail out, if we try to sink an instruction that needs to
   stay in place for another recurrence.

Original message:
If the recurrence PHI node has a single user, we can sink any
instruction without side effects, given that all users are dominated by
the instruction computing the incoming value of the next iteration
('Previous'). We can sink instructions that may cause traps, because
that only causes the trap to occur later, but not on any new paths.

With the relaxed check, we also have to make sure that we do not have a
direct cycle (meaning PHI user == 'Previous), which indicates a
reduction relation, which potentially gets missed by
ReductionDescriptor.

As follow-ups, we can also sink stores, iff they do not alias with
other instructions we move them across and we could also support sinking
chains of instructions and multiple users of the PHI.

Fixes PR43398.

Reviewers: hsaito, dcaballe, Ayal, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D69228
2019-11-24 21:21:55 +00:00
Anton Afanasyev 80cd6b6e04 [SLP] Enhance SLPVectorizer to vectorize vector aggregate
Summary:
Vector aggregate is homogeneous aggregate of vectors like `{ <2 x float>, <2 x float> }`.
This patch allows `findBuildAggregate()` to consider vector aggregates as
well as scalar ones. For instance, `{ <2 x float>, <2 x float> }` maps to `<4 x float>`.

Fixes vector part of llvm.org/PR42022

Reviewers: RKSimon

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70068
2019-11-22 20:01:59 +03:00
Tom Stellard ab411801b8 [cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO.  I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so.  Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:

1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so.  This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.

With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.

2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set.  This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.

I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:

- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON

Reviewers: beanz, smeenai, compnerd, phosek

Reviewed By: beanz

Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70179
2019-11-21 10:48:08 -08:00
Sjoerd Meijer 901cd3b3f6 [LV] PreferPredicateOverEpilog respecting option
Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when
option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not
to predicate the loop.

Differential Revision: https://reviews.llvm.org/D70382
2019-11-21 14:06:10 +00:00
Eric Christopher 714aabacfb Temporarily Revert "[SLP] allow forming 2-way reduction patterns" and update testcases.
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.

This reverts commit 8a0aa5310b.
2019-11-20 16:00:53 -08:00
Eric Christopher 8a0aa5310b Temporarily Revert "Temporarily Revert "[SLP] allow forming 2-way reduction patterns""
as there were testcase changes after that need to also be reverted.

This reverts commit cd8748a15f.
2019-11-20 15:39:47 -08:00
Eric Christopher cd8748a15f Temporarily Revert "[SLP] allow forming 2-way reduction patterns"
After speaking with Sanjay - seeing a number of miscompiles and working
on tracking down a testcase. None of the follow on patches seem to
have helped so far.

This reverts commit 7ff57705ba.
2019-11-20 15:19:31 -08:00
Sanjay Patel 0a8e7ca402 [SLP] fix miscompile on min/max reductions with extra uses (PR43948) (2nd try)
The 1st attempt was reverted because it revealed an existing
bug where we could produce invalid IR (use of value before
definition). That should be fixed with:
rG39de82ecc9c2

The bug manifests as replacing a reduction operand with an undef
value.

The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.

In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.

For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.

Differential Revision: https://reviews.llvm.org/D70148
2019-11-19 14:57:35 -05:00
Sanjay Patel 39de82ecc9 [SLP] fix insertion point for min/max reduction
As discussed in D70148 (and caused a revert of the original commit):
if we insert at the select, then we can produce invalid IR because
the replacement for the compare may have uses before the select.
2019-11-19 10:50:10 -05:00
Evgeniy Brevnov 4a64d710ae [NFC] Test commit. Please ignore.
As a test commit I fixed a misspelling in one of comments in SLP
vectorizer.
2019-11-19 15:41:57 +07:00
Eric Christopher 6f1cc4151a Temporarily revert "[SLP] fix miscompile on min/max reductions with extra uses (PR43948)"
as it causes an ICE on valid. A testcase was followed up on the original thread.

This reverts commit a3e61946c5.
2019-11-18 14:41:37 -08:00
Richard Smith 7889d8e7eb Revert "[LoadStoreVectorize] Use '||' instead of '|' between sides with function calls. NFCI."
This broke two tests. Presumably the non-short-circuting '|' was
intentional here.

This reverts commit f7efea0ded.
2019-11-15 12:49:35 -08:00
Dávid Bolvanský f7efea0ded [LoadStoreVectorize] Use '||' instead of '|' between sides with function calls. NFCI.
Fixes warning from PVS Studio
2019-11-15 18:51:13 +01:00
Reid Kleckner 4c1a1d3cf9 Add missing includes needed to prune LLVMContext.h include, NFC
These are a pre-requisite to removing #include "llvm/Support/Options.h"
from LLVMContext.h: https://reviews.llvm.org/D70280
2019-11-14 15:23:15 -08:00
Alexey Bataev bfa32573bf Revert "Temporarily Revert:"
This reverts commit e511c4b0dff1692c267addf17dce3cebe8f97faa:

    Temporarily Revert:

     "[SLP] Generalization of stores vectorization."
     "[SLP] Fix -Wunused-variable. NFC"
     "[SLP] Vectorize jumbled stores."

after fixing the problem with compile time.
2019-11-14 16:38:20 -05:00
Sjoerd Meijer cb47b87830 [LV] PreferPredicateOverEpilog respecting predicate loop hint
The vectoriser queries TTI->preferPredicateOverEpilogue to determine if
tail-folding is preferred for a loop, but it was not respecting loop hint
'predicate' that can disable this, which has now been added. This showed that
we were incorrectly initialising loop hint 'vectorize.predicate.enable' with 0
(i.e. FK_Disabled) but this should have been FK_Undefined, which has been
fixed.

Differential Revision: https://reviews.llvm.org/D70125
2019-11-14 13:10:44 +00:00
Reid Kleckner 05da2fe521 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Sanjay Patel a3e61946c5 [SLP] fix miscompile on min/max reductions with extra uses (PR43948)
The bug manifests as replacing a reduction operand with an undef
value.

The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.

In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.

For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.

Differential Revision: https://reviews.llvm.org/D70148
2019-11-13 15:57:35 -05:00
Sanjay Patel e9bf7a60a0 [SLP] reduce code duplication for min/max vs. other reductions; NFCI 2019-11-13 11:26:08 -05:00
Simon Pilgrim d1bd5e476b SLPVectorizer - make comparison operators + isInSchedulingRegion const
Fixes cppcheck warnings.
2019-11-13 14:40:19 +00:00
Vasileios Porpodas 6a18a95487 [SLP] Look-ahead operand reordering heuristic.
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for examples).

Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk

Reviewed By: RKSimon, dtemirbulatov

Subscribers: xbolva00, Carrot, hiraditya, phosek, rnk, rcorcs, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60897
2019-11-11 21:06:51 -08:00
Simon Pilgrim 4ff246fef2 Remove unused variable (which allows us to remove vector include). NFC. 2019-11-10 12:16:23 +00:00
Gil Rapaport 7f152543e4 [LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)
This recommits 11ed1c0239 (reverted in
9f08ce0d21 for failing an assert) with a fix:
tryToWidenMemory() now first checks if the widening decision is to interleave,
thus maintaining previous behavior where tryToInterleaveMemory() was called
first, giving priority to interleave decisions over widening/scalarization. This
commit adds the test case that exposed this bug as a LIT.
2019-11-09 20:52:25 +02:00
Gil Rapaport 9f08ce0d21 Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)"
This reverts commit 11ed1c0239 - causes an assert failure.
2019-11-08 22:17:11 +02:00
Gil Rapaport 11ed1c0239 [LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)
This recommits 100e797adb (reverted in
009e032634 for failing an assert). While the
root cause was independently reverted in eaff300401,
this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not
try to sink branch instructions.
2019-11-08 15:25:14 +02:00
Sanjay Patel 7ff57705ba [SLP] allow forming 2-way reduction patterns
We have a vector compare reduction problem seen in PR39665 comment 2:
https://bugs.llvm.org/show_bug.cgi?id=39665#c2

Or slightly reduced here:

define i1 @cmp2(<2 x double> %a0) {
  %a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0>
  %b = extractelement <2 x i1> %a, i32 0
  %c = extractelement <2 x i1> %a, i32 1
  %d = and i1 %b, %c
  ret i1 %d
}

SLP would not attempt to turn this into a vector reduction because there is an
artificial lower limit on that transform. We can not completely remove that limit
without inducing regressions though, so this patch just hacks an extra attempt at
creating a 2-way reduction to the end of the analysis.

As shown in the test file, we are still not getting some of the motivating cases,
so follow-on patches will be needed to solve those cases.

Differential Revision: https://reviews.llvm.org/D59710
2019-11-07 06:08:42 -05:00
Eric Christopher 009e032634 Temporarily Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)"
as it's causing assert failures.

This reverts commit 100e797adb.
2019-11-06 21:58:28 -08:00
Eric Christopher e511c4b0df Temporarily Revert:
"[SLP] Generalization of stores vectorization."
 "[SLP] Fix -Wunused-variable. NFC"
 "[SLP] Vectorize jumbled stores."

As they're causing significant (10-30x) compile time regressions on
vectorizable code.

The primary cause of the compile-time regression is f228b53716.

This reverts commits:

f228b53716
5503455ccb
21d498c9c0
2019-11-06 16:06:15 -08:00
Sjoerd Meijer 6c2a4f5ff9 [TTI][LV] preferPredicateOverEpilogue
We have two ways to steer creating a predicated vector body over creating a
scalar epilogue. To force this, we have 1) a command line option and 2) a
pragma available. This adds a third: a target hook to TargetTransformInfo that
can be queried whether predication is preferred or not, which allows the
vectoriser to make the decision without forcing it.

While this change behaves as a non-functional change for now, it shows the
required TTI plumbing, usage of this new hook in the vectoriser, and the
beginning of an ARM MVE implementation. I will follow up on this with:
- a complete MVE implementation, see D69845.
- a patch to disable this, i.e. we should respect "vector_predicate(disable)"
  and its corresponding loophint.

Differential Revision: https://reviews.llvm.org/D69040
2019-11-06 10:14:20 +00:00
Sergey Dmitriev 82588e05cc [SLP] - Add couple safety checks to TreeEntry::dump(). NFC
Summary: Check for MainOp and AltOp for NULL before dereferencing or issue NULL.

Reviewers: Vasilis, dtemirbulatov, RKSimon, ABataev

Reviewed By: ABataev

Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69812
2019-11-05 09:57:30 -08:00
Gil Rapaport 100e797adb [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)
This recommits 2be17087f8 (reverted in
d3ec06d219 for heap-use-after-free) with a fix
in IAI's reset() which was not clearing the set of interleave groups after
deleting them.
2019-11-05 17:29:13 +02:00
Alexey Bataev b80c41cd3c [SLP]Fix PR43799: Crash on different sizes of GEP indices.
Summary:
If the GEP instructions are going to be vectorized, the indices in those
GEP instructions must be of the same type. Otherwise, the compiler may
crash when trying to build the vector constant.

Reviewers: RKSimon, spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69627
2019-11-04 10:36:26 -05:00
Benjamin Kramer d3ec06d219 Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)"
This reverts commit 2be17087f8. Fails ASAN.
2019-11-04 15:04:42 +01:00
Gil Rapaport 2be17087f8 [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)
The sink-after and interleave-group vectorization decisions were so far applied to
VPlan during initial VPlan construction, which complicates VPlan construction – also because of
their inter-dependence. This patch refactors buildVPlanWithRecipes() to construct a simpler
initial VPlan and later apply both these vectorization decisions, in order, as VPlan-to-VPlan
transformations.

Differential Revision: https://reviews.llvm.org/D68577
2019-11-04 10:37:39 +02:00
Simon Pilgrim 81ba611e88 Ensure VPlanPrinter::Depth is initialized to fix static analyzer warning. NFCI. 2019-11-03 11:17:05 +00:00
Alexey Bataev 70ad617dd6 [SLP] Vectorize jumbled stores.
Summary:
Patch adds support for vectorization of the jumbled stores. The value
operands are vectorized and then shuffled in the right order before
store.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43339
2019-10-31 16:02:25 -04:00
Haojian Wu e65ddcafee Revert "[SLP] Vectorize jumbled stores."
This reverts commit 21d498c9c0.

This commit causes some crashes on some targets.
2019-10-31 10:21:24 +01:00
Alexey Bataev 21d498c9c0 [SLP] Vectorize jumbled stores.
Summary:
Patch adds support for vectorization of the jumbled stores. The value
operands are vectorized and then shuffled in the right order before
store.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D43339
2019-10-30 13:33:52 -04:00
Simon Pilgrim d52f5ed01a [SLPVectorizer] Use getAPInt() for comparison. NFCI.
Technically integers can assert on getZExtValue() if beyond i64 range, and a fuzzer usually find this.....
2019-10-30 16:16:55 +00:00
Fangrui Song 5503455ccb [SLP] Fix -Wunused-variable. NFC 2019-10-29 09:38:55 -07:00
Alexey Bataev f228b53716 [SLP] Generalization of stores vectorization.
Stores are vectorized with maximum vectorization factor of 16. Patch
tries to improve the situation and use maximal vectorization factor.

Reviewers: spatel, RKSimon, mkuper, hfinkel

Differential Revision: https://reviews.llvm.org/D43582
2019-10-29 11:46:36 -04:00
Craig Topper 18824d25d8 [LV] Interleaving should not exceed estimated loop trip count.
Currently we may do iterleaving by more than estimated trip count
coming from the profile or computed maximum trip count. The solution is to
use "best known" trip count instead of exact one in interleaving analysis.

Patch by Evgeniy Brevnov.

Differential Revision: https://reviews.llvm.org/D67948
2019-10-28 10:58:22 -07:00
Guillaume Chatelet a4783ef58d [Alignment][NFC] getMemoryOpCost uses MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: nemanjai, hiraditya, kbarton, MaskRay, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69307
2019-10-25 21:26:59 +02:00
Sanjay Patel b82fa80e80 [SLP] adjust code comment; NFC
(check commit access)
2019-10-25 11:39:43 -04:00
Guillaume Chatelet 5e1e83ee23 [Alignment][NFC] Instructions::getLoadStoreAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69256

llvm-svn: 375416
2019-10-21 14:49:28 +00:00
Sanjay Patel 8cc6d42e8d [SLP] avoid reduction transform on patterns that the backend can load-combine (2nd try)
The 1st attempt at this modified the cost model in a bad way to avoid the vectorization,
but that caused problems for other users (the loop vectorizer) of the cost model.

I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a cost-independent bailout with a conservative pattern match for a
multi-instruction sequence that can probably be reduced later.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 375025
2019-10-16 18:06:24 +00:00
Sam Parker 527a35e155 [NFC][TTI] Add Alignment for isLegalMasked[Load/Store]
Add an extra parameter so the backend can take the alignment into
consideration.

Differential Revision: https://reviews.llvm.org/D68400

llvm-svn: 374763
2019-10-14 10:00:21 +00:00
Benjamin Kramer 97c9804e06 [LV] Merge LLVM_DEBUG blocks.
Avoids unused variable warnings about the range-based for loops in
there. NFCI.

llvm-svn: 374646
2019-10-12 10:57:22 +00:00
Zi Xuan Wu 9802268ad3 recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374634
2019-10-12 02:53:04 +00:00
Florian Hahn 39d4c9fd56 [VPlan] Add moveAfter to VPRecipeBase.
This patch adds a moveAfter method to VPRecipeBase, which can be used to
move elements after other elements, across VPBasicBlocks, if necessary.

Reviewers: dcaballe, hsaito, rengolin, hfinkel

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D46825

llvm-svn: 374565
2019-10-11 15:36:55 +00:00
Florian Hahn a3ca7acb4f [LV][NFC] Factor out calculation of "best" estimated trip count.
This is just small refactoring to minimize changes in upcoming patch.
In the next path I'm going to introduce changes into heuristic for vectorization of "tiny trip count" loops.

Patch by Evgeniy Brevnov <evgueni.brevnov@gmail.com>

Reviewers: hsaito, Ayal, fhahn, reames

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D67690

llvm-svn: 374338
2019-10-10 13:07:01 +00:00
Guillaume Chatelet 837a1b84ce [Alignment][NFC] Make VectorUtils uas llvm::Align
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, rogfer01, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68784

llvm-svn: 374330
2019-10-10 12:35:04 +00:00
Sanjay Patel df14bd315d [SLP] respect target register width for GEP vectorization (PR43578)
We failed to account for the target register width (max vector factor)
when vectorizing starting from GEPs. This causes vectorization to
proceed to obviously illegal widths as in:
https://bugs.llvm.org/show_bug.cgi?id=43578

For x86, this also means that SLP can produce rogue AVX or AVX512
code even when the user specifies a narrower vector width.

The AArch64 test in ext-trunc.ll appears to be better using the
narrower width. I'm not exactly sure what getelementptr.ll is trying
to do, but it's testing with "-slp-threshold=-18", so I'm not worried
about those diffs. The x86 test is an over-reduction from SPEC h264;
this patch appears to restore the perf loss caused by SLP when using
-march=haswell.

Differential Revision: https://reviews.llvm.org/D68667

llvm-svn: 374183
2019-10-09 16:32:49 +00:00
Sjoerd Meijer d1170dbe58 [LV] Emitting SCEV checks with OptForSize
When optimising for size and SCEV runtime checks need to be emitted to check
overflow behaviour, the loop vectorizer can run in this assert:

  LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks(
  llvm::Loop *, llvm::BasicBlock *): Assertion `!BB->getParent()->hasOptSize()
  && "Cannot SCEV check stride or overflow when opt

We should not generate predicates while optimising for size because
code will be generated for predicates such as these SCEV overflow runtime
checks.

This should fix PR43371.

Differential Revision: https://reviews.llvm.org/D68082

llvm-svn: 374166
2019-10-09 13:19:41 +00:00
Jinsong Ji 9912232b46 Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"

This reverts commit 9f41deccc0.
This reverts commit 18b6fe07bc.

The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.

llvm-svn: 374091
2019-10-08 17:32:56 +00:00
Kadir Cetinkaya 18b6fe07bc [LoopVectorize] Fix non-debug builds after rL374017
llvm-svn: 374021
2019-10-08 07:39:50 +00:00
Zi Xuan Wu 9f41deccc0 [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374017
2019-10-08 03:28:33 +00:00
Martin Storsjo dfc1aee25b Revert "[SLP] avoid reduction transform on patterns that the backend can load-combine"
This reverts SVN r373833, as it caused a failed assert "Non-zero loop
cost expected" on building numerous projects, see PR43582 for details
and reproduction samples.

llvm-svn: 373882
2019-10-07 08:21:37 +00:00
Sanjay Patel e2321bb448 [SLP] avoid reduction transform on patterns that the backend can load-combine
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708
https://bugs.llvm.org/show_bug.cgi?id=43146

We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).

Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.

In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:

  movbe   rax, qword ptr [rdi]

or:

  mov     rax, qword ptr [rdi]

Not some (half) vector monstrosity as we currently do using SLP:

  vpmovzxbq       ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
  vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
  movzx   eax, byte ptr [rdi]
  movzx   ecx, byte ptr [rdi + 5]
  shl     rcx, 40
  movzx   edx, byte ptr [rdi + 6]
  shl     rdx, 48
  or      rdx, rcx
  movzx   ecx, byte ptr [rdi + 7]
  shl     rcx, 56
  or      rcx, rdx
  or      rcx, rax
  vextracti128    xmm1, ymm0, 1
  vpor    xmm0, xmm0, xmm1
  vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
  vpor    xmm0, xmm0, xmm1
  vmovq   rax, xmm0
  or      rax, rcx
  vzeroupper
  ret

Differential Revision: https://reviews.llvm.org/D67841

llvm-svn: 373833
2019-10-05 18:03:58 +00:00
Guillaume Chatelet d400d45150 [Alignment][NFC] Remove StoreInst::setAlignment(unsigned)
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu, jdoerfert

Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68268

llvm-svn: 373595
2019-10-03 13:17:21 +00:00
Guillaume Chatelet 17380227e8 [Alignment][NFC] Remove LoadInst::setAlignment(unsigned)
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, jdoerfert

Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68142

llvm-svn: 373195
2019-09-30 09:37:05 +00:00
Alexey Bataev 8b1eeafb91 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 373166
2019-09-29 14:18:06 +00:00
Guillaume Chatelet 18f805a7ea [Alignment][NFC] Remove unneeded llvm:: scoping on Align types
llvm-svn: 373081
2019-09-27 12:54:21 +00:00
Jordan Rupprecht f98d2c099a Revert [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
This reverts r372626 (git commit 6a278d9073)

llvm-svn: 373019
2019-09-26 22:09:17 +00:00
Simon Pilgrim 934f18144d LoopVectorize - silence static analyzer dyn_cast<CmpInst> null dereference warning. NFCI.
The static analyzer is warning about a potential null dereference, but we should be able to use cast<CmpInst> directly and if not assert will fire for us.

llvm-svn: 372732
2019-09-24 11:27:38 +00:00
Sjoerd Meijer 0fcb3afb40 [LV] Forced vectorization with runtime checks and OptForSize
When vectorisation is forced with a pragma, we optimise for min size, and we
need to emit runtime memory checks, then allow this code growth and don't run
in an assert like we currently do.

This is the result of D65197 and D66803, and was a use-case not really
considered before. If this now happens, we emit an optimisation remark warning
about the code-size expansion, which can be avoided by not forcing
vectorisation or possibly source-code modifications.

Differential Revision: https://reviews.llvm.org/D67764

llvm-svn: 372694
2019-09-24 08:03:34 +00:00
Alexey Bataev 6a278d9073 [SLP] Fix for PR31847: Assertion failed: (isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!")
Summary:
Initially SLP vectorizer replaced all going-to-be-vectorized
instructions with Undef values. It may break ScalarEvaluation and may
cause a crash.
Reworked SLP vectorizer so that it does not replace vectorized
instructions by UndefValue anymore. Instead vectorized instructions are
marked for deletion inside if BoUpSLP class and deleted upon class
destruction.

Reviewers: mzolotukhin, mkuper, hfinkel, RKSimon, davide, spatel

Subscribers: RKSimon, Gerolf, anemet, hans, majnemer, llvm-commits, sanjoy

Differential Revision: https://reviews.llvm.org/D29641

llvm-svn: 372626
2019-09-23 16:25:03 +00:00
Simon Pilgrim a56bd6c51e [VPlan] Silence static analyzer dyn_cast null dereference warning. NFCI.
llvm-svn: 372502
2019-09-22 13:02:00 +00:00
Simon Pilgrim a2719f38c1 [LoopVectorize] Don't dereference a dyn_cast result. NFCI.
The static analyzer is warning about potential null dereferences of dyn_cast<> results, we can use cast<> directly as we know that these cases should all be CastInst, which is why its working atm and anyway cast<> will assert if they aren't.

llvm-svn: 372116
2019-09-17 13:24:54 +00:00
Simon Pilgrim 1aaefbca24 [VPlanSLP] Don't dereference a cast_or_null<VPInstruction> result. NFCI.
The static analyzer is warning about a potential null dereference of the cast_or_null result, I've split the cast_or_null check from the ->getUnderlyingInstr() call to avoid this, but it appears that we weren't seeing any null pointers in the dumped bundles in the first place.

llvm-svn: 371975
2019-09-16 11:22:44 +00:00
Simon Pilgrim bfe6b35c70 [SLPVectorizer] Assert that we find a LastInst to silence analyzer null dereference warning. NFCI.
llvm-svn: 371974
2019-09-16 10:48:16 +00:00
Simon Pilgrim ae625d70cd [SLPVectorizer] Don't dereference a dyn_cast result. NFCI.
The static analyzer is warning about potential null dereferences of dyn_cast<> results - in these cases we can safely use cast<> directly as we know that these cases should all be the correct type, which is why its working atm and anyway cast<> will assert if they aren't.

llvm-svn: 371973
2019-09-16 10:35:09 +00:00
Simon Pilgrim 4e46ea3946 [LoadStoreVectorizer] vectorizeLoadChain - ensure we find a valid Type down the load chain. NFCI.
Silence static analyzer uninitialized variable warning by setting the LoadTy to null and then asserting we find a real value.

llvm-svn: 371936
2019-09-15 16:44:35 +00:00
Sanjay Patel b6a0faaa0c [SLP] limit vectorization of Constant subclasses (PR33958)
This is a fix for:
https://bugs.llvm.org/show_bug.cgi?id=33958

It seems universally true that we would not want to transform this kind of
sequence on any target, but if that's not correct, then we could view this
as a target-specific cost model problem. We could also white-list ConstantInt,
ConstantFP, etc. rather than blacklist Global and ConstantExpr.

Differential Revision: https://reviews.llvm.org/D67362

llvm-svn: 371931
2019-09-15 13:03:24 +00:00
Philip Reames cffa630c80 [Loads] Move generic code out of vectorizer into a location it might be reused [NFC]
llvm-svn: 371558
2019-09-10 21:33:53 +00:00
Philip Reames 1e1db80048 [ValueTracking] Factor our common speculation suppression logic [NFC]
Expose a utility function so that all places which want to suppress speculation (when otherwise legal) due to ordering and/or sanitizer interaction can do so.

llvm-svn: 371556
2019-09-10 21:12:29 +00:00
Philip Reames 7403569be7 [LoopVectorize] Leverage speculation safety to avoid masked.loads
If we're vectorizing a load in a predicated block, check to see if the load can be speculated rather than predicated.  This allows us to generate a normal vector load instead of a masked.load.

To do so, we must prove that all bytes accessed on any iteration of the original loop are dereferenceable, and that all loads (across all iterations) are properly aligned.  This is equivelent to proving that hoisting the load into the loop header in the original scalar loop is safe.

Note: There are a couple of code motion todos in the code.  My intention is to wait about a day - to be sure this sticks - and then perform the NFC motion without furthe review.

Differential Revision: https://reviews.llvm.org/D66688

llvm-svn: 371452
2019-09-09 20:54:13 +00:00
Simon Pilgrim 879ed20bde Fix typo. NFCI
llvm-svn: 371317
2019-09-07 18:09:09 +00:00
Teresa Johnson 9c27b59cec Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.

This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.

Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.

There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.

Reviewers: chandlerc, hfinkel

Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66428

llvm-svn: 371284
2019-09-07 03:09:36 +00:00
Guillaume Chatelet 33671ceffa [LLVM][Alignment] Convert isLegalNTStore/isLegalNTLoad to llvm::Align
Summary:
This is patch is part of a serie to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67223

llvm-svn: 371063
2019-09-05 13:09:42 +00:00
Philip Reames 4228245e41 [NFC] Switch last couple of invariant_load checks to use hasMetadata
llvm-svn: 370948
2019-09-04 18:27:31 +00:00
Bjorn Pettersson dd18ce4501 [LV] Fix miscompiles by adding non-header PHI nodes to AllowedExit
Summary:
Fold-tail currently supports reduction last-vector-value live-out's,
but has yet to support last-scalar-value live-outs, including
non-header phi's. As it relies on AllowedExit in order to detect
them and bail out we need to add the non-header PHI nodes to
AllowedExit, otherwise we end up with miscompiles.

Solves https://bugs.llvm.org/show_bug.cgi?id=43166

Reviewers: fhahn, Ayal

Reviewed By: fhahn, Ayal

Subscribers: anna, hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67074

llvm-svn: 370721
2019-09-03 09:33:55 +00:00
Sjoerd Meijer 718f909ccd [LV] Tail-folding, runtime scev checks
Now that we allow tail-folding, not only when we optimise for size, make
sure we do not run in this assert.

Differential revision: https://reviews.llvm.org/D66932

llvm-svn: 370711
2019-09-03 08:53:02 +00:00
Sjoerd Meijer 0469b0e4ef [LV] Tail-folding with runtime memory checks
The loop vectorizer was running in an assert when it tried to fold the tail and
had to emit runtime memory disambiguation checks.

Differential revision: https://reviews.llvm.org/D66803

llvm-svn: 370707
2019-09-03 08:38:24 +00:00
Simon Pilgrim 757cc16ab7 Fix cppcheck shadow variable and variable scope warnings. NFCI.
llvm-svn: 370580
2019-08-31 12:30:19 +00:00
Ayal Zaks d15df0ede5 [LV] Fold tail by masking - handle reductions
Allow vectorizing loops that have reductions when tail is folded by masking.
A select is introduced in VPlan, choosing between the last value carried by the
loop-exit/live-out instruction of the reduction, and the penultimate value
carried by the reduction phi, according to the "i < n" mask of fold-tail.
This select replaces the last value as the live-out value of the loop.

Differential Revision: https://reviews.llvm.org/D66720

llvm-svn: 370173
2019-08-28 09:02:23 +00:00
Philip Reames cf3b555973 Add a clarify comment for meaning of SafePointes [NFC]
Extracted from D66688 as requested.

llvm-svn: 369962
2019-08-26 20:48:35 +00:00
Sanjay Patel 5a5d44e801 [SLP] use range-for loops, fix formatting; NFC
These are part of D57059, but that patch doesn't apply cleanly to trunk
at this point, so we might as well remove some of the noise.

llvm-svn: 369776
2019-08-23 16:22:32 +00:00
Sanjay Patel 9182467886 [SLP] fix formatting; NFC
These are part of D57059, but that patch doesn't apply cleanly to trunk
at this point, so we might as well remove some of the noise.

llvm-svn: 369769
2019-08-23 15:26:12 +00:00
Dinar Temirbulatov 081c57989e [SLP][NFC] Avoid repetitive calls to getSameOpcode()
We can avoid repetitive calls getSameOpcode() for already known tree elements by keeping MainOp and AltOp in TreeEntry.

Differential Revision: https://reviews.llvm.org/D64700

llvm-svn: 369315
2019-08-20 00:22:04 +00:00
Sanjay Patel b38bac3699 [SLP] reduce duplicated code; NFC
llvm-svn: 369250
2019-08-19 11:39:56 +00:00
Vasileios Porpodas 1d254f3dae [SLPVectorizer] Make the scheduler aware of the TreeEntry operands.
Summary:
The scheduler's dependence graph gets the use-def dependencies by accessing the operands of the instructions in a bundle. However, buildTree_rec() may change the order of the operands in TreeEntry, and the scheduler is currently not aware of this. This is not causing any functional issues currently, because reordering is restricted to the operands of a single instruction. Once we support operand reordering across multiple TreeEntries, as shown here: http://www.llvm.org/devmtg/2019-04/slides/Poster-Porpodas-Supernode_SLP.pdf , the scheduler will need to get the correct operands from TreeEntry and not from the individual instructions.

In short, this patch:
- Connects the scheduler's bundle with the corresponding TreeEntry. It introduces new TE and Lane fields in ScheduleData.
- Moves the location where the operands of the TreeEntry are initialized. This used to take place in newTreeEntry() setting one operand at a time, but is now moved pre-order just before the recursion of buildTree_rec(). This is required because the scheduler needs to access both operands of the TreeEntry in tryScheduleBundle().
- Updates the scheduler to access the instruction operands through the TreeEntry operands instead of accessing the instruction operands directly.

Reviewers: ABataev, RKSimon, dtemirbulatov, Ayal, dorit, hfinkel

Reviewed By: ABataev

Subscribers: hiraditya, llvm-commits, lebedev.ri, rcorcs

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62432

llvm-svn: 369131
2019-08-16 17:21:18 +00:00
Simon Pilgrim 59894d4668 [SLPVectorizer] Silence null dereference warning. NFCI.
cppcheck + MSVC analyzer both over zealously warn that we might dereference a null Bundle pointer - add an assertion to check for null to silence the warning, plus its a good idea to check that we succeeded in finding a schedule bundle anyway....

llvm-svn: 369094
2019-08-16 10:28:23 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Dorit Nuzman d57d73daed [LV] fold-tail predication should be respected even with assume_safety
assume_safety implies that loads under "if's" can be safely executed
speculatively (unguarded, unmasked). However this assumption holds only for the
original user "if's", not those introduced by the compiler, such as the
fold-tail "if" that guards us from loading beyond the original loop trip-count.
Currently the combination of fold-tail and assume-safety pragmas results in
ignoring the fold-tail predicate that guards the loads, generating unmasked
loads. This patch fixes this behavior.

Differential Revision: https://reviews.llvm.org/D66106

Reviewers: Ayal, hsaito, fhahn
llvm-svn: 368973
2019-08-15 07:12:14 +00:00
Dinar Temirbulatov da0435a690 [SLP][NFC] Use pointers to address to ScalarToTreeEntry elements, instead of indexes.
llvm-svn: 368906
2019-08-14 19:46:50 +00:00
Dorit Nuzman 491ca2425d [LV] Fold-tail flag
This is the compiler-flag equivalent of the Predicate pragma
(https://reviews.llvm.org/D65197), to direct the vectorizer to fold the
remainder-loop into the main-loop using predication.

Differential Revision: https://reviews.llvm.org/D66108

Reviewers: Ayal, hsaito, fhahn, SjoerdMeije
llvm-svn: 368801
2019-08-14 05:22:20 +00:00
Craig Topper 005b22855e [LoopVectorize][X86] Clamp interleave factor if we have a known constant trip count that is less than VF*interleave
If we know the trip count, we should make sure the interleave factor won't cause the vectorized loop to exceed it.

Improves one of the cases from PR42674

Differential Revision: https://reviews.llvm.org/D65896

llvm-svn: 368215
2019-08-07 21:44:14 +00:00
Mitch Phillips 924359dc0f Revert "[X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors."
This reverts commit fc33e33776.

This commit depends on the rolled back commit rL367901, and thus needs
to be rolled back.

llvm-svn: 368109
2019-08-06 23:38:14 +00:00
Craig Topper fc33e33776 [X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors.
With the switch to widening legalization, we need to a better
job of costing extractions of less than 128-bits.

llvm-svn: 368081
2019-08-06 20:12:41 +00:00
Hideki Saito ec818d7fb3 [LV][NFC] Share the LV illegality reporting with LoopVectorize.
Reviewers: hsaito, fhahn, rengolin
 
Reviewed By: rengolin
 
Patch by psamolysov, thanks!
 
Differential Revision: https://reviews.llvm.org/D62997

llvm-svn: 367980
2019-08-06 06:08:48 +00:00
Stanislav Mekhanoshin 6fe00a21f2 Handle casts changing pointer size in the vectorizer
Added code to truncate or shrink offsets so that we can continue
base pointer search if size has changed along the way.

Differential Revision: https://reviews.llvm.org/D65612

llvm-svn: 367646
2019-08-02 04:03:37 +00:00
Stanislav Mekhanoshin eee9312a85 Relax load store vectorizer pointer strip checks
The previous change to fix crash in the vectorizer introduced
performance regressions. The condition to preserve pointer
address space during the search is too tight, we only need to
match the size.

Differential Revision: https://reviews.llvm.org/D65600

llvm-svn: 367624
2019-08-01 22:18:56 +00:00
Sjoerd Meijer e0dfce0723 Follow up of rL367592, fix the build
Some buildbots complained about:
error: default label in switch which covers all enumeration values

llvm-svn: 367603
2019-08-01 18:54:29 +00:00
Sjoerd Meijer 20b198ec5e [LV] Tail-Loop Folding
This allows folding of the scalar epilogue loop (the tail) into the main
vectorised loop body when the loop is annotated with a "vector predicate"
metadata hint. To fold the tail, instructions need to be predicated (masked),
enabling/disabling lanes for the remainder iterations.

Differential Revision: https://reviews.llvm.org/D65197

llvm-svn: 367592
2019-08-01 18:21:44 +00:00
Stanislav Mekhanoshin ba1e845c21 [AMDGPU] Fix for vectorizer crash with pointers of different size
When vectorizer strips pointers it can eventually end up with
pointers of two different sizes, then SCEV will crash.

Differential Revision: https://reviews.llvm.org/D65480

llvm-svn: 367443
2019-07-31 16:33:11 +00:00
Sjoerd Meijer 5c606cef79 [LV] Scalar Epilogue Lowering. NFC.
This refactors boolean 'OptForSize' that was passed around in a lot of places.
It controlled folding of the tail loop, the scalar epilogue, into the main loop
but code-size reasons may not be the only reason to do this. Thus, this is a
first step to generalise the concept of tail-loop folding, and hence OptForSize
has been renamed and is using an enum ScalarEpilogueStatus that holds the
status how the epilogue should be lowered.

This will be followed up by D65197, that picks up the predicate loop hint and
performs the tail-loop folding.

Differential Revision: https://reviews.llvm.org/D64916

llvm-svn: 366993
2019-07-25 08:06:02 +00:00
Simon Pilgrim 5d4bb8628c [SLPVectorizer] Revert local change that got accidently got committed in rL366799
This wasn't part of D63281

llvm-svn: 366807
2019-07-23 13:42:01 +00:00
Simon Pilgrim 743d45ee25 [TargetLowering] Add SimplifyMultipleUseDemandedBits
This patch introduces the DAG version of SimplifyMultipleUseDemandedBits, which attempts to peek through ops (mainly and/or/xor so far) that don't contribute to the demandedbits/elts of a node - which means we can do this even in cases where we have multiple uses of an op, which normally requires us to demanded all bits/elts. The intention is to remove a similar instruction - SelectionDAG::GetDemandedBits - once SimplifyMultipleUseDemandedBits has matured.

The InstCombine version of SimplifyMultipleUseDemandedBits can constant fold which I haven't added here yet, and so far I've only wired this up to some basic binops (and/or/xor/add/sub/mul) to demonstrate its use.

We do see a couple of regressions that need to be addressed:

    AMDGPU unsigned dot product codegen retains an AND mask (for ZERO_EXTEND) that it previously removed (but otherwise the dotproduct codegen is a lot better).
	
    X86/AVX2 has poor handling of vector ANY_EXTEND/ANY_EXTEND_VECTOR_INREG - it prematurely gets converted to ZERO_EXTEND_VECTOR_INREG.

The code owners have confirmed its ok for these cases to fixed up in future patches.

Differential Revision: https://reviews.llvm.org/D63281

llvm-svn: 366799
2019-07-23 12:39:08 +00:00
Simon Pilgrim 87adcf8c47 [SLPVectorizer] Remove null-pointer test. NFCI.
cast<CallInst> shouldn't return null and we dereference the pointer in a lot of other places, causing both MSVC + cppcheck to warn about dereferenced null pointers

llvm-svn: 366793
2019-07-23 10:51:43 +00:00
Simon Pilgrim 3ebd2fe91a [SLPVectorizer] Fix some MSVC/cppcheck uninitialized variable warnings. NFCI.
llvm-svn: 366712
2019-07-22 17:57:36 +00:00
Eric Christopher 93dfb93ad6 Temporarily Revert "[SLP] Recommit: Look-ahead operand reordering heuristic."
As there are some reported miscompiles with AVX512 and performance regressions
in Eigen. Verified with the original committer and testcases will be forthcoming.

This reverts commit r364964.

llvm-svn: 366154
2019-07-15 23:36:02 +00:00
Florian Hahn 1d554b7441 [LoopVectorize] Pass unfiltered list of arguments to getIntrinsicInstCost.
We do not compute the scalarization overhead in getVectorIntrinsicCost
and TTI::getIntrinsicInstrCost requires the full arguments list.

llvm-svn: 366049
2019-07-15 08:48:47 +00:00
Florian Hahn 9428d95ce7 [LV] Exclude loop-invariant inputs from scalar cost computation.
Loop invariant operands do not need to be scalarized, as we are using
the values outside the loop. We should ignore them when computing the
scalarization overhead.

Fixes PR41294

Reviewers: hsaito, rengolin, dcaballe, Ayal

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D59995

llvm-svn: 366030
2019-07-14 20:12:36 +00:00
Fangrui Song b251cc0d91 Delete dead stores
llvm-svn: 365903
2019-07-12 14:58:15 +00:00
Nikita Popov 5ca39e828c [SLP] Optimize getSpillCost(); NFCI
For a given set of live values, the spill cost will always be the
same for each call. Compute the cost once and multiply it by the
number of calls.

(I'm not sure this spill cost modeling makes sense if there are
multiple calls, as the spill cost will likely be shared across
calls in that case. But that's how it currently works.)

llvm-svn: 365552
2019-07-09 20:24:44 +00:00
Vasileios Porpodas cf47ff5ffb [SLP] Recommit: Look-ahead operand reordering heuristic.
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk

Reviewed By: RKSimon, dtemirbulatov

Subscribers: hiraditya, phosek, rnk, rcorcs, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60897

llvm-svn: 364964
2019-07-02 20:20:28 +00:00
Jordan Rupprecht a7972dc04a Revert [SLP] Look-ahead operand reordering heuristic.
This reverts r364478 (git commit 574cb0eb3a)

The patch is causing compilation timeouts.

llvm-svn: 364846
2019-07-01 21:10:43 +00:00
Vasileios Porpodas 574cb0eb3a [SLP] Look-ahead operand reordering heuristic.
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk

Reviewed By: RKSimon, dtemirbulatov

Subscribers: rnk, rcorcs, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60897

llvm-svn: 364478
2019-06-26 21:25:24 +00:00
Vasileios Porpodas 3081f78776 [SLP] NFC: Fixed typo in comment
llvm-svn: 364237
2019-06-24 21:40:48 +00:00
Cameron McInally fe3f15cf90 [SLP] Support unary FNeg vectorization
Differential Revision: https://reviews.llvm.org/D63609

llvm-svn: 364219
2019-06-24 19:24:23 +00:00
Reid Kleckner 592a193285 Revert [SLP] Look-ahead operand reordering heuristic.
This reverts r364084 (git commit 5698921be2)

It caused crashes while compiling a file in Chrome. Reduction
forthcoming.

llvm-svn: 364111
2019-06-21 23:10:25 +00:00
Simon Pilgrim 5698921be2 [SLP] Look-ahead operand reordering heuristic.
This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for an example).

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D60897

llvm-svn: 364084
2019-06-21 17:57:01 +00:00
Orlando Cazalet-Hyams 1251cac62a [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

I have set up a separate review D61933 for a fix which is required for this patch.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse

Reviewed By: hfinkel, jmorse

Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

> llvm-svn: 363046

llvm-svn: 363786
2019-06-19 10:50:47 +00:00
Warren Ristow 6452bdd29b [LV] Suppress vectorization in some nontemporal cases
When considering a loop containing nontemporal stores or loads for
vectorization, suppress the vectorization if the corresponding
vectorized store or load with the aligment of the original scaler
memory op is not supported with the nontemporal hint on the target.

This adds two new functions:
  bool isLegalNTStore(Type *DataType, unsigned Alignment) const;
  bool isLegalNTLoad(Type *DataType, unsigned Alignment) const;

to TTI, leaving the target independent default implementation as
returning true, but with overriding implementations for X86 that
check the legality based on available Subtarget features.

This fixes https://llvm.org/PR40759

Differential Revision: https://reviews.llvm.org/D61764

llvm-svn: 363581
2019-06-17 17:20:08 +00:00
Whitney Tsang 15b7f5b72d PHINode: introduce setIncomingValueForBlock() function, and use it.
Summary:
There is PHINode::getBasicBlockIndex() and PHINode::setIncomingValue()
but no function to replace incoming value for a specified BasicBlock*
predecessor.
Clearly, there are a lot of places that could use that functionality.

Reviewer: craig.topper, lebedev.ri, Meinersbur, kbarton, fhahn
Reviewed By: Meinersbur, fhahn
Subscribers: fhahn, hiraditya, zzheng, jsji, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D63338

llvm-svn: 363566
2019-06-17 14:38:56 +00:00
Bjorn Pettersson 83773b77a5 [LV] Deny irregular types in interleavedAccessCanBeWidened
Summary:
Avoid that loop vectorizer creates loads/stores of vectors
with "irregular" types when interleaving. An example of
an irregular type is x86_fp80 that is 80 bits, but that
may have an allocation size that is 96 bits. So an array
of x86_fp80 is not bitcast compatible with a vector
of the same type.

Not sure if interleavedAccessCanBeWidened is the best
place for this check, but it solves the problem seen
in the added test case. And it is the same kind of check
that already exists in memoryInstructionCanBeWidened.

Reviewers: fhahn, Ayal, craig.topper

Reviewed By: fhahn

Subscribers: hiraditya, rkruppe, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63386

llvm-svn: 363547
2019-06-17 12:02:24 +00:00
Orlando Cazalet-Hyams a947156396 Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion"
This reverts commit 1a0f7a2077.
See phabricator thread for D60831.

llvm-svn: 363132
2019-06-12 08:34:51 +00:00
Orlando Cazalet-Hyams 1a0f7a2077 [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

I have set up a separate review D61933 for a fix which is required for this patch.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse

Reviewed By: hfinkel, jmorse

Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

llvm-svn: 363046
2019-06-11 10:37:20 +00:00
Fangrui Song 19189993c9 [LV] Fix -Wunused-function after r362736
llvm-svn: 362762
2019-06-07 01:48:26 +00:00
Renato Golin 9e97caf594 [LV] Wrap LV illegality reporting in a function. NFC.
A function for loop vectorization illegality reporting has been
introduced:

void LoopVectorizationLegality::reportVectorizationFailure(
    const StringRef DebugMsg, const StringRef OREMsg,
    const StringRef ORETag, Instruction * const I) const;

The function prints a debug message when the debug for the compilation
unit is enabled as well as invokes the optimization report emitter to
generate a message with a specified tag. The function doesn't cover any
complicated logic when a custom lambda should be passed to the emitter,
only generating a message with a tag is supported.

The function always prints the instruction `I` after the debug message
whenever the instruction is specified, otherwise the debug message
ends with a dot: 'LV: Not vectorizing: Disabled/already vectorized.'

Patch by Pavel Samolysov <samolisov@gmail.com>

llvm-svn: 362736
2019-06-06 19:15:52 +00:00
Dinar Temirbulatov 15c657d13d [SLP] Fix regression in broadcasts caused by operand reordering patch D59973.
This patch fixes a regression caused by the operand reordering refactoring patch https://reviews.llvm.org/D59973 .
The fix changes the strategy to Splat instead of Opcode, if broadcast opportunities are found.
Please see the lit test for some examples.

Committed on behalf of @vporpo (Vasileios Porpodas)
    
Differential Revision: https://reviews.llvm.org/D62427

llvm-svn: 362613
2019-06-05 15:26:28 +00:00
Sanjay Patel ad62a3a299 [LoopUtils][SLPVectorizer] clean up management of fast-math-flags
Instead of passing around fast-math-flags as a parameter, we can set those
using an IRBuilder guard object. This is no-functional-change-intended.

The motivation is to eventually fix the vectorizers to use and set the
correct fast-math-flags for reductions. Examples of that not behaving as
expected are:
https://bugs.llvm.org/show_bug.cgi?id=23116 (should be able to reduce with less than 'fast')
https://bugs.llvm.org/show_bug.cgi?id=35538 (possible miscompile for -0.0)
D61802 (should be able to reduce with IR-level FMF)

Differential Revision: https://reviews.llvm.org/D62272

llvm-svn: 362612
2019-06-05 14:58:04 +00:00
Florian Hahn 9bbdde2598 [LV] Remove the redundant using LoopVectorizationPlanner:VPlanPtr
VPlan.h already contains the declaration of VPlanPtr type alias:

using VPlanPtr = std::unique_ptr<VPlan>;

The LoopVectorizationPlanner class also contains the same declaration
of VPlanPtr and therefore LoopVectorize requires a long wording when
its methods return VPlanPtr:

    LoopVectorizationPlanner::VPlanPtr
    LoopVectorizationPlanner::buildVPlanWithVPRecipes(...)

but LoopVectorize.cpp includes VPlan.h (via LoopVectorizationPlanner.h)
and can use VPlanPtr from that header.

Patch by Pavel Samolysov.

Reviewers: hsaito, rengolin, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D62576

llvm-svn: 362126
2019-05-30 18:46:13 +00:00
Craig Topper 778e445c58 [LoopVectorize] Add FNeg instruction support
Differential Revision: https://reviews.llvm.org/D62510

llvm-svn: 362124
2019-05-30 18:19:35 +00:00
Florian Hahn e4cfa89915 [LV] Inform about exactly reason of loop illegality
Currently, only the following information is provided by LoopVectorizer
in the case when the CF of the loop is not legal for vectorization:

 LV: Can't vectorize the instructions or CFG
    LV: Not vectorizing: Cannot prove legality.

But this information is not enough for the root cause analysis; what is
exactly wrong with the loop should also be printed:

 LV: Not vectorizing: The exiting block is not the loop latch.

Patch by Pavel Samolysov.

Reviewers: mkuper, hsaito, rengolin, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D62311

llvm-svn: 362056
2019-05-30 05:03:12 +00:00
Alina Sbirlea 63729b0c49 [SLPVectorizer] Set flag to previous default.
Summary:
The refactoring in r360276 moved the `RunSLPVectorization` flag and added the default explicitly. The default should have been `false`, as before.

The new pass manager used to have SLPVectorization on by default, now it's off in opt, and needs D61617 checked in to enable it in clang.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, eraman, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61955

llvm-svn: 361537
2019-05-23 19:07:41 +00:00
Simon Pilgrim 3c05cad03e LoopVectorizationCostModel::selectInterleaveCount - assert we have a non-zero loop cost. NFCI.
The input LoopCost value can be zero, but if so it should be recalculated with the current VF. After that it should always be non-zero.

llvm-svn: 361387
2019-05-22 14:18:17 +00:00
Dinar Temirbulatov 2ff72f6654 [SLP] Refactoring of EdgeInfo and UserTreeIdx in buildTree_rec().
This is a follow-up refactoring patch after the introduction of usable TreeEntry pointers in D61706.
The EdgeInfo struct can now use a TreeEntry pointer instead of an index in VectorizableTree.

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D61795

llvm-svn: 361110
2019-05-19 01:30:41 +00:00
Florian Hahn 9e778e6c73 [LV] Move getScalarizationOverhead and vector call cost computations to CM. (NFC)
This reduces the number of parameters we need to pass in and they seem a
natural fit in LoopVectorizationCostModel. Also simplifies things for
D59995.

As a follow up refactoring, we could only expose a expose a
shouldUseVectorIntrinsic() helper in LoopVectorizationCostModel, instead
of calling getVectorCallCost/getVectorIntrinsicCost in
InnerLoopVectorizer/VPRecipeBuilder.

Reviewers: Ayal, hsaito, dcaballe, rengolin

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D61638

llvm-svn: 360758
2019-05-15 10:05:49 +00:00
Simon Pilgrim 6c3ae79e9b [SLP] Refactor VectorizableTree to use unique_ptr.
This patch fixes the TreeEntry dangling pointer issue caused by reallocations of VectorizableTree.

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D61706

llvm-svn: 360456
2019-05-10 18:55:17 +00:00
Alina Sbirlea 458c7339e1 [NewPassManager] Add tuning option: SLPVectorization [NFC].
Summary: Mirror tuning option from old pass manager in new pass manager.

Reviewers: chandlerc

Subscribers: mehdi_amini, jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61616

llvm-svn: 360276
2019-05-08 17:58:35 +00:00
Alina Sbirlea f31eba6494 [MemorySSA] Teach LoopSimplify to preserve MemorySSA.
Summary:
Preserve MemorySSA in LoopSimplify, in the old pass manager, if the analysis is available.
Do not preserve it in the new pass manager.
Update tests.

Subscribers: nemanjai, jlebar, javed.absar, Prazek, kbarton, zzheng, jsji, llvm-commits, george.burgess.iv, chandlerc

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60833

llvm-svn: 360270
2019-05-08 17:05:36 +00:00
Simon Pilgrim cced3ecc35 [VPlan] Fix "value never used" static analyzer warning. NFCI.
llvm-svn: 360241
2019-05-08 10:52:26 +00:00
Kostya Serebryany b9c5768302 revert r360162 as it breaks most of the buildbots
llvm-svn: 360190
2019-05-07 20:57:11 +00:00
Orlando Cazalet-Hyams 78a6062c24 [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel

Reviewed By: hfinkel

Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D60831

llvm-svn: 360162
2019-05-07 15:37:38 +00:00
Simon Pilgrim 97fbc2abfe [LoadStoreVectorizer] vectorizeStoreChain - ensure we find a store type.
Properly initialize store type to null then ensure we find a real store type in the chain.

Fixes scan-build null dereference warning and makes the code clearer.

llvm-svn: 360031
2019-05-06 10:25:11 +00:00
Simon Pilgrim afb0e664e6 [SLPVectorizer] Prefer pre-increments. NFCI.
llvm-svn: 359989
2019-05-05 17:53:09 +00:00
Simon Pilgrim 5b05f20a3a [SLPVectorizer] Make getSpillCost() const. NFCI.
Ideally getTreeCost() should be const as well but non-const Type creation would need to be addressed first.

llvm-svn: 359975
2019-05-05 10:37:38 +00:00
Alina Sbirlea 733c8c40c8 Enable LoopVectorization by default.
Summary:
When refactoring vectorization flags, vectorization was disabled by default in the new pass manager.
This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults.
Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization.

Reviewers: chandlerc, jgorbe

Subscribers: jlebar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61091

llvm-svn: 359167
2019-04-25 04:49:48 +00:00
Alexey Bataev ef3c1884ec [SLP] Fix crash after r358519, by V. Porpodas.
Summary: The code did not check if operand was undef before casting it to Instruction.

Reviewers: RKSimon, ABataev, dtemirbulatov

Reviewed By: ABataev

Subscribers: uabelho

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61024

llvm-svn: 359136
2019-04-24 20:21:32 +00:00
Fangrui Song efd94c56ba Use llvm::stable_sort
While touching the code, simplify if feasible.

llvm-svn: 358996
2019-04-23 14:51:27 +00:00
Alina Sbirlea 0499a2f961 [NewPassManager] Adding pass tuning options: loop vectorize.
Summary:
Trying to add the plumbing necessary to add tuning options to the new pass manager.
Testing with the flags for loop vectorize.

Reviewers: chandlerc

Subscribers: sanjoy, mehdi_amini, jlebar, steven_wu, dexonsmith, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59723

llvm-svn: 358763
2019-04-19 16:11:59 +00:00
Ali Tamur e8de5cd602 Fix a typo in comments. [NFC]
llvm-svn: 358531
2019-04-16 21:37:43 +00:00
Simon Pilgrim 82ffa88a04 [SLP] Refactoring of the operand reordering code.
This is a refactoring patch which should have all the functionality of the current code. Its goal is twofold:
i. Cleanup and simplify the reordering code, and
ii. Generalize reordering so that it will work for an arbitrary number of operands, not just 2.

This is the second patch in a series of patches that will enable operand reordering across chains of operations. An example of this was presented in EuroLLVM'18 https://www.youtube.com/watch?v=gIEn34LvyNo .

Committed on behalf of @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59973

llvm-svn: 358519
2019-04-16 19:27:00 +00:00
Hiroshi Yamauchi 09e539fcae [PGO] Profile guided code size optimization.
Summary:
Enable some of the existing size optimizations for cold code under PGO.

A ~5% code size saving in big internal app under PGO.

The way it gets BFI/PSI is discussed in the RFC thread

http://lists.llvm.org/pipermail/llvm-dev/2019-March/130894.html 

Note it doesn't currently touch loop passes.

Reviewers: davidxl, eraman

Reviewed By: eraman

Subscribers: mgorny, javed.absar, smeenai, mehdi_amini, eraman, zzheng, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59514

llvm-svn: 358422
2019-04-15 16:49:00 +00:00
Florian Hahn db1a69c250 [VPLAN] Minor improvement to testing and debug messages.
1. Use computed VF for stress testing.
2. If the computed VF does not produce vector code (VF smaller than 2), force VF to be 4.
3. Test vectorization of i64 data on AArch64 to make sure we generate VF != 4 (on X86 that was already tested on AVX).

Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D59952

llvm-svn: 358056
2019-04-10 08:17:28 +00:00
Evandro Menezes 85bd3978ae [IR] Refactor attribute methods in Function class (NFC)
Rename the functions that query the optimization kind attributes.

Differential revision: https://reviews.llvm.org/D60287

llvm-svn: 357731
2019-04-04 22:40:06 +00:00
Vedant Kumar c6bceec01a [DebugInfo] Fix pr41180 : Loop Vectorization Debugify Failure
Bug: https://bugs.llvm.org/show_bug.cgi?id=41180

In the bug test case the debug location was missing for the cmp instruction in
the "middle block" BB. This patch fixes the bug by copying the debug location
from the cmp of the scalar loop's terminator branch, if it exists.

The patch also fixes the debug location on the subsequent branch instruction.
It was previously using the location of the of the original loop's pre-header
block terminator. Both of these instructions will now map to the source line of
the conditional branch in the original loop.

A regression test has been added that covers these issues.

Patch by Orlando Cazalet-Hyams!

Differential Revision: https://reviews.llvm.org/D59944

llvm-svn: 357499
2019-04-02 17:28:34 +00:00
Simon Pilgrim a3b71018d9 [SLP] reorderInputsAccordingToOpcode is const method. NFCI.
llvm-svn: 357490
2019-04-02 16:27:11 +00:00
Simon Pilgrim b06935fa8c [SLP] getVectorElementSize and isTreeTinyAndNotFullyVectorizable are const methods. NFCI.
llvm-svn: 357416
2019-04-01 17:48:03 +00:00
Simon Pilgrim f6c04ad486 [SLP] getGatherCost and isFullyVectorizableTinyTree are const methods. NFCI.
llvm-svn: 357414
2019-04-01 17:32:46 +00:00
Simon Pilgrim 6a75c36ea9 [SLP] Add support for commutative icmp/fcmp predicates
For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands.

This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?).

Differential Revision: https://reviews.llvm.org/D59992

llvm-svn: 357266
2019-03-29 15:28:25 +00:00
Simon Pilgrim 62f0d1650a [SLP] Add support for swapping icmp/fcmp predicates to permit vectorization
We should be able to match elements with the swapped predicate as well - as long as we commute the source operands.

Differential Revision: https://reviews.llvm.org/D59956

llvm-svn: 357243
2019-03-29 10:41:00 +00:00
Benjamin Kramer ba2ea93ad1 Make helper functions static. NFC.
llvm-svn: 357187
2019-03-28 17:18:42 +00:00
Florian Hahn e21ed594d8 [VPlan] Determine Vector Width programmatically.
With this change, the VPlan native path is triggered with the directive:

   #pragma clang loop vectorize(enable)

There is no need to specify the vectorize_width(N) clause.

Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>

Differential Revision: https://reviews.llvm.org/D57598

llvm-svn: 357156
2019-03-28 10:37:12 +00:00
Simon Pilgrim 6f96795b88 [SLPVectorizer] Merge reorderAltShuffleOperands into reorderInputsAccordingToOpcode
As discussed on D59738, this generalizes reorderInputsAccordingToOpcode to handle multiple + non-commutative instructions so we can get rid of reorderAltShuffleOperands and make use of the extra canonicalizations that reorderInputsAccordingToOpcode brings.

Differential Revision: https://reviews.llvm.org/D59784

llvm-svn: 356939
2019-03-25 20:05:27 +00:00
Simon Pilgrim ff3abef395 [SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization
Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops.

This is prep work towards:
(a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands
(b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops.

Differential Revision: https://reviews.llvm.org/D59738

llvm-svn: 356913
2019-03-25 15:53:55 +00:00
Simon Pilgrim 5cd4eb96f6 [SLPVectorizer] shouldReorderOperands - just check for reordering. NFCI.
Remove the I.getOperand() calls from inside shouldReorderOperands - reorderInputsAccordingToOpcode should handle the creation of the operand lists and shouldReorderOperands should just check to see whether the i'th element should be commuted.

llvm-svn: 356854
2019-03-24 13:36:32 +00:00
Simon Pilgrim 1466e5c383 Fix unused variable warning on non-asserts builds. NFCI.
llvm-svn: 356841
2019-03-23 16:56:23 +00:00
Simon Pilgrim 64feec7977 Remove unused function argument. NFCI.
llvm-svn: 356840
2019-03-23 16:20:34 +00:00
Simon Pilgrim c7ba9555cf [SLPVectorizer] reorderInputsAccordingToOpcode - use InstructionState directly. NFCI.
llvm-svn: 356832
2019-03-23 13:44:06 +00:00
Simon Pilgrim f4f01f3cff [SLPVectorizer] Don't repeat VL.size() call. NFCI.
llvm-svn: 356830
2019-03-23 12:11:25 +00:00
Simon Pilgrim b68322f9d0 [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059

llvm-svn: 356814
2019-03-22 21:27:11 +00:00
Simon Pilgrim d3a8fd8bfb Revert rL355906: [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059
........

Reverted due to buildbot failures that I don't have time to track down.

llvm-svn: 355913
2019-03-12 11:51:59 +00:00
Simon Pilgrim 5db95efdbd Try to fix SLPVectorizer BoUpSLP::BoEdgeInfo::dump visibility on non-debug builds
llvm-svn: 355912
2019-03-12 11:31:06 +00:00
Simon Pilgrim 2086a8894d [SLP] Remove redundancy of performing operand reordering twice: once in buildTree() and later in vectorizeTree().
This is a refactoring patch that removes the redundancy of performing operand reordering twice, once in buildTree() and later in vectorizeTree().
To achieve this we need to keep track of the operands within the TreeEntry struct while building the tree, and later in vectorizeTree() we are just accessing them from the TreeEntry in the right order.

This patch is the first in a series of patches that will allow for better operand reordering across chains of instructions (e.g., a chain of ADDs), as presented here: https://www.youtube.com/watch?v=gIEn34LvyNo

Patch by: @vporpo (Vasileios Porpodas)

Differential Revision: https://reviews.llvm.org/D59059

llvm-svn: 355906
2019-03-12 10:51:51 +00:00
Sanjoy Das 3f5ce18658 Reland "Relax constraints for reduction vectorization"
Change from original commit: move test (that uses an X86 triple) into the X86
subdirectory.

Original description:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355889
2019-03-12 01:31:44 +00:00
Sanjoy Das 2136a5bc49 Revert "Relax constraints for reduction vectorization"
This reverts commit r355868.  Breaks hexagon.

llvm-svn: 355873
2019-03-11 22:37:31 +00:00
Sanjoy Das 93f8cc186a Relax constraints for reduction vectorization
Summary:
Gating vectorizing reductions on *all* fastmath flags seems unnecessary;
`reassoc` should be sufficient.

Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal

Reviewed By: sdesmalen

Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57728

llvm-svn: 355868
2019-03-11 21:36:41 +00:00
Jonas Hahnfeld e071cd86df Hide two unused debugging methods, NFCI.
GCC correctly moans that PlainCFGBuilder::isExternalDef(llvm::Value*) and
StackSafetyDataFlowAnalysis::verifyFixedPoint() are defined but not used
in Release builds. Hide them behind 'ifndef NDEBUG'.

llvm-svn: 355205
2019-03-01 17:15:21 +00:00
Simon Pilgrim a066f1f9e6 [Vectorizer] Add vectorization support for fixed smul/umul intrinsics
This requires a couple of tweaks to existing vectorization functions as they were assuming that only the second call argument (ctlz/cttz/powi) could ever be the 'always scalar' argument, but for smul.fix + umul.fix its the third argument.

Differential Revision: https://reviews.llvm.org/D58616

llvm-svn: 354790
2019-02-25 15:42:02 +00:00
Alina Sbirlea 0a8bc14ad7 [MemorySSA & LoopPassManager] Add remaining book keeping [NFCI].
Add plumbing to get MemorySSA in the remaining loop passes.
Also update unit test to add the dependency.
[EnableMSSALoopDependency remains disabled].

llvm-svn: 353901
2019-02-12 23:48:02 +00:00
Michael Kruse 77a614a6e1 Refactor setAlreadyUnrolled() and setAlreadyVectorized().
Loop::setAlreadyUnrolled() and
LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that
stops the same loop from being transformed multiple times. This patch
merges both implementations.

In doing so we fix 3 potential issues:

 * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.*
   metadata even though it will not be used anymore. This already caused
   problems such as http://llvm.org/PR40546. Change the behavior to the
   one of setAlreadyUnrolled which deletes this loop metadata.

 * setAlreadyUnrolled() used to create a new LoopID by calling
   MDNode::get with nullptr as the first operand, then replacing it by
   the returned references using replaceOperandWith. It is possible
   that MDNode::get would instead return an existing node (due to
   de-duplication) that then gets modified. To avoid, use a fresh
   TempMDNode that does not get uniqued with anything else before
   replacing it with replaceOperandWith.

 * LoopVectorizeHints::matchesHintMetadataName() only compares the
   suffix of the attribute to set the new value for. That is, when
   called with "enable", would erase attributes such as
   "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and
   "llvm.loop.distribute.enable" instead of the one to replace.
   Fortunately, function was only called with "isvectorized".

Differential Revision: https://reviews.llvm.org/D57566

llvm-svn: 353738
2019-02-11 19:45:44 +00:00
Chandler Carruth b53f0e1145 Update files that were mistakenly added with the old file header to the
new one.

llvm-svn: 353665
2019-02-11 08:07:38 +00:00
Florian Hahn f557a94aa3 [LV] Remove unnecessary assignment to UserIC.
llvm-svn: 353469
2019-02-07 21:23:37 +00:00
Florian Hahn ba5acbc4fe [LV] Prevent interleaving if computeMaxVF returned None.
As discussed in D57382, interleaving should be avoided if computeMaxVF
returns None, same as we currently do for vectorization.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6477

Reviewers: Ayal, dcaballe, hsaito, mkuper, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D57837

llvm-svn: 353461
2019-02-07 20:49:10 +00:00
James Y Knight 7716075a17 [opaque pointer types] Pass value type to GetElementPtr creation.
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57173

llvm-svn: 352913
2019-02-01 20:44:47 +00:00
James Y Knight 14359ef1b6 [opaque pointer types] Pass value type to LoadInst creation.
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.

Differential Revision: https://reviews.llvm.org/D57172

llvm-svn: 352911
2019-02-01 20:44:24 +00:00
Yevgeny Rouban 4cdd783955 [SLPVectorizer] Get rid of IndexQueue array from vectorizeStores. NFCI.
Indices are checked as they are generated. No need to fill the whole array of indices.

Differential Revision: https://reviews.llvm.org/D57144

llvm-svn: 352839
2019-02-01 06:44:08 +00:00
Mircea Trofin ec02630278 [llvm] Clarify responsiblity of some of DILocation discriminator APIs
Summary:
Renamed setBaseDiscriminator to cloneWithBaseDiscriminator, to match
similar APIs. Also changed its behavior to copy over the other
discriminator components, instead of eliding them.

Renamed cloneWithDuplicationFactor to
cloneByMultiplyingDuplicationFactor, which more closely matches what
this API does.

Reviewers: dblaikie, wmi

Reviewed By: dblaikie

Subscribers: zzheng, llvm-commits

Differential Revision: https://reviews.llvm.org/D56220

llvm-svn: 351996
2019-01-24 00:10:25 +00:00
Hideki Saito 4e4ecae028 [LV][VPlan] Change to implement VPlan based predication for
VPlan-native path

Context: Patch Series #2 for outer loop vectorization support in LV
using VPlan. (RFC:
http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).

Patch series #2 checks that inner loops are still trivially lock-step
among all vector elements. Non-loop branches are blindly assumed as
divergent.

Changes here implement VPlan based predication algorithm to compute
predicates for blocks that need predication. Predicates are computed
for the VPLoop region in reverse post order. A block's predicate is
computed as OR of the masks of all incoming edges. The mask for an
incoming edge is computed as AND of predecessor block's predicate and
either predecessor's Condition bit or NOT(Condition bit) depending on
whether the edge from predecessor block to the current block is true
or false edge.

Reviewers: fhahn, rengolin, hsaito, dcaballe

Reviewed By: fhahn

Patch by Satish Guggilla, thanks!

Differential Revision: https://reviews.llvm.org/D53349

llvm-svn: 351990
2019-01-23 22:43:12 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Alexey Bataev 18809a6bbb [SLP] Fix PR40310: The reduction nodes should stay scalar.
Summary:
Sometimes the SLP vectorizer tries to vectorize the horizontal reduction
nodes during regular vectorization. This may happen inside of the loops,
when there are some vectorizable PHIs. Patch fixes this by checking if
the node is the reduction node and thus it must not be vectorized, it must
be gathered.

Reviewers: RKSimon, spatel, hfinkel, fedor.sergeev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D56783

llvm-svn: 351349
2019-01-16 15:39:52 +00:00
Sanjay Patel 7d65fe5cd5 [LoopVectorizer] give more advice in remark about failure to vectorize call
Something like this is requested by:
https://bugs.llvm.org/show_bug.cgi?id=40265
...and it seems like a common enough case that we should acknowledge it.

Differential Revision: https://reviews.llvm.org/D56551

llvm-svn: 351010
2019-01-12 15:27:15 +00:00
Mircea Trofin b53eeb6f4c [llvm] API for encoding/decoding DWARF discriminators.
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)

The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.

Reviewers: davidxl, danielcdh, wmi, dblaikie

Reviewed By: dblaikie

Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D55681

llvm-svn: 349973
2018-12-21 22:48:50 +00:00
Anton Afanasyev ce28791e20 Test commit
Fix typos.

llvm-svn: 349644
2018-12-19 17:18:40 +00:00
Michael Kruse d4eb13c880 [LoopVectorize] Rename pass options. NFC.
Rename:
NoUnrolling to InterleaveOnlyWhenForced
and
AlwaysVectorize to !VectorizeOnlyWhenForced

Contrary to what the name 'AlwaysVectorize' suggests, it does not
unconditionally vectorize all loops, but applies a cost model to
determine whether vectorization is profitable to all loops. Hence,
passing false will disable the cost model, except when a loop is marked
with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
by @hfinkel in D55716) better matches this behavior.

Similarly, 'NoUnrolling' disables the profitability cost model for
interleaving (a term to distinguish it from unrolling by the
LoopUnrollPass); rename it for consistency.

Differential Revision: https://reviews.llvm.org/D55785

llvm-svn: 349513
2018-12-18 17:46:09 +00:00
Michael Kruse 7244852557 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288

llvm-svn: 348944
2018-12-12 17:32:52 +00:00
Markus Lavin 4dc4ebd606 [PM] Port LoadStoreVectorizer to the new pass manager.
Differential Revision: https://reviews.llvm.org/D54848

llvm-svn: 348570
2018-12-07 08:23:37 +00:00
Alexey Bataev 3689747619 [SLP]PR39774: Update references of the replaced external instructions.
Summary:
An additional fix for PR39774. Need to update the references for the
RedcutionRoot instruction when it is replaced during the vectorization
phase to avoid compiler crash on reduction vectorization.

Reviewers: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55017

llvm-svn: 347997
2018-11-30 15:14:20 +00:00
Alexey Bataev 579c2d9d64 [SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized.
Summary:
If the original reduction root instruction was vectorized, it might be
removed from the tree. It means that the insertion point may become
invalidated and the whole vectorization of the reduction leads to the
incorrect output result.
The ReductionRoot instruction must be marked as externally used so it
could not be removed. Otherwise it might cause inconsistency with the
cost model and we may end up with too optimistic optimization.

Reviewers: RKSimon, spatel, hfinkel, mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54955

llvm-svn: 347759
2018-11-28 14:34:11 +00:00
Vedant Kumar 4de31bba51 [IR] Add hasNPredecessors, hasNPredecessorsOrMore to BasicBlock
Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.

This can be more efficient than using pred_size(), which is a linear
time operation.

We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).

With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.

See llvm.org/PR39702 for a harder but more general way of achieving
similar results.

Differential Revision: https://reviews.llvm.org/D54686

llvm-svn: 347256
2018-11-19 19:54:27 +00:00
Anna Thomas 5e9215f02b [LV] Avoid vectorizing unsafe dependencies in uniform address
Summary:
Currently, when vectorizing stores to uniform addresses, the only
instance we prevent vectorization is if there are multiple stores to the
same uniform address causing an unsafe dependency.
This patch teaches LAA to avoid vectorizing loops that have an unsafe
cross-iteration dependency between a load and a store to the same uniform address.

Fixes PR39653.

Reviewers: Ayal, efriedma

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D54538

llvm-svn: 347220
2018-11-19 15:39:59 +00:00
Florian Hahn 6df11868b5 [VPlan, SLP] Use SmallPtrSet for Candidates.
This slightly improves the candidate handling in getBest().

llvm-svn: 346870
2018-11-14 15:58:40 +00:00
Florian Hahn 02cb67deb9 [VPlan] Remove LLVM_DEBUG from VPlanSlp::dumpBundle.
The caller should take care of only calling it with debug enabled.

llvm-svn: 346860
2018-11-14 13:33:44 +00:00
Florian Hahn 2eca3728ee [VPlan] Update ifdef.
llvm-svn: 346858
2018-11-14 13:21:26 +00:00
Florian Hahn 09e516c54b [VPlan, SLP] Add simple SLP analysis on top of VPlan.
This patch adds an initial implementation of the look-ahead SLP tree
construction described in 'Look-Ahead SLP: Auto-vectorization in the Presence
of Commutative Operations, CGO 2018 by Vasileios Porpodas, Rodrigo C. O. Rocha,
Luís F. W. Góes'.

It returns an SLP tree represented as VPInstructions, with combined
instructions represented as a single, wider VPInstruction.

This initial version does not support instructions with multiple
different users (either inside or outside the SLP tree) or
non-instruction operands; it won't generate any shuffles or
insertelement instructions.

It also just adds the analysis that builds an SLP tree rooted in a set
of stores. It does not include any cost modeling or memory legality
checks. The plan is to integrate it with VPlan based cost modeling, once
available and to only apply it to operations that can be widened.

A follow-up patch will add a support for replacing instructions in a
VPlan with their SLP counter parts.

Reviewers: Ayal, mssimpso, rengolin, mkuper, hfinkel, hsaito, dcaballe, vporpo, RKSimon, ABataev

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D4949

llvm-svn: 346857
2018-11-14 13:11:49 +00:00
Florian Hahn a4dc7feeea [VPlan] VPlan version of InterleavedAccessInfo.
This patch turns InterleaveGroup into a template with the instruction type
being a template parameter. It also adds a VPInterleavedAccessInfo class, which
only contains a mapping from VPInstructions to their respective InterleaveGroup.
As we do not have access to scalar evolution in VPlan, we can re-use
convert InterleavedAccessInfo to VPInterleavedAccess info.


Reviewers: Ayal, mssimpso, hfinkel, dcaballe, rengolin, mkuper, hsaito

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D49489

llvm-svn: 346758
2018-11-13 15:58:18 +00:00
Simon Pilgrim 631f2bf51e [CostModel] Add more realistic SK_ExtractSubvector generic costs.
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.

This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type.

llvm-svn: 346656
2018-11-12 14:25:23 +00:00
Ayal Zaks 45a3ca7be7 [LV] Avoid vectorizing loops under opt for size that involve SCEV checks
Fix PR39417, PR39497

The loop vectorizer may generate runtime SCEV checks for overflow and stride==1
cases, leading to execution of original scalar loop. The latter is forbidden
when optimizing for size. An assert introduced in r344743 triggered the above
PR's showing it does happen. This patch fixes this behavior by preventing
vectorization in such cases.

Differential Revision: https://reviews.llvm.org/D53612

llvm-svn: 345959
2018-11-02 09:16:12 +00:00
Dorit Nuzman 34da6dd696 [LV] Support vectorization of interleave-groups that require an epilog under
optsize using masked wide loads 

Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53668

llvm-svn: 345705
2018-10-31 09:57:56 +00:00
Simon Pilgrim 44a9a71d2a [TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.

Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!

I've done my best to fix a number of vectorizer uses:

SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.

LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.

Differential Revision: https://reviews.llvm.org/D53573

llvm-svn: 345617
2018-10-30 18:10:02 +00:00
Jonas Paulsson 1f067c94dc [LoopVectorizer] Fix for cost values of memory accesses.
This commit is a combination of two patches:

* "Fix in getScalarizationOverhead()"

   If target returns false in TTI.prefersVectorizedAddressing(), it means the
   address registers will not need to be extracted. Therefore, there should
   be no operands scalarization overhead for a load instruction.

* "Don't pass the instruction pointer from getMemInstScalarizationCost."

   Since VF is always > 1, this is a cost query for an instruction in the
   vectorized loop and it should not be evaluated within the scalar
   context of the instruction.

Review: Ulrich Weigand, Hal Finkel
https://reviews.llvm.org/D52351
https://reviews.llvm.org/D52417

llvm-svn: 345603
2018-10-30 14:34:15 +00:00
Dorit Nuzman 5114390e48 [LV] Don't have fold-tail under optsize invalidate interleave-groups when
masked-interleaving is enabled

Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: hsaito

Differential Revision: https://reviews.llvm.org/D53559

llvm-svn: 345115
2018-10-24 07:11:38 +00:00
Simon Pilgrim 532a0f122e [SLPVectorizer] Add basic support for mul/and/or/xor horizontal reductions
Expand arithmetic reduction to include mul/and/or/xor instructions.

This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests.

This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason.

Differential Revision: https://reviews.llvm.org/D53473

llvm-svn: 345037
2018-10-23 15:13:09 +00:00
Dorit Nuzman da5dc13355 Leftover bits from https://reviews.llvm.org/D53420 that were accidentally left
out of revision 344883

llvm-svn: 345021
2018-10-23 11:51:55 +00:00
Dorit Nuzman 3ec99fe21b [IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when
optimizing for size

LV is careful to respect -Os and not to create a scalar epilog in all cases
(runtime tests, trip-counts that require a remainder loop) except for peeling
due to gaps in interleave-groups. This patch fixes that; -Os will now have us
invalidate such interleave-groups and vectorize without an epilog.

The patch also removes a related FIXME comment that is now obsolete, and was
also inaccurate:
"FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller
MaxVF that does not require a scalar epilog."
(requiresScalarEpilog() has nothing to do with VF).

Reviewers: Ayal, hsaito, dcaballe, fhahn

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53420

llvm-svn: 344883
2018-10-22 06:17:09 +00:00
Ayal Zaks b0b5312e67 [LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size
When optimizing for size, a loop is vectorized only if the resulting vector loop
completely replaces the original scalar loop. This holds if no runtime guards
are needed, if the original trip-count TC does not overflow, and if TC is a
known constant that is a multiple of the VF. The last two TC-related conditions
can be overcome by
1. rounding the trip-count of the vector loop up from TC to a multiple of VF;
2. masking the vector body under a newly introduced "if (i <= TC-1)" condition.

The patch allows loops with arbitrary trip counts to be vectorized under -Os,
subject to the existing cost model considerations. It also applies to loops with
small trip counts (under -O2) which are currently handled as if under -Os.

The patch does not handle loops with reductions, live-outs, or w/o a primary
induction variable, and disallows interleave groups.

(Third, final and main part of -)
Differential Revision: https://reviews.llvm.org/D50480

llvm-svn: 344743
2018-10-18 15:03:15 +00:00
Anna Thomas 6f732bfb79 [LV] Teach vectorizer about variant value store into uniform address
Summary:
Teach vectorizer about vectorizing variant value stores to uniform
address. Similar to rL343028, we do not allow vectorization if we have
multiple stores to the same uniform address.

Cost model already has the change for considering the extract
instruction cost for a variant value store. See added test cases for how
vectorization is done.
The patch also contains changes to the ORE messages.

Reviewers: Ayal, mkuper, anemet, hsaito

Subscribers: rkruppe, llvm-commits

Differential Revision: https://reviews.llvm.org/D52656

llvm-svn: 344613
2018-10-16 15:46:26 +00:00
Chandler Carruth e303c87e19 [TI removal] Make `getTerminator()` return a generic `Instruction`.
This removes the primary remaining API producing `TerminatorInst` which
will reduce the rate at which code is introduced trying to use it and
generally make it much easier to remove the remaining APIs across the
codebase.

Also clean up some of the stragglers that the previous mechanical update
of variables missed.

Users of LLVM and out-of-tree code generally will need to update any
explicit variable types to handle this. Replacing `TerminatorInst` with
`Instruction` (or `auto`) almost always works. Most of these edits were
made in prior commits using the perl one-liner:
```
perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator\(\))/Instruction\1/g'
```

This also my break some rare use cases where people overload for both
`Instruction` and `TerminatorInst`, but these should be easily fixed by
removing the `TerminatorInst` overload.

llvm-svn: 344504
2018-10-15 10:42:50 +00:00
Chandler Carruth edb12a838a [TI removal] Make variables declared as `TerminatorInst` and initialized
by `getTerminator()` calls instead be declared as `Instruction`.

This is the biggest remaining chunk of the usage of `getTerminator()`
that insists on the narrow type and so is an easy batch of updates.
Several files saw more extensive updates where this would cascade to
requiring API updates within the file to use `Instruction` instead of
`TerminatorInst`. All of these were trivial in nature (pervasively using
`Instruction` instead just worked).

llvm-svn: 344502
2018-10-15 10:04:59 +00:00
Ayal Zaks e567b5b526 [LV] Fix comments reported when not vectorizing single iteration loops; NFC
Landing this as a separate part of https://reviews.llvm.org/D50480, being a
seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count
without remainder under opt for size).

llvm-svn: 344483
2018-10-14 17:53:02 +00:00
Dorit Nuzman 38bbf81ade recommit 344472 after fixing build failure on ARM and PPC.
llvm-svn: 344475
2018-10-14 08:50:06 +00:00
Dorit Nuzman 5118c68cde revert 344472 due to failures.
llvm-svn: 344473
2018-10-14 07:21:20 +00:00
Dorit Nuzman 8174368955 [IAI,LV] Add support for vectorizing predicated strided accesses using masked
interleave-group

The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.

Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D53011

llvm-svn: 344472
2018-10-14 07:06:16 +00:00
Florian Hahn 18e07bb822 [LV] Use SmallVector instead of DenseMap in calculateRegisterUsage (NFC).
We assign indices sequentially for seen instructions, so we can just use
a vector and push back the seen instructions. No need for using a
DenseMap.

Reviewers: hsaito, rengolin, nadav, dcaballe

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D53089

llvm-svn: 344233
2018-10-11 09:46:25 +00:00
Florian Hahn 7eb5cb4ebc [LV] Ignore more debug info.
We can avoid doing some unnecessary work by skipping debug instructions
in a few loops. It also helps to ensure debug instructions do not
prevent vectorization, although I do not have any concrete test cases
for that.

Reviewers: rengolin, hsaito, dcaballe, aprantl, vsk

Reviewed By: rengolin, dcaballe

Differential Revision: https://reviews.llvm.org/D53091

llvm-svn: 344232
2018-10-11 09:27:24 +00:00
Renato Golin d8e7ca4a32 [VPlan] Fix CondBit quoting in dumpBasicBlock
Quotes were being printed for VPInstructions but not the rest.

llvm-svn: 344161
2018-10-10 17:55:21 +00:00
Max Kazantsev b07369651e [LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160
At the point when we perform `emitTransformedIndex`, we have a broken IR (in
particular, we have Phis for which not every incoming value is properly set). On
such IR, it is illegal to create SCEV expressions, because their internal
simplification process may try to prove some predicates and break when it
stumbles across some broken IR.

The only purpose of using SCEV in this particular place is attempt to simplify
the generated code slightly. It seems that the result isn't worth it, because
some trivial cases (like addition of zero and multiplication by 1) can be
handled separately if needed, but more generally InstCombine is able to achieve
the goals we want to achieve by using SCEV.

This patch fixes a functional crash described in PR39160, and as side-effect it
also generates a bit smarter code in some simple cases. It also may cause some
optimality loss (i.e. we will now generate `mul` by power of `2` instead of
shift etc), but there is nothing what InstCombine could not handle later. In
case of dire need, we can support more trivial cases just in place.

Note that this patch only fixes one particular case of the general problem that
LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR.
The general solution, however, seems complex enough.

Differential Revision: https://reviews.llvm.org/D52881
Reviewed By: fhahn, hsaito

llvm-svn: 343954
2018-10-08 05:46:29 +00:00
Jonas Paulsson 29d80f07ee [LoopVectorizer] Use TTI.getOperandInfo()
Call getOperandInfo() instead of using (near) duplicated code in
LoopVectorizationCostModel::getInstructionCost().

This gets the OperandValueKind and OperandValueProperties values for a Value
passed as operand to an arithmetic instruction.

getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but
is now instead a public member.

Review: Florian Hahn
https://reviews.llvm.org/D52883

llvm-svn: 343852
2018-10-05 14:34:04 +00:00
Anna Thomas b1e3d45318 [LV][LAA] Vectorize loop invariant values stored into loop invariant address
Summary:
We are overly conservative in loop vectorizer with respect to stores to loop
invariant addresses.
More details in https://bugs.llvm.org/show_bug.cgi?id=38546
This is the first part of the fix where we start with vectorizing loop invariant
values to loop invariant addresses.

This also includes changes to ORE for stores to invariant address.

Reviewers: anemet, Ayal, mkuper, mssimpso

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50665

llvm-svn: 343028
2018-09-25 20:57:20 +00:00
Warren Ristow 4f27730eaf [Loop Vectorizer] Abandon vectorization when no integer IV found
Support for vectorizing loops with secondary floating-point induction
variables was added in r276554.  A primary integer IV is still required
for vectorization to be done.  If an FP IV was found, but no integer IV
was found at all (primary or secondary), the attempt to vectorize still
went forward, causing a compiler-crash.  This change abandons that
attempt when no integer IV is found.  (Vectorizing FP-only cases like
this, rather than bailing out, is discussed as possible future work
in D52327.)

See PR38800 for more information.

Differential Revision: https://reviews.llvm.org/D52327

llvm-svn: 342786
2018-09-21 23:03:50 +00:00
Matt Arsenault c640798597 LSV: Fix adjust alloca alignment trick for AMDGPU
This was checking the hardcoded address space 0 for the stack.
Additionally, this should be checking for legality with
the adjusted alignment, so defer the alignment check.

Also try to split if the unaligned access isn't allowed.

llvm-svn: 342442
2018-09-18 02:05:44 +00:00
Hideki Saito d19851ac7e Fix for the buildbot failure http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/23635
from the commit (r342197) of https://reviews.llvm.org/D50820.

llvm-svn: 342201
2018-09-14 02:02:57 +00:00