Commit Graph

3283 Commits

Author SHA1 Message Date
David Green 9727c77d58 [NFC] Rename Instrinsic to Intrinsic 2022-04-25 18:13:23 +01:00
Valery N Dmitriev edf7bed87b [SLP][NFC] Outline lookahead heuristics into a separate helper class.
Minor refactoring to reduce size of functional change D124309:
  look-ahead scoring routines pulled out of VLOperands and formed
  new LookAheadHeuristics helper class.

Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo)
Differential Revision: https://reviews.llvm.org/D124313
2022-04-22 18:59:08 -07:00
Vasileios Porpodas 889588ee97 [SLP] Refactoring isLegalBroadcastLoad() to use `ElementCount`.
Replacing `unsigned` with `ElementCount` in the argument of `isLegalBroadcastLoad()`.
This helps reduce the diff of a future SLP patch for AArch64.
2022-04-21 10:19:00 -07:00
Florian Hahn bea69b232f
[VPlan] Initial modeling of middle block in VPlan.
This patch extends the scope of VPlan to also include the exit (aka
middle) block.

For now, the exit block remains empty, but handling of exit values will
subsequently be moved to VPlan, by adding recipes to model exit values
in the exit block.

As a first step, this will allow fixing #51366.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123457
2022-04-20 19:34:41 +01:00
Vasileios Porpodas 8d4b5e0833 [NFC][SLP] Improved description of getShallowScore() and getScoreAtLevelRec()
Differential Revision: https://reviews.llvm.org/D124027
2022-04-19 12:15:36 -07:00
Florian Hahn 4026b718b8
[VPlan] Remove unused SCEV forward declaration (NFC). 2022-04-19 17:16:17 +02:00
Alexey Bataev 883571928c Revert "[SLP]Improve reductions analysis and emission, part 1."
This reverts commit 0e1f4d4d3c to fix
a crash reported in PR54976
2022-04-19 06:17:03 -07:00
Florian Hahn a65f2730d2
[VPlan] Expand induction step in VPlan pre-header.
This patch moves SCEV expansion of steps used by
VPWidenIntOrFpInductionRecipes to the pre-header using
VPExpandSCEVRecipe. This ensures that those steps are expanded while the
CFG is in a valid state. Previously, SCEV expansion may happen during
vector body code-generation, during which the CFG may be invalid,
causing issues with SCEV expansion.

Depends on D122095.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D122096
2022-04-19 13:06:39 +02:00
Vasileios Porpodas b1333f03d9 Recommit "[SLP] Support internal users of splat loads"
Code review: https://reviews.llvm.org/D121940

This reverts commit 359dbb0d3d.
2022-04-18 15:58:01 -07:00
Vasileios Porpodas 359dbb0d3d Revert "[SLP] Support internal users of splat loads"
This reverts commit f8e1337115.
2022-04-18 12:12:34 -07:00
Vasileios Porpodas f8e1337115 [SLP] Support internal users of splat loads
Until now we would only accept a broadcast load pattern if it is only used
by a single vector of instructions.

This patch relaxes this, and allows for the broadcast to have more than one
user vector, as long as all of its uses are internal to the SLP graph and
vectorized.

Differential Revision: https://reviews.llvm.org/D121940
2022-04-18 11:59:44 -07:00
Florian Hahn 73f5d7d0d6
[VPlan] Handle equal address and store ops in onlyFirstLaneDemanded.
With opaque pointers, the stored value and address can be the same.

Previously the code in VPWidenMemoryInstructionRecipe::onlyFirstLaneDemanded
incorrectly considers stores with matching store and pointer operands as
only demanding the first lane, causing a crash.
2022-04-15 22:53:33 +02:00
Johannes Doerfert 1fb415fee9 [AMDGPU][FIX] Proper load-store-vectorizer result with opaque pointers
The original code relied on the fact that we needed a bitcast
instruction (for non constant base objects). With opaque pointers there
might not be a bitcast. Always check if reordering is required instead.

Fixes: https://github.com/llvm/llvm-project/issues/54896

Differential Revision: https://reviews.llvm.org/D123694
2022-04-15 13:42:46 -05:00
Florian Hahn 2c14cdf831
[VPlan] Turn external defs in Value -> VPValue mapping.
This addresses an existing TODO by keeping a mapping of external IR
Value * definitions wrapped in VPValues for use in a VPlan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123700
2022-04-14 12:03:09 +02:00
serge-sans-paille fa5a4e1b95 [iwyu] Handle regressions in libLLVM header include
Running iwyu-diff on LLVM codebase since a96638e50e detected a few
regressions, fixing them.
2022-04-13 20:53:19 +02:00
Alexey Bataev 0e1f4d4d3c [SLP]Improve reductions analysis and emission, part 1.
Currently SLP vectorizer walks through the instructions and selects
3 main classes of values: 1) reduction operations - instructions with same
reduction opcode (add, mul, min/max, etc.), which build the reduction,
2) reduced values - instructions with the same opcodes, but different
from the reduction opcode, 3) extra arguments - all other values,
instructions from the different basic block rather than the root node,
instructions with to many/less uses.

This scheme is not very efficient. It excludes some instructions and all
non-instruction values from the reductions (constants, proficient
gathers), to many possibly reduced values are marked as extra arguments.
Patch improves this process by introducing a bit extended analysis
stage. During this stage, we still try to select 3 classes of the
values: 1) reduction operations - same as before, 2) possibly reduced
values - all instructions from the current block/non-instructions, which
may build a vectorization tree, 3) extra arguments - instructions from
the different basic blocks. Additionally, an extra sorting of the
possibly reduced values occurs to build the scalar sequences which
highly likely will bed vectorized, e.g. loads are grouped by the
distance between them, constants are grouped together, cmp instructions
are sorted by their compare types and predicates, extractelement
instructions are sorted by the vector operand, etc. Also, these groups
are reordered by their length so the longest group is the first in the
list of the possibly reduced values.

The vectorization process tries to emit the reductions for all these
groups. These reductions, remaining non-vectorized possible reduced
values and extra arguments are then combined into the final expression
just like it was before.

Differential Revision: https://reviews.llvm.org/D114171
2022-04-12 17:46:11 -07:00
Muhammad Omair Javaid 42ebfa8269 Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth"
This reverts commit 64b6192e81.

This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage:

https://lab.llvm.org/buildbot/#/builders/176/builds/1515

llvm-tblgen crashes after applying this patch.
2022-04-13 04:53:07 +05:00
Florian Hahn 5f1eb74850
[VPlan] Place VPExpandSCEVRecipe in pre-header.
After D121624 models the pre-header in VPlan, VPExpandSCEVRecipes can be
placed there. This ensures SCEV expansion happens before modifying the
CFG during VPlan execution, when CFG is incomplete.

Depends on D121624.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D122095
2022-04-10 10:26:20 +02:00
Florian Hahn 256c6b0ba1
[VPlan] Model pre-header explicitly.
This patch extends the scope of VPlan to also model the pre-header.
The pre-header can be used to place recipes that should be code-gen'd
outside the loop, like SCEV expansion.

Depends on D121623.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121624
2022-04-09 14:19:47 +02:00
Florian Hahn 467dbcd9f1
[LV] Set debug loc after setting insert point.
This fixes the code to actually use the location of the instruction, if
available. Previously, SetInsertPoint would overwrite the insert point
set from the instruction.
2022-04-08 20:34:40 +02:00
Florian Hahn 29fe998eaa
[VPlan] Preserve debug location when creating branch.
Update createEmptyBasicBlock to preserve the debug location of the
previous terminator.
2022-04-08 17:22:53 +02:00
Florian Hahn 4388c979da
[VPlan] Use vector.body as header name in VPlan native path.
This brings the VPlan block naming in line with the naming of the
generated basic blocks.
2022-04-07 10:31:12 +02:00
Jingu Kang 64b6192e81 [AArch64] Set maximum VF with shouldMaximizeVectorBandwidth
Set the maximum VF of AArch64 with 128 / the size of smallest type in loop.

Differential Revision: https://reviews.llvm.org/D118979
2022-04-05 13:16:52 +01:00
Florian Hahn 1ff022e21b
[LV] Add vector.body block to parent loop during skeleton creation.
When creating induction resume values, SCEV queries may rely on
LoopInfo. Make sure vector.body gets added to the loop of the pre-header
during skeleton construction.

%vector.body will be moved to the vector preheader during VPlan
execution.

Fixes #54745.
2022-04-05 11:54:17 +01:00
Florian Hahn 1817c526e1
[VPlan] Update VPInterleavedAccessInfo to use getVectorLoopRegion.
Update VPInterleavedAccessInfo  to use the generic getVectorLoopRegion
helper instead of relying on the entry block being the top-most vector
loop region.
2022-04-04 10:26:39 +01:00
Florian Hahn 8cd1892725
[VPlan] Remember previous loop and reset vector loop.
At the moment this is NFC, but will be needed once nested loops are also
modeled as regions. Preparation for D123005.
2022-04-04 09:27:15 +01:00
Philip Reames 88de27e3fd [LV] Handle non-integral types when considering interleave widening legality
In general, anywhere we might need to insert a blind bitcast, we need to make sure the types are losslessly convertible.

This fixes pr54634.
2022-04-03 20:16:20 -07:00
Florian Hahn 95b2aa511e
[VPlan] Set VPlan header block name to vector.body.
This brings the VPlan block naming in line with the naming of the
generated basic blocks.
2022-04-02 19:34:32 +01:00
Florian Hahn f8101e4d68
Recommit "[LV] Remove unneeded createHeaderBranch.(NFCI)"
This reverts commit 14e3650f01.

The issue causing the revert were fixed independently in
a08c90a402 and 14e5f9785c.
2022-04-01 16:53:39 +01:00
Florian Hahn 14e5f9785c
[LV] Add SCEV workaround from 80e8025 to epilogue vector code path.
This was exposed by 14e3650f. The recommit of 14e3650f will hit the
problematic code path requiring the workaround.
test case that crashes without the workaround.
2022-04-01 15:14:47 +01:00
Florian Hahn a08c90a402
[LV] Re-use TripCount from EPI.TripCount.
During skeleton construction for the epilogue vector loop, generic
helpers use getOrCreateTripCount, which will re-expand the trip count
computation. Instead, re-use the TripCount created during main loop
vectorization.
2022-04-01 13:47:34 +01:00
Florian Hahn 14e3650f01
Revert "Recommit "[LV] Remove unneeded createHeaderBranch.(NFCI)""
This reverts commit 8378a71b6c.

It looks like this patch uncovered another issue, e.g. see
https://lab.llvm.org/buildbot/#/builders/168/builds/5518
2022-03-31 19:00:48 +01:00
Florian Hahn 8378a71b6c
Recommit "[LV] Remove unneeded createHeaderBranch.(NFCI)"
This reverts the revert commit 2760cdc9c6.

This version pulls in the code to create the vector loop object in VPlan
from D121624.

This is needed because otherwise existing LoopInfo verification will
fail, as a loop block doesn't have in-loop successors now that we
do not replace the branch.

Now that we do not add new loops during skeleton construction, there's
also no need to verify LI there.
2022-03-31 14:48:32 +01:00
Florian Hahn 2760cdc9c6
Revert "[LV] Remove unneeded createHeaderBranch.(NFCI)"
This reverts commit 32bc83d11e.

This is causing bots with expensive-checks to fail. Revert while I
investigate.
2022-03-31 12:32:50 +01:00
Florian Hahn 32bc83d11e
[LV] Remove unneeded createHeaderBranch.(NFCI)
The only remaining use was to get the exit block of the loop. Instead of
relying on the loop, use the successor of VectorHeaderBB
(LoopMiddleBlock) directly to set VPTransformState::CFG::ExitB

Depends on D121621.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121623
2022-03-31 11:48:52 +01:00
Florian Hahn 2c494f0941
[VPlan] Remove unneeded Loop variable (NFC).
Suggested in D121623. The remaining uses of L can be replaced, reducing
the need for the variable.
2022-03-31 10:34:28 +01:00
David Green b65267ca7b [LV] Invalidate widening decisions after maximizing vector bandwidth
When MaximizeVectorBandwidth is enabled, we can end up (via calls to
collectUniformsAndScalars/setCostBasedWideningDecision through
calculateRegisterUsage) making widening decisions before we have decided
whether to fold the tail by masking. These decisions will be wrong if we
later decided to fold the tail, for example when the trip count is very
low. It will use incorrect costs for loads that should get masked, using
standard memory operation costs instead.

This still at the moment uses the EmulatedMaskMemRefHack costs (a bit
unfortunately), but the old costs without this change were 1, leading to
too optimistic vectorization.

This slightly changes the way that the MaximizeVectorBandwidth option
works to make it easier to test, always honouring the option if it is
set.

Differential Revision: https://reviews.llvm.org/D120215
2022-03-31 09:19:31 +01:00
Florian Hahn e4543af4e6
[VPlan] Track current vector loop in VPTransformState (NFC).
Instead of looking up the vector loop using the header, keep track of
the current vector loop in VPTransformState. This removes the
requirement for the vector header block being part of the loop up front.

A follow-up patch will move the code to generate the Loop object for the
vector loop to VPRegionBlock.

Depends on D121619.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121621
2022-03-30 22:16:40 +01:00
Florian Hahn e8673f2f20
[LV] Do not create separate latch block in VPlan::execute.
Now that all dependencies on creating the latch block up-front have been
removed, there is no need to create it early.

Depends on D121618.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121619
2022-03-30 17:31:38 +01:00
Florian Hahn 8a4077fac0
[LV] Pass LoopHeaderBB directly to updateDominatorTree. (NFC)
At the call site, we already know what the vector header block is. Pass
it directly.
2022-03-30 13:11:20 +01:00
Florian Hahn ecb4171dcb
[LV] Handle zero cost loops in selectInterleaveCount.
In some case, like in the added test case, we can reach
selectInterleaveCount with loops that actually have a cost of 0.

Unfortunately a loop cost of 0 is also used to communicate that the cost
has not been computed yet. To resolve the crash, bail out if the cost
remains zero after computing it.

This seems like the best option, as there are multiple code paths that
return a cost of 0 to force a computation in selectInterleaveCount.
Computing the cost at multiple places up front there would unnecessarily
complicate the logic.

Fixes #54413.
2022-03-29 22:52:43 +01:00
Florian Hahn d1d3563278
[LV] Move code to place pointer induction increment to VPlan post-processing.
This patch moves the code to set the correct incoming block for the
backedge value to VPlan::execute.

When generating the phi node, the backedge value is temporarily added
using the pre-header as incoming block. The invalid phi node will be
fixed up during VPlan::execute after main VPlan code generation.
At the same time, the backedge value is also moved to the latch.

This change removes the requirement to create the latch block up-front
for VPWidenInductionPHIRecipe::execute, which in turn will enable
modeling the pre-header in VPlan.

Depends on D121617.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121618
2022-03-29 20:27:59 +01:00
Philip Reames 7d6e8f2a96 [slp] Delete dead scalar instructions feeding vectorized instructions
If we vectorize a e.g. store, we leave around a bunch of getelementptrs for the individual scalar stores which we removed. We can go ahead and delete them as well.

This is purely for test output quality and readability. It should have no effect in any sane pipeline.

Differential Revision: https://reviews.llvm.org/D122493
2022-03-28 20:10:13 -07:00
Florian Hahn e7bf2ea934
[LV] Move code to place induction increment to VPlan post-processing.
This patch moves the code to set the correct incoming block for the
backedge value to VPlan::execute.

When generating the phi node, the backedge value is temporarily added
using the pre-header as incoming block. The invalid phi node will be
fixed up during VPlan::execute after main VPlan code generation.
At the same time, the backedge value is also moved to the latch.

This change removes the requirement to create the latch block up-front
for VPWidenIntOrFpInductionRecipe::execute, which in turn will enable
modeling the pre-header in VPlan.

As an alternative, the increment could be modeled as separate recipe,
but that would require more work and a bit of redundant code, as we need
to create the step-vector during VPWidenIntOrFpInductionRecipe::execute
anyways, to create the values for different parts.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121617
2022-03-28 16:20:02 +01:00
Philip Reames f80aaa675f [SLP] Simplify eraseInstruction [NFC]
This simplifies the implementation of eraseInstruction by moving the odd-replace-users-with-undef handling back to the only caller which uses it.  This handling was not obviously correct, so add the asserts which make it clear why this is safe to do at all.  The result is simpler code and stronger assertions.
2022-03-25 12:01:52 -07:00
Philip Reames 48cc9287f5 Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3)
The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling).  Most of these were fixed over the weekend and have had several days to bake.  The last was fixed this morning after being noticed in manual review of test changes yesterday.  See the review thread for links to each change.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

   Before this patch:

   704357 SLP                          - Number of calcDeps actions
   699021 SLP                          - Number of schedule calls
   5598 SLP                          - Number of ReSchedule actions
   59 SLP                          - Number of ReScheduleOnFail actions
   10084 SLP                          - Number of schedule resets
   8523 SLP                          - Number of vector instructions generated

   After this patch:

   102895 SLP                          - Number of calcDeps actions
   161916 SLP                          - Number of schedule calls
   5637 SLP                          - Number of ReSchedule actions
   55 SLP                          - Number of ReScheduleOnFail actions
   10083 SLP                          - Number of schedule resets
   8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-03-25 10:39:23 -07:00
Philip Reames ec858f0201 [SLP] Optimize stacksave dependence handling [NFC]
After writing the commit message for 4b1bace28, realized that the mentioned optimization was rather straight forward.  We already have the code for scanning a block during region initialization, we can simply keep track if we've seen a stacksave or stackrestore.  If we haven't, none of these dependencies are relevant and we can avoid the relatively expensive scans entirely.
2022-03-25 10:04:10 -07:00
Philip Reames a16308c282 [SLP] Explicit track required stacksave/alloca dependency (try 3)
This is an extension of commit b7806c to handle one last case noticed in test changes for D118538.  Again, this is thought to be a latent bug in the existing code, though this time I have not managed to reduce tests for the original algoritthm.

The prior attempt had failed to account for this case:
  %a = alloca i8
  stacksave
  stackrestore
  store i8 0, i8* %a

If we allow '%a' to reorder into the stacksave/restore region, then the alloca will be deallocated before the use.  We will have taken a well defined program, and introduced a use-after-free bug.

There's also an inverse case where the alloca originally follows the stackrestore, and we need to prevent the reordering it above the restore.

Compile time wise, we potentially do an extra scan of the block for each alloca seen in a bundle.  This is significantly more expensive than the stacksave rooted version and is why I'd tried to avoid this in the initial patch.  There is room to optimize this (by essentially caching a "has stacksave" bit per block), but I'm leaving that to future work if it actually shows up in practice.  Since allocas in bundles should be rare in practice, I suspect we can defer the complexity for a long while.
2022-03-25 10:04:10 -07:00
Florian Hahn e47d220230
[LV] Use getVectorLoopRegion to retrieve header. (NFC)
Update all places that currently assume the entry block to the plan is
also the vector loop header to use getVectorLoopRegion instead.

getVectorLoopRegion will keep doing the right thing when the pre-header
is modeled explicitly (and becomes the new entry block in the plan).
2022-03-25 16:57:12 +00:00
Philip Reames d9756fa723 [slp] Factor out a lambda to avoid uplicating code a third time in upcoming patch [nfc] 2022-03-25 09:02:39 -07:00
Fraser Cormack 2e44b7872b [VectorCombine] Insert addrspacecast when crossing address space boundaries
We can not bitcast pointers across different address spaces. This was
previously fixed in D89577 but then in D93229 an enhancement was added
which peeks further through the ponter operand, opening up the
possibility that address-space violations could be introduced.

Instead of bailing as the previous fix did, simply insert an
addrspacecast cast instruction.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D121787
2022-03-24 19:08:08 +00:00
Simon Pilgrim 597aefa89c Fix unused variable warning by embedding inside assertion 2022-03-24 17:41:24 +00:00
Florian Hahn 46432a0088
[VPlan] Add VPWidenPointerInductionRecipe.
This patch moves pointer induction handling from VPWidenPHIRecipe to its
own recipe. In the process, it adds all information required to generate
code for pointer inductions without relying on Legal to access the list
of induction phis.

Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121615
2022-03-24 14:58:45 +00:00
Alexey Bataev 20973c0841 [SLP][NFC]Fix param name in comments, NFC. 2022-03-24 05:58:42 -07:00
Vasileios Porpodas 39aa202aff Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 3, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit e6ead19b77.
2022-03-23 18:32:17 -07:00
Arthur Eubanks e6ead19b77 Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash."
This reverts commit 27bd8f9492.

Causes crashes, see comments in D121973
2022-03-23 10:57:45 -07:00
serge-sans-paille 1b89c83254 Cleanup includes: Transforms/Instrumentation & Transforms/Vectorize
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D122181
2022-03-23 11:06:13 +01:00
Vasileios Porpodas 27bd8f9492 Recommit "[SLP] Fix lookahead operand reordering for splat loads." attempt 2, fixed assertion crash.
Original review: https://reviews.llvm.org/D121354

This reverts commit f7d7d2a08d.
2022-03-22 16:41:55 -07:00
Arthur Eubanks f7d7d2a08d Revert "Recommit "[SLP] Fix lookahead operand reordering for splat loads.""
This reverts commit 79613185d3.

Causes crashes, see comments in https://reviews.llvm.org/D121973.
2022-03-22 13:33:49 -07:00
Florian Hahn 50c8588e44
[LV] Remove Loop argument from createInductionResumeValues (NFCI).
createInductionResumeValues only uses its loop argument only to get the
pre-header, but the pre-header is already known (we created/cached it
earlier). Remove the unneeded loop argument.
2022-03-22 14:23:12 +00:00
Vasileios Porpodas 79613185d3 Recommit "[SLP] Fix lookahead operand reordering for splat loads."
Original review: https://reviews.llvm.org/D121354

The original commit 9136145eb0 broke the build on several targets.

Differential Revision: https://reviews.llvm.org/D121973
2022-03-21 15:57:32 -07:00
Philip Reames ee7324b898 Rename mayBeMemoryDependent to mayHaveNonDefUseDependency [nfc] 2022-03-21 10:01:40 -07:00
Alexey Bataev 79a182371e [SLP]Make stricter check for instructions that do not require
scheduling.

Need to check that the instructions with external operands can be
reordered safely before actualy exclude them from the scheduling.
2022-03-21 06:09:12 -07:00
Sophia 72bde608d2 [LV] Fix typo in comment
Reviewed by: fhahn (Florian Hahn)

    Differential Revision: https://reviews.llvm.org/D121781
2022-03-21 20:30:05 +08:00
Florian Hahn 0ebac76e6e
[LV] Remove unneeded Loop argument from completeLoopSkeleton. (NFCI)
completeLoopSkeleton only uses its loop argument only to get the
pre-header, but the pre-header is already known (we created/cached it
earlier). Remove the unneeded loop argument.
2022-03-21 10:07:25 +00:00
Florian Hahn 487629cc61
[LV] Remove dead Loop argument from emitMemRuntimeChecks. (NFC) 2022-03-20 21:01:15 +00:00
Philip Reames b7806c8b37 [SLP] Explicit track required stacksave/alloca dependency
The semantics of an inalloca alloca instruction requires that it not be reordered with a preceeding stacksave intrinsic call.  Unfortunately, there's no def/use edge or memory dependence edge.  (THe memory point is slightly subtle, but in general a new allocation can't alias with a call which executes strictly before it comes into existance.)

I'd tried to tackle this same case previously in 689babdf6, but the fix chosen there turned out to be incomplete.  As such, this change contains a fully revert of the first fix attempt.

This was noticed when investigating problems which surfaced with D118538, but this is definitely an existing bug.  This time around, I managed to reduce a couple of additional cases, including one which was being actively miscompiled even without the new scheduling change.  (See test diffs)

Compile time wise, we only spend extra time when seeing a stacksave (rare), and even then we walk the block at most once per schedule window extension.  Likely a non-issue.
2022-03-20 13:58:45 -07:00
Kazu Hirata bce1bf0ee2 [Transform] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC) 2022-03-20 10:41:22 -07:00
Philip Reames 6253b77da9 [SLP] Respect control dependence within a block during scheduling
This fixes an active miscompile visible in the test changes.  The basic problem is that the scheduling dependency graph didn't have any edges for control dependence within a single basic block.  The result is that we could (and in some rare cases *did*) perform reorderings within a block which could introduce new undefined behavior along paths which didn't previously contain any.

Impact wise, we have two major cases where control is not guaranteed to reach a later instruction in the block: may throw calls, and calls containing infinite loops.
* The former case was mostly covered by the memory dependencies, and to trigger require a function which can throw, but not write to memory.  In theory, such a case is possible, but not likely in practice.
* The later case is likely more of an issue in practice.  After this code was first written, we changed the IR semantics to allow well defined infinite loops without satisifying mustprogress.  Even for C/C++ - which do imply mustprogress - recent changes to how we treat atomics (e.g. an atomic read does not always imply a write) could expose this issue.  I'm a bit shocked we don't seem to have a bug report which hit this in real code actually.

Compile time wise, this results in a single extra scan of the scheduling window in the common case.  Since we stop scanning at the next instruction which isn't guaranteed to execute, no matter what order we traverse instructions in, we scan the block once.  The exception to this is that when we extend the scheduling window downwards, we invalidate all dependencies, and thus rescan.  So the potentially expensive case is when we a call in a big schedule window which is frequently extended.  We could optimize this case (by caching the last instruction not guaranteeed to transfer execution and scanning only the extended window) and starting there), but I decided to leave the complexity until it mattered.  That same case is already degenerate with memory dependences which is more expensive than the control dependence scan.

We could also consider combining the memory dependence and control dependence sets to reduce memory usage, but since it complicates the code slightly and makes debugging a bit harder, I went with the simplest scheme for now.

This was noticed while trying to understand the failures reported against D118538, but is not otherwise related to that change.
2022-03-19 13:36:24 -07:00
Florian Hahn 1a820ff039
[LV] Remove unnecessary uses of Loop* (NFC).
Update functions that previously took a loop pointer but only to get the
pre-header. Instead, pass the block directly. This removes the
requirement for the loop object to be created up-front.
2022-03-19 20:18:47 +00:00
Philip Reames 1093949cff [SLP] Add comment clarifying assumption that tripped me up [NFC]
I keep thinking this assumption is probably exploitable for a bug in the existing implementation, but all of my attempts at writing a test case have failed.  So for the moment, just document this very subtle assumption.
2022-03-18 11:40:19 -07:00
Kazu Hirata 3e0f7c7881 [Vectorize] Fix an 'unused function' warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3917:13: error:
  unused function 'needToScheduleSingleInstruction'
  [-Werror,-Wunused-function]
2022-03-18 11:24:57 -07:00
Kazu Hirata b3d8c0d069 [Vectorize] Fix an 'unused variable' warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8148:18: error:
  unused variable 'SDTE' [-Werror,-Wunused-variable]
2022-03-18 11:24:54 -07:00
Philip Reames 8f108c32bc Revert "[SLP] Optionally preserve MemorySSA"
This reverts commit 1cfa986d68.  See https://github.com/llvm/llvm-project/issues/54256 for why I'm discontinuing the project.

Seperately, it turns out that while this patch does correctly preserve MSSA, it's correct only at the end of the pass; not between vectorization attempts.  Even if we decide to resurrect this, we'll need to fix that before reapplying.
2022-03-18 10:45:59 -07:00
Vasileios Porpodas 9136145eb0 Revert "[SLP] Fix lookahead operand reordering for splat loads." due to build failures
This reverts commit 5efa78985b.
2022-03-17 18:22:04 -07:00
Vasileios Porpodas 5efa78985b [SLP] Fix lookahead operand reordering for splat loads.
Splat loads are inexpensive in X86. For a 2-lane vector we need just one
instruction: `movddup (%reg), xmm0`. Using the standard Splat score leads
to worse code. This patch adds a new score dedicated for splat loads.

Please note that a splat is usually three IR instructions:
- It is usually a load and 2 inserts:
 %ld = load double, double* %gep
 %ins1 = insertelement <2 x double> poison, double %ld, i32 0
 %ins2 = insertelement <2 x double> %ins1, double %ld, i32 1

- But it can also be a load, an insert and a shuffle:
 %ld = load double, double* %gep
 %ins = insertelement <2 x double> poison, double %ld, i32 0
 %shf = shufflevector <2 x double> %ins, <2 x double> poison, <2 x i32> zeroinitializer

Because of this some of the lit tests contain more IR instructions.

Differential Revision: https://reviews.llvm.org/D121354
2022-03-17 18:05:54 -07:00
Alexey Bataev d65cc85977 [SLP]Do not schedule instructions with constants/argument/phi operands and external users.
No need to schedule entry nodes where all instructions are not memory
read/write instructions and their operands are either constants, or
arguments, or phis, or instructions from others blocks, or their users
are phis or from the other blocks.
The resulting vector instructions can be placed at
the beginning of the basic block without scheduling (if operands does
not need to be scheduled) or at the end of the block (if users are
outside of the block).
It may save some compile time and scheduling resources.

Differential Revision: https://reviews.llvm.org/D121121
2022-03-17 11:03:45 -07:00
Florian Hahn 151c144350
[LV] Use usesScalars in widenPHIInstruction.
This uses the existing VPlan helpers to check whether there are scalar
uses of a phi recipe. It remove one of the few remaining dependencies on
the cost model from VPlan code generation.

Depends on D121612.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121613
2022-03-17 13:16:32 +00:00
Florian Hahn a6e70e4056
[VPlan] VPInterleaveRecipe only requires the first lane of the address.
VPInterleaveRecipe only uses the first lane of the address. Add
onlyFirstLaneUsed implementation. This is needed for a follow-up patch.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D121612
2022-03-17 11:56:43 +00:00
Nikita Popov 1dbeb64493 [SLP] Avoid unnecessary getIncomingValueForBlock() call (NFC)
This code just wants to check all incoming values, we don't care
care what the incoming block is here.
2022-03-17 12:23:46 +01:00
Alexey Bataev 150ea76543 Revert "[SLP]Do not schedule instructions with constants/argument/phi operands and external users."
This reverts commit 1eeb2bfe72 to fix
a bug reported in https://reviews.llvm.org/D121121
2022-03-16 13:54:59 -07:00
Malhar Jajoo a36d269658 [VPlan] Avoid collecting scalars for SVE
This patch ensures scalars (except for uniforms) are no
longer collected (prior to LVP planning phase) for
scalable vectorization.

This is to avoid the chances of generating scalarized
instructions later (during LVP execute phase) as they
are not supported for scalable vectorization.

Relevant test has also been added.

Differential Revision: https://reviews.llvm.org/D121452
2022-03-16 16:33:34 +00:00
Alexey Bataev 1eeb2bfe72 [SLP]Do not schedule instructions with constants/argument/phi operands and external users.
No need to schedule entry nodes where all instructions are not memory
read/write instructions and their operands are either constants, or
arguments, or phis, or instructions from others blocks, or their users
are phis or from the other blocks.
The resulting vector instructions can be placed at
the beginning of the basic block without scheduling (if operands does
not need to be scheduled) or at the end of the block (if users are
outside of the block).
It may save some compile time and scheduling resources.

Differential Revision: https://reviews.llvm.org/D121121
2022-03-16 06:05:43 -07:00
Philip Reames 1cfa986d68 [SLP] Optionally preserve MemorySSA
This initial patch adds code to preserve MemorySSA through a run of SLP vectorizer. The eventual plan is to use MemorySSA to accelerate SLP's memory dependence checking, but we're a ways from that.  In particular, this patch is correct, but really slow. It's being landed so that we can work incrementally in tree, not because it's expected to be useful to anyone just yet.

The broader effort is being tracked in https://github.com/llvm/llvm-project/issues/54256.  Its worth noting expicitly that this may not work out, and if not, we will be reverting all of the MSSA support in SLP at some point in the next few weeks.

Differential Revision: https://reviews.llvm.org/D117926
2022-03-15 16:36:15 -07:00
Florian Hahn ca1b2fc9fb
[LV] Remove LoopVectorBody from InnerLoopVectorizer. (NFCI)
Update places still referencing LoopVectorBody to use the vector loop to
get the vector loop header. This is needed to move vector loop
code-generation to VPlan completely, which in turn is needed to model
pre-header & exit blocks in VPlan as well.
2022-03-15 08:22:31 +00:00
Florian Hahn 4a0481e981
[LV] Check for users of truncated IVs, add more detailed comment.
Add missing outside user check for truncated IVs. Also hoist the code in
the helper with additional explanations.

Fixes #54370.
2022-03-14 19:39:30 +00:00
Nikita Popov 8361c5da30 [SLPVectorizer] Handle external load/store pointer uses with opaque pointers
In this case we may not generate a bitcast, so the new load/store
becomes the external user.
2022-03-14 16:55:09 +01:00
Florian Hahn d621ae30e2
[LV] Remove dead Loop argument from emitMinimumVector... (NFC)
The argument is not used, remove it.
2022-03-14 15:47:40 +00:00
Florian Hahn 3ee2d908a9
[LV] Remove dead Loop argument from emitSCEVChecks. (NFC)
The argument is not used, remove it.
2022-03-14 13:00:03 +00:00
Florian Hahn 8896c36624
[LV] Do not set insert point in completeLoopSkeleton. (NFCI)
The insertion point for the builder used during VPlan code generation is
set during code generation. Setting the insert point here is dead code
and can be removed.
2022-03-14 12:21:26 +00:00
Florian Hahn 1c0fc1f074
[VPlan] Ensure each iv user is only visited once in transform.
If a recipe has multiple uses of an IV, we crash. It causes a crash when
building llvm-test-suite.

Exposed by 95f76bff1c.
2022-03-13 21:42:17 +00:00
Florian Hahn 95f76bff1c
[LV] Create & use VPScalarIVSteps for all scalar users.
This patch is a follow-up to D115953. It updates optimizeInductions
to also introduce new VPScalarIVStepsRecipes if an IV has both vector
and scalar uses.

It updates all uses that only need scalar values to use the newly
created recipe for the scalar steps.

This completes untangling of VPWidenIntOrFpInductionRecipe
code-generation. Now the recipe *only* creates the widened vector
values, as it says on the tin.

The code to genereate IR has been moved directly to
VPWidenIntOrFpInductionRecipe::execute.

Note that the recipe has been updated to hold a reference to
ScalarEvolution, which is needed to expand the step, until we can place
the corresponding SCEV expansion in the pre-header.

Depends on D120827.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D120828
2022-03-13 17:15:24 +00:00
serge-sans-paille ed98c1b376 Cleanup includes: DebugInfo & CodeGen
Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D121332
2022-03-12 17:26:40 +01:00
Florian Hahn d3e1094473
[VPlan] Implement VPCanonicalIVPHIRecipe::onlyFirstLaneUsed.
The recipe only uses the first lane of its operands.

Suggested & split off D120827.
2022-03-11 18:07:26 +00:00
Florian Hahn ecea477df3
[VPlan] Helper to check if a recipe uses scalar values of op.
This patch adds a helper to check if a recipe only uses scalars of a
given operand. This is similar to onlyFirstLaneUsed, which was
introduced earlier.

By default, usesScalars falls back on onlyFirstLaneUsed.

Will be used by D120828.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D120827
2022-03-11 13:41:08 +00:00
Florian Hahn a12403cfea
[LV] Do not consider instrs dead if used by phi that's not in plan.
Single value phis won't be modeled in VPlan. If the phi only gets used
outside the loop, the current code misses the fact that the incoming
value is not dead. Update the code to also look through such phis to
check for outside users.

Fixes #54266
2022-03-09 16:04:44 +00:00
Philip Reames a2e9c68fcd [SLP] Extract a helper for buildvector [nfc] 2022-03-07 19:11:40 -08:00
Philip Reames 8ab3befa3f [SLP] Fix spelling in a lambda name [NFC] 2022-03-07 18:52:57 -08:00
Roman Lebedev 2f80ea7f4f
[NFC][LV] Use different braces in debug output
The analysis passes output function name encapsulated in `'` braces,
but LV uses `"`. Harmonizing this may help in creating an update script
for the LV costmodel test checks.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D121105
2022-03-07 19:32:37 +03:00
Philip Reames deae979a2c Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"""
This reverts commit 738042711b. A second, apparently separate, issue has been reported on the original review.
2022-03-03 11:35:34 -08:00
Philip Reames 738042711b Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""
Root issue which triggered the revert was fixed in 689bab.  No changes in the reapplied patch.

Original commit message follows:

SLP currently schedules all instructions within a scheduling window which stretches from the first instr
uction potentially vectorized to the last. This window can include a very large number of unrelated instruct
ions which are not being considered for vectorization. This change switches the code to only schedule the su
b-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

    Before this patch:

    704357 SLP                          - Number of calcDeps actions
     699021 SLP                          - Number of schedule calls
       5598 SLP                          - Number of ReSchedule actions
         59 SLP                          - Number of ReScheduleOnFail actions
      10084 SLP                          - Number of schedule resets
       8523 SLP                          - Number of vector instructions generated

    After this patch:

    102895 SLP                          - Number of calcDeps actions
     161916 SLP                          - Number of schedule calls
       5637 SLP                          - Number of ReSchedule actions
         55 SLP                          - Number of ReScheduleOnFail actions
      10083 SLP                          - Number of schedule resets
       8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-03-02 10:47:20 -08:00
Philip Reames 689babdf68 [SLP] Don't try to vectorize allocas
While a collection of allocas are technically vectorizeable - by forming a wider alloca - this was not a transform SLP actually knows how to do.  Instead, we were forming a bundle with missing dependencies, and then relying on the scheduling code to preserve program order if multiple instructions were scheduleable at once.  I haven't been able to write a test case, but I'm 99% sure this was wrong in some edge case.

The unknown op case was flowing down the shufflevector path.  This did result in some splat handling being lost with this change, but the same lack of splat handling is visible in a whole bunch of simple examples for the gather path.  I didn't consider this interesting to fix given how narrow the splat of allocas case is.
2022-03-02 10:08:43 -08:00
Florian Hahn 8777cb66a8
[VPlan] Remove reliance on underlying instr for ScalarIVSteps (NFCI).
Instead of relying on underlying instructions, this patch updates
VPScalarIVStepsRecipe to only store the required type information.

This removes access to unrelated information, as well as avoiding issues
with the same underlying instruction being shared by multiple recipes.

This change should only change the debug output and not cause any
codegen changes, hence NFCI.
2022-03-02 16:23:19 +00:00
Florian Hahn 9e46866c0c
[LV] Remove dead EntryVal argument from buildScalarSteps (NFC).
The EntryVal argument is not needed after recent refactoring. Remove it.
2022-03-02 14:59:22 +00:00
Arthur Eubanks 9c6250ee41 Revert "[SLP] Schedule only sub-graph of vectorizable instructions"
This reverts commit 0539a26d91.

Causes a miscompile, see comments on D118538.

Required updating bottom-to-top-reorder.ll.
2022-03-01 17:31:16 -08:00
Arthur Eubanks 6987ac7903 Revert "[SLP] Remove SchedulingPriority from ScheduleData [NFC]"
This reverts commit a3e9b32c00.

Required for reverting D118538.
2022-03-01 17:28:52 -08:00
serge-sans-paille a494ae43be Cleanup includes: TransformsUtils
Estimation on the impact on preprocessor output:
before: 1065307662
after:  1064800684

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120741
2022-03-01 21:00:07 +01:00
serge-sans-paille 71c3a5519d Cleanup includes: LLVMAnalysis
Number of lines output by preprocessor:
before: 1065940348
after:  1065307662

Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup
Differential Revision: https://reviews.llvm.org/D120659
2022-03-01 18:01:54 +01:00
Philip Reames 8cb0ac5825 [SLP] Check invariant that all instructions in bundle are in same block [NFC] 2022-02-28 13:17:44 -08:00
Alexey Bataev e4b9640867 [SLP]Improve bottom-to-top reordering.
Currently bottom-to-top reordering analysis counts orders of the
operands and then adds natural order counts for the operand users. It is
very conservative, this the user nodes themselves may require
reordering. Patch improves bottom-to-top analysis by checking for the
user nodes if they require/allows the reordring. If the user node must
be reordered, has reused scalars, is an alternate op vectorization node,
is a non-ordered gather node or may allow reordering because of the
reordered operands, such node is considered as the node that allows
reodring and is not counted as a node with the natural order.

Differential Revision: https://reviews.llvm.org/D120492
2022-02-28 06:48:46 -08:00
Florian Hahn b3e8ace198
Recommit "[VPlan] Introduce recipe to build scalar steps."
This reverts the revert commit ff93260bf6.

The underlying issue causing the PPC bot failures has been fixed in
cbaac14734 and a corresponding test case has been added in
ad2cad1c52.

Original message:

    This patch adds a new VPScalarIVStepsRecipe to handle building scalar
    steps.

    In the first patch, it only handles the case where there is no vector
    induction variable needed.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D115953
2022-02-28 14:12:20 +00:00
Florian Hahn cbaac14734
[LV] Remove induction recipes only used outside vector loop.
Exit values of vector inductions are generated completely independent of
the induction recipes. Consider them for removal, if they are not used
in loop.

This fixes a crash exposed by 49b23f451c.
2022-02-28 11:14:22 +00:00
Philip Reames 319265328c [SLP] Remove field unused after 33ce97f to silence buildbots [NFC] 2022-02-27 10:18:10 -08:00
Florian Hahn ff93260bf6
Revert "[VPlan] Introduce recipe to build scalar steps."
This reverts commit 49b23f451c.

This appears to break some PPC build bots. Revert while I investigate.
2022-02-27 17:51:19 +00:00
Philip Reames 33ce97f413 [SLP] Use BatchAA to reduce capture analysis cost [NFC]
SLP makes very heavy use of aliasing queries to construct pointer dependencies for scheduling purposes.  AA internally usings pointerMayBeCaptured to prove some noalias results.  In a local profile, we were spending about 4% of total O2 time in capture tracking.  By using BatchAA interface - which caches capture results - this drops to 2%.

Note that there is no invalidation of BatchAA here.  This assumes that no transformation done by SLP invalidates alias or capture results.  This is the same assumption made by the existing AliasCache, so this is not a new assumption in the code.
2022-02-27 09:47:24 -08:00
Florian Hahn 49b23f451c
[VPlan] Introduce recipe to build scalar steps.
This patch adds a new VPScalarIVStepsRecipe to handle building scalar
steps.

In the first patch, it only handles the case where there is no vector
induction variable needed.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115953
2022-02-27 17:32:41 +00:00
Florian Hahn 9bc866cc6f
[VPlan] Add recipe to handle SCEV expansion (NFC).
This can be used to explicitly model VPValues that depend on SCEV
expansion, like the step for inductions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116288
2022-02-27 12:47:02 +00:00
Florian Hahn da740492b0
[VPlan] Remove dead header-phi recipes.
This patch adds a new transform to remove dead recipes. For now, it only
removes dead recipes in the header, to keep the number tests that require
updating manageable. Future patches will extend this to remove dead
recipes across the whole plan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D118051
2022-02-26 16:26:39 +00:00
Evgeniy Brevnov 10e99eb7e4 [SLP] "Normal" instructions should not go between PHI and Lading pad
Currently, SLP can insert "shuffle" instruction beween PHI and Landing pad instruction. The problem is demonstrated by LIT test. The solution is to adjust insertion point once we are done with PHI generation.

Differential Revision: https://reviews.llvm.org/D120552
2022-02-26 11:44:26 +07:00
Vasileios Porpodas 4bbc3290a2 [SLP] Fix for the min/max intrinsic cost.
The min/max intrinsic cost is currently too low because in the cost calculation
we subtract the cost of the vector compare as we will not emit it.
For the cost of the vector compare we are currently passing BAD_ICMP_PREDICATE
which returns 3, the worst case cost.
I think we should be passing VecPred instead, since we know the predicates of
the compare instr.

I think this is related to commit b3b993a7ad which introduced the predicate
argument to getCmpSelInstrCost().
https://reviews.llvm.org/rGb3b993a7ad817c3c5801341fa78f34332900eb83

Differential Revision: https://reviews.llvm.org/D120439
2022-02-24 18:08:40 -08:00
Philip Reames ed54296ea3 [SLP] Fastpath instructions not in block being scheduled [nfc] 2022-02-23 13:51:36 -08:00
Philip Reames a4541fdfe4 [SLP] Replace a impossible branch condition with an assert [NFC]
An entire bundle must be inside the scheduling window.  Assert that this property holds as opposed to checking it at runtime.
2022-02-23 13:43:45 -08:00
Philip Reames 9a40f9f681 {SLP] Make it clear ScheduleDataMap is keyed by instructions [NFC] 2022-02-23 13:31:36 -08:00
Philip Reames 9392c0d4ef Revert "[SLP] Remove cap on schedule window size"
This reverts commit 6adf4b039e.  Reverting while investigating https://github.com/llvm/llvm-project/issues/54029
2022-02-23 13:12:07 -08:00
Philip Reames a83441e8cd Revert "[SLP] Simplify extendSchedulingRegion"
This reverts commit 8c85f3a052.
2022-02-23 13:12:07 -08:00
Philip Reames 222e8610f1 [SLP] Rearrange fields in ScheduleData for density [NFC] 2022-02-23 12:33:43 -08:00
Philip Reames a3e9b32c00 [SLP] Remove SchedulingPriority from ScheduleData [NFC]
First step in trying to shrink the memory footprint of ScheduleData to improve cache locality.
2022-02-23 11:43:46 -08:00
Philip Reames 8c85f3a052 [SLP] Simplify extendSchedulingRegion
This change uses instruction's comesBefore method to simplify the code significantly. There's little compile time concern here because getSpillCost already calls comesBefore on every basic block which contains a vectorization candidate. The only additional times we'll build basic block ordering is when we can't schedule a vector candidate anywhere in the containing block.

Differential Revision: https://reviews.llvm.org/D120364
2022-02-23 11:23:38 -08:00
Philip Reames 6adf4b039e [SLP] Remove cap on schedule window size
This cap was first added in 848c1aa45 (back in 2015).  Per the original commit message, the purpose was to avoid a compile time explosion in long basic blocks.  The algorithmic problem in scheduling has now been fixed in 0539a26d.

In the meantime, the code has rotten fairly badly.  Some intermediate refactoring caused the size to only be incremented if *both* iterators advance in the window search.  This causes the size to be badly undercounted when near one end of a basic block.  We no longer have any test which exercises the logic in an intentional way; there's one test which differs with this change, but the changes appear fairly orthoganol to the purpose of the test file.

Unfortunately, we no longer have the original motivating example, so it's possible that it also hits some other issue.  I tested locally with a large example, but even at it's worst, that one doesn't demonstrate anything too extreme even without the algorithmic fix.  It's clearly faster with, but only by ~20% which doesn't seem in line with the original commit message.   If regressions with this patch are seen, please file a bug and I'll try to fix any other algorithmic problems which fall out.
2022-02-23 08:27:45 -08:00
Brendon Cahoon 3cc15e2cb6 [SLP] Fix assert from non-constant index in insertelement
A call to getInsertIndex() in getTreeCost() is returning None,
which causes an assert because a non-constant index value for
insertelement was not expected. This case occurs when the
insertelement index value is defined with a PHI.

Differential Revision: https://reviews.llvm.org/D120223
2022-02-22 15:57:14 -06:00
Philip Reames 8612b11c86 [SLP] Use isInSchedulingRegion consistently [NFC] 2022-02-22 10:27:16 -08:00
Philip Reames 0539a26d91 [SLP] Schedule only sub-graph of vectorizable instructions
SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users.

This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example:

Before this patch:

704357 SLP                          - Number of calcDeps actions
 699021 SLP                          - Number of schedule calls
   5598 SLP                          - Number of ReSchedule actions
     59 SLP                          - Number of ReScheduleOnFail actions
  10084 SLP                          - Number of schedule resets
   8523 SLP                          - Number of vector instructions generated

After this patch:

102895 SLP                          - Number of calcDeps actions
 161916 SLP                          - Number of schedule calls
   5637 SLP                          - Number of ReSchedule actions
     55 SLP                          - Number of ReScheduleOnFail actions
  10083 SLP                          - Number of schedule resets
   8403 SLP                          - Number of vector instructions generated

I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore.

The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass.

For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch.

Differential Revision: https://reviews.llvm.org/D118538
2022-02-22 10:15:55 -08:00
Kerry McLaughlin 12fb133eba [LoopVectorize] Support conditional in-loop vector reductions
Extends getReductionOpChain to look through Phis which may be part of
the reduction chain. adjustRecipesForReductions will now also create a
CondOp for VPReductionRecipe if the block is predicated and not only if
foldTailByMasking is true.

Changes were required in tryToBlend to ensure that we don't attempt
to convert the reduction Phi into a select by returning a VPBlendRecipe.
The VPReductionRecipe will create a select between the Phi and the reduction.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D117580
2022-02-22 12:04:35 +00:00
Florian Hahn c141d158e5
[VectorCombine] Remove redundant checks (NFC).
The removed conditions are already checked by the if above.

Fixes #53761.
2022-02-19 21:05:32 +00:00
Philip Reames 3ad0bdae8f [SLP] Address post commit comment from 2e50760 2022-02-18 10:57:15 -08:00
Alexey Bataev b0a0df9809 [SLP]Fix vectorization of the alternate cmp instruction with swapped predicates.
If the alternate cmp instruction is a swapped predicate of the main cmp
instruction, need to generate alternate instruction, not the one with
the swapped predicate. Also, the lane with the alternate opcode should
be selected only, if the corresponding operands are not compatible.

Correctness confirmed:
https://alive2.llvm.org/ce/z/94BG66

Differential Revision: https://reviews.llvm.org/D119855
2022-02-18 04:27:45 -08:00
Alexey Bataev d1cd64ffdd [SLP][NFC]Fix misprint in function name, NFC. 2022-02-17 05:57:51 -08:00
Arthur Eubanks 826fae51d2 [SLPVectorizer][OpaquePtrs] Check GEP source element type
Fixes a miscompile with opaque pointers.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D119980
2022-02-16 14:47:20 -08:00
Philip Reames 2e50760775 [SLP] Add assert that entities are scheduled as expected
Requested in D118538
2022-02-15 12:21:49 -08:00
Anton Afanasyev b7574b092a [SLP] Don't try to vectorize pair with insertelement
Particularly this breaks vectorization of insertelements where some of
intermediate (i.e. not last) insertelements are used externally.

Fixes PR52275
Fixes #51617

Differential Revision: https://reviews.llvm.org/D119679
2022-02-15 16:12:59 +03:00
Anton Afanasyev 954ea0f044 [SLP] Simplify indices processing for insertelements
Get rid of non-constant and undef indices of insertelements
at `buildTree()` stage. Fix bugs.

Differential Revision: https://reviews.llvm.org/D119623
2022-02-14 14:50:44 +03:00
Florian Hahn 2cd22ce0d0
[LV] Pass start value directly to emitTransformedIndex (NFC). 2022-02-12 19:03:32 +00:00
Anton Afanasyev cd685f5736 [NFC][SLP] Set default parameter for Offset equal to zero 2022-02-11 17:22:33 +03:00
Philip Reames 5ba115031d [PSE] Remove assumption that top level predicate is union from public interface [NFC*]
Note that this doesn't actually cause the top level predicate to become a non-union just yet.

The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
2022-02-10 16:14:52 -08:00
Simon Pilgrim 6af7c1371a [LoopVectorize] getStepVector - reduce scope of local variable. NFC. 2022-02-10 20:44:25 +00:00
David Green b55d4c2ad8 Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`"
This reverts commit 77a0da926c as we've
received multiple reports of this significantly impacting performance,
in ways that don't seem to just be target specific cost models going
wrong. I would offer some reproducers, but the test changes here seem to
be full of them!

Reverting for now and hopefully we can remove the "hack" more carefully
as we go.
2022-02-09 20:02:54 +00:00
Alexey Bataev 370ea1a199 [SLP][NFC]Fix comment, NFC. 2022-02-09 07:14:14 -08:00
Florian Hahn 8aa122081f
[LV] Pass step to emitTransformedIndex (NFC).
Move out the induction step creation from emitTransformedIndex to the
callers. In some places (e.g. widenIntOrFpInduction) the step is already
created. Passing the step in ensures the steps are kept in sync.
2022-02-09 11:12:45 +00:00
Florian Hahn c9e6678b56
[LV] Move buildScalarSteps out of ILV (NFC).
This makes the function independent of shared state in ILV (ensures no
new dependencies on things like the cost model are introduced) and allows
for use directly in recipe's ::execute functions.
2022-02-08 21:18:40 +00:00
David Green b4c6d1bb37 [LoopVectorizer] Don't perform interleaving of predicated scalar loops
The vectorizer will choose at times to "vectorize" loops with a scalar
factor (VF=1) with interleaving (IC > 1). This can occasionally produce
better code than the unroller (notable for reductions where it can
produce independent reduction chains that are combined after the loop).
At times this is not very beneficial though, for example when runtime
checks are needed or when the scalar code requires predication.

This addresses the second point, preventing the vectorizer from
interleaving when the scalar loop will require predication. This
prevents it from making a bit of a mess, that is worse than the original
and better left for the unroller to unroll if beneficial. It helps
reverse some of the regressions from D118090.

Differential Revision: https://reviews.llvm.org/D118566
2022-02-07 19:34:28 +00:00
Florian Hahn 5a72357697
[LV] Use IRBuilderBase in VPlan.h, remove IRBuilder.h include (NFC).
By using IRBuilderBase instead of IRBuilder<> a forward declaration can
be used instead of including IRBuilder.h
2022-02-07 17:46:16 +00:00
Roman Lebedev 77a0da926c
[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`
D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model.
What it essentially does is prevents scalarized vectorization of masked memory operations:
```
  // TODO: Cost model for emulated masked load/store is completely
  // broken. This hack guides the cost model to use an artificially
  // high enough value to practically disable vectorization with such
  // operations, except where previously deployed legality hack allowed
  // using very low cost values. This is to avoid regressions coming simply
  // from moving "masked load/store" check from legality to cost model.
  // Masked Load/Gather emulation was previously never allowed.
  // Limited number of Masked Store/Scatter emulation was allowed.
```

While i don't really understand about what specifically `is completely broken`
was talking about, i believe that at least on X86 with AVX2-or-later,
this is no longer true. (or at least, i would like to know what is still broken).
So i would like to follow suit after D111460, and like wise disable that hack for AVX2+.

But since this was added for X86 specifically, let's just instead completely remove this hack.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114779
2022-02-07 16:08:31 +03:00
Djordje Todorovic afd54e1ed1 [SLPVectorizer] Fix "unused variable" build warning 2022-02-07 10:38:19 +01:00
Kazu Hirata 3a3cb929ab [llvm] Use = default (NFC) 2022-02-06 22:18:35 -08:00
Florian Hahn 541ca12dcd
[LV] Use VPReplicateRecipe::isUniform instead isUniformAfterVec (NFCI).
In scalarizeInstruction(), isUniformAfterVectorization is used to detect
cases where it is sufficient to always access the first lane. This
should map directly checking whether the operand is a uniform replicate
recipe.

Differential Revision: https://reviews.llvm.org/D116654
2022-02-06 16:37:20 +00:00
Benjamin Kramer ce9417348e [SLP] Skip a DenseSet<unsigned> -> bit vector conversion. NFCI. 2022-02-06 00:57:47 +01:00
Philip Reames 0cc6165d05 [SLP] Strengthen internal asserts about scheduled node state [NFC]
All members of a scheduled bundle must have valid dependencies, with no unscheduled ones, and only the lead element gets marked scheduled.
2022-02-04 12:22:52 -08:00
Philip Reames f3f8e3da9f [SLP] Remove ScheduleData::UnscheduledDepsInBundle field [NFC-ish]
We can simply compute the value of this field on demand.  Doing so clarifies the behavior when one of the instructions within a bundle doesn't have valid dependencies.  I vaguely thing this could change behavior slightly, but none of the test cases are affected, and my attempts to write one by hand have failed.

This also minorly reduces memory usage, but that's a secondary value at best.
2022-02-04 10:12:09 -08:00
Philip Reames bb9964ba43 [SLP] Have only ready items in ready list [NFC]
This adds the assertion that all items in the ready list are in-fact scheduleable entities ready to be scheduled.  This involves changing the ReadyInsts structure to be a set, and fixing a couple places where we left nodes on the list when they were no longer ready.
2022-02-03 19:49:24 -08:00
Philip Reames 2cbc92fb11 [SLP] Strengthen internal invariant assertions slightly
This builds on the invariant checks introduced in 1519629, and adds a couple more than seem to hold without additional work.
2022-02-03 14:56:39 -08:00
Philip Reames 1519629a20 [SLP] Add basic self consistency asserts into scheduling
The idea here is to have a verify routine we can call during scheduling to ensure broken invariants are reported.  The intent is to help in debugging scheduling bugs.

At the moment, only the most basic properties are checked as adding several I thought held reported failures.
2022-02-03 13:27:35 -08:00
Philip Reames 6d0c007bc1 [SLP] Fix a typo in comment 2022-02-03 09:11:47 -08:00
Sander de Smalen eaee477eda [LV] Use VScaleForTuning to allow wider epilogue VFs.
When the main loop is e.g. VF=vscale x 1 and the epilogue VF cannot
be any smaller, the vectorizer should try to estimate how many lanes are
executed at runtime and allow a suitable fixed-width VF to be chosen. It
can use VScaleForTuning to figure out what a suitable fixed-width VF could
be. For the case where the main loop VF is VF=vscale x 1, and VScaleForTuning=8,
it could still choose an epilogue VF upto VF=4.

This was a bit tricky to test, so this patch also introduces a wrapper
function to get 'VScaleForTuning' by also considering vscale_range.
If min and max are equal, then that will be the vscale we compile for.
It makes little sense to tune for a different width if the code
will not be portable for other widths.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118709
2022-02-03 15:40:17 +00:00
Alexey Bataev 802ceb8343 [SLP]Excluded external uses from the reordering estimation.
Compiler adds the estimation for the external uses during operands
reordering analysis, which makes it tend to prefer duplicates in the
lanes rather than diamond/shuffled match in the graph. It changes the sizes of
the vector operands and may prevent some vectorization. We don't need
this kind of estimation for the analysis phase, because we just need to
choose the most compatible instruction and it does not matter if it has
external user or used in the non-matching lane. Instead, we count the number
of unique instruction in the lane and see if the reassociation changes
the number of unique scalars to be power of 2 or not. If we have power
of 2 unique scalars in the lane, it is considered more profitable rather
than having non-power-of-2 number of unique scalars.

Metric: SLP.NumVectorInstructions

                          test-suite :: MultiSource/Benchmarks/FreeBench/distray/distray.test   70.00   86.00   22.9%
                             test-suite :: External/SPEC/CFP2017rate/544.nab_r/544.nab_r.test  346.00  353.00    2.0%
                            test-suite :: External/SPEC/CFP2017speed/644.nab_s/644.nab_s.test  346.00  353.00    2.0%
                         test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test  235.00  239.00    1.7%
                  test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test  235.00  239.00    1.7%
                     test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 8723.00 8834.00    1.3%
                                 test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 1051.00 1064.00    1.2%
                         test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1628.00 1646.00    1.1%
                          test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1628.00 1646.00    1.1%
                       test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 9100.00 9184.00    0.9%
                     test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3565.00 3577.00    0.3%
                    test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3565.00 3577.00    0.3%
                       test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4235.00 4245.00    0.2%
                              test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1996.00 1998.00    0.1%
                                 test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1671.00 1672.00    0.1%

test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test  783.00  782.00   -0.1%
                      test-suite :: SingleSource/Benchmarks/Misc/oourafft.test   69.00   68.00   -1.4%
        test-suite :: External/SPEC/CINT2017speed/641.leela_s/641.leela_s.test  207.00  192.00   -7.2%
         test-suite :: External/SPEC/CINT2017rate/541.leela_r/541.leela_r.test  207.00  192.00   -7.2%
 test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test   89.00   80.00  -10.1%
test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test   89.00   80.00  -10.1%
       test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test  260.00  215.00  -17.3%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test  256.00  211.00  -17.6%

MultiSource/Benchmarks/Prolangs-C/TimberWolfMC - pretty the same.
SingleSource/Benchmarks/Misc/oourafft.test - 2 <2 x > loads replaced by
one <4 x> load.
External/SPEC/CINT2017speed/641.leela_s - function gets vectorized and
not inlined anymore.
External/SPEC/CINT2017rate/541.leela_r - same
xternal/SPEC/CINT2017rate/531.deepsjeng_r - changed the order in
multi-block tree, the result is pretty the same.
External/SPEC/CINT2017speed/631.deepsjeng_s - same.
MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a - the result is the same
as before.
MultiSource/Benchmarks/MiBench/consumer-jpeg - same.

Differential Revision: https://reviews.llvm.org/D116688
2022-02-03 06:50:06 -08:00
Alexey Bataev ad2a0ccf8f [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-03 06:24:10 -08:00
Alexey Bataev 8a1dfbc4d8 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit 842a2360a8 to fix the
bugs reported by users in https://reviews.llvm.org/D115955#3291538.
2022-02-02 12:06:36 -08:00
Alexey Bataev 842a2360a8 [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-02 10:32:52 -08:00
Benjamin Kramer 0c3d22a592 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit 83620bd2ad.

It's causing miscompilations, see review comments at
https://reviews.llvm.org/D115955
2022-02-02 13:08:51 +01:00
serge-sans-paille e188aae406 Cleanup header dependencies in LLVMCore
Based on the output of include-what-you-use.

This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/

I've tried to summarize the biggest change below:

- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h

And the usual count of preprocessed lines:
$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after:  6189948

200k lines less to process is no that bad ;-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup

Differential Revision: https://reviews.llvm.org/D118652
2022-02-02 06:54:20 +01:00
Sander de Smalen 2a44eaf20f [LV] Allow a scalable VF for the epilogue.
For some reason we limited the epilogue VF to be fixed-width, but there
is not necessarily a reason for doing so. If the main VF=vscale x 16, the
epilogue VF could be either fixed-width, or a scalable VF upto vscale x 8.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D118688
2022-02-01 22:38:55 +00:00
Alexey Bataev 83620bd2ad [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-02-01 09:54:20 -08:00
Benjamin Kramer 5281f0dab2 Revert "[SLP]Alternate vectorization for cmp instructions."
This reverts commit afaaecc88c.

Crashes when compiling SciPy, test case https://reviews.llvm.org/P8276
2022-02-01 11:40:43 +01:00
Florian Hahn 7fe4fa9a0a
[LV] Use onlyFirstLaneDemanded when widening pointer phis (NFCI).
This removes another instance of recipe execution still relying on
the cost model.

Depends on D116554.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D116656
2022-02-01 09:50:47 +00:00
Alexey Bataev afaaecc88c [SLP]Alternate vectorization for cmp instructions.
Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Differential Revision: https://reviews.llvm.org/D115955
2022-01-31 11:11:25 -08:00
Florian Hahn 8f12175fed
[VPlan] Use VPlan to check if only the first lane is used.
This removes the remaining dependence on LoopVectorizationCostModel from
buildScalarSteps and is required so it can be moved out of ILV.

It also improves allows us to remove a few unneeded instructions.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116554
2022-01-30 13:07:29 +00:00
Florian Hahn efd4938723
[VPlan] Handle IV vector splat using VPWidenCanonicalIV.
This patch tries to use an existing VPWidenCanonicalIVRecipe
instead of creating another step-vector for canonical
induction recipes in widenIntOrFpInduction.

This has the following benefits:

 1. First step to avoid setting both vector and scalar values for the
    same induction def.
 2. Reducing complexity of widenIntOrFpInduction through making things
    more explicit in VPlan
 3. Only need to splat the vector IV for block in masks.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116123
2022-01-29 16:25:27 +00:00
Philip Reames 6888081e32 [SLP] Use moveBefore to simplify code [NFC] 2022-01-28 12:44:07 -08:00
Philip Reames 746e435ff7 Revert "[SLP] Add a clarifying assert in block scheduling [NFC]"
This reverts commit db49a78900.  The reasoning in the patch applied to a downstream branch, and I got myself confused when trying to split apart pieces.  Thankfully, the assert was simply weaker than the actual invariant currently upstream which is that ReadyInsts is not empty.
2022-01-28 12:10:31 -08:00
Philip Reames db49a78900 [SLP] Add a clarifying assert in block scheduling [NFC]
The fact we could have a block with a valid scheduling window, but nothing to schedule was surprising to me.  After digging through the code, this can only happen if we don't find anything to directly vectorize.  However, the reduction handling code relies on this mode, so we can't simply consider such trees unvectorizeable.  The assert conveys both that this situation can happen, but also that it can *only* happen for an immediate gather.

Context: We built the bundle before deciding that vectorization of a bundle is possible.  A side effect of bundle construction is manipulating the scheduling window, so a bundle which isn't vectorizable can cause the creation or expansion of a scheduling window.
2022-01-28 11:08:59 -08:00
Alexey Bataev cec8b614f3 [SLP]Do not reorder top nodes if they do not require reordering.
No need to reorder the top nodes, if they are not stores or
insertelement instructions and each node should be analized only
once, when the bottom-to-top analysis is performed.
We still endup with extractelements for the top node scalars and
the final shuffle just adds an extra cost and currently
crashes the compiler for PHI nodes.

Differential Revision: https://reviews.llvm.org/D116760
2022-01-28 09:16:18 -08:00
Florian Hahn 96400f179f
[VPlan] Record whether scalar IVs are need in induction recipe. (NFC)
This explicitly records whether a scalar IV is needed in the
VPWidenIntOrFpInductionRecipe, to remove a dependence on the cost-model
during its ::execute.

It will also be used in D116123 to determine if a vector phi will be
generated.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D118167
2022-01-28 09:34:03 +00:00
Benjamin Kramer 0776f6e04d [LSV] Vectorize loads of vectors by turning it into a larger vector
Use shufflevector to do the subvector extracts. This allows a lot more
load merging on AMDGPU and also on NVPTX when <2 x half> is involved.

Differential Revision: https://reviews.llvm.org/D117219
2022-01-26 11:38:41 +01:00
eopXD 6be77561f8 [SLP][NFC] Add debug logs for entry.
Tell the users they are specifying something without vector register.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D117980
2022-01-24 09:05:21 -08:00
Kerry McLaughlin 8082ab2fc3 [LoopVectorize] Support epilogue vectorisation of loops with reductions
isCandidateForEpilogueVectorization will currently return false for loops
which contain reductions. This patch removes this restriction and makes
the following changes to support epilogue vectorisation with reductions:

- `fixReduction`: If fixReduction is being called during vectorisation of the
    epilogue, the phi node it creates will need to additionally carry incoming
     values from the middle block of the main loop.

- `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi
    created by fixReduction are updated after the vec.epilog.iter.check block
    is added. The phi is also moved to the preheader of the epilogue.

- `processLoop`: The start value of any VPReductionPHIRecipes are updated before
    vectorising the epilogue loop. The getResumeInstr function added to the ILV
    will return the resume instruction associated with the recurrence descriptor.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D116928
2022-01-24 12:03:31 +00:00
Kazu Hirata f63a9cd99d [Vectorize] Remove unused variables (NFC) 2022-01-23 20:32:54 -08:00
Florian Hahn 5f2854f1da
[LV] Always create VPWidenCanonicalIVRecipe, optimize away later.
This patch updates createBlockInMask to always generate
VPWidenCanonicalIVRecipe and adds a transform to optimize it away later,
if it is not needed.

This is a step towards breaking up VPWidenIntOrFpInductionRecipe and
explicitly distinguishing between vector phis and scalarizing.

Split off from D116123.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D117140
2022-01-22 15:34:20 +00:00
Florian Hahn 55689904d2
[VPlan] Move ::isCanonical outside ifdef.
This fixes a build failure with assertions disabled.
2022-01-21 09:44:31 +00:00
Florian Hahn c0cf209076
[VPlan] Add VPWidenIntOrFpInductionRecipe::isCanonical, use it (NFCI).
This patch adds VPWidenIntOrFpInductionRecipe::isCanonical to check if
an induction recipe is canonical. The code is also updated to use it
instead of isCanonicalID.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D117551
2022-01-21 09:35:06 +00:00
Philip Reames c0906f6b21 [SLP] Remove stray semicolon to make bots happy
Certain bots (e.g. sanitizer-x86_64-linux-android) appear to be running with strict c++98 flags which disallow ; at global scope.
2022-01-20 14:09:28 -08:00
Philip Reames 5a670f1378 [SLP] Kill an unused param and use a for-loop in calculateDependencies [NFC] 2022-01-20 13:58:20 -08:00
Philip Reames 60f6191879 [SLP] Extract formBundle helper for readability [NFC] 2022-01-20 13:08:37 -08:00
Philip Reames 118babe67a [SLP] Use for loops for walking bundle elements 2022-01-20 12:44:33 -08:00
Philip Reames 860038e0d7 [SLP] Rename a couple lambdas to be more clearly separate from method names 2022-01-20 12:13:30 -08:00
Philip Reames c104fca36b {SLP] Delete dead code in favor of proper assert [NFC] 2022-01-20 08:54:12 -08:00
Philip Reames c43ebae838 [SLP] Reduce nesting depth in calculateDependencies via for loop and early continue [NFC] 2022-01-20 08:46:44 -08:00
Philip Reames 3c422cbe6b [SLP] Add an asser to make a non-obvious precondition clear [NFC] 2022-01-20 08:24:10 -08:00
Florian Hahn 165e36bf18
[VPlan] Assert can IV is only used by increments during epilogue vec.
After resetting the start value of the canonical IV, it might not be
canonical any more. Add an assertion to make sure it is only used by its
increment, to avoid potential mis-use. Suggested in D117140.
2022-01-19 10:10:05 +00:00
David Sherwood e781620dee [LoopVectorize][AArch64] Use get.active.lane.mask intrinsic when SVE is enabled
When SVE is enabled for AArch64 targets it makes more sense to use the
get.active.lane.mask intrinsic, because SVE has an exact 1-1 mapping
from the intrinsic to the 'whilelo' instruction for legal vector types.
This instruction neatly takes overflow into account as well. This patch
fixes an issue in VPInstruction::generateInstruction that assumed we are
only dealing with fixed-width vectors.

Differential Revision: https://reviews.llvm.org/D117109
2022-01-18 11:59:30 +00:00
Florian Hahn 500fe60957
[VPlan] Drop unnecessary uses of getVPSingleValue (NFC). 2022-01-17 13:27:33 +00:00
Florian Hahn 070d1034da
[LV] Restore metadata to disable runtime unrolling for epilogue loop.
After d4a8fc3a87 LV stopped adding metadata to disable runtime
unrolling to the vectorized epilogue loop. This was missed because
278aa65cc4 removed the relevant test coverage.

This patch fixes that by adding the relevant metadata after
vector loop generation.
2022-01-16 13:14:16 +00:00
Florian Hahn 62739204d4
[LV] Move AddRuntimeUnrollDisableMetaData so it can be used earlier (NFC)
Move up the definition of AddRuntimeUnrollDisableMetaData, so it can be
re-used earlier in the file in a follow-up patch.
2022-01-16 10:30:24 +00:00
Florian Hahn 42b34facfd
Recommit "[LV] Inline CreateSplatIV call for scalar VFs."
This reverts the revert commit 073c27b5e5.

A reduced test case has been added in 5e4966cbae and the code has
been updated to handle the case where getInductionOpcode returns
BinaryOpsEnd. In this case, the original code was always using
Instruction::Add. Do the same in the patch.

Note this commit may slightly change the value naming, because it now
also assigns the 'induction' name in the floating point case.
2022-01-14 19:03:49 +00:00
James Y Knight 073c27b5e5 Revert "[LV] Inline CreateSplatIV call for scalar VFs (NFC)."
Causes a crash with the following (creduce'd) test-case:

clang -O3 '--target=aarch64-grtev4-linux-gnu' -xc - -c -o /dev/null <<EOF
int *e;
int f;
int g() {
  int h;
  int *j = 0;
  while (&f - j > 0) {
    int k;
    k = j;
    if (e == j && *e)
      k = 5;
    h = k;
    j++;
  }
  return h;
}
EOF

This reverts commit 7ce48be0fd.
2022-01-14 00:00:02 +00:00
Florian Hahn 3f2fb767e3
[VPlan] Make IV operand explicit for VPWidenCanonicalIVRecipe (NFC).
This makes the def-use relationship between VPCanonicalIVPHIRecipe and
VPWidenCanonicalIVRecipe explicit. Needed for D117140.
2022-01-13 11:13:05 +00:00
Florian Hahn 7ce48be0fd
[LV] Inline CreateSplatIV call for scalar VFs (NFC).
This is a NFC change split off from D116123, as suggested there.
D116123 will remove the last user of CreateSplatIV.
2022-01-13 09:34:31 +00:00
Florian Hahn d4a8fc3a87
[VPlan] Introduce and use BranchOnCount VPInstruction.
This patch adds a new BranchOnCount VPInstruction opcode with 2
operands. It first compares its 2 operands (increment of canonical
induction and vector trip count), followed by a branch to either the
exit block or back to the vector header.

It must be the last recipe in the exit block of the topmost vector loop
region.

This extracts parts from D113224 and was discussed in D113223.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116479
2022-01-12 13:42:13 +00:00
Rosie Sumpter 552eb372cb [LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter
This is required to query the legality more precisely in the LoopVectorizer.

This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.

Differential Revision: https://reviews.llvm.org/D115329
2022-01-12 13:34:12 +00:00
David Sherwood b0922a9dcd [LoopVectorize] Make VPWidenCanonicalIVRecipe::execute work for scalable vectors
The code in VPWidenCanonicalIVRecipe::execute only worked for fixed-width
vectors due to the way we generate the values per lane. This patch changes
the code to use a combination of vector splats and step vectors to get
the same result. This then works for both fixed-width and scalable vectors.

Tests that exercise this code path for scalable vectors have been added here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding.ll

Differential Revision: https://reviews.llvm.org/D113180
2022-01-10 14:12:32 +00:00
David Sherwood e3c84fb948 [LoopVectorize] Add support for tail folding using scalable vectors
This patch fixes up an issue with InnerLoopVectorizer::getOrCreateVectorTripCount
whereby we weren't correctly generating the runtime trip count
for scalable vectors when tail-folding.

It also removes some asserts in the tail-folding path for cases when
the VF is not scalable.

In this patch I have only permitted tail-folding to be enabled
explicitly for scalable vectors when the user has specified one
of the following flags:

  -prefer-predicate-over-epilogue=predicate-dont-vectorize
  -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue

For now it's best not to enable tail-folding with scalable vectors for
low trip counts or when optimising for code size, since there has been
no analysis on whether this is worth it.

Various tests have been added here:

  Transforms/LoopVectorize/AArch64/sve-tail-folding.ll
  Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll

The tests cannot be target independent because they require masked
load/store support, i.e. TTI.isLegalMaskedLoad and TTI.isLegalMaskedStore
need to return true.

Differential Revision: https://reviews.llvm.org/D113003
2022-01-10 10:55:40 +00:00
Florian Hahn 1ce01b7dfe
[SCEVExpander] Simplify cleanup, skip sorting by dominance.
There is no need to sort inserted instructions by dominance, as the
deletion loop still requires RAUW with undef before deleting. Removing
instructions in reverse insertion order should still insure that the
number of uselist updates is kept to a minimum.
2022-01-09 18:38:41 +00:00
Kazu Hirata 435a5a3652 [llvm] Fix bugprone argument comments (NFC)
Identified with bugprone-argument-comment.
2022-01-08 11:56:38 -08:00
Kazu Hirata 4e2ec7e38d [llvm] Remove unused forward declarations (NFC) 2022-01-07 20:00:34 -08:00
Kazu Hirata b932bdf59f [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-07 17:45:09 -08:00
Alexey Bataev d130df544d [SLP]Improve reordering for the nodes beeing used in alternate vectorization.
No need to include the order of the scalars beeing used as part of the
alternate vectorization into account when trying to reorder the whole
graph. Such elements better to reorder in the following phase because
the subtree still ends up in shuffle.

Part of D116688, fixes the regression in D116690.

Differential Revision: https://reviews.llvm.org/D116740
2022-01-06 11:18:57 -08:00
Alexey Bataev 7cb19fe493 [SLP]Initialize the lane with the given value instead of default 0.
There is a bug in the reordering analysis stage. If the element with the
given hash is not added to the map but has the same number of APOs and
instructions with same parent, but different instruction opcode, it will
be initalized with default values and then the counter is increased by
1. But the lane is not updated and default to 0 instead of the actual
   `Lane` value. It leads to the fact that the analysis is useless in
   many cases and default to lane 0 instead of actual lane with the
   minimum amount of APO operands.

Differential Revision: https://reviews.llvm.org/D116690
2022-01-06 10:57:11 -08:00
Alexey Bataev 700997aef8 [SLP][NFC]Fix comment, NFC. 2022-01-06 06:38:29 -08:00
Sander de Smalen 9cbe000df2 [LV] Load/store/reduction type must be sized, assert it.
This addresses a suggestion by @nikic on D115356.
2022-01-06 12:35:27 +00:00
Alexey Bataev dd83befe33 [SLP][NFC]Improved isAltShuffle by comparing instructions instead of
opcodes, NFC.

NFC part of D115955.
2022-01-05 12:30:13 -08:00
Florian Hahn 2ee8154816
[LV] Don't use getVPSingleValue for VPWidenMemoryInstRecipe (NFC).
VPWidenMemoryInstructionRecipe is a VPValue, so this can be passed
directly, instead of relying on getVPSingleValue.
2022-01-05 13:51:50 +00:00
Sander de Smalen 95a93722db [LV] Remove what seems like stale code in collectElementTypesForWidening.
This was originally added in rG22174f5d5af1eb15b376c6d49e7925cbb7cca6be
although that patch doesn't really mention any reasons for ignoring the
pointer type in this calculation if the memory access isn't consecutive.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D115356
2022-01-05 12:20:59 +00:00
Florian Hahn 65c4d6191f
[VPlan] Add VPCanonicalIVPHIRecipe, partly retire createInductionVariable.
At the moment, the primary induction variable for the vector loop is
created as part of the skeleton creation. This is tied to creating the
vector loop latch outside of VPlan. This prevents from modeling the
*whole* vector loop in VPlan, which in turn is required to model
preheader and exit blocks in VPlan as well.

This patch introduces a new recipe VPCanonicalIVPHIRecipe to represent the
primary IV in VPlan and CanonicalIVIncrement{NUW} opcodes for
VPInstruction to model the increment.

This allows us to partly retire createInductionVariable. At the moment,
a bit of patching up is done after executing all blocks in the plan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D113223
2022-01-05 10:46:06 +00:00
Rosie Sumpter 961f51fdf0 [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without loads/stores
For loops that contain in-loop reductions but no loads or stores, large
VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes
has no element types to check through and so returns the default widths
(-1U for the smallest and 8 for the widest). This results in the widest
VF being chosen for the following example,

float s = 0;
for (int i = 0; i < N; ++i)
  s += (float) i*i;

which, for more computationally intensive loops, leads to large loop
sizes when the operations end up being scalarized.

In this patch, for the case where ElementTypesInLoop is empty, the widest
type is determined by finding the smallest type used by recurrences in
the loop instead of falling back to a default value of 8 bits. This
results in the cost model choosing a more sensible VF for loops like
the one above.

Differential Revision: https://reviews.llvm.org/D113973
2022-01-04 10:12:57 +00:00
Kazu Hirata e5947760c2 Revert "[llvm] Remove redundant member initialization (NFC)"
This reverts commit fd4808887e.

This patch causes gcc to issue a lot of warnings like:

  warning: base class ‘class llvm::MCParsedAsmOperand’ should be
  explicitly initialized in the copy constructor [-Wextra]
2022-01-03 11:28:47 -08:00
Florian Hahn 791523bae6
[LV] Set loop metadata after VPlan execution (NFC).
Setting the loop metadata for the vector loop after VPlan execution
allows generating the full loop body during VPlan execution. This is in
preparation for D113224.
2022-01-03 09:59:50 +00:00
Nikita Popov 330cb03269 [LoadStoreVectorizer] Check for guaranteed-to-transfer (PR52950)
Rather than checking for nounwind in particular, make sure the
instruction is guaranteed to transfer execution, which will also
handle non-willreturn calls correctly.

Fixes https://github.com/llvm/llvm-project/issues/52950.
2022-01-03 10:55:47 +01:00
Florian Hahn 6e0a333f71
[LV] Use Builder.CreateVectorReverse directly. (NFC)
IRBuilder::CreateVectorReverse already handles all cases required by
LoopVectorize. It can be used directly instead of reverseVector.
2022-01-02 19:09:30 +00:00
Kazu Hirata 7e163afd9e Remove redundant void arguments (NFC)
Identified by modernize-redundant-void-arg.
2022-01-02 10:20:19 -08:00
Florian Hahn b1a333f0fe
[VPlan] Don't consider VPWidenCanonicalIVRecipe phi-like.
VPWidenCanonicalIVRecipe does not create PHI instructions, so it does
not need to be placed in the phi section of a VPBasicBlock.

Also tidies the code so the WidenCanonicalIV recipe and the
compare/lane-masks are created in the header.

Discussed D113223.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116473
2022-01-02 12:48:17 +00:00
Kazu Hirata fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
Florian Hahn 7305798049
[VPlan] Remove VPWidenPHIRecipe constructor without start value (NFC).
This was suggested as a separate cleanup in recent reviews.
2022-01-01 13:53:48 +00:00
Florian Hahn e2f1c4c706
[LV] Turn check for unexpected VF into assertion (NFC).
VF should always be non-zero in widenIntOrFpInduction. Turn check into
assertion.
2021-12-31 13:19:03 +00:00
Alexey Bataev e0efedd2c3 [SLP][NFC]Fix non-determinism in reordering, NFC.
Need to clear CurrentOrder order mask if it is determined that
extractelements form identity order and need to use a vector-like
construct when iterating over ordered entries in the reorderTopToBottom
function.
2021-12-30 13:10:25 -08:00
Florian Hahn ba9016a030
[LV] Replace redundant tail-fold check with assert (NFC).
The code path can only be reached when folding the tail, so turn the
check into an assertion.
2021-12-29 19:00:41 +01:00
Florian Hahn 9d297c7894
[VPlan] Add prepareToExecute to set up live-ins (NFC).
This patch adds a new prepareToExecute helper to set up live-ins, so
VPTransformState doesn't need to hold values like TripCount.

This also requires making the trip count operand for ActiveLaneMask
explicit in VPlan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116320
2021-12-28 17:49:47 +01:00
Sanjay Patel 0edf99950e [Analysis] allow caller to choose signed/unsigned when computing constant range
We should not lose analysis precision if an 'add' has both no-wrap
flags (nsw and nuw) compared to just one or the other.

This patch is modeled on a similar construct that was added with
D59386.

I don't think it is possible to expose a problem with an unsigned
compare because of the way this was coded (nuw is handled first).

InstCombine has an assert that fires with the example from:
https://github.com/llvm/llvm-project/issues/52884
...because it was expecting InstSimplify to handle this kind of
pattern with an smax.

Fixes #52884

Differential Revision: https://reviews.llvm.org/D116322
2021-12-28 09:45:37 -05:00
Florian Hahn c2275278c6
[VPlan] Add abstract base class for header phi recipes (NFC).
Not all header phis widen the phi, e.g. like the new
VPCanonicalIVPHIRecipe in D113223. To let those recipes also inherit
from a phi-like base class, add a more generic VPHeaderPHIRecipe
abstract base class.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D116304
2021-12-28 15:37:47 +01:00
Florian Hahn c66286ed59
[LV] Use specific first-order recurrence recipe as arg type (NFC).
Required for further refactoring in D116304.
2021-12-28 10:58:21 +01:00
Florian Hahn 2e630eabd3
[LV] Sink BTC creation to actual use (NFC).
Suggested separately in D116123.
2021-12-27 11:25:46 +01:00
Florian Hahn 511726c64d
[LV] Move getStepVector out of ILV (NFC).
First step to split up induction handling and move it outside ILV.
Used in D116123 and following.
2021-12-26 21:17:26 +01:00
Kazu Hirata 76f0f1cc5c Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC) 2021-12-24 21:43:06 -08:00
Florian Hahn ede7c2438f
[VPlan] Create header & latch blocks for skeleton up front (NFC).
By creating the header and latch blocks up front and adding blocks and
recipes in between those 2 blocks we ensure that the entry and exits of
the plan remain valid throughout construction.

In order to avoid test changes and keep printing of the plans the same,
we use the new header block instead of creating a new block on the first
iteration of the loop traversing the original loop.

We also fold the latch into its predecessor.

This is a follow up to a post-commit suggestion in D114586.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115793
2021-12-22 12:44:25 +00:00
Florian Hahn c83ef407df
[LV] Adjust comment to say the induction is created in header.
Follow-up suggested post-commit for 1a54889f48.
2021-12-22 11:56:40 +00:00
Florian Hahn 1a54889f48
[LV] Ensure WidenCanonicalIVRecipe is always created in header (NFC).
The VPWidenCanonicalIVRecipe must always be created in the phi section
of the header block. Use that block as insert point.
2021-12-21 15:14:48 +00:00
Paul Walker 7c68ed8892 [SVE] Reintroduce -scalable-vectorization=preferred as an alias to "on".
Some buildbots still rely on the experimental flag, so let's keep
it until everything has been migrated to the new "on by default"
state.
2021-12-21 12:54:04 +00:00
Kazu Hirata 500c4b68dc [llvm] Construct SmallVector with iterator ranges (NFC) 2021-12-20 23:43:24 -08:00
Sander de Smalen b1ff20fd35 [LV] Enable scalable vectorization by default for SVE cores.
The availability of SVE should be sufficient to enable scalable
auto-vectorization.

This patch adds a new TTI interface to query the target what style of
vectorization it wants when scalable vectors are available. For other
targets than AArch64, this currently defaults to 'FixedWidthOnly'.

Differential Revision: https://reviews.llvm.org/D115651
2021-12-20 16:23:29 +00:00
Alexey Bataev ab9078f3d3 [SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy.
Need to check for the number of the unique non-constant values since the
unique values may include several constants.

Differential Revision: https://reviews.llvm.org/D115939
2021-12-20 07:21:20 -08:00
Alexey Bataev 4459a11f4d Revert "[SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy."
This reverts commit fcaf290d02 to fix test
mismatch reported in https://lab.llvm.org/buildbot#builders/117/builds/3531
2021-12-20 07:21:18 -08:00
Florian Hahn 5b362e4c7f
[VPlan] Add Debugloc to VPInstruction.
Upcoming changes require attaching debug locations to VPInstructions,
e.g. adding induction increment recipes in D113223.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D115123
2021-12-20 15:10:41 +00:00
Alexey Bataev fcaf290d02 [SLP]Fix PR52756: SLPVectorizer crashes with assertion VecTy == FinalVecTy.
Need to check for the number of the unique non-constant values since the
unique values may include several constants.

Differential Revision: https://reviews.llvm.org/D115939
2021-12-20 05:15:01 -08:00