Commit Graph

3362 Commits

Author SHA1 Message Date
Florian Hahn ac80b0e84f
[LV] Mark Instr as const in scalarizeInstruction. (NFC).
This is to reduce the diff in follow-up changes.
2022-09-13 09:10:02 +01:00
Florian Hahn 3fd1cc2574
[SLP] Add Preheader to CSE blocks after hoisting CSE-able instrs.
Adding the pre-header to CSEBlocks ensures instructions are CSE'd even
after hoisting.

This was original discovered by @atrick a while ago.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D133649
2022-09-12 15:53:31 +01:00
Alexey Bataev dfe1e9dd79 [SLP]Improve reordering of clustered reused scalars.
If the reused scalars are clustered, i.e. each part of the reused mask
contains all elements of the original scalars exactly once, we can
reorder those clusters to improve the whole ordering of of the clustered
vectors.

Differential Revision: https://reviews.llvm.org/D133524
2022-09-12 06:52:25 -07:00
Florian Hahn 69d9bb2aad
[VPlan] Check recipe uses instead of type of underlying instr (NFC).
Suggested by @Ayal post-commit, to reduce the dependence on the
underlying instruction in favor of information available directly for
the recipe.
2022-09-11 12:24:44 +01:00
Florian Hahn da734473fa
[LV] Remove now dead variable after 2a78890b7b (NFC). 2022-09-09 20:25:55 +01:00
Florian Hahn 2a78890b7b
[VPlan] Move SCEV expansion for pointer induction to VPExpandSCEV (NFC).
Use VPExpandSCEVRecipe to expand the step of pointer inductions. This
cleanup addresses a corresponding FIXME.

It should be NFC, as steps for pointer induction must be constants,
which makes expansion trivial.
2022-09-09 19:20:13 +01:00
Philip Reames a33d98e20a [LV] Pull out common expression [nfc] 2022-09-09 07:31:46 -07:00
Philip Reames edb26268ce [VPlan] Only generate single instr for stores uniform across all parts.
Extend the approach taken by D133019 to store instructions.

Differential Revision: https://reviews.llvm.org/D133497
2022-09-09 07:15:12 -07:00
Philip Reames 4c4c0d2c06 [LV] Use safe-divisor lowering for fixed vectors if profitable
This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well.

Differential Revision: https://reviews.llvm.org/D132591
2022-09-08 09:15:54 -07:00
Florian Hahn 422cf99161
[VPlan] Only generate single instr for loads uniform across all parts.
VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a
scalar instruction is generated per-part.

This is a potential alternative D132892. For now the current patch only
catches cases where the address is trivially invariant (defined outside
VPlan), while D132892 catches any address that is considered invariant
by SCEV AFAICT.

It should be possible to hoist fully invariant recipes feeding loads out
of the vector loop region as well, but in practice LICM should do that
already.

This version of the patch artificially limits this to loads to make it
easier to compare, but this restriction should be easily liftable.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133019
2022-09-08 14:27:58 +01:00
Florian Hahn 408ebe5e3a
[VPlan] Move VPWidenCallRecipe to VPlanRecipes.cpp (NFC).
Depends on D132585.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132586
2022-09-05 10:48:29 +01:00
Kazu Hirata 9eca5ed790 [llvm] Use std::enable_if_t (NFC) 2022-09-03 11:17:44 -07:00
Alexey Bataev 982d9ef1c1 [SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison.
Need either follow the original order of the operands for bool logical
ops, or emit freeze instruction to avoid poison propagation.

Differential Revision: https://reviews.llvm.org/D126877
2022-09-01 05:34:45 -07:00
Florian Hahn fc444ddc77
[VPlan] Add field to track if intrinsic should be used for call. (NFC)
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132585
2022-09-01 13:14:40 +01:00
Alexey Bataev 588115c117 [SLP][NFC]Add a check for SelectInst to match description, NFC. 2022-08-31 13:04:21 -07:00
Alexey Bataev d8d9ee10bb [SLP][NFC]Fix comment and make function following naming standard, NFC. 2022-08-31 12:37:55 -07:00
Philip Reames 8524622bdc [SLP] Simplify getOperandInfo implementation and be consistent
This is NOT nfc.  Specifically, the following behavior changes:
* Pointers are now allowed.  Both uniform, and constants.
* FP uniform non-constants can now be recognized.
* FP undefs are no longer considered constant.  This matches int behavior which we had tests for.  FP behavior was untested.  Its not clear to me int behavior is reasonable, but it's what tests seem to expect, so go with minimum impact for now.
2022-08-31 12:24:05 -07:00
Fangrui Song 13f0795425 [SLPVectorizer] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off build 2022-08-30 23:01:22 -07:00
Alexey Bataev ec06df9459 [SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed.
The pointer operands for the ScatterVectorize node may contain
non-instruction values and they are not checked for "already being
vectorized". Need to check that such pointers are already vectorized and
gather them instead of trying to build vectorize node to avoid compiler
crash.

Differential Revision: https://reviews.llvm.org/D132949
2022-08-30 12:30:14 -07:00
Alexey Bataev afbf5466ba [SLP]Improve operands kind analaysis for constants.
Removed EnableFP parameter in getOperandInfo function since it is not
needed, the operands kinds also controlled by the operation code, which
allows to remove extra check for the type of the operands. Also, added
analysis for uniform constant float values.

This change currently does not trigger any changes in the code since TTI
does not do analysis for constant floats, so it can be considered NFC.
Tested with llvm-test-suite + SPEC2017, no changes.

Differential Revision: https://reviews.llvm.org/D132886
2022-08-30 06:35:39 -07:00
Philip Reames 8936d86469 [LV] Add debug output for force scalar tracing [nfc]
I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.
2022-08-29 15:17:51 -07:00
Valery N Dmitriev 329b972d41 [SLP] Try to match reductions before trying to vectorize a vector build sequence.
This patch changes order of searching for reductions vs other vectorization possibilities.
The idea is if we do not match a reduction it won't be harmful for further attempts to
find vectorizable operations on a vector build sequences. But doing it in the opposite
order we have good chance to ruin opportunity to match a reduction later.
We also don't want to try vectorizing binary operations too early as 2-way vectorization
may effectively prohibit wider ones leading to producing less effective code.

Differential Revision: https://reviews.llvm.org/D132590
2022-08-29 13:32:14 -07:00
Philip Reames 033a97a8f3 [LV] Minor code restructure of isUniformAfterVectorization [nfc]
Mostly just to make a future patch easier to review.
2022-08-29 12:48:27 -07:00
Alexey Bataev beacf9bd9e [SLP]Fix PR57322: vectorize constant float stores.
Stores for constant floats must be vectorized, improve analysis in SLP
vectorizer for stores.

Differential Revision: https://reviews.llvm.org/D132750
2022-08-29 11:02:53 -07:00
Alexey Bataev e6345bf644 [SLP]Improve lookup of the buildvector top insertelement instruction.
When estimating the cost of the in-tree vectorized scalars in
buildvector sequences, need to take into account the vectorized
insertelement instruction. The top of the buildvector seuences is the
topmost vectorized insertelement instruction, because it will have
> than 1 use after the vectorization.

For the affected test case improves througput from 21 to 16 (per
llvm-mca).

Differential Revision: https://reviews.llvm.org/D132740
2022-08-29 08:19:52 -07:00
Florian Hahn c78696813f
[LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC).
Suggested as independent fix during the review of D132585.
2022-08-29 10:19:47 +01:00
Florian Hahn af98b875e8
[VPlan] Use range check in VPHeaderPHIRecipe::classof (NFC).
This addresses a suggestion to simplify the check from D131989. This
also makes it easier to ensure that VPHeaderPHIRecipe::classof checks
for all header phi ids.
2022-08-28 15:54:12 +01:00
Kazu Hirata 56ea4f9bd3 [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-27 21:21:02 -07:00
Florian Hahn 7743badafa
[VPlan] Verify that header only contains header phi recipes.
Add verification that VPHeaderPHIRecipes are only in header VPBBs. Also
adds missing checks for VPPointerInductionRecipe to
VPHeaderPHIRecipe::classof.

Split off from D119661.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D131989
2022-08-27 22:06:12 +01:00
Kazu Hirata 21de2888a4 Use llvm::is_contained (NFC) 2022-08-27 09:53:11 -07:00
Kazu Hirata a33ef8f2b7 Use llvm::all_equal (NFC) 2022-08-27 09:53:10 -07:00
Philip Reames 3dcec5e29f [LV] Consistently use vputils::isUniformAfterVectorization [mostly nfc]
I'd extracted isUniform, and Florian moved isUniformAfterVectorization out of VPlan at basically the same time. Let's go ahead and merge them.

For the VPTransformState::get path, a VPValue without a def (which corresponds to an external IR value outside of VPLan) is explicitly handled above the uniform check.  On the scalarizeInstruction path, I'm less sure why the change isn't visible, but test cases which would seem likely to hit it were already being handled as uniform through some other mechanism.  It would be correct to consider values defined outside of vplan uniform here.
2022-08-26 11:09:17 -07:00
Florian Hahn 4e5c44964a
[VPlan] Move isUniformAfterVectorization from VPlan to vputils (NFC).
This allows re-using the utility without a VPlan object. The helper also
doesn't access any data from VPlan.
2022-08-26 18:26:33 +01:00
Philip Reames 2d5f025779 [LV] Extract utility for checking if VPValue is uniform [nfc] 2022-08-26 09:56:13 -07:00
Daniil Fukalov 9c710ebbdb [TTI] NFC: Reduce InstructionCost::getValue() usage...
in order to propagate `InstructionCost` value upper.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D103406
2022-08-26 16:37:32 +03:00
Valery N Dmitriev a4c8fb9d1f [SLP][NFC] Refactor SLPVectorizerPass::vectorizeRootInstruction method.
The goal is to separate collecting items for post-processing
and processing them. Post processing also outlined as
dedicated method.

Differential Revision: https://reviews.llvm.org/D132603
2022-08-24 17:07:53 -07:00
Philip Reames 23245a914b [LV] Simplify code given isPredicatedInst doesn't dependent on VF any more [nfc] 2022-08-24 11:42:10 -07:00
Philip Reames 3ab00cfca9 [LV] Adjust code added in f79214d1 for 531dd3634 [nfc]
When rebasing the review which became f79214d1, I forgot to adjust for the changed semantics introduced by 531dd3634.  Functionally, this had no impact, but semantically it resulted in an incorrect result for isPredicatedInst.  I noticed this while doing a follow up change.
2022-08-24 10:38:17 -07:00
Philip Reames f79214d1e1 [LV] Support predicated div/rem operations via safe-divisor select idiom
This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors.

The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others.  This one was chosen mostly because it was straight forward, and none of the others seemed oviously better.

I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches.

Differential Revision: https://reviews.llvm.org/D130164
2022-08-24 10:07:59 -07:00
Florian Hahn 689895f432
[VPlan] Remove unneeded `struct` prefix for VPTransformState args (NFC). 2022-08-24 17:58:08 +01:00
David Green 8d830f8d68 [LV] Replace fixed-order cost model with a SK_Splice shuffle
The existing cost model for fixed-order recurrences models the phi as an
extract shuffle of a v1 vector. The shuffle produced should be a splice,
as they take two vectors inputs are extracting from a subset of the
lanes. On certain architectures the existing cost model can drastically
under-estimate the correct cost for the shuffle, so this changes it to a
SK_Splice and passes a correct Mask through to the getShuffleCost call.

I believe this might be the first use of a SK_Splice shuffle cost model
outside of scalable vectors, and some targets may require additions to
the cost-model to correctly account for them. In tree targets appear to
all have been updated where needed.

Differential Revision: https://reviews.llvm.org/D132308
2022-08-24 13:00:32 +01:00
Philip Reames 49547b2241 [slp] Pull out a getOperandInfo variant helper [nfc] 2022-08-23 13:46:05 -07:00
Florian Hahn ff34432649
[LoopUtils] Remove unused Loop arg from addDiffRuntimeChecks (NFC).
The argument is no longer used, remove it.
2022-08-23 10:15:28 +01:00
Philip Reames 27d3321c4f [TTI] Use OperandValueInfo in getMemoryOpCost client api [nfc]
This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.
2022-08-22 11:26:31 -07:00
Philip Reames 274f86e7a6 [TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc]
This completes the client side transition to the OperandValueInfo version of this routine.  Backend TTI implementations still use the prior versions for now.
2022-08-22 11:06:32 -07:00
Philip Reames c42a5f1cc2 [TTI] Migrate getOperandInfo to OperandVaueInfo [nfc]
This is part of merging OperandValueKind and OperandValueProperties.
2022-08-22 10:19:02 -07:00
Philip Reames 5cd427106d [TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc]
OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling.  We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so.

This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works.  Target TTI implementations still use the split flags.  I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.
2022-08-22 09:48:15 -07:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Kazu Hirata 8b1b0d1d81 Revert "Use std::is_same_v instead of std::is_same (NFC)"
This reverts commit c5da37e42d.

This patch seems to break builds with some versions of MSVC.
2022-08-20 23:00:39 -07:00
Kazu Hirata c5da37e42d Use std::is_same_v instead of std::is_same (NFC) 2022-08-20 22:36:26 -07:00