Commit Graph

3351 Commits

Author SHA1 Message Date
Kazu Hirata 9eca5ed790 [llvm] Use std::enable_if_t (NFC) 2022-09-03 11:17:44 -07:00
Alexey Bataev 982d9ef1c1 [SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison.
Need either follow the original order of the operands for bool logical
ops, or emit freeze instruction to avoid poison propagation.

Differential Revision: https://reviews.llvm.org/D126877
2022-09-01 05:34:45 -07:00
Florian Hahn fc444ddc77
[VPlan] Add field to track if intrinsic should be used for call. (NFC)
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D132585
2022-09-01 13:14:40 +01:00
Alexey Bataev 588115c117 [SLP][NFC]Add a check for SelectInst to match description, NFC. 2022-08-31 13:04:21 -07:00
Alexey Bataev d8d9ee10bb [SLP][NFC]Fix comment and make function following naming standard, NFC. 2022-08-31 12:37:55 -07:00
Philip Reames 8524622bdc [SLP] Simplify getOperandInfo implementation and be consistent
This is NOT nfc.  Specifically, the following behavior changes:
* Pointers are now allowed.  Both uniform, and constants.
* FP uniform non-constants can now be recognized.
* FP undefs are no longer considered constant.  This matches int behavior which we had tests for.  FP behavior was untested.  Its not clear to me int behavior is reasonable, but it's what tests seem to expect, so go with minimum impact for now.
2022-08-31 12:24:05 -07:00
Fangrui Song 13f0795425 [SLPVectorizer] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off build 2022-08-30 23:01:22 -07:00
Alexey Bataev ec06df9459 [SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed.
The pointer operands for the ScatterVectorize node may contain
non-instruction values and they are not checked for "already being
vectorized". Need to check that such pointers are already vectorized and
gather them instead of trying to build vectorize node to avoid compiler
crash.

Differential Revision: https://reviews.llvm.org/D132949
2022-08-30 12:30:14 -07:00
Alexey Bataev afbf5466ba [SLP]Improve operands kind analaysis for constants.
Removed EnableFP parameter in getOperandInfo function since it is not
needed, the operands kinds also controlled by the operation code, which
allows to remove extra check for the type of the operands. Also, added
analysis for uniform constant float values.

This change currently does not trigger any changes in the code since TTI
does not do analysis for constant floats, so it can be considered NFC.
Tested with llvm-test-suite + SPEC2017, no changes.

Differential Revision: https://reviews.llvm.org/D132886
2022-08-30 06:35:39 -07:00
Philip Reames 8936d86469 [LV] Add debug output for force scalar tracing [nfc]
I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.
2022-08-29 15:17:51 -07:00
Valery N Dmitriev 329b972d41 [SLP] Try to match reductions before trying to vectorize a vector build sequence.
This patch changes order of searching for reductions vs other vectorization possibilities.
The idea is if we do not match a reduction it won't be harmful for further attempts to
find vectorizable operations on a vector build sequences. But doing it in the opposite
order we have good chance to ruin opportunity to match a reduction later.
We also don't want to try vectorizing binary operations too early as 2-way vectorization
may effectively prohibit wider ones leading to producing less effective code.

Differential Revision: https://reviews.llvm.org/D132590
2022-08-29 13:32:14 -07:00
Philip Reames 033a97a8f3 [LV] Minor code restructure of isUniformAfterVectorization [nfc]
Mostly just to make a future patch easier to review.
2022-08-29 12:48:27 -07:00
Alexey Bataev beacf9bd9e [SLP]Fix PR57322: vectorize constant float stores.
Stores for constant floats must be vectorized, improve analysis in SLP
vectorizer for stores.

Differential Revision: https://reviews.llvm.org/D132750
2022-08-29 11:02:53 -07:00
Alexey Bataev e6345bf644 [SLP]Improve lookup of the buildvector top insertelement instruction.
When estimating the cost of the in-tree vectorized scalars in
buildvector sequences, need to take into account the vectorized
insertelement instruction. The top of the buildvector seuences is the
topmost vectorized insertelement instruction, because it will have
> than 1 use after the vectorization.

For the affected test case improves througput from 21 to 16 (per
llvm-mca).

Differential Revision: https://reviews.llvm.org/D132740
2022-08-29 08:19:52 -07:00
Florian Hahn c78696813f
[LV] Remove unneeded getVectorIntrinsicIDForCall call (NFC).
Suggested as independent fix during the review of D132585.
2022-08-29 10:19:47 +01:00
Florian Hahn af98b875e8
[VPlan] Use range check in VPHeaderPHIRecipe::classof (NFC).
This addresses a suggestion to simplify the check from D131989. This
also makes it easier to ensure that VPHeaderPHIRecipe::classof checks
for all header phi ids.
2022-08-28 15:54:12 +01:00
Kazu Hirata 56ea4f9bd3 [Transforms] Qualify auto in range-based for loops (NFC)
Identified with readability-qualified-auto.
2022-08-27 21:21:02 -07:00
Florian Hahn 7743badafa
[VPlan] Verify that header only contains header phi recipes.
Add verification that VPHeaderPHIRecipes are only in header VPBBs. Also
adds missing checks for VPPointerInductionRecipe to
VPHeaderPHIRecipe::classof.

Split off from D119661.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D131989
2022-08-27 22:06:12 +01:00
Kazu Hirata 21de2888a4 Use llvm::is_contained (NFC) 2022-08-27 09:53:11 -07:00
Kazu Hirata a33ef8f2b7 Use llvm::all_equal (NFC) 2022-08-27 09:53:10 -07:00
Philip Reames 3dcec5e29f [LV] Consistently use vputils::isUniformAfterVectorization [mostly nfc]
I'd extracted isUniform, and Florian moved isUniformAfterVectorization out of VPlan at basically the same time. Let's go ahead and merge them.

For the VPTransformState::get path, a VPValue without a def (which corresponds to an external IR value outside of VPLan) is explicitly handled above the uniform check.  On the scalarizeInstruction path, I'm less sure why the change isn't visible, but test cases which would seem likely to hit it were already being handled as uniform through some other mechanism.  It would be correct to consider values defined outside of vplan uniform here.
2022-08-26 11:09:17 -07:00
Florian Hahn 4e5c44964a
[VPlan] Move isUniformAfterVectorization from VPlan to vputils (NFC).
This allows re-using the utility without a VPlan object. The helper also
doesn't access any data from VPlan.
2022-08-26 18:26:33 +01:00
Philip Reames 2d5f025779 [LV] Extract utility for checking if VPValue is uniform [nfc] 2022-08-26 09:56:13 -07:00
Daniil Fukalov 9c710ebbdb [TTI] NFC: Reduce InstructionCost::getValue() usage...
in order to propagate `InstructionCost` value upper.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D103406
2022-08-26 16:37:32 +03:00
Valery N Dmitriev a4c8fb9d1f [SLP][NFC] Refactor SLPVectorizerPass::vectorizeRootInstruction method.
The goal is to separate collecting items for post-processing
and processing them. Post processing also outlined as
dedicated method.

Differential Revision: https://reviews.llvm.org/D132603
2022-08-24 17:07:53 -07:00
Philip Reames 23245a914b [LV] Simplify code given isPredicatedInst doesn't dependent on VF any more [nfc] 2022-08-24 11:42:10 -07:00
Philip Reames 3ab00cfca9 [LV] Adjust code added in f79214d1 for 531dd3634 [nfc]
When rebasing the review which became f79214d1, I forgot to adjust for the changed semantics introduced by 531dd3634.  Functionally, this had no impact, but semantically it resulted in an incorrect result for isPredicatedInst.  I noticed this while doing a follow up change.
2022-08-24 10:38:17 -07:00
Philip Reames f79214d1e1 [LV] Support predicated div/rem operations via safe-divisor select idiom
This patch adds support for vectorizing conditionally executed div/rem operations via a variant of widening. The existing support for predicated divrem in the vectorizer requires scalarization which we can't do for scalable vectors.

The basic idea is that we can always divide (take remainder) by 1 without executing UB. As such, we can use the active lane mask to conditional select either the actual divisor for active lanes, or a constant one for inactive lanes. We already account for the cost of the active lane mask, so the only additional cost is a splat of one and the vector select. This is one of several possible approaches to this problem; see the review thread for discussion on some of the others.  This one was chosen mostly because it was straight forward, and none of the others seemed oviously better.

I enabled the new code only for scalable vectors. We could also legally enable it for fixed vectors as well, but I haven't thought through the cost tradeoffs between widening and scalarization enough to know if that's profitable. This will be explored in future patches.

Differential Revision: https://reviews.llvm.org/D130164
2022-08-24 10:07:59 -07:00
Florian Hahn 689895f432
[VPlan] Remove unneeded `struct` prefix for VPTransformState args (NFC). 2022-08-24 17:58:08 +01:00
David Green 8d830f8d68 [LV] Replace fixed-order cost model with a SK_Splice shuffle
The existing cost model for fixed-order recurrences models the phi as an
extract shuffle of a v1 vector. The shuffle produced should be a splice,
as they take two vectors inputs are extracting from a subset of the
lanes. On certain architectures the existing cost model can drastically
under-estimate the correct cost for the shuffle, so this changes it to a
SK_Splice and passes a correct Mask through to the getShuffleCost call.

I believe this might be the first use of a SK_Splice shuffle cost model
outside of scalable vectors, and some targets may require additions to
the cost-model to correctly account for them. In tree targets appear to
all have been updated where needed.

Differential Revision: https://reviews.llvm.org/D132308
2022-08-24 13:00:32 +01:00
Philip Reames 49547b2241 [slp] Pull out a getOperandInfo variant helper [nfc] 2022-08-23 13:46:05 -07:00
Florian Hahn ff34432649
[LoopUtils] Remove unused Loop arg from addDiffRuntimeChecks (NFC).
The argument is no longer used, remove it.
2022-08-23 10:15:28 +01:00
Philip Reames 27d3321c4f [TTI] Use OperandValueInfo in getMemoryOpCost client api [nfc]
This removes the last use of OperandValueKind from the client side API, and (once this is fully plumbed through TTI implementation) allow use of the same properties in store costing as arithmetic costing.
2022-08-22 11:26:31 -07:00
Philip Reames 274f86e7a6 [TTI] Remove OperandValueKind/Properties from getArithmeticInstrCost interface [nfc]
This completes the client side transition to the OperandValueInfo version of this routine.  Backend TTI implementations still use the prior versions for now.
2022-08-22 11:06:32 -07:00
Philip Reames c42a5f1cc2 [TTI] Migrate getOperandInfo to OperandVaueInfo [nfc]
This is part of merging OperandValueKind and OperandValueProperties.
2022-08-22 10:19:02 -07:00
Philip Reames 5cd427106d [TTI] Start process of merging OperandValueKind and OperandValueProperties [nfc]
OperandValueKind and OperandValueProperties both provide facts about the operands of an instruction for purposes of cost modeling.  We've discussed merging them several times; before I plumb through more flags, let's go ahead and do so.

This change only adds the client side interface for getArithmeticInstrCost and makes a couple of minor changes in client code to prove that it works.  Target TTI implementations still use the split flags.  I'm deliberately splitting what could be one big change into a series of smaller ones so that I can lean on the compiler to catch errors along the way.
2022-08-22 09:48:15 -07:00
Simon Pilgrim 5263155d5b [CostModel] Add CostKind argument to getShuffleCost
Defaults to TCK_RecipThroughput - as most explicit calls were assuming TCK_RecipThroughput (vectorizers) or was just doing a before-vs-after comparison (vectorcombiner). Calls via getInstructionCost were just dropping the CostKind, so again there should be no change at this time (as getShuffleCost and its expansions don't use CostKind yet) - but it will make it easier for us to better account for size/latency shuffle costs in inline/unroll passes in the future.

Differential Revision: https://reviews.llvm.org/D132287
2022-08-21 10:54:51 +01:00
Kazu Hirata 8b1b0d1d81 Revert "Use std::is_same_v instead of std::is_same (NFC)"
This reverts commit c5da37e42d.

This patch seems to break builds with some versions of MSVC.
2022-08-20 23:00:39 -07:00
Kazu Hirata c5da37e42d Use std::is_same_v instead of std::is_same (NFC) 2022-08-20 22:36:26 -07:00
Kazu Hirata 258531b7ac Remove redundant initialization of Optional (NFC) 2022-08-20 21:18:28 -07:00
Philip Reames b0a2c48e9f [tti] Consolidate getOperandInfo without OperandValueProperties copies [nfc] 2022-08-19 16:22:22 -07:00
Alexey Bataev c167028684 [SLP]Delay vectorization of postponable values for instructions with no users.
SLP vectorizer tries to find the reductions starting the operands of the
instructions with no-users/void returns/etc. But such operands can be
postponable instructions, like Cmp, InsertElement or InsertValue. Such
operands still must be postponed, vectorizer should not try to vectorize
them immediately.

Differential Revision: https://reviews.llvm.org/D131965
2022-08-19 08:39:16 -07:00
Alexey Bataev 0e7ed32c71 [SLP]Cost for a constant buildvector.
In many cases constant buildvector results in a vector load from a
constant/data pool. Need to consider this cost too.

Differential Revision: https://reviews.llvm.org/D126885
2022-08-19 08:02:42 -07:00
Alexey Bataev d53e245951 [COST][NFC]Introduce OperandValueKind in getMemoryOpCost, NFC.
Added OperandValueKind OpdInfo parameter to getMemoryOpCost functions to
better estimate cost with immediate values.

Part of D126885.
2022-08-19 07:33:00 -07:00
Florian Hahn b8709a9d03
[LV] Support fixed order recurrences.
If the incoming previous value of a fixed-order recurrence is a phi in
the header, go through incoming values from the latch until we find a
non-phi value. Use this as the new Previous, all uses in the header
will be dominated by the original phi, but need to be moved after
the non-phi previous value.

At the moment, fixed-order recurrences are modeled as a chain of
first-order recurrences.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D119661
2022-08-18 19:15:52 +01:00
Philip Reames 1436adae2c [LV-L] Add const and move method body out of line [nfc] 2022-08-18 11:10:19 -07:00
Philip Reames c064d3f139 [LV] Use early continue to simplify code [nfc] 2022-08-18 10:31:55 -07:00
Philip Reames 531dd3634d [LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops)
This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is *not* an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it *not* to be a predicated instruction, but *were* considering it to be scalable with predication.

As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely. I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious.

Differential Revision: https://reviews.llvm.org/D131093
2022-08-18 07:14:04 -07:00
Simon Pilgrim 594c5b1a42 [SLP] Update TODO comment about shuffle mask decoding
This is handled in ShuffleVectorInst/getShuffleCost - getInstructionThroughput is (slowly) being removed.
2022-08-17 11:41:46 +01:00
Alexey Bataev 65c7cecb13 [SLP]Fix PR51320: Try to vectorize single store operands.
Currently, we try to vectorize values, feeding into stores, only if
slp-vectorize-hor-store option is provided. We can safely enable
vectorization of the value operand of a single store in the basic block,
if the operand value is used only in store.
It should enable extra vectorization and should not increase compile
time significantly.
Fixes https://github.com/llvm/llvm-project/issues/51320

Differential Revision: https://reviews.llvm.org/D131894
2022-08-16 07:25:21 -07:00