The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both.
This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling.
Fixes#50778
Differential Revision: https://reviews.llvm.org/D111968
This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.)
This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.)
Differential Revision: https://reviews.llvm.org/D133580
The compiler does not reorder the gather nodes with reused scalars, just
does it for opernads of the user nodes. This currently does not affect
the compiler but breaks internal logic of the SLP graph. In future, it
is supposed to actually use all nodes instead of just list of operands
and this will affect the vectorization result.
Also, did some early check to avoid complex logic in cost estimation
analysis, should improve compiler time a bit.
Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.
At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.
This can lead to widened phis with incorrect start values being created
in the epilogue vector body.
This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization
Fixes#57712.
My understanding is that NoImplicitFloat, despite it's name, is
supposed to disable all vectors not just float vectors.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D134084
Added the mask and the analysis of the buildvector sequence in the
isUndefVector function, improves codegen and cost estimation.
Metric: SLP.NumVectorInstructions
Program SLP.NumVectorInstructions
results results0 diff
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27362.00 27360.00 -0.0%
Metric: size..text
Program size..text
results results0 diff
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 805299.00 806035.00 0.1%
526.blender_r - some extra code is vectorized.
508.namd_r - some extra code is optimized out.
Differential Revision: https://reviews.llvm.org/D133891
Adding the pre-header to CSEBlocks ensures instructions are CSE'd even
after hoisting.
This was original discovered by @atrick a while ago.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D133649
If the reused scalars are clustered, i.e. each part of the reused mask
contains all elements of the original scalars exactly once, we can
reorder those clusters to improve the whole ordering of of the clustered
vectors.
Differential Revision: https://reviews.llvm.org/D133524
Use VPExpandSCEVRecipe to expand the step of pointer inductions. This
cleanup addresses a corresponding FIXME.
It should be NFC, as steps for pointer induction must be constants,
which makes expansion trivial.
This extends the safe-divisor widening scheme recently added for scalable vectors to handle fixed vectors as well.
Differential Revision: https://reviews.llvm.org/D132591
VPReplicateRecipe::isUniform actually means uniform-per-parts, hence a
scalar instruction is generated per-part.
This is a potential alternative D132892. For now the current patch only
catches cases where the address is trivially invariant (defined outside
VPlan), while D132892 catches any address that is considered invariant
by SCEV AFAICT.
It should be possible to hoist fully invariant recipes feeding loads out
of the vector loop region as well, but in practice LICM should do that
already.
This version of the patch artificially limits this to loads to make it
easier to compare, but this restriction should be easily liftable.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D133019
Need either follow the original order of the operands for bool logical
ops, or emit freeze instruction to avoid poison propagation.
Differential Revision: https://reviews.llvm.org/D126877
This patch moves the cost-based decision whether to use an intrinsic or
library call to the point where the recipe is created. This untangles
code-gen from the cost model and also avoids doing some extra work as
the information is already computed at construction.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D132585
This is NOT nfc. Specifically, the following behavior changes:
* Pointers are now allowed. Both uniform, and constants.
* FP uniform non-constants can now be recognized.
* FP undefs are no longer considered constant. This matches int behavior which we had tests for. FP behavior was untested. Its not clear to me int behavior is reasonable, but it's what tests seem to expect, so go with minimum impact for now.
The pointer operands for the ScatterVectorize node may contain
non-instruction values and they are not checked for "already being
vectorized". Need to check that such pointers are already vectorized and
gather them instead of trying to build vectorize node to avoid compiler
crash.
Differential Revision: https://reviews.llvm.org/D132949
Removed EnableFP parameter in getOperandInfo function since it is not
needed, the operands kinds also controlled by the operation code, which
allows to remove extra check for the type of the operands. Also, added
analysis for uniform constant float values.
This change currently does not trigger any changes in the code since TTI
does not do analysis for constant floats, so it can be considered NFC.
Tested with llvm-test-suite + SPEC2017, no changes.
Differential Revision: https://reviews.llvm.org/D132886
I keep finding myself needing to rule this out as a possible source of scalarization, so add debug output like we have for other instructions we decide to scalarize.
This patch changes order of searching for reductions vs other vectorization possibilities.
The idea is if we do not match a reduction it won't be harmful for further attempts to
find vectorizable operations on a vector build sequences. But doing it in the opposite
order we have good chance to ruin opportunity to match a reduction later.
We also don't want to try vectorizing binary operations too early as 2-way vectorization
may effectively prohibit wider ones leading to producing less effective code.
Differential Revision: https://reviews.llvm.org/D132590
When estimating the cost of the in-tree vectorized scalars in
buildvector sequences, need to take into account the vectorized
insertelement instruction. The top of the buildvector seuences is the
topmost vectorized insertelement instruction, because it will have
> than 1 use after the vectorization.
For the affected test case improves througput from 21 to 16 (per
llvm-mca).
Differential Revision: https://reviews.llvm.org/D132740
This addresses a suggestion to simplify the check from D131989. This
also makes it easier to ensure that VPHeaderPHIRecipe::classof checks
for all header phi ids.
Add verification that VPHeaderPHIRecipes are only in header VPBBs. Also
adds missing checks for VPPointerInductionRecipe to
VPHeaderPHIRecipe::classof.
Split off from D119661.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D131989
I'd extracted isUniform, and Florian moved isUniformAfterVectorization out of VPlan at basically the same time. Let's go ahead and merge them.
For the VPTransformState::get path, a VPValue without a def (which corresponds to an external IR value outside of VPLan) is explicitly handled above the uniform check. On the scalarizeInstruction path, I'm less sure why the change isn't visible, but test cases which would seem likely to hit it were already being handled as uniform through some other mechanism. It would be correct to consider values defined outside of vplan uniform here.