Commit Graph

3413 Commits

Author SHA1 Message Date
Alexey Bataev b8b740c834 [SLP][NFC]Remove unused variable, NFC. 2022-10-19 12:35:27 -07:00
Florian Hahn d72fcee8f4
[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC).
@Ayal suggested a better named helper than using `!getDef()` to check if
a value is invariant across all parts.

The property we are using here is that the VPValue is defined outside
any vector loop region. There's a TODO left to handle recipes defined in
pre-header blocks.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133666
2022-10-19 13:20:30 +01:00
Alexey Bataev 087dadfd37 [SLP]Generalize cost model.
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
  cost model for extractelement instructions.

cpu2017
   511.povray_r             0.57
   520.omnetpp_r           -0.98
   521.wrf_r               -0.01
   525.x264_r               3.59 <+
   526.blender_r           -0.12
   531.deepsjeng_r         -0.07
   538.imagick_r           -1.42
Geometric mean:  0.21

Differential Revision: https://reviews.llvm.org/D115757
2022-10-18 11:55:59 -07:00
Alexey Bataev 62267e8de0 Revert "[SLP]Generalize cost model."
This reverts commit f12fb91188 and
f5c747bfbe to fix detected non-initialized
var use.
2022-10-18 11:25:59 -07:00
Alexey Bataev f5c747bfbe [SLP][NFC]Fix a warning for ?: with enum/unsigned, NFC. 2022-10-18 10:08:05 -07:00
Alexey Bataev f12fb91188 [SLP]Generalize cost model.
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
  cost model for extractelement instructions.

cpu2017
   511.povray_r             0.57
   520.omnetpp_r           -0.98
   521.wrf_r               -0.01
   525.x264_r               3.59 <+
   526.blender_r           -0.12
   531.deepsjeng_r         -0.07
   538.imagick_r           -1.42
Geometric mean:  0.21

Differential Revision: https://reviews.llvm.org/D115757
2022-10-18 08:49:32 -07:00
Alexey Bataev e79532d28c [SLP][NFC]Try to fix MSVC buildbots with a workaround, NFC. 2022-10-18 07:50:10 -07:00
Alexey Bataev 6a6fc4890d [SLP][NFC]Formatting of the getEntryCost function, NFC. 2022-10-18 07:18:26 -07:00
Sanjay Patel 8d76fbb5f0 [VectorCombine] fix crashing on match of non-canonical fneg
We can't assume that operand 0 is the negated operand because
the matcher handles "fsub -0.0, X" (and also +0.0 with FMF).

By capturing the extract within the match, we avoid the bug
and make the transform more robust (can't assume that this
pass will only see canonical IR).
2022-10-17 10:47:48 -04:00
Kazu Hirata b2f41e9ac1 [Vectorize] Use std::conditional_t (NFC) 2022-10-15 14:52:25 -07:00
Craig Topper d3366efd43 [LV] Simplify register usage code and avoid double map lookups. NFC
Instead of checking whether a map entry exists to decide if we should
initialize it or add to it, we can rely on the map entry being constructed
and initialized to 0 before the addition happens.

For the std::max case, I've made a reference to the map entry to
avoid looking it up twice.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D135977
2022-10-14 11:55:48 -07:00
Alexey Bataev c787986cdd [SLP]Improve costs of vectorized loads/stores by analyzing GEPs.
When generating masked gathers nodes, SLP vectorizer accounts the cost
of the GEPs for loads as part of the scalar-vector transformation cost
estimation. But it does not do it for vectorized loads/stores, while it
may completely remove some of the GEPs completely. Because of this in
some cases masked gather operation can be much more profitable rather
than regular vectorization (masked-gather cost + vector GEP - scalar
loads + GEPs comparing to vectorized loads - scalar loads).
Added the analysis of the removed scalarGEPs for vectorized load/store nodes for better cost estimation.

Differential Revision: https://reviews.llvm.org/D135282
2022-10-13 07:20:41 -07:00
Alexey Bataev d71ad41080 [SLP]Fix insertpoint of the extractellements instructions to avoid reshuffle crash.
Need to set the insertpoint for extractelement to point to the first
instruction in the node to avoid possible crash during external uses
combine  process. Without it we may endup with the incorrect
transformation.

Differential Revision: https://reviews.llvm.org/D135591
2022-10-12 08:18:30 -07:00
Jordan Rupprecht cbae57c0e1 [NFC] Ignore unused var in no-asserts builds 2022-10-12 08:11:10 -07:00
Alexey Bataev 1be3428ea0 [SLP]Fix PR58177: Improve isUndefVector function to avoid extra freeze.
Freeze instruction in some cases makes codegen worse, so need to be very
careful when emitting it. Instead improve analysis in isUndefVector
function to generate mask of unused elements and use it in the analysis.

Differential Revision: https://reviews.llvm.org/D135382
2022-10-12 07:32:54 -07:00
Florian Hahn c1fe52bfa6
[VPlan] Remove dead recipes before sinking.
optimizeInductions may leave dead recipes which can prevent sinking.
Sinking on the other hand should not introduce new dead recipes, so
clean up dead recipes before sinking.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133762
2022-10-12 12:49:42 +01:00
Sanjay Patel baab4aa1ba [VectorCombine] convert scalar fneg with insert/extract to vector fneg
insertelt DestVec, (fneg (extractelt SrcVec, Index)), Index --> shuffle DestVec, (fneg SrcVec), Mask

This is a specialized form of what could be a more general fold for a binop.
It's also possible that fneg is overlooked by SLP in this kind of
insert/extract pattern since it's a unary op.

This shows up in the motivating example from #issue 58139, but it won't solve
it (that probably requires some x86-specific backend changes). There are also
some small enhancements (see TODO comments) that can be done as follow-up
patches.

Differential Revision: https://reviews.llvm.org/D135278
2022-10-10 14:59:56 -04:00
Florian Hahn 73950f26f5
[LV] Replace check with assert for reduction resume values (NFC).
At this point, we need to have resume values for all inductions. If not,
this would result in silent mis-compiles.
2022-10-08 16:26:10 +01:00
Alexey Bataev 323ed2308a [SLP]Improve/fix CSE analysis of the blocks/instructions.
Added analysis for invariant extractelement instructions and improved
detection of the CSE blocks for generated extractelement instructions.

Differential Revision: https://reviews.llvm.org/D135279
2022-10-06 12:08:48 -07:00
Alexey Bataev ab9a81f736 [SLP]Try to emit canonical shuffle with undef operand.
In the canonical form of the shuffle the poison/undef operand is the
second operand, the patch tries to emit canonical form for partial
vectorization of the buildvector sequence.
Also, this patch starts emitting freeze instruction for shuffles with undef indices if the second shuffle operan is undef, not poison. It is an initial step to D93818, where undef mask element are treated as returning poison value.

Differential Revision: https://reviews.llvm.org/D134377
2022-10-04 08:16:07 -07:00
Florian Hahn 825e16969e
[LAA] Pass LoopAccessInfoManager instead of GetLAA function.
Use LoopAccessInfoManager directly instead of various GetLAA lambdas.

Depends on D134608.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134609
2022-10-04 11:51:25 +01:00
Florian Hahn 7c0ff64b0f
[LAA] Change to function analysis for new PM.
At the moment, LoopAccessAnalysis is a loop analysis for the new pass
manager. The issue with that is that LAI caches SCEV expressions and
modifications in a loop may impact SCEV expressions in other loops, but
we do not have a convenient way to invalidate LAI for other loops
withing a loop pipeline.

To avoid this issue, turn it into a function analysis which returns a
manager object that keeps track of the individual LAI objects per loop.

Fixes #50940.

Fixes #51669.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D134606
2022-10-01 15:44:27 +01:00
Simon Pilgrim 5849fcb635 Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions"
Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI."
Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds"

Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.
2022-09-30 11:22:48 +01:00
Florian Hahn 080a1e2bbb
[LV] Create createInductionResumeValue helper (NFC).
Factor out the logic to create induction resume values for a specific
induction. This will be used in D92132 to support widened IVs during
epilogue vectorization.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D134211
2022-09-29 11:13:01 +01:00
serge-sans-paille 16544cbe64 [iwyu] Move <cmath> out of llvm/Support/MathExtras.h
Interestingly, MathExtras.h doesn't use <cmath> declaration, so move it out of
that header and include it when needed.

No functional change intended, but there's no longer a transitive include
fromMathExtras.h to cmath.
2022-09-28 20:49:01 +02:00
Simon Pilgrim 926ccfef03 [SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds
Workaround for a chromium bug reported on D134605 - test case will be added later
2022-09-28 11:03:37 +01:00
Igor Kirillov 2d60d7ba1a [LoopVectorize][Fix] Crash when invariant store address is calculated inside loop
Fixes #57572

Generally LICM pass is responsible for sinking out code that calculates
invariant address inside loop as it only needed to be calculated once.
But in rare case it does not happen we will not be vectorizing the
loop.

Differential Revision: https://reviews.llvm.org/D133687
2022-09-28 10:33:50 +01:00
Philip Reames f6d110e26f [LAA] Make getPtrStride return Option instead of overloading zero as error value [nfc]
This is purely NFC restructure in advance of a change which actually exposes zero strides.  This is mostly because I find this interface confusing each time I look at it.
2022-09-27 15:55:44 -07:00
Philip Reames 899ebd7e99 [LV] Remove two unused default arguments [nfc] 2022-09-27 14:33:53 -07:00
Philip Reames dc7387b587 [LV] Adjust cost model to use uniform store lowering for unpredicated uniform stores
Follow up to D133580; adjust the cost model to prefer uniform store lowering for scalable stores which are unpredicated.

The impact here isn't in the uniform store lowering quality itself. InstCombine happily converts the scatter form into the single store form. The main impact is in letting the rest of the cost model make choices based on the knowledge that the vector will be scalarized on use.

Differential Revision: https://reviews.llvm.org/D134460
2022-09-27 07:28:40 -07:00
Simon Pilgrim ef89409a59 Fix 'unused-lambda-capture' gcc warning. NFCI. 2022-09-27 15:15:43 +01:00
Simon Pilgrim 1b7089fe67 [SLP] Add ScalarizationOverheadBuilder helper to track vector extractions
Instead of accumulating all extraction costs separately and then adjusting for repeated subvector extractions, this patch collects all the extractions and then converts to calls to getScalarizationOverhead to improve the accuracy of the costs.

I'm not entirely satisfied with the getExtractWithExtendCost handling yet - this still just adds all the getExtractWithExtendCost costs together - it really needs to be replaced with a "getScalarizationOverheadWithExtend", but that will require further refactoring first.

This replaces my initial attempt in D124769.

Differential Revision: https://reviews.llvm.org/D134605
2022-09-27 14:49:07 +01:00
Florian Hahn 2c692d891e
[LV] Update handling of scalable pointer inductions after b73d2c8.
The dependent code has been changed quite a lot since 151c144 which
b73d2c8 effectively reverts. Now we run into a case where lowering
didn't expect/support the behavior pre 151c144 any longer.

Update the code dealing with scalable pointer inductions to also check
for uniformity in combination with isScalarAfterVectorization. This
should ensure scalable pointer inductions are handled properly during
epilogue vectorization.

Fixes #57912.
2022-09-23 18:23:02 +01:00
Philip Reames 954c1ed009 [SLP] Adjust debug output for store vectorization failure
When store vectorization is infeasible, it's helpful to have a debug logging indication of why.  A case I've hit a couple times now is accidentally using -march instead of -mtriple and getting the default TTI results.  This causes max-vf to become 1, and thus hits the added logging line.
2022-09-23 09:58:15 -07:00
Philip Reames 42ef572049 [SLP] Fix cost model w.r.t. operand properties
We allow the target to report different costs depending on properties of the operands; given this, we have to make sure we pass the right set of operands and account for the fact that different scalar instructions can have operands with different properties.

As a motivating example, consider a set of multiplies which each multiply by a constant (but not all the same constant). Most of the constants are power of two (but not all).

If the target doesn't have support for non-uniform constant immediates, this will likely require constant materialization and a non-uniform multiply. However, depending on the balance of target costs for constant scalar multiplies vs a single vector multiply, this might or might not be a profitable vectorization.

This ends up basically being a rewrite of the existing code. Normally, I'd scope the change more narrowly, but I kept noticing things which seemed highly suspicious, and none of the existing code appears to have any test coverage at all. I think this is a case where simply throwing out the existing code and starting from scratch is reasonable.

This is a follow on to Alexey's D126885, but also handles the arithmetic instruction case since the existing code appears to have the same problem.

Differential Revision: https://reviews.llvm.org/D132566
2022-09-23 08:40:23 -07:00
Simon Pilgrim a6e9141505 [TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436)
The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both.

This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling.

Fixes #50778

Differential Revision: https://reviews.llvm.org/D111968
2022-09-23 14:03:18 +01:00
Philip Reames 32dc1151e2 [VPlan] Only generate single instr for unpredicated stores of varying value to invariant address
This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.)

This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.)

Differential Revision: https://reviews.llvm.org/D133580
2022-09-22 08:53:46 -07:00
Alexey Bataev e664dea182 [SLP]Fix write-after-bounds.
Mask might be larger than the NumElts-OffsetBeg, need to use actual
indices to avoid acces out of bounds.
2022-09-21 08:00:15 -07:00
Florian Hahn dcbc8a0daa
[LV] Remove unused widenCallInstruction declaration (NFC).
The definition and uses have been removed a while ago. Clean up the
unused declaration.
2022-09-20 15:20:28 +01:00
Matt Arsenault 2adae8e1b7 VectorCombine: Pass through AssumptionCache 2022-09-19 19:25:22 -04:00
Matt Arsenault 555af0274c SLPVectorizer: Pass through AssumptionCache 2022-09-19 19:25:22 -04:00
Matt Arsenault b609741958 LoopVectorize: Pass through AssumptionCache 2022-09-19 19:25:22 -04:00
Alexey Bataev ce39bdbd65 [SLP][NFC]Reorder gather nodes with reused scalars, NFC.
The compiler does not reorder the gather nodes with reused scalars, just
does it for opernads of the user nodes. This currently does not affect
the compiler but breaks internal logic of the SLP graph. In future, it
is supposed to actually use all nodes instead of just list of operands
and this will affect the vectorization result.
Also, did some early check to avoid complex logic in cost estimation
analysis, should improve compiler time a bit.
2022-09-19 14:00:17 -07:00
Florian Hahn 582f8ef19f
[LV] Keep track of cost-based ScalarAfterVec in VPWidenPointerInd.
Epilogue vectorization uses isScalarAfterVectorization to check if
widened versions for inductions need to be generated and bails out in
those cases.

At the moment, there are scenarios where isScalarAfterVectorization
returns true but VPWidenPointerInduction::onlyScalarsGenerated would
return false, causing widening.

This can lead to widened phis with incorrect start values being created
in the epilogue vector body.

This patch addresses the issue by storing the cost-model decision in
VPWidenPointerInductionRecipe and restoring the behavior before 151c144.
This effectively reverts 151c144, but the long-term fix is to properly
support widened inductions during epilogue vectorization

Fixes #57712.
2022-09-19 18:14:35 +01:00
Craig Topper 90a004b4a1 [LV] Remove FIXME about NoImplicitFloat. NFC
My understanding is that NoImplicitFloat, despite it's name, is
supposed to disable all vectors not just float vectors.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D134084
2022-09-19 10:01:02 -07:00
Kazu Hirata 5e5a6c5b07 Use std::conditional_t (NFC) 2022-09-18 10:25:06 -07:00
Alexey Bataev 5d13b12674 [SLP]Improve isUndefVector function by adding insertelement analysis.
Added the mask and the analysis of the buildvector sequence in the
isUndefVector function, improves codegen and cost estimation.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
                                                                          results                   results0 diff
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27362.00                  27360.00 -0.0%

Metric: size..text

Program                                                                                                           size..text
                                                                   results     results0    diff
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   805299.00   806035.00  0.1%

526.blender_r - some extra code is vectorized.
508.namd_r - some extra code is optimized out.

Differential Revision: https://reviews.llvm.org/D133891
2022-09-16 14:36:38 -07:00
Alexey Bataev d647312e3f [SLP][NFC]Extract getLastInstructionInBundle function for better
dependence  checking, NFC.

Part of D110978
2022-09-14 08:43:15 -07:00
Alexey Bataev 796af0c027 [SLP] Move getInsertIndex function, NFC.
Part of D110978.
2022-09-14 06:22:52 -07:00
Kazu Hirata d3649c2be4 [Vectorize] Fix a warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5879:5: error:
  expression result unused [-Werror,-Wunused-value]
2022-09-13 09:30:06 -07:00