Commit Graph

3453 Commits

Author SHA1 Message Date
Alexey Bataev 07015e12f0 [SLP]Fix PR59053: trying to erase instruction with users.
Need to count the reduced values, vectorized in the tree but not in the top node. Such scalars still must be extracted out of the vector node instead of the original scalar.
2022-11-17 17:23:48 -08:00
Stanislav Mekhanoshin bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Florian Hahn 55f56cdc33
[VPlan] Introduce VPValue::hasDefiningRecipe helper (NFC).
This clarifies the intention of code that uses the helper.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 23:12:40 +00:00
Florian Hahn aa16689f82
[VPlan] Use recipe type to avoid getDefiningRecipe call (NFC).
Suggested by @Ayal during review of D136068, thanks!
2022-11-16 23:03:34 +00:00
Florian Hahn 239b52d4b6
[VPlan] Update stale comment (NFC).
Update comment to reflect current code, which also allows for
VPScalarIVStepsRecipes to be uniform.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 22:39:50 +00:00
Florian Hahn bcc9c5d959
[LV] Replace unnecessary cast_or_null with cast (NFC).
The existing code already unconditionally dereferences RepR, so
cast_or_null can be replaced by just cast.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 22:31:59 +00:00
Florian Hahn 32f1c5531b
[VPlan] Update VPValue::getDef to return VPRecipeBase, adjust name(NFC)
The return value of getDef is guaranteed to be a VPRecipeBase and all
users can also accept a VPRecipeBase *. Most users actually case to
VPRecipeBase or a specific recipe before using it, so this change
removes a number of redundant casts.

Also rename it to getDefiningRecipe to make the name a bit clearer.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D136068
2022-11-16 22:12:08 +00:00
Alexey Bataev 9f9fdab9f1 [SLP]Fix PR58766: deleted value used after vectorization.
If same instruction is reduced several times, but in one graph is part
of buildvector sequence and in another it is vectorized, we may loose
information that it was part of buildvector and must be extracted from
later vectorized value.
2022-11-16 10:57:03 -08:00
Alexey Bataev 2f8f17c157 [SLP]Fix PR58956: fix insertpoint for reduced buildvector graphs.
If the graph is only the buildvector node without main operation, need
to inherit insrtpoint from the redution instruction. Otherwise the
compiler crashes trying to insert instruction at the entry block.
2022-11-16 07:38:49 -08:00
Alexey Bataev 0a33ceee01 [SLP]Fix a crash on analysis of the vectorized node.
Need to use advanced check for the same vectorized node to avoid
possible compiler crash. We may have 2 similar nodes (vector one and
gather) after graph nodes rotation, need to do extra checks for the
exact match.
2022-11-15 13:40:28 -08:00
OCHyams 139e08efc5 [Assignment Tracking][23/*] Account for assignment tracking in SLP Vectorizer
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

The SLP-Vectorizer can merge a set of scalar stores into a single vectorized
store. Merge DIAssignID intrinsics from the scalar stores onto the new
vectorized store.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133320
2022-11-15 15:20:18 +00:00
Jordan Rupprecht 81896f88ce [NFC] Remove unused OrigLoopID vars 2022-11-11 07:51:40 -08:00
Florian Hahn 2d7e5e29b7
[LV] Remove unused OrigLoopID argument from completeLoopSekelton (NFC).
The argument is not used any longer and can be removed.
2022-11-11 15:39:08 +00:00
Sanjay Patel b57819e130 [VectorCombine] widen a load with subvector insert
This adapts/copies code from the existing fold that allows
widening of load scalar+insert. It can help in IR because
it removes a shuffle, and the backend can already narrow
loads if that is profitable in codegen.

We might be able to consolidate more of the logic, but
handling this basic pattern should be enough to make a small
difference on one of the motivating examples from issue #17113.
The final goal of combining loads on those patterns is not
solved though.

Differential Revision: https://reviews.llvm.org/D137341
2022-11-10 14:11:32 -05:00
Alexey Bataev b505fd559d [SLP]Redesign vectorization of the gather nodes.
Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.

Differential Revision: https://reviews.llvm.org/D135174
2022-11-10 10:59:54 -08:00
Alexey Bataev b5d91ab73e [SLP]Fix PR58863: Mask index beyond mask size for non-power-2 insertelement analysis.
Need to check if the insertelement mask size is reached during cost analysis to avoid compiler crash.

Differential Revision: https://reviews.llvm.org/D137639
2022-11-08 07:54:57 -08:00
skc7 42bce72536 Reapply "[SLP] Extend reordering data of tree entry to support PHInodes".
Reapplies 87a2086 (which was reverted in 656f1d8).
Fix for scalable vectors in getInsertIndex merged in 46d53f4.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D137537
2022-11-08 21:21:28 +05:30
Nathan James 6aa050a690 Reland "[llvm][NFC] Use c++17 style variable type traits"
This reverts commit 632a389f96.

This relands commit
1834a310d0.

Differential Revision: https://reviews.llvm.org/D137493
2022-11-08 14:15:15 +00:00
Nathan James 632a389f96 Revert "[llvm][NFC] Use c++17 style variable type traits"
This reverts commit 1834a310d0.
2022-11-08 13:11:41 +00:00
skc7 46d53f45d8 [SLP][NFC] Restructure getInsertIndex
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D137567
2022-11-08 18:07:50 +05:30
Nathan James 1834a310d0
[llvm][NFC] Use c++17 style variable type traits
This was done as a test for D137302 and it makes sense to push these changes

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D137493
2022-11-08 12:22:52 +00:00
skc7 9d96feb19b [SLP][NFC] Restructure areTwoInsertFromSameBuildVector
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D137569
2022-11-08 09:32:19 +05:30
Alexey Bataev ecd0b5a532 Revert "[SLP]Redesign vectorization of the gather nodes."
This reverts commit 8ddd1ccdf8 to fix
buildbots failures reported in https://lab.llvm.org/buildbot#builders/74/builds/14839
2022-11-07 08:35:21 -08:00
Alexey Bataev 8ddd1ccdf8 [SLP]Redesign vectorization of the gather nodes.
Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.

Differential Revision: https://reviews.llvm.org/D135174
2022-11-07 07:04:38 -08:00
David Green 656f1d8b74 Revert "[SLP] Extend reordering data of tree entry to support PHI nodes"
This reverts commit 87a20868eb as it has
problems with scalable vectors and use-list orders. Test to follow.
2022-11-06 11:43:51 +00:00
Sanjay Patel 710e34e136 [VectorCombine] move load safety checks to helper function; NFC
These checks can be re-used with other potential transforms
such as a load of a subvector-insert.
2022-11-04 10:39:37 -04:00
LiDongjin d1cee3539f [LoopVectorize] Fix crash on "Cannot dereference end iterator!"(PR56627)
Check hasOneUser before user_back().

Differential Revision: https://reviews.llvm.org/D136227
2022-11-03 23:13:37 +08:00
Alexey Bataev f090e3c00f [SLP]Fix write after bounds.
Need to use comma instead of + symbol to prevent writing after bounds.
2022-11-03 05:30:41 -07:00
Alexey Bataev 8b015b2078 [SLP][NFC]Formatting and reduce number of iterations, NFC. 2022-11-03 05:30:13 -07:00
skc7 87a20868eb [SLP] Extend reordering data of tree entry to support PHI nodes
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D136757
2022-11-01 04:50:04 +00:00
Alexey Bataev 99f9bd4807 [SLP]Fix a crash in the analysis of the compatible cmp operands.
We can skip the analysis of the operands opcodes, can compare directly
them in some cases.
2022-10-31 09:47:25 -07:00
Florian Hahn 43f0f1a66f
[VPlan] Use onlyFirstLaneUsed in sinkScalarOperands.
Replace custom code to check if only the first lane is used by generic
helper `onlyFirstLaneUsed`. This enables VPlan-based sinking in a few
additional cases and was suggested in D133760.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D136368
2022-10-29 19:45:19 +01:00
Alexey Bataev 2ec51f1c75 [SLP]Improve analysis of same/alternate code ops and scheduling.
Should improve compile time for analysis and vectorization.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test   6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test           2023.00                   2022.00 -0.0%
test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test               148.00                    146.00 -1.4%

Generated more vector instructions.

Differential Revision: https://reviews.llvm.org/D127531
2022-10-27 16:29:16 -07:00
Alexey Bataev 8ce0c7b1c9 Revert "[SLP]Improve analysis of same/alternate code ops and scheduling."
This reverts commit dad64448c6 to fix
a crash in https://lab.llvm.org/buildbot/#/builders/74/builds/14584
2022-10-27 15:21:35 -07:00
Alexey Bataev dad64448c6 [SLP]Improve analysis of same/alternate code ops and scheduling.
Should improve compile time for analysis and vectorization.

Metric: SLP.NumVectorInstructions

Program                                                                                       SLP.NumVectorInstructions
test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test  6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test   6380.00                   6378.00 -0.0%
test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test           2023.00                   2022.00 -0.0%
test-suite :: External/SPEC/CINT2006/471.omnetpp/471.omnetpp.test               148.00                    146.00 -1.4%

Generated more vector instructions.

Differential Revision: https://reviews.llvm.org/D127531
2022-10-27 11:31:18 -07:00
Philip Reames 269bc684e7 [LV][RISCV] Disable vectorization of epilogue loops
Epilogue loop vectorization is a feature in the vectorize intended to avoid running fully scalar code when the vector length of the main loop turns out to be either longer than the trip count of the actual loop, or with a huge remainder.

In practice, this feature appears to not have been well tuned. I honestly don't think it should be on by default at all, but it definitely shouldn't be on for RISCV. Note that other targets have also disabled it, but they've done so via disabling interleaving - which is, well, completely unrelated - and we don't want to do that for RISCV.

In the near term, many examples I'm seeing have terrible codegen for epilogue vectorization. We are greatly increasing code size for little value at reasonable VLEN values for small types. In the long term, the cases that epilogue vectorization are intended to handle are likely better handled via tail folding on RISCV.

As an aside, I also don't really trust the correctness of epilogue vectorization. The code structure is such that otherwise straight forward changes sometimes break only epilogue vectorization. The reuse of an existing vplan without careful validation opens significant room for nasty bugs. Given how rarely the code is exercised, that is not a good combination.

As such, this patch introduces a TTI hook, and completely disables epilogue vectorization on RISCV.

Differential Revision: https://reviews.llvm.org/D136695
2022-10-25 14:28:02 -07:00
Alexey Bataev da4e0f7ac5 [SLP][NFC]Fix PR58476: Fix compile time for reductions, NFC.
Improve O(N^2) to O(N) in some cases, reduce number of allocations by
reserving memory.
Also, improve analysis of loads reduction values to avoid analysis
of not vectorizable cases.
2022-10-24 10:13:24 -07:00
Florian Hahn 7eb4ec1c75
[VPlan] Print predicates for widened cmp instructions (NFC). 2022-10-21 08:54:11 +01:00
Paul Walker ab8257ca0e [NFC] Fix a few whitespace inconsistencies. 2022-10-20 14:52:25 +00:00
Florian Hahn e25ed058bc
[LV] Use buildScalarSteps to also handle VF = 1. (NFCI)
The code in buildScalarSteps already properly handles creating the
scalar induction values with VF = 1. Use it directly instead of using
extra code to handle that case.

Suggested by @Ayal in D133760.
2022-10-20 14:30:01 +01:00
Alexey Bataev b8b740c834 [SLP][NFC]Remove unused variable, NFC. 2022-10-19 12:35:27 -07:00
Florian Hahn d72fcee8f4
[VPlan] Add VPValue::isDefinedOutsideVectorRegions helper (NFC).
@Ayal suggested a better named helper than using `!getDef()` to check if
a value is invariant across all parts.

The property we are using here is that the VPValue is defined outside
any vector loop region. There's a TODO left to handle recipes defined in
pre-header blocks.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D133666
2022-10-19 13:20:30 +01:00
Alexey Bataev 087dadfd37 [SLP]Generalize cost model.
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
  cost model for extractelement instructions.

cpu2017
   511.povray_r             0.57
   520.omnetpp_r           -0.98
   521.wrf_r               -0.01
   525.x264_r               3.59 <+
   526.blender_r           -0.12
   531.deepsjeng_r         -0.07
   538.imagick_r           -1.42
Geometric mean:  0.21

Differential Revision: https://reviews.llvm.org/D115757
2022-10-18 11:55:59 -07:00
Alexey Bataev 62267e8de0 Revert "[SLP]Generalize cost model."
This reverts commit f12fb91188 and
f5c747bfbe to fix detected non-initialized
var use.
2022-10-18 11:25:59 -07:00
Alexey Bataev f5c747bfbe [SLP][NFC]Fix a warning for ?: with enum/unsigned, NFC. 2022-10-18 10:08:05 -07:00
Alexey Bataev f12fb91188 [SLP]Generalize cost model.
Generalized the cost model estimation. Improved cost model estimation
for repeated scalars (no need to count their cost anymore), improved
  cost model for extractelement instructions.

cpu2017
   511.povray_r             0.57
   520.omnetpp_r           -0.98
   521.wrf_r               -0.01
   525.x264_r               3.59 <+
   526.blender_r           -0.12
   531.deepsjeng_r         -0.07
   538.imagick_r           -1.42
Geometric mean:  0.21

Differential Revision: https://reviews.llvm.org/D115757
2022-10-18 08:49:32 -07:00
Alexey Bataev e79532d28c [SLP][NFC]Try to fix MSVC buildbots with a workaround, NFC. 2022-10-18 07:50:10 -07:00
Alexey Bataev 6a6fc4890d [SLP][NFC]Formatting of the getEntryCost function, NFC. 2022-10-18 07:18:26 -07:00
Sanjay Patel 8d76fbb5f0 [VectorCombine] fix crashing on match of non-canonical fneg
We can't assume that operand 0 is the negated operand because
the matcher handles "fsub -0.0, X" (and also +0.0 with FMF).

By capturing the extract within the match, we avoid the bug
and make the transform more robust (can't assume that this
pass will only see canonical IR).
2022-10-17 10:47:48 -04:00
Kazu Hirata b2f41e9ac1 [Vectorize] Use std::conditional_t (NFC) 2022-10-15 14:52:25 -07:00