Commit Graph

3480 Commits

Author SHA1 Message Date
Kazu Hirata 595f1a6aaf [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 19:47:13 -08:00
Kazu Hirata 9f252e5567 [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:31:17 -08:00
Kazu Hirata 3c09ed006a [llvm] Use std::nullopt instead of None in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-04 17:12:44 -08:00
Florian Hahn 37809c867a
[VPlan] Support sinking VPScalarIVStepsRecipe.
This patch extends VP-based sinking to also sink VPScalarStepsRecipe.
This takes us a step closer towards retiring the IR based sinking.

The main change is extending VPScalarIVStepsRecipe::execute to support
executing in a replicate-region.

Depends on D133758.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133760
2022-12-04 22:59:17 +00:00
Florian Hahn 3c5f07349f
[VPlan] Mark VPScalarIVStepsRecipe as not reading/writing memory.
The recipe only computes the inductions steps using its operands. It
does neither read nor write memory.

Split of from D133760.
2022-12-04 12:58:46 +00:00
Kazu Hirata 343de6856e [Transforms] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-02 21:11:37 -08:00
Krzysztof Parzyszek 26424c96c0 Attributes: convert Optional to std::optional 2022-12-02 08:15:45 -06:00
Florian Hahn 0c5df7cd2f
Recommit "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit bf15f1e489.

The updated version fixes a crash by checking the induction kind instead
of the opcode; for integer inductions, the step is always added, but the
opcode might not be set.
2022-11-30 17:04:20 +00:00
Alexey Bataev 0cc15050a4 [SLP]Fix PR59230: Use actual vector factor when sorting entries.
When we sort entries for attempting to reorder scalars, need to use
actual vectorization factor, not the number of scalars. Otherwise the
compiler crashes, if the scalars has to be reordered.

Differential Revision: https://reviews.llvm.org/D138819
2022-11-29 06:46:06 -08:00
Stanislav Mekhanoshin c46634554d [LoadStoreVectorizer] Consider if operation is faster than before
Compare a relative speed of misaligned accesses before and
after vectorization, not just check the new instruction is
not going to be slower.

Since no target now returns anything but 0 or 1 for Fast
argument of the allowsMisalignedMemoryAccesses this is still NFCI.

The subsequent patch will tune actual vaues of Fast on AMDGPU.

Differential Revision: https://reviews.llvm.org/D124218
2022-11-28 15:52:32 -08:00
Florian Hahn bf15f1e489
Revert "[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe."
This reverts commit 0fa666eced.

This triggers an assertion during AArch64 stage2 builds. Revert while I
investigate.

See https://lab.llvm.org/buildbot/#/builders/179/builds/4967/steps/11/logs/stdio
2022-11-28 22:43:11 +00:00
Florian Hahn 0fa666eced
[VPlan] Add VPDerivedIVRecipe, use for VPScalarIVStepsRecipe.
This patch splits off the logic to transform the canonical IV to a
a value for an induction with a different start and step. This
transformation only needs to be done once (independent of VF/UF) and
enables sinking of VPScalarIVStepsRecipe as follow-up.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D133758
2022-11-28 16:32:31 +00:00
Qiongsi Wu f946c70130 [SLPVectorizer] Do Not Move Loads/Stores Beyond Stacksave/Stackrestore Boundaries
If left unchecked, the SLPVecrtorizer can move loads/stores below a stackrestore. The move can cause issues if the loads/stores have pointer operands from `alloca`s that are reset by the stackrestores. This patch adds the dependency check.

The check is conservative, in that it does not check if the pointer operands of the loads/stores are actually from `alloca`s that may be reset. We did not observe any SPECCPU2017 performance degradation so this simple fix seems sufficient.

The test could have been added to `llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll`, but that test has not been updated to use opaque pointers. I am not inclined to add tests that still use typed pointers, or to refactor `llvm/test/Transforms/SLPVectorizer/X86/stacksave-dependence.ll` to use opaque pointers in this patch. If desired, I will open a different patch to refactor and consolidate the tests.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D138585
2022-11-28 10:00:29 -05:00
Kazu Hirata 41c638875e [Vectorize] Use std::optional in VPlanSLP.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 18:11:32 -08:00
Kazu Hirata 5fc8f6c37c [Vectorize] Use std::optional in SLPVectorizer.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-11-26 18:03:49 -08:00
Florian Hahn bf0bd85f9d
[LV] Move trunc codegen to buildScalarSteps (NFCI).
This moves the code to truncate step and IV into buildScalarSteps,
closer to the place where they are actually used.

Suggested in D133758.
2022-11-26 23:48:46 +00:00
Florian Hahn 12bb5535d2
[VPlan] Move cast codegen to emitTransformedIndex (NFCI).
This reduces duplication a bit.

Suggested as simplification in D133758.
2022-11-26 22:47:13 +00:00
Florian Hahn ed2fdace89
[LV] Use separate index to access StoredValues in vectorizeInterleave.
StoredValues only has entries for members of the interleave group. If
there are gaps, then using the index i here will either access a wrong
entry or be out-of-bounds.

Instead use a dedicated index that only gets incremented for members of
the interleave group.

Fixes #59090.
2022-11-25 15:28:05 +00:00
Fangrui Song fa36d72305 [LoopVectorize] Internalize some cl::opt 2022-11-23 23:03:02 -08:00
Matt Devereau ee4d6c8bf0 [VectorCombine] Enable scalarizeBinopOrCmp for scalable vectors
This reverts a change to exclude scalarizeBinopOrCmp in VectorCombine for
scalable vectors which caused poor scalable Binop codegen.

Differential Revision: https://reviews.llvm.org/D138545
2022-11-23 13:17:21 +00:00
Benjamin Kramer f116107f2d [VectorCombine] Don't touch instruction after foldSingleElementStore, it might be deleted
Use after free found by asan.
2022-11-22 21:12:42 +01:00
Sanjay Patel ede6d608f4 [VectorCombine] switch on opcode to compile faster
This follows 87debdadaf to further eliminate wasting time
calling helper functions only to early return to the main
run loop.

Once again, this results in significant savings based on
experimental data:
https://llvm-compile-time-tracker.com/compare.php?from=01023bfcd33f922ed8c934ce563e54abe8bfe246&to=3dce4f70b73e48ccb045decb634c185e6b4c67d5&stat=instructions:u

This is NFCI other than making the pass faster. The total
cost of VectorCombine runs in an -O3 build appears to be
well under 0.1% of compile-time now, so there's not much
left to do AFAICT.

There's a TODO about making the code cleaner, but it
probably doesn't change timing much. I didn't include those
changes here because it requires updating much more code.
2022-11-22 10:23:32 -05:00
Sanjay Patel 163bb6d64e [Passes][VectorCombine] enable early run generally and try load folds
An early run of VectorCombine was added with D102496 specifically to
deal with unnecessary vector ops produced with the C matrix extension.
This patch is proposing to try those folds in general and add a pair
of load folds to the menu.

The load transform will partly solve (see PhaseOrdering diffs) a
longstanding vectorization perf bug by removing redundant loads via GVN:
issue #17113

The main reason for not enabling the extra pass generally in the initial
patch was compile-time cost. The cost of VectorCombine was significantly
(surprisingly) improved with:
87debdadaf
https://llvm-compile-time-tracker.com/compare.php?from=ffe05b8f57d97bc4340f791cb386c8d00e0739f2&to=87debdadaf18f8a5c7e5d563889e10731dc3554d&stat=instructions:u

...so the extra run is going to cost very little now - the total cost of
the 2 runs should be less than the 1 run before that micro-optimization:
https://llvm-compile-time-tracker.com/compare.php?from=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&to=2c4b68eab5ae969811f422714e0eba44c5f7eefb&stat=instructions:u

It may be possible to reduce the cost slightly more with a few more
earlier-exits like that, but it's probably in the noise based on timing
experiments.

Differential Revision: https://reviews.llvm.org/D138353
2022-11-21 13:57:55 -05:00
Sanjay Patel 8f337f8ffe [VectorCombine] generalize pass param name for early combines; NFC
The option was added with https://reviews.llvm.org/D102496,
and currently the name is accurate, but I am hoping to add
a load transform that is not a scalarization. See issue #17113.
2022-11-21 13:57:55 -05:00
Alexey Bataev ac93b61165 [SLP]Fix PR59098: check if the vector type is scalarized for
extractelements.

If the resulting type is going to be scalarized, no need to adjust the
cost of removed extractelement and insert/extract subvector costs.
Otherwise, the compiler can crash because of the wrong type sizes.
2022-11-21 10:26:01 -08:00
Bjorn Pettersson 1c308d6641 [LV] Clean up LoopVectorizationCostModel::calculateRegisterUsage. NFC
Minor refactoring in LoopVectorizationCostModel::calculateRegisterUsage.

Also adding some FIXME:s related to what appears to be some short
comings related to how the register usage is calculated.

Differential Revision: https://reviews.llvm.org/D138342
2022-11-20 20:52:13 +01:00
Sanjay Patel 87debdadaf [VectorCombine] check instruction type before dispatching to folds
This is no externally visible change intended, but appears to be a
noticeable (surprising) improvement in compile-time based on:
https://llvm-compile-time-tracker.com/compare.php?from=0f3e72e86c8c7c6bf0ec24bf1e2acd74b4123e7b&to=5e8c2026d10e8e2c93c038c776853bed0e7c8fc1&stat=instructions:u

The early returns in the individual fold functions are not good
enough to avoid the overhead of the many "fold*" calls, so this
speeds up the main instruction loop enough to make a difference.
2022-11-18 16:03:18 -05:00
Alexey Bataev 07015e12f0 [SLP]Fix PR59053: trying to erase instruction with users.
Need to count the reduced values, vectorized in the tree but not in the top node. Such scalars still must be extracted out of the vector node instead of the original scalar.
2022-11-17 17:23:48 -08:00
Stanislav Mekhanoshin bcaf31ec3f [AMDGPU] Allow finer grain control of an unaligned access speed
A target can return if a misaligned access is 'fast' as defined
by the target or not. In reality there can be different levels
of 'fast' and 'slow'. This patch changes the boolean 'Fast'
argument of the allowsMisalignedMemoryAccesses family of functions
to an unsigned representing its speed.

A target can still define it as it wants and the direct translation
of the current code uses 0 and 1 for current false and true. This
makes the change an NFC.

Subsequent patch will start using an actual value of speed in
the load/store vectorizer to compare if a vectorized access going
to be not just fast, but not slower than before.

Differential Revision: https://reviews.llvm.org/D124217
2022-11-17 09:23:53 -08:00
Florian Hahn 55f56cdc33
[VPlan] Introduce VPValue::hasDefiningRecipe helper (NFC).
This clarifies the intention of code that uses the helper.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 23:12:40 +00:00
Florian Hahn aa16689f82
[VPlan] Use recipe type to avoid getDefiningRecipe call (NFC).
Suggested by @Ayal during review of D136068, thanks!
2022-11-16 23:03:34 +00:00
Florian Hahn 239b52d4b6
[VPlan] Update stale comment (NFC).
Update comment to reflect current code, which also allows for
VPScalarIVStepsRecipes to be uniform.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 22:39:50 +00:00
Florian Hahn bcc9c5d959
[LV] Replace unnecessary cast_or_null with cast (NFC).
The existing code already unconditionally dereferences RepR, so
cast_or_null can be replaced by just cast.

Suggested by @Ayal during review of D136068, thanks!
2022-11-16 22:31:59 +00:00
Florian Hahn 32f1c5531b
[VPlan] Update VPValue::getDef to return VPRecipeBase, adjust name(NFC)
The return value of getDef is guaranteed to be a VPRecipeBase and all
users can also accept a VPRecipeBase *. Most users actually case to
VPRecipeBase or a specific recipe before using it, so this change
removes a number of redundant casts.

Also rename it to getDefiningRecipe to make the name a bit clearer.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D136068
2022-11-16 22:12:08 +00:00
Alexey Bataev 9f9fdab9f1 [SLP]Fix PR58766: deleted value used after vectorization.
If same instruction is reduced several times, but in one graph is part
of buildvector sequence and in another it is vectorized, we may loose
information that it was part of buildvector and must be extracted from
later vectorized value.
2022-11-16 10:57:03 -08:00
Alexey Bataev 2f8f17c157 [SLP]Fix PR58956: fix insertpoint for reduced buildvector graphs.
If the graph is only the buildvector node without main operation, need
to inherit insrtpoint from the redution instruction. Otherwise the
compiler crashes trying to insert instruction at the entry block.
2022-11-16 07:38:49 -08:00
Alexey Bataev 0a33ceee01 [SLP]Fix a crash on analysis of the vectorized node.
Need to use advanced check for the same vectorized node to avoid
possible compiler crash. We may have 2 similar nodes (vector one and
gather) after graph nodes rotation, need to do extra checks for the
exact match.
2022-11-15 13:40:28 -08:00
OCHyams 139e08efc5 [Assignment Tracking][23/*] Account for assignment tracking in SLP Vectorizer
The Assignment Tracking debug-info feature is outlined in this RFC:

https://discourse.llvm.org/t/
rfc-assignment-tracking-a-better-way-of-specifying-variable-locations-in-ir

The SLP-Vectorizer can merge a set of scalar stores into a single vectorized
store. Merge DIAssignID intrinsics from the scalar stores onto the new
vectorized store.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D133320
2022-11-15 15:20:18 +00:00
Jordan Rupprecht 81896f88ce [NFC] Remove unused OrigLoopID vars 2022-11-11 07:51:40 -08:00
Florian Hahn 2d7e5e29b7
[LV] Remove unused OrigLoopID argument from completeLoopSekelton (NFC).
The argument is not used any longer and can be removed.
2022-11-11 15:39:08 +00:00
Sanjay Patel b57819e130 [VectorCombine] widen a load with subvector insert
This adapts/copies code from the existing fold that allows
widening of load scalar+insert. It can help in IR because
it removes a shuffle, and the backend can already narrow
loads if that is profitable in codegen.

We might be able to consolidate more of the logic, but
handling this basic pattern should be enough to make a small
difference on one of the motivating examples from issue #17113.
The final goal of combining loads on those patterns is not
solved though.

Differential Revision: https://reviews.llvm.org/D137341
2022-11-10 14:11:32 -05:00
Alexey Bataev b505fd559d [SLP]Redesign vectorization of the gather nodes.
Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.

Differential Revision: https://reviews.llvm.org/D135174
2022-11-10 10:59:54 -08:00
Alexey Bataev b5d91ab73e [SLP]Fix PR58863: Mask index beyond mask size for non-power-2 insertelement analysis.
Need to check if the insertelement mask size is reached during cost analysis to avoid compiler crash.

Differential Revision: https://reviews.llvm.org/D137639
2022-11-08 07:54:57 -08:00
skc7 42bce72536 Reapply "[SLP] Extend reordering data of tree entry to support PHInodes".
Reapplies 87a2086 (which was reverted in 656f1d8).
Fix for scalable vectors in getInsertIndex merged in 46d53f4.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D137537
2022-11-08 21:21:28 +05:30
Nathan James 6aa050a690 Reland "[llvm][NFC] Use c++17 style variable type traits"
This reverts commit 632a389f96.

This relands commit
1834a310d0.

Differential Revision: https://reviews.llvm.org/D137493
2022-11-08 14:15:15 +00:00
Nathan James 632a389f96 Revert "[llvm][NFC] Use c++17 style variable type traits"
This reverts commit 1834a310d0.
2022-11-08 13:11:41 +00:00
skc7 46d53f45d8 [SLP][NFC] Restructure getInsertIndex
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D137567
2022-11-08 18:07:50 +05:30
Nathan James 1834a310d0
[llvm][NFC] Use c++17 style variable type traits
This was done as a test for D137302 and it makes sense to push these changes

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D137493
2022-11-08 12:22:52 +00:00
skc7 9d96feb19b [SLP][NFC] Restructure areTwoInsertFromSameBuildVector
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D137569
2022-11-08 09:32:19 +05:30
Alexey Bataev ecd0b5a532 Revert "[SLP]Redesign vectorization of the gather nodes."
This reverts commit 8ddd1ccdf8 to fix
buildbots failures reported in https://lab.llvm.org/buildbot#builders/74/builds/14839
2022-11-07 08:35:21 -08:00