llvm-project

Commit Graph

Author	SHA1	Message	Date
Valery N Dmitriev	8e9216fe87	[SLP] Do not make an attempt to match reduction on already erased instruction. Differential Revision: https://reviews.llvm.org/D105752	2021-07-09 17:13:15 -07:00
Sanjay Patel	c2b7f09d8c	[SLP] make invalid operand explicit for extra arg in reduction matching; NFC This makes it clearer when we have encountered the extra arg. Also, we may need to adjust the way the operand iteration works when handling logical and/or.	2021-07-09 15:32:12 -04:00
Sanjay Patel	486992f958	[SLP] improve code comments; NFC This likely started out only supporint binops, but now we handle min/max using cmp+sel, and we may extend to handle bool logic in the form of select.	2021-07-09 12:49:54 -04:00
Sanjay Patel	544f2711bb	[SLP] make checks for cmp+select min/max more explicit This is NFC-intended currently (so no test diffs). The motivation is to eventually allow matching for poison-safe logical-and and logical-or (these are in the form of a select-of-bools). ( https://llvm.org/PR41312 ) Those patterns will not have all of the same constraints as min/max in the form of cmp+sel. We may also end up removing the cmp+sel min/max matching entirely (if we canonicalize to intrinsics), so this will make that step easier.	2021-07-09 12:43:43 -04:00
David Green	38c9a4068d	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484	2021-07-09 11:51:16 +01:00
Alexey Bataev	c574d2fbac	[SLP]Improve vectorization of stores. Patch tries to improve the vectorization of stores. Originally, we just check the type and the base pointer of the store. Patch adds some extra checks to avoid non-profitable vectorization cases. It includes analysis of the scalar values to be stored and triggers the vectorization attempt only if the scalar values have same/alt opcode and are from same basic block, i.e. we don't end up immediately with the gather node, which is not profitable. This also improves compile time by filtering out non-profitable cases. Part of D57059. Differential Revision: https://reviews.llvm.org/D104122	2021-07-08 12:35:39 -07:00
Alexey Bataev	0d74fd3fdf	[SLP][COST][X86]Improve cost model for masked gather. Revived D101297 in its original form + added some changes in X86 legalization cehcking for masked gathers. This solution is the most stable and the most correct one. We have to check the legality before trying to build the masked gather in SLP. Without this check we have incorrect cost (for SLP) in case if the masked gather is not legal/slower than the gather. And we're missing some vectorization opportunities. This can be fixed in the cost model, but in this case we need to add special checks for the cost of GEPs for ScatterVectorize node, add special check for small trees, etc., i.e. there are a lot of corner cases here and there, which insrease code base and make it harder to maintain the code. > Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed. The question from D101297. Actually, no, it can't. Actually, simple gather may give us better result, especially after we started vectorization of insertelements. Plus, like I said before, the cost for non-legal masked gathers leads to missed vectorization opportunities. Differential Revision: https://reviews.llvm.org/D105042	2021-07-08 11:53:30 -07:00
Sanjay Patel	97c473ad39	[SLP] rename variable to not be misleading; NFC The reduction matching was probably only dealing with binops when it was written, but we have now generalized it to handle select and intrinsics too, so assert on that too.	2021-07-07 14:40:21 -04:00
Alexey Bataev	4e1a0684f1	[SLP]Fix non-determinism in PHI sorting. Compare type IDs and DFS numbering for basic block instead of addresses to fix non-determinism. Differential Revision: https://reviews.llvm.org/D105031	2021-07-06 08:45:45 -07:00
Caroline Concatto	b868a2d2c6	[SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector. The function vectorizeChainsInBlock does not support scalable vector, because function like canReuseExtract and isCommutative in the code path assert with scalable vectors. This patch avoids vectorizing blocks that have extract instructions with scalable vector.. Differential Revision: https://reviews.llvm.org/D104809	2021-07-05 12:43:41 +01:00
Nikita Popov	fabc17192e	[IRBuilder] Add type argument to CreateMaskedLoad/Gather Same as other CreateLoad-style APIs, these need an explicit type argument to support opaque pointers. Differential Revision: https://reviews.llvm.org/D105395	2021-07-04 12:17:59 +02:00
Alexey Bataev	7f7e4aed21	[SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by V.Dmitriev. Reduces number of arguments	2021-07-02 10:30:13 -07:00
Alexey Bataev	28ac873bcb	[SLP]Fix gathering of the scalars by not ignoring UndefValues. The compiler should not ignore UndefValue when gathering the scalars, otherwise the resulting code may be less defined than the original one. Also, grouped scalars to insert them at first to reduce the analysis in further passes. Differential Revision: https://reviews.llvm.org/D105275	2021-07-02 04:46:48 -07:00
Nikita Popov	00d3f7cc3c	[LAA] Make getPointersDiff() API compatible with opaque pointers Make getPointersDiff() and sortPtrAccesses() compatible with opaque pointers by explicitly passing in the element type instead of determining it from the pointer element type. The SLPVectorizer result is slightly non-optimal in that unnecessary pointer bitcasts are added. Differential Revision: https://reviews.llvm.org/D104784	2021-06-23 18:44:34 +02:00
Alexey Bataev	908b753661	[SLP]Improve vectorization of PHI instructions. Perform better analysis when trying to vectorize PHIs. 1. Do not try to vectorize vector PHIs. 2. Do deeper analysis for more profitable nodes for the vectorization. Before we just tried to vectorize the PHIs of the same type. Patch improves this and tries to vectorize PHIs with incoming values which come from the same basic block, have the same and/or alternative opcodes. It allows to save the compile time and provides better vectorization results in general. Part of D57059. Differential Revision: https://reviews.llvm.org/D103638	2021-06-21 12:26:24 -07:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Evgeniy Brevnov	96cded5b79	[SLP] Incorrect handling of external scalar values Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D103954	2021-06-16 13:27:36 +07:00
Valery N Dmitriev	94a07c79cf	[SLP][NFC] Fix condition that was supposed to save a bit of compile time. It was found by chance revealing discrepancy between comment (few lines above), the condition and how re-ordering of instruction is done inside the if statement it guards. The condition was always evaluated to true. Differential Revision: https://reviews.llvm.org/D104064	2021-06-11 10:08:55 -07:00
Alexey Bataev	a010d4230e	[SLP]Allow reordering of insertelements. After we added support for non-ordered insertelements, we can allow their reordering. Differential Revision: https://reviews.llvm.org/D104057	2021-06-11 08:47:41 -07:00
Alexey Bataev	74af4bb1f4	[SLP]Remove unnecessary UndefValue in CreateShuffle. No need to use UndefValue in CreateShuffle call. Differential Revision: https://reviews.llvm.org/D104113	2021-06-11 08:08:30 -07:00
Alexey Bataev	a893b44187	[SLP]Disable scheduling of insertelements. There is no need to schedule insertelement instructions. The compiler did not schedule them before it started support their vectorization and it should not do it after. We pre-schedule them manually when finding a build vector sequence. Disabling scheduling of insertelement instructions improves compile time and vectorization of the very large basic blocks by saving scheduling budget for other instructions. Differential Revision: https://reviews.llvm.org/D104026	2021-06-10 10:25:26 -07:00
Alexey Bataev	a0086add2e	[SLP]Improve gathering of scalar elements. 1. Better sorting of scalars to be gathered. Trying to insert constants/arguments/instructions-out-of-loop at first and only then the instructions which are inside the loop. It improves hoisting of invariant insertelements instructions. 2. Better detection of shuffle candidates in gathering function. 3. The cost of insertelement for constants is 0. Part of D57059. Differential Revision: https://reviews.llvm.org/D103458	2021-06-09 05:23:21 -07:00
Alexey Bataev	8c48d77cdf	[SLP]Improve cost estimation/emission of externally used extractelements. No need to recalculate the cost of extractelements, just no need to compensate the cost of all extractelements, need to check before if this is actually going to be removed at the vectorization. Also, no need to generate new extractelement instruction, we may just regenerate the original one. It may improve the final vectorization. Differential Revision: https://reviews.llvm.org/D102933	2021-06-03 10:26:59 -07:00
Alexey Bataev	89f3bc7698	[SLP]Allow to reorder nodes with >2 scalar values. tryToVectorizeList function allows to reorder only 2 scalars. Patch allows to reorder >2 scalars. Also, to avoid possible regressions, it allows extra vectorization of the remaining parts of the scalars elements if possible. Part of D57059. Differential Revision: https://reviews.llvm.org/D103247	2021-06-03 10:01:36 -07:00
Harald van Dijk	5d2b3de284	[SLP] Avoid std::stable_sort(properlyDominates()). As noticed by NAKAMURA Takumi back in 2017, we cannot use properlyDominates for std::stable_sort as properlyDominates only partially orders blocks. That is, for blocks A, B, C, D, where A dominates B and C dominates D, we have A == C, B == C, but A < B. This is not a valid comparison function for std::stable_sort and causes different results between libstdc++ and libc++. This change uses DFS numbering to give deterministic results for all reachable blocks. Unreachable blocks are ignored already, so do not need special consideration. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103441	2021-06-03 17:51:52 +01:00
Harald van Dijk	f126e8ec28	[SLPVectorizer] Ignore unreachable blocks As the existing test unreachable.ll shows, we should be doing more work to avoid entering unreachable blocks: we should not stop vectorization just because a PHI incoming value from an unreachable block cannot be vectorized. We know that particular value will never be used so we can just replace it with poison.	2021-06-01 20:21:04 +01:00
Alexey Bataev	36911971a5	[SLP]Better detection of perfect/shuffles matches for gather nodes. Implemented better scheme for perfect/shuffled matches of the gather nodes which allows to fix the performance regressions introduced by earlier patches. Starting detecting matches for broadcast nodes and extractelement gathering. Differential Revision: https://reviews.llvm.org/D102920	2021-06-01 07:08:07 -07:00
Alexey Bataev	27d3528acf	[SLP]Fix vectorization of insertelements with multiple uses. SLP vectorizer should not consider in sertelements with multiple uses as a part of high level build vector, it must be considered as a terminating insertelement in the vector build, otherwise it may produce incorrect code. Differential Revision: https://reviews.llvm.org/D103164	2021-05-26 09:42:18 -07:00
Anton Afanasyev	b2cd895011	[SLP] Fix "gathering" of insertelement instructions For rare exceptional case vector tree node (insertelements for now only) is marked as `NeedToGather`, this case is processed by patch. Follow-up of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135. Differential Revision: https://reviews.llvm.org/D102675	2021-05-25 01:35:43 +03:00
Alexey Bataev	8dab25954b	[SLP]Improve handling of compensate external uses cost. External insertelement users can be represented as a result of shuffle of the vectorized element and noconsecutive insertlements too. Added support for handling non-consecutive insertelements. Differential Revision: https://reviews.llvm.org/D101555	2021-05-21 07:45:31 -07:00
Alexey Bataev	182162b616	[SLP]Try to vectorize tiny trees with shuffled gathers of extractelements. If we gather extract elements and they actually are just shuffles, it might be profitable to vectorize them even if the tree is tiny. Differential Revision: https://reviews.llvm.org/D101460	2021-05-20 08:36:16 -07:00
Arthur Eubanks	6b9524a05b	[NewPM] Don't mark AA analyses as preserved Currently all AA analyses marked as preserved are stateless, not taking into account their dependent analyses. So there's no need to mark them as preserved, they won't be invalidated unless their analyses are. SCEVAAResults was the one exception to this, it was treated like a typical analysis result. Make it like the others and don't invalidate unless SCEV is invalidated. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D102032	2021-05-18 13:49:03 -07:00
Anton Afanasyev	207cdd7ed9	[SLP] Fix spill cost computation for insertelement tree node This is follow up for D98714, bugfixing.	2021-05-14 13:14:41 +03:00
Anton Afanasyev	ab2c499d3a	[SLP] Add insertelement instructions to vectorizable tree Add new type of tree node for `InsertElementInst` chain forming vector. These instructions could be either removed, or replaced by shuffles during vectorization and we can add this node to cost model, so naturally estimating their cost, getting rid of `CompensateCost` tricks and reducing further work for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this patch is the first step towards revectorization of partially vectorization (to fix PR42022 completely). After adding inserts to tree the next step is to add vector instructions there (for instance, to merge `store <2 x float>` and `store <2 x float>` to `store <4 x float>`). Fixes PR40522 and PR35732. Differential Revision: https://reviews.llvm.org/D98714	2021-05-13 07:41:45 +03:00
Sanjay Patel	49950cb1f6	[SLP] restrict matching of load combine candidates The test example from https://llvm.org/PR50256 (and reduced here) shows that we can match a load combine candidate even when there are no "or" instructions. We can avoid that by confirming that we do see an "or". This doesn't apply when matching an or-reduction because that match begins from the operands of the reduction. Differential Revision: https://reviews.llvm.org/D102074	2021-05-11 08:46:40 -04:00
Alexey Bataev	30463bc3f1	[SLP]Do not count perfect diamond matches for gathers several times. Need to remove the old code for avoiding double counting of the gather nodes with perfect diamond matches within the tree after we started detecting perfect/shuffled matching in the previous patch D100495. We may skip the cost for such nodes completely. Differential Revision: https://reviews.llvm.org/D102023	2021-05-10 07:08:07 -07:00
Simon Pilgrim	338c1b701f	[SLP] Constify the TreeEntry* input into getEntryCost() + setInsertPointAfterBundle(). NFCI.	2021-05-06 16:20:19 +01:00
Simon Pilgrim	2dab059021	[SLP] Constify the TreeEntry* input into dumpTreeCosts(). NFCI.	2021-05-06 16:20:19 +01:00
Simon Pilgrim	1b47489fd0	[SLP] Use empty() instead of size() == 0. NFCI.	2021-05-06 16:20:18 +01:00
Alexey Bataev	369cd2ae52	Revert "[SLP]Allow masked gathers only if allowed by target." This reverts commit `fd18547e07`. Need to add a check for the size of the vectorization tree to avoid some extra vectorization.	2021-05-04 04:53:22 -07:00
Alexey Bataev	fd18547e07	[SLP]Allow masked gathers only if allowed by target. Need to check if target allows/supports masked gathers before trying to estimate its cost, otherwise we may fail to vectorize some of the patterns because of too pessimistic cost model. Part of D57059. Differential Revision: https://reviews.llvm.org/D101297	2021-05-03 08:06:20 -07:00
Alexey Bataev	2e4cc9a725	Revert "[SLP]Allow masked gathers only if allowed by target." This reverts commit `b5f64768cf` to fix a compiler crash revealed by buildbots.	2021-05-03 07:20:00 -07:00
Alexey Bataev	b5f64768cf	[SLP]Allow masked gathers only if allowed by target. Need to check if target allows/supports masked gathers before trying to estimate its cost, otherwise we may fail to vectorize some of the patterns because of too pessimistic cost model. Part of D57059. Differential Revision: https://reviews.llvm.org/D101297	2021-05-03 06:45:42 -07:00
Alexey Bataev	a3fd82c289	[SLP]Fix the crash on cost calculation if non-compatible vectors shuffled. If the extracts from the non-power-2 vectors are recognized as shuffles, need some extra checks to not crash cost calculations if trying to gext the ecost for subvector extracts. In this case need to check carefully that we do not exit out of bounds of the original vector, otherwise the TTI's cost model will crash on assert. Differential Revision: https://reviews.llvm.org/D101477	2021-04-30 09:34:20 -07:00
Alexey Bataev	12c51f2358	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 12:48:00 -07:00
Alexey Bataev	6e859f3cd4	Revert "[COST] Improve shuffle kind detection if shuffle mask is provided." This reverts commit `9239932221` to fix a compiler crash on mask checks.	2021-04-29 12:40:33 -07:00
Alexey Bataev	9239932221	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 09:42:56 -07:00
Alexey Bataev	8af4723c58	[SLP]Try to vectorize tiny trees with shuffled gathers. If the first tree element is vectorize and the second is gather, it still might be profitable to vectorize it if the gather node contains less scalars to vectorize than the original tree node. It might be profitable to use shuffles. Differential Revision: https://reviews.llvm.org/D101397	2021-04-28 06:35:31 -07:00
Alexey Bataev	24590d8d67	[SLP]Improved isGatherShuffledEntry, NFC. Reworked isGatherShuffledEntry function, simplified and moved common code to the lambda (it shall go away when non-power-2 patch will be landed).	2021-04-27 05:59:46 -07:00
Alexey Bataev	18c61fc498	[SLP]Skip undefs trying to find perfect/shuffled tree entries matching. We can skip check for undefs trying to find perfect/shuffled tree entries matching, they can be ignored completely improving the final cost/vectorization results. Differential Revision: https://reviews.llvm.org/D101061	2021-04-22 08:59:07 -07:00

1 2 3 4 5 ...

901 Commits