llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	f057756a1a	[SLP] Fix Wdocumentation warning - remove \returns from void function. NFC.	2021-11-07 15:08:39 +00:00
Kazu Hirata	1b108ab975	[Transforms] Use make_early_inc_range (NFC)	2021-11-02 18:13:23 -07:00
Alexey Bataev	07ef9f513f	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-28 05:45:09 -07:00
Alexey Bataev	f06e332982	Revert "[SLP]Improve/fix reordering of the gathered graph nodes." This reverts commit `64d1617d18` to fix test non-stability.	2021-10-27 11:16:58 -07:00
Alexey Bataev	64d1617d18	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-27 08:49:13 -07:00
Alexey Bataev	9b12975cbf	Revert "[SLP]Improve/fix reordering of the gathered graph nodes." This reverts commit `f719b794bc` to fix instability in tests.	2021-10-27 07:31:36 -07:00
Alexey Bataev	f719b794bc	[SLP]Improve/fix reordering of the gathered graph nodes. Gathered loads/extractelements/extractvalue instructions should be checked if they can represent a vector reordering node too and their order should ve taken into account for better graph reordering analysis/ Also, if the gather node has reused scalars, they must be reordered instead of the scalars themselves. Differential Revision: https://reviews.llvm.org/D112454	2021-10-27 06:08:40 -07:00
Alexey Bataev	cb4feae7bd	[SLP]Fix logical and/or reductions. Need to emit select(cmp) instructions for poison-safe forms of select ops. Currently alive reports that `Target is more poisonous than source` for operations we generating for such instructions. https://alive2.llvm.org/ce/z/FiNiAA Differential Revision: https://reviews.llvm.org/D112562	2021-10-27 04:25:20 -07:00
Alexey Bataev	ce14d1b690	[SLP]Do not reorder reduction nodes. The final reduction nodes should not be reordered, the order does not matter for reductions. Also, it might be profitable to vectorize smaller reduction trees, reduction cost may compensate small tree cost. Part of D111574 Differential Revision: https://reviews.llvm.org/D112467	2021-10-26 07:41:24 -07:00
Alexey Bataev	eb9b75dd4d	[SLP]Change the order of the reduction/binops args pair vectorization attempts. Need to change the order of the reduction/binops args pair vectorization attempts. Need to try to find the reduction at first and postpone vectorization of binops args. This may help to find more reduction patterns and vectorize them. Part of D111574. Differential Revision: https://reviews.llvm.org/D112224	2021-10-25 06:27:14 -07:00
Alexey Bataev	3ea7877c8b	[SLP]Unify vectorization of PHI and store nodes with improved tiny tree vectorization. Vectorization of PHIs and stores very similar, it might be beneficial to try to revectorize stores (like PHIs) if the total number of stores with the same/alternate opcode is less than the vector size but number of stores with the same type is larger than the vector size. Differential Revision: https://reviews.llvm.org/D109831	2021-10-21 06:25:32 -07:00
Alexey Bataev	b9cfa016da	[SLP]Fix emission of the shrink shuffles. Need to follow the order of the reused scalars from the ReuseShuffleIndices mask rather than rely on the natural order. Differential Revision: https://reviews.llvm.org/D111898	2021-10-18 13:13:12 -07:00
Alexey Bataev	414abff1fe	[SLP]Fix PR52090: clang crashes: Assertion `Index < Length && "Invalid index!"' failed. Need to check that either Idx is UndefMaskElem and value is UndefValue or Idx is valid and value is the same as the scalar value in the node. Differential Revision: https://reviews.llvm.org/D111802	2021-10-14 14:26:29 -07:00
Simon Pilgrim	0dcd2b40e6	[TTI] Remove default condition type and predicate arguments from getCmpSelInstrCost We need to be better at exposing the comparison predicate to getCmpSelInstrCost calls as some targets (e.g. X86 SSE) have very different costs for different comparisons (PR48337), and we can't always rely on the optional Instruction argument. This initial commit requires explicit condition type and predicate arguments. The next step will be to review a lot of the existing getCmpSelInstrCost calls which have used BAD_ICMP_PREDICATE even when the predicate is known. Differential Revision: https://reviews.llvm.org/D111024	2021-10-06 15:40:35 +01:00
Alexey Bataev	bebe702dbe	[SLP]Detect reused scalars in all possible gathers for better vectorization cost. Some initially gathered nodes missed the check for the reused scalars, which leads to high gather cost. Such nodes still can be represented as m gathers + shuffle instead of n gathers, where m < n. Differential Revision: https://reviews.llvm.org/D111153	2021-10-05 09:43:03 -07:00
Kazu Hirata	4f0225f6d2	[Transforms] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-01 09:57:40 -07:00
Kerry McLaughlin	c1d46d3461	[SLPVectorizer] Fix crash in isShuffle with scalable vectors D104809 changed `buildTree_rec` to check for extract element instructions with scalable types. However, if the extract is extended or truncated, these changes do not apply and we assert later on in isShuffle(), which attempts to cast the type of the extract to FixedVectorType. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D110640	2021-10-01 10:56:44 +01:00
Alexey Bataev	f701505c45	[SLP]Improve vectorization of phi nodes by trying wider vectors. Try to improve vectorization of the PHI nodes by trying to vectorize similar instructions at the size of the widest possible vectors, then aggregating with compatible type PHIs and trying to vectoriza again and only if this failed, try smaller sizes of the vector factors for compatible PHI nodes. This restores performance of several benchmarks after tuning of the fp/int conversion instructions costs. Differential Revision: https://reviews.llvm.org/D108740	2021-09-28 07:20:36 -07:00
Alexey Bataev	8bacfb9bed	[SLP]No need to schedule/check parent for extract{element/value} instruction. The instruction extractelement/extractvalue are not required to be scheduled since they only depend on the source vector/aggregate (with constant indices), smae applies to the parent basic block checks. Improves compile time and saves scheduling budget. Differential Revision: https://reviews.llvm.org/D108703	2021-09-28 06:13:55 -07:00
Jameson Nash	e27a6db529	Bad SLPVectorization shufflevector replacement, resulting in write to wrong memory location We see that it might otherwise do: %10 = getelementptr {}, <2 x {}> %9, <2 x i32> <i32 10, i32 4> %11 = bitcast <2 x {}*> %10 to <2 x i64> ... %27 = extractelement <2 x i64> %11, i32 0 %28 = bitcast i64 %27 to <2 x i64>* store <2 x i64> %22, <2 x i64>* %28, align 4, !tbaa !2 Which is an out-of-bounds store (the extractelement got offset 10 instead of offset 4 as intended). With the fix, we correctly generate extractelement for i32 1 and generate correct code. Differential Revision: https://reviews.llvm.org/D106613	2021-09-27 14:06:13 -04:00
Simon Pilgrim	8a44281f47	[SLP] getReductionCost - use explicit TTI::TCK_RecipThroughput CostKind. NFCI. Avoid relying on the default cost kinds in TTI calls (we already do this in other places in SLP) - noticed while trying to see how much work it'd be to extend D110242 and remove all remaining uses of default CostKind arguments.	2021-09-22 16:52:22 +01:00
Alexey Bataev	bc69dd62c0	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-09-20 08:42:19 -07:00
Chris Lattner	735f46715d	[APInt] Normalize naming on keep constructors / predicate methods. This renames the primary methods for creating a zero value to `getZero` instead of `getNullValue` and renames predicates like `isAllOnesValue` to simply `isAllOnes`. This achieves two things: 1) This starts standardizing predicates across the LLVM codebase, following (in this case) ConstantInt. The word "Value" doesn't convey anything of merit, and is missing in some of the other things. 2) Calling an integer "null" doesn't make any sense. The original sin here is mine and I've regretted it for years. This moves us to calling it "zero" instead, which is correct! APInt is widely used and I don't think anyone is keen to take massive source breakage on anything so core, at least not all in one go. As such, this doesn't actually delete any entrypoints, it "soft deprecates" them with a comment. Included in this patch are changes to a bunch of the codebase, but there are more. We should normalize SelectionDAG and other APIs as well, which would make the API change more mechanical. Differential Revision: https://reviews.llvm.org/D109483	2021-09-09 09:50:24 -07:00
Kazu Hirata	5648f7170e	[Analysis, Target, Transforms] Construct SmallVector with iterator ranges (NFC)	2021-09-07 09:19:33 -07:00
Nikita Popov	48ebe427c9	[SLPVectorizer] Make aliasing check more precise SLPVectorizer currently uses AA::isNoAlias() to determine whether two locations alias. This does not work if one of the instructions is a call. Instead, we should check getModRefInfo(), which determines whether an arbitrary instruction modifies or references a given location. Among other things, this prevents @llvm.experimental.noalias.scope.decl() and other inaccessiblmemonly intrinsics from interfering with SLP vectorization. Differential Revision: https://reviews.llvm.org/D109012	2021-08-31 22:35:30 +02:00
Anton Afanasyev	077d4cb3ab	Revert "[SLP]No need to schedule/check parent for extract{element/value} instruction." Revert since introduced issure reported here: https://lists.llvm.org/pipermail/llvm-dev/2021-August/152411.html Discussed starting from here: https://reviews.llvm.org/D108703#2974289 This reverts commit `a36bc873a2`.	2021-08-31 15:29:06 +03:00
Mikhail Goncharov	5097b6e352	Revert "[SLP]Improve graph reordering." This reverts commit `84cbd71c95`. This commit breaks one of the internal tests. As agreed with Alexey I will provide the reproducer later.	2021-08-30 19:16:44 +02:00
Alexey Bataev	84cbd71c95	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-08-26 12:31:18 -07:00
Alexey Bataev	b00f73d8bf	Revert "[SLP]Improve graph reordering." This reverts commit `a28234e37a` to investigate a compiler crash caused by the commit.	2021-08-26 09:19:40 -07:00
Alexey Bataev	a28234e37a	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-08-26 07:19:07 -07:00
Alexey Bataev	a36bc873a2	[SLP]No need to schedule/check parent for extract{element/value} instruction. The instruction extractelement/extractvalue are not required to be scheduled since they only depend on the source vector/aggregate (with constant indices), smae applies to the parent basic block checks. Improves compile time and saves scheduling budget. Differential Revision: https://reviews.llvm.org/D108703	2021-08-25 09:27:55 -07:00
Alexey Bataev	e7c3eaa8ae	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107494	2021-08-05 08:41:24 -07:00
Alexey Bataev	214f99b27c	Revert "[SLP]Do not emit extra shuffle for insertelements vectorization." This reverts commit `871ea69803` to fix the problem if the first vector is not just undef.	2021-08-04 11:28:59 -07:00
Alexey Bataev	871ea69803	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107344	2021-08-03 13:18:41 -07:00
Alexey Bataev	7d9d926a18	Revert "[SLP]Improve graph reordering." This reverts commit `e408d1dfab` and 2 other (`4b25c11321` and `c2deb2afaf`) related to fix the problem with the reordering shuffles.	2021-08-03 12:13:43 -07:00
Alexey Bataev	95e5d401ae	[SLP]Improve splats vectorization. Replace insertelement instructions for splats with just single insertelement + broadcast shuffle. Also, try to merge these instructions if they come from the same/shuffled gather node. Differential Revision: https://reviews.llvm.org/D107104	2021-07-30 10:17:45 -07:00
Alexey Bataev	4b25c11321	[SLP]Fix an assertion for the size of user nodes. For the nodes with reused scalars the user may be not only of the size of the final shuffle but also of the size of the scalars themselves, need to check for this. It is safe to just modify the check here, since the order of the scalars themselves is preserved, only indeces of the reused scalars are changed. So, the users with the same size as the number of scalars in the node, will not be affected, they still will get the operands in the required order. Reported by @mstorsjo in D105020. Differential Revision: https://reviews.llvm.org/D107080	2021-07-30 05:46:44 -07:00
Alexey Bataev	f4fb854811	[SLP]Do not consider deleted instruction as external users. If the instruction was previously deleted, it should not be treated as an external user. This fixes cost estimation and removes dead extractelement instructions. Differential Revision: https://reviews.llvm.org/D107106	2021-07-30 05:37:43 -07:00
Alexey Bataev	c2deb2afaf	[SLP]Fix a crash in gathered loads analysis. Need to check that the minimum acceptable vector factor is at least 2, not 0, to avoid compiler crash during gathered loads analysis. Differential Revision: https://reviews.llvm.org/D107058	2021-07-30 05:19:17 -07:00
Alexey Bataev	3ad6437fcc	[SLP]Fix build on MacOS, NFC.	2021-07-28 06:33:13 -07:00
Alexey Bataev	e408d1dfab	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-07-28 05:49:06 -07:00
Alexey Bataev	6ca48efcf6	[SLP]Fix costs calculations. Need to fix several cost-related problems. The final type may be defined incorrectly because of to early definition (we may end up with the wider type), the CommonCost should not be redefined in ExtractElements cost related calculations and the shuffle of the final insertelements vectors should be calculated as a cost of single vector permutations + costs of two vector permutations for other n-1 incoming vectors. Differential Revision: https://reviews.llvm.org/D106578	2021-07-26 07:14:03 -07:00
Alexey Bataev	d7cb2a0796	Revert "[SLP]Fix costs calculations." This reverts commit `a053afed49` to fix buildbots.	2021-07-26 05:42:34 -07:00
Alexey Bataev	a053afed49	[SLP]Fix costs calculations. Need to fix several cost-related problems. The final type may be defined incorrectly because of to early definition (we may end up with the wider type), the CommonCost should not be redefined in ExtractElements cost related calculations and the shuffle of the final insertelements vectors should be calculated as a cost of single vector permutations + costs of two vector permutations for other n-1 incoming vectors. Differential Revision: https://reviews.llvm.org/D106578	2021-07-26 04:37:22 -07:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
Alexey Bataev	d8d8b4574a	[SLP]Fix possible crash on unreachable incoming values sorting. The incoming values for PHI nodes may come from unreachable BasicBlocks, need to handle this case. Differential Revision: https://reviews.llvm.org/D106264	2021-07-19 04:54:53 -07:00
Alexey Bataev	da3dbfcacf	[SLP]Improve calculations of the cost for reused/reordered scalars. Part of D105020. Also, fixed FIXMEs that need to use wider vector type when trying to calculate the cost of reused scalars. This may cause regressions unless D100486 is landed to improve the cost estimations for long vectors shuffling. Differential Revision: https://reviews.llvm.org/D106060	2021-07-16 13:40:15 -07:00
Alexey Bataev	1b18e9ab67	[PATCH] D105827: [SLP]Workaround for InsertSubVector cost. The cost of the InsertSubvector shuffle kind cost is not complete and may end up with just extracts + inserts costs in many cases. Added a workaround to represent it as a generic PermuteSingleSrc, which is still pessimistic but better than InsertSubvector. Differential Revision: https://reviews.llvm.org/D105827	2021-07-16 12:59:08 -07:00
Sanjay Patel	81ce3aa30c	[SLP] avoid leaking poison in reduction of safe boolean logic ops This bug was introduced with D105730 / `25ee55c0ba` . If we are not converting all of the operations of a reduction into a vector op, we need to preserve the existing select form of the remaining ops. Otherwise, we are potentially leaking poison where it did not in the original code. Alive2 agrees that the version that freezes some inputs and then falls back to scalar is correct: https://alive2.llvm.org/ce/z/erF4K2	2021-07-15 17:33:06 -04:00
Arthur Eubanks	99cb2507f3	Revert "[SLP]Workaround for InsertSubVector cost." This reverts commit `2eb50baf05`. Causes hangs, see comments on D105827.	2021-07-15 10:19:41 -07:00

1 2 3 4 5 ...

955 Commits