llvm-project

Commit Graph

Author	SHA1	Message	Date
David Sherwood	f4122398e7	[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64 I have added a new TTI interface called enableOrderedReductions() that controls whether or not ordered reductions should be enabled for a given target. By default this returns false, whereas for AArch64 it returns true and we rely upon the cost model to make sensible vectorisation choices. It is still possible to override the new TTI interface by setting the command line flag: -force-ordered-reductions=true\|false I have added a new RUN line to show that we use ordered reductions by default for SVE and Neon: Transforms/LoopVectorize/AArch64/strict-fadd.ll Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll Differential Revision: https://reviews.llvm.org/D106653	2021-08-19 09:29:40 +01:00
Dylan Fleming	ef198cd99e	[SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute Removed AArch64 usage of the getMaxVScale interface, replacing it with the vscale_range(min, max) IR Attribute. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D106277	2021-08-17 14:42:47 +01:00
Nikita Popov	570c9beb8e	[MemorySSA] Remove unnecessary MSSA dependencies LoopLoadElimination, LoopVersioning and LoopVectorize currently fetch MemorySSA when construction LoopAccessAnalysis. However, LoopAccessAnalysis does not actually use MemorySSA and we can pass nullptr instead. This saves one MemorySSA calculation in the default pipeline, and thus improves compile-time. Differential Revision: https://reviews.llvm.org/D108074	2021-08-16 20:40:55 +02:00
Paul Walker	f7a831daa6	[LoopVectorize] Don't emit remarks about lack of scalable vectors unless they're specifically requested. Previously we emitted a "does not support scalable vectors" remark for all targets whenever vectorisation is attempted. This pollutes the output for architectures that don't support scalable vectors and is likely confusing to the user. Instead this patch introduces a debug message that reports when scalable vectorisation is allowed by the target and only issues the previous remark when scalable vectorisation is specifically requested, for example: #pragma clang loop vectorize_width(2, scalable) Differential Revision: https://reviews.llvm.org/D108028	2021-08-15 12:15:52 +01:00
Dorit Nuzman	67278b8a90	[LV] Support Interleaved Store Group With Gaps Teach LV to use masked-store to support interleave-store-group with gaps (instead of scatters/scalarization). The symmetric case of using masked-load to support interleaved-load-group with gaps was introduced a while ago, by https://reviews.llvm.org/D53668; This patch completes the store-scenario leftover from D53668, and solves PR50566. Reviewed by: Ayal Zaks Differential Revision: https://reviews.llvm.org/D104750	2021-08-08 10:32:02 +03:00
Florian Hahn	a00aafc30d	[VPlan] Iterate over phi recipes to detect reductions to fix. After refactoring the phi recipes, we can now iterate over all header phis in a VPlan to detect reductions when it comes to fixing them up when tail folding. This reduces the coupling with the cost model & legal by using the information directly available in VPlan. It also removes a call to getOrAddVPValue, which references the original IR value which may become outdated after VPlan transformations. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100102	2021-08-07 14:06:50 +01:00
David Sherwood	3fd96e1b2e	[LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform This patch adds more instructions to the Uniforms list, for example certain intrinsics that are uniform by definition or whose operands are loop invariant. This list includes: 1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition. 2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too. Also, in VPRecipeBuilder::handleReplication we check if an instruction is uniform based purely on whether or not the instruction lives in the Uniforms list. However, there are certain cases where calls to some intrinsics can be effectively treated as uniform too. Therefore, we now also treat the following cases as uniform for scalable vectors: 1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane. 2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect. I have updated the assume test for fixed width, since we now treat it as uniform: Transforms/LoopVectorize/assume.ll I've also added new scalable vectorisation tests for other intriniscs: Transforms/LoopVectorize/scalable-assume.ll Transforms/LoopVectorize/scalable-lifetime.ll Transforms/LoopVectorize/scalable-noalias-scope-decl.ll Differential Revision: https://reviews.llvm.org/D107284	2021-08-06 10:13:15 +01:00
David Sherwood	43a5c750d1	Revert "[LoopVectorize] Add support for replication of more intrinsics with scalable vectors" This reverts commit `95800da914`.	2021-08-06 09:48:16 +01:00
Florian Hahn	3e58dd19df	[LV] Move reduction PHI node fixup to VPlan::execute (NFC). All information to fix-up the reduction phi nodes in the vectorized loop is available in VPlan now. This patch moves the code to do so, to make this clearer. Fixing up the loop exit value still relies on other information and remains outside of VPlan for now. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100113	2021-08-06 08:29:20 +01:00
Kazu Hirata	72661f337a	[Transforms] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-08-05 08:53:17 -07:00
Alexey Bataev	e7c3eaa8ae	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107494	2021-08-05 08:41:24 -07:00
David Sherwood	e9177b0958	Fix build issues caused by `95800da914`	2021-08-05 16:26:34 +01:00
Sander de Smalen	3e47f009ff	[LV] Consider ExtractValue as uniform. Since all operands to ExtractValue must be loop-invariant when we deem the loop vectorizable, we can consider ExtractValue to be uniform. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107286	2021-08-05 16:20:50 +01:00
Florian Hahn	38b098be66	[VectorCombine] Limit scalarization known non-poison indices. We can only trust the range of the index if it is guaranteed non-poison. Fixes PR50949. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107364	2021-08-05 15:36:31 +01:00
David Sherwood	95800da914	[LoopVectorize] Add support for replication of more intrinsics with scalable vectors This patch adds more instructions to the Uniforms list, for example certain intrinsics that are uniform by definition or whose operands are loop invariant. This list includes: 1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition. 2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too. Also, in VPRecipeBuilder::handleReplication we check if an instruction is uniform based purely on whether or not the instruction lives in the Uniforms list. However, there are certain cases where calls to some intrinsics can be effectively treated as uniform too. Therefore, we now also treat the following cases as uniform for scalable vectors: 1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane. 2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect. I have updated the assume test for fixed width, since we now treat it as uniform: Transforms/LoopVectorize/assume.ll I've also added new scalable vectorisation tests for other intriniscs: Transforms/LoopVectorize/scalable-assume.ll Transforms/LoopVectorize/scalable-lifetime.ll Transforms/LoopVectorize/scalable-noalias-scope-decl.ll Differential Revision: https://reviews.llvm.org/D107284	2021-08-05 15:17:27 +01:00
Sander de Smalen	8d08a84745	[LV] Remove a change that was added in D106164. This change wasn't strictly necessary for D106164 and could be removed. This patch addresses the post-commit comments from @fhahn on D106164, and also changes sve-widen-gep.ll to use the same IR test as shown in pointer-induction.ll. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106878	2021-08-05 14:44:53 +01:00
Alexey Bataev	214f99b27c	Revert "[SLP]Do not emit extra shuffle for insertelements vectorization." This reverts commit `871ea69803` to fix the problem if the first vector is not just undef.	2021-08-04 11:28:59 -07:00
Alexey Bataev	871ea69803	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107344	2021-08-03 13:18:41 -07:00
Alexey Bataev	7d9d926a18	Revert "[SLP]Improve graph reordering." This reverts commit `e408d1dfab` and 2 other (`4b25c11321` and `c2deb2afaf`) related to fix the problem with the reordering shuffles.	2021-08-03 12:13:43 -07:00
David Sherwood	0156f91f3b	[NFC] Rename enable-strict-reductions to force-ordered-reductions I'm renaming the flag because a future patch will add a new enableOrderedReductions() TTI interface and so the meaning of this flag will change to be one of forcing the target to enable/disable them. Also, since other places in LoopVectorize.cpp use the word 'Ordered' instead of 'strict' I changed the flag to match. Differential Revision: https://reviews.llvm.org/D107264	2021-08-03 09:33:01 +01:00
Florian Hahn	bb725c9803	[VPlan] Use defined and ops VPValues to print VPInterleaveRecipe. This patch updates VPInterleaveRecipe::print to print the actual defined VPValues for load groups and the store VPValue operands for store groups. The IR references may become outdated while transforming the VPlan and the defined and stored VPValues always are up-to-date. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D107223	2021-08-02 18:36:36 +01:00
Alexey Bataev	95e5d401ae	[SLP]Improve splats vectorization. Replace insertelement instructions for splats with just single insertelement + broadcast shuffle. Also, try to merge these instructions if they come from the same/shuffled gather node. Differential Revision: https://reviews.llvm.org/D107104	2021-07-30 10:17:45 -07:00
Alexey Bataev	4b25c11321	[SLP]Fix an assertion for the size of user nodes. For the nodes with reused scalars the user may be not only of the size of the final shuffle but also of the size of the scalars themselves, need to check for this. It is safe to just modify the check here, since the order of the scalars themselves is preserved, only indeces of the reused scalars are changed. So, the users with the same size as the number of scalars in the node, will not be affected, they still will get the operands in the required order. Reported by @mstorsjo in D105020. Differential Revision: https://reviews.llvm.org/D107080	2021-07-30 05:46:44 -07:00
Alexey Bataev	f4fb854811	[SLP]Do not consider deleted instruction as external users. If the instruction was previously deleted, it should not be treated as an external user. This fixes cost estimation and removes dead extractelement instructions. Differential Revision: https://reviews.llvm.org/D107106	2021-07-30 05:37:43 -07:00
Alexey Bataev	c2deb2afaf	[SLP]Fix a crash in gathered loads analysis. Need to check that the minimum acceptable vector factor is at least 2, not 0, to avoid compiler crash during gathered loads analysis. Differential Revision: https://reviews.llvm.org/D107058	2021-07-30 05:19:17 -07:00
Alexey Bataev	3ad6437fcc	[SLP]Fix build on MacOS, NFC.	2021-07-28 06:33:13 -07:00
Alexey Bataev	e408d1dfab	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-07-28 05:49:06 -07:00
Florian Hahn	c07dd2b885	[LV] Move recurrence backedge fixup code to VPlan::execute (NFC). As suggested in D105008, move the code that fixes up the backedge value for first order recurrences to VPlan::execute. Now all that remains in fixFirstOrderRecurrences is the code responsible for creating the exit values in the middle block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D106244	2021-07-28 13:32:40 +01:00
David Green	41cedb1c9a	[LV][ARM] Tighten up MLA reduction costing This makes a couple of changes to the costing of MLA reduction patterns, to more accurately cost various patterns that can come up from vectorization. - The Arm implementation of getExtendedAddReductionCost is altered to only provide costs for legal or smaller types. Larger than legal types need to be split, which currently does not work very well, especially for predicated reductions where the predicate may be legal but needs to be split. Currently we limit it to legal or smaller input types. - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext)) is a pattern that can come up, and can be treated the same as reduce(mul(ext, ext)) providing the extension types match. - And it has been adjusted to not count the ext in reduce(mul(ext, ext)) as part of a reduce(mul) pattern. Together these changes help to more accurately cost the mla reductions in cases such as where the extend types don't match or the extend opcodes are different, picking better vector factors that don't result in expanded reductions. Differential Revision: https://reviews.llvm.org/D106166	2021-07-28 12:50:58 +01:00
David Sherwood	a5dd6c6cf9	[LoopVectorize] Don't interleave scalar ordered reductions for inner loops Consider the following loop: void foo(float dst, float src, int N) { for (int i = 0; i < N; i++) { dst[i] = 0.0; for (int j = 0; j < N; j++) { dst[i] += src[(i * N) + j]; } } } When we are not building with -Ofast we may attempt to vectorise the inner loop using ordered reductions instead. In addition we also try to select an appropriate interleave count for the inner loop. However, when choosing a VF=1 the inner loop will be scalar and there is existing code in selectInterleaveCount that limits the interleave count to 2 for reductions due to concerns about increasing the critical path. For ordered reductions this problem is even worse due to the additional data dependency, and so I've added code to simply disable interleaving for scalar ordered reductions for now. Test added here: Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll Differential Revision: https://reviews.llvm.org/D106646	2021-07-27 17:41:01 +01:00
Sander de Smalen	d7dd12aee3	[LV] Disable Scalable VFs when tail folding is enabled b/c of low tripcount. The loop vectorizer may decide to use tail folding when the trip-count is low. When that happens, scalable VFs are no longer a candidate, since tail folding/predication is not yet supported for scalable vectors. This can be re-enabled in a future patch. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D106657	2021-07-27 11:37:21 +01:00
Sander de Smalen	13ccb09725	[LV] Don't let ForceTargetInstructionCost override Invalid cost. Invalid costs can be used to avoid vectorization with a given VF, which is used for scalable vectors to avoid things that the code-generator cannot handle. If we override the cost using the -force-target-instruction-cost option of the LV, we would override this mechanism, rendering the flag useless. This change ensures the cost is only overriden when the original cost that was calculated is valid. That allows the flag to be used in combination with the -scalable-vectorization option. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106677	2021-07-26 20:27:49 +01:00
Sander de Smalen	b9051ba848	[LV] Remove assert that VF cannot be scalable in setCostBasedWideningDecision. Scalarization for scalable vectors is not (yet) supported, so the LV discards a VF when scalarization is chosen as the widening decision. It should therefore not assert that the VF is not scalable when it computes the decision to scalarize. The code can get here when both the interleave-cost, gather/scatter cost and scalarization-cost are all illegal. This may e.g. happen for SVE when the VF=1, to avoid generating `<vscale x 1 x eltty>` types that the code-generator cannot yet handle. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106656	2021-07-26 17:11:45 +01:00
Sander de Smalen	981e9dce54	[LV] Don't assume isScalarAfterVectorization if one of the uses needs widening. This fixes an issue that was found in D105199, where a GEP instruction is used both as the address of a store, as well as the value of a store. For the former, the value is scalar after vectorization, but the latter (as value) requires widening. Other code in that function seems to prevent similar cases from happening, but it seems this case was missed. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106164	2021-07-26 16:01:55 +01:00
Florian Hahn	7a1e73f0b9	Recommit "[VPlan] Add recipe for first-order rec phis, make splicing explicit." This reverts the revert commit `b1777b04dc`. The patch originally got reverted due to a crash: https://bugs.chromium.org/p/chromium/issues/detail?id=1232798#c2 The underlying issue was that we were not using the stored values from the modified memory recipes, but the out-of-date values directly from the IR (accessed via the VPlan). This should be fixed in `d995d6376`. A reduced version of the reproducer has been added in `93664503be`.	2021-07-26 15:50:30 +01:00
Alexey Bataev	6ca48efcf6	[SLP]Fix costs calculations. Need to fix several cost-related problems. The final type may be defined incorrectly because of to early definition (we may end up with the wider type), the CommonCost should not be redefined in ExtractElements cost related calculations and the shuffle of the final insertelements vectors should be calculated as a cost of single vector permutations + costs of two vector permutations for other n-1 incoming vectors. Differential Revision: https://reviews.llvm.org/D106578	2021-07-26 07:14:03 -07:00
Kerry McLaughlin	e484e1ae03	[SVE] Fix casts to <FixedVectorType> in truncateToMinimalBitwidths Fixes more casts to `<FixedVectorType>` for the cases where the instruction is a Insert/ExtractElementInst. For fixed-width, this part of truncateToMinimalBitWidths is tested by AArch64/type-shrinkage-insertelt.ll. I attempted to write a test case for this part of truncateToMinimalBitWidths which uses scalable vectors, but was unable to add one. The tests in type-shrinkage-insertelt.ll rely on scalarization to create extract element instructions for instance, which is not possible for scalable vectors. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106163	2021-07-26 13:44:51 +01:00
Alexey Bataev	d7cb2a0796	Revert "[SLP]Fix costs calculations." This reverts commit `a053afed49` to fix buildbots.	2021-07-26 05:42:34 -07:00
Alexey Bataev	a053afed49	[SLP]Fix costs calculations. Need to fix several cost-related problems. The final type may be defined incorrectly because of to early definition (we may end up with the wider type), the CommonCost should not be redefined in ExtractElements cost related calculations and the shuffle of the final insertelements vectors should be calculated as a cost of single vector permutations + costs of two vector permutations for other n-1 incoming vectors. Differential Revision: https://reviews.llvm.org/D106578	2021-07-26 04:37:22 -07:00
Florian Hahn	d995d63767	[VPlan] Use stored value from recipes for interleave groups. Instead of getting the VPValue for the stored IR values through the current plan, use the stored value of the recipes directly. This way, the correct VPValues are used if the store recipes have been modified in the VPlan and the IR value is not correct any longer. This can happen, e.g. due to D105008.	2021-07-26 12:05:23 +01:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
Nico Weber	b1777b04dc	Revert "[VPlan] Add recipe for first-order rec phis, make splicing explicit." Makes clang crash: https://reviews.llvm.org/D105008#2903350 This reverts commit `d2a73fb44e`. Also revert a minor formatting follow-up: This reverts commit `82834a6732`.	2021-07-25 17:39:28 -04:00
Caroline Concatto	5a4de84d55	[LoopVectorize] Fix crash for predicated instruction with scalable VF This patch avoids computing discounts for predicated instructions when the VF is scalable. There is no support for vectorization of loops with division because the vectorizer cannot guarantee that zero divisions will not happen. This loop now does not use VF scalable ``` for (long long i = 0; i < n; i++) if (cond[i]) a[i] /= b[i]; ``` Differential Revision: https://reviews.llvm.org/D101916	2021-07-22 12:48:27 +01:00
David Green	72dc5cab4f	[LV] Make use of PatternMatchers in getReductionPatternCost. NFC Pulled out of D106166, this modifies getReductionPatternCost to use PatternMatchers, hopefully simplifying the code a little.	2021-07-21 11:34:30 +01:00
David Green	4272e64acd	[LV] Change interface of getReductionPatternCost to return Optional Currently the Instruction cost of getReductionPatternCost returns an Invalid cost to specify "did not find the pattern". This changes that to return an Optional with None specifying not found, allowing Invalid to mean an infinite cost as is used elsewhere. Differential Revision: https://reviews.llvm.org/D106140	2021-07-20 16:44:50 +01:00
Caroline Concatto	cf78995c4a	[NFC][LoopVectorizer] Remove VF.isScalable() assertion from collectInstsToScalarize and getInstructionCost This patch removes the assertion when VF is scalable and replaces getKnownMinValue() by getFixedValue(), so it still guards the code against scalable vector types. The assertions were used to guarantee that getknownMinValue were not used for scalable vectors. Differential Revision: https://reviews.llvm.org/D106359	2021-07-20 15:56:30 +01:00
Florian Hahn	82834a6732	[VPlan] Fix formatting glitch from `d2a73fb44e`.	2021-07-20 16:16:30 +02:00
Florian Hahn	d2a73fb44e	[VPlan] Add recipe for first-order rec phis, make splicing explicit. This patch adds a VPFirstOrderRecurrencePHIRecipe, to further untangle VPWidenPHIRecipe into distinct recipes for distinct use cases/lowering. See D104989 for a new recipe for reduction phis. This patch also introduces a new `FirstOrderRecurrenceSplice` VPInstruction opcode, which is used to make the forming of the vector recurrence value explicit in VPlan. This more accurately models def-uses in VPlan and also simplifies code-generation. Now, the vector recurrence values are created at the right place during VPlan-codegeneration, rather than during post-VPlan fixups. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D105008	2021-07-20 16:14:17 +02:00
Alexey Bataev	d8d8b4574a	[SLP]Fix possible crash on unreachable incoming values sorting. The incoming values for PHI nodes may come from unreachable BasicBlocks, need to handle this case. Differential Revision: https://reviews.llvm.org/D106264	2021-07-19 04:54:53 -07:00
Alexey Bataev	da3dbfcacf	[SLP]Improve calculations of the cost for reused/reordered scalars. Part of D105020. Also, fixed FIXMEs that need to use wider vector type when trying to calculate the cost of reused scalars. This may cause regressions unless D100486 is landed to improve the cost estimations for long vectors shuffling. Differential Revision: https://reviews.llvm.org/D106060	2021-07-16 13:40:15 -07:00

1 2 3 4 5 ...

2676 Commits