llvm-project

Commit Graph

Author	SHA1	Message	Date
Kazu Hirata	72661f337a	[Transforms] Drop unnecessary const from return types (NFC) Identified with readability-const-return-type.	2021-08-05 08:53:17 -07:00
Alexey Bataev	e7c3eaa8ae	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107494	2021-08-05 08:41:24 -07:00
David Sherwood	e9177b0958	Fix build issues caused by `95800da914`	2021-08-05 16:26:34 +01:00
Sander de Smalen	3e47f009ff	[LV] Consider ExtractValue as uniform. Since all operands to ExtractValue must be loop-invariant when we deem the loop vectorizable, we can consider ExtractValue to be uniform. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107286	2021-08-05 16:20:50 +01:00
Florian Hahn	38b098be66	[VectorCombine] Limit scalarization known non-poison indices. We can only trust the range of the index if it is guaranteed non-poison. Fixes PR50949. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D107364	2021-08-05 15:36:31 +01:00
David Sherwood	95800da914	[LoopVectorize] Add support for replication of more intrinsics with scalable vectors This patch adds more instructions to the Uniforms list, for example certain intrinsics that are uniform by definition or whose operands are loop invariant. This list includes: 1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition. 2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too. Also, in VPRecipeBuilder::handleReplication we check if an instruction is uniform based purely on whether or not the instruction lives in the Uniforms list. However, there are certain cases where calls to some intrinsics can be effectively treated as uniform too. Therefore, we now also treat the following cases as uniform for scalable vectors: 1. If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane. 2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect. I have updated the assume test for fixed width, since we now treat it as uniform: Transforms/LoopVectorize/assume.ll I've also added new scalable vectorisation tests for other intriniscs: Transforms/LoopVectorize/scalable-assume.ll Transforms/LoopVectorize/scalable-lifetime.ll Transforms/LoopVectorize/scalable-noalias-scope-decl.ll Differential Revision: https://reviews.llvm.org/D107284	2021-08-05 15:17:27 +01:00
Sander de Smalen	8d08a84745	[LV] Remove a change that was added in D106164. This change wasn't strictly necessary for D106164 and could be removed. This patch addresses the post-commit comments from @fhahn on D106164, and also changes sve-widen-gep.ll to use the same IR test as shown in pointer-induction.ll. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106878	2021-08-05 14:44:53 +01:00
Alexey Bataev	214f99b27c	Revert "[SLP]Do not emit extra shuffle for insertelements vectorization." This reverts commit `871ea69803` to fix the problem if the first vector is not just undef.	2021-08-04 11:28:59 -07:00
Alexey Bataev	871ea69803	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107344	2021-08-03 13:18:41 -07:00
Alexey Bataev	7d9d926a18	Revert "[SLP]Improve graph reordering." This reverts commit `e408d1dfab` and 2 other (`4b25c11321` and `c2deb2afaf`) related to fix the problem with the reordering shuffles.	2021-08-03 12:13:43 -07:00
David Sherwood	0156f91f3b	[NFC] Rename enable-strict-reductions to force-ordered-reductions I'm renaming the flag because a future patch will add a new enableOrderedReductions() TTI interface and so the meaning of this flag will change to be one of forcing the target to enable/disable them. Also, since other places in LoopVectorize.cpp use the word 'Ordered' instead of 'strict' I changed the flag to match. Differential Revision: https://reviews.llvm.org/D107264	2021-08-03 09:33:01 +01:00
Florian Hahn	bb725c9803	[VPlan] Use defined and ops VPValues to print VPInterleaveRecipe. This patch updates VPInterleaveRecipe::print to print the actual defined VPValues for load groups and the store VPValue operands for store groups. The IR references may become outdated while transforming the VPlan and the defined and stored VPValues always are up-to-date. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D107223	2021-08-02 18:36:36 +01:00
Alexey Bataev	95e5d401ae	[SLP]Improve splats vectorization. Replace insertelement instructions for splats with just single insertelement + broadcast shuffle. Also, try to merge these instructions if they come from the same/shuffled gather node. Differential Revision: https://reviews.llvm.org/D107104	2021-07-30 10:17:45 -07:00
Alexey Bataev	4b25c11321	[SLP]Fix an assertion for the size of user nodes. For the nodes with reused scalars the user may be not only of the size of the final shuffle but also of the size of the scalars themselves, need to check for this. It is safe to just modify the check here, since the order of the scalars themselves is preserved, only indeces of the reused scalars are changed. So, the users with the same size as the number of scalars in the node, will not be affected, they still will get the operands in the required order. Reported by @mstorsjo in D105020. Differential Revision: https://reviews.llvm.org/D107080	2021-07-30 05:46:44 -07:00
Alexey Bataev	f4fb854811	[SLP]Do not consider deleted instruction as external users. If the instruction was previously deleted, it should not be treated as an external user. This fixes cost estimation and removes dead extractelement instructions. Differential Revision: https://reviews.llvm.org/D107106	2021-07-30 05:37:43 -07:00
Alexey Bataev	c2deb2afaf	[SLP]Fix a crash in gathered loads analysis. Need to check that the minimum acceptable vector factor is at least 2, not 0, to avoid compiler crash during gathered loads analysis. Differential Revision: https://reviews.llvm.org/D107058	2021-07-30 05:19:17 -07:00
Alexey Bataev	3ad6437fcc	[SLP]Fix build on MacOS, NFC.	2021-07-28 06:33:13 -07:00
Alexey Bataev	e408d1dfab	[SLP]Improve graph reordering. Reworked reordering algorithm. Originally, the compiler just tried to detect the most common order in the reordarable nodes (loads, stores, extractelements,extractvalues) and then fully rebuilding the graph in the best order. This was not effecient, since it required an extra memory and time for building/rebuilding tree, double the use of the scheduling budget, which could lead to missing vectorization due to exausted scheduling resources. Patch provide 2-way approach for graph reodering problem. At first, all reordering is done in-place, it doe not required tree deleting/rebuilding, it just rotates the scalars/orders/reuses masks in the graph node. The first step (top-to bottom) rotates the whole graph, similarly to the previous implementation. Compiler counts the number of the most used orders of the graph nodes with the same vectorization factor and then rotates the subgraph with the given vectorization factor to the most used order, if it is not empty. Then repeats the same procedure for the subgraphs with the smaller vectorization factor. We can do this because we still need to reshuffle smaller subgraph when buildiong operands for the graph nodes with lasrger vectorization factor, we can rotate just subgraph, not the whole graph. The second step (bottom-to-top) scans through the leaves and tries to detect the users of the leaves which can be reordered. If the leaves can be reorder in the best fashion, they are reordered and their user too. It allows to remove double shuffles to the same ordering of the operands in many cases and just reorder the user operations instead. Plus, it moves the final shuffles closer to the top of the graph and in many cases allows to remove extra shuffle because the same procedure is repeated again and we can again merge some reordering masks and reorder user nodes instead of the operands. Also, patch improves cost model for gathering of loads, which improves x264 benchmark in some cases. Gives about +2% on AVX512 + LTO (more expected for AVX/AVX2) for {625,525}x264, +3% for 508.namd, improves most of other benchmarks. The compile and link time are almost the same, though in some cases it should be better (we're not doing an extra instruction scheduling anymore) + we may vectorize more code for the large basic blocks again because of saving scheduling budget. Differential Revision: https://reviews.llvm.org/D105020	2021-07-28 05:49:06 -07:00
Florian Hahn	c07dd2b885	[LV] Move recurrence backedge fixup code to VPlan::execute (NFC). As suggested in D105008, move the code that fixes up the backedge value for first order recurrences to VPlan::execute. Now all that remains in fixFirstOrderRecurrences is the code responsible for creating the exit values in the middle block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D106244	2021-07-28 13:32:40 +01:00
David Green	41cedb1c9a	[LV][ARM] Tighten up MLA reduction costing This makes a couple of changes to the costing of MLA reduction patterns, to more accurately cost various patterns that can come up from vectorization. - The Arm implementation of getExtendedAddReductionCost is altered to only provide costs for legal or smaller types. Larger than legal types need to be split, which currently does not work very well, especially for predicated reductions where the predicate may be legal but needs to be split. Currently we limit it to legal or smaller input types. - The getReductionPatternCost has learnt that reduce(ext(mul(ext, ext)) is a pattern that can come up, and can be treated the same as reduce(mul(ext, ext)) providing the extension types match. - And it has been adjusted to not count the ext in reduce(mul(ext, ext)) as part of a reduce(mul) pattern. Together these changes help to more accurately cost the mla reductions in cases such as where the extend types don't match or the extend opcodes are different, picking better vector factors that don't result in expanded reductions. Differential Revision: https://reviews.llvm.org/D106166	2021-07-28 12:50:58 +01:00
David Sherwood	a5dd6c6cf9	[LoopVectorize] Don't interleave scalar ordered reductions for inner loops Consider the following loop: void foo(float dst, float src, int N) { for (int i = 0; i < N; i++) { dst[i] = 0.0; for (int j = 0; j < N; j++) { dst[i] += src[(i * N) + j]; } } } When we are not building with -Ofast we may attempt to vectorise the inner loop using ordered reductions instead. In addition we also try to select an appropriate interleave count for the inner loop. However, when choosing a VF=1 the inner loop will be scalar and there is existing code in selectInterleaveCount that limits the interleave count to 2 for reductions due to concerns about increasing the critical path. For ordered reductions this problem is even worse due to the additional data dependency, and so I've added code to simply disable interleaving for scalar ordered reductions for now. Test added here: Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll Differential Revision: https://reviews.llvm.org/D106646	2021-07-27 17:41:01 +01:00
Sander de Smalen	d7dd12aee3	[LV] Disable Scalable VFs when tail folding is enabled b/c of low tripcount. The loop vectorizer may decide to use tail folding when the trip-count is low. When that happens, scalable VFs are no longer a candidate, since tail folding/predication is not yet supported for scalable vectors. This can be re-enabled in a future patch. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D106657	2021-07-27 11:37:21 +01:00
Sander de Smalen	13ccb09725	[LV] Don't let ForceTargetInstructionCost override Invalid cost. Invalid costs can be used to avoid vectorization with a given VF, which is used for scalable vectors to avoid things that the code-generator cannot handle. If we override the cost using the -force-target-instruction-cost option of the LV, we would override this mechanism, rendering the flag useless. This change ensures the cost is only overriden when the original cost that was calculated is valid. That allows the flag to be used in combination with the -scalable-vectorization option. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106677	2021-07-26 20:27:49 +01:00
Sander de Smalen	b9051ba848	[LV] Remove assert that VF cannot be scalable in setCostBasedWideningDecision. Scalarization for scalable vectors is not (yet) supported, so the LV discards a VF when scalarization is chosen as the widening decision. It should therefore not assert that the VF is not scalable when it computes the decision to scalarize. The code can get here when both the interleave-cost, gather/scatter cost and scalarization-cost are all illegal. This may e.g. happen for SVE when the VF=1, to avoid generating `<vscale x 1 x eltty>` types that the code-generator cannot yet handle. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106656	2021-07-26 17:11:45 +01:00
Sander de Smalen	981e9dce54	[LV] Don't assume isScalarAfterVectorization if one of the uses needs widening. This fixes an issue that was found in D105199, where a GEP instruction is used both as the address of a store, as well as the value of a store. For the former, the value is scalar after vectorization, but the latter (as value) requires widening. Other code in that function seems to prevent similar cases from happening, but it seems this case was missed. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106164	2021-07-26 16:01:55 +01:00
Florian Hahn	7a1e73f0b9	Recommit "[VPlan] Add recipe for first-order rec phis, make splicing explicit." This reverts the revert commit `b1777b04dc`. The patch originally got reverted due to a crash: https://bugs.chromium.org/p/chromium/issues/detail?id=1232798#c2 The underlying issue was that we were not using the stored values from the modified memory recipes, but the out-of-date values directly from the IR (accessed via the VPlan). This should be fixed in `d995d6376`. A reduced version of the reproducer has been added in `93664503be`.	2021-07-26 15:50:30 +01:00
Alexey Bataev	6ca48efcf6	[SLP]Fix costs calculations. Need to fix several cost-related problems. The final type may be defined incorrectly because of to early definition (we may end up with the wider type), the CommonCost should not be redefined in ExtractElements cost related calculations and the shuffle of the final insertelements vectors should be calculated as a cost of single vector permutations + costs of two vector permutations for other n-1 incoming vectors. Differential Revision: https://reviews.llvm.org/D106578	2021-07-26 07:14:03 -07:00
Kerry McLaughlin	e484e1ae03	[SVE] Fix casts to <FixedVectorType> in truncateToMinimalBitwidths Fixes more casts to `<FixedVectorType>` for the cases where the instruction is a Insert/ExtractElementInst. For fixed-width, this part of truncateToMinimalBitWidths is tested by AArch64/type-shrinkage-insertelt.ll. I attempted to write a test case for this part of truncateToMinimalBitWidths which uses scalable vectors, but was unable to add one. The tests in type-shrinkage-insertelt.ll rely on scalarization to create extract element instructions for instance, which is not possible for scalable vectors. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D106163	2021-07-26 13:44:51 +01:00
Alexey Bataev	d7cb2a0796	Revert "[SLP]Fix costs calculations." This reverts commit `a053afed49` to fix buildbots.	2021-07-26 05:42:34 -07:00
Alexey Bataev	a053afed49	[SLP]Fix costs calculations. Need to fix several cost-related problems. The final type may be defined incorrectly because of to early definition (we may end up with the wider type), the CommonCost should not be redefined in ExtractElements cost related calculations and the shuffle of the final insertelements vectors should be calculated as a cost of single vector permutations + costs of two vector permutations for other n-1 incoming vectors. Differential Revision: https://reviews.llvm.org/D106578	2021-07-26 04:37:22 -07:00
Florian Hahn	d995d63767	[VPlan] Use stored value from recipes for interleave groups. Instead of getting the VPValue for the stored IR values through the current plan, use the stored value of the recipes directly. This way, the correct VPValues are used if the store recipes have been modified in the VPlan and the IR value is not correct any longer. This can happen, e.g. due to D105008.	2021-07-26 12:05:23 +01:00
David Sherwood	0aff1798b5	[Analysis] Add simple cost model for strict (in-order) reductions I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432	2021-07-26 10:26:06 +01:00
Nico Weber	b1777b04dc	Revert "[VPlan] Add recipe for first-order rec phis, make splicing explicit." Makes clang crash: https://reviews.llvm.org/D105008#2903350 This reverts commit `d2a73fb44e`. Also revert a minor formatting follow-up: This reverts commit `82834a6732`.	2021-07-25 17:39:28 -04:00
Caroline Concatto	5a4de84d55	[LoopVectorize] Fix crash for predicated instruction with scalable VF This patch avoids computing discounts for predicated instructions when the VF is scalable. There is no support for vectorization of loops with division because the vectorizer cannot guarantee that zero divisions will not happen. This loop now does not use VF scalable ``` for (long long i = 0; i < n; i++) if (cond[i]) a[i] /= b[i]; ``` Differential Revision: https://reviews.llvm.org/D101916	2021-07-22 12:48:27 +01:00
David Green	72dc5cab4f	[LV] Make use of PatternMatchers in getReductionPatternCost. NFC Pulled out of D106166, this modifies getReductionPatternCost to use PatternMatchers, hopefully simplifying the code a little.	2021-07-21 11:34:30 +01:00
David Green	4272e64acd	[LV] Change interface of getReductionPatternCost to return Optional Currently the Instruction cost of getReductionPatternCost returns an Invalid cost to specify "did not find the pattern". This changes that to return an Optional with None specifying not found, allowing Invalid to mean an infinite cost as is used elsewhere. Differential Revision: https://reviews.llvm.org/D106140	2021-07-20 16:44:50 +01:00
Caroline Concatto	cf78995c4a	[NFC][LoopVectorizer] Remove VF.isScalable() assertion from collectInstsToScalarize and getInstructionCost This patch removes the assertion when VF is scalable and replaces getKnownMinValue() by getFixedValue(), so it still guards the code against scalable vector types. The assertions were used to guarantee that getknownMinValue were not used for scalable vectors. Differential Revision: https://reviews.llvm.org/D106359	2021-07-20 15:56:30 +01:00
Florian Hahn	82834a6732	[VPlan] Fix formatting glitch from `d2a73fb44e`.	2021-07-20 16:16:30 +02:00
Florian Hahn	d2a73fb44e	[VPlan] Add recipe for first-order rec phis, make splicing explicit. This patch adds a VPFirstOrderRecurrencePHIRecipe, to further untangle VPWidenPHIRecipe into distinct recipes for distinct use cases/lowering. See D104989 for a new recipe for reduction phis. This patch also introduces a new `FirstOrderRecurrenceSplice` VPInstruction opcode, which is used to make the forming of the vector recurrence value explicit in VPlan. This more accurately models def-uses in VPlan and also simplifies code-generation. Now, the vector recurrence values are created at the right place during VPlan-codegeneration, rather than during post-VPlan fixups. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D105008	2021-07-20 16:14:17 +02:00
Alexey Bataev	d8d8b4574a	[SLP]Fix possible crash on unreachable incoming values sorting. The incoming values for PHI nodes may come from unreachable BasicBlocks, need to handle this case. Differential Revision: https://reviews.llvm.org/D106264	2021-07-19 04:54:53 -07:00
Alexey Bataev	da3dbfcacf	[SLP]Improve calculations of the cost for reused/reordered scalars. Part of D105020. Also, fixed FIXMEs that need to use wider vector type when trying to calculate the cost of reused scalars. This may cause regressions unless D100486 is landed to improve the cost estimations for long vectors shuffling. Differential Revision: https://reviews.llvm.org/D106060	2021-07-16 13:40:15 -07:00
Alexey Bataev	1b18e9ab67	[PATCH] D105827: [SLP]Workaround for InsertSubVector cost. The cost of the InsertSubvector shuffle kind cost is not complete and may end up with just extracts + inserts costs in many cases. Added a workaround to represent it as a generic PermuteSingleSrc, which is still pessimistic but better than InsertSubvector. Differential Revision: https://reviews.llvm.org/D105827	2021-07-16 12:59:08 -07:00
Kerry McLaughlin	49d73130ca	[LV] Avoid scalable vectorization for loops containing alloca This patch returns an Invalid cost from getInstructionCost() for alloca instructions if the VF is scalable, as otherwise loops which contain these instructions will crash when attempting to scalarize the alloca. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105824	2021-07-16 11:47:13 +01:00
Sander de Smalen	239d01fa88	Reland "[LV] Print remark when loop cannot be vectorized due to invalid costs." The original patch was: https://reviews.llvm.org/D105806 There were some issues with undeterministic behaviour of the sorting function, which led to scalable-call.ll passing and/or failing. This patch fixes the issue by numbering all instructions in the array first, and using that number as the order, which should provide a consistent ordering. This reverts commit `a607f64118`.	2021-07-16 10:52:01 +01:00
Sanjay Patel	81ce3aa30c	[SLP] avoid leaking poison in reduction of safe boolean logic ops This bug was introduced with D105730 / `25ee55c0ba` . If we are not converting all of the operations of a reduction into a vector op, we need to preserve the existing select form of the remaining ops. Otherwise, we are potentially leaking poison where it did not in the original code. Alive2 agrees that the version that freezes some inputs and then falls back to scalar is correct: https://alive2.llvm.org/ce/z/erF4K2	2021-07-15 17:33:06 -04:00
Arthur Eubanks	99cb2507f3	Revert "[SLP]Workaround for InsertSubVector cost." This reverts commit `2eb50baf05`. Causes hangs, see comments on D105827.	2021-07-15 10:19:41 -07:00
Philip Reames	95346ba877	[LV] Enable vectorization of multiple exit loops w/computable exit counts This change enables vectorization of multiple exit loops when the exit count is statically computable. That requirement - shared with the rest of LV - in turn requires each exit to be analyzeable and to dominate the latch. The majority of work to support this was done in a set of previous patches. In particular,, `72314466` avoids having multiple edges from the middle block to the exits, and `4b33b2387` which added support for non-latch single exit and multiple exits with a single exiting block. As a result, this change is basically just removing a bailout and adjusting some tests now that the prerequisite work is done and has stuck in tree for a bit. Differential Revision: https://reviews.llvm.org/D105817	2021-07-15 08:53:51 -07:00
Sander de Smalen	a607f64118	Revert "[LV] Print remark when loop cannot be vectorized due to invalid costs." This reverts commit `efaf3099c8`. This reverts commit `dc7bdc1e71`. Reverting patches due to buildbot failures.	2021-07-15 15:21:57 +01:00
Sander de Smalen	dc7bdc1e71	[LV] Fix determinism for failing scalable-call.ll test. The sort function for emitting an OptRemark was not deterministic, which caused scalable-call.ll to fail on some buildbots. This patch fixes that. This patch also fixes an issue where `Instruction::comesBefore()` is called when two Instructions are in different basic blocks, which would otherwise cause an assertion failure.	2021-07-15 13:16:59 +01:00
Alexey Bataev	ba2690b17b	[SLP][NFC]Fix variables names, NFC.	2021-07-14 12:43:45 -07:00
Simon Pilgrim	4fd0addb68	[SLP] Fix case of variable name. NFCI.	2021-07-14 20:20:04 +01:00
Sander de Smalen	efaf3099c8	[LV] Print remark when loop cannot be vectorized due to invalid costs. This patch emits remarks for instructions that have invalid costs for a given set of vectorization factors. Some example output: t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load dst[i] = sinf(src[i]); ^ t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32 dst[i] = sinf(src[i]); ^ t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store dst[i] = sinf(src[i]); ^ Reviewed By: fhahn, kmclaughlin Differential Revision: https://reviews.llvm.org/D105806	2021-07-14 17:11:33 +01:00
Alexey Bataev	2eb50baf05	[SLP]Workaround for InsertSubVector cost. The cost of the InsertSubvector shuffle kind cost is not complete and may end up with just extracts + inserts costs in many cases. Added a workaround to represent it as a generic PermuteSingleSrc, which is still pessimistic but better than InsertSubvector. Differential Revision: https://reviews.llvm.org/D105827	2021-07-14 07:54:24 -07:00
Sanjay Patel	25ee55c0ba	[SLP] match logical and/or as reduction candidates This has been a work-in-progress for a long time...we finally have all of the pieces in place to handle vectorization of compare code as shown in: https://llvm.org/PR41312 To do this (see PhaseOrdering tests), we converted SimplifyCFG and InstCombine to the poison-safe (select) forms of the logic ops, so now we need to have SLP recognize those patterns and insert a freeze op to make a safe reduction: https://alive2.llvm.org/ce/z/NH54Ah We get the minimal patterns with this patch, but the PhaseOrdering tests show that we still need adjustments to get the ideal IR in some or all of the motivating cases. Differential Revision: https://reviews.llvm.org/D105730	2021-07-14 09:02:31 -04:00
Sander de Smalen	d2e4ccc790	[LV] Ignore candidate VFs with invalid costs. This follows on from discussion on the mailing-list: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151047.html to interpret an Invalid cost as 'infinitely expensive', as this simplifies some of the legalization issues with scalable vectors. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105473	2021-07-12 09:58:22 +01:00
Florian Hahn	c6e4c1fbd8	[VPlan] Remove default arg from getVPValue (NFC). The const version of VPValue::getVPValue still had a default value for the value index. Remove the default value and use getVPSingleValue instead, which is the proper function.	2021-07-11 22:03:09 +02:00
Sander de Smalen	239fcda268	[LV] NFCI: Do cost comparison on InstructionCost directly. Instead of performing the isMoreProfitable() operation on InstructionCost::CostTy the operation is performed on InstructionCost directly, so that it can handle the case where one of the costs is Invalid. This patch also changes the CostTy to be int64_t, so that the type is wide enough to deal with multiplications with e.g. `unsigned MaxTripCount`. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D105113	2021-07-10 11:57:16 +01:00
Valery N Dmitriev	8e9216fe87	[SLP] Do not make an attempt to match reduction on already erased instruction. Differential Revision: https://reviews.llvm.org/D105752	2021-07-09 17:13:15 -07:00
Sanjay Patel	c2b7f09d8c	[SLP] make invalid operand explicit for extra arg in reduction matching; NFC This makes it clearer when we have encountered the extra arg. Also, we may need to adjust the way the operand iteration works when handling logical and/or.	2021-07-09 15:32:12 -04:00
Sanjay Patel	486992f958	[SLP] improve code comments; NFC This likely started out only supporint binops, but now we handle min/max using cmp+sel, and we may extend to handle bool logic in the form of select.	2021-07-09 12:49:54 -04:00
Sanjay Patel	544f2711bb	[SLP] make checks for cmp+select min/max more explicit This is NFC-intended currently (so no test diffs). The motivation is to eventually allow matching for poison-safe logical-and and logical-or (these are in the form of a select-of-bools). ( https://llvm.org/PR41312 ) Those patterns will not have all of the same constraints as min/max in the form of cmp+sel. We may also end up removing the cmp+sel min/max matching entirely (if we canonicalize to intrinsics), so this will make that step easier.	2021-07-09 12:43:43 -04:00
David Green	38c9a4068d	[TTI] Remove IsPairwiseForm from getArithmeticReductionCost This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizers. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions. This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll. Differential Revision: https://reviews.llvm.org/D105484	2021-07-09 11:51:16 +01:00
Alexey Bataev	c574d2fbac	[SLP]Improve vectorization of stores. Patch tries to improve the vectorization of stores. Originally, we just check the type and the base pointer of the store. Patch adds some extra checks to avoid non-profitable vectorization cases. It includes analysis of the scalar values to be stored and triggers the vectorization attempt only if the scalar values have same/alt opcode and are from same basic block, i.e. we don't end up immediately with the gather node, which is not profitable. This also improves compile time by filtering out non-profitable cases. Part of D57059. Differential Revision: https://reviews.llvm.org/D104122	2021-07-08 12:35:39 -07:00
Alexey Bataev	0d74fd3fdf	[SLP][COST][X86]Improve cost model for masked gather. Revived D101297 in its original form + added some changes in X86 legalization cehcking for masked gathers. This solution is the most stable and the most correct one. We have to check the legality before trying to build the masked gather in SLP. Without this check we have incorrect cost (for SLP) in case if the masked gather is not legal/slower than the gather. And we're missing some vectorization opportunities. This can be fixed in the cost model, but in this case we need to add special checks for the cost of GEPs for ScatterVectorize node, add special check for small trees, etc., i.e. there are a lot of corner cases here and there, which insrease code base and make it harder to maintain the code. > Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed. The question from D101297. Actually, no, it can't. Actually, simple gather may give us better result, especially after we started vectorization of insertelements. Plus, like I said before, the cost for non-legal masked gathers leads to missed vectorization opportunities. Differential Revision: https://reviews.llvm.org/D105042	2021-07-08 11:53:30 -07:00
Sanjay Patel	97c473ad39	[SLP] rename variable to not be misleading; NFC The reduction matching was probably only dealing with binops when it was written, but we have now generalized it to handle select and intrinsics too, so assert on that too.	2021-07-07 14:40:21 -04:00
Philip Reames	723144665b	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 4) Resubmit after the following changes: * Fix a latent bug related to unrolling with required epilogue (see `e49d65f`). I believe this is the cause of the prior PPC buildbot failure. * Disable non-latch exits for epilogue vectorization to be safe (`9ffa90d`) * Split out assert movement (`600624a`) to reduce churn if this gets reverted again. Previous commit message (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-07-07 07:44:35 -07:00
Dylan Fleming	7215dcfe36	[SVE] Fix ShuffleVector cast<FixedVectorType> in truncateToMinimalBitwidths Depends on D104239 Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105341	2021-07-07 15:30:10 +01:00
Dylan Fleming	7586b47fb6	[SVE] Fix cast<FixedVectorType> in truncateToMinimalBitwidths Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D104239	2021-07-07 09:58:05 +01:00
Philip Reames	9ffa90d6c2	[LV] Disable epilogue vectorization for non-latch exits When skimming through old review discussion, I noticed a post commit comment on an earlier patch which had gone unaddressed. Better late (4 months), than never right? I'm not aware of an active problem with the combination of non-latch exits and epilogue vectorization, but the interaction was not considered and I'm not modivated to make epilogue vectorization work with early exits. If there were a bug in the interaction, it would be pretty hard to hit right now (as we canonicalize towards bottom tested loops), but an upcoming change to allow multiple exit loops will greatly increase the chance for error. Thus, let's play it safe for now.	2021-07-06 10:57:10 -07:00
Alexey Bataev	4e1a0684f1	[SLP]Fix non-determinism in PHI sorting. Compare type IDs and DFS numbering for basic block instead of addresses to fix non-determinism. Differential Revision: https://reviews.llvm.org/D105031	2021-07-06 08:45:45 -07:00
Florian Hahn	ef0d147cdc	Recommit "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups. This reverts commit `706bbfb35b`. The committed version moves the definition of VPReductionPHIRecipe out of an ifdef only intended for ::print helpers. This should resolve the build failures that caused the revert	2021-07-06 14:15:42 +01:00
Kerry McLaughlin	a7512401e5	[LV] Prevent vectorization with unsupported element types. This patch adds a TTI function, isElementTypeLegalForScalableVector, to query whether it is possible to vectorize a given element type. This is called by isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if any of the instruction types in the loop are unsupported, e.g: int foo(__int128_t* ptr, int N) #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) ptr[i] = ptr[i] + 42; This example currently crashes if we attempt to vectorize since i128 is not a supported type for scalable vectorization. Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D102253	2021-07-06 13:06:21 +01:00
Florian Hahn	706bbfb35b	Revert "[VPlan] Add VPReductionPHIRecipe (NFC)." and follow-ups This reverts commit `3fed6d443f`, `bbcbf21ae6` and `6c3451cd76`. The changes causing build failures with certain configurations, e.g. https://lab.llvm.org/buildbot/#/builders/67/builds/3365/steps/6/logs/stdio lib/libLLVMVectorize.a(LoopVectorize.cpp.o): In function `llvm::VPRecipeBuilder::tryToCreateWidenRecipe(llvm::Instruction, llvm::ArrayRef<llvm::VPValue>, llvm::VFRange&, std::unique_ptr<llvm::VPlan, std::default_delete<llvm::VPlan> >&) [clone .localalias.8]': LoopVectorize.cpp:(.text._ZN4llvm15VPRecipeBuilder22tryToCreateWidenRecipeEPNS_11InstructionENS_8ArrayRefIPNS_7VPValueEEERNS_7VFRangeERSt10unique_ptrINS_5VPlanESt14default_deleteISA_EE+0x63b): undefined reference to `vtable for llvm::VPReductionPHIRecipe' collect2: error: ld returned 1 exit status	2021-07-06 12:10:03 +01:00
Florian Hahn	3fed6d443f	[VPlan] Mark overriden function in VPWidenPHIRecipe as virtual. VPReductionRecipe overrides those implementations. Mark them as virtual in the VPWidenPHIRecipe to unbreak build in certain configurations.	2021-07-06 12:00:41 +01:00
Florian Hahn	bbcbf21ae6	[VPlan] Add destructor to VPReductionRecipe to unbreak build. Attempt to unbreak https://lab.llvm.org/buildbot/#/builders/67/builds/3363/steps/6/logs/stdio	2021-07-06 11:41:20 +01:00
Florian Hahn	6c3451cd76	[VPlan] Add VPReductionPHIRecipe (NFC). This patch is a first step towards splitting up VPWidenPHIRecipe into separate recipes for the 3 distinct cases they model: 1. reduction phis, 2. first-order recurrence phis, 3. pointer induction phis. This allows untangling the code generation and allows us to reduce the reliance on LoopVectorizationCostModel during VPlan code generation. Discussed/suggested in D100102, D100113, D104197. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104989	2021-07-06 11:25:28 +01:00
Kerry McLaughlin	17b701c43c	[LV] Collect a list of all element types found in the loop (NFC) Splits `getSmallestAndWidestTypes` into two functions, one of which now collects a list of all element types found in the loop (`ElementTypesInLoop`). This ensures we do not have to iterate over all instructions in the loop again in other places, such as in D102253 which disables scalable vectorization of a loop if any of the instructions use invalid types. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D105437	2021-07-06 10:37:41 +01:00
Caroline Concatto	b868a2d2c6	[SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector. The function vectorizeChainsInBlock does not support scalable vector, because function like canReuseExtract and isCommutative in the code path assert with scalable vectors. This patch avoids vectorizing blocks that have extract instructions with scalable vector.. Differential Revision: https://reviews.llvm.org/D104809	2021-07-05 12:43:41 +01:00
Nikita Popov	a213f735d8	[IR] Deprecate GetElementPtrInst::CreateInBounds without element type This API is not compatible with opaque pointers, the method accepting an explicit pointer element type should be used instead. Thankfully there were few in-tree users. The BPF case still ends up using the pointer element type for now and needs something like D105407 to avoid doing so.	2021-07-04 16:49:30 +02:00
Paul Walker	287d39dd5a	[NFC] Fix a few whitespace issues and typos.	2021-07-04 11:49:58 +01:00
Nikita Popov	fabc17192e	[IRBuilder] Add type argument to CreateMaskedLoad/Gather Same as other CreateLoad-style APIs, these need an explicit type argument to support opaque pointers. Differential Revision: https://reviews.llvm.org/D105395	2021-07-04 12:17:59 +02:00
Alexey Bataev	7f7e4aed21	[SLP][NFC]Refactor findLaneForValue and make it static member, NFC, by V.Dmitriev. Reduces number of arguments	2021-07-02 10:30:13 -07:00
Alexey Bataev	28ac873bcb	[SLP]Fix gathering of the scalars by not ignoring UndefValues. The compiler should not ignore UndefValue when gathering the scalars, otherwise the resulting code may be less defined than the original one. Also, grouped scalars to insert them at first to reduce the analysis in further passes. Differential Revision: https://reviews.llvm.org/D105275	2021-07-02 04:46:48 -07:00
David Sherwood	51b4ab26ca	[NFC] Add new setDebugLocFromInst that uses the class Builder by default In lots of places we were calling setDebugLocFromInst and passing in the same Builder member variable found in InnerLoopVectorizer. I personally found this confusing so I've changed the interface to take an Optional<IRBuilder<> *> and we can now pass in None when we want to use the class member variable. Differential Revision: https://reviews.llvm.org/D105100	2021-07-01 14:23:34 +01:00
David Sherwood	7b7b5b5a26	[NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-30 11:11:49 +01:00
Philip Reames	e49d65f36d	[LV] Fix bug when unrolling (only) a loop with non-latch exit If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so. The included test case on an unmodified opt will access memory one past the expected bound. As a result, this patch is fixing a latent miscompile. Differential Revision: https://reviews.llvm.org/D103700	2021-06-29 08:04:26 -07:00
David Sherwood	9de63367d8	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `9dde514162`.	2021-06-29 15:20:22 +01:00
David Sherwood	9dde514162	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 14:34:30 +01:00
David Sherwood	8a3365fba2	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `dcfc2c3fac`.	2021-06-29 14:04:42 +01:00
Florian Hahn	47215e1c62	[LV] Fix crash when target instruction for sinking is dead. This patch fixes a crash when the target instruction for sinking is dead. In that case, no recipe is created and trying to get the recipe for it results in a crash. To ensure all sink targets are alive, find & use the first previous alive instruction. Note that the case where the sink source is dead is already handled. Found by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35320 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104603	2021-06-29 13:31:22 +01:00
David Sherwood	303b6d5e98	[LoopVectorize] Add support for scalable vectorization of invariant stores Previously in setCostBasedWideningDecision if we encountered an invariant store we just assumed that we could scalarize the store and called getUniformMemOpCost to get the associated cost. However, for scalable vectors this is not an option because it is not currently possibly to scalarize the store. At the moment we crash in VPReplicateRecipe::execute when trying to scalarize the store. Therefore, I have changed setCostBasedWideningDecision so that if we are storing a scalable vector out to a uniform address and the target supports scatter instructions, then we should use those instead. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-inv-store.ll Differential Revision: https://reviews.llvm.org/D104624	2021-06-29 11:56:09 +01:00
David Sherwood	dcfc2c3fac	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 09:14:35 +01:00
Kerry McLaughlin	f99672568f	[LoopVectorize] Fix strict reductions where VF = 1 Currently we will allow loops with a fixed width VF of 1 to vectorize if the -enable-strict-reductions flag is set. However, the loop vectorizer will not use ordered reductions if `VF.isScalar()` and the resulting vectorized loop will be out of order. This patch removes `VF.isVector()` when checking if ordered reductions should be used. Also, instead of converting the FAdds to reductions if the VF = 1, operands of the FAdds are changed such that the order is preserved. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D104533	2021-06-28 11:27:10 +01:00
Florian Hahn	80aa7e147e	[VPlan] Merge predicated-triangle regions, after sinking. Sinking scalar operands into predicated-triangle regions may allow merging regions. This patch adds a VPlan-to-VPlan transform that tries to merge predicate-triangle regions after sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100260	2021-06-28 11:10:38 +01:00
Nikita Popov	a9129f8964	[LoadStoreVectorizer] Support opaque pointers There are remaining redundant bitcasts.	2021-06-27 15:42:16 +02:00
Florian Hahn	f1a6430272	[VPlan] Track both incoming values for first-order recurrence phis. This patch updates VPWidenPHI recipes for first-order recurrences to also track the incoming value from the back-edge. Similar to D99294, which did the same for reductions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104197	2021-06-27 14:29:35 +01:00
Florian Hahn	7f36981977	[LV] Adjust trip count based on IsOrdered in widenPHIInstruction (NFC). Suggested in D104197, avoids the early exit.	2021-06-26 13:13:25 +01:00
Florian Hahn	cc5ee857f9	[LV] Doxygenize VectorizationFactor member comments (NFC). Minor cleanup for follow-up patch.	2021-06-25 18:35:00 +01:00
Florian Hahn	91053e327c	[LV] Reflow comment for VectorizationCostTy (NFC).	2021-06-25 14:20:06 +01:00
Florian Hahn	833bdbe93c	[LV] Support sinking recipe in replicate region after another region. This patch handles sinking a replicate region after another replicate region. In that case, we can connect the sink region after the target region. This properly handles the case for which an assertion has been added in `337d765282`. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D103514	2021-06-24 13:58:42 +01:00
Nikita Popov	00d3f7cc3c	[LAA] Make getPointersDiff() API compatible with opaque pointers Make getPointersDiff() and sortPtrAccesses() compatible with opaque pointers by explicitly passing in the element type instead of determining it from the pointer element type. The SLPVectorizer result is slightly non-optimal in that unnecessary pointer bitcasts are added. Differential Revision: https://reviews.llvm.org/D104784	2021-06-23 18:44:34 +02:00
Alexey Bataev	908b753661	[SLP]Improve vectorization of PHI instructions. Perform better analysis when trying to vectorize PHIs. 1. Do not try to vectorize vector PHIs. 2. Do deeper analysis for more profitable nodes for the vectorization. Before we just tried to vectorize the PHIs of the same type. Patch improves this and tries to vectorize PHIs with incoming values which come from the same basic block, have the same and/or alternative opcodes. It allows to save the compile time and provides better vectorization results in general. Part of D57059. Differential Revision: https://reviews.llvm.org/D103638	2021-06-21 12:26:24 -07:00
Roman Lebedev	37dfc467ac	[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg This really isn't talking about vectors in general, but only about either fixed or scalable vectors, and it's pretty confusing to see it state that there aren't any vectors :)	2021-06-17 21:07:34 +03:00
Florian Hahn	80a403348b	[VPlan] Support PHIs as LastInst when inserting scalars in ::get(). At the moment, we create insertelement instructions directly after LastInst when inserting scalar values in a vector in VPTransformState::get. This results in invalid IR when LastInst is a phi, followed by another phi. In that case, the new instructions should be inserted just after the last PHI node in the block. At the moment, I don't think the problematic case can be triggered, but it can happen once predicate regions are merged and multiple VPredInstPHI recipes are in the same block (D100260). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104188	2021-06-17 09:36:44 +01:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Evgeniy Brevnov	96cded5b79	[SLP] Incorrect handling of external scalar values Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D103954	2021-06-16 13:27:36 +07:00
Florian Hahn	96ca03493a	[VectorCombine] Limit scalarization to non-poison indices for now. As Eli mentioned post-commit in D103378, the result of the freeze may still be out-of-range according to Alive2. So for now, just limit the transform to indices that are non-poison.	2021-06-14 16:40:14 +01:00
Simon Pilgrim	b013c58e82	VPlanSLP.cpp - tidy implicit header dependencies. NFCI. We don't use std::string and std::vector, but we do use std::pair and std::max.	2021-06-13 12:37:17 +01:00
Valery N Dmitriev	94a07c79cf	[SLP][NFC] Fix condition that was supposed to save a bit of compile time. It was found by chance revealing discrepancy between comment (few lines above), the condition and how re-ordering of instruction is done inside the if statement it guards. The condition was always evaluated to true. Differential Revision: https://reviews.llvm.org/D104064	2021-06-11 10:08:55 -07:00
Alexey Bataev	a010d4230e	[SLP]Allow reordering of insertelements. After we added support for non-ordered insertelements, we can allow their reordering. Differential Revision: https://reviews.llvm.org/D104057	2021-06-11 08:47:41 -07:00
Alexey Bataev	74af4bb1f4	[SLP]Remove unnecessary UndefValue in CreateShuffle. No need to use UndefValue in CreateShuffle call. Differential Revision: https://reviews.llvm.org/D104113	2021-06-11 08:08:30 -07:00
Roman Lebedev	20542b47d6	[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper This results in slightly more optimistic alignments in some cases	2021-06-11 12:47:10 +03:00
Roman Lebedev	abc0e0125c	[NFC][VectorCombine] Extract computeAlignmentAfterScalarization() helper function	2021-06-11 12:47:09 +03:00
Simon Pilgrim	5e6bfb661e	[Analysis] Pass RecurrenceDescriptor as const reference. NFCI. We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.). Differential Revision: https://reviews.llvm.org/D104029	2021-06-11 10:24:14 +01:00
Qiu Chaofan	2670c7dd5b	[VectorCombine] Fix alignment in single element store This fixes the concern in single element store scalarization that the alignment of new store may be larger than it should be. It calculates the largest alignment if index is constant, and a safe one if not. Reviewed By: lebedev.ri, spatel Differential Revision: https://reviews.llvm.org/D103419	2021-06-11 10:28:15 +08:00
Slava Nikolaev	119965865c	LoadStoreVectorizer: support different operand orders in the add sequence match First we refactor the code which does no wrapping add sequences match: we need to allow different operand orders for the key add instructions involved in the match. Then we use the refactored code trying 4 variants of matching operands. Originally the code relied on the fact that the matching operands of the two last add instructions of memory index calculations had the same LHS argument. But which operand is the same in the two instructions is actually not essential, so now we allow that to be any of LHS or RHS of each of the two instructions. This increases the chances of vectorization to happen. Reviewed By: volkan Differential Revision: https://reviews.llvm.org/D103912	2021-06-10 16:31:35 -07:00
Joachim Meyer	4f01122c3f	[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads. This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL. The question remains why this was initially added and what the implications of removing this optimization would be. Do we need an alternative mechanism to propagate the information about legality of if-conversion? Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general? I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous. Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103907	2021-06-10 23:37:57 +02:00
Alexey Bataev	a893b44187	[SLP]Disable scheduling of insertelements. There is no need to schedule insertelement instructions. The compiler did not schedule them before it started support their vectorization and it should not do it after. We pre-schedule them manually when finding a build vector sequence. Disabling scheduling of insertelement instructions improves compile time and vectorization of the very large basic blocks by saving scheduling budget for other instructions. Differential Revision: https://reviews.llvm.org/D104026	2021-06-10 10:25:26 -07:00
Keith Smiley	026170d17d	Fix range-loop-analysis warning ``` llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:19: warning: loop variable 'VF' of type 'const llvm::ElementCount' creates a copy from type 'const llvm::ElementCount' [-Wrange-loop-analysis] for (const auto VF : VFCandidates) { ^ llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:8: note: use reference type 'const llvm::ElementCount &' to prevent copying for (const auto VF : VFCandidates) { ^~~~~~~~~~~~~~~ & 1 warning generated. ``` Differential Revision: https://reviews.llvm.org/D103970	2021-06-10 08:39:54 -07:00
Alexey Bataev	a0086add2e	[SLP]Improve gathering of scalar elements. 1. Better sorting of scalars to be gathered. Trying to insert constants/arguments/instructions-out-of-loop at first and only then the instructions which are inside the loop. It improves hoisting of invariant insertelements instructions. 2. Better detection of shuffle candidates in gathering function. 3. The cost of insertelement for constants is 0. Part of D57059. Differential Revision: https://reviews.llvm.org/D103458	2021-06-09 05:23:21 -07:00
Kerry McLaughlin	14eeccfe9a	[LoopVectorize] Don't use strict reductions when reordering is allowed If the `-enable-strict-reductions` flag is set to true, then currently we will always choose to vectorize the loop with strict in-order reductions. This is not necessary where we allow the reordering of FP operations, such as when loop hints are passed via metadata. This patch moves useOrderedReductions so that we can also check whether loop hints allow reordering, in which case we should use the default behaviour of vectorizing with unordered reductions. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103814	2021-06-08 10:39:29 +01:00
Florian Hahn	1465e7770b	[VPlan] Print successors of VPRegionBlocks. The non-DOT printing does not include the successors of VPregionBlocks. This patch use the same style for printing successors as for VPBasicBlock. I think the printing of successors could be a bit improved further, as at the moment it is hard to ensure a check line matches all successors. But that can be done as follow-up. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D103515	2021-06-07 17:57:21 +01:00
Florian Hahn	23c2f2e6b2	[LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW. This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling. At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything. Note that this could probably be further improved by using information from the original IV. Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g Part of a set of fixes required for PR50412. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103255	2021-06-07 10:47:52 +01:00
Alexey Bataev	8c48d77cdf	[SLP]Improve cost estimation/emission of externally used extractelements. No need to recalculate the cost of extractelements, just no need to compensate the cost of all extractelements, need to check before if this is actually going to be removed at the vectorization. Also, no need to generate new extractelement instruction, we may just regenerate the original one. It may improve the final vectorization. Differential Revision: https://reviews.llvm.org/D102933	2021-06-03 10:26:59 -07:00
Alexey Bataev	89f3bc7698	[SLP]Allow to reorder nodes with >2 scalar values. tryToVectorizeList function allows to reorder only 2 scalars. Patch allows to reorder >2 scalars. Also, to avoid possible regressions, it allows extra vectorization of the remaining parts of the scalars elements if possible. Part of D57059. Differential Revision: https://reviews.llvm.org/D103247	2021-06-03 10:01:36 -07:00
Harald van Dijk	5d2b3de284	[SLP] Avoid std::stable_sort(properlyDominates()). As noticed by NAKAMURA Takumi back in 2017, we cannot use properlyDominates for std::stable_sort as properlyDominates only partially orders blocks. That is, for blocks A, B, C, D, where A dominates B and C dominates D, we have A == C, B == C, but A < B. This is not a valid comparison function for std::stable_sort and causes different results between libstdc++ and libc++. This change uses DFS numbering to give deterministic results for all reachable blocks. Unreachable blocks are ignored already, so do not need special consideration. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103441	2021-06-03 17:51:52 +01:00
Sander de Smalen	d41cb6bb26	[LV] Build and cost VPlans for scalable VFs. This patch uses the calculated maximum scalable VFs to build VPlans, cost them and select a suitable scalable VF. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98722	2021-06-02 14:47:47 +01:00
Sander de Smalen	034503e9d2	[LV] NFC: Remove redundant isLegalMasked(Gather\|Scatter) functions. This NFC change follows from conversation in D102437, where it was discussed to remove these functions as a separate patch.	2021-06-02 14:09:07 +01:00
Sander de Smalen	3472d3fd9d	[LV] NFC: Replace custom getMemInstValueType by llvm::getLoadStoreType. llvm::getLoadStoreType was added recently and has the same implementation as 'getMemInstValueType' in LoopVectorize.cpp. Since there is no value in having two implementations, this patch removes the custom LV implementation in favor of the generic one defined in Instructions.h.	2021-06-02 14:09:06 +01:00
Harald van Dijk	f126e8ec28	[SLPVectorizer] Ignore unreachable blocks As the existing test unreachable.ll shows, we should be doing more work to avoid entering unreachable blocks: we should not stop vectorization just because a PHI incoming value from an unreachable block cannot be vectorized. We know that particular value will never be used so we can just replace it with poison.	2021-06-01 20:21:04 +01:00
Alexey Bataev	36911971a5	[SLP]Better detection of perfect/shuffles matches for gather nodes. Implemented better scheme for perfect/shuffled matches of the gather nodes which allows to fix the performance regressions introduced by earlier patches. Starting detecting matches for broadcast nodes and extractelement gathering. Differential Revision: https://reviews.llvm.org/D102920	2021-06-01 07:08:07 -07:00
Florian Hahn	d4c070d801	[VectorCombine] Freeze index unless it is known to be non-poison. If the index itself is already poison, the poison propagates through instructions clamping the index to a valid range. This still causes introducing a load of poison, as flagged by Alive2 and pointed out at `575e2aff55`. This patch updates the code to freeze the index, unless it is proven to not be poison. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D103378	2021-06-01 10:40:57 +01:00
Florian Hahn	aa00b1d763	[LV] Try to sink users recursively for first-order recurrences. Update isFirstOrderRecurrence to explore all uses of a recurrence phi and check if we can sink them. If there are multiple users to sink, they are all mapped to the previous instruction. Fixes PR44286 (and another PR or two). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84951	2021-05-31 19:55:33 +01:00
Bardia Mahjour	06eaffa858	[NFC] Remove confusing info about MainLoop VF/UF from debug message	2021-05-28 16:10:04 -04:00
Florian Hahn	007f268c35	[VectorCombine] Check indices for all extracts we scalarize. We need to make sure that the indices of all extracts we scalarize are valid.	2021-05-28 18:35:29 +01:00
Florian Hahn	38641ddf3e	[VPlan] Do not sink uniform recipes in sinkScalarOperands. For uniform ReplicateRecipes, only the first lane should be used, so sinking them would mean we have to compute the value of the first lane multiple times. Also, at the moment, sinking them causes a crash because the value of the first lane is re-used by all users. Reported post-commit for D100258.	2021-05-27 14:07:48 +01:00
Alexey Bataev	27d3528acf	[SLP]Fix vectorization of insertelements with multiple uses. SLP vectorizer should not consider in sertelements with multiple uses as a part of high level build vector, it must be considered as a terminating insertelement in the vector build, otherwise it may produce incorrect code. Differential Revision: https://reviews.llvm.org/D103164	2021-05-26 09:42:18 -07:00
Kerry McLaughlin	9f76a85260	[LoopVectorize] Enable strict reductions when allowReordering() returns false When loop hints are passed via metadata, the allowReordering function in LoopVectorizationLegality will allow the order of floating point operations to be changed: bool allowReordering() const { // When enabling loop hints are provided we allow the vectorizer to change // the order of operations that is given by the scalar loop. This is not // enabled by default because can be unsafe or inefficient. The -enable-strict-reductions flag introduced in D98435 will currently only vectorize reductions in-loop if hints are used, since canVectorizeFPMath() will return false if reordering is not allowed. This patch changes canVectorizeFPMath() to query whether it is safe to vectorize the loop with ordered reductions if no hints are used. For testing purposes, an additional flag (-hints-allow-reordering) has been added to disable the reordering behaviour described above. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D101836	2021-05-26 13:59:12 +01:00
Florian Hahn	8e83ff58c9	[VectorCombine] Remove unneeded InsertPointGuard (NFCI). All users of the builder should set an insert point before using the builder. There should be no need for using InsertPointGuard here.	2021-05-25 17:01:05 +01:00
Florian Hahn	575e2aff55	[VectorCombine] Use constant range info for index scalarization legality. We can only scalarize memory accesses if we know the index is valid. This patch adjusts canScalarizeAcceess to fall back to computeConstantRange to check if the index is known to be valid. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D102476	2021-05-25 13:58:42 +01:00
Anton Afanasyev	b2cd895011	[SLP] Fix "gathering" of insertelement instructions For rare exceptional case vector tree node (insertelements for now only) is marked as `NeedToGather`, this case is processed by patch. Follow-up of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135. Differential Revision: https://reviews.llvm.org/D102675	2021-05-25 01:35:43 +03:00
Florian Hahn	65d3dd7c88	[VPlan] Add first VPlan version of sinkScalarOperands. This patch adds a first VPlan-based implementation of sinking of scalar operands. The current version traverse a VPlan once and processes all operands of a predicated REPLICATE recipe. If one of those operands can be sunk, it is moved to the block containing the predicated REPLICATE recipe. Continue with processing the operands of the sunk recipe. The initial version does not re-process candidates after other recipes have been sunk. It also cannot partially sink induction increments at the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the induction is used for example in a GEP, only the first lane is used and in the lowered IR the adds for the other lanes can be sunk into the predicated blocks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100258	2021-05-24 15:29:58 +01:00
Florian Hahn	e9d97d7d9d	[VPlan] Add mayReadOrWriteMemory & friends. This patch adds initial implementation of mayReadOrWriteMemory, mayReadFromMemory and mayWriteToMemory to VPRecipeBase. Used by D100258.	2021-05-24 13:11:32 +01:00
Florian Hahn	4e8c28b6fb	Recommit "[VectorCombine] Scalarize vector load/extract." This reverts commit `94d54155e2`. This fixes a sanitizer failure by moving scalarizeLoadExtract(I) before foldSingleElementStore(I), which may remove instructions.	2021-05-24 11:35:07 +01:00
Florian Hahn	94d54155e2	Revert "[VectorCombine] Scalarize vector load/extract." This reverts commit `86497785d5`. One of the tests causes an ASAN failure. https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio	2021-05-24 10:11:00 +01:00
Florian Hahn	86497785d5	[VectorCombine] Scalarize vector load/extract. This patch adds a new combine that tries to scalarize chains of `extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is profitable when extracting only a few elements out of a large vector. At the moment, `store (extractelement (load %ptr), %idx), %ptr` operations on large vectors result in huge code in the backend. This can easily be triggered by using the matrix extension, e.g. https://clang.godbolt.org/z/qsccPdPf4 This should complement D98240. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D100273	2021-05-24 09:29:08 +01:00
Alexey Bataev	8dab25954b	[SLP]Improve handling of compensate external uses cost. External insertelement users can be represented as a result of shuffle of the vectorized element and noconsecutive insertlements too. Added support for handling non-consecutive insertelements. Differential Revision: https://reviews.llvm.org/D101555	2021-05-21 07:45:31 -07:00
Daniil Fukalov	e8e88c3353	[TTI] NFC: Change getRegUsageForType to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102541	2021-05-21 15:17:23 +03:00
Alexey Bataev	182162b616	[SLP]Try to vectorize tiny trees with shuffled gathers of extractelements. If we gather extract elements and they actually are just shuffles, it might be profitable to vectorize them even if the tree is tiny. Differential Revision: https://reviews.llvm.org/D101460	2021-05-20 08:36:16 -07:00
David Sherwood	7e95a563c8	Remove scalable vector assert from InnerLoopVectorizer::setDebugLocFromInst In InnerLoopVectorizer::setDebugLocFromInst we were previously asserting that the VF is not scalable. This is because we want to use the number of elements to create a duplication factor for the debug profiling data. However, for scalable vectors we only know the minimum number of elements. I've simply removed the assert for now and added a FIXME saying that we assume vscale is always 1. When vscale is not 1 it just means that the profiling data isn't as accurate, but shouldn't cause any functional problems.	2021-05-19 13:33:10 +01:00

1 2 3 4 5 ...

2767 Commits