llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	95f76bff1c	[LV] Create & use VPScalarIVSteps for all scalar users. This patch is a follow-up to D115953. It updates optimizeInductions to also introduce new VPScalarIVStepsRecipes if an IV has both vector and scalar uses. It updates all uses that only need scalar values to use the newly created recipe for the scalar steps. This completes untangling of VPWidenIntOrFpInductionRecipe code-generation. Now the recipe only creates the widened vector values, as it says on the tin. The code to genereate IR has been moved directly to VPWidenIntOrFpInductionRecipe::execute. Note that the recipe has been updated to hold a reference to ScalarEvolution, which is needed to expand the step, until we can place the corresponding SCEV expansion in the pre-header. Depends on D120827. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D120828	2022-03-13 17:15:24 +00:00
Roman Lebedev	2f80ea7f4f	[NFC][LV] Use different braces in debug output The analysis passes output function name encapsulated in `'` braces, but LV uses `"`. Harmonizing this may help in creating an update script for the LV costmodel test checks. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D121105	2022-03-07 19:32:37 +03:00
Andrei Elovikov	6e9a8cdcfb	[NFC][LoopVectorizer] Simplify LoopVectorize/X86/gather_scatter.ll The test used to run whole O3 pipeline. Modify it to contain LLVM IR right before LV and limit passes to "-loop-vectorizer -simplifycfg". For the RUN line with forced VF force interleave factor as well to simplify CHECKs as interleaving isn't related to the purpose of the test. I also tried to add "noalias" to pointer arguments in @test_gather_not_profitable_pr48429 but LAI seems unable to use them. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D119786	2022-02-28 11:12:50 -08:00
Florian Hahn	b3e8ace198	Recommit "[VPlan] Introduce recipe to build scalar steps." This reverts the revert commit `ff93260bf6`. The underlying issue causing the PPC bot failures has been fixed in `cbaac14734` and a corresponding test case has been added in `ad2cad1c52`. Original message: This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-28 14:12:20 +00:00
Florian Hahn	ff93260bf6	Revert "[VPlan] Introduce recipe to build scalar steps." This reverts commit `49b23f451c`. This appears to break some PPC build bots. Revert while I investigate.	2022-02-27 17:51:19 +00:00
Florian Hahn	49b23f451c	[VPlan] Introduce recipe to build scalar steps. This patch adds a new VPScalarIVStepsRecipe to handle building scalar steps. In the first patch, it only handles the case where there is no vector induction variable needed. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D115953	2022-02-27 17:32:41 +00:00
Florian Hahn	da740492b0	[VPlan] Remove dead header-phi recipes. This patch adds a new transform to remove dead recipes. For now, it only removes dead recipes in the header, to keep the number tests that require updating manageable. Future patches will extend this to remove dead recipes across the whole plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118051	2022-02-26 16:26:39 +00:00
Nikita Popov	a266af7211	[InstCombine] Canonicalize SPF to min/max intrinsics Now that integer min/max intrinsics have good support in both InstCombine and other passes, start canonicalizing SPF min/max to intrinsic min/max. Once this sticks, we can stop matching SPF min/max in various places, and can remove hacks we have for preventing infinite loops and breaking of SPF canonicalization. Differential Revision: https://reviews.llvm.org/D98152	2022-02-24 09:01:20 +01:00
Malhar Jajoo	9f1c6fbf11	[LAA] Add remarks for unbounded array access Adds new optimization remarks when loop vectorization fails due to the compiler being unable to find bound of an array access inside a loop Differential Revision: https://reviews.llvm.org/D115873	2022-02-23 15:57:39 +00:00
Florian Hahn	446e7c64c7	[LV] Add real uses in some tests, to make them more robust. Add real uses to some tests, to ensure dead instructions cannot be directly removed.	2022-02-13 09:52:59 +00:00
David Green	b55d4c2ad8	Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`" This reverts commit `77a0da926c` as we've received multiple reports of this significantly impacting performance, in ways that don't seem to just be target specific cost models going wrong. I would offer some reproducers, but the test changes here seem to be full of them! Reverting for now and hopefully we can remove the "hack" more carefully as we go.	2022-02-09 20:02:54 +00:00
Roman Lebedev	77a0da926c	[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. What it essentially does is prevents scalarized vectorization of masked memory operations: ``` // TODO: Cost model for emulated masked load/store is completely // broken. This hack guides the cost model to use an artificially // high enough value to practically disable vectorization with such // operations, except where previously deployed legality hack allowed // using very low cost values. This is to avoid regressions coming simply // from moving "masked load/store" check from legality to cost model. // Masked Load/Gather emulation was previously never allowed. // Limited number of Masked Store/Scatter emulation was allowed. ``` While i don't really understand about what specifically `is completely broken` was talking about, i believe that at least on X86 with AVX2-or-later, this is no longer true. (or at least, i would like to know what is still broken). So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. But since this was added for X86 specifically, let's just instead completely remove this hack. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114779	2022-02-07 16:08:31 +03:00
Florian Hahn	8f12175fed	[VPlan] Use VPlan to check if only the first lane is used. This removes the remaining dependence on LoopVectorizationCostModel from buildScalarSteps and is required so it can be moved out of ILV. It also improves allows us to remove a few unneeded instructions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116554	2022-01-30 13:07:29 +00:00
Florian Hahn	efd4938723	[VPlan] Handle IV vector splat using VPWidenCanonicalIV. This patch tries to use an existing VPWidenCanonicalIVRecipe instead of creating another step-vector for canonical induction recipes in widenIntOrFpInduction. This has the following benefits: 1. First step to avoid setting both vector and scalar values for the same induction def. 2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan 3. Only need to splat the vector IV for block in masks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116123	2022-01-29 16:25:27 +00:00
Malhar Jajoo	b75bdff4a0	Trivial update for debug location in LIT test. This just updates debug location of a loop in a LIT test to point to the correct source line.	2022-01-27 19:07:47 +00:00
Kerry McLaughlin	8082ab2fc3	[LoopVectorize] Support epilogue vectorisation of loops with reductions isCandidateForEpilogueVectorization will currently return false for loops which contain reductions. This patch removes this restriction and makes the following changes to support epilogue vectorisation with reductions: - `fixReduction`: If fixReduction is being called during vectorisation of the epilogue, the phi node it creates will need to additionally carry incoming values from the middle block of the main loop. - `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi created by fixReduction are updated after the vec.epilog.iter.check block is added. The phi is also moved to the preheader of the epilogue. - `processLoop`: The start value of any VPReductionPHIRecipes are updated before vectorising the epilogue loop. The getResumeInstr function added to the ILV will return the resume instruction associated with the recurrence descriptor. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D116928	2022-01-24 12:03:31 +00:00
Florian Hahn	4a6f475446	[LV] Make test more robust by adding users of inductions. The modified tests didn't have actual users of all inductions, making it trivial to eliminate them. Add users to make sure the inductions are actually used in the vectorized version.	2022-01-17 13:28:59 +00:00
Florian Hahn	070d1034da	[LV] Restore metadata to disable runtime unrolling for epilogue loop. After `d4a8fc3a87` LV stopped adding metadata to disable runtime unrolling to the vectorized epilogue loop. This was missed because `278aa65cc4` removed the relevant test coverage. This patch fixes that by adding the relevant metadata after vector loop generation.	2022-01-16 13:14:16 +00:00
Florian Hahn	d4a8fc3a87	[VPlan] Introduce and use BranchOnCount VPInstruction. This patch adds a new BranchOnCount VPInstruction opcode with 2 operands. It first compares its 2 operands (increment of canonical induction and vector trip count), followed by a branch to either the exit block or back to the vector header. It must be the last recipe in the exit block of the topmost vector loop region. This extracts parts from D113224 and was discussed in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116479	2022-01-12 13:42:13 +00:00
Florian Hahn	7f1bf68d7d	[SCEVExpander] Only check overflow if it is needed. `9345ab3a45` updated generateOverflowCheck to skip creating checks that always evaluate to false. This in turn means that we only need to check for overflows if the result of the multiplication is actually used. Sink the Or for the overflow check into ComputeEndCheck, so it is only created when there's an actual check.	2022-01-09 12:55:41 +00:00
Florian Hahn	f395a4f8d5	[SCEVExpand] Only create required predicate checks. Currently generateOverflowCheck always creates code for Step being negative and positive, followed by a select at the end depending on Step's sign. This patch updates the code to only create either the checks for step being positive or negative, if the sign is known. Follow-up to D116696. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D116747	2022-01-07 14:49:02 +00:00
Florian Hahn	86d113a8b8	[SCEVExpand] Do not create redundant 'or false' for pred expansion. This patch updates SCEVExpander::expandUnionPredicate to not create redundant 'or false, x' instructions. While those are trivially foldable, they can be easily avoided and hinder code that checks the size/cost of the generated checks before further folds. I am planning on look into a few other similar improvements to code generated by SCEVExpander. I remember a while ago @lebedev.ri working on doing some trivial folds like that in IRBuilder itself, but there where concerns that such changes may subtly break existing code. Reviewed By: reames, lebedev.ri Differential Revision: https://reviews.llvm.org/D116696	2022-01-06 11:52:19 +00:00
Sander de Smalen	95a93722db	[LV] Remove what seems like stale code in collectElementTypesForWidening. This was originally added in rG22174f5d5af1eb15b376c6d49e7925cbb7cca6be although that patch doesn't really mention any reasons for ignoring the pointer type in this calculation if the memory access isn't consecutive. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D115356	2022-01-05 12:20:59 +00:00
Florian Hahn	65c4d6191f	[VPlan] Add VPCanonicalIVPHIRecipe, partly retire createInductionVariable. At the moment, the primary induction variable for the vector loop is created as part of the skeleton creation. This is tied to creating the vector loop latch outside of VPlan. This prevents from modeling the whole vector loop in VPlan, which in turn is required to model preheader and exit blocks in VPlan as well. This patch introduces a new recipe VPCanonicalIVPHIRecipe to represent the primary IV in VPlan and CanonicalIVIncrement{NUW} opcodes for VPInstruction to model the increment. This allows us to partly retire createInductionVariable. At the moment, a bit of patching up is done after executing all blocks in the plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113223	2022-01-05 10:46:06 +00:00
Rosie Sumpter	961f51fdf0	[LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without loads/stores For loops that contain in-loop reductions but no loads or stores, large VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes has no element types to check through and so returns the default widths (-1U for the smallest and 8 for the widest). This results in the widest VF being chosen for the following example, float s = 0; for (int i = 0; i < N; ++i) s += (float) i*i; which, for more computationally intensive loops, leads to large loop sizes when the operations end up being scalarized. In this patch, for the case where ElementTypesInLoop is empty, the widest type is determined by finding the smallest type used by recurrences in the loop instead of falling back to a default value of 8 bits. This results in the cost model choosing a more sensible VF for loops like the one above. Differential Revision: https://reviews.llvm.org/D113973	2022-01-04 10:12:57 +00:00
Florian Hahn	b1a333f0fe	[VPlan] Don't consider VPWidenCanonicalIVRecipe phi-like. VPWidenCanonicalIVRecipe does not create PHI instructions, so it does not need to be placed in the phi section of a VPBasicBlock. Also tidies the code so the WidenCanonicalIV recipe and the compare/lane-masks are created in the header. Discussed D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116473	2022-01-02 12:48:17 +00:00
Sanjay Patel	0c6979b2d6	[InstCombine] fold opposite shifts around an add ((X << C) + Y) >>u C --> (X + (Y >>u C)) & (-1 >>u C) https://alive2.llvm.org/ce/z/DY9DPg This replaces a shift with an 'and', and in the case where the add has a constant operand, it eliminates both shifts. As noted in the TODO comment, we already have this fold when the shifts are in the opposite order (and that code handles bitwise logic ops too). Fixes #52851	2021-12-30 12:01:06 -05:00
Sanjay Patel	fd9cd3408b	Revert "[InstCombine] fold opposite shifts around an add" This reverts commit `2e3e0a5c28`. Some unintended diffs snuck into this patch.	2021-12-30 11:54:55 -05:00
Sanjay Patel	2e3e0a5c28	[InstCombine] fold opposite shifts around an add ((X << C) + Y) >>u C --> (X + (Y >>u C)) & (-1 >>u C) https://alive2.llvm.org/ce/z/DY9DPg This replaces a shift with an 'and', and in the case where the add has a constant operand, it eliminates both shifts. As noted in the TODO comment, we already have this fold when the shifts are in the opposite order (and that code handles bitwise logic ops too). Fixes #52851	2021-12-30 11:52:29 -05:00
Philip Reames	e6ad9ef4e7	[instcombine] Canonicalize constant index type to i64 for extractelement/insertelement The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order. I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions. We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause. Differential Revision: https://reviews.llvm.org/D115387	2021-12-13 16:56:22 -08:00
Philip Reames	bbba86764a	Revert "Autogen more vectorizer tests in advance of D115387." This reverts commit `bbfaf0b170`. Post commit review noted a case where my manual update lost intentional check lines. Given I've abandoned the motivating patch, I'm just reverting the autogen prep.	2021-12-13 12:45:50 -08:00
Philip Reames	bbfaf0b170	Autogen more vectorizer tests in advance of D115387. These are the ones my first round of scripting couldn't handle that required a bit of manual messaging. This should be the last batch in llvm-check.	2021-12-13 11:04:20 -08:00
Evgeniy Brevnov	2025e0985c	[LV] Make sure VF doesn't exceed compile time known TC For the simple copy loop (see test case) vectorizer selects VF equal to 32 while the loop is known to have 17 iterations only. Such behavior makes no sense to me since such vector loop will never be executed. The only case we may want to select VF large than TC is masked vectoriztion. So I haven't touched that case. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D114528	2021-12-13 13:48:46 +07:00
Evgeniy Brevnov	eef8f3f856	[LV][NFC] New test case for compile time known trip count (TC) New test to test/track upcoming chnages Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D114526	2021-12-10 18:25:55 +07:00
Sander de Smalen	e1edec1ee6	[LV] NFC: Add check for VF to vector_ptr_load_store.ll. This just adds some extra CHECK lines to show the effect of a follow-up patch.	2021-12-08 16:41:59 +00:00
Roman Lebedev	8cd782487f	[X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()` We ask `TTI.getAddressComputationCost()` about the cost of computing vector address, and then multiply it by the vector width. This doesn't make any sense, it implies that we'd do a vector GEP and then scalarize the vector of pointers, but there is no such thing in the vectorized IR, we perform scalar GEP's. This is especially bad on X86, and was effectively prohibiting any scalarized vectorization of gathers/scatters, because `X86TTIImpl::getAddressComputationCost()` says that cost of vector address computation is `10` as compared to `1` for scalar. The computed costs are similar to the ones with D111222+D111220, but we end up without masked memory intrinsics that we'd then have to expand later on, without much luck. (D111363) Differential Revision: https://reviews.llvm.org/D111460	2021-11-30 10:47:56 +03:00
Diego Caballero	4348cd42c3	[LV] Drop integer poison-generating flags from instructions that need predication This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact` and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop is linearized, these flags may lead to generating a poison value that is effectively used as the base address of the widen load/store. The fix drops all the integer poison-generating flags from instructions that contribute to the address computation of a widen load/store whose original instruction was in a basic block that needed predication and is not predicated after vectorization. Reviewed By: fhahn, spatel, nlopes Differential Revision: https://reviews.llvm.org/D111846	2021-11-22 10:57:29 +00:00
Diego Caballero	a7027bb799	[LV] Pre-commit test for D111846 Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D112054	2021-11-22 10:13:56 +00:00
Philip Reames	37ead201e6	[runtime-unroll] Use incrementing IVs instead of decrementing ones This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing. Why does this matter? A couple of reasons: * SCEV doesn't have a native subtract node. Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such. As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones. (You can see this in the inferred flags in some of the test cases.) * Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language. We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced. (You can see this looking at nearby phis in the test cases.) Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen. * Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value. We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.	2021-11-12 15:44:58 -08:00
Kerry McLaughlin	6f16ee5e14	Revert "[LoopVectorize] Extract the last lane from a uniform store" This reverts commit `0d748b4d32`. This is causing some failures when building Spec2017 with scalable vectors. Reverting to investigate.	2021-11-10 11:21:19 +00:00
Dmitry Makogon	62f86d4f95	Reapply `5ec2386` "Reapply `db28934` "[IndVars] Pass TTI to replaceCongruentIVs"" This reverts commit `7cd273c339`. Several patches with tests fixes have been applied: `0cada82f0a` "[Test] Remove incorrect test in GVN" `97cb13615d` "[Test] Separate IndVars test into AArch64 and X86 parts" `985cc490f1` "[Test] Remove separated test in IndVars", and test failures caused by `5ec2386` should be resolved now.	2021-11-10 17:36:14 +07:00
Douglas Yung	7cd273c339	Revert "Reapply `db28934` "[IndVars] Pass TTI to replaceCongruentIVs"" This reverts commit `5ec2386332`. This change is causing test failures on the PS4 linux build bot: https://lab.llvm.org/buildbot/#/builders/139/builds/12871	2021-11-09 10:28:41 -08:00
Kerry McLaughlin	0d748b4d32	[LoopVectorize] Extract the last lane from a uniform store Changes VPReplicateRecipe to extract the last lane from an unconditional, uniform store instruction. collectLoopUniforms will also add stores to the list of uniform instructions where Legal->isUniformMemOp is true. setCostBasedWideningDecision now sets the widening decision for all uniform memory ops to Scalarize, where previously GatherScatter may have been chosen for scalable stores. This fixes an assert ("Cannot yet scalarize uniform stores") in setCostBasedWideningDecision when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: sdesmalen, david-arm, fhahn Differential Revision: https://reviews.llvm.org/D112725	2021-11-09 14:43:16 +00:00
Dmitry Makogon	5ec2386332	Reapply `db28934` "[IndVars] Pass TTI to replaceCongruentIVs" This reapplies patch `db289340c8`. The test failures on build with expensive checks caused by the patch happened due to the fact that we sorted loop Phis in replaceCongruentIVs using llvm::sort, which shuffles the given container if the expensive checks are enabled, so equivalent Phis in the sorted vector had different mutual order from run to run. replaceCongruentIVs tries to replace narrow Phis with truncations of wide ones. In some test cases there were several Phis with the same width, so if their order differs from run to run, the narrow Phis would be replaced with a different Phi, depending on the shuffling result. The patch `ae14fae0ff` fixed this issue by replacing llvm::sort with llvm::stable_sort.	2021-11-09 17:42:29 +07:00
Dmitry Makogon	8d4eba6c0d	Revert "[IndVars] Pass TTI to replaceCongruentIVs" This reverts commit `db289340c8`. The patch caused 2 crashes with expensive checks enabled.	2021-11-08 19:35:14 +07:00
Dmitry Makogon	db289340c8	[IndVars] Pass TTI to replaceCongruentIVs In IndVarSimplify after simplifying and extending loop IVs we call 'replaceCongruentIVs'. This function optionally takes a TTI argument to be able to replace narrow IVs uses with truncates of the widest one. For some reason the TTI wasn't passed to the function, so it couldn't perform such transform. This patch fixes it. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D113024	2021-11-08 19:20:53 +07:00
Philip Reames	6caff716da	Regen some autogen tests to account for format change	2021-10-28 09:22:20 -07:00
Roman Lebedev	b291597112	Revert rest of `IRBuilderBase`'s short-circuiting folds Upon further investigation and discussion, this is actually the opposite direction from what we should be taking, and this direction wouldn't solve the motivational problem anyway. Additionally, some more (polly) tests have escaped being updated. So, let's just take a step back here. This reverts commit `f3190dedee`. This reverts commit `749581d21f`. This reverts commit `f3df87d57e`. This reverts commit `ab1dbcecd6`.	2021-10-28 02:15:14 +03:00
Roman Lebedev	101aaf62ef	Revert "[NFC] `IRBuilderBase::CreateAdd()`: place constant onto RHS" Clang OpenMP codegen tests are failing, will recommit afterwards. This reverts commit `4723c9b3c6`.	2021-10-27 22:21:37 +03:00
Roman Lebedev	42712698fd	Revert "[IR] `IRBuilderBase::CreateAdd()`: short-circuit `x + 0` --> `x`" Clang OpenMP codegen tests are failing. This reverts commit `288f1f8abe`. This reverts commit `cb90e5356a`.	2021-10-27 22:21:37 +03:00

1 2 3 4 5 ...

552 Commits