llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	d92cec4c96	[LV] Regenerate check lines for some tests. Make sure the auto-generated check lines are up-to-date for some files, to reduce the test diff in upcoming changes	2022-05-17 17:45:01 +01:00
Florian Hahn	b7315ffc3c	[LAA,LV] Add initial support for pointer-diff memory checks. This patch adds initial support for a pointer diff based runtime check scheme for vectorization. This scheme requires fewer computations and checks than the existing full overlap checking, if it is applicable. The main idea is to only check if source and sink of a dependency are far enough apart so the accesses won't overlap in the vector loop. To do so, it is sufficient to compute the difference and compare it to the `VF * UF * AccessSize`. It is sufficient to check `(Sink - Src) <u VF * UF * AccessSize` to rule out a backwards dependence in the vector loop with the given VF and UF. If Src >=u Sink, there is not dependence preventing vectorization, hence the overflow should not matter and using the ULT should be sufficient. Note that the initial version is restricted in multiple ways: 1. Pointers must only either be read or written, by a single instruction (this allows re-constructing source/sink for dependences with the available information) 2. Source and sink pointers must be add-recs, with matching steps 3. The step must be a constant. 3. abs(step) == AccessSize. Most of those restrictions can be relaxed in the future. See https://github.com/llvm/llvm-project/issues/53590. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D119078	2022-05-16 15:27:22 +01:00
David Sherwood	befc952045	[LoopVectorize] Permit tail-folding for low trip counts using scalable vectors When the loop vectoriser encounters a known low trip count it tries to create a single predicated loop in order to get the benefit of vectorisation and eliminate the scalar tail. However, until now the vectoriser prevented the use of scalable vectors in this case due to concerns in the past about stability. I believe that tail-folded loops using scalable vectors are now sufficiently well tested that we can enable this. For the same reason I've also enabled it when optimising for code size too. Tests added here: Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll Transforms/LoopVectorize/RISCV/low-trip-count.ll Differential Revision: https://reviews.llvm.org/D121595	2022-05-16 09:14:24 +01:00
David Sherwood	92c645b5c1	[LoopVectorize] Add overflow checks when tail-folding with scalable vectors In InnerLoopVectorizer::getOrCreateVectorTripCount there is an assert that the known minimum value for the VF is a power of 2 when tail-folding is enabled. However, for scalable vectors the value of vscale may not be a power of 2, which means we have to worry about the possibility of overflow. I have solved this problem by adding preheader checks that prevent us from entering the vector body if the canonical IV would overflow, i.e. if ((IntMax - TripCount) < (VF * UF)) ... skip vector loop ... Differential Revision: https://reviews.llvm.org/D125235	2022-05-13 14:09:43 +01:00
David Sherwood	45f2e92d97	[NFC][LoopVectorize] Add SVE test for tail-folding combined with interleaving Differential Revision: https://reviews.llvm.org/D125001	2022-05-09 13:08:25 +01:00
Igor Kirillov	4e5e042d9a	[LoopVectorize] Support reductions that store intermediary result Adds ability to vectorize loops containing a store to a loop-invariant address as part of a reduction that isn't converted to SSA form due to lack of aliasing info. Runtime checks are generated to ensure the store does not alias any other accesses in the loop. Ordered fadd reductions are not yet supported. Differential Revision: https://reviews.llvm.org/D110235	2022-05-03 10:12:30 +01:00
Florian Hahn	0ef8ca6d88	[VPlan] Do not create VPWidenCall recipes for scalar vector factors. 'Widen' recipe are only used when actual vector values are generated. Fix tryToWidenCall to do not create VPWidenCallRecipes for scalar vector factors. This was exposed by D123720, because the widened recipes are considered vector users. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D124718	2022-05-02 19:40:33 +01:00
Craig Topper	ac8c720d48	[IR] Allow constant folding (insertelement <vscale x 2 x i32> zeroinitializer, i32 0, i32 i32 0. Most of insertelement constant folding is blocked if the vector type is scalable. I believe we can make an exception for inserting null into an all zeros vector. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D123413	2022-04-15 17:44:32 -07:00
Muhammad Omair Javaid	42ebfa8269	Revert "[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth" This reverts commit `64b6192e81`. This broke LLVM AArch64 buildbot clang-aarch64-sve-vls-2stage: https://lab.llvm.org/buildbot/#/builders/176/builds/1515 llvm-tblgen crashes after applying this patch.	2022-04-13 04:53:07 +05:00
Florian Hahn	256c6b0ba1	[VPlan] Model pre-header explicitly. This patch extends the scope of VPlan to also model the pre-header. The pre-header can be used to place recipes that should be code-gen'd outside the loop, like SCEV expansion. Depends on D121623. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121624	2022-04-09 14:19:47 +02:00
Jingu Kang	64b6192e81	[AArch64] Set maximum VF with shouldMaximizeVectorBandwidth Set the maximum VF of AArch64 with 128 / the size of smallest type in loop. Differential Revision: https://reviews.llvm.org/D118979	2022-04-05 13:16:52 +01:00
Dávid Bolvanský	872f7000fc	Revert "[NFCI] Regenerate SROA/LoopVectorize test checks" This reverts commit `14e3450fb5`.	2022-04-04 01:15:30 +02:00
Dávid Bolvanský	a113a582b1	[NFCI] Regenerate LoopVectorize test checks	2022-04-03 21:56:24 +02:00
Florian Hahn	95b2aa511e	[VPlan] Set VPlan header block name to vector.body. This brings the VPlan block naming in line with the naming of the generated basic blocks.	2022-04-02 19:34:32 +01:00
Florian Hahn	a08c90a402	[LV] Re-use TripCount from EPI.TripCount. During skeleton construction for the epilogue vector loop, generic helpers use getOrCreateTripCount, which will re-expand the trip count computation. Instead, re-use the TripCount created during main loop vectorization.	2022-04-01 13:47:34 +01:00
David Green	b65267ca7b	[LV] Invalidate widening decisions after maximizing vector bandwidth When MaximizeVectorBandwidth is enabled, we can end up (via calls to collectUniformsAndScalars/setCostBasedWideningDecision through calculateRegisterUsage) making widening decisions before we have decided whether to fold the tail by masking. These decisions will be wrong if we later decided to fold the tail, for example when the trip count is very low. It will use incorrect costs for loads that should get masked, using standard memory operation costs instead. This still at the moment uses the EmulatedMaskMemRefHack costs (a bit unfortunately), but the old costs without this change were 1, leading to too optimistic vectorization. This slightly changes the way that the MaximizeVectorBandwidth option works to make it easier to test, always honouring the option if it is set. Differential Revision: https://reviews.llvm.org/D120215	2022-03-31 09:19:31 +01:00
Florian Hahn	46432a0088	[VPlan] Add VPWidenPointerInductionRecipe. This patch moves pointer induction handling from VPWidenPHIRecipe to its own recipe. In the process, it adds all information required to generate code for pointer inductions without relying on Legal to access the list of induction phis. Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121615	2022-03-24 14:58:45 +00:00
Florian Hahn	151c144350	[LV] Use usesScalars in widenPHIInstruction. This uses the existing VPlan helpers to check whether there are scalar uses of a phi recipe. It remove one of the few remaining dependencies on the cost model from VPlan code generation. Depends on D121612. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D121613	2022-03-17 13:16:32 +00:00
Malhar Jajoo	a36d269658	[VPlan] Avoid collecting scalars for SVE This patch ensures scalars (except for uniforms) are no longer collected (prior to LVP planning phase) for scalable vectorization. This is to avoid the chances of generating scalarized instructions later (during LVP execute phase) as they are not supported for scalable vectorization. Relevant test has also been added. Differential Revision: https://reviews.llvm.org/D121452	2022-03-16 16:33:34 +00:00
Florian Hahn	95f76bff1c	[LV] Create & use VPScalarIVSteps for all scalar users. This patch is a follow-up to D115953. It updates optimizeInductions to also introduce new VPScalarIVStepsRecipes if an IV has both vector and scalar uses. It updates all uses that only need scalar values to use the newly created recipe for the scalar steps. This completes untangling of VPWidenIntOrFpInductionRecipe code-generation. Now the recipe only creates the widened vector values, as it says on the tin. The code to genereate IR has been moved directly to VPWidenIntOrFpInductionRecipe::execute. Note that the recipe has been updated to hold a reference to ScalarEvolution, which is needed to expand the step, until we can place the corresponding SCEV expansion in the pre-header. Depends on D120827. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D120828	2022-03-13 17:15:24 +00:00
Roman Lebedev	2f80ea7f4f	[NFC][LV] Use different braces in debug output The analysis passes output function name encapsulated in `'` braces, but LV uses `"`. Harmonizing this may help in creating an update script for the LV costmodel test checks. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D121105	2022-03-07 19:32:37 +03:00
Florian Hahn	da740492b0	[VPlan] Remove dead header-phi recipes. This patch adds a new transform to remove dead recipes. For now, it only removes dead recipes in the header, to keep the number tests that require updating manageable. Future patches will extend this to remove dead recipes across the whole plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D118051	2022-02-26 16:26:39 +00:00
Kerry McLaughlin	12fb133eba	[LoopVectorize] Support conditional in-loop vector reductions Extends getReductionOpChain to look through Phis which may be part of the reduction chain. adjustRecipesForReductions will now also create a CondOp for VPReductionRecipe if the block is predicated and not only if foldTailByMasking is true. Changes were required in tryToBlend to ensure that we don't attempt to convert the reduction Phi into a select by returning a VPBlendRecipe. The VPReductionRecipe will create a select between the Phi and the reduction. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117580	2022-02-22 12:04:35 +00:00
Florian Hahn	446e7c64c7	[LV] Add real uses in some tests, to make them more robust. Add real uses to some tests, to ensure dead instructions cannot be directly removed.	2022-02-13 09:52:59 +00:00
David Green	b55d4c2ad8	Revert "[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()`" This reverts commit `77a0da926c` as we've received multiple reports of this significantly impacting performance, in ways that don't seem to just be target specific cost models going wrong. I would offer some reproducers, but the test changes here seem to be full of them! Reverting for now and hopefully we can remove the "hack" more carefully as we go.	2022-02-09 20:02:54 +00:00
David Green	b4c6d1bb37	[LoopVectorizer] Don't perform interleaving of predicated scalar loops The vectorizer will choose at times to "vectorize" loops with a scalar factor (VF=1) with interleaving (IC > 1). This can occasionally produce better code than the unroller (notable for reductions where it can produce independent reduction chains that are combined after the loop). At times this is not very beneficial though, for example when runtime checks are needed or when the scalar code requires predication. This addresses the second point, preventing the vectorizer from interleaving when the scalar loop will require predication. This prevents it from making a bit of a mess, that is worse than the original and better left for the unroller to unroll if beneficial. It helps reverse some of the regressions from D118090. Differential Revision: https://reviews.llvm.org/D118566	2022-02-07 19:34:28 +00:00
Roman Lebedev	77a0da926c	[LV] Remove `LoopVectorizationCostModel::useEmulatedMaskMemRefHack()` D43208 extracted `useEmulatedMaskMemRefHack()` from legality into cost model. What it essentially does is prevents scalarized vectorization of masked memory operations: ``` // TODO: Cost model for emulated masked load/store is completely // broken. This hack guides the cost model to use an artificially // high enough value to practically disable vectorization with such // operations, except where previously deployed legality hack allowed // using very low cost values. This is to avoid regressions coming simply // from moving "masked load/store" check from legality to cost model. // Masked Load/Gather emulation was previously never allowed. // Limited number of Masked Store/Scatter emulation was allowed. ``` While i don't really understand about what specifically `is completely broken` was talking about, i believe that at least on X86 with AVX2-or-later, this is no longer true. (or at least, i would like to know what is still broken). So i would like to follow suit after D111460, and like wise disable that hack for AVX2+. But since this was added for X86 specifically, let's just instead completely remove this hack. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D114779	2022-02-07 16:08:31 +03:00
Sander de Smalen	eaee477eda	[LV] Use VScaleForTuning to allow wider epilogue VFs. When the main loop is e.g. VF=vscale x 1 and the epilogue VF cannot be any smaller, the vectorizer should try to estimate how many lanes are executed at runtime and allow a suitable fixed-width VF to be chosen. It can use VScaleForTuning to figure out what a suitable fixed-width VF could be. For the case where the main loop VF is VF=vscale x 1, and VScaleForTuning=8, it could still choose an epilogue VF upto VF=4. This was a bit tricky to test, so this patch also introduces a wrapper function to get 'VScaleForTuning' by also considering vscale_range. If min and max are equal, then that will be the vscale we compile for. It makes little sense to tune for a different width if the code will not be portable for other widths. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118709	2022-02-03 15:40:17 +00:00
Sander de Smalen	2a44eaf20f	[LV] Allow a scalable VF for the epilogue. For some reason we limited the epilogue VF to be fixed-width, but there is not necessarily a reason for doing so. If the main VF=vscale x 16, the epilogue VF could be either fixed-width, or a scalable VF upto vscale x 8. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D118688	2022-02-01 22:38:55 +00:00
David Green	aaa16eb023	[LV][AArch64] Add test for scalar interleaving with predication. NFC	2022-02-01 09:21:49 +00:00
Florian Hahn	efd4938723	[VPlan] Handle IV vector splat using VPWidenCanonicalIV. This patch tries to use an existing VPWidenCanonicalIVRecipe instead of creating another step-vector for canonical induction recipes in widenIntOrFpInduction. This has the following benefits: 1. First step to avoid setting both vector and scalar values for the same induction def. 2. Reducing complexity of widenIntOrFpInduction through making things more explicit in VPlan 3. Only need to splat the vector IV for block in masks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116123	2022-01-29 16:25:27 +00:00
Congzhe Cao	f3e1f44340	[IVDescriptor] Get the exact FP instruction that does not allow reordering This is a bugfix in IVDescriptor.cpp. The helper function `RecurrenceDescriptor::getExactFPMathInst()` is supposed to return the 1st FP instruction that does not allow reordering. However, when constructing the RecurrenceDescriptor, we trace the use-def chain staring from a PHI node and for each instruction in the use-def chain, its descriptor overrides the previous one. Therefore in the final RecurrenceDescriptor we constructed, we lose previous FP instructions that does not allow reordering. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D118073	2022-01-27 00:33:46 -05:00
Igor Kirillov	d3932c690d	[LoopVectorize] Add tests with reductions that are stored in invariant address This patch adds tests for functionality that is to be implemented in D110235. Differential Revision: https://reviews.llvm.org/D117213	2022-01-24 21:26:38 +00:00
Florian Hahn	b2a8eff45c	[LV] Make some tests more robust by adding missing users.	2022-01-24 13:04:09 +00:00
Kerry McLaughlin	8082ab2fc3	[LoopVectorize] Support epilogue vectorisation of loops with reductions isCandidateForEpilogueVectorization will currently return false for loops which contain reductions. This patch removes this restriction and makes the following changes to support epilogue vectorisation with reductions: - `fixReduction`: If fixReduction is being called during vectorisation of the epilogue, the phi node it creates will need to additionally carry incoming values from the middle block of the main loop. - `createEpilogueVectorizedLoopSkeleton`: The incoming values of the phi created by fixReduction are updated after the vec.epilog.iter.check block is added. The phi is also moved to the preheader of the epilogue. - `processLoop`: The start value of any VPReductionPHIRecipes are updated before vectorising the epilogue loop. The getResumeInstr function added to the ILV will return the resume instruction associated with the recurrence descriptor. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D116928	2022-01-24 12:03:31 +00:00
Kerry McLaughlin	c740a07863	[LoopVectorize] Test in-loop reductions with tail folding for scalable vectors Adds `-prefer-inloop-reductions` to the RUN line of sve-tail-folding.ll & adds a new test where in-loop reductions cannot be used (`@cond_xor_reduction`). NFC. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D117578	2022-01-19 14:36:23 +00:00
David Sherwood	e781620dee	[LoopVectorize][AArch64] Use get.active.lane.mask intrinsic when SVE is enabled When SVE is enabled for AArch64 targets it makes more sense to use the get.active.lane.mask intrinsic, because SVE has an exact 1-1 mapping from the intrinsic to the 'whilelo' instruction for legal vector types. This instruction neatly takes overflow into account as well. This patch fixes an issue in VPInstruction::generateInstruction that assumed we are only dealing with fixed-width vectors. Differential Revision: https://reviews.llvm.org/D117109	2022-01-18 11:59:30 +00:00
Florian Hahn	d4a8fc3a87	[VPlan] Introduce and use BranchOnCount VPInstruction. This patch adds a new BranchOnCount VPInstruction opcode with 2 operands. It first compares its 2 operands (increment of canonical induction and vector trip count), followed by a branch to either the exit block or back to the vector header. It must be the last recipe in the exit block of the topmost vector loop region. This extracts parts from D113224 and was discussed in D113223. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D116479	2022-01-12 13:42:13 +00:00
Rosie Sumpter	552eb372cb	[LoopVectorize] Pass a vector type to isLegalMaskedGather/Scatter This is required to query the legality more precisely in the LoopVectorizer. This adds another TTI function named 'forceScalarizeMaskedGather/Scatter' function to work around the hack introduced for MVE, where isLegalMaskedGather/Scatter would return an answer by second-guessing where the function was called from, based on the Type passed in (vector vs scalar). The new interface makes this explicit. It is also used by X86 to check for vector widths where gather/scatters aren't profitable (or don't exist) for certain subtargets. Differential Revision: https://reviews.llvm.org/D115329	2022-01-12 13:34:12 +00:00
David Sherwood	b0922a9dcd	[LoopVectorize] Make VPWidenCanonicalIVRecipe::execute work for scalable vectors The code in VPWidenCanonicalIVRecipe::execute only worked for fixed-width vectors due to the way we generate the values per lane. This patch changes the code to use a combination of vector splats and step vectors to get the same result. This then works for both fixed-width and scalable vectors. Tests that exercise this code path for scalable vectors have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding.ll Differential Revision: https://reviews.llvm.org/D113180	2022-01-10 14:12:32 +00:00
David Sherwood	e3c84fb948	[LoopVectorize] Add support for tail folding using scalable vectors This patch fixes up an issue with InnerLoopVectorizer::getOrCreateVectorTripCount whereby we weren't correctly generating the runtime trip count for scalable vectors when tail-folding. It also removes some asserts in the tail-folding path for cases when the VF is not scalable. In this patch I have only permitted tail-folding to be enabled explicitly for scalable vectors when the user has specified one of the following flags: -prefer-predicate-over-epilogue=predicate-dont-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue For now it's best not to enable tail-folding with scalable vectors for low trip counts or when optimising for code size, since there has been no analysis on whether this is worth it. Various tests have been added here: Transforms/LoopVectorize/AArch64/sve-tail-folding.ll Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll The tests cannot be target independent because they require masked load/store support, i.e. TTI.isLegalMaskedLoad and TTI.isLegalMaskedStore need to return true. Differential Revision: https://reviews.llvm.org/D113003	2022-01-10 10:55:40 +00:00
David Green	bc615e436c	[AArch64] Update addo and subo costs Similar to D116732, this adds basic scalar sadd_with_overflow, uadd_with_overflow, ssub_with_overflow and usub_with_overflow costs for aarch64, which are usually quite efficiently lowered. Differential Revision: https://reviews.llvm.org/D116734	2022-01-07 16:20:23 +00:00
Florian Hahn	65c4d6191f	[VPlan] Add VPCanonicalIVPHIRecipe, partly retire createInductionVariable. At the moment, the primary induction variable for the vector loop is created as part of the skeleton creation. This is tied to creating the vector loop latch outside of VPlan. This prevents from modeling the whole vector loop in VPlan, which in turn is required to model preheader and exit blocks in VPlan as well. This patch introduces a new recipe VPCanonicalIVPHIRecipe to represent the primary IV in VPlan and CanonicalIVIncrement{NUW} opcodes for VPInstruction to model the increment. This allows us to partly retire createInductionVariable. At the moment, a bit of patching up is done after executing all blocks in the plan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113223	2022-01-05 10:46:06 +00:00
Rosie Sumpter	961f51fdf0	[LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions without loads/stores For loops that contain in-loop reductions but no loads or stores, large VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes has no element types to check through and so returns the default widths (-1U for the smallest and 8 for the widest). This results in the widest VF being chosen for the following example, float s = 0; for (int i = 0; i < N; ++i) s += (float) i*i; which, for more computationally intensive loops, leads to large loop sizes when the operations end up being scalarized. In this patch, for the case where ElementTypesInLoop is empty, the widest type is determined by finding the smallest type used by recurrences in the loop instead of falling back to a default value of 8 bits. This results in the cost model choosing a more sensible VF for loops like the one above. Differential Revision: https://reviews.llvm.org/D113973	2022-01-04 10:12:57 +00:00
Sander de Smalen	290ae657a6	Fix buildbot failure caused by D115651 I somehow missed updating the RUN line of this test.	2021-12-20 17:18:59 +00:00
Sander de Smalen	b1ff20fd35	[LV] Enable scalable vectorization by default for SVE cores. The availability of SVE should be sufficient to enable scalable auto-vectorization. This patch adds a new TTI interface to query the target what style of vectorization it wants when scalable vectors are available. For other targets than AArch64, this currently defaults to 'FixedWidthOnly'. Differential Revision: https://reviews.llvm.org/D115651	2021-12-20 16:23:29 +00:00
Philip Reames	e6ad9ef4e7	[instcombine] Canonicalize constant index type to i64 for extractelement/insertelement The basic idea to this is that a) having a single canonical type makes CSE easier, and b) many of our transforms are inconsistent about which types we end up with based on visit order. I'm restricting this to constants as for non-constants, we'd have to decide whether the simplicity was worth extra instructions. For constants, there are no extra instructions. We chose the canonical type as i64 arbitrarily. We might consider changing this to something else in the future if we have cause. Differential Revision: https://reviews.llvm.org/D115387	2021-12-13 16:56:22 -08:00
Philip Reames	1a18de3d0a	Autogen a bunch of instcombine and vectorizer tests Done in advance of D115387. These are all the ones which my local script could handle, there's a couple more which need manual updates.	2021-12-13 10:41:38 -08:00
David Sherwood	8b0448ce5d	[AArch64][Analysis] Add on overhead costs for SVE gathers and scatters This patch adds on an overhead cost for gathers and scatters, which is a rough estimate based on performance investigations I have performed on SVE hardware for various micro-benchmarks. Differential Revision: https://reviews.llvm.org/D115143	2021-12-09 16:02:59 +00:00
David Sherwood	def8b952eb	[LoopVectorize][AArch64] Add vectoriser cost model tests for gathers/scatters I've added some tests that were previously missing for the gather-scatter costs being calculated by the vectorizer for AArch64: Transforms/LoopVectorize/AArch64/sve-gather-scatter-cost.ll The costs are sometimes different to the ones in Analysis/CostModel/AArch64/sve-gather.ll because the vectorizer also adds on the address computation cost.	2021-12-09 15:44:12 +00:00
Cullen Rhodes	698584f89b	[IR] Remove unbounded as possible value for vscale_range minimum The default for min is changed to 1. The behaviour of -mvscale-{min,max} in Clang is also changed such that 16 is the max vscale when targeting SVE and no max is specified. Reviewed By: sdesmalen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D113294	2021-12-07 09:52:21 +00:00
Sander de Smalen	3d549dddf7	[LV] Pass compare predicate to getCmpSelInstrCost. If the condition of a select is a compare, pass its predicate to TTI::getCmpSelInstrCost to get a more accurate cost value instead of passing BAD_ICMP_PREDICATE. I noticed that the commit message from D90070 had a comment about the vectorized select predicate possibly being composed of other compares with different predicate values, but I wasn't able to construct an example where this was an actual issue. If this is an issue, I guess we could add another check that the block isn't predicated for any reason. Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D114646	2021-12-06 11:41:27 +00:00
Sander de Smalen	28a4deab92	[LV] Fix incorrectly marking a pointer indvar as 'scalar'. collectLoopScalars should only add non-uniform nodes to the list if they are used by a load/store instruction that is marked as CM_Scalarize. Before this patch, the LV incorrectly marked pointer induction variables as 'scalar' when they required to be widened by something else, such as a compare instruction, and weren't used by a node marked as 'CM_Scalarize'. This case is covered by sve-widen-phi.ll. This change also allows removing some code where the LV tried to widen the PHI nodes with a stepvector, even though it was marked as 'scalarAfterVectorization'. Now that this code is more careful about marking instructions that need widening as 'scalar', this code has become redundant. Differential Revision: https://reviews.llvm.org/D114373	2021-11-28 09:49:28 +00:00
Sander de Smalen	a9f837bbf0	NFC: Simplify sve-widen-phi.ll by unrolling once. The unroll factor > 1 has little value for what is being tested.	2021-11-28 09:49:28 +00:00
David Sherwood	e20391fc5d	[LoopVectorize] When tail-folding, don't always predicate uniform loads In VPRecipeBuilder::handleReplication if we believe the instruction is predicated we then proceed to create new VP region blocks even when the load is uniform and only predicated due to tail-folding. I have updated isPredicatedInst to avoid treating a uniform load as predicated when tail-folding, which means we can do a single scalar load and a vector splat of the value. Tests added here: Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll Differential Revision: https://reviews.llvm.org/D112552	2021-11-26 11:30:54 +00:00
Rosie Sumpter	df32a39dd0	[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic This patch updates the cost model for ordered reductions so that a call to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction plus the cost of an ordered fadd reduction. Differential Revision: https://reviews.llvm.org/D111630	2021-11-24 08:50:05 +00:00
Rosie Sumpter	991074012a	[LoopVectorize] Propagate fast-math flags for VPInstruction In-loop vector reductions which use the llvm.fmuladd intrinsic involve the creation of two recipes; a VPReductionRecipe for the fadd and a VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags these should be propagated through to the fmul instruction, so an interface setFastMathFlags has been added to the VPInstruction class to enable this. Differential Revision: https://reviews.llvm.org/D113125	2021-11-24 08:50:04 +00:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Diego Caballero	4348cd42c3	[LV] Drop integer poison-generating flags from instructions that need predication This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact` and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop is linearized, these flags may lead to generating a poison value that is effectively used as the base address of the widen load/store. The fix drops all the integer poison-generating flags from instructions that contribute to the address computation of a widen load/store whose original instruction was in a basic block that needed predication and is not predicated after vectorization. Reviewed By: fhahn, spatel, nlopes Differential Revision: https://reviews.llvm.org/D111846	2021-11-22 10:57:29 +00:00
Florian Hahn	cf8efbd30e	[VPlan] Wrap vector loop blocks in region. A first step towards modeling preheader and exit blocks in VPlan as well. Keeping the vector loop in a region allows for changing the VF as we traverse region boundaries. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113182	2021-11-20 17:59:48 +00:00
Kerry McLaughlin	ff64b2933a	[LoopVectorize] Check the number of uses of an FAdd before classifying as ordered checkOrderedReductions looks for Phi nodes which can be classified as in-order, meaning they can be vectorised without unsafe math. In order to vectorise the reduction it should also be classified as in-loop by getReductionOpChain, which checks that the reduction has two uses. In this patch, a similar check is added to checkOrderedReductions so that we now return false if there are more than two uses of the FAdd instruction. This fixes PR52515. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D114002	2021-11-18 16:41:19 +00:00
David Sherwood	8d77555b12	[Analysis] Ensure getTypeLegalizationCost returns a simple VT for TypeScalarizeScalableVector When getTypeConversion returns TypeScalarizeScalableVector we were sometimes returning a non-simple type from getTypeLegalizationCost. However, many callers depend upon this being a simple type and will crash if not. This patch changes getTypeLegalizationCost to ensure that we always a return sensible simple VT. If the vector type contains unusual integer types, e.g. <vscale x 2 x i3>, then we just set the type to MVT::i64 as a reasonable default. A test has been added here that demonstrates the vectoriser can correctly calculate the cost of vectorising a "zext i3 to i64" instruction with a VF=vscale x 1: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113777	2021-11-17 13:11:58 +00:00
David Sherwood	670dd40244	[Analysis] Fix getNumberOfParts to return 0 when the answer is unknown When asking how many parts are required for a scalable vector type there are occasions when it cannot be computed. For example, <vscale x 1 x i3> is one such vector for AArch64+SVE because at the moment no matter how we promote the i3 type we never end up with a legal vector. This means that getTypeConversion returns TypeScalarizeScalableVector as the LegalizeKind, and then getTypeLegalizationCost returns an invalid cost. This then causes BasicTTImpl::getNumberOfParts to dereference an invalid cost, which triggers an assert. This patch changes getNumberOfParts to return 0 for such cases, since the definition of getNumberOfParts in TargetTransformInfo.h states that we can use a return value of 0 to represent an unknown answer. Currently, LoopVectorize.cpp is the only place where we need to check for 0 as a return value, because all other instances will not currently ask for the number of parts for <vscale x 1 x iX> types. In addition, I have changed the target-independent interface for getNumberOfParts to return 1 and assume there is a single register that can fit the type. The loop vectoriser has lots of tests that are target-independent and they relied upon the 0 value to mean the answer is known and that we are not scalarising the vector. I have added tests here that show we correctly return an invalid cost for VF=vscale x 1 when the loop contains unusual types such as i7: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113772	2021-11-17 12:07:09 +00:00
Kerry McLaughlin	7647822156	[AArch64][SVE] Remove i1 type from isElementTypeLegalForScalableVector `collectElementTypesForWidening` collects the types of load, store and reduction Phis in a loop. These types are later checked using `isElementTypeLegalForScalableVector` to prevent vectorisation of loops with instruction types that are unsupported. This patch removes i1 from the list of types supported for scalable vectors. This fixes an assert ("Cannot yet scalarize uniform stores") in `setCostBasedWideningDecision` when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D113680	2021-11-12 14:24:38 +00:00
Kerry McLaughlin	6f16ee5e14	Revert "[LoopVectorize] Extract the last lane from a uniform store" This reverts commit `0d748b4d32`. This is causing some failures when building Spec2017 with scalable vectors. Reverting to investigate.	2021-11-10 11:21:19 +00:00
David Sherwood	2a48b6993a	[IR] In ConstantFoldShuffleVectorInstruction use zeroinitializer for splats of 0 When creating a splat of 0 for scalable vectors we tend to create them with using a combination of shufflevector and insertelement, i.e. shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 0, i32 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer) However, for the case of a zero splat we can actually just replace the above with zeroinitializer instead. This makes the IR a lot simpler and easier to read. I have changed ConstantFoldShuffleVectorInstruction to use zeroinitializer when creating a splat of integer 0 or FP +0.0 values. Differential Revision: https://reviews.llvm.org/D113394	2021-11-10 09:42:58 +00:00
Kerry McLaughlin	0d748b4d32	[LoopVectorize] Extract the last lane from a uniform store Changes VPReplicateRecipe to extract the last lane from an unconditional, uniform store instruction. collectLoopUniforms will also add stores to the list of uniform instructions where Legal->isUniformMemOp is true. setCostBasedWideningDecision now sets the widening decision for all uniform memory ops to Scalarize, where previously GatherScatter may have been chosen for scalable stores. This fixes an assert ("Cannot yet scalarize uniform stores") in setCostBasedWideningDecision when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: sdesmalen, david-arm, fhahn Differential Revision: https://reviews.llvm.org/D112725	2021-11-09 14:43:16 +00:00
Sander de Smalen	2829376bb2	[LV] Use VScaleForTuning to fine-tune the cost per lane. When targeting a specific CPU with scalable vectorization, the knowledge of that particular CPU's vscale value can be used to tune the cost-model and make the cost per lane less pessimistic. If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane is calculated as: Cost / (VScaleForTuning * VF.KnownMinLanes) Otherwise, it assumes a value of 1 meaning that the behavior is unchanged and calculated as: Cost / VF.KnownMinLanes Reviewed By: kmclaughlin, david-arm Differential Revision: https://reviews.llvm.org/D113209	2021-11-08 16:59:46 +00:00
David Sherwood	c42bb30b9e	[LoopVectorize] Permit fixed-width epilogue loops for scalable vector bodies At the moment in LoopVectorizationCostModel::selectEpilogueVectorizationFactor we bail out if the main vector loop uses a scalable VF. This patch adds support for generating epilogue vector loops using a fixed-width VF when the main vector loop uses a scalable VF. I've changed LoopVectorizationCostModel::selectEpilogueVectorizationFactor so that we convert the scalable VF into a fixed-width VF and do profitability checks on that instead. In addition, since the scalable and fixed-width VFs live in different VPlans that means I had to change the calls to LVP.hasPlanWithVFs so that we only pass in the fixed-width VF. New tests added here: Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll Differential Revision: https://reviews.llvm.org/D109432	2021-11-08 09:41:13 +00:00
David Sherwood	9da8dde7fd	[NFC][LoopVectorize] Add test for tail-folding loop with conditional uniform load I've added a test for a loop containing a conditional uniform load for a target that supports masked loads. The test just ensures that we correctly use gather instructions and have the correct mask. Differential Revision: https://reviews.llvm.org/D112619	2021-11-03 09:51:11 +00:00
Rosie Sumpter	dcb8222d87	[LoopVectorize] Propagate fast-math flags for inloop reductions This patch updates VPReductionRecipe::execute so that the fast-math flags associated with the underlying instruction of the VPRecipe are propagated through to the reductions which are created. Differential Revision: https://reviews.llvm.org/D112548	2021-11-02 08:59:53 +00:00
Roman Lebedev	101aaf62ef	Revert "[NFC] `IRBuilderBase::CreateAdd()`: place constant onto RHS" Clang OpenMP codegen tests are failing, will recommit afterwards. This reverts commit `4723c9b3c6`.	2021-10-27 22:21:37 +03:00
Roman Lebedev	42712698fd	Revert "[IR] `IRBuilderBase::CreateAdd()`: short-circuit `x + 0` --> `x`" Clang OpenMP codegen tests are failing. This reverts commit `288f1f8abe`. This reverts commit `cb90e5356a`.	2021-10-27 22:21:37 +03:00
Roman Lebedev	cb90e5356a	[IR] `IRBuilderBase::CreateAdd()`: short-circuit `x + 0` --> `x` There's precedent for that in `CreateOr()`/`CreateAnd()`. The motivation here is to avoid bloating the run-time check's IR in `SCEVExpander::generateOverflowCheck()`. Refs. https://reviews.llvm.org/D109368#3089809	2021-10-27 21:34:38 +03:00
Roman Lebedev	4723c9b3c6	[NFC] `IRBuilderBase::CreateAdd()`: place constant onto RHS	2021-10-27 21:34:38 +03:00
Roman Lebedev	2eaef53023	[TTI] `BasicTTIImplBase::getInterleavedMemoryOpCost()`: fix load discounting The math here is: Cost of 1 load = cost of n loads / n Cost of live loads = num live loads * Cost of 1 load Cost of live loads = num live loads * (cost of n loads / n) Cost of live loads = cost of n loads * (num live loads / n) But, all the variables here are integers, and integer division rounds down, but this calculation clearly expects float semantics. Instead multiply upfront, and then perform round-up-division. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112302	2021-10-22 14:08:58 +03:00
David Sherwood	9448cdc900	[SVE][Analysis] Tune the cost model according to the tune-cpu attribute This patch introduces a new function: AArch64Subtarget::getVScaleForTuning that returns a value for vscale that can be used for tuning the cost model when using scalable vectors. The VScaleForTuning option in AArch64Subtarget is initialised according to the following rules: 1. If the user has specified the CPU to tune for we use that, else 2. If the target CPU was specified we use that, else 3. The tuning is set to "generic". For CPUs of type "generic" I have assumed that vscale=2. New tests added here: Analysis/CostModel/AArch64/sve-gather.ll Analysis/CostModel/AArch64/sve-scatter.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D110259	2021-10-21 09:33:50 +01:00
Kerry McLaughlin	1439ef1a3f	[LoopVectorize] Classify pointer induction updates as scalar only if they have one use collectLoopScalars collects pointer induction updates in ScalarPtrs, assuming that the instruction will be scalar after vectorization. This may crash later in VPReplicateRecipe::execute() if there there is another user of the instruction other than the Phi node which needs to be widened. This changes collectLoopScalars so that if there are any other users of Update other than a Phi node, it is not added to ScalarPtrs. Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D111294	2021-10-12 13:24:49 +01:00
David Sherwood	26b7d9d622	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-11 09:41:38 +01:00
Krasimir Georgiev	685f1bfd0a	Revert "[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns" It appears to cause stage2 clang build failures, e.g., https://lab.llvm.org/buildbot/#/builders/74/builds/7145. This reverts commit `1fb37334bd`.	2021-10-01 11:39:43 +02:00
David Sherwood	1fb37334bd	[LoopVectorize] Permit vectorisation of more select(cmp(), X, Y) reduction patterns This patch adds further support for vectorisation of loops that involve selecting an integer value based on a previous comparison. Consider the following C++ loop: int r = a; for (int i = 0; i < n; i++) { if (src[i] > 3) { r = b; } src[i] += 2; } We should be able to vectorise this loop because all we are doing is selecting between two states - 'a' and 'b' - both of which are loop invariant. This just involves building a vector of values that contain either 'a' or 'b', where the final reduced value will be 'b' if any lane contains 'b'. The IR generated by clang typically looks like this: %phi = phi i32 [ %a, %entry ], [ %phi.update, %for.body ] ... %pred = icmp ugt i32 %val, i32 3 %phi.update = select i1 %pred, i32 %b, i32 %phi We already detect min/max patterns, which also involve a select + cmp. However, with the min/max patterns we are selecting loaded values (and hence loop variant) in the loop. In addition we only support certain cmp predicates. This patch adds a new pattern matching function (isSelectCmpPattern) and new RecurKind enums - SelectICmp & SelectFCmp. We only support selecting values that are integer and loop invariant, however we can support any kind of compare - integer or float. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-select-cmp.ll Transforms/LoopVectorize/select-cmp-predicated.ll Transforms/LoopVectorize/select-cmp.ll Differential Revision: https://reviews.llvm.org/D108136	2021-10-01 08:41:03 +01:00
Craig Topper	765348298c	[CostModel] Update default cost model for sadd/ssub overflow to match TargetLowering The expansion for these was updated in https://reviews.llvm.org/D47927 but the cost model was not adjusted. I believe the cost model was also incorrect for the old expansion. The expansion prior to D47927 used 3 icmps using LHS, RHS, and Result to calculate theirs signs. Then 2 icmps to compare the signs. Followed by an And. The previous cost model was using 3 icmps and 2 selects. Digging back through git blame, those 2 selects in the cost model used to be 2 icmps, but were changed in https://reviews.llvm.org/D90681 Differential Revision: https://reviews.llvm.org/D110739	2021-09-30 09:41:14 -07:00
Florian Hahn	4b581e87df	[LV] Add tests where rt checks may make vectorization unprofitable. Add a few additional tests which require a large number of runtime checks for D109368.	2021-09-27 10:32:28 +01:00
Usman Nadeem	f417d9d821	[InstCombine] Eliminate vector reverse if all inputs/outputs to an instruction are reverses Differential Revision: https://reviews.llvm.org/D109808 Change-Id: I1a10d2bc33acbe0ea353c6cb3d077851391fe73e	2021-09-20 18:32:24 -07:00
David Sherwood	f988f68064	[Analysis] Add support for vscale in computeKnownBitsFromOperator In ValueTracking.cpp we use a function called computeKnownBitsFromOperator to determine the known bits of a value. For the vscale intrinsic if the function contains the vscale_range attribute we can use the maximum and minimum values of vscale to determine some known zero and one bits. This should help to improve code quality by allowing certain optimisations to take place. Tests added here: Transforms/InstCombine/icmp-vscale.ll Differential Revision: https://reviews.llvm.org/D109883	2021-09-20 15:01:59 +01:00
Rosie Sumpter	9d1bea9c88	[SVE][LoopVectorize] Optimise code generated by widenPHIInstruction For SVE, when scalarising the PHI instruction the whole vector part is generated as opposed to creating instructions for each lane for fixed- width vectors. However, in some cases the lane values may be needed later (e.g for a load instruction) so we still need to calculate these values to avoid extractelement being called on the vector part. Differential Revision: https://reviews.llvm.org/D109445	2021-09-10 11:58:04 +01:00
Simon Pilgrim	10c982e0b3	Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450	2021-08-23 21:09:26 +01:00
Florian Hahn	d024a01511	Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64" This reverts the revert `ab9296f13b`. The issue causing the revert should be fixed in `9baed023b4`.	2021-08-23 11:25:27 +01:00
Florian Hahn	9baed023b4	[LV] Adjust reduction recipes before recurrence handling. Adjusting the reduction recipes still relies on references to the original IR, which can become outdated by the first-order recurrence handling. Until reduction recipe construction does not require IR references, move it before first-order recurrence handling, to prevent a crash as exposed by D106653.	2021-08-22 11:02:33 +01:00
Florian Hahn	ab9296f13b	Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64" This reverts commit `f4122398e7` to investigate a crash exposed by it. The patch breaks building the code below with `clang -O2 --target=aarch64-linux` int a; double b, c; void d() { for (; a; a++) { b += c; c = a; } }	2021-08-20 21:24:28 +01:00
David Sherwood	f4122398e7	[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64 I have added a new TTI interface called enableOrderedReductions() that controls whether or not ordered reductions should be enabled for a given target. By default this returns false, whereas for AArch64 it returns true and we rely upon the cost model to make sensible vectorisation choices. It is still possible to override the new TTI interface by setting the command line flag: -force-ordered-reductions=true\|false I have added a new RUN line to show that we use ordered reductions by default for SVE and Neon: Transforms/LoopVectorize/AArch64/strict-fadd.ll Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll Differential Revision: https://reviews.llvm.org/D106653	2021-08-19 09:29:40 +01:00
David Sherwood	219d4518fc	[Analysis][AArch64] Make fixed-width ordered reductions slightly more expensive For tight loops like this: float r = 0; for (int i = 0; i < n; i++) { r += a[i]; } it's better not to vectorise at -O3 using fixed-width ordered reductions on AArch64 targets. Although the resulting number of instructions in the generated code ends up being comparable to not vectorising at all, there may be additional costs on some CPUs, for example perhaps the scheduling is worse. It makes sense to deter vectorisation in tight loops. Differential Revision: https://reviews.llvm.org/D108292	2021-08-18 17:01:56 +01:00
Dylan Fleming	ef198cd99e	[SVE] Remove usage of getMaxVScale for AArch64, in favour of IR Attribute Removed AArch64 usage of the getMaxVScale interface, replacing it with the vscale_range(min, max) IR Attribute. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D106277	2021-08-17 14:42:47 +01:00
Paul Walker	f7a831daa6	[LoopVectorize] Don't emit remarks about lack of scalable vectors unless they're specifically requested. Previously we emitted a "does not support scalable vectors" remark for all targets whenever vectorisation is attempted. This pollutes the output for architectures that don't support scalable vectors and is likely confusing to the user. Instead this patch introduces a debug message that reports when scalable vectorisation is allowed by the target and only issues the previous remark when scalable vectorisation is specifically requested, for example: #pragma clang loop vectorize_width(2, scalable) Differential Revision: https://reviews.llvm.org/D108028	2021-08-15 12:15:52 +01:00
David Sherwood	3ce8c31eb8	[NFC] Add extra RUN line to strict reduction tests I have added RUN lines to both: Transforms/LoopVectorize/AArch64/strict-fadd.ll Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll to show the default behaviour is to not vectorise when the following flag is unset: -force-ordered-reductions	2021-08-10 14:48:38 +01:00
David Sherwood	8439415333	[IR] Let ConstantVector::getSplat use poison instead of undef This patch updates ConstantVector::getSplat to use poison instead of undef when using insertelement/shufflevector to splat. This follows on from D93793. Differential Revision: https://reviews.llvm.org/D107751	2021-08-10 08:27:43 +01:00
Sander de Smalen	3e47f009ff	[LV] Consider ExtractValue as uniform. Since all operands to ExtractValue must be loop-invariant when we deem the loop vectorizable, we can consider ExtractValue to be uniform. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D107286	2021-08-05 16:20:50 +01:00
Sander de Smalen	8d08a84745	[LV] Remove a change that was added in D106164. This change wasn't strictly necessary for D106164 and could be removed. This patch addresses the post-commit comments from @fhahn on D106164, and also changes sve-widen-gep.ll to use the same IR test as shown in pointer-induction.ll. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D106878	2021-08-05 14:44:53 +01:00
David Sherwood	0156f91f3b	[NFC] Rename enable-strict-reductions to force-ordered-reductions I'm renaming the flag because a future patch will add a new enableOrderedReductions() TTI interface and so the meaning of this flag will change to be one of forcing the target to enable/disable them. Also, since other places in LoopVectorize.cpp use the word 'Ordered' instead of 'strict' I changed the flag to match. Differential Revision: https://reviews.llvm.org/D107264	2021-08-03 09:33:01 +01:00
James Y Knight	3d272eea08	Fix test/Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll. It was writing to the source directory (which may not be writeable), rather than using %t. Fixes: `a5dd6c6cf9` ("[LoopVectorize] Don't interleave scalar ordered reductions for inner loops")	2021-07-27 18:32:29 -04:00

1 2 3 4 5 ...

358 Commits