llvm-project

Commit Graph

Author	SHA1	Message	Date
David Sherwood	7b7b5b5a26	[NFC] Rename shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-30 11:11:49 +01:00
Philip Reames	e49d65f36d	[LV] Fix bug when unrolling (only) a loop with non-latch exit If we unroll a loop in the vectorizer (without vectorizing), and the cost model requires a epilogue be generated for correctness, the code generation must actually do so. The included test case on an unmodified opt will access memory one past the expected bound. As a result, this patch is fixing a latent miscompile. Differential Revision: https://reviews.llvm.org/D103700	2021-06-29 08:04:26 -07:00
David Sherwood	9de63367d8	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `9dde514162`.	2021-06-29 15:20:22 +01:00
David Sherwood	9dde514162	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 14:34:30 +01:00
David Sherwood	8a3365fba2	Revert "[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable" This reverts commit `dcfc2c3fac`.	2021-06-29 14:04:42 +01:00
Florian Hahn	47215e1c62	[LV] Fix crash when target instruction for sinking is dead. This patch fixes a crash when the target instruction for sinking is dead. In that case, no recipe is created and trying to get the recipe for it results in a crash. To ensure all sink targets are alive, find & use the first previous alive instruction. Note that the case where the sink source is dead is already handled. Found by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=35320 Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104603	2021-06-29 13:31:22 +01:00
David Sherwood	303b6d5e98	[LoopVectorize] Add support for scalable vectorization of invariant stores Previously in setCostBasedWideningDecision if we encountered an invariant store we just assumed that we could scalarize the store and called getUniformMemOpCost to get the associated cost. However, for scalable vectors this is not an option because it is not currently possibly to scalarize the store. At the moment we crash in VPReplicateRecipe::execute when trying to scalarize the store. Therefore, I have changed setCostBasedWideningDecision so that if we are storing a scalable vector out to a uniform address and the target supports scatter instructions, then we should use those instead. Tests have been added here: Transforms/LoopVectorize/AArch64/sve-inv-store.ll Differential Revision: https://reviews.llvm.org/D104624	2021-06-29 11:56:09 +01:00
David Sherwood	dcfc2c3fac	[NFC] Remove shadowed variable in InnerLoopVectorizer::createInductionVariable Avoid creating a IRBuilder stack variable with the same name as the class member.	2021-06-29 09:14:35 +01:00
Kerry McLaughlin	f99672568f	[LoopVectorize] Fix strict reductions where VF = 1 Currently we will allow loops with a fixed width VF of 1 to vectorize if the -enable-strict-reductions flag is set. However, the loop vectorizer will not use ordered reductions if `VF.isScalar()` and the resulting vectorized loop will be out of order. This patch removes `VF.isVector()` when checking if ordered reductions should be used. Also, instead of converting the FAdds to reductions if the VF = 1, operands of the FAdds are changed such that the order is preserved. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D104533	2021-06-28 11:27:10 +01:00
Florian Hahn	80aa7e147e	[VPlan] Merge predicated-triangle regions, after sinking. Sinking scalar operands into predicated-triangle regions may allow merging regions. This patch adds a VPlan-to-VPlan transform that tries to merge predicate-triangle regions after sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100260	2021-06-28 11:10:38 +01:00
Nikita Popov	a9129f8964	[LoadStoreVectorizer] Support opaque pointers There are remaining redundant bitcasts.	2021-06-27 15:42:16 +02:00
Florian Hahn	f1a6430272	[VPlan] Track both incoming values for first-order recurrence phis. This patch updates VPWidenPHI recipes for first-order recurrences to also track the incoming value from the back-edge. Similar to D99294, which did the same for reductions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104197	2021-06-27 14:29:35 +01:00
Florian Hahn	7f36981977	[LV] Adjust trip count based on IsOrdered in widenPHIInstruction (NFC). Suggested in D104197, avoids the early exit.	2021-06-26 13:13:25 +01:00
Florian Hahn	cc5ee857f9	[LV] Doxygenize VectorizationFactor member comments (NFC). Minor cleanup for follow-up patch.	2021-06-25 18:35:00 +01:00
Florian Hahn	91053e327c	[LV] Reflow comment for VectorizationCostTy (NFC).	2021-06-25 14:20:06 +01:00
Florian Hahn	833bdbe93c	[LV] Support sinking recipe in replicate region after another region. This patch handles sinking a replicate region after another replicate region. In that case, we can connect the sink region after the target region. This properly handles the case for which an assertion has been added in `337d765282`. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=34842. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D103514	2021-06-24 13:58:42 +01:00
Nikita Popov	00d3f7cc3c	[LAA] Make getPointersDiff() API compatible with opaque pointers Make getPointersDiff() and sortPtrAccesses() compatible with opaque pointers by explicitly passing in the element type instead of determining it from the pointer element type. The SLPVectorizer result is slightly non-optimal in that unnecessary pointer bitcasts are added. Differential Revision: https://reviews.llvm.org/D104784	2021-06-23 18:44:34 +02:00
Alexey Bataev	908b753661	[SLP]Improve vectorization of PHI instructions. Perform better analysis when trying to vectorize PHIs. 1. Do not try to vectorize vector PHIs. 2. Do deeper analysis for more profitable nodes for the vectorization. Before we just tried to vectorize the PHIs of the same type. Patch improves this and tries to vectorize PHIs with incoming values which come from the same basic block, have the same and/or alternative opcodes. It allows to save the compile time and provides better vectorization results in general. Part of D57059. Differential Revision: https://reviews.llvm.org/D103638	2021-06-21 12:26:24 -07:00
Roman Lebedev	37dfc467ac	[NFC] LoopVectorizationCostModel::getMaximizedVFForTarget(): clarify debug msg This really isn't talking about vectors in general, but only about either fixed or scalable vectors, and it's pretty confusing to see it state that there aren't any vectors :)	2021-06-17 21:07:34 +03:00
Florian Hahn	80a403348b	[VPlan] Support PHIs as LastInst when inserting scalars in ::get(). At the moment, we create insertelement instructions directly after LastInst when inserting scalar values in a vector in VPTransformState::get. This results in invalid IR when LastInst is a phi, followed by another phi. In that case, the new instructions should be inserted just after the last PHI node in the block. At the moment, I don't think the problematic case can be triggered, but it can happen once predicate regions are merged and multiple VPredInstPHI recipes are in the same block (D100260). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104188	2021-06-17 09:36:44 +01:00
Bjorn Pettersson	4c7f820b2b	Update @llvm.powi to handle different int sizes for the exponent This can be seen as a follow up to commit `0ee439b705`, that changed the second argument of __powidf2, __powisf2 and __powitf2 in compiler-rt from si_int to int. That was to align with how those runtimes are defined in libgcc. One thing that seem to have been missing in that patch was to make sure that the rest of LLVM also handle that the argument now depends on the size of int (not using the si_int machine mode for 32-bit). When using __builtin_powi for a target with 16-bit int clang crashed. And when emitting libcalls to those rtlib functions, typically when lowering @llvm.powi), the backend would always prepare the exponent argument as an i32 which caused miscompiles when the rtlib was compiled with 16-bit int. The solution used here is to use an overloaded type for the second argument in @llvm.powi. This way clang can use the "correct" type when lowering __builtin_powi, and then later when emitting the libcall it is assumed that the type used in @llvm.powi matches the rtlib function. One thing that needed some extra attention was that when vectorizing calls several passes did not support that several arguments could be overloaded in the intrinsics. This patch allows overload of a scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with an entry for powi. Differential Revision: https://reviews.llvm.org/D99439	2021-06-17 09:38:28 +02:00
Evgeniy Brevnov	96cded5b79	[SLP] Incorrect handling of external scalar values Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D103954	2021-06-16 13:27:36 +07:00
Florian Hahn	96ca03493a	[VectorCombine] Limit scalarization to non-poison indices for now. As Eli mentioned post-commit in D103378, the result of the freeze may still be out-of-range according to Alive2. So for now, just limit the transform to indices that are non-poison.	2021-06-14 16:40:14 +01:00
Simon Pilgrim	b013c58e82	VPlanSLP.cpp - tidy implicit header dependencies. NFCI. We don't use std::string and std::vector, but we do use std::pair and std::max.	2021-06-13 12:37:17 +01:00
Valery N Dmitriev	94a07c79cf	[SLP][NFC] Fix condition that was supposed to save a bit of compile time. It was found by chance revealing discrepancy between comment (few lines above), the condition and how re-ordering of instruction is done inside the if statement it guards. The condition was always evaluated to true. Differential Revision: https://reviews.llvm.org/D104064	2021-06-11 10:08:55 -07:00
Alexey Bataev	a010d4230e	[SLP]Allow reordering of insertelements. After we added support for non-ordered insertelements, we can allow their reordering. Differential Revision: https://reviews.llvm.org/D104057	2021-06-11 08:47:41 -07:00
Alexey Bataev	74af4bb1f4	[SLP]Remove unnecessary UndefValue in CreateShuffle. No need to use UndefValue in CreateShuffle call. Differential Revision: https://reviews.llvm.org/D104113	2021-06-11 08:08:30 -07:00
Roman Lebedev	20542b47d6	[VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper This results in slightly more optimistic alignments in some cases	2021-06-11 12:47:10 +03:00
Roman Lebedev	abc0e0125c	[NFC][VectorCombine] Extract computeAlignmentAfterScalarization() helper function	2021-06-11 12:47:09 +03:00
Simon Pilgrim	5e6bfb661e	[Analysis] Pass RecurrenceDescriptor as const reference. NFCI. We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.). Differential Revision: https://reviews.llvm.org/D104029	2021-06-11 10:24:14 +01:00
Qiu Chaofan	2670c7dd5b	[VectorCombine] Fix alignment in single element store This fixes the concern in single element store scalarization that the alignment of new store may be larger than it should be. It calculates the largest alignment if index is constant, and a safe one if not. Reviewed By: lebedev.ri, spatel Differential Revision: https://reviews.llvm.org/D103419	2021-06-11 10:28:15 +08:00
Slava Nikolaev	119965865c	LoadStoreVectorizer: support different operand orders in the add sequence match First we refactor the code which does no wrapping add sequences match: we need to allow different operand orders for the key add instructions involved in the match. Then we use the refactored code trying 4 variants of matching operands. Originally the code relied on the fact that the matching operands of the two last add instructions of memory index calculations had the same LHS argument. But which operand is the same in the two instructions is actually not essential, so now we allow that to be any of LHS or RHS of each of the two instructions. This increases the chances of vectorization to happen. Reviewed By: volkan Differential Revision: https://reviews.llvm.org/D103912	2021-06-10 16:31:35 -07:00
Joachim Meyer	4f01122c3f	[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads. This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL. The question remains why this was initially added and what the implications of removing this optimization would be. Do we need an alternative mechanism to propagate the information about legality of if-conversion? Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general? I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous. Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103907	2021-06-10 23:37:57 +02:00
Alexey Bataev	a893b44187	[SLP]Disable scheduling of insertelements. There is no need to schedule insertelement instructions. The compiler did not schedule them before it started support their vectorization and it should not do it after. We pre-schedule them manually when finding a build vector sequence. Disabling scheduling of insertelement instructions improves compile time and vectorization of the very large basic blocks by saving scheduling budget for other instructions. Differential Revision: https://reviews.llvm.org/D104026	2021-06-10 10:25:26 -07:00
Keith Smiley	026170d17d	Fix range-loop-analysis warning ``` llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:19: warning: loop variable 'VF' of type 'const llvm::ElementCount' creates a copy from type 'const llvm::ElementCount' [-Wrange-loop-analysis] for (const auto VF : VFCandidates) { ^ llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:8: note: use reference type 'const llvm::ElementCount &' to prevent copying for (const auto VF : VFCandidates) { ^~~~~~~~~~~~~~~ & 1 warning generated. ``` Differential Revision: https://reviews.llvm.org/D103970	2021-06-10 08:39:54 -07:00
Alexey Bataev	a0086add2e	[SLP]Improve gathering of scalar elements. 1. Better sorting of scalars to be gathered. Trying to insert constants/arguments/instructions-out-of-loop at first and only then the instructions which are inside the loop. It improves hoisting of invariant insertelements instructions. 2. Better detection of shuffle candidates in gathering function. 3. The cost of insertelement for constants is 0. Part of D57059. Differential Revision: https://reviews.llvm.org/D103458	2021-06-09 05:23:21 -07:00
Kerry McLaughlin	14eeccfe9a	[LoopVectorize] Don't use strict reductions when reordering is allowed If the `-enable-strict-reductions` flag is set to true, then currently we will always choose to vectorize the loop with strict in-order reductions. This is not necessary where we allow the reordering of FP operations, such as when loop hints are passed via metadata. This patch moves useOrderedReductions so that we can also check whether loop hints allow reordering, in which case we should use the default behaviour of vectorizing with unordered reductions. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D103814	2021-06-08 10:39:29 +01:00
Florian Hahn	1465e7770b	[VPlan] Print successors of VPRegionBlocks. The non-DOT printing does not include the successors of VPregionBlocks. This patch use the same style for printing successors as for VPBasicBlock. I think the printing of successors could be a bit improved further, as at the moment it is hard to ensure a check line matches all successors. But that can be done as follow-up. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D103515	2021-06-07 17:57:21 +01:00
Florian Hahn	23c2f2e6b2	[LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW. This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling. At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything. Note that this could probably be further improved by using information from the original IV. Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g Part of a set of fixes required for PR50412. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103255	2021-06-07 10:47:52 +01:00
Alexey Bataev	8c48d77cdf	[SLP]Improve cost estimation/emission of externally used extractelements. No need to recalculate the cost of extractelements, just no need to compensate the cost of all extractelements, need to check before if this is actually going to be removed at the vectorization. Also, no need to generate new extractelement instruction, we may just regenerate the original one. It may improve the final vectorization. Differential Revision: https://reviews.llvm.org/D102933	2021-06-03 10:26:59 -07:00
Alexey Bataev	89f3bc7698	[SLP]Allow to reorder nodes with >2 scalar values. tryToVectorizeList function allows to reorder only 2 scalars. Patch allows to reorder >2 scalars. Also, to avoid possible regressions, it allows extra vectorization of the remaining parts of the scalars elements if possible. Part of D57059. Differential Revision: https://reviews.llvm.org/D103247	2021-06-03 10:01:36 -07:00
Harald van Dijk	5d2b3de284	[SLP] Avoid std::stable_sort(properlyDominates()). As noticed by NAKAMURA Takumi back in 2017, we cannot use properlyDominates for std::stable_sort as properlyDominates only partially orders blocks. That is, for blocks A, B, C, D, where A dominates B and C dominates D, we have A == C, B == C, but A < B. This is not a valid comparison function for std::stable_sort and causes different results between libstdc++ and libc++. This change uses DFS numbering to give deterministic results for all reachable blocks. Unreachable blocks are ignored already, so do not need special consideration. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D103441	2021-06-03 17:51:52 +01:00
Sander de Smalen	d41cb6bb26	[LV] Build and cost VPlans for scalable VFs. This patch uses the calculated maximum scalable VFs to build VPlans, cost them and select a suitable scalable VF. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98722	2021-06-02 14:47:47 +01:00
Sander de Smalen	034503e9d2	[LV] NFC: Remove redundant isLegalMasked(Gather\|Scatter) functions. This NFC change follows from conversation in D102437, where it was discussed to remove these functions as a separate patch.	2021-06-02 14:09:07 +01:00
Sander de Smalen	3472d3fd9d	[LV] NFC: Replace custom getMemInstValueType by llvm::getLoadStoreType. llvm::getLoadStoreType was added recently and has the same implementation as 'getMemInstValueType' in LoopVectorize.cpp. Since there is no value in having two implementations, this patch removes the custom LV implementation in favor of the generic one defined in Instructions.h.	2021-06-02 14:09:06 +01:00
Harald van Dijk	f126e8ec28	[SLPVectorizer] Ignore unreachable blocks As the existing test unreachable.ll shows, we should be doing more work to avoid entering unreachable blocks: we should not stop vectorization just because a PHI incoming value from an unreachable block cannot be vectorized. We know that particular value will never be used so we can just replace it with poison.	2021-06-01 20:21:04 +01:00
Alexey Bataev	36911971a5	[SLP]Better detection of perfect/shuffles matches for gather nodes. Implemented better scheme for perfect/shuffled matches of the gather nodes which allows to fix the performance regressions introduced by earlier patches. Starting detecting matches for broadcast nodes and extractelement gathering. Differential Revision: https://reviews.llvm.org/D102920	2021-06-01 07:08:07 -07:00
Florian Hahn	d4c070d801	[VectorCombine] Freeze index unless it is known to be non-poison. If the index itself is already poison, the poison propagates through instructions clamping the index to a valid range. This still causes introducing a load of poison, as flagged by Alive2 and pointed out at `575e2aff55`. This patch updates the code to freeze the index, unless it is proven to not be poison. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D103378	2021-06-01 10:40:57 +01:00
Florian Hahn	aa00b1d763	[LV] Try to sink users recursively for first-order recurrences. Update isFirstOrderRecurrence to explore all uses of a recurrence phi and check if we can sink them. If there are multiple users to sink, they are all mapped to the previous instruction. Fixes PR44286 (and another PR or two). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84951	2021-05-31 19:55:33 +01:00
Bardia Mahjour	06eaffa858	[NFC] Remove confusing info about MainLoop VF/UF from debug message	2021-05-28 16:10:04 -04:00
Florian Hahn	007f268c35	[VectorCombine] Check indices for all extracts we scalarize. We need to make sure that the indices of all extracts we scalarize are valid.	2021-05-28 18:35:29 +01:00
Florian Hahn	38641ddf3e	[VPlan] Do not sink uniform recipes in sinkScalarOperands. For uniform ReplicateRecipes, only the first lane should be used, so sinking them would mean we have to compute the value of the first lane multiple times. Also, at the moment, sinking them causes a crash because the value of the first lane is re-used by all users. Reported post-commit for D100258.	2021-05-27 14:07:48 +01:00
Alexey Bataev	27d3528acf	[SLP]Fix vectorization of insertelements with multiple uses. SLP vectorizer should not consider in sertelements with multiple uses as a part of high level build vector, it must be considered as a terminating insertelement in the vector build, otherwise it may produce incorrect code. Differential Revision: https://reviews.llvm.org/D103164	2021-05-26 09:42:18 -07:00
Kerry McLaughlin	9f76a85260	[LoopVectorize] Enable strict reductions when allowReordering() returns false When loop hints are passed via metadata, the allowReordering function in LoopVectorizationLegality will allow the order of floating point operations to be changed: bool allowReordering() const { // When enabling loop hints are provided we allow the vectorizer to change // the order of operations that is given by the scalar loop. This is not // enabled by default because can be unsafe or inefficient. The -enable-strict-reductions flag introduced in D98435 will currently only vectorize reductions in-loop if hints are used, since canVectorizeFPMath() will return false if reordering is not allowed. This patch changes canVectorizeFPMath() to query whether it is safe to vectorize the loop with ordered reductions if no hints are used. For testing purposes, an additional flag (-hints-allow-reordering) has been added to disable the reordering behaviour described above. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D101836	2021-05-26 13:59:12 +01:00
Florian Hahn	8e83ff58c9	[VectorCombine] Remove unneeded InsertPointGuard (NFCI). All users of the builder should set an insert point before using the builder. There should be no need for using InsertPointGuard here.	2021-05-25 17:01:05 +01:00
Florian Hahn	575e2aff55	[VectorCombine] Use constant range info for index scalarization legality. We can only scalarize memory accesses if we know the index is valid. This patch adjusts canScalarizeAcceess to fall back to computeConstantRange to check if the index is known to be valid. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D102476	2021-05-25 13:58:42 +01:00
Anton Afanasyev	b2cd895011	[SLP] Fix "gathering" of insertelement instructions For rare exceptional case vector tree node (insertelements for now only) is marked as `NeedToGather`, this case is processed by patch. Follow-up of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135. Differential Revision: https://reviews.llvm.org/D102675	2021-05-25 01:35:43 +03:00
Florian Hahn	65d3dd7c88	[VPlan] Add first VPlan version of sinkScalarOperands. This patch adds a first VPlan-based implementation of sinking of scalar operands. The current version traverse a VPlan once and processes all operands of a predicated REPLICATE recipe. If one of those operands can be sunk, it is moved to the block containing the predicated REPLICATE recipe. Continue with processing the operands of the sunk recipe. The initial version does not re-process candidates after other recipes have been sunk. It also cannot partially sink induction increments at the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the induction is used for example in a GEP, only the first lane is used and in the lowered IR the adds for the other lanes can be sunk into the predicated blocks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100258	2021-05-24 15:29:58 +01:00
Florian Hahn	e9d97d7d9d	[VPlan] Add mayReadOrWriteMemory & friends. This patch adds initial implementation of mayReadOrWriteMemory, mayReadFromMemory and mayWriteToMemory to VPRecipeBase. Used by D100258.	2021-05-24 13:11:32 +01:00
Florian Hahn	4e8c28b6fb	Recommit "[VectorCombine] Scalarize vector load/extract." This reverts commit `94d54155e2`. This fixes a sanitizer failure by moving scalarizeLoadExtract(I) before foldSingleElementStore(I), which may remove instructions.	2021-05-24 11:35:07 +01:00
Florian Hahn	94d54155e2	Revert "[VectorCombine] Scalarize vector load/extract." This reverts commit `86497785d5`. One of the tests causes an ASAN failure. https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio	2021-05-24 10:11:00 +01:00
Florian Hahn	86497785d5	[VectorCombine] Scalarize vector load/extract. This patch adds a new combine that tries to scalarize chains of `extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is profitable when extracting only a few elements out of a large vector. At the moment, `store (extractelement (load %ptr), %idx), %ptr` operations on large vectors result in huge code in the backend. This can easily be triggered by using the matrix extension, e.g. https://clang.godbolt.org/z/qsccPdPf4 This should complement D98240. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D100273	2021-05-24 09:29:08 +01:00
Alexey Bataev	8dab25954b	[SLP]Improve handling of compensate external uses cost. External insertelement users can be represented as a result of shuffle of the vectorized element and noconsecutive insertlements too. Added support for handling non-consecutive insertelements. Differential Revision: https://reviews.llvm.org/D101555	2021-05-21 07:45:31 -07:00
Daniil Fukalov	e8e88c3353	[TTI] NFC: Change getRegUsageForType to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102541	2021-05-21 15:17:23 +03:00
Alexey Bataev	182162b616	[SLP]Try to vectorize tiny trees with shuffled gathers of extractelements. If we gather extract elements and they actually are just shuffles, it might be profitable to vectorize them even if the tree is tiny. Differential Revision: https://reviews.llvm.org/D101460	2021-05-20 08:36:16 -07:00
David Sherwood	7e95a563c8	Remove scalable vector assert from InnerLoopVectorizer::setDebugLocFromInst In InnerLoopVectorizer::setDebugLocFromInst we were previously asserting that the VF is not scalable. This is because we want to use the number of elements to create a duplication factor for the debug profiling data. However, for scalable vectors we only know the minimum number of elements. I've simply removed the assert for now and added a FIXME saying that we assume vscale is always 1. When vscale is not 1 it just means that the profiling data isn't as accurate, but shouldn't cause any functional problems.	2021-05-19 13:33:10 +01:00
Sander de Smalen	4f86aa650c	[LV] Add -scalable-vectorization=<option> flag. This patch adds a new option to the LoopVectorizer to control how scalable vectors can be used. Initially, this suggests three levels to control scalable vectorization, although other more aggressive options can be added in the future. The possible options are: - Disabled: Disables vectorization with scalable vectors. - Enabled: Vectorize loops using scalable vectors or fixed-width vectors, but favors fixed-width vectors when the cost is a tie. - Preferred: Like 'Enabled', but favoring scalable vectors when the cost-model is inconclusive. Reviewed By: paulwalker-arm, vkmr Differential Revision: https://reviews.llvm.org/D101945	2021-05-19 10:40:56 +01:00
Rong Xu	886629a8c9	[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This patch implements first part of Flow Sensitive SampleFDO (FSAFDO). It has the following changes: (1) disable current discriminator encoding scheme, (2) new hierarchical discriminator for FSAFDO. For this patch, option "-enable-fs-discriminator=true" turns on the new functionality. Option "-enable-fs-discriminator=false" (the default) keeps the current SampleFDO behavior. When the fs-discriminator is enabled, we insert a flag variable, namely, llvm_fs_discriminator, to the object. This symbol will checked by create_llvm_prof tool, and used to generate a profile with FS-AFDO discriminators enabled. If this happens, for an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. Differential Revision: https://reviews.llvm.org/D102246	2021-05-18 16:23:43 -07:00
Arthur Eubanks	6b9524a05b	[NewPM] Don't mark AA analyses as preserved Currently all AA analyses marked as preserved are stateless, not taking into account their dependent analyses. So there's no need to mark them as preserved, they won't be invalidated unless their analyses are. SCEVAAResults was the one exception to this, it was treated like a typical analysis result. Make it like the others and don't invalidate unless SCEV is invalidated. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D102032	2021-05-18 13:49:03 -07:00
Florian Hahn	cc1a6361d3	[VPlan] Add VPUserID to distinguish between recipes and others. This allows cast/dyn_cast'ing from VPUser to recipes. This is needed because there are VPUsers that are not recipes. Reviewed By: gilr, a.elovikov Differential Revision: https://reviews.llvm.org/D100257	2021-05-18 09:17:28 +01:00
Sander de Smalen	81fdc73e5d	[LV] Return both fixed and scalable Max VF from computeMaxVF. This patch introduces a new class, MaxVFCandidates, that holds the maximum vectorization factors that have been computed for both scalable and fixed-width vectors. This patch is intended to be NFC for fixed-width vectors, although considering a scalable max VF (which is disabled by default) pessimises tail-loop elimination, since it can no longer determine if any chosen VF (less than fixed/scalable MaxVFs) is guaranteed to handle all vector iterations if the trip-count is known. This issue will be addressed in a future patch. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D98721	2021-05-18 08:03:48 +01:00
Philip Reames	ed9d70781b	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)" This reverts commit `6d3e3ae8a9`. Still seeing PPC build bot failures, and one arm self host bot failing. I'm officially stumped, and need help from a bot owner to reduce.	2021-05-17 20:53:28 -07:00
Philip Reames	6d3e3ae8a9	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:59:25 -07:00
Philip Reames	d16da7343d	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute" This reverts commit `c23ce54b36`. I apparently missed some newly added non-x86 tests.	2021-05-17 16:49:32 -07:00
Philip Reames	c23ce54b36	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:33:56 -07:00
Sander de Smalen	f82966d19a	[LoopVectorizationLegality] NFC: Mark some interfaces as 'const' This patch marks blockNeedsPredication, isConsecutivePtr, isMaskRequired and getSymbolicStrides as 'const'.	2021-05-14 11:53:54 +01:00
Anton Afanasyev	207cdd7ed9	[SLP] Fix spill cost computation for insertelement tree node This is follow up for D98714, bugfixing.	2021-05-14 13:14:41 +03:00
Sander de Smalen	459c48e04f	NFCI: Remove VF argument from isScalarWithPredication As discussed in D102437, the VF argument to isScalarWithPredication seems redundant, so this is intended to be a non-functional change. It seems wrong to query the widening decision at this point. Removing the operand and code to get the widening decision causes no unit/regression tests to fail. I've also found no issues running the LLVM test-suite. This subsequently removes the VF argument from isPredicatedInst as well, since it is no longer required.	2021-05-14 10:34:40 +01:00
Florian Hahn	bdada7546e	[VPlan] Adjust assert in splitBlock to allow splitting at end. SplitAt should only be dereferenced in the assert if it does not point to the end of the block. This fixes a crash in the added test case.	2021-05-13 13:36:35 +01:00
Anton Afanasyev	ab2c499d3a	[SLP] Add insertelement instructions to vectorizable tree Add new type of tree node for `InsertElementInst` chain forming vector. These instructions could be either removed, or replaced by shuffles during vectorization and we can add this node to cost model, so naturally estimating their cost, getting rid of `CompensateCost` tricks and reducing further work for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this patch is the first step towards revectorization of partially vectorization (to fix PR42022 completely). After adding inserts to tree the next step is to add vector instructions there (for instance, to merge `store <2 x float>` and `store <2 x float>` to `store <4 x float>`). Fixes PR40522 and PR35732. Differential Revision: https://reviews.llvm.org/D98714	2021-05-13 07:41:45 +03:00
Justin Bogner	e7d26aceca	Change the context instruction for computeKnownBits in LoadStoreVectorizer pass This change enables cases for which the index value for the first load/store instruction in a pair could be a function argument. This allows using llvm.assume to provide known bits information in such cases. Patch by Viacheslav Nikolaev. Thanks! Differential Revision: https://reviews.llvm.org/D101680	2021-05-12 15:29:29 -07:00
David Sherwood	b7a11274f9	[LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors In InnerLoopVectorizer::widenPHIInstruction there are cases where we have to scalarise a pointer induction variable after vectorisation. For scalable vectors we already deal with the case where the pointer induction variable is uniform, but we currently crash if not uniform. For fixed width vectors we calculate every lane of the scalarised pointer induction variable for a given VF, however this cannot work for scalable vectors. In this case I have added support for caching the whole vector value for each unrolled part so that we can always extract an arbitrary element. Additionally, we still continue to cache the known minimum number of lanes too in order to improve code quality by avoiding an extractelement operation. I have adapted an existing test `pointer_iv_mixed` from the file: Transforms/LoopVectorize/consecutive-ptr-uniforms.ll and added it here for scalable vectors instead: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D101294	2021-05-12 11:02:11 +01:00
Qiu Chaofan	6d2df18163	[VectorComine] Restrict single-element-store index to inbounds constant Vector single element update optimization is landed in `2db4979`. But the scope needs restriction. This patch restricts the index to inbounds and vector must be fixed sized. In future, we may use value tracking to relax constant restrictions. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D102146	2021-05-12 13:18:20 +08:00
Florian Hahn	faebc6bf10	[VPlan] Register recipe for instr if the simplified value is recipe. If the simplified VPValue is a recipe, we need to register it for Instr, in case it needs to be recorded. The way this is handled in general may change soon, following some post-commit comments. This fixes PR50298.	2021-05-11 14:32:34 +01:00
Sanjay Patel	49950cb1f6	[SLP] restrict matching of load combine candidates The test example from https://llvm.org/PR50256 (and reduced here) shows that we can match a load combine candidate even when there are no "or" instructions. We can avoid that by confirming that we do see an "or". This doesn't apply when matching an or-reduction because that match begins from the operands of the reduction. Differential Revision: https://reviews.llvm.org/D102074	2021-05-11 08:46:40 -04:00
Alexey Bataev	30463bc3f1	[SLP]Do not count perfect diamond matches for gathers several times. Need to remove the old code for avoiding double counting of the gather nodes with perfect diamond matches within the tree after we started detecting perfect/shuffled matching in the previous patch D100495. We may skip the cost for such nodes completely. Differential Revision: https://reviews.llvm.org/D102023	2021-05-10 07:08:07 -07:00
Qiu Chaofan	2db4979c0f	[VectorCombine] Simplify to scalar store if only one element updated This patch simplifies load-insertelt-store pattern into getelementptr-store. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98240	2021-05-08 18:14:51 +08:00
Florian Hahn	75b9997760	[LV] Remove reference of PHI from comment, they are not recorded (NFC). The comment incorrectly states that the PHI is recorded. That's not accurate, only the recipe for the incoming value is recorded. Suggested post-commit for `4ba8720f88`.	2021-05-07 21:34:23 +01:00
Florian Hahn	337d765282	[LV] Assert if trying to sink replicate region into another region (NFC) Currently sinking a replicate region into another replicate region is not supported. Add an assert, to make the problem more obvious, should it occur. Discussed post-commit for `ccebf7a109`.	2021-05-07 21:25:35 +01:00
Florian Hahn	01c26d4e04	[LV] Rename Region to TargetRegion, similar to SinkRegion (NFC). Adjust the name to make it clearer this is the region containing the target recipe, similar to SinkRegion below. Suggested post-commit for `ccebf7a109`.	2021-05-07 21:25:35 +01:00
Caroline Concatto	cf06c8eee3	[LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction The function fixReduction used to assert/crash for scalable vector when a vector reduce could be done with a smaller vector. This patch removes this assertion as it is safe to use scalable vector for vector reduce and truncate. Differential Revision: https://reviews.llvm.org/D101260	2021-05-07 09:37:37 +01:00
Simon Pilgrim	338c1b701f	[SLP] Constify the TreeEntry* input into getEntryCost() + setInsertPointAfterBundle(). NFCI.	2021-05-06 16:20:19 +01:00
Simon Pilgrim	2dab059021	[SLP] Constify the TreeEntry* input into dumpTreeCosts(). NFCI.	2021-05-06 16:20:19 +01:00
Simon Pilgrim	1b47489fd0	[SLP] Use empty() instead of size() == 0. NFCI.	2021-05-06 16:20:18 +01:00
David Green	4979c90458	[LV] Account for tripcount when calculation vectorization profitability The loop vectorizer will currently assume a large trip count when calculating which of several vectorization factors are more profitable. That is often not a terrible assumption to make as small trip count loops will usually have been fully unrolled. There are cases however where we will try to vectorize them, and especially when folding the tail by masking can incorrectly choose to vectorize loops that are not beneficial, due to the folded tail rounding the iteration count up for the vectorized loop. The motivating example here has a trip count of 5, so either performs 5 scalar iterations or 2 vector iterations (with VF=4). At a high enough trip count the vectorization becomes profitable, but the rounding up to 2 vector iterations vs only 5 scalar makes it unprofitable. This adds an alternative cost calculation when we know the max trip count and are folding tail by masking, rounding the iteration count up to the correct number for the vector width. We still do not account for anything like setup cost or the mixture of vector and scalar loops, but this is at least an improvement in a few cases that we have had reported. Differential Revision: https://reviews.llvm.org/D101726	2021-05-06 12:36:46 +01:00
Kerry McLaughlin	8c9742bd23	[SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences Adds support for scalable vectorization of loops containing first-order recurrences, e.g: ``` for(int i = 0; i < n; i++) b[i] = a[i] + a[i - 1] ``` This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into account when inserting into and extracting from the last lane of a vector. CreateVectorSplice has been added to construct a vector for the recurrence, which returns a splice intrinsic for scalable types. For fixed-width the behaviour remains unchanged as CreateVectorSplice will return a shufflevector instead. The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D101076	2021-05-06 11:35:39 +01:00
Philip Reames	80e8025083	[LV] Workaround PR49900 (a crash due to analyzing partially mutated IR) LoopVectorize has a fairly deeply baked in design problem where it will try to query analysis (primarily SCEV, but also ValueTracking) in the midst of mutating IR. In particular, the intermediate IR state does not represent the semantics of the original (or final) program. Fixing this for real is hard, but all of the cases seen so far share a common symptom. In cases seen to date, the analysis being queried is the computation of the original loop's trip count. We can fix this particular instance of the issue by simply computing the trip count early, and caching it. I want to be really clear that this is nothing but a workaround. It does nothing to fix the root issue, and at best, delays the time until we have to fix this for real. Florian and I have discussed an eventual solution in the review comments for https://reviews.llvm.org/D100663, but it's a lot of work. Test taken from https://reviews.llvm.org/D100663. Differential Revision: https://reviews.llvm.org/D101487	2021-05-05 09:56:28 -07:00
Florian Hahn	ccebf7a109	[VPlan] Properly handle sinking of replicate regions. This patch updates the code that sinks recipes required for first-order recurrences to properly handle replicate-regions. At the moment, the code would just move the replicate recipe out of its replicate-region, producing an invalid VPlan. When sinking a recipe in a replicate-region, we have to sink the whole region. To do that, we first need to split the block at the target recipe and move the region in between. This patch also adds a splitAt helper to VPBasicBlock to split a VPBasicBlock at a given iterator. Fixes PR50009. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100751	2021-05-04 22:36:01 +01:00
Florian Hahn	4ba8720f88	[VPlan] Representing backedge def-use feeding reduction phis. This patch updates the code handling reduction recipes to also keep track of the incoming value from the latch in the recipe. This is needed to model the def-use chains completely in VPlan, so that it is possible to replace the incoming value with an arbitrary VPValue. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D99294	2021-05-04 16:33:22 +01:00
Sander de Smalen	9931ae645e	Reland "[LV] Calculate max feasible scalable VF." Relands https://reviews.llvm.org/D98509 This reverts commit `51d648c119`.	2021-05-04 15:44:41 +01:00
Alexey Bataev	369cd2ae52	Revert "[SLP]Allow masked gathers only if allowed by target." This reverts commit `fd18547e07`. Need to add a check for the size of the vectorization tree to avoid some extra vectorization.	2021-05-04 04:53:22 -07:00
Alexey Bataev	fd18547e07	[SLP]Allow masked gathers only if allowed by target. Need to check if target allows/supports masked gathers before trying to estimate its cost, otherwise we may fail to vectorize some of the patterns because of too pessimistic cost model. Part of D57059. Differential Revision: https://reviews.llvm.org/D101297	2021-05-03 08:06:20 -07:00
Alexey Bataev	2e4cc9a725	Revert "[SLP]Allow masked gathers only if allowed by target." This reverts commit `b5f64768cf` to fix a compiler crash revealed by buildbots.	2021-05-03 07:20:00 -07:00
Alexey Bataev	b5f64768cf	[SLP]Allow masked gathers only if allowed by target. Need to check if target allows/supports masked gathers before trying to estimate its cost, otherwise we may fail to vectorize some of the patterns because of too pessimistic cost model. Part of D57059. Differential Revision: https://reviews.llvm.org/D101297	2021-05-03 06:45:42 -07:00
Florian Hahn	2b7fa7f744	[LV] Iterate over recipes in VPlan to fix PHI (NFC). As we gradually move more elements of LV to VPlan, we are trying to reduce the number of places that still has to check IR of the original loop. This patch adjusts the code to fix cross iteration phis to get the PHIs to fix directly from the VPlan that is executed. We still need the original PHI to check for first-order recurrences, but we can get rid of that once we model that explicitly in VPlan as well. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D99293	2021-05-03 14:09:46 +01:00
Florian Hahn	942e068d7a	[VPlan] Add VPBasicBlock::phis() helper (NFC). This patch introduces a helper to obtain an iterator range for the PHI-like recipes in a block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100101	2021-05-02 19:20:13 +01:00
Justin Bogner	9542721085	Add support for llvm.assume intrinsic to the LoadStoreVectorizer pass Patch by Viacheslav Nikolaev. Thanks!	2021-04-30 13:39:46 -07:00
Alexey Bataev	a3fd82c289	[SLP]Fix the crash on cost calculation if non-compatible vectors shuffled. If the extracts from the non-power-2 vectors are recognized as shuffles, need some extra checks to not crash cost calculations if trying to gext the ecost for subvector extracts. In this case need to check carefully that we do not exit out of bounds of the original vector, otherwise the TTI's cost model will crash on assert. Differential Revision: https://reviews.llvm.org/D101477	2021-04-30 09:34:20 -07:00
Alexey Bataev	12c51f2358	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 12:48:00 -07:00
Alexey Bataev	6e859f3cd4	Revert "[COST] Improve shuffle kind detection if shuffle mask is provided." This reverts commit `9239932221` to fix a compiler crash on mask checks.	2021-04-29 12:40:33 -07:00
Alexey Bataev	9239932221	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 09:42:56 -07:00
Sander de Smalen	51d648c119	Revert "[LV] Calculate max feasible scalable VF." Temporarily reverting this patch due to some unexpected issue found by one of the PPC buildbots. This reverts commit `584e9b6e4b`.	2021-04-29 16:04:37 +01:00
Florian Hahn	a0e1313c23	[VPlan] Add getVPSingleValue helper. As suggested in D99294, this adds a getVPSingleValue helper to use for recipes that are guaranteed to define a single value. This replaces uses of getVPValue() which used to default to I = 0.	2021-04-29 13:37:38 +01:00
Bardia Mahjour	ddb3b26a12	[LV] Consider Loop Unroll Hints When Making Interleave Decisions This patch causes the loop vectorizer to not interleave loops that have nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)). Note that if a particular interleave count is being requested (through llvm.loop.interleave_count), it will still be honoured, regardless of the presence of nounroll hints. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D101374	2021-04-28 17:27:52 -04:00
David Sherwood	00e65f3345	[LoopVectorize][SVE] Fix crash when vectorising FP negation This patch fixes a crash encountered when vectorising the following loop: void foo(float dst, float src, long long n) { for (long long i = 0; i < n; i++) dst[i] = -src[i]; } using scalable vectors. I've added a test to Transforms/LoopVectorize/AArch64/sve-basic-vec.ll as well as cleaned up the other tests in the same file. Differential Revision: https://reviews.llvm.org/D98054	2021-04-28 15:22:35 +01:00
Tres Popp	f0e848e63d	Silence unused variable warning	2021-04-28 15:46:09 +02:00
Alexey Bataev	8af4723c58	[SLP]Try to vectorize tiny trees with shuffled gathers. If the first tree element is vectorize and the second is gather, it still might be profitable to vectorize it if the gather node contains less scalars to vectorize than the original tree node. It might be profitable to use shuffles. Differential Revision: https://reviews.llvm.org/D101397	2021-04-28 06:35:31 -07:00
David Sherwood	6998f8ae2d	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718	2021-04-28 13:41:07 +01:00
Sander de Smalen	584e9b6e4b	[LV] Calculate max feasible scalable VF. This patch also refactors the way the feasible max VF is calculated, although this is NFC for fixed-width vectors. After this change scalable VF hints are no longer truncated/clamped to a shorter scalable VF, nor does it drop the 'scalable flag' from the suggested VF to vectorize with a similar VF that is fixed. Instead, the hint is ignored which means the vectorizer is free to find a more suitable VF, using the CostModel to determine the best possible VF. Reviewed By: c-rhodes, fhahn Differential Revision: https://reviews.llvm.org/D98509	2021-04-28 12:30:00 +01:00
Kerry McLaughlin	9cc217ab36	[LoopVectorize] Prevent multiple Phis being generated with in-order reductions When using the -enable-strict-reductions flag where UF>1 we generate multiple Phi nodes, though only one of these is used as an input to the vector.reduce.fadd intrinsics. The unused Phi nodes are removed later by instcombine. This patch changes widenPHIInstruction/fixReduction to only generate one Phi, and adds an additional test for unrolling to strict-fadd.ll Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D100570	2021-04-28 11:29:01 +01:00
David Sherwood	6968520c3b	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `4afeda9157`.	2021-04-27 15:46:03 +01:00
David Sherwood	4afeda9157	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718	2021-04-27 15:26:15 +01:00
Alexey Bataev	24590d8d67	[SLP]Improved isGatherShuffledEntry, NFC. Reworked isGatherShuffledEntry function, simplified and moved common code to the lambda (it shall go away when non-power-2 patch will be landed).	2021-04-27 05:59:46 -07:00
Florian Hahn	cb96d802d4	[LV] Hoist code to get vector loop latch (NFC). Address suggestion from D99294.	2021-04-27 13:30:17 +01:00
Florian Hahn	160e729cf0	[VPlan] Use recursive traversal iterator in VPSlotTracker. This patch simplifies VPSlotTracker by using the recursive traversal iterator to traverse all blocks in a VPlan in reverse post-order when numbering VPValues in a plan. This depends on a fix to RPOT (D100169). It also extends the traversal unit tests to check RPOT. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100176	2021-04-27 12:39:06 +01:00
Florian Hahn	7302fe4328	[VPlan] Make blocksOnly work properly with ranges over const pointers. When iterating over const blocks, the base type in the lambdas needs to use const VPBlockBase *, otherwise it cannot be used with input iterators over const VPBlockBase. Also adjust the type of the input iterator range to const &, as it does not take ownership of the input range.	2021-04-26 10:52:35 +01:00
Florian Hahn	4b9be5ac08	[VPlan] Add VPBlockUtils::blocksOnly helper. This patch adds a blocksOnly helpers which take an iterator range over VPBlockBase * or const VPBlockBase * and returns an interator range that only include BlockTy blocks. The accesses are casted to BlockTy. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D101093	2021-04-25 17:38:09 +01:00
Florian Hahn	89c4dda076	[VPlan] Add GraphTraits impl to traverse through VPRegionBlock. This patch adds a new iterator to traverse through VPRegionBlocks and a GraphTraits specialization using the iterator to traverse through VPRegionBlocks. Because there is already a GraphTraits specialization for VPBlockBase * and co, a new VPBlockRecursiveTraversalWrapper helper is introduced. This allows us to provide a new GraphTraits specialization for that type. Users can use the new recursive traversal by using this wrapper. The graph trait visits both the entry block of a region, as well as all its successors. Exit blocks of a region implicitly have their parent region's successors. This ensures all blocks in a region are visited before any blocks in a successor region when doing a reverse post-order traversal of the graph. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100175	2021-04-23 17:26:47 +01:00
Alexey Bataev	18c61fc498	[SLP]Skip undefs trying to find perfect/shuffled tree entries matching. We can skip check for undefs trying to find perfect/shuffled tree entries matching, they can be ignored completely improving the final cost/vectorization results. Differential Revision: https://reviews.llvm.org/D101061	2021-04-22 08:59:07 -07:00
Joe Ellis	2c551aedcf	[LoopVectorize] Fix bug where predicated loads/stores were dropped This commit fixes a bug where the loop vectoriser fails to predicate loads/stores when interleaving for targets that support masked loads and stores. Code such as: 1 void foo(int restrict data1, int restrict data2) 2 { 3 int counter = 1024; 4 while (counter--) 5 if (data1[counter] > data2[counter]) 6 data1[counter] = data2[counter]; 7 } ... could previously be transformed in such a way that the predicated store implied by: if (data1[counter] > data2[counter]) data1[counter] = data2[counter]; ... was lost, resulting in miscompiles. This bug was causing some tests in llvm-test-suite to fail when built for SVE. Differential Revision: https://reviews.llvm.org/D99569	2021-04-22 15:05:54 +00:00
Alexey Bataev	d4f5f23bbb	[SLP]Replace more `TTI` with `TTIRef`, NFC. To pacify MSVC buildbots.	2021-04-22 07:53:20 -07:00
Alexey Bataev	da2cdfd421	[SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC buildbots, NFC.	2021-04-22 07:49:48 -07:00
Alexey Bataev	e99b98cb1b	[SLP]Improve cost model for the vectorized extractelements. 1. No need to call `areAllUsersVectorized` as later the cost is calculated only if the instruction has one use and gets vectorized. 2. Need to calculate the cost of the dead extractelement more precisely, taking the vector type of the vector operand, not the resulting vector type. Part of D57059. Differential Revision: https://reviews.llvm.org/D99980	2021-04-22 07:40:17 -07:00
David Sherwood	5a229a6702	[LoopVectorize] Don't create unnecessary vscale intrinsic calls In quite a few cases in LoopVectorize.cpp we call createStepForVF with a step value of 0, which leads to unnecessary generation of llvm.vscale intrinsic calls. I've optimised IRBuilder::CreateVScale and createStepForVF to return 0 when attempting to multiply vscale by 0. Differential Revision: https://reviews.llvm.org/D100763	2021-04-22 09:01:52 +01:00
Alexey Bataev	af870e11ae	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 09:08:46 -07:00
Alexey Bataev	b82344a019	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit `daf6e18c55` to fix the compiler crash.	2021-04-20 08:29:32 -07:00
Alexey Bataev	daf6e18c55	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 07:46:49 -07:00
Alexey Bataev	cf00cb8bed	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit `b232771aca` to fix buildbots.	2021-04-20 07:16:11 -07:00
Alexey Bataev	b232771aca	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 06:55:55 -07:00
Sander de Smalen	86729538bd	[LV] Let selectVectorizationFactor reason directly on VectorizationFactor. Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single VectorizationFactor which comprises the two and also removes the need for converting an integer value to float. This simplifies the query when asking if one VF is more profitable than another when we want to extend this for scalable vectors (which may require additional options to determine if e.g. a scalable VF of the some cost, is more profitable than a fixed VF of the same cost). The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs no longer truncates the floating-point cost from `float` to `unsigned` to then perform the calculation on the truncated cost. It now does a cost comparison with the correct precision. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100121	2021-04-20 09:54:45 +01:00
Alexey Bataev	8030481065	Revert "[SLP]Add detection of shuffled/perfect matching of tree entries." This reverts commit `d6fde91379` to fix compiler crashes.	2021-04-19 14:10:04 -07:00
Alexey Bataev	d6fde91379	[SLP]Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Differential Revision: https://reviews.llvm.org/D100495	2021-04-19 13:29:30 -07:00
Cullen Rhodes	f0bc2782f2	[TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D100377	2021-04-19 11:01:34 +00:00
Florian Hahn	49999d4364	[VPlan] Replace a few unnecessary includes with forward decls.	2021-04-15 20:08:31 +01:00
Florian Hahn	6adebe3fd2	[VPlan] Add VPRecipeBase::mayHaveSideEffects. Add an initial version of a helper to determine whether a recipe may have side-effects. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100259	2021-04-15 11:49:40 +01:00
David Sherwood	ea14df695e	[SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction There were a few places in widenPHIInstruction where calculations of offsets were failing to take the runtime calculation of VF into account for scalable vectors. I've fixed those cases in this patch as well as adding an assert that we should not be scalarising for scalable vectors. Tests are added here: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D99254	2021-04-15 10:51:49 +01:00
David Sherwood	7120f89f7d	[NFC][LoopVectorize] Remove unnecessary VF.isScalable asserts There are a few places in LoopVectorize.cpp where we have been too cautious in adding VF.isScalable() asserts and it can be confusing. It also makes it more difficult to see the genuine places where work needs doing to improve scalable vectorization support. This patch changes getMemInstScalarizationCost to return an invalid cost instead of firing an assert for scalable vectors. Also, vectorizeInterleaveGroup had multiple asserts all for the same thing. I have removed all but one assert near the start of the function, and added a new assert that we aren't dealing with masks for scalable vectors. Differential Revision: https://reviews.llvm.org/D99727	2021-04-15 09:41:03 +01:00
Simon Pilgrim	b49c41afba	[SLP] createOp - fix null dereference warning. NFCI. Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.	2021-04-14 15:24:41 +01:00
Sander de Smalen	bd86824d98	[TTI] NFC: Change getArithmeticReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html This patch is practically NFC, with the exception of an AArch64 SVE related cost-model change, where we can now return an Invalid cost instead of some bogus number. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100201	2021-04-13 14:20:59 +01:00
Sander de Smalen	92d8421f49	[TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100199	2021-04-13 14:20:58 +01:00
dfukalov	d066079728	[NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset. Main reason is preparation to transform AliasResult to class that contains offset for PartialAlias case. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98027	2021-04-09 12:54:22 +03:00
Alexey Bataev	ab124bbe2a	[SLP]Fix PR49898: Infinite loop in SLP vectorizer. We should not re-try attempt of finding of the consecutive store chain if it was tried before. Differential Revision: https://reviews.llvm.org/D100131	2021-04-08 14:18:06 -07:00
Florian Hahn	e4de3cdf3d	[LV] Pass VPWidenPHIRecipe to widenPHIInstruction (NFC). Instead of passing the start value and the defined value to widenPHIInstruction, pass the VPWidenPHIRecipe directly, which can be used to get both (and more in future patches).	2021-04-08 14:25:10 +01:00
David Green	8675ef100f	[LV] Logical and/or select costs D99674 stopped the folding of certain select operations into and/or, due to incorrect folding in the presence of poison. D97360 added some costs to attempt to account for the change, but only worked at the getUserCost level, not the getCmpSelInstrCost that the vectorizer will use directly. This adds similar logic into the vectorizer to handle these logical and/or selects, treating them like and/or directly. This fixes 60% performance regressions from code like the attached test case. Differential Revision: https://reviews.llvm.org/D99884	2021-04-08 10:39:47 +01:00
Alexey Bataev	a78e86e6be	[SLP]Avoid multiple attempts to vectorize CmpInsts. No need to lookup through and/or try to vectorize operands of the CmpInst instructions during attempts to find/vectorize min/max reductions. Compiler implements postanalysis of the CmpInsts so we can skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save compile time. Differential Revision: https://reviews.llvm.org/D99950	2021-04-07 06:15:42 -07:00
Philip Reames	a6d2a8d6f5	Add a subclass of IntrinsicInst for llvm.assume [nfc] Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.	2021-04-06 11:16:22 -07:00
Kerry McLaughlin	7344f3d39a	[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to reorder FP operations. However, it may still be beneficial to vectorize the loop by moving the reduction inside the vectorized loop and making sure that the scalar reduction value be an input to the horizontal reduction, e.g: %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ] %load = load <8 x float> %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load) This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions. For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D98435	2021-04-06 14:45:34 +01:00
Kerry McLaughlin	857b8a73da	[LoopVectorize] Change the identity element for FAdd Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd. Reviewed By: dmgreen, spatel Differential Revision: https://reviews.llvm.org/D98963	2021-04-06 12:13:43 +01:00
Florian Hahn	a6b06b785c	[VPlan] Print VPValue operands for VPWidenPHI if possible. For VPWidenPHIRecipes that model all incoming values as VPValue operands, print those operands instead of printing the original PHI. D99294 updates recipes of reduction PHIs to use the VPValue for the incoming value from the loop backedge, making use of this new printing.	2021-04-06 12:11:21 +01:00
Alexey Bataev	00a84f9a7f	[SLP]Improve vectorization of the CmpInst instructions. During vectorization better to postpone the vectorization of the CmpInst instructions till the end of the basic block. Otherwise we may vectorize it too early and may miss some vectorization patterns, like reductions. Reworked part of D57059 Differential Revision: https://reviews.llvm.org/D99796	2021-04-05 06:22:51 -07:00
Fangrui Song	8e5f3d04f2	[SLPVectorizer] Fix divide-by-zero after D99719 Will add a test case later.	2021-04-02 11:13:51 -07:00
Florian Hahn	8867fc69f0	[LV] Hoist mapping of IR operands to VPValues (NFC). This patch moves mapping of IR operands to VPValues out of tryToCreateWidenRecipe. This allows using existing VPValue operands when widening recipes directly, which will be introduced in future patches.	2021-04-02 17:57:20 +01:00
Alexey Bataev	5fcb07a070	[SLP]Fix a bug in min/max reduction, number of condition uses. The ultimate reduction node may have multiple uses, but if the ultimate reduction is min/max reduction and based on SelectInstruction, the condition of this select instruction must have only single use. Differential Revision: https://reviews.llvm.org/D99753	2021-04-02 07:09:44 -07:00
Florian Hahn	0f3230390b	[SLP] Better estimate cost of no-op extracts on target vectors. The motivation for this patch is to better estimate the cost of extracelement instructions in cases were they are going to be free, because the source vector can be used directly. A simple example is %v1.lane.0 = extractelement <2 x double> %v.1, i32 0 %v1.lane.1 = extractelement <2 x double> %v.1, i32 1 %a.lane.0 = fmul double %v1.lane.0, %x %a.lane.1 = fmul double %v1.lane.1, %y Currently we only consider the extracts free, if there are no other users. In this particular case, on AArch64 which can fit <2 x double> in a vector register, the extracts should be free, independently of other users, because the source vector of the extracts will be in a vector register directly, so it should be free to use the vector directly. The SLP vectorized version of noop_extracts_9_lanes is 30%-50% faster on certain AArch64 CPUs. It looks like this does not impact any code in SPEC2000/SPEC2006/MultiSource both on X86 and AArch64 with -O3 -flto. This originally regressed after D80773, so if there's a better alternative to explore, I'd be more than happy to do that. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D99719	2021-04-02 10:40:12 +01:00
Alexey Bataev	c03696da5e	[SLP]Improve and fix getVectorElementSize. 1. Need to cleanup InstrElementSize map for each new tree, otherwise might use sizes from the previous run of the vectorization attempt. 2. No need to include into analysis the instructions from the different basic blocks to save compile time. Differential Revision: https://reviews.llvm.org/D99677	2021-04-01 06:51:26 -07:00
Alexey Bataev	ce98a0556a	[SLP]Remove `else` after `return`, NFC.`	2021-04-01 05:33:01 -07:00
Huihui Zhang	fe5c4a06a4	[LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism. Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM test consecutive-ptr-uniforms.ll . Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99549	2021-03-31 11:21:07 -07:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Huihui Zhang	d857a81437	[VPlan] Use SetVector for VPExternalDefs to prevent non-determinism. Use SetVector instead of SmallPtrSet for external definitions created for VPlan. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM-Unit test VPRecipeTest.dump. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99544	2021-03-30 12:10:56 -07:00
David Sherwood	a08c7736a7	[LoopVectorize] Add support for scalable vectorization of induction variables This patch adds support for the vectorization of induction variables when using scalable vectors, which required the following changes: 1. Removed assert from InnerLoopVectorizer::getStepVector. 2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use a runtime determined value for VF and removed an assert. 3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable vectors. I did this by calculating the full vector value for each Part of the unroll factor (UF) and caching this in the VP state. This means that we are always able to extract an arbitrary element from the vector if necessary. In addition to this, I also permitted the caching of the individual lane values themselves for the known minimum number of elements in the same way we do for fixed width vectors. This is a further optimisation that improves the code quality since it avoids unnecessary extractelement operations when extracting the first lane. 4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while testing some code paths I noticed this is currently broken for scalable vectors. Various tests to support different cases have been added here: Transforms/LoopVectorize/AArch64/sve-inductions.ll Differential Revision: https://reviews.llvm.org/D98715	2021-03-30 11:13:31 +01:00
Florian Hahn	c773d0f973	Recommit "[LV] Move runtime pointer size check to LVP::plan()." Re-apply `25fbe803d4`, with a small update to emit the right remark class. Original message: [LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri	2021-03-29 16:14:27 +01:00
Florian Hahn	485c8ce733	Revert "[LV] Move runtime pointer size check to LVP::plan()." This reverts commit `25fbe803d4`. This breaks a clang test which filters for the wrong remark type.	2021-03-29 14:41:53 +01:00
Sanjay Patel	da381cf7ce	[SLP] allow matching integer min/max intrinsics as reduction ops This is a 2nd try of: `3c8473ba53` which was reverted at: `a26312f9d4` because of crashing. This version includes extra code and tests to avoid the known crashing examples as discussed in PR49730. Original commit message: As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-29 09:38:18 -04:00
Florian Hahn	25fbe803d4	[LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98634	2021-03-29 14:12:29 +01:00
Florian Hahn	8c6c357897	[LV] Mark a few more cost-model members as const (NFC).	2021-03-28 14:59:48 +01:00
Florian Hahn	d2855eba81	[LV] Fix formatting from `2f9d68c3f1`.	2021-03-27 21:29:56 +00:00
Florian Hahn	2f9d68c3f1	[LV] Mark some methods as const (NFC). Mark a few methods as const, as they do not modify any state.	2021-03-27 21:27:53 +00:00
Sanjay Patel	b0797e0c12	[SLP] use dyn_cast instead of isa + cast; NFC	2021-03-26 13:52:31 -04:00
Sanjay Patel	a26312f9d4	Revert "[SLP] allow matching integer min/max intrinsics as reduction ops" This reverts commit `3c8473ba53` and includes test diffs to maintain testing status. There's at least 1 place that was not updated with `7202f47508` , so we can crash mismatching select and intrinsics as shown in PR49730.	2021-03-26 09:59:14 -04:00
David Sherwood	c39460cc4f	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `240aa96cf2`.	2021-03-26 11:36:53 +00:00
David Sherwood	240aa96cf2	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. Differential Revision: https://reviews.llvm.org/D98512	2021-03-26 11:27:12 +00:00
Yevgeny Rouban	f7ef26ef0b	[SLP] Fix crash in reduction for integer min/max The SCEV commit `b46c085d2b` [NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions seems to reveal a new crash in SLPVectorizer. SLP crashes expecting a SelectInst as an externally used value but umin() call is found. The patch relaxes the assumption to make the IR flag propagation safe. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D99328	2021-03-25 21:44:21 +07:00
Alexey Bataev	568c874117	[SLP]Improve and simplify extendSchedulingRegion. We do not need to scan further if the upper end or lower end of the basic block is reached already and the instruction is not found. It means that the instruction is definitely in the lower part of basic block or in the upper block relatively. This should improve compile time for the very big basic blocks. Differential Revision: https://reviews.llvm.org/D99266	2021-03-25 05:31:58 -07:00
Florian Hahn	9d45579279	[LV] Factor out phi type access to variable (NFC). A slight simplification of the code to reduce future diffs.	2021-03-24 19:25:22 +00:00
Florian Hahn	8d1342f79d	[LV] Remove redundant access to Legal::getReductionVars() (NFC). The reduction descriptor is retrieved earlier and stored in a variable RdxDesc already.	2021-03-24 19:15:14 +00:00
Sander de Smalen	55d18b3cc2	[TTI] Return a TypeSize from getRegisterBitWidth. This patch changes the interface to take a RegisterKind, to indicate whether the register bitwidth of a scalar register, fixed-width vector register, or scalable vector register must be returned. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D98874	2021-03-24 14:45:13 +00:00
Florian Hahn	cd0c00c9fe	[LV] Move exact FP math check out of Requirements. We know if the loop contains FP instructions preventing vectorization after we are done with legality checks. This patch updates the code the check for un-vectorizable FP operations earlier, to avoid unnecessarily running the cost model and picking a vectorization factor. It also makes the code more direct and moves the check to a position where similar checks are done. I might be missing something, but I don't see any reason to handle this check differently to other, similar checks. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98633	2021-03-24 11:01:44 +00:00
Alexey Bataev	99203f2004	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 14:25:36 -07:00
Alexey Bataev	f1b47ad278	Revert "[Analysis]Add getPointersDiff function to improve compile time." This reverts commit `065a14a12d` to investigate and fix crash in SLP vectorizer.	2021-03-23 13:17:54 -07:00
Alexey Bataev	065a14a12d	[Analysis]Add getPointersDiff function to improve compile time. Added getPointersDiff function to LoopAccessAnalysis and used it instead direct calculatoin of the distance between pointers and/or isConsecutiveAccess function in SLP vectorizer to improve compile time and detection of stores consecutive chains. Part of D57059 Differential Revision: https://reviews.llvm.org/D98967	2021-03-23 12:58:42 -07:00
Sanjay Patel	3c8473ba53	[SLP] allow matching integer min/max intrinsics as reduction ops As noted in D98152, we need to patch SLP to avoid regressions when we start canonicalizing to integer min/max intrinsics. Most of the real work to make this possible was in: `7202f47508` Differential Revision: https://reviews.llvm.org/D98981	2021-03-23 08:56:44 -04:00
David Sherwood	d70251163f	[LoopVectorize][NFC] Refactor code to use IRBuilder::CreateStepVector In places where we create a ConstantVector whose elements are a linear sequence of the form <start, start + 1, start + 2, ...> I've changed the code to make use of CreateStepVector, which creates a vector with the sequence <0, 1, 2, ...>, and a vector addition operation. This patch is a non-functional change, since the output from the vectoriser remains unchanged for fixed length vectors and there are existing asserts that still fire when attempting to use scalable vectors for vectorising induction variables. In a later patch we will enable support for scalable vectors in InnerLoopVectorizer::getStepVector(), which relies upon the new stepvector intrinsic in IRBuilder::CreateStepVector. Differential Revision: https://reviews.llvm.org/D97861	2021-03-23 11:29:05 +00:00
Florian Hahn	f759d512c8	[VPlan] Include name when printing after `93a9d2de8f`. The name is included when printing in DOT mode. Also print it in non-DOT mode after `93a9d2de8f`. This will become more important to distinguish different plans once VPlans are gradually refined.	2021-03-23 09:50:14 +00:00
Bjorn Pettersson	688cdddafb	[SLP] Honor min/max regsize and min/max VF in vectorizeStores Make sure we use PowerOf2Floor instead of PowerOf2Ceil when calculating max number of elements that fits inside a vector register (otherwise we could end up creating vectors larger than the maximum vector register size). Also make sure we honor the min/max VF (as given by TTI or cmd line parameters) when doing vectorizeStores. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D97691	2021-03-22 17:29:35 +01:00
Andrei Elovikov	92205cb27f	[NFC][VPlan] Guard print routines with "#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)" Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D98897	2021-03-19 10:50:12 -07:00
Andrei Elovikov	93a9d2de8f	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-19 10:50:12 -07:00
Mehdi Amini	3614df3537	Revert "[VPlan] Add plain text (not DOT's digraph) dumps" This reverts commit `6b053c9867`. The build is broken: ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const >>> referenced by LoopVectorize.cpp >>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a	2021-03-18 19:20:39 +00:00
Andrei Elovikov	6b053c9867	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-18 11:33:39 -07:00
Alexey Bataev	b3ced9852c	[SLP]Fix crash on extending scheduling region. If SLP vectorizer tries to extend the scheduling region and runs out of the budget too early, but still extends the region to the new ending instructions (i.e., it was able to extend the region for the first instruction in the bundle, but not for the second), the compiler need to recalculate dependecies in full, just like if the extending was successfull. Without it, the schedule data chunks may end up with the wrong number of (unscheduled) dependecies and it may end up with the incorrect function, where the vectorized instruction does not dominate on the extractelement instruction. Differential Revision: https://reviews.llvm.org/D98531	2021-03-18 06:11:08 -07:00
David Green	e2935dcfc4	[TTI] Add a Mask to getShuffleCost This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask can be provided a more accurate cost can be provided by the backend. For example VREV costs could be returned by the ARM backend. This should be an NFC until then, laying the groundwork for that to be added. Differential Revision: https://reviews.llvm.org/D98206	2021-03-17 17:46:26 +00:00
LemonBoy	4f024938e4	[LoopVectorize] Refine hasIrregularType predicate The `hasIrregularType` predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector. The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32. The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits). Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D97465	2021-03-17 17:03:47 +01:00
David Green	3c25c40d51	[LV] Account for the cost of predication of scalarized load/store This adds the cost of an i1 extract and a branch to the cost in getMemInstScalarizationCost when the instruction is predicated. These predicated loads/store would generate blocks of something like: %c1 = extractelement <4 x i1> %C, i32 1 br i1 %c1, label %if, label %else if: %sa = extractelement <4 x i32> %a, i32 1 %sb = getelementptr inbounds float, float* %pg, i32 %sa %sv = extractelement <4 x float> %x, i32 1 store float %sa, float* %sb, align 4 else: So this increases the cost by the extract and branch. This is probably still too low in many cases due to the cost of all that branching, but there is already an existing hack increasing the cost using useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is a load or there are more than one store. This patch improves the cost for when there is only a single store, and hopefully at some point in the future the hack can be removed. Differential Revision: https://reviews.llvm.org/D98243	2021-03-17 10:57:50 +00:00
Bu Le	9abe500473	[SLP] Fix the trunc instruction insertion problem Current SLP pass has this piece of code that inserts a trunc instruction after the vectorized instruction. In the case that the vectorized instruction is a phi node and not the last phi node in the BB, the trunc instruction will be inserted between two phi nodes, which will trigger verify problem in debug version or unpredictable error in another pass. This patch changes the algorithm to 'if the last vectorized instruction is a phi, insert it after the last phi node in current BB' to fix this problem.	2021-03-17 13:51:08 +03:00
Sanjay Patel	7202f47508	[SLP] separate min/max matching from its instruction-level implementation; NFC The motivation is to handle integer min/max reductions independently of whether they are in the current cmp+sel form or the planned intrinsic form. We assumed that min/max included a select instruction, but we can decouple that implementation detail by checking the instructions themselves rather than relying on the recurrence (reduction) type.	2021-03-16 17:16:11 -04:00
Florian Hahn	f586de8459	[VPlan] Remove PredInst2Recipe, use VP operands instead. (NFC) Instead of maintaining a separate map from predicated instructions to recipes, we can instead directly look at the VP operands. If the operand comes from a predicated instruction, the operand will be a VPPredInstPHIRecipe with a VPReplicateRecipe as its operand.	2021-03-16 17:40:35 +00:00
Sanjay Patel	40fdb43d30	[SLP] improve readability in reduction logic; NFC We had 2 different and ambiguously-named 'I' variables.	2021-03-16 07:35:13 -04:00
Caroline Concatto	3c03635d53	[SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse This patch adds support for reverse loop vectorization. It is possible to vectorize the following loop: ``` for (int i = n-1; i >= 0; --i) a[i] = b[i] + 1.0; ``` with fixed or scalable vector. The loop-vectorizer will use 'reverse' on the loads/stores to make sure the lanes themselves are also handled in the right order. This patch adds support for scalable vector on IRBuilder interface to create a reverse vector. The IR function CreateVectorReverse lowers to experimental.vector.reverse for scalable vector and keedp the original behavior for fixed vector using shuffle reverse. Differential Revision: https://reviews.llvm.org/D95363	2021-03-16 07:51:59 +00:00
Florian Hahn	fb3ca70761	[LV] Account IV recipes being uniform in VPTransformState::get(). This patch fixes a crash when trying to get a scalar value using VPTransformState::get() for uniform induction values or truncated induction values. IVs and truncated IVs can be uniform and the updated code accounts for that, fixing the crash. This should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=31981	2021-03-12 13:29:06 +00:00
Valery N Dmitriev	73f94969b2	[SLP] Fix crash when matching associative reduction for integer min/max. Associative reduction matcher in SLP begins with select instruction but when it reached call to llvm.umax (or alike) via def-use chain the latter also matched as UMax kind. The routine's later code assumes matched instruction to be a select and thus it merely died on the first encountered cast that did not fit. Differential Revision: https://reviews.llvm.org/D98432	2021-03-11 11:52:57 -08:00
Mauri Mustonen	0de8aeae72	[VPlan] Support to widen select intructions in VPlan native path Add support to widen select instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer. Previously select instructions get handled by the wrong recipe and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97136	2021-03-10 20:59:53 +00:00
Sanjay Patel	23fd647cc6	[SLP] remove dead null check; NFC We cast<> to Instruction (not dyn_cast<>), so we already required/assumed that Cmp is not null.	2021-03-09 17:43:07 -05:00
Mauri Mustonen	494b5ba364	[VPlan] Support to widen call intructions in VPlan native path Add support to widen call instructions in VPlan native path by using a correct recipe when such instructions are encountered. This is already used by inner loop vectorizer. Previously call instructions got handled by wrong recipes and resulted in unreachable instruction errors like this one: https://bugs.llvm.org/show_bug.cgi?id=48139. Patch by Mauri Mustonen <mauri.mustonen@tuni.fi> Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97278	2021-03-06 21:59:52 +00:00
David Sherwood	fec0a0adac	[SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector There are certain loops like this below: for (int i = 0; i < n; i++) { a[i] = b[i] + 1; *inv = a[i]; } that can only be vectorised if we are able to extract the last lane of the vectorised form of 'a[i]'. For fixed width vectors this already works since we know at compile time what the final lane is, however for scalable vectors this is a different story. This patch adds support for extracting the last lane from a scalable vector using a runtime determined lane value. I have added support to VPIteration for runtime-determined lanes that still permit the caching of values. I did this by introducing a new class called VPLane, which describes the lane we're dealing with and provides interfaces to get both the compile-time known lane and the runtime determined value. Whilst doing this work I couldn't find any explicit tests for extracting the last lane values of fixed width vectors so I added tests for both scalable and fixed width vectors. Differential Revision: https://reviews.llvm.org/D95139	2021-03-05 09:57:56 +00:00
Sanjay Patel	1bee549737	[LoopVectorize] propagate fast-math-flags from induction instructions This code assumed that FP math was only permissable if it was fully "fast", so it hard-coded "fast" when creating new instructions. The underlying code already allows matching recurrences/reductions that are only "reassoc", so this change should prevent the potential miscompile seen in the test diffs (we created "fast" ops even though none existed in the original code). I don't know if we need to create the temporary IRBuilder objects used here, so that could be follow-up clean-up. There's an open question about whether we should require "nsz" in addition to "reassoc" here. InstCombine uses that combo for its reassociative folds, but I think codegen is not as strict.	2021-03-04 17:21:32 -05:00
Sanjay Patel	36a489d194	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC Similar to `b3a33553ae`, but this shows a TODO and a potential miscompile is already present. We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification may be possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers. The new test shows that there is an existing bug somewhere in the callers. We assumed that the original code was fully 'fast' and so we produced IR with 'fast' even though it was just 'reassoc'.	2021-03-04 10:40:26 -05:00
Sanjay Patel	b3a33553ae	[Analysis][LoopVectorize] rename "Unsafe" variables/methods; NFC We are tracking an FP instruction that does not have FMF (reassoc) properties, so calling that "Unsafe" seems opposite of the common reading. I also removed one getter method by rolling the null check into the access. Further simplification seems possible. The motivation is to clean up the interactions between FMF and function-level attributes in these classes and their callers.	2021-03-04 08:53:04 -05:00
Andrei Elovikov	b24afec8ae	[NFCI][VPlan] Modify Recipes' print methods to honor Indent parameter Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D97787	2021-03-02 15:32:10 -08:00
Alexey Bataev	a054e94e9e	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the vectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D94992	2021-03-02 06:39:47 -08:00
Florian Hahn	a6c81d3366	[VPlan] Remove recipes from back to front. Update the deletion order when destroying VPBasicBlocks. This ensures recipes that depend on earlier ones in the block are removed first. Otherwise this may cause issues when recipes have remaining users later in the block.	2021-03-01 16:06:30 +00:00
Florian Hahn	53dacb7b67	[LV] Generate RT checks up-front and remove them if required. This patch updates LV to generate the runtime checks just after cost modeling, to allow a more precise estimate of the actual cost of the checks. This information will be used in future patches to generate larger runtime checks in cases where the checks only make up a small fraction of the expected scalar loop execution time. The runtime checks are created up-front in a temporary block to allow better estimating the cost and un-linked from the existing IR. After deciding to vectorize, the checks are moved backed. If deciding not to vectorize, the temporary block is completely removed. This patch is similar in spirit to D71053, but explores a different direction: instead of delaying the decision on whether to vectorize in the presence of runtime checks it instead optimistically creates the runtime checks early and discards them later if decided to not vectorize. This has the advantage that the cost-modeling decisions can be kept together and can be done up-front and thus preserving the general code structure. I think delaying (part) of the decision to vectorize would also make the VPlan migration a bit harder. One potential drawback of this patch is that we speculatively generate IR which we might have to clean up later. However it seems like the code required to do so is quite manageable. Reviewed By: lebedev.ri, ebrevnov Differential Revision: https://reviews.llvm.org/D75980	2021-03-01 10:48:04 +00:00
Sander de Smalen	5e19208d96	[InstructionCost] NFC: Fix up missing cases in LoopVectorize and CodeGenPrep. This fixes the types of a few more cost variables to be of type InstructionCost.	2021-02-24 14:30:03 +00:00
Florian Hahn	6240f436dd	Recommit "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends." This reverts the revert commit `437f0bbcd5`. It adds a new toVPRecipeResult, which forces VPRecipeOrVPValueTy to be constructed with a VPRecipeBase *. This should address ambiguous constructor issues for recipe sub-types that also inherit from VPValue.	2021-02-24 10:36:02 +00:00
Andrei Elovikov	3605b873f6	[NFC][VPlan] Use VPUser to store block's predicate Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96529	2021-02-23 11:08:27 -08:00
Florian Hahn	de40423c85	[LV] Ensure fixNonInductionPHIs uses a valid insertion point. In some cases, Builder's insertion point may be invalidated before using it in VPTransformState::get. Make sure the insertion point is up-to-date. This should fix various sanitizer errors, like https://lab.llvm.org/buildbot/#/builders/5/builds/4933/steps/9/logs/stdio	2021-02-23 18:51:05 +00:00
Florian Hahn	437f0bbcd5	Revert "[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends." This reverts commit `4efa097eb4`, because some the compilers used for some bots do not support automatic conversions to PointerUnion.	2021-02-23 16:57:21 +00:00
Florian Hahn	4efa097eb4	[LV] Allow tryToCreateWidenRecipe to return a VPValue, use for blends. Generalize the return value of tryToCreateWidenRecipe to return either a newly create recipe or an existing VPValue. Use this to avoid creating unnecessary VPBlendRecipes. Fixes PR44800.	2021-02-23 16:52:03 +00:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Alexey Bataev	9a4dd4de9d	[SLP]No need to mark scatter load pointer as scalar as it gets vectorized. Pointer operand of scatter loads does not remain scalar in the tree (it gest vectorized) and thus must not be marked as the scalar that remains scalar in vectorized form. Differential Revision: https://reviews.llvm.org/D96818	2021-02-22 11:58:28 -08:00
Florian Hahn	c7ee57f1dc	[LV] Directly use incoming value for single VPBlendRecipes. VPBlendRecipes with single incoming (value, mask) pair are no-ops. Use the incoming value directly.	2021-02-22 16:10:08 +00:00
Florian Hahn	c11fd0df64	[VPlan] Skip VPWidenPHIRecipe in VPInterleavedACcessInfo. Update unit tests that did not expect VPWidenPHIRecipes after `15a74b64df`.	2021-02-22 10:35:09 +00:00
Florian Hahn	15a74b64df	[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe. This patch extends VPWidenPHIRecipe to manage pairs of incoming (VPValue, VPBasicBlock) in the VPlan native path. This is made possible because we now directly manage defined VPValues for recipes. By keeping both the incoming value and block in the recipe directly, code-generation in the VPlan native path becomes independent of the predecessor ordering when fixing up non-induction phis, which currently can cause crashes in the VPlan native path. This fixes PR45958. Reviewed By: sguggill Differential Revision: https://reviews.llvm.org/D96773	2021-02-22 09:44:25 +00:00
Benjamin Kramer	59f442e6bb	[LV] Fold single-use variable into assert. NFC.	2021-02-19 18:11:39 +01:00
Florian Hahn	edc92a1c42	[LV] Remove VPCallback. Now that all state for generated instructions is managed directly in VPTransformState, VPCallBack is no longer needed. This patch updates the last use of `getOrCreateScalarValue` to instead manage the value directly in VPTransformState and removes VPCallback. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95383	2021-02-19 12:50:41 +00:00
Joseph Huber	c3a3d20093	[LV] Add analysis remark for mixed precision conversions Floating point conversions inside vectorized loops have performance implications but are very subtle. The user could specify a floating point constant, or call a function without realizing that it will force a change in the vector width. An example of this behaviour is seen in https://godbolt.org/z/M3nT6c . The vectorizer should indicate when this happens becuase it is most likely unintended behaviour. This patch adds a simple check for this behaviour by following floating point stores in the original loop and checking if a floating point conversion operation occurs. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D95539	2021-02-17 21:37:08 -05:00
Florian Hahn	f64c626069	[VPlan] Remove unused Phi member from VPWidenPHIRecipe (NFC). The member is not needed any longer after recent changes.	2021-02-16 13:53:06 +00:00
Kerry McLaughlin	ba1e150d03	[SVE] Add support for scalable vectorization of loops with int/fast FP reductions This patch enables scalable vectorization of loops with integer/fast reductions, e.g: ``` unsigned sum = 0; for (int i = 0; i < n; ++i) { sum += a[i]; } ``` A new TTI interface, isLegalToVectorizeReduction, has been added to prevent reductions which are not supported for scalable types from vectorizing. If the reduction is not supported for a given scalable VF, computeFeasibleMaxVF will fall back to using fixed-width vectorization. Reviewed By: david-arm, fhahn, dmgreen Differential Revision: https://reviews.llvm.org/D95245	2021-02-16 13:50:06 +00:00
Florian Hahn	54a14c264a	[VPlan] Manage scalarized values using VPValues. This patch updates codegen to use VPValues to manage the generated scalarized instructions. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92285	2021-02-16 09:04:10 +00:00
Juneyoung Lee	ed253ef772	[LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask This patch fixes pr48832 by correctly generating the mask when a poison value is involved. Consider this CFG (which is a part of the input): ``` for.body: ; preds = %for.cond br i1 true, label %cond.false, label %land.rhs land.rhs: ; preds = %for.body br i1 poison, label %cond.end, label %cond.false cond.false: ; preds = %for.body, %land.rhs br label %cond.end cond.end: ; preds = %land.rhs, %cond.false %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ] ``` The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead. The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation. SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026). Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D95217	2021-02-14 21:12:34 +09:00
Kerry McLaughlin	fea06efe7c	[SVE][LoopVectorize] Support for vectorization of loops with function calls Changes `getScalarizationOverhead` to return an invalid cost for scalable VFs and adds some simple tests for loops containing a function for which there is a vectorized variant available. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96356	2021-02-12 13:47:43 +00:00
Florian Hahn	85fe5c9345	[VPlan] Make VPRecipeBase inherit from VPUser directly (NFC). The individual recipes have been updated to manage their operands using VPUser a while back. Now that the transition is done, we can instead make VPRecipeBase a VPUser and get rid of the toVPUser helper.	2021-02-12 13:06:58 +00:00
David Sherwood	01b87444cb	[NFC][Analysis] Change struct VecDesc to use ElementCount This patch changes the VecDesc struct to use ElementCount instead of an unsigned VF value, in preparation for future work that adds support for vectorized versions of math functions using scalable vectors. Since all I'm doing in this patch is switching the type I believe it's a non-functional change. I changed getWidestVF to now return both the widest fixed-width and scalable VF values, but currently the widest scalable value will be zero. Differential Revision: https://reviews.llvm.org/D96011	2021-02-12 11:07:58 +00:00
Sander de Smalen	703130fb01	[TTI] Change TargetTransformInfo::getMinimumVF to return ElementCount This will be needed in the loop-vectorizer where the minimum VF requested may be a scalable VF. getMinimumVF now takes an additional operand 'IsScalableVF' that indicates whether a scalable VF is required. Reviewed By: kparzysz, rampitec Differential Revision: https://reviews.llvm.org/D96020	2021-02-11 09:08:48 +00:00
Sander de Smalen	be9bbb57f4	[LoopVectorize] NFC: Change selectVectorizationFactor to work on ElementCount. This patch is NFC and changes occurrences of `unsigned Width` and `unsigned i` to work on type ElementCount instead. This patch is a preparatory patch with the ultimate goal of making `computeMaxVF()` return both a max fixed VF and a max scalable VF, so that `selectVectorizationFactor()` can pick the most cost-effective vectorization factor. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96019	2021-02-11 08:47:59 +00:00
Sander de Smalen	9db6e97a86	[LoopVectorize] NFC: Change computeFeasibleMaxVF to operate on ElementCount. This patch is NFC and changes occurrences of `unsigned MaxVectorSize` to work on type ElementCount. This patch is a preparatory patch with the ultimate goal of making `computeMaxVF()` return both a max fixed VF and a max scalable VF, so that `selectVectorizationFactor()` can pick the most cost-effective vectorization factor. Reviewed By: kmclaughlin Differential Revision: https://reviews.llvm.org/D96018	2021-02-10 08:52:10 +00:00
Florian Hahn	fd8afa41eb	[VPlan] Use VPUser to manage CondBit VP blocks keep track of a condition, which is a VPValue. This patch updates VPBlockBase to manage the value using VPUser, so replaceAllUsesWith properly updates the condition bit as well. This is required to enable VP2VP transformations and it helps with simplifying some of the code required to manage condition bits. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95382	2021-02-09 21:53:50 +00:00
Kazu Hirata	302313a264	[Transforms] Use range-based for loops (NFC)	2021-02-08 22:33:53 -08:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
Florian Hahn	3bb6dc0b26	[LV] Replace some uses of VectorLoopValueMap with VPTransformState (NFC) This patch updates some places where VectorLoopValueMap is accessed directly to instead go through VPTransformState. As we move towards managing created values exclusively in VPTransformState, this ensures the use always can fetch the correct value. This is in preparation for D92285, which switches to managing scalarized values through VPValues. In the future, the various fix* functions should be moved directly into the VPlan codegen stage. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95757	2021-02-07 18:28:21 +00:00
Adrian Kuegel	7fe41ac3df	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute" This reverts commit `3e5ce49e53`. Tests started failing on PPC, for example: http://lab.llvm.org:8011/#/builders/105/builds/5569	2021-02-05 12:51:03 +01:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00
Craig Topper	11ef356d9e	[TargetLowering] Use Align in allowsMisalignedMemoryAccesses. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D96097	2021-02-04 19:22:06 -08:00
Philip Reames	3e5ce49e53	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCI-ish prep work, but the changes are a bit too involved for me to feel comfortable tagging the change that way. Differential Revision: https://reviews.llvm.org/D94892	2021-02-04 17:28:30 -08:00
Florian Hahn	daaa0e3501	[VPlan] Manage induction value creation using VPValues. This patch updates the induction value creation to use VPValues of recipes to map the created values. This should bring is one step closer to being able to optimize induction recipes directly in VPlan. Currently widenIntOrFpInduction also generates vector values for a cast of the induction, if it exists. Make this explicit by adding the cast instruction to the values defined by the recipe. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92284	2021-02-03 17:45:03 +00:00
David Sherwood	d4626eb0bd	[VPlan][NFC] Introduce constructors for VPIteration This patch adds constructors to VPIteration as a cleaner way of initialising the struct and replaces existing constructions of the form: {Part, Lane} with VPIteration(Part, Lane) I have also added a default constructor, which is used by VPlan.cpp when deciding whether to replicate a block or not. This refactoring will be required in a later patch that adds more members and functions to VPIteration. Differential Revision: https://reviews.llvm.org/D95676	2021-02-03 08:52:27 +00:00
David Sherwood	d4d4ceeb8f	[SVE][LoopVectorize] Add masked load/store and gather/scatter support for SVE This patch updates IRBuilder::CreateMaskedGather/Scatter to work with ScalableVectorType and adds isLegalMaskedGather/Scatter functions to AArch64TargetTransformInfo. In addition I've fixed up isLegalMaskedLoad/Store to return true for supported scalar types, since this is what the vectorizer asks for. In LoopVectorize.cpp I've changed LoopVectorizationCostModel::getInterleaveGroupCost to return an invalid cost for scalable vectors, since currently this relies upon using shuffle vector for reversing vectors. In addition, in LoopVectorizationCostModel::setCostBasedWideningDecision I have assumed that the cost of scalarising memory ops is infinitely expensive. I have added some simple masked load/store and gather/scatter tests, including cases where we use gathers and scatters for conditional invariant loads and stores. Differential Revision: https://reviews.llvm.org/D95350	2021-02-02 09:52:39 +00:00
Gil Rapaport	d475030dc2	[SCEV] Apply loop guards to divisibility tests Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops. Differential Revision: https://reviews.llvm.org/D95521	2021-02-02 08:09:39 +02:00
Sanjay Patel	bbed5f2f8a	[LoopVectorize] improve IR fast-math-flags propagation in reductions This is another step (see D95452) towards correcting fast-math-flags bugs in vector reductions. There are multiple bugs visible in the test diffs, and this is still not working as it should. We still use function attributes (rather than FMF) to drive part of the logic, but we are not checking for the correct FP function attributes. Note that FMF may not be propagated optimally on selects (example in https://llvm.org/PR35607 ). That's why I'm proposing to union the FMF of a fcmp+select pair and avoid regressions on existing vectorizer tests. Differential Revision: https://reviews.llvm.org/D95690	2021-02-01 16:21:36 -05:00
Cullen Rhodes	8cda227432	[LV] Fix crash when computing max VF too early D90687 introduced a crash: llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed. when compiling the following C code: typedef struct { char a; } b; b *c; int d, e; int f() { int g = 0; for (; d; d++) { e = 0; for (; e < c[d].a; e++) g++; } return g; } with: clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o - This occurred since prior to D90687 computeFeasibleMaxVF would only be called in computeMaxVF when a scalar epilogue was allowed, but now it's always called. This causes the assert above since computeFeasibleMaxVF collects all viable VFs larger than the default MaxVF, and for each VF calculates the register usage which results in analysis being done the assert above guards against. This can occur in computeFeasibleMaxVF if TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in the hexagon backend to always return true. Reported by @iajbar. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94869	2021-02-01 12:14:59 +00:00
Sanjay Patel	ab93c18c12	[LoopVectorize] use IR fast-math-flags exclusively (not FP function attributes) I am trying to untangle the fast-math-flags propagation logic in the vectorizers (see `a6f022127` for SLP). The loop vectorizer has a mix of checking FP function attributes, IR-level FMF, and just wrong assumptions. I am trying to avoid regressions while fixing this, and I think the IR-level logic is good enough for that, but it's hard to say for sure. This would be the 1st step in the clean-up. The existing test that I changed to include 'fast' actually shows a miscompile: the function only had the equivalent of nnan, but we created new instructions that had fast (all FMF set). This is similar to the example in https://llvm.org/PR35538 Differential Revision: https://reviews.llvm.org/D95452	2021-01-27 14:17:11 -05:00
Florian Hahn	28410d17f5	[LoopUtils] Pass SCEVExpander instead SE to addRuntimeChecks. This gives the user control over which expander to use, which in turn allows the user to decide what to do with the expanded instructions. Used in D75980. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D94295	2021-01-27 17:36:19 +00:00
Florian Hahn	76afbf60ed	[VPlan] Replace uses with new value in VPInstructionsToVPRecipe (NFC). Now that VPRecipeBase inherits from VPDef, we can always use the new VPValue for replacement, if the recipe defines one. Given the recipes that are supported at the moment, all new recipes must have either 0 or 1 defined values.	2021-01-25 19:38:08 +00:00
Florian Hahn	3201274dea	[VPlan] Handle scalarized values in VPTransformState. This patch adds plumbing to handle scalarized values directly in VPTransformState. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92282	2021-01-25 14:21:56 +00:00
Sander de Smalen	171d12489f	[SLPVectorizer] NFC: Migrate getVectorCallCosts to use InstructionCost. This change also changes getReductionCost to return InstructionCost, and it simplifies two expressions by removing a redundant 'isValid' check.	2021-01-25 12:27:01 +00:00
Sanjay Patel	77adbe6a8c	[SLP] fix fast-math requirements for fmin/fmax reductions `a6f0221276` enabled intersection of FMF on reduction instructions, so it is safe to ease the check here. There is still some room to improve here - it looks like we have nearly duplicate flags propagation logic inside of the LoopUtils helper but it is limited targets that do not form reduction intrinsics (they form the shuffle expansion).	2021-01-24 08:55:56 -05:00
Nikita Popov	c83cff45c7	[IR] Add NoAliasScopeDeclInst (NFC) Add an intrinsic type class to represent the llvm.experimental.noalias.scope.decl intrinsic, to make code working with it a bit nicer by hiding the metadata extraction from view.	2021-01-23 22:40:32 +01:00
Kazu Hirata	1238378f18	[llvm] Use pop_back_val (NFC)	2021-01-23 10:56:33 -08:00
Sanjay Patel	a6f0221276	[SLP] fix fast-math-flag propagation on FP reductions As shown in the test diffs, we could miscompile by propagating flags that did not exist in the original code. The flags required for fmin/fmax reductions will be fixed in a follow-up patch.	2021-01-23 11:17:20 -05:00
Anton Rapetov	a4914dc1f2	[SLP] do not traverse constant uses Walking the use list of a Constant (particularly, ConstantData) is not scalable, since a given constant may be used by many instructinos in many functions in many modules. Differential Revision: https://reviews.llvm.org/D94713	2021-01-22 08:14:09 -05:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
David Green	39db5753f9	[LV][ARM] Inloop reduction cost modelling This adds cost modelling for the inloop vectorization added in `745bf6cf44`. Up until now they have been modelled as the original underlying instruction, usually an add. This happens to works OK for MVE with instructions that are reducing into the same type as they are working on. But MVE's instructions can perform the equivalent of an extended MLA as a single instruction: %sa = sext <16 x i8> A to <16 x i32> %sb = sext <16 x i8> B to <16 x i32> %m = mul <16 x i32> %sa, %sb %r = vecreduce.add(%m) -> R = VMLADAV A, B There are other instructions for performing add reductions of v4i32/v8i16/v16i8 into i32 (VADDV), for doing the same with v4i32->i64 (VADDLV) and for performing a v4i32/v8i16 MLA into an i64 (VMLALDAV). The i64 are particularly interesting as there are no native i64 add/mul instructions, leading to the i64 add and mul naturally getting very high costs. Also worth mentioning, under NEON there is the concept of a sdot/udot instruction which performs a partial reduction from a v16i8 to a v4i32. They extend and mul/sum the first four elements from the inputs into the first element of the output, repeating for each of the four output lanes. They could possibly be represented in the same way as above in llvm, so long as a vecreduce.add could perform a partial reduction. The vectorizer would then produce a combination of in and outer loop reductions to efficiently use the sdot and udot instructions. Although this patch does not do that yet, it does suggest that separating the input reduction type from the produced result type is a useful concept to model. It also shows that a MLA reduction as a single instruction is fairly common. This patch attempt to improve the costmodelling of in-loop reductions by: - Adding some pattern matching in the loop vectorizer cost model to match extended reduction patterns that are optionally extended and/or MLA patterns. This marks the cost of the reduction instruction correctly and the sext/zext/mul leading up to it as free, which is otherwise difficult to tell and may get a very high cost. (In the long run this can hopefully be replaced by vplan producing a single node and costing it correctly, but that is not yet something that vplan can do). - getExtendedAddReductionCost is added to query the cost of these extended reduction patterns. - Expanded the ARM costs to account for these expanded sizes, which is a fairly simple change in itself. - Some minor alterations to allow inloop reduction larger than the highest vector width and i64 MVE reductions. - An extra InLoopReductionImmediateChains map was added to the vectorizer for it to efficiently detect which instructions are reductions in the cost model. - The tests have some updates to show what I believe is optimal vectorization and where we are now. Put together this can greatly improve performance for reduction loop under MVE. Differential Revision: https://reviews.llvm.org/D93476	2021-01-21 21:03:41 +00:00
Sanjay Patel	2f03528f5e	[SLP] rename reduction variable to avoid shadowing; NFC The code structure can likely be improved now that 'OperationData' is gone.	2021-01-21 16:02:38 -05:00
Sanjay Patel	d77753381f	[SLP] simplify reduction matching This is NFC-intended and removes the "OperationData" class which had become nothing more than a recurrence (reduction) type. I adjusted the matching logic to distinguish instructions from non-instructions - that's all that the "IsLeafValue" member was keeping track of.	2021-01-21 14:58:57 -05:00
Kazu Hirata	e53472de68	[Transforms] Use llvm::append_range (NFC)	2021-01-20 21:35:54 -08:00
Sanjay Patel	c09be0d2a0	[SLP] reduce reduction code for checking vectorizable ops; NFC This is another step towards removing `OperationData` and fixing FMF matching/propagation bugs when forming reductions.	2021-01-20 11:14:48 -05:00
Sanjay Patel	1c54112a57	[SLP] refactor more reduction functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param. More streamlining is possible, but I'm trying to avoid logic/typo bugs while fixing this. Eventually, we should not need the `OperationData` class.	2021-01-20 11:14:48 -05:00
Sanjay Patel	8590d24543	[SLP] move reduction createOp functions; NFC We were able to remove almost all of the state from OperationData, so these don't make sense as members of that class - just pass the RecurKind in as a param.	2021-01-20 11:14:48 -05:00
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Alexey Bataev	e463bd53c0	Revert "[SLP]Merge reorder and reuse shuffles." This reverts commit `438682de6a` to fix the bug with the reducing size of the resulting vector for the entry node with multiple users.	2021-01-19 11:48:04 -08:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
David Sherwood	c3ce262794	[NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost A previous patch has already changed getInstructionCost to return an InstructionCost type. This patch changes the other various getXXXCost functions to return an InstructionCost too. This is a non-functional change - I've added a few asserts that the costs are valid in places where we're selecting between vector call and intrinsic costs. However, since we don't yet return invalid costs from any of the TTI implementations these asserts should not fire. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D94065	2021-01-19 09:08:40 +00:00
Sanjay Patel	5b77ac32b1	[SLP] match maxnum/minnum intrinsics as FP reduction ops After much refactoring over the last 2 weeks to the reduction matching code, I think this change is finally ready. We effectively broke fmax/fmin vector reduction optimization when we started canonicalizing to intrinsics in instcombine, so this should restore that functionality for SLP. There are still FMF problems here as noted in the code comments, but we should be avoiding miscompiles on those for fmax/fmin by restricting to full 'fast' ops (negative tests are included). Fixing FMF propagation is a planned follow-up. Differential Revision: https://reviews.llvm.org/D94913	2021-01-18 17:37:16 -05:00
Sanjay Patel	3dbbadb8ef	[SLP] rename reduction query for min/max ops; NFC This will avoid confusion once we start matching min/max intrinsics. All of these hacks to accomodate cmp+sel idioms should disappear once we canonicalize to min/max intrinsics.	2021-01-18 09:32:57 -05:00
Sanjay Patel	d1c4e859ce	[SLP] reduce opcode API dependency in reduction cost calc; NFC The icmp opcode is now hard-coded in the cost model call. This will make it easier to eventually remove all opcode queries for min/max patterns as we transition to intrinsics.	2021-01-18 09:32:57 -05:00
Caroline Concatto	36710c38c1	[NFC]Migrate VectorCombine.cpp to use InstructionCost This patch changes these functions: vectorizeLoadInsert isExtractExtractCheap foldExtractedCmps scalarizeBinopOrCmp getShuffleExtract foldBitcastShuf to use the class InstructionCost when calling TTI.get<something>Cost(). This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 ps.:This patch adds the test \|\| !NewCost.isValid(), because we want to return false when: !NewCost.isValid && !OldCost.isValid()->the cost to transform it expensive and !NewCost.isValid() && OldCost.isValid() Therefore for simplication we only add test for !NewCost.isValid() Differential Revision: https://reviews.llvm.org/D94069	2021-01-18 13:37:21 +00:00
Sanjay Patel	49b96cd9ef	[SLP] remove opcode field from reduction data class This is NFC-intended and another step towards supporting intrinsics as reduction candidates. The remaining bits of the OperationData class do not make much sense as-is, so I will try to improve that, but I'm trying to take minimal steps because it's still not clear how this was intended to work.	2021-01-16 13:55:52 -05:00
Sanjay Patel	fcfcc3cc6b	[SLP] fix typos; NFC	2021-01-16 13:55:52 -05:00
Sanjay Patel	48dbac5b6b	[SLP] remove unnecessary use of 'OperationData' This is another NFC-intended patch to allow matching intrinsics (example: maxnum) as candidates for reductions. It's possible that the loop/if logic can be reduced now, but it's still difficult to understand how this all works.	2021-01-16 13:55:52 -05:00
Kazu Hirata	19aacdb715	[llvm] Construct SmallVector with iterator ranges (NFC)	2021-01-16 09:40:53 -08:00
Sanjay Patel	ceb3cdccd0	[SLP] remove dead code in reduction matching; NFC To get into this block we had: !A \|\| B \|\| C and we checked C in the first 'if' clause leaving !A \|\| B. But the 2nd 'if' is checking: A && !B --> !(!A \|\| B)	2021-01-15 17:03:26 -05:00
Sanjay Patel	1f21de535d	[SLP] remove unused reduction functions; NFC These were made obsolete by simplifying the code in recent patches.	2021-01-15 14:59:33 -05:00
Sanjay Patel	b21905dfe3	[SLP] remove unnecessary state in matching reductions This is NFC-intended. I'm still trying to figure out how the loop where this is used works. It does not seem like we require this data at all, but it's hard to confirm given the complicated predicates.	2021-01-14 18:32:37 -05:00
Bjorn Pettersson	d58512b2e3	[SLP] Don't vectorize stores of non-packed types (like i1, i2) In the spirit of commit `fc783e91e0` (llvm-svn: 248943) we shouldn't vectorize stores of non-packed types (i.e. types that has padding between consecutive variables in a scalar layout, but being packed in a vector layout). The problem was detected as a miscompile in a downstream test case. Reviewed By: anton-afanasyev Differential Revision: https://reviews.llvm.org/D94446	2021-01-14 11:30:33 +01:00
Sanjay Patel	123674a816	[SLP] simplify type check for reductions This is NFC-intended. The 'valid' call allows int/FP/pointers for other parts of SLP. The difference here is that we can't reduce pointers.	2021-01-13 13:30:46 -05:00
Sanjay Patel	9e7895a868	[SLP] reduce code duplication while processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	92fb5c49e8	[SLP] rename variable to improve readability; NFC The OperationData in the 2nd block (visiting the operands) is completely independent of the 1st block.	2021-01-12 16:03:57 -05:00
Sanjay Patel	554be30a42	[SLP] reduce code duplication in processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	46507a96fc	[SLP] reduce code duplication while matching reductions; NFC	2021-01-12 16:03:57 -05:00
Philip Reames	9f61fbd75a	[LV] Relax assumption that LCSSA implies single entry This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317). The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis. A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates) Differential Revision: https://reviews.llvm.org/D93725	2021-01-12 12:34:52 -08:00
Florian Hahn	eb0371e403	[VPlan] Unify value/recipe printing after VPDef transition. This patch unifies the way recipes and VPValues are printed after the transition to VPDef. VPSlotTracker has been updated to iterate over all recipes and all their defined values to number those. There is no need to number values in Value2VPValue. It also updates a few places that only used slot numbers for VPInstruction. All recipes now can produce numbered VPValues.	2021-01-11 14:42:46 +00:00
Florian Hahn	a94497a342	[VPlan] Move initial quote emission from ::print to ::dumpBasicBlock. This means there will be no stray " when printing individual recipes using print()/dump() in a debugger, for example.	2021-01-11 12:22:15 +00:00
David Sherwood	40abeb11f4	[NFC][InstructionCost] Change LoopVectorizationCostModel::getInstructionCost to return InstructionCost This patch is part of a series of patches that migrate integer instruction costs to use InstructionCost. In the function selectVectorizationFactor I have simply asserted that the cost is valid and extracted the value as is. In future we expect to encounter invalid costs, but we should filter out those vectorization factors that lead to such invalid costs. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Differential Revision: https://reviews.llvm.org/D92178	2021-01-11 09:22:37 +00:00
David Sherwood	b7ccaca537	[NFC] Remove min/max functions from InstructionCost Removed the InstructionCost::min/max functions because it's fine to use std::min/max instead. Differential Revision: https://reviews.llvm.org/D94301	2021-01-11 09:00:12 +00:00
Sanjay Patel	3f09c77d33	[SLP] fix typo in assert This snuck into `0aa75fb12f` , but I didn't catch it locally.	2021-01-10 13:15:04 -05:00
Sanjay Patel	0aa75fb12f	[SLP] put verifyFunction call behind EXPENSIVE_CHECKS A severe compile-time slowdown from this call is noted in: https://llvm.org/PR48689 My naive fix was to put it under LLVM_DEBUG ( `267ff79` ), but that's not limiting in the way we want. This is a quick fix (or we could just remove the call completely and rely on some later pass to discover potentially wrong IR?). A bigger/better fix would be to improve/limit verifyFunction() as noted in: https://llvm.org/PR47712 Differential Revision: https://reviews.llvm.org/D94328	2021-01-10 12:32:21 -05:00
Kazu Hirata	6a6e382161	[llvm] Drop unnecessary make_range (NFC)	2021-01-09 09:25:00 -08:00
Florian Hahn	65f578fc0e	[VPlan] Keep start value of VPWidenPHIRecipe as VPValue. Similar to D92129, update VPWidenPHIRecipe to manage the start value as VPValue. This allows adjusting the start value as a VPlan transform, which will be used in a follow-up patch to support reductions during epilogue vectorization. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D93975	2021-01-09 16:34:15 +00:00
Florian Hahn	c493e9216b	[VPlan] Move reduction start value creation to widenPHIRecipe. This was suggested to prepare for D93975. By moving the start value creation to widenPHInstruction, we set the stage to manage the start value directly in VPWidenPHIRecipe, which be used subsequently to set the 'resume' value for reductions during epilogue vectorization. It also moves RdxDesc to the recipe, so we do not have to rely on Legal to look it up later. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D94175	2021-01-08 17:49:43 +00:00
Alexander Belyaev	bcbdeafa9c	Revert "[SLP]Need shrink the load vector after reordering." This reverts commit `4284afdf94`. This changes computed values in fused_batchnorm_test_cpu. Not equal to tolerance rtol=1e-06, atol=0.001 Mismatched value: a is different from b. not close where = (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]), array([0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5])) not close lhs = [-0.6636615 -0.9804948 -1.148275 -0.68193716 -0.8572368 -0.65046215 -0.6993756 -1.2244141 -1.0938729 -0.50369143 -0.51830524 -0.738452 -0.7214286 -0.48115745 -0.9380924 -0.9341769 -0.5916775 -1.2896856 -0.7264182 -0.9746917 -0.783249 -0.7659018 -0.86214024 -0.47784212] not close rhs = [ 0.44102234 0.12418899 -0.04359123 0.42274666 0.24744703 0.45422167 0.40530816 -0.11973029 0.01081094 0.6009924 0.5863786 0.3662318 0.38325527 0.62352633 0.1665914 0.1705069 0.5130063 -0.18500176 0.37826565 0.12999213 0.3214348 0.338782 0.24254355 0.62684166] not close dif = [1.1046839 1.1046838 1.1046838 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046839 1.1046838 1.1046838] not close tol = [0.00100044 0.00100012 0.00100004 0.00100042 0.00100025 0.00100045 0.00100041 0.00100012 0.00100001 0.0010006 0.00100059 0.00100037 0.00100038 0.00100062 0.00100017 0.00100017 0.00100051 0.00100019 0.00100038 0.00100013 0.00100032 0.00100034 0.00100024 0.00100063]	2021-01-08 14:42:26 +01:00
Sanjay Patel	267ff7901c	[SLP] limit verifyFunction to debug build (PR48689) As noted in PR48689, the verifier may have some kind of exponential behavior that should be addressed separately. For now, only run it in debug mode to prevent problems for release+asserts. That limit is what we had before D80401, and I'm not sure if there was a reason to change it in that patch.	2021-01-08 08:10:17 -05:00
Cullen Rhodes	1e7efd397a	[LV] Legalize scalable VF hints In the following loop: void foo(int a, int b, int N) { for (int i=0; i<N; ++i) a[i + 4] = a[i] + b[i]; } The loop dependence constrains the VF to a maximum of (4, fixed), which would mean using <4 x i32> as the vector type in vectorization. Extending this to scalable vectorization, a VF of (4, scalable) implies a vector type of <vscale x 4 x i32>. To determine if this is legal vscale must be taken into account. For this example, unless max(vscale)=1, it's unsafe to vectorize. For SVE, the number of bits in an SVE register is architecturally defined to be a multiple of 128 bits with a maximum of 2048 bits, thus the maximum vscale is 16. In the loop above it is therefore unfeasible to vectorize with SVE. However, in this loop: void foo(int a, int b, int N) { #pragma clang loop vectorize_width(X, scalable) for (int i=0; i<N; ++i) a[i + 32] = a[i] + b[i]; } As long as max(vscale) multiplied by the number of lanes 'X' doesn't exceed the dependence distance, it is safe to vectorize. For SVE a VF of (2, scalable) is within this constraint, since a vector of <16 x 2 x 32> will have no dependencies between lanes. For any number of lanes larger than this it would be unsafe to vectorize. This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs specified as loop hints, implementing the following behaviour: * If the backend does not support scalable vectors, ignore the hint. * If scalable vectorization is unfeasible given the loop dependence, like in the first example above for SVE, then use a fixed VF. * Accept scalable VFs if it's safe to do so. * Otherwise, clamp scalable VFs that exceed the maximum safe VF. Reviewed By: sdesmalen, fhahn, david-arm Differential Revision: https://reviews.llvm.org/D91718	2021-01-08 10:49:44 +00:00
David Green	72fb5ba079	[LV] Don't sink into replication regions The new test case here contains a first order recurrences and an instruction that is replicated. The first order recurrence forces an instruction to be sunk _into_, as opposed to after the replication region. That causes several things to go wrong including registering vector instructions multiple times and failing to create dominance relations correctly. Instead we should be sinking to after the replication region, which is what this patch makes sure happens. Differential Revision: https://reviews.llvm.org/D93629	2021-01-08 09:50:10 +00:00
Kazu Hirata	33bf1cad75	[llvm] Use *Set::contains (NFC)	2021-01-07 20:29:34 -08:00
Sanjay Patel	4c7148d75c	[SLP] remove opcode identifier for reduction; NFC Another step towards allowing intrinsics in reduction matching.	2021-01-07 14:07:27 -05:00
Alexey Bataev	4284afdf94	[SLP]Need shrink the load vector after reordering. After merging the shuffles, we cannot rely on the previous shuffle anymore and need to shrink the final shuffle, if it is required. Reported in D92668 Differential Revision: https://reviews.llvm.org/D93967	2021-01-07 04:50:48 -08:00
Oliver Stannard	76f6b125ce	Revert "[llvm] Use BasicBlock::phis() (NFC)" Reverting because this causes crashes on the 2-stage buildbots, for example http://lab.llvm.org:8011/#/builders/7/builds/1140. This reverts commit `9b228f107d`.	2021-01-07 09:43:33 +00:00
Kazu Hirata	cfeecdf7b6	[llvm] Use llvm::all_of (NFC)	2021-01-06 18:27:36 -08:00
Kazu Hirata	9b228f107d	[llvm] Use BasicBlock::phis() (NFC)	2021-01-06 18:27:35 -08:00
Sanjay Patel	4c022b5a41	[SLP] use reduction kind's opcode to create new instructions; NFC Similar to `5a1d31a28` - This should be no-functional-change because the reduction kind opcodes are 1-for-1 mappings to the instructions we are matching as reductions. But we want to remove the need for the `OperationData` opcode field because that does not work when we start matching intrinsics (eg, maxnum) as reduction candidates.	2021-01-06 14:37:44 -05:00
Sanjay Patel	5d24089a70	[SLP] reduce code for propagating flags on reductions; NFC If we add/change to match intrinsics, this might get more wordy, but there's no need to list each kind currently.	2021-01-06 14:37:44 -05:00
Florian Hahn	816dba48af	[VPlan] Keep start value in VPWidenIntOrFpInductionRecipe (NFC). This patch updates VPWidenIntOrFpInductionRecipe to hold the start value for the induction variable. This makes the start value explicit and allows for adjusting the start value for a VPlan. The flexibility will be used in further patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92129	2021-01-06 11:47:33 +00:00
Florian Hahn	0ce5f402e0	[VPlan] Add getLiveInIRValue accessor to VPValue. This patch adds a new getLiveInIRValue accessor to VPValue, which returns the underlying value, if the VPValue is defined outside of VPlan. This is required to handle scalars in VPTransformState, which requires dealing with scalars defined outside of VPlan. We can simply check VPValue::Def to determine if the value is defined inside a VPlan. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92281	2021-01-06 11:20:42 +00:00
Florian Hahn	f73c09caa2	[VPlan] Use public VPValue constructor in VPPRedInstPHIRecipe (NFC). VPPredInstPHIRecipe does not need access to VPValue via friendship. It can just use the public constructor, Discussed as part of D92281.	2021-01-06 10:47:09 +00:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Sanjay Patel	6a03f8ab62	[SLP] reduce code for finding reduction costs; NFC We can get both (vector/scalar) costs in a single switch instead of sequentially.	2021-01-05 17:35:54 -05:00
Sanjay Patel	5a1d31a284	[SLP] use reduction kind's opcode for cost model queries; NFC This should be no-functional-change because the reduction kind opcodes are 1-for-1 mappings to the instructions we are matching as reductions. But we want to remove the need for the `OperationData` opcode field because that does not work when we start matching intrinsics (eg, maxnum) as reduction candidates.	2021-01-05 15:12:40 -05:00
Sanjay Patel	d4a999b453	[SLP] reduce code duplication; NFC	2021-01-05 15:12:40 -05:00
Sanjay Patel	3b8b2c7da2	[SLP] delete unused pairwise reduction option SLP tries to model 2 forms of vector reductions: pairwise and splitting. From the cost model code comments, those are defined using an example as: /// Pairwise: /// (v0, v1, v2, v3) /// ((v0+v1), (v2+v3), undef, undef) /// Split: /// (v0, v1, v2, v3) /// ((v0+v2), (v1+v3), undef, undef) I don't know the full history of this functionality, but it was partly added back in D29402. There are apparently no users at this point (no regression tests change). X86 might have managed to work-around the need for this through cost model and codegen improvements. Removing this code makes it easier to continue the work that was started in D87416 / D88193. The alternative -- if there is some target that is silently using this option -- is to move this logic into LoopUtils. We have related/duplicate functionality there via llvm::createTargetReduction(). Differential Revision: https://reviews.llvm.org/D93860	2021-01-05 13:23:07 -05:00
Florian Hahn	8a47e6252a	[VPlan] Re-add interleave group members to plan. Creating in-loop reductions relies on IR references to map IR values to VPValues after interleave group creation. Make sure we re-add the updated member to the plan, so the look-ups still work as expected This fixes a crash reported after D90562.	2021-01-05 15:06:47 +00:00
Florian Hahn	38c6933dcc	[LV] Simplify lambda in all_of to directly return hasVF() result. (NFC) The if in the lambda is not necessary. We can directly return the result of hasVF.	2021-01-05 10:34:06 +00:00
Sanjay Patel	36263a7ccc	[LoopUtils] remove redundant opcode parameter; NFC While here, rename the inaccurate getRecurrenceBinOp() because that was also used to get CmpInst opcodes. The recurrence/reduction kind should always refer to the expected opcode for a reduction. SLP appears to be the only direct caller of createSimpleTargetReduction(), and that calling code ideally should not be carrying around both an opcode and a reduction kind. This should allow us to generalize reduction matching to use intrinsics instead of only binops.	2021-01-04 17:05:28 -05:00
Florian Hahn	c50f9b2351	[LV] Clean up trailing whitespace (NFC). Clean up some stray whitespace that sneaked in recently.	2021-01-02 16:43:13 +00:00
Sanjay Patel	c74e8539ff	[Analysis] flatten enums for recurrence types This is almost all mechanical search-and-replace and no-functional-change-intended (NFC). Having a single enum makes it easier to match/reason about the reduction cases. The goal is to remove `Opcode` from reduction matching code in the vectorizers because that makes it harder to adapt the code to handle intrinsics. The code in RecurrenceDescriptor::AddReductionVar() is the only place that required closer inspection. It uses a RecurrenceDescriptor and a second InstDesc to sometimes overwrite part of the struct. It seem like we should be able to simplify that logic, but it's not clear exactly which cmp+sel patterns that we are trying to handle/avoid.	2021-01-01 12:20:16 -05:00
Florian Hahn	d9f306aa52	[LV] Fix crash when generating remarks with multi-exit loops. If DoExtraAnalysis is true (e.g. because remarks are enabled), we continue with the analysis rather than exiting. Update code to conditionally check if the ExitBB has phis or not a single predecessor. Otherwise a nullptr is dereferenced with DoExtraAnalysis.	2021-01-01 13:54:41 +00:00
Sanjay Patel	8ca60db40b	[LoopUtils] reduce FMF and min/max complexity when forming reductions I don't know if there's some way this changes what the vectorizers may produce for reductions, but I have added test coverage with `3567908` and `5ced712` to show that both passes already have bugs in this area. Hopefully this does not make things worse before we can really fix it.	2020-12-30 15:22:26 -05:00
Sanjay Patel	21a3a0225d	[SLP] replace local reduction enum with RecurrenceKind; NFCI I'm not sure if the SLP enum was created before the IVDescriptor RecurrenceDescriptor / RecurrenceKind existed, but the code in SLP is now redundant with that class, so it just makes things more complicated to have both. We eventually call LoopUtils createSimpleTargetReduction() to create reduction ops, so we might as well standardize on those enum names. There's still a question of whether we need to use TTI::ReductionFlags vs. MinMaxRecurrenceKind, but that can be another clean-up step. Another option would just be to flatten the enums in RecurrenceDescriptor into a single enum. There isn't much benefit (smaller switches?) to having a min/max subset.	2020-12-29 14:52:11 -05:00
Philip Reames	4b33b23877	Reapply "[LV] Vectorize (some) early and multiple exit loops"" w/fix for builder This reverts commit `4ffcd4fe9a` thus restoring `e4df6a40da`. The only change from the original patch is to add "llvm::" before the call to empty(iterator_range). This is a speculative fix for the ambiguity reported on some builders.	2020-12-28 10:13:28 -08:00
Arthur Eubanks	4ffcd4fe9a	Revert "[LV] Vectorize (some) early and multiple exit loops" This reverts commit `e4df6a40da`. Breaks Windows bots, e.g. http://45.33.8.238/win/30472/step_4.txt and http://lab.llvm.org:8011/#/builders/83/builds/2078/steps/5/logs/stdio	2020-12-28 10:05:41 -08:00
Philip Reames	e4df6a40da	[LV] Vectorize (some) early and multiple exit loops This patch is a major step towards supporting multiple exit loops in the vectorizer. This patch on it's own extends the loop forms allowed in two ways: single exit loops which are not bottom tested multiple exit loops w/ a single exit block reached from all exits and no phis in the exit block (because of LCSSA this implies no values defined in the loop used later) The restrictions on multiple exit loop structures will be removed in follow up patches; disallowing cases for now makes the code changes smaller and more obvious. As before, we can only handle loops with entirely analyzable exits. Removing that restriction is much harder, and is not part of currently planned efforts. The basic idea here is that we can force the last iteration to run in the scalar epilogue loop (if we have one). From the definition of SCEV's backedge taken count, we know that no earlier iteration can exit the vector body. As such, we can leave the decision on which exit to be taken to the scalar code and generate a bottom tested vector loop which runs all but the last iteration. The existing code already had the notion of requiring one iteration in the scalar epilogue, this patch is mainly about generalizing that support slightly, making sure we don't try to use this mechanism when tail folding, and updating the code to reflect the difference between a single exit block and a unique exit block (very mechanical). Differential Revision: https://reviews.llvm.org/D93317	2020-12-28 09:40:42 -08:00
Florian Hahn	0ea3749b3c	[LV] Set up branch from middle block earlier. Previously the branch from the middle block to the scalar preheader & exit was being set-up at the end of skeleton creation in completeLoopSkeleton. Inserting SCEV or runtime checks may result in LCSSA phis being created, if they are required. Adjusting branches afterwards may break those PHIs. To avoid this, we can instead create the branch from the middle block to the exit after we created the middle block, so we have the final CFG before potentially adjusting/creating PHIs. This fixes a crash for the included test case. For the non-crashing case, this is almost a NFC with respect to the generated code. The only change is the order of the predecessors of the involved branch targets. Note an assertion was moved from LoopVersioning() to LoopVersioning::versionLoop. Adjusting the branches means loop-simplify form may be broken before constructing LoopVersioning. But LV only uses LoopVersioning to annotate the loop instructions with !noalias metadata, which does not require loop-simplify form. This is a fix for an existing issue uncovered by D93317.	2020-12-27 18:21:12 +00:00
Kazu Hirata	789d250613	[CodeGen, Transforms] Use *Map::lookup (NFC)	2020-12-27 09:57:27 -08:00
Sanjay Patel	badf0f20f3	[SLP] rename reduction variables for readability; NFC I am hoping to extend the reduction matching code, and it is hard to distinguish "ReductionData" from "ReducedValueData". So extend the tree/root metaphor to include leaves. Another problem is that the name "OperationData" does not provide insight into its purpose. I'm not sure if we can alter that underlying data structure to make the code clearer.	2020-12-26 11:20:25 -05:00
Sanjay Patel	c4ca108966	[SLP] use switch to improve readability; NFC This will get more complicated when we handle intrinsics like maxnum.	2020-12-26 10:59:45 -05:00
Kazu Hirata	df812115e3	[CodeGen, Transforms] Use llvm::any_of (NFC)	2020-12-24 09:08:36 -08:00
Sanjay Patel	0d15d4b6f4	[SLP] use operand index abstraction for number of operands I think this is NFC currently, but the bug would be exposed when we allow binary intrinsics (maxnum, etc) as candidates for reductions. The code in matchAssociativeReduction() is using OperationData::getNumberOfOperands() when comparing whether the "EdgeToVisit" iterator is in-bounds, so this code must use the same (potentially offset) operand value to set the "EdgeToVisit".	2020-12-22 16:05:39 -05:00
Florian Hahn	ef4dbb2b7a	[LV] Use ScalarEvolution::getURemExpr to reduce duplication. ScalarEvolution should be able to handle both constant and variable trip counts using getURemExpr, so we do not have to handle them separately. This is a small simplification of `a56280094e`. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D93677	2020-12-22 14:48:42 +00:00
Florian Hahn	c0c0ae16c3	[VPlan] Make VPInstruction a VPDef This patch turns updates VPInstruction to manage the value it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90565	2020-12-22 09:53:47 +00:00
Gil Rapaport	a56280094e	[LV] Avoid needless fold tail When the trip-count is provably divisible by the maximal/chosen VF, folding the loop's tail during vectorization is redundant. This commit extends the existing test for constant trip-counts to any trip-count known to be divisible by maximal/selected VF by SCEV. Differential Revision: https://reviews.llvm.org/D93615	2020-12-22 10:25:20 +02:00
Florian Hahn	f250892373	[VPlan] Make VPRecipeBase inherit from VPDef. This patch makes VPRecipeBase a direct subclass of VPDef, moving the SubclassID to VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90564	2020-12-21 13:34:00 +00:00
Florian Hahn	cd608dc8d3	[VPlan] Use VPDef for VPInterleaveRecipe. This patch turns updates VPInterleaveRecipe to manage the values it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90562	2020-12-21 10:56:53 +00:00
David Sherwood	3bf7d47a97	[NFC][InstructionCost] Remove isValid() asserts in SLPVectorizer.cpp An earlier patch introduced asserts that the InstructionCost is valid because at that time the ReuseShuffleCost variable was an unsigned. However, now that the variable is an InstructionCost instance the asserts can be removed. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174	2020-12-21 09:12:28 +00:00
Sanjay Patel	37d0dda739	[SLP] fix typo; NFC	2020-12-18 16:55:52 -05:00
Sanjay Patel	47aaa99c0e	[VectorCombine] allow peeking through GEPs when creating a vector load This is an enhancement motivated by https://llvm.org/PR16739 (see D92858 for another). We can look through a GEP to find a base pointer that may be safe to use for a vector load. If so, then we shuffle (shift) the necessary vector element over to index 0. Alive2 proof based on 1 of the regression tests: https://alive2.llvm.org/ce/z/yPJLkh The vector translation is independent of endian (verify by changing to leading 'E' in the datalayout string). Differential Revision: https://reviews.llvm.org/D93229	2020-12-18 09:25:03 -05:00
Cullen Rhodes	1fd3a04775	[LV] Disable epilogue vectorization for scalable VFs Epilogue vectorization doesn't support scalable vectorization factors yet, disable it for now. Reviewed By: sdesmalen, bmahjour Differential Revision: https://reviews.llvm.org/D93063	2020-12-17 12:14:03 +00:00
Sanjay Patel	38ebc1a13d	[VectorCombine] optimize alignment for load transform Here's another minimal step suggested by D93229 / D93397 . (I'm trying to be extra careful in these changes because load transforms are easy to get wrong.) We can optimistically choose the greater alignment of a load and its pointer operand. As the test diffs show, this can improve what would have been unaligned vector loads into aligned loads. When we enhance with gep offsets, we will need to adjust the alignment calculation to include that offset. Differential Revision: https://reviews.llvm.org/D93406	2020-12-16 15:25:45 -05:00
Sanjay Patel	aaaf0ec72b	[VectorCombine] loosen alignment constraint for load transform As discussed in D93229, we only need a minimal alignment constraint when querying whether a hypothetical vector load is safe. We still pass/use the potentially stronger alignment attribute when checking costs and creating the new load. There's already a test that changes with the minimum code change, so splitting this off as a preliminary commit independent of any gep/offset enhancements. Differential Revision: https://reviews.llvm.org/D93397	2020-12-16 12:25:18 -05:00
Caroline Concatto	be9184bc55	[SLPVectorizer]Migrate getEntryCost to return InstructionCost This patch also changes: the return type of getGatherCost and the signature of the debug function dumpTreeCosts to use InstructionCost. This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html See this patch for the introduction of the type: https://reviews.llvm.org/D91174 Depends on D93049 Differential Revision: https://reviews.llvm.org/D93127	2020-12-16 14:18:40 +00:00
Caroline Concatto	07217e0a1b	[CostModel]Migrate getTreeCost() to use InstructionCost This patch changes the type of cost variables (for instance: Cost, ExtractCost, SpillCost) to use InstructionCost. This patch also changes the type of cost variables to InstructionCost in other functions that use the result of getTreeCost() This patch is part of a series of patches to use InstructionCost instead of unsigned/int for the cost model functions. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Depends on D91174 Differential Revision: https://reviews.llvm.org/D93049	2020-12-16 13:08:37 +00:00
Philip Reames	1f6e15566f	[LV] Weaken a unnecessarily strong assert [NFC] Account for the fact that (in the future) the latch might be a switch not a branch. The existing code is correct, minus the assert.	2020-12-15 19:07:53 -08:00
Philip Reames	af7ef895d4	[LV] Extend dead instruction detection to multiple exiting blocks Given we haven't yet enabled multiple exiting blocks, this is currently non functional, but it's an obvious extension which cleans up a later patch. I don't think this is worth review (as it's pretty obvious), if anyone disagrees, feel feel to revert or comment and I will.	2020-12-15 18:46:32 -08:00
Philip Reames	a81db8b315	[LV] Restructure handling of -prefer-predicate-over-epilogue option [NFC] This should be purely non-functional. When touching this code for another reason, I found the handling of the PredicateOrDontVectorize piece here very confusing. Let's make it an explicit state (instead of an implicit combination of two variables), and use early return for options/hint processing.	2020-12-15 12:38:13 -08:00
Florian Hahn	7186a3965a	[VPlan] Use VPDef for VPWidenSelectRecipe. This patch turns updates VPWidenSelectRecipe to manage the value it defines using VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90560	2020-12-15 14:15:01 +00:00
Florian Hahn	318f5798d8	[VPlan] Use VPDef for VPWidenGEPRecipe. This patch turns updates VPWidenGEPRecipe to manage the value it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90561	2020-12-15 09:30:14 +00:00
Florian Hahn	ad1161f9b5	[VPlan] Use VPdef for VPWidenCall. This patch turns updates VPWidenREcipe to manage the value it defines using VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90559	2020-12-15 09:20:07 +00:00
Sanjay Patel	d399f870b5	[VectorCombine] make load transform poison-safe As noted in D93229, the transform from scalar load to vector load potentially leaks poison from the extra vector elements that are being loaded. We could use freeze here (and x86 codegen at least appears to be the same either way), but we already have a shuffle in this logic to optionally change the vector size, so let's allow that instruction to serve both purposes. Differential Revision: https://reviews.llvm.org/D93238	2020-12-14 17:42:01 -05:00
Stanislav Mekhanoshin	87d7757bbe	[SLP] Control maximum vectorization factor from TTI D82227 has added a proper check to limit PHI vectorization to the maximum vector register size. That unfortunately resulted in at least a couple of regressions on SystemZ and x86. This change reverts PHI handling from D82227 and replaces it with a more general check in SLPVectorizerPass::tryToVectorizeList(). Moved to tryToVectorizeList() it allows to restart vectorization if initial chunk fails. However, this function is more general and handles not only PHI but everything which SLP handles. If vectorization factor would be limited to maximum vector register size it would limit much more vectorization than before leading to further regressions. Therefore a new TTI callback getMaximumVF() is added with the default 0 to preserve current behavior and limit nothing. Then targets can decide what is better for them. The callback gets ElementSize just like a similar getMinimumVF() function and the main opcode of the chain. The latter is to avoid regressions at least on the AMDGPU. We can have loads and stores up to 128 bit wide, and <2 x 16> bit vector math on some subtargets, where the rest shall not be vectorized. I.e. we need to differentiate based on the element size and operation itself. Differential Revision: https://reviews.llvm.org/D92059	2020-12-14 08:49:40 -08:00
Florian Hahn	e42e5263bd	[VPlan] Make VPWidenMemoryInstructionRecipe a VPDef. This patch updates VPWidenMemoryInstructionRecipe to use VPDef to manage the value it produces instead of inheriting from VPValue. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90563	2020-12-14 14:13:59 +00:00
Anton Afanasyev	fac7c7ec3c	[SLP] Fix vector element size for the store chains Vector element size could be different for different store chains. This patch prevents wrong computation of maximum number of elements for that case. Differential Revision: https://reviews.llvm.org/D93192	2020-12-14 15:51:43 +03:00
Kazu Hirata	5891ad4e22	[Transforms] Use llvm::erase_value (NFC)	2020-12-13 09:48:47 -08:00
Florian Hahn	533f85767c	[VPlan] Use interleaveComma in printOperands() (NFC).	2020-12-13 16:29:16 +00:00
Kazu Hirata	215c1b1935	[Transforms] Use is_contained (NFC)	2020-12-12 09:37:49 -08:00
David Green	ab97c9bdb7	[LV] Fix scalar cost for tail predicated loops When it comes to the scalar cost of any predicated block, the loop vectorizer by default regards this predication as a sign that it is looking at an if-conversion and divides the scalar cost of the block by 2, assuming it would only be executed half the time. This however makes no sense if the predication has been introduced to tail predicate the loop. Original patch by Anna Welker Differential Revision: https://reviews.llvm.org/D86452	2020-12-12 14:21:40 +00:00
David Sherwood	9b76160e53	[Support] Introduce a new InstructionCost class This is the first in a series of patches that attempts to migrate existing cost instructions to return a new InstructionCost class in place of a simple integer. This new class is intended to be as light-weight and simple as possible, with a full range of arithmetic and comparison operators that largely mirror the same sets of operations on basic types, such as integers. The main advantage to using an InstructionCost is that it can encode a particular cost state in addition to a value. The initial implementation only has two states - Normal and Invalid - but these could be expanded over time if necessary. An invalid state can be used to represent an unknown cost or an instruction that is prohibitively expensive. This patch adds the new class and changes the getInstructionCost interface to return the new class. Other cost functions, such as getUserCost, etc., will be migrated in future patches as I believe this to be less disruptive. One benefit of this new class is that it provides a way to unify many of the magic costs in the codebase where the cost is set to a deliberately high number to prevent optimisations taking place, e.g. vectorization. It also provides a route to represent the extremely high, and unknown, cost of scalarization of scalable vectors, which is not currently supported. Differential Revision: https://reviews.llvm.org/D91174	2020-12-11 08:12:54 +00:00
Sanjay Patel	12b684ae02	[VectorCombine] improve readability; NFC If we are going to allow adjusting the pointer for GEPs, rearranging the code a bit will make it easier to follow.	2020-12-10 13:10:26 -05:00
Sanjay Patel	b2ef264096	[VectorCombine] allow peeking through an extractelt when creating a vector load This is an enhancement to load vectorization that is motivated by a pattern in https://llvm.org/PR16739. Unfortunately, it's still not enough to make a difference there. We will have to handle multi-use cases in some better way to avoid creating multiple overlapping loads. Differential Revision: https://reviews.llvm.org/D92858	2020-12-09 10:36:14 -05:00
Anton Afanasyev	e5bf2e8989	[SLP] Use the width of value truncated just before storing For stores chain vectorization we choose the size of vector elements to ensure we fit to minimum and maximum vector register size for the number of elements given. This patch corrects vector element size choosing the width of value truncated just before storing instead of the width of value stored. Fixes PR46983 Differential Revision: https://reviews.llvm.org/D92824	2020-12-09 16:38:45 +03:00
Sander de Smalen	d568cff696	[LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF. * Steps are scaled by `vscale`, a runtime value. * Changes to circumvent the cost-model for now (temporary) so that the cost-model can be implemented separately. This can vectorize the following loop [1]: void loop(int N, double a, double b) { #pragma clang loop vectorize_width(4, scalable) for (int i = 0; i < N; i++) { a[i] = b[i] + 1.0; } } [1] This source-level example is based on the pragma proposed separately in D89031. This patch only implements the LLVM part. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91077	2020-12-09 11:25:21 +00:00
Sander de Smalen	adc37145de	[LoopVectorizer] NFC: Remove unnecessary asserts that VF cannot be scalable. This patch removes a number of asserts that VF is not scalable, even though the code where this assert lives does nothing that prevents VF being scalable. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91060	2020-12-09 11:25:21 +00:00
Sanjay Patel	5fe1a49f96	[SLP] fix typo in debug string; NFC	2020-12-07 15:09:21 -05:00
Bardia Mahjour	4db9b78c81	[LV] Epilogue Vectorization with Optimal Control Flow - Default Enablement This patch enables epilogue vectorization by default per reviewer requests. Differential Revision: https://reviews.llvm.org/D89566	2020-12-07 14:29:36 -05:00
Alexey Bataev	438682de6a	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the ivectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D92668	2020-12-07 07:50:00 -08:00
Philip Reames	0c866a3d6a	[LoopVec] Support non-instructions as argument to uniform mem ops The initial step of the uniform-after-vectorization (lane-0 demanded only) analysis was very awkwardly written. It would revisit use list of each pointer operand of a widened load/store. As a result, it was in the worst case O(N^2) where N was the number of instructions in a loop, and had restricted operand Value types to reduce the size of use lists. This patch replaces the original algorithm with one which is at most O(2N) in the number of instructions in the loop. (The key observation is that each use of a potentially interesting pointer is visited at most twice, once on first scan, once in the use list of it's operand. Only instructions within the loop have their uses scanned.) In the process, we remove a restriction which required the operand of the uniform mem op to itself be an instruction. This allows detection of uniform mem ops involving global addresses. Differential Revision: https://reviews.llvm.org/D92056	2020-12-03 14:51:44 -08:00
Bardia Mahjour	a7e2c26939	[LV] Epilogue Vectorization with Optimal Control Flow (Recommit) This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819. Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). However it's able to handle more loops, and generates more optimal control flow for cases where the trip count is too small to execute any code in vector form. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D89566	2020-12-02 10:09:56 -05:00
Sanjay Patel	56fd29e93b	[SLP] use 'match' for binop/select; NFC This might be a small improvement in readability, but the real motivation is to make it easier to adapt the code to deal with intrinsics like 'maxnum' and/or integer min/max. There is potentially help in doing that with D92086, but we might also just add specialized wrappers here to deal with the expected patterns.	2020-12-02 09:04:08 -05:00
David Sherwood	71bd59f0cb	[SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute In this patch I have added support for a new loop hint called vectorize.scalable.enable that says whether we should enable scalable vectorization or not. If a user wants to instruct the compiler to vectorize a loop with scalable vectors they can now do this as follows: br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2 ... !2 = !{!2, !3, !4} !3 = !{!"llvm.loop.vectorize.width", i32 8} !4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true} Setting the hint to false simply reverts the behaviour back to the default, using fixed width vectors. Differential Revision: https://reviews.llvm.org/D88962	2020-12-02 13:23:43 +00:00
Fangrui Song	a5309438fe	static const char *const foo => const char foo[] By default, a non-template variable of non-volatile const-qualified type having namespace-scope has internal linkage, so no need for `static`.	2020-12-01 10:33:18 -08:00
Bardia Mahjour	c94af03f7f	Revert "[LV] Epilogue Vectorization with Optimal Control Flow" This reverts commit `9c5504adce`. Reverting to investigate build failure in http://lab.llvm.org:8011/#/builders/98/builds/1461/steps/9	2020-12-01 12:50:36 -05:00
Bardia Mahjour	9c5504adce	[LV] Epilogue Vectorization with Optimal Control Flow This is yet another attempt at providing support for epilogue vectorization following discussions raised in RFC http://llvm.1065342.n5.nabble.com/llvm-dev-Proposal-RFC-Epilog-loop-vectorization-tt106322.html#none and reviews D30247 and D88819. Similar to D88819, this patch achieve epilogue vectorization by executing a single vplan twice: once on the main loop and a second time on the epilogue loop (using a different VF). However it's able to handle more loops, and generates more optimal control flow for cases where the trip count is too small to execute any code in vector form. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D89566	2020-12-01 12:04:29 -05:00
Cullen Rhodes	cba4accda0	[LV] Clamp VF hint when unsafe In the following loop the dependence distance is 2 and can only be vectorized if the vector length is no larger than this. void foo(int a, int b, int N) { #pragma clang loop vectorize(enable) vectorize_width(4) for (int i=0; i<N; ++i) { a[i + 2] = a[i] + b[i]; } } However, when specifying a VF of 4 via a loop hint this loop is vectorized. According to [1][2], loop hints are ignored if the optimization is not safe to apply. This patch introduces a check to bail of vectorization if the user specified VF is greater than the maximum feasible VF, unless explicitly forced with '-force-vector-width=X'. [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave [2] https://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations Reviewed By: sdesmalen, fhahn, Meinersbur Differential Revision: https://reviews.llvm.org/D90687	2020-12-01 11:30:34 +00:00
Caroline Concatto	4b0ef2b075	[NFC][CostModel]Extend class IntrinsicCostAttributes to use ElementCount Type This patch replaces the attribute `unsigned VF` in the class IntrinsicCostAttributes by `ElementCount VF`. This is a non-functional change to help upcoming patches to compute the cost model for scalable vector inside this class. Differential Revision: https://reviews.llvm.org/D91532	2020-12-01 11:12:51 +00:00
Sjoerd Meijer	f44ba25135	ExtractValue instruction costs Instruction ExtractValue wasn't handled in LoopVectorizationCostModel::getInstructionCost(). As a result, it was modeled as a mul which is not really accurate. Since it is free (most of the times), this now gets a cost of 0 using getInstructionCost. This is a follow-up of D92208, that required changing this regression test. In a follow up I will look at InsertValue which also isn't handled yet. Differential Revision: https://reviews.llvm.org/D92317	2020-12-01 10:42:23 +00:00
Florian Hahn	fe83adb05a	[VPlan] Use VPUser to manage VPPredInstPHIRecipe operand (NFC). VPPredInstPHIRecipe is one of the recipes that was missed during the initial conversion. This patch adjusts the recipe to also manage its operand using VPUser.	2020-11-30 13:09:58 +00:00
Fangrui Song	5408fdcd78	[VPlan] Fix -Wunused-variable after `a813090072`	2020-11-29 10:38:01 -08:00
Florian Hahn	4bc9b909d7	[VPlan] Use VPValue and VPUser ops to print VPReplicateRecipe.	2020-11-29 18:28:27 +00:00
Florian Hahn	a813090072	[VPlan] Manage stored values of interleave groups using VPUser (NFC) Interleave groups also depend on the values they store. Manage the stored values as VPUser operands. This is currently a NFC, but is required to allow VPlan transforms and to manage generated vector values exclusively in VPTransformState.	2020-11-29 17:24:36 +00:00
Florian Hahn	ae008798a4	[VPlan] Use VPTransformState::set in widenGEP. This patch updates widenGEP to manage the resulting vector values using the VPValue of VPWidenGEP recipe.	2020-11-27 17:01:55 +00:00
Sjoerd Meijer	10ad64aa3b	[SLP] Dump Tree costs. NFC. This adds LLVM_DEBUG messages to dump the (intermediate) tree cost calculations, which is useful to trace and see how the final cost is calculated.	2020-11-27 11:37:33 +00:00
Florian Hahn	bd0b1311db	[VPlan] Turn VPReplicateRecipe into a VPValue. Update VPReplicateRecipe to inherit from VPValue. This still does not update scalarizeInstruction to set the result for the VPValue of VPReplicateRecipe, because this first requires tracking scalar values in VPTransformState. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D91500	2020-11-26 13:50:24 +00:00
Cullen Rhodes	1ba4b82f67	[LAA] NFC: Rename [get]MaxSafeRegisterWidth -> [get]MaxSafeVectorWidthInBits MaxSafeRegisterWidth is a misnomer since it actually returns the maximum safe vector width. Register suggests it relates directly to a physical register where it could be a vector spanning one or more physical registers. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D91727	2020-11-25 13:06:26 +00:00
Florian Hahn	ad5b83ddcf	[VPlan] Add VPReductionSC to VPUser::classof, unify VPValue IDs. This is a follow-up to `00a6601136` to make isa<VPReductionRecipe> work and unifies the VPValue ID names, by making sure they all consistently start with VPV*.	2020-11-25 11:08:25 +00:00
David Green	e0c479cd0e	[VPlan] Switch VPWidenRecipe to be a VPValue Similar to other patches, this makes VPWidenRecipe a VPValue. Because of the way it interacts with the reduction code it also slightly alters the way that VPValues are registered, removing the up front NeedDef and using getOrAddVPValue to create them on-demand if needed instead. Differential Revision: https://reviews.llvm.org/D88447	2020-11-25 08:25:06 +00:00
David Green	00a6601136	[VPlan] Turn VPReductionRecipe into a VPValue This converts the VPReductionRecipe into a VPValue, like other VPRecipe's in preparation for traversing def-use chains. It also makes it a VPUser, now storing the used VPValues as operands. It doesn't yet change how the VPReductionRecipes are created. It will need to call replaceAllUsesWith from the original recipe they replace, but that is not done yet as VPWidenRecipe need to be created first. Differential Revision: https://reviews.llvm.org/D88382	2020-11-25 08:25:05 +00:00
Philip Reames	10ddb927c1	[SCEV] Use isa<> pattern for testing for CouldNotCompute [NFC] Some older code - and code copied from older code - still directly tested against the singelton result of SE::getCouldNotCompute. Using the isa<SCEVCouldNotCompute> form is both shorter, and more readable.	2020-11-24 18:47:49 -08:00
Philip Reames	075468621c	[LoopVec] Add a minor clarifying comment	2020-11-24 10:45:06 -08:00
Ayal Zaks	32d9a386bf	[LV] Keep Primary Induction alive when folding tail by masking Fix PR47390. The primary induction should be considered alive when folding tail by masking, because it will be used by said masking; even when it may otherwise appear useless: feeding only its own 'bump', which is correctly considered dead, and as the 'bump' of another induction variable, which may wrongfully want to consider its bump = the primary induction, dead. Differential Revision: https://reviews.llvm.org/D92017	2020-11-24 15:12:54 +02:00
Philip Reames	1a9c72f8a8	[LoopVec] Reuse a lambda [NFC] Minor code refactor to improve readability.	2020-11-23 21:07:34 -08:00
Philip Reames	b06a2ad94f	[LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE) A uniform load is one which loads from a uniform address across all lanes. As currently implemented, we cost model such loads as if we did a single scalar load + a broadcast, but the actual lowering replicates the load once per lane. This change tweaks the lowering to use the REPLICATE strategy by marking such loads (and the computation leading to their memory operand) as uniform after vectorization. This is a useful change in itself, but it's real purpose is to pave the way for a following change which will generalize our uniformity logic. In review discussion, there was an issue raised with coupling cost modeling with the lowering strategy for uniform inputs. The discussion on that item remains unsettled and is pending larger architectural discussion. We decided to move forward with this patch as is, and revise as warranted once the bigger picture design questions are settled. Differential Revision: https://reviews.llvm.org/D91398	2020-11-23 15:32:17 -08:00
Alexey Bataev	0b420d674a	[SLP][NFC]Fix assert condition in newTreeEntry, NFC.	2020-11-20 13:25:21 -08:00
Hongtao Yu	f3c445697d	[CSSPGO] IR intrinsic for pseudo-probe block instrumentation This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story. A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues: 1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality. 2. The counter atomics may not be fully cleaned up from the code stream eventually. 3. Extra work is needed for re-targeting. We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality. Let's now look at an example. Given the following LLVM IR: ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 br i1 %cmp, label %bb1, label %bb2 bb1: br label %bb3 bb2: br label %bb3 bb3: ret void } ``` The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID. ``` define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 { bb0: %cmp = icmp eq i32 %x, 0 call void @llvm.pseudoprobe(i64 837061429793323041, i64 1) br i1 %cmp, label %bb1, label %bb2 bb1: call void @llvm.pseudoprobe(i64 837061429793323041, i64 2) br label %bb3 bb2: call void @llvm.pseudoprobe(i64 837061429793323041, i64 3) br label %bb3 bb3: call void @llvm.pseudoprobe(i64 837061429793323041, i64 4) ret void } ``` Reviewed By: wmi Differential Revision: https://reviews.llvm.org/D86490	2020-11-20 10:39:24 -08:00
Sander de Smalen	41c9f4c1ce	[LoopVectorize] NFC: Fix unused variable warning for MaxSafeDepDist rGf571fe6df585127d8b045f8e8f5b4e59da9bbb73 led to a warning of an unused variable for MaxSafeDepDist (written but not used). It seems this variable and assignment can be safely removed.	2020-11-19 17:41:35 +00:00
Simon Moll	a1de391dae	[LV][NFC-ish] Allow vector widths over 256 elements The assertion that vector widths are <= 256 elements was hard wired in the LV code. Eg, VE allows for vectors up to 512 elements. Test again the TTI vector register bit width instead - this is an NFC for non-asserting builds. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91518	2020-11-19 10:58:29 +01:00
Benjamin Kramer	4dbe12e866	[SLP] Use the minimum alignment of the load bundle when forming a masked.gather Instead of the first load. That works when vectorizing contiguous loads, but not for gathers. Fixes a miscompile introduced in `fcad8d3635`.	2020-11-18 12:53:39 +01:00
Sanjay Patel	08834979e3	[SLP] avoid unreachable code crash/infloop Example based on the post-commit comments for D88735.	2020-11-17 15:10:23 -05:00
Florian Hahn	52f3714dae	[VPlan] Add VPDef class. This patch introduces a new VPDef class, which can be used to manage VPValues defined by recipes/VPInstructions. The idea here is to mirror VPUser for values defined by a recipe. A VPDef can produce either zero (e.g. a store recipe), one (most recipes) or multiple (VPInterleaveRecipe) result VPValues. To traverse the def-use chain from a VPDef to its users, one has to traverse the users of all values defined by a VPDef. VPValues now contain a pointer to their corresponding VPDef, if one exists. To traverse the def-use chain upwards from a VPValue, we first need to check if the VPValue is defined by a VPDef. If it does not have a VPDef, this means we have a VPValue that is not directly defined iniside the plan and we are done. If we have a VPDef, it is defined inside the region by a recipe, which is a VPUser, and the upwards def-use chain traversal continues by traversing all its operands. Note that we need to add an additional field to to VPVAlue to link them to their defs. The space increase is going to be offset by being able to remove the SubclassID field in future patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D90558	2020-11-17 16:18:11 +00:00
Anton Afanasyev	0a1d315f9f	[SLPVectorizer] Fix assert	2020-11-17 18:46:31 +03:00
Anton Afanasyev	fcad8d3635	[SLP] Make SLPVectorizer to use `llvm.masked.gather` intrinsic For the scattered operands of load instructions it makes sense to use gathering load intrinsic, which can lower to native instruction for X86/AVX512 and ARM/SVE. This also enables building vectorization tree with entries containing scattered operands. The next step is to add scattered store. Fixes PR47629 and PR47623 Differential Revision: https://reviews.llvm.org/D90445	2020-11-17 18:11:45 +03:00
Sander de Smalen	f571fe6df5	Reland [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. This relands https://reviews.llvm.org/D91059 and reverts commit `30fded75b4`. GetRegUsage now returns 0 when Ty is not a valid vector element type.	2020-11-17 13:45:10 +00:00
Philip Reames	2240d3d054	[LoopVec] Introduce an api for detecting uniform memory ops Split off D91398 at request of reviewer.	2020-11-16 13:30:48 -08:00
Florian Hahn	0c119ba8a8	[VPlan] Use VPValue def for VPWidenGEPRecipe. This patch turns VPWidenGEPRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84683	2020-11-15 15:12:47 +00:00
Florian Hahn	a70b511e78	Recommit "[VPlan] Use VPValue def for VPWidenSelectRecipe." This reverts the revert commit `c8d73d939f`. It includes a fix for cases where we missed inserting VPValues for some selects, which should fix PR48142.	2020-11-14 20:00:25 +00:00
serge-sans-paille	9218ff50f9	llvmbuildectomy - replace llvm-build by plain cmake No longer rely on an external tool to build the llvm component layout. Instead, leverage the existing `add_llvm_componentlibrary` cmake function and introduce `add_llvm_component_group` to accurately describe component behavior. These function store extra properties in the created targets. These properties are processed once all components are defined to resolve library dependencies and produce the header expected by llvm-config. Differential Revision: https://reviews.llvm.org/D90848	2020-11-13 10:35:24 +01:00
Sander de Smalen	30fded75b4	Revert "[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost." This reverts commits: * [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. `b873aba394`. * [LoopVectorizer] Silence warning in GetRegUsage. `9ff701100a`.	2020-11-11 14:41:55 +00:00
Sander de Smalen	9ff701100a	[LoopVectorizer] Silence warning in GetRegUsage. This patch silences the warning: error: lambda capture 'DL' is not used [-Werror,-Wunused-lambda-capture] auto GetRegUsage = [&DL, &TTI=TTI](Type *Ty, ElementCount VF) { ~^~~ 1 error generated. Introduced in: https://reviews.llvm.org/rGb873aba3943c067a5efd5303cbdf5aeb0732cf88	2020-11-11 10:54:20 +00:00
Sander de Smalen	b873aba394	[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost. This is more accurate than dividing the bitwidth based on the element count by the maximum register size, as it can just reuse whatever has been calculated for legalization of these types. This change is also necessary when calculating register usage for scalable vectors, where the legalization of these types cannot be done based on the widest register size, because that does not take the 'vscale' component into account. Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D91059	2020-11-11 10:18:50 +00:00
Sander de Smalen	0141f5a49d	[LoopVectorizer] NFC: Return ElementCount from compute[Feasible]MaxVF Interfaces changed to return `ElementCount`: * LoopVectorizationCostModel::computeMaxVF * LoopVectorizationCostModel::computeFeasibleMaxVF This is NFC for fixed-width vectors. Reviewed By: dmgreen, ctetreau Differential Revision: https://reviews.llvm.org/D90880	2020-11-11 09:55:06 +00:00
Florian Hahn	c8d73d939f	Revert "[VPlan] Use VPValue def for VPWidenSelectRecipe." This reverts commit `a8e50f1c6e`. This reportedly breaks building the Linux kernel. https://bugs.llvm.org/show_bug.cgi?id=48142	2020-11-10 22:50:46 +00:00
Florian Hahn	a8e50f1c6e	[VPlan] Use VPValue def for VPWidenSelectRecipe. This patch turns VPWidenSelectRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84682	2020-11-10 19:39:37 +00:00
Sander de Smalen	f47573f9bf	[LoopVectorizer] NFC: Propagate ElementCount to more interfaces. Interfaces changed to take `ElementCount` as parameters: * LoopVectorizationPlanner::buildVPlans * LoopVectorizationPlanner::buildVPlansWithVPRecipes * LoopVectorizationCostModel::selectVectorizationFactor This patch is NFC for fixed-width vectors. Reviewed By: dmgreen, ctetreau Differential Revision: https://reviews.llvm.org/D90879	2020-11-10 11:11:02 +00:00
Florian Hahn	f0d76275cb	[VPlan] Print result value for loads in VPWidenMemoryInst (NFC). For loads, print the result value.	2020-11-09 14:01:29 +00:00
Florian Hahn	537829f2a7	[VPlan] Add isStore helper to VPWidenMemoryInstructionRecipe (NFC). Move logic to check if the recipe is a store to a helper for easier reuse.	2020-11-09 14:01:29 +00:00
Florian Hahn	fec64de261	[VPlan] Use VPValue def for VPWidenCall. This patch turns VPWidenCall into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84681	2020-11-09 13:29:41 +00:00
Florian Hahn	091c5c9a18	[VPlan] Add printOperands helper to VPUser (NFC). Factor out the code for printing operands of a VPUser so it can be re-used when printing other recipes.	2020-11-09 12:30:57 +00:00
Florian Hahn	d8d1cc647d	[SLP] Also try to vectorize incoming values of PHIs . Currently we do not consider incoming values of PHIs as roots for SLP vectorization. This means we miss scenarios like the one in the test case and PR47670. It appears quite straight-forward to consider incoming values of PHIs as roots for vectorization, but I might be missing something that makes this problematic. In terms of vectorized instructions, this applies to quite a few benchmarks across MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto Same hash: 185 (filtered out) Remaining: 52 Metric: SLP.NumVectorInstructions Program base patch diff test-suite...ProxyApps-C++/HPCCG/HPCCG.test 9.00 27.00 200.0% test-suite...C/CFP2000/179.art/179.art.test 8.00 22.00 175.0% test-suite...T2006/458.sjeng/458.sjeng.test 14.00 30.00 114.3% test-suite...ce/Benchmarks/PAQ8p/paq8p.test 11.00 18.00 63.6% test-suite...s/FreeBench/neural/neural.test 12.00 18.00 50.0% test-suite...rimaran/enc-3des/enc-3des.test 65.00 95.00 46.2% test-suite...006/450.soplex/450.soplex.test 63.00 89.00 41.3% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 177.00 250.00 41.2% test-suite...nchmarks/McCat/18-imp/imp.test 13.00 18.00 38.5% test-suite.../Applications/sgefa/sgefa.test 26.00 35.00 34.6% test-suite...pplications/oggenc/oggenc.test 100.00 133.00 33.0% test-suite...6/482.sphinx3/482.sphinx3.test 103.00 134.00 30.1% test-suite...oxyApps-C++/miniFE/miniFE.test 169.00 213.00 26.0% test-suite.../Benchmarks/Olden/tsp/tsp.test 59.00 73.00 23.7% test-suite...TimberWolfMC/timberwolfmc.test 503.00 622.00 23.7% test-suite...T2006/456.hmmer/456.hmmer.test 65.00 79.00 21.5% test-suite...libquantum/462.libquantum.test 58.00 68.00 17.2% test-suite...ternal/HMMER/hmmcalibrate.test 84.00 98.00 16.7% test-suite...ications/JM/ldecod/ldecod.test 351.00 401.00 14.2% test-suite...arks/VersaBench/dbms/dbms.test 52.00 57.00 9.6% test-suite...ce/Benchmarks/Olden/bh/bh.test 118.00 128.00 8.5% test-suite.../Benchmarks/Bullet/bullet.test 6355.00 6880.00 8.3% test-suite...nsumer-lame/consumer-lame.test 480.00 519.00 8.1% test-suite...000/183.equake/183.equake.test 226.00 244.00 8.0% test-suite...chmarks/Olden/power/power.test 105.00 113.00 7.6% test-suite...6/471.omnetpp/471.omnetpp.test 92.00 99.00 7.6% test-suite...ications/JM/lencod/lencod.test 1173.00 1261.00 7.5% test-suite...0/253.perlbmk/253.perlbmk.test 55.00 59.00 7.3% test-suite...oxyApps-C/miniAMR/miniAMR.test 92.00 98.00 6.5% test-suite...chmarks/MallocBench/gs/gs.test 446.00 473.00 6.1% test-suite.../CINT2006/403.gcc/403.gcc.test 464.00 491.00 5.8% test-suite...6/464.h264ref/464.h264ref.test 998.00 1055.00 5.7% test-suite...006/453.povray/453.povray.test 5711.00 6007.00 5.2% test-suite...FreeBench/distray/distray.test 102.00 107.00 4.9% test-suite...:: External/Povray/povray.test 4184.00 4378.00 4.6% test-suite...DOE-ProxyApps-C/CoMD/CoMD.test 112.00 117.00 4.5% test-suite...T2006/445.gobmk/445.gobmk.test 104.00 108.00 3.8% test-suite...CI_Purple/SMG2000/smg2000.test 789.00 819.00 3.8% test-suite...yApps-C++/PENNANT/PENNANT.test 233.00 241.00 3.4% test-suite...marks/7zip/7zip-benchmark.test 417.00 428.00 2.6% test-suite...arks/mafft/pairlocalalign.test 627.00 643.00 2.6% test-suite.../Benchmarks/nbench/nbench.test 259.00 265.00 2.3% test-suite...006/447.dealII/447.dealII.test 4641.00 4732.00 2.0% test-suite...lications/ClamAV/clamscan.test 106.00 108.00 1.9% test-suite...CFP2000/177.mesa/177.mesa.test 1639.00 1664.00 1.5% test-suite...oxyApps-C/RSBench/rsbench.test 66.00 65.00 -1.5% test-suite.../CINT2000/252.eon/252.eon.test 3416.00 3444.00 0.8% test-suite...CFP2000/188.ammp/188.ammp.test 1846.00 1861.00 0.8% test-suite.../CINT2000/176.gcc/176.gcc.test 152.00 153.00 0.7% test-suite...CFP2006/444.namd/444.namd.test 3528.00 3544.00 0.5% test-suite...T2006/473.astar/473.astar.test 98.00 98.00 0.0% test-suite...frame_layout/frame_layout.test NaN 39.00 nan% On ARM64, there appears to be a slight regression on SPEC2006, which might be interesting to investigate: test-suite...T2006/473.astar/473.astar.test 0.9% Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D88735	2020-11-06 12:50:32 +00:00
Sander de Smalen	4a3bb9ea6c	[VPlan] NFC: Change VFRange to take ElementCount This patch changes the type of Start, End in VFRange to be an ElementCount instead of `unsigned`. This is done as preparation to make VPlans for scalable vectors, but is otherwise NFC. Reviewed By: dmgreen, fhahn, vkmr Differential Revision: https://reviews.llvm.org/D90715	2020-11-06 09:50:20 +00:00
Florian Hahn	d9cbf39a37	[SLP] Pass VecPred argument to getCmpSelInstrCost. Check if all compares in VL have the same predicate and pass it to getCmpSelInstrCost, to improve cost-modeling on targets that only support compare/select combinations for certain uniform predicates. This leads to additional vectorization in some cases ``` Same hash: 217 (filtered out) Remaining: 19 Metric: SLP.NumVectorInstructions Program base slp2 diff test-suite...marks/SciMark2-C/scimark2.test 11.00 26.00 136.4% test-suite...T2006/445.gobmk/445.gobmk.test 79.00 135.00 70.9% test-suite...ediabench/gsm/toast/toast.test 54.00 71.00 31.5% test-suite...telecomm-gsm/telecomm-gsm.test 54.00 71.00 31.5% test-suite...CI_Purple/SMG2000/smg2000.test 426.00 542.00 27.2% test-suite...ch/g721/g721encode/encode.test 30.00 24.00 -20.0% test-suite...000/186.crafty/186.crafty.test 116.00 138.00 19.0% test-suite...ications/JM/ldecod/ldecod.test 697.00 765.00 9.8% test-suite...6/464.h264ref/464.h264ref.test 822.00 886.00 7.8% test-suite...chmarks/MallocBench/gs/gs.test 154.00 162.00 5.2% test-suite...nsumer-lame/consumer-lame.test 621.00 651.00 4.8% test-suite...lications/ClamAV/clamscan.test 223.00 231.00 3.6% test-suite...marks/7zip/7zip-benchmark.test 680.00 695.00 2.2% test-suite...CFP2000/177.mesa/177.mesa.test 2121.00 2129.00 0.4% test-suite...:: External/Povray/povray.test 2406.00 2412.00 0.2% test-suite...TimberWolfMC/timberwolfmc.test 634.00 634.00 0.0% test-suite...CFP2006/433.milc/433.milc.test 1036.00 1036.00 0.0% test-suite.../Benchmarks/nbench/nbench.test 321.00 321.00 0.0% test-suite...ctions-flt/Reductions-flt.test NaN 5.00 nan% ``` Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D90124	2020-11-03 10:16:43 +00:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Florian Hahn	ca38652b9a	[VPlan] Assert no users remaining when deleting a VPValue. When deleting a VPValue, all users must already by deleted. Add an assertion to make sure and catch violations.	2020-11-01 17:44:53 +00:00
Florian Hahn	799033d8c5	Reland "[SLP] Consider alternatives for cost of select instructions." This reverts the revert commit `a1b53db324`. This patch includes a fix for a reported issue, caused by matchSelectPattern returning UMIN for selects of pointers in some cases by looking to some connected casts. For now, ensure integer instrinsics are only returned for selects of ints or int vectors.	2020-10-31 16:52:36 +00:00
Florian Hahn	a1b53db324	Revert "[SLP] Consider alternatives for cost of select instructions." This reverts commit `1922570489`. This appears to cause a crash in the following example a, b, c; l() { int e = a, f = l, g, h, i, j; float d = c, k = b; for (;;) for (; g < f; g++) { k[h] = d[i]; k[h - 1] = d[j]; h += e << 1; i += e; } } clang -cc1 -triple i386-unknown-linux-gnu -emit-obj -target-cpu pentium-m -O1 -vectorize-loops -vectorize-slp reduced.c llvm::Type *llvm::Type::getWithNewBitWidth(unsigned int) const: Assertion `isIntOrIntVectorTy() && "Original type expected to be a vector of integers or a scalar integer."' failed.	2020-10-30 21:26:14 +00:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Florian Hahn	aa1a198a64	[VPlan] Use isa<> instead getVPRecipeID in getFirstNonPhi (NFC). As per the comment in VPRecipeBase, clients should not rely on getVPRecipeID, as it may change in the future. It should only be used in classof implementations. Use isa instead in getFirstNonPhi.	2020-10-30 14:56:06 +00:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
Florian Hahn	1922570489	[SLP] Consider alternatives for cost of select instructions. Some architectures do not have general vector select instructions (e.g. AArch64). But some cmp/select patterns can be vectorized using other instructions/intrinsics. One example is using min/max instructions for certain patterns. This patch updates the cost calculations for selects in the SLP vectorizer to consider using min/max intrinsics. This patch does not change SLP vectorizer's codegen itself to actually generate those intrinsics, but relies on the backends to lower the vector cmps & selects. This keeps things simple on the SLP side and works well in practice for AArch64. This exposes additional SLP vectorization opportunities in some benchmarks on AArch64 (-O3 -flto). Metric: SLP.NumVectorInstructions Program base slp diff test-suite...ications/JM/ldecod/ldecod.test 502.00 697.00 38.8% test-suite...ications/JM/lencod/lencod.test 1023.00 1414.00 38.2% test-suite...-typeset/consumer-typeset.test 56.00 65.00 16.1% test-suite...6/464.h264ref/464.h264ref.test 804.00 822.00 2.2% test-suite...006/453.povray/453.povray.test 3335.00 3357.00 0.7% test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2121.00 0.5% test-suite...:: External/Povray/povray.test 2378.00 2382.00 0.2% Reviewed By: RKSimon, samparker Differential Revision: https://reviews.llvm.org/D89969	2020-10-29 20:39:50 +00:00
Nicolai Hähnle	e025d09b21	Revert multiple patches based on "Introduce CfgTraits abstraction" These logically belong together since it's a base commit plus followup fixes to less common build configurations. The patches are: Revert "CfgInterface: rename interface() to getInterface()" This reverts commit `a74fc48158`. Revert "Wrap CfgTraitsFor in namespace llvm to please GCC 5" This reverts commit `f2a06875b6`. Revert "Try to make GCC5 happy about the CfgTraits thing" This reverts commit `03a5f7ce12`. Revert "Introduce CfgTraits abstraction" This reverts commit `c0cdd22c72`.	2020-10-27 20:33:30 +01:00
Joe Ellis	467e5cf40f	[SVE][AArch64] Fix TypeSize warning in loop vectorization legality The warning would fire when calling isDereferenceableAndAlignedInLoop with a scalable load. Calling isDereferenceableAndAlignedInLoop with a scalable load would result in the use of the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes this issue by: - no longer considering vector loads as candidates in canVectorizeWithIfConvert. This doesn't make sense in the context of identifying scalar loads to vectorize. - making use of getFixedSize inside isDereferenceableAndAlignedInLoop -- this removes the dependency on the deprecated interface, and will trigger an assertion error if the function is ever called with a scalable type. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D89798	2020-10-26 17:40:04 +00:00
Nicolai Hähnle	c0cdd22c72	Introduce CfgTraits abstraction The CfgTraits abstraction simplfies writing algorithms that are generic over the type of CFG, and enables writing such algorithms as regular non-template code that operates on opaque references to CFG blocks and values. Implementations of CfgTraits provide operations on the concrete CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock `. CfgInterface is an abstract base class which provides operations on opaque types CfgBlockRef and CfgValueRef. Those opaque types encapsulate a `void `, but the meaning depends on the concrete CFG type. For example, MachineCfgTraits -- for use with MachineIR in SSA form -- encodes a Register inside CfgValueRef. Converting between concrete references and opaque/generic ones is done by CfgTraits::{fromGeneric,toGeneric}. Convenience methods CfgTraits::{un}wrap{Iterator,Range} are available as well. Writing algorithms in terms of CfgInterface adds some overhead (virtual method calls, plus in same cases it removes the opportunity to inline iterators), but can be much more convenient since generic algorithms can be written as non-templates. This patch adds implementations of CfgTraits for all CFGs on which dominator trees are calculated, so that the dominator tree can be ported to this machinery. Only IrCfgTraits (LLVM IR) and MachineCfgTraits (Machine IR in SSA form) are complete, the other implementations are limited to the absolute minimum required to make the upcoming dominator tree changes work. v5: - fix MachineCfgTraits::blockdef_iterator and allow it to iterate over the instructions in a bundle - use MachineBasicBlock::printName v6: - implement predecessors/successors for all CfgTraits implementations - fix error in unwrapRange - rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming that is consistent with {wrap,unwrap}{Iterator,Range} - use getVRegDef instead of getUniqueVRegDef v7: - std::forward fix in wrapping_iterator - fix typos v8: - cleanup operators on CfgOpaqueType - address other review comments Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d Differential Revision: https://reviews.llvm.org/D83088	2020-10-20 13:50:52 +02:00
Artem Belevich	c36c0fabd1	[VectorCombine] Avoid crossing address space boundaries. We can not bitcast pointers across different address spaces, and VectorCombine should be careful when it attempts to find the original source of the loaded data. Differential Revision: https://reviews.llvm.org/D89577	2020-10-16 13:19:31 -07:00
Florian Hahn	89c0124273	[LoopVersion] Unify SCEVChecks and alias check handling (NFC). This is an initial cleanup of the way LoopVersioning interacts with LAA. Currently LoopVersioning has 2 ways of initializing things: 1. Passing LAI and passing UseLAIChecks = true 2. Passing UseLAIChecks = false, followed by calling setSCEVChecks and setAliasChecks. Both ways of initializing lead to the same result and the duplication seems more complicated than necessary. This patch removes the UseLAIChecks flag from the constructor and the setSCEVChecks & setAliasChecks helpers and move initialization exclusively to the constructor. This simplifies things, by providing a single way to initialize LoopVersioning and reducing duplication. Reviewed By: Meinersbur, lebedev.ri Differential Revision: https://reviews.llvm.org/D84406	2020-10-15 22:02:17 +01:00
David Green	13ec3dd66f	[LV] Add a getRecurrenceBinOp and make use of it. NFC	2020-10-15 18:21:41 +01:00
Florian Hahn	93f6c6b79c	Recommit "[VPlan] Use VPValue def for VPMemoryInstructionRecipe." This reverts the revert commit `710aceb645` and includes a fix for a memsan failure. Original message: This patch turns VPMemoryInstructionRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible.	2020-10-14 17:41:23 +01:00
Evgeniy Brevnov	d0c95808e5	[LV] Unroll factor is expected to be > 0 LV fails with assertion checking that UF > 0. We already set UF to 1 if it is 0 except the case when IC > MaxInterleaveCount. The fix is to set UF to 1 for that case as well. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D87679	2020-10-14 16:48:17 +07:00
Vitaly Buka	710aceb645	Revert "[VPlan] Use VPValue def for VPMemoryInstructionRecipe." It introduced a memory leak. This reverts commit `525b085a65`.	2020-10-13 03:14:08 -07:00
Florian Hahn	525b085a65	[VPlan] Use VPValue def for VPMemoryInstructionRecipe. This patch turns VPMemoryInstructionRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84680	2020-10-12 18:02:33 +01:00
Florian Hahn	ea058d289c	[VPlan] Use operands for printing of VPWidenMemoryInstructionRecipe. Now that operands of the recipe are managed through VPUser, we can simplify the printing by just using the operands.	2020-10-12 16:51:54 +01:00
David Sherwood	c5ba0d33cc	[SVE] Make ElementCount and TypeSize use a new PolySize class I have introduced a new template PolySize class, where the template parameter determines the type of quantity, i.e. for an element count this is just an unsigned value. The ElementCount class is now just a simple derivation of PolySize<unsigned>, whereas TypeSize is more complicated because it still needs to contain the uint64_t cast operator, since there are still many places in the code that rely upon this implicit cast. As such the class also still needs some of it's own operators. I've tried to minimise the amount of code in the base PolySize class, which led to a couple of changes: 1. In some places we were relying on '==' operator comparisons between ElementCounts and the scalar value 1. I didn't put this operator in the new PolySize class, and thought it was actually clearer to use the isScalar() function instead. 2. I removed the isByteSized function and replaced it with calls to isKnownMultipleOf(8). I've also renamed NextPowerOf2 to be coefficientNextPowerOf2 so that it's more consistent with coefficientDivideBy. Differential Revision: https://reviews.llvm.org/D88409	2020-10-12 08:23:38 +01:00
David Green	be6e8e50f4	[LV] Tail folded inloop reductions. This expands upon the inloop reductions added in e9761688e41cb9e976, allowing them to be inserted into tail folded loops. Reductions are generates with the form: x = select(mask, vecop, zero) v = vecreduce.add(x) c = add chain, v Where zero here is chosen as the identity value for add reductions. The backend is then expected to fold the select and the vecreduce into a single predicated instruction. Most of the code is fairly straight forward, except for the creation of blockmasks which need to ensure they are created in dominance order. The order they are added is altered to be after any phis, keeping the requirements for the underlying IR. Differential Revision: https://reviews.llvm.org/D84451	2020-10-11 16:58:34 +01:00
Simon Pilgrim	0716805c02	[SLP] optimizeGatherSequence - assert every Instruction in the worklist is non-null. Fixes clang static analyzer warning.	2020-10-08 20:02:18 +01:00
David Green	498f89d188	[LV] Collect dead induction truncates We currently collect the ICmp and Add from an induction variable, marking them as dead so that vplan values are not created for them. This extends that to include any single use trunk from the ICmp, which allows the Add to more readily be removed too. This can help with costing vplan nodes, as the ICmp and Add are more reliably removed and are not double-counted. Differential Revision: https://reviews.llvm.org/D88873	2020-10-08 08:28:58 +01:00
Florian Hahn	348d85a6c7	[VPlan] Clean up uses/operands on VPBB deletion. Update the code responsible for deleting VPBBs and recipes to properly update users and release operands. This is another preparation for D84680 & following patches towards enabling modeling def-use chains in VPlan.	2020-10-05 14:43:52 +01:00
Florian Hahn	357bbaab66	[VPlan] Add VPRecipeBase::toVPUser helper (NFC). This adds a helper to convert a VPRecipeBase pointer to a VPUser, for recipes that inherit from VPUser. Once VPRecipeBase directly inherits from VPUser this helper can be removed.	2020-10-04 19:43:27 +01:00
Florian Hahn	f5fe7abe8a	[VPlan] Account for removed users in replaceAllUsesWith. Make sure we do not iterate using an invalid iterator. Another small fix/step towards traversing the def-use chains in VPlan.	2020-10-04 18:18:58 +01:00
Florian Hahn	82dcd383c4	[VPlan] Properly update users when updating operands. When updating operands of a VPUser, we also have to adjust the list of users for the new and old VPValues. This is required once we start transitioning recipes to become VPValues.	2020-10-03 20:54:58 +01:00
Florian Hahn	0867a9e85a	[VPlan] Use isa<> instead of directly checking VPRecipeID (NFC). getVPRecipeID is intended to be only used in `classof` helpers. Instead of checking it directly, use isa<> with the correct recipe type.	2020-10-02 17:47:35 +01:00
Florian Hahn	d856365470	[VPlan] Change recipes to inherit from VPUser instead of a member var. Now that VPUser is not inheriting from VPValue, we can take the next step and turn the recipes that already manage their operands via VPUser into VPUsers directly. This is another small step towards traversing def-use chains in VPlan. This is NFC with respect to the generated code, but makes the interface more powerful.	2020-09-30 14:39:00 +01:00
Sanjay Patel	0a349d5827	[SLP] clean up - use 'const' and ArrayRef constructor; NFC Follow-on tidying suggested in the post-commit review of `6a23668`.	2020-09-24 15:31:07 -04:00
Craig Topper	03f22b08e2	[SLP] Remove LHS and RHS from OperationData. These were only really used for 2 things. One was to check if the operand matches the phi if it exists. The other was for the createOp method to build the reduction. For the first case we still have the operation we just need to know how to index its operands. So I've modified getLHS/getRHS to just use the opcode/kind to know how to find the right operands on an instruction that is now passed in. For the other case we had to create an OperationData object to set the LHS/RHS values and copy the opcode/kind from another object. We would then just call createOp on that temporary object. Instead I've made LHS/RHS arguments to createOp and removed all these temporary objects. Differential Revision: https://reviews.llvm.org/D88193	2020-09-24 10:57:11 -07:00
Craig Topper	7a3c643c35	[SLP] Make HorizontalReduction::getOperationData take an Instruction* instead of a Value. NFCI All of the callers already have an Instruction . Many of them from a dyn_cast. Also update the OperationData constructor to use a Instruction& to remove a dyn_cast and make it clear that the pointer is non-null. Differential Revision: https://reviews.llvm.org/D88132	2020-09-23 10:51:03 -07:00
Simon Pilgrim	474dc33d07	Add missing namespace closure comment. NFCI. Fixes clang-tidy llvm-namespace-comment warning.	2020-09-23 16:19:25 +01:00
Florian Hahn	31923f6b36	[VPlan] Disconnect VPValue and VPUser. This refactors VPuser to not inherit from VPValue to facilitate introducing operations that introduce multiple VPValues (e.g. VPInterleaveRecipe). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84679	2020-09-23 14:44:31 +01:00
Alexey Bataev	d6ac649ccd	[SLP]Fix coding style, NFC.	2020-09-22 17:44:29 -04:00
Stefanos Baziotis	89c1e35f3c	[LoopInfo] empty() -> isInnermost(), add isOutermost() Differential Revision: https://reviews.llvm.org/D82895	2020-09-22 23:28:51 +03:00
Florian Hahn	c671e34bf2	[VPlan] Add dump() helper to VPValue & VPRecipeBase. This provides a convenient way to print VPValues and recipes in a debugger. In particular it saves the user from instantiating VPSlotTracker to print recipes or values.	2020-09-22 15:55:16 +01:00
Sanjay Patel	0c3bfbe4bc	[SLP] reduce code duplication for checking parent block; NFC	2020-09-22 09:21:20 -04:00
Sanjay Patel	bbd49a0266	[SLP] move misplaced code comments; NFC	2020-09-22 09:21:20 -04:00
Sanjay Patel	062276c691	[SLP] clean up code in gather(); NFC 1. Use range for-loop to avoid repeatedly accessing end index. 2. Better variable names.	2020-09-22 09:21:20 -04:00
Simon Pilgrim	d682a36ef9	[SLP] Merge null and dyn_cast<> checks into dyn_cast_or_null<>. NFCI.	2020-09-22 14:01:47 +01:00
Sanjay Patel	7451bf0b0b	[SLP] use std::distance/find to reduce code; NFC We were already using this code pattern right after the loop, so this makes it consistent.	2020-09-21 16:22:55 -04:00
Sanjay Patel	be93505986	[LoopVectorize] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 15:34:24 -04:00
Sanjay Patel	a44238cb44	[SLP] use unary shuffle creator to reduce code duplication; NFC	2020-09-21 13:54:06 -04:00
Sanjay Patel	1e6b240d7d	[IRBuilder][VectorCombine] make and use a convenience function for unary shuffle; NFC This reduces code duplication for common construct. Follow-ups can use this in SLP, LoopVectorizer, and other passes.	2020-09-21 13:47:01 -04:00
Simon Pilgrim	005f826a05	[SLP] Use for-range loops across ValueLists. NFCI. Also rename some existing loops that used a 'j' iterator to consistently use 'V'.	2020-09-21 18:24:23 +01:00
Sanjay Patel	46075e0b78	[SLP] simplify interface for gather(); NFC The implementation of gather() should be reduced too, but this change by itself makes things a little clearer: we don't try to gather to a different type or number-of-values than whatever is passed in as the value list itself.	2020-09-21 12:57:28 -04:00
Simon Pilgrim	3ddecfd220	SLPVectorizer.cpp - fix include ordering. NFCI.	2020-09-21 17:17:11 +01:00
Alexey Bataev	3ff07fcd54	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-21 10:51:03 -04:00
Fangrui Song	6913812abc	Fix some clang-tidy bugprone-argument-comment issues	2020-09-19 20:41:25 -07:00
Eric Christopher	ecfd8161bf	Temporarily Revert "[SLP] Allow reordering of vectorization trees with reused instructions." as it's infinite looping on occasion. This reverts commit `455ca0ebb6`.	2020-09-18 12:50:04 -07:00
Alexey Bataev	455ca0ebb6	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-18 09:34:59 -04:00
Sanjay Patel	48a23bccf3	[VectorCombine] limit load+insert transform to one-use As discussed in: https://llvm.org/PR47558 ...there are several potential fixes/follow-ups visible in the test case, but this is the quickest and safest fix of the perf regression.	2020-09-17 14:29:15 -04:00
Sanjay Patel	ddd9575d15	[VectorCombine] rearrange bailouts for load insert for efficiency; NFC	2020-09-17 13:50:37 -04:00
Sanjay Patel	03783f19dc	[SLP] sort candidates to increase chance of optimal compare reduction This is one (small) part of improving PR41312: https://llvm.org/PR41312 As shown there and in the smaller tests here, if we have some member of the reduction values that does not match the others, we want to push it to the end (bring the matching members forward and together). In the regression tests, we have 5 candidates for the 4 slots of the reduction. If the one "wrong" compare is grouped with the others, it prevents forming the ideal v4i1 compare reduction. Differential Revision: https://reviews.llvm.org/D87772	2020-09-17 08:49:27 -04:00
Sanjay Patel	24238f09ed	[SLP] fix formatting; NFC Also move variable declarations closer to usage and add code comments.	2020-09-16 08:50:27 -04:00
Sanjay Patel	6a23668e78	[SLP] remove uses of 'auto' that obscure functionality; NFC	2020-09-16 08:26:21 -04:00
Sanjay Patel	0cee1bf5d1	[SLP] remove redundant size check; NFC We bail out on small array size anyway.	2020-09-16 08:11:19 -04:00
Sanjay Patel	bbad998bab	[SLP] move loop index variable declaration to its use; NFC	2020-09-16 07:59:31 -04:00
Sanjay Patel	158989184e	[SLP] change poorly named variable; NFC 'V' shadows a function argument.	2020-09-16 07:59:31 -04:00
Wenlei He	2ea4c2c598	[BFI] Make BFI information available through loop passes inside LoopStandardAnalysisResults ~~D65060 uncovered that trying to use BFI in loop passes can lead to non-deterministic behavior when blocks are re-used while retaining old BFI data.~~ ~~To make sure BFI is preserved through loop passes a Value Handle (VH) callback is registered on blocks themselves. When a block is freed it now also wipes out the accompanying BFI entry such that stale BFI data can no longer persist resolving the determinism issue. ~~ ~~An optimistic approach would be to incrementally update BFI information throughout the loop passes rather than only invalidating them on removed blocks. The issues with that are:~~ ~~1. It is not clear how BFI information should be incrementally updated: If a block is duplicated does its BFI information come with? How about if it's split/modified/moved around? ~~ ~~2. Assuming we can address these problems the implementation here will be a massive undertaking. ~~ ~~There's a known need of BFI in LICM analysis which requires correct but not incrementally updated BFI data. A follow-up change can register BFI in all loop passes so this preserved but potentially lossy data is available to any loop pass that wants it.~~ See: D75341 for an identical implementation of preserving BFI via VH callbacks. The previous statements do still apply but this change no longer has to be in this diff because it's already upstream 😄 . This diff also moves BFI to be a part of LoopStandardAnalysisResults since the previous method using getCachedResults now (correctly!) statically asserts (D72893) that this data isn't static through the loop passes. Testing Ninja check Reviewed By: asbirlea, nikic Differential Revision: https://reviews.llvm.org/D86156	2020-09-15 16:16:24 -07:00
Huihui Zhang	3b7f5166bd	[SLPVectorizer][SVE] Skip scalable-vector instructions before vectorizeSimpleInstructions. For scalable type, the aggregated size is unknown at compile-time. Skip instructions with scalable type to ensure the list of instructions for vectorizeSimpleInstructions does not contains any scalable-vector instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87550	2020-09-15 13:10:15 -07:00
Fangrui Song	4452cc4086	[VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan Similar to the tsan suppression in `Utils/VNCoercion.cpp:getLoadLoadClobberFullWidthSize` (rL175034; load widening used by GVN), the D81766 optimization should be suppressed under tsan due to potential spurious data race reports: struct A { int i; const short s; // the load cannot be vectorized because int modify; // it overlaps with bytes being concurrently modified long pad1, pad2; }; // __tsan_read16 does not know that some bytes are undef and accessing is safe Similarly, under asan, users can mark memory regions with `__asan_poison_memory_region`. A widened load can lead to a spurious use-after-poison error. hwasan/memtag should be similarly suppressed. `mustSuppressSpeculation` suppresses asan/hwasan/tsan but not memtag, so we need to exclude memtag in `vectorizeLoadInsert`. Note, memtag suppression can be relaxed if the load is aligned to the its granule (usually 16), but that is out of scope of this patch. Reviewed By: spatel, vitalybuka Differential Revision: https://reviews.llvm.org/D87538	2020-09-15 09:47:21 -07:00
Simon Pilgrim	2b42d53e5e	SLPVectorizer.h - remove unnecessary AliasAnalysis.h include. NFCI. Forward declare AAResults instead of the (old) AliasAnalysis type. Remove includes from SLPVectorizer.cpp that are already included in SLPVectorizer.h.	2020-09-15 16:24:05 +01:00
David Green	74760bb00f	[LV][ARM] Add preferInloopReduction target hook. This allows the backend to tell the vectorizer to produce inloop reductions through a TTI hook. For the moment on ARM under MVE this means allowing integer add reductions of the correct size. In the future this can include integer min/max too, under -Os. Differential Revision: https://reviews.llvm.org/D75512	2020-09-12 17:47:04 +01:00
Sanjay Patel	40f12ef621	[SLP] further limit bailout for load combine candidate (PR47450) The test example based on PR47450 shows that we can match non-byte-sized shifts, but those won't ever be bswap opportunities. This isn't a full fix (we'd still match if the shifts were by 8-bits for example), but this should be enough until there's evidence that we need to do more (this is a borderline case for vectorization in the first place).	2020-09-11 11:56:11 -04:00
Craig Topper	c195ae2f00	[SLPVectorizer][X86][AMDGPU] Remove fcmp+select to fmin/fmax reduction support. Previously we could match fcmp+select to a reduction if the fcmp had the nonans fast math flag. But if the select had the nonans fast math flag, InstCombine would turn it into a fminnum/fmaxnum intrinsic before SLP gets to it. Seems fairly likely that if one of the fcmp+select pair have the fast math flag, they both would. My plan is to start vectorizing the fmaxnum/fminnum version soon, but I wanted to get this code out as it had some of the strangest fast math flag behaviors.	2020-09-10 11:49:19 -07:00
Simon Pilgrim	5ea9e655ef	VPlan.h - remove unnecessary forward declarations. NFCI. Already defined in includes.	2020-09-07 18:35:06 +01:00
Huihui Zhang	b4f04d7135	[VectorCombine][SVE] Do not fold bitcast shuffle for scalable type. First, shuffle cost for scalable type is not known for scalable type; Second, we cannot reason if the narrowed shuffle mask for scalable type is a splat or not. E.g., Bitcast splat vector from type <vscale x 4 x i32> to <vscale x 8 x i16> will involve narrowing shuffle mask <vscale x 4 x i32> zeroinitializer to <vscale x 8 x i32> with element sequence of <0, 1, 0, 1, ...>, which cannot be reasoned if it's a valid splat or not. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86995	2020-09-02 15:02:16 -07:00
Sanjay Patel	8fb055932c	[VectorCombine] allow vector loads with mismatched insert type This is an enhancement to D81766 to allow loading the minimum target vector type into an IR vector with a different number of elements. In one of the motivating tests from PR16739, SLP creates <2 x float> load ops mixed with <4 x float> insert ops, so we want to handle that pattern in addition to potential oversized vectors created by the vectorizers. For now, we are assuming the insert/extract subvector with undef is free because there is no exact corresponding TTI modeling for that. Differential Revision: https://reviews.llvm.org/D86160	2020-09-02 08:11:36 -04:00
Aaron Liu	d7e16ca28f	[LV] Interleave to expose ILP for small loops with scalar reductions. Interleave for small loops that have reductions inside, which breaks dependencies and expose. This gives very significant performance improvements for some benchmarks. Because small loops could be in very hot functions in real applications. Differential Revision: https://reviews.llvm.org/D81416	2020-09-01 19:47:32 +00:00
Florian Hahn	eb35ebb3a2	[LV] Update CFG before adding runtime checks. addRuntimeChecks uses SCEVExpander, which relies on the DT/LoopInfo to be up-to-date. Changing the CFG afterwards may invalidate some inserted instructions, especially LCSSA phis. Reorder the code to first update the CFG and then create the runtime checks. This should not have any impact on the generated code, as we adjust the CFG and generate runtime checks together. Fixes PR47343.	2020-08-30 18:21:44 +01:00
Sanjay Patel	af4581e8ab	[SLP] make commutative check apply only to binops; NFC As discussed in D86798, it's not clear if the caller code works with a more liberal definition of "commutative" that includes intrinsics like min/max. This makes the binop restriction (current functionality is unchanged) explicit until the code is audited/tested.	2020-08-30 10:55:44 -04:00
David Green	543c5425f1	[LV] Add some const to RecurrenceDescriptor. NFC	2020-08-30 12:27:51 +01:00
Florian Hahn	5067f4b626	[LV] Check opt-for-size before expanding runtime checks. Move bail out when optimizing for size before runtime check generation. In that case, we do not use the result of the expansion, the expanded instruction will be dead and cleaned up later. By doing the check before expanding the runtime-checks, we can save a bit of unnecessary work.	2020-08-29 20:35:14 +01:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Christopher Tetreault	5e63083435	[SVE] Remove calls to VectorType::getNumElements from Transforms/Vectorize Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D82056	2020-08-27 12:02:20 -07:00
Sjoerd Meijer	bda8fbe2d2	[LV] Fallback strategies if tail-folding fails This implements 2 different vectorisation fallback strategies if tail-folding fails: 1) don't vectorise at all, or 2) vectorise using a scalar epilogue. This can be controlled with option -prefer-predicate-over-epilogue, that has been changed to take a numeric value corresponding to the tail-folding preference and preferred fallback. Patch by: Pierre van Houtryve, Sjoerd Meijer. Differential Revision: https://reviews.llvm.org/D79783	2020-08-26 16:55:25 +01:00
Sjoerd Meijer	ae366479e8	[LV] get.active.lane.mask consuming tripcount instead of backedge-taken count This adapts LV to the new semantics of get.active.lane.mask as discussed in D86147, which means that the LV now emits intrinsic get.active.lane.mask with the loop tripcount instead of the backedge-taken count as its second argument. The motivation for this is described in D86147. Differential Revision: https://reviews.llvm.org/D86304	2020-08-25 13:49:19 +01:00
Francesco Petrogalli	5a34b3ab95	[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI] Changes: * Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() \|\| VF.isScalar()`. * Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with `llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an ordering function for the `ElementCount` class. * Bits and pieces around printing the `ElementCount` to string streams. To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places: 1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line. Note that in some places the idiom ``` unsigned VF = ... auto VTy = FixedVectorType::get(ScalarTy, VF) ``` has been replaced with ``` ElementCount VF = ... assert(!VF.Scalable && ...); auto VTy = VectorType::get(ScalarTy, VF) ``` The assertion guarantees that the new code is (at least in debug mode) functionally equivalent to the old version. Notice that this change had been possible because none of the methods that are specific to `FixedVectorType` were used after the instantiation of `VTy`. Reviewed By: rengolin, ctetreau Differential Revision: https://reviews.llvm.org/D85794	2020-08-24 13:54:03 +00:00
Francesco Petrogalli	bad7d6b373	Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]" Reverting because the commit message doesn't reflect the one agreed on phabricator at https://reviews.llvm.org/D85794. This reverts commit `c8d2b065b9`.	2020-08-24 13:50:55 +00:00
Francesco Petrogalli	c8d2b065b9	[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI] Changes: * Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() \|\| VF.isScalar()`. * Add `<` operator to `ElementCount` to be able to use `llvm::SmallSetVector<ElementCount, ...>`. * Bits and pieces around printing the ElementCount to string streams. * Added a static method to `ElementCount` to represent a scalar. To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places: 1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line. Differential Revision: https://reviews.llvm.org/D85794	2020-08-24 13:39:42 +00:00
David Green	2b69efded0	[ARM][LV] Add a preferPredicatedReductionSelect target hook As part of D84741, this adds a target hook for the preferPredicatedReductionSelect option and makes use of it under MVE, allowing us to tail predicate most reduction loops. Differential Revision: https://reviews.llvm.org/D85980	2020-08-21 08:48:12 +01:00
David Green	816097e4e5	[LV] Allow tail folded reduction selects to remain in the loop The normal scheme for tail folding reductions is to use: loop: p = phi(0, a) mask = ... x = masked_load(..., mask) a = add(x, p) s = select(mask, a, p) This means we need to keep the register p and a alive out of the loop, plus the mask. On a target with predicated operations we can instead generate the phi as p = phi(0, s). This ensures the select in the loop and we can fold select(m, add(a, b), c) to something like a vaddt c, a, b using the m predicate. This in turn allows us to tail predicate the entire loop. Differential Revision: https://reviews.llvm.org/D84741	2020-08-20 14:31:14 +01:00
Hiroshi Yamauchi	ab401a8c8a	[PGO][PGSO][LV] Fix loop not vectorized issue under profile guided size opts. D81345 appears to accidentally disables vectorization when explicitly enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to the vectorization with versioning-for-unit-stride for PGSO. Differential Revision: https://reviews.llvm.org/D85784	2020-08-19 12:13:34 -07:00
Mehdi Amini	a407ec9b6d	Revert "Revert "[NFC][llvm] Make the contructors of `ElementCount` private."" Was reverted because MLIR/Flang builds were broken, these APIs have been fixed in the meantime.	2020-08-19 17:26:36 +00:00
Mehdi Amini	4fc56d70aa	Revert "[NFC][llvm] Make the contructors of `ElementCount` private." This reverts commit `264afb9e6a`. (and dependent `6b742cc48` and `fc53bd610f`) MLIR/Flang are broken.	2020-08-19 17:21:37 +00:00
Francesco Petrogalli	264afb9e6a	[NFC][llvm] Make the contructors of `ElementCount` private. Differential Revision: https://reviews.llvm.org/D86120	2020-08-19 16:26:44 +00:00
Bjorn Pettersson	11446b02c7	[VectorCombine] Fix for non-zero addrspace when creating vector load from scalar load This is a fixup to commit `43bdac2906`, to make sure the address space from the original load pointer is retained in the vector pointer. Resolves problem with Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed. due to address space mismatch. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D85912	2020-08-13 18:25:32 +02:00
Sanjay Patel	cc892fd9f4	[VectorCombine] early exit if target has no vector registers Based on post-commit discussion in: D81766 Other vectorization passes (SLP and Loop) use this TTI API similarly.	2020-08-12 09:22:31 -04:00
Sanjay Patel	b0b95dab1c	[VectorCombine] add safety check for 0-width register Based on post-commit discussion in D81766, Hexagon sets this to "0". I'll see if I can come up with a test, but making the obvious code fix first to unblock that target.	2020-08-11 20:30:02 -04:00
Dinar Temirbulatov	b1600d8b89	[NFC] Guard the cost report block of debug outputs with NDEBUG and switch to SmallString, this is part of D57779.	2020-08-11 16:34:47 +02:00
Florian Hahn	0b774acf11	[SLP] Make sure instructions are ordered when computing spill cost. The entries in VectorizableTree are not necessarily ordered by their position in basic blocks. Collect them and order them by dominance so later instructions are guaranteed to be visited first. For instructions in different basic blocks, we only scan to the beginning of the block, so their order does not matter, as long as all instructions in a basic block are grouped together. Using dominance ensures a deterministic order. The modified test case contains an example where we compute a wrong spill cost (2) without this patch, even though there is no call between any instruction in the bundle. This seems to have limited practical impact, .e.g on X86 with a recent Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006 there are no binary changes. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D82444	2020-08-11 11:18:12 +02:00
Sanjay Patel	43bdac2906	[VectorCombine] try to create vector loads from scalar loads This patch was adjusted to match the most basic pattern that starts with an insertelement (so there's no extract created here). Hopefully, that removes any concern about interfering with other passes. Ie, the transform should almost always be profitable. We could make an argument that this could be part of canonicalization, but we conservatively try not to create vector ops from scalar ops in passes like instcombine. If the transform is not profitable, the backend should be able to re-scalarize the load. Differential Revision: https://reviews.llvm.org/D81766	2020-08-09 09:05:06 -04:00
Anton Afanasyev	a7478fab6c	[SLP] Fix order of `insertelement`/`insertvalue` seed operands Summary: This patch takes the indices operands of `insertelement`/`insertvalue` into account while generation of seed elements for `findBuildAggregate()`. This function has kept the original order of `insert`s before. Also this patch optimizes `findBuildAggregate()` preventing it from redundant temporary vector allocations and its multiple reversing. Fixes llvm.org/pr44067 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83779	2020-08-06 22:09:24 +03:00
David Green	745bf6cf44	[LoopVectorizer] Inloop vector reductions Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069	2020-08-06 10:10:50 +01:00
Jordan Rupprecht	3c39db0c44	Revert "[LoopVectorizer] Inloop vector reductions" This reverts commit `e9761688e4`. It breaks the build: ``` ~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>' return ReductionOperations; ```	2020-08-05 10:24:15 -07:00
David Green	e9761688e4	[LoopVectorizer] Inloop vector reductions Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069	2020-08-05 18:14:05 +01:00
Bardia Mahjour	3c0f347002	[NFC][LV] Vectorized Loop Skeleton Refactoring This patch tries to improve readability and maintenance of createVectorizedLoopSkeleton by reorganizing some lines, updating some of the comments and breaking it up into smaller logical units. Reviewed By: pjeeva01 Differential Revision: https://reviews.llvm.org/D83824	2020-08-04 14:50:57 -04:00
Florian Hahn	98db27711d	[LV] Do not check widening decision for instrs outside of loop. No widening decisions will be computed for instructions outside the loop. Do not try to get a widening decision. The load/store will be just a scalar load, so treating at as normal should be fine I think. Fixes PR46950. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85087	2020-08-03 10:09:24 +01:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
David Green	1da0c47fa2	[LoopVectorizer] Don't create unused block masks for reductions. NFC This removes some unneeded block masks when we don't have any reductions. It should not have any effect on codegen as the values created are dead anyway. Differential Revision: https://reviews.llvm.org/D81415	2020-07-30 14:28:08 +01:00
Simon Pilgrim	cc529285fd	VectorUtils.h - reduce unnecessary includes. NFC. Replace TargetLibraryInfo.h include with forward declaration and fix implicit dependencies. Reduce SmallSet.h include to SmallVector.h include.	2020-07-30 12:27:49 +01:00
David Sherwood	9ad7c980bb	[SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Differential Revision: https://reviews.llvm.org/D83542	2020-07-29 16:29:19 +01:00
David Green	60280e9818	[Analysis] TTI: Add CastContextHint for getCastInstrCost Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types. Sometimes there is a context instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization. Thus, the current system can get the cost of certain casts incorrect as the correct cost can vary greatly based on the context in which it's used. For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext(load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar situations can come up with how masked loads can be extended when being split. To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce. Original patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79162	2020-07-29 13:32:53 +01:00
Kazu Hirata	902cbcd59e	Use llvm::is_contained where appropriate (NFC) Summary: This patch replaces std::find with llvm::is_contained where appropriate. Reviewers: efriedma, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr Tags: #llvm Differential Revision: https://reviews.llvm.org/D84489	2020-07-27 10:20:44 -07:00
Hiroshi Yamauchi	7bedae7dee	[PGO][PGSO] Add profile guided size optimization to loop vectorization legality.	2020-07-21 11:16:36 -07:00
Arthur Eubanks	0dfa4a83fa	Revert "[PGO][PGSO] Add profile guided size optimization to loop vectorization legality." This reverts commit `30c382a7c6`. See https://crbug.com/1106813.	2020-07-17 16:47:41 -07:00
Stanislav Mekhanoshin	efb5040262	Fixed warning about signed/unsigned comparison I've got the report clang11 issues signed/unsigned mismatch warning here. For some reason only clang11 seems to issue this warning. Differential Revision: https://reviews.llvm.org/D83916	2020-07-17 11:03:42 -07:00
Anna Welker	23c9534515	[LV] Enable the LoopVectorizer to create pointer inductions This patch enables the LoopVectorizer to build a phi of pointer type and provide the vector loads and stores with vector type getelementptrs built from the pointer induction variable, which produces much less instructions than the previous approach of creating scalar getelementpointers and glue them together to a vector. Differential Revision: https://reviews.llvm.org/D81267	2020-07-17 13:35:07 +01:00
Hiroshi Yamauchi	30c382a7c6	[PGO][PGSO] Add profile guided size optimization to loop vectorization legality. Differential Revision: https://reviews.llvm.org/D83329	2020-07-15 11:49:36 -07:00
Sanne Wouda	13fec93a77	[NFC] rename to reflect F is not necessarily an Intrinsic	2020-07-13 15:28:46 +01:00
Sanne Wouda	7b84045565	[SLPVectorizer] handle vectorizeable library functions Teaches the SLPVectorizer to use vectorized library functions for non-intrinsic calls. This already worked for intrinsics that have vectorized library functions, thanks to D75878, but schedules with library functions with a vector variant were being rejected early. - assume that there are no load/store dependencies between lib functions with a vector variant; this would otherwise prevent the bundle from becoming "ready" - check during legalization that the vector variant can be used - fix-up where we previously assumed that a call would be an intrinsic Differential Revision: https://reviews.llvm.org/D82550	2020-07-13 15:28:46 +01:00
Ayal Zaks	82a5157ff1	[LV] Fixing versioning-for-unit-stide of loops with small trip count This patch fixes D81345 and PR46652. If a loop with a small trip count is compiled w/o -Os/-Oz, Loop Access Analysis still generates runtime checks for unit strides that will version the loop. In such cases, the loop vectorizer should either re-run the analysis or bail-out from vectorizing the loop, as done prior to D81345. The latter is applied for now as the former requires refactoring. Differential Revision: https://reviews.llvm.org/D83470	2020-07-12 19:51:47 +03:00
Florian Hahn	264ab1e2c8	[LV] Pick vector loop body as insert point for SCEV expansion. Currently the DomTree is not kept up to date for additional blocks generated in the vector loop, for example when vectorizing with predication. SCEVExpander relies on dominance checks when looking for existing instructions to re-use and in some cases that can lead to the expander picking instructions that do not actually dominate their insert point (e.g. as in PR46525). Unfortunately keeping the DT up-to-date is a bit tricky, because the CFG is only patched up after generating code for a block. For now, we can just use the vector loop header, as this ensures the inserted instructions dominate all uses in the vector loop. There should be no noticeable impact on the generated code, as other passes should sink those instructions, if profitable. Fixes PR46525. Reviewers: Ayal, gilr, mkazantsev, dmgreen Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D83288	2020-07-10 10:37:12 +01:00
Benjamin Kramer	b44470547e	Make helpers static. NFC.	2020-07-09 13:48:56 +02:00
Nicolai Hähnle	3fa989d4fd	DomTree: remove explicit use of DomTreeNodeBase::iterator Summary: Almost all uses of these iterators, including implicit ones, really only need the const variant (as it should be). The only exception is in NewGVN, which changes the order of dominator tree child nodes. Change-Id: I4b5bd71e32d71b0c67b03d4927d93fe9413726d4 Reviewers: arsenm, RKSimon, mehdi_amini, courbet, rriddle, aartbik Subscribers: wdng, Prazek, hiraditya, kuhar, rogfer01, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, vkmr, Kayjukh, jurahul, msifontes, cfe-commits, llvm-commits Tags: #clang, #mlir, #llvm Differential Revision: https://reviews.llvm.org/D83087	2020-07-08 18:18:49 +02:00
Stanislav Mekhanoshin	64030099c3	SLP: honor requested max vector size merging PHIs At the moment this place does not check maximum size set by TTI and just creates a maximum possible vectors. Differential Revision: https://reviews.llvm.org/D82227	2020-07-08 08:06:15 -07:00
Florian Hahn	04b85e2bcb	Revert "[SLP] Make sure instructions are ordered when computing spill cost." This seems to break http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/24371 This reverts commit `eb46137daa`.	2020-07-07 23:15:01 +01:00
Ayal Zaks	7bf299c8d8	[LV] Vectorize without versioning-for-unit-stride under -Os/-Oz If a loop is in a function marked OptSize, Loop Access Analysis should refrain from generating runtime checks for unit strides that will version the loop. If a loop is in a function marked OptSize and its vectorization is enabled, it should be vectorized w/o any versioning. Fixes PR46228. Differential Revision: https://reviews.llvm.org/D81345	2020-07-07 15:04:21 +03:00
Jordan Rupprecht	10c82eecbc	Revert "[LV] Enable the LoopVectorizer to create pointer inductions" This reverts commit `a8fe12065e`. It causes a crash when building gzip. Will post the detailed reduced test case to D81267.	2020-07-06 17:50:38 -07:00
Florian Hahn	cff5739157	[LV] Pass dbgs() to verifyFunction call. This is done in other places of the pass already and improves the output on verification failure.	2020-07-06 15:09:20 +01:00
Florian Hahn	eb46137daa	[SLP] Make sure instructions are ordered when computing spill cost. The entries in VectorizableTree are not necessarily ordered by their position in basic blocks. Collect them and order them by dominance so later instructions are guaranteed to be visited first. For instructions in different basic blocks, we only scan to the beginning of the block, so their order does not matter, as long as all instructions in a basic block are grouped together. Using dominance ensures a deterministic order. The modified test case contains an example where we compute a wrong spill cost (2) without this patch, even though there is no call between any instruction in the bundle. This seems to have limited practical impact, .e.g on X86 with a recent Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006 there are no binary changes. Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D82444	2020-07-03 17:30:17 +01:00
Anna Welker	a8fe12065e	[LV] Enable the LoopVectorizer to create pointer inductions This patch enables the LoopVectorizer to build a phi of pointer type and provide the vector loads and stores with vector type getelementptrs built from the pointer induction variable, which produces much less instructions than the previous approach of creating scalar getelementpointers and glue them together to a vector. Differential Revision: https://reviews.llvm.org/D81267	2020-07-02 11:39:28 +01:00
Sanjay Patel	b6315aee5b	[VectorCombine] try to form vector compare and binop to eliminate scalar ops binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1) --> vcmp = cmp Pred X, VecC ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0 This is a larger pattern than the existing extractelement folds because we can't reasonably vectorize the sub-patterns with constants based on cost model calcs (it doesn't usually make sense to replace a single extracted scalar op with constant operand with a vector op). I salvaged as much of the existing logic as I could, but there might be better ways to share and reduce code. The motivating case from PR43745: https://bugs.llvm.org/show_bug.cgi?id=43745 ...is the special case of a 2-way reduction. We tried to get SLP to handle that particular pattern in D59710, but that caused crashing and regressions. This patch is more general, but hopefully safer. The v2f64 test with SSE2 surprised me - the cost model accounting looks like this: OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4 NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5 Differential Revision: https://reviews.llvm.org/D82474	2020-06-29 10:38:52 -04:00
Sanjay Patel	3b95d8346d	[VectorCombine] refactor - make helper function for extract to shuffle logic; NFC Preliminary for D82474	2020-06-29 09:55:34 -04:00
Florian Hahn	c0cdba727a	[VPlan] Add & use VPValue for VPWidenGEPRecipe operands (NFC). This patch adds VPValue version of the GEP's operands to VPWidenGEPRecipe and uses them during code-generation. Reviewers: Ayal, gilr, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D80220	2020-06-26 20:59:17 +01:00
Guillaume Chatelet	1507fc1506	[Alignment][NFC] Migrate TTI::isLegalToVectorize{Load,Store}Chain to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82653	2020-06-26 14:14:27 +00:00
Guillaume Chatelet	b66e33a689	[Alignment][NFC] Migrate TTI::getGatherScatterOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82577	2020-06-26 11:08:27 +00:00
Guillaume Chatelet	fdc7c7fb87	[Alignment][NFC] Migrate TTI::getInterleavedMemoryOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82573	2020-06-26 11:00:53 +00:00
Guillaume Chatelet	7e1f79c3de	[Alignment][NFC] Migrate TTI::getMaskedMemoryOpCost to Align This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Differential Revision: https://reviews.llvm.org/D82569	2020-06-26 10:14:16 +00:00
Simon Pilgrim	1b10c618e9	LoopVectorize.h - reduce AliasAnalysis.h include to forward declaration. NFC. Replace legacy AliasAnalysis typedef with AAResults where necessary.	2020-06-26 10:49:00 +01:00
dfukalov	7ddee0922f	[NFCI][CostModel] Add const to Value*. Summary: Get back `const` partially lost in one of recent changes. Additionally specify explicit qualifiers in few places. Reviewers: samparker Reviewed By: samparker Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82383	2020-06-24 23:16:08 +03:00
Florian Hahn	35bb9bfbb0	[SLP] Limit GEP lists based on width of index computation. D68667 introduced a tighter limit to the number of GEPs to simplify together. The limit was based on the vector element size of the pointer, but the pointers themselves are not actually put in vectors. IIUC we try to vectorize the index computations here, so we should base the limit on the vector element size of the computation of the index. This restores the test regression on AArch64 and also restores the vectorization for a important pattern in SPEC2006/464.h264ref on AArch64 (@test_i16_extend). We get a large benefit from doing a single load up front and then processing the index computations in vectors. Note that we could probably even further improve the AArch64 codegen, if we would do zexts to i32 instead of i64 for the sub operands and then do a single vector sext on the result of the subtractions. AArch64 provides dedicated vector instructions to do so. Sketch of proof in Alive: https://alive2.llvm.org/ce/z/A4xYAB Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel Reviewed By: ABataev, spatel Differential Revision: https://reviews.llvm.org/D82418	2020-06-24 19:56:53 +01:00
Sanjay Patel	a0f967418f	[VectorCombine] give invalid index value a name; NFC	2020-06-24 11:10:36 -04:00
Sanjay Patel	54143e2bd5	[VectorCombine] do not use magic number for undef mask element; NFC	2020-06-22 20:47:09 -04:00
Sanjay Patel	9934cc544c	[VectorCombine] make helper function for shift-shuffle; NFC This will probably be useful for other extract patterns.	2020-06-22 12:23:52 -04:00
Sanjay Patel	98c2f4eea5	[VectorCombine] add helper to replace uses and rename The tests are regenerated to show a path that missed renaming, but there should be no functional difference from this patch.	2020-06-22 09:58:49 -04:00
Sanjay Patel	de65b356dc	[VectorCombine] add/use pass-level IRBuilder This saves creating/destroying a builder every time we perform some transform. The tests show instruction ordering diffs resulting from always inserting at the root instruction now, but those should be benign.	2020-06-22 09:01:29 -04:00
Sanjay Patel	cce625f73d	[VectorCombine] improve IR debugging by providing/salvaging value names The tests are regenerated to show the diffs, but there should be no functional change from this patch.	2020-06-22 08:35:47 -04:00
Sanjay Patel	6bdd531af5	[VectorCombine] create class for pass to hold analyses, etc; NFC This doesn't change anything currently, but it would make sense to create a class-level IRBuilder instead of recreating that everywhere. As we expand to more optimizations, we will probably also want to hold things like the DataLayout or other constant refs in here too.	2020-06-21 16:07:33 -04:00
Sanjay Patel	741e20f3d6	[VectorCombine] fix assert for type of compare operand As shown in the post-commit comment for D81661 - we need to loosen the type assertion to allow scalarization of a compare for vectors of pointers.	2020-06-20 15:20:17 -04:00
Sanjay Patel	216a37bb46	[VectorCombine] refactor extract-extract logic; NFCI	2020-06-19 14:52:27 -04:00
Sanjay Patel	6d864097a2	[VectorCombine] fix crash while transforming constants This is a variation of the proposal in D82049 with an extra test.	2020-06-19 12:30:32 -04:00
Sanjay Patel	46a285ad9e	[IRBuilder] add/use wrapper to create a generic compare based on predicate type; NFC The predicate can always be used to distinguish between icmp and fcmp, so we don't need to keep repeating this check in the callers.	2020-06-18 15:47:06 -04:00
Simon Pilgrim	a5f1f9c9b8	ScalarEvolution.h - reduce LoopInfo.h include to forward declarations. NFC. Move ScalarEvolution::forgetLoopDispositions implementation to ScalarEvolution.cpp to remove the dependency. Add implicit header dependency to source files where necessary.	2020-06-17 15:48:23 +01:00
Sjoerd Meijer	c1034d044a	Follow up of rGe345d547a0d5, and attempt to pacify buildbot: "error: 'get' is deprecated: The base class version of get with the scalable argument defaulted to false is deprecated." Changed VectorType::get() -> FixedVectorType::get().	2020-06-17 13:24:09 +01:00
Sjoerd Meijer	e345d547a0	Recommit "[LV] Emit @llvm.get.active.lane.mask for tail-folded loops" Fixed ARM regression test. Please see the original commit message rG47650451738c for details.	2020-06-17 13:12:15 +01:00
Sjoerd Meijer	d4e183f686	Revert "[LV] Emit @llvm.get.active.mask for tail-folded loops" This reverts commit `4765045173` while I investigate the build bot failures.	2020-06-17 10:09:54 +01:00
Sjoerd Meijer	4765045173	[LV] Emit @llvm.get.active.mask for tail-folded loops This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised loops if the intrinsic is supported by the backend, which is checked by querying TargetTransform hook emitGetActiveLaneMask. This intrinsic creates a mask representing active and inactive vector lanes, which is used by the masked load/store instructions that are created for tail-folded loops. The semantics of @llvm.get.active.mask are described here in LangRef: https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics This intrinsic is also used to provide a hint to the backend. That is, the second argument of the intrinsic represents the back-edge taken count of the loop. For MVE, for example, we use that to set up tail-predication, which is a new form of predication in MVE for vector loops that implicitely predicates the last vector loop iteration by implicitely setting active/inactive lanes, i.e. the tail loop is predicated. In order to set up a tail-predicated vector loop, we need to know the number of data elements processed by the vector loop, which corresponds the the tripcount of the scalar loop, which we can now reconstruct using @llvm.get.active.mask. Differential Revision: https://reviews.llvm.org/D79100	2020-06-17 09:53:58 +01:00
Christopher Tetreault	ff628f5f5e	[SVE] Eliminate calls to default-false VectorType::get() from Vectorize Reviewers: efriedma, fhahn, spatel, sdesmalen, kmclaughlin Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81521	2020-06-16 12:50:13 -07:00
Sanjay Patel	ed67f5e7ab	[VectorCombine] scalarize compares with insertelement operand(s) Generalize scalarization (recently enhanced with D80885) to allow compares as well as binops. Similar to binops, we are avoiding scalarization of a loaded value because that could avoid a register transfer in codegen. This requires 1 extra predicate that I am aware of: we do not want to scalarize the condition value of a vector select. That might also invert a transform that we do in instcombine that prefers a vector condition operand for a vector select. I think this is the final step in solving PR37463: https://bugs.llvm.org/show_bug.cgi?id=37463 Differential Revision: https://reviews.llvm.org/D81661	2020-06-16 13:48:10 -04:00
Sam Parker	2596da3174	[CostModel] getCFInstrCost in getUserCost. Have BasicTTI call the base implementation so that both agree on the default behaviour, which the default being a cost of '1'. This has required an X86 specific implementation as it seems to be very reliant on those instructions being free. Changes are also made to AMDGPU so that their implementations distinguish between cost kinds, so that the unrolling isn't affected. PowerPC also has its own implementation to prevent changes to the reg-usage vectorizer test. The cost model test changes now reflect that ret instructions are not generally free. Differential Revision: https://reviews.llvm.org/D79164	2020-06-15 09:28:46 +01:00
Roman Lebedev	7aeb41b3c8	[NFCI] VectorCombine: add statistic for bitcast(shuf()) -> shuf(bitcast()) xform	2020-06-12 23:10:53 +03:00
Florian Hahn	3a846d4d92	[VPlan] Reject loops without computable backedge taken counts getOrCreateTripCount is used to generate code for the outer loop, but it requires a computable backedge taken counts. Check that in the VPlan native path. Reviewers: Ayal, gilr, rengolin, sguggill Reviewed By: sguggill Differential Revision: https://reviews.llvm.org/D81088	2020-06-12 10:31:18 +01:00
Sanjay Patel	039ff29ef6	[VectorCombine] remove unused parameters; NFC	2020-06-11 19:15:03 -04:00
Simon Pilgrim	5dc4e7c2b9	[VectorCombine] scalarizeBinop - support an all-constant src vector operand scalarizeBinop currently folds vec_bo((inselt VecC0, V0, Index), (inselt VecC1, V1, Index)) -> inselt(vec_bo(VecC0, VecC1), scl_bo(V0,V1), Index) This patch extends this to account for cases where one of the vec_bo operands is already all-constant and performs similar cost checks to determine if the scalar binop with a constant still makes sense: vec_bo((inselt VecC0, V0, Index), VecC1) -> inselt(vec_bo(VecC0, VecC1), scl_bo(V0,extractelt(V1,Index)), Index) Fixes PR42174 Differential Revision: https://reviews.llvm.org/D80885	2020-06-09 19:02:05 +01:00
Benjamin Kramer	3badd17b69	SmallPtrSet::find -> SmallPtrSet::count The latter is more readable and more efficient. While there clean up some double lookups. NFCI.	2020-06-07 22:38:08 +02:00
Simon Pilgrim	5006e551d3	LoopAnalysisManager.h - reduce includes to forward declarations. NFC. Move implicit include dependencies down to header/source files.	2020-06-06 14:06:46 +01:00
Florian Hahn	211596c94e	[VPlan] Support extracting lanes for defs managed in VPTransformState. Currently extracting a lane for a VPValue def is not supported, if it is managed directly by VPTransformState (e.g. because it is created by a VPInstruction or an external VPValue def). For now, simply extract the requested lane. In the future, we should also cache the extracted scalar values, similar to LV. Reviewers: Ayal, rengolin, gilr, SjoerdMeijer Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D80787	2020-06-03 12:14:16 +01:00
Florian Hahn	b446ec56a2	[LV] Make sure the MaxVF is a power-of-2 by rounding down. LV currently only supports power of 2 vectorization factors, which has been made explicit with the assertion added in `840450549c`. However, if the widest type is not a power-of-2 the computed MaxVF won't be a power-of-2 either. This patch updates computeFeasibleMaxVF to ensure the returned value is a power-of-2 by rounding down to the nearest power-of-2. Fixes PR46139. Reviewers: Ayal, gilr, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D80870	2020-06-02 10:40:49 +01:00
Valery N Dmitriev	a45688a72c	[SLP] Apply external to vectorizable tree users cost adjustment for relevant aggregate build instructions only (UserCost). Users are detected with findBuildAggregate routine and the trick is that following SLP vectorization may end up vectorizing entire list with smaller chunks. Cost adjustment then is applied for individual chunks and these adjustments obviously have to be smaller than the entire aggregate build cost. Differential Revision: https://reviews.llvm.org/D80773	2020-05-29 15:37:41 -07:00
Christopher Tetreault	d2befc6633	[SVE] Eliminate calls to default-false VectorType::get() from Vectorize Reviewers: efriedma, c-rhodes, david-arm, fhahn Reviewed By: david-arm Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80339	2020-05-29 11:31:24 -07:00
Florian Hahn	9b507b2127	[LAA] We only need pointer checks if there are non-zero checks (NFC). If it turns out that we can do runtime checks, but there are no runtime-checks to generate, set RtCheck.Need to false. This can happen if we can prove statically that the pointers passed in to canCheckPtrAtRT do not alias. This should not change any results, but allows us to skip some work and assert that runtime checks are generated, if LAA indicates that runtime checks are required. Reviewers: anemet, Ayal Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D79969 Note: This is a recommit of `259abfc7cb`, with some suggested renaming.	2020-05-27 12:47:36 +01:00
Florian Hahn	2d0389821e	Revert "[LAA] We only need pointer checks if there are non-zero checks (NFC)." This reverts commit `259abfc7cb`. Reverting this, as I missed a case where we return without setting RtCheck.Need.	2020-05-27 12:39:45 +01:00
Florian Hahn	259abfc7cb	[LAA] We only need pointer checks if there are non-zero checks (NFC). If it turns out that we can do runtime checks, but there are no runtime-checks to generate, set RtCheck.Need to false. This can happen if we can prove statically that the pointers passed in to canCheckPtrAtRT do not alias. This should not change any results, but allows us to skip some work and assert that runtime checks are generated, if LAA indicates that runtime checks are required. Reviewers: anemet, Ayal Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D79969	2020-05-27 12:37:20 +01:00
Simon Pilgrim	35963f6d85	VPlanValue.h - reduce unnecessary includes to forward declarations. NFC.	2020-05-27 11:26:14 +01:00
Ayal Zaks	840450549c	[LV] Clamp MaxVF to power of 2. If a loop has a constant trip count known to be a multiple of MaxVF (times user UF), LV infers that no tail will be generated for any chosen VF. This relies on the chosen VF's being powers of 2 bound by MaxVF, and assumes MaxVF is a power of 2. Make sure the latter holds, in particular when MaxVF is set by a memory dependence distance which may not be a power of 2. Differential Revision: https://reviews.llvm.org/D80491	2020-05-25 11:24:33 +03:00
Florian Hahn	0deab8a54f	[LV] Either get invariant condition OR vector condition. Currently we unconditionally get the first lane of the condition operand, even if we later use the full vector condition. This can result in some unnecessary instructions being generated. Suggested as follow-up in D80219.	2020-05-24 17:16:42 +01:00
Sanjay Patel	7eed772a27	[PatternMatch] abbreviate vector inst matchers; NFC Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.	2020-05-24 09:19:47 -04:00
Florian Hahn	15224408f0	[VPlan] Use VPUser for VPWidenSelectRecipe operands (NFC). VPWidenSelectRecipe already contains a VPUser, but it is not used. This patch updates the code related to VPWidenSelectRecipe to use VPUser for its operands. Reviewers: Ayal, gilr, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D80219	2020-05-24 13:58:08 +01:00
Sanjay Patel	024098ae53	[VectorCombine] set preserve alias analysis As noted in D80236, moving the pass in the pipeline exposed this shortcoming. Extra work to recalculate the alias results showed up as a compile-time slowdown.	2020-05-22 16:25:16 -04:00
Anh Tuyen Tran	13bf6039c9	Title: [LV] Handle Fold-Tail of loops with vectorizarion factor equal to 1 Summary: When handling loops whose VF is 1, fold-tail vectorization sets the backedge taken count of the original loop with a vector of a single element. This causes type-mismatch during instruction generartion. The purpose of this patch is toto address the case of VF==1. Reviewer: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn), gilr (Gil Rapaport), rengolin (Renato Golin) Reviewed By: Ayal (Ayal Zaks), bmahjour (Bardia Mahjour), fhahn (Florian Hahn) Subscribers: Ayal (Ayal Zaks), rkruppe (Hanna Kruppe), bmahjour (Bardia Mahjour), rogfer01 (Roger Ferrer Ibanez), vkmr (Vineet Kumar), bollu (Siddharth Bhat), hiraditya (Aditya Kumar), llvm-commits (Mailing List llvm-commits) Tag: LLVM Differential Revision: https://reviews.llvm.org/D79976	2020-05-22 13:30:56 +00:00
Sanjay Patel	21f7cf4057	[SLP] fix verification check for valid IR This is a fix for PR45965 - https://bugs.llvm.org/show_bug.cgi?id=45965 - which was left out of D80106 because of a test failure. SLP does its own mini-CSE after potentially creating redundant instructions, so we need to wait for that to complete before running the verifier. Otherwise, we will see a test failure for test/Transforms/SLPVectorizer/X86/crash_vectorizeTree.ll (not changed here) because a phi temporarily has identical but different incoming values for the same incoming block. A related, but independent, test that would have been altered here was fixed with: rG880df55 The test was escaping verification in SLP without this change because we were not running verifyFunction() unless SLP actually changed the IR. Differential Revision: https://reviews.llvm.org/D80401	2020-05-22 09:15:27 -04:00
Dinar Temirbulatov	df3b95bc0a	[SLP][NFC] PR45269 getVectorElementSize() is slow The algorithm inside getVectorElementSize() is almost O(x^2) complexity and when, for example, we compile MultiSource/Applications/ClamAV/shared_sha256.c with 1k instructions inside sha256_transform() function that resulted in almost ~800k iterations. The following change improves the algorithm with the map to a liner complexity. Differential Revision: https://reviews.llvm.org/D80241	2020-05-21 17:26:50 +02:00
Sam Parker	8cc911fa5b	[NFCI][CostModel] Refactor getIntrinsicInstrCost Combine the two API calls into one by introducing a structure to hold the relevant data. This has the added benefit of moving the boiler plate code for arguments and flags, into the constructors. This is intended to be a non-functional change, but the complicated web of logic involved here makes it very hard to guarantee. Differential Revision: https://reviews.llvm.org/D79941	2020-05-20 11:59:08 +01:00
Florian Hahn	bcbd26bfe6	[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. This patch was originally committed as `b8a3c34eee`, but broke the modules build, as LoopAccessAnalysis was using the Expander. The code-gen part of LAA was moved to lib/Transforms recently, so this patch can be landed again. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537	2020-05-20 10:53:40 +01:00
Florian Hahn	7cefd1b4cd	[LV] Remove duplicated return stmt (NFC).	2020-05-19 17:20:50 +01:00
Florian Hahn	cff9399f6b	[VPlan] Fix comment for User in VPWidenSelectRecipe (NFC). The comment was referring the arguments of the call, but the recipe widens a select.	2020-05-19 15:31:39 +01:00
Florian Hahn	f828d75b46	[VPlan] Add & use VPValue operands for VPReplicateRecipe (NFC). This patch adds VPValue version of the instruction operands to VPReplicateRecipe and uses them during code-generation. Reviewers: Ayal, gilr, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D80114	2020-05-19 15:12:17 +01:00
Florian Hahn	66ad107452	[VPlan] Remove unique_ptr from VPBranchOnRecipeMask (NFC). We can remove a dynamic memory allocation, by checking the number of operands: no operands = all true, 1 operand = mask. Reviewers: Ayal, gilr, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D80110	2020-05-19 15:01:37 +01:00
Eli Friedman	27b4e6931d	[NFC] Replace MaybeAlign with Align in TargetTransformInfo.	2020-05-18 19:25:49 -07:00
Ayal Zaks	682e739638	[LV] Fix FoldTail under user VF and UF LV considers an internally computed MaxVF to decide if a constant trip-count is a multiple of any subsequently chosen VF, and conclude that no scalar remainder iterations (tail) will be left for Fold Tail to handle. If an external VF is provided via -force-vector-width, it must be considered instead of the internal MaxVF. If an external UF is provided via -force-vector-interleave, it too must be considered in addition to MaxVF or user VF. Fixes PR45679. Differential Revision: https://reviews.llvm.org/D80085	2020-05-19 01:32:25 +03:00
Craig Topper	c9f63297e2	Fix several places that were calling verifyFunction or verifyModule without checking the return value. verifyFunction/verifyModule don't assert or error internally. They also don't print anything if you don't pass a raw_ostream to them. So the caller needs to check the result and ideally pass a stream to get the messages. Otherwise they're just really expensive no-ops. I've filed PR45965 for another instance in SLPVectorizer that causes a lit test failure. Differential Revision: https://reviews.llvm.org/D80106	2020-05-18 13:28:46 -07:00
Volkan Keles	63081dc6f6	LoadStoreVectorizer: Match nested adds to prove vectorization is safe If both OpA and OpB is an add with NSW/NUW and with the same LHS operand, we can guarantee that the transformation is safe if we can prove that OpA won't overflow when IdxDiff added to the RHS of OpA. Review: https://reviews.llvm.org/D79817	2020-05-18 12:13:01 -07:00
Nikita Popov	52e98f620c	[Alignment] Remove unnecessary getValueOrABITypeAlignment calls (NFC) Now that load/store alignment is required, we no longer need most of them. Also switch the getLoadStoreAlignment() helper to return Align instead of MaybeAlign.	2020-05-17 22:19:15 +02:00
Sanjay Patel	81e9ede3a2	[VectorCombine] forward walk through instructions to improve chaining of transforms This is split off from D79799 - where I was proposing to fully iterate over a function until there are no more transforms. I suspect we are still going to want to do something like that eventually. But we can achieve the same gains much more efficiently on the current set of regression tests just by reversing the order that we visit the instructions. This may also reduce the motivation for D79078, but we are still not getting the optimal pattern for a reduction.	2020-05-16 13:08:01 -04:00
Florian Hahn	4c8285c750	[VPlan] Move emission of \\l\"+\n to dumpBasicBlock (NFC). The patch standardizes printing of VPRecipes a bit, by hoisting out the common emission of \\l\"+\n. It simplifies the code and is also a first step towards untangling printing from DOT format output, with the goal of making the DOT output optional and to provide a more concise debug output if DOT output is disabled. Reviewers: gilr, Ayal, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D78883	2020-05-14 13:07:59 +01:00
Alina Sbirlea	bd541b217f	[NewPassManager] Add assertions when getting statefull cached analysis. Summary: Analyses that are statefull should not be retrieved through a proxy from an outer IR unit, as these analyses are only invalidated at the end of the inner IR unit manager. This patch disallows getting the outer manager and provides an API to get a cached analysis through the proxy. If the analysis is not stateless, the call to getCachedResult will assert. Reviewers: chandlerc Subscribers: mehdi_amini, eraman, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72893	2020-05-13 12:38:38 -07:00
Sjoerd Meijer	9529597cf4	Recommit #2 : "[LV] Induction Variable does not remain scalar under tail-folding." This was reverted because of a miscompilation. At closer inspection, the problem was actually visible in a changed llvm regression test too. This one-line follow up fix/recommit will splat the IV, which is what we are trying to avoid if unnecessary in general, if tail-folding is requested even if all users are scalar instructions after vectorisation. Because with tail-folding, the splat IV will be used by the predicate of the masked loads/stores instructions. The previous version omitted this, which caused the miscompilation. The original commit message was: If tail-folding of the scalar remainder loop is applied, the primary induction variable is splat to a vector and used by the masked load/store vector instructions, thus the IV does not remain scalar. Because we now mark that the IV does not remain scalar for these cases, we don't emit the vector IV if it is not used. Thus, the vectoriser produces less dead code. Thanks to Ayal Zaks for the direction how to fix this.	2020-05-13 13:50:09 +01:00
Sanjay Patel	5f730b645d	[VectorCombine] account for extra uses in scalarization cost Follow-up to D79452. Mimics the extra use cost formula for the inverse transform with extracts.	2020-05-11 15:20:57 -04:00
Florian Hahn	8528186b9b	[LAA] Move runtime-check generation to Transforms/Utils/loopUtils (NFC) Currently LAA's uses of ScalarEvolutionExpander blocks moving the expander from Analysis to Transforms. Conceptually the expander does not fit into Analysis (it is only used for code generation) and runtime-check generation also seems to be better suited as a transformation utility. Reviewers: Ayal, anemet Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D78460	2020-05-10 17:39:26 +01:00
Florian Hahn	96c63f544f	Recommit "[LAA] Remove one addRuntimeChecks function (NFC)." The failing assertion has been fixed and the problematic test case has been added. This reverts the revert commit `fc44617f28`.	2020-05-10 15:19:57 +01:00
Florian Hahn	fc44617f28	Revert "[LAA] Remove one addRuntimeChecks function (NFC)." This reverts commit `c28114c8ff`. This causes some bots to fail: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-android/builds/30596/steps/build%20android%2Faarch64/logs/stdio	2020-05-10 13:28:00 +01:00
Florian Hahn	c28114c8ff	[LAA] Remove one addRuntimeChecks function (NFC). In order to reduce the API surface area (preparation for D78460), remove a addRuntimeChecks() function and do the additional check in the single caller. Reviewers: Ayal, anemet Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D79679	2020-05-10 12:48:55 +01:00
Sanjay Patel	0d2a0b44c8	[VectorCombine] scalarize binop of inserted elements into vector constants As with the extractelement patterns that are currently in vector-combine, there are going to be several possible variations on this theme. This should be the clearest, simplest example. Scalarization is the right direction for target-independent canonicalization, and InstCombine has some of those folds already, but it doesn't do this. I proposed a similar transform in D50992. Here in vector-combine, we can check the cost model to be sure it's profitable, so there should be less risk. Differential Revision: https://reviews.llvm.org/D79452	2020-05-08 16:31:12 -04:00
Benjamin Kramer	f936457f80	Revert "Recommit "[LV] Induction Variable does not remain scalar under tail-folding."" This reverts commit `ae45b4dbe7`. It causes miscompilations, test case on the mailing list.	2020-05-08 14:49:10 +02:00
Sanjay Patel	02051c7f3a	[SLP] add another bailout for load-combine patterns (2nd try) The original patch (rG86dfbc676ebe) exposed an existing bug: we could wrongly cast a constant expression to BinaryOperator because the pattern matching allows that. This adds a check for that case, and there's a reduced test case to verify no crashing. Original commit message: This builds on the or-reduction bailout that was added with D67841. We still do not have IR-level load combining, although that could be a target-specific enhancement for -vector-combiner. The heuristic is narrowly defined to catch the motivating case from PR39538: https://bugs.llvm.org/show_bug.cgi?id=39538 ...while preserving existing functionality. That is, there's an unmodified test of pure load/zext/store that is not seen in this patch at llvm/test/Transforms/SLPVectorizer/X86/cast.ll. That's the reason for the logic difference to require the 'or' instructions. The chances that vectorization would actually help a memory-bound sequence like that seem small, but it looks nicer with: vpmovzxwd (%rsi), %xmm0 vmovdqu %xmm0, (%rdi) rather than: movzwl (%rsi), %eax movl %eax, (%rdi) ... In the motivating test, we avoid creating a vector mess that is unrecoverable in the backend, and SDAG forms the expected bswap instructions after load combining: movzbl (%rdi), %eax vmovd %eax, %xmm0 movzbl 1(%rdi), %eax vmovd %eax, %xmm1 movzbl 2(%rdi), %eax vpinsrb $4, 4(%rdi), %xmm0, %xmm0 vpinsrb $8, 8(%rdi), %xmm0, %xmm0 vpinsrb $12, 12(%rdi), %xmm0, %xmm0 vmovd %eax, %xmm2 movzbl 3(%rdi), %eax vpinsrb $1, 5(%rdi), %xmm1, %xmm1 vpinsrb $2, 9(%rdi), %xmm1, %xmm1 vpinsrb $3, 13(%rdi), %xmm1, %xmm1 vpslld $24, %xmm0, %xmm0 vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero vpslld $16, %xmm1, %xmm1 vpor %xmm0, %xmm1, %xmm0 vpinsrb $1, 6(%rdi), %xmm2, %xmm1 vmovd %eax, %xmm2 vpinsrb $2, 10(%rdi), %xmm1, %xmm1 vpinsrb $3, 14(%rdi), %xmm1, %xmm1 vpinsrb $1, 7(%rdi), %xmm2, %xmm2 vpinsrb $2, 11(%rdi), %xmm2, %xmm2 vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero vpinsrb $3, 15(%rdi), %xmm2, %xmm2 vpslld $8, %xmm1, %xmm1 vpmovzxbd %xmm2, %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero vpor %xmm2, %xmm1, %xmm1 vpor %xmm1, %xmm0, %xmm0 vmovdqu %xmm0, (%rsi) movl (%rdi), %eax movl 4(%rdi), %ecx movl 8(%rdi), %edx movbel %eax, (%rsi) movbel %ecx, 4(%rsi) movl 12(%rdi), %ecx movbel %edx, 8(%rsi) movbel %ecx, 12(%rsi) Differential Revision: https://reviews.llvm.org/D78997	2020-05-07 15:04:37 -04:00
Hans Wennborg	c54c6ee1a7	Revert "[SLP] add another bailout for load-combine patterns" It caused asserts building Chromium, see discussion on https://reviews.llvm.org/D78997 This reverts commit `86dfbc676e`.	2020-05-07 16:31:52 +02:00
Sjoerd Meijer	3bbc71d6c9	[LV] Fix typo in variable name. NFC.	2020-05-07 13:53:44 +01:00
Sjoerd Meijer	ae45b4dbe7	Recommit "[LV] Induction Variable does not remain scalar under tail-folding." With 3 llvm regr tests fixed/updated that I had missed.	2020-05-07 11:52:20 +01:00
Sjoerd Meijer	20d67ffeae	Revert "[LV] Induction Variable does not remain scalar under tail-folding." This reverts commit `617aa64c84`. while I investigate buildbot failures.	2020-05-07 09:29:56 +01:00
Sjoerd Meijer	617aa64c84	[LV] Induction Variable does not remain scalar under tail-folding. If tail-folding of the scalar remainder loop is applied, the primary induction variable is splat to a vector and used by the masked load/store vector instructions, thus the IV does not remain scalar. Because we now mark that the IV does not remain scalar for these cases, we don't emit the vector IV if it is not used. Thus, the vectoriser produces less dead code. Thanks to Ayal Zaks for the direction how to fix this. Differential Revision: https://reviews.llvm.org/D78911	2020-05-07 09:15:23 +01:00
Sanjay Patel	86dfbc676e	[SLP] add another bailout for load-combine patterns This builds on the or-reduction bailout that was added with D67841. We still do not have IR-level load combining, although that could be a target-specific enhancement for -vector-combiner. The heuristic is narrowly defined to catch the motivating case from PR39538: https://bugs.llvm.org/show_bug.cgi?id=39538 ...while preserving existing functionality. That is, there's an unmodified test of pure load/zext/store that is not seen in this patch at llvm/test/Transforms/SLPVectorizer/X86/cast.ll. That's the reason for the logic difference to require the 'or' instructions. The chances that vectorization would actually help a memory-bound sequence like that seem small, but it looks nicer with: vpmovzxwd (%rsi), %xmm0 vmovdqu %xmm0, (%rdi) rather than: movzwl (%rsi), %eax movl %eax, (%rdi) ... In the motivating test, we avoid creating a vector mess that is unrecoverable in the backend, and SDAG forms the expected bswap instructions after load combining: movzbl (%rdi), %eax vmovd %eax, %xmm0 movzbl 1(%rdi), %eax vmovd %eax, %xmm1 movzbl 2(%rdi), %eax vpinsrb $4, 4(%rdi), %xmm0, %xmm0 vpinsrb $8, 8(%rdi), %xmm0, %xmm0 vpinsrb $12, 12(%rdi), %xmm0, %xmm0 vmovd %eax, %xmm2 movzbl 3(%rdi), %eax vpinsrb $1, 5(%rdi), %xmm1, %xmm1 vpinsrb $2, 9(%rdi), %xmm1, %xmm1 vpinsrb $3, 13(%rdi), %xmm1, %xmm1 vpslld $24, %xmm0, %xmm0 vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero vpslld $16, %xmm1, %xmm1 vpor %xmm0, %xmm1, %xmm0 vpinsrb $1, 6(%rdi), %xmm2, %xmm1 vmovd %eax, %xmm2 vpinsrb $2, 10(%rdi), %xmm1, %xmm1 vpinsrb $3, 14(%rdi), %xmm1, %xmm1 vpinsrb $1, 7(%rdi), %xmm2, %xmm2 vpinsrb $2, 11(%rdi), %xmm2, %xmm2 vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero vpinsrb $3, 15(%rdi), %xmm2, %xmm2 vpslld $8, %xmm1, %xmm1 vpmovzxbd %xmm2, %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero vpor %xmm2, %xmm1, %xmm1 vpor %xmm1, %xmm0, %xmm0 vmovdqu %xmm0, (%rsi) movl (%rdi), %eax movl 4(%rdi), %ecx movl 8(%rdi), %edx movbel %eax, (%rsi) movbel %ecx, 4(%rsi) movl 12(%rdi), %ecx movbel %edx, 8(%rsi) movbel %ecx, 12(%rsi) Differential Revision: https://reviews.llvm.org/D78997	2020-05-05 12:44:38 -04:00
Simon Pilgrim	4e3c005554	[TTI] getScalarizationOverhead - use explicit VectorType operand getScalarizationOverhead is only ever called with vectors (and we already had a load of cast<VectorType> calls immediately inside the functions). Followup to D78357 Reviewed By: @samparker Differential Revision: https://reviews.llvm.org/D79341	2020-05-05 16:59:23 +01:00
Sam Parker	40574fefe9	[NFC][CostModel] Add TargetCostKind to relevant APIs Make the kind of cost explicit throughout the cost model which, apart from making the cost clear, will allow the generic parts to calculate better costs. It will also allow some backends to approximate and correlate the different costs if they wish. Another benefit is that it will also help simplify the cost model around immediate and intrinsic costs, where we currently have multiple APIs. RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html Differential Revision: https://reviews.llvm.org/D79002	2020-05-05 10:35:54 +01:00
Florian Hahn	bbdfcf8f69	[VPlan] Remove unused & undefined print method (NFC).	2020-05-03 18:36:20 +01:00
Anh Tuyen Tran	c7878ad231	[VFDatabase] Scalar functions are vector functions with VF =1 Summary: Return scalar function when VF==1. The new trivial mapping scalar --> scalar when VF==1 to prevent false positive for "isVectorizable" query. Author: masoud.ataei (Masoud Ataei) Reviewers: Whitney (Whitney Tsang), fhahn (Florian Hahn), pjeeva01 (Jeeva P.), fpetrogalli (Francesco Petrogalli), rengolin (Renato Golin) Reviewed By: fpetrogalli (Francesco Petrogalli) Subscribers: hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D78054	2020-04-29 17:20:37 +00:00
Simon Pilgrim	090cae8491	[TTI] Add DemandedElts to getScalarizationOverhead The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited. This patch does 2 things: 1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern. 2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs. This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing. A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well. Reviewed By: @craig.topper Differential Revision: https://reviews.llvm.org/D78216	2020-04-29 12:00:38 +01:00
Florian Hahn	e89379856a	Recommit "[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)." The crash that caused the original revert has been fixed in `a3c964a278`. I also added a reduced version of the crash reproducer. This reverts the revert commit `2107af9ccf`.	2020-04-29 11:40:39 +01:00
Sanjay Patel	21acc0612a	[SLP] refactor load-combine logic; NFC We may want to identify sequences that are not reductions, but still qualify as load-combines in the back-end, so make most of the body a helper function.	2020-04-27 16:02:37 -04:00
Ayal Zaks	a3c964a278	[LV] Fix recording of BranchTakenCount for FoldTail When folding tail, branch taken count is computed during initial VPlan execution and recorded to be used by the compare computing the loop's mask. This recording should directly set the State, instead of reusing Value2VPValue mapping which serves original Values present prior to vectorization. The branch taken count may be a constant Value, which may be used elsewhere in the loop; trying to employ Value2VPValue for both leads to the issue reported in https://reviews.llvm.org/D76992#inline-721028 Differential Revision: https://reviews.llvm.org/D78847	2020-04-26 20:13:10 +03:00
Max Kazantsev	9cd4debd5a	[LoopVectorize] Preserve CFG analyses if CFG wasn't modified One of transforms the loop vectorizer makes is LCSSA formation. In some cases it is the only transform it makes. We should not drop CFG analyzes if only LCSSA was formed and no actual CFG changes was made. We should think of expanding this logic to other passes as well, and maybe make it a part of PM framework. Reviewed By: Florian Hahn Differential Revision: https://reviews.llvm.org/D78360	2020-04-24 17:22:24 +07:00
Mehdi Amini	2107af9ccf	Revert "[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)." This reverts commit `9245c7ac13`. This is triggering a segfault in XLA downstream, we'll follow-up with a reproducer, it is likely influenced by TTI/TLI settings or other options as a simple `opt -loop-vectorize` invocation on the IR before the crash does not reproduce immediately.	2020-04-24 05:07:32 +00:00
Simon Pilgrim	b108a457e1	[VPlan] Remove unused forward declarations. NFC. Move VPlan.h include from VPlanVerifier.h down to VPlanVerifier.cpp	2020-04-23 12:34:20 +01:00
Florian Hahn	9245c7ac13	[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC). This patch adds VPValue version of the instruction operands to VPWidenRecipe and uses them during code-generation. Similar to D76373 this reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. Reviewers: rengolin, Ayal, gilr Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D76992	2020-04-23 12:16:46 +01:00
Florian Hahn	647c9e72e4	[VPlan] Make various tryTo* helpers private and mark as const (NFC). The individual tryTo* helpers do not need to be public. Also, the builder contained two consecutive public: sections, which is not necessary. Moved the remaining public methods after the constructor. Also make some of the tryTo* helpers const. Reviewers: gilr, rengolin, Ayal, hsaito Reviewed by: gilr Differential Revision: https://reviews.llvm.org/D78288	2020-04-21 14:49:02 +01:00
Craig Topper	68b2e507e4	[Local] Update getOrEnforceKnownAlignment/getKnownAlignment to use Align/MaybeAlign. Differential Revision: https://reviews.llvm.org/D78443	2020-04-20 21:31:44 -07:00
Craig Topper	fcc9d70260	Revert "[Local] Update getOrEnforceKnownAlignment/getKnownAlignment to use Align/MaybeAlign." This is breaking the clang build. This reverts commit `897409fb56`.	2020-04-20 13:25:06 -07:00
Craig Topper	897409fb56	[Local] Update getOrEnforceKnownAlignment/getKnownAlignment to use Align/MaybeAlign. Differential Revision: https://reviews.llvm.org/D78443	2020-04-20 13:08:05 -07:00
Florian Hahn	fa284e136e	[VPlan] Clean up tryToCreate(Widen)Recipe. (NFC) This patch includes some clean-ups to tryToCreateRecipe, suggested in D77973. It includes: * Renaming tryToCreateRecipe to tryToCreateWidenRecipe. * Move VPBB insertion logic to caller of tryToCreateWidenRecipe. * Hoists instruction checks to tryToCreateWidenRecipe, making it clearer which instructions are handled by which recipe, simplifying the checks by using early exits. * Split up handling of induction PHIs and truncates using inductions. Reviewers: gilr, rengolin, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D78287	2020-04-20 10:06:35 +01:00
Sam Parker	e3056ae9a0	[NFC][TTI] Explicit use of VectorType The API for shuffles and reductions uses generic Type parameters, instead of VectorType, and so assertions and casts are used a lot. This patch makes those types explicit, which means that the clients can't be lazy, but results in less ambiguity, and that can only be a good thing. Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562 Differential Revision: https://reviews.llvm.org/D78357	2020-04-20 09:16:52 +01:00
Sanjay Patel	bef6e67e95	[VectorCombine] transform bitcasted shuffle to wider elements bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC' This is the widen shuffle elements enhancement to D76727. It builds on the analysis and simplifications in D77881 and rG6a7e958a423e. The phase ordering tests show that we can simplify inverse shuffles across a binop in both directions (widen/narrow or narrow/widen) now. There's another potential transform visible in some of the remaining TODOs - move a bitcasted operand of a shuffle after the shuffle. Differential Revision: https://reviews.llvm.org/D78371	2020-04-19 08:24:38 -04:00
Benjamin Kramer	ff54d1c897	Remove remaining callers of CreateShuffleVector with unsigned indices and mark it as deprecated No functionality change intended.	2020-04-19 11:48:28 +02:00
Ayal Zaks	8e0c5f7200	[LV] Mark first-order recurrences as allowed exits First-order recurrences require special treatment when they are live-out; such treatment is provided by fixFirstOrderRecurrence(), so they should be included in AllowedExit set. (Should probably have been included originally in D16197.) Fixes PR45526: AllowedExit set is used by prepareToFoldTailByMasking() to check whether the treatment for live-outs also holds when folding the tail, which is not (yet) the case for first-order recurrences. Differential Revision: https://reviews.llvm.org/D78210	2020-04-18 23:54:21 +03:00
Florian Hahn	4ee45ab60f	[LV] Invalidate cost model decisions along with interleave groups. Cost-modeling decisions are tied to the compute interleave groups (widening decisions, scalar and uniform values). When invalidating the interleave groups, those decisions also need to be invalidated. Otherwise there is a mis-match during VPlan construction. VPWidenMemoryRecipes created initially are left around w/o converting them into VPInterleave recipes. Such a conversion indeed should not take place, and these gather/scatter recipes may in fact be right. The crux is leaving around obsolete CM_Interleave (and dependent) markings of instructions along with their costs, instead of recalculating decisions, costs, and recipes. Alternatively to forcing a complete recompute later on, we could try to selectively invalidate the decisions connected to the interleave groups. But we would likely need to run the uniform/scalar value detection parts again anyways and the extra complexity is probably not worth it. Fixes PR45572. Reviewers: gilr, rengolin, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D78298	2020-04-18 10:23:49 +01:00
Benjamin Kramer	c5e7c2691d	Remove accidental include. Thank you clangd.	2020-04-17 16:36:30 +02:00
Benjamin Kramer	b639091c02	Change users of CreateShuffleVector to pass the masks as int instead of Constants No functionality change intended.	2020-04-17 16:34:29 +02:00
Benjamin Kramer	166467e822	[VectorUtils] Create shufflevector masks as int vectors instead of Constants No functionality change intended.	2020-04-17 15:28:00 +02:00
Simon Pilgrim	fa7f328a15	[cmake] LLVMVectorize - add include/llvm/Transforms/Vectorize header path MSVC projects were missing the llvm/Transforms/Vectorize/* headers	2020-04-17 11:06:26 +01:00
Florian Hahn	3f7f06888b	[VPlan] Branches are not widened by VPWidenRecipe, assert (NFC).	2020-04-15 12:03:45 +01:00
Benjamin Kramer	6f64daca8f	Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints No functionality change intended.	2020-04-15 12:51:38 +02:00
Florian Hahn	5b4b3e0b6e	[VPlan] Move widening check for non-memory/non-calls to function (NFC). After introducing VPWidenSelectRecipe, the duplicated logic can be shared. Reviewers: gilr, rengolin, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D77973	2020-04-15 11:48:37 +01:00
Florian Hahn	79d185c792	[VPlan] Move Load/Store checks out of tryToWiden (NFC). Handling LoadInst and StoreInst in tryToWiden seems a bit counter-intuitive, as there is only an assertion for them and in no case VPWidenRefipes are created for them. I think it makes sense to move the assertion to handleReplication, where the non-widened loads and store are handled. Reviewers: gilr, rengolin, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D77972	2020-04-15 10:18:42 +01:00
Gil Rapaport	b747d72c19	[LV] Fix PR45525: Incorrect assert in blend recipe Fix an assert introduced in 41ed5d856c1: a phi with a single predecessor and a mask is a valid case which is already supported by the code. Differential Revision: https://reviews.llvm.org/D78115	2020-04-15 10:39:07 +03:00
Teresa Johnson	33ffb62e23	Allow disabling of vectorization using internal options Summary: Currently, the internal options -vectorize-loops, -vectorize-slp, and -interleave-loops do not have much practical effect. This is because they are used to initialize the corresponding flags in the pass managers, and those flags are then unconditionally overwritten when compiling via clang or via LTO from the linkers. The only exception was -vectorize-loops via opt because of some special hackery there. While vectorization could still be disabled when compiling via clang, using -fno-[slp-]vectorize, this meant that there was no way to disable it when compiling in LTO mode via the linkers. This only affected ThinLTO, since for regular LTO vectorization is done during the compile step for scalability reasons. For ThinLTO it is invoked in the LTO backends. See also the discussion on PR45434. This patch makes it so the internal options can actually be used to disable these optimizations. Ultimately, the best long term solution is to mark the loops with metadata (similar to the approach used to fix -fno-unroll-loops in D77058), but this enables a shorter term workaround, and actually makes these internal options useful. I constant propagated the initial values of these internal flags into the pass manager flags (for some reasons vectorize-loops and interleave-loops were initialized to true, while vectorize-slp was initialized to false). As mentioned above, they are overwritten unconditionally so this doesn't have any real impact, and these initial values aren't particularly meaningful. I then changed the passes to check the internl values and return without performing the associated optimization when false (I changed the default of -vectorize-slp to true so the options behave similarly). I was able to remove the hackery in opt used to get -vectorize-loops=false to work, as well as a special option there used to disable SLP vectorization. Finally, I changed thinlto-slp-vectorize-pm.c to: a) Only test SLP (moved the loop vectorization checking to a new test). b) Use code that is slp vectorized when it is enabled, and check that instead of whether the pass is enabled. c) Test the new behavior of -vectorize-slp. d) Test both pass managers. The loop vectorization (and associated interleaving) testing I moved to a new thinlto-loop-vectorize-pm.c test, with several changes: a) Changed the flags on the interleaving testing so that it will actually interleave, and check that. b) Test the new behavior of -vectorize-loops and -interleave-loops. c) Test both pass managers. Reviewers: fhahn, wmi Subscribers: hiraditya, steven_wu, dexonsmith, cfe-commits, davezarzycki, llvm-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77989	2020-04-14 18:09:10 -07:00
Christopher Tetreault	3297e9b7c3	Clean up usages of asserting vector getters in Type Summary: Remove usages of asserting vector getters in Type in preparation for the VectorType refactor. The existence of these functions complicates the refactor while adding little value. Reviewers: rriddle, sdesmalen, efriedma Reviewed By: efriedma Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77259	2020-04-13 12:29:43 -07:00
Gil Rapaport	41ed5d856c	[LV] Clean up vectorizeInterleaveGroup (NFCI) Pass from the calling recipe the interleave group itself instead of passing the group's insertion position and having the function query CM for its interleave group and making sure that given instruction is the insertion point of. Differential Revision: https://reviews.llvm.org/D78002	2020-04-13 13:15:06 +03:00
Florian Hahn	18138e0252	[VPlan] Introduce VPWidenSelectRecipe (NFC). Widening a selects depends on whether the condition is loop invariant or not. Rather than checking during codegen-time, the information can be recorded at the VPlan construction time. This was suggested as part of D76992, to reduce the reliance on accessing the original underlying IR values. Reviewers: gilr, rengolin, Ayal, hsaito Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D77869	2020-04-13 08:35:28 +01:00
Florian Hahn	ae1e353a25	[VPlan] Turn classes with all public members into structs (NFC). struct should be used when all members are public: https://llvm.org/docs/CodingStandards.html#use-of-class-and-struct-keywords Reviewers: gilr, rengolin, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D77865	2020-04-12 11:03:39 +01:00
Sanjay Patel	1318ddbc14	[VectorUtils] rename scaleShuffleMask to narrowShuffleMaskElts; NFC As proposed in D77881, we'll have the related widening operation, so this name becomes too vague. While here, change the function signature to take an 'int' rather than 'size_t' for the scaling factor, add an assert for overflow of 32-bits, and improve the documentation comments.	2020-04-11 10:05:49 -04:00
Florian Hahn	719846c469	[VPlan] Drop redundant private: at beginning of class defs (NFC). Default visibility for classes is private, so the private: at the top of various class definitions is redundant. Reviewers: gilr, rengolin, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D77810	2020-04-11 13:27:10 +01:00
Gil Rapaport	e2a1867880	[LV] Add VPValue operands to VPBlendRecipe (NFCI) InnerLoopVectorizer's code called during VPlan execution still relies on original IR's def-use relations to decide which vector code to generate, limiting VPlan transformations ability to modify def-use relations and still have ILV generate the vector code. This commit introduces VPValues for VPBlendRecipe to use as the values to blend. The recipe is generated with VPValues wrapping the phi's incoming values of the scalar phi. This reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. Differential Revision: https://reviews.llvm.org/D77539	2020-04-09 18:48:33 +03:00
Ayal Zaks	1678489234	[LV] FoldTail w/o Primary Induction Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector induction for use in fold-tail-with-masking, if a primary induction is absent. The canonical scalar IV having start = 0 and step = VFUF, created during code -gen to control the vector loop, is widened into a canonical vector IV having start = {<PartVF, PartVF+1, ..., PartVF+VF-1> for 0 <= Part < UF} and step = <VFUF, VFUF, ..., VF*UF>. Differential Revision: https://reviews.llvm.org/D77635	2020-04-09 17:45:23 +03:00
Florian Hahn	a7efe06af0	[LV] Assert no DbgInfoIntrinsic calls are passed to widening (NFC). When building a VPlan, BasicBlock::instructionsWithoutDebug() is used to iterate over the instructions in a block. This means that no recipes should be created for debug info intrinsics already and we can turn the early exit into an assertion. Reviewers: Ayal, gilr, rengolin, aprantl Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D77636	2020-04-09 11:37:32 +01:00
Florian Hahn	9997ee23ed	[VPlan] Add & use VPValue operands for VPWidenCallRecipe (NFC). This patch adds VPValue versions for the arguments of the call to VPWidenCallRecipe and uses them during code-generation. Similar to D76373 this reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. Reviewers: Ayal, gilr, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D77655	2020-04-09 10:23:26 +01:00
Eli Friedman	3f13ee8a00	[NFC] Modernize misc. uses of Align/MaybeAlign APIs. Use the current getAlign() APIs where it makes sense, and use Align instead of MaybeAlign when we know the value is non-zero.	2020-04-06 17:53:04 -07:00
Eli Friedman	68b03aee1a	Remove SequentialType from the type heirarchy. Now that we have scalable vectors, there's a distinction that isn't getting captured in the original SequentialType: some vectors don't have a known element count, so counting the number of elements doesn't make sense. In some cases, there's a better way to express the commonality using other methods. If we're dealing with GEPs, there's GEP methods; if we're dealing with a ConstantDataSequential, we can query its element type directly. In the relatively few remaining cases, I just decided to write out the type checks. We're talking about relatively few places, and I think the abstraction doesn't really carry its weight. (See thread "[RFC] Refactor class hierarchy of VectorType in the IR" on llvmdev.) Differential Revision: https://reviews.llvm.org/D75661	2020-04-06 17:03:49 -07:00
Florian Hahn	7aba6a0333	[LV] Fix value that could be read uninitialized. This should fix http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-msan/builds/18569	2020-04-06 17:54:50 +01:00
Florian Hahn	90be3c24a7	[VPlan] Introduce new VPWidenCallRecipe (NFC). This patch moves calls to their own recipe, to simplify the transition to VPUser for operands of VPWidenRecipe, as discussed in D76992. Subsequently additional information can be added to the recipe rather than computing it during the execute step. Reviewers: rengolin, Ayal, gilr, hsaito Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D77467	2020-04-06 16:07:37 +01:00
Florian Hahn	a2b18c5a08	[LV] Simplify tryToWiden as recipes are not re-used (NFC). After `49d00824bb`, VPWidenRecipe only stores a single instruction. tryToWiden can simply return the widen recipe, like other helpers in VPRecipeBuilder.	2020-04-04 18:30:50 +01:00
Sanjay Patel	ce97ce3a5d	[VectorCombine] try to form a better extractelement Extracting to the same index that we are going to insert back into allows forming select ("blend") shuffles and enables further transforms. Admittedly, this is a quick-fix for a more general problem that I'm hoping to solve by adding transforms for patterns that start with an insertelement. But this might resolve some regressions known to be caused by the extract-extract transform (although I have not gotten more details on those yet). In the motivating case from PR34724: https://bugs.llvm.org/show_bug.cgi?id=34724 The combination of subsequent instcombine and codegen transforms gets us this improvement: vmovshdup %xmm0, %xmm2 ## xmm2 = xmm0[1,1,3,3] vhaddps %xmm1, %xmm1, %xmm4 vmovshdup %xmm1, %xmm3 ## xmm3 = xmm1[1,1,3,3] vaddps %xmm0, %xmm2, %xmm0 vaddps %xmm1, %xmm3, %xmm1 vshufps $200, %xmm4, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm4[0,3] vinsertps $177, %xmm1, %xmm0, %xmm0 ## xmm0 = zero,xmm0[1,2],xmm1[2] --> vmovshdup %xmm0, %xmm2 ## xmm2 = xmm0[1,1,3,3] vhaddps %xmm1, %xmm1, %xmm1 vaddps %xmm0, %xmm2, %xmm0 vshufps $200, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,3] Differential Revision: https://reviews.llvm.org/D76623	2020-04-03 13:55:13 -04:00
Guillaume Chatelet	1a584a8d50	[Alignment][NFC] Remove unused private functions Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77297	2020-04-03 09:16:20 +00:00
Sanjay Patel	b6050ca181	[VectorCombine] transform bitcasted shuffle to narrower elements bitcast (shuf V, MaskC) --> shuf (bitcast V), MaskC' We do not attempt this in InstCombine because we do not want to change types and create new shuffle ops that are potentially not lowered as well as the original code. Here, we can check the cost model to see if it is worthwhile. I've aggressively enabled this transform even if the types are the same size and/or equal cost because moving the bitcast allows InstCombine to make further simplifications. In the motivating cases from PR35454: https://bugs.llvm.org/show_bug.cgi?id=35454 ...this is enough to let instcombine and the backend eliminate the redundant shuffles, but we probably want to extend VectorCombine to handle the inverse pattern (shuffle-of-bitcast) to get that simplification directly in IR. Differential Revision: https://reviews.llvm.org/D76727	2020-04-02 13:30:22 -04:00
Eli Friedman	1ee6ec2bf3	Remove "mask" operand from shufflevector. Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467	2020-03-31 13:08:59 -07:00
Vedant Kumar	dcc410b5cf	[LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259) In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken count is a SCEV add expression, its type is defined by the type of the last operand of the add expression. In the test case from PR45259, this last operand happens to be a pointer, which (according to llvm::Type) does not have a primitive size in bits. In this case, LoopVectorize fails to truncate the SCEV and crashes as a result. Uing ScalarEvolution::getTypeSizeInBits makes the truncation work as expected. https://bugs.llvm.org/show_bug.cgi?id=45259 Differential Revision: https://reviews.llvm.org/D76669	2020-03-30 10:14:14 -07:00
Sanjay Patel	fc3cc8a4b0	[VectorCombine] skip debug intrinsics first for efficiency	2020-03-29 13:58:04 -04:00
Florian Hahn	49d00824bb	[VPlan] Use one VPWidenRecipe per original IR instruction. (NFC). This patch changes VPWidenRecipe to only store a single original IR instruction. This is the first required step towards modeling it's operands as VPValues and also towards breaking it up into a VPInstruction. Discussed as part of D74695. Reviewers: Ayal, gilr, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D76988	2020-03-29 13:47:28 +01:00
Sjoerd Meijer	401a324c51	[LV] Refactor widenIntOrFpInduction. NFC. This untangles the logic in widenIntOrFpInduction in order to make more explicit and visible how exactly the induction variable is lowered. Differential Revision: https://reviews.llvm.org/D76686	2020-03-27 12:58:50 +00:00
Gil Rapaport	078c863305	[LV] Replace stored value with a VPValue (NFCI) InnerLoopVectorizer's code called during VPlan execution still relies on original IR's def-use relations to decide which vector code to generate, limiting VPlan transformations ability to modify def-use relations and still have ILV generate the vector code. This commit introduces a VPValue for VPWidenMemoryInstructionRecipe to use as the stored value. The recipe is generated with a VPValue wrapping the stored value of the scalar store. This reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. Differential Revision: https://reviews.llvm.org/D76373	2020-03-25 19:36:55 +02:00
Guillaume Chatelet	32851f8d63	[Alignment][NFC] Deprecate VectorUtils::getAlignment Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, rogfer01, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76542	2020-03-23 13:54:15 +01:00
Nikita Popov	a63eaa5449	[SLP] Avoid repeated visitation in getVectorElementSize(); NFC We need to insert into the Visited set at the same time we insert into the worklist. Otherwise we may end up pushing the same instruction to the worklist multiple times, and only adding it to the visited set later.	2020-03-22 14:34:29 +01:00
Florian Hahn	fd2c15e602	[VPlan] Do not print mapping for Value2VPValue. The latest improvements to VPValue printing make this mapping clear when printing the operand. Printing the mapping separately is not required any longer. Reviewers: rengolin, hsaito, Ayal, gilr Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D76375	2020-03-18 21:44:07 +00:00
Florian Hahn	00c1cd1934	[VPlan] Record underlying value for VPValues created by addVPValue (NFC). Now that printing VPValues uses the underlying IR value name, if available, recording the underlying value here improves printing. Reviewers: rengolin, hsaito, Ayal, gilr Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D76374	2020-03-18 21:30:58 +00:00
Eli Friedman	e24e95fe90	Remove CompositeType class. The existence of the class is more confusing than helpful, I think; the commonality is mostly just "GEP is legal", which can be queried using APIs on GetElementPtrInst. Differential Revision: https://reviews.llvm.org/D75660	2020-03-18 13:53:17 -07:00
Florian Hahn	e6a74803d4	[VPlan] Use underlying value for printing, if available. When the an underlying value is available, we can use its name for printing, as discussed in D73078. Reviewers: rengolin, hsaito, Ayal, gilr Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D76200	2020-03-18 17:46:57 +00:00
Huihui Zhang	fc1f205745	[SLPVectorizer][SVE] Bail out early for scalable vector. Summary: SLPVectorizer try to vectorize list of scalar instructions of the same type, instructions already vectorized are rejected through isValidElementType(). Without this patch, tryToVectorizeList() will first try to determine vectorization factor of a list of Instructions before checking whether each instruction has unsupported type or not. For instructions already vectorized for SVE, it will crash at getVectorElementSize(), where it try to return a fixed size. This patch make sure invalid element types are rejected before trying to get vectorization factor. This make sure we are not trying to vectorize instructions already vectorized. Reviewers: sdesmalen, efriedma, spatel, RKSimon, ABataev, apazos, rengolin Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76017	2020-03-13 11:23:31 -07:00
Huihui Zhang	118abf2017	[SVE] Update API ConstantVector::getSplat() to use ElementCount. Summary: Support ConstantInt::get() and Constant::getAllOnesValue() for scalable vector type, this requires ConstantVector::getSplat() to take in 'ElementCount', instead of 'unsigned' number of element count. This change is needed for D73753. Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett Reviewed By: efriedma Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74386	2020-03-12 13:22:41 -07:00
Anna Welker	a6d3bec83f	[TTI][ARM][MVE] Refine gather/scatter cost model Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for the gather/scatter cost evaluation. This did require trivial changes in some non-ARM backends to adopt the new parameter. Extending gathers and truncating scatters are now priced cheaper. Differential Revision: https://reviews.llvm.org/D75525	2020-03-11 10:23:41 +00:00
Benjamin Kramer	247a177cf7	Give helpers internal linkage. NFC.	2020-03-10 18:27:42 +01:00
Florian Hahn	2d6ecf4648	[SLP] Support vectorizing functions provided by vector libs. It seems like the SLPVectorizer is currently not aware of vector versions of functions provided by libraries like Accelerate [1]. This patch updates SLPVectorizer to use the same infrastructure the LoopVectorizer uses to detect vectorizable library functions. For calls, it computes the cost of an intrinsic call (existing behavior) and the cost of a vector function library call, if available. Like LoopVectorizer, it assumes the cost of the vector function is simply the cost of a call to a vector function. [1] https://developer.apple.com/documentation/accelerate Reviewers: ABataev, RKSimon, spatel Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D75878	2020-03-10 13:10:50 +00:00
Sanjay Patel	a69158c12a	[VectorCombine] fold extract-extract-op with different extraction indexes opcode (extelt V0, Ext0), (ext V1, Ext1) --> extelt (opcode (splat V0, Ext0), V1), Ext1 The first part of this patch generalizes the cost calculation to accept different extraction indexes. The second part creates a shuffle+extract before feeding into the existing code to create a vector op+extract. The patch conservatively uses "TargetTransformInfo::SK_PermuteSingleSrc" rather than "TargetTransformInfo::SK_Broadcast" (splat specifically from element 0) because we do not have a more general "SK_Splat" currently. That does not affect any of the current regression tests, but we might be able to find some cost model target specialization where that comes into play. I suspect that we can expose some missing x86 horizontal op codegen with this transform, so I'm speculatively adding a debug flag to disable the binop variant of this transform to allow easier testing. The test changes show that we're sensitive to cost model diffs (as we should be), so that means that patches like D74976 should have better coverage. Differential Revision: https://reviews.llvm.org/D75689	2020-03-08 09:57:55 -04:00
Florian Hahn	40e7bfc424	[VPlan] Use consecutive numbers to print VPValues instead of addresses. Currently when printing VPValues we use the object address, which makes it hard to distinguish VPValues as they usually are large numbers with varying distance between them. This patch adds a simple slot tracker, similar to the ModuleSlotTracker used for IR values. In order to dump a VPValue or anything containing a VPValue, a slot tracker for the enclosing VPlan needs to be created. The existing VPlanPrinter can take care of that for the existing code. We assign consecutive numbers to each VPValue we encounter in a reverse post order traversal of the VPlan. Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D73078	2020-03-05 14:55:15 +00:00
Simon Pilgrim	01a91a6de7	Fix static analyzer uninitialized variable warning. NFCI.	2020-03-05 14:22:24 +00:00
Florian Hahn	05afa55521	[VPlan] Add getPlan() to VPBlockBase. This patch adds a getPlan accessor to VPBlockBase, which finds the entry block of the plan containing the block and returns the plan set for this block. VPBlockBase contains a VPlan pointer, but it should only be set for the entry block of a plan. This allows moving blocks without updating the pointer for each moved block and in the future we might introduce a parent relationship between plans and blocks, similar to the one in LLVM IR. Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D74445	2020-03-03 13:20:13 +00:00
David Green	d0d38df091	[LoopVectorizer] Change types of lists from pointers to references. NFC getReductionVars, getInductionVars and getFirstOrderRecurrences were all being returned from LoopVectorizationLegality as pointers to lists. This just changes them to be references, cleaning up the interface slightly. Differential Revision: https://reviews.llvm.org/D75448	2020-03-02 15:04:41 +00:00
Austin Kerbow	4fa63fd452	[VectorCombine] Fix assert on compare extract index Extract index could be a differnet integral type. Differential Revision: https://reviews.llvm.org/D75327	2020-02-28 10:37:08 -08:00
Valery N Dmitriev	d723ec4f04	[SLP][NFC] Assert that tree entry operands completed when scheduler looks for dependencies. This change adds an assertion to prevent tricky bug related to recursive approach of building vectorization tree. For loop below takes number of operands directly from tree entry rather than from scalars. If the entry at this moment turns out incomplete (i.e. not all operands set) then not all the dependencies will be seen by the scheduler. This can lead to failed scheduling (and thus failed vectorization) for perfectly vectorizable tree. Here is code example which is likely to fire the assertion: for (i : VL0->getNumOperands()) { ... TE->setOperand(i, Operands); buildTree_rec(Operands, Depth + 1,...); } Correct way is two steps process: first set all operands to a tree entry and then recursively process each operand. Differential Revision: https://reviews.llvm.org/D75296	2020-02-28 10:34:48 -08:00
Valery N Dmitriev	02e5e47e17	[SLP][NFC] Delete some unreachable code. This patch deletes some dead code out of SLP vectorizer. Couple of changes taken out of D57059 to slightly lighten it plus one more similar case fixed. Differential Revision: https://reviews.llvm.org/D75276	2020-02-28 09:22:51 -08:00
Nemanja Ivanovic	c46b85aaf4	[LoopVectorize] Fix cost for calls to functions that have vector versions A recent commit (https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed the cost for calls to functions that have a vector version for some vectorization factor. However, no check is performed for whether the vectorization factor matches the current one being cost modeled. This leads to attempts to widen call instructions to a vectorization factor for which such a function does not exist, which in turn leads to an assertion failure. This patch adds the check for vectorization factor (i.e. not just that the called function has a vector version for some VF, but that it has a vector version for this VF). Differential revision: https://reviews.llvm.org/D74944	2020-02-26 21:39:11 -06:00
Sanjay Patel	25c6544f32	[VectorCombine] add a debug flag to skip all transforms As suggested in D75145 - I'm not sure why, but several passes have this kind of disable/enable flag implemented at the pass manager level. But that means we have to duplicate the flag for both pass managers and add code to check the flag every time the pass appears in the pipeline. We want a debug option to see if this pass is misbehaving regardless of the pass managers, so just add a disablement check at the single point before any transforms run. Differential Revision: https://reviews.llvm.org/D75204	2020-02-26 15:15:42 -05:00
Sanjay Patel	10ea01d80d	[VectorCombine] make cost calc consistent for binops and cmps Code duplication (subsequently removed by refactoring) allowed a logic discrepancy to creep in here. We were being conservative about creating a vector binop -- but not a vector cmp -- in the case where a vector op has the same estimated cost as the scalar op. We want to be more aggressive here because that can allow other combines based on reduced instruction count/uses. We can reverse the transform in DAGCombiner (potentially with a more accurate cost model) if this causes regressions. AFAIK, this does not conflict with InstCombine. We have a scalarize transform there, but it relies on finding a constant operand or a matching insertelement, so that means it eliminates an extractelement from the sequence (so we won't have 2 extracts by the time we get here if InstCombine succeeds). Differential Revision: https://reviews.llvm.org/D75062	2020-02-25 08:41:59 -05:00
Sanjay Patel	e9c79a7aef	[VectorCombine] refactor to reduce duplicated code; NFC This should be the last step in the current cleanup. Follow-ups should resolve the TODO about cost calc and enable the more general case where we extract different elements.	2020-02-21 15:56:00 -05:00
Sanjay Patel	34e3485560	[VectorCombine] refactor cost calcs to reduce duplication; NFC More cleanup is possible now, but we probably need to resolve the TODO about the existing difference between compares and binops.	2020-02-21 15:12:00 -05:00
Florian Hahn	98f5268a72	[VectorUtils] Move ToVectorTy to VectorUtils.h (NFC). ToVectorTy is defined and used in multiple places. Hoist it to VectorUtils.h to avoid duplication and improve re-usability. Reviewers: rengolin, hsaito, Ayal, gilr, fpetrogalli Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D74959	2020-02-21 17:31:24 +00:00
Sanjay Patel	fc4455891c	[VectorCombine] refactor matching code to reduce duplication; NFC cmp/binop were already diverging even though they are largely the same logic.	2020-02-21 12:06:51 -05:00
Reid Kleckner	0c2b09a9b6	[IR] Lazily number instructions for local dominance queries Essentially, fold OrderedBasicBlock into BasicBlock, and make it auto-invalidate the instruction ordering when new instructions are added. Notably, we don't need to invalidate it when removing instructions, which is helpful when a pass mostly delete dead instructions rather than transforming them. The downside is that Instruction grows from 56 bytes to 64 bytes. The resulting LLVM code is substantially simpler and automatically handles invalidation, which makes me think that this is the right speed and size tradeoff. The important change is in SymbolTableTraitsImpl.h, where the numbering is invalidated. Everything else should be straightforward. We probably want to implement a fancier re-numbering scheme so that local updates don't invalidate the ordering, but I plan for that to be future work, maybe for someone else. Reviewed By: lattner, vsk, fhahn, dexonsmith Differential Revision: https://reviews.llvm.org/D51664	2020-02-18 14:44:24 -08:00
Huihui Zhang	8ee0e1dc02	[NFC] Silence compiler warning [-Wmissing-braces].	2020-02-18 10:37:12 -08:00
Florian Hahn	e32522ca17	[SLPVectorizer] Do not assume extracelement idx is a ConstantInt. The index of an ExtractElementInst is not guaranteed to be a ConstantInt. It can be any integer value. Check explicitly for ConstantInts. The new test cases illustrate scenarios where we crash without this patch. I've also added another test case to check the matching of extractelement vector ops works. Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D74758	2020-02-18 18:16:06 +01:00
Benjamin Kramer	5fc5c7db38	Strength reduce vectors into arrays. NFCI.	2020-02-17 15:37:35 +01:00
Sanjay Patel	62dd44d76d	[VectorCombine] fix cost calc for extract-cmp getOperationCost() is not the cost we wanted; that's not the throughput value that the rest of the calculation uses. We may want to switch everything in this code to use the getInstructionThroughput() wrapper to avoid these kinds of problems, but I'll look at that as a follow-up because that can create other logical diffs via using optional parameters (we'd need to speculatively create the vector instruction to make a fair(er) comparison).	2020-02-16 10:40:28 -05:00
Kadir Cetinkaya	1674f772b4	[VecotrCombine] Fix unused variable for assertion disabled builds	2020-02-14 09:30:29 +01:00
Sanjay Patel	19b62b79db	[VectorCombine] try to form vector binop to eliminate an extract element binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C This is a transform that has been considered for canonicalization (instcombine) in the past because it reduces instruction count. But as shown in the x86 tests, it's impossible to know if it's profitable without a cost model. There are many potential target constraints to consider. We have implemented similar transforms in the backend (DAGCombiner and target-specific), but I don't think we have this exact fold there either (and if we did it in SDAG, it wouldn't work across blocks). Note: this patch was intended to handle the more general case where the extract indexes do not match, but it got too big, so I scaled it back to this pattern for now. Differential Revision: https://reviews.llvm.org/D74495	2020-02-13 17:23:27 -05:00
Matt Arsenault	86f9117d47	AMDGPU: Don't report 2-byte alignment as fast This is apparently worse than 1-byte alignment. This does not attempt to decompose 2-byte aligned wide stores, but will stop trying to produce them. Also fix bug in LoadStoreVectorizer which was decreasing the alignment and vectorizing stack accesses. It was assuming a stack object was an alloca that could have its base alignment changed, which is not true if the pointer is derived from a function argument.	2020-02-11 18:35:00 -05:00
Sanjay Patel	a2a0f9a43a	[VectorCombine] remove unused debug counter; NFC The variable was added to the initial commit via copy/paste of existing code, but it wasn't actually used in the code. We can add it back with the proper usage if/when that is needed.	2020-02-11 08:24:07 -05:00
Sanjay Patel	a17f03bd93	[VectorCombine] new IR transform pass for partial vector ops We have several bug reports that could be characterized as "reducing scalarization", and this topic was also raised on llvm-dev recently: http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html ...so I'm proposing that we deal with these patterns in a new, lightweight IR vector pass that runs before/after other vectorization passes. There are 4 alternate options that I can think of to deal with this kind of problem (and we've seen various attempts at all of these), but they all have flaws: InstCombine - can't happen without TTI, but we don't want target-specific folds there. SDAG - too late to assist other vectorization passes; TLI is not equipped for these kind of cost queries; limited to a single basic block. CGP - too late to assist other vectorization passes; would need to re-implement basic cleanups like CSE/instcombine. SLP - doesn't fit with existing transforms; limited to a single basic block. This initial patch/transform is based on existing code in AggressiveInstCombine: we walk backwards through the function looking for a pattern match. But we diverge from that cost-independent IR canonicalization pass by using TTI to decide if the vector alternative is profitable. We probably have at least 10 similar bug reports/patterns (binops, constants, inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements. It's possible that we could iterate on a worklist to fix-point like InstCombine does, but it's safer to start with a most basic case and evolve from there, so I didn't try to do anything fancy with this initial implementation. Differential Revision: https://reviews.llvm.org/D73480	2020-02-09 10:04:41 -05:00
Florian Hahn	d1f849a284	[LV] Hoist code to mark conditional assumes as dead to caller (NFC). This is a follow-up suggested in D73423. It is sufficient to just add the conditional assumes to DeadInstructions once.	2020-01-28 08:50:44 -08:00
Florian Hahn	a911fef3dd	[LV] Do not try to sink dead instructions. Dead instructions do not need to be sunk. Currently we try and record the recipies for them, but there are no recipes emitted for them and there's nothing to sink. They can be removed from SinkAfter while marking them for recording. Fixes PR44634. Reviewers: rengolin, hsaito, fhahn, Ayal, gilr Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D73423	2020-01-28 08:28:03 -08:00
Guillaume Chatelet	d0a7cc7177	[Alignment][NFC] Use Align with CreateMaskedScatter/Gather Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 This patch shows that CreateMaskedScatter/CreateMaskedGather can only take positive non zero alignment values. Reviewers: courbet Subscribers: hiraditya, llvm-commits, delena Tags: #llvm Differential Revision: https://reviews.llvm.org/D73361	2020-01-27 10:17:14 +01:00
Guillaume Chatelet	59f95222d4	[Alignment][NFC] Use Align with CreateAlignedStore Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet, bollu Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73274	2020-01-23 17:34:32 +01:00
Guillaume Chatelet	279fa8e006	[Alignement][NFC] Deprecate untyped CreateAlignedLoad Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73260	2020-01-23 13:34:32 +01:00
Florian Hahn	f14f2a8568	[LV] Fix predication for branches with matching true and false succs. Currently due to the edge caching, we create wrong predicates for branches with matching true and false successors. We will cache the condition for the edge from the true successor, and then lookup the same edge (src and dst are the same) for the edge to the false successor. If both successors match, the condition should always be true. At the moment, we cannot really create constant VPValues, but we can just create a true condition as X \| !X. Later passes will clean that up. Fixes PR44488. Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D73079	2020-01-22 18:34:11 -08:00
Guillaume Chatelet	0957233320	[Alignment][NFC] Use Align with CreateMaskedStore Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73106	2020-01-22 11:04:39 +01:00
Andrei Elovikov	e1d6d36852	[SLP] Don't allow Div/Rem as alternate opcodes Summary: We don't have control/verify what will be the RHS of the division, so it might happen to be zero, causing UB. Reviewers: Vasilis, RKSimon, ABataev Reviewed By: ABataev Subscribers: vporpo, ABataev, hiraditya, llvm-commits, vdmitrie Tags: #llvm Differential Revision: https://reviews.llvm.org/D72740	2020-01-21 15:21:17 -08:00
Guillaume Chatelet	bc8a1ab26f	[Alignment][NFC] Use Align with CreateMaskedLoad Summary: This is patch is part of a series to introduce an Alignment type. See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html See this patch for the introduction of the type: https://reviews.llvm.org/D64790 Reviewers: courbet Subscribers: hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73087	2020-01-21 14:13:22 +01:00
Evgeniy Brevnov	af7e158872	[LV] Vectorizer should adjust trip count in profile information Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly. Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov Reviewed By: Ayal, DaniilSuchkov Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67905	2020-01-20 18:36:28 +07:00
Francesco Petrogalli	66c120f025	[VectorUtils] Rework the Vector Function Database (VFDatabase). Summary: This commits is a rework of the patch in https://reviews.llvm.org/D67572. The rework was requested to prevent out-of-tree performance regression when vectorizing out-of-tree IR intrinsics. The vectorization of such intrinsics is enquired via the static function `isTLIScalarize`. For detail see the discussion in https://reviews.llvm.org/D67572. Reviewers: uabelho, fhahn, sdesmalen Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72734	2020-01-16 15:08:26 +00:00
Florian Hahn	23c113802e	[LV] Allow assume calls in predicated blocks. The assume intrinsic is intentionally marked as may reading/writing memory, to avoid passes moving them around. When flattening the CFG for predicated blocks, we have to drop the assume calls, as they are control-flow dependent. There are some cases where we can do better (when control flow is preserved), but that is follow-up work. Fixes PR43620. Reviewers: hsaito, rengolin, dcaballe, Ayal Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D68814	2020-01-16 10:11:35 +00:00
Benjamin Kramer	498856fca5	[LV] Silence unused variable warning in Release builds. NFC.	2020-01-10 11:21:27 +01:00
Gil Rapaport	8647a72c4a	[LV] VPValues for memory operation pointers (NFCI) Memory instruction widening recipes use the pointer operand of their load/store ingredient for generating the needed GEPs, making it difficult to feed these recipes with pointers based on other ingredients or none at all. This patch modifies these recipes to use a VPValue for the pointer instead, in order to reduce ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. The recipes are constructed with VPValues bound to these ingredients, maintaining current behavior. Differential revision: https://reviews.llvm.org/D70865	2020-01-10 09:24:59 +02:00
Sjoerd Meijer	8f1887456a	[LV] Still vectorise when tail-folding can't find a primary inducation variable This addresses a vectorisation regression for tail-folded loops that are counting down, e.g. loops as simple as this: void foo(char A, char B, char C, uint32_t N) { while (N > 0) { C++ = A++ + B++; N--; } } These are loops that can be vectorised, but when tail-folding is requested, it can't find a primary induction variable which we do need for predicating the loop. As a result, the loop isn't vectorised at all, which it is able to do when tail-folding is not attempted. So, this adds a check for the primary induction variable where we decide how to lower the scalar epilogue. I.e., when there isn't a primary induction variable, a scalar epilogue loop is allowed (i.e. don't request tail-folding) so that vectorisation could still be triggered. Having this check for the primary induction variable make sense anyway, and in addition, in a follow-up of this I will look into discovering earlier the primary induction variable for counting down loops, so that this can also be tail-folded. Differential revision: https://reviews.llvm.org/D72324	2020-01-09 09:14:00 +00:00
Florian Hahn	b8a3c34eee	Revert "[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC)." This reverts commit `51ef53f3bd`, as it breaks some bots.	2020-01-04 18:44:38 +00:00
Florian Hahn	51ef53f3bd	[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC). SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537	2020-01-04 18:29:35 +00:00
Evgeniy Brevnov	948e745270	[LV][NFC] Keep dominator tree up to date during vectorization.	2019-12-30 18:38:41 +07:00

... 13 14 15 16 17 ...

3283 Commits