llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	6c3451cd76	[VPlan] Add VPReductionPHIRecipe (NFC). This patch is a first step towards splitting up VPWidenPHIRecipe into separate recipes for the 3 distinct cases they model: 1. reduction phis, 2. first-order recurrence phis, 3. pointer induction phis. This allows untangling the code generation and allows us to reduce the reliance on LoopVectorizationCostModel during VPlan code generation. Discussed/suggested in D100102, D100113, D104197. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104989	2021-07-06 11:25:28 +01:00
Florian Hahn	f1a6430272	[VPlan] Track both incoming values for first-order recurrence phis. This patch updates VPWidenPHI recipes for first-order recurrences to also track the incoming value from the back-edge. Similar to D99294, which did the same for reductions. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D104197	2021-06-27 14:29:35 +01:00
Florian Hahn	1465e7770b	[VPlan] Print successors of VPRegionBlocks. The non-DOT printing does not include the successors of VPregionBlocks. This patch use the same style for printing successors as for VPBasicBlock. I think the printing of successors could be a bit improved further, as at the moment it is hard to ensure a check line matches all successors. But that can be done as follow-up. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D103515	2021-06-07 17:57:21 +01:00
Florian Hahn	e9d97d7d9d	[VPlan] Add mayReadOrWriteMemory & friends. This patch adds initial implementation of mayReadOrWriteMemory, mayReadFromMemory and mayWriteToMemory to VPRecipeBase. Used by D100258.	2021-05-24 13:11:32 +01:00
Florian Hahn	cc1a6361d3	[VPlan] Add VPUserID to distinguish between recipes and others. This allows cast/dyn_cast'ing from VPUser to recipes. This is needed because there are VPUsers that are not recipes. Reviewed By: gilr, a.elovikov Differential Revision: https://reviews.llvm.org/D100257	2021-05-18 09:17:28 +01:00
Florian Hahn	ccebf7a109	[VPlan] Properly handle sinking of replicate regions. This patch updates the code that sinks recipes required for first-order recurrences to properly handle replicate-regions. At the moment, the code would just move the replicate recipe out of its replicate-region, producing an invalid VPlan. When sinking a recipe in a replicate-region, we have to sink the whole region. To do that, we first need to split the block at the target recipe and move the region in between. This patch also adds a splitAt helper to VPBasicBlock to split a VPBasicBlock at a given iterator. Fixes PR50009. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100751	2021-05-04 22:36:01 +01:00
Florian Hahn	4ba8720f88	[VPlan] Representing backedge def-use feeding reduction phis. This patch updates the code handling reduction recipes to also keep track of the incoming value from the latch in the recipe. This is needed to model the def-use chains completely in VPlan, so that it is possible to replace the incoming value with an arbitrary VPValue. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D99294	2021-05-04 16:33:22 +01:00
Florian Hahn	2b7fa7f744	[LV] Iterate over recipes in VPlan to fix PHI (NFC). As we gradually move more elements of LV to VPlan, we are trying to reduce the number of places that still has to check IR of the original loop. This patch adjusts the code to fix cross iteration phis to get the PHIs to fix directly from the VPlan that is executed. We still need the original PHI to check for first-order recurrences, but we can get rid of that once we model that explicitly in VPlan as well. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D99293	2021-05-03 14:09:46 +01:00
Florian Hahn	942e068d7a	[VPlan] Add VPBasicBlock::phis() helper (NFC). This patch introduces a helper to obtain an iterator range for the PHI-like recipes in a block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100101	2021-05-02 19:20:13 +01:00
Florian Hahn	a0e1313c23	[VPlan] Add getVPSingleValue helper. As suggested in D99294, this adds a getVPSingleValue helper to use for recipes that are guaranteed to define a single value. This replaces uses of getVPValue() which used to default to I = 0.	2021-04-29 13:37:38 +01:00
Florian Hahn	7302fe4328	[VPlan] Make blocksOnly work properly with ranges over const pointers. When iterating over const blocks, the base type in the lambdas needs to use const VPBlockBase *, otherwise it cannot be used with input iterators over const VPBlockBase. Also adjust the type of the input iterator range to const &, as it does not take ownership of the input range.	2021-04-26 10:52:35 +01:00
Florian Hahn	4b9be5ac08	[VPlan] Add VPBlockUtils::blocksOnly helper. This patch adds a blocksOnly helpers which take an iterator range over VPBlockBase * or const VPBlockBase * and returns an interator range that only include BlockTy blocks. The accesses are casted to BlockTy. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D101093	2021-04-25 17:38:09 +01:00
Florian Hahn	89c4dda076	[VPlan] Add GraphTraits impl to traverse through VPRegionBlock. This patch adds a new iterator to traverse through VPRegionBlocks and a GraphTraits specialization using the iterator to traverse through VPRegionBlocks. Because there is already a GraphTraits specialization for VPBlockBase * and co, a new VPBlockRecursiveTraversalWrapper helper is introduced. This allows us to provide a new GraphTraits specialization for that type. Users can use the new recursive traversal by using this wrapper. The graph trait visits both the entry block of a region, as well as all its successors. Exit blocks of a region implicitly have their parent region's successors. This ensures all blocks in a region are visited before any blocks in a successor region when doing a reverse post-order traversal of the graph. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100175	2021-04-23 17:26:47 +01:00
Sander de Smalen	86729538bd	[LV] Let selectVectorizationFactor reason directly on VectorizationFactor. Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single VectorizationFactor which comprises the two and also removes the need for converting an integer value to float. This simplifies the query when asking if one VF is more profitable than another when we want to extend this for scalable vectors (which may require additional options to determine if e.g. a scalable VF of the some cost, is more profitable than a fixed VF of the same cost). The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs no longer truncates the floating-point cost from `float` to `unsigned` to then perform the calculation on the truncated cost. It now does a cost comparison with the correct precision. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100121	2021-04-20 09:54:45 +01:00
Florian Hahn	6adebe3fd2	[VPlan] Add VPRecipeBase::mayHaveSideEffects. Add an initial version of a helper to determine whether a recipe may have side-effects. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100259	2021-04-15 11:49:40 +01:00
Huihui Zhang	d857a81437	[VPlan] Use SetVector for VPExternalDefs to prevent non-determinism. Use SetVector instead of SmallPtrSet for external definitions created for VPlan. Doing this can help avoid non-determinism caused by iterating over unordered containers. This bug was found with reverse iteration turning on, --extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON". Failing LLVM-Unit test VPRecipeTest.dump. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D99544	2021-03-30 12:10:56 -07:00
Andrei Elovikov	92205cb27f	[NFC][VPlan] Guard print routines with "#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)" Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D98897	2021-03-19 10:50:12 -07:00
Andrei Elovikov	93a9d2de8f	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-19 10:50:12 -07:00
Mehdi Amini	3614df3537	Revert "[VPlan] Add plain text (not DOT's digraph) dumps" This reverts commit `6b053c9867`. The build is broken: ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const >>> referenced by LoopVectorize.cpp >>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a	2021-03-18 19:20:39 +00:00
Andrei Elovikov	6b053c9867	[VPlan] Add plain text (not DOT's digraph) dumps I foresee two uses for this: 1) It's easier to use those in debugger. 2) Once we start implementing more VPlan-to-VPlan transformations (especially inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in LIT test would become too obscure. I can imagine that we'd want to CHECK against VPlan dumps after multiple transformations instead. That would be easier with plain text dumps than with DOT format. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96628	2021-03-18 11:33:39 -07:00
Florian Hahn	f586de8459	[VPlan] Remove PredInst2Recipe, use VP operands instead. (NFC) Instead of maintaining a separate map from predicated instructions to recipes, we can instead directly look at the VP operands. If the operand comes from a predicated instruction, the operand will be a VPPredInstPHIRecipe with a VPReplicateRecipe as its operand.	2021-03-16 17:40:35 +00:00
David Sherwood	fec0a0adac	[SVE][LoopVectorize] Add support for extracting the last lane of a scalable vector There are certain loops like this below: for (int i = 0; i < n; i++) { a[i] = b[i] + 1; *inv = a[i]; } that can only be vectorised if we are able to extract the last lane of the vectorised form of 'a[i]'. For fixed width vectors this already works since we know at compile time what the final lane is, however for scalable vectors this is a different story. This patch adds support for extracting the last lane from a scalable vector using a runtime determined lane value. I have added support to VPIteration for runtime-determined lanes that still permit the caching of values. I did this by introducing a new class called VPLane, which describes the lane we're dealing with and provides interfaces to get both the compile-time known lane and the runtime determined value. Whilst doing this work I couldn't find any explicit tests for extracting the last lane values of fixed width vectors so I added tests for both scalable and fixed width vectors. Differential Revision: https://reviews.llvm.org/D95139	2021-03-05 09:57:56 +00:00
Florian Hahn	a6c81d3366	[VPlan] Remove recipes from back to front. Update the deletion order when destroying VPBasicBlocks. This ensures recipes that depend on earlier ones in the block are removed first. Otherwise this may cause issues when recipes have remaining users later in the block.	2021-03-01 16:06:30 +00:00
Andrei Elovikov	3605b873f6	[NFC][VPlan] Use VPUser to store block's predicate Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D96529	2021-02-23 11:08:27 -08:00
Florian Hahn	15a74b64df	[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe. This patch extends VPWidenPHIRecipe to manage pairs of incoming (VPValue, VPBasicBlock) in the VPlan native path. This is made possible because we now directly manage defined VPValues for recipes. By keeping both the incoming value and block in the recipe directly, code-generation in the VPlan native path becomes independent of the predecessor ordering when fixing up non-induction phis, which currently can cause crashes in the VPlan native path. This fixes PR45958. Reviewed By: sguggill Differential Revision: https://reviews.llvm.org/D96773	2021-02-22 09:44:25 +00:00
Florian Hahn	edc92a1c42	[LV] Remove VPCallback. Now that all state for generated instructions is managed directly in VPTransformState, VPCallBack is no longer needed. This patch updates the last use of `getOrCreateScalarValue` to instead manage the value directly in VPTransformState and removes VPCallback. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95383	2021-02-19 12:50:41 +00:00
Florian Hahn	f64c626069	[VPlan] Remove unused Phi member from VPWidenPHIRecipe (NFC). The member is not needed any longer after recent changes.	2021-02-16 13:53:06 +00:00
Florian Hahn	54a14c264a	[VPlan] Manage scalarized values using VPValues. This patch updates codegen to use VPValues to manage the generated scalarized instructions. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92285	2021-02-16 09:04:10 +00:00
Florian Hahn	85fe5c9345	[VPlan] Make VPRecipeBase inherit from VPUser directly (NFC). The individual recipes have been updated to manage their operands using VPUser a while back. Now that the transition is done, we can instead make VPRecipeBase a VPUser and get rid of the toVPUser helper.	2021-02-12 13:06:58 +00:00
Florian Hahn	fd8afa41eb	[VPlan] Use VPUser to manage CondBit VP blocks keep track of a condition, which is a VPValue. This patch updates VPBlockBase to manage the value using VPUser, so replaceAllUsesWith properly updates the condition bit as well. This is required to enable VP2VP transformations and it helps with simplifying some of the code required to manage condition bits. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95382	2021-02-09 21:53:50 +00:00
Florian Hahn	3bb6dc0b26	[LV] Replace some uses of VectorLoopValueMap with VPTransformState (NFC) This patch updates some places where VectorLoopValueMap is accessed directly to instead go through VPTransformState. As we move towards managing created values exclusively in VPTransformState, this ensures the use always can fetch the correct value. This is in preparation for D92285, which switches to managing scalarized values through VPValues. In the future, the various fix* functions should be moved directly into the VPlan codegen stage. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D95757	2021-02-07 18:28:21 +00:00
Florian Hahn	daaa0e3501	[VPlan] Manage induction value creation using VPValues. This patch updates the induction value creation to use VPValues of recipes to map the created values. This should bring is one step closer to being able to optimize induction recipes directly in VPlan. Currently widenIntOrFpInduction also generates vector values for a cast of the induction, if it exists. Make this explicit by adding the cast instruction to the values defined by the recipe. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92284	2021-02-03 17:45:03 +00:00
David Sherwood	d4626eb0bd	[VPlan][NFC] Introduce constructors for VPIteration This patch adds constructors to VPIteration as a cleaner way of initialising the struct and replaces existing constructions of the form: {Part, Lane} with VPIteration(Part, Lane) I have also added a default constructor, which is used by VPlan.cpp when deciding whether to replicate a block or not. This refactoring will be required in a later patch that adds more members and functions to VPIteration. Differential Revision: https://reviews.llvm.org/D95676	2021-02-03 08:52:27 +00:00
Sanjay Patel	ab93c18c12	[LoopVectorize] use IR fast-math-flags exclusively (not FP function attributes) I am trying to untangle the fast-math-flags propagation logic in the vectorizers (see `a6f022127` for SLP). The loop vectorizer has a mix of checking FP function attributes, IR-level FMF, and just wrong assumptions. I am trying to avoid regressions while fixing this, and I think the IR-level logic is good enough for that, but it's hard to say for sure. This would be the 1st step in the clean-up. The existing test that I changed to include 'fast' actually shows a miscompile: the function only had the equivalent of nnan, but we created new instructions that had fast (all FMF set). This is similar to the example in https://llvm.org/PR35538 Differential Revision: https://reviews.llvm.org/D95452	2021-01-27 14:17:11 -05:00
Florian Hahn	3201274dea	[VPlan] Handle scalarized values in VPTransformState. This patch adds plumbing to handle scalarized values directly in VPTransformState. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D92282	2021-01-25 14:21:56 +00:00
Kazu Hirata	8857202489	[llvm] Use llvm::find (NFC)	2021-01-19 20:19:14 -08:00
Florian Hahn	eb0371e403	[VPlan] Unify value/recipe printing after VPDef transition. This patch unifies the way recipes and VPValues are printed after the transition to VPDef. VPSlotTracker has been updated to iterate over all recipes and all their defined values to number those. There is no need to number values in Value2VPValue. It also updates a few places that only used slot numbers for VPInstruction. All recipes now can produce numbered VPValues.	2021-01-11 14:42:46 +00:00
Florian Hahn	65f578fc0e	[VPlan] Keep start value of VPWidenPHIRecipe as VPValue. Similar to D92129, update VPWidenPHIRecipe to manage the start value as VPValue. This allows adjusting the start value as a VPlan transform, which will be used in a follow-up patch to support reductions during epilogue vectorization. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D93975	2021-01-09 16:34:15 +00:00
Florian Hahn	c493e9216b	[VPlan] Move reduction start value creation to widenPHIRecipe. This was suggested to prepare for D93975. By moving the start value creation to widenPHInstruction, we set the stage to manage the start value directly in VPWidenPHIRecipe, which be used subsequently to set the 'resume' value for reductions during epilogue vectorization. It also moves RdxDesc to the recipe, so we do not have to rely on Legal to look it up later. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D94175	2021-01-08 17:49:43 +00:00
David Green	72fb5ba079	[LV] Don't sink into replication regions The new test case here contains a first order recurrences and an instruction that is replicated. The first order recurrence forces an instruction to be sunk _into_, as opposed to after the replication region. That causes several things to go wrong including registering vector instructions multiple times and failing to create dominance relations correctly. Instead we should be sinking to after the replication region, which is what this patch makes sure happens. Differential Revision: https://reviews.llvm.org/D93629	2021-01-08 09:50:10 +00:00
Florian Hahn	816dba48af	[VPlan] Keep start value in VPWidenIntOrFpInductionRecipe (NFC). This patch updates VPWidenIntOrFpInductionRecipe to hold the start value for the induction variable. This makes the start value explicit and allows for adjusting the start value for a VPlan. The flexibility will be used in further patches. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D92129	2021-01-06 11:47:33 +00:00
Florian Hahn	f73c09caa2	[VPlan] Use public VPValue constructor in VPPRedInstPHIRecipe (NFC). VPPredInstPHIRecipe does not need access to VPValue via friendship. It can just use the public constructor, Discussed as part of D92281.	2021-01-06 10:47:09 +00:00
Kazu Hirata	789d250613	[CodeGen, Transforms] Use *Map::lookup (NFC)	2020-12-27 09:57:27 -08:00
Florian Hahn	c0c0ae16c3	[VPlan] Make VPInstruction a VPDef This patch turns updates VPInstruction to manage the value it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90565	2020-12-22 09:53:47 +00:00
Florian Hahn	f250892373	[VPlan] Make VPRecipeBase inherit from VPDef. This patch makes VPRecipeBase a direct subclass of VPDef, moving the SubclassID to VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90564	2020-12-21 13:34:00 +00:00
Florian Hahn	cd608dc8d3	[VPlan] Use VPDef for VPInterleaveRecipe. This patch turns updates VPInterleaveRecipe to manage the values it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90562	2020-12-21 10:56:53 +00:00
Florian Hahn	7186a3965a	[VPlan] Use VPDef for VPWidenSelectRecipe. This patch turns updates VPWidenSelectRecipe to manage the value it defines using VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90560	2020-12-15 14:15:01 +00:00
Florian Hahn	318f5798d8	[VPlan] Use VPDef for VPWidenGEPRecipe. This patch turns updates VPWidenGEPRecipe to manage the value it defines using VPDef. The VPValue is used during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90561	2020-12-15 09:30:14 +00:00
Florian Hahn	ad1161f9b5	[VPlan] Use VPdef for VPWidenCall. This patch turns updates VPWidenREcipe to manage the value it defines using VPDef. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90559	2020-12-15 09:20:07 +00:00
Florian Hahn	e42e5263bd	[VPlan] Make VPWidenMemoryInstructionRecipe a VPDef. This patch updates VPWidenMemoryInstructionRecipe to use VPDef to manage the value it produces instead of inheriting from VPValue. Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D90563	2020-12-14 14:13:59 +00:00
Sander de Smalen	d568cff696	[LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF. * Steps are scaled by `vscale`, a runtime value. * Changes to circumvent the cost-model for now (temporary) so that the cost-model can be implemented separately. This can vectorize the following loop [1]: void loop(int N, double a, double b) { #pragma clang loop vectorize_width(4, scalable) for (int i = 0; i < N; i++) { a[i] = b[i] + 1.0; } } [1] This source-level example is based on the pragma proposed separately in D89031. This patch only implements the LLVM part. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91077	2020-12-09 11:25:21 +00:00
Florian Hahn	fe83adb05a	[VPlan] Use VPUser to manage VPPredInstPHIRecipe operand (NFC). VPPredInstPHIRecipe is one of the recipes that was missed during the initial conversion. This patch adjusts the recipe to also manage its operand using VPUser.	2020-11-30 13:09:58 +00:00
Florian Hahn	a813090072	[VPlan] Manage stored values of interleave groups using VPUser (NFC) Interleave groups also depend on the values they store. Manage the stored values as VPUser operands. This is currently a NFC, but is required to allow VPlan transforms and to manage generated vector values exclusively in VPTransformState.	2020-11-29 17:24:36 +00:00
Florian Hahn	bd0b1311db	[VPlan] Turn VPReplicateRecipe into a VPValue. Update VPReplicateRecipe to inherit from VPValue. This still does not update scalarizeInstruction to set the result for the VPValue of VPReplicateRecipe, because this first requires tracking scalar values in VPTransformState. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D91500	2020-11-26 13:50:24 +00:00
Florian Hahn	ad5b83ddcf	[VPlan] Add VPReductionSC to VPUser::classof, unify VPValue IDs. This is a follow-up to `00a6601136` to make isa<VPReductionRecipe> work and unifies the VPValue ID names, by making sure they all consistently start with VPV*.	2020-11-25 11:08:25 +00:00
David Green	e0c479cd0e	[VPlan] Switch VPWidenRecipe to be a VPValue Similar to other patches, this makes VPWidenRecipe a VPValue. Because of the way it interacts with the reduction code it also slightly alters the way that VPValues are registered, removing the up front NeedDef and using getOrAddVPValue to create them on-demand if needed instead. Differential Revision: https://reviews.llvm.org/D88447	2020-11-25 08:25:06 +00:00
David Green	00a6601136	[VPlan] Turn VPReductionRecipe into a VPValue This converts the VPReductionRecipe into a VPValue, like other VPRecipe's in preparation for traversing def-use chains. It also makes it a VPUser, now storing the used VPValues as operands. It doesn't yet change how the VPReductionRecipes are created. It will need to call replaceAllUsesWith from the original recipe they replace, but that is not done yet as VPWidenRecipe need to be created first. Differential Revision: https://reviews.llvm.org/D88382	2020-11-25 08:25:05 +00:00
Florian Hahn	0c119ba8a8	[VPlan] Use VPValue def for VPWidenGEPRecipe. This patch turns VPWidenGEPRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84683	2020-11-15 15:12:47 +00:00
Florian Hahn	a70b511e78	Recommit "[VPlan] Use VPValue def for VPWidenSelectRecipe." This reverts the revert commit `c8d73d939f`. It includes a fix for cases where we missed inserting VPValues for some selects, which should fix PR48142.	2020-11-14 20:00:25 +00:00
Florian Hahn	c8d73d939f	Revert "[VPlan] Use VPValue def for VPWidenSelectRecipe." This reverts commit `a8e50f1c6e`. This reportedly breaks building the Linux kernel. https://bugs.llvm.org/show_bug.cgi?id=48142	2020-11-10 22:50:46 +00:00
Florian Hahn	a8e50f1c6e	[VPlan] Use VPValue def for VPWidenSelectRecipe. This patch turns VPWidenSelectRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84682	2020-11-10 19:39:37 +00:00
Florian Hahn	537829f2a7	[VPlan] Add isStore helper to VPWidenMemoryInstructionRecipe (NFC). Move logic to check if the recipe is a store to a helper for easier reuse.	2020-11-09 14:01:29 +00:00
Florian Hahn	fec64de261	[VPlan] Use VPValue def for VPWidenCall. This patch turns VPWidenCall into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84681	2020-11-09 13:29:41 +00:00
Sander de Smalen	4a3bb9ea6c	[VPlan] NFC: Change VFRange to take ElementCount This patch changes the type of Start, End in VFRange to be an ElementCount instead of `unsigned`. This is done as preparation to make VPlans for scalable vectors, but is otherwise NFC. Reviewed By: dmgreen, fhahn, vkmr Differential Revision: https://reviews.llvm.org/D90715	2020-11-06 09:50:20 +00:00
Florian Hahn	93f6c6b79c	Recommit "[VPlan] Use VPValue def for VPMemoryInstructionRecipe." This reverts the revert commit `710aceb645` and includes a fix for a memsan failure. Original message: This patch turns VPMemoryInstructionRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible.	2020-10-14 17:41:23 +01:00
Vitaly Buka	710aceb645	Revert "[VPlan] Use VPValue def for VPMemoryInstructionRecipe." It introduced a memory leak. This reverts commit `525b085a65`.	2020-10-13 03:14:08 -07:00
Florian Hahn	525b085a65	[VPlan] Use VPValue def for VPMemoryInstructionRecipe. This patch turns VPMemoryInstructionRecipe into a VPValue and uses it during VPlan construction and codegeneration instead of the plain IR reference where possible. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D84680	2020-10-12 18:02:33 +01:00
Florian Hahn	ea058d289c	[VPlan] Use operands for printing of VPWidenMemoryInstructionRecipe. Now that operands of the recipe are managed through VPUser, we can simplify the printing by just using the operands.	2020-10-12 16:51:54 +01:00
David Green	be6e8e50f4	[LV] Tail folded inloop reductions. This expands upon the inloop reductions added in e9761688e41cb9e976, allowing them to be inserted into tail folded loops. Reductions are generates with the form: x = select(mask, vecop, zero) v = vecreduce.add(x) c = add chain, v Where zero here is chosen as the identity value for add reductions. The backend is then expected to fold the select and the vecreduce into a single predicated instruction. Most of the code is fairly straight forward, except for the creation of blockmasks which need to ensure they are created in dominance order. The order they are added is altered to be after any phis, keeping the requirements for the underlying IR. Differential Revision: https://reviews.llvm.org/D84451	2020-10-11 16:58:34 +01:00
Florian Hahn	348d85a6c7	[VPlan] Clean up uses/operands on VPBB deletion. Update the code responsible for deleting VPBBs and recipes to properly update users and release operands. This is another preparation for D84680 & following patches towards enabling modeling def-use chains in VPlan.	2020-10-05 14:43:52 +01:00
Florian Hahn	357bbaab66	[VPlan] Add VPRecipeBase::toVPUser helper (NFC). This adds a helper to convert a VPRecipeBase pointer to a VPUser, for recipes that inherit from VPUser. Once VPRecipeBase directly inherits from VPUser this helper can be removed.	2020-10-04 19:43:27 +01:00
Florian Hahn	d856365470	[VPlan] Change recipes to inherit from VPUser instead of a member var. Now that VPUser is not inheriting from VPValue, we can take the next step and turn the recipes that already manage their operands via VPUser into VPUsers directly. This is another small step towards traversing def-use chains in VPlan. This is NFC with respect to the generated code, but makes the interface more powerful.	2020-09-30 14:39:00 +01:00
Florian Hahn	31923f6b36	[VPlan] Disconnect VPValue and VPUser. This refactors VPuser to not inherit from VPValue to facilitate introducing operations that introduce multiple VPValues (e.g. VPInterleaveRecipe). Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D84679	2020-09-23 14:44:31 +01:00
Florian Hahn	c671e34bf2	[VPlan] Add dump() helper to VPValue & VPRecipeBase. This provides a convenient way to print VPValues and recipes in a debugger. In particular it saves the user from instantiating VPSlotTracker to print recipes or values.	2020-09-22 15:55:16 +01:00
Simon Pilgrim	5ea9e655ef	VPlan.h - remove unnecessary forward declarations. NFCI. Already defined in includes.	2020-09-07 18:35:06 +01:00
David Sherwood	f4257c5832	[SVE] Make ElementCount members private This patch changes ElementCount so that the Min and Scalable members are now private and can only be accessed via the get functions getKnownMinValue() and isScalable(). In addition I've added some other member functions for more commonly used operations. Hopefully this makes the class more useful and will reduce the need for calling getKnownMinValue(). Differential Revision: https://reviews.llvm.org/D86065	2020-08-28 14:43:53 +01:00
Francesco Petrogalli	5a34b3ab95	[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI] Changes: * Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() \|\| VF.isScalar()`. * Replaced the uses of `llvm::SmallSet<ElementCount, ...>` with `llvm::SmallSetVector<ElementCount, ...>`. This avoids the need of an ordering function for the `ElementCount` class. * Bits and pieces around printing the `ElementCount` to string streams. To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places: 1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line. Note that in some places the idiom ``` unsigned VF = ... auto VTy = FixedVectorType::get(ScalarTy, VF) ``` has been replaced with ``` ElementCount VF = ... assert(!VF.Scalable && ...); auto VTy = VectorType::get(ScalarTy, VF) ``` The assertion guarantees that the new code is (at least in debug mode) functionally equivalent to the old version. Notice that this change had been possible because none of the methods that are specific to `FixedVectorType` were used after the instantiation of `VTy`. Reviewed By: rengolin, ctetreau Differential Revision: https://reviews.llvm.org/D85794	2020-08-24 13:54:03 +00:00
Francesco Petrogalli	bad7d6b373	Revert "[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI]" Reverting because the commit message doesn't reflect the one agreed on phabricator at https://reviews.llvm.org/D85794. This reverts commit `c8d2b065b9`.	2020-08-24 13:50:55 +00:00
Francesco Petrogalli	c8d2b065b9	[llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI] Changes: * Change `ToVectorTy` to deal directly with `ElementCount` instances. * `VF == 1` replaced with `VF.isScalar()`. * `VF > 1` and `VF >=2` replaced with `VF.isVector()`. * `VF <=1` is replaced with `VF.isZero() \|\| VF.isScalar()`. * Add `<` operator to `ElementCount` to be able to use `llvm::SmallSetVector<ElementCount, ...>`. * Bits and pieces around printing the ElementCount to string streams. * Added a static method to `ElementCount` to represent a scalar. To guarantee that this change is a NFC, `VF.Min` and asserts are used in the following places: 1. When it doesn't make sense to deal with the scalable property, for example: a. When computing unrolling factors. b. When shuffle masks are built for fixed width vector types In this cases, an assert(!VF.Scalable && "<mgs>") has been added to make sure we don't enter coepaths that don't make sense for scalable vectors. 2. When there is a conscious decision to use `FixedVectorType`. These uses of `FixedVectorType` will likely be removed in favour of `VectorType` once the vectorizer is generic enough to deal with both fixed vector types and scalable vector types. 3. When dealing with building constants out of the value of VF, for example when computing the vectorization `step`, or building vectors of indices. These operation _make sense_ for scalable vectors too, but changing the code in these places to be generic and make it work for scalable vectors is to be submitted in a separate patch, as it is a functional change. 4. When building the potential VFs in VPlan. Making the VPlan generic enough to handle scalable vectorization factors is a functional change that needs a separate patch. See for example `void LoopVectorizationPlanner::buildVPlans(unsigned MinVF, unsigned MaxVF)`. 5. The class `IntrinsicCostAttribute`: this class still uses `unsigned VF` as updating the field to use `ElementCount` woudl require changes that could result in changing the behavior of the compiler. Will be done in a separate patch. 7. When dealing with user input for forcing the vectorization factor. In this case, adding support for scalable vectorization is a functional change that migh require changes at command line. Differential Revision: https://reviews.llvm.org/D85794	2020-08-24 13:39:42 +00:00
David Green	745bf6cf44	[LoopVectorizer] Inloop vector reductions Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069	2020-08-06 10:10:50 +01:00
Jordan Rupprecht	3c39db0c44	Revert "[LoopVectorizer] Inloop vector reductions" This reverts commit `e9761688e4`. It breaks the build: ``` ~/src/llvm-project/llvm/lib/Analysis/IVDescriptors.cpp:868:10: error: no viable conversion from returned value of type 'SmallVector<[...], 8>' to function return type 'SmallVector<[...], 4>' return ReductionOperations; ```	2020-08-05 10:24:15 -07:00
David Green	e9761688e4	[LoopVectorizer] Inloop vector reductions Arm MVE has multiple instructions such as VMLAVA.s8, which (in this case) can take two 128bit vectors, sign extend the inputs to i32, multiplying them together and sum the result into a 32bit general purpose register. So taking 16 i8's as inputs, they can multiply and accumulate the result into a single i32 without any rounding/truncating along the way. There are also reduction instructions for plain integer add and min/max, and operations that sum into a pair of 32bit registers together treated as a 64bit integer (even though MVE does not have a plain 64bit addition instruction). So giving the vectorizer the ability to use these instructions both enables us to vectorize at higher bitwidths, and to vectorize things we previously could not. In order to do that we need a way to represent that the reduction operation, specified with a llvm.experimental.vector.reduce when vectorizing for Arm, occurs inside the loop not after it like most reductions. This patch attempts to do that, teaching the vectorizer about in-loop reductions. It does this through a vplan recipe representing the reductions that the original chain of reduction operations is replaced by. Cost modelling is currently just done through a prefersInloopReduction TTI hook (which follows in a later patch). Differential Revision: https://reviews.llvm.org/D75069	2020-08-05 18:14:05 +01:00
Florian Hahn	c0cdba727a	[VPlan] Add & use VPValue for VPWidenGEPRecipe operands (NFC). This patch adds VPValue version of the GEP's operands to VPWidenGEPRecipe and uses them during code-generation. Reviewers: Ayal, gilr, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D80220	2020-06-26 20:59:17 +01:00
Sjoerd Meijer	e345d547a0	Recommit "[LV] Emit @llvm.get.active.lane.mask for tail-folded loops" Fixed ARM regression test. Please see the original commit message rG47650451738c for details.	2020-06-17 13:12:15 +01:00
Sjoerd Meijer	d4e183f686	Revert "[LV] Emit @llvm.get.active.mask for tail-folded loops" This reverts commit `4765045173` while I investigate the build bot failures.	2020-06-17 10:09:54 +01:00
Sjoerd Meijer	4765045173	[LV] Emit @llvm.get.active.mask for tail-folded loops This emits new IR intrinsic @llvm.get.active.mask for tail-folded vectorised loops if the intrinsic is supported by the backend, which is checked by querying TargetTransform hook emitGetActiveLaneMask. This intrinsic creates a mask representing active and inactive vector lanes, which is used by the masked load/store instructions that are created for tail-folded loops. The semantics of @llvm.get.active.mask are described here in LangRef: https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics This intrinsic is also used to provide a hint to the backend. That is, the second argument of the intrinsic represents the back-edge taken count of the loop. For MVE, for example, we use that to set up tail-predication, which is a new form of predication in MVE for vector loops that implicitely predicates the last vector loop iteration by implicitely setting active/inactive lanes, i.e. the tail loop is predicated. In order to set up a tail-predicated vector loop, we need to know the number of data elements processed by the vector loop, which corresponds the the tripcount of the scalar loop, which we can now reconstruct using @llvm.get.active.mask. Differential Revision: https://reviews.llvm.org/D79100	2020-06-17 09:53:58 +01:00
Florian Hahn	211596c94e	[VPlan] Support extracting lanes for defs managed in VPTransformState. Currently extracting a lane for a VPValue def is not supported, if it is managed directly by VPTransformState (e.g. because it is created by a VPInstruction or an external VPValue def). For now, simply extract the requested lane. In the future, we should also cache the extracted scalar values, similar to LV. Reviewers: Ayal, rengolin, gilr, SjoerdMeijer Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D80787	2020-06-03 12:14:16 +01:00
Florian Hahn	15224408f0	[VPlan] Use VPUser for VPWidenSelectRecipe operands (NFC). VPWidenSelectRecipe already contains a VPUser, but it is not used. This patch updates the code related to VPWidenSelectRecipe to use VPUser for its operands. Reviewers: Ayal, gilr, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D80219	2020-05-24 13:58:08 +01:00
Florian Hahn	cff9399f6b	[VPlan] Fix comment for User in VPWidenSelectRecipe (NFC). The comment was referring the arguments of the call, but the recipe widens a select.	2020-05-19 15:31:39 +01:00
Florian Hahn	f828d75b46	[VPlan] Add & use VPValue operands for VPReplicateRecipe (NFC). This patch adds VPValue version of the instruction operands to VPReplicateRecipe and uses them during code-generation. Reviewers: Ayal, gilr, rengolin Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D80114	2020-05-19 15:12:17 +01:00
Florian Hahn	66ad107452	[VPlan] Remove unique_ptr from VPBranchOnRecipeMask (NFC). We can remove a dynamic memory allocation, by checking the number of operands: no operands = all true, 1 operand = mask. Reviewers: Ayal, gilr, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D80110	2020-05-19 15:01:37 +01:00
Florian Hahn	bbdfcf8f69	[VPlan] Remove unused & undefined print method (NFC).	2020-05-03 18:36:20 +01:00
Florian Hahn	e89379856a	Recommit "[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)." The crash that caused the original revert has been fixed in `a3c964a278`. I also added a reduced version of the crash reproducer. This reverts the revert commit `2107af9ccf`.	2020-04-29 11:40:39 +01:00
Ayal Zaks	a3c964a278	[LV] Fix recording of BranchTakenCount for FoldTail When folding tail, branch taken count is computed during initial VPlan execution and recorded to be used by the compare computing the loop's mask. This recording should directly set the State, instead of reusing Value2VPValue mapping which serves original Values present prior to vectorization. The branch taken count may be a constant Value, which may be used elsewhere in the loop; trying to employ Value2VPValue for both leads to the issue reported in https://reviews.llvm.org/D76992#inline-721028 Differential Revision: https://reviews.llvm.org/D78847	2020-04-26 20:13:10 +03:00
Mehdi Amini	2107af9ccf	Revert "[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC)." This reverts commit `9245c7ac13`. This is triggering a segfault in XLA downstream, we'll follow-up with a reproducer, it is likely influenced by TTI/TLI settings or other options as a simple `opt -loop-vectorize` invocation on the IR before the crash does not reproduce immediately.	2020-04-24 05:07:32 +00:00
Simon Pilgrim	b108a457e1	[VPlan] Remove unused forward declarations. NFC. Move VPlan.h include from VPlanVerifier.h down to VPlanVerifier.cpp	2020-04-23 12:34:20 +01:00
Florian Hahn	9245c7ac13	[VPlan] Add & use VPValue operands for VPWidenRecipe (NFC). This patch adds VPValue version of the instruction operands to VPWidenRecipe and uses them during code-generation. Similar to D76373 this reduces ingredient def-use usage by ILV as a step towards full VPlan-based def-use relations. Reviewers: rengolin, Ayal, gilr Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D76992	2020-04-23 12:16:46 +01:00
Gil Rapaport	b747d72c19	[LV] Fix PR45525: Incorrect assert in blend recipe Fix an assert introduced in 41ed5d856c1: a phi with a single predecessor and a mask is a valid case which is already supported by the code. Differential Revision: https://reviews.llvm.org/D78115	2020-04-15 10:39:07 +03:00
Florian Hahn	18138e0252	[VPlan] Introduce VPWidenSelectRecipe (NFC). Widening a selects depends on whether the condition is loop invariant or not. Rather than checking during codegen-time, the information can be recorded at the VPlan construction time. This was suggested as part of D76992, to reduce the reliance on accessing the original underlying IR values. Reviewers: gilr, rengolin, Ayal, hsaito Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D77869	2020-04-13 08:35:28 +01:00
Florian Hahn	719846c469	[VPlan] Drop redundant private: at beginning of class defs (NFC). Default visibility for classes is private, so the private: at the top of various class definitions is redundant. Reviewers: gilr, rengolin, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D77810	2020-04-11 13:27:10 +01:00

1 2 3 4 5

208 Commits