llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	e9d97d7d9d	[VPlan] Add mayReadOrWriteMemory & friends. This patch adds initial implementation of mayReadOrWriteMemory, mayReadFromMemory and mayWriteToMemory to VPRecipeBase. Used by D100258.	2021-05-24 13:11:32 +01:00
Florian Hahn	4e8c28b6fb	Recommit "[VectorCombine] Scalarize vector load/extract." This reverts commit `94d54155e2`. This fixes a sanitizer failure by moving scalarizeLoadExtract(I) before foldSingleElementStore(I), which may remove instructions.	2021-05-24 11:35:07 +01:00
Florian Hahn	94d54155e2	Revert "[VectorCombine] Scalarize vector load/extract." This reverts commit `86497785d5`. One of the tests causes an ASAN failure. https://lab.llvm.org/buildbot/#/builders/5/builds/7927/steps/12/logs/stdio	2021-05-24 10:11:00 +01:00
Florian Hahn	86497785d5	[VectorCombine] Scalarize vector load/extract. This patch adds a new combine that tries to scalarize chains of `extractelement (load %ptr), %idx` to `load (gep %ptr, %idx)`. This is profitable when extracting only a few elements out of a large vector. At the moment, `store (extractelement (load %ptr), %idx), %ptr` operations on large vectors result in huge code in the backend. This can easily be triggered by using the matrix extension, e.g. https://clang.godbolt.org/z/qsccPdPf4 This should complement D98240. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D100273	2021-05-24 09:29:08 +01:00
Alexey Bataev	8dab25954b	[SLP]Improve handling of compensate external uses cost. External insertelement users can be represented as a result of shuffle of the vectorized element and noconsecutive insertlements too. Added support for handling non-consecutive insertelements. Differential Revision: https://reviews.llvm.org/D101555	2021-05-21 07:45:31 -07:00
Daniil Fukalov	e8e88c3353	[TTI] NFC: Change getRegUsageForType to return InstructionCost. This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D102541	2021-05-21 15:17:23 +03:00
Alexey Bataev	182162b616	[SLP]Try to vectorize tiny trees with shuffled gathers of extractelements. If we gather extract elements and they actually are just shuffles, it might be profitable to vectorize them even if the tree is tiny. Differential Revision: https://reviews.llvm.org/D101460	2021-05-20 08:36:16 -07:00
David Sherwood	7e95a563c8	Remove scalable vector assert from InnerLoopVectorizer::setDebugLocFromInst In InnerLoopVectorizer::setDebugLocFromInst we were previously asserting that the VF is not scalable. This is because we want to use the number of elements to create a duplication factor for the debug profiling data. However, for scalable vectors we only know the minimum number of elements. I've simply removed the assert for now and added a FIXME saying that we assume vscale is always 1. When vscale is not 1 it just means that the profiling data isn't as accurate, but shouldn't cause any functional problems.	2021-05-19 13:33:10 +01:00
Sander de Smalen	4f86aa650c	[LV] Add -scalable-vectorization=<option> flag. This patch adds a new option to the LoopVectorizer to control how scalable vectors can be used. Initially, this suggests three levels to control scalable vectorization, although other more aggressive options can be added in the future. The possible options are: - Disabled: Disables vectorization with scalable vectors. - Enabled: Vectorize loops using scalable vectors or fixed-width vectors, but favors fixed-width vectors when the cost is a tie. - Preferred: Like 'Enabled', but favoring scalable vectors when the cost-model is inconclusive. Reviewed By: paulwalker-arm, vkmr Differential Revision: https://reviews.llvm.org/D101945	2021-05-19 10:40:56 +01:00
Rong Xu	886629a8c9	[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO This patch implements first part of Flow Sensitive SampleFDO (FSAFDO). It has the following changes: (1) disable current discriminator encoding scheme, (2) new hierarchical discriminator for FSAFDO. For this patch, option "-enable-fs-discriminator=true" turns on the new functionality. Option "-enable-fs-discriminator=false" (the default) keeps the current SampleFDO behavior. When the fs-discriminator is enabled, we insert a flag variable, namely, llvm_fs_discriminator, to the object. This symbol will checked by create_llvm_prof tool, and used to generate a profile with FS-AFDO discriminators enabled. If this happens, for an extbinary format profile, create_llvm_prof tool will add a flag to profile summary section. Differential Revision: https://reviews.llvm.org/D102246	2021-05-18 16:23:43 -07:00
Arthur Eubanks	6b9524a05b	[NewPM] Don't mark AA analyses as preserved Currently all AA analyses marked as preserved are stateless, not taking into account their dependent analyses. So there's no need to mark them as preserved, they won't be invalidated unless their analyses are. SCEVAAResults was the one exception to this, it was treated like a typical analysis result. Make it like the others and don't invalidate unless SCEV is invalidated. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D102032	2021-05-18 13:49:03 -07:00
Florian Hahn	cc1a6361d3	[VPlan] Add VPUserID to distinguish between recipes and others. This allows cast/dyn_cast'ing from VPUser to recipes. This is needed because there are VPUsers that are not recipes. Reviewed By: gilr, a.elovikov Differential Revision: https://reviews.llvm.org/D100257	2021-05-18 09:17:28 +01:00
Sander de Smalen	81fdc73e5d	[LV] Return both fixed and scalable Max VF from computeMaxVF. This patch introduces a new class, MaxVFCandidates, that holds the maximum vectorization factors that have been computed for both scalable and fixed-width vectors. This patch is intended to be NFC for fixed-width vectors, although considering a scalable max VF (which is disabled by default) pessimises tail-loop elimination, since it can no longer determine if any chosen VF (less than fixed/scalable MaxVFs) is guaranteed to handle all vector iterations if the trip-count is known. This issue will be addressed in a future patch. Reviewed By: fhahn, david-arm Differential Revision: https://reviews.llvm.org/D98721	2021-05-18 08:03:48 +01:00
Philip Reames	ed9d70781b	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3)" This reverts commit `6d3e3ae8a9`. Still seeing PPC build bot failures, and one arm self host bot failing. I'm officially stumped, and need help from a bot owner to reduce.	2021-05-17 20:53:28 -07:00
Philip Reames	6d3e3ae8a9	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute (try 3) Resubmit after fixing test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll Previous commit message... This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:59:25 -07:00
Philip Reames	d16da7343d	Revert "[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute" This reverts commit `c23ce54b36`. I apparently missed some newly added non-x86 tests.	2021-05-17 16:49:32 -07:00
Philip Reames	c23ce54b36	[LV] Unconditionally branch from middle to scalar preheader if the scalar loop must execute This is a resubmit of 3e5ce4 (which was reverted by `7fe41ac`). The original commit caused a PPC build bot failure we never really got to the bottom of. I can't reproduce the issue, and the bot owner was non-responsive. In the meantime, we stumbled across an issue which seems possibly related, and worked around a latent bug in `80e8025`. My best guess is that the original patch exposed that latent issue at higher frequency, but it really is just a guess. Original commit message follows... If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block. The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and which exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed. This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCIish prep work, but the changes are a bit too involved for me to feel comfortable tagging the review that way. Differential Revision: https://reviews.llvm.org/D94892	2021-05-17 16:33:56 -07:00
Sander de Smalen	f82966d19a	[LoopVectorizationLegality] NFC: Mark some interfaces as 'const' This patch marks blockNeedsPredication, isConsecutivePtr, isMaskRequired and getSymbolicStrides as 'const'.	2021-05-14 11:53:54 +01:00
Anton Afanasyev	207cdd7ed9	[SLP] Fix spill cost computation for insertelement tree node This is follow up for D98714, bugfixing.	2021-05-14 13:14:41 +03:00
Sander de Smalen	459c48e04f	NFCI: Remove VF argument from isScalarWithPredication As discussed in D102437, the VF argument to isScalarWithPredication seems redundant, so this is intended to be a non-functional change. It seems wrong to query the widening decision at this point. Removing the operand and code to get the widening decision causes no unit/regression tests to fail. I've also found no issues running the LLVM test-suite. This subsequently removes the VF argument from isPredicatedInst as well, since it is no longer required.	2021-05-14 10:34:40 +01:00
Florian Hahn	bdada7546e	[VPlan] Adjust assert in splitBlock to allow splitting at end. SplitAt should only be dereferenced in the assert if it does not point to the end of the block. This fixes a crash in the added test case.	2021-05-13 13:36:35 +01:00
Anton Afanasyev	ab2c499d3a	[SLP] Add insertelement instructions to vectorizable tree Add new type of tree node for `InsertElementInst` chain forming vector. These instructions could be either removed, or replaced by shuffles during vectorization and we can add this node to cost model, so naturally estimating their cost, getting rid of `CompensateCost` tricks and reducing further work for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this patch is the first step towards revectorization of partially vectorization (to fix PR42022 completely). After adding inserts to tree the next step is to add vector instructions there (for instance, to merge `store <2 x float>` and `store <2 x float>` to `store <4 x float>`). Fixes PR40522 and PR35732. Differential Revision: https://reviews.llvm.org/D98714	2021-05-13 07:41:45 +03:00
Justin Bogner	e7d26aceca	Change the context instruction for computeKnownBits in LoadStoreVectorizer pass This change enables cases for which the index value for the first load/store instruction in a pair could be a function argument. This allows using llvm.assume to provide known bits information in such cases. Patch by Viacheslav Nikolaev. Thanks! Differential Revision: https://reviews.llvm.org/D101680	2021-05-12 15:29:29 -07:00
David Sherwood	b7a11274f9	[LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors In InnerLoopVectorizer::widenPHIInstruction there are cases where we have to scalarise a pointer induction variable after vectorisation. For scalable vectors we already deal with the case where the pointer induction variable is uniform, but we currently crash if not uniform. For fixed width vectors we calculate every lane of the scalarised pointer induction variable for a given VF, however this cannot work for scalable vectors. In this case I have added support for caching the whole vector value for each unrolled part so that we can always extract an arbitrary element. Additionally, we still continue to cache the known minimum number of lanes too in order to improve code quality by avoiding an extractelement operation. I have adapted an existing test `pointer_iv_mixed` from the file: Transforms/LoopVectorize/consecutive-ptr-uniforms.ll and added it here for scalable vectors instead: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D101294	2021-05-12 11:02:11 +01:00
Qiu Chaofan	6d2df18163	[VectorComine] Restrict single-element-store index to inbounds constant Vector single element update optimization is landed in `2db4979`. But the scope needs restriction. This patch restricts the index to inbounds and vector must be fixed sized. In future, we may use value tracking to relax constant restrictions. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D102146	2021-05-12 13:18:20 +08:00
Florian Hahn	faebc6bf10	[VPlan] Register recipe for instr if the simplified value is recipe. If the simplified VPValue is a recipe, we need to register it for Instr, in case it needs to be recorded. The way this is handled in general may change soon, following some post-commit comments. This fixes PR50298.	2021-05-11 14:32:34 +01:00
Sanjay Patel	49950cb1f6	[SLP] restrict matching of load combine candidates The test example from https://llvm.org/PR50256 (and reduced here) shows that we can match a load combine candidate even when there are no "or" instructions. We can avoid that by confirming that we do see an "or". This doesn't apply when matching an or-reduction because that match begins from the operands of the reduction. Differential Revision: https://reviews.llvm.org/D102074	2021-05-11 08:46:40 -04:00
Alexey Bataev	30463bc3f1	[SLP]Do not count perfect diamond matches for gathers several times. Need to remove the old code for avoiding double counting of the gather nodes with perfect diamond matches within the tree after we started detecting perfect/shuffled matching in the previous patch D100495. We may skip the cost for such nodes completely. Differential Revision: https://reviews.llvm.org/D102023	2021-05-10 07:08:07 -07:00
Qiu Chaofan	2db4979c0f	[VectorCombine] Simplify to scalar store if only one element updated This patch simplifies load-insertelt-store pattern into getelementptr-store. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D98240	2021-05-08 18:14:51 +08:00
Florian Hahn	75b9997760	[LV] Remove reference of PHI from comment, they are not recorded (NFC). The comment incorrectly states that the PHI is recorded. That's not accurate, only the recipe for the incoming value is recorded. Suggested post-commit for `4ba8720f88`.	2021-05-07 21:34:23 +01:00
Florian Hahn	337d765282	[LV] Assert if trying to sink replicate region into another region (NFC) Currently sinking a replicate region into another replicate region is not supported. Add an assert, to make the problem more obvious, should it occur. Discussed post-commit for `ccebf7a109`.	2021-05-07 21:25:35 +01:00
Florian Hahn	01c26d4e04	[LV] Rename Region to TargetRegion, similar to SinkRegion (NFC). Adjust the name to make it clearer this is the region containing the target recipe, similar to SinkRegion below. Suggested post-commit for `ccebf7a109`.	2021-05-07 21:25:35 +01:00
Caroline Concatto	cf06c8eee3	[LoopVectorize][SVE] Remove assert for scalable vector in InnerLoopVectorizer::fixReduction The function fixReduction used to assert/crash for scalable vector when a vector reduce could be done with a smaller vector. This patch removes this assertion as it is safe to use scalable vector for vector reduce and truncate. Differential Revision: https://reviews.llvm.org/D101260	2021-05-07 09:37:37 +01:00
Simon Pilgrim	338c1b701f	[SLP] Constify the TreeEntry* input into getEntryCost() + setInsertPointAfterBundle(). NFCI.	2021-05-06 16:20:19 +01:00
Simon Pilgrim	2dab059021	[SLP] Constify the TreeEntry* input into dumpTreeCosts(). NFCI.	2021-05-06 16:20:19 +01:00
Simon Pilgrim	1b47489fd0	[SLP] Use empty() instead of size() == 0. NFCI.	2021-05-06 16:20:18 +01:00
David Green	4979c90458	[LV] Account for tripcount when calculation vectorization profitability The loop vectorizer will currently assume a large trip count when calculating which of several vectorization factors are more profitable. That is often not a terrible assumption to make as small trip count loops will usually have been fully unrolled. There are cases however where we will try to vectorize them, and especially when folding the tail by masking can incorrectly choose to vectorize loops that are not beneficial, due to the folded tail rounding the iteration count up for the vectorized loop. The motivating example here has a trip count of 5, so either performs 5 scalar iterations or 2 vector iterations (with VF=4). At a high enough trip count the vectorization becomes profitable, but the rounding up to 2 vector iterations vs only 5 scalar makes it unprofitable. This adds an alternative cost calculation when we know the max trip count and are folding tail by masking, rounding the iteration count up to the correct number for the vector width. We still do not account for anything like setup cost or the mixture of vector and scalar loops, but this is at least an improvement in a few cases that we have had reported. Differential Revision: https://reviews.llvm.org/D101726	2021-05-06 12:36:46 +01:00
Kerry McLaughlin	8c9742bd23	[SVE][LoopVectorize] Add support for scalable vectorization of first-order recurrences Adds support for scalable vectorization of loops containing first-order recurrences, e.g: ``` for(int i = 0; i < n; i++) b[i] = a[i] + a[i - 1] ``` This patch changes fixFirstOrderRecurrence for scalable vectors to take vscale into account when inserting into and extracting from the last lane of a vector. CreateVectorSplice has been added to construct a vector for the recurrence, which returns a splice intrinsic for scalable types. For fixed-width the behaviour remains unchanged as CreateVectorSplice will return a shufflevector instead. The tests included here are the same as test/Transform/LoopVectorize/first-order-recurrence.ll Reviewed By: david-arm, fhahn Differential Revision: https://reviews.llvm.org/D101076	2021-05-06 11:35:39 +01:00
Philip Reames	80e8025083	[LV] Workaround PR49900 (a crash due to analyzing partially mutated IR) LoopVectorize has a fairly deeply baked in design problem where it will try to query analysis (primarily SCEV, but also ValueTracking) in the midst of mutating IR. In particular, the intermediate IR state does not represent the semantics of the original (or final) program. Fixing this for real is hard, but all of the cases seen so far share a common symptom. In cases seen to date, the analysis being queried is the computation of the original loop's trip count. We can fix this particular instance of the issue by simply computing the trip count early, and caching it. I want to be really clear that this is nothing but a workaround. It does nothing to fix the root issue, and at best, delays the time until we have to fix this for real. Florian and I have discussed an eventual solution in the review comments for https://reviews.llvm.org/D100663, but it's a lot of work. Test taken from https://reviews.llvm.org/D100663. Differential Revision: https://reviews.llvm.org/D101487	2021-05-05 09:56:28 -07:00
Florian Hahn	ccebf7a109	[VPlan] Properly handle sinking of replicate regions. This patch updates the code that sinks recipes required for first-order recurrences to properly handle replicate-regions. At the moment, the code would just move the replicate recipe out of its replicate-region, producing an invalid VPlan. When sinking a recipe in a replicate-region, we have to sink the whole region. To do that, we first need to split the block at the target recipe and move the region in between. This patch also adds a splitAt helper to VPBasicBlock to split a VPBasicBlock at a given iterator. Fixes PR50009. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100751	2021-05-04 22:36:01 +01:00
Florian Hahn	4ba8720f88	[VPlan] Representing backedge def-use feeding reduction phis. This patch updates the code handling reduction recipes to also keep track of the incoming value from the latch in the recipe. This is needed to model the def-use chains completely in VPlan, so that it is possible to replace the incoming value with an arbitrary VPValue. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D99294	2021-05-04 16:33:22 +01:00
Sander de Smalen	9931ae645e	Reland "[LV] Calculate max feasible scalable VF." Relands https://reviews.llvm.org/D98509 This reverts commit `51d648c119`.	2021-05-04 15:44:41 +01:00
Alexey Bataev	369cd2ae52	Revert "[SLP]Allow masked gathers only if allowed by target." This reverts commit `fd18547e07`. Need to add a check for the size of the vectorization tree to avoid some extra vectorization.	2021-05-04 04:53:22 -07:00
Alexey Bataev	fd18547e07	[SLP]Allow masked gathers only if allowed by target. Need to check if target allows/supports masked gathers before trying to estimate its cost, otherwise we may fail to vectorize some of the patterns because of too pessimistic cost model. Part of D57059. Differential Revision: https://reviews.llvm.org/D101297	2021-05-03 08:06:20 -07:00
Alexey Bataev	2e4cc9a725	Revert "[SLP]Allow masked gathers only if allowed by target." This reverts commit `b5f64768cf` to fix a compiler crash revealed by buildbots.	2021-05-03 07:20:00 -07:00
Alexey Bataev	b5f64768cf	[SLP]Allow masked gathers only if allowed by target. Need to check if target allows/supports masked gathers before trying to estimate its cost, otherwise we may fail to vectorize some of the patterns because of too pessimistic cost model. Part of D57059. Differential Revision: https://reviews.llvm.org/D101297	2021-05-03 06:45:42 -07:00
Florian Hahn	2b7fa7f744	[LV] Iterate over recipes in VPlan to fix PHI (NFC). As we gradually move more elements of LV to VPlan, we are trying to reduce the number of places that still has to check IR of the original loop. This patch adjusts the code to fix cross iteration phis to get the PHIs to fix directly from the VPlan that is executed. We still need the original PHI to check for first-order recurrences, but we can get rid of that once we model that explicitly in VPlan as well. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D99293	2021-05-03 14:09:46 +01:00
Florian Hahn	942e068d7a	[VPlan] Add VPBasicBlock::phis() helper (NFC). This patch introduces a helper to obtain an iterator range for the PHI-like recipes in a block. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100101	2021-05-02 19:20:13 +01:00
Justin Bogner	9542721085	Add support for llvm.assume intrinsic to the LoadStoreVectorizer pass Patch by Viacheslav Nikolaev. Thanks!	2021-04-30 13:39:46 -07:00
Alexey Bataev	a3fd82c289	[SLP]Fix the crash on cost calculation if non-compatible vectors shuffled. If the extracts from the non-power-2 vectors are recognized as shuffles, need some extra checks to not crash cost calculations if trying to gext the ecost for subvector extracts. In this case need to check carefully that we do not exit out of bounds of the original vector, otherwise the TTI's cost model will crash on assert. Differential Revision: https://reviews.llvm.org/D101477	2021-04-30 09:34:20 -07:00
Alexey Bataev	12c51f2358	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 12:48:00 -07:00
Alexey Bataev	6e859f3cd4	Revert "[COST] Improve shuffle kind detection if shuffle mask is provided." This reverts commit `9239932221` to fix a compiler crash on mask checks.	2021-04-29 12:40:33 -07:00
Alexey Bataev	9239932221	[COST] Improve shuffle kind detection if shuffle mask is provided. Added an extra analysis for better choosing of shuffle kind in getShuffleCost functions for better cost estimation if mask was provided. Differential Revision: https://reviews.llvm.org/D100865	2021-04-29 09:42:56 -07:00
Sander de Smalen	51d648c119	Revert "[LV] Calculate max feasible scalable VF." Temporarily reverting this patch due to some unexpected issue found by one of the PPC buildbots. This reverts commit `584e9b6e4b`.	2021-04-29 16:04:37 +01:00
Florian Hahn	a0e1313c23	[VPlan] Add getVPSingleValue helper. As suggested in D99294, this adds a getVPSingleValue helper to use for recipes that are guaranteed to define a single value. This replaces uses of getVPValue() which used to default to I = 0.	2021-04-29 13:37:38 +01:00
Bardia Mahjour	ddb3b26a12	[LV] Consider Loop Unroll Hints When Making Interleave Decisions This patch causes the loop vectorizer to not interleave loops that have nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)). Note that if a particular interleave count is being requested (through llvm.loop.interleave_count), it will still be honoured, regardless of the presence of nounroll hints. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D101374	2021-04-28 17:27:52 -04:00
David Sherwood	00e65f3345	[LoopVectorize][SVE] Fix crash when vectorising FP negation This patch fixes a crash encountered when vectorising the following loop: void foo(float dst, float src, long long n) { for (long long i = 0; i < n; i++) dst[i] = -src[i]; } using scalable vectors. I've added a test to Transforms/LoopVectorize/AArch64/sve-basic-vec.ll as well as cleaned up the other tests in the same file. Differential Revision: https://reviews.llvm.org/D98054	2021-04-28 15:22:35 +01:00
Tres Popp	f0e848e63d	Silence unused variable warning	2021-04-28 15:46:09 +02:00
Alexey Bataev	8af4723c58	[SLP]Try to vectorize tiny trees with shuffled gathers. If the first tree element is vectorize and the second is gather, it still might be profitable to vectorize it if the gather node contains less scalars to vectorize than the original tree node. It might be profitable to use shuffles. Differential Revision: https://reviews.llvm.org/D101397	2021-04-28 06:35:31 -07:00
David Sherwood	6998f8ae2d	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718	2021-04-28 13:41:07 +01:00
Sander de Smalen	584e9b6e4b	[LV] Calculate max feasible scalable VF. This patch also refactors the way the feasible max VF is calculated, although this is NFC for fixed-width vectors. After this change scalable VF hints are no longer truncated/clamped to a shorter scalable VF, nor does it drop the 'scalable flag' from the suggested VF to vectorize with a similar VF that is fixed. Instead, the hint is ignored which means the vectorizer is free to find a more suitable VF, using the CostModel to determine the best possible VF. Reviewed By: c-rhodes, fhahn Differential Revision: https://reviews.llvm.org/D98509	2021-04-28 12:30:00 +01:00
Kerry McLaughlin	9cc217ab36	[LoopVectorize] Prevent multiple Phis being generated with in-order reductions When using the -enable-strict-reductions flag where UF>1 we generate multiple Phi nodes, though only one of these is used as an input to the vector.reduce.fadd intrinsics. The unused Phi nodes are removed later by instcombine. This patch changes widenPHIInstruction/fixReduction to only generate one Phi, and adds an additional test for unrolling to strict-fadd.ll Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D100570	2021-04-28 11:29:01 +01:00
David Sherwood	6968520c3b	Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost" This reverts commit `4afeda9157`.	2021-04-27 15:46:03 +01:00
David Sherwood	4afeda9157	[LoopVectorize] Simplify scalar cost calculation in getInstructionCost This patch simplifies the calculation of certain costs in getInstructionCost when isScalarAfterVectorization() returns a true value. There are a few places where we multiply a cost by a number N, i.e. unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1; return N * TTI.getArithmeticInstrCost(... After some investigation it seems that there are only these cases that occur in practice: 1. VF is a scalar, in which case N = 1. 2. VF is a vector. We can only get here if: a) the instruction is a GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction variable that remains scalar. I have changed the code so that N is assumed to always be 1. For GEPs the cost is always 0, since this is calculated later on as part of the load/store cost. PHI nodes are costed separately and were never previously multiplied by VF. For all other cases I have added an assert that none of the users needs scalarising, which didn't fire in any unit tests. Only one test required fixing and I believe the original cost for the scalar add instruction to have been wrong, since only one copy remains after vectorisation. I have also added a new test for the case when a pointer PHI feeds directly into a store that will be scalarised as we were previously never testing it. Differential Revision: https://reviews.llvm.org/D99718	2021-04-27 15:26:15 +01:00
Alexey Bataev	24590d8d67	[SLP]Improved isGatherShuffledEntry, NFC. Reworked isGatherShuffledEntry function, simplified and moved common code to the lambda (it shall go away when non-power-2 patch will be landed).	2021-04-27 05:59:46 -07:00
Florian Hahn	cb96d802d4	[LV] Hoist code to get vector loop latch (NFC). Address suggestion from D99294.	2021-04-27 13:30:17 +01:00
Florian Hahn	160e729cf0	[VPlan] Use recursive traversal iterator in VPSlotTracker. This patch simplifies VPSlotTracker by using the recursive traversal iterator to traverse all blocks in a VPlan in reverse post-order when numbering VPValues in a plan. This depends on a fix to RPOT (D100169). It also extends the traversal unit tests to check RPOT. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100176	2021-04-27 12:39:06 +01:00
Florian Hahn	7302fe4328	[VPlan] Make blocksOnly work properly with ranges over const pointers. When iterating over const blocks, the base type in the lambdas needs to use const VPBlockBase *, otherwise it cannot be used with input iterators over const VPBlockBase. Also adjust the type of the input iterator range to const &, as it does not take ownership of the input range.	2021-04-26 10:52:35 +01:00
Florian Hahn	4b9be5ac08	[VPlan] Add VPBlockUtils::blocksOnly helper. This patch adds a blocksOnly helpers which take an iterator range over VPBlockBase * or const VPBlockBase * and returns an interator range that only include BlockTy blocks. The accesses are casted to BlockTy. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D101093	2021-04-25 17:38:09 +01:00
Florian Hahn	89c4dda076	[VPlan] Add GraphTraits impl to traverse through VPRegionBlock. This patch adds a new iterator to traverse through VPRegionBlocks and a GraphTraits specialization using the iterator to traverse through VPRegionBlocks. Because there is already a GraphTraits specialization for VPBlockBase * and co, a new VPBlockRecursiveTraversalWrapper helper is introduced. This allows us to provide a new GraphTraits specialization for that type. Users can use the new recursive traversal by using this wrapper. The graph trait visits both the entry block of a region, as well as all its successors. Exit blocks of a region implicitly have their parent region's successors. This ensures all blocks in a region are visited before any blocks in a successor region when doing a reverse post-order traversal of the graph. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100175	2021-04-23 17:26:47 +01:00
Alexey Bataev	18c61fc498	[SLP]Skip undefs trying to find perfect/shuffled tree entries matching. We can skip check for undefs trying to find perfect/shuffled tree entries matching, they can be ignored completely improving the final cost/vectorization results. Differential Revision: https://reviews.llvm.org/D101061	2021-04-22 08:59:07 -07:00
Joe Ellis	2c551aedcf	[LoopVectorize] Fix bug where predicated loads/stores were dropped This commit fixes a bug where the loop vectoriser fails to predicate loads/stores when interleaving for targets that support masked loads and stores. Code such as: 1 void foo(int restrict data1, int restrict data2) 2 { 3 int counter = 1024; 4 while (counter--) 5 if (data1[counter] > data2[counter]) 6 data1[counter] = data2[counter]; 7 } ... could previously be transformed in such a way that the predicated store implied by: if (data1[counter] > data2[counter]) data1[counter] = data2[counter]; ... was lost, resulting in miscompiles. This bug was causing some tests in llvm-test-suite to fail when built for SVE. Differential Revision: https://reviews.llvm.org/D99569	2021-04-22 15:05:54 +00:00
Alexey Bataev	d4f5f23bbb	[SLP]Replace more `TTI` with `TTIRef`, NFC. To pacify MSVC buildbots.	2021-04-22 07:53:20 -07:00
Alexey Bataev	da2cdfd421	[SLP]Added explicit ref to TargetTransformInfo to try to pacify MSVC buildbots, NFC.	2021-04-22 07:49:48 -07:00
Alexey Bataev	e99b98cb1b	[SLP]Improve cost model for the vectorized extractelements. 1. No need to call `areAllUsersVectorized` as later the cost is calculated only if the instruction has one use and gets vectorized. 2. Need to calculate the cost of the dead extractelement more precisely, taking the vector type of the vector operand, not the resulting vector type. Part of D57059. Differential Revision: https://reviews.llvm.org/D99980	2021-04-22 07:40:17 -07:00
David Sherwood	5a229a6702	[LoopVectorize] Don't create unnecessary vscale intrinsic calls In quite a few cases in LoopVectorize.cpp we call createStepForVF with a step value of 0, which leads to unnecessary generation of llvm.vscale intrinsic calls. I've optimised IRBuilder::CreateVScale and createStepForVF to return 0 when attempting to multiply vscale by 0. Differential Revision: https://reviews.llvm.org/D100763	2021-04-22 09:01:52 +01:00
Alexey Bataev	af870e11ae	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 09:08:46 -07:00
Alexey Bataev	b82344a019	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit `daf6e18c55` to fix the compiler crash.	2021-04-20 08:29:32 -07:00
Alexey Bataev	daf6e18c55	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 07:46:49 -07:00
Alexey Bataev	cf00cb8bed	Revert "[SLP] Add detection of shuffled/perfect matching of tree entries." This reverts commit `b232771aca` to fix buildbots.	2021-04-20 07:16:11 -07:00
Alexey Bataev	b232771aca	[SLP] Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100495	2021-04-20 06:55:55 -07:00
Sander de Smalen	86729538bd	[LV] Let selectVectorizationFactor reason directly on VectorizationFactor. Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single VectorizationFactor which comprises the two and also removes the need for converting an integer value to float. This simplifies the query when asking if one VF is more profitable than another when we want to extend this for scalable vectors (which may require additional options to determine if e.g. a scalable VF of the some cost, is more profitable than a fixed VF of the same cost). The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs no longer truncates the floating-point cost from `float` to `unsigned` to then perform the calculation on the truncated cost. It now does a cost comparison with the correct precision. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100121	2021-04-20 09:54:45 +01:00
Alexey Bataev	8030481065	Revert "[SLP]Add detection of shuffled/perfect matching of tree entries." This reverts commit `d6fde91379` to fix compiler crashes.	2021-04-19 14:10:04 -07:00
Alexey Bataev	d6fde91379	[SLP]Add detection of shuffled/perfect matching of tree entries. SLP supports perfect diamond matching for the vectorized tree entries but do not support it for gathered entries and does not support non-perfect (shuffled) matching with 1 or 2 tree entries. Patch adds support for this matching to improve cost of the vectorized tree. Differential Revision: https://reviews.llvm.org/D100495	2021-04-19 13:29:30 -07:00
Cullen Rhodes	f0bc2782f2	[TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D100377	2021-04-19 11:01:34 +00:00
Florian Hahn	49999d4364	[VPlan] Replace a few unnecessary includes with forward decls.	2021-04-15 20:08:31 +01:00
Florian Hahn	6adebe3fd2	[VPlan] Add VPRecipeBase::mayHaveSideEffects. Add an initial version of a helper to determine whether a recipe may have side-effects. Reviewed By: a.elovikov Differential Revision: https://reviews.llvm.org/D100259	2021-04-15 11:49:40 +01:00
David Sherwood	ea14df695e	[SVE][LoopVectorize] Fix crash in InnerLoopVectorizer::widenPHIInstruction There were a few places in widenPHIInstruction where calculations of offsets were failing to take the runtime calculation of VF into account for scalable vectors. I've fixed those cases in this patch as well as adding an assert that we should not be scalarising for scalable vectors. Tests are added here: Transforms/LoopVectorize/AArch64/sve-widen-phi.ll Differential Revision: https://reviews.llvm.org/D99254	2021-04-15 10:51:49 +01:00
David Sherwood	7120f89f7d	[NFC][LoopVectorize] Remove unnecessary VF.isScalable asserts There are a few places in LoopVectorize.cpp where we have been too cautious in adding VF.isScalable() asserts and it can be confusing. It also makes it more difficult to see the genuine places where work needs doing to improve scalable vectorization support. This patch changes getMemInstScalarizationCost to return an invalid cost instead of firing an assert for scalable vectors. Also, vectorizeInterleaveGroup had multiple asserts all for the same thing. I have removed all but one assert near the start of the function, and added a new assert that we aren't dealing with masks for scalable vectors. Differential Revision: https://reviews.llvm.org/D99727	2021-04-15 09:41:03 +01:00
Simon Pilgrim	b49c41afba	[SLP] createOp - fix null dereference warning. NFCI. Only attempt to propagateIRFlags if we have both SelectInst - afaict we shouldn't have matched a min/max reduction without both SelectInst, but static analyzer doesn't know that.	2021-04-14 15:24:41 +01:00
Sander de Smalen	bd86824d98	[TTI] NFC: Change getArithmeticReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html This patch is practically NFC, with the exception of an AArch64 SVE related cost-model change, where we can now return an Invalid cost instead of some bogus number. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100201	2021-04-13 14:20:59 +01:00
Sander de Smalen	92d8421f49	[TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100199	2021-04-13 14:20:58 +01:00
dfukalov	d066079728	[NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset. Main reason is preparation to transform AliasResult to class that contains offset for PartialAlias case. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D98027	2021-04-09 12:54:22 +03:00
Alexey Bataev	ab124bbe2a	[SLP]Fix PR49898: Infinite loop in SLP vectorizer. We should not re-try attempt of finding of the consecutive store chain if it was tried before. Differential Revision: https://reviews.llvm.org/D100131	2021-04-08 14:18:06 -07:00
Florian Hahn	e4de3cdf3d	[LV] Pass VPWidenPHIRecipe to widenPHIInstruction (NFC). Instead of passing the start value and the defined value to widenPHIInstruction, pass the VPWidenPHIRecipe directly, which can be used to get both (and more in future patches).	2021-04-08 14:25:10 +01:00
David Green	8675ef100f	[LV] Logical and/or select costs D99674 stopped the folding of certain select operations into and/or, due to incorrect folding in the presence of poison. D97360 added some costs to attempt to account for the change, but only worked at the getUserCost level, not the getCmpSelInstrCost that the vectorizer will use directly. This adds similar logic into the vectorizer to handle these logical and/or selects, treating them like and/or directly. This fixes 60% performance regressions from code like the attached test case. Differential Revision: https://reviews.llvm.org/D99884	2021-04-08 10:39:47 +01:00
Alexey Bataev	a78e86e6be	[SLP]Avoid multiple attempts to vectorize CmpInsts. No need to lookup through and/or try to vectorize operands of the CmpInst instructions during attempts to find/vectorize min/max reductions. Compiler implements postanalysis of the CmpInsts so we can skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save compile time. Differential Revision: https://reviews.llvm.org/D99950	2021-04-07 06:15:42 -07:00
Philip Reames	a6d2a8d6f5	Add a subclass of IntrinsicInst for llvm.assume [nfc] Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.	2021-04-06 11:16:22 -07:00
Kerry McLaughlin	7344f3d39a	[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to reorder FP operations. However, it may still be beneficial to vectorize the loop by moving the reduction inside the vectorized loop and making sure that the scalar reduction value be an input to the horizontal reduction, e.g: %phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ] %load = load <8 x float> %reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load) This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions. For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D98435	2021-04-06 14:45:34 +01:00
Kerry McLaughlin	857b8a73da	[LoopVectorize] Change the identity element for FAdd Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd. Reviewed By: dmgreen, spatel Differential Revision: https://reviews.llvm.org/D98963	2021-04-06 12:13:43 +01:00

1 2 3 4 5 ...

2575 Commits