llvm-project

Commit Graph

Author	SHA1	Message	Date
Sander de Smalen	3d549dddf7	[LV] Pass compare predicate to getCmpSelInstrCost. If the condition of a select is a compare, pass its predicate to TTI::getCmpSelInstrCost to get a more accurate cost value instead of passing BAD_ICMP_PREDICATE. I noticed that the commit message from D90070 had a comment about the vectorized select predicate possibly being composed of other compares with different predicate values, but I wasn't able to construct an example where this was an actual issue. If this is an issue, I guess we could add another check that the block isn't predicated for any reason. Reviewed By: dmgreen, fhahn Differential Revision: https://reviews.llvm.org/D114646	2021-12-06 11:41:27 +00:00
Florian Hahn	7c3c352d82	[VPlan] Separate ctors for VPWidenIntOrFpInduction. (NFC) VPWidenIntOrFpInductionRecipes can either be constructed with a PHI and an optional cast or a PHI and a trunc instruction. Reflect this in 2 separate constructors. This also simplifies a follow-up change.	2021-12-05 12:15:18 +00:00
Alexey Bataev	ba74bb3a22	[SLP]Fix reused extracts cost. If the extractelement instruction is used multiple times in the different tree entries (either vectorized, or gathered), need to compensate the scalar cost of such instructions. They are completely removed if all users are part of the tree but we need to compensate the cost only once for each instruction. Differential Revision: https://reviews.llvm.org/D114958	2021-12-02 10:52:00 -08:00
Alexey Bataev	8ceccbd321	[SLP]Outline and fix code for finding common insertelement vectors. Need to outline the code for finding common vectors in insertelement instructions into a separate function for future patches. It also improves the process by adding some extra checks for early exit and fixes a bug where it always finds the match because of erroneous compare of the same values. Differential Revision: https://reviews.llvm.org/D114909	2021-12-02 09:18:25 -08:00
Alexey Bataev	92fbd76af5	[SLP]Improve registering and merging of compatible shuffles. If several shuffle instructions are emitted, some of them might same/compatible (less defined) with the previously emitted ones. Such shuffles can be removed safely, improving the total cost of the vectorized code. Differential Revision: https://reviews.llvm.org/D114087	2021-12-02 08:48:29 -08:00
Alexey Bataev	afc9e7517a	[SLP]Improve cost model for the shuffled extracts. Improved the calculation of the shuffled extracts, where possible. Need to calculate the cost for the extracted scalars if some users are not insertelements + improved the total estimation of the shuffled scalars used in insertelements build vectors. Differential Revision: https://reviews.llvm.org/D113782	2021-12-01 08:10:57 -08:00
Alexey Bataev	cc30fbf242	[SLP]Introduce isUndefVector function to check for undef vectors. Undefined vector might be not only the UndefValue, but also it can be a constant vector with undef ot poison elements, need to check for this kind of undef too. Differential Revision: https://reviews.llvm.org/D114873	2021-12-01 07:46:10 -08:00
Alexey Bataev	ddce6e0561	[SLP]Improve vectorization of cmp instructions sequences. Final attempt to vectorize bundles of comptatible cmp instructions after all other instructions processing. Metric: SLP.NumVectorInstructions Program results results0 diff test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 1.00 5.00 400.0% test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test 8.00 11.00 37.5% test-suite :: MultiSource/Benchmarks/Olden/voronoi/voronoi.test 20.00 26.00 30.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1344.00 1648.00 22.6% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1344.00 1648.00 22.6% test-suite :: MultiSource/Benchmarks/Olden/bh/bh.test 102.00 124.00 21.6% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 118.00 133.00 12.7% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3233.00 3554.00 9.9% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3233.00 3554.00 9.9% test-suite :: MultiSource/Benchmarks/Olden/power/power.test 64.00 70.00 9.4% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 7879.00 8604.00 9.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/simulator/simulator.test 50.00 54.00 8.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 27.00 29.00 7.4% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 8345.00 8955.00 7.3% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 694.00 738.00 6.3% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 361.00 382.00 5.8% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 409.00 430.00 5.1% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 140.00 147.00 5.0% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 140.00 147.00 5.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4013.00 4206.00 4.8% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 966.00 1011.00 4.7% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 65.00 68.00 4.6% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4219.00 4381.00 3.8% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1911.00 1973.00 3.2% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 62.00 64.00 3.2% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 62.00 64.00 3.2% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 852.00 877.00 2.9% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 852.00 877.00 2.9% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1624.00 1668.00 2.7% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 39.00 40.00 2.6% test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 613.00 624.00 1.8% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 378.00 383.00 1.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 293.00 295.00 0.7% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 297.00 299.00 0.7% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5522.00 5534.00 0.2% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5522.00 5534.00 0.2% Differential Revision: https://reviews.llvm.org/D114799	2021-12-01 07:26:29 -08:00
Florian Hahn	e44298a8f8	[LV] Move code from vectorizeMemoryInstruction to recipe's execute(). The code in widenMemoryInstruction has already been transitioned to only rely on information provided by VPWidenMemoryInstructionRecipe directly. Moving the code directly to VPWidenMemoryInstructionRecipe::execute completes the transition for the recipe. It provides the following advantages: 1. Less indirection, easier to see what's going on. 2. Removes accesses to fields of ILV. 2) in particular ensures that no dependencies on fields in ILV for vector code generation are re-introduced. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114324	2021-12-01 14:56:51 +00:00
Alexey Bataev	dce6c434ea	[SLP]Improve isFixedVectorShuffle and its use. Extended support for undefined source vector/extract indices/non-fixed vector types, also no need to check for the parent of the extractelement instructions with the constant indicies. Differential Revision: https://reviews.llvm.org/D114121	2021-11-30 10:10:20 -08:00
Alexey Bataev	fc57cfad3c	[SLP][NFC]Move static function to make it visible in member function, NFC.	2021-11-30 09:38:46 -08:00
Philip Reames	c41b318423	[LV] Remove unneeded cast to Operator [NFC]	2021-11-30 08:45:13 -08:00
Florian Hahn	dab776dd0f	[LV] Move code from widenSelectInstruction to VPWidenSelectRecipe. (NFC) The code in widenSelectInstruction has already been transitioned to only rely on information provided by VPWidenSelectRecipe directly. Moving the code directly to VPWidenSelectRecipe::execute completes the transition for the recipe. It provides the following advantages: 1. Less indirection, easier to see what's going on. 2. Removes accesses to fields of ILV. 2) in particular ensures that no dependencies on fields in ILV for vector code generation are re-introduced. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114323	2021-11-30 10:32:44 +00:00
Roman Lebedev	8cd782487f	[X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()` We ask `TTI.getAddressComputationCost()` about the cost of computing vector address, and then multiply it by the vector width. This doesn't make any sense, it implies that we'd do a vector GEP and then scalarize the vector of pointers, but there is no such thing in the vectorized IR, we perform scalar GEP's. This is especially bad on X86, and was effectively prohibiting any scalarized vectorization of gathers/scatters, because `X86TTIImpl::getAddressComputationCost()` says that cost of vector address computation is `10` as compared to `1` for scalar. The computed costs are similar to the ones with D111222+D111220, but we end up without masked memory intrinsics that we'd then have to expand later on, without much luck. (D111363) Differential Revision: https://reviews.llvm.org/D111460	2021-11-30 10:47:56 +03:00
Florian Hahn	fd71159f64	[LV] Move code from widenInstruction to VPWidenRecipe. (NFC) The code in widenInstruction has already been transitioned to only rely on information provided by VPWidenRecipe directly. Moving the code directly to VPWidenRecipe::execute completes the transition for the recipe. It provides the following advantages: 1. Less indirection, easier to see what's going on. 2. Removes accesses to fields of ILV. 2) in particular ensures that no dependencies on fields in ILV for vector code generation are re-introduced. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114322	2021-11-29 09:09:00 +00:00
Florian Hahn	3495090b9b	[LV] Move code from widenGEP to VPWidenGEPRecipe (NFC). The code in widenGEP has already been transitioned to only rely on information provided by VPWidenGEPRecipe directly. Moving the code directly to VPWidenGEPRecipe::execute completes the transition for the recipe. It provides the following advantages: 1. Less indirection, easier to see what's going on. 2. Removes accesses to fields of ILV. 2) in particular ensures that no dependencies on fields in ILV for GEP code generation are re-introduced. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D114321	2021-11-28 18:29:18 +00:00
Sander de Smalen	28a4deab92	[LV] Fix incorrectly marking a pointer indvar as 'scalar'. collectLoopScalars should only add non-uniform nodes to the list if they are used by a load/store instruction that is marked as CM_Scalarize. Before this patch, the LV incorrectly marked pointer induction variables as 'scalar' when they required to be widened by something else, such as a compare instruction, and weren't used by a node marked as 'CM_Scalarize'. This case is covered by sve-widen-phi.ll. This change also allows removing some code where the LV tried to widen the PHI nodes with a stepvector, even though it was marked as 'scalarAfterVectorization'. Now that this code is more careful about marking instructions that need widening as 'scalar', this code has become redundant. Differential Revision: https://reviews.llvm.org/D114373	2021-11-28 09:49:28 +00:00
Alexey Bataev	fc0aacf324	[SLP]Improve analysis/emission of vector operands for alternate nodes. Compiler has an analysis for perfect diamond matching but it does not support nodes with main/alternate opcodes. The problem is that the scalars themselves are different and might not match directly with other nodes, but operands and main/alternate opcodes might match and compiler might reuse some previously emitted vector instructions. Need to include this analysis in the cost model and actual vector instructions emission process. Differential Revision: https://reviews.llvm.org/D114101	2021-11-26 06:38:02 -08:00
David Sherwood	e20391fc5d	[LoopVectorize] When tail-folding, don't always predicate uniform loads In VPRecipeBuilder::handleReplication if we believe the instruction is predicated we then proceed to create new VP region blocks even when the load is uniform and only predicated due to tail-folding. I have updated isPredicatedInst to avoid treating a uniform load as predicated when tail-folding, which means we can do a single scalar load and a vector splat of the value. Tests added here: Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll Differential Revision: https://reviews.llvm.org/D112552	2021-11-26 11:30:54 +00:00
Alexey Bataev	4675a1654c	Revert "[SLP]Improve analysis/emission of vector operands for alternate nodes." This reverts commit `496254cf80` to fix compiler crashes reported in D114101#3152982.	2021-11-25 05:19:49 -08:00
Alexey Bataev	496254cf80	[SLP]Improve analysis/emission of vector operands for alternate nodes. Compiler has an analysis for perfect diamond matching but it does not support nodes with main/alternate opcodes. The problem is that the scalars themselves are different and might not match directly with other nodes, but operands and main/alternate opcodes might match and compiler might reuse some previously emitted vector instructions. Need to include this analysis in the cost model and actual vector instructions emission process. Differential Revision: https://reviews.llvm.org/D114101	2021-11-24 12:55:24 -08:00
Florian Hahn	2897b67665	[LV] Use OrigLoop instead of induction to get function. (NFC) Upcoming changes will result in Induction not being set/used in some cases. Use OrigLoop to get the function instead.	2021-11-24 20:17:44 +00:00
Florian Hahn	8b86752c60	[VPlan] Remove unused VPInstruction constructor. (NFC) VPInstruction inherits from VPValue, so the constructor taking ArrayRef<VPValue*> covers all cases that would be covered by the removed constructor.	2021-11-24 14:06:50 +00:00
Rosie Sumpter	df32a39dd0	[LoopVectorize][CostModel] Update cost model for fmuladd intrinsic This patch updates the cost model for ordered reductions so that a call to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction plus the cost of an ordered fadd reduction. Differential Revision: https://reviews.llvm.org/D111630	2021-11-24 08:50:05 +00:00
Rosie Sumpter	2d33327f9d	[LoopVectorize] Print fast-math flags for VPReductionRecipe	2021-11-24 08:50:05 +00:00
Rosie Sumpter	991074012a	[LoopVectorize] Propagate fast-math flags for VPInstruction In-loop vector reductions which use the llvm.fmuladd intrinsic involve the creation of two recipes; a VPReductionRecipe for the fadd and a VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags these should be propagated through to the fmul instruction, so an interface setFastMathFlags has been added to the VPInstruction class to enable this. Differential Revision: https://reviews.llvm.org/D113125	2021-11-24 08:50:04 +00:00
Rosie Sumpter	c2441b6b89	[LoopVectorize] Add vector reduction support for fmuladd intrinsic Enables LoopVectorize to handle reduction patterns involving the llvm.fmuladd intrinsic. Differential Revision: https://reviews.llvm.org/D111555	2021-11-24 08:50:04 +00:00
Diego Caballero	4348cd42c3	[LV] Drop integer poison-generating flags from instructions that need predication This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact` and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop is linearized, these flags may lead to generating a poison value that is effectively used as the base address of the widen load/store. The fix drops all the integer poison-generating flags from instructions that contribute to the address computation of a widen load/store whose original instruction was in a basic block that needed predication and is not predicated after vectorization. Reviewed By: fhahn, spatel, nlopes Differential Revision: https://reviews.llvm.org/D111846	2021-11-22 10:57:29 +00:00
Florian Hahn	cf8efbd30e	[VPlan] Wrap vector loop blocks in region. A first step towards modeling preheader and exit blocks in VPlan as well. Keeping the vector loop in a region allows for changing the VF as we traverse region boundaries. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D113182	2021-11-20 17:59:48 +00:00
Florian Hahn	76effb001d	[LV] Remove obsolete comment about creating a dummy block (NFC) No dummy pre-entry block is created since `a6c4969f5f`. The comment is stale now and can be removed. Mentioned by @Ayal in D113182.	2021-11-19 17:17:04 +00:00
Alexey Bataev	d1fdf867b1	[SLP][NFC]Introduce TreeEntry::getVectorFactor member function, NFC. Added TreeEntry::getVectorFactor to get the final vectotization factor to simplify the code. Differential Revision: https://reviews.llvm.org/D114190	2021-11-19 06:32:19 -08:00
David Sherwood	670dd40244	[Analysis] Fix getNumberOfParts to return 0 when the answer is unknown When asking how many parts are required for a scalable vector type there are occasions when it cannot be computed. For example, <vscale x 1 x i3> is one such vector for AArch64+SVE because at the moment no matter how we promote the i3 type we never end up with a legal vector. This means that getTypeConversion returns TypeScalarizeScalableVector as the LegalizeKind, and then getTypeLegalizationCost returns an invalid cost. This then causes BasicTTImpl::getNumberOfParts to dereference an invalid cost, which triggers an assert. This patch changes getNumberOfParts to return 0 for such cases, since the definition of getNumberOfParts in TargetTransformInfo.h states that we can use a return value of 0 to represent an unknown answer. Currently, LoopVectorize.cpp is the only place where we need to check for 0 as a return value, because all other instances will not currently ask for the number of parts for <vscale x 1 x iX> types. In addition, I have changed the target-independent interface for getNumberOfParts to return 1 and assume there is a single register that can fit the type. The loop vectoriser has lots of tests that are target-independent and they relied upon the 0 value to mean the answer is known and that we are not scalarising the vector. I have added tests here that show we correctly return an invalid cost for VF=vscale x 1 when the loop contains unusual types such as i7: Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll Differential Revision: https://reviews.llvm.org/D113772	2021-11-17 12:07:09 +00:00
Alexey Bataev	900cc1a226	[SLP]Improve cost of the gather nodes. No need to count the final shuffle cost for the constants, gathering of the constants is just a constant vector + extra inserts, if required. Differential Revision: https://reviews.llvm.org/D113770	2021-11-16 06:25:07 -08:00
Alexey Bataev	cdf8a53c1d	[SLP]Fix windows build, NFC. Need to put `IndexIdx` var to the list of captures.	2021-11-16 06:09:51 -08:00
Alexey Bataev	aa9bbb64be	[SLP]Adjust GEP indices types when trying to build entries. Need to adjust the types of GEPs indices when building the tree entries/operands. Otherwise some of the nodes might differ and vectorizer is unable to correctly find them and count their cost. Differential Revision: https://reviews.llvm.org/D113792	2021-11-16 05:44:33 -08:00
Alexey Bataev	224e46d355	[SLP][DOT][NFCI]Output all scalars for the splats, not only the first one.	2021-11-15 10:54:26 -08:00
Alexey Bataev	036207d5f2	[SLP]Improve splat detection. A bunch of scalars can be treated as a splat not only if all elements are the same but also if some of them are undefvalues. Differential Revision: https://reviews.llvm.org/D113774	2021-11-15 07:50:34 -08:00
Alexey Bataev	b85152f8b1	[SLP][NFC]Use `isa_and_nonnull` and fix comment, NFC.	2021-11-15 06:49:33 -08:00
Alexey Bataev	6fb5bed7d1	[SLP]Do not create unused gather nodes for scalar arguments of vector intrinsics. If the vector intrinsic has scalar argument, we currently still create a tree entry for this argument. This entry is not used, just consumes resources and increases the cost of the tree. Differential Revision: https://reviews.llvm.org/D113806	2021-11-15 06:11:19 -08:00
Sander de Smalen	f835fe8ef7	[LV] Rename blockNeedsPredication to blockNeedsPredicationForAnyReason. The interface is a convenience function to ask if a block requires predication when widening, but it's important that there are two separate concepts to consider: (A) The block was predicated in the original loop. (B) The block was unpredicated in the original loop, but requires predication because of tail folding. In the case of (B) we know that at least one lane of the vector will be executed, which means we can implementing a load from a uniform address with a scalar load + splat (D112552). In the case of predication because of (A), we cannot do this, because the scalar load itself requires predication. The name 'blockNeedsPredication' does not make the distinction between (A) and (B), hence the reason to rename it. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D113392	2021-11-15 08:04:20 +00:00
Alexey Bataev	352c46e707	[SLP]Improve vectorization of split loads. Need to fix ther cost estimation for split loads, since we look at the subregs already, no need to permute them, need just to estimate subregister insert, if it is smaller than the real register. Also, using split loads, it might be profitable already to vectorize smaller trees with gathering of the loads. Differential Revision: https://reviews.llvm.org/D107188	2021-11-12 06:13:22 -08:00
Florian Hahn	93931d78cf	[LV] Do not rely on InductionDescriptor::getCastInsts. (NFC) Now that CastDef is passed as VPValue, there is no need to access ID.getCastInsts, as CastDef can instead be checked.	2021-11-10 13:03:44 +00:00
Florian Hahn	e7f1232cb7	[LV] Move optimized IV recipes to phi section of header after sinking. Unfortunately sinking recipes for first-order recurrences relies on the original position of recipes. So if a recipes needs to be sunk after an optimized induction, it needs to stay in the original position, until sinking is done. This is causing PR52460. To fix the crash, keep the recipes in the original position until sink-after is done. Post-commit follow-up to `c45045bfd0` to address PR52460.	2021-11-10 11:41:08 +00:00
Kerry McLaughlin	6f16ee5e14	Revert "[LoopVectorize] Extract the last lane from a uniform store" This reverts commit `0d748b4d32`. This is causing some failures when building Spec2017 with scalable vectors. Reverting to investigate.	2021-11-10 11:21:19 +00:00
Kerry McLaughlin	0d748b4d32	[LoopVectorize] Extract the last lane from a uniform store Changes VPReplicateRecipe to extract the last lane from an unconditional, uniform store instruction. collectLoopUniforms will also add stores to the list of uniform instructions where Legal->isUniformMemOp is true. setCostBasedWideningDecision now sets the widening decision for all uniform memory ops to Scalarize, where previously GatherScatter may have been chosen for scalable stores. This fixes an assert ("Cannot yet scalarize uniform stores") in setCostBasedWideningDecision when we have a loop containing a uniform i1 store and a scalable VF, which we cannot create a scatter for. Reviewed By: sdesmalen, david-arm, fhahn Differential Revision: https://reviews.llvm.org/D112725	2021-11-09 14:43:16 +00:00
Florian Hahn	acbefbf19f	[VPlan] Guard code to dump instructions after `d9361bfbe2`. This should fix build failures when built without assertions enabled, e.g. https://lab.llvm.org/buildbot/#/builders/205/builds/172	2021-11-09 10:29:05 +00:00
Florian Hahn	d9361bfbe2	[VPlan] Add initial inner-loop VPlan verification. This patch adds a function to verify general properties of VPlans. The first check makes sure that all phi-like recipes are at the beginning of a block, with no other recipes in between. Note that this currently may not hold for VPBlendRecipes at the moment, as other recipes may be inserted before the VPBlendRecipe during mask creation. Note that this patch depends on D111300 and D111301, which fix code that breaks the checked invariant. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111302	2021-11-09 10:18:28 +00:00
Florian Hahn	e3bfb6a146	[VPlan] Make sure recurrence splice is not inserted between phis. All phi-like recipes should be at the beginning of a VPBasicBlock with no other recipes in between. Ensure that the recurrence-splicing recipe is not added between phi-like recipes, but after them. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D111301	2021-11-08 17:42:32 +00:00
Sander de Smalen	2829376bb2	[LV] Use VScaleForTuning to fine-tune the cost per lane. When targeting a specific CPU with scalable vectorization, the knowledge of that particular CPU's vscale value can be used to tune the cost-model and make the cost per lane less pessimistic. If the target implements 'TTI.getVScaleForTuning()', the cost-per-lane is calculated as: Cost / (VScaleForTuning * VF.KnownMinLanes) Otherwise, it assumes a value of 1 meaning that the behavior is unchanged and calculated as: Cost / VF.KnownMinLanes Reviewed By: kmclaughlin, david-arm Differential Revision: https://reviews.llvm.org/D113209	2021-11-08 16:59:46 +00:00
David Sherwood	c63b0f471b	[NFC][LoopVectorize] Make the createStepForVF interface more caller-friendly The common use case for calling createStepForVF is currently something like: Value Step = createStepForVF(Builder, ConstantInt::get(Ty, UF), VF); and it makes more sense to reduce overall lines of code and change the function to let it create the constant instead. With my patch this becomes: Value Step = createStepForVF(Builder, Ty, VF, UF); and the ConstantInt is created instead createStepForVF. A side-effect of this is that the code in createStepForVF is also becomes simpler. As part of this patch I've also replaced some calls to getRuntimeVF with calls to createStepForVF, i.e. getRuntimeVF(Builder, Count->getType(), VFactor * UFactor) -> createStepForVF(Builder, Count->getType(), VFactor, UFactor) because this feels semantically better. Differential Revision: https://reviews.llvm.org/D113122	2021-11-08 15:14:14 +00:00

1 2 3 4 5 ...

2808 Commits