llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	833b33a7f4	[NFC][X86][CostModel] Add tests for byteswap intrinsic	2021-05-05 20:11:46 +03:00
Alexey Bataev	f19e8f424f	[COST][X86]Improve cost model for reverse shuffle v32i16/v64i8 in AVX512F. Improved cost model for reverse shuffle on AVX512F for types v32i16/v64i8. Differential Revision: https://reviews.llvm.org/D100974	2021-04-27 11:14:21 -07:00
David Sherwood	a458b7855e	[AArch64] Add AArch64TTIImpl::getMaskedMemoryOpCost function When vectorising for AArch64 targets if you specify the SVE attribute we automatically then treat masked loads and stores as legal. Also, since we have no cost model for masked memory ops we believe it's cheap to use the masked load/store intrinsics even for fixed width vectors. This can lead to poor code quality as the intrinsics will currently be scalarised in the backend. This patch adds a basic cost model that marks fixed-width masked memory ops as significantly more expensive than for scalable vectors. Tests for the cost model are added here: Transforms/LoopVectorize/AArch64/masked-op-cost.ll Differential Revision: https://reviews.llvm.org/D100745	2021-04-26 11:00:03 +01:00
Roman Lebedev	7b312e228c	[NFC][X86][AVX2] Add baseline CodeGen/CostModel tests for interleaved loads/stores of i16 w/ strides 2/3/4 `X86TTIImpl::getInterleavedMemoryOpCostAVX2()` currently contains data only for a handful of tuples. For now, at least add tests for a few more. I'm guessing that we care how well the patterns codegen since we use their presumed cost for vectorization decisions, so i've added codegen tests too. There's one really easy caveat for these codegen tests: for interleaved load tests, we really have to ensure that the deinterleaved vectors are escaped separately. Similarly for stores.	2021-04-26 01:13:07 +03:00
Simon Pilgrim	043bc88dba	[CostModel][X86] Improve v2f32 fadd reduction cost This was being reported as a similar cost to v4f32 when its a lot cheaper (just a shufps+addps).	2021-04-23 16:56:13 +01:00
Daniil Fukalov	f79d055791	[TTI] Fix ScalarizationCost initialization. In cases when ScalarizationCostPassed has no value, UINT_MAX is actually used for cost estimation in `return ScalarCalls * ScalarCost + ScalarizationCost`. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D101099	2021-04-23 17:59:59 +03:00
David Sherwood	57ca65e21e	[AArch64] Add instruction costs for FP_TO_UINT and FP_TO_SINT with half types We were missing some instruction costs when converting vectors of floating point half types into integers, so I've added those here. I also manually generated assembly code for each FP->int case and looked at the number of instructions generated, which meant adjusting some of the existing costs too. I've updated an existing test to reflect the new costs: Analysis/CostModel/AArch64/sve-fptoi.ll Differential Revision: https://reviews.llvm.org/D99935	2021-04-21 09:39:45 +01:00
Alexey Bataev	673e2f1b70	[COST][AARCH64] Improve cost of reverse shuffles for AArch64. Introduced the cost of thre reverse shuffles for AArch64, currently just copied the costs for PermuteSingleSrc. Differential Revision: https://reviews.llvm.org/D100871	2021-04-20 13:47:56 -07:00
Alexey Bataev	683dc41695	Update tests checks, NFC.	2021-04-20 10:20:15 -07:00
Alexey Bataev	e7d8105373	[COST]Add a test for reverse shuffles cost on AArch64, NFC.	2021-04-20 10:01:14 -07:00
Roman Lebedev	df9597cf5a	[X86][CostModel] X86TTIImpl::getShuffleCost(): subvector insertions are cheap This is similar to the subvector extractions, except that the 0'th subvector isn't free to insert, because we generally don't know whether or not the upper elements need to be preserved: https://godbolt.org/z/rsxP5W4sW This is needed to avoid regressions in D100684 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100698	2021-04-19 13:24:58 +03:00
Roman Lebedev	b9fc47745a	[NFC][X86][CostModel] Rewrite load_store.ll Test SSE41, since that added float/i64/i32/i8 inserts/extracts. Don't forget to test vectors of pointers. Do test byte-aligned loads/stores. Fixup test coverage to be rather more exhaustive, testing all reasonable element sizes vs element counts permutations that fit up to witin ZMM.	2021-04-18 11:12:36 +03:00
Roman Lebedev	b06c55a698	[X86][CostModel] Fix cost model for non-power-of-two vector load/stores Sometimes LV has to produce really wide vectors, and sometimes they end up being not powers of two. As it can be seen from the diff, the cost computation is currently completely non-sensical in those cases. Instead of just scalarizing everything, split/factorize the wide vector into a number of subvectors, each one having a power-of-two elements, recurse to get the cost of op on this subvector. Also, check how we'd legalize this subvector, and if the legalized type is scalar, also account for the scalarization cost. Note that for sub-vector loads, we might be able to do better, when the vectors are properly aligned. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100099	2021-04-16 15:30:57 +03:00
Simon Pilgrim	2a1a2f5733	[CostModel][X86] Add fully aligned load/store tests As noted on D100099, if these illegal vector types are suitably aligned they should be much cheaper to load (but probably not store).	2021-04-16 10:35:40 +01:00
Florian Hahn	acd9cc7495	[AArch64] Use type-legalization cost for code size memop cost. At the moment, getMemoryOpCost returns 1 for all inputs if CostKind is CodeSize or SizeAndLatency. This fools LoopUnroll into thinking memory operations on large vectors have a cost of one, even if they will get expanded to a large number of memory operations in the backend. This patch updates getMemoryOpCost to return the cost for the type legalization for both CodeSize and SizeAndLatency. This should more accurately reflect the number of memory operations required. I am not sure how latency should properly be included in SizeAndLatency from the description, but returning the size cost should be clearly more accurate. This does not cause any binary changes when building MultiSource/SPEC2000/SPEC2006 with -O3 -flto for AArch64, likely because large vector memops are not really formed by code emitted from Clang. But using the C/C++ matrix extension can easily result in code with very large vector operations directly from Clang, e.g. https://clang.godbolt.org/z/6xzxcTGvb Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D100291	2021-04-15 10:11:05 +01:00
Sander de Smalen	bd86824d98	[TTI] NFC: Change getArithmeticReductionCost to return InstructionCost This patch migrates the TTI cost interfaces to return an InstructionCost. See this patch for the introduction of the type: https://reviews.llvm.org/D91174 See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html This patch is practically NFC, with the exception of an AArch64 SVE related cost-model change, where we can now return an Invalid cost instead of some bogus number. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100201	2021-04-13 14:20:59 +01:00
dfukalov	8f4b7e94a2	[AMDGPU][CostModel] Refine cost model for control-flow instructions. Added cost estimation for switch instruction, updated costs of branches, fixed phi cost. Had to increase `-amdgpu-unroll-threshold-if` default value since conditional branch cost (size) was corrected to higher value. Test renamed to "control-flow.ll". Removed redundant code in `X86TTIImpl::getCFInstrCost()` and `PPCTTIImpl::getCFInstrCost()`. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D96805	2021-04-10 09:20:24 +03:00
Roman Lebedev	aa165eac32	[NFC][X86][CostModel] Add some load/store tests w/ non-power-of-two elt cnt vectors For example the cost to load <48 x i16> should likely be 3, because that's just 3x load i256.	2021-04-08 15:00:28 +03:00
Sander de Smalen	672f673004	[SVE] Remove checks for warnings in scalable-vector tests. After D98856 these tests will by default break (fatal_error) if any of the wrong interfaces are used, so there's no longer a need to have a RUN line that checks for a warning message emitted by the compiler.	2021-04-07 15:59:32 +01:00
Simon Pilgrim	201877d572	[CostModel][X86] Improve accuracy of vXi8 multiply reduction costs After rG47321c311bdbe0145b9bf45d822185c37b19fa50 we promote vXi8 reductions to vXi16 to create a much faster PMULLW mul reduction, followed by a (free) truncation. This avoids the high cost of repeated vXi8 multiplications (which extend+multiply+truncate to/from vXi16 types....). Fixes the missing vXi8 mul reduction vectorization in PR42674 (Comment #20) 'mul16' test case.	2021-04-06 11:53:22 +01:00
Sander de Smalen	7108b2dec1	[SVE] Fix LoopVectorizer test scalalable-call.ll This marks FSIN and other operations to EXPAND for scalable vectors, so that they are not assumed to be legal by the cost-model. Depends on D97470 Reviewed By: dmgreen, paulwalker-arm Differential Revision: https://reviews.llvm.org/D97471	2021-03-31 14:52:49 +01:00
Sander de Smalen	b6d0529780	[CostModel] Align the cost model for intrinsics for scalable/fixed-width vectors. Let getIntrinsicInstrCost call getTypeBasedIntrinsicInstrCost for scalable vectors, similar to how this is done for fixed-width vectors, instead of falling back on BaseT::getIntrinsicInstrCost(). If the intrinsic cannot be costed (or is not overloaded by the target), it will return InstructionCost::getInvalid() instead. Depends on D97469 Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D97470	2021-03-31 14:52:49 +01:00
Sander de Smalen	4ca860742d	[InstructionCost] Don't conflate Invalid costs with Unknown costs. We previously made a change to getUserCost to return a Invalid cost when one of the TTI costs returned '-1' (meaning 'unknown' or 'infinitely expensive'). It makes no sense to say that: shufflevector <2 x i8> %x, <2 x i8> %y, <4 x i32> <i32 0, i32 1, i32 2, i32 3> has an invalid cost. Perhaps the cost is not known, but the IR is valid and can be code-generated. Invalid should only be used for IR that cannot possibly be code-generated and where a cost is nonsensical. With more passes now asserting that the cost must be valid, it is possible that those assertions will fail for perfectly valid IR. An incomplete cost-model probably shouldn't be a reason for the compiler to break. It's better to consider these costs as 'very expensive' and ignore them for other reasons. At some point, we should consider replacing -1 with some other mechanism. Reviewed By: paulwalker-arm, dmgreen Differential Revision: https://reviews.llvm.org/D99502	2021-03-30 09:29:42 +01:00
Nashe Mncube	19601a4c6c	[SVE][Analysis]Instruction costs for ops on scalable-vec The following operations have no associated cost for them when applied to scalable vectors, and as a consequence can trigger a crash when a call is made to AArch64TTIImpl::getCastInstrCost(): - fptrunc - trunc - fpext - fpto(u,s)i This patch adds costs for these operations and relevant regression tests. Differential Revision: https://reviews.llvm.org/D98934	2021-03-29 11:15:50 +01:00
Craig Topper	5797feaa55	[RISCV] Reorder checks in RISCVTTIImpl::getGatherScatterOpCost to avoid calling getMinRVVVectorSizeInBits() when V extension is not enabled. getMinRVVVectorSizeInBits() asserts if the V extension isn't enabled. So check that gather/scatter is legal first since it already contains a check for V extension being enabled. It also already checks getMinRVVVectorSizeInBits for fixed length vectors so we don't need a check in getGatherScatterOpCost.	2021-03-25 14:20:47 -07:00
Craig Topper	512bae81cc	[RISCV] Add basic cost modelling for fixed vector gather/scatter. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D99142	2021-03-24 11:14:14 -07:00
David Sherwood	748ae5281d	[IR][SVE] Add new llvm.experimental.stepvector intrinsic This patch adds a new llvm.experimental.stepvector intrinsic, which takes no arguments and returns a linear integer sequence of values of the form <0, 1, ...>. It is primarily intended for scalable vectors, although it will work for fixed width vectors too. It is intended that later patches will make use of this new intrinsic when vectorising induction variables, currently only supported for fixed width. I've added a new CreateStepVector method to the IRBuilder, which will generate a call to this intrinsic for scalable vectors and fall back on creating a ConstantVector for fixed width. For scalable vectors this intrinsic is lowered to a new ISD node called STEP_VECTOR, which takes a single constant integer argument as the step. During lowering this argument is set to a value of 1. The reason for this additional argument at the codegen level is because in future patches we will introduce various generic DAG combines such as mul step_vector(1), 2 -> step_vector(2) add step_vector(1), step_vector(1) -> step_vector(2) shl step_vector(1), 1 -> step_vector(2) etc. that encourage a canonical format for all targets. This hopefully means all other targets supporting scalable vectors can benefit from this too. I've added cost model tests for both fixed width and scalable vectors: llvm/test/Analysis/CostModel/AArch64/neon-stepvector.ll llvm/test/Analysis/CostModel/AArch64/sve-stepvector.ll as well as codegen lowering tests for fixed width and scalable vectors: llvm/test/CodeGen/AArch64/neon-stepvector.ll llvm/test/CodeGen/AArch64/sve-stepvector.ll See this thread for discussion of the intrinsic: https://lists.llvm.org/pipermail/llvm-dev/2021-January/147943.html	2021-03-23 10:43:35 +00:00
David Green	a2e0312cda	[ARM] Tone down the MVE scalarization overhead The scalarization overhead was set deliberately high for MVE, whilst the codegen was new. It helps protect us against the negative ramifications of mixing scalar and vector instructions. This decreases that, especially for floating point where the cost of extracting/inserting lane elements can be low. For integer the cost is still fairly high due to the cross-register-bank copy, but is no longer n^2 in the length of the vector. In general, this will decrease the cost of scalarizing floats and long integer vectors. i64 increase in cost, having a high cost before and after this patch. For floats this allows up to start doing things like vectorizing fdiv instructions, even if they are scalarized. Differential Revision: https://reviews.llvm.org/D98245	2021-03-19 18:30:11 +00:00
Alexey Bataev	14ae0cf0f5	[Cost]Canonicalize the cost for logical or/and reductions. The generic cost of logical or/and reductions should be cost of bitcast <ReduxWidth x i1> to iReduxWidth + cmp eq\|ne iReduxWidth. Differential Revision: https://reviews.llvm.org/D97961	2021-03-19 11:01:58 -07:00
David Green	35e0567d58	[ARM] Add VREV MVE shuffle costs This uses the shuffle mask cost from D98206 to give a better cost of MVE VREV instructions. This helps especially in VectorCombine where the cost of shuffles is used to reorder bitcasts, which this helps keep the phase ordering test for fp16 reductions producing optimal code. The isVREVMask has been moved to a header file to allow it to be used across target transform and isel lowering. Differential Revision: https://reviews.llvm.org/D98210	2021-03-17 21:21:43 +00:00
Caroline Concatto	f2b749be15	[CostModel][SVE] Add cost model for shuffle reverse with i1 and scalable vector This patch adds the cost model for experimental.vector.reverse with scalable vector types: nxv16i1, nxv8i1, nxv4i1 and nxv2i1. These types are missing from the previous cost model patch D95603. The cost model for experimental.vector.reverse with 1 bit mask is used by loop vectorization in the patch D95363 Differential Revision: https://reviews.llvm.org/D97758	2021-03-04 18:52:59 +00:00
Alexey Bataev	60470ac7ff	[Cost]Add tests for boolean and/or reductions, NFC. Tests with the default costs for boolean and/or reductions. Differential Revision: https://reviews.llvm.org/D97793	2021-03-03 12:34:30 -08:00
Juneyoung Lee	c89d9d8a48	[TTI] Consider select form of and/or i1 as having arithmetic cost This is a patch that updates the cost of `select i1 a, b, false` to be equivalent to that of `and i1 a, b` as well as the cost of `select i1 a, true, b` equivalent to `or i1 a, b`. Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe. This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked. These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D97360	2021-03-02 02:18:19 +09:00
Sander de Smalen	f870c551f0	[AArch64] NFC: Cleanup some SVE cost-model tests. Moved some of the `sve-getIntrinsicCost-<..>` into a single sve-intrinsics.ll file, and simplified the tests a bit by bundling all the intrinsics in one function (instead of testing one intrinsic per function). That makes it easier to see the cost of the intrinsics.	2021-03-01 13:26:31 +00:00
Stelios Ioannou	30cb9c03b5	[AArch64] Add abs intrinsic costs This patch adds cost-modelling for abs vector intrinsic. Change-Id: I89007971bfb15f5b4a02a2eadfd43018e9a73976	2021-02-25 09:31:52 +00:00
David Green	dd2dbf7ee2	[TTI] Change getOperandsScalarizationOverhead to take Type args As a followup to D95291, getOperandsScalarizationOverhead was still using a VF as a vector factor if the arguments were scalar, and would assert on certain matrix intrinsics with differently sized vector arguments. This patch removes the VF arg, instead passing the Types through directly. This should allow it to more accurately compute the cost without having to guess at which operands will be vectorized, something difficult with more complex intrinsics. This adjusts one SVE test as it is now calling the wrong intrinsic vs veccall. Without invalid InstructCosts the cost of the scalarized intrinsic is too low. This should get fixed when the cost of scalarization is accounted for with scalable types. Differential Revision: https://reviews.llvm.org/D96287	2021-02-23 13:04:59 +00:00
David Green	33ba220611	[ARM] Ensure types provided to getIntrinsicCost are valid It appears that pointer types were causing issues for the min/max cost code in getIntrinsicInstrCost. This makes sure that when matching icmp/select to a min/max, we only do that for normal int or float types.	2021-02-18 14:00:23 +00:00
David Green	1a6744e3dc	[ARM] Add larger than legal ICmp costs A v8i32 compare will produce a v8i1 predicate, but during codegen the v8i32 will be split into two v4i32, potentially requiring two v4i1 predicates to be merged into a single v8i1. Because this merging of two v4i1's into a v8i1 is very expensive, we need to make the cost of the compare equally high. This patch adds the cost of that to ARMTTIImpl::getCmpSelInstrCost. Because we don't know whether the user of the predicate can be split, and the cost model is mostly pre-instruction, we may be pessimistic but that should only be for larger and legal types. This also adds min/max detection to the costmodel where it can be detected, to keep those in line with the cost of simple min/max instructions. Otherwise for the most part, costs that were already expensive have become more expensive. Differential Revision: https://reviews.llvm.org/D96692	2021-02-18 11:42:17 +00:00
David Green	1fbb3287fc	[ARM] MVE ICmp costing tests. NFC	2021-02-18 10:50:34 +00:00
Stanislav Mekhanoshin	a8d9d50762	[AMDGPU] gfx90a support Differential Revision: https://reviews.llvm.org/D96906	2021-02-17 16:01:32 -08:00
David Green	6d835c5fcd	[ARM] Add MVE abs costs Similar to min/max, this increases the accuracy of abs intrinsics costs under MVE.	2021-02-17 14:21:09 +00:00
David Green	415deff10b	[ARM] MVE abs intrinsic costs. NFC	2021-02-17 13:54:17 +00:00
David Green	0a98efb049	[ARM] Add some basic Min/Max costs This adds basic MVE costs for SMIN/SMAX/UMIN/UMAX, as well as MINNUM and MAXNUM representing fmin and fmax. It tightens up the costs, not using a ICmp+Select cost. Differential Revision: https://reviews.llvm.org/D96603	2021-02-15 15:06:19 +00:00
Caroline Concatto	b52e6c5891	[CostModel]Add cost model for experimental.vector.reverse This patch uses the function getShuffleCost with SK_Reverse to compute the cost for experimental.vector.reverse. For scalable vector type, it adds a table will the legal types on AArch64TTIImpl::getShuffleCost to not assert in BasicTTIImpl::getShuffleCost, and for fixed vector, it relies on the existing cost model in BasicTTIImpl. Depends on D94883 Differential Revision: https://reviews.llvm.org/D95603	2021-02-15 14:23:57 +00:00
David Green	6abe362ed7	[ARM] Fix duplicate fdiv tests, changing them to frem. NFC	2021-02-13 15:16:11 +00:00
David Green	7c2e061188	[ARM] Extra vector shuffle tests of various kinds. NFC	2021-02-13 15:03:10 +00:00
David Green	b7c3de8d5a	[ARM] MVE min/max cost tests. NFC	2021-02-13 11:12:12 +00:00
Sander de Smalen	1d42ba254f	[BasicTTIImpl] Fix getCastInstrCost for scalable vectors by querying for ElementCount. This fixes an overly restrictive assumption that the vector is a FixedVectorType, in code that tries to calculate the cost of a cast operation when splitting a too-wide vector. The algorithm works the same for scalable vectors, so this patch removes the cast<FixedVectorType>. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D96253	2021-02-12 08:28:52 +00:00
Sander de Smalen	63d787e5d4	[CostModel] An extending load to illegal type is not free. COST(zext (<4 x i32> load(...) to <4 x i64>)) != 0 when <4 x i64> is an illegal result type that requires splitting of the operation. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D96250	2021-02-12 07:59:21 +00:00
David Green	b1ef919aad	[ARM] Add CostKind to getMVEVectorCostFactor. This adds the CostKind to getMVEVectorCostFactor, so that it can automatically account for CodeSize costs, where it returns a cost of 1 not the MVEFactor used for Throughput/Latency. This helps simplify the caller code and allows us to get the codesize cost more correct in more cases.	2021-02-11 15:33:59 +00:00
Caroline Concatto	2cbcf3e297	[AArch64][SVE]Add cost model for broadcast shuffle This patch adds a cost model for SK_Broadcast in AArch64TTIImpl::getShuffleCost with scalable vector. Without this patch, the scalable vector type relies on BasicTTIImpl cost implementation and assert. Differential Revision: https://reviews.llvm.org/D95598	2021-02-03 09:53:22 +00:00
David Green	0175cd00a1	[AArch64] Add vector saturating add intrinsic costs This adds sadd.sat, uadd.sat, ssub.sat and usub.sat costs for AArch64, similar to how they were recently added for ARM. Differential Revision: https://reviews.llvm.org/D95292	2021-01-27 10:38:32 +00:00
Sander de Smalen	b9417c3616	[CostModel] Handle CTLZ and CCTZ in getTypeBasedIntrinsicInstrCost Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D95355	2021-01-26 14:37:51 +00:00
David Green	06ab7953e9	[AArch64] Saturating add cost tests. NFC	2021-01-24 13:49:17 +00:00
David Sherwood	83e7a96c06	Fix build failure caused by `2e080eb00a`	2021-01-22 09:56:53 +00:00
David Sherwood	2e080eb00a	[SVE] Add support for scalable vectorization of loops with selects and cmps I have removed an unnecessary assert in LoopVectorizationCostModel::getInstructionCost that prevented a cost being calculated for select instructions when using scalable vectors. In addition, I have changed AArch64TTIImpl::getCmpSelInstrCost to only do special cost calculations for fixed width vectors and fall back to the base version for scalable vectors. I have added a simple cost model test for cmps and selects: test/Analysis/CostModel/sve-cmpsel.ll and some simple tests that show we vectorize loops with cmp and select: test/Transforms/LoopVectorize/AArch64/sve-basic-vec.ll Differential Revision: https://reviews.llvm.org/D95039	2021-01-22 09:48:13 +00:00
Jeroen Dobbelaere	121cac01e8	[noalias.decl] Look through llvm.experimental.noalias.scope.decl Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93042	2021-01-19 20:09:42 +01:00
David Green	6a563eef13	[ARM] Expand vXi1 VSELECT's We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as expanded to turn them into a series of logical operations. Differential Revision: https://reviews.llvm.org/D94946	2021-01-19 17:56:50 +00:00
David Green	f373b30923	[ARM] Add MVE add.sat costs This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs, based on when the instruction is legal. With smaller than legal types that are promoted we generate shr(qadd(shl, shl)), so the cost is 4 appropriately. Differential Revision: https://reviews.llvm.org/D94958	2021-01-19 15:38:46 +00:00
David Green	54e38440e7	[ARM] Expand add.sat/sub.sat cost checks. NFC	2021-01-19 15:06:06 +00:00
Caroline Concatto	172f1f8952	[AArch64][SVE]Add cost model for vector reduce for scalable vector This patch computes the cost for vector.reduce<operand> for scalable vectors. The cost is split into two parts: the legalization cost and the horizontal reduction. Differential Revision: https://reviews.llvm.org/D93639	2021-01-19 11:54:16 +00:00
David Green	dcefcd51e0	[ARM] Update trunc costs We did not have specific costs for larger than legal truncates that were not otherwise cheap (where they were next to stores, for example). As MVE does not have a dedicated instruction for them (and we do not use loads/stores yet), they should be expensive as they get expanded to a series of lane moves. Differential Revision: https://reviews.llvm.org/D94260	2021-01-11 08:59:28 +00:00
David Green	0c8b748f32	[ARM] Additional trunc cost tests. NFC	2021-01-11 08:35:16 +00:00
Caroline Concatto	01c190e907	[AArch64][CostModel]Fix gather scatter cost model This patch fixes a bug introduced in the patch: https://reviews.llvm.org/D93030 This patch pulls the test for scalable vector to be the first instruction to be checked. This avoids the Gather and Scatter cost model for AArch64 to compute the number of vector elements for something that is not a vector and therefore crashing.	2021-01-07 14:02:08 +00:00
Peter Waller	3e357ecd44	[llvm][NFC] Disallow all warnings in TypeSize tests This is a follow-up to a request from a reviewer [0]. The text may change in the future and these tests should not produce any warning output. [0] https://reviews.llvm.org/D91806#inline-879243 Reviewed By: sdesmalen, david-arm Differential Revision: https://reviews.llvm.org/D94161	2021-01-06 17:17:07 +00:00
Caroline Concatto	060cfd9795	[AArch64][SVE]Add cost model for masked gather and scatter for scalable vector. A new TTI interface has been added 'Optional <unsigned>getMaxVScale' that returns the maximum vscale for a given target. When known getMaxVScale is used to compute the cost of masked gather scatter for scalable vector. Depends on D92094 Differential Revision: https://reviews.llvm.org/D93030	2021-01-04 13:59:58 +00:00
Juneyoung Lee	c1f3033697	Update inselt tests at llvm/test/Analysis to have poison as shufflevector's placeholder (NFC) File listed by: grep -R -E "^[^;]shufflevector <.> ., <.> undef" . \| grep inseltpoison Updated with: sed -i -E 's/shufflevector <(.)> (.), <(.*)> undef/shufflevector <\1> \2, <\3> poison/g' $1	2020-12-31 17:12:37 +09:00
Juneyoung Lee	3036547248	Precommit analysis/etc tests for inselt poison placeholder This adds tests in directories missing from https://reviews.llvm.org/rGdb7a2f347f132b3920415013d62d1adfb18d8d58	2020-12-24 12:14:24 +09:00
Bradley Smith	e0b9c5df26	[CostModel] Add costs for llvm.experimental.vector.{extract,insert} intrinsics Adds cost model support for the new llvm.experimental.vector.{extract,insert} intrinsics, using the existing getExtractSubvectorOverhead and getInsertSubvectorOverhead functions for shuffles. Previously this case would throw an assertion. Differential Revision: https://reviews.llvm.org/D93043	2020-12-16 13:39:04 +00:00
Caroline Concatto	60e4698b9a	[CostModel]Replace FixedVectorType by VectorType in costgetIntrinsicInstrCost This patch replaces FixedVectorType by VectorType in getIntrinsicInstrCost in BasicTTIImpl.h. It re-arranges the scalable type test earlier return and add tests for scalable types. Depends on D91532 Differential Revision: https://reviews.llvm.org/D92094	2020-12-16 13:06:23 +00:00
David Green	a4823377fd	[ARM] Add basic masked load/store costs This adds some basic MVE masked load/store costs, notably changing the cost of legal loads/stores to the MVECostFactor and the cost of scalarized instructions to 8*NumElts. Differential Revision: https://reviews.llvm.org/D86538	2020-12-12 15:26:32 +00:00
Simon Pilgrim	db900995ed	[CostModel][X86] getGatherScatterOpCost - use default implementation for alt costkinds Noticed while looking at D92701 - we only really handle TCK_RecipThroughput gather/scatter costs - for now drop back to the default implementation for non-legal gathers/scatters.	2020-12-06 14:08:26 +00:00
Sanjay Patel	136f98e523	[x86] adjust cost model values for minnum/maxnum with fast-math-flags Without FMF, we lower these intrinsics into something like this: vmaxsd %xmm0, %xmm1, %xmm2 vcmpunordsd %xmm0, %xmm0, %xmm0 vblendvpd %xmm0, %xmm1, %xmm2, %xmm0 But if we can ignore NANs, the single min/max instruction is enough because there is no need to fix up the x86 logic that corresponds to X > Y ? X : Y. We probably want to make other adjustments for FP intrinsics with FMF to account for specialized codegen (for example, FSQRT). Differential Revision: https://reviews.llvm.org/D92337	2020-12-01 10:45:53 -05:00
Sanjay Patel	40dc535b5a	[x86] add tests for maxnum/minnum with nnan; NFC	2020-11-30 14:30:28 -05:00
Sjoerd Meijer	5110ff0817	[AArch64][CostModel] Fix cost for mul <2 x i64> This was modeled to have a cost of 1, but since we do not have a MUL.2d this is scalarized into vector inserts/extracts and scalar muls. Motivating precommitted test is test/Transforms/SLPVectorizer/AArch64/mul.ll, which we don't want to SLP vectorize. Test Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll unfortunately needed changing, but the reason is documented in LoopVectorize.cpp:6855: // The cost of executing VF copies of the scalar instruction. This opcode // is unknown. Assume that it is the same as 'mul'. which I will address next as a follow up of this. Differential Revision: https://reviews.llvm.org/D92208	2020-11-30 11:36:55 +00:00
Simon Pilgrim	969918e177	[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion. Allows us to move the x86-specific lowering to the generic expansion code. Differential Revision: https://reviews.llvm.org/D92183	2020-11-27 11:18:58 +00:00
Sjoerd Meijer	a3b1fcbc0c	[AArch64][CostModel] Precommit some vector mul tests. NFC. The cost-model is not getting the cost right for a mul with <2 x i64> operands, i.e. we don't have a MUL.2d, and this is precommitting some tests before adjusting this.	2020-11-26 13:23:11 +00:00
Florian Hahn	926681b6be	[CostModel] Add basic implementation of getGatherScatterOpCost. Add a basic implementation of getGatherScatterOpCost to BasicTTIImpl. The implementation estimates the cost of scalarizing the loads/stores, the cost of packing/extracting the individual lanes and the cost of only selecting enabled lanes. This more accurately reflects the current cost on targets like AArch64. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D91984	2020-11-26 12:02:25 +00:00
Simon Pilgrim	385a27d6cd	[CostModel][X86] Refresh ISD::ABS costs Update costs now that D92095 and D92102 have tweaked the SSE2 implementation The SSE42 BLENDVPD cost can actually be used on SSE41 as we don't attempt to generate PCMPGT anymore Add scalar i16/i32/i64 costs as we can do this cheaply with CMOV	2020-11-25 18:40:19 +00:00
Florian Hahn	14c0185bfe	[AArch64] Add scatter cost model tests.	2020-11-23 18:36:56 +00:00
Florian Hahn	3a1c6cec15	[AArch64] Add tests for masked.gather costs.	2020-11-23 17:33:27 +00:00
Sanjay Patel	2717252c92	[CostModel] add basic handling for FP maximum/minimum intrinsics This might be a regression for some ARM targets, but that should be changed in the target-specific overrides. There is apparently still no default lowering for these nodes, so I am assuming these intrinsics are not in common use. X86, PowerPC, and RISC-V for example, just crash given the most basic IR.	2020-11-22 13:43:53 -05:00
Sanjay Patel	3a18f26723	[CostModel] add tests for FP maximum; NFC These min/max intrinsics are not handled in the basic implementation and probably not handled in target-specific overrides either.	2020-11-22 13:33:42 -05:00
Sanjay Patel	e32bd35120	[CostModel] mostly remove cost-kind predicate for intrinsics in basic TTI implementation This is re-applying a combination of `f7eac51b9b` and `8ec7ea3ddc` as one patch to avoid regressions now that we have better testing in place. Those were reverted with `32dd5870ee` because of crashing in experimental intrinsics. That bug should be fixed with `7ae346434`. Paraphrased original commit messages: This is the last step in removing cost-kind as a consideration in the basic class model for intrinsics. See D89461 for the start of that. Subsequent commits dealt with each of the special-case intrinsics that had customization here in the basic class. This should remove a barrier to retrying D87188 (canonicalization to the abs intrinsic). The ARM and x86 cost diffs seen here may be wrong because the target-specific overrides have their own bugs, but we hope this is less wrong - if something has a significant throughput cost, then it should have a significant size / blended cost too by default. The only behavioral diff in current regression tests is shown in the x86 scatter-gather test (which is misplaced or broken because it runs the entire -O3 pipeline) - we unrolled less, and we assume that is a improvement. Exception: in general, we want the size cost for a scalar call to be cheap even if the other costs are expensive - we expect it to just be a branch with some optional stack manipulation. It is likely that we will want to carve out some exceptions/overrides to this rule as follow-up patches for calls that have some general and/or target-specific difference to the expected lowering. This was noticed as a regression in unrolling, so we have a test for that now along with a couple of direct cost model tests. If the assumed scalarization costs for the oversized vector calls are not realistic, that would be another follow-up refinement of the cost models. Differential Revision: https://reviews.llvm.org/D90554	2020-11-20 11:21:10 -05:00
Sanjay Patel	7ae346434a	[CostModel] avoid crashing while finding scalarization overhead The constrained intrinsics have metadata arguments, so the tests here were crashing as noted in D90554 (and that was reverted even though this bug exists independently of that change).	2020-11-20 10:18:29 -05:00
Sanjay Patel	1285781fc5	[CostModel] add tests for math library calls; NFC This is a partial un-revert of `32dd5870ee` (originally `df09f82599` ). I'm adding back the baseline tests first, so we don't have to back-track as much in case there are still problems.	2020-11-20 08:24:49 -05:00
Eric Christopher	32dd5870ee	Temporarily Revert "[CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation" as it's causing crashes in the optimizer. A reduced testcase has been posted as a follow-up. This reverts commit `f7eac51b9b`. Temporarily Revert "[CostModel] make default size cost for libcalls small (again)" as it depends upon the primary revert. This reverts commit `8ec7ea3ddc`. Temporarily Revert "[CostModel] add tests for math library calls; NFC" as it depends upon the primary revert. This reverts commit `df09f82599`. Temporarily Revert "[LoopUnroll] add test for full unroll that is sensitive to cost-model; NFC" as it depends upon the primary revert. This reverts commit `618d555e8d`.	2020-11-19 22:10:23 -08:00
Craig Topper	f0b0bab34d	[X86] Use GF2P8AFFINEQB to implement vector bitreverse. We can use GF2P8AFFINEQB to reverse bits in a byte. Shuffles are needed to reverse the bytes in elements larger than i8. LegalizeVectorOps takes care of inserting the shuffle for the larger element size. We already have Custom lowering for v16i8 with SSSE3, v32i8 with AVX, and v64i8 with AVX512BW. I think we might be able to use this for scalars too by moving into a vector and back. But I'll save that for a follow up as its a little more involved. Reviewed By: RKSimon, pengfei Differential Revision: https://reviews.llvm.org/D91515	2020-11-17 23:49:06 -08:00
Caroline Concatto	6c4d8f4651	[AArch64] Add check for widening instruction for SVE. This patch fixes the function isWideningInstruction for scalable vectors. Now the cost model can check the widening pattern for SVE. Differential Revision: https://reviews.llvm.org/D91260	2020-11-16 12:30:08 +00:00
Sanjay Patel	8ec7ea3ddc	[CostModel] make default size cost for libcalls small (again) This was changed recently with D90554 / `f7eac51b9b` ...because we had a regression testing blindspot for intrinsics that are expected to be lowered to libcalls. In general, we want the size cost for a scalar call to be cheap even if the other costs are expensive - we expect it to just be a branch with some optional stack manipulation. It is likely that we will want to carve out some exceptions/overrides to this rule as follow-up patches for calls that have some general and/or target-specific difference to the expected lowering. This was noticed as a regression in unrolling, so we have a test for that now along with a couple of direct cost model tests. If the assumed scalarization costs for the oversized vector calls are not realistic, that would be another follow-up refinement of the cost models.	2020-11-14 08:15:35 -05:00
Sanjay Patel	df09f82599	[CostModel] add tests for math library calls; NFC	2020-11-14 08:15:35 -05:00
Simon Pilgrim	e11195d0a9	[CostModel][X86] Remove unused CHECK prefixes Allows us to remove the "CHECK: {{^}}" hack and help simplify D91275	2020-11-13 17:31:48 +00:00
Caroline Concatto	37f4ccb275	[AArch64]Add memory op cost model for SVE This patch adds/fixes memory op cost model for SVE with fixed-width vector. Differential Revision: https://reviews.llvm.org/D90950	2020-11-11 12:49:19 +00:00
Sanjay Patel	f7eac51b9b	[CostModel] remove cost-kind predicate for intrinsics in basic TTI implementation This is the last step in removing cost-kind as a consideration in the basic class model for intrinsics. See D89461 for the start of that. Subsequent commits dealt with each of the special-case intrinsics that had customization here in the basic class. This should remove a barrier to retrying D87188 (canonicalization to the abs intrinsic). The ARM and x86 cost diffs seen here may be wrong because the target-specific overrides have their own bugs, but we hope this is less wrong - if something has a significant throughput cost, then it should have a significant size / blended cost too by default. The only behavioral diff in current regression tests is shown in the x86 scatter-gather test (which is misplaced or broken because it runs the entire -O3 pipeline) - we unrolled less, and we assume that is a improvement. Differential Revision: https://reviews.llvm.org/D90554	2020-11-10 08:19:31 -05:00
Simon Pilgrim	20bbe14ac8	[CostModel][ARM] Remove unused check-prefix	2020-11-10 13:10:12 +00:00
Simon Pilgrim	bd2c0e2c9f	[CostModel][AArch64] Remove unused check-prefix	2020-11-10 13:10:11 +00:00
Simon Pilgrim	fe9403df06	[CostModel][X86] Remove unused check-prefixes	2020-11-10 12:48:35 +00:00
Sanjay Patel	264a6df353	[ARM] remove cost-kind predicate for cmp/sel costs This is the cmp/sel sibling to D90692. Again, the reasoning is: the throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uops). We need to check for a valid (non-null) condition type parameter because SimplifyCFG may pass nullptr for that (and so we will crash multiple regression tests without that check). I'm not sure if passing nullptr makes sense, but other code in the cost model does appear to check if that param is set or not. Differential Revision: https://reviews.llvm.org/D90781	2020-11-05 14:52:25 -05:00
Sanjay Patel	c40126e740	[ARM] remove cost-kind predicate for most math op costs This is based on the same idea that I am using for the basic model implementation and what I have partly already done for x86: throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other known-expensive machine instructions or things like MVE that may require cracking into >1 uop)). Differential Revision: https://reviews.llvm.org/D90692	2020-11-03 17:23:46 -05:00
Sanjay Patel	3c050a597c	[CostModel] fix cost calc bug for sadd/ssub with overflow As noted in D90554, there's an opcode typo in using an easily misused cost model API: getCmpSelInstrCost(). Beyond that, the assumed sequence of ops is questionable, but that would be another patch. My guess is that the x86 test diffs show that we are probably wrong both before and after this change, so there will be no practical difference. As an example, I tried this test which shows a cost of '7' either way: define <4 x i32> @sadd(<4 x i32> %va, <4 x i32> %vb) { %V4I32 = call {<4 x i32>, <4 x i1>} @llvm.sadd.with.overflow.v4i32(<4 x i32> %va, <4 x i32> %vb) %ov = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 1 %r = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 0 %z = select <4 x i1> %ov, <4 x i32> <i32 42, i32 42, i32 42, i32 42>, <4 x i32> %r ret <4 x i32> %z } $ llc -o - sadd.ll -mattr=avx vpaddd %xmm1, %xmm0, %xmm2 vpcmpgtd %xmm2, %xmm0, %xmm0 vpxor %xmm0, %xmm1, %xmm0 vblendvps %xmm0, LCPI0_0(%rip), %xmm2, %xmm0a Differential Revision: https://reviews.llvm.org/D90681	2020-11-03 11:03:47 -05:00
David Green	90131e3ecb	[CostModel] Make target intrinsics cheap by default This patch changes the intrinsics cost model to assume that by default target intrinsics are cheap. This didn't seem to be the case for all intrinsics, and is potentially an MVE problem due to our scalarization overheads. Cheap seems to be a good default in general though. Differential Revision: https://reviews.llvm.org/D90597	2020-11-03 09:58:28 +00:00
David Green	5ac21f9bfe	[ARM] Cost model test for target intrinsics. NFC	2020-11-02 17:46:48 +00:00
Sanjay Patel	35fa3c474f	[x86] add AVX2 cost model entries for maxnum of 256-bit vectors As noticed in D90554 , the AVX2 costs for 256-bit vectors did not include FMAXNUM entries, so we fell back to AVX1 which assumes those ops will be split into 128-bit halves or something close to that. Differential Revision: https://reviews.llvm.org/D90613	2020-11-02 12:20:17 -05:00
Florian Hahn	b3b993a7ad	Reland "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts the revert commit `408c4408fa`. This version of the patch includes a fix for a crash caused by treating ICmp/FCmp constant expressions as instructions. Original message: On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV.	2020-11-02 15:39:29 +00:00
Fangrui Song	7979f24954	[test] Fix some unused check prefixes in test/Analysis/CostModel/X86	2020-10-31 23:29:57 -07:00
David Green	30ad742644	[ARM] Fix crash for gather of pointer costs. If the elt size is unknown due to it being a pointer, a comparison against 0 will cause an assert. Make sure the elt size is large enough before comparing and for the moment just return the scalar cost.	2020-10-31 13:10:14 +00:00
Florian Hahn	408c4408fa	Revert "[TTI] Add VecPred argument to getCmpSelInstrCost." This reverts commit `73f01e3df5`. This appears to break http://lab.llvm.org:8011/#/builders/85/builds/383.	2020-10-30 21:26:14 +00:00
Sanjay Patel	251dd7c0f9	[x86] add cost overrides for mul with overflow I'm assuming the standard size integer instructions for this end up as something like: mulq %rsi seto %al And the 'mul' generally has reciprocal throughput of 1 on typical implementations (higher latency, but that's not handled here). The default costs may end up much higher than that, and that's what we see in the test diffs. Vector types are left as a 'TODO'. Differential Revision: https://reviews.llvm.org/D90431	2020-10-30 12:38:16 -04:00
Florian Hahn	73f01e3df5	[TTI] Add VecPred argument to getCmpSelInstrCost. On some targets, like AArch64, vector selects can be efficiently lowered if the vector condition is a compare with a supported predicate. This patch adds a new argument to getCmpSelInstrCost, to indicate the predicate of the feeding select condition. Note that it is not sufficient to use the context instruction when querying the cost of a vector select starting from a scalar one, because the condition of the vector select could be composed of compares with different predicates. This change greatly improves modeling the costs of certain compare/select patterns on AArch64. I am also planning on putting up patches to make use of the new argument in SLPVectorizer & LV. Reviewed By: dmgreen, RKSimon Differential Revision: https://reviews.llvm.org/D90070	2020-10-30 13:49:08 +00:00
Sanjay Patel	d5a75e7738	[x86] add test for umul intrinsic costs; NFC	2020-10-29 12:12:52 -04:00
Sanjay Patel	7c395f31a6	[CostModel][x86] remove cost-kind predicate for intrinsic costs We model cost as number of instructions / uops, so it does not make sense to treat size/blended costs any differently than throughput.	2020-10-28 14:33:37 -04:00
Sanjay Patel	9df32c9044	[CostModel] remove cost-kind predicate for funnel shift costs Completing the series of FIXME removals for special-case intrinsics: `50dfa19cc7` `f2c25c7079` `c963bde015` `01ea93d85d` This one looks quite different than the others. The size/blended cost is still potentially very far off from the throughput cost, but this is hopefully not worse on the whole. It looks like the underlying costs for the expanded shift/logic have their own cost-kind limitations. Also, we are not asking the target if it has a legal funnel shift op, so we just assume that the intrinsic gets expanded.	2020-10-28 14:02:34 -04:00
David Green	066737fdbc	[AArch64] Remove AArch64ISD::NOT, use vnot instead vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT node, but allow more folding thanks to all the target independent optimizations. Specifically this allows select(icmp ne, x, y) to become "cmeq; bsl y, x" as opposed to needing to convert the predicate with "cmeq; mvn; bsl x, y" Unfortunately there is a regression in a cmtst test, but the code it selected from was already non-canonical, with instcombine preferring to use an eq predicate instead. Plus the more common case of icmp ne is improved. Differential Revision: https://reviews.llvm.org/D90126	2020-10-28 08:15:37 +00:00
Sanjay Patel	50dfa19cc7	[CostModel] remove cost-kind predicate for FP add/mul vector reduction costs This was originally part of: `f2c25c7079` but that was reverted because there was an underlying bug in processing the vector type of these intrinsics. That was fixed with: `74ffc823ed` This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-27 18:00:20 -04:00
Sanjay Patel	138fda5dd2	[CostModel] add tests for FP reductions; NFC	2020-10-27 18:00:20 -04:00
Bing1 Yu	2c08f1b4b6	[CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded... In each 128-lane, if there is at least one index is demanded and not all indices are demanded and this 128-lane is not the first 128-lane of the legalized-vector, then this 128-lane needs a extracti128; If in each 128-lane, there is at least one index is demanded, this 128-lane needs a inserti128. The following cases will help you build a better understanding: Assume we insert several elements into a v8i32 vector in avx2, Case#1: inserting into 1th index needs vpinsrd + inserti128 Case#2: inserting into 5th index needs extracti128 + vpinsrd + inserti128 Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128. Reviewed By: pengfei, RKSimon Differential Revision: https://reviews.llvm.org/D89767	2020-10-27 11:21:13 +08:00
Joe Ellis	0383a1a8c2	[SVE][AArch64] Fix TypeSize warning in GEP cost analysis The warning would fire when calling getGEPCost for analyzing the cost of a GEP instruction. This would result in the use of the now deprecated implicit cast of TypeSize to uint64_t through the overloaded operator. This patch fixes the issue by using getKnownMinSize instead of the implicit cast. This is possible because the code is already scalable-vector aware. The semantic behaviour of the code is unchanged by this patch. Reviewed By: sdesmalen, fpetrogalli Differential Revision: https://reviews.llvm.org/D89872	2020-10-26 17:40:19 +00:00
Tyker	d3205bbca3	[Annotation] Allows annotation to carry some additional constant arguments. This allows using annotation in a much more contexts than it currently has. especially when annotation with template or constexpr. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D88645	2020-10-26 10:50:05 +01:00
Sanjay Patel	f2c25c7079	[CostModel] remove cost-kind predicate for some vector reduction costs This is a modified 2nd try of `22d10b8ab4` (reverted by `1c8371692d` because it managed to expose an existing crashing bug that should be fixed by `74ffc823` ). Original commit message: This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. The ARM costs show a small difference between throughput and size because there's an underlying difference in cmp/sel costs that is also predicated on cost-kind. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-25 15:17:52 -04:00
Sanjay Patel	74ffc823ed	[CostModel] fix operand/type accounting for fadd/fmul reductions I'm not sure if/how this ever worked, but it must not be tested currently because the basic tests added here were crashing as noted in the post-review comments for `1c83716` (which reverted another cost-model fix in `22d10b8ab4`).	2020-10-25 15:01:19 -04:00
Martin Storsjö	1c8371692d	Revert "[CostModel] remove cost-kind predicate for vector reduction costs" This reverts commit `22d10b8ab4`. This broke compilation e.g. like this: $ cat synth.c a; float b; c() { for (;;) { float d = -b a++; d -= --b * a++; d -= --b * a; d -= --b * a; e(d); } } $ clang -target x86_64-linux-gnu -c -O2 -ffast-math synth.c clang: ../include/llvm/Support/Casting.h:104: static bool llvm::isa_impl _cl<To, const From>::doit(const From*) [with To = llvm::PointerType; Fr om = llvm::Type]: Assertion `Val && "isa<> used on a null pointer"' fail ed.	2020-10-25 08:47:54 +02:00
Sanjay Patel	22d10b8ab4	[CostModel] remove cost-kind predicate for vector reduction costs This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant targets could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. The ARM costs show a small difference between throughput and size because there's an underlying difference in cmp/sel costs that is also predicated on cost-kind. Paraphrasing from the previous commits: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-24 13:20:17 -04:00
dfukalov	9068c20965	[AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions. 1. Throughput and codesize costs estimations was separated and updated. 2. Updated fdiv cost estimation for different cases. 3. Added scalarization processing for types that are treated as !isSimple() to improve codesize estimation in getArithmeticInstrCost() and getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path of base implementation. Next step is unify scalarization part in base class that is currently works for TCK_RecipThroughput path only. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89973	2020-10-24 19:53:08 +03:00
Florian Hahn	089c1ccd6d	[AArch64] Add vector compare/select cost-model tests.	2020-10-23 20:43:04 +01:00
Florian Hahn	0fcc6f7a76	[AArch64] Implement getIntrinsicInstrCost, handle min/max intrinsics. This patch adds a specialized implementation of getIntrinsicInstrCost and add initial cost-modeling for min/max vector intrinsics. AArch64 NEON support umin/smin/umax/smax for vectors <8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>. Notably, it does not support vectors with i64 elements. This change by itself should have very little impact on codegen, but in follow-up patches I plan to teach the vectorizers to consider using those intrinsics on platforms where it is profitable, e.g. because there is no general 'select'-like instruction. The current cost returned should be better for throughput, latency and size. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D89953	2020-10-23 11:32:42 +01:00
Florian Hahn	c1705e0ba4	[AArch64] Add min/max cost-model tests for v2i32.	2020-10-22 16:04:13 +01:00
Florian Hahn	d6efc87518	[AArch64] Add min/max cost-model tests for v4i16.	2020-10-22 15:47:50 +01:00
Florian Hahn	fbb6375db0	[AArch64] Add cost model tests for min/max intrinsics.	2020-10-22 13:28:04 +01:00
Sanjay Patel	c963bde015	[CostModel] remove cost-kind predicate for scatter/gather cost This is similar in spirit to `01ea93d85d` (memcpy) except that here the underlying caller assumptions were created for vectorizer use (throughput) rather than other passes. That meant ARM could have an enormous throughput cost with no corresponding size, latency, or blended cost increase. X86 has the same throughput restriction as the basic implementation, so it is still unchanged. Paraphrasing from the previous commit: This may not make sense for some callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. Targets should provide better overrides if the current modeling is not accurate.	2020-10-21 14:26:05 -04:00
Sanjay Patel	729610a51a	[ARM] add cost-kind tests for intrinsics; NFC This is a copy of the x86 file to provide better coverage; x86 may have strange overrides that mask changes in the generic model.	2020-10-21 14:26:04 -04:00
Sanjay Patel	01ea93d85d	[CostModel] remove cost-kind predicate for memcpy cost The default implementation base returns TCC_Expensive (currently set to '4'), so that explains the test diff. This probably does not make sense for most callers, but at least now the costs will be consistently wrong instead of mysteriously wrong. The ARM target has an override that tries to model codegen expansion, and that should likely be adapted for general usage. This probably does not affect anything because the vectorizers are the primary users of the throughput cost, but memcpy is not listed as a trivially vectorizable intrinsic.	2020-10-21 08:50:44 -04:00
David Green	b93d74ac9c	[ARM] Basic getArithmeticReductionCost reduction costs This adds some basic costs for MVE reductions - currently just costing the simple legal add vectors as a single MVE instruction. More complex costing can be added in the future when the framework more readily allows it. Differential Revision: https://reviews.llvm.org/D88980	2020-10-17 10:29:00 +01:00
David Green	d79ee3a807	[ARM] Add a very basic active_lane_mask cost This adds a very basic cost for active_lane_mask under MVE - making the assumption that they will be free and then apologizing for that in a comment. In reality they may either be free (by being nicely folded into a tail predicated loop), cost the same as a VCTP or be expanded into vdup's, adds and cmp's. It is difficult to detect the difference from a single getIntrinsicInstrCost call, so makes the assumption that the vectorizer is adding them, and only added them where it makes sense. We may need to change this in the future to better model predicate costs in the vectorizer, especially at -Os or non-tail predicated loops. The vectorizer currently does not query the cost of these instructions but that will change in the future and a zero cost there probably makes the most sense at the moment. Differential Revision: https://reviews.llvm.org/D88989	2020-10-17 10:09:42 +01:00
Sanjay Patel	9f6048f83d	[CostModel] remove cost-kind predicate for ctlz/cttz intrinsics in basic TTI implementation The cost modeling for intrinsics is a patchwork based on different expectations from the callers, so it's a mess. I'm hoping to untangle this to allow canonicalization to the new min/max intrinsics in IR. The general goal is to remove the cost-kind restriction here in the basic implementation class. Ie, if some intrinsic has throughput cost of 104, assume that it has the same size, latency, and blended costs. Effectively, an intrinsic with cost N is composed of N simple instructions. If that's not correct, the target should provide a more accurate override. The x86-64 SSE2 subtarget cost diffs require explanation: 1. The scalar ctlz/cttz are assuming "BSR+XOR+CMOV" or "TEST+BSF+CMOV/BRANCH", so not cheap. 2. The 128-bit SSE vector width versions assume cost of 18 or 26 (no explanation provided in the tables, but this corresponds to a bunch of shift/logic/compare). 3. The 512-bit vectors in the test file are scaled up by a factor of 4 from the legal vector width costs. 4. The plain latency cost-kind is not affected in this patch because that calc is diverted before we get to getIntrinsicInstrCost(). Differential Revision: https://reviews.llvm.org/D89461	2020-10-15 13:14:41 -04:00
Sanjay Patel	ef748583c2	[CostModel] rearrange basic intrinsic cost implementation This is bigger/uglier than before, but it should allow fixing all of the broken paths more easily. Test coverage added with rGfab028b and other commits. This is not NFC - the scalable vector test would crash without this patch.	2020-10-13 11:52:00 -04:00
Sanjay Patel	1b94261e36	[x86] add cost model test for memcpy; NFC This is treated as a special-case in the base class implementation of getIntrinsicInstrCost().	2020-10-13 11:42:44 -04:00
Sanjay Patel	1c90878e60	[AArch64] fix spacing in test's RUN lines; NFC	2020-10-13 10:44:18 -04:00
Sanjay Patel	fab028b914	[x86] add tests for cost model kinds of intrinsics; NFC This provides coverage for existing special-cases and a sampling of other intrinsics. Current output appears to be wrong in several cases.	2020-10-13 10:39:43 -04:00
Sanjay Patel	937d782e38	[AArch64] add cost model test for scalable vector math; NFC Testing for the various cost model "TargetCostKind" is limited, and testing for scalable vectors is limited. The motivating example of an intrinsic is not included here yet because that just crashes.	2020-10-13 08:39:04 -04:00
Simon Pilgrim	913d7a110e	[X86][SSE2] Use smarter instruction patterns for lowering UMIN/UMAX with v8i16. This is my first LLVM patch, so please tell me if there are any process issues. The main observation for this patch is that we can lower UMIN/UMAX with v8i16 by using unsigned saturated subtractions in a clever way. Previously this operation was lowered by turning the signbit of both inputs and the output which turns the unsigned minimum/maximum into a signed one. We could use this trick in reverse for lowering SMIN/SMAX with v16i8 instead. In terms of latency/throughput this is the needs one large move instruction. It's just that the sign bit turning has an increased chance of being optimized further. This is particularly apparent in the "reduce" test cases. However due to the slight regression in the single use case, this patch no longer proposes this. Unfortunately this argument also applies in reverse to the new lowering of UMIN/UMAX with v8i16 which regresses the "horizontal-reduce-umax", "horizontal-reduce-umin", "vector-reduce-umin" and "vector-reduce-umax" test cases a bit with this patch. Maybe some extra casework would be possible to avoid this. However independent of that I believe that the benefits in the common case of just 1 to 3 chained min/max instructions outweighs the downsides in that specific case. Patch By: @TomHender (Tom Hender) ActuallyaDeviloper Differential Revision: https://reviews.llvm.org/D87236	2020-10-11 11:21:23 +01:00
David Green	4c3515cd62	[ARM] Add MVE vecreduce costmodel tests. NFC There were some existing tests that were not super useful. New ones are added for testing MVE specific patterns.	2020-10-09 16:25:25 +01:00
Amara Emerson	322d0afd87	[llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787	2020-10-07 10:36:44 -07:00
Sanjay Patel	816b0a9c9f	[CostModel] add cl option to check size and latency costs; NFC This is a setting used by SimplifyCFG, LoopUnroll, and InlineCost, but there is apparently no direct test coverage for any of those cost model values.	2020-09-27 09:52:56 -04:00
Jonas Paulsson	370a8c8025	[SystemZ] Make sure not to call getZExtValue on a >64 bit constant. Better use isZero() and isIntN() in SystemZTargetTransformInfo rather than calling getZExtValue() since the immediate operand may be wider than 64 bits, which is not allowed with getZExtValue(). Fixes https://bugs.llvm.org/show_bug.cgi?id=47600 Review: Simon Pilgrim	2020-09-23 15:36:32 +02:00
Bing1 Yu	ec24e50553	[CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8) add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8) Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87884	2020-09-23 10:29:10 +08:00
Simon Pilgrim	18a3ebcd30	[CostModel][X86] Add some select shuffle costs tests for D87884	2020-09-21 16:09:05 +01:00
Simon Pilgrim	de25ebaac6	[CostModel][X86] Add vXi32 division by uniform constant costs (PR47476) Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.	2020-09-10 12:17:54 +01:00
Sam Parker	0af4147804	[ARM][CostModel] CodeSize costs for i1 arith ops When optimising for size, make the cost of i1 logical operations relatively expensive so that optimisations don't try to combine predicates. Differential Revision: https://reviews.llvm.org/D86525	2020-09-07 09:27:18 +01:00
Anna Welker	064981f0ce	[ARM][MVE] Enable MVE gathers and scatters by default Enable MVE gather/scatters by default, which requires some minor adaptations in some tests. Differential revision: https://reviews.llvm.org/D86776	2020-08-28 19:05:29 +01:00
David Green	677c1590c0	[ARM] Increase MVE gather/scatter cost by MVECostFactor. MVE Gather scatter codegeneration is looking a lot better than it used to, but still has some issues. The instructions we currently model as 1 cycle per element, which is a bit low for some cases. Increasing the cost by the MVECostFactor brings them in-line with our other instruction costs. This will have the effect of only generating then when the extra benefit is more likely to overcome some of the issues. Notably in running out of registers and vectorizing loops that could otherwise be SLP vectorized. In the short-term whilst we look at other ways of dealing with those more directly, we can increase the costs of gathers to make them more likely to be beneficial when created. Differential Revision: https://reviews.llvm.org/D86444	2020-08-26 13:03:46 +01:00
Sam Parker	da4ada116e	[NFC][ARM] arith code size cost tests Add a run to measure the code size cost of arithmetic instructions and add a function for i1 types.	2020-08-25 11:16:01 +01:00
David Sherwood	7b64765cd1	[SVE] Fix TypeSize related warnings with IR truncates of scalable vectors In getCastInstrCost when the instruction is a truncate we were relying upon the implicit TypeSize -> uint64_t cast when asking if a given type has the same size as a legal integer. I've changed the code to only ask the question if the type is fixed length. I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail out for now if the type is a scalable vector. I've added the following new tests: Analysis/CostModel/AArch64/sve-trunc.ll Transforms/InstCombine/AArch64/sve-trunc.ll for both of these fixes. Differential revision: https://reviews.llvm.org/D86432	2020-08-25 09:17:56 +01:00
Christopher Tetreault	5eff21c8ff	[NFC][documentation] clarify comment in test test referenced a relative path to a file, but the path was not correct relative to the project the test is in Differential Revision: https://reviews.llvm.org/D86368	2020-08-21 14:30:47 -07:00
Sam Parker	acf0bb41e4	[ARM][CostModel] Select instruction costs. Modify the ARM getCmpSelInstrCost implementation for the code size costs of selects. Now consider the legalization cost and increase the cost of i1 because those values wouldn't live in a general purpose register. We also make selects +1 more expensive to account for the IT instruction. Differential Revision: https://reviews.llvm.org/D82091	2020-08-21 08:49:56 +01:00
dfukalov	4ccc38813e	[AMDGPU][CostModel] Add f16, f64 and contract cases to fused costs estimation. Add cases of fused fmul+fadd/fsub with f16 and f64 operands to cost model. Also added operations with contract attribute. Fixed line endings in test. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84995	2020-08-06 21:43:27 +03:00
Sam Parker	f2675ab45f	[ARM][CostModel] Implement getCFInstrCost As with other targets, set the throughput cost of control-flow instructions to free so that we don't miss out of vectorization opportunities. Differential Revision: https://reviews.llvm.org/D85283	2020-08-05 12:44:51 +01:00
Simon Pilgrim	d1abca187d	[CostModel][X86] Add SSE costs for SMAX/SMIN/UMAX/UMIN intrinsics	2020-07-29 15:55:43 +01:00
Simon Pilgrim	0a0f28254a	[CostModel][X86] Add SSE costs for ABS intrinsics	2020-07-29 14:33:59 +01:00
David Green	9ddb28964c	[ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores This patch uses the feature added in D79162 to fix the cost of a sext/zext of a masked load, or a trunc for a masked store. Previously, those were considered cheap or even free, but it's not the case as we cannot split the load in the same way we would for normal loads. This updates the costs to better reflect reality, and adds a test for it in test/Analysis/CostModel/ARM/cast.ll. It also adds a vectorizer test that showcases the improvement: in some cases, the vectorizer will now choose a smaller VF when tail-predication is enabled, which results in better codegen. (Because if it were to use a higher VF in those cases, the code we see above would be generated, and the vmovs would block tail-predication later in the process, resulting in very poor codegen overall) Original Patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79163	2020-07-29 13:41:34 +01:00
Simon Pilgrim	c5ef1f1edd	[TTI] Add default cost expansion for abs/smax/smin/umax/umin intrinsics	2020-07-29 12:13:06 +01:00
Simon Pilgrim	3f7249046a	[CostModel][X86] Add smax/smin/umin/umax intrinsics cost model tests Costs currently fall back to scalar generic intrinsic calls	2020-07-28 19:56:11 +01:00
Simon Pilgrim	c6920081a8	[CostModel][X86] Add abs intrinsics cost model tests abs costs currently falls back in scalar generic intrinsic calls	2020-07-28 19:56:10 +01:00
Jinsong Ji	d28f86723f	Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `bf544fa1c3`. Fixed the typo in PPCInstrInfo.cpp.	2020-07-28 14:00:11 +00:00
Jinsong Ji	bf544fa1c3	Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `adffce7153`. This is breaking test-suite, revert while investigation.	2020-07-27 21:07:00 +00:00
Jinsong Ji	adffce7153	[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html no one is making use of QPX/A2Q/BGQ/BGP CNK anymore. This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang, CNK support in openmp/polly. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83915	2020-07-27 19:24:39 +00:00
dfukalov	76a0c0ee6f	[AMDGPU][CostModel] Improve cost estimation for fused {fadd\|fsub}(a,fmul(b,c)) Summary: If result of fmul(b,c) has one use, in almost all cases (except denormals are IEEE) the pair of operations will be fused in one fma/mad/mac/etc. Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits, kerbowa Tags: #llvm Differential Revision: https://reviews.llvm.org/D83919	2020-07-16 03:06:38 +03:00
David Sherwood	c06b7e2ab5	[SVE] Fix implicit TypeSize->uint64_t conversion getCastInstrCost In getCastInstrCost() when comparing different sizes for src and dst types we should be using the TypeSize comparison operators instead of relying upon TypeSize being converted a uin64_t. Previously this meant we were dropping the scalable property and treating fixed and scalable vector types the same. Differential Revision: https://reviews.llvm.org/D83461	2020-07-14 08:16:31 +01:00
Stanislav Mekhanoshin	f7a7efbf88	[AMDGPU] Tweak getTypeLegalizationCost() Even though wide vectors are legal they still cost more as we will have to eventually split them. Not all operations can be uniformly done on vector types. Conservatively add the cost of splitting at least to 8 dwords, which is our widest possible load. We are more or less lying to cost mode with this change but this can prevent vectorizer from creation of wide vectors which results in RA problems for us. Differential Revision: https://reviews.llvm.org/D83078	2020-07-06 14:07:48 -07:00
David Green	146dad0077	[ARM] MVE FP16 cost adjustments This adjusts the MVE fp16 cost model, similar to how we already do for integer casts. It uses the base cost of 1 per cvt for most fp extend / truncates, but adjusts it for loads and stores where we know that a extending load has been used to get the load into the correct lane, and only an MVE VCVTB is then needed. Differential Revision: https://reviews.llvm.org/D81813	2020-07-06 15:57:51 +01:00
David Green	afdb2ef2ed	[ARM] Adjust default fp extend and trunc costs This adds some default costs for fp extends and truncates, generally costing them as 1 per lane. If the type is not legal then the cost will include a call to an __aeabi_ function. Some NEON code is also adjusted to make sure it applies to the expected types, now that fp16 is a more common thing. Differential Revision: https://reviews.llvm.org/D82458	2020-07-06 14:23:17 +01:00
David Green	60b8b2beea	[ARM] Add extra extend and trunc costs for cast instructions This expands the existing extend costs with a few extras for larger types than legal, which will usually be split under MVE. It also adds trunk support for the same thing. These should not have a large effect on many things, but makes the costs explicit and keeps a certain balance between the trunks and extends. Differential Revision: https://reviews.llvm.org/D82457	2020-07-06 11:33:05 +01:00
David Green	55227f85d0	[ARM] Use BaseT::getMemoryOpCost for getMemoryOpCost This alters getMemoryOpCost to use the Base TargetTransformInfo version that includes some additional checks for whether extending loads are legal. This will generally have the effect of making <2 x ..> and some <4 x ..> loads/stores more expensive, which in turn should help favour larger vector factors. Notably it alters the cost of a <4 x half>, which with the current codegen will be expensive if it is not extended. Differential Revision: https://reviews.llvm.org/D82456	2020-07-06 10:58:40 +01:00
Florian Hahn	1ccc49924a	[AArch64] Add getCFInstrCost, treat branches as free for throughput. D79164/2596da31740f changed getCFInstrCost to return 1 per default. AArch64 did not have its own implementation, hence the throughput cost of CFI instructions is overestimated. On most cores, most branches should be predicated and essentially free throughput wise. This restores a 9% performance regression on a SPEC2006 benchmark on AArch64 with -O3 LTO & PGO. This patch effectively restores pre `2596da3174` behavior for AArch64 and undoes the AArch64 test changes of the patch. Reviewers: samparker, dmgreen, anemet Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D82755	2020-06-30 20:34:04 +01:00
David Green	f14457f5d8	[ARM] Split cast cost tests, and add masked load/store tests. NFC This file has grown quite large and could do with being split up. This splits away the load/store + cast tests into a separate file. Some masked load/store + cast tests have been added too, along with some extra load/store + fpcast tests.	2020-06-25 13:24:17 +01:00
dfukalov	129388ddc4	[AMDGPU][CostModel] Add fneg cost estimation Summary: The estimation uses AMDGPUTargetLowering::isFNegFree() Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82065	2020-06-19 17:31:35 +03:00
Paul Walker	4612f39120	[SVE] Add flag to specify SVE register size, using this to calculate legal vector types. Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE data registers (in bits) to be specified. This allows the code generator to make assumptions it normally couldn't. As a starting point this information is used to mark fixed length vector types that can fit within the specified size as legal. Reviewers: rengolin, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80384	2020-06-18 12:11:16 +00:00
Sam Parker	2596da3174	[CostModel] getCFInstrCost in getUserCost. Have BasicTTI call the base implementation so that both agree on the default behaviour, which the default being a cost of '1'. This has required an X86 specific implementation as it seems to be very reliant on those instructions being free. Changes are also made to AMDGPU so that their implementations distinguish between cost kinds, so that the unrolling isn't affected. PowerPC also has its own implementation to prevent changes to the reg-usage vectorizer test. The cost model test changes now reflect that ret instructions are not generally free. Differential Revision: https://reviews.llvm.org/D79164	2020-06-15 09:28:46 +01:00
David Green	7507186b94	[ARM] Additional cast cost tests. This adds additional cast cpst tests useful for MVE, notably around half types.	2020-06-14 14:30:07 +01:00
Simon Pilgrim	28947bc23c	[CostModel][X86] Add broadcast costs for vXi1 bool vectors Doesn't mean much on non-AVX512 targets but better to keep with the other shuffles	2020-06-10 15:27:15 +01:00
Sam Parker	e70cf280f8	[NFC][ARM][AArch64] Test runs Add code size tests runs for memory ops for both architectures.	2020-06-02 09:05:30 +01:00
Sam Parker	792575ff32	[NFC][ARM][AArch64] More code size tests Add analysis runs for icmp, fcmp and select instructions.	2020-05-26 14:47:02 +01:00
Sam Parker	c5bbc8dd6d	[NFC][ARM] Fix for previous commit Actually analyse code-size for the size runs...	2020-05-26 10:45:35 +01:00
Sam Parker	48cdbd081c	[NFC][ARM] Add code size analysis tests Add code size runs for the cast costs.	2020-05-26 10:30:43 +01:00
Sam Parker	64cfb8a864	[NFC][ARM] Add intrinsic code size runs Add code size analysis of arithmetic intrinsics.	2020-05-26 09:41:54 +01:00
Sam Parker	1f72d5880e	[CostModel] Check for free intrinsics in BasicTTI Recommitting part of "[CostModel] Unify Intrinsic Costs." `de71def3f5` Now that the 'free' intrinsic information has been sunk to the lowest level, query the base implementation in BasicTTI before doing anything else. I suspect this is the change that was causing the main changes, particularly the large effects on debug builds. Differential Revision: https://reviews.llvm.org/D80012	2020-05-26 08:37:13 +01:00
Sam Parker	fb3ba38021	[CostModel] Remove getExtCost This has not been implemented by any backends which appear to cover the functionality through getCastInstrCost. Sink what there is in the default implementation into BasicTTI. Differential Revision: https://reviews.llvm.org/D78922	2020-05-21 07:18:06 +01:00
Eli Friedman	11aa3707e3	StoreInst should store Align, not MaybeAlign This is D77454, except for stores. All the infrastructure work was done for loads, so the remaining changes necessary are relatively small. Differential Revision: https://reviews.llvm.org/D79968	2020-05-15 12:26:58 -07:00
Sam Parker	0ef62fc25d	[NFC][ARM] Intrinsic CostModel Tests Add throughput tests for saturating, overflowing and reduction operations.	2020-05-15 13:38:42 +01:00
Stanislav Mekhanoshin	184b383457	Add v16f64 value type We need to use it to handle <16 x double> indirect indexes in the AMDGPU BE. The only visible change from adding it is in ARM cost model. To me it looks reasonable. With doubling a vector size it quadruples the cost up to the size 8 and then it did only double it. Now it also quadruples, which seems a logical progression to me. Actual AMDGPU code is to follow, this is a common part, plus load/store legalization in the AMDGPU BE not to break what works now. Differential Revision: https://reviews.llvm.org/D79952	2020-05-14 14:28:00 -07:00
Sam Parker	6bbad7285c	[CostModel] Modify BasicTTI getCastInstrCost Fix the assumption that all bitcasts of the same type sizes are free. We now only assume that bitcasts between ints and ptrs of the same size are free. This allows TTImpl to just call the concrete implementation of getCastInstrCost. Differential Revision: https://reviews.llvm.org/D78918	2020-05-13 07:26:08 +01:00
Sam Parker	f1f8cffce4	[NFC][AArch64] More casts tests... Don't use truncs are users because sometimes they're free too.	2020-05-12 13:06:17 +01:00
Sam Parker	e114bdf072	[NFC][AArch64] More cast cost tests Add truncating stores and casts with users.	2020-05-12 11:32:52 +01:00
Sam Parker	b4a8091a11	[ARM][CostModel] Improve getCastInstrCost - Specifically check for sext/zext users which have 'long' form NEON instructions. - Add more entries to the table for sext/zexts so that we can report more accurately the number of vmovls required for NEON. - Pass the instruction to the pass implementation. Differential Revision: https://reviews.llvm.org/D79561	2020-05-12 10:32:20 +01:00
Sam Parker	1952c86d61	[AArch64][CostModel] getCastInstrCost Pass the instruction to the base implementation. Differential Revision: https://reviews.llvm.org/D79562	2020-05-12 10:02:29 +01:00
Sam Parker	494c7ecef9	[NFC][AArch64] Update tests Add cost model tests for extending loads.	2020-05-12 08:49:05 +01:00
Sam Parker	751da4d596	[NFC][AArch64] Add test Add cost model test for cast operations.	2020-05-07 13:16:03 +01:00
Craig Topper	e39c7ab2b9	[CostModel][X86][ARM] Teach default implementation of getCastInstrCost to not add a split/join cost if source type and the destination type both have a SplitVector action If both the source and the destination need to be split then the two halves of the split operation are completely independent and don't need to be split or joined. So we don't need to assess a cost for the split or join. Differential Revision: https://reviews.llvm.org/D79111	2020-05-01 18:55:23 -07:00
Craig Topper	b938168aef	[X86] Lower the cost of v4i64->v4i32 truncate with avx512. We use the vpmovqd instruction which is a single uop. So the cost should be 1.	2020-05-01 11:09:37 -07:00
Craig Topper	6a1ad76dab	[X86] Don't return true from isTruncateFree for vectors Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to. Differential Revision: https://reviews.llvm.org/D79045	2020-04-30 16:43:35 -07:00
Craig Topper	ff66919020	[X86][CostModel] Bump the cost of vpermw/vpermt2b/vperm2w vpermw is 2 uops. vpermt2b/vpermt2w are two shuffle uops and a port 015 uop. Weirdly vpermb is a single uop. This patch bumps the cost to 2 for these operations. Maybe should go to 3 for the vpermt2*, but I've started conservative. I've also removed a few entries that were now the same as earlier subtargets or that I didn't think we really did. Like I don't think we extend v32i8 to v32i16, shuffle, and then truncate. Differential Revision: https://reviews.llvm.org/D79148	2020-04-30 11:32:25 -07:00
Craig Topper	cff6686532	[X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these For v4i64->v4i32 we generate: vextractf128 xmm1, ymm0, 1 vshufps xmm0, xmm0, xmm1, 136 # xmm0 = xmm0[0,2],xmm1[0,2] And for v8i64->v8i32 we generate: vperm2f128 ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3] vinsertf128 ymm0, ymm0, xmm1, 1 vshufps ymm0, ymm0, ymm2, 136 # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6] Differential Revision: https://reviews.llvm.org/D79109	2020-04-29 13:21:44 -07:00
Sam Parker	e9d0f1c8ea	[NFC][ARM] Modify cost model test	2020-04-29 12:42:47 +01:00
Sam Parker	850bdefa65	[NFC][ARM] Add two cost model tests	2020-04-29 12:36:05 +01:00
Simon Pilgrim	090cae8491	[TTI] Add DemandedElts to getScalarizationOverhead The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited. This patch does 2 things: 1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern. 2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs. This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing. A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well. Reviewed By: @craig.topper Differential Revision: https://reviews.llvm.org/D78216	2020-04-29 12:00:38 +01:00
Craig Topper	59b9e6fe76	[X86] Update costs for truncates from less than 128-bit vectors to vXi1 on pre-avx512 targets vXi1 types are legalized by promoting, but the narrow vectors are legalized by widening. This results in some truncates turning into any_extends.	2020-04-28 11:35:41 -07:00
Craig Topper	d42192c50f	[X86][CostModel] Correct the costs for truncate to a mask register with avx512 I've modified isTruncateFree to get an accurate cost for types that need to be split. I'm planning to look into fixing it for all vectors, but need more cost cleanups first. Differential Revision: https://reviews.llvm.org/D78973	2020-04-28 10:39:36 -07:00
Craig Topper	9ea5cc8a25	[X86][CostModel] Add vXiY->vXi1 truncate tests to min-legal-vector-width.ll. NFC	2020-04-27 15:48:11 -07:00
Craig Topper	37ec709233	[X86][CostModel] Update truncate costs for some narrow vector cases to match their wider version. This updates v4i16->v4i8 with sse2 to match v8i16->v8i8. Update v2i16->v2i8 and v4i16->v4i8 with sse 4.1 to match v8i16->v8i8.	2020-04-27 13:47:48 -07:00
Craig Topper	bdbbed115f	[X86][CostModel] Update costs for vector truncate with avx512f/avx512bw. All avx512 truncate instructions except vXi64->vXi32 are 2 uops on port 5. So raise their costs to 2. Except when we have an earlier faster sequence like pshufb for 128 bit input vectors. Add a lower cost of 3 v16i16->v16i8 with avx512f where we can extend to v16i32 then truncate. And a cost of 2 for avx512bw with and without avx512vl. There we can use vpmovwb with either a ymm or zmm input. Both of these beat masking, splitting, and using packuswb which is our avx/avx2 codegen.	2020-04-27 12:00:24 -07:00
Craig Topper	5eff75d86a	[X86][CostModel] Improve costs for fp_to_uint/fp_to_sint for vXi8/vXi16/v2i32 results. Differential Revision: https://reviews.llvm.org/D78893	2020-04-27 10:35:15 -07:00
Craig Topper	8296bcf76f	[X86][CostModel] Fix typos in test. NFC	2020-04-26 21:17:38 -07:00
Craig Topper	5f2ea70980	[X86] Add cost model tests for conversions between <2 x float> and integers. For all but 2 x i32 we were starting from 4 x float.	2020-04-26 19:59:01 -07:00
Craig Topper	b9de62c2b6	[X86] Fix the cost of v16i1->v16i16 sext/zext on avx targets. Previously we were hitting the scalarization case in the default implementation.	2020-04-25 23:16:20 -07:00
Craig Topper	19cb26f517	[X86][CostModel] Improve costs for vXi1 sign_extend/zero_extend with avx512. With avx512 vXi1 is legal and uses k-registers with many custom cases for extending.	2020-04-25 23:16:20 -07:00
Craig Topper	084433702d	[X86][CostModel] Add sext/zext from vXi1 tests to min-legal-vector-width.ll. NFC We aren't properly costing extends from k-registers. I also added command lines without avx512bw to be able to show all the different extending strategies we have.	2020-04-25 23:15:40 -07:00
Craig Topper	061f330d7e	[X86] Add avx512vl to the truncate cost model test. NFC	2020-04-25 12:59:10 -07:00
Craig Topper	999058ba5e	[X86] Add cost model tests for truncating from v2i8/v4i8/v8i8/v16i8 to vXi1. NFC	2020-04-24 23:11:17 -07:00
Craig Topper	7664a0d282	[X86] Improve accuracy of cost for v16i64->v16i8 truncate with avx512. The 2 vpmovqds are only 1 uop each.	2020-04-24 19:13:55 -07:00
Craig Topper	03aa967c0d	[CostModel][X86][ARM] Teach getCastInstrCost to include the splitting factor when handling operations that type legalize to the same number of subvectors or scalar components Previously, we just always returned 1. But that ignores that we have to do the operation for each subvector or scalar component. Differential Revision: https://reviews.llvm.org/D78824	2020-04-24 13:36:26 -07:00
Craig Topper	4cf73a3fc6	[CostModel][X86] Account for splitting cost when vector zext/sext type legalize to the same size vector.	2020-04-24 09:59:23 -07:00
Sam Parker	04ef154124	[NFC] Test changes Add some more targets for the ARM cost model tests and add some tests for icmps and bitcasts.	2020-04-22 08:28:52 +01:00
Craig Topper	8dfb9627b7	[X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead. This moves v32i16/v64i8 to a model consistent with how we treat integer types with avx1. This does change the ABI for types vXi16/vXi8 vectors larger than 512 bits to pass in multiple zmms instead of multiple ymms. We'd already hacked some code to make v64i8/v32i16 pass in zmm. Cost model is still a bit of a mess. In some place I tried to match existing behavior. But really we need to account for splitting and concating costs. Cost model for shuffles is especially pessimistic. Differential Revision: https://reviews.llvm.org/D76212	2020-04-15 12:17:18 -07:00
Simon Pilgrim	2f951e99c6	[CostModel][X86] Regenerate load_store.ll costs tests Add SSE + AVX512 targets Add some illegal type store tests	2020-04-15 11:54:39 +01:00
Craig Topper	2f60fbce6c	[X86] Use a more realisitic cost for truncate v16i64->v16i8 with avx512f. Still not great and we could probably codegen this better, but 11 was clearly ridiculous.	2020-04-13 21:09:43 -07:00
Craig Topper	071c64d68d	[X86] Add a more accurate truncate cost for v8i64->v8i8	2020-04-13 21:09:41 -07:00
Craig Topper	b37b1840eb	[X86] Add truncate cost model tests to min-legal-vector-width.ll for when we're avoiding 512 bit vectors.	2020-04-13 21:09:40 -07:00
Simon Pilgrim	353347288b	[CostModel][X86] Remove comments that begin with a filecheck prefix. Stop filecheck from confusing a general comment with a check.	2020-04-13 18:39:24 +01:00
Simon Pilgrim	91bc50c0d7	[CostModel][X86] Improve InsertElement costs for sub-128bit vectors If we're inserting into v2i8/v4i8/v8i8/v2i16/v4i16 style sub-128bit vectors ensure we don't use the SK_PermuteTwoSrc cost of the legalized value type - this is a followup to rG12c629ec6c59 which added equivalent sub-128bit shuffle costs	2020-04-10 14:55:46 +01:00
Craig Topper	5625e6ab37	[X86] Improve min/max reduction costs. This is similar to what I recently did for getArithmeticReductionCost. I'm trying to account for the narrowing from 512->256->128 as we go. I've also added a new helper method getMinMaxCost that tries to handle the cases where we have native min/max instructions and fall back to cmp+select when we don't. Differential Revision: https://reviews.llvm.org/D76634	2020-04-09 17:28:50 -07:00
Simon Pilgrim	12c629ec6c	[CostModel][X86] Add shuffle costs for some common sub-128bit vectors v2i8/v4i8/v8i8 + v2i16/v4i16 all show up in vectorizer code and by just using the legalized types (v16i8/v8i16) we're highly exaggerating the actual cost of the shuffle.	2020-04-09 19:57:06 +01:00
Sanjay Patel	a2bb19ca42	[x86] add size cost tests for casts and binops; NFC Shows bugs for div/rem/fdiv and possibly others.	2020-04-06 12:38:15 -04:00
Jonathan Roelofs	7c5d2bec76	[llvm] Fix missing FileCheck directive colons https://reviews.llvm.org/D77352	2020-04-06 09:59:08 -06:00
Simon Pilgrim	be84d2b5b7	[CostModel][X86] Add some insert subvector cost tests for vXf32/vXi32/vXi16/vXi8 types	2020-04-04 22:46:57 +01:00
Simon Pilgrim	6a57ba17c0	[CostModel][X86] Add shuffle cost tests for sub-128bit vectors	2020-04-04 13:08:25 +01:00
Simon Pilgrim	87fd686f6f	[CostModel][X86] Add insert/extract cost tests for sub-128bit vXi8/vXi16 vectors	2020-04-04 13:08:25 +01:00
Matt Arsenault	5660bb6bc9	AMDGPU: Remove denormal subtarget features Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.	2020-04-02 17:17:12 -04:00
Craig Topper	b4695351cb	[TTI][X86] Fix the value passed to IsUnsigned for cost modeling of experimental.vector.reduce.smin/smax/umin/umax. We were passing true for smax/smin and false for umax/umin.	2020-03-29 23:34:22 -07:00
Craig Topper	d74533a18b	[X86] Add sse4.1 RUNs lines to the min/max reduction cost model tests. Mostly this matches the sse4.2 we already had command lines for. Except in the i64 case since sse4.1 doesn't have pcmpgtq.	2020-03-29 16:05:35 -07:00
Craig Topper	c0aa97b632	[X86] Add cost model test cases for fmin/fmax reduction.	2020-03-28 17:12:49 -07:00
Craig Topper	f4c67dfa92	[X86] More accurately model the cost of horizontal reductions. This patch attempts to more accurately model the reduction of power of 2 vectors of types we natively support. This takes into account the narrowing of vectors that occur as we go from 512 bits to 256 bits, to 128 bits. It also takes into account the use of wider elements in the shuffles for the first 2 steps of a reduction from 128 bits. And uses a v8i16 shift for the final step of vXi8 reduction. The default implementation uses the legalized type for the arithmetic for all levels. And uses the single source permute cost of the legalized type for all levels. This penalizes things like lack of v16i8 pshufb on pre-sse3 targets and the splitting and joining that needs to be done for integer types on AVX1. We never need v16i8 shuffle for a reduction and we only need split AVX1 ops when type the type wide and needs to be split. I think we're still over costing splits and joins for AVX1, but we're closer now. I've also removed all pairwise special casing because I don't think we ever want to generate that on X86. I've also adjusted the add handling to more accurately account for any type splitting that occurs before we reach a legal type. Differential Revision: https://reviews.llvm.org/D76478	2020-03-22 14:20:15 -07:00
Craig Topper	c13aa36bb7	[X86] Attempt to more accurately model the cost of a bool reduction of wide vector type. Previously we multiplied the cost for the table entries by the number of splits needed. But that implies that each split goes through a reduction to scalar independently. I think what really happens is that the we AND/OR the split pieces until we're down to a single value with a legal type and then do special reduction sequence on that. So to model that this patch takes the number of splits minus one multiplied by the cost of a AND/OR at the legal element count and adds that on top of the table lookup. Differential Revision: https://reviews.llvm.org/D76400	2020-03-19 09:31:05 -07:00
Sam Parker	ef56b55e12	[NFC][ARM] Add thumb triple to test Test the costs of selects for thumbv8m.base too.	2020-03-18 09:15:19 +00:00
Craig Topper	b2da1ddaef	[X86] Add a non-zero cost for truncating v32i16->v32i8 on avx512bw.	2020-03-15 17:18:46 -07:00
Simon Pilgrim	a2db388dce	[CostModel][X86] Improve ISD::CTTZ costs accounting for BSF/TZCNT implementations	2020-03-13 16:51:13 +00:00
Anna Welker	a6d3bec83f	[TTI][ARM][MVE] Refine gather/scatter cost model Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for the gather/scatter cost evaluation. This did require trivial changes in some non-ARM backends to adopt the new parameter. Extending gathers and truncating scatters are now priced cheaper. Differential Revision: https://reviews.llvm.org/D75525	2020-03-11 10:23:41 +00:00
Simon Pilgrim	5cbddf7cbc	[X86][SSE] Add more accurate costs for fmaxnum/fminnum codegen Based off llvm-mca reports on codegen in llvm\test\CodeGen\X86\fmaxnum.ll + llvm\test\CodeGen\X86\fminnum.ll	2020-03-10 11:59:40 +00:00
Simon Pilgrim	0b1dc6016f	[CostModel][X86] Add fmaxnum/fminnum costs tests	2020-03-10 11:18:27 +00:00
David Green	587feec07e	[ARM] Change all tests from "thumbv8.1-m.main" to "thumbv8.1m.main". NFC	2020-03-04 13:47:35 +00:00
Simon Pilgrim	174cb7c695	[CostModel][X86] Add vXi1 extract/insert cost tests	2020-03-02 11:41:20 +00:00
Simon Pilgrim	168a44a70e	[CostModel][X86] Improve extract/insert element costs (PR43605) This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index. It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true). Differential Revision: https://reviews.llvm.org/D74976	2020-02-27 15:54:13 +00:00
Simon Pilgrim	b82438872b	[CostModel][X86] We don't need a scale factor for SLM extract costs D74976 will handle larger vector types, but since SLM doesn't support AVX+ then we will always be extracting from 128-bit vectors so don't need to scale the cost.	2020-02-24 14:23:04 +00:00
Simon Pilgrim	eaa41e103c	[CostModel][X86] Try to check against common prefixes before using target-specific cpu checks SLM/GLM is still a mess so not all of them have been updated yet.	2020-02-24 11:59:07 +00:00
Jonas Paulsson	41bd9ead35	[SystemZ] Return scalarized costs for vector instructions on older archs. A cost query for a vector instruction should return a cost even without target vector support, and not trigger an assert. VectorCombine does this with an input containing source code vectors. Review: Ulrich Weigand	2020-02-21 09:17:37 -08:00
Craig Topper	35625464c6	[X86] Fix the cost model for v16i16->v16i32 zero_extend/sign_extend with AVX2 We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just one extract to split the 16i16 and two v8i16->v8i32 operations so our cost should be 3 not 4. Differential Revision: https://reviews.llvm.org/D73646	2020-01-29 15:52:10 -08:00
David Green	e9c198278e	[ARM] Basic gather scatter cost model This is a very basic MVE gather/scatter cost model, based roughly on the code that we will currently produce. It does not handle truncating scatters or extending gathers correctly yet, as it is difficult to tell that they are going to be correctly extended/truncated from the limited information in the cost function. This can be improved as we extend support for these in the future. Based on code originally written by David Sherwood. Differential Revision: https://reviews.llvm.org/D73021	2020-01-22 14:41:38 +00:00
David Green	0b83e14804	[ARM] MVE Gather Scatter cost model tests. NFC	2020-01-22 14:41:38 +00:00
Simon Pilgrim	5d986a68a5	[CostModel][X86] Add missing scalar i64->f32 uitofp costs	2020-01-06 13:17:02 +00:00
Fangrui Song	502a77f125	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351	2019-12-24 15:57:33 -08:00
Stanislav Mekhanoshin	58578f7056	[AMDGPU] Implemented fma cost analysis Differential Revision: https://reviews.llvm.org/D71676	2019-12-18 23:54:20 -08:00
Stanislav Mekhanoshin	b8ac5894a1	[AMDGPU] Fixed cost model for packed 16 bit ops Differential Revision: https://reviews.llvm.org/D71622	2019-12-17 15:14:17 -08:00
David Green	be7a107070	[ARM] Teach the Arm cost model that a Shift can be folded into other instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966	2019-12-09 10:24:33 +00:00
David Green	f008b5b8ce	[ARM] Additional tests and minor formatting. NFC This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.	2019-12-09 10:24:33 +00:00
Sanjay Patel	7ff0fcb53f	[x86] add cost model special-case for insert/extract from element 0 This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023	2019-12-06 13:50:25 -05:00
Stefan Pintilie	8e84c9ae99	[PowerPC] Separate Features that are known to be Power9 specific from Future CPU The Power 9 CPU has some features that are unlikely to be passed on to future versions of the CPU. This patch separates this out so that future CPU does not inherit them. Differential Revision: https://reviews.llvm.org/D70466	2019-11-27 15:40:13 -06:00
Sanjay Patel	5c166f1d19	[x86] make SLM extract vector element more expensive than default I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607	2019-11-27 14:08:56 -05:00
Matt Arsenault	b337bce871	AMDGPU: Split test functions to avoid dependency on subtarget Prepare this test for moving tthe denormal setting out of the subtarget features.	2019-11-19 11:12:13 +05:30
Craig Topper	a4b7613a49	[X86] Remove setOperationAction for FP_TO_SINT v8i16. This is no longer needed after widening legalization as we custom legalize v8i8 ourselves. Added entries to the cost model, but bumped the cost slightly to account for the truncate shuffle that wasn't costed before.	2019-11-12 22:45:52 -08:00
Tim Renouf	0703db3989	[CostModel] Fixed isExtractSubvectorMask for undef index off end ShuffleVectorInst::isExtractSubvectorMask, introduced in [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) erroneously thought that %340 = shufflevector <4 x float> %339, <4 x float> undef, <3 x i32> <i32 2, i32 3, i32 undef> is a subvector extract, even though it goes off the end of the parent vector with the undef index. That then caused an assert in BasicTTIImplBase::getExtractSubvectorOverhead. This commit fixes that, by not considering the above a subvector extract. Differential Revision: https://reviews.llvm.org/D70005 Change-Id: I87b8b00b24bef19ffc9a1b82ef4eca3b8a246eaf	2019-11-08 15:40:09 +00:00
dfukalov	6e8251046b	[AMDGPU] Fix bug introduced in `47a5c36b37` Summary: [AMDGPU] Fix bug introduced in `47a5c36b37` Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69915	2019-11-07 11:50:14 +03:00
Simon Pilgrim	a091f70610	[CostModel][X86] Improve add vXi64 + fadd vXf64 reduction tests for SLM As noted on D59710 we weren't handling the high costs of these operations on SLM.	2019-11-06 17:55:38 +00:00
Simon Pilgrim	1b986b41ac	[CostModel][X86] Add add/fadd reduction tests for SLM	2019-11-06 17:04:22 +00:00
dfukalov	47a5c36b37	[AMDGPU] Improve code size cost model (part 2) Summary: Added estimations for ShuffleVector, some cast and arithmetic instructions Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69629	2019-11-06 13:55:48 +03:00
Craig Topper	103968d147	[X86] Lower the cost of avx512 horizontal bool and/or reductions to 2*log2(bitwidth)+1 for legal types. This better represents the kshift+binop we'd get for each stage before the final extract. Its likely we'll do even better by doing a kmov and a cmp with a GPR, but this is a good start. The default handling was costing a worst case single source permute shuffle of the vector before the binop. This worst case assumes the shuffle might have to be emulated with extracts and inserts. But since we know we're doing a reduction we can assume we'll get kshift lowering. There's still some room for improvement here, but this is much better than it was.	2019-11-04 22:58:04 -08:00
Daniil Fukalov	3972057511	[AMDGPU] Improve code size cost model Summary: Added estimation for zero size insertelement, extractelement and llvm.fabs operators. Updated inline/unroll parameters default values. Reviewers: rampitec, arsenm Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68881 llvm-svn: 375109	2019-10-17 12:15:35 +00:00
Simon Pilgrim	1385b27e92	[CostModel][X86] Add CTLZ scalar costs Add specific scalar costs for CTLZ instructions, we can't discriminate between CTLZ and CTLZ_ZERO_UNDEF so we have to assume the worst. Given how BSR is often a microcoded nightmare on some older targets we might still be underestimating it. For targets supporting LZCNT (Intel Haswell+ or AMD Fam10+), we provide overrides that assume 1cy costs. llvm-svn: 374786	2019-10-14 16:30:17 +00:00
Simon Pilgrim	151bbba758	[CostModel][X86] Add CTPOP scalar costs (PR43656) Add specific scalar costs for ctpop instructions, these are based on the llvm-mca's SLM throughput numbers (the oldest model we have). For targets supporting POPCNT, we provide overrides that assume 1cy costs. llvm-svn: 374775	2019-10-14 14:07:43 +00:00
Simon Pilgrim	1b59a16c0b	[CostModel][X86] Improve sum reduction costs. I can't see any notable differences in costs between SSE2 and SSE42 arches for FADD/ADD reduction, so I've lowered the target to just SSE2. I've also added vXi8 sum reduction costs in line with the PSADBW codegen and discussions on PR42674. llvm-svn: 374655	2019-10-12 13:21:50 +00:00
Simon Pilgrim	d7ac255325	[CostModel][X86] Add tests for insertelement to non-immediate vector element indices llvm-svn: 374161	2019-10-09 12:36:34 +00:00
Simon Pilgrim	a21176ffb1	[CostModel][X86] Add tests for extractelement from non-immediate vector element indices llvm-svn: 374160	2019-10-09 12:36:22 +00:00
Simon Pilgrim	d7f0207d73	[CostModel][X86] Fix SLM <2 x i64> icmp costs SLM is 2 x slower for <2 x i64> comparison ops than other vector types, we should account for this like we do for SLM <2 x i64> add/sub/mul costs. This should remove some of the SLM codegen diffs in D43582 llvm-svn: 372954	2019-09-26 10:14:38 +00:00
Simon Pilgrim	4d486156e7	[Cost][X86] Add more missing vector truncation costs The AVX512 cases still need some work to correct recognise the PMOV truncation cases. llvm-svn: 372514	2019-09-22 16:46:15 +00:00
Simon Pilgrim	665ccbff60	[Cost][X86] Add v2i64 truncation costs We are missing costs for a lot of truncation cases, I'm hoping to address all the 'zero cost' cases in trunc.ll I thought this was a vector widening side effect, but even before this we had some interesting LV decisions (notably over indvars) being made due to these zero costs. llvm-svn: 372498	2019-09-22 12:04:38 +00:00
Ulrich Weigand	819c1651f7	[SystemZ] Support z15 processor name The recently announced IBM z15 processor implements the architecture already supported as "arch13" in LLVM. This patch adds support for "z15" as an alternate architecture name for arch13. The patch also uses z15 in a number of places where we used arch13 as long as the official name was not yet announced. llvm-svn: 372435	2019-09-20 23:04:45 +00:00
Simon Pilgrim	cacf4db571	[CostModel][X86] Add scalar sext/zext cost tests llvm-svn: 370684	2019-09-02 21:02:51 +00:00
Roman Lebedev	cc95a45f8a	[CostModel] Model all `extractvalue`s as free. Summary: As disscussed in https://reviews.llvm.org/D65148#1606412, `extractvalue` don't actually generate any code, so we should treat them as free. Reviewers: craig.topper, RKSimon, jnspaulsson, greened, asb, t.p.northover, jmolloy, dmgreen Reviewed By: jmolloy Subscribers: javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66098 llvm-svn: 370339	2019-08-29 11:50:30 +00:00
Roland Froese	18db4e9ae1	Recommit [PowerPC] Update P9 vector costs for insert/extract Now that the v1i128 smin regression has been fixed, recommit the P9 cost updates from D60160. llvm-svn: 369952	2019-08-26 19:26:08 +00:00
Craig Topper	d420616313	[X86] Lower the cost of v2i32->v2f64 sint_to_fp under vector widening legalization. I don't really understand the costs we're using for fp_to_sint, but prior to widening legalization we used 20 as the cost for this via the v2i64->v2f64 entry. That number seems better than the 40 we got with widening legalization. So now we need either a v2i32->v2f64 entry or a v4i32->v2f64 entry depending on whether AVX is enabled or not since we skip the first SSE2 table look up under AVX. llvm-svn: 369628	2019-08-22 08:18:45 +00:00
David Green	2bfc13fde1	[ARM] MVE sext costs This adds some sext costs for MVE, taken from the length of assembly sequences that we currently generate. Differential Revision: https://reviews.llvm.org/D66010 llvm-svn: 369244	2019-08-19 09:13:22 +00:00
David Green	b782e61e47	[ARM] MVE sext of a load is free MVE also has some sext of loads, which will be free just as scalar instructions are. Differential Revision: https://reviews.llvm.org/D66008 llvm-svn: 369118	2019-08-16 15:13:37 +00:00
Craig Topper	6eebd2bcd7	[X86] Improve cost model for subvector extraction of less than 128-bit vectors Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements. Differential Revision: https://reviews.llvm.org/D65892 llvm-svn: 369022	2019-08-15 17:29:42 +00:00
Craig Topper	30d3e9c395	[X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than 128-bit inputs Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG. For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high. Differential Revision: https://reviews.llvm.org/D66169 llvm-svn: 368858	2019-08-14 14:52:39 +00:00
Simon Pilgrim	13447d3664	[X86] Add missing regular 512-bit vXi8 extract subvector cost model tests These tests don't cover many cases where the subvectors don't start on aligned indices, but that can be added later. llvm-svn: 368839	2019-08-14 12:36:23 +00:00
David Green	a655393f17	[ARM] Add MVE beats vector cost model The MVE architecture has the idea of "beats", where a vector instruction can be executed over several ticks of the architecture. This adds a similar system into the Arm backend cost model, multiplying the cost of all vector instructions by a factor. This factor essentially becomes the expected difference between scalar code and vector code, on average. MVE Vector instructions can also overlap so the a true cost of them is often lower. But equally scalar instructions can in some situations be dual issued, or have other optimisations such as unrolling or make use of dsp instructions. The default is chosen as 2. This should not prevent vectorisation is a most cases (as the vector instructions will still be doing at least 4 times the work), but it will help prevent over vectorising in cases where the benefits are less likely. This adds things so far to the obvious places in ARMTargetTransformInfo, and updates a few related costs like not treating float instructions as cost 2 just because they are floats. Differential Revision: https://reviews.llvm.org/D66005 llvm-svn: 368733	2019-08-13 18:12:08 +00:00
Simon Pilgrim	e842314e76	[X86] Add some vXi8 extract subvector cost model tests We don't have full 512-bit test coverage yet - but there's enough to help test D65892 llvm-svn: 368716	2019-08-13 16:44:40 +00:00
Roman Lebedev	31ba61bb0d	[CostModel][X86][AArch64] Check all 3 cost kinds in aggregates.ll llvm-svn: 368595	2019-08-12 17:45:12 +00:00
David Green	86876422ef	[ARM] sext of a load is free This teaches the cost model that the sext or zext of a load is going to be free. Differential Revision: https://reviews.llvm.org/D66006 llvm-svn: 368593	2019-08-12 17:39:56 +00:00
David Green	3e39f39ad9	[ARM] MVE shuffle broadcast costs A VDUP will perform a vector broadcast in a single instruction. Update the cost model for MVE accordingly. Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D63448 llvm-svn: 368589	2019-08-12 16:54:07 +00:00
David Green	83bbfaa5e4	[ARM] Put some of the TTI costmodel behind hasNeon calls. This puts some of the calls in ARMTargetTransformInfo.cpp behind hasNeon() checks, now that we have MVE, and updates all the tests accordingly. Differential Revision: https://reviews.llvm.org/D63447 llvm-svn: 368587	2019-08-12 15:59:52 +00:00
David Green	84cb4b2b53	[ARM] Add or update a number of costmodel tests. NFC This adds a number of cost model tests for ARM, useful for MVE. It also re-jigs some of the existing tests to make them easier to update and read. llvm-svn: 368586	2019-08-12 15:40:27 +00:00
Roman Lebedev	d68a277f23	[CostModel][X86][AArch64] Add some tests for extractvalue In https://reviews.llvm.org/D65148 it is suggested that it should have zero cost, always. llvm-svn: 368548	2019-08-12 09:24:33 +00:00
Craig Topper	396f6c7e90	Recommit r368081 "[X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors." llvm-svn: 368185	2019-08-07 16:42:47 +00:00
Craig Topper	8b5f2ab2a4	Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183	2019-08-07 16:24:26 +00:00
Mitch Phillips	924359dc0f	Revert "[X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors." This reverts commit `fc33e33776`. This commit depends on the rolled back commit rL367901, and thus needs to be rolled back. llvm-svn: 368109	2019-08-06 23:38:14 +00:00
Mitch Phillips	bd0d97e1c4	Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default." This reverts commit `3de33245d2`. This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information. llvm-svn: 368107	2019-08-06 23:00:43 +00:00
Craig Topper	fc33e33776	[X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors. With the switch to widening legalization, we need to a better job of costing extractions of less than 128-bits. llvm-svn: 368081	2019-08-06 20:12:41 +00:00
Craig Topper	b1e4da2b90	[X86] Remove tests for -x86-experimental-vector-widening-legalization from test/Analysis/CostModel/X86/ This flag is now the default behavior so we don't need separate tests. llvm-svn: 368080	2019-08-06 20:12:34 +00:00
Craig Topper	3de33245d2	[X86] Enable -x86-experimental-vector-widening-legalization by default. This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 367901	2019-08-05 18:25:36 +00:00
Ulrich Weigand	0f0a8b7784	[SystemZ] Add support for new cpu architecture - arch13 This patch series adds support for the next-generation arch13 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Assembler/disassembler support for new instructions. - CodeGen for new instructions, including new LLVM intrinsics. - Scheduler description for the new processor. - Detection of arch13 as host processor. Note: No currently available Z system supports the arch13 architecture. Once new systems become available, the official system name will be added as supported -march name. llvm-svn: 365932	2019-07-12 18:13:16 +00:00
Jordan Rupprecht	351b7e7b24	Revert Recommit [PowerPC] Update P9 vector costs for insert/extract element This reverts r364557 (git commit `9f7f5858fe`) This crashes as reported on the commit thread. Repro instructions TBD. llvm-svn: 364876	2019-07-01 23:29:46 +00:00
Roland Froese	9f7f5858fe	Recommit [PowerPC] Update P9 vector costs for insert/extract element Recommit patch D60160 after regression fix patch D63463. llvm-svn: 364557	2019-06-27 16:20:24 +00:00
Fangrui Song	ac14f7b10c	[lit] Delete empty lines at the end of lit.local.cfg NFC llvm-svn: 363538	2019-06-17 09:51:07 +00:00
Sander de Smalen	51c2fa0e2a	Improve reduction intrinsics by overloading result value. This patch uses the mechanism from D62995 to strengthen the definitions of the reduction intrinsics by letting the scalar result/accumulator type be overloaded from the vector element type. For example: ; The LLVM LangRef specifies that the scalar result must equal the ; vector element type, but this is not checked/enforced by LLVM. declare i32 @llvm.experimental.vector.reduce.or.i32.v4i32(<4 x i32> %a) This patch changes that into: declare i32 @llvm.experimental.vector.reduce.or.v4i32(<4 x i32> %a) Which has the type-constraint more explicit and causes LLVM to check the result type with the vector element type. Reviewers: RKSimon, arsenm, rnk, greened, aemerson Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D62996 llvm-svn: 363240	2019-06-13 09:37:38 +00:00
David Green	c5471c2a57	[ARM] Adjust isLegalT1AddressImmediate for non-legal types Types such as float and i64's do not have legal loads in Thumb1, but will still be loaded with a LDR (or potentially multiple LDR's). As such we can treat the cost of addressing mode calculations the same as an i32 and get some optimisation benefits. Differential Revision: https://reviews.llvm.org/D62968 llvm-svn: 362874	2019-06-08 10:32:53 +00:00
David Green	342d1b81a3	[ARM] Add MVE addressing to isLegalT2AddressImmediate Now with MVE being added, we can add the vector addressing mode costs for it. These are generally imm7 multiplied by the size of the type being loaded / stored. Differential Revision: https://reviews.llvm.org/D62967 llvm-svn: 362873	2019-06-08 10:18:23 +00:00
David Green	4ecce205d5	[ARM] Add fp16 addressing to isLegalT2AddressImmediate The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs to account for those, and adds extra testing. It is dependant upon hasFPRegs16 as this is what the load/store instructions require. Differential Revision: https://reviews.llvm.org/D62966 llvm-svn: 362872	2019-06-08 10:09:02 +00:00
David Green	990eb2d1e8	[ARM] Add extra gep costmodel tests for MVE and half float. NFC llvm-svn: 362871	2019-06-08 09:58:05 +00:00
Luis Marques	711f361596	[RISCV] Disable test/Analysis/CostModel/RISCV tests if RISCV backend not built Adds missing lit.local.cfg. Fixes rL362691. llvm-svn: 362693	2019-06-06 10:12:28 +00:00
Luis Marques	cff7d2fdc9	[RISCV] Add CostModel GEP tests Differential Revision: https://reviews.llvm.org/D61185 llvm-svn: 362691	2019-06-06 09:47:53 +00:00
Matt Arsenault	8dbeb9256c	TTI: Improve default costs for addrspacecast For some reason multiple places need to do this, and the variant the loop unroller and inliner use was not handling it. Also, introduce a new wrapper to be slightly more precise, since on AMDGPU some addrspacecasts are free, but not no-ops. llvm-svn: 362436	2019-06-03 18:41:34 +00:00
Simon Pilgrim	8a32ca381d	[CostModel][X86] Improve masked load/store AVX1/AVX2 costs A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338	2019-06-02 20:37:02 +00:00
Simon Pilgrim	9e7be9b745	[CostModel][X86] Add bool vector and/or/xor cost tests llvm-svn: 362083	2019-05-30 10:41:04 +00:00
Craig Topper	50d502826b	[CostModel] Add really basic support for being able to query the cost of the FNeg instruction. Summary: This reuses the getArithmeticInstrCost, but passes dummy values of the second operand flags. The X86 costs are wrong and can be improved in a follow up. I just wanted to stop it from reporting an unknown cost first. Reviewers: RKSimon, spatel, andrew.w.kaylor, cameron.mcinally Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62444 llvm-svn: 361788	2019-05-28 04:09:18 +00:00
Craig Topper	ba883e980a	[X86] Add test cases for D62444. NFC llvm-svn: 361745	2019-05-27 05:27:57 +00:00
Jonas Paulsson	19871f848b	[CodeMetrics] Don't let extends of i1 be free. getUserCost() currently returns TCC_Free for any extend of a compare (i1) result. It seems this is only true in a limited number of cases where for example two compares are chained. Even in those types of cases it seems unlikely that they are generally free, while they may be in some cases. This patch therefore removes this special handling of cast of i1. No tests are failing because of this. If some target want the old behavior, it could override getUserCost(). Review: Hal Finkel, Chandler Carruth, Evgeny Astigeevich, Simon Pilgrim, Ulrich Weigand https://reviews.llvm.org/D54742/new/ llvm-svn: 360970	2019-05-17 01:26:35 +00:00
Simon Pilgrim	6b10fde69b	[CostModel][X86] Add min/max reduction costs for all SSE targets The original costs stopped at SSE42, I've added conservative estimates for everything down to SSE1/SSE2 and moved some of the SSE42 costs to SSE41 (really only the addition of PCMPGT makes any difference). I've also added missing vXi8 costs (we use PHMINPOSUW for i8/i16 for scarily quick results) and 256-bit vector costs for AVX1. llvm-svn: 360528	2019-05-11 17:12:52 +00:00
David L. Jones	fccb505f0f	Revert "[llvm] r359313 - [PowerPC] Update P9 vector costs for insert/extract element" This causes segfaults during optimized builds. More details, including a reproducer, are on the llvm-commits thread for r359313. llvm-svn: 359648	2019-05-01 05:01:03 +00:00
Sjoerd Meijer	ea31ddb36f	[ARM] Implement TTI::getMemcpyCost This implements TargetTransformInfo method getMemcpyCost, which estimates the number of instructions to which a memcpy instruction expands to. Differential Revision: https://reviews.llvm.org/D59787 llvm-svn: 359547	2019-04-30 10:28:50 +00:00
Roland Froese	4b17772b9e	[PowerPC] Update P9 vector costs for insert/extract element The PPC vector cost model values for insert/extract element reflect older processors that lacked vector insert/extract and move-to/move-from VSR instructions. Update getVectorInstrCost to give appropriate values for when the newer instructions are present. Differential Revision: https://reviews.llvm.org/D60160 llvm-svn: 359313	2019-04-26 16:14:17 +00:00
David Green	0d741507f7	[ARM] Rewrite isLegalT2AddressImmediate This does two main things, firstly adding some at least basic addressing modes for i64 types, and secondly treats floats and doubles sensibly when there is no fpu. The floating point change can help codesize in some cases, especially with D60294. Most backends seems to not consider the exact VT in isLegalAddressingMode, instead switching on type size. That is now what this does when the target does not have an fpu (as the float data will be loaded using LDR's). i64's currently use the address range of an LDRD (even though they may be legalised and loaded with an LDR). This is at least better than marking them all as illegal addressing modes. I have not attempted to do much with vectors yet. That will need changing once MVE is added. Differential Revision: https://reviews.llvm.org/D60677 llvm-svn: 358845	2019-04-21 09:54:29 +00:00
Roland Froese	a5dd08cac2	[PowerPC] Add some PPC vec cost tests to prep for D60160 NFC llvm-svn: 358699	2019-04-18 18:12:09 +00:00
Simon Pilgrim	9daacec816	[CostModel][X86] Add bool anyof/allof reduction costs On pre-AVX512 targets we can use MOVMSK to extract reduced boolean results. This is properly optimized, annoyingly AVX512 isn't and produces code that is almost as bad as the (unchanged) costs suggest...... Differential Revision: https://reviews.llvm.org/D60403 llvm-svn: 358574	2019-04-17 10:58:19 +00:00
Simon Pilgrim	2ea8dbf564	[CostModel][X86] Add more exhaustive masked load/store/gather/scatter/expand/compress cost tests llvm-svn: 357838	2019-04-06 12:08:37 +00:00
Sjoerd Meijer	633fb0f266	[TTI] getMemcpyCost This adds new function getMemcpyCost to TTI so that the cost of a memcpy can be modeled and queried. The default implementation returns Expensive, but targets can override this function to model the cost more accurately. Differential Revision: https://reviews.llvm.org/D59252 llvm-svn: 356555	2019-03-20 14:15:46 +00:00
Matt Arsenault	e0c1f9e76d	AMDGPU: Partially fix default device for HSA There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347	2019-03-17 21:31:35 +00:00
Tim Renouf	e30aa6a136	[AMDGPU] Prepare for introduction of v3 and v5 MVTs AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342	2019-03-17 21:04:16 +00:00
Simon Pilgrim	42bf2dd629	[TTI] Add generic cost model for smul/umul overflow intrinsics Based off smul/umul fixed costs and the implementation in TargetLowering::expandMULO. llvm-svn: 354784	2019-02-25 13:30:23 +00:00
Simon Pilgrim	9caf0f0d15	[TTI] Add generic cost model for fixed point smul/umul Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline. Differential Revision: https://reviews.llvm.org/D57925 llvm-svn: 354774	2019-02-25 11:59:23 +00:00
Simon Pilgrim	fbb3086fc8	[CostModel][X86] Add UMUL fixed point cost tests llvm-svn: 353153	2019-02-05 10:55:38 +00:00
Nikita Popov	8e1a464e6a	[CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADD Followup to D56636, this time handling the UADDSAT case by expanding uadd.sat(a, b) to umin(a, ~b) + b. Differential Revision: https://reviews.llvm.org/D56869 llvm-svn: 352409	2019-01-28 19:19:09 +00:00
Simon Pilgrim	adca820927	[TTI] Add generic SADDSAT/SSUBSAT costs Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow. This completes PR40316 Differential Revision: https://reviews.llvm.org/D57239 llvm-svn: 352315	2019-01-27 13:51:59 +00:00
Nemanja Ivanovic	7d007ddedf	[PowerPC] Update Vector Costs for P9 For the power9 CPU, vector operations consume a pair of execution units rather than one execution unit like a scalar operation. Update the target transform cost functions to reflect the higher cost of vector operations when targeting Power9. Patch by RolandF. Differential revision: https://reviews.llvm.org/D55461 llvm-svn: 352261	2019-01-26 01:18:48 +00:00
Simon Pilgrim	30b206b5da	[CostModel][X86] Add SMUL fixed point cost tests llvm-svn: 352046	2019-01-24 13:48:20 +00:00
Simon Pilgrim	47ca8606ba	[TTI] Add generic SADDO/SSUBO costs Added x86 scalar sadd_with_overflow/ssub_with_overflow costs. llvm-svn: 352045	2019-01-24 13:36:45 +00:00
Simon Pilgrim	a131e4e296	[TTI] Add generic UADDSAT/USUBSAT costs Add generic costs calculation for UADDSAT/USUBSAT intrinsics, this fallbacks to using generic costs for uadd_with_overflow/usub_with_overflow + a select. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352044	2019-01-24 12:27:10 +00:00
Simon Pilgrim	2d1964b90f	[TTI] Add generic UADDO/USUBO costs Added x86 scalar uadd_with_overflow/usub_with_overflow costs. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352043	2019-01-24 12:10:20 +00:00
Simon Pilgrim	f87226eb70	[IR] Match intrinsic parameter by scalar/vectorwidth This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth. The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth. I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type. The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour Differential Revision: https://reviews.llvm.org/D57090 llvm-svn: 351957	2019-01-23 16:00:22 +00:00
Simon Pilgrim	ee900efb30	[CostModel][X86] Add ICMP Predicate specific costs First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly. Differential Revision: https://reviews.llvm.org/D57013 llvm-svn: 351810	2019-01-22 12:29:38 +00:00
Simon Pilgrim	44feb4a87b	[CostModel][X86] Add XOP icmp cost tests (PR40376) llvm-svn: 351741	2019-01-21 11:33:52 +00:00
Simon Pilgrim	c934d3a01b	[CostModel][X86] Add explicit vector select costs Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)\|(Y & ~C)) bit select. Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason). The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests. llvm-svn: 351685	2019-01-20 13:55:01 +00:00
Simon Pilgrim	1231904c48	[CostModel][X86] Add explicit fcmp costs for pre-SSE42 targets Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era llvm-svn: 351684	2019-01-20 13:21:43 +00:00

... 5 6 7 8 9 ...

1068 Commits