llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	8dfb9627b7	[X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead. This moves v32i16/v64i8 to a model consistent with how we treat integer types with avx1. This does change the ABI for types vXi16/vXi8 vectors larger than 512 bits to pass in multiple zmms instead of multiple ymms. We'd already hacked some code to make v64i8/v32i16 pass in zmm. Cost model is still a bit of a mess. In some place I tried to match existing behavior. But really we need to account for splitting and concating costs. Cost model for shuffles is especially pessimistic. Differential Revision: https://reviews.llvm.org/D76212	2020-04-15 12:17:18 -07:00
Simon Pilgrim	91bc50c0d7	[CostModel][X86] Improve InsertElement costs for sub-128bit vectors If we're inserting into v2i8/v4i8/v8i8/v2i16/v4i16 style sub-128bit vectors ensure we don't use the SK_PermuteTwoSrc cost of the legalized value type - this is a followup to rG12c629ec6c59 which added equivalent sub-128bit shuffle costs	2020-04-10 14:55:46 +01:00
Simon Pilgrim	168a44a70e	[CostModel][X86] Improve extract/insert element costs (PR43605) This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index. It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true). Differential Revision: https://reviews.llvm.org/D74976	2020-02-27 15:54:13 +00:00
Simon Pilgrim	b82438872b	[CostModel][X86] We don't need a scale factor for SLM extract costs D74976 will handle larger vector types, but since SLM doesn't support AVX+ then we will always be extracting from 128-bit vectors so don't need to scale the cost.	2020-02-24 14:23:04 +00:00
Simon Pilgrim	eaa41e103c	[CostModel][X86] Try to check against common prefixes before using target-specific cpu checks SLM/GLM is still a mess so not all of them have been updated yet.	2020-02-24 11:59:07 +00:00
Sanjay Patel	7ff0fcb53f	[x86] add cost model special-case for insert/extract from element 0 This is a follow-up to D70607 where we made any extract element on SLM more costly than default. But that is pessimistic for extract from element 0 because that corresponds to x86 movd/movq instructions. These generally have >1 cycle latency, but they are probably implemented as single uop instructions. Note that no vectorization tests are affected by this change. Also, no targets besides SLM are affected because those are falling through to the default cost of 1 anyway. But this will become visible/important if we add more specializations via cost tables. Differential Revision: https://reviews.llvm.org/D71023	2019-12-06 13:50:25 -05:00
Sanjay Patel	5c166f1d19	[x86] make SLM extract vector element more expensive than default I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607	2019-11-27 14:08:56 -05:00
Tim Renouf	0703db3989	[CostModel] Fixed isExtractSubvectorMask for undef index off end ShuffleVectorInst::isExtractSubvectorMask, introduced in [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) erroneously thought that %340 = shufflevector <4 x float> %339, <4 x float> undef, <3 x i32> <i32 2, i32 3, i32 undef> is a subvector extract, even though it goes off the end of the parent vector with the undef index. That then caused an assert in BasicTTIImplBase::getExtractSubvectorOverhead. This commit fixes that, by not considering the above a subvector extract. Differential Revision: https://reviews.llvm.org/D70005 Change-Id: I87b8b00b24bef19ffc9a1b82ef4eca3b8a246eaf	2019-11-08 15:40:09 +00:00
Craig Topper	6eebd2bcd7	[X86] Improve cost model for subvector extraction of less than 128-bit vectors Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements. Differential Revision: https://reviews.llvm.org/D65892 llvm-svn: 369022	2019-08-15 17:29:42 +00:00
Simon Pilgrim	13447d3664	[X86] Add missing regular 512-bit vXi8 extract subvector cost model tests These tests don't cover many cases where the subvectors don't start on aligned indices, but that can be added later. llvm-svn: 368839	2019-08-14 12:36:23 +00:00
Simon Pilgrim	e842314e76	[X86] Add some vXi8 extract subvector cost model tests We don't have full 512-bit test coverage yet - but there's enough to help test D65892 llvm-svn: 368716	2019-08-13 16:44:40 +00:00
Craig Topper	396f6c7e90	Recommit r368081 "[X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors." llvm-svn: 368185	2019-08-07 16:42:47 +00:00
Mitch Phillips	924359dc0f	Revert "[X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors." This reverts commit `fc33e33776`. This commit depends on the rolled back commit rL367901, and thus needs to be rolled back. llvm-svn: 368109	2019-08-06 23:38:14 +00:00
Craig Topper	fc33e33776	[X86] Add more extract subvector cost model tests for smaller element sizes and smaller than 128-bit vectors. With the switch to widening legalization, we need to a better job of costing extractions of less than 128-bits. llvm-svn: 368081	2019-08-06 20:12:41 +00:00
Simon Pilgrim	f4cd292ba2	[CostModel][X86] SK_ExtractSubvector is cheap if the (legal) subvector is aligned within the source vector llvm-svn: 346664	2018-11-12 15:48:06 +00:00
Simon Pilgrim	631f2bf51e	[CostModel] Add more realistic SK_ExtractSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type. llvm-svn: 346656	2018-11-12 14:25:23 +00:00
Simon Pilgrim	fc8f1d7da7	[CostModel][X86] SK_ExtractSubvector is free if the subvector is at the start of the source vector llvm-svn: 346538	2018-11-09 19:04:27 +00:00
Simon Pilgrim	d0c71609c5	[CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) Add ShuffleVectorInst::isExtractSubvectorMask helper to match shuffle masks. llvm-svn: 346510	2018-11-09 16:28:19 +00:00
Simon Pilgrim	acd10b9f6c	[CostModel][X86] Add some initial extract/insert subvector shuffle cost tests Just f64/i64 tests initially to demonstrate PR39368 llvm-svn: 344857	2018-10-20 17:38:33 +00:00

19 Commits