llvm-project

Commit Graph

Author	SHA1	Message	Date
Arthur Eubanks	e23aee7175	[test] Update some legacy PM tests	2022-09-30 11:31:02 -07:00
Simon Pilgrim	5fc7bbfaa3	[SLP][X86] Add test coverage for Issue #58054	2022-09-30 13:26:31 +01:00
Simon Pilgrim	5849fcb635	Revert rG1b7089fe67b924bdd5ecef786a34bdba7a88778f "[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions" Revert rGef89409a59f3b79ae143b33b7d8e6ee6285aa42f "Fix 'unused-lambda-capture' gcc warning. NFCI." Revert rG926ccfef032d206dcbcdf74ca1e3a9ebf4d1be45 "[SLP] ScalarizationOverheadBuilder - demand all elements for scalarization if the extraction index is unknown / out of bounds" Revert ScalarizationOverheadBuilder sequence from D134605 - when accumulating extraction costs by Type (instead of specific Value), we are not distinguishing enough when they are coming from the same source or not, and we always just count the cost once. This needs addressing before we can use getScalarizationOverhead properly.	2022-09-30 11:22:48 +01:00
Simon Pilgrim	19782a46f8	[SLP][X86] Add test case for crash reported on D134605	2022-09-30 11:07:54 +01:00
Simon Pilgrim	28a3fc39a6	[SLP][AArch64] Add test case for vectorization regression case reported on D134605	2022-09-30 11:07:54 +01:00
Philip Reames	02bfe2de7c	[RISCV] Adjust vector immediate store materialization cost This change updates the costs to make constant pool loads match their actual cost, and adds the broadcast special case to avoid too many regressions. We really need more information about the constants being rematerialized, but this is an incremental improvement. Differential Revision: https://reviews.llvm.org/D134746	2022-09-29 07:37:13 -07:00
Philip Reames	17f2ee804a	[RISCV][SLP] Add test coverage for stores of constants	2022-09-27 07:53:33 -07:00
Simon Pilgrim	1b7089fe67	[SLP] Add ScalarizationOverheadBuilder helper to track vector extractions Instead of accumulating all extraction costs separately and then adjusting for repeated subvector extractions, this patch collects all the extractions and then converts to calls to getScalarizationOverhead to improve the accuracy of the costs. I'm not entirely satisfied with the getExtractWithExtendCost handling yet - this still just adds all the getExtractWithExtendCost costs together - it really needs to be replaced with a "getScalarizationOverheadWithExtend", but that will require further refactoring first. This replaces my initial attempt in D124769. Differential Revision: https://reviews.llvm.org/D134605	2022-09-27 14:49:07 +01:00
Simon Pilgrim	200fe18c5c	[SLP][X86] Add missing triple from pr49081.ll test	2022-09-25 16:22:42 +01:00
Simon Pilgrim	a6e9141505	[TTI] Add OperandValueProperties::OP_NegatedPowerOf2 enum (PR51436) The mul by constant costmodels handle power-of-2 constants, but not negated-power-of-2, despite the backends handling both. This patch adds the OperandValueProperties::OP_NegatedPowerOf2 enum and wires it for use for basic mul cost analysis and SLP handling. Fixes #50778 Differential Revision: https://reviews.llvm.org/D111968	2022-09-23 14:03:18 +01:00
Simon Pilgrim	b2cd8118d0	[CostModel][X86] Add CostKinds handling for smax/smin/umax/umin instructions This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-22 10:19:23 +01:00
Alexey Bataev	e664dea182	[SLP]Fix write-after-bounds. Mask might be larger than the NumElts-OffsetBeg, need to use actual indices to avoid acces out of bounds.	2022-09-21 08:00:15 -07:00
Vitaly Buka	bbef90ace4	[IRBuilder] Use PoisonValue in CreateMasked* Followup to `72b776168c` Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D133967	2022-09-19 11:01:41 -07:00
Sebastian Peryt	99c9b37d11	[NFC][1/n] Remove -enable-new-pm=0 flags from lit tests This is the first patch in a series intended for removing flag -enable-new-pm=0 from lit tests. This is part of a bigger effort of completely removing legacy code related to legacy pass manager in favor of currently default new pass manager. In this patch flag has been removed only from tests where no significant change has been required because checks has been duplicated for both PMs. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D134150	2022-09-19 09:57:37 -07:00
Simon Pilgrim	2538adde5c	[CostModel][X86] Add CostKinds handling for cttz This was achieved with the 'cost-tables vs llvm-mca' script D103695	2022-09-19 15:57:03 +01:00
Simon Pilgrim	d90a42d64c	[CostModel][X86] Add CTLZ_ZERO_UNDEF/CTTZ_ZERO_UNDEF cost handling Without LZCNT/BMI, the *_ZERO_UNDEF costs are cheaper as they can avoid the zero handling.	2022-09-19 14:06:33 +01:00
Simon Pilgrim	5665d0941a	[SLP][X86] Add AVX512 test coverage to CTLZ/CTTZ tests Only AVX512 has decent CTTZ/CTLZ vector ops, add tests to ensure we definitely vectorize these	2022-09-19 13:07:55 +01:00
Alexey Bataev	5d13b12674	[SLP]Improve isUndefVector function by adding insertelement analysis. Added the mask and the analysis of the buildvector sequence in the isUndefVector function, improves codegen and cost estimation. Metric: SLP.NumVectorInstructions Program SLP.NumVectorInstructions results results0 diff test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 27362.00 27360.00 -0.0% Metric: size..text Program size..text results results0 diff test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 805299.00 806035.00 0.1% 526.blender_r - some extra code is vectorized. 508.namd_r - some extra code is optimized out. Differential Revision: https://reviews.llvm.org/D133891	2022-09-16 14:36:38 -07:00
Simon Pilgrim	23cb1c42cd	[CostModel][X86] Update throughput costs for CTLZ ops This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (and recent fixes to the bdver2 + alderlake models) Adding full CostKinds costs are affecting some other tests as they make assumptions about SizeLatency costs, so they need addressing first	2022-09-16 16:56:49 +01:00
Simon Pilgrim	94620e4fc3	[CostModel][X86] Add CostKinds handling for vector shift by generic/non-uniform shift amounts These are the worst case generic vector shift costs, where nothing is known about the shift amounts - in particular this should stop us using the default sizelatency cost of 1 for so many pre-AVX2 vector shifts that can often actually expand during lowering to +20 uops, just for 128-bit vectors, resulting in some horrible inline/unroll decisions. This was achieved with an updated version of the 'cost-tables vs llvm-mca' script D103695 (I'll update the patch soon for reference)	2022-09-15 16:51:58 +01:00
Valery N Dmitriev	18dde772d6	[SLP] Unify main/alternate selection for CmpInst instructions Make main/alternate operation selection logic for CmpInst consistent across SLP vectorizer. Differential Revision: https://reviews.llvm.org/D133430	2022-09-13 09:20:25 -07:00
Florian Hahn	3fd1cc2574	[SLP] Add Preheader to CSE blocks after hoisting CSE-able instrs. Adding the pre-header to CSEBlocks ensures instructions are CSE'd even after hoisting. This was original discovered by @atrick a while ago. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D133649	2022-09-12 15:53:31 +01:00
Alexey Bataev	dfe1e9dd79	[SLP]Improve reordering of clustered reused scalars. If the reused scalars are clustered, i.e. each part of the reused mask contains all elements of the original scalars exactly once, we can reorder those clusters to improve the whole ordering of of the clustered vectors. Differential Revision: https://reviews.llvm.org/D133524	2022-09-12 06:52:25 -07:00
Florian Hahn	3ff7892005	[SLP] Add test case showing missing CSE in hoisted instructions.	2022-09-10 20:55:04 +01:00
Simon Pilgrim	10edf88458	[CostModel][X86] Update CTPOP costs With the bdver2 model updates, many of the AVX1 costs were far too high - it also helped expose some costs mismatches for Atom/Silvermont	2022-09-10 17:57:20 +01:00
Alexey Bataev	982d9ef1c1	[SLP]Fix PR55734: SLP vectorizer's reduce_and formation introduces poison. Need either follow the original order of the operands for bool logical ops, or emit freeze instruction to avoid poison propagation. Differential Revision: https://reviews.llvm.org/D126877	2022-09-01 05:34:45 -07:00
Florian Hahn	e68594ca20	[SLP] Add FMA test case with missing or partial fast-math flags. Add extra FMA tests with missing or partial fast-math flags.	2022-08-31 16:25:18 +01:00
Alexey Bataev	ec06df9459	[SLP]Fix PR57447: Assertion `!getTreeEntry(V) && "Scalar already in tree!"' failed. The pointer operands for the ScatterVectorize node may contain non-instruction values and they are not checked for "already being vectorized". Need to check that such pointers are already vectorized and gather them instead of trying to build vectorize node to avoid compiler crash. Differential Revision: https://reviews.llvm.org/D132949	2022-08-30 12:30:14 -07:00
Valery N Dmitriev	329b972d41	[SLP] Try to match reductions before trying to vectorize a vector build sequence. This patch changes order of searching for reductions vs other vectorization possibilities. The idea is if we do not match a reduction it won't be harmful for further attempts to find vectorizable operations on a vector build sequences. But doing it in the opposite order we have good chance to ruin opportunity to match a reduction later. We also don't want to try vectorizing binary operations too early as 2-way vectorization may effectively prohibit wider ones leading to producing less effective code. Differential Revision: https://reviews.llvm.org/D132590	2022-08-29 13:32:14 -07:00
Alexey Bataev	beacf9bd9e	[SLP]Fix PR57322: vectorize constant float stores. Stores for constant floats must be vectorized, improve analysis in SLP vectorizer for stores. Differential Revision: https://reviews.llvm.org/D132750	2022-08-29 11:02:53 -07:00
Florian Hahn	3c5e24a51c	[SLP] Add tests showing over-eager SLP when scalar fma can be used. Add test cases for AArch64 that show over-eager SLP vectorization on AArch64, where keeping the things scalar allows efficient lowering using scalar fmas.	2022-08-29 18:58:56 +01:00
Alexey Bataev	e6345bf644	[SLP]Improve lookup of the buildvector top insertelement instruction. When estimating the cost of the in-tree vectorized scalars in buildvector sequences, need to take into account the vectorized insertelement instruction. The top of the buildvector seuences is the topmost vectorized insertelement instruction, because it will have > than 1 use after the vectorization. For the affected test case improves througput from 21 to 16 (per llvm-mca). Differential Revision: https://reviews.llvm.org/D132740	2022-08-29 08:19:52 -07:00
Philip Reames	a310637132	[RISCV] Disable SLP vectorization by default due to unresolved profitability issues This change implements a TTI query with the goal of disabling slp vectorization on RISCV. The current default configuration disables SLP already, but its current tied to the ability to lower fixed length vectors. Over in D131508, I want to enable fixed length vectors for purposes of LoopVectorizer, but preliminary analysis has revealed a couple of SLP specific issues we need to resolve before enabling it by default. This change exists to allow us to enable LV without SLP. Differential Revision: https://reviews.llvm.org/D132680	2022-08-26 14:11:22 -07:00
Alexey Bataev	c7b815a510	[SLP][NFC]Add a test for vectorization of stores with float constants, NFC.	2022-08-26 10:22:07 -07:00
Alexey Bataev	af8477f2bc	[SLP][NFC]Add a test for incorrectly calculated cost for extracted buildvector sequence, NFC.	2022-08-26 06:29:15 -07:00
Valery N Dmitriev	f9ceb71542	[SLP][NFC] Add a coverage test for horizontal reductions. Reduction feeds single insertelement instruction.	2022-08-25 16:02:22 -07:00
Valery N Dmitriev	e3dd0ddc5b	[SLP][NFC] Add test case exposing deficiency in finding reductions that feed buildvector sequence. Differential Revision: https://reviews.llvm.org/D132506	2022-08-24 11:13:40 -07:00
Alexey Bataev	c167028684	[SLP]Delay vectorization of postponable values for instructions with no users. SLP vectorizer tries to find the reductions starting the operands of the instructions with no-users/void returns/etc. But such operands can be postponable instructions, like Cmp, InsertElement or InsertValue. Such operands still must be postponed, vectorizer should not try to vectorize them immediately. Differential Revision: https://reviews.llvm.org/D131965	2022-08-19 08:39:16 -07:00
Alexey Bataev	0e7ed32c71	[SLP]Cost for a constant buildvector. In many cases constant buildvector results in a vector load from a constant/data pool. Need to consider this cost too. Differential Revision: https://reviews.llvm.org/D126885	2022-08-19 08:02:42 -07:00
Simon Pilgrim	b864cad7b4	[CostModel][X86] Adjust SLM select costs to match poor throughput of pblendvb/blendvpd/blendvps	2022-08-18 17:05:38 +01:00
Alexey Bataev	65c7cecb13	[SLP]Fix PR51320: Try to vectorize single store operands. Currently, we try to vectorize values, feeding into stores, only if slp-vectorize-hor-store option is provided. We can safely enable vectorization of the value operand of a single store in the basic block, if the operand value is used only in store. It should enable extra vectorization and should not increase compile time significantly. Fixes https://github.com/llvm/llvm-project/issues/51320 Differential Revision: https://reviews.llvm.org/D131894	2022-08-16 07:25:21 -07:00
Alexey Bataev	aed5e3bea1	[SLP][NFC]Add a test for delaying of insertelements vectorization, NFC.	2022-08-15 13:26:51 -07:00
Philip Reames	1062595808	[RISCV][SLP] Add some basic test coverage	2022-08-11 13:05:14 -07:00
Vasileios Porpodas	f669030373	[TTI][AArch64][SLP] Sets the cost of an ADD reduction 2xi64 to 2. 2xi64 is the legalized type for wide reductions (like 16xi64) and setting the cost to 2 makes `load-reduce` and `load-zext-reduce` patterns profitable. The few performance measurments that I did on an aarch64 machine confirm that these patterns are actually faster when vectorized. Differential Revision: https://reviews.llvm.org/D130740	2022-08-01 13:03:14 -07:00
Vasileios Porpodas	2d9eae4152	[NFC][TTI][AArch64][SLP] Precommit test for a TTI cost fix of i64 add reductions.	2022-08-01 11:20:12 -07:00
William Schmidt	bccc9aa81c	Don't vectorize PHIs in catchswitch blocks We currently assert in vectorizeTree(TreeEntry*) when processing a PHI bundle in a block containing a catchswitch. We attempt to set the IRBuilder insertion point following the catchswitch, which is invalid. This is done so that ShuffleBuilder.finalize() knows where to insert a shuffle if one is needed. To avoid this occurring, watch out for catchswitch blocks during buildTree_rec() processing, and avoid adding PHIs in such blocks to the vectorizable tree. It is unlikely that constraining vectorization over an exception path will cause a noticeable performance loss, so this seems preferable to trying to anticipate when a shuffle will and will not be required.	2022-07-19 06:10:17 -07:00
Evgeniy Brevnov	8f90edeb55	Additional regression test for a crash during reorder masked gather nodes	2022-07-19 19:03:53 +07:00
Sanjay Patel	10ebaf7686	[SLP] add test for load combining + shuffling; NFC issue #38821	2022-07-04 10:55:07 -04:00
David Green	2de05afc19	[SLP] Peek into loads when hitting the RecursionMaxDepth This patch slightly extends the limit on the RecursionMaxDepth inside the SLP vectorizer. It does it only when it hits a load (or zext/sext of a load), which allows it to peek through in the places where it will be the most valuable, without ballooning out the O(..) by any 2^n factors. Differential Revision: https://reviews.llvm.org/D122148	2022-07-04 14:22:50 +01:00
Alexey Bataev	34073b5538	[SLP][NFC]Rework the test for logical and freeze, need some extra nodes, NFC.	2022-07-01 12:43:10 -07:00

1 2 3 4 5 ...

1234 Commits