llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	3ff07fcd54	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-21 10:51:03 -04:00
Eric Christopher	ecfd8161bf	Temporarily Revert "[SLP] Allow reordering of vectorization trees with reused instructions." as it's infinite looping on occasion. This reverts commit `455ca0ebb6`.	2020-09-18 12:50:04 -07:00
Alexey Bataev	455ca0ebb6	[SLP] Allow reordering of vectorization trees with reused instructions. If some leaves have the same instructions to be vectorized, we may incorrectly evaluate the best order for the root node (it is built for the vector of instructions without repeated instructions and, thus, has less elements than the root node). In this case we just can not try to reorder the tree + we may calculate the wrong number of nodes that requre the same reordering. For example, if the root node is \<a+b, a+c, a+d, f+e\>, then the leaves are \<a, a, a, f\> and \<b, c, d, e\>. When we try to vectorize the first leaf, it will be shrink to \<a, b\>. If instructions in this leaf should be reordered, the best order will be \<1, 0\>. We need to extend this order for the root node. For the root node this order should look like \<3, 0, 1, 2\>. This patch allows extension of the orders of the nodes with the reused instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D45263	2020-09-18 09:34:59 -04:00
Sanjay Patel	03783f19dc	[SLP] sort candidates to increase chance of optimal compare reduction This is one (small) part of improving PR41312: https://llvm.org/PR41312 As shown there and in the smaller tests here, if we have some member of the reduction values that does not match the others, we want to push it to the end (bring the matching members forward and together). In the regression tests, we have 5 candidates for the 4 slots of the reduction. If the one "wrong" compare is grouped with the others, it prevents forming the ideal v4i1 compare reduction. Differential Revision: https://reviews.llvm.org/D87772	2020-09-17 08:49:27 -04:00
Sanjay Patel	b011611e37	[SLP] add tests for reduction ordering; NFC	2020-09-16 13:28:19 -04:00
Huihui Zhang	3b7f5166bd	[SLPVectorizer][SVE] Skip scalable-vector instructions before vectorizeSimpleInstructions. For scalable type, the aggregated size is unknown at compile-time. Skip instructions with scalable type to ensure the list of instructions for vectorizeSimpleInstructions does not contains any scalable-vector instructions. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87550	2020-09-15 13:10:15 -07:00
Sanjay Patel	40f12ef621	[SLP] further limit bailout for load combine candidate (PR47450) The test example based on PR47450 shows that we can match non-byte-sized shifts, but those won't ever be bswap opportunities. This isn't a full fix (we'd still match if the shifts were by 8-bits for example), but this should be enough until there's evidence that we need to do more (this is a borderline case for vectorization in the first place).	2020-09-11 11:56:11 -04:00
Sanjay Patel	54680591e8	[SLP] add test for missed store vectorization; NFC	2020-09-11 11:56:11 -04:00
Craig Topper	c195ae2f00	[SLPVectorizer][X86][AMDGPU] Remove fcmp+select to fmin/fmax reduction support. Previously we could match fcmp+select to a reduction if the fcmp had the nonans fast math flag. But if the select had the nonans fast math flag, InstCombine would turn it into a fminnum/fmaxnum intrinsic before SLP gets to it. Seems fairly likely that if one of the fcmp+select pair have the fast math flag, they both would. My plan is to start vectorizing the fmaxnum/fminnum version soon, but I wanted to get this code out as it had some of the strangest fast math flag behaviors.	2020-09-10 11:49:19 -07:00
Simon Pilgrim	de25ebaac6	[CostModel][X86] Add vXi32 division by uniform constant costs (PR47476) Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.	2020-09-10 12:17:54 +01:00
Simon Pilgrim	0aea3a79ad	[SLP][X86] Add division by uniform constant tests (PR47476)	2020-09-10 11:52:20 +01:00
Arthur Eubanks	78e4aeb783	[NewPM][test] Fix accelerate-vector-functions.ll under NPM The legacy SLPVectorizer has a dependency on InjectTLIMappingsLegacy. That cannot be expressed in the new PM since they are both normal passes. Explicitly add -inject-tli-mappings as a pass. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86492	2020-08-25 10:50:14 -07:00
Sanjay Patel	c4f0a0896f	[InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try) The 1st attempt (rG557b890) was reverted because it caused miscompiles. That bug is avoided here by changing the order of folds and as verified in the new tests. Original commit message: InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-25 11:19:36 -04:00
Benjamin Kramer	c6fb72de4f	Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract" This reverts commit `557b890ff4`. Causing miscompiles, test case is on llvm-commits.	2020-08-25 11:31:31 +02:00
Sanjay Patel	557b890ff4	[InstCombine] improve demanded element analysis for vector insert-of-extract InstCombine currently has odd rules for folding insert-extract chains to shuffles, so we miss collapsing seemingly simple cases as shown in the tests here. But poison makes this not quite as easy as we might have guessed. Alive2 tests to show the subtle difference (similar to the regression tests): https://alive2.llvm.org/ce/z/hp4hv3 (this is ok) https://alive2.llvm.org/ce/z/ehEWaN (poison leakage) SLP tends to create these patterns (as shown in the SLP tests), and this could help with solving PR16739. Differential Revision: https://reviews.llvm.org/D86460	2020-08-24 17:00:16 -04:00
Sanjay Patel	7661c8c040	[SLP] avoid 'tmp' names in regression tests; NFC That can cause problems for update_test_checks.py (it warns when updating this file).	2020-08-24 17:00:16 -04:00
Arthur Eubanks	b79889c2b1	[opt][NewPM] Add basic-aa in legacy PM compatibility mode The legacy PM alias analysis pipeline by default includes basic-aa. When running `opt -foo-pass` under the NPM and -disable-basic-aa is not specified, use basic-aa. This decreases the number of check-llvm failures under NPM from 913 to 752. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86167	2020-08-21 14:05:07 -07:00
Sanjay Patel	c98fcba55c	[SLP] remove instcombine dependency from regression test; NFC InstCombine doesn't do that much here - sinks some instructions and improves alignments - but that should not be part of the SLP pass unit testing.	2020-08-18 10:18:22 -04:00
Thomas Lively	f969734c21	Reland "[SLPVectorizer] Pre-commit a test for D85759" This reverts commit `52b71aa8b1`. The problem was a missing lit.local.cfg file, which was causing the test to be incorrectly run on bots that had not built the WebAssembly target.	2020-08-11 12:18:33 -07:00
Thomas Lively	52b71aa8b1	Revert "[SLPVectorizer] Pre-commit a test for D85759" This reverts commit `94791970de`. The test is failing on multiple bots, event though it passes for me locally. Reverting while I investigate further.	2020-08-11 12:11:24 -07:00
Thomas Lively	94791970de	[SLPVectorizer] Pre-commit a test for D85759 `8cc911fa5b` refactored the `getIntrinsicInstrCost` function and was meant to be a nonfunctional change, but it accidentally changed how costs were calculated in the SLP vectorizer, which regressed WebAssembly codegen and resulted in a downstream bug report at https://github.com/emscripten-core/emscripten/issues/11449. The fix for this regression is in D85759, and this patch just pre-commits the test from that patch to demonstrate the regressed behavior first.	2020-08-11 11:30:09 -07:00
Florian Hahn	0b774acf11	[SLP] Make sure instructions are ordered when computing spill cost. The entries in VectorizableTree are not necessarily ordered by their position in basic blocks. Collect them and order them by dominance so later instructions are guaranteed to be visited first. For instructions in different basic blocks, we only scan to the beginning of the block, so their order does not matter, as long as all instructions in a basic block are grouped together. Using dominance ensures a deterministic order. The modified test case contains an example where we compute a wrong spill cost (2) without this patch, even though there is no call between any instruction in the bundle. This seems to have limited practical impact, .e.g on X86 with a recent Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006 there are no binary changes. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D82444	2020-08-11 11:18:12 +02:00
Simon Pilgrim	90f721404f	[SLP] Regenerate load-merge.ll tests Noticed this NFC change in D57779	2020-08-10 16:09:26 +01:00
Simon Pilgrim	f35992b75b	[SLP][X86] Add smax intrinsic reduction tests SLP currently only matches the ICMP+SELECT patterns for min/max reductions	2020-08-07 11:48:08 +01:00
Simon Pilgrim	aa38e97ad5	[SLP][X86] Add abs/smax/smin/umax/umin intrinsic vectorization tests	2020-08-07 11:23:43 +01:00
Anton Afanasyev	a7478fab6c	[SLP] Fix order of `insertelement`/`insertvalue` seed operands Summary: This patch takes the indices operands of `insertelement`/`insertvalue` into account while generation of seed elements for `findBuildAggregate()`. This function has kept the original order of `insert`s before. Also this patch optimizes `findBuildAggregate()` preventing it from redundant temporary vector allocations and its multiple reversing. Fixes llvm.org/pr44067 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83779	2020-08-06 22:09:24 +03:00
Simon Pilgrim	3b93464dcf	[SLP][X86] Regenerate sdiv test noticed in D83779. NFC.	2020-08-06 18:00:21 +01:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
David Sherwood	9ad7c980bb	[SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Differential Revision: https://reviews.llvm.org/D83542	2020-07-29 16:29:19 +01:00
Matt Arsenault	c230965ccf	AMDGPU: Make saturating add/sub legal for DAG path	2020-07-29 08:27:31 -04:00
Anton Afanasyev	56c92bf4b7	[SLP][Test] Precommit tests for D83779. NFC.	2020-07-22 18:25:45 +03:00
Alexey Bataev	be37f13e2d	[SLP]Add an extra test for vectorization of non-pow-2 trees, NFC.	2020-07-22 09:13:30 -04:00
Sanne Wouda	7b84045565	[SLPVectorizer] handle vectorizeable library functions Teaches the SLPVectorizer to use vectorized library functions for non-intrinsic calls. This already worked for intrinsics that have vectorized library functions, thanks to D75878, but schedules with library functions with a vector variant were being rejected early. - assume that there are no load/store dependencies between lib functions with a vector variant; this would otherwise prevent the bundle from becoming "ready" - check during legalization that the vector variant can be used - fix-up where we previously assumed that a call would be an intrinsic Differential Revision: https://reviews.llvm.org/D82550	2020-07-13 15:28:46 +01:00
Sanne Wouda	e909f6bc48	Pre-commit tests Prepare to land D82550	2020-07-13 15:28:46 +01:00
Stanislav Mekhanoshin	64030099c3	SLP: honor requested max vector size merging PHIs At the moment this place does not check maximum size set by TTI and just creates a maximum possible vectors. Differential Revision: https://reviews.llvm.org/D82227	2020-07-08 08:06:15 -07:00
Florian Hahn	04b85e2bcb	Revert "[SLP] Make sure instructions are ordered when computing spill cost." This seems to break http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/24371 This reverts commit `eb46137daa`.	2020-07-07 23:15:01 +01:00
Florian Hahn	eb46137daa	[SLP] Make sure instructions are ordered when computing spill cost. The entries in VectorizableTree are not necessarily ordered by their position in basic blocks. Collect them and order them by dominance so later instructions are guaranteed to be visited first. For instructions in different basic blocks, we only scan to the beginning of the block, so their order does not matter, as long as all instructions in a basic block are grouped together. Using dominance ensures a deterministic order. The modified test case contains an example where we compute a wrong spill cost (2) without this patch, even though there is no call between any instruction in the bundle. This seems to have limited practical impact, .e.g on X86 with a recent Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006 there are no binary changes. Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D82444	2020-07-03 17:30:17 +01:00
Florian Hahn	039145c72b	[SLP] Precommit test for which spill cost is computed incorrectly. Test for D82444.	2020-07-03 17:15:52 +01:00
Arthur Eubanks	691c086d15	[NewPM][BasicAA] basicaa -> basic-aa in Transforms/SLPVectorizer Following https://reviews.llvm.org/D82607. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D82681	2020-06-26 14:58:41 -07:00
Florian Hahn	35bb9bfbb0	[SLP] Limit GEP lists based on width of index computation. D68667 introduced a tighter limit to the number of GEPs to simplify together. The limit was based on the vector element size of the pointer, but the pointers themselves are not actually put in vectors. IIUC we try to vectorize the index computations here, so we should base the limit on the vector element size of the computation of the index. This restores the test regression on AArch64 and also restores the vectorization for a important pattern in SPEC2006/464.h264ref on AArch64 (@test_i16_extend). We get a large benefit from doing a single load up front and then processing the index computations in vectors. Note that we could probably even further improve the AArch64 codegen, if we would do zexts to i32 instead of i64 for the sub operands and then do a single vector sext on the result of the subtractions. AArch64 provides dedicated vector instructions to do so. Sketch of proof in Alive: https://alive2.llvm.org/ce/z/A4xYAB Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel Reviewed By: ABataev, spatel Differential Revision: https://reviews.llvm.org/D82418	2020-06-24 19:56:53 +01:00
Florian Hahn	f4044dd539	[SLP] Precommit short load / wide math test for AArch64. This pattern is key to eliminate a 10% performance regression in SPEC2006.	2020-06-24 16:57:45 +01:00
Stanislav Mekhanoshin	f633b07669	Pre-commited test update. NFC.	2020-06-22 08:10:20 -07:00
Stanislav Mekhanoshin	736b0d0cf0	Pre-commit SLP test. NFC.	2020-06-22 07:41:45 -07:00
Sanjay Patel	e50059f6b6	[x86] form reduction intrinsics from vectorizers instead of raw IR Motivating examples are seen in the PhaseOrdering tests based on: https://bugs.llvm.org/show_bug.cgi?id=43953#c2 - if we have intrinsics there, some pass can fold them. The intrinsics are still named "experimental" at this point, but if there is no fallout from this patch, that will be a good indicator that it is safe to finalize them. Differential Revision: https://reviews.llvm.org/D80867	2020-06-05 12:38:49 -04:00
Valery N Dmitriev	a45688a72c	[SLP] Apply external to vectorizable tree users cost adjustment for relevant aggregate build instructions only (UserCost). Users are detected with findBuildAggregate routine and the trick is that following SLP vectorization may end up vectorizing entire list with smaller chunks. Cost adjustment then is applied for individual chunks and these adjustments obviously have to be smaller than the entire aggregate build cost. Differential Revision: https://reviews.llvm.org/D80773	2020-05-29 15:37:41 -07:00
Sanjay Patel	61412b762d	[SLP] auto-generate complete test checks; NFC	2020-05-29 13:45:25 -04:00
Valery N Dmitriev	38727bab6f	[NFC][SLP] Add test case exposing SLP cost model bug. The bug is related to aggregate build cost model adjustment that adds a bias to cost triggering vectorization of actually unprofitable to vectorize tree. Differential Revision: https://reviews.llvm.org/D80682	2020-05-28 17:31:29 -07:00
Sanjay Patel	880df559f9	[SLP] fix test to have valid IR; NFC This test was failing verification because the metadata is ill-formed. This commit is split from D80401 because it is an independent fix (although the test would break with that change).	2020-05-22 09:06:02 -04:00
Vedant Kumar	77ffce6954	[Instruction] Set metadata uses to undef on deletion Summary: Replace any extant metadata uses of a dying instruction with undef to preserve debug info accuracy. Some alternatives include: - Treat Instruction like any other Value, and point its extant metadata uses to an empty ValueAsMetadata node. This makes extant dbg.value uses trivially dead (i.e. fair game for deletion in many passes), leading to stale dbg.values being in effect for too long. - Call salvageDebugInfoOrMarkUndef. Not needed to make instruction removal correct. OTOH results in wasted work in some common cases (e.g. when all instructions in a BasicBlock are deleted). This came up while discussing some basic cases in https://reviews.llvm.org/D80052. Reviewers: jmorse, TWeaver, aprantl, dexonsmith, jdoerfert Subscribers: jholewinski, qcolombet, hiraditya, jfb, sstefan1, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D80264	2020-05-21 15:58:12 -07:00
Eli Friedman	11aa3707e3	StoreInst should store Align, not MaybeAlign This is D77454, except for stores. All the infrastructure work was done for loads, so the remaining changes necessary are relatively small. Differential Revision: https://reviews.llvm.org/D79968	2020-05-15 12:26:58 -07:00

1 2 3 4 5 ...

706 Commits