llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	6dd29fccb8	[SLP] Support for horizontal min/max reduction. SLP vectorizer supports horizontal reductions for Add/FAdd binary operations. Patch adds support for horizontal min/max reductions. Function getReductionCost() is split to getArithmeticReductionCost() for binary operation reductions and getMinMaxReductionCost() for min/max reductions. Patch fixes PR26956. Differential revision: https://reviews.llvm.org/D27846 llvm-svn: 312791	2017-09-08 13:49:36 +00:00
Sam Elliott	b0c9753691	Keep Optimization Remark Yaml in NewPM Summary: The New Pass Manager infrastructure was forgetting to keep around the optimization remark yaml file that the compiler might have been producing. This meant setting the option to '-' for stdout worked, but setting it to a filename didn't give file output (presumably it was deleted because compilation didn't explicitly keep it). This change just ensures that the file is kept if compilation succeeds. So far I have updated one of the optimization remark output tests to add a version with the new pass manager. It is my intention for this patch to also include changes to all tests that use `-opt-remark-output=` but I wanted to get the code patch ready for review while I was making all those changes. Fixes https://bugs.llvm.org/show_bug.cgi?id=33951 Reviewers: anemet, chandlerc Reviewed By: anemet, chandlerc Subscribers: javed.absar, chandlerc, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D36906 llvm-svn: 311271	2017-08-20 01:30:45 +00:00
Dinar Temirbulatov	7aff8cfa55	[SLPVectorizer] Tighten up VLeft, VRight declaration, remove unnecessary testcase test/Transforms/SLPVectorizer/X86/reorder.ll, NFCI. llvm-svn: 311223	2017-08-19 03:15:07 +00:00
Dinar Temirbulatov	e3ce1b455e	[SLPVectorizer] Add opcode parameter to reorderAltShuffleOperands, reorderInputsAccordingToOpcode functions. Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D36766 llvm-svn: 311221	2017-08-19 02:54:20 +00:00
Dinar Temirbulatov	7b78f5e52d	[SLPVectorizer] Schedule bundle with different opcodes. This change let us schedule a bundle with different opcodes in it, for example : [ load, add, add, add ] Reviewers: mkuper, RKSimon, ABataev, mzolotukhin, spatel, filcab Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D36518 llvm-svn: 310847	2017-08-14 15:40:16 +00:00
Alexey Bataev	9581b42589	[SLP] General improvements of SLP vectorization process. Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310260	2017-08-07 15:25:49 +00:00
Alexey Bataev	53d523c9eb	Revert "[SLP] General improvements of SLP vectorization process." This reverts commit r310255. llvm-svn: 310257	2017-08-07 14:51:52 +00:00
Alexey Bataev	faace8f1f1	[SLP] General improvements of SLP vectorization process. Summary: Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310255	2017-08-07 14:03:17 +00:00
Simon Pilgrim	7c53c0f586	[SLPVectorizer][X86] Cleanup test case. NFCI Remove excess attributes/metadata llvm-svn: 310227	2017-08-06 20:50:19 +00:00
Dinar Temirbulatov	cc2294a4eb	[SLPVectorizer] Add extra parameter to setInsertPointAfterBundle to handle different opcodes, NFCI. Differential Revision: https://reviews.llvm.org/D35769 llvm-svn: 310183	2017-08-05 18:43:52 +00:00
Craig Topper	ae9b87d10c	[InstCombine] Support sext in foldLogicCastConstant This adds support for sext in foldLogicCastConstant. This is a prerequisite for D36214. Differential Revision: https://reviews.llvm.org/D36234 llvm-svn: 309880	2017-08-02 20:25:56 +00:00
Alexey Bataev	65e9dd66f7	[SLPVectorizer] Test update, NFC. llvm-svn: 309814	2017-08-02 14:22:53 +00:00
Alexey Bataev	fba97e6e21	[SLP] Fix for PR31880: shuffle and vectorize repeated scalar ops on extracted elements Summary: Currently most of the time vectors of extractelement instructions are treated as scalars that must be gathered into vectors. But in some cases, like when we have extractelement instructions from single vector with different constant indeces or from 2 vectors of the same size, we can treat this operations as shuffle of a single vector or blending of 2 vectors. ``` define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) { %x0 = extractelement <2 x i8> %x, i32 0 %y1 = extractelement <2 x i8> %y, i32 1 %x0x0 = mul i8 %x0, %x0 %y1y1 = mul i8 %y1, %y1 %ins1 = insertelement <2 x i8> undef, i8 %x0x0, i32 0 %ins2 = insertelement <2 x i8> %ins1, i8 %y1y1, i32 1 ret <2 x i8> %ins2 } ``` can be converted to something like ``` define <2 x i8> @g(<2 x i8> %x, <2 x i8> %y) { %1 = shufflevector <2 x i8> %x, <2 x i8> %y, <2 x i32> <i32 0, i32 3> %2 = mul <2 x i8> %1, %1 ret <2 x i8> %2 } ``` Currently this type of conversion is considered as high cost transformation. Reviewers: mzolotukhin, delena, mkuper, hfinkel, RKSimon Subscribers: ashahid, RKSimon, spatel, llvm-commits Differential Revision: https://reviews.llvm.org/D30200 llvm-svn: 309812	2017-08-02 13:25:26 +00:00
Mohammad Shahid	126e91ab09	[SLP]: Add test to resurrect the jumbled load patch. This test has multiple uses of memory loads by different user Change-Id: I40b5ba8b810265440f3e55efca77c4b41ca98fa4 llvm-svn: 309544	2017-07-31 07:40:54 +00:00
Alexey Bataev	8843aab83a	[SLP] A test for limiting vectorization of instructions, NFC. llvm-svn: 306828	2017-06-30 14:37:32 +00:00
Matt Arsenault	67cd347e93	AMDGPU: Allow vectorization of packed types llvm-svn: 305844	2017-06-20 20:38:06 +00:00
Simon Pilgrim	448c2290f1	[X86][SLM] Add SLM arithmetic vectorization tests As discussed on D33983, as SLM has so many custom costs its worth testing as well. llvm-svn: 305151	2017-06-10 19:16:09 +00:00
Simon Pilgrim	631109c09c	Regenerate test llvm-svn: 304973	2017-06-08 10:24:49 +00:00
Alexey Bataev	0c2fca94a3	[SLP] Change extension of the test, NFC. llvm-svn: 304829	2017-06-06 20:27:45 +00:00
Alexey Bataev	7266fec0d4	[SLP] Add a test for fix of PR32164, NFC. llvm-svn: 304826	2017-06-06 20:11:35 +00:00
Amara Emerson	c9916d7e97	Re-commit r302678, fixing PR33053. The issue was that the AArch64 TTI hook allowed unpacked integer cmp reductions which didn't have a lowering. llvm-svn: 303211	2017-05-16 21:29:22 +00:00
Adam Nemet	e29686e5c1	[SLP] Enable 64-bit wide vectorization on AArch64 ARM Neon has native support for half-sized vector registers (64 bits). This is beneficial for example for 2D and 3D graphics. This patch adds the option to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer. * Performance Analysis This change was motivated by some internal benchmarks but it is also beneficial on SPEC and the LLVM testsuite. The results are with -O3 and PGO. A negative percentage is an improvement. The testsuite was run with a sample size of 4. SPEC * CFP2006/482.sphinx3 -3.34% A pretty hot loop is SLP vectorized resulting in nice instruction reduction. This used to be a +22% regression before rL299482. * CFP2000/177.mesa -3.34% * CINT2000/256.bzip2 +6.97% My current plan is to extend the fix in rL299482 to i16 which brings the regression down to +2.5%. There are also other problems with the codegen in this loop so there is further room for improvement. ** LLVM testsuite * SingleSource/Benchmarks/Misc/ReedSolomon -10.75% There are multiple small SLP vectorizations outside the hot code. It's a bit surprising that it adds up to 10%. Some of this may be code-layout noise. * MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40% The opt-viewer screenshot can be seen at F3218284. We start at a colder store but the tree leads us into the hottest loop. * MultiSource/Applications/lambda-0.1.3/lambda -2.68% * MultiSource/Benchmarks/Bullet/bullet -2.18% This is using 3D vectors. * SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67% Noise, binary is unchanged. * MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90% There is an additional SLP in the cold code. The test runs for ~1sec and prints out over 2000 lines. This is most likely noise. * MultiSource/Applications/aha/aha +1.63% * MultiSource/Applications/JM/lencod/lencod +1.41% * SingleSource/Benchmarks/Misc/richards_benchmark +1.15% Differential Revision: https://reviews.llvm.org/D31965 llvm-svn: 303116	2017-05-15 21:15:01 +00:00
Hans Wennborg	bd6e9e77a7	Revert r302678 "[AArch64] Enable use of reduction intrinsics." This caused PR33053. Original commit message: > The new experimental reduction intrinsics can now be used, so I'm enabling this > for AArch64. We will need this for SVE anyway, so it makes sense to do this for > NEON reductions as well. > > The existing code to match shufflevector patterns are replaced with a direct > lowering of the reductions to AArch64-specific nodes. Tests updated with the > new, simpler, representation. > > Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 303115	2017-05-15 20:59:32 +00:00
Simon Pilgrim	7d2f06ae22	[SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 add/sub/mul llvm-svn: 303074	2017-05-15 15:48:15 +00:00
Simon Pilgrim	a12d65972a	[SLPVectorizer][X86] Add vectorization tests for vXi64/vXi32/vXi16/VXi8 shifts llvm-svn: 303069	2017-05-15 14:27:11 +00:00
Adam Nemet	0aca09fc6c	[SLP] Emit optimization remarks The approach I followed was to emit the remark after getTreeCost concludes that SLP is profitable. I initially tried emitting them after the vectorizeRootInstruction calls in vectorizeChainsInBlock but I vaguely remember missing a few cases for example in HorizontalReduction::tryToReduce. ORE is placed in BoUpSLP so that it's available from everywhere (notably HorizontalReduction::tryToReduce). We use the first instruction in the root bundle as the locator for the remark. In order to get a sense how far the tree is spanning I've include the size of the tree in the remark. This is not perfect of course but it gives you at least a rough idea about the tree. Then you can follow up with -view-slp-tree to really see the actual tree. llvm-svn: 302811	2017-05-11 17:06:17 +00:00
Amara Emerson	816542ceb3	[AArch64] Enable use of reduction intrinsics. The new experimental reduction intrinsics can now be used, so I'm enabling this for AArch64. We will need this for SVE anyway, so it makes sense to do this for NEON reductions as well. The existing code to match shufflevector patterns are replaced with a direct lowering of the reductions to AArch64-specific nodes. Tests updated with the new, simpler, representation. Differential Revision: https://reviews.llvm.org/D32247 llvm-svn: 302678	2017-05-10 15:15:38 +00:00
Matt Arsenault	6a288c1e32	Replace hardcoded intrinsic list with speculatable attribute. No change in which intrinsics should be speculated. llvm-svn: 301995	2017-05-03 02:26:10 +00:00
Easwaran Raman	76aba5f6d7	[SLP vectorizer] Allow phi node reordering in tryToVectorizeList. In tryToVectorizeList, under a very limited circumstance (when entered from tryToVectorizePair), the values may be reordered (swapped) and the SLP tree is built with the new order. This extends that to the case when starting from phis in vectorizeChainsInBlock when there are exactly two phis. The textual order of phi nodes shouldn't really matter. Without this change, the loop body in the accompnaying test case is fully vectorized when we swap the orde of the phis but not with this order. While this doesn't solve the phi-ordering problem in a general way (for more than 2 phis), this is simple fix that piggybacks on an existing mechanism and is useful in cases like multiplying two complex numbers. Differential revision: https://reviews.llvm.org/D32065 llvm-svn: 300574	2017-04-18 18:16:57 +00:00
Jonas Paulsson	4707015d46	Fix a RUN line in new test. Use '2>&1 \|' and not '\|&' to pipe debug output to FileCheck Hopefully handles a "shell parser error" on llvm-clang-x86_64-expensive-checks-win test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll llvm-svn: 300064	2017-04-12 14:25:08 +00:00
Jonas Paulsson	22776892c9	[SLPVectorizer] Pass the right type argument to getCmpSelInstrCost() In getEntryCost(), make the scalar type for a compare instruction that of the operands, not i1. This is needed in order to call getCmpSelInstrCost() for a compare in a sensible way, the same way as the LoopVectorizer does. New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll Review: Matthew Simpson https://reviews.llvm.org/D31601 llvm-svn: 300061	2017-04-12 13:29:25 +00:00
Matt Arsenault	3dbeefa978	AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel Currently the default C calling convention functions are treated the same as compute kernels. Make this explicit so the default calling convention can be changed to a non-kernel. Converted with perl -pi -e 's/define void/define amdgpu_kernel void/' on the relevant test directories (and undoing in one place that actually wanted a non-kernel). llvm-svn: 298444	2017-03-21 21:39:51 +00:00
Simon Pilgrim	06c70adcf0	[X86] Add missing BITREVERSE costs for SSE2 vectors and i8/i16/i32/i64 scalars Prep work for PR31810 llvm-svn: 297876	2017-03-15 19:34:55 +00:00
Michael Kuperstein	5fb39a7966	[SLP] Revert everything that has to do with memory access sorting. This reverts r293386, r294027, r294029 and r296411. Turns out the SLP tree isn't actually a "tree" and we don't handle accessing the same packet of loads in several different orders well, causing miscompiles. Revert until we can fix this properly. llvm-svn: 297493	2017-03-10 18:59:07 +00:00
Michael Kuperstein	768d013a03	[SLP] Revert r296863 due to miscompiles. Details and reproducer are on the email thread for r296863. llvm-svn: 297103	2017-03-06 23:54:51 +00:00
Alexey Bataev	d7db344c17	[SLP] A test for vectorization of users of extractelement instructions, NFC. llvm-svn: 297024	2017-03-06 16:26:00 +00:00
Mohammad Shahid	bdac9f30c0	[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for external use of vectorizable scalars based on shuffle mask. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d llvm-svn: 296863	2017-03-03 10:02:47 +00:00
Hans Wennborg	cc4ff78c9d	Revert r296575 "[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available" It caused miscompiles, e.g. in Chromium (PR32109). llvm-svn: 296654	2017-03-01 18:57:16 +00:00
Alexey Bataev	4a45efa431	[SLP] Preserve IR flags when vectorizing horizontal reductions. Summary: The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math) when converting scalar horizontal reductions instructions into vectors, just like for other vectorized instructions. It doe not include IR propagation for extra arguments, we need to handle original scalar operations for extra args to propagate correct flags. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30418 llvm-svn: 296614	2017-03-01 12:43:39 +00:00
Alexey Bataev	74e5a36856	[SLP] Preserve IR flags for extra args. Summary: We should preserve IR flags for extra args. These IR flags should be taken from original scalar operations, not from the reduction operations. Reviewers: mkuper, mzolotukhin, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30447 llvm-svn: 296613	2017-03-01 12:22:33 +00:00
Alexey Bataev	dfec81107f	[SLP] Fix for PR32038: extra add of PHI node when it is not required. Summary: If horizontal reduction tree starts from the binary operation that is used in PHI node, but this PHI is not used in horizontal reduction, we may end up with extra addition of this PHI node after vectorization. Here is an example: ``` %phi = phi i32 [ %tmp, %end], ... ... %tmp = add i32 %tmp1, %tmp2 end: ``` after vectorization we always have something like: ``` %phi = phi i32 [ %tmp, %end], ... ... %red = extractelement <8 x 32> %vec.red, 0 %tmp = add i32 %red, %phi end: ``` even if `%phi` is not used in reduction tree. Patch considers these PHI nodes as extra arguments and considers them in the final result iff they really used in reduction. Reviewers: mkuper, hfinkel, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30409 llvm-svn: 296606	2017-03-01 10:50:44 +00:00
Mohammad Shahid	175ffa8c35	[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR. The fix is to compute the mask for out of order memory accesses while building the vectorizable tree instead of actual vectorization of vectorizable tree. Reviewers: mkuper Differential Revision: https://reviews.llvm.org/D30159 Change-Id: Id1e287f073fa4959713ba545fa4254db5da8b40d llvm-svn: 296575	2017-03-01 03:51:54 +00:00
Michael Kuperstein	c07cca85fb	[SLP] Load sorting should not try to sort things that aren't loads. We may get a VL where the first element is a load, but the others aren't. Trying to sort such VLs can only lead to sorrow. llvm-svn: 296411	2017-02-27 23:18:11 +00:00
Alexey Bataev	a79c41cf51	[SLP] Use different flags in tests for reduction ops and extra args. llvm-svn: 296376	2017-02-27 20:22:44 +00:00
Alexey Bataev	0aadc6ef13	[SLP] Modify test to check IR flags propagation for extra args. llvm-svn: 296369	2017-02-27 19:16:09 +00:00
Alexey Bataev	cb78d09d14	[SLP] A test for a fix of PR32038. llvm-svn: 296349	2017-02-27 16:07:10 +00:00
Alexey Bataev	f77d1656af	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295972	2017-02-23 13:37:09 +00:00
Alexey Bataev	14b370c1bf	Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong" This reverts commit 7c5141e577d9efd1c8e3087566a38ce6b3a41a84. llvm-svn: 295957	2017-02-23 11:09:35 +00:00
Alexey Bataev	7ae653285d	[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong result Summary: If the same value is used several times as an extra value, SLP vectorizer takes it into account only once instead of actual number of using. For example: ``` int val = 1; for (int y = 0; y < 8; y++) { for (int x = 0; x < 8; x++) { val = val + input[y * 8 + x] + 3; } } ``` We have 2 extra rguments: `1` - initial value of horizontal reduction and `3`, which is added 8*8 times to the reduction. Before the patch we added `1` to the reduction value and added once `3`, though it must be added 64 times. Reviewers: mkuper, mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D30262 llvm-svn: 295956	2017-02-23 10:57:15 +00:00
Alexey Bataev	7337212e83	Revert "[SLP] Fix for PR32036: Vectorized horizontal reduction returning wrong" This reverts commit d83c81ee6a8dea662808ac22b396d1bb0595c89d. llvm-svn: 295951	2017-02-23 09:59:29 +00:00

1 2 3 4 5 ...

350 Commits