llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	c167028684	[SLP]Delay vectorization of postponable values for instructions with no users. SLP vectorizer tries to find the reductions starting the operands of the instructions with no-users/void returns/etc. But such operands can be postponable instructions, like Cmp, InsertElement or InsertValue. Such operands still must be postponed, vectorizer should not try to vectorize them immediately. Differential Revision: https://reviews.llvm.org/D131965	2022-08-19 08:39:16 -07:00
Florian Hahn	aeb19817d6	Revert "[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly." This reverts commit `fc9c59c355`. The patch triggers an assertion when building SPEC on X86. Reduced reproducer shared at D107966. Also reverts follow-up commit `11a09af76d`.	2022-05-21 21:00:01 +01:00
Alexey Bataev	fc9c59c355	[SLP]Do not emit extract elements for insertelements users, replace with shuffles directly. SLP vectorizer emits extracts for externally used vectorized scalars and estimates the cost for each such extract. But in many cases these scalars are input for insertelement instructions, forming buildvector, and instead of extractelement/insertelement pair we can emit/cost estimate shuffle(s) cost and generate series of shuffles, which can be further optimized. Tested using test-suite (+SPEC2017), the tests passed, SLP was able to generate/vectorize more instructions in many cases and it allowed to reduce number of re-vectorization attempts (where we could try to vectorize buildector insertelements again and again). Differential Revision: https://reviews.llvm.org/D107966	2022-05-20 05:58:09 -07:00
Alexey Bataev	7ea03f0b4e	[SLP]Improve reductions analysis and emission, part 1. Currently SLP vectorizer walks through the instructions and selects 3 main classes of values: 1) reduction operations - instructions with same reduction opcode (add, mul, min/max, etc.), which build the reduction, 2) reduced values - instructions with the same opcodes, but different from the reduction opcode, 3) extra arguments - all other values, instructions from the different basic block rather than the root node, instructions with to many/less uses. This scheme is not very efficient. It excludes some instructions and all non-instruction values from the reductions (constants, proficient gathers), to many possibly reduced values are marked as extra arguments. Patch improves this process by introducing a bit extended analysis stage. During this stage, we still try to select 3 classes of the values: 1) reduction operations - same as before, 2) possibly reduced values - all instructions from the current block/non-instructions, which may build a vectorization tree, 3) extra arguments - instructions from the different basic blocks. Additionally, an extra sorting of the possibly reduced values occurs to build the scalar sequences which highly likely will bed vectorized, e.g. loads are grouped by the distance between them, constants are grouped together, cmp instructions are sorted by their compare types and predicates, extractelement instructions are sorted by the vector operand, etc. Also, these groups are reordered by their length so the longest group is the first in the list of the possibly reduced values. The vectorization process tries to emit the reductions for all these groups. These reductions, remaining non-vectorized possible reduced values and extra arguments are then combined into the final expression just like it was before. Differential Revision: https://reviews.llvm.org/D114171	2022-05-02 12:03:58 -07:00
Valery N Dmitriev	88b9e46fb5	[SLP] Steer for the best chance in tryToVectorize() when rooting with binary ops. tryToVectorize() method implements one of searching paths for vectorizable tree roots in SLP vectorizer, specifically for binary and comparison operations. Order of making probes for various scalar pairs was defined by its implementation: the instruction operands, then climb over one operand if the instruction is its sole user and then perform same actions for another operand if previous attempts failed. Problem with this approach is that among these options we can have more than a single vectorizable tree candidate and it is not necessarily the one that encountered first. Trying to build vectorizable tree for each possible combination for just evaluation is expensive. But we already have lookahead heuristics mechanism which we use for finding best pick among operands of commutative instructions. It calculates cumulative score for candidates in two consecutive lanes. This patch introduces use of the heuristics for choosing the best pair among several combinations. We only try one that looks as most promising for vectorization. Additional benefit is that we reduce total number of vectorization trees built for probes because we skip those looking non-profitable early. Reviewed By: Alexey Bataev (ABataev), Vasileios Porpodas (vporpo) Differential Revision: https://reviews.llvm.org/D124309	2022-04-25 12:25:33 -07:00
Philip Reames	48cc9287f5	Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" (try 3) The original commit exposed several missing dependencies (e.g. latent bugs in SLP scheduling). Most of these were fixed over the weekend and have had several days to bake. The last was fixed this morning after being noticed in manual review of test changes yesterday. See the review thread for links to each change. Original commit message follows: SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-03-25 10:39:23 -07:00
Philip Reames	deae979a2c	Revert "Reapply "[SLP] Schedule only sub-graph of vectorizable instructions""" This reverts commit `738042711b`. A second, apparently separate, issue has been reported on the original review.	2022-03-03 11:35:34 -08:00
Philip Reames	738042711b	Reapply "[SLP] Schedule only sub-graph of vectorizable instructions"" Root issue which triggered the revert was fixed in 689bab. No changes in the reapplied patch. Original commit message follows: SLP currently schedules all instructions within a scheduling window which stretches from the first instr uction potentially vectorized to the last. This window can include a very large number of unrelated instruct ions which are not being considered for vectorization. This change switches the code to only schedule the su b-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-03-02 10:47:20 -08:00
Arthur Eubanks	9c6250ee41	Revert "[SLP] Schedule only sub-graph of vectorizable instructions" This reverts commit `0539a26d91`. Causes a miscompile, see comments on D118538. Required updating bottom-to-top-reorder.ll.	2022-03-01 17:31:16 -08:00
Philip Reames	0539a26d91	[SLP] Schedule only sub-graph of vectorizable instructions SLP currently schedules all instructions within a scheduling window which stretches from the first instruction potentially vectorized to the last. This window can include a very large number of unrelated instructions which are not being considered for vectorization. This change switches the code to only schedule the sub-graph consisting of the instructions being vectorized and their transitive users. This has the effect of greatly reducing the amount of work performed in large basic blocks, and thus greatly improves compile time on degenerate examples. To understand the effects, I added some statistics (not planned for upstream contribution). Here's an illustration from my motivating example: Before this patch: 704357 SLP - Number of calcDeps actions 699021 SLP - Number of schedule calls 5598 SLP - Number of ReSchedule actions 59 SLP - Number of ReScheduleOnFail actions 10084 SLP - Number of schedule resets 8523 SLP - Number of vector instructions generated After this patch: 102895 SLP - Number of calcDeps actions 161916 SLP - Number of schedule calls 5637 SLP - Number of ReSchedule actions 55 SLP - Number of ReScheduleOnFail actions 10083 SLP - Number of schedule resets 8403 SLP - Number of vector instructions generated I do want to highlight that there is a small difference in number of generated vector instructions. This example is hitting the bailout due to maximum window size, and the change in scheduling is slightly perturbing when and how we hit it. This can be seen in the RescheduleOnFail counter change. Given that, I think we can safely ignore. The downside of this change can be seen in the large test diff. We group all vectorizable instructions together at the bottom of the scheduling region. This means that vector instructions can move quite far from their original point in code. While maybe undesirable, I don't see this as being a major problem as this pass is not intended to be a general scheduling pass. For context, it's worth noting that the pre-scheduling that SLP does while building the vector tree is exactly the sub-graph scheduling implemented by this patch. Differential Revision: https://reviews.llvm.org/D118538	2022-02-22 10:15:55 -08:00
Alexey Bataev	ddce6e0561	[SLP]Improve vectorization of cmp instructions sequences. Final attempt to vectorize bundles of comptatible cmp instructions after all other instructions processing. Metric: SLP.NumVectorInstructions Program results results0 diff test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test 1.00 5.00 400.0% test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test 8.00 11.00 37.5% test-suite :: MultiSource/Benchmarks/Olden/voronoi/voronoi.test 20.00 26.00 30.0% test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1344.00 1648.00 22.6% test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1344.00 1648.00 22.6% test-suite :: MultiSource/Benchmarks/Olden/bh/bh.test 102.00 124.00 21.6% test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test 118.00 133.00 12.7% test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3233.00 3554.00 9.9% test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3233.00 3554.00 9.9% test-suite :: MultiSource/Benchmarks/Olden/power/power.test 64.00 70.00 9.4% test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 7879.00 8604.00 9.2% test-suite :: MultiSource/Benchmarks/Prolangs-C/simulator/simulator.test 50.00 54.00 8.0% test-suite :: MultiSource/Applications/sqlite3/sqlite3.test 27.00 29.00 7.4% test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 8345.00 8955.00 7.3% test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 694.00 738.00 6.3% test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test 361.00 382.00 5.8% test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 409.00 430.00 5.1% test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 140.00 147.00 5.0% test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 140.00 147.00 5.0% test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4013.00 4206.00 4.8% test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test 966.00 1011.00 4.7% test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 65.00 68.00 4.6% test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4219.00 4381.00 3.8% test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1911.00 1973.00 3.2% test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test 62.00 64.00 3.2% test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test 62.00 64.00 3.2% test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 852.00 877.00 2.9% test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 852.00 877.00 2.9% test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1624.00 1668.00 2.7% test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test 39.00 40.00 2.6% test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test 613.00 624.00 1.8% test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test 378.00 383.00 1.3% test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test 293.00 295.00 0.7% test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test 297.00 299.00 0.7% test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5522.00 5534.00 0.2% test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5522.00 5534.00 0.2% Differential Revision: https://reviews.llvm.org/D114799	2021-12-01 07:26:29 -08:00
Alexey Bataev	eb9b75dd4d	[SLP]Change the order of the reduction/binops args pair vectorization attempts. Need to change the order of the reduction/binops args pair vectorization attempts. Need to try to find the reduction at first and postpone vectorization of binops args. This may help to find more reduction patterns and vectorize them. Part of D111574. Differential Revision: https://reviews.llvm.org/D112224	2021-10-25 06:27:14 -07:00
Alexey Bataev	e7c3eaa8ae	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107494	2021-08-05 08:41:24 -07:00
Alexey Bataev	214f99b27c	Revert "[SLP]Do not emit extra shuffle for insertelements vectorization." This reverts commit `871ea69803` to fix the problem if the first vector is not just undef.	2021-08-04 11:28:59 -07:00
Alexey Bataev	871ea69803	[SLP]Do not emit extra shuffle for insertelements vectorization. If the vectorized insertelements instructions form indentity subvector (the subvector at the beginning of the long vector), it is just enough to extend the vector itself, no need to generate inserting subvector shuffle. Differential Revision: https://reviews.llvm.org/D107344	2021-08-03 13:18:41 -07:00
Alexey Bataev	a010d4230e	[SLP]Allow reordering of insertelements. After we added support for non-ordered insertelements, we can allow their reordering. Differential Revision: https://reviews.llvm.org/D104057	2021-06-11 08:47:41 -07:00
Alexey Bataev	74af4bb1f4	[SLP]Remove unnecessary UndefValue in CreateShuffle. No need to use UndefValue in CreateShuffle call. Differential Revision: https://reviews.llvm.org/D104113	2021-06-11 08:08:30 -07:00
Alexey Bataev	a893b44187	[SLP]Disable scheduling of insertelements. There is no need to schedule insertelement instructions. The compiler did not schedule them before it started support their vectorization and it should not do it after. We pre-schedule them manually when finding a build vector sequence. Disabling scheduling of insertelement instructions improves compile time and vectorization of the very large basic blocks by saving scheduling budget for other instructions. Differential Revision: https://reviews.llvm.org/D104026	2021-06-10 10:25:26 -07:00
Anton Afanasyev	b2cd895011	[SLP] Fix "gathering" of insertelement instructions For rare exceptional case vector tree node (insertelements for now only) is marked as `NeedToGather`, this case is processed by patch. Follow-up of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135. Differential Revision: https://reviews.llvm.org/D102675	2021-05-25 01:35:43 +03:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit `bda6e5bee0`. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since `d6de1e1a71`, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
Alexey Bataev	8dab25954b	[SLP]Improve handling of compensate external uses cost. External insertelement users can be represented as a result of shuffle of the vectorized element and noconsecutive insertlements too. Added support for handling non-consecutive insertelements. Differential Revision: https://reviews.llvm.org/D101555	2021-05-21 07:45:31 -07:00
Alexey Bataev	20e2b4f6e0	[SLP][NFC]Add a test for non-consecutive inserts, NFC.	2021-05-14 12:44:35 -07:00
Anton Afanasyev	ab2c499d3a	[SLP] Add insertelement instructions to vectorizable tree Add new type of tree node for `InsertElementInst` chain forming vector. These instructions could be either removed, or replaced by shuffles during vectorization and we can add this node to cost model, so naturally estimating their cost, getting rid of `CompensateCost` tricks and reducing further work for InstCombine. This fixes PR40522 and PR35732 in a natural way. Also this patch is the first step towards revectorization of partially vectorization (to fix PR42022 completely). After adding inserts to tree the next step is to add vector instructions there (for instance, to merge `store <2 x float>` and `store <2 x float>` to `store <4 x float>`). Fixes PR40522 and PR35732. Differential Revision: https://reviews.llvm.org/D98714	2021-05-13 07:41:45 +03:00
Juneyoung Lee	4a8e6ed2f7	[SLP,LV] Use poison constant vector for shufflevector/initial insertelement This patch makes SLP and LV emit operations with initial vectors set to poison constant instead of undef. This is a part of efforts for using poison vector instead of undef to represent "doesn't care" vector. The goal is to make nice shufflevector optimizations valid that is currently incorrect due to the tricky interaction between undef and poison (see https://bugs.llvm.org/show_bug.cgi?id=44185 ). Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D94061	2021-01-06 11:22:50 +09:00
Juneyoung Lee	278aa65cc4	[IR] Let IRBuilder's CreateVectorSplat/CreateShuffleVector use poison as placeholder This patch updates IRBuilder to create insertelement/shufflevector using poison as a placeholder. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93793	2020-12-30 04:21:04 +09:00
Simon Pilgrim	119e4550dd	[SLPVectorizer][X86] Remove unused check-prefixes	2020-11-08 14:03:55 +00:00
Anton Afanasyev	a7478fab6c	[SLP] Fix order of `insertelement`/`insertvalue` seed operands Summary: This patch takes the indices operands of `insertelement`/`insertvalue` into account while generation of seed elements for `findBuildAggregate()`. This function has kept the original order of `insert`s before. Also this patch optimizes `findBuildAggregate()` preventing it from redundant temporary vector allocations and its multiple reversing. Fixes llvm.org/pr44067 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83779	2020-08-06 22:09:24 +03:00
Simon Pilgrim	168a44a70e	[CostModel][X86] Improve extract/insert element costs (PR43605) This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index. It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true). Differential Revision: https://reviews.llvm.org/D74976	2020-02-27 15:54:13 +00:00
Fangrui Song	502a77f125	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351	2019-12-24 15:57:33 -08:00
Sanjay Patel	b80033ef65	[SLP] reduce duplicate CHECK lines in tests; NFC	2019-11-20 10:12:54 -05:00
Craig Topper	8b5f2ab2a4	Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default." The assert that caused this to be reverted should be fixed now. Original commit message: This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 368183	2019-08-07 16:24:26 +00:00
Mitch Phillips	bd0d97e1c4	Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default." This reverts commit `3de33245d2`. This commit broke the MSan buildbots. See https://reviews.llvm.org/rL367901 for more information. llvm-svn: 368107	2019-08-06 23:00:43 +00:00
Craig Topper	3de33245d2	[X86] Enable -x86-experimental-vector-widening-legalization by default. This patch changes our defualt legalization behavior for 16, 32, and 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to widening. For example, v8i8 will now be widened to v16i8 instead of promoted to v8i16. This keeps the elements widths the same and pads with undef elements. We believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example, i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors. This has the potential to cause regressions so we wanted to get it in early in the 10.0 cycle so we have plenty of time to address them. Next steps will be to merge tests that explicitly test the command line option. And then we can remove the option and its associated code. llvm-svn: 367901	2019-08-05 18:25:36 +00:00
Eric Christopher	cee313d288	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552	2019-04-17 04:52:47 +00:00
Eric Christopher	a863435128	Temporarily Revert "Add basic loop fusion pass." As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546	2019-04-17 02:12:23 +00:00
Simon Pilgrim	ff3abef395	[SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops. This is prep work towards: (a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands (b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops. Differential Revision: https://reviews.llvm.org/D59738 llvm-svn: 356913	2019-03-25 15:53:55 +00:00
Alexey Bataev	ca2396e673	[SLP] Take user instructions cost into consideration in insertelement vectorization. Summary: For better vectorization result we should take into consideration the cost of the user insertelement instructions when we try to vectorize sequences that build the whole vector. I.e. if we have the following scalar code: ``` <Scalar code> insertelement <ScalarCode>, ... ``` we should consider the cost of the last `insertelement ` instructions as the cost of the scalar code. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42657 llvm-svn: 324893	2018-02-12 14:54:48 +00:00
Alexey Bataev	5b9a77d4ea	[SLP] Fix PR35777: Incorrect handling of aggregate values. Summary: Fixes the bug with incorrect handling of InsertValue\|InsertElement instrucions in SLP vectorizer. Currently, we may use incorrect ExtractElement instructions as the operands of the original InsertValue\|InsertElement instructions. Reviewers: mkuper, hfinkel, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D41767 llvm-svn: 321994	2018-01-08 14:43:06 +00:00
Alexey Bataev	fa13848da8	[SLP] Update more test checks, NFC. llvm-svn: 321872	2018-01-05 16:15:17 +00:00
Alexey Bataev	9581b42589	[SLP] General improvements of SLP vectorization process. Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310260	2017-08-07 15:25:49 +00:00
Alexey Bataev	53d523c9eb	Revert "[SLP] General improvements of SLP vectorization process." This reverts commit r310255. llvm-svn: 310257	2017-08-07 14:51:52 +00:00
Alexey Bataev	faace8f1f1	[SLP] General improvements of SLP vectorization process. Summary: Patch tries to improve two-pass vectorization analysis, existing in SLP vectorizer. What it does: 1. Defines key nodes, that are the vectorization roots. Previously vectorization started if StoreInst or ReturnInst is found. For now, the vectorization started for all Instructions with no users and void types (Terminators, StoreInst) + CallInsts. 2. CmpInsts, InsertElementInsts and InsertValueInsts are stored in the array. This array is processed only after the vectorization of the first-after-these instructions key node is finished. Vectorization goes in reverse order to try to vectorize as much code as possible. Reviewers: mzolotukhin, Ayal, mkuper, gilr, hfinkel, RKSimon Subscribers: ashahid, anemet, RKSimon, mssimpso, llvm-commits Differential Revision: https://reviews.llvm.org/D29826 llvm-svn: 310255	2017-08-07 14:03:17 +00:00
Alexey Bataev	4f0d469d45	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 289043	2016-12-08 11:57:51 +00:00
Renato Golin	5b8e7ecdb3	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's builtins (twice: once in r288412 and once in r288497). We should investigate this offline. llvm-svn: 288508	2016-12-02 16:56:26 +00:00
Alexey Bataev	e8e94a7176	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288497	2016-12-02 12:20:22 +00:00
Artem Belevich	704395a25a	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements." This reverts r288412 which causes severe compile-time regression. llvm-svn: 288431	2016-12-01 22:52:15 +00:00
Alexey Bataev	2c01af5904	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements. When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288412	2016-12-01 20:06:53 +00:00
Alexey Bataev	e951e5eb7b	[SLP] Add a new test for tree vectorization starting from insertelement instruction. llvm-svn: 288148	2016-11-29 15:37:52 +00:00
Hal Finkel	3b7fc86677	[SLPVectorize] Basic ephemeral-value awareness The SLP vectorizer should not vectorize ephemeral values. These are used to express information to the optimizer, and vectorizing them does not lead to faster code (because the ephemeral values are dropped prior to code generation, vectorized or not), and obscures the information the instructions are attempting to communicate (the logic that interprets the arguments to @llvm.assume generically does not understand vectorized conditions). Also, uses by ephemeral values are free (because they, and the necessary extractelement instructions, will be dropped prior to code generation). llvm-svn: 219816	2014-10-15 17:35:01 +00:00

1 2

59 Commits