llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	727e279ac4	SLPVectorizer: Move propagateMetadata to VectorUtils This will be re-used by the LoadStoreVectorizer. Fix handling of range metadata and testcase by Justin Lebar. llvm-svn: 274281	2016-06-30 21:17:59 +00:00
Wei Mi	95685faeee	Refine the set of UniformAfterVectorization instructions. Except the seed uniform instructions (conditional branch and consecutive ptr instructions), dependencies to be added into uniform set should only be used by existing uniform instructions or intructions outside of current loop. Differential Revision: http://reviews.llvm.org/D21755 llvm-svn: 274262	2016-06-30 18:42:56 +00:00
Adam Nemet	e1af3c635c	[LV] Improve accuracy and formatting of function comment llvm-svn: 274182	2016-06-29 22:04:10 +00:00
Elena Demikhovsky	5e21c94f25	Reverted patch 273864 llvm-svn: 274115	2016-06-29 10:01:06 +00:00
Adam Nemet	ad437fff53	[Diag] Add getter shouldAlwaysPrint. NFC For the new hotness attribute, the API will take the pass rather than the pass name so we can no longer play the trick of AlwaysPrint being a special pass name. This adds a getter to help the transition. There is also a corresponding clang patch. llvm-svn: 274100	2016-06-29 04:55:19 +00:00
Elena Demikhovsky	6f2ec8104a	Fixed crash of SLP Vectorizer on KNL The bug is connected to vector GEPs. https://llvm.org/bugs/show_bug.cgi?id=28313 llvm-svn: 273919	2016-06-27 20:07:00 +00:00
Elena Demikhovsky	4c58b2761a	Fixed consecutive memory access detection in Loop Vectorizer. It did not handle correctly cases without GEP. The following loop wasn't vectorized: for (int i=0; i<len; i++) to++ = from++; I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1. Re-commit rL273257 - revision: http://reviews.llvm.org/D20789 llvm-svn: 273864	2016-06-27 11:19:23 +00:00
Benjamin Kramer	135f735af1	Apply clang-tidy's modernize-loop-convert to most of lib/Transforms. Only minor manual fixes. No functionality change intended. llvm-svn: 273808	2016-06-26 12:28:59 +00:00
Matthew Simpson	e794678404	[LV] Preserve order of dependences in interleaved accesses analysis The interleaved access analysis currently assumes that the inserted run-time pointer aliasing checks ensure the absence of dependences that would prevent its instruction reordering. However, this is not the case. Issues can arise from how code generation is performed for interleaved groups. For a load group, all loads in the group are essentially moved to the location of the first load in program order, and for a store group, all stores in the group are moved to the location of the last store. For groups having members involved in a dependence relation with any other instruction in the loop, this reordering can violate the dependence. This patch teaches the interleaved access analysis how to avoid breaking such dependences, and should fix PR27626. An assumption of the original analysis was that the accesses had been collected in "program order". The analysis was then simplified by visiting the accesses bottom-up. However, this ordering was never guaranteed for anything other than single basic block loops. Thus, this patch also enforces the desired ordering. Reference: https://llvm.org/bugs/show_bug.cgi?id=27626 Differential Revision: http://reviews.llvm.org/D19984 llvm-svn: 273687	2016-06-24 15:33:25 +00:00
Elena Demikhovsky	a266cf0518	reverted the prev commit due to assertion failure llvm-svn: 273258	2016-06-21 12:10:11 +00:00
Elena Demikhovsky	9823c995bc	Fixed consecutive memory access detection in Loop Vectorizer. It did not handle correctly cases without GEP. The following loop wasn't vectorized: for (int i=0; i<len; i++) to++ = from++; I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1. Differential revision: http://reviews.llvm.org/D20789 llvm-svn: 273257	2016-06-21 11:32:01 +00:00
Adam Nemet	a9f09c6245	[LAA] Enable symbolic stride speculation for all LAA clients This is a functional change for LLE and LDist. The other clients (LV, LVerLICM) already had this explicitly enabled. The temporary boolean parameter to LAA is removed that allowed turning off speculation of symbolic strides. This makes LAA's caching interface LAA::getInfo only take the loop as the parameter. This makes the interface more friendly to the new Pass Manager. The flag -enable-mem-access-versioning is moved from LV to a LAA which now allows turning off speculation globally. llvm-svn: 273064	2016-06-17 22:35:41 +00:00
Benjamin Kramer	1afc1de406	Apply another batch of fixes from clang-tidy's performance-unnecessary-value-param. Contains some manual fixes. No functionality change intended. llvm-svn: 273047	2016-06-17 20:41:14 +00:00
Adam Nemet	c953bb9953	[LV] Move management of symbolic strides to LAA. NFCI This is still NFCI, so the list of clients that allow symbolic stride speculation does not change (yes: LV and LoopVersioningLICM, no: LLE, LDist). However since the symbolic strides are now managed by LAA rather than passed by client a new bool parameter is used to enable symbolic stride speculation. The existing test Transforms/LoopVectorize/version-mem-access.ll checks that stride speculation is performed for LV. The previously added test Transforms/LoopLoadElim/symbolic-stride.ll ensures that no speculation is performed for LLE. The next patch will change the functionality and turn on symbolic stride speculation in all of LAA's clients and remove the bool parameter. llvm-svn: 272970	2016-06-16 22:57:55 +00:00
Adam Nemet	886e0617a2	[LV] Make getSymbolicStrides return a pointer rather than a reference. NFC Turns out SymbolicStrides is actually used in canVectorizeWithIfConvert before it gets set up in canVectorizeMemory. This works fine as long as SymbolicStrides resides in LV since we just have an empty map. Based on this the conclusion is made that there are no symbolic strides which is conservatively correct. However once SymbolicStrides becomes part of LAI, LAI is nullptr at this point so we need to differentiate the uninitialized state by returning a nullptr for SymbolicStrides. llvm-svn: 272966	2016-06-16 21:55:10 +00:00
Sean Silva	a4cfb620df	Attempt to define friend function more portably. Patch written by Reid. I verified it locally with clang. llvm-svn: 272875	2016-06-16 07:00:19 +00:00
Adam Nemet	76a41d3a25	[LV] Make the new getter return a const reference. NFC LoopVectorizationLegality holds a constant reference to LAI, so this will have to be const as well. Also added missed function comment. llvm-svn: 272851	2016-06-15 22:58:27 +00:00
Adam Nemet	82b9d2a72c	[LV] Add getter function for LoopVectorizationLegality::Strides. NFC This should help moving Strides to LAA later. llvm-svn: 272796	2016-06-15 15:49:46 +00:00
Adam Nemet	927b54e48a	[LV] Remove more unused functions. NFC LoopVectorizationLegality::strides_begin/end are also unused. llvm-svn: 272781	2016-06-15 12:26:15 +00:00
Adam Nemet	b1973be8e2	[LV] Remove unused function. NFC LoopVectorizationLegality::mustCheckStrides is unused. llvm-svn: 272780	2016-06-15 12:26:11 +00:00
Sean Silva	7eeda20c72	Work around MSVC "friend" semantics. The error on clang-x86-win2008-selfhost is: C:\buildbot\slave-config\clang-x86-win2008-selfhost\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(955) : error C2248: 'llvm::slpvectorizer::BoUpSLP::ScheduleData' : cannot access private struct declared in class 'llvm::slpvectorizer::BoUpSLP' C:\buildbot\slave-config\clang-x86-win2008-selfhost\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(608) : see declaration of 'llvm::slpvectorizer::BoUpSLP::ScheduleData' C:\buildbot\slave-config\clang-x86-win2008-selfhost\llvm\lib\Transforms\Vectorize\SLPVectorizer.cpp(337) : see declaration of 'llvm::slpvectorizer::BoUpSLP' I reproduced this locally with both MSVC 2013 and MSVC 2015. llvm-svn: 272772	2016-06-15 10:51:40 +00:00
Sean Silva	ec3ed2097b	Speculative buildbot fix. This wasn't failing for me with clang as the compiler. I think GCC may disagree with clang about whether a friend declaration introduces a declaration in the enclosing namespace (or something). Example error: /home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:950:77: error: ‘llvm::raw_ostream& llvm::slpvectorizer::operator<<(llvm::raw_ostream&, const llvm::slpvectorizer::BoUpSLP::ScheduleData&)’ should have been declared inside ‘llvm::slpvectorizer’ const BoUpSLP::ScheduleData &SD) { ^ llvm-svn: 272767	2016-06-15 09:00:33 +00:00
Sean Silva	e0a9e66040	[PM] Port SLPVectorizer to the new PM This uses the "runImpl" approach to share code with the old PM. Porting to the new PM meant abandoning the anonymous namespace enclosing most of SLPVectorizer.cpp which is a bit of a bummer (but not a big deal compared to having to pull the pass class into a header which the new PM requires since it calls the constructor directly). llvm-svn: 272766	2016-06-15 08:43:40 +00:00
Michael Kuperstein	3277a05fcf	Recommit [LV] Enable vectorization of loops where the IV has an external use r272715 broke libcxx because it did not correctly handle cases where the last iteration of one IV is the second-to-last iteration of another. Original commit message: Vectorizing loops with "escaping" IVs has been disabled since r190790, due to PR17179. This re-enables it, with support for external use of both "post-increment" (last iteration) and "pre-increment" (second-to-last iteration) IVs. llvm-svn: 272742	2016-06-15 00:35:26 +00:00
Michael Kuperstein	d4bd3ab5fe	Reverting r272715 since it broke libcxx. llvm-svn: 272730	2016-06-14 22:30:41 +00:00
Michael Kuperstein	23b6d6adc9	[LV] Enable vectorization of loops where the IV has an external use Vectorizing loops with "escaping" IVs has been disabled since r190790, due to PR17179. This re-enables it, with support for external use of both "post-increment" (last iteration) and "pre-increment" (second-to-last iteration) IVs. Differential Revision: http://reviews.llvm.org/D21048 llvm-svn: 272715	2016-06-14 21:27:27 +00:00
Easwaran Raman	e12c487b8c	[PM] Port LCSSA to the new PM. Differential Revision: http://reviews.llvm.org/D21090 llvm-svn: 272294	2016-06-09 19:44:46 +00:00
Michael Kuperstein	c5edcdeb0e	[LV] Use vector phis for some secondary induction variables Previously, we materialized secondary vector IVs from the primary scalar IV, by offseting the primary to match the correct start value, and then broadcasting it - inside the loop body. Instead, we can use a real vector IV, like we do for the primary. This enables using vector IVs for secondary integer IVs whose type matches the type of the primary. Differential Revision: http://reviews.llvm.org/D20932 llvm-svn: 272283	2016-06-09 18:03:15 +00:00
Xinliang David Li	ecde1c7f3d	Revert r272194 No need for it if loop Analysis Manager is used llvm-svn: 272243	2016-06-09 03:22:39 +00:00
Michael Zolotukhin	987ab631fa	[SLPVectorizer] Handle GEP with differing constant index types Summary: This fixes PR27617. Bug description: The SLPVectorizer asserts on encountering GEPs with different index types, such as i8 and i64. The patch includes a simple relaxation of the assert to allow constants being of different types, along with a regression test that will provoke the unrelaxed assert. Reviewers: nadav, mzolotukhin Subscribers: JesperAntonsson, llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D20685 Patch by Jesper Antonsson! llvm-svn: 272206	2016-06-08 21:55:16 +00:00
Xinliang David Li	572135f717	[PM] Refector LoopAccessInfo analysis code This is the preparation patch to port the analysis to new PM Differential Revision: http://reviews.llvm.org/D20560 llvm-svn: 272194	2016-06-08 20:15:37 +00:00
Michael Kuperstein	3a3c64d23e	[LV] For some IVs, use vector phis instead of widening in the loop body Previously, whenever we needed a vector IV, we would create it on the fly, by splatting the scalar IV and adding a step vector. Instead, we can create a real vector IV. This tends to save a couple of instructions per iteration. This only changes the behavior for the most basic case - integer primary IVs with a constant step. Differential Revision: http://reviews.llvm.org/D20315 llvm-svn: 271410	2016-06-01 17:16:46 +00:00
Guozhi Wei	b994f4cdbc	[SLP] Pass in correct alignment when query memory access cost This patch fixes bug https://llvm.org/bugs/show_bug.cgi?id=27897. When query memory access cost, current SLP always passes in alignment value of 1 (unaligned), so it gets a very high cost of scalar memory access, and wrongly vectorize memory loads in the test case. It can be fixed by simply giving correct alignment. llvm-svn: 271333	2016-05-31 20:41:19 +00:00
Michael Kuperstein	9a81b62a01	[BBVectorize] Don't vectorize selects with a scalar condition and vector operands. This fixes PR27879. Differential Revision: http://reviews.llvm.org/D20659 llvm-svn: 270888	2016-05-26 18:43:57 +00:00
Sanjay Patel	6be09ee827	fix typo; NFC llvm-svn: 270760	2016-05-25 21:03:31 +00:00
Sanjay Patel	929ebf5a54	fix typos; NFC llvm-svn: 270579	2016-05-24 16:51:26 +00:00
Wei Mi	0456d9dd18	Recommit r255691 since PR26509 has been fixed. llvm-svn: 270113	2016-05-19 20:38:03 +00:00
James Molloy	a854c0a0c3	[VectorUtils] Fix nasty use-after-free In truncateToMinimalBitwidths() we were RAUW'ing an instruction then erasing it. However, that intruction could be cached in the map we're iterating over. The first check is "I->use_empty()" which in most cases would return true, as the (deleted) object was RAUW'd first so would have zero use count. However in some cases the object could have been polluted or written over and this wouldn't be the case. Also it makes valgrind, asan and traditionalists who don't like their compiler to crash sad. No testcase as there are no externally visible symptoms apart from a crash if the stars align. Fixes PR26509. llvm-svn: 269908	2016-05-18 11:57:58 +00:00
Matthew Simpson	e43198dc4b	[LV] Ensure safe VF for loops with interleaved accesses The selection of the vectorization factor currently doesn't consider interleaved accesses. The vectorization factor is based on the maximum safe dependence distance computed by LAA. However, for loops with interleaved groups, we should instead base the vectorization factor on the maximum safe dependence distance divided by the maximum interleave factor of all the interleaved groups. Interleaved accesses not in a group will be scalarized. Differential Revision: http://reviews.llvm.org/D20241 llvm-svn: 269659	2016-05-16 15:08:20 +00:00
Matthew Simpson	c326d050ca	Correct spelling in comment (NFC) llvm-svn: 269482	2016-05-13 21:01:07 +00:00
Simon Pilgrim	3ac4d831ee	Tidied up switch cases. NFCI. Split FCMP//ICMP/SEL from the basic arithmetic cost functions. They were not sharing any notable code path (just the return) and were repeatedly testing the opcode. llvm-svn: 269348	2016-05-12 21:01:20 +00:00
Michael Kuperstein	82e7df5a58	[LoopVectorizer] LoopVectorBody doesn't need to be a vector. NFC. LoopVectorBody was changed from a single pointer to a SmallVector when store predication was introduced in r200270. Since r247139, store predication no longer splits the vector loop body in-place, so we can go back to having a single LoopVectorBody block. This reverts the no-longer-needed changes from r200270. llvm-svn: 269321	2016-05-12 18:44:51 +00:00
Elena Demikhovsky	c434d091c5	[LoopVectorize] Handling induction variable with non-constant step. Allow vectorization when the step is a loop-invariant variable. This is the loop example that is getting vectorized after the patch: int int_inc; int bar(int init, int restrict A, int N) { int x = init; for (int i=0;i<N;i++){ A[i] = x; x += int_inc; } return x; } "x" is an induction variable with loop-invariant* step. But it is not a primary induction. Primary induction variable with non-constant step is not handled yet. Differential Revision: http://reviews.llvm.org/D19258 llvm-svn: 269023	2016-05-10 07:33:35 +00:00
Denis Zobnin	15d1e64b2b	[LAA] Rename "isStridedPtr" with "getPtrStride". NFC. Changing misleading function name was approved in http://reviews.llvm.org/D17268. Patch by Roman Shirokiy. llvm-svn: 269021	2016-05-10 05:55:16 +00:00
Chad Rosier	b438a327d7	Remove dead include. NFC. llvm-svn: 268655	2016-05-05 17:55:51 +00:00
Silviu Baranga	28eb344140	Fix unused variable warning after r268632 llvm-svn: 268634	2016-05-05 15:27:57 +00:00
Silviu Baranga	c05bab8a9c	[LV] Identify more induction PHIs by coercing expressions to AddRecExprs Summary: Some PHIs can have expressions that are not AddRecExprs due to the presence of sext/zext instructions. In order to prevent the Loop Vectorizer from bailing out when encountering these PHIs, we now coerce the SCEV expressions to AddRecExprs using SCEV predicates (when possible). We only do this when the alternative would be to not vectorize. Reviewers: mzolotukhin, anemet Subscribers: mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17153 llvm-svn: 268633	2016-05-05 15:20:39 +00:00
Silviu Baranga	7e0d4353f2	[LV] Refactor the validation of PHI inductions. NFC This moves the validation of PHI inductions into a separate method, making it easier to reuse this logic. llvm-svn: 268632	2016-05-05 15:14:01 +00:00
Dehao Chen	d55bc4c7ab	clang-format some files in preparation of coming patch reviews. llvm-svn: 268583	2016-05-05 00:54:54 +00:00
David Majnemer	13d5526392	[SLPVectorizer] Add operand bundles to vectorized functions SLPVectorizing a call site should result in further propagation of its bundles. llvm-svn: 268004	2016-04-29 07:09:51 +00:00
David Majnemer	50ddc0e1b6	[LoopVectorize] Add operand bundles to vectorized functions Also, do not crash when calculating a cost model for loop-invariant token values. llvm-svn: 268003	2016-04-29 07:09:48 +00:00
Michael Zolotukhin	1816d03b7d	[PR25281] Remove AAResultsWrapper from preserved analyses of loop vectorizer. We don't preserve AAResults, because, for one, we don't preserve SCEV-AA. That fixes PR25281. llvm-svn: 267980	2016-04-29 03:31:25 +00:00
Hal Finkel	1b66f7e3c8	[LoopVectorize] Keep hints from original loop on the vector loop We need to keep loop hints from the original loop on the new vector loop. Failure to do this meant that, for example: void foo(int *b) { #pragma clang loop unroll(disable) for (int i = 0; i < 16; ++i) b[i] = 1; } this loop would be unrolled. Why? Because we'd vectorize it, thus dropping the hints that unrolling should be disabled, and then we'd unroll it. llvm-svn: 267970	2016-04-29 01:27:40 +00:00
Arch D. Robison	0e61034018	[SLPVectorizer] Extend SLP Vectorizer to deal with aggregates. The refactoring portion part was done as r267748. http://reviews.llvm.org/D14185 llvm-svn: 267899	2016-04-28 16:11:45 +00:00
Matthew Simpson	622b95be7b	[LV] Reallow positive-stride interleaved load groups with gaps We previously disallowed interleaved load groups that may cause us to speculatively access memory out-of-bounds (r261331). We did this by ensuring each load group had an access corresponding to the first and last member. Instead of bailing out for these interleaved groups, this patch enables us to peel off the last vector iteration, ensuring that we execute at least one iteration of the scalar remainder loop. This solution was proposed in the review of the previous patch. Differential Revision: http://reviews.llvm.org/D19487 llvm-svn: 267751	2016-04-27 18:21:36 +00:00
Arch D. Robison	aca7c412b4	[SLPVectorizer] Refactor where MinVecRegSize and MaxVecRegSize live. This is the first of two commits for extending SLP Vectorizer to deal with aggregates. This commit merely refactors existing logic. http://reviews.llvm.org/D14185 llvm-svn: 267748	2016-04-27 17:46:25 +00:00
Matthew Simpson	e5dfb08fcb	[TTI] Add hook for vector extract with extension This change adds a new hook for estimating the cost of vector extracts followed by zero- and sign-extensions. The motivating example for this change is the SMOV and UMOV instructions on AArch64. These instructions move data from vector to general purpose registers while performing the corresponding extension (sign-extend for SMOV and zero-extend for UMOV) at the same time. For these operations, TargetTransformInfo can assume the extensions are free and only report the cost of the vector extract. The SLP vectorizer has been updated to make use of the new hook. Differential Revision: http://reviews.llvm.org/D18523 llvm-svn: 267725	2016-04-27 15:20:21 +00:00
Elena Demikhovsky	308a7eb0d2	Masked Store in Loop Vectorizer - bugfix Fixed a bug in loop vectorization with conditional store. Differential Revision: http://reviews.llvm.org/D19532 llvm-svn: 267597	2016-04-26 20:18:04 +00:00
Hal Finkel	411d31ad72	[LoopVectorize] Don't consider conditional-load dereferenceability for marked parallel loops I really thought we were doing this already, but we were not. Given this input: void Test(int res, int c, int d, int p) { for (int i = 0; i < 16; i++) res[i] = (p[i] == 0) ? res[i] : res[i] + d[i]; } we did not vectorize the loop. Even with "assume_safety" the check that we don't if-convert conditionally-executed loads (to protect against data-dependent deferenceability) was not elided. One subtlety: As implemented, it will still prefer to use a masked-load instrinsic (given target support) over the speculated load. The choice here seems architecture specific; the best option depends on how expensive the masked load is compared to a regular load. Ideally, using the masked load still reduces unnecessary memory traffic, and so should be preferred. If we'd rather do it the other way, flipping the order of the checks is easy. The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also implies that if conversion is okay. Differential Revision: http://reviews.llvm.org/D19512 llvm-svn: 267514	2016-04-26 02:00:36 +00:00
Andrew Kaylor	aa641a5171	Re-commit optimization bisect support (r267022) without new pass manager support. The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling). Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267231	2016-04-22 22:06:11 +00:00
Vedant Kumar	6013f45f92	Revert "Initial implementation of optimization bisect support." This reverts commit r267022, due to an ASan failure: http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549 llvm-svn: 267115	2016-04-22 06:51:37 +00:00
Andrew Kaylor	f0f279291c	Initial implementation of optimization bisect support. This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations. The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used. The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way. Differential Revision: http://reviews.llvm.org/D19172 llvm-svn: 267022	2016-04-21 17:58:54 +00:00
David Majnemer	b4b27230bf	[ValueTracking, VectorUtils] Refactor getIntrinsicIDForCall The functionality contained within getIntrinsicIDForCall is two-fold: it checks if a CallInst's callee is a vectorizable intrinsic. If it isn't an intrinsic, it attempts to map the call's target to a suitable intrinsic. Move the mapping functionality into getIntrinsicForCallSite and rename getIntrinsicIDForCall to getVectorIntrinsicIDForCall while reimplementing it in terms of getIntrinsicForCallSite. llvm-svn: 266801	2016-04-19 19:10:21 +00:00
Michael Kuperstein	de16b44f74	Port DemandedBits to the new pass manager. Differential Revision: http://reviews.llvm.org/D18679 llvm-svn: 266699	2016-04-18 23:55:01 +00:00
Mehdi Amini	b550cb1750	[NFC] Header cleanup Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' \| xargs grep -L 'IndexedMap[<]' \| xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595	2016-04-18 09:17:29 +00:00
Renato Golin	5cb666add7	[ARM] Adding IEEE-754 SIMD detection to loop vectorizer Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON. This patch teaches the loop vectorizer to only allow transformations of loops that either contain no floating-point operations or have enough allowance flags supporting lack of precision (ex. -ffast-math, Darwin). For that, the target description now has a method which tells us if the vectorizer is allowed to handle FP math without falling into unsafe representations, plus a check on every FP instruction in the candidate loop to check for the safety flags. This commit makes LLVM behave like GCC with respect to ARM NEON support, but it stops short of fixing the underlying problem: sub-normals. Neither GCC nor LLVM have a flag for allowing sub-normal operations. Before this patch, GCC only allows it using unsafe-math flags and LLVM allows it by default with no way to turn it off (short of not using NEON at all). As a first step, we push this change to make it safe and in sync with GCC. The second step is to discuss a new sub-normal's flag on both communitues and come up with a common solution. The third step is to improve the FastMath flags in LLVM to encode sub-normals and use those flags to restrict NEON FP. Fixes PR16275. llvm-svn: 266363	2016-04-14 20:42:18 +00:00
David Majnemer	0f26b0aeb4	[CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NAN The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only one of the inputs is NaN: -NUM will return the non-NaN argument while -NAN would return NaN. It is desirable to lower to @llvm.{min,max}num to -NAN if they don't have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the -NAN semantics. N.B. Of course, it is only safe to do this if the intrinsic call is marked nnan. llvm-svn: 266279	2016-04-14 07:13:24 +00:00
Elena Demikhovsky	751ed0a06a	Loop vectorization with uniform load Vectorization cost of uniform load wasn't correctly calculated. As a result, a simple loop that loads a uniform value wasn't vectorized. Differential Revision: http://reviews.llvm.org/D18940 llvm-svn: 265901	2016-04-10 16:53:19 +00:00
David Majnemer	60c6abc3cc	[LoopVectorize] Register cloned assumptions InstCombine cannot effectively remove redundant assumptions without them registered in the assumption cache. The vectorizer can create identical assumptions but doesn't register them with the cache, resulting in slower compile times because InstCombine tries to reason about a lot more assumptions. Fix this by registering the cloned assumptions. llvm-svn: 265800	2016-04-08 16:37:10 +00:00
Silviu Baranga	6f444dfd55	Re-commit [SCEV] Introduce a guarded backedge taken count and use it in LAA and LV This re-commits r265535 which was reverted in r265541 because it broke the windows bots. The problem was that we had a PointerIntPair which took a pointer to a struct allocated with new. The problem was that new doesn't provide sufficient alignment guarantees. This pattern was already present before r265535 and it just happened to work. To fix this, we now separate the PointerToIntPair from the ExitNotTakenInfo struct into a pointer and a bool. Original commit message: Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265786	2016-04-08 14:29:09 +00:00
Silviu Baranga	a393baf1fd	Revert r265535 until we know how we can fix the bots llvm-svn: 265541	2016-04-06 14:06:32 +00:00
Silviu Baranga	72b4a4a330	[SCEV] Introduce a guarded backedge taken count and use it in LAA and LV Summary: When the backedge taken codition is computed from an icmp, SCEV can deduce the backedge taken count only if one of the sides of the icmp is an AddRecExpr. However, due to sign/zero extensions, we sometimes end up with something that is not an AddRecExpr. However, we can use SCEV predicates to produce a 'guarded' expression. This change adds a method to SCEV to get this expression, and the SCEV predicate associated with it. In HowManyGreaterThans and HowManyLessThans we will now add a SCEV predicate associated with the guarded backedge taken count when the analyzed SCEV expression is not an AddRecExpr. Note that we only do this as an alternative to returning a 'CouldNotCompute'. We use new feature in Loop Access Analysis and LoopVectorize to analyze and transform more loops. Reviewers: anemet, mzolotukhin, hfinkel, sanjoy Subscribers: flyingforyou, mcrosier, atrick, mssimpso, sanjoy, mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17201 llvm-svn: 265535	2016-04-06 13:18:26 +00:00
David Majnemer	6f1f85f0e1	[SLPVectorizer] Don't insert an extractelement before a catchswitch A catchswitch cannot be preceded by another instruction in the same basic block (other than a PHI node). Instead, insert the extract element right after the materialization of the vectorized value. This isn't optimal but is a reasonable compromise given the constraints of WinEH. This fixes PR27163. llvm-svn: 265157	2016-04-01 17:28:15 +00:00
Hal Finkel	2e0ff2b244	[LoopVectorize] Don't vectorize loops when everything will be scalarized This change prevents the loop vectorizer from vectorizing when all of the vector types it generates will be scalarized. I've run into this problem on the PPC's QPX vector ISA, which only holds floating-point vector types. The loop vectorizer will, however, happily vectorize loops with purely integer computation. Here's an example: LV: The Smallest and Widest types: 32 / 32 bits. LV: The Widest register is: 256 bits. LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 1 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 1 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 1 for VF 1 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 1 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 1 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 1 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Scalar loop costs: 3. LV: Found an estimated cost of 0 for VF 2 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 2 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 2 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 2 for VF 2 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 2 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 2 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 2 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 2 costs: 2. LV: Found an estimated cost of 0 for VF 4 For instruction: %indvars.iv25 = phi i64 [ 0, %entry ], [ %indvars.iv.next26, %for.body ] LV: Found an estimated cost of 0 for VF 4 For instruction: %arrayidx = getelementptr inbounds [1600 x i32], [1600 x i32]* %a, i64 0, i64 %indvars.iv25 LV: Found an estimated cost of 0 for VF 4 For instruction: %2 = trunc i64 %indvars.iv25 to i32 LV: Found an estimated cost of 4 for VF 4 For instruction: store i32 %2, i32* %arrayidx, align 4 LV: Found an estimated cost of 1 for VF 4 For instruction: %indvars.iv.next26 = add nuw nsw i64 %indvars.iv25, 1 LV: Found an estimated cost of 1 for VF 4 For instruction: %exitcond27 = icmp eq i64 %indvars.iv.next26, 1600 LV: Found an estimated cost of 0 for VF 4 For instruction: br i1 %exitcond27, label %for.cond.cleanup, label %for.body LV: Vector loop of width 4 costs: 1. ... LV: Selecting VF: 8. LV: The target has 32 registers LV(REG): Calculating max register usage: LV(REG): At #0 Interval # 0 LV(REG): At #1 Interval # 1 LV(REG): At #2 Interval # 2 LV(REG): At #4 Interval # 1 LV(REG): At #5 Interval # 1 LV(REG): VF = 8 The problem is that the cost model here is not wrong, exactly. Since all of these operations are scalarized, their cost (aside from the uniform ones) are indeed VF*(scalar cost), just as the model suggests. In fact, the larger the VF picked, the lower the relative overhead from the loop itself (and the induction-variable update and check), and so in a sense, picking the largest VF here is the right thing to do. The problem is that vectorizing like this, where all of the vectors will be scalarized in the backend, isn't really vectorizing, but rather interleaving. By itself, this would be okay, but then the vectorizer itself also interleaves, and that's where the problem manifests itself. There's aren't actually enough scalar registers to support the normal interleave factor multiplied by a factor of VF (8 in this example). In other words, the problem with this is that our register-pressure heuristic does not account for scalarization. While we might want to improve our register-pressure heuristic, I don't think this is the right motivating case for that work. Here we have a more-basic problem: The job of the vectorizer is to vectorize things (interleaving aside), and if the IR it generates won't generate any actual vector code, then something is wrong. Thus, if every type looks like it will be scalarized (i.e. will be split into VF or more parts), then don't consider that VF. This is not a problem specific to PPC/QPX, however. The problem comes up under SSE on x86 too, and as such, this change fixes PR26837 too. I've added Sanjay's reduced test case from PR26837 to this commit. Differential Revision: http://reviews.llvm.org/D18537 llvm-svn: 264904	2016-03-30 19:37:08 +00:00
Nirav Dave	8dd66e5753	Remove HasFnAttribute guards to getFnAttribute calls These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872	2016-03-30 15:41:12 +00:00
Chad Rosier	2e5c526bb1	[SLP] Remove unnecessary member variables by using container APIs. This changes the debug output, but still retains its usefulness. Differential Revision: http://reviews.llvm.org/D18324 llvm-svn: 263975	2016-03-21 19:47:44 +00:00
Adam Nemet	b0c4eae073	[LoopVectorize] Annotate versioned loop with noalias metadata Summary: Use the new LoopVersioning facility (D16712) to add noalias metadata in the vector loop if we versioned with memchecks. This can enable some optimization opportunities further down the pipeline (see the included test or the benchmark improvement quoted in D16712). The test also covers the bug I had in the initial version in D16712. The vectorizer did not previously use LoopVersioning. The reason is that the vectorizer performs its transformations in single shot. It creates an empty single-block vector loop that it then populates with the widened, if-converted instructions. Thus creating an intermediate versioned scalar loop seems wasteful. So this patch (rather than bringing in LoopVersioning fully) adds a special interface to LoopVersioning to allow the vectorizer to add no-alias annotation while still performing its own versioning. As the vectorizer propagates metadata from the instructions in the original loop to the vector instructions we also check the pointer in the original instruction and see if LoopVersioning can add no-alias metadata based on the issued memchecks. Reviewers: hfinkel, nadav, mzolotukhin Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17191 llvm-svn: 263744	2016-03-17 20:32:37 +00:00
Chad Rosier	fea398188c	[SLP] Make DataLayout a member variable. llvm-svn: 263656	2016-03-16 19:48:42 +00:00
Adam Nemet	fdb20595a1	[LV] Preserve LoopInfo when store predication is used This was a latent bug that got exposed by the change to add LoopSimplify as a dependence to LoopLoadElimination. Since LoopInfo was corrupted after LV, LoopSimplify mis-compiled nbench in the test-suite (more details in the PR). The problem was that when we create the blocks for predicated stores we didn't add those to any loops. The original testcase for store predication provides coverage for this assuming we verify LI on the way out of LV. Fixes PR26952. llvm-svn: 263565	2016-03-15 18:06:20 +00:00
Chad Rosier	ebe559019b	[SLP] Update comment to reflect reality. NFC. llvm-svn: 263548	2016-03-15 13:27:58 +00:00
Keno Fischer	a91ae8336b	[SLPVectorizer] Fix dependency list Summary: DemandedBits was added to the requirements of SLPVectorizer in rL261212 (and various earlier version of it), but the appropriate initialization statement was accidentally forgotten. Ref [[ https://github.com/JuliaLang/julia/issues/14998 \| JuliaLang/julia#14998 ]]. Patch by Yichao Yu. Reviewers: mssimpso Differential Revision: http://reviews.llvm.org/D18152 llvm-svn: 263476	2016-03-14 20:04:24 +00:00
Mehdi Amini	ba9fba81d6	Remove PreserveNames template parameter from IRBuilder This reapplies r263258, which was reverted in r263321 because of issues on Clang side. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263393	2016-03-13 21:05:13 +00:00
Eric Christopher	35abd051c0	Temporarily revert: commit ae14bf6488e8441f0f6d74f00455555f6f3943ac Author: Mehdi Amini <mehdi.amini@apple.com> Date: Fri Mar 11 17:15:50 2016 +0000 Remove PreserveNames template parameter from IRBuilder Summary: Following r263086, we are now relying on a flag on the Context to discard Value names in release builds. Reviewers: chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18023 From: Mehdi Amini <mehdi.amini@apple.com> git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258 91177308-0d34-0410-b5e6-96231b3b80d8 until we can figure out what to do about clang and Release build testing. This reverts commit 263258. llvm-svn: 263321	2016-03-12 01:47:22 +00:00
Mehdi Amini	99eab3dd06	Remove PreserveNames template parameter from IRBuilder Summary: Following r263086, we are now relying on a flag on the Context to discard Value names in release builds. Reviewers: chandlerc Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D18023 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 263258	2016-03-11 17:15:50 +00:00
Michael Zolotukhin	b88fbe08fc	[SLP] Add -slp-min-reg-size command line option. MinVecRegSize is currently hardcoded to 128; this patch adds a cl::opt to allow changing it. I tried not to change any existing behavior for the default case. Differential revision: http://reviews.llvm.org/D13278 llvm-svn: 263089	2016-03-10 02:49:47 +00:00
Duncan P. N. Exon Smith	e9bc579c37	ADT: Remove == and != comparisons between ilist iterators and pointers I missed == and != when I removed implicit conversions between iterators and pointers in r252380 since they were defined outside ilist_iterator. Since they depend on getNodePtrUnchecked(), they indirectly rely on UB. This commit removes all uses of these operators. (I'll delete the operators themselves in a separate commit so that it can be easily reverted if necessary.) There should be NFC here. llvm-svn: 261498	2016-02-21 20:39:50 +00:00
Hans Wennborg	a0f7090563	Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions." It caused PR26509. llvm-svn: 261368	2016-02-19 21:40:12 +00:00
Matthew Simpson	29c997c1a1	[LV] Vectorize first-order recurrences This patch enables the vectorization of first-order recurrences. A first-order recurrence is a non-reduction recurrence relation in which the value of the recurrence in the current loop iteration equals a value defined in the previous iteration. The load PRE of the GVN pass often creates these recurrences by hoisting loads from within loops. In this patch, we add a new recurrence kind for first-order phi nodes and attempt to vectorize them if possible. Vectorization is performed by shuffling the values for the current and previous iterations. The vectorization cost estimate is updated to account for the added shuffle instruction. Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org> Differential Revision: http://reviews.llvm.org/D16197 llvm-svn: 261346	2016-02-19 17:56:08 +00:00
Silviu Baranga	ad1dafb2c3	[LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorization Summary: If we don't have the first and last access of an interleaved load group, the first and last wide load in the loop can do an out of bounds access. Even though we discard results from speculative loads, this can cause problems, since it can technically generate page faults (or worse). We now discard interleaved load groups that don't have the first and load in the group. Reviewers: hfinkel, rengolin Subscribers: rengolin, llvm-commits, mzolotukhin, anemet Differential Revision: http://reviews.llvm.org/D17332 llvm-svn: 261331	2016-02-19 15:46:10 +00:00
Matthew Simpson	92821cb4a8	Reapply commit r259357 with a fix for PR26629 Commit r259357 was reverted because it caused PR26629. We were assuming all roots of a vectorizable tree could be truncated to the same width, which is not the case in general. This commit reapplies the patch along with a fix and a new test case to ensure we don't regress because of this issue again. This should fix PR26629. llvm-svn: 261212	2016-02-18 14:14:40 +00:00
Elena Demikhovsky	88e76cad16	Create masked gather and scatter intrinsics in Loop Vectorizer. Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access. The feature is enabled on AVX-512 target. Differential Revision: http://reviews.llvm.org/D15690 llvm-svn: 261140	2016-02-17 19:23:04 +00:00
David Majnemer	f48bcb2bd9	Revert "Reapply commit r258404 with fix." This reverts commit r259357, it caused PR26629. llvm-svn: 261137	2016-02-17 19:02:36 +00:00
Silviu Baranga	ec7063ac77	[LV] Add support for insertelt/extractelt processing during type truncation Summary: While shrinking types according to the required bits, we can encounter insert/extract element instructions. This will cause us to reach an llvm_unreachable statement. This change adds support for truncating insert/extract element operations, and adds a regression test. Reviewers: jmolloy Subscribers: mzolotukhin, llvm-commits Differential Revision: http://reviews.llvm.org/D17078 llvm-svn: 260893	2016-02-15 15:38:17 +00:00
Matthew Simpson	a4e43c5b51	[SLP] Add debug output for extract cost (NFC) llvm-svn: 260614	2016-02-11 23:06:40 +00:00
Silviu Baranga	ea63a7f512	[SCEV][LAA] Re-commit r260085 and r260086, this time with a fix for the memory sanitizer issue. The PredicatedScalarEvolution's copy constructor wasn't copying the Generation value, and was leaving it un-initialized. Original commit message: [SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 llvm-svn: 260112	2016-02-08 17:02:45 +00:00
Igor Breger	1a39a34eae	[SLP] Fix placement of debug statement (NFC) By Ayal Zaks (ayal.zaks@intel.com) Differential Revision: http://reviews.llvm.org/D16976 llvm-svn: 260094	2016-02-08 14:11:39 +00:00
Silviu Baranga	41b4973329	Revert r260086 and r260085. They have broken the memory sanitizer bots. llvm-svn: 260087	2016-02-08 11:56:15 +00:00
Silviu Baranga	a35fadc7c4	[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection Summary: This change adds no wrap SCEV predicates with: - support for runtime checking - support for expression rewriting: (sext ({x,+,y}) -> {sext(x),+,sext(y)} (zext ({x,+,y}) -> {zext(x),+,sext(y)} Note that we are sign extending the increment of the SCEV, even for the zext case. This is needed to cover the fairly common case where y would be a (small) negative integer. In order to do this, this change adds two new flags: nusw and nssw that are applicable to AddRecExprs and permit the transformations above. We also change isStridedPtr in LAA to be able to make use of these predicates. With this feature we should now always be able to work around overflow issues in the dependence analysis. Reviewers: mzolotukhin, sanjoy, anemet Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel Differential Revision: http://reviews.llvm.org/D15412 llvm-svn: 260085	2016-02-08 10:45:50 +00:00
Wei Mi	a49559befb	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. The original commit triggered regressions in Polly tests. The regressions exposed two problems which have been fixed in current version. 1. Polly will generate a new function based on the old one. To generate an instruction for the new function, it builds SCEV for the old instruction, applies some tranformation on the SCEV generated, then expands the transformed SCEV and insert the expanded value into new function. Because SCEV expansion may reuse value cached in ExprValueMap, the value in old function may be inserted into new function, which is wrong. In SCEVExpander::expand, there is a logic to check the cached value to be used should dominate the insertion point. However, for the above case, the check always passes. That is because the insertion point is in a new function, which is unreachable from the old function. However for unreachable node, DominatorTreeBase::dominates thinks it will be dominated by any other node. The fix is to simply add a check that the cached value to be used in expansion should be in the same function as the insertion point instruction. 2. When the SCEV is of scConstant type, expanding it directly is cheaper than reusing a normal value cached. Although in the cached value set in ExprValueMap, there is a Constant type value, but it is not easy to find it out -- the cached Value set is not sorted according to the potential cost. Existing reuse logic in SCEVExpander::expand simply chooses the first legal element from the cached value set. The fix is that when the SCEV is of scConstant type, don't try the reuse logic. simply expand it. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259736	2016-02-04 01:27:38 +00:00
Junmo Park	e90057a5f3	Minor code cleanups. NFC. llvm-svn: 259725	2016-02-03 23:16:39 +00:00
Wei Mi	97de385868	Revert r259662, which caused regressions on polly tests. llvm-svn: 259675	2016-02-03 18:05:57 +00:00
Wei Mi	ed133978a0	[SCEV] Try to reuse existing value during SCEV expansion Current SCEV expansion will expand SCEV as a sequence of operations and doesn't utilize the value already existed. This will introduce redundent computation which may not be cleaned up throughly by following optimizations. This patch introduces an ExprValueMap which is a map from SCEV to the set of equal values with the same SCEV. When a SCEV is expanded, the set of values is checked and reused whenever possible before generating a sequence of operations. Differential Revision: http://reviews.llvm.org/D12090 llvm-svn: 259662	2016-02-03 17:05:12 +00:00
Matthew Simpson	73dad62174	[LV] Rename RdxPHIsToFix to PHIsToFix (NFC) In the future, we will vectorize recurrences other than reductions. This patch renames a few variables and updates their associated comments to enable them to be reused for non-reduction PHI nodes. This change was requested in the review for D16197. llvm-svn: 259364	2016-02-01 16:07:01 +00:00
Matthew Simpson	c578d67407	Reapply commit r258404 with fix. The previous patch caused PR26364. The fix is to ensure that we don't enter a cycle when iterating over use-def chains. llvm-svn: 259357	2016-02-01 13:38:29 +00:00
Matthew Simpson	53d00ef874	[SLP] Fix printing of debug statement (NFC) llvm-svn: 259212	2016-01-29 17:21:38 +00:00
David Majnemer	b2416bd2a7	Revert "Reapply commit r258404 with fix" This reverts commit r258929, it caused PR26364. llvm-svn: 259148	2016-01-29 02:43:22 +00:00
Matthew Simpson	b95861d35e	Reapply commit r258404 with fix This patch is the second attempt to reapply commit r258404. There was bug in the initial patch and subsequent fix (mentioned below). The initial patch caused an assertion because we were computing smaller type sizes for instructions that cannot be demoted. The fix first determines the instructions that will be demoted, and then applies the smaller type size to only those instructions. This should fix PR26239 and PR26307. llvm-svn: 258929	2016-01-27 13:43:27 +00:00
Haicheng Wu	76873b6039	[SLPVectorizer] Swap the checking order of isCommutative and isConsecutiveAccess NFC llvm-svn: 258909	2016-01-27 04:59:05 +00:00
Chris Bieneman	e49730d4ba	Remove autoconf support Summary: This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html "I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened." - Obi Wan Kenobi Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D16471 llvm-svn: 258861	2016-01-26 21:29:08 +00:00
Matthew Simpson	61d5a18469	Revert "Reapply commit r258404 with fix" This commit exposes a crash in computeKnownBits on the Chromium buildbots. Reverting to investigate. Reference: https://llvm.org/bugs/show_bug.cgi?id=26307 llvm-svn: 258812	2016-01-26 15:45:49 +00:00
Haicheng Wu	f1c00a22be	[LIR] Add support for structs and hand unrolled loops This is a recommit of r258620 which causes PR26293. The original message: Now LIR can turn following codes into memset: typedef struct foo { int a; int b; } foo_t; void bar(foo_t f, unsigned n) { for (unsigned i = 0; i < n; ++i) { f[i].a = 0; f[i].b = 0; } } void test(foo_t f, unsigned n) { for (unsigned i = 0; i < n; i += 2) { f[i] = 0; f[i+1] = 0; } } llvm-svn: 258777	2016-01-26 02:27:47 +00:00
Matthew Simpson	cfe5e2c846	Reapply commit r25804 with fix We were hitting an assertion because we were computing smaller type sizes for instructions that cannot be demoted. The fix first determines the instructions that will be demoted, and then applies the smaller type size to only those instructions. This should fix PR26239. llvm-svn: 258705	2016-01-25 19:24:29 +00:00
Quentin Colombet	a392810bea	Speculatively revert r258620 as it is the likely culprid of PR26293. llvm-svn: 258703	2016-01-25 19:12:49 +00:00
Haicheng Wu	dd5e9d2159	[LIR] Add support for structs and hand unrolled loops Now LIR can turn following codes into memset: typedef struct foo { int a; int b; } foo_t; void bar(foo_t f, unsigned n) { for (unsigned i = 0; i < n; ++i) { f[i].a = 0; f[i].b = 0; } } void test(foo_t f, unsigned n) { for (unsigned i = 0; i < n; i += 2) { f[i] = 0; f[i+1] = 0; } } llvm-svn: 258620	2016-01-23 06:52:41 +00:00
Matthew Simpson	486bace5cc	Revert "[SLP] Truncate expressions to minimum required bit width" This reverts commit r258404. llvm-svn: 258408	2016-01-21 17:17:20 +00:00
Matthew Simpson	cb17d72170	[SLP] Truncate expressions to minimum required bit width This change attempts to produce vectorized integer expressions in bit widths that are narrower than their scalar counterparts. The need for demotion arises especially on architectures in which the small integer types (e.g., i8 and i16) are not legal for scalar operations but can still be used in vectors. Like similar work done within the loop vectorizer, we rely on InstCombine to perform the actual type-shrinking. We use the DemandedBits analysis and ComputeNumSignBits from ValueTracking to determine the minimum required bit width of an expression. Differential revision: http://reviews.llvm.org/D15815 llvm-svn: 258404	2016-01-21 16:31:55 +00:00
Matthew Simpson	57fe1b10db	Reapply r257800 with fix The fix uniques the bundle of getelementptr indices we are about to vectorize since it's possible for the same index to be used by multiple instructions. The original commit message is below. [SLP] Vectorize the index computations of getelementptr instructions. This patch seeds the SLP vectorizer with getelementptr indices. The primary motivation in doing so is to vectorize gather-like idioms beginning with consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these cases could be vectorized with a top-down phase, seeding the existing bottom-up phase with the index computations avoids the complexity, compile-time, and phase ordering issues associated with a full top-down pass. Only bundles of single-index getelementptrs with non-constant differences are considered for vectorization. llvm-svn: 257918	2016-01-15 18:51:51 +00:00
Matthew Simpson	9258e013a2	Revert "[SLP] Vectorize the index computations of getelementptr instructions." This reverts commit r257800. llvm-svn: 257888	2016-01-15 13:10:46 +00:00
Matthew Simpson	791fd160c3	[SLP] Vectorize the index computations of getelementptr instructions. This patch seeds the SLP vectorizer with getelementptr indices. The primary motivation in doing so is to vectorize gather-like idioms beginning with consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these cases could be vectorized with a top-down phase, seeding the existing bottom-up phase with the index computations avoids the complexity, compile-time, and phase ordering issues associated with a full top-down pass. Only bundles of single-index getelementptrs with non-constant differences are considered for vectorization. Differential Revision: http://reviews.llvm.org/D14829 llvm-svn: 257800	2016-01-14 20:46:27 +00:00
Junmo Park	b98cc2a617	Remove extra whitespace. NFC. llvm-svn: 257578	2016-01-13 07:03:42 +00:00
Sanjay Patel	046c1d6355	rangify; NFCI llvm-svn: 257500	2016-01-12 18:47:59 +00:00
Sanjay Patel	a252815bc1	function names start with a lower case letter ; NFC llvm-svn: 257496	2016-01-12 18:03:37 +00:00
Matthew Simpson	bf894faa15	[LV] Avoid creating empty reduction entries (NFC) This patch prevents us from unintentionally creating entries in the reductions map for PHIs that are not actually reductions. This is currently not an issue since we bail out if we encounter PHIs other than inductions or reductions. However the behavior could become problematic as we add support for additional recurrence types. llvm-svn: 256930	2016-01-06 12:50:29 +00:00
Sanjoy Das	0de2feceb1	[SCEV] Add and use SCEVConstant::getAPInt; NFCI llvm-svn: 255921	2015-12-17 20:28:46 +00:00
Charlie Turner	5b8895b496	[SLPVectorizer] Ensure dominated reduction values. When considering incoming values as part of a reduction phi, ensure the incoming value is dominated by said phi. Failing to ensure this property causes miscompiles. Fixes PR25787. Many thanks to Mattias Eriksson for reporting, reducing and analyzing the problem for me. Differential Revision: http://reviews.llvm.org/D15580 llvm-svn: 255792	2015-12-16 18:23:44 +00:00
Cong Hou	a73ffa2206	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. (This is the third attempt to check in this patch, and the first two are r255454 and r255460. The once failed test file reg-usage.ll is now moved to test/Transform/LoopVectorize/X86 directory with target datalayout and target triple indicated.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255691	2015-12-15 22:45:09 +00:00
Cong Hou	ccec6e4d84	Revert r255460, which still causes test failures on some platforms. Further investigation on the failures is ongoing. llvm-svn: 255463	2015-12-13 17:15:38 +00:00
Cong Hou	e6a210f50b	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. (This is the second attempt to check in this patch: REQUIRES: asserts is added to reg-usage.ll now.) LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255460	2015-12-13 16:55:46 +00:00
Cong Hou	7c369156eb	Revert r255454 as it leads to several test failers on buildbots. llvm-svn: 255456	2015-12-13 09:28:57 +00:00
Cong Hou	7f8b43d424	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the register usage for specific VFs. However, it takes into account many instructions that won't be vectorized, such as induction variables, GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative when choosing VF. In this patch, the induction variables that won't be vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set so that their register usage won't be considered any more. Differential revision: http://reviews.llvm.org/D15177 llvm-svn: 255454	2015-12-13 08:44:08 +00:00
Hal Finkel	494393b740	AlignmentFromAssumptions and SLPVectorizer preserves AA and GlobalsAA GlobalsAA's assumptions that passes do not escape globals not previously escaped is not violated by AlignmentFromAssumptions and SLPVectorizer. Marking them as such allows GlobalsAA to be preserved until GVN in the LTO pipeline. http://lists.llvm.org/pipermail/llvm-dev/2015-December/092972.html Patch by Vaivaswatha Nagaraj! llvm-svn: 255348	2015-12-11 17:46:01 +00:00
Silviu Baranga	9cd9a7e310	Re-commit r255115, with the PredicatedScalarEvolution class moved to ScalarEvolution.h, in order to avoid cyclic dependencies between the Transform and Analysis modules: [LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions Summary: This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is that both LAA and LV should use this interface everywhere. This also solves a problem involving the result of SCEV expression rewritting when the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates P1: {a,+,b} has nsw P2: b = 1. Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies). The SCEVPredicatedLayer maintains the order of transformations by feeding back the results of previous transformations into new transformations, and therefore avoiding this issue. The SCEVPredicatedLayer maintains a cache to remember the results of previous SCEV rewritting results. This also has the benefit of reducing the overall number of expression rewrites. Reviewers: mzolotukhin, anemet Subscribers: jmolloy, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D14296 llvm-svn: 255122	2015-12-09 16:06:28 +00:00
Silviu Baranga	ad1ccb357b	Revert r255115 until we figure out how to fix the bot failures. llvm-svn: 255117	2015-12-09 15:25:28 +00:00
Silviu Baranga	41eb682501	[LV][LAA] Add a layer over SCEV to apply run-time checked knowledge on SCEV expressions Summary: This change creates a layer over ScalarEvolution for LAA and LV, and centralizes the usage of SCEV predicates. The SCEVPredicatedLayer takes the statically deduced knowledge by ScalarEvolution and applies the knowledge from the SCEV predicates. The end goal is that both LAA and LV should use this interface everywhere. This also solves a problem involving the result of SCEV expression rewritting when the predicate changes. Suppose we have the expression (sext {a,+,b}) and two predicates P1: {a,+,b} has nsw P2: b = 1. Applying P1 and then P2 gives us {a,+,1}, while applying P2 and the P1 gives us sext({a,+,1}) (the AddRec expression was changed by P2 so P1 no longer applies). The SCEVPredicatedLayer maintains the order of transformations by feeding back the results of previous transformations into new transformations, and therefore avoiding this issue. The SCEVPredicatedLayer maintains a cache to remember the results of previous SCEV rewritting results. This also has the benefit of reducing the overall number of expression rewrites. Reviewers: mzolotukhin, anemet Subscribers: jmolloy, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D14296 llvm-svn: 255115	2015-12-09 15:03:52 +00:00
Cong Hou	a465312e9c	Fix a typo in LoopVectorize.cpp. NFC. llvm-svn: 254813	2015-12-05 01:00:22 +00:00
Cong Hou	1a6b5a9e4f	Fix a typo in LoopVectorize.cpp. NFC. llvm-svn: 254549	2015-12-02 21:33:47 +00:00
Charlie Turner	54336a5a4e	[LoopVectorize] Use MapVector rather than DenseMap for MinBWs. The order in which instructions are truncated in truncateToMinimalBitwidths effects code generation. Switch to a map with a determinisic order, since the iteration order over a DenseMap is not defined. This code is not hot, so the difference in container performance isn't interesting. Many thanks to David Blaikie for making me aware of MapVector! Fixes PR25490. Differential Revision: http://reviews.llvm.org/D14981 llvm-svn: 254179	2015-11-26 20:39:51 +00:00
Chad Rosier	33efdf810f	[LV] Add a helper function, isReductionVariable. NFC. llvm-svn: 253565	2015-11-19 14:19:06 +00:00
Cong Hou	7b2ae9abba	Fix several long lines (>80) in LoopVectorize.cpp. NFC. llvm-svn: 253527	2015-11-19 00:32:30 +00:00
Chad Rosier	6066dc69f1	Typo. llvm-svn: 253336	2015-11-17 13:58:10 +00:00
Charlie Turner	d82c9389e7	[SLP] Enable -slp-vectorize-hor by default. Measurements primarily on AArch64 have shown this feature does not significantly effect compile-time. The are no significant perf changes in LNT, but for AArch64 at least, there are wins in third party benchmarks. As discussed on llvm-dev, we're going to try turning this on by default and see how other targets react to the change. llvm-svn: 252733	2015-11-11 15:03:46 +00:00
James Molloy	45f67d52d0	[LoopVectorize] Address post-commit feedback on r250032 Implemented as many of Michael's suggestions as were possible: * clang-format the added code while it is still fresh. * tried to change Value* to Instruction* in many places in computeMinimumValueSizes - unfortunately there are several places where Constants need to be handled so this wasn't possible. * Reduce the pass list on loop-vectorization-factors.ll. * Fix a bug where we were querying MinBWs for I->getOperand(0) but using MinBWs[I]. llvm-svn: 252469	2015-11-09 14:32:05 +00:00
Mehdi Amini	b0e3192a48	Fix SLPVectorizer commutativity reordering The SLPVectorizer had a very crude way of trying to benefit from associativity: it tried to optimize for splat/broadcast or in order to have the same operator on the same side. This is benefitial to the cost model and allows more vectorization to occur. This patch improve the logic and make the detection optimal (locally, we don't look at the full tree but only at the immediate children). Should fix https://llvm.org/bugs/show_bug.cgi?id=25247 Reviewers: mzolotukhin Differential Revision: http://reviews.llvm.org/D13996 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 252337	2015-11-06 20:17:51 +00:00
Elena Demikhovsky	2b06b0fe2a	LoopVectorizer - skip 'bitcast' between GEP and load. Skipping 'bitcast' in this case allows to vectorize load: %arrayidx = getelementptr inbounds double, double* %in, i64 %indvars.iv %tmp53 = bitcast double** %arrayidx to i64* %tmp54 = load i64, i64* %tmp53, align 8 Differential Revision http://reviews.llvm.org/D14112 llvm-svn: 251907	2015-11-03 10:29:34 +00:00
Cong Hou	cf2ed26836	Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. This is the second attempt to submit this patch. The first attempt got a test failure on ARM. This patch is updated to try to fix the failure (more specifically, by handling the case when VF=1). Differential revision: http://reviews.llvm.org/D8943 llvm-svn: 251850	2015-11-02 22:53:48 +00:00
Silviu Baranga	e3c0534b11	[SCEV][LV] Add SCEV Predicates and use them to re-implement stride versioning Summary: SCEV Predicates represent conditions that typically cannot be derived from static analysis, but can be used to reduce SCEV expressions to forms which are usable for different optimizers. ScalarEvolution now has the rewriteUsingPredicate method which can simplify a SCEV expression using a SCEVPredicateSet. The normal workflow of a pass using SCEVPredicates would be to hold a SCEVPredicateSet and every time assumptions need to be made a new SCEV Predicate would be created and added to the set. Each time after calling getSCEV, the user will call the rewriteUsingPredicate method. We add two types of predicates SCEVPredicateSet - implements a set of predicates SCEVEqualPredicate - tests for equality between two SCEV expressions We use the SCEVEqualPredicate to re-implement stride versioning. Every time we version a stride, we will add a SCEVEqualPredicate to the context. Instead of adding specific stride checks, LoopVectorize now adds a more generic SCEV check. We only need to add support for this in the LoopVectorizer since this is the only pass that will do stride versioning. Reviewers: mzolotukhin, anemet, hfinkel, sanjoy Subscribers: sanjoy, hfinkel, rengolin, jmolloy, llvm-commits Differential Revision: http://reviews.llvm.org/D13595 llvm-svn: 251800	2015-11-02 14:41:02 +00:00
Cong Hou	45bd8ce64c	Revert the revision 251592 as it fails a test on some platforms. llvm-svn: 251617	2015-10-29 05:35:22 +00:00
Cong Hou	abe042bb3e	Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using larger vectorization factor. To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers. llvm-svn: 251592	2015-10-29 01:28:44 +00:00
NAKAMURA Takumi	6f49ecc3c4	Whitespace. llvm-svn: 251437	2015-10-27 19:02:52 +00:00
NAKAMURA Takumi	7ef7293b40	Revert r251291, "Loop Vectorizer - skipping "bitcast" before GEP" It causes miscompilation of llvm/lib/ExecutionEngine/Interpreter/Execution.cpp. See also PR25324. llvm-svn: 251436	2015-10-27 19:02:36 +00:00

1 2 3 4 5 ...

1105 Commits