llvm-project

Commit Graph

Author	SHA1	Message	Date
Dorit Nuzman	3ec99fe21b	[IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when optimizing for size LV is careful to respect -Os and not to create a scalar epilog in all cases (runtime tests, trip-counts that require a remainder loop) except for peeling due to gaps in interleave-groups. This patch fixes that; -Os will now have us invalidate such interleave-groups and vectorize without an epilog. The patch also removes a related FIXME comment that is now obsolete, and was also inaccurate: "FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller MaxVF that does not require a scalar epilog." (requiresScalarEpilog() has nothing to do with VF). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53420 llvm-svn: 344883	2018-10-22 06:17:09 +00:00
Ayal Zaks	b0b5312e67	[LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size When optimizing for size, a loop is vectorized only if the resulting vector loop completely replaces the original scalar loop. This holds if no runtime guards are needed, if the original trip-count TC does not overflow, and if TC is a known constant that is a multiple of the VF. The last two TC-related conditions can be overcome by 1. rounding the trip-count of the vector loop up from TC to a multiple of VF; 2. masking the vector body under a newly introduced "if (i <= TC-1)" condition. The patch allows loops with arbitrary trip counts to be vectorized under -Os, subject to the existing cost model considerations. It also applies to loops with small trip counts (under -O2) which are currently handled as if under -Os. The patch does not handle loops with reductions, live-outs, or w/o a primary induction variable, and disallows interleave groups. (Third, final and main part of -) Differential Revision: https://reviews.llvm.org/D50480 llvm-svn: 344743	2018-10-18 15:03:15 +00:00
Anna Thomas	6f732bfb79	[LV] Teach vectorizer about variant value store into uniform address Summary: Teach vectorizer about vectorizing variant value stores to uniform address. Similar to rL343028, we do not allow vectorization if we have multiple stores to the same uniform address. Cost model already has the change for considering the extract instruction cost for a variant value store. See added test cases for how vectorization is done. The patch also contains changes to the ORE messages. Reviewers: Ayal, mkuper, anemet, hsaito Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D52656 llvm-svn: 344613	2018-10-16 15:46:26 +00:00
Chandler Carruth	e303c87e19	[TI removal] Make `getTerminator()` return a generic `Instruction`. This removes the primary remaining API producing `TerminatorInst` which will reduce the rate at which code is introduced trying to use it and generally make it much easier to remove the remaining APIs across the codebase. Also clean up some of the stragglers that the previous mechanical update of variables missed. Users of LLVM and out-of-tree code generally will need to update any explicit variable types to handle this. Replacing `TerminatorInst` with `Instruction` (or `auto`) almost always works. Most of these edits were made in prior commits using the perl one-liner: ``` perl -i -ple 's/TerminatorInst(\b.* = .*getTerminator)/Instruction\1/g' ``` This also my break some rare use cases where people overload for both `Instruction` and `TerminatorInst`, but these should be easily fixed by removing the `TerminatorInst` overload. llvm-svn: 344504	2018-10-15 10:42:50 +00:00
Chandler Carruth	edb12a838a	[TI removal] Make variables declared as `TerminatorInst` and initialized by `getTerminator()` calls instead be declared as `Instruction`. This is the biggest remaining chunk of the usage of `getTerminator()` that insists on the narrow type and so is an easy batch of updates. Several files saw more extensive updates where this would cascade to requiring API updates within the file to use `Instruction` instead of `TerminatorInst`. All of these were trivial in nature (pervasively using `Instruction` instead just worked). llvm-svn: 344502	2018-10-15 10:04:59 +00:00
Ayal Zaks	e567b5b526	[LV] Fix comments reported when not vectorizing single iteration loops; NFC Landing this as a separate part of https://reviews.llvm.org/D50480, being a seemingly unrelated change ([LV] Vectorizing loops of arbitrary trip count without remainder under opt for size). llvm-svn: 344483	2018-10-14 17:53:02 +00:00
Dorit Nuzman	38bbf81ade	recommit 344472 after fixing build failure on ARM and PPC. llvm-svn: 344475	2018-10-14 08:50:06 +00:00
Dorit Nuzman	5118c68cde	revert 344472 due to failures. llvm-svn: 344473	2018-10-14 07:21:20 +00:00
Dorit Nuzman	8174368955	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472	2018-10-14 07:06:16 +00:00
Florian Hahn	18e07bb822	[LV] Use SmallVector instead of DenseMap in calculateRegisterUsage (NFC). We assign indices sequentially for seen instructions, so we can just use a vector and push back the seen instructions. No need for using a DenseMap. Reviewers: hsaito, rengolin, nadav, dcaballe Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D53089 llvm-svn: 344233	2018-10-11 09:46:25 +00:00
Florian Hahn	7eb5cb4ebc	[LV] Ignore more debug info. We can avoid doing some unnecessary work by skipping debug instructions in a few loops. It also helps to ensure debug instructions do not prevent vectorization, although I do not have any concrete test cases for that. Reviewers: rengolin, hsaito, dcaballe, aprantl, vsk Reviewed By: rengolin, dcaballe Differential Revision: https://reviews.llvm.org/D53091 llvm-svn: 344232	2018-10-11 09:27:24 +00:00
Renato Golin	d8e7ca4a32	[VPlan] Fix CondBit quoting in dumpBasicBlock Quotes were being printed for VPInstructions but not the rest. llvm-svn: 344161	2018-10-10 17:55:21 +00:00
Max Kazantsev	b07369651e	[LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160 At the point when we perform `emitTransformedIndex`, we have a broken IR (in particular, we have Phis for which not every incoming value is properly set). On such IR, it is illegal to create SCEV expressions, because their internal simplification process may try to prove some predicates and break when it stumbles across some broken IR. The only purpose of using SCEV in this particular place is attempt to simplify the generated code slightly. It seems that the result isn't worth it, because some trivial cases (like addition of zero and multiplication by 1) can be handled separately if needed, but more generally InstCombine is able to achieve the goals we want to achieve by using SCEV. This patch fixes a functional crash described in PR39160, and as side-effect it also generates a bit smarter code in some simple cases. It also may cause some optimality loss (i.e. we will now generate `mul` by power of `2` instead of shift etc), but there is nothing what InstCombine could not handle later. In case of dire need, we can support more trivial cases just in place. Note that this patch only fixes one particular case of the general problem that LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR. The general solution, however, seems complex enough. Differential Revision: https://reviews.llvm.org/D52881 Reviewed By: fhahn, hsaito llvm-svn: 343954	2018-10-08 05:46:29 +00:00
Jonas Paulsson	29d80f07ee	[LoopVectorizer] Use TTI.getOperandInfo() Call getOperandInfo() instead of using (near) duplicated code in LoopVectorizationCostModel::getInstructionCost(). This gets the OperandValueKind and OperandValueProperties values for a Value passed as operand to an arithmetic instruction. getOperandInfo() used to be a static method in TargetTransformInfo.cpp, but is now instead a public member. Review: Florian Hahn https://reviews.llvm.org/D52883 llvm-svn: 343852	2018-10-05 14:34:04 +00:00
Anna Thomas	b1e3d45318	[LV][LAA] Vectorize loop invariant values stored into loop invariant address Summary: We are overly conservative in loop vectorizer with respect to stores to loop invariant addresses. More details in https://bugs.llvm.org/show_bug.cgi?id=38546 This is the first part of the fix where we start with vectorizing loop invariant values to loop invariant addresses. This also includes changes to ORE for stores to invariant address. Reviewers: anemet, Ayal, mkuper, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50665 llvm-svn: 343028	2018-09-25 20:57:20 +00:00
Warren Ristow	4f27730eaf	[Loop Vectorizer] Abandon vectorization when no integer IV found Support for vectorizing loops with secondary floating-point induction variables was added in r276554. A primary integer IV is still required for vectorization to be done. If an FP IV was found, but no integer IV was found at all (primary or secondary), the attempt to vectorize still went forward, causing a compiler-crash. This change abandons that attempt when no integer IV is found. (Vectorizing FP-only cases like this, rather than bailing out, is discussed as possible future work in D52327.) See PR38800 for more information. Differential Revision: https://reviews.llvm.org/D52327 llvm-svn: 342786	2018-09-21 23:03:50 +00:00
Matt Arsenault	c640798597	LSV: Fix adjust alloca alignment trick for AMDGPU This was checking the hardcoded address space 0 for the stack. Additionally, this should be checking for legality with the adjusted alignment, so defer the alignment check. Also try to split if the unaligned access isn't allowed. llvm-svn: 342442	2018-09-18 02:05:44 +00:00
Hideki Saito	d19851ac7e	Fix for the buildbot failure http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/23635 from the commit (r342197) of https://reviews.llvm.org/D50820. llvm-svn: 342201	2018-09-14 02:02:57 +00:00
Hideki Saito	ea7f3035a0	[VPlan] Implement initial vector code generation support for simple outer loops. Summary: [VPlan] Implement vector code generation support for simple outer loops. Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following: - force vector code generation using explicit vectorize_width - add conservative early returns in cost model and other places for VPlanNativePath - add code for setting up outer loop inductions - support for widening non-induction PHIs that can result from inner loops and uniform conditional branches - support for generating uniform inner branches We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer. Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal Reviewed By: fhahn, hsaito Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D50820 llvm-svn: 342197	2018-09-14 00:36:00 +00:00
Florian Hahn	1086ce2397	[LV] Move InterleaveGroup and InterleavedAccessInfo to VectorUtils.h (NFC) Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use them for VPlan outside LoopVectorize.cpp Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00 Reviewed By: rengolin, xbolva00 Differential Revision: https://reviews.llvm.org/D49488 llvm-svn: 342027	2018-09-12 08:01:57 +00:00
Vikram TV	09be521d4d	Move a transformation routine from LoopUtils to LoopVectorize. Summary: Move InductionDescriptor::transform() routine from LoopUtils to its only uses in LoopVectorize.cpp. Specifically, the function is renamed as InnerLoopVectorizer::emitTransformedIndex(). This is a child to D51153. Reviewers: dmgreen, llvm-commits Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D51837 llvm-svn: 341776	2018-09-10 06:16:44 +00:00
Vikram TV	6594dc377d	Move createMinMaxOp() out of RecurrenceDescriptor. Reviewers: dmgreen, llvm-commits Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D51838 llvm-svn: 341773	2018-09-10 05:05:08 +00:00
Anna Thomas	110df11a1a	[LV] Fix code gen for conditionally executed loads and stores Fix a latent bug in loop vectorizer which generates incorrect code for memory accesses that are executed conditionally. As pointed in review, this bug definitely affects uniform loads and may affect conditional stores that should have turned into scatters as well). The code gen for conditionally executed uniform loads on architectures that support masked gather instructions is broken. Without this patch, we were unconditionally executing the conditional load in the vectorized version. This patch does the following: 1. Uniform conditional loads on architectures with gather support will have correct code generated. In particular, the cost model (setCostBasedWideningDecision) is fixed. 2. For the recipes which are handled after the widening decision is set, we use the isScalarWithPredication(I, VF) form which is added in the patch. 3. Fix the vectorization cost model for scalarization (getMemInstScalarizationCost): implement and use isPredicatedInst to identify all predicated instructions, not just scalar+predicated. So, now the cost for scalarization will be increased for maskedloads/stores and gather/scatter operations. In short, we should be choosing the gather/scatter in place of scalarization on archs where it is profitable. 4. We needed to weaken the assert in useEmulatedMaskMemRefHack. Reviewers: Ayal, hsaito, mkuper Differential Revision: https://reviews.llvm.org/D51313 llvm-svn: 341673	2018-09-07 15:53:48 +00:00
Anna Thomas	dbacea188b	[LV] First order recurrence phis should not be treated as uniform This is fix for PR38786. First order recurrence phis were incorrectly treated as uniform, which caused them to be vectorized as uniform instructions. Patch by Ayal Zaks and Orivej Desh! Reviewed by: Anna Differential Revision: https://reviews.llvm.org/D51639 llvm-svn: 341416	2018-09-04 22:12:23 +00:00
Matt Arsenault	c807ce0ee4	SLPVectorizer: Fix assert with different sized address spaces llvm-svn: 341215	2018-08-31 14:34:53 +00:00
David Bolvansky	589bb484f6	[LoopVectorize][NFCI] Use find instead of count Summary: Avoid "count" if possible -> use "find" to check for the existence of keys. Passed llvm test suite. Reviewers: fhahn, dcaballe, mkuper, rengolin Reviewed By: fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51054 llvm-svn: 340563	2018-08-23 18:34:58 +00:00
Anna Thomas	b02b0ad8c7	[LV] Vectorize loops where non-phi instructions used outside loop Summary: Follow up change to rL339703, where we now vectorize loops with non-phi instructions used outside the loop. Note that the cyclic dependency identification occurs when identifying reduction/induction vars. We also need to identify that we do not allow users where the PSCEV information within and outside the loop are different. This was the fix added in rL307837 for PR33706. Reviewers: Ayal, mkuper, fhahn Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D50778 llvm-svn: 340278	2018-08-21 14:40:27 +00:00
Anna Thomas	6a1dd77f5d	NFC: Clarify comment in loop vectorization legality Clarifying the comment about PSCEV and external IV users by referencing the bug in question. llvm-svn: 339722	2018-08-14 20:25:13 +00:00
Anna Thomas	60a1e4dddc	[LV] Teach about non header phis that have uses outside the loop Summary: This patch teaches the loop vectorizer to vectorize loops with non header phis that have have outside uses. This is because the iteration dependence distance for these phis can be widened upto VF (similar to how we do for induction/reduction) if they do not have a cyclic dependence with header phis. When identifying reduction/induction/first order recurrence header phis, we already identify if there are any cyclic dependencies that prevents vectorization. The vectorizer is taught to extract the last element from the vectorized phi and update the scalar loop exit block phi to contain this extracted element from the vector loop. This patch can be extended to vectorize loops where instructions other than phis have outside uses. Reviewers: Ayal, mkuper, mssimpso, efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50579 llvm-svn: 339703	2018-08-14 18:22:19 +00:00
Alexey Bataev	0edcd0278d	[SLP] Fix insert point for reused extract instructions. Summary: Reworked the previously committed patch to insert shuffles for reused extract element instructions in the correct position. Previous logic was incorrect, and might lead to the crash with PHIs and EH instructions. Reviewers: efriedma, javed.absar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50143 llvm-svn: 339166	2018-08-07 19:21:05 +00:00
Alexey Bataev	c0c3a6ed5e	[SLP] Fix PR38339: Instruction does not dominate all uses! Summary: If the ExtractElement instructions can be optimized out during the vectorization and we need to reshuffle the parent vector, this ShuffleInstruction may be inserted in the wrong place causing compiler to produce incorrect code. Reviewers: spatel, RKSimon, mkuper, hfinkel, javed.absar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49928 llvm-svn: 338380	2018-07-31 14:02:43 +00:00
Diego Caballero	3587150fcb	[VPlan] Introduce VPLoopInfo analysis. The patch introduces loop analysis (VPLoopInfo/VPLoop) for VPBlockBases. This analysis will be necessary to perform some H-CFG transformations and detect and introduce regions representing a loop in the H-CFG. Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48816 llvm-svn: 338346	2018-07-31 01:57:29 +00:00
Diego Caballero	2a34ac86d3	[VPlan] Introduce VPlan-based dominator analysis. The patch introduces dominator analysis for VPBlockBases and extend VPlan's GraphTraits specialization with the required interfaces. Dominator analysis will be necessary to perform some H-CFG transformations and to introduce VPLoopInfo (LoopInfo analysis on top of the VPlan representation). Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48815 llvm-svn: 338310	2018-07-30 21:33:31 +00:00
Fangrui Song	f78650a8de	Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} llvm-svn: 338293	2018-07-30 19:41:25 +00:00
Anastasis Grammenos	f6e143e67f	Revert "[LV][DebugInfo] Set DL to the middle block Icmp instruction" This reverts commit r338106. llvm-svn: 338109	2018-07-27 08:22:54 +00:00
Anastasis Grammenos	03948d0e0f	[LV][DebugInfo] Set DL to the middle block Icmp instruction Reviewers: hsaito Differential Revision: https://reviews.llvm.org/D49746 llvm-svn: 338106	2018-07-27 07:12:44 +00:00
Fangrui Song	984a424c8a	[LoadStoreVectorizer] Use const reference llvm-svn: 337992	2018-07-26 01:11:36 +00:00
Roman Tereshin	4f10a9d3a3	[LSV] Look through selects for consecutive addresses In some cases LSV sees (load/store _ (select _ <pointer expression> <pointer expression>)) patterns in input IR, often due to sinking and other forms of CFG simplification, sometimes interspersed with bitcasts and all-constant-indices GEPs. With this patch`areConsecutivePointers` method would attempt to handle select instructions. This leads to an increased number of successful vectorizations. Technically, select instructions could appear in index arithmetic as well, however, we don't see those in our test suites / benchmarks. Also, there is a lot more freedom in IR shapes computing integral indices in general than in what's common in pointer computations, and it appears that it's quite unreliable to do anything short of making select instructions first class citizens of Scalar Evolution, which for the purposes of this patch is most definitely an overkill. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D49428 llvm-svn: 337965	2018-07-25 21:33:00 +00:00
Hideki Saito	ef380b0fc5	[LV] Fix for PR38110, LV encountered llvm_unreachable() Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash). Reviewers: sbaranga, mssimpso, hsaito Reviewed By: hsaito Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49461 llvm-svn: 337861	2018-07-24 22:30:31 +00:00
Roman Tereshin	31d52847ef	Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size" This reapplies commit r337489 reverted by r337541 Additionally, this commit contains a speculative fix to the issue reported in r337541 (the report does not contain an actionable reproducer, just a stack trace) llvm-svn: 337606	2018-07-20 20:10:04 +00:00
Sam McCall	57743883f1	Revert "[LSV] Refactoring + supporting bitcasts to a type of different size" This reverts commit r337489. It causes asserts to fire in some TensorFlow tests, e.g. tensorflow/compiler/tests/gather_test.py on GPU. Example stack trace: Start test case: GatherTest.testHigherRank assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits" @ 0x5559446ebe10 __assert_fail @ 0x55593ef32f5e llvm::APInt::trunc() @ 0x55593d78f86e (anonymous namespace)::Vectorizer::lookThroughComplexAddresses() @ 0x55593d78f2bc (anonymous namespace)::Vectorizer::areConsecutivePointers() @ 0x55593d78d128 (anonymous namespace)::Vectorizer::isConsecutiveAccess() @ 0x55593d78c926 (anonymous namespace)::Vectorizer::vectorizeInstructions() @ 0x55593d78c221 (anonymous namespace)::Vectorizer::vectorizeChains() @ 0x55593d78b948 (anonymous namespace)::Vectorizer::run() @ 0x55593d78b725 (anonymous namespace)::LoadStoreVectorizer::runOnFunction() @ 0x55593edf4b17 llvm::FPPassManager::runOnFunction() @ 0x55593edf4e55 llvm::FPPassManager::runOnModule() @ 0x55593edf563c (anonymous namespace)::MPPassManager::runOnModule() @ 0x55593edf5137 llvm::legacy::PassManagerImpl::run() @ 0x55593edf5b71 llvm::legacy::PassManager::run() @ 0x55593ced250d xla::gpu::IrDumpingPassManager::run() @ 0x55593ced5033 xla::gpu::(anonymous namespace)::EmitModuleToPTX() @ 0x55593ced40ba xla::gpu::(anonymous namespace)::CompileModuleToPtx() @ 0x55593ced33d0 xla::gpu::CompileToPtx() @ 0x55593b26b2a2 xla::gpu::NVPTXCompiler::RunBackend() @ 0x55593b21f973 xla::Service::BuildExecutable() @ 0x555938f44e64 xla::LocalService::CompileExecutable() @ 0x555938f30a85 xla::LocalClient::Compile() @ 0x555938de3c29 tensorflow::XlaCompilationCache::BuildExecutable() @ 0x555938de4e9e tensorflow::XlaCompilationCache::CompileImpl() @ 0x555938de3da5 tensorflow::XlaCompilationCache::Compile() @ 0x555938c5d962 tensorflow::XlaLocalLaunchBase::Compute() @ 0x555938c68151 tensorflow::XlaDevice::Compute() @ 0x55593f389e1f tensorflow::(anonymous namespace)::ExecutorState::Process() @ 0x55593f38a625 tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()() * SIGABRT received by PID 7798 (TID 7837) from PID 7798; * llvm-svn: 337541	2018-07-20 12:03:00 +00:00
Roman Tereshin	b49b2a601f	[LSV] Refactoring + supporting bitcasts to a type of different size This is mostly a preparation work for adding a limited support for select instructions. It proved to be difficult to do due to size and irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I believe. It also turned out that these changes make it simpler to finish one of the TODOs and fix a number of other small issues, namely: 1. Looking through bitcasts to a type of a different size (requires careful tracking of the original load/store size and some math converting sizes in bytes to expected differences in indices of GEPs). 2. Reusing partial analysis of pointers done by first attempt in proving them consecutive instead of starting from scratch. This added limited support for nested GEPs co-existing with difficult sext/zext instructions. This also required a careful handling of negative differences between constant parts of offsets. 3. Handing a case where the first pointer index is not an add, but something else (a function parameter for instance). I observe an increased number of successful vectorizations on a large set of shader programs. Only few shaders are affected, but those that are affected sport >5% less loads and stores than before the patch. Reviewed By: rampitec Differential-Revision: https://reviews.llvm.org/D49342 llvm-svn: 337489	2018-07-19 19:42:43 +00:00
Farhana Aleen	8c7a30baea	[LoadStoreVectorizer] Use getMinusScev() to compute the distance between two pointers. Summary: Currently, isConsecutiveAccess() detects two pointers(PtrA and PtrB) as consecutive by comparing PtrB with BaseDelta+PtrA. This works when both pointers are factorized or both of them are not factorized. But isConsecutiveAccess() fails if one of the pointers is factorized but the other one is not. Here is an example: PtrA = 4 * (A + B) PtrB = 4 + 4A + 4B This patch uses getMinusSCEV() to compute the distance between two pointers. getMinusSCEV() allows combining the expressions and computing the simplified distance. Author: FarhanaAleen Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D49516 llvm-svn: 337471	2018-07-19 16:50:27 +00:00
Simon Pilgrim	2b37ddce4b	[SLPVectorizer] Avoid duplicate scalar cost calculations in BoUpSLP::getEntryCost. NFCI. Pulled out from D49225, we have a lot of repeated scalar cost calculations, often with arguments that don't look the same but turn out to be. llvm-svn: 337390	2018-07-18 13:53:55 +00:00
Simon Pilgrim	1a4f3c93fb	[SLPVectorizer] Don't attempt horizontal reduction on pointer types (PR38191) TTI::getMinMaxReductionCost typically can't handle pointer types - until this is changed its better to limit horizontal reduction to integer/float vector types only. llvm-svn: 337280	2018-07-17 13:43:33 +00:00
Simon Pilgrim	36b944e778	[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED-2) We currently only support binary instructions in the alternate opcode shuffles. This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism: 1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly. 2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this. 3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc. 4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements. Reapplied with fix to only accept 2 different casts if they come from the same source type (PR38154). Differential Revision: https://reviews.llvm.org/D49135 llvm-svn: 336989	2018-07-13 11:09:52 +00:00
Martin Storsjo	86b95489fd	Revert "[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED)" This reverts commit r336812, which broke compilation of a number of projects, see PR38154. llvm-svn: 336949	2018-07-12 21:33:42 +00:00
Simon Pilgrim	876e99bf2c	[SLPVectorizer] Add initial alternate opcode support for cast instructions. (REAPPLIED) We currently only support binary instructions in the alternate opcode shuffles. This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism: 1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly. 2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this. 3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc. 4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements. Reapplied with fix to only accept 2 different casts if they come from the same source type. Differential Revision: https://reviews.llvm.org/D49135 llvm-svn: 336812	2018-07-11 15:05:10 +00:00
Simon Pilgrim	1edde95abd	Revert rL336804: [SLPVectorizer] Add initial alternate opcode support for cast instructions. Reverting due to buildbot failures llvm-svn: 336806	2018-07-11 14:08:16 +00:00
Simon Pilgrim	2f963a7e83	[SLPVectorizer] Add initial alternate opcode support for cast instructions. We currently only support binary instructions in the alternate opcode shuffles. This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism: 1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly. 2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this. 3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc. 4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements. Differential Revision: https://reviews.llvm.org/D49135 llvm-svn: 336804	2018-07-11 13:34:09 +00:00
Anastasis Grammenos	612bf7cac5	[DebugInfo][LoopVectorize] Preserve DL in induction PHI and Add Differential Revision: https://reviews.llvm.org/D48968 llvm-svn: 336667	2018-07-10 13:29:50 +00:00
Diego Caballero	d09530144a	[VPlan][LV] Introduce condition bit in VPBlockBase This patch introduces a VPValue in VPBlockBase to represent the condition bit that is used as successor selector when a block has multiple successors. This information wasn't necessary until now, when we are about to introduce outer loop vectorization support in VPlan code gen. Reviewers: fhahn, rengolin, mkuper, hfinkel, mssimpso Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D48814 llvm-svn: 336554	2018-07-09 15:57:09 +00:00
Simon Pilgrim	dafd828c97	[SLPVectorizer] Begin abstracting InstructionsState alternate matching away from opcodes. NFCI. This is an early step towards matching Instructions by attributes other than the opcode. This will be necessary for cast/call alternates which share the same opcode but have different types/intrinsicIDs etc. - which we could vectorize as long as we split them using the alternate mechanism. Differential Revision: https://reviews.llvm.org/D48945 llvm-svn: 336344	2018-07-05 12:30:44 +00:00
Simon Pilgrim	ae1c4dcc6e	Fix some irregular whitespace/indentation. NFCI. llvm-svn: 336291	2018-07-04 17:24:05 +00:00
Anastasis Grammenos	204726b345	[DebugInfo][LoopVectorize] Preserve DL in generated phi instruction When creating `phi` instructions to resume at the scalar part of the loop, copy the DebugLoc from the original phi over to the new one. Differential Revision: https://reviews.llvm.org/D48769 llvm-svn: 336256	2018-07-04 10:16:55 +00:00
Farhana Aleen	3b416db19b	[SLP] Recognize min/max pattern using instructions producing same values. Summary: It is common to have the following min/max pattern during the intermediate stages of SLP since we only optimize at the end. This patch tries to catch such patterns and allow more vectorization. %1 = extractelement <2 x i32> %a, i32 0 %2 = extractelement <2 x i32> %a, i32 1 %cond = icmp sgt i32 %1, %2 %3 = extractelement <2 x i32> %a, i32 0 %4 = extractelement <2 x i32> %a, i32 1 %select = select i1 %cond, i32 %3, i32 %4 Author: FarhanaAleen Reviewed By: ABataev, RKSimon, spatel Differential Revision: https://reviews.llvm.org/D47608 llvm-svn: 336130	2018-07-02 17:55:31 +00:00
Simon Pilgrim	d5fb50e3bf	[SLPVectorizer] Remove nullptr early-outs from Instruction::ShuffleVector getEntryCost This code is only used by alternate opcodes so the InstructionsState has already confirmed that every Value is an Instruction, plus we use cast<Instruction> which will assert on failure. llvm-svn: 336102	2018-07-02 13:41:29 +00:00
Simon Pilgrim	265793d52a	[SLPVectorizer] Fix alternate opcode + shuffle cost function to correct handle SK_Select patterns. We were always using the opcodes of the first 2 scalars for the costs of the alternate opcode + shuffle. This made sense when we used SK_Alternate and opcodes were guaranteed to be alternating, but this fails for the more general SK_Select case. This fix exposes an issue demonstrated by the fmul_fdiv_v4f32_const test - the SLM model has v4f32 fdiv costs which are more than twice those of the f32 scalar cost, meaning that the cost model determines that the vectorization is not performant. Unfortunately it completely ignores the fact that the fdiv by a constant will be changed into a fmul by InstCombine for a much lower cost vectorization. But at least we're seeing this now... llvm-svn: 336095	2018-07-02 11:28:01 +00:00
Simon Pilgrim	409bd5f487	[SLPVectorizer] Only Alternate opcodes use ShuffleVector cases for getEntryCost/vectorizeTree. NFCI. Add assertions - we're already assuming this in how we use the AltOpcode and treat everything as BinaryOperators. llvm-svn: 336092	2018-07-02 10:54:19 +00:00
Simon Pilgrim	3dafb553d9	[SLPVectorizer] Call InstructionsState.isOpcodeOrAlt with Instruction instead of an opcode. NFCI. llvm-svn: 336069	2018-07-01 20:22:46 +00:00
Simon Pilgrim	ef9c97c343	[SLPVectorizer] Replace sameOpcodeOrAlt with InstructionsState.isOpcodeOrAlt helper. NFCI. This is a basic step towards matching more general instructions types than just opcodes. llvm-svn: 336068	2018-07-01 20:07:30 +00:00
Simon Pilgrim	77d2067677	[SLPVectorizer] Use InstructionsState Op/Alt opcodes directly. NFCI. llvm-svn: 336063	2018-07-01 13:41:58 +00:00
Simon Pilgrim	bbfc18b5b5	[SLPVectorizer] Recognise non uniform power of 2 constants Since D46637 we are better at handling uniform/non-uniform constant Pow2 detection; this patch tweaks the SLP argument handling to support them. As SLP works with arrays of values I don't think we can easily use the pattern match helpers here. Differential Revision: https://reviews.llvm.org/D48214 llvm-svn: 335621	2018-06-26 16:20:16 +00:00
Simon Pilgrim	9d3ef8ee2b	[SLPVectorizer] Support alternate opcodes in tryToVectorizeList Enable tryToVectorizeList to support InstructionsState alternate opcode patterns at a root (build vector etc.) as well as further down the vectorization tree. NOTE: This patch reduces some of the debug reporting if there are opcode mismatches - I can try to add it back if it proves a problem. But it could get rather messy trying to provide equivalent verbose debug strings via getSameOpcode etc. Differential Revision: https://reviews.llvm.org/D48488 llvm-svn: 335364	2018-06-22 16:37:34 +00:00
Simon Pilgrim	213cb1b82d	[SLPVectorizer] reorderAltShuffleOperands should just take InstructionsState. NFCI. All calls were extracting the InstructionsState Opcode/AltOpcode values so we might as well pass it directly llvm-svn: 335359	2018-06-22 16:10:26 +00:00
Simon Pilgrim	1e564504bb	[SLPVectorizer] Relax alternate opcodes to accept any BinaryOperator pair SLP currently only accepts (F)Add/(F)Sub alternate counterpart ops to be merged into an alternate shuffle. This patch relaxes this to accept any pair of BinaryOperator opcodes instead, assuming the target's cost model accepts the vectorization+shuffle. Differential Revision: https://reviews.llvm.org/D48477 llvm-svn: 335349	2018-06-22 14:04:06 +00:00
Simon Pilgrim	3d1c8c97b8	[SLPVectorizer] Provide InstructionsState down the BoUpSLP vectorization call tree As described in D48359, this patch pushes InstructionsState down the BoUpSLP call hierarchy instead of the corresponding raw OpValue. This makes it easier to track the alternate opcode etc. and avoids us having to call getAltOpcode which makes it difficult to support more than one alternate opcode. Differential Revision: https://reviews.llvm.org/D48382 llvm-svn: 335170	2018-06-20 20:54:52 +00:00
Simon Pilgrim	292651a5b7	[SLPVectorizer] Move isOneOf after InstructionsState type. NFCI. A future patch will have isOneOf use InstructionsState. llvm-svn: 335142	2018-06-20 16:11:00 +00:00
Simon Pilgrim	0c9f8dcde7	[SLPVectorizer] Use InstructionsState to record AltOpcode This is part of a move towards generalizing the alternate opcode mechanism and not just supporting (F)Add/(F)Sub counterparts. The patch embeds the AltOpcode in the InstructionsState instead of calling getAltOpcode so often. I'm hoping to eventually remove all uses of getAltOpcode and handle alternate opcode selection entirely within getSameOpcode, that will require us to use InstructionsState throughout the BoUpSLP call hierarchy (similar to some of the changes in D28907), which I will begin in future patches. Differential Revision: https://reviews.llvm.org/D48359 llvm-svn: 335134	2018-06-20 15:13:40 +00:00
Simon Pilgrim	2e2f20a949	[SLPVectorizer] Relax "alternate" opcode vectorisation to work with any SK_Select shuffle pattern D47985 saw the old SK_Alternate 'alternating' shuffle mask replaced with the SK_Select mask which accepts either input operand for each lane, equivalent to a vector select with a constant condition operand. This patch updates SLPVectorizer to make full use of this SK_Select shuffle pattern by removing the 'isOdd()' limitation. The AArch64 regression will be fixed by D48172. Differential Revision: https://reviews.llvm.org/D48174 llvm-svn: 335130	2018-06-20 14:26:28 +00:00
Simon Pilgrim	b7ac037797	[SLPVectorizer] Split Tree/Reduction cost calls to simplify debugging. NFCI. llvm-svn: 335110	2018-06-20 09:39:01 +00:00
Simon Pilgrim	0461393660	[SLPVectorizer] Remove default OperandValueKind arguments from getArithmeticInstrCost calls (NFC) The getArithmeticInstrCost calls for shuffle vectors entry costs specify TargetTransformInfo::OperandValueKind arguments, but are just using the method's default values. This seems to be a copy + paste issue and doesn't affect the costs in anyway. The TargetTransformInfo::OperandValueProperties default arguments are already not being used. Noticed while working on D47985. Differential Revision: https://reviews.llvm.org/D48008 llvm-svn: 335045	2018-06-19 13:40:00 +00:00
Simon Pilgrim	c966f7213e	[SLPVectorizer] Pull out AltOpcode determination from reorderAltShuffleOperands. Minor step towards making the alternate opcode system work with a wider range of opcode pairs. llvm-svn: 335032	2018-06-19 09:16:06 +00:00
Florian Hahn	3385caaafd	[VPlan] Add VPInstruction to VPRecipe transformation. This patch introduces a VPInstructionToVPRecipe transformation, which allows us to generate code for a VPInstruction based VPlan re-using the existing infrastructure. Reviewers: dcaballe, hsaito, mssimpso, hfinkel, rengolin, mkuper, javed.absar, sguggill Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D46827 llvm-svn: 334969	2018-06-18 18:28:49 +00:00
Simon Pilgrim	5b962b2fc3	[SLPVectorizer] Tidyup isShuffle helper Ensure we keep track of the input vectors in all cases instead of just for SK_Select. Ideally we'd reuse the shuffle mask pattern matching in TargetTransformInfo::getInstructionThroughput here to easily add support for all TargetTransformInfo::ShuffleKind without mass code duplication, I've added a TODO for now but D48236 should help us here. Differential Revision: https://reviews.llvm.org/D48023 llvm-svn: 334958	2018-06-18 16:25:01 +00:00
Florian Hahn	63cbcf98a5	[VPlanRecipeBase] Add eraseFromParent(). Reviewers: dcaballe, hsaito, mkuper, hfinkel Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D48081 llvm-svn: 334951	2018-06-18 15:18:48 +00:00
Florian Hahn	3bcff3662c	[VPlan] Fix sanitizer problem with insertBefore. llvm-svn: 334943	2018-06-18 13:51:28 +00:00
Simon Pilgrim	99a5832016	[SLPVectorizer] Avoid calling const VL.size() repeatedly in for-loop. NFCI. llvm-svn: 334934	2018-06-18 11:35:36 +00:00
Florian Hahn	7591e4e94a	[VPlanRecipeBase] Add insertBefore helper. Reviewers: dcaballe, mkuper, hfinkel, hsaito, mssimpso Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D48080 llvm-svn: 334933	2018-06-18 11:34:17 +00:00
Diego Caballero	68795245cf	[LV] Prevent LV to run cost model twice for VF=2 This is a minor fix for LV cost model, where the cost for VF=2 was computed twice when the vectorization of the loop was forced without specifying a VF. Reviewers: xusx595, hsaito, fhahn, mkuper Reviewed By: hsaito, xusx595 Differential Revision: https://reviews.llvm.org/D48048 llvm-svn: 334840	2018-06-15 16:21:35 +00:00
Simon Pilgrim	b234ff136e	[SLPVectorizer] Remove RawInstructionsData/getMainOpcode and merge into getSameOpcode This is part of the work to cleanup use of 'alternate' ops so we can use the more general SK_Select shuffle type. Only getSameOpcode calls getMainOpcode and much of the logic is repeated in both functions. This will require some reworking of D28907 but that patch has hit trouble and is unlikely to be completed anytime soon. Differential Revision: https://reviews.llvm.org/D48120 llvm-svn: 334701	2018-06-14 10:25:19 +00:00
Simon Pilgrim	2c9d2adff5	[SLPVectorizer] getSameOpcode - remove useless cast [NFC] There's no need to cast the base Value to an Instruction llvm-svn: 334588	2018-06-13 10:49:24 +00:00
Simon Pilgrim	1224260f83	[SLPVectorizer] getSameOpcode - remove unusued alternate code [NFC] We early-out for the case where we don't use alternate opcodes, so no need to check for it later. llvm-svn: 334587	2018-06-13 10:14:27 +00:00
Simon Pilgrim	e39fa6cbbb	[CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744) As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources: e.g. v4f32: <0,5,2,7> or <4,1,6,3> This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline: e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc. This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns. Differential Revision: https://reviews.llvm.org/D47985 llvm-svn: 334513	2018-06-12 16:12:29 +00:00
Florian Hahn	a1cc848399	Use SmallPtrSet explicitly for SmallSets with pointer types (NFC). Currently SmallSet<PointerTy> inherits from SmallPtrSet<PointerTy>. This patch replaces such types with SmallPtrSet, because IMO it is slightly clearer and allows us to get rid of unnecessarily including SmallSet.h Reviewers: dblaikie, craig.topper Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D47836 llvm-svn: 334492	2018-06-12 11:16:56 +00:00
Craig Topper	61998289f9	Use SmallPtrSet instead of SmallSet in places where we iterate over the set. SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work. These places were found by hiding the begin/end methods in the SmallSet forwarding llvm-svn: 334343	2018-06-09 05:04:20 +00:00
Florian Hahn	45e5d5b4be	[VPlan] Move recipe construction to VPRecipeBuilder. This patch moves the recipe-creation functions out of LoopVectorizationPlanner, which should do the high-level orchestration of the transformations. Reviewers: dcaballe, rengolin, hsaito, Ayal Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D47595 llvm-svn: 334305	2018-06-08 17:30:45 +00:00
Florian Hahn	b3c6f07dde	[VPlan] Move recipe based VPlan generation to separate function. This first step separates VPInstruction-based and VPRecipe-based VPlan creation, which should make it easier to migrate to VPInstruction based code-gen step by step. Reviewers: Ayal, rengolin, dcaballe, hsaito, mkuper, mzolotukhin Reviewed By: dcaballe Subscribers: bollu, tschuett, rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D47477 llvm-svn: 334284	2018-06-08 12:53:51 +00:00
Roman Shirokiy	9ba0aa2da0	[LV] Fix PR36983. For a given recurrence, fix all phis in exit block There could be more than one PHIs in exit block using same loop recurrence. Don't assume there is only one and fix each user. Differential Revision: https://reviews.llvm.org/D47788 llvm-svn: 334271	2018-06-08 08:21:20 +00:00
David Blaikie	31b98d2e99	Move Analysis/Utils/Local.h back to Transforms Review feedback from r328165. Split out just the one function from the file that's used by Analysis. (As chandlerc pointed out, the original change only moved the header and not the implementation anyway - which was fine for the one function that was used (since it's a template/inlined in the header) but not in general) llvm-svn: 333954	2018-06-04 21:23:21 +00:00
Diego Caballero	b94b21d441	[VPlan] Replace LLVM_ATTRIBUTE_USED with ifndef NDEBUG Minor replacement. LLVM_ATTRIBUTE_USED was introduced to silence a warning but using #ifndef NDEBUG makes more sense in this case. Reviewers: dblaikie, fhahn, hsaito Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D47498 llvm-svn: 333476	2018-05-29 23:10:44 +00:00
Fangrui Song	afa95ee03d	[LLVM-C] [OCaml] Remove LLVMAddBBVectorizePass Summary: It was fully replaced back in 2014, and the implementation was removed 11 months ago by r306797. Reviewers: hfinkel, chandlerc, whitequark, deadalnix Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D47436 llvm-svn: 333378	2018-05-28 16:58:10 +00:00
Andrei Elovikov	d34b765cb2	[NFC][VPlan] Wrap PlainCFGBuilder with an anonymous namespace. Summary: It's internal to the VPlanHCFGBuilder and should not be visible outside of its translation unit. Reviewers: dcaballe, fhahn Reviewed By: fhahn Subscribers: rengolin, bollu, tschuett, llvm-commits, rkruppe Differential Revision: https://reviews.llvm.org/D47312 llvm-svn: 333187	2018-05-24 14:31:00 +00:00
Nicola Zaghen	03d0b91f43	Remove DEBUG macro. Now that the LLVM_DEBUG() macro landed on the various sub-projects the DEBUG macro can be removed. Also change the new uses of DEBUG to LLVM_DEBUG. Differential Revision: https://reviews.llvm.org/D46952 llvm-svn: 333091	2018-05-23 15:09:29 +00:00
Diego Caballero	1bd5f2261d	Fix warning from r332654 with LLVM_ATTRIBUTE_USED r332654 tried to fix an unused function warning with a void cast. This approach worked for clang and gcc but not for MSVC. This commit replaces the void cast with the LLVM_ATTRIBUTE_USED approach. llvm-svn: 332910	2018-05-21 22:12:38 +00:00
Diego Caballero	168d04d544	[VPlan] Reland r332654 and silence unused func warning r332654 was reverted due to an unused function warning in release build. This commit includes the same code with the warning silenced. Differential Revision: https://reviews.llvm.org/D44338 llvm-svn: 332860	2018-05-21 18:14:23 +00:00
Galina Kistanova	083ea389d6	Reverted r332654 as it has broken some buildbots and left unfixed for a long time. The introduced problem is: llvm.src/lib/Transforms/Vectorize/VPlanVerifier.cpp:29:13: error: unused function 'hasDuplicates' [-Werror,-Wunused-function] static bool hasDuplicates(const SmallVectorImpl<VPBlockBase *> &VPBlockVec) { ^ llvm-svn: 332747	2018-05-18 18:14:06 +00:00
Diego Caballero	f58ad3129c	[LV][VPlan] Build plain CFG with simple VPInstructions for outer loops. Patch #3 from VPlan Outer Loop Vectorization Patch Series #1 (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). Expected to be NFC for the current inner loop vectorization path. It introduces the basic algorithm to build the VPlan plain CFG (single-level CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization path using VPInstructions. It includes: - VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now). - VPlanVerifier: Main class with utilities to check the consistency of a H-CFG. - VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan. Reviewers: rengolin, fhahn, mkuper, mssimpso, a.elovikov, hfinkel, aprantl. Differential Revision: https://reviews.llvm.org/D44338 llvm-svn: 332654	2018-05-17 19:24:47 +00:00
Nicola Zaghen	d34e60ca85	Rename DEBUG macro to LLVM_DEBUG. The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240	2018-05-14 12:53:11 +00:00
Vedant Kumar	e0b5f86b30	[STLExtras] Add distance() for ranges, pred_size(), and succ_size() This commit adds a wrapper for std::distance() which works with ranges. As it would be a common case to write `distance(predecessors(BB))`, this also introduces `pred_size()` and `succ_size()` helpers to make that easier to write. Differential Revision: https://reviews.llvm.org/D46668 llvm-svn: 332057	2018-05-10 23:01:54 +00:00
Krzysztof Parzyszek	ea4c1bb772	[LV] Change MaxVectorSize bound to 256 in assertion, NFC otherwise It's possible to have a vector of 256 bytes in HVX code on Hexagon (vector pair in 128-byte mode). llvm-svn: 331885	2018-05-09 15:18:12 +00:00
Hideki Saito	d722d61402	[LV] Fix for PR37248, Broadcast codegen incorrectly assumed vector loop body is single basic block Summary: Broadcast code generation emitted instructions in pre-header, while the instruction they are dependent on in the vector loop body. This resulted in an IL verification error ---- value used before defined. Reviewers: rengolin, fhahn, hfinkel Reviewed By: rengolin, fhahn Subscribers: dcaballe, Ka-Ka, llvm-commits Differential Revision: https://reviews.llvm.org/D46302 llvm-svn: 331799	2018-05-08 18:57:34 +00:00
Craig Topper	781aa181ab	Fix a bunch of places where operator-> was used directly on the return from dyn_cast. Inspired by r331508, I did a grep and found these. Mostly just change from dyn_cast to cast. Some cases also showed a dyn_cast result being converted to bool, so those I changed to isa. llvm-svn: 331577	2018-05-05 01:57:00 +00:00
Adrian Prantl	5f8f34e459	Remove \brief commands from doxygen comments. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272	2018-05-01 15:54:18 +00:00
Daniel Neilson	9e4bbe801a	[LV] Preserve inbounds on created GEPs Summary: This is a fix for PR23997. The loop vectorizer is not preserving the inbounds property of GEPs that it creates. This is inhibiting some optimizations. This patch preserves the inbounds property in the case where a load/store is being fed by an inbounds GEP. Reviewers: mkuper, javed.absar, hsaito Reviewed By: hsaito Subscribers: dcaballe, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D46191 llvm-svn: 331269	2018-05-01 15:35:08 +00:00
Davide Italiano	bd3bf1660b	[SLPVectorizer] Debug info shouldn't impact spill cost computation. <rdar://problem/39794738> (Also, PR32761). Differential Revision: https://reviews.llvm.org/D46199 llvm-svn: 331199	2018-04-30 16:57:33 +00:00
Florian Hahn	deb01ea126	[LV] Use BB::instructionsWithoutDebug to skip DbgInfo (NFC). This patch updates some code responsible the skip debug info to use BasicBlock::instructionsWithoutDebug. I think this makes things slightly simpler and more direct. Reviewers: mkuper, rengolin, dcaballe, aprantl, vsk Reviewed By: rengolin Differential Revision: https://reviews.llvm.org/D46254 llvm-svn: 331174	2018-04-30 13:28:08 +00:00
Hideki Saito	f2ec16ccc2	[NFC][LV][LoopUtil] Move LoopVectorizationLegality to its own file Summary: This is a follow up to D45420 (included here since it is still under review and this change is dependent on that) and D45072 (committed). Actual change for this patch is LoopVectorize* and cmakefile. All others are all from D45420. LoopVectorizationLegality is an analysis and thus really belongs to Analysis tree. It is modular enough and it is reusable enough ---- we can further improve those aspects once uses outside of LV picks up. Hopefully, this will make it easier for people familiar with vectorization theory, but not necessarily LV itself to contribute, by lowering the volume of code they should deal with. We probably should start adding some code in LV to check its own capability (i.e., vectorization is legal but LV is not ready to handle it) and then bail out. Reviewers: rengolin, fhahn, hfinkel, mkuper, aemerson, mssimpso, dcaballe, sguggill Reviewed By: rengolin, dcaballe Subscribers: egarcia, rogfer01, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D45552 llvm-svn: 331139	2018-04-29 07:26:18 +00:00
Daniel Neilson	a19ee7d7b6	[LV] Common duplicate vector load/store address calculation (NFC) Summary: Commoning some obviously copy/paste code in InnerLoopVectorizer::vectorizeMemoryInstruction llvm-svn: 331076	2018-04-27 20:29:18 +00:00
Diego Caballero	60f2776b2f	[LV][VPlan] Detect outer loops for explicit vectorization. Patch #2 from VPlan Outer Loop Vectorization Patch Series #1 (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html). This patch introduces the basic infrastructure to detect, legality check and process outer loops annotated with hints for explicit vectorization. All these changes are protected under the feature flag -enable-vplan-native-path. This should make this patch NFC for the existing inner loop vectorizer. Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso. Differential Revision: https://reviews.llvm.org/D42447 llvm-svn: 330739	2018-04-24 17:04:17 +00:00
Benjamin Kramer	f85f5da3b2	[LoadStoreVectorize] Ignore interleaved invariant loads. The memory location an invariant load is using can never be clobbered by any store, so it's safe to move the load ahead of the store. Differential Revision: https://reviews.llvm.org/D46011 llvm-svn: 330725	2018-04-24 15:28:47 +00:00
Stanislav Mekhanoshin	0bee630814	LoadStoreVectorizer crashes due to unsized type When we skip bitcasts while looking for GEP in LoadSoreVectorizer we should also verify that the type is sized otherwise we assert Differential Revision: https://reviews.llvm.org/D45709 llvm-svn: 330221	2018-04-17 21:40:04 +00:00
Haicheng Wu	f7466f3164	[SLP] Use getExtractWithExtendCost() to compute the scalar cost of extractelement/ext pair We use getExtractWithExtendCost to calculate the cost of extractelement and s\|zext together when computing the extract cost after vectorization, but we calculate the cost of extractelement and s\|zext separately when computing the scalar cost which is larger than it should be. Differential Revision: https://reviews.llvm.org/D45469 llvm-svn: 330143	2018-04-16 18:09:49 +00:00
Krzysztof Parzyszek	dfed941eec	[LV] Introduce TTI::getMinimumVF The function getMinimumVF(ElemWidth) will return the minimum VF for a vector with elements of size ElemWidth bits. This value will only apply to targets for which TTI::shouldMaximizeVectorBandwidth returns true. The value of 0 indicates that there is no minimum VF. Differential Revision: https://reviews.llvm.org/D45271 llvm-svn: 330062	2018-04-13 20:16:32 +00:00
Hideki Saito	d829973794	Fix for the buildbot failure. Now-unused private field TTI deleted. llvm-svn: 329649	2018-04-10 00:38:36 +00:00
Hideki Saito	dfa932b049	[NFC][LV] Move InterleaveInfo from Legal to CostModel Summary: Another clean up, following D43208. Interleaved memory access analysis/optimization has nothing to do with vectorization legality. It doesn't really belong there. On the other hand, cost model certainly has to know about it. In principle, vectorization should proceed like Legality ==> Optimization ==> CostModel ==> CodeGen, and this change just does that, by moving the interleaved access analysis/decision out of Legal, and run it just before CostModel object is created. After this, I can move LoopVectorizationLegality and Hints/Requirements classes into it's own header file, making it shareable within Transform tree. I have the patch already but I don't want to mix with this change. Eventual goal is to move to Analysis tree, but I first need to move RecurrenceDescriptor/InductionDescriptor from Transform/Util/LoopUtil.* to Analysis. Reviewers: rengolin, hfinkel, mkuper, dcaballe, sguggill, fhahn, aemerson Reviewed By: rengolin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45072 llvm-svn: 329645	2018-04-09 23:45:40 +00:00
Alexey Bataev	d5b1f7892f	[SLP] Fixed formatting, NFC. llvm-svn: 329091	2018-04-03 17:48:14 +00:00
Alexey Bataev	428e9d9d87	[SLP] Fix PR36481: vectorize reassociated instructions. Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Patch does not support reordering of the repeated instruction, this must be handled in the separate patch. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 329085	2018-04-03 17:14:47 +00:00
Alexey Bataev	df989c54cf	Recommit "[SLP] Fix issues with debug output in the SLP vectorizer." The primary issue here is that using NDEBUG alone isn't enough to guard debug printing -- instead the DEBUG() macro needs to be used so that the specific pass debug logging check is employed. Without this, every asserts-enabled build was printing out information when it hit this. I also fixed another place where we had multiple statements in a DEBUG macro to use {}s to be a bit cleaner. And I fixed a place that used errs() rather than dbgs(). llvm-svn: 329082	2018-04-03 16:40:33 +00:00
Benjamin Kramer	2fc3b18922	Revert "[SLP] Fix PR36481: vectorize reassociated instructions." This reverts commit r328980 and r329046. Makes the vectorizer crash. llvm-svn: 329071	2018-04-03 14:40:33 +00:00
Chandler Carruth	597bfd8448	[SLP] Fix issues with debug output in the SLP vectorizer. The primary issue here is that using NDEBUG alone isn't enough to guard debug printing -- instead the DEBUG() macro needs to be used so that the specific pass debug logging check is employed. Without this, every asserts-enabled build was printing out information when it hit this. I also fixed another place where we had multiple statements in a DEBUG macro to use {}s to be a bit cleaner. And I fixed a place that used `errs()` rather than `dbgs()`. llvm-svn: 329046	2018-04-03 05:27:28 +00:00
Haicheng Wu	7f0daaeb86	[SLP] Distinguish "demanded and shrinkable" from "demanded and not shrinkable" values when determining the minimum bitwidth We use two approaches for determining the minimum bitwidth. * Demanded bits * Value tracking If demanded bits doesn't result in a narrower type, we then try value tracking. We need this if we want to root SLP trees with the indices of getelementptr instructions since all the bits of the indices are demanded. But there is a missing piece though. We need to be able to distinguish "demanded and shrinkable" from "demanded and not shrinkable". For example, the bits of %i in %i = sext i32 %e1 to i64 %gep = getelementptr inbounds i64, i64* %p, i64 %i are demanded, but we can shrink %i's type to i32 because it won't change the result of the getelementptr. On the other hand, in %tmp15 = sext i32 %tmp14 to i64 %tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0 it doesn't make sense to shrink %tmp15 and we can skip the value tracking. Ideas are from Matthew Simpson! Differential Revision: https://reviews.llvm.org/D44868 llvm-svn: 329035	2018-04-03 00:05:10 +00:00
Alexey Bataev	3decaf4275	[SLP] Fix PR36481: vectorize reassociated instructions. Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 328980	2018-04-02 14:51:37 +00:00
Krzysztof Parzyszek	5d93fdfa89	[LV] Add TTI::shouldMaximizeVectorBandwidth to allow enabling it per target The default implementation returns false and keeps the current behavior. Differential Revision: https://reviews.llvm.org/D44735 llvm-svn: 328632	2018-03-27 16:14:11 +00:00
Matthew Simpson	6c289a1c74	[SLP] Stop counting cost of gather sequences with multiple uses When building the SLP tree, we look for reuse among the vectorized tree entries. However, each gather sequence is represented by a unique tree entry, even though the sequence may be identical to another one. This means, for example, that a gather sequence with two uses will be counted twice when computing the cost of the tree. We should only count the cost of the definition of a gather sequence rather than its uses. During code generation, the redundant gather sequences are emitted, but we optimize them away with CSE. So it looks like this problem just affects the cost model. Differential Revision: https://reviews.llvm.org/D44742 llvm-svn: 328316	2018-03-23 14:18:27 +00:00
David Blaikie	2be3922807	Fix a couple of layering violations in Transforms Remove #include of Transforms/Scalar.h from Transform/Utils to fix layering. Transforms depends on Transforms/Utils, not the other way around. So remove the header and the "createStripGCRelocatesPass" function declaration (& definition) that is unused and motivated this dependency. Move Transforms/Utils/Local.h into Analysis because it's used by Analysis/MemoryBuiltins.cpp. llvm-svn: 328165	2018-03-21 22:34:23 +00:00
Andrei Elovikov	8b8253fdc7	[LV] Let recordVectorLoopValueForInductionCast to check if IV was created from the cast. Summary: It turned out to be error-prone to expect the callers to handle that - better to leave the decision to this routine and make the required data to be explicitly passed to the function. This handles the case that was missed in the r322473 and fixes the assert mentioned in PR36524. Reviewers: dorit, mssimpso, Ayal, dcaballe Reviewed By: dcaballe Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D43812 llvm-svn: 327960	2018-03-20 09:04:39 +00:00
Diego Caballero	cae4994a58	[LV] Test commit. Removing white space. This is just to check that I have commit access privilege. llvm-svn: 327656	2018-03-15 19:34:27 +00:00
Matt Davis	9407bb5f54	[CleanUp] Remove NumInstructions field from LoopVectorizer's RegisterUsage struct. Summary: This variable is largely going unused; aside from reporting number of instructions for in DEBUG builds. The only use of NumInstructions is in debug output to represent the LoopSize. That value can be can be misleading as it also includes metadata instructions (e.g., DBG_VALUE) which have no real impact. If we do choose to keep this around, we probably should guard it by a DEBUG macro, as it's not used in production builds. Reviewers: majnemer, congh, rengolin Reviewed By: rengolin Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D44495 llvm-svn: 327589	2018-03-14 23:30:31 +00:00
Haicheng Wu	aee0af3e23	[SLP] clean some formats llvm-svn: 327433	2018-03-13 18:44:19 +00:00
Renato Golin	038ede2a16	[NFC] Consolidate six getPointerOperand() utility functions into one place There are six separate instances of getPointerOperand() utility. LoopVectorize.cpp has one of them, and I don't want to create a 7th one while I'm trying to move LoopVectorizationLegality into a separate file (eventual objective is to move it to Analysis tree). See http://lists.llvm.org/pipermail/llvm-dev/2018-February/120999.html for llvm-dev discussions Closes D43323. Patch by Hideki Saito <hideki.saito@intel.com>. llvm-svn: 327173	2018-03-09 21:05:58 +00:00
Renato Golin	6b62039bb0	[LV] Fix vectorizer's isUniform() abuse triggers assert in SCEV Fixes PR36311. See more detailed analysis in https://bugs.llvm.org/show_bug.cgi?id=36311. isUniform() information is recomputed after LV started transforming the underlying IR and that triggered an assert in SCEV. From vectorizer's architectural perspective, such information, while still useful in vector code gen, should not be recomputed after the start of transforming the LLVM IR. Instead, we should collect and cache such information during the analysis phase of LV and use the cached info during code gen. From the symptom perspective, this assert as it stands right now is not very useful. Legality already rejected loops that would trigger the assert. As such, commenting out the assert is NFC from vectorizer's functionality perspective. On top of that, just above the assertion, we check for unit-strided load/store or gather scatter. Addresses can't be uniform below that check. From vectorization theory point of view, we don't have to reject all cases of stores to uniform addresses. Eventually, we should support safe/profitable cases. This patch resolves the issue by removing the useless assertion that is invoking LAA's isUniform() that requires up-to-date DomTree ---- once vector code gen starts modifying CFG, we don't have an up-to-date DomTree. Patch by Hideki Saito <hideki.saito@intel.com>. llvm-svn: 327109	2018-03-09 10:31:31 +00:00
Farhana Aleen	89196642f7	[AMDGPU] Increased vector length for global/constant loads. Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords. Author: FarhanaAleen Reviewed By: rampitec Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D44179 llvm-svn: 326910	2018-03-07 17:09:18 +00:00
Farhana Aleen	347d12b4ce	Revert "[AMDGPU] Widened vector length for global/constant address space." This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03. llvm-svn: 326907	2018-03-07 16:55:27 +00:00
Farhana Aleen	0d03d0588d	[AMDGPU] Widened vector length for global/constant address space. llvm-svn: 326904	2018-03-07 16:29:05 +00:00
Sven van Haastregt	19f531d31e	[LoadStoreVectorizer] Differentiate between <1 x T> and T The LoadStoreVectorizer thought that <1 x T> and T were the same types when merging stores, leading to a crash later. Patch by Erik Hogeman. Differential Revision: https://reviews.llvm.org/D44014 llvm-svn: 326884	2018-03-07 10:29:28 +00:00
Florian Hahn	515acd64fd	[LV][CFG] Add irreducible CFG detection for outer loops This patch adds support for detecting outer loops with irreducible control flow in LV. Current detection uses SCCs and only works for innermost loops. This patch adds a utility function that works on any CFG, given its RPO traversal and its LoopInfoBase. This function is a generalization of isIrreducibleCFG from lib/CodeGen/ShrinkWrap.cpp. The code in lib/CodeGen/ShrinkWrap.cpp is also updated to use the new generic utility function. Patch by Diego Caballero <diego.caballero@intel.com> Differential Revision: https://reviews.llvm.org/D40874 llvm-svn: 326568	2018-03-02 12:24:25 +00:00
David Green	7c35de124a	[Dominators] Remove verifyDomTree and add some verifying for Post Dom Trees Removes verifyDomTree, using assert(verify()) everywhere instead, and changes verify a little to always run IsSameAsFreshTree first in order to print good output when we find errors. Also adds verifyAnalysis for PostDomTrees, which will allow checking of PostDomTrees it the same way we check DomTrees and MachineDomTrees. Differential Revision: https://reviews.llvm.org/D41298 llvm-svn: 326315	2018-02-28 11:00:08 +00:00
Renato Golin	9d1b2acaaa	[LV] Move isLegalMasked* functions from Legality to CostModel All SIMD architectures can emulate masked load/store/gather/scatter through element-wise condition check, scalar load/store, and insert/extract. Therefore, bailing out of vectorization as legality failure, when they return false, is incorrect. We should proceed to cost model and determine profitability. This patch is to address the vectorizer's architectural limitation described above. As such, I tried to keep the cost model and vectorize/don't-vectorize behavior nearly unchanged. Cost model tuning should be done separately. Please see http://lists.llvm.org/pipermail/llvm-dev/2018-January/120164.html for RFC and the discussions. Closes D43208. Patch by: Hideki Saito <hideki.saito@intel.com> llvm-svn: 326079	2018-02-26 11:06:36 +00:00
Alexey Bataev	7f246e003a	[SLP] Allow vectorization of reversed loads. Summary: Reversed loads are handled as gathering. But we can just reshuffle these values. Patch adds support for vectorization of reversed loads. Reviewers: RKSimon, spatel, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43022 llvm-svn: 325134	2018-02-14 15:29:15 +00:00
Elena Demikhovsky	945b7e5aa6	Adding a width of the GEP index to the Data Layout. Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout. p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits. The index size parameter is optional, if not specified, it is equal to the pointer size. Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width. It works fine if you can convert pointer to integer for address calculation and all registered targets do this. But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout. http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account. This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size. Differential Revision: https://reviews.llvm.org/D42123 llvm-svn: 325102	2018-02-14 06:58:08 +00:00
Alexey Bataev	ca2396e673	[SLP] Take user instructions cost into consideration in insertelement vectorization. Summary: For better vectorization result we should take into consideration the cost of the user insertelement instructions when we try to vectorize sequences that build the whole vector. I.e. if we have the following scalar code: ``` <Scalar code> insertelement <ScalarCode>, ... ``` we should consider the cost of the last `insertelement ` instructions as the cost of the scalar code. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42657 llvm-svn: 324893	2018-02-12 14:54:48 +00:00
Mircea Trofin	73b96d6dcf	[LV] Fix analyzeInterleaving when -pass-remarks enabled Summary: If -pass-remarks=loop-vectorize, atomic ops will be seen by analyzeInterleaving(), even though canVectorizeMemory() == false. This is because we are requesting extra analysis instead of bailing out. In such a case, we end up with a Group in both Load- and StoreGroups, and then we'll try to access freed memory when traversing LoadGroups after having had released the Group when iterating over StoreGroups. The fix is to include mayWriteToMemory() when validating that two instructions are the same kind of memory operation. Reviewers: mssimpso, davidxl Reviewed By: davidxl Subscribers: hsaito, fhahn, llvm-commits Differential Revision: https://reviews.llvm.org/D43064 llvm-svn: 324786	2018-02-10 00:07:45 +00:00
Mircea Trofin	06ac8cfbd1	Verify profile data confirms large loop trip counts. Summary: Loops with inequality comparers, such as: // unsigned bound for (unsigned i = 1; i < bound; ++i) {...} have getSmallConstantMaxTripCount report a large maximum static trip count - in this case, 0xffff fffe. However, profiling info may show that the trip count is much smaller, and thus counter-recommend vectorization. This change: - flips loop-vectorize-with-block-frequency on by default. - validates profiled loop frequency data supports vectorization, when static info appears to not counter-recommend it. Absence of profile data means we rely on static data, just as we've done so far. Reviewers: twoh, mkuper, davidxl, tejohnson, Ayal Reviewed By: davidxl Subscribers: bkramer, llvm-commits Differential Revision: https://reviews.llvm.org/D42946 llvm-svn: 324543	2018-02-07 23:29:52 +00:00
Clement Courbet	9c22d8018c	[SLPVectorizer][NFC] Make a loop more readable. llvm-svn: 324482	2018-02-07 14:26:43 +00:00
Chad Rosier	a097bc69df	[LV] Use Demanded Bits and ValueTracking for reduction type-shrinking The type-shrinking logic in reduction detection, although narrow in scope, is also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies the approach to rely on the demanded bits and value tracking analyses, if available. We currently perform type-shrinking separately for reductions and other instructions in the loop. Long-term, we should probably think about computing minimal bit widths in a more complete way for the loops we want to vectorize. PR35734 Differential Revision: https://reviews.llvm.org/D42309 llvm-svn: 324195	2018-02-04 15:42:24 +00:00
David Blaikie	42bcdffd24	Add missing includes llvm-svn: 324040	2018-02-02 00:11:09 +00:00
Alexey Bataev	9c5c103283	[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle. Summary: If the same value is going to be vectorized several times in the same tree entry, this entry is considered to be a gather entry and cost of this gather is counter as cost of InsertElementInstrs for each gathered value. But we can consider these elements as ShuffleInstr with SK_PermuteSingle shuffle kind. Reviewers: spatel, RKSimon, mkuper, hfinkel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38697 llvm-svn: 323662	2018-01-29 16:08:52 +00:00
Alexey Bataev	f86be12182	Revert "[SLP] Fix for PR32086: Count InsertElementInstr of the same elements as shuffle." This reverts commit r323530 to fix possible problems in users code. llvm-svn: 323581	2018-01-27 02:42:21 +00:00
Alexey Bataev	dce1614d75	Revert "[SLP] Removed the warning about unused variable, NFC." This reverts commit r323533 to fix possible problems in users code. llvm-svn: 323580	2018-01-27 02:42:17 +00:00

1 2 3 4 5 ...

1700 Commits