llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	e50059f6b6	[x86] form reduction intrinsics from vectorizers instead of raw IR Motivating examples are seen in the PhaseOrdering tests based on: https://bugs.llvm.org/show_bug.cgi?id=43953#c2 - if we have intrinsics there, some pass can fold them. The intrinsics are still named "experimental" at this point, but if there is no fallout from this patch, that will be a good indicator that it is safe to finalize them. Differential Revision: https://reviews.llvm.org/D80867	2020-06-05 12:38:49 -04:00
Florian Hahn	b446ec56a2	[LV] Make sure the MaxVF is a power-of-2 by rounding down. LV currently only supports power of 2 vectorization factors, which has been made explicit with the assertion added in `840450549c`. However, if the widest type is not a power-of-2 the computed MaxVF won't be a power-of-2 either. This patch updates computeFeasibleMaxVF to ensure the returned value is a power-of-2 by rounding down to the nearest power-of-2. Fixes PR46139. Reviewers: Ayal, gilr, rengolin Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D80870	2020-06-02 10:40:49 +01:00
Sanjay Patel	db653ff6b7	[LoopVectorize] auto-generate complete test checks; NFC	2020-05-29 13:14:08 -04:00
Sanjay Patel	f78eecbb93	[LoopVectorize] regenerate test checks; NFC Align attributes are now visible.	2020-05-29 13:02:45 -04:00
Sanjay Patel	5e94273227	[LoopVectorize] auto-generate complete checks; NFC	2020-05-29 13:01:35 -04:00
Sanjay Patel	9d1f95bf9f	[LoopVectorize] regenerate test checks; NFC Align attributes are now visible.	2020-05-29 13:01:35 -04:00
Sanjay Patel	0b21c6706a	[LoopVectorize] auto-generate complete test checks; NFC	2020-05-29 13:01:35 -04:00
Florian Hahn	0deab8a54f	[LV] Either get invariant condition OR vector condition. Currently we unconditionally get the first lane of the condition operand, even if we later use the full vector condition. This can result in some unnecessary instructions being generated. Suggested as follow-up in D80219.	2020-05-24 17:16:42 +01:00
Sjoerd Meijer	9529597cf4	Recommit #2 : "[LV] Induction Variable does not remain scalar under tail-folding." This was reverted because of a miscompilation. At closer inspection, the problem was actually visible in a changed llvm regression test too. This one-line follow up fix/recommit will splat the IV, which is what we are trying to avoid if unnecessary in general, if tail-folding is requested even if all users are scalar instructions after vectorisation. Because with tail-folding, the splat IV will be used by the predicate of the masked loads/stores instructions. The previous version omitted this, which caused the miscompilation. The original commit message was: If tail-folding of the scalar remainder loop is applied, the primary induction variable is splat to a vector and used by the masked load/store vector instructions, thus the IV does not remain scalar. Because we now mark that the IV does not remain scalar for these cases, we don't emit the vector IV if it is not used. Thus, the vectoriser produces less dead code. Thanks to Ayal Zaks for the direction how to fix this.	2020-05-13 13:50:09 +01:00
Benjamin Kramer	f936457f80	Revert "Recommit "[LV] Induction Variable does not remain scalar under tail-folding."" This reverts commit `ae45b4dbe7`. It causes miscompilations, test case on the mailing list.	2020-05-08 14:49:10 +02:00
Sjoerd Meijer	ae45b4dbe7	Recommit "[LV] Induction Variable does not remain scalar under tail-folding." With 3 llvm regr tests fixed/updated that I had missed.	2020-05-07 11:52:20 +01:00
Sjoerd Meijer	20d67ffeae	Revert "[LV] Induction Variable does not remain scalar under tail-folding." This reverts commit `617aa64c84`. while I investigate buildbot failures.	2020-05-07 09:29:56 +01:00
Sjoerd Meijer	617aa64c84	[LV] Induction Variable does not remain scalar under tail-folding. If tail-folding of the scalar remainder loop is applied, the primary induction variable is splat to a vector and used by the masked load/store vector instructions, thus the IV does not remain scalar. Because we now mark that the IV does not remain scalar for these cases, we don't emit the vector IV if it is not used. Thus, the vectoriser produces less dead code. Thanks to Ayal Zaks for the direction how to fix this. Differential Revision: https://reviews.llvm.org/D78911	2020-05-07 09:15:23 +01:00
Simon Pilgrim	090cae8491	[TTI] Add DemandedElts to getScalarizationOverhead The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited. This patch does 2 things: 1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern. 2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs. This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing. A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well. Reviewed By: @craig.topper Differential Revision: https://reviews.llvm.org/D78216	2020-04-29 12:00:38 +01:00
Craig Topper	5eff75d86a	[X86][CostModel] Improve costs for fp_to_uint/fp_to_sint for vXi8/vXi16/v2i32 results. Differential Revision: https://reviews.llvm.org/D78893	2020-04-27 10:35:15 -07:00
Ayal Zaks	a3c964a278	[LV] Fix recording of BranchTakenCount for FoldTail When folding tail, branch taken count is computed during initial VPlan execution and recorded to be used by the compare computing the loop's mask. This recording should directly set the State, instead of reusing Value2VPValue mapping which serves original Values present prior to vectorization. The branch taken count may be a constant Value, which may be used elsewhere in the loop; trying to employ Value2VPValue for both leads to the issue reported in https://reviews.llvm.org/D76992#inline-721028 Differential Revision: https://reviews.llvm.org/D78847	2020-04-26 20:13:10 +03:00
Ayal Zaks	1678489234	[LV] FoldTail w/o Primary Induction Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector induction for use in fold-tail-with-masking, if a primary induction is absent. The canonical scalar IV having start = 0 and step = VFUF, created during code -gen to control the vector loop, is widened into a canonical vector IV having start = {<PartVF, PartVF+1, ..., PartVF+VF-1> for 0 <= Part < UF} and step = <VFUF, VFUF, ..., VF*UF>. Differential Revision: https://reviews.llvm.org/D77635	2020-04-09 17:45:23 +03:00
Craig Topper	ca376782ff	[LoopVectorize] Move testing for SVML vectorization of exp2f_finite/exp2_finite from svml-calls.ll to svml-calls-finite.ll where the finite versions of log, pow, and exp already were.	2020-04-08 18:13:55 -07:00
laith sakka	a0983ed3d2	Handle exp2 with proper vectorization and lowering to SVML calls Summary: Add mapping from exp2 math functions to corresponding SVML calls. This is a follow up and extension for llvm diff https://reviews.llvm.org/D19544 Test Plan: - update test case and run ninja check. - run tests locally Reviewers: wenlei, hoyFB, mmasten, mzolotukhin, spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77114	2020-04-02 21:11:13 -07:00
Roman Lebedev	1badf7c33a	[InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant Summary: This is potentially more friendly for further optimizations, analysies, e.g.: https://godbolt.org/z/G24anE This resolves phase-ordering bug that was introduced in D75145 for https://godbolt.org/z/2gBwF2 https://godbolt.org/z/XvgSua Reviewers: spatel, nikic, dmgreen, xbolva00 Reviewed By: nikic, xbolva00 Subscribers: hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75757	2020-03-06 21:39:07 +03:00
Simon Pilgrim	168a44a70e	[CostModel][X86] Improve extract/insert element costs (PR43605) This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index. It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true). Differential Revision: https://reviews.llvm.org/D74976	2020-02-27 15:54:13 +00:00
Roman Lebedev	d6f47aeb51	[SCEV] SCEVExpander::isHighCostExpansionHelper(): cost-model min/max (PR44668) Summary: Previosly we simply always said that `SCEVMinMaxExpr` is too costly to expand. But this isn't really true, it expands into just a comparison+swap pair. And again much like with add/mul, there will be one less such pair than the number of operands. And we need to count the cost of operands themselves. This does change a number of testcases, and as far as i can tell, all of these changes are improvements, in the sense that we fixed up more latches to do the [in]equality comparison. This concludes cost-modelling changes, no other SCEV expressions exist as of now. This is a part of addressing [[ https://bugs.llvm.org/show_bug.cgi?id=44668 \| PR44668 ]]. Reviewers: reames, mkazantsev, wmi, sanjoy Reviewed By: mkazantsev Subscribers: hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73744	2020-02-25 23:05:59 +03:00
Simon Pilgrim	2769fb90f0	[LoopVectorize][X86] Regenerate tests. NFCI.	2020-02-21 18:23:55 +00:00
Roman Lebedev	3bd33ccfdf	[NFC?][SCEV][LoopVectorize] Add datalayout to the X86/float-induction-x86.ll test Summary: Currently, `SCEVExpander::isHighCostExpansionHelper()` has the following logic: ``` if (auto UDivExpr = dyn_cast<SCEVUDivExpr>(S)) { // If the divisor is a power of two and the SCEV type fits in a native // integer (and the LHS not expensive), consider the division cheap // irrespective of whether it occurs in the user code since it can be // lowered into a right shift. if (auto SC = dyn_cast<SCEVConstant>(UDivExpr->getRHS())) if (SC->getAPInt().isPowerOf2()) { if (isHighCostExpansionHelper(UDivExpr->getLHS(), L, At, BudgetRemaining, TTI, Processed)) return true; const DataLayout &DL = L->getHeader()->getParent()->getParent()->getDataLayout(); unsigned Width = cast<IntegerType>(UDivExpr->getType())->getBitWidth(); return DL.isIllegalInteger(Width); } ``` Since this test does not have a datalayout specified, `SCEVExpander::isHighCostExpansionHelper()` says that `[[TMP2:%.*]] = lshr exact i64 [[TMP1]], 5` is high-cost, and didn't perform it. But future patches will change that logic to solely rely on cost-model, without any such datalayout checks, so i think it is best to show that that change is ephemeral, and can already happen without costmodel changes. Reviewers: reames, fhahn, sanjoy, craig.topper, RKSimon Reviewed By: RKSimon Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73717	2020-02-12 12:27:38 +03:00
Simon Pilgrim	105e5c940c	[ValueTracking] Add DemandedElts support to computeKnownBits/ComputeNumSignBits (PR36319) This patch adds initial support for a DemandedElts mask to the internal computeKnownBits/ComputeNumSignBits methods, matching the SelectionDAG and GlobalISel equivalents. So far only a couple of instructions have been setup to handle the DemandedElts, the remainder still using the existing 'all elements' default. The plan is to extend support as we have test coverage. Differential Revision: https://reviews.llvm.org/D73435	2020-02-01 12:45:46 +00:00
Roman Lebedev	7bca4a28f5	[NFC][LoopVectorize] Autogenerate tests affected by isHighCostExpansionHelper() cost modelling (PR44668)	2020-01-27 23:34:30 +03:00
Florian Hahn	59ac44b3c1	[LV] Make X86/assume.ll X86 independent (NFC). The test does not check anything X86 specific. This is a preparation for the D68814.	2020-01-16 10:01:35 +00:00
Matt Arsenault	f26ed6e47c	llc: Change behavior of -mcpu with existing attribute Don't overwrite existing target-cpu attributes. I've often found the replacement behavior annoying, and this is inconsistent with how the fast math command line flags interact with the function attributes. Does not yet change target-features, since I think that should behave as a concatenation.	2020-01-07 10:10:25 -05:00
Fangrui Song	a36ddf0aa9	Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351	2019-12-24 16:27:51 -08:00
Fangrui Song	502a77f125	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351	2019-12-24 15:57:33 -08:00
Ayal Zaks	e498be5738	[LV] Strip wrap flags from vectorized reductions A sequence of additions or multiplications that is known not to wrap, may wrap if it's order is changed (i.e., reassociated). Therefore when vectorizing integer sum or product reductions, their no-wrap flags need to be removed. Fixes PR43828 Patch by Denis Antrushin Differential Revision: https://reviews.llvm.org/D69563	2019-12-20 14:48:53 +02:00
Ayal Zaks	6ed9cef25f	[LV] Scalar with predication must not be uniform Fix PR40816: avoid considering scalar-with-predication instructions as also uniform-after-vectorization. Instructions identified as "scalar with predication" will be "vectorized" using a replicating region. If such instructions are also optimized as "uniform after vectorization", namely when only the first of VF lanes is used, such a replicating region becomes erroneous - only the first instance of the region can and should be formed. Fix such cases by not considering such instructions as "uniform after vectorization". Differential Revision: https://reviews.llvm.org/D70298	2019-12-03 19:50:24 +02:00
Sanjay Patel	5c166f1d19	[x86] make SLM extract vector element more expensive than default I'm not sure what the effect of this change will be on all of the affected tests or a larger benchmark, but it fixes the horizontal add/sub problems noted here: https://reviews.llvm.org/D59710?vs=227972&id=228095&whitespace=ignore-most#toc The costs are based on reciprocal throughput numbers in Agner's tables for PEXTR*; these appear to be very slow ops on Silvermont. This is a small step towards the larger motivation discussed in PR43605: https://bugs.llvm.org/show_bug.cgi?id=43605 Also, it seems likely that insert/extract is the source of perf regressions on other CPUs (up to 30%) that were cited as part of the reason to revert D59710, so maybe we'll extend the table-based approach to other subtargets. Differential Revision: https://reviews.llvm.org/D70607	2019-11-27 14:08:56 -05:00
Craig Topper	4592f70758	[LV] Move interleave_short_tc.ll into the X86 directory to hopefully make fix non-X86 bots.	2019-11-01 10:41:18 -07:00
Jay Foad	843c0adf0f	[ConstantFold] Fold extractelement of getelementptr Summary: Getelementptr has vector type if any of its operands are vectors (the scalar operands being implicitly broadcast to all vector elements). Extractelement applied to a vector getelementptr can be folded by applying the extractelement in turn to all of the vector operands. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69379	2019-10-28 18:32:39 +00:00
Craig Topper	18824d25d8	[LV] Interleaving should not exceed estimated loop trip count. Currently we may do iterleaving by more than estimated trip count coming from the profile or computed maximum trip count. The solution is to use "best known" trip count instead of exact one in interleaving analysis. Patch by Evgeniy Brevnov. Differential Revision: https://reviews.llvm.org/D67948	2019-10-28 10:58:22 -07:00
Zi Xuan Wu	9802268ad3	recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374634	2019-10-12 02:53:04 +00:00
Jinsong Ji	9912232b46	Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize" Also Revert "[LoopVectorize] Fix non-debug builds after rL374017" This reverts commit `9f41deccc0`. This reverts commit `18b6fe07bc`. The patch is breaking PowerPC internal build, checked with author, reverting on behalf of him for now due to timezone. llvm-svn: 374091	2019-10-08 17:32:56 +00:00
Zi Xuan Wu	9f41deccc0	[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance. So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types. For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled. It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions. Differential revision: https://reviews.llvm.org/D67148 llvm-svn: 374017	2019-10-08 03:28:33 +00:00
Sanjay Patel	b743f18b1f	[LoopVectorize] add test that asserted after cost model change (PR43582); NFC llvm-svn: 373913	2019-10-07 14:48:27 +00:00
Philip Reames	0e8d5085ac	Remove a duplicate test Turns out I'd already added exactly the same test under the name non_unit_stride. llvm-svn: 371777	2019-09-12 21:40:15 +00:00
Florian Hahn	0741810077	[LV] Update test case after r371768. llvm-svn: 371769	2019-09-12 20:07:17 +00:00
Philip Reames	e0cab70718	Precommit tests for generalization of load dereferenceability in loop llvm-svn: 371747	2019-09-12 17:09:01 +00:00
Philip Reames	b90f94f42e	[LV] Support invariant addresses in speculation logic Implement a TODO from rL371452, and handle loop invariant addresses in predicated blocks. If we can prove that the load is safe to speculate into the header, then we can avoid using a masked.load in favour of a normal load. This is mostly about vectorization robustness. In the common case, it's generally expected that LICM/LoadStorePromotion would have eliminated such loads entirely. Differential Revision: https://reviews.llvm.org/D67372 llvm-svn: 371745	2019-09-12 16:49:10 +00:00
Philip Reames	b8cddb7611	[Tests] Fix a typo in a test llvm-svn: 371456	2019-09-09 21:33:59 +00:00
Philip Reames	847fbf7013	[Tests] Precommit test case for D67372 llvm-svn: 371455	2019-09-09 21:32:16 +00:00
Philip Reames	7403569be7	[LoopVectorize] Leverage speculation safety to avoid masked.loads If we're vectorizing a load in a predicated block, check to see if the load can be speculated rather than predicated. This allows us to generate a normal vector load instead of a masked.load. To do so, we must prove that all bytes accessed on any iteration of the original loop are dereferenceable, and that all loads (across all iterations) are properly aligned. This is equivelent to proving that hoisting the load into the loop header in the original scalar loop is safe. Note: There are a couple of code motion todos in the code. My intention is to wait about a day - to be sure this sticks - and then perform the NFC motion without furthe review. Differential Revision: https://reviews.llvm.org/D66688 llvm-svn: 371452	2019-09-09 20:54:13 +00:00
Craig Topper	a31112e357	[X86] Replace -mcpu with -mattr on some tests. llvm-svn: 371260	2019-09-06 21:48:44 +00:00
Ayal Zaks	d15df0ede5	[LV] Fold tail by masking - handle reductions Allow vectorizing loops that have reductions when tail is folded by masking. A select is introduced in VPlan, choosing between the last value carried by the loop-exit/live-out instruction of the reduction, and the penultimate value carried by the reduction phi, according to the "i < n" mask of fold-tail. This select replaces the last value as the live-out value of the loop. Differential Revision: https://reviews.llvm.org/D66720 llvm-svn: 370173	2019-08-28 09:02:23 +00:00
Philip Reames	2de9788815	Preland test cases for D66688 to make diffs clear. llvm-svn: 369959	2019-08-26 20:37:06 +00:00
Dorit Nuzman	d57d73daed	[LV] fold-tail predication should be respected even with assume_safety assume_safety implies that loads under "if's" can be safely executed speculatively (unguarded, unmasked). However this assumption holds only for the original user "if's", not those introduced by the compiler, such as the fold-tail "if" that guards us from loading beyond the original loop trip-count. Currently the combination of fold-tail and assume-safety pragmas results in ignoring the fold-tail predicate that guards the loads, generating unmasked loads. This patch fixes this behavior. Differential Revision: https://reviews.llvm.org/D66106 Reviewers: Ayal, hsaito, fhahn llvm-svn: 368973	2019-08-15 07:12:14 +00:00
Dorit Nuzman	491ca2425d	[LV] Fold-tail flag This is the compiler-flag equivalent of the Predicate pragma (https://reviews.llvm.org/D65197), to direct the vectorizer to fold the remainder-loop into the main-loop using predication. Differential Revision: https://reviews.llvm.org/D66108 Reviewers: Ayal, hsaito, fhahn, SjoerdMeije llvm-svn: 368801	2019-08-14 05:22:20 +00:00
Craig Topper	005b22855e	[LoopVectorize][X86] Clamp interleave factor if we have a known constant trip count that is less than VF*interleave If we know the trip count, we should make sure the interleave factor won't cause the vectorized loop to exceed it. Improves one of the cases from PR42674 Differential Revision: https://reviews.llvm.org/D65896 llvm-svn: 368215	2019-08-07 21:44:14 +00:00
Craig Topper	0a05a04e5b	[LoopVectorize][X86] Add test case for missed vectorization from PR42674. We do end vectorizing the code, but use an interleave factor that is too high and causes the vector code to be dead. llvm-svn: 368197	2019-08-07 19:07:10 +00:00
Jay Foad	b874b3d3fa	[LV] Fix test failure in a Release build. llvm-svn: 367666	2019-08-02 08:33:41 +00:00
Hideki Saito	8871ac41a7	Moves the newly added test interleaved-accesses-waw-dependency.ll to X86 subdirectory. ps4-buildslave1 reported a failure. The test has x86 triple. llvm-svn: 367659	2019-08-02 07:25:09 +00:00
Sjoerd Meijer	20b198ec5e	[LV] Tail-Loop Folding This allows folding of the scalar epilogue loop (the tail) into the main vectorised loop body when the loop is annotated with a "vector predicate" metadata hint. To fold the tail, instructions need to be predicated (masked), enabling/disabling lanes for the remainder iterations. Differential Revision: https://reviews.llvm.org/D65197 llvm-svn: 367592	2019-08-01 18:21:44 +00:00
Petr Hosek	e28fca29fe	Revert "[IRBuilder] Fold consistently for or/and whether constant is LHS or RHS" This reverts commit r365260 which broke the following tests: Clang :: CodeGenCXX/cfi-mfcall.cpp Clang :: CodeGenObjC/ubsan-nullability.m LLVM :: Transforms/LoopVectorize/AArch64/pr36032.ll llvm-svn: 365284	2019-07-07 22:12:01 +00:00
Philip Reames	9812668d77	[IRBuilder] Fold consistently for or/and whether constant is LHS or RHS Without this, we have the unfortunate property that tests are dependent on the order of operads passed the CreateOr and CreateAnd functions. In actual usage, we'd promptly optimize them away, but it made tests slightly more verbose than they should have been. llvm-svn: 365260	2019-07-06 04:28:00 +00:00
Orlando Cazalet-Hyams	1251cac62a	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 > llvm-svn: 363046 llvm-svn: 363786	2019-06-19 10:50:47 +00:00
Warren Ristow	6452bdd29b	[LV] Suppress vectorization in some nontemporal cases When considering a loop containing nontemporal stores or loads for vectorization, suppress the vectorization if the corresponding vectorized store or load with the aligment of the original scaler memory op is not supported with the nontemporal hint on the target. This adds two new functions: bool isLegalNTStore(Type DataType, unsigned Alignment) const; bool isLegalNTLoad(Type DataType, unsigned Alignment) const; to TTI, leaving the target independent default implementation as returning true, but with overriding implementations for X86 that check the legality based on available Subtarget features. This fixes https://llvm.org/PR40759 Differential Revision: https://reviews.llvm.org/D61764 llvm-svn: 363581	2019-06-17 17:20:08 +00:00
Bjorn Pettersson	83773b77a5	[LV] Deny irregular types in interleavedAccessCanBeWidened Summary: Avoid that loop vectorizer creates loads/stores of vectors with "irregular" types when interleaving. An example of an irregular type is x86_fp80 that is 80 bits, but that may have an allocation size that is 96 bits. So an array of x86_fp80 is not bitcast compatible with a vector of the same type. Not sure if interleavedAccessCanBeWidened is the best place for this check, but it solves the problem seen in the added test case. And it is the same kind of check that already exists in memoryInstructionCanBeWidened. Reviewers: fhahn, Ayal, craig.topper Reviewed By: fhahn Subscribers: hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63386 llvm-svn: 363547	2019-06-17 12:02:24 +00:00
Fangrui Song	ac14f7b10c	[lit] Delete empty lines at the end of lit.local.cfg NFC llvm-svn: 363538	2019-06-17 09:51:07 +00:00
Sam Parker	0cf9639a9c	[SCEV] Pass NoWrapFlags when expanding an AddExpr InsertBinop now accepts NoWrapFlags, so pass them through when expanding a simple add expression. This is the first re-commit of the functional changes from rL362687, which was previously reverted. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 363364	2019-06-14 09:19:41 +00:00
Orlando Cazalet-Hyams	a947156396	Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion" This reverts commit `1a0f7a2077`. See phabricator thread for D60831. llvm-svn: 363132	2019-06-12 08:34:51 +00:00
Orlando Cazalet-Hyams	1a0f7a2077	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. I have set up a separate review D61933 for a fix which is required for this patch. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse Reviewed By: hfinkel, jmorse Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 363046	2019-06-11 10:37:20 +00:00
Benjamin Kramer	f1249442cf	Revert "[SCEV] Use wrap flags in InsertBinop" This reverts commit r362687. Miscompiles llvm-profdata during selfhost. llvm-svn: 362699	2019-06-06 12:35:46 +00:00
Sam Parker	7cc580f5e9	[SCEV] Use wrap flags in InsertBinop If the given SCEVExpr has no (un)signed flags attached to it, transfer these to the resulting instruction or use them to find an existing instruction. Differential Revision: https://reviews.llvm.org/D61934 llvm-svn: 362687	2019-06-06 08:56:26 +00:00
Simon Pilgrim	8a32ca381d	[CostModel][X86] Improve masked load/store AVX1/AVX2 costs A mixture of internal tests and review of the scheduler models indicates we're overestimating the cost of a masked load, which we're estimating at 4x regular memory ops - more realistic values indicates that its closer to 2x. Masked stores costs are a lot more diverse but 8x is roughly in the middle of the range. e.g. SandyBridge defm : X86WriteRes<WriteFMaskedLoad, [SBPort23,SBPort05], 8, [1,2], 3>; defm : X86WriteRes<WriteFMaskedLoadY, [SBPort23,SBPort05], 9, [1,2], 3>; defm : X86WriteRes<WriteFMaskedStore, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; defm : X86WriteRes<WriteFMaskedStoreY, [SBPort4,SBPort01,SBPort23], 5, [1,1,1], 3>; e.g. Btver2 defm : X86WriteRes<WriteFMaskedLoad, [JLAGU, JFPU01, JFPX], 6, [1, 2, 2], 1>; defm : X86WriteRes<WriteFMaskedLoadY, [JLAGU, JFPU01, JFPX], 6, [2, 4, 4], 2>; defm : X86WriteRes<WriteFMaskedStore, [JSAGU, JFPU01, JFPX], 6, [1, 1, 4], 1>; defm : X86WriteRes<WriteFMaskedStoreY, [JSAGU, JFPU01, JFPX], 6, [2, 2, 4], 2>; Differential Revision: https://reviews.llvm.org/D61257 llvm-svn: 362338	2019-06-02 20:37:02 +00:00
Craig Topper	778e445c58	[LoopVectorize] Add FNeg instruction support Differential Revision: https://reviews.llvm.org/D62510 llvm-svn: 362124	2019-05-30 18:19:35 +00:00
Craig Topper	a807495fd1	[LoopVectorize] Precommit tests for D62510. NFC llvm-svn: 362060	2019-05-30 06:48:13 +00:00
Sanjay Patel	012adfbb96	[LoopVectorizer] fix test file to not run the entire -O3 pipeline This test file has a long history of edits from changes outside of vectorization, and it would happen again with the proposal in D61726. End-to-end testing shouldn't be happening in a test file that is specifically checking for vector masked load/store ops. Larger-scale testing goes in PhaseOrdering or the test-suite. I've hopefully preserved the intent by taking what was completely unoptimized IR in some tests and passing that through the -O1 pipeline. That becomes the input IR, and now we just run the loop vectorizer and verify that the vector masked ops are produced as expected. llvm-svn: 360340	2019-05-09 13:43:22 +00:00
Kostya Serebryany	b9c5768302	revert r360162 as it breaks most of the buildbots llvm-svn: 360190	2019-05-07 20:57:11 +00:00
Orlando Cazalet-Hyams	78a6062c24	[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion Summary: Bug: https://bugs.llvm.org/show_bug.cgi?id=39024 The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here: A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins. B) Instructions in the middle block have different line numbers which give the impression of another iteration. In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks. Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel Reviewed By: hfinkel Subscribers: bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D60831 llvm-svn: 360162	2019-05-07 15:37:38 +00:00
Keno Fischer	a1a4adf4b9	[SCEV] Add explicit representations of umin/smin Summary: Currently we express umin as `~umax(~x, ~y)`. However, this becomes a problem for operands in non-integral pointer spaces, because `~x` is not something we can compute for `x` non-integral. However, since comparisons are generally still allowed, we are actually able to express `umin(x, y)` directly as long as we don't try to express is as a umax. Support this by adding an explicit umin/smin representation to SCEV. We do this by factoring the existing getUMax/getSMax functions into a new function that does all four. The previous two functions were largely identical. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D50167 llvm-svn: 360159	2019-05-07 15:28:47 +00:00
Alina Sbirlea	4e1ac95cf5	[PassManagerBuilder] Add option for interleaved loops, for loop vectorize. Summary: Match NewPassManager behavior: add option for interleaved loops in the old pass manager, and use that instead of the flag used to disable loop unroll. No changes in the defaults. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, dmgreen, hsaito, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61030 llvm-svn: 359615	2019-04-30 21:29:20 +00:00
Alina Sbirlea	733c8c40c8	Enable LoopVectorization by default. Summary: When refactoring vectorization flags, vectorization was disabled by default in the new pass manager. This patch re-enables is for both managers, and changes the assumptions opt makes, based on the new defaults. Comments in opt.cpp should clarify the intended use of all flags to enable/disable vectorization. Reviewers: chandlerc, jgorbe Subscribers: jlebar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61091 llvm-svn: 359167	2019-04-25 04:49:48 +00:00
Eric Christopher	cee313d288	Revert "Temporarily Revert "Add basic loop fusion pass."" The reversion apparently deleted the test/Transforms directory. Will be re-reverting again. llvm-svn: 358552	2019-04-17 04:52:47 +00:00
Eric Christopher	a863435128	Temporarily Revert "Add basic loop fusion pass." As it's causing some bot failures (and per request from kbarton). This reverts commit r358543/ab70da07286e618016e78247e4a24fcb84077fda. llvm-svn: 358546	2019-04-17 02:12:23 +00:00
Florian Hahn	e21ed594d8	[VPlan] Determine Vector Width programmatically. With this change, the VPlan native path is triggered with the directive: #pragma clang loop vectorize(enable) There is no need to specify the vectorize_width(N) clause. Patch by Francesco Petrogalli <francesco.petrogalli@arm.com> Differential Revision: https://reviews.llvm.org/D57598 llvm-svn: 357156	2019-03-28 10:37:12 +00:00
Simon Pilgrim	ff3abef395	[SLPVectorizer] reorderInputsAccordingToOpcode - remove non-Instruction canonicalization Remove attempts to commute non-Instructions to the LHS - the codegen changes appear to rely on chance more than anything else and also have a tendency to fight existing instcombine canonicalization which moves constants to the RHS of commutable binary ops. This is prep work towards: (a) reusing reorderInputsAccordingToOpcode for alt-shuffles and removing the similar reorderAltShuffleOperands (b) improving reordering to optimized cases with commutable and non-commutable instructions to still find splat/consecutive ops. Differential Revision: https://reviews.llvm.org/D59738 llvm-svn: 356913	2019-03-25 15:53:55 +00:00
Craig Topper	16dc165046	[InstCombine] Don't transform ((C1 OP zext(X)) & C2) -> zext((C1 OP X) & C2) if either zext or OP has another use. If they have other users we'll just end up increasing the instruction count. We might be able to weaken this to only one of them having a single use if we can prove that the and will be removed. Fixes PR41164. Differential Revision: https://reviews.llvm.org/D59630 llvm-svn: 356690	2019-03-21 17:50:49 +00:00
Nikita Popov	208381953b	[ValueTracking] Use computeConstantRange() for unsigned add/sub overflow Improve computeOverflowForUnsignedAdd/Sub in ValueTracking by intersecting the computeConstantRange() result into the ConstantRange created from computeKnownBits(). This allows us to detect some additional never/always overflows conditions that can't be determined from known bits. This revision also adds basic handling for constants to computeConstantRange(). Non-splat vectors will be handled in a followup. The signed case will also be handled in a followup, as it needs some more groundwork. Differential Revision: https://reviews.llvm.org/D59386 llvm-svn: 356489	2019-03-19 17:53:56 +00:00
Sanjoy Das	3f5ce18658	Reland "Relax constraints for reduction vectorization" Change from original commit: move test (that uses an X86 triple) into the X86 subdirectory. Original description: Gating vectorizing reductions on all fastmath flags seems unnecessary; `reassoc` should be sufficient. Reviewers: tvvikram, mkuper, kristof.beyls, sdesmalen, Ayal Reviewed By: sdesmalen Subscribers: dcaballe, huntergr, jmolloy, mcrosier, jlebar, bixia, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57728 llvm-svn: 355889	2019-03-12 01:31:44 +00:00
Florian Hahn	6ca0985aa5	[InterleavedAccessAnalysis] Fix integer overflow in insertMember. Without checking for integer overflow, invalid members can be added e.g. if the calculated key overflows, becomes positive and the largest key. This fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7560 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13128 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13229 Reviewers: Ayal, anna, hsaito, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D55538 llvm-svn: 355613	2019-03-07 17:50:16 +00:00
Nikita Popov	af2b0bef43	[ValueTracking] More accurate unsigned sub overflow detection Second part of D58593. Compute precise overflow conditions based on all known bits, rather than just the sign bits. Unsigned a - b overflows iff a < b, and we can determine whether this always/never happens based on the minimal and maximal values achievable for a and b subject to the known bits constraint. llvm-svn: 355109	2019-02-28 18:04:20 +00:00
Michael Kruse	77a614a6e1	Refactor setAlreadyUnrolled() and setAlreadyVectorized(). Loop::setAlreadyUnrolled() and LoopVectorizeHints::setLoopAlreadyUnrolled() both add loop metadata that stops the same loop from being transformed multiple times. This patch merges both implementations. In doing so we fix 3 potential issues: * setLoopAlreadyUnrolled() kept the llvm.loop.vectorize/interleave.* metadata even though it will not be used anymore. This already caused problems such as http://llvm.org/PR40546. Change the behavior to the one of setAlreadyUnrolled which deletes this loop metadata. * setAlreadyUnrolled() used to create a new LoopID by calling MDNode::get with nullptr as the first operand, then replacing it by the returned references using replaceOperandWith. It is possible that MDNode::get would instead return an existing node (due to de-duplication) that then gets modified. To avoid, use a fresh TempMDNode that does not get uniqued with anything else before replacing it with replaceOperandWith. * LoopVectorizeHints::matchesHintMetadataName() only compares the suffix of the attribute to set the new value for. That is, when called with "enable", would erase attributes such as "llvm.loop.unroll.enable", "llvm.loop.vectorize.enable" and "llvm.loop.distribute.enable" instead of the one to replace. Fortunately, function was only called with "isvectorized". Differential Revision: https://reviews.llvm.org/D57566 llvm-svn: 353738	2019-02-11 19:45:44 +00:00
Johannes Doerfert	00102c7d95	[ValueTracking] Look through casts when determining non-nullness Bitcast and certain Ptr2Int/Int2Ptr instructions will not alter the value of their operand and can therefore be looked through when we determine non-nullness. Differential Revision: https://reviews.llvm.org/D54956 llvm-svn: 352293	2019-01-26 23:40:35 +00:00
Simon Pilgrim	c934d3a01b	[CostModel][X86] Add explicit vector select costs Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)\|(Y & ~C)) bit select. Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason). The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests. llvm-svn: 351685	2019-01-20 13:55:01 +00:00
Michael Kruse	978ba61536	Introduce llvm.loop.parallel_accesses and llvm.access.group metadata. The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()). Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass. This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop). This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together. The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated. Differential Revision: https://reviews.llvm.org/D52116 llvm-svn: 349725	2018-12-20 04:58:07 +00:00
Sanjay Patel	608d128c42	[LoopVectorize] auto-generate complete checks; NFC The first test claims to show that the vectorizer will generate a vector load/loop, but then this file runs other passes which might scalarize that op. I'm removing instcombine from the RUN line here to break that dependency. Also, I'm generating full checks to make it clear exactly what the vectorizer has done. llvm-svn: 349554	2018-12-18 22:23:04 +00:00
Michael Kruse	7244852557	[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 llvm-svn: 348944	2018-12-12 17:32:52 +00:00
Craig Topper	88270231f8	[X86][LoopVectorize] Replace -mcpu=skylake-avx512 with -mattr=avx512f in some tests that failed when experimenting with defaulting to -mprefer-vector-width=256 for skylake-avx512. llvm-svn: 348063	2018-12-01 01:38:44 +00:00
Simon Pilgrim	47d38198eb	[CostModel] Add more realistic SK_InsertSubvector generic costs. Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles. llvm-svn: 346662	2018-11-12 15:20:24 +00:00
Ayal Zaks	45a3ca7be7	[LV] Avoid vectorizing loops under opt for size that involve SCEV checks Fix PR39417, PR39497 The loop vectorizer may generate runtime SCEV checks for overflow and stride==1 cases, leading to execution of original scalar loop. The latter is forbidden when optimizing for size. An assert introduced in r344743 triggered the above PR's showing it does happen. This patch fixes this behavior by preventing vectorization in such cases. Differential Revision: https://reviews.llvm.org/D53612 llvm-svn: 345959	2018-11-02 09:16:12 +00:00
Dorit Nuzman	34da6dd696	[LV] Support vectorization of interleave-groups that require an epilog under optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705	2018-10-31 09:57:56 +00:00
Simon Pilgrim	53e8e145e9	[CostModel][X86] Add realistic vXi64 uitofp vXf64 costs Match codegen improvements from D53649/rL345256 llvm-svn: 345263	2018-10-25 13:06:20 +00:00
Dorit Nuzman	5114390e48	[LV] Don't have fold-tail under optsize invalidate interleave-groups when masked-interleaving is enabled Enable interleave-groups under fold-tail scenario for Opt for size compilation; D50480 added support for vectorizing loops of arbitrary trip-count without a remiander, which in turn makes everything in the loop conditional, including interleave-groups if any. It therefore invalidated all interleave-groups because we didn't have support for vectorizing predicated interleaved-groups at the time. In the meantime, D53011 introduced this support, so we don't have to invalidate interleave-groups when masked-interleaved support is enabled. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: hsaito Differential Revision: https://reviews.llvm.org/D53559 llvm-svn: 345115	2018-10-24 07:11:38 +00:00
Dorit Nuzman	3ec99fe21b	[IAI,LV] Avoid creating a scalar epilogue due to gaps in interleave-groups when optimizing for size LV is careful to respect -Os and not to create a scalar epilog in all cases (runtime tests, trip-counts that require a remainder loop) except for peeling due to gaps in interleave-groups. This patch fixes that; -Os will now have us invalidate such interleave-groups and vectorize without an epilog. The patch also removes a related FIXME comment that is now obsolete, and was also inaccurate: "FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a smaller MaxVF that does not require a scalar epilog." (requiresScalarEpilog() has nothing to do with VF). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53420 llvm-svn: 344883	2018-10-22 06:17:09 +00:00
Ayal Zaks	b0b5312e67	[LV] Fold tail by masking to vectorize loops of arbitrary trip count under opt for size When optimizing for size, a loop is vectorized only if the resulting vector loop completely replaces the original scalar loop. This holds if no runtime guards are needed, if the original trip-count TC does not overflow, and if TC is a known constant that is a multiple of the VF. The last two TC-related conditions can be overcome by 1. rounding the trip-count of the vector loop up from TC to a multiple of VF; 2. masking the vector body under a newly introduced "if (i <= TC-1)" condition. The patch allows loops with arbitrary trip counts to be vectorized under -Os, subject to the existing cost model considerations. It also applies to loops with small trip counts (under -O2) which are currently handled as if under -Os. The patch does not handle loops with reductions, live-outs, or w/o a primary induction variable, and disallows interleave groups. (Third, final and main part of -) Differential Revision: https://reviews.llvm.org/D50480 llvm-svn: 344743	2018-10-18 15:03:15 +00:00
Anna Thomas	6f732bfb79	[LV] Teach vectorizer about variant value store into uniform address Summary: Teach vectorizer about vectorizing variant value stores to uniform address. Similar to rL343028, we do not allow vectorization if we have multiple stores to the same uniform address. Cost model already has the change for considering the extract instruction cost for a variant value store. See added test cases for how vectorization is done. The patch also contains changes to the ORE messages. Reviewers: Ayal, mkuper, anemet, hsaito Subscribers: rkruppe, llvm-commits Differential Revision: https://reviews.llvm.org/D52656 llvm-svn: 344613	2018-10-16 15:46:26 +00:00
Ayal Zaks	1a8713046d	[LV] Add test checks when vectorizing loops under opt for size; NFC Landing this as a separate part of https://reviews.llvm.org/D50480, recording current behavior more accurately, to clarify subsequent diff ([LV] Vectorizing loops of arbitrary trip count without remainder under opt for size). llvm-svn: 344606	2018-10-16 14:25:02 +00:00
Dorit Nuzman	38bbf81ade	recommit 344472 after fixing build failure on ARM and PPC. llvm-svn: 344475	2018-10-14 08:50:06 +00:00
Dorit Nuzman	5118c68cde	revert 344472 due to failures. llvm-svn: 344473	2018-10-14 07:21:20 +00:00
Dorit Nuzman	8174368955	[IAI,LV] Add support for vectorizing predicated strided accesses using masked interleave-group The vectorizer currently does not attempt to create interleave-groups that contain predicated loads/stores; predicated strided accesses can currently be vectorized only using masked gather/scatter or scalarization. This patch makes predicated loads/stores candidates for forming interleave-groups during the Loop-Vectorizer's analysis, and adds the proper support for masked-interleave- groups to the Loop-Vectorizer's planning and transformation stages. The patch also extends the TTI API to allow querying the cost of masked interleave groups (which each target can control); Targets that support masked vector loads/ stores may choose to enable this feature and allow vectorizing predicated strided loads/stores using masked wide loads/stores and shuffles. Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53011 llvm-svn: 344472	2018-10-14 07:06:16 +00:00
Sanjay Patel	05aadf885d	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization; 2nd try Re-trying r344082 because it unintentionally included extra diffs. Original commit message: icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344181	2018-10-10 20:47:46 +00:00
Sanjay Patel	58fc00d0bc	revert r344082: [InstCombine] reverse 'trunc X to <N x i1>' canonicalization This commit accidentally included the diffs from D53057. llvm-svn: 344178	2018-10-10 20:39:39 +00:00
Justin Bogner	90fde0e06f	[LV] Move test for r343954 into x86 subdirectory This test uses an x86 triple, so it needs to be in the x86 specific test directory. llvm-svn: 344087	2018-10-09 22:40:04 +00:00
Sanjay Patel	e9ca7ea3e5	[InstCombine] reverse 'trunc X to <N x i1>' canonicalization icmp ne (and X, 1), 0 --> trunc X to N x i1 Ideally, we'd do the same for scalars, but there will likely be regressions unless we add more trunc folds as we're doing here for vectors. The motivating vector case is from PR37549: https://bugs.llvm.org/show_bug.cgi?id=37549 define <4 x float> @bitwise_select(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x float> %w) { %c = fcmp ole <4 x float> %x, %y %s = sext <4 x i1> %c to <4 x i32> %s1 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 0, i32 0, i32 1, i32 1> %s2 = shufflevector <4 x i32> %s, <4 x i32> undef, <4 x i32> <i32 2, i32 2, i32 3, i32 3> %cond = or <4 x i32> %s1, %s2 %condtr = trunc <4 x i32> %cond to <4 x i1> %r = select <4 x i1> %condtr, <4 x float> %z, <4 x float> %w ret <4 x float> %r } Here's a sampling of the vector codegen for that case using mask+icmp (current behavior) vs. trunc (with this patch): AVX before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vandps LCPI0_0(%rip), %xmm0, %xmm0 vxorps %xmm1, %xmm1, %xmm1 vpcmpeqd %xmm1, %xmm0, %xmm0 vblendvps %xmm0, %xmm3, %xmm2, %xmm0 AVX after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vblendvps %xmm0, %xmm2, %xmm3, %xmm0 AVX512f before: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpbroadcastd LCPI0_0(%rip), %xmm1 ## xmm1 = [1,1,1,1] vptestnmd %zmm1, %zmm0, %k1 vblendmps %zmm3, %zmm2, %zmm0 {%k1} AVX512f after: vcmpleps %xmm1, %xmm0, %xmm0 vpermilps $80, %xmm0, %xmm1 ## xmm1 = xmm0[0,0,1,1] vpermilps $250, %xmm0, %xmm0 ## xmm0 = xmm0[2,2,3,3] vorps %xmm0, %xmm1, %xmm0 vpslld $31, %xmm0, %xmm0 vptestmd %zmm0, %zmm0, %k1 vblendmps %zmm2, %zmm3, %zmm0 {%k1} AArch64 before: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b movi v1.4s, #1 and v0.16b, v0.16b, v1.16b cmeq v0.4s, v0.4s, #0 bsl v0.16b, v3.16b, v2.16b AArch64 after: fcmge v0.4s, v1.4s, v0.4s zip1 v1.4s, v0.4s, v0.4s zip2 v0.4s, v0.4s, v0.4s orr v0.16b, v1.16b, v0.16b bsl v0.16b, v2.16b, v3.16b PowerPC-le before: xvcmpgesp 34, 35, 34 vspltisw 0, 1 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxlxor 35, 35, 35 xxland 34, 0, 32 vcmpequw 2, 2, 3 xxsel 34, 36, 37, 34 PowerPC-le after: xvcmpgesp 34, 35, 34 vmrglw 3, 2, 2 vmrghw 2, 2, 2 xxlor 0, 35, 34 xxsel 34, 37, 36, 0 Differential Revision: https://reviews.llvm.org/D52747 llvm-svn: 344082	2018-10-09 21:26:01 +00:00
Max Kazantsev	b07369651e	[LV] Do not create SCEVs on broken IR in emitTransformedIndex. PR39160 At the point when we perform `emitTransformedIndex`, we have a broken IR (in particular, we have Phis for which not every incoming value is properly set). On such IR, it is illegal to create SCEV expressions, because their internal simplification process may try to prove some predicates and break when it stumbles across some broken IR. The only purpose of using SCEV in this particular place is attempt to simplify the generated code slightly. It seems that the result isn't worth it, because some trivial cases (like addition of zero and multiplication by 1) can be handled separately if needed, but more generally InstCombine is able to achieve the goals we want to achieve by using SCEV. This patch fixes a functional crash described in PR39160, and as side-effect it also generates a bit smarter code in some simple cases. It also may cause some optimality loss (i.e. we will now generate `mul` by power of `2` instead of shift etc), but there is nothing what InstCombine could not handle later. In case of dire need, we can support more trivial cases just in place. Note that this patch only fixes one particular case of the general problem that LV misuses SCEV, attempting to create SCEVs or prove predicates on invalid IR. The general solution, however, seems complex enough. Differential Revision: https://reviews.llvm.org/D52881 Reviewed By: fhahn, hsaito llvm-svn: 343954	2018-10-08 05:46:29 +00:00
Dorit Nuzman	72f6e29980	[IAI,LV] Avoid creating interleave-groups for predicated accesse This patch fixes PR39099. When strided loads are predicated, each of them will form an interleaved-group (with gaps). However, subsequent stages of vectorization (planning and transformation) assume that if a load is part of an Interleave-Group it is not predicated, resulting in wrong code - unmasked wide loads are created. The Interleaving Analysis does take care not to have conditional interleave groups of size > 1, but until we extend the planning and transformation stages to support masked-interleave-groups we should also avoid having them for size == 1. Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D52682 llvm-svn: 343931	2018-10-07 06:57:25 +00:00
Anna Thomas	b1e3d45318	[LV][LAA] Vectorize loop invariant values stored into loop invariant address Summary: We are overly conservative in loop vectorizer with respect to stores to loop invariant addresses. More details in https://bugs.llvm.org/show_bug.cgi?id=38546 This is the first part of the fix where we start with vectorizing loop invariant values to loop invariant addresses. This also includes changes to ORE for stores to invariant address. Reviewers: anemet, Ayal, mkuper, mssimpso Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50665 llvm-svn: 343028	2018-09-25 20:57:20 +00:00
Tim Northover	12c1f7675f	InstCombine: move hasOneUse check to the top of foldICmpAddConstant There were two combines not covered by the check before now, neither of which actually differed from normal in the benefit analysis. The most recent seems to be because it was just added at the top of the function (naturally). The older is from way back in 2008 (r46687) when we just didn't put those checks in so routinely, and has been diligently maintained since. llvm-svn: 341831	2018-09-10 14:26:44 +00:00
Anna Thomas	110df11a1a	[LV] Fix code gen for conditionally executed loads and stores Fix a latent bug in loop vectorizer which generates incorrect code for memory accesses that are executed conditionally. As pointed in review, this bug definitely affects uniform loads and may affect conditional stores that should have turned into scatters as well). The code gen for conditionally executed uniform loads on architectures that support masked gather instructions is broken. Without this patch, we were unconditionally executing the conditional load in the vectorized version. This patch does the following: 1. Uniform conditional loads on architectures with gather support will have correct code generated. In particular, the cost model (setCostBasedWideningDecision) is fixed. 2. For the recipes which are handled after the widening decision is set, we use the isScalarWithPredication(I, VF) form which is added in the patch. 3. Fix the vectorization cost model for scalarization (getMemInstScalarizationCost): implement and use isPredicatedInst to identify all predicated instructions, not just scalar+predicated. So, now the cost for scalarization will be increased for maskedloads/stores and gather/scatter operations. In short, we should be choosing the gather/scatter in place of scalarization on archs where it is profitable. 4. We needed to weaken the assert in useEmulatedMaskMemRefHack. Reviewers: Ayal, hsaito, mkuper Differential Revision: https://reviews.llvm.org/D51313 llvm-svn: 341673	2018-09-07 15:53:48 +00:00
Anna Thomas	dbacea188b	[LV] First order recurrence phis should not be treated as uniform This is fix for PR38786. First order recurrence phis were incorrectly treated as uniform, which caused them to be vectorized as uniform instructions. Patch by Ayal Zaks and Orivej Desh! Reviewed by: Anna Differential Revision: https://reviews.llvm.org/D51639 llvm-svn: 341416	2018-09-04 22:12:23 +00:00
Nicola Zaghen	9588ad9611	[InstCombine] Fold icmp ugt/ult (add nuw X, C2), C --> icmp ugt/ult X, (C - C2) Support for sgt/slt was added in rL294898, this adds the same cases also for unsigned compares. This is the Alive proof: https://rise4fun.com/Alive/nyY Differential Revision: https://reviews.llvm.org/D50972 llvm-svn: 341353	2018-09-04 10:29:48 +00:00
Roman Tereshin	02320eee6b	Revert "[SCEV][NFC] Check NoWrap flags before lexicographical comparison of SCEVs" This reverts r319889. Unfortunately, wrapping flags are not a part of SCEV's identity (they do not participate in computing a hash value or in equality comparisons) and in fact they could be assigned after the fact w/o rebuilding a SCEV. Grep for const_cast's to see quite a few of examples, apparently all for AddRec's at the moment. So, if 2 expressions get built in 2 slightly different ways: one with flags set in the beginning, the other with the flags attached later on, we may end up with 2 expressions which are exactly the same but have their operands swapped in one of the commutative N-ary expressions, and at least one of them will have "sorted by complexity" invariant broken. 2 identical SCEV's won't compare equal by pointer comparison as they are supposed to. A real-world reproducer is added as a regression test: the issue described causes 2 identical SCEV expressions to have different order of operands and therefore compare not equal, which in its turn prevents LoadStoreVectorizer from vectorizing a pair of consecutive loads. On a larger example (the source of the test attached, which is a bugpoint) I have seen even weirder behavior: adding a constant to an existing SCEV changes the order of the existing terms, for instance, getAddExpr(1, ((A * B) + (C * D))) returns (1 + (C * D) + (A * B)). Differential Revision: https://reviews.llvm.org/D40645 llvm-svn: 340777	2018-08-27 21:41:37 +00:00
Max Kazantsev	20da7e467a	Revert "[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done" llvm-svn: 336410	2018-07-06 04:04:13 +00:00
Gabor Buella	da4a966e1c	NFC - Various typo fixes in tests llvm-svn: 336268	2018-07-04 13:28:39 +00:00
Max Kazantsev	3097b76e8c	[InstCombine] Delay foldICmpUsingKnownBits until simple transforms are done This patch changes order of transform in InstCombineCompares to avoid performing transforms based on ranges which produce complex bit arithmetics before more simple things (like folding with constants) are done. See PR37636 for the motivating example. Differential Revision: https://reviews.llvm.org/D48584 Reviewed By: spatel, lebedev.ri llvm-svn: 336172	2018-07-03 06:23:57 +00:00
Diego Caballero	72aed5e5dc	Move redundant-vf2-cost.ll test to X86 directory redundant-vf2-cost.ll is X86 specific. Moved from test/Transforms/LoopVectorize/redundant-vf2-cost.ll to test/Transforms/LoopVectorize/X86/redundant-vf2-cost.ll llvm-svn: 334854	2018-06-15 18:46:03 +00:00
Sanjay Patel	4b1205b40f	[TargetLibraryInfo] add mappings from LLVM sin/cos intrinsics to SVML calls These weren't included in D19544 - probably just an oversight. D40044 made it more likely that we'll have LLVM math intrinsics rather than libcalls, so this bug was more easily exposed. As the tests/code show, we already have the complete mappings for pow/exp/log. I don't have any experience with SVML, so I don't know if anything else is missing. It's also not clear to me that we should be doing this transform in IR rather than DAG/isel, but that's a separate issue. Differential Revision: https://reviews.llvm.org/D47610 llvm-svn: 334211	2018-06-07 18:21:24 +00:00
Karl-Johan Karlsson	6d52e5c3e4	[ConstantFold] Disallow folding vector geps into bitcasts Summary: Getelementptr returns a vector of pointers, instead of a single address, when one or more of its arguments is a vector. In such case it is not possible to simplify the expression by inserting a bitcast of operand(0) into the destination type, as it will create a bitcast between different sizes. Reviewers: majnemer, mkuper, mssimpso, spatel Reviewed By: spatel Subscribers: lebedev.ri, llvm-commits Differential Revision: https://reviews.llvm.org/D46379 llvm-svn: 333783	2018-06-01 19:34:35 +00:00
Sanjay Patel	affe450db7	[LoopVectorize, x86] add tests to show missing SVML transforms; NFC llvm-svn: 333707	2018-05-31 22:31:02 +00:00
Sanjay Patel	2ae1ab30e3	[LoopVectorize, x86] regenerate checks; NFC I removed the 'fast' flag from the calls because that's not required. llvm-svn: 333695	2018-05-31 21:30:36 +00:00
Alexander Ivchenko	5c54742da4	[X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part) This patch aims to match the changes introduced in gcc by https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The IBT feature definition is removed, with the IBT instructions being freely available on all X86 targets. The shadow stack instructions are also being made freely available, and the use of all these CET instructions is controlled by the module flags derived from the -fcf-protection clang option. The hasSHSTK option remains since clang uses it to determine availability of shadow stack instruction intrinsics, but it is no longer directly used. Comes with a clang patch (D46881). Patch by mike.dvoretsky Differential Revision: https://reviews.llvm.org/D46882 llvm-svn: 332705	2018-05-18 11:58:25 +00:00
Karl-Johan Karlsson	b27dd33c58	[LV] Add lit testcase for bitcast problem. NFC llvm-svn: 331878	2018-05-09 13:34:57 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Daniel Neilson	1091ca4640	[LV] Move test/Transforms/LoopVectorize/pr23997.ll Summary: This fixes a build break with r331269. test/Transforms/LoopVectorize/pr23997.ll should be in: test/Transforms/LoopVectorize/X86/pr23997.ll llvm-svn: 331281	2018-05-01 16:40:45 +00:00
Daniel Neilson	9e4bbe801a	[LV] Preserve inbounds on created GEPs Summary: This is a fix for PR23997. The loop vectorizer is not preserving the inbounds property of GEPs that it creates. This is inhibiting some optimizations. This patch preserves the inbounds property in the case where a load/store is being fed by an inbounds GEP. Reviewers: mkuper, javed.absar, hsaito Reviewed By: hsaito Subscribers: dcaballe, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D46191 llvm-svn: 331269	2018-05-01 15:35:08 +00:00
Evgeny Stupachenko	579507a53a	Revert r325687 (workaround for PR36032). Summary: Revert r325687 workaround for PR36032 since a fix was committed in r326154. Reviewers: sbaranga Differential Revision: http://reviews.llvm.org/D44768 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 328257	2018-03-22 22:04:39 +00:00
Andrei Elovikov	8b8253fdc7	[LV] Let recordVectorLoopValueForInductionCast to check if IV was created from the cast. Summary: It turned out to be error-prone to expect the callers to handle that - better to leave the decision to this routine and make the required data to be explicitly passed to the function. This handles the case that was missed in the r322473 and fixes the assert mentioned in PR36524. Reviewers: dorit, mssimpso, Ayal, dcaballe Reviewed By: dcaballe Subscribers: Ka-Ka, hiraditya, dneilson, hsaito, llvm-commits Differential Revision: https://reviews.llvm.org/D43812 llvm-svn: 327960	2018-03-20 09:04:39 +00:00
Max Kazantsev	f8d2969abb	[SCEV] Smart range calculation for SCEVUnknown Phis The range of SCEVUnknown Phi which merges values `X1, X2, ..., XN` can be evaluated as `U(Range(X1), Range(X2), ..., Range(XN))`. Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D43810 llvm-svn: 326418	2018-03-01 06:56:48 +00:00
Evgeny Stupachenko	f1c058d99b	Fix r326154 buildbots test fail Summary: Add specific mtriples to tests added in r326154. From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 326158	2018-02-27 01:33:11 +00:00
Alexey Bataev	650f639d33	[LV] Fix test checks, NFC llvm-svn: 325699	2018-02-21 16:48:23 +00:00
Sanjay Patel	e6143904b9	revert r325515: [TTI CostModel] change default cost of FP ops to 1 (PR36280) There are too many perf regressions resulting from this, so we need to investigate (and add tests for) targets like ARM and AArch64 before trying to reinstate. llvm-svn: 325658	2018-02-21 01:42:52 +00:00
Alexey Bataev	42bcec7d38	[LV] Fix test checks, NFC. llvm-svn: 325617	2018-02-20 19:49:25 +00:00
Sanjay Patel	3e8a76abfd	[TTI CostModel] change default cost of FP ops to 1 (PR36280) This change was mentioned at least as far back as: https://bugs.llvm.org/show_bug.cgi?id=26837#c26 ...and I found a real program that is harmed by this: Himeno running on AMD Jaguar gets 6% slower with SLP vectorization: https://bugs.llvm.org/show_bug.cgi?id=36280 ...but the change here appears to solve that bug only accidentally. The div/rem costs for x86 look very wrong in some cases, but that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but that's an independent problem (though the solution may be adjusted after this change is made). Differential Revision: https://reviews.llvm.org/D43079 llvm-svn: 325515	2018-02-19 16:11:44 +00:00
Sanjay Patel	42b8c23cc6	[LoopVectorize] auto-generate complete checks; NFC llvm-svn: 324611	2018-02-08 15:13:47 +00:00
Craig Topper	0d797a34d8	[X86] Add support for passing 'prefer-vector-width' function attribute into X86Subtarget and exposing via X86's getRegisterWidth TTI interface. This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for. I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10. This has been split from D41895 with the remainder in a follow up commit. llvm-svn: 323015	2018-01-20 00:26:08 +00:00
Craig Topper	83b0a98902	[X86] Use vmovdqu64/vmovdqa64 for unmasked integer vector stores for consistency with loads. Previously we used 64 for vXi64 stores and 32 for everything else. This change uses 64 for everything just like do for loads. llvm-svn: 322820	2018-01-18 07:44:09 +00:00
Hal Finkel	92ea8acbcd	Move Transforms/LoopVectorize/consecutive-ptr-cg-bug.ll into the X86 subdirectory This test depends on X86's TTI; move into the X86 subdirectory. llvm-svn: 320914	2017-12-16 05:10:20 +00:00
Dorit Nuzman	927b31600e	[LV] Ignore the cost of values that will not appear in the vectorized loop VecValuesToIgnore holds values that will not appear in the vectorized loop. We should therefore ignore their cost when VF > 1. Differential Revision: https://reviews.llvm.org/D40883 llvm-svn: 320463	2017-12-12 08:57:43 +00:00
Gil Rapaport	848581cadb	[LV] Introduce VPBlendRecipe, VPWidenMemoryInstructionRecipe This patch is part of D38676. The patch introduces two new Recipes to handle instructions whose vectorization involves masking. These Recipes take VPlan-level masks in D38676, but still rely on ILV's existing createEdgeMask(), createBlockInMask() in this patch. VPBlendRecipe handles intra-loop phi nodes, which are vectorized as a sequence of SELECTs. Its execute() code is refactored out of ILV::widenPHIInstruction(), which now handles only loop-header phi nodes. VPWidenMemoryInstructionRecipe handles load/store which are to be widened (but are not part of an Interleave Group). In this patch it simply calls ILV::vectorizeMemoryInstruction on execute(). Differential Revision: https://reviews.llvm.org/D39068 llvm-svn: 318149	2017-11-14 12:09:30 +00:00
Sanjay Patel	b049173157	[SimplifyCFG] use pass options and remove the latesimplifycfg pass This is no-functional-change-intended. This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and D35411 (defer folding unconditional branches) with pass parameters rather than a named "latesimplifycfg" pass. Now that we have individual options to control the functionality, we could decouple when these fire (but that's an independent patch if desired). The next planned step would be to add another option bit to disable the sinking transform mentioned in D38566. This should also make it clear that the new pass manager needs to be updated to limit simplifycfg in the same way as the old pass manager. Differential Revision: https://reviews.llvm.org/D38631 llvm-svn: 316835	2017-10-28 18:43:07 +00:00
Anna Thomas	19529f75b9	[LV] Avoid computing the register usage for default VF. NFC These are changes to reduce redundant computations when calculating a feasible vectorization factor: 1. early return when target has no vector registers 2. don't compute register usage for the default VF. Suggested during review for D37702. llvm-svn: 313176	2017-09-13 19:35:45 +00:00
Anna Thomas	9f1be02fa3	[LV] Clamp the VF to the trip count Summary: When the MaxVectorSize > ConstantTripCount, we should just clamp the vectorization factor to be the ConstantTripCount. This vectorizes loops where the TinyTripCountThreshold >= TripCount < MaxVF. Earlier we were finding the maximum vector width, which could be greater than the trip count itself. The Loop vectorizer does all the work for generating a vectorizable loop, but in the end we would always choose the scalar loop (since the VF > trip count). This allows us to choose the VF keeping in mind the trip count if available. This is a fix on top of rL312472. Reviewers: Ayal, zvi, hfinkel, dneilson Reviewed by: Ayal Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37702 llvm-svn: 313046	2017-09-12 16:32:45 +00:00
Zvi Rackover	9a087a357a	LoopVectorize: MaxVF should not be larger than the loop trip count Summary: Improve how MaxVF is computed while taking into account that MaxVF should not be larger than the loop's trip count. Other than saving on compile-time by pruning the possible MaxVF candidates, this patch fixes pr34438 which exposed the following flow: 1. Short trip count identified -> Don't bail out, set OptForSize:=True to avoid tail-loop and runtime checks. 2. Compute MaxVF returned 16 on a target supporting AVX512. 3. OptForSize -> choose VF:=MaxVF. 4. Bail out because TripCount = 8, VF = 16, TripCount % VF !=0 means we need a tail loop. With this patch step 2. will choose MaxVF=8 based on TripCount. Reviewers: Ayal, dorit, mkuper, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D37425 llvm-svn: 312472	2017-09-04 08:35:13 +00:00
Elena Demikhovsky	f58f838495	Changed basic cost of store operation on X86 Store operation takes 2 UOps on X86 processors. The exact cost calculation affects several optimization passes including loop unroling. This change compensates performance degradation caused by https://reviews.llvm.org/D34458 and shows improvements on some benchmarks. Differential Revision: https://reviews.llvm.org/D35888 llvm-svn: 311285	2017-08-20 12:34:29 +00:00
Aditya Kumar	a525fffd07	[Loop Vectorize] Added a separate metadata Added a separate metadata to indicate when the loop has already been vectorized instead of setting width and count to 1. Patch written by Divya Shanmughan and Aditya Kumar Differential Revision: https://reviews.llvm.org/D36220 llvm-svn: 311281	2017-08-20 10:32:41 +00:00

1 2 3 4 5 ...

473 Commits