llvm-project

Commit Graph

Author	SHA1	Message	Date
Max Kazantsev	204c0bbe7f	[Test] Fix test failure: platform-dependent printout	2020-04-20 10:27:50 +07:00
Max Kazantsev	80cd36ed63	[Test] Add a test showing how CFG analyses are invalidated after LV It demonstrates that, even if LV does no actual vectorization and only forms LCSSA, CFG analyses get dropped.	2020-04-20 09:38:49 +07:00
Ayal Zaks	8e0c5f7200	[LV] Mark first-order recurrences as allowed exits First-order recurrences require special treatment when they are live-out; such treatment is provided by fixFirstOrderRecurrence(), so they should be included in AllowedExit set. (Should probably have been included originally in D16197.) Fixes PR45526: AllowedExit set is used by prepareToFoldTailByMasking() to check whether the treatment for live-outs also holds when folding the tail, which is not (yet) the case for first-order recurrences. Differential Revision: https://reviews.llvm.org/D78210	2020-04-18 23:54:21 +03:00
Florian Hahn	4ee45ab60f	[LV] Invalidate cost model decisions along with interleave groups. Cost-modeling decisions are tied to the compute interleave groups (widening decisions, scalar and uniform values). When invalidating the interleave groups, those decisions also need to be invalidated. Otherwise there is a mis-match during VPlan construction. VPWidenMemoryRecipes created initially are left around w/o converting them into VPInterleave recipes. Such a conversion indeed should not take place, and these gather/scatter recipes may in fact be right. The crux is leaving around obsolete CM_Interleave (and dependent) markings of instructions along with their costs, instead of recalculating decisions, costs, and recipes. Alternatively to forcing a complete recompute later on, we could try to selectively invalidate the decisions connected to the interleave groups. But we would likely need to run the uniform/scalar value detection parts again anyways and the extra complexity is probably not worth it. Fixes PR45572. Reviewers: gilr, rengolin, Ayal, hsaito Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D78298	2020-04-18 10:23:49 +01:00
Sjoerd Meijer	5be767d489	NFC: remove outdated TODOs from ARM test file.	2020-04-17 17:08:11 +01:00
Gil Rapaport	b747d72c19	[LV] Fix PR45525: Incorrect assert in blend recipe Fix an assert introduced in 41ed5d856c1: a phi with a single predecessor and a mask is a valid case which is already supported by the code. Differential Revision: https://reviews.llvm.org/D78115	2020-04-15 10:39:07 +03:00
Sjoerd Meijer	3ef614a007	NFC: update of ARM llvm regr test, follow up of `9633fc14ae`.	2020-04-14 21:30:22 +01:00
Sjoerd Meijer	9633fc14ae	[LV][ARM] Add tail-folding tests for MVE. NFC. D77635 added support to recognise primary induction variables for counting-down loops. This allows us to fold the scalar tail loop into the main vector body, which we need for MVE tail-predication. This adds some ARM tail-folding test cases that we want to support. This test was extracted from D76838, which implemented a different approach to reverse and thus find a primary induction variable.	2020-04-14 16:03:29 +01:00
Jon Roelofs	0b0bb1969f	[llvm] Fix yet more missing FileCheck colons	2020-04-13 10:49:19 -06:00
Ayal Zaks	1678489234	[LV] FoldTail w/o Primary Induction Introduce a new VPWidenCanonicalIVRecipe to generate a canonical vector induction for use in fold-tail-with-masking, if a primary induction is absent. The canonical scalar IV having start = 0 and step = VFUF, created during code -gen to control the vector loop, is widened into a canonical vector IV having start = {<PartVF, PartVF+1, ..., PartVF+VF-1> for 0 <= Part < UF} and step = <VFUF, VFUF, ..., VF*UF>. Differential Revision: https://reviews.llvm.org/D77635	2020-04-09 17:45:23 +03:00
Craig Topper	ca376782ff	[LoopVectorize] Move testing for SVML vectorization of exp2f_finite/exp2_finite from svml-calls.ll to svml-calls-finite.ll where the finite versions of log, pow, and exp already were.	2020-04-08 18:13:55 -07:00
Jonathan Roelofs	7c5d2bec76	[llvm] Fix missing FileCheck directive colons https://reviews.llvm.org/D77352	2020-04-06 09:59:08 -06:00
laith sakka	a0983ed3d2	Handle exp2 with proper vectorization and lowering to SVML calls Summary: Add mapping from exp2 math functions to corresponding SVML calls. This is a follow up and extension for llvm diff https://reviews.llvm.org/D19544 Test Plan: - update test case and run ninja check. - run tests locally Reviewers: wenlei, hoyFB, mmasten, mzolotukhin, spatel Reviewed By: spatel Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77114	2020-04-02 21:11:13 -07:00
Vedant Kumar	dcc410b5cf	[LoopVectorize] Fix crash on "getNoopOrZeroExtend cannot truncate!" (PR45259) In InnerLoopVectorizer::getOrCreateTripCount, when the backedge taken count is a SCEV add expression, its type is defined by the type of the last operand of the add expression. In the test case from PR45259, this last operand happens to be a pointer, which (according to llvm::Type) does not have a primitive size in bits. In this case, LoopVectorize fails to truncate the SCEV and crashes as a result. Uing ScalarEvolution::getTypeSizeInBits makes the truncation work as expected. https://bugs.llvm.org/show_bug.cgi?id=45259 Differential Revision: https://reviews.llvm.org/D76669	2020-03-30 10:14:14 -07:00
Roman Lebedev	1badf7c33a	[InstComine] Forego of one-use check in `(X - (X & Y)) --> (X & ~Y)` if Y is a constant Summary: This is potentially more friendly for further optimizations, analysies, e.g.: https://godbolt.org/z/G24anE This resolves phase-ordering bug that was introduced in D75145 for https://godbolt.org/z/2gBwF2 https://godbolt.org/z/XvgSua Reviewers: spatel, nikic, dmgreen, xbolva00 Reviewed By: nikic, xbolva00 Subscribers: hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75757	2020-03-06 21:39:07 +03:00
David Green	587feec07e	[ARM] Change all tests from "thumbv8.1-m.main" to "thumbv8.1m.main". NFC	2020-03-04 13:47:35 +00:00
David Green	ec7e4a9a80	[LoopVectorizer] Add reduction tests for inloop reductions. NFC Also adds a force-reduction-intrinsics option for testing, for forcing the generation of reduction intrinsics even when the backend is not requesting them.	2020-03-03 10:54:00 +00:00
Simon Pilgrim	168a44a70e	[CostModel][X86] Improve extract/insert element costs (PR43605) This tries to improve the accuracy of extract/insert element costs by accounting for subvector extraction/insertion for >128-bit vectors and the shuffling of elements to/from the 0'th index. It also adds INSERTPS for f32 types and PINSR/PEXTR costs for integer types (at the moment we assume the same cost as MOVD/MOVQ - which isn't always true). Differential Revision: https://reviews.llvm.org/D74976	2020-02-27 15:54:13 +00:00
Nemanja Ivanovic	b9f3686056	Fix buildbot break after `c46b85aaf4` I added test cases that rely on the availability of the PPC target into the general directory for the loop vectorizer. This causes failures on bots that don't build the PPC target. Moving them to the PowerPC directory to fix this.	2020-02-26 21:56:11 -06:00
Nemanja Ivanovic	c46b85aaf4	[LoopVectorize] Fix cost for calls to functions that have vector versions A recent commit (https://reviews.llvm.org/rG66c120f02560ef528a60924104ead66f330190f1) changed the cost for calls to functions that have a vector version for some vectorization factor. However, no check is performed for whether the vectorization factor matches the current one being cost modeled. This leads to attempts to widen call instructions to a vectorization factor for which such a function does not exist, which in turn leads to an assertion failure. This patch adds the check for vectorization factor (i.e. not just that the called function has a vector version for some VF, but that it has a vector version for this VF). Differential revision: https://reviews.llvm.org/D74944	2020-02-26 21:39:11 -06:00
Roman Lebedev	d6f47aeb51	[SCEV] SCEVExpander::isHighCostExpansionHelper(): cost-model min/max (PR44668) Summary: Previosly we simply always said that `SCEVMinMaxExpr` is too costly to expand. But this isn't really true, it expands into just a comparison+swap pair. And again much like with add/mul, there will be one less such pair than the number of operands. And we need to count the cost of operands themselves. This does change a number of testcases, and as far as i can tell, all of these changes are improvements, in the sense that we fixed up more latches to do the [in]equality comparison. This concludes cost-modelling changes, no other SCEV expressions exist as of now. This is a part of addressing [[ https://bugs.llvm.org/show_bug.cgi?id=44668 \| PR44668 ]]. Reviewers: reames, mkazantsev, wmi, sanjoy Reviewed By: mkazantsev Subscribers: hiraditya, javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73744	2020-02-25 23:05:59 +03:00
Simon Pilgrim	2769fb90f0	[LoopVectorize][X86] Regenerate tests. NFCI.	2020-02-21 18:23:55 +00:00
Roman Lebedev	3bd33ccfdf	[NFC?][SCEV][LoopVectorize] Add datalayout to the X86/float-induction-x86.ll test Summary: Currently, `SCEVExpander::isHighCostExpansionHelper()` has the following logic: ``` if (auto UDivExpr = dyn_cast<SCEVUDivExpr>(S)) { // If the divisor is a power of two and the SCEV type fits in a native // integer (and the LHS not expensive), consider the division cheap // irrespective of whether it occurs in the user code since it can be // lowered into a right shift. if (auto SC = dyn_cast<SCEVConstant>(UDivExpr->getRHS())) if (SC->getAPInt().isPowerOf2()) { if (isHighCostExpansionHelper(UDivExpr->getLHS(), L, At, BudgetRemaining, TTI, Processed)) return true; const DataLayout &DL = L->getHeader()->getParent()->getParent()->getDataLayout(); unsigned Width = cast<IntegerType>(UDivExpr->getType())->getBitWidth(); return DL.isIllegalInteger(Width); } ``` Since this test does not have a datalayout specified, `SCEVExpander::isHighCostExpansionHelper()` says that `[[TMP2:%.*]] = lshr exact i64 [[TMP1]], 5` is high-cost, and didn't perform it. But future patches will change that logic to solely rely on cost-model, without any such datalayout checks, so i think it is best to show that that change is ephemeral, and can already happen without costmodel changes. Reviewers: reames, fhahn, sanjoy, craig.topper, RKSimon Reviewed By: RKSimon Subscribers: javed.absar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73717	2020-02-12 12:27:38 +03:00
Sanjay Patel	e78fb556c5	[InstCombine] reassociate splatted vector ops bo (splat X), (bo Y, OtherOp) --> bo (splat (bo X, Y)), OtherOp This patch depends on the splat analysis enhancement in D73549. See the test with comment: ; Negative test - mismatched splat elements ...as the motivation for that first patch. The motivating case for reassociating splatted ops is shown in PR42174: https://bugs.llvm.org/show_bug.cgi?id=42174 In that example, a slight change in order-of-associative math results in a big difference in IR and codegen. This patch gets all of the unnecessary shuffles out of the way, but doesn't address the potential scalarization (see D50992 or D73480 for that). Differential Revision: https://reviews.llvm.org/D73703	2020-02-03 09:08:36 -05:00
Simon Pilgrim	105e5c940c	[ValueTracking] Add DemandedElts support to computeKnownBits/ComputeNumSignBits (PR36319) This patch adds initial support for a DemandedElts mask to the internal computeKnownBits/ComputeNumSignBits methods, matching the SelectionDAG and GlobalISel equivalents. So far only a couple of instructions have been setup to handle the DemandedElts, the remainder still using the existing 'all elements' default. The plan is to extend support as we have test coverage. Differential Revision: https://reviews.llvm.org/D73435	2020-02-01 12:45:46 +00:00
Florian Hahn	a911fef3dd	[LV] Do not try to sink dead instructions. Dead instructions do not need to be sunk. Currently we try and record the recipies for them, but there are no recipes emitted for them and there's nothing to sink. They can be removed from SinkAfter while marking them for recording. Fixes PR44634. Reviewers: rengolin, hsaito, fhahn, Ayal, gilr Reviewed By: gilr Differential Revision: https://reviews.llvm.org/D73423	2020-01-28 08:28:03 -08:00
Wei Mi	f60671f049	[LV] Remove nondeterminacy by changing LoopVectorizationLegality::Reductions from DenseMap to MapVector The iteration order of LoopVectorizationLegality::Reductions matters for the final code generation, so we better use MapVector instead of DenseMap for it to remove the nondeterminacy. reduction-order.ll in the patch is an example reduced from the case we saw. In the output of opt command, the order of the select instructions in the vector.body block keeps changing from run to run currently. Differential Revision: https://reviews.llvm.org/D73490	2020-01-27 16:53:20 -08:00
Roman Lebedev	7bca4a28f5	[NFC][LoopVectorize] Autogenerate tests affected by isHighCostExpansionHelper() cost modelling (PR44668)	2020-01-27 23:34:30 +03:00
David Green	b535aa405a	[ARM] Use reduction intrinsics for larger than legal reductions The codegen for splitting a llvm.vector.reduction intrinsic into parts will be better than the codegen for the generic reductions. This will only directly effect when vectorization factors are specified by the user. Also added tests to make sure the codegen for larger reductions is OK. Differential Revision: https://reviews.llvm.org/D72257	2020-01-24 17:07:24 +00:00
Florian Hahn	f14f2a8568	[LV] Fix predication for branches with matching true and false succs. Currently due to the edge caching, we create wrong predicates for branches with matching true and false successors. We will cache the condition for the edge from the true successor, and then lookup the same edge (src and dst are the same) for the edge to the false successor. If both successors match, the condition should always be true. At the moment, we cannot really create constant VPValues, but we can just create a true condition as X \| !X. Later passes will clean that up. Fixes PR44488. Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D73079	2020-01-22 18:34:11 -08:00
Florian Hahn	39ae86ab72	[AArch64TTI] AArch64 supports NT vector stores through STNP. This patch adds a custom implementation of isLegalNTStore to AArch64TTI that supports vector types that can be directly stored by STNP. Note that the implementation may not catch all valid cases (e.g. because the vector is a multiple of 256 and could be broken down to multiple valid 256 bit stores), but it is good enough for LV to vectorize loops with NT stores, as LV only passes in a vector with 2 elements to check. LV seems to also be the only user of isLegalNTStore. We should also do the same for NT loads, but before that we need to ensure that we properly lower LDNP of vectors, similar to D72919. Reviewers: dmgreen, samparker, t.p.northover, ab Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D73158	2020-01-22 16:45:24 -08:00
Evgeniy Brevnov	af7e158872	[LV] Vectorizer should adjust trip count in profile information Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly. Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov Reviewed By: Ayal, DaniilSuchkov Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D67905	2020-01-20 18:36:28 +07:00
Francesco Petrogalli	66c120f025	[VectorUtils] Rework the Vector Function Database (VFDatabase). Summary: This commits is a rework of the patch in https://reviews.llvm.org/D67572. The rework was requested to prevent out-of-tree performance regression when vectorizing out-of-tree IR intrinsics. The vectorization of such intrinsics is enquired via the static function `isTLIScalarize`. For detail see the discussion in https://reviews.llvm.org/D67572. Reviewers: uabelho, fhahn, sdesmalen Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72734	2020-01-16 15:08:26 +00:00
Florian Hahn	23c113802e	[LV] Allow assume calls in predicated blocks. The assume intrinsic is intentionally marked as may reading/writing memory, to avoid passes moving them around. When flattening the CFG for predicated blocks, we have to drop the assume calls, as they are control-flow dependent. There are some cases where we can do better (when control flow is preserved), but that is follow-up work. Fixes PR43620. Reviewers: hsaito, rengolin, dcaballe, Ayal Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D68814	2020-01-16 10:11:35 +00:00
Florian Hahn	59ac44b3c1	[LV] Make X86/assume.ll X86 independent (NFC). The test does not check anything X86 specific. This is a preparation for the D68814.	2020-01-16 10:01:35 +00:00
Momchil Velikov	173b711e83	[ARM][MVE] MVE-I should not be disabled by -mfpu=none Architecturally, it's allowed to have MVE-I without an FPU, thus -mfpu=none should not disable MVE-I, or moves to/from FP-registers. This patch removes `+/-fpregs` from features unconditionally added to target feature list, depending on FPU and moves the logic to Clang driver, where the negative form (`-fpregs`) is conditionally added to the target features list for the cases of `-mfloat-abi=soft`, or `-mfpu=none` without either `+mve` or `+mve.fp`. Only the negative form is added by the driver, the positive one is derived from other features in the backend. Differential Revision: https://reviews.llvm.org/D71843	2020-01-09 14:03:25 +00:00
Sjoerd Meijer	8f1887456a	[LV] Still vectorise when tail-folding can't find a primary inducation variable This addresses a vectorisation regression for tail-folded loops that are counting down, e.g. loops as simple as this: void foo(char A, char B, char C, uint32_t N) { while (N > 0) { C++ = A++ + B++; N--; } } These are loops that can be vectorised, but when tail-folding is requested, it can't find a primary induction variable which we do need for predicating the loop. As a result, the loop isn't vectorised at all, which it is able to do when tail-folding is not attempted. So, this adds a check for the primary induction variable where we decide how to lower the scalar epilogue. I.e., when there isn't a primary induction variable, a scalar epilogue loop is allowed (i.e. don't request tail-folding) so that vectorisation could still be triggered. Having this check for the primary induction variable make sense anyway, and in addition, in a follow-up of this I will look into discovering earlier the primary induction variable for counting down loops, so that this can also be tail-folded. Differential revision: https://reviews.llvm.org/D72324	2020-01-09 09:14:00 +00:00
Matt Arsenault	f26ed6e47c	llc: Change behavior of -mcpu with existing attribute Don't overwrite existing target-cpu attributes. I've often found the replacement behavior annoying, and this is inconsistent with how the fast math command line flags interact with the function attributes. Does not yet change target-features, since I think that should behave as a concatenation.	2020-01-07 10:10:25 -05:00
Jinsong Ji	e29a2e6be4	[PowerPC][LoopVectorize] Extend getRegisterClassForType to consider double and other floating point type In https://reviews.llvm.org/D67148, we use isFloatTy to test floating point type, otherwise we return GPRRC. So 'double' will be classified as GPRRC, which is not accurate. This patch covers other floating point types. Reviewed By: #powerpc, nemanjai Differential Revision: https://reviews.llvm.org/D71946	2020-01-06 18:44:59 +00:00
Jinsong Ji	1d7990228f	[PowerPC][LoopVectorize] Add tests for fp128 and fp16 Add two tests to reg-usage.ll	2020-01-03 21:39:29 +00:00
Jinsong Ji	e8c5600de8	[PowerPC][LoopVectorize]Add floating point reg usage test Copied two tests from x86 to test floating point reg usage.	2019-12-27 20:37:23 +00:00
Fangrui Song	a36ddf0aa9	Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351	2019-12-24 16:27:51 -08:00
Fangrui Song	eb16435b5e	Migrate function attribute "no-frame-pointer-elim-non-leaf" to "frame-pointer"="non-leaf" as cleanups after D56351	2019-12-24 16:05:15 -08:00
Fangrui Song	502a77f125	Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351	2019-12-24 15:57:33 -08:00
Ayal Zaks	e498be5738	[LV] Strip wrap flags from vectorized reductions A sequence of additions or multiplications that is known not to wrap, may wrap if it's order is changed (i.e., reassociated). Therefore when vectorizing integer sum or product reductions, their no-wrap flags need to be removed. Fixes PR43828 Patch by Denis Antrushin Differential Revision: https://reviews.llvm.org/D69563	2019-12-20 14:48:53 +02:00
Nemanja Ivanovic	a5da8d90da	[PowerPC] Add missing legalization for vector BSWAP We somehow missed doing this when we were working on Power9 exploitation. This just adds the missing legalization and cost for producing the vector intrinsics. Differential revision: https://reviews.llvm.org/D70436	2019-12-17 19:07:34 -06:00
David Green	d6642ed1c8	[ARM] Add missing REQUIRES: asserts to test. NFC	2019-12-09 11:43:43 +00:00
David Green	b1aba0378e	[ARM] Enable MVE masked loads and stores With the extra optimisations we have done, these should now be fine to enable by default. Which is what this patch does. Differential Revision: https://reviews.llvm.org/D70968	2019-12-09 11:37:34 +00:00
David Green	be7a107070	[ARM] Teach the Arm cost model that a Shift can be folded into other instructions This attempts to teach the cost model in Arm that code such as: %s = shl i32 %a, 3 %a = and i32 %s, %b Can under Arm or Thumb2 become: and r0, r1, r2, lsl #3 So the cost of the shift can essentially be free. To do this without trying to artificially adjust the cost of the "and" instruction, it needs to get the users of the shl and check if they are a type of instruction that the shift can be folded into. And so it needs to have access to the actual instruction in getArithmeticInstrCost, which if available is added as an extra parameter much like getCastInstrCost. We otherwise limit it to shifts with a single user, which should hopefully handle most of the cases. The list of instruction that the shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR, ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and ICmp. Differential Revision: https://reviews.llvm.org/D70966	2019-12-09 10:24:33 +00:00
David Green	f008b5b8ce	[ARM] Additional tests and minor formatting. NFC This adds some extra cost model tests for shifts, and does some minor adjustments to some Neon code to make it clear as to what it applies to. Both NFC.	2019-12-09 10:24:33 +00:00

1 2 3 4 5 ...

937 Commits