llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	f114ef3731	[CostModel][X86] Add generic costs for vXi32 MUL -> v2Xi16 PMADDDW folds Based off the improved fold in D108522 This should eventually allow us to replace the SLM only cost patterns with generic versions.	2021-09-05 16:08:11 +01:00
Simon Pilgrim	10c982e0b3	Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450	2021-08-23 21:09:26 +01:00
Dorit Nuzman	67278b8a90	[LV] Support Interleaved Store Group With Gaps Teach LV to use masked-store to support interleave-store-group with gaps (instead of scatters/scalarization). The symmetric case of using masked-load to support interleaved-load-group with gaps was introduced a while ago, by https://reviews.llvm.org/D53668; This patch completes the store-scenario leftover from D53668, and solves PR50566. Reviewed by: Ayal Zaks Differential Revision: https://reviews.llvm.org/D104750	2021-08-08 10:32:02 +03:00
Simon Pilgrim	1c9bec727a	[InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069) As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead. I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst. https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!) Differential Revision: https://reviews.llvm.org/D106450	2021-07-22 10:58:51 +01:00
Mindong Chen	e908e063d1	[LoopUtils] Fix incorrect RT check bounds of loop-invariant mem accesses This fixes the lower and upper bound calculation of a RuntimeCheckingPtrGroup when it has more than one loop invariant pointers. Resolves PR50686. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D104148	2021-07-19 19:38:24 +08:00
Mindong Chen	f3814ed3e9	[LV] Re-generate check lines of some fragile tests (NFC) Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D105438	2021-07-19 19:38:24 +08:00
Simon Pilgrim	ae0d73ac3b	[CostModel][X86] Adjust fptosi/fptoui SSE/AVX legalized costs based on llvm-mca reports. Update (mainly) vXf32/vXf64 -> vXi8/vXi16 fptosi/fptoui costs based on the worst case costs from the script in D103695. Move to using legalized types wherever possible, which allows us to prune the cost tables.	2021-07-12 20:38:25 +01:00
Alexey Bataev	0d74fd3fdf	[SLP][COST][X86]Improve cost model for masked gather. Revived D101297 in its original form + added some changes in X86 legalization cehcking for masked gathers. This solution is the most stable and the most correct one. We have to check the legality before trying to build the masked gather in SLP. Without this check we have incorrect cost (for SLP) in case if the masked gather is not legal/slower than the gather. And we're missing some vectorization opportunities. This can be fixed in the cost model, but in this case we need to add special checks for the cost of GEPs for ScatterVectorize node, add special check for small trees, etc., i.e. there are a lot of corner cases here and there, which insrease code base and make it harder to maintain the code. > Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed. The question from D101297. Actually, no, it can't. Actually, simple gather may give us better result, especially after we started vectorization of insertelements. Plus, like I said before, the cost for non-legal masked gathers leads to missed vectorization opportunities. Differential Revision: https://reviews.llvm.org/D105042	2021-07-08 11:53:30 -07:00
Simon Pilgrim	cdca1785d3	[CostModel][X86] Adjust uitofp(vXi64) SSE/AVX legalized costs based on llvm-mca reports. Update v4i64 -> v4f32/v4f64 uitofp costs based on the worst case costs from the script in D103695. Fixes a few regressions before we start adding AVX costs for legalized types.	2021-07-02 13:09:00 +01:00
Simon Pilgrim	0af9b25aff	[LoopVectorize][X86] Regenerate conversion-cost.ll tests	2021-07-01 15:34:20 +01:00
Florian Hahn	80aa7e147e	[VPlan] Merge predicated-triangle regions, after sinking. Sinking scalar operands into predicated-triangle regions may allow merging regions. This patch adds a VPlan-to-VPlan transform that tries to merge predicate-triangle regions after sinking. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100260	2021-06-28 11:10:38 +01:00
Eli Friedman	8f3d16905d	[ScalarEvolution] Ensure backedge-taken counts are not pointers. A backedge-taken count doesn't refer to memory; returning a pointer type is nonsense. So make sure we always return an integer. The obvious way to do this would be to just convert the operands of the icmp to integers, but that doesn't quite work out at the moment: isLoopEntryGuardedByCond currently gets confused by ptrtoint operations. So we perform the ptrtoint conversion late for lt/gt operations. The test changes are mostly innocuous. The most interesting changes are more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to i64)) + %ptr)". This is expected: we can't fold this to zero because we need to preserve the pointer base. The call to isLoopEntryGuardedByCond in howFarToZero is less precise because of ptrtoint operations; this shows up in the function pr46786_c26_char in ptrtoint.ll. Fixing it here would require more complex refactoring. It should eventually be fixed by future improvements to isImpliedCond. See https://bugs.llvm.org/show_bug.cgi?id=46786 for context. Differential Revision: https://reviews.llvm.org/D103656	2021-06-21 16:24:16 -07:00
Joachim Meyer	4f01122c3f	[LV] Parallel annotated loop does not imply all loads can be hoisted. As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads. This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL. The question remains why this was initially added and what the implications of removing this optimization would be. Do we need an alternative mechanism to propagate the information about legality of if-conversion? Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general? I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous. Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103907	2021-06-10 23:37:57 +02:00
Florian Hahn	23c2f2e6b2	[LV] Mark increment of main vector loop induction variable as NUW. This patch marks the induction increment of the main induction variable of the vector loop as NUW when not folding the tail. If the tail is not folded, we know that End - Start >= Step (either statically or through the minimum iteration checks). We also know that both Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV + %Step == %End. Hence we must exit the loop before %IV + %Step unsigned overflows and we can mark the induction increment as NUW. This should make SCEV return more precise bounds for the created vector loops, used by later optimizations, like late unrolling. At the moment quite a few tests still need to be updated, but before doing so I'd like to get initial feedback to make sure I am not missing anything. Note that this could probably be further improved by using information from the original IV. Attempt of modeling of the assumption in Alive2: https://alive2.llvm.org/ce/z/H_DL_g Part of a set of fixes required for PR50412. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D103255	2021-06-07 10:47:52 +01:00
Eli Friedman	925cd6b467	Regenerate a few tests related to SCEV. In preparation for https://reviews.llvm.org/D103656	2021-06-04 13:35:00 -07:00
Juneyoung Lee	7161bb87c9	[InsCombine] Fix a few remaining vec transforms to use poison instead of undef This is a patch that replaces shufflevector and insertelement's placeholder value with poison. Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead (D93818) The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 . This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.	2021-05-31 18:47:09 +09:00
serge-sans-paille	4ab3041acb	Revert "[NFC] remove explicit default value for strboolattr attribute in tests" This reverts commit `bda6e5bee0`. See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance	2021-05-24 19:43:40 +02:00
serge-sans-paille	bda6e5bee0	[NFC] remove explicit default value for strboolattr attribute in tests Since `d6de1e1a71`, no attributes is quivalent to setting attribute to false. This is a preliminary commit for https://reviews.llvm.org/D99080	2021-05-24 19:31:04 +02:00
Florian Hahn	65d3dd7c88	[VPlan] Add first VPlan version of sinkScalarOperands. This patch adds a first VPlan-based implementation of sinking of scalar operands. The current version traverse a VPlan once and processes all operands of a predicated REPLICATE recipe. If one of those operands can be sunk, it is moved to the block containing the predicated REPLICATE recipe. Continue with processing the operands of the sunk recipe. The initial version does not re-process candidates after other recipes have been sunk. It also cannot partially sink induction increments at the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the induction is used for example in a GEP, only the first lane is used and in the lowered IR the adds for the other lanes can be sunk into the predicated blocks. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D100258	2021-05-24 15:29:58 +01:00
Simon Pilgrim	2fca555866	[CostModel][X86] Improve fneg costs These are always lowered as xor ops, so are always cheap	2021-05-21 17:23:45 +01:00
Juneyoung Lee	8a156d1c27	[InstCombine] Fully disable select to and/or i1 folding This is a patch that disables the poison-unsafe select -> and/or i1 folding. It has been blocking D72396 and also has been the source of a few miscompilations described in llvm.org/pr49688 . D99674 conditionally blocked this folding and successfully fixed the latter one. The former one was still blocked, and this patch addresses it. Note that a few test functions that has `_logical` suffix are now deoptimized. These are created by @nikic to check the impact of disabling this optimization by copying existing original functions and replacing and/or with select. I can see that most of these are poison-unsafe; they can be revived by introducing freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll) because I think they are good targets for freeze fix. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101191	2021-05-06 09:29:52 +09:00
Juneyoung Lee	e639bccefd	run update_test_checks.py for the tests in D101191 (NFC) This is an NFC that reruns update_test_checks.py on the tests that are going to be updated in D101191.	2021-05-02 13:11:57 +09:00
Bardia Mahjour	ddb3b26a12	[LV] Consider Loop Unroll Hints When Making Interleave Decisions This patch causes the loop vectorizer to not interleave loops that have nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)). Note that if a particular interleave count is being requested (through llvm.loop.interleave_count), it will still be honoured, regardless of the presence of nounroll hints. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D101374	2021-04-28 17:27:52 -04:00
Joe Ellis	2c551aedcf	[LoopVectorize] Fix bug where predicated loads/stores were dropped This commit fixes a bug where the loop vectoriser fails to predicate loads/stores when interleaving for targets that support masked loads and stores. Code such as: 1 void foo(int restrict data1, int restrict data2) 2 { 3 int counter = 1024; 4 while (counter--) 5 if (data1[counter] > data2[counter]) 6 data1[counter] = data2[counter]; 7 } ... could previously be transformed in such a way that the predicated store implied by: if (data1[counter] > data2[counter]) data1[counter] = data2[counter]; ... was lost, resulting in miscompiles. This bug was causing some tests in llvm-test-suite to fail when built for SVE. Differential Revision: https://reviews.llvm.org/D99569	2021-04-22 15:05:54 +00:00
Sander de Smalen	86729538bd	[LV] Let selectVectorizationFactor reason directly on VectorizationFactor. Rather than maintaining two separate values, a `float` for the per-lane cost and a Width for the VF, maintain a single VectorizationFactor which comprises the two and also removes the need for converting an integer value to float. This simplifies the query when asking if one VF is more profitable than another when we want to extend this for scalable vectors (which may require additional options to determine if e.g. a scalable VF of the some cost, is more profitable than a fixed VF of the same cost). The patch isn't entirely NFC because it also fixes an issue in selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs no longer truncates the floating-point cost from `float` to `unsigned` to then perform the calculation on the truncated cost. It now does a cost comparison with the correct precision. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D100121	2021-04-20 09:54:45 +01:00
Roman Lebedev	df9597cf5a	[X86][CostModel] X86TTIImpl::getShuffleCost(): subvector insertions are cheap This is similar to the subvector extractions, except that the 0'th subvector isn't free to insert, because we generally don't know whether or not the upper elements need to be preserved: https://godbolt.org/z/rsxP5W4sW This is needed to avoid regressions in D100684 Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D100698	2021-04-19 13:24:58 +03:00
Roman Lebedev	f3953a8aba	[NFC][LoopVectorize] Autogenerate check lines in X86/gather_scatter.ll test	2021-04-18 10:26:16 +03:00
Philip Reames	ff55d01a8e	[nofree] Restrict semantics to memory visible to caller This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee. This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute". The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work. Differential Revision: https://reviews.llvm.org/D100141	2021-04-16 11:38:55 -07:00
Kerry McLaughlin	857b8a73da	[LoopVectorize] Change the identity element for FAdd Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd. Reviewed By: dmgreen, spatel Differential Revision: https://reviews.llvm.org/D98963	2021-04-06 12:13:43 +01:00
Philip Reames	e2c6621e63	[deref-at-point] restrict inference of dereferenceability based on allocsize attribute Support deriving dereferenceability facts from allocation sites with known object sizes while correctly accounting for any possibly frees between allocation and use site. (At the moment, we're conservative and only allowing it in functions where we know we can't free.) This is part of the work on deref-at-point semantics. I'm making the change unconditional as the miscompile in this case is way too easy to trip by accident, and the optimization was only recently added (by me). There will be a follow up patch wiring through TLI since that should now be doable without introducing widespread miscompiles. Differential Revision: https://reviews.llvm.org/D95815	2021-04-01 08:34:40 -07:00
Thomas Preud'homme	8b5b03c279	[test, LoopVectorize] Fix use of var defined in CHECK-NOT LLVM test Transforms/LoopVectorize/X86/x86-pr39099.ll tries to check for the absence of a sequence of instructions with several CHECK-NOT with one of those directives using a variable defined in another. However CHECK-NOT are checked independently so that is using a variable defined in a pattern that should not occur in the input. This commit only checks for the absence of a widened load which rules out the presence of the whole sequence and does not involve an undefined variable. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D99583	2021-03-30 15:32:30 +01:00
Florian Hahn	c773d0f973	Recommit "[LV] Move runtime pointer size check to LVP::plan()." Re-apply `25fbe803d4`, with a small update to emit the right remark class. Original message: [LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri	2021-03-29 16:14:27 +01:00
Florian Hahn	485c8ce733	Revert "[LV] Move runtime pointer size check to LVP::plan()." This reverts commit `25fbe803d4`. This breaks a clang test which filters for the wrong remark type.	2021-03-29 14:41:53 +01:00
Florian Hahn	25fbe803d4	[LV] Move runtime pointer size check to LVP::plan(). This removes the need for the remaining doesNotMeet check and instead directly checks if there are too many runtime checks for vectorization in the planner. A subsequent patch will adjust the logic used to decide whether to vectorize with runtime to consider their cost more accurately. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D98634	2021-03-29 14:12:29 +01:00
Philip Reames	67e28173f1	Autogen test to account for tool output format change	2021-03-25 14:41:08 -07:00
Jeroen Dobbelaere	04790d9cfb	Support intrinsic overloading on unnamed types This patch adds support for intrinsic overloading on unnamed types. This fixes PR38117 and PR48340 and will also be needed for the Full Restrict Patches (D68484). The main problem is that the intrinsic overloading name mangling is using 's_s' for unnamed types. This can result in identical intrinsic mangled names for different function prototypes. This patch changes this by adding a '.XXXXX' to the intrinsic mangled name when at least one of the types is based on an unnamed type, ensuring that we get a unique name. Implementation details: - The mapping is created on demand and kept in Module. - It also checks for existing clashes and recycles potentially existing prototypes and declarations. - Because of extra data in Module, Intrinsic::getName needs an extra Module* argument and, for speed, an optional FunctionType* argument. - I still kept the original two-argument 'Intrinsic::getName' around which keeps the original behavior (providing the base name). -- Main reason is that I did not want to change the LLVMIntrinsicGetName version, as I don't know how acceptable such a change is -- The current situation already has a limitation. So that should not get worse with this patch. - Intrinsic::getDeclaration and the verifier are now using the new version. Other notes: - As far as I see, this should not suffer from stability issues. The count is only added for prototypes depending on at least one anonymous struct - The initial count starts from 0 for each intrinsic mangled name. - In case of name clashes, existing prototypes are remembered and reused when that makes sense. Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D91250	2021-03-19 14:34:25 +01:00
Sanjay Patel	c8893f3b78	[LoopVectorize] relax FMF constraint for FP induction This makes the induction part of the loop vectorizer match the reduction part. We do not need all of the fast-math-flags. For example, there are some that clearly are not in play like arcp or afn. If we want to make FMF constraints consistent across the IR optimizer, we might want to add nsz too, but that's up for debate (users can't expect associative FP math and preservation of sign-of-zero at the same time?). The calling code was fixed to avoid miscompiles with: `1bee549737` Differential Revision: https://reviews.llvm.org/D98708	2021-03-18 08:11:22 -04:00
David Green	3c25c40d51	[LV] Account for the cost of predication of scalarized load/store This adds the cost of an i1 extract and a branch to the cost in getMemInstScalarizationCost when the instruction is predicated. These predicated loads/store would generate blocks of something like: %c1 = extractelement <4 x i1> %C, i32 1 br i1 %c1, label %if, label %else if: %sa = extractelement <4 x i32> %a, i32 1 %sb = getelementptr inbounds float, float* %pg, i32 %sa %sv = extractelement <4 x float> %x, i32 1 store float %sa, float* %sb, align 4 else: So this increases the cost by the extract and branch. This is probably still too low in many cases due to the cost of all that branching, but there is already an existing hack increasing the cost using useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is a load or there are more than one store. This patch improves the cost for when there is only a single store, and hopefully at some point in the future the hack can be removed. Differential Revision: https://reviews.llvm.org/D98243	2021-03-17 10:57:50 +00:00
Sanjay Patel	d2eae990a1	[LoopVectorize] add FP induction test with minimal FMF; NFC	2021-03-16 12:05:34 -04:00
Roman Lebedev	78b8ce40ef	Reland [SCEV] Improve modelling for (null) pointer constants This reverts commit `329aeb5db4`, and relands commit `61f006ac65`. This is a continuation of D89456. As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In particular, i believe, i will need this in the future to further fix `SCEVAddExpr`operation type handling. This removes special handling of `ConstantPointerNull` from `ScalarEvolution::createSCEV()`, and add constant folding into `ScalarEvolution::getPtrToIntExpr()`. This way, `null` constants stay as such in SCEV's, but gracefully become zero integers when asked. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98147	2021-03-13 16:05:34 +03:00
Roman Lebedev	329aeb5db4	Temporairly evert "[SCEV] Improve modelling for (null) pointer constants" This appears to have broken ubsan bot: https://lab.llvm.org/buildbot/#/builders/85/builds/3062 https://reviews.llvm.org/D98147#2623549 It looks like LSR needs some kind of a change around insertion point handling. Reverting until i have a fix. This reverts commit `61f006ac65`.	2021-03-13 09:10:28 +03:00
Roman Lebedev	61f006ac65	[SCEV] Improve modelling for (null) pointer constants This is a continuation of D89456. As it was suggested there, now that SCEV models `PtrToInt`, we can try to improve SCEV's pointer handling. In particular, i believe, i will need this in the future to further fix `SCEVAddExpr`operation type handling. This removes special handling of `ConstantPointerNull` from `ScalarEvolution::createSCEV()`, and add constant folding into `ScalarEvolution::getPtrToIntExpr()`. This way, `null` constants stay as such in SCEV's, but gracefully become zero integers when asked. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D98147	2021-03-12 22:11:58 +03:00
Roman Lebedev	b46c085d2b	[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions These intrinsics, not the icmp+select are the canonical form nowadays, so we might as well directly emit them. This should not cause any regressions, but if it does, then then they would needed to be fixed regardless. Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`, but that is a pessimization, not a correctness issue. Additionally, the non-intrinsic form has issues with undef, see https://reviews.llvm.org/D88287#2587863	2021-03-06 21:52:46 +03:00
Florian Hahn	53dacb7b67	[LV] Generate RT checks up-front and remove them if required. This patch updates LV to generate the runtime checks just after cost modeling, to allow a more precise estimate of the actual cost of the checks. This information will be used in future patches to generate larger runtime checks in cases where the checks only make up a small fraction of the expected scalar loop execution time. The runtime checks are created up-front in a temporary block to allow better estimating the cost and un-linked from the existing IR. After deciding to vectorize, the checks are moved backed. If deciding not to vectorize, the temporary block is completely removed. This patch is similar in spirit to D71053, but explores a different direction: instead of delaying the decision on whether to vectorize in the presence of runtime checks it instead optimistically creates the runtime checks early and discards them later if decided to not vectorize. This has the advantage that the cost-modeling decisions can be kept together and can be done up-front and thus preserving the general code structure. I think delaying (part) of the decision to vectorize would also make the VPlan migration a bit harder. One potential drawback of this patch is that we speculatively generate IR which we might have to clean up later. However it seems like the code required to do so is quite manageable. Reviewed By: lebedev.ri, ebrevnov Differential Revision: https://reviews.llvm.org/D75980	2021-03-01 10:48:04 +00:00
David Green	bd4b61efbd	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-23 13:03:26 +00:00
Florian Hahn	15a74b64df	[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe. This patch extends VPWidenPHIRecipe to manage pairs of incoming (VPValue, VPBasicBlock) in the VPlan native path. This is made possible because we now directly manage defined VPValues for recipes. By keeping both the incoming value and block in the recipe directly, code-generation in the VPlan native path becomes independent of the predecessor ordering when fixing up non-induction phis, which currently can cause crashes in the VPlan native path. This fixes PR45958. Reviewed By: sguggill Differential Revision: https://reviews.llvm.org/D96773	2021-02-22 09:44:25 +00:00
Kerry McLaughlin	5fe1593438	[LoopVectorizer] Require no-signed-zeros-fp-math=true for fmin/fmax Currently, setting the `no-nans-fp-math` attribute to true will allow loops with fmin/fmax to vectorize, though we should be requiring that `no-signed-zeros-fp-math` is also set. This patch adds the check for no-signed-zeros at the function level and includes tests to make sure we don't vectorize functions with only one of the attributes associated. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D96604	2021-02-15 13:47:05 +00:00
Juneyoung Lee	ed253ef772	[LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask This patch fixes pr48832 by correctly generating the mask when a poison value is involved. Consider this CFG (which is a part of the input): ``` for.body: ; preds = %for.cond br i1 true, label %cond.false, label %land.rhs land.rhs: ; preds = %for.body br i1 poison, label %cond.end, label %cond.false cond.false: ; preds = %for.body, %land.rhs br label %cond.end cond.end: ; preds = %land.rhs, %cond.false %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ] ``` The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead. The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation. SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026). Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D95217	2021-02-14 21:12:34 +09:00
Jinsong Ji	9202806241	Revert "[CostModel] Remove VF from IntrinsicCostAttributes" This reverts commit `502a67dd7f`. This expose a failure in test-suite build on PowerPC, revert to unblock buildbot first, Dave will re-commit in https://reviews.llvm.org/D96287. Thanks Dave.	2021-02-09 02:14:14 +00:00
David Green	502a67dd7f	[CostModel] Remove VF from IntrinsicCostAttributes getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various parameters of the intrinsic being costed. It can either be called with a scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction (RetTy==Vector, VF==1) or from the vectorizer with a scalar type and vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered an error. Both of the vector modes are expected to be treated the same, but because this is confusing many backends end up getting it wrong. Instead of trying work with those two values separately this removes the VF parameter, widening the RetTy/ArgTys by VF used called from the vectorizer. This keeps things simpler, but does require some other modifications to keep things consistent. Most backends look like this will be an improvement (or were not using getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code from `c230965ccf` working. ARM removed the fix in `dfac521da1`, webassembly happens to get a fixup for an SLP cost issue and both X86 and AArch64 seem to now be using better costs from the vectorizer. Differential Revision: https://reviews.llvm.org/D95291	2021-02-05 09:34:24 +00:00

1 2 3 4 5 ...

475 Commits