Commit Graph

475 Commits

Author SHA1 Message Date
Simon Pilgrim f114ef3731 [CostModel][X86] Add generic costs for vXi32 MUL -> v2Xi16 PMADDDW folds
Based off the improved fold in D108522

This should eventually allow us to replace the SLM only cost patterns with generic versions.
2021-09-05 16:08:11 +01:00
Simon Pilgrim 10c982e0b3 Revert rG1c9bec727ab5c53fa060560dc8d346a911142170 : [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
Reverted (manually due to merge conflicts) while regressions reported on PR51540 are investigated

As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.

I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst.

https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)

Differential Revision: https://reviews.llvm.org/D106450
2021-08-23 21:09:26 +01:00
Dorit Nuzman 67278b8a90 [LV] Support Interleaved Store Group With Gaps
Teach LV to use masked-store to support interleave-store-group with
gaps (instead of scatters/scalarization).

The symmetric case of using masked-load to support
interleaved-load-group with gaps was introduced a while ago, by
https://reviews.llvm.org/D53668; This patch completes the store-scenario
leftover from D53668, and solves PR50566.

Reviewed by: Ayal Zaks

Differential Revision: https://reviews.llvm.org/D104750
2021-08-08 10:32:02 +03:00
Simon Pilgrim 1c9bec727a [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069)
As noticed on D106352, after we've folded "(select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0))" if the inner Ptr was also a (now one use) gep we could then merge the geps, using the sum of the indices instead.

I've limited this to basic 2-op geps - a more general case further down InstCombinerImpl.visitGetElementPtrInst doesn't have the one-use limitation but only creates the add if it can be created via SimplifyAddInst.

https://alive2.llvm.org/ce/z/f8pLfD (Thanks Roman!)

Differential Revision: https://reviews.llvm.org/D106450
2021-07-22 10:58:51 +01:00
Mindong Chen e908e063d1 [LoopUtils] Fix incorrect RT check bounds of loop-invariant mem accesses
This fixes the lower and upper bound calculation of a
RuntimeCheckingPtrGroup when it has more than one loop
invariant pointers. Resolves PR50686.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D104148
2021-07-19 19:38:24 +08:00
Mindong Chen f3814ed3e9 [LV] Re-generate check lines of some fragile tests (NFC)
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D105438
2021-07-19 19:38:24 +08:00
Simon Pilgrim ae0d73ac3b [CostModel][X86] Adjust fptosi/fptoui SSE/AVX legalized costs based on llvm-mca reports.
Update (mainly) vXf32/vXf64 -> vXi8/vXi16 fptosi/fptoui costs based on the worst case costs from the script in D103695.

Move to using legalized types wherever possible, which allows us to prune the cost tables.
2021-07-12 20:38:25 +01:00
Alexey Bataev 0d74fd3fdf [SLP][COST][X86]Improve cost model for masked gather.
Revived D101297 in its original form + added some changes in X86
legalization cehcking for masked gathers.

This solution is the most stable and the most correct one. We have to
check the legality before trying to build the masked gather in SLP.
Without this check we have incorrect cost (for SLP) in case if the masked gather
is not legal/slower than the gather. And we're missing some
vectorization opportunities.

This can be fixed in the cost model, but in this case we need to add
special checks for the cost of GEPs for ScatterVectorize node, add
special check for small trees, etc., i.e. there are a lot of corner
cases here and there, which insrease code base and make it harder to
maintain the code.

> Can't we rely on cost model to deal with this? This can be profitable for futher vectorization, when we can start from such gather loads as seed.

The question from D101297. Actually, no, it can't. Actually, simple
gather may give us better result, especially after we started
vectorization of insertelements. Plus, like I said before, the cost for
non-legal masked gathers leads to missed vectorization opportunities.

Differential Revision: https://reviews.llvm.org/D105042
2021-07-08 11:53:30 -07:00
Simon Pilgrim cdca1785d3 [CostModel][X86] Adjust uitofp(vXi64) SSE/AVX legalized costs based on llvm-mca reports.
Update v4i64 -> v4f32/v4f64 uitofp costs based on the worst case costs from the script in D103695.

Fixes a few regressions before we start adding AVX costs for legalized types.
2021-07-02 13:09:00 +01:00
Simon Pilgrim 0af9b25aff [LoopVectorize][X86] Regenerate conversion-cost.ll tests 2021-07-01 15:34:20 +01:00
Florian Hahn 80aa7e147e
[VPlan] Merge predicated-triangle regions, after sinking.
Sinking scalar operands into predicated-triangle regions may allow
merging regions. This patch adds a VPlan-to-VPlan transform that tries
to merge predicate-triangle regions after sinking.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100260
2021-06-28 11:10:38 +01:00
Eli Friedman 8f3d16905d [ScalarEvolution] Ensure backedge-taken counts are not pointers.
A backedge-taken count doesn't refer to memory; returning a pointer type
is nonsense. So make sure we always return an integer.

The obvious way to do this would be to just convert the operands of the
icmp to integers, but that doesn't quite work out at the moment:
isLoopEntryGuardedByCond currently gets confused by ptrtoint operations.
So we perform the ptrtoint conversion late for lt/gt operations.

The test changes are mostly innocuous. The most interesting changes are
more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to
i64)) + %ptr)". This is expected: we can't fold this to zero because we
need to preserve the pointer base.

The call to isLoopEntryGuardedByCond in howFarToZero is less precise
because of ptrtoint operations; this shows up in the function
pr46786_c26_char in ptrtoint.ll. Fixing it here would require more
complex refactoring.  It should eventually be fixed by future
improvements to isImpliedCond.

See https://bugs.llvm.org/show_bug.cgi?id=46786 for context.

Differential Revision: https://reviews.llvm.org/D103656
2021-06-21 16:24:16 -07:00
Joachim Meyer 4f01122c3f [LV] Parallel annotated loop does not imply all loads can be hoisted.
As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.

The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.

Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103907
2021-06-10 23:37:57 +02:00
Florian Hahn 23c2f2e6b2
[LV] Mark increment of main vector loop induction variable as NUW.
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.

If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.

This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.

At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.

Note that this could probably be further improved by using information
from the original IV.

Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g

Part of a set of fixes required for PR50412.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D103255
2021-06-07 10:47:52 +01:00
Eli Friedman 925cd6b467 Regenerate a few tests related to SCEV.
In preparation for https://reviews.llvm.org/D103656
2021-06-04 13:35:00 -07:00
Juneyoung Lee 7161bb87c9 [InsCombine] Fix a few remaining vec transforms to use poison instead of undef
This is a patch that replaces shufflevector and insertelement's placeholder value with poison.

Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead
(D93818)
The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 .

This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.
2021-05-31 18:47:09 +09:00
serge-sans-paille 4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee0.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
Florian Hahn 65d3dd7c88
[VPlan] Add first VPlan version of sinkScalarOperands.
This patch adds a first VPlan-based implementation of sinking of scalar
operands.

The current version traverse a VPlan once and processes all operands of
a predicated REPLICATE recipe. If one of those operands can be sunk,
it is moved to the block containing the predicated REPLICATE recipe.
Continue with processing the operands of the sunk recipe.

The initial version does not re-process candidates after other recipes
have been sunk. It also cannot partially sink induction increments at
the moment. The VPlan only contains WIDEN-INDUCTION recipes and if the
induction is used for example in a GEP, only the first lane is used and
in the lowered IR the adds for the other lanes can be sunk into the
predicated blocks.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100258
2021-05-24 15:29:58 +01:00
Simon Pilgrim 2fca555866 [CostModel][X86] Improve fneg costs
These are always lowered as xor ops, so are always cheap
2021-05-21 17:23:45 +01:00
Juneyoung Lee 8a156d1c27 [InstCombine] Fully disable select to and/or i1 folding
This is a patch that disables the poison-unsafe select -> and/or i1 folding.

It has been blocking D72396 and also has been the source of a few miscompilations
described in llvm.org/pr49688 .
D99674 conditionally blocked this folding and successfully fixed the latter one.
The former one was still blocked, and this patch addresses it.

Note that a few test functions that has `_logical` suffix are now deoptimized.
These are created by @nikic to check the impact of disabling this optimization
by copying existing original functions and replacing and/or with select.

I can see that most of these are poison-unsafe; they can be revived by introducing
freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll)
because I think they are good targets for freeze fix.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101191
2021-05-06 09:29:52 +09:00
Juneyoung Lee e639bccefd run update_test_checks.py for the tests in D101191 (NFC)
This is an NFC that reruns update_test_checks.py on the tests that are
going to be updated in D101191.
2021-05-02 13:11:57 +09:00
Bardia Mahjour ddb3b26a12 [LV] Consider Loop Unroll Hints When Making Interleave Decisions
This patch causes the loop vectorizer to not interleave loops that have
nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)).
Note that if a particular interleave count is being requested
(through llvm.loop.interleave_count), it will still be honoured, regardless
of the presence of nounroll hints.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101374
2021-04-28 17:27:52 -04:00
Joe Ellis 2c551aedcf [LoopVectorize] Fix bug where predicated loads/stores were dropped
This commit fixes a bug where the loop vectoriser fails to predicate
loads/stores when interleaving for targets that support masked
loads and stores.

Code such as:

     1  void foo(int *restrict data1, int *restrict data2)
     2  {
     3    int counter = 1024;
     4    while (counter--)
     5      if (data1[counter] > data2[counter])
     6        data1[counter] = data2[counter];
     7  }

... could previously be transformed in such a way that the predicated
store implied by:

    if (data1[counter] > data2[counter])
       data1[counter] = data2[counter];

... was lost, resulting in miscompiles.

This bug was causing some tests in llvm-test-suite to fail when built
for SVE.

Differential Revision: https://reviews.llvm.org/D99569
2021-04-22 15:05:54 +00:00
Sander de Smalen 86729538bd [LV] Let selectVectorizationFactor reason directly on VectorizationFactor.
Rather than maintaining two separate values, a `float` for the per-lane
cost and a Width for the VF, maintain a single VectorizationFactor which
comprises the two and also removes the need for converting an integer value
to float.

This simplifies the query when asking if one VF is more profitable than
another when we want to extend this for scalable vectors (which may
require additional options to determine if e.g. a scalable VF of the
some cost, is more profitable than a fixed VF of the same cost).

The patch isn't entirely NFC because it also fixes an issue in
selectEpilogueVectorizationFactor, where the cost passed to ProfitableVFs
no longer truncates the floating-point cost from `float` to `unsigned` to
then perform the calculation on the truncated cost. It now does
a cost comparison with the correct precision.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100121
2021-04-20 09:54:45 +01:00
Roman Lebedev df9597cf5a
[X86][CostModel] X86TTIImpl::getShuffleCost(): subvector insertions are cheap
This is similar to the subvector extractions,
except that the 0'th subvector isn't free to insert,
because we generally don't know whether or not
the upper elements need to be preserved:
https://godbolt.org/z/rsxP5W4sW

This is needed to avoid regressions in D100684

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100698
2021-04-19 13:24:58 +03:00
Roman Lebedev f3953a8aba
[NFC][LoopVectorize] Autogenerate check lines in X86/gather_scatter.ll test 2021-04-18 10:26:16 +03:00
Philip Reames ff55d01a8e [nofree] Restrict semantics to memory visible to caller
This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee.

This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute".

The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work.

Differential Revision: https://reviews.llvm.org/D100141
2021-04-16 11:38:55 -07:00
Kerry McLaughlin 857b8a73da [LoopVectorize] Change the identity element for FAdd
Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd.

Reviewed By: dmgreen, spatel

Differential Revision: https://reviews.llvm.org/D98963
2021-04-06 12:13:43 +01:00
Philip Reames e2c6621e63 [deref-at-point] restrict inference of dereferenceability based on allocsize attribute
Support deriving dereferenceability facts from allocation sites with known object sizes while correctly accounting for any possibly frees between allocation and use site. (At the moment, we're conservative and only allowing it in functions where we know we can't free.)

This is part of the work on deref-at-point semantics. I'm making the change unconditional as the miscompile in this case is way too easy to trip by accident, and the optimization was only recently added (by me).

There will be a follow up patch wiring through TLI since that should now be doable without introducing widespread miscompiles.

Differential Revision: https://reviews.llvm.org/D95815
2021-04-01 08:34:40 -07:00
Thomas Preud'homme 8b5b03c279 [test, LoopVectorize] Fix use of var defined in CHECK-NOT
LLVM test Transforms/LoopVectorize/X86/x86-pr39099.ll tries to check for
the absence of a sequence of instructions with several CHECK-NOT with
one of those directives using a variable defined in another. However
CHECK-NOT are checked independently so that is using a variable defined
in a pattern that should not occur in the input.

This commit only checks for the absence of a widened load which rules
out the presence of the whole sequence and does not involve an undefined
variable.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D99583
2021-03-30 15:32:30 +01:00
Florian Hahn c773d0f973
Recommit "[LV] Move runtime pointer size check to LVP::plan()."
Re-apply 25fbe803d4, with a small update to emit the right remark
class.

Original message:
    [LV] Move runtime pointer size check to LVP::plan().

    This removes the need for the remaining doesNotMeet check and instead
    directly checks if there are too many runtime checks for vectorization
    in the planner.

    A subsequent patch will adjust the logic used to decide whether to
    vectorize with runtime to consider their cost more accurately.

    Reviewed By: lebedev.ri
2021-03-29 16:14:27 +01:00
Florian Hahn 485c8ce733
Revert "[LV] Move runtime pointer size check to LVP::plan()."
This reverts commit 25fbe803d4.

This breaks a clang test which filters for the wrong remark type.
2021-03-29 14:41:53 +01:00
Florian Hahn 25fbe803d4
[LV] Move runtime pointer size check to LVP::plan().
This removes the need for the remaining doesNotMeet check and instead
directly checks if there are too many runtime checks for vectorization
in the planner.

A subsequent patch will adjust the logic used to decide whether to
vectorize with runtime to consider their cost more accurately.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98634
2021-03-29 14:12:29 +01:00
Philip Reames 67e28173f1 Autogen test to account for tool output format change 2021-03-25 14:41:08 -07:00
Jeroen Dobbelaere 04790d9cfb Support intrinsic overloading on unnamed types
This patch adds support for intrinsic overloading on unnamed types.

This fixes PR38117 and PR48340 and will also be needed for the Full Restrict Patches (D68484).

The main problem is that the intrinsic overloading name mangling is using 's_s' for unnamed types.
This can result in identical intrinsic mangled names for different function prototypes.

This patch changes this by adding a '.XXXXX' to the intrinsic mangled name when at least one of the types is based on an unnamed type, ensuring that we get a unique name.

Implementation details:
- The mapping is created on demand and kept in Module.
- It also checks for existing clashes and recycles potentially existing prototypes and declarations.
- Because of extra data in Module, Intrinsic::getName needs an extra Module* argument and, for speed, an optional FunctionType* argument.
- I still kept the original two-argument 'Intrinsic::getName' around which keeps the original behavior (providing the base name).
-- Main reason is that I did not want to change the LLVMIntrinsicGetName version, as I don't know how acceptable such a change is
-- The current situation already has a limitation. So that should not get worse with this patch.
- Intrinsic::getDeclaration and the verifier are now using the new version.

Other notes:
- As far as I see, this should not suffer from stability issues. The count is only added for prototypes depending on at least one anonymous struct
- The initial count starts from 0 for each intrinsic mangled name.
- In case of name clashes, existing prototypes are remembered and reused when that makes sense.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D91250
2021-03-19 14:34:25 +01:00
Sanjay Patel c8893f3b78 [LoopVectorize] relax FMF constraint for FP induction
This makes the induction part of the loop vectorizer match the reduction part.
We do not need all of the fast-math-flags. For example, there are some that
clearly are not in play like arcp or afn.

If we want to make FMF constraints consistent across the IR optimizer, we
might want to add nsz too, but that's up for debate (users can't expect
associative FP math and preservation of sign-of-zero at the same time?).

The calling code was fixed to avoid miscompiles with:
1bee549737

Differential Revision: https://reviews.llvm.org/D98708
2021-03-18 08:11:22 -04:00
David Green 3c25c40d51 [LV] Account for the cost of predication of scalarized load/store
This adds the cost of an i1 extract and a branch to the cost in
getMemInstScalarizationCost when the instruction is predicated. These
predicated loads/store would generate blocks of something like:

    %c1 = extractelement <4 x i1> %C, i32 1
    br i1 %c1, label %if, label %else
  if:
    %sa = extractelement <4 x i32> %a, i32 1
    %sb = getelementptr inbounds float, float* %pg, i32 %sa
    %sv = extractelement <4 x float> %x, i32 1
    store float %sa, float* %sb, align 4
  else:

So this increases the cost by the extract and branch. This is probably
still too low in many cases due to the cost of all that branching, but
there is already an existing hack increasing the cost using
useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is
a load or there are more than one store. This patch improves the cost
for when there is only a single store, and hopefully at some point in
the future the hack can be removed.

Differential Revision: https://reviews.llvm.org/D98243
2021-03-17 10:57:50 +00:00
Sanjay Patel d2eae990a1 [LoopVectorize] add FP induction test with minimal FMF; NFC 2021-03-16 12:05:34 -04:00
Roman Lebedev 78b8ce40ef
Reland [SCEV] Improve modelling for (null) pointer constants
This reverts commit 329aeb5db4,
and relands commit 61f006ac65.

This is a continuation of D89456.

As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.

This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98147
2021-03-13 16:05:34 +03:00
Roman Lebedev 329aeb5db4
Temporairly evert "[SCEV] Improve modelling for (null) pointer constants"
This appears to have broken ubsan bot:
https://lab.llvm.org/buildbot/#/builders/85/builds/3062
https://reviews.llvm.org/D98147#2623549

It looks like LSR needs some kind of a change around insertion point handling.
Reverting until i have a fix.

This reverts commit 61f006ac65.
2021-03-13 09:10:28 +03:00
Roman Lebedev 61f006ac65
[SCEV] Improve modelling for (null) pointer constants
This is a continuation of D89456.

As it was suggested there, now that SCEV models `PtrToInt`,
we can try to improve SCEV's pointer handling.
In particular, i believe, i will need this in the future
to further fix `SCEVAddExpr`operation type handling.

This removes special handling of `ConstantPointerNull`
from `ScalarEvolution::createSCEV()`, and add constant folding
into `ScalarEvolution::getPtrToIntExpr()`.
This way, `null` constants stay as such in SCEV's,
but gracefully become zero integers when asked.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D98147
2021-03-12 22:11:58 +03:00
Roman Lebedev b46c085d2b
[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.

This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.

Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.

Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
2021-03-06 21:52:46 +03:00
Florian Hahn 53dacb7b67
[LV] Generate RT checks up-front and remove them if required.
This patch updates LV to generate the runtime checks just after cost
modeling, to allow a more precise estimate of the actual cost of the
checks. This information will be used in future patches to generate
larger runtime checks in cases where the checks only make up a small
fraction of the expected scalar loop execution time.

The runtime checks are created up-front in a temporary block to allow better
estimating the cost and un-linked from the existing IR. After deciding to
vectorize, the checks are moved backed. If deciding not to vectorize, the
temporary block is completely removed.

This patch is similar in spirit to D71053, but explores a different
direction: instead of delaying the decision on whether to vectorize in
the presence of runtime checks it instead optimistically creates the
runtime checks early and discards them later if decided to not
vectorize. This has the advantage that the cost-modeling decisions
can be kept together and can be done up-front and thus preserving the
general code structure. I think delaying (part) of the decision to
vectorize would also make the VPlan migration a bit harder.

One potential drawback of this patch is that we speculatively
generate IR which we might have to clean up later. However it seems like
the code required to do so is quite manageable.

Reviewed By: lebedev.ri, ebrevnov

Differential Revision: https://reviews.llvm.org/D75980
2021-03-01 10:48:04 +00:00
David Green bd4b61efbd [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-23 13:03:26 +00:00
Florian Hahn 15a74b64df
[VPlan] Manage pairs of incoming (VPValue, VPBB) in VPWidenPHIRecipe.
This patch extends VPWidenPHIRecipe to manage pairs of incoming
(VPValue, VPBasicBlock) in the VPlan native path. This is made possible
because we now directly manage defined VPValues for recipes.

By keeping both the incoming value and block in the recipe directly,
code-generation in the VPlan native path becomes independent of the
predecessor ordering when fixing up non-induction phis, which currently
can cause crashes in the VPlan native path.

This fixes PR45958.

Reviewed By: sguggill

Differential Revision: https://reviews.llvm.org/D96773
2021-02-22 09:44:25 +00:00
Kerry McLaughlin 5fe1593438 [LoopVectorizer] Require no-signed-zeros-fp-math=true for fmin/fmax
Currently, setting the `no-nans-fp-math` attribute to true will allow
loops with fmin/fmax to vectorize, though we should be requiring that
`no-signed-zeros-fp-math` is also set.

This patch adds the check for no-signed-zeros at the function level and includes
tests to make sure we don't vectorize functions with only one of the attributes
associated.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D96604
2021-02-15 13:47:05 +00:00
Juneyoung Lee ed253ef772 [LoopVectorize] Fix VPRecipeBuilder::createEdgeMask to correctly generate the mask
This patch fixes pr48832 by correctly generating the mask when a poison value is involved.

Consider this CFG (which is a part of the input):

```
for.body:                                         ; preds = %for.cond
  br i1 true, label %cond.false, label %land.rhs

land.rhs:                                         ; preds = %for.body
  br i1 poison, label %cond.end, label %cond.false

cond.false:                                       ; preds = %for.body, %land.rhs
  br label %cond.end

cond.end:                                         ; preds = %land.rhs, %cond.false
  %cond = phi i32 [ 0, %cond.false ], [ 1, %land.rhs ]

```

The path for.body -> land.rhs -> cond.end should be taken when 'select i1 false, i1 poison, i1 false' holds (which means it's never taken); but VPRecipeBuilder::createEdgeMask was emitting 'and i1 false, poison' instead.
The former one successfully blocks poison propagation whereas the latter one doesn't, making the condition poison and thus causing the miscompilation.

SimplifyCFG has a similar bug (which didn't expose a real-world bug yet), and a patch for this is also ongoing (see https://reviews.llvm.org/D95026).

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D95217
2021-02-14 21:12:34 +09:00
Jinsong Ji 9202806241 Revert "[CostModel] Remove VF from IntrinsicCostAttributes"
This reverts commit 502a67dd7f.

This expose a failure in test-suite build on PowerPC,
revert to unblock buildbot first,
Dave will re-commit in https://reviews.llvm.org/D96287.

Thanks Dave.
2021-02-09 02:14:14 +00:00
David Green 502a67dd7f [CostModel] Remove VF from IntrinsicCostAttributes
getIntrinsicInstrCost takes a IntrinsicCostAttributes holding various
parameters of the intrinsic being costed. It can either be called with a
scalar intrinsic (RetTy==Scalar, VF==1), with a vector instruction
(RetTy==Vector, VF==1) or from the vectorizer with a scalar type and
vector width (RetTy==Scalar, VF>1). A RetTy==Vector, VF>1 is considered
an error. Both of the vector modes are expected to be treated the same,
but because this is confusing many backends end up getting it wrong.

Instead of trying work with those two values separately this removes the
VF parameter, widening the RetTy/ArgTys by VF used called from the
vectorizer. This keeps things simpler, but does require some other
modifications to keep things consistent.

Most backends look like this will be an improvement (or were not using
getIntrinsicInstrCost). AMDGPU needed the most changes to keep the code
from c230965ccf working. ARM removed the fix in
dfac521da1, webassembly happens to get a fixup for an SLP cost
issue and both X86 and AArch64 seem to now be using better costs from
the vectorizer.

Differential Revision: https://reviews.llvm.org/D95291
2021-02-05 09:34:24 +00:00