Commit Graph

17930 Commits

Author SHA1 Message Date
Sanjay Patel c2ebad8d55 [InstCombine] add fold for demand of low bit of abs()
This is one problem shown in https://llvm.org/PR49763

https://alive2.llvm.org/ce/z/cV6-4K
https://alive2.llvm.org/ce/z/9_3g-L
2021-03-30 15:14:37 -04:00
Sanjay Patel 79ae41991c [InstCombine] add test for abs() demanded bits; NFC 2021-03-30 15:14:37 -04:00
Sourabh Singh Tomar f13f050551 [DebugInfo] Support for signed constants inside DIExpression
Negative numbers are represented using DW_OP_consts along with signed representation
of the number as the argument.

Test case IR is generated using Fortran front-end.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D99273
2021-03-30 23:20:38 +05:30
spupyrev 22998738e8 [SamplePGO] Keeping prof metadata for IndirectBrInst
Currently prof metadata with branch counts is added only for BranchInst and SwitchInst, but not for IndirectBrInst. As a result, BPI/BFI make incorrect inferences for indirect branches, which can be very hot.
This diff adds metadata for IndirectBrInst, in addition to BranchInst and SwitchInst.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D99550
2021-03-30 10:44:48 -07:00
Hongtao Yu 3e3fc431df [CSSPGO] Top-down processing order based on full profile.
Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example:

1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them.

2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining.

3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph.

4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions.

Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4.

I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%.

The change is an enhancement to https://reviews.llvm.org/D95988.

Reviewed By: wmi, wenlei

Differential Revision: https://reviews.llvm.org/D99351
2021-03-30 10:42:22 -07:00
Thomas Preud'homme a6950c33e8 [test, ARM] Fix use of var defined in CHECK-NOT
tries to check for the
    absence of a sequence of instructions with several CHECK-NOT with
one of
    those directives using a variable defined in another.

LLVM test CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll tries to
check for the absence of a sequence of instructions with several
CHECK-NOT with one of those directives using a variable defined in
another. However, CHECK-NOT are checked independently so that is using
a variable defined in a pattern that should not occur in the input. The
bug was then copied over in
Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector-inseltpoison.ll

This commit removes the definition and uses of variable to check each
line independently, making the check stronger than the current one.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99597
2021-03-30 16:28:13 +01:00
Thomas Preud'homme 8b5b03c279 [test, LoopVectorize] Fix use of var defined in CHECK-NOT
LLVM test Transforms/LoopVectorize/X86/x86-pr39099.ll tries to check for
the absence of a sequence of instructions with several CHECK-NOT with
one of those directives using a variable defined in another. However
CHECK-NOT are checked independently so that is using a variable defined
in a pattern that should not occur in the input.

This commit only checks for the absence of a widened load which rules
out the presence of the whole sequence and does not involve an undefined
variable.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D99583
2021-03-30 15:32:30 +01:00
Thomas Preud'homme 80fb7434e9 [test, HardwareLoops] Fix use of var defined in CHECK-NOT
LLVM test Transforms/HardwareLoops/ARM/structure.ll tries to check for
the absence of a sequence of instructions with several CHECK-NOT with
one of those directives using a variable defined in another. However
CHECK-NOT are checked independently so that is using a variable defined
in a pattern that should not occur in the input.

This commit only checks for the absence of llvm.loop.decrement.i32 which
rules out the presence of the whole sequence and does not involve an
undefined variable.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D99591
2021-03-30 15:06:32 +01:00
Krasimir Georgiev c51e91e046 Revert "[Passes] Add relative lookup table converter pass"
This reverts commit 5178ffc7cf.

Compiling `llvm-profdata` with a compiler build from this produces a
crashing binary.
2021-03-30 14:13:37 +02:00
Juneyoung Lee 6b4b1dc6ec [LoopUnswitch] Simplify branch condition if it is select with constant operands
This fixes the miscompilation reported in https://reviews.llvm.org/rG5bb38e84d3d0#986154 .

`select _, true, false` matches both m_LogicalAnd and m_LogicalOr, making later
transformations confused.
Simplify the branch condition to not have the form.
2021-03-30 20:09:42 +09:00
David Sherwood a08c7736a7 [LoopVectorize] Add support for scalable vectorization of induction variables
This patch adds support for the vectorization of induction variables when
using scalable vectors, which required the following changes:

1. Removed assert from InnerLoopVectorizer::getStepVector.
2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use
   a runtime determined value for VF and removed an assert.
3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable
   vectors. I did this by calculating the full vector value for each Part
   of the unroll factor (UF) and caching this in the VP state. This means
   that we are always able to extract an arbitrary element from the vector
   if necessary. In addition to this, I also permitted the caching of the
   individual lane values themselves for the known minimum number of elements
   in the same way we do for fixed width vectors. This is a further
   optimisation that improves the code quality since it avoids unnecessary
   extractelement operations when extracting the first lane.
4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while
   testing some code paths I noticed this is currently broken for scalable
   vectors.

Various tests to support different cases have been added here:

  Transforms/LoopVectorize/AArch64/sve-inductions.ll

Differential Revision: https://reviews.llvm.org/D98715
2021-03-30 11:13:31 +01:00
Stefan Gränitz c42c67ad60 Re-apply "[lli] Make -jit-kind=orc the default JIT engine"
MCJIT served well as the default JIT engine in lli for a long time, but the code is getting old and maintenance efforts don't seem to be in sight. In the meantime Orc became mature enough to fill that gap. The newly added greddy mode is very similar to the execution model of MCJIT. It should work as a drop-in replacement for common JIT tasks.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D98931
2021-03-30 12:08:26 +02:00
Krasimir Georgiev 8e7df996e3 Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
This reverts commit 92ddd3c1b6.

Causes multistage clang crashes, e.g.:
https://lab.llvm.org/buildbot/#/builders/36/builds/6678
2021-03-30 11:47:12 +02:00
Han Zhu 92ddd3c1b6 [loop-idiom] Hoist loop memcpys to loop preheader
For a simple loop like:
```
struct S {
  int x;
  int y;
  char b;
};

unsigned foo(S* __restrict__ a, S* b, int n) {
  for (int i = 0; i < n; i++)
    a[i] = b[i];

  return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %0 = bitcast %struct.S* %arrayidx2 to i8*
  %1 = bitcast %struct.S* %arrayidx to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.

With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %a1 = bitcast %struct.S* %a to i8*
  %b2 = bitcast %struct.S* %b to i8*
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %0 = zext i32 %n to i64
  %1 = mul nuw nsw i64 %0, 12
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %2 = bitcast %struct.S* %arrayidx2 to i8*
  %3 = bitcast %struct.S* %arrayidx to i8*
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```

Reviewed By: zino

Differential Revision: https://reviews.llvm.org/D97667
2021-03-29 23:36:26 -07:00
Han Zhu 2bd4049ceb Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
This reverts commit deb5095833.

Bad commit message.
2021-03-29 23:35:35 -07:00
Han Zhu deb5095833 [loop-idiom] Hoist loop memcpys to loop preheader
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Blame Revision:

Differential Revision: https://phabricator.intern.facebook.com/D26380397
2021-03-29 23:14:42 -07:00
Max Kazantsev 18b3415e61 [Test] Add a test demonstrating a missing opportunity to PRE a load 2021-03-30 12:29:11 +07:00
Gulfem Savrun Yeniceri 5178ffc7cf [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-03-29 21:53:32 +00:00
Wei Mi 3cbf44190b [SampleFDO] Do not scale the magic number NOMORE_ICP_MAGICNUM in value profile
during profile update.

When we inline a function and update the profile, the value profiles of the
indirect call in the inliner and inlinee will be scaled. In
https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we start
using the magic number NOMORE_ICP_MAGICNUM (-1) to mark targets which have
been promoted. The magic number shouldn't be scaled during the profile update.

Although the problem has been suppressed by https://reviews.llvm.org/D98187
for SampleFDO, which stops profile update for inlining in sampleFDO, the patch
is still wanted since it will be more consistent to handle the magic number
properly in profile update.

Differential Revision: https://reviews.llvm.org/D99394
2021-03-29 09:34:37 -07:00
Florian Hahn c773d0f973
Recommit "[LV] Move runtime pointer size check to LVP::plan()."
Re-apply 25fbe803d4, with a small update to emit the right remark
class.

Original message:
    [LV] Move runtime pointer size check to LVP::plan().

    This removes the need for the remaining doesNotMeet check and instead
    directly checks if there are too many runtime checks for vectorization
    in the planner.

    A subsequent patch will adjust the logic used to decide whether to
    vectorize with runtime to consider their cost more accurately.

    Reviewed By: lebedev.ri
2021-03-29 16:14:27 +01:00
Florian Hahn 485c8ce733
Revert "[LV] Move runtime pointer size check to LVP::plan()."
This reverts commit 25fbe803d4.

This breaks a clang test which filters for the wrong remark type.
2021-03-29 14:41:53 +01:00
Sanjay Patel da381cf7ce [SLP] allow matching integer min/max intrinsics as reduction ops
This is a 2nd try of:
3c8473ba53
which was reverted at:
 a26312f9d4
because of crashing.

This version includes extra code and tests to avoid the known
crashing examples as discussed in PR49730.

Original commit message:
As noted in D98152, we need to patch SLP to avoid regressions when
we start canonicalizing to integer min/max intrinsics.
Most of the real work to make this possible was in:
7202f47508

Differential Revision: https://reviews.llvm.org/D98981
2021-03-29 09:38:18 -04:00
Florian Hahn 25fbe803d4
[LV] Move runtime pointer size check to LVP::plan().
This removes the need for the remaining doesNotMeet check and instead
directly checks if there are too many runtime checks for vectorization
in the planner.

A subsequent patch will adjust the logic used to decide whether to
vectorize with runtime to consider their cost more accurately.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98634
2021-03-29 14:12:29 +01:00
Jingu Kang ab72871703 [SimpleLoopUnswitch] Fix wrong assertions in partial-unswitch.ll 2021-03-29 14:04:29 +01:00
Matt Arsenault 9a0c9402fa Reapply "OpaquePtr: Turn inalloca into a type attribute"
This reverts commit 07e46367ba.
2021-03-29 08:55:30 -04:00
Jingu Kang 07142b3040 [SimpleLoopUnswitch] Add tests to check partially invariant unswitch
Differential Revision: https://reviews.llvm.org/D99493
2021-03-29 13:06:32 +01:00
Hans Wennborg c6e5c4654b Don't use $ as suffix for symbol names in ThinLTOBitcodeWriter and other places
Using $ breaks demangling of the symbols. For example,

$ c++filt _Z3foov\$123
_Z3foov$123

This causes problems for developers who would like to see nice stack traces
etc., but also for automatic crash tracking systems which try to organize
crashes based on the stack traces.

Instead, use the period as suffix separator, since Itanium demanglers normally
ignore such suffixes:

$ c++filt _Z3foov.123
foo() [clone .123]

This is already done in some places; try to do it everywhere.

Differential revision: https://reviews.llvm.org/D97484
2021-03-29 13:03:52 +02:00
Oliver Stannard 07e46367ba Revert "Reapply "OpaquePtr: Turn inalloca into a type attribute""
Reverting because test 'Bindings/Go/go.test' is failing on most
buildbots.

This reverts commit fc9df30991.
2021-03-29 11:32:22 +01:00
Craig Topper 36b5d09b07 [X86] Add phase ordering test for the problem D99427 is trying to solve. NFC 2021-03-28 12:14:30 -07:00
Matt Arsenault fc9df30991 Reapply "OpaquePtr: Turn inalloca into a type attribute"
This reverts commit 20d5c42e0e.
2021-03-28 13:35:21 -04:00
Sanjay Patel 01ae6e5ead [InstCombine] sink min/max intrinsics with common op after select
This is another step towards parity with cmp+select min/max idioms.

See D98152.
2021-03-28 13:13:04 -04:00
Sanjay Patel 4f349739ef [InstCombine] add tests for select of min/max intrinsics; NFC 2021-03-28 13:13:04 -04:00
Nico Weber 20d5c42e0e Revert "OpaquePtr: Turn inalloca into a type attribute"
This reverts commit 4fefed6563.
Broke check-clang everywhere.
2021-03-28 13:02:52 -04:00
Matt Arsenault 4fefed6563 OpaquePtr: Turn inalloca into a type attribute
I think byval/sret and the others are close to being able to rip out
the code to support the missing type case. A lot of this code is
shared with inalloca, so catch this up to the others so that can
happen.
2021-03-28 11:12:23 -04:00
Juneyoung Lee 05884d3b52 Make FoldBranchToCommonDest poison-safe by default
This is a small patch to make FoldBranchToCommonDest poison-safe by default.
After fc3f0c9c, only two syntactic changes are needed to fix unit tests.
This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro).

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99452
2021-03-27 19:05:12 +09:00
Nikita Popov 4622648a06 Revert "[ArgPromotion] Copy additional metadata for loads."
This reverts commit 166620a4f0.

A miscompile has been reported in https://reviews.llvm.org/D93927#2653480
and following.
2021-03-26 21:34:54 +01:00
Florian Hahn 4858e081d7
[ConstraintElimination] Only strip casts preserving the representation.
Things like addrspacecast may not be no-ops, so we should not look
through them.
2021-03-26 20:07:41 +00:00
Florian Hahn af0087c03a
[ConstraintElimination] Add additional pointercast tests.
Add coverage for pointercasts other than bitcast. addrspacecast are not
handled properly at the moment.
2021-03-26 17:52:46 +00:00
Sanjay Patel 3f6e7d1550 [SLP] move test for min/max crashing; NFC
This was originally just an XFAIL test, but I modified it
to check output. To make that bot-friendly, I'm moving it
to the x86 dir since it specified an x86 target.
2021-03-26 10:28:15 -04:00
Sanjay Patel a26312f9d4 Revert "[SLP] allow matching integer min/max intrinsics as reduction ops"
This reverts commit 3c8473ba53 and includes test diffs to
maintain testing status.

There's at least 1 place that was not updated with 7202f47508 ,
so we can crash mismatching select and intrinsics as shown in
PR49730.
2021-03-26 09:59:14 -04:00
Nashe Mncube b723aa2a5a [InstCombine]Generalise regression tests for sve
The tests, test/Transforms/InstCombine/AArch64/sve-*,
have been shown to not be AArch64 specific. These tests
have been renamed and moved to reflect this.

Differential Revision: https://reviews.llvm.org/D99253
2021-03-26 12:04:50 +00:00
David Sherwood c39460cc4f Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost"
This reverts commit 240aa96cf2.
2021-03-26 11:36:53 +00:00
David Sherwood 240aa96cf2 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast with scalar uses, or b) this is an update to an induction variable
that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. For all other cases I have added an assert that none of the
users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

Differential Revision: https://reviews.llvm.org/D98512
2021-03-26 11:27:12 +00:00
Max Kazantsev 6a7bcc9c8d [Test] Add failing test for pr49730 2021-03-26 18:03:39 +07:00
Richard Smith ed8d76ec60 Explicitly enable the new pass manager in this test.
Otherwise it fails under -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF.
2021-03-25 18:10:36 -07:00
Guozhi Wei 3240910f00 [DAE] Adjust param/arg attributes when changing parameter to undef
In DeadArgumentElimination pass, if a function's argument is never used, corresponding caller's parameter can be changed to undef. If the param/arg has attribute noundef or other related attributes, LLVM LangRef(https://llvm.org/docs/LangRef.html#parameter-attributes) says its behavior is undefined. SimplifyCFG(D97244) takes advantage of this behavior and does bad transformation on valid code.

To avoid this undefined behavior when change caller's parameter to undef, this patch removes noundef attribute and other attributes imply noundef on param/arg.

Differential Revision: https://reviews.llvm.org/D98899
2021-03-25 14:53:22 -07:00
Philip Reames 67e28173f1 Autogen test to account for tool output format change 2021-03-25 14:41:08 -07:00
Philip Reames 88d0f47b4f [test] Add test for hoisting to custom allocation function using allocsize
The first is currently demonstrating a miscompile.
2021-03-25 14:31:51 -07:00
Yevgeny Rouban f7ef26ef0b [SLP] Fix crash in reduction for integer min/max
The SCEV commit b46c085d2b [NFCI] SCEVExpander:
    emit intrinsics for integral {u,s}{min,max} SCEV expressions
seems to reveal a new crash in SLPVectorizer.
SLP crashes expecting a SelectInst as an externally used value
but umin() call is found.

The patch relaxes the assumption to make the IR flag propagation safe.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D99328
2021-03-25 21:44:21 +07:00
Kerry McLaughlin 1f46499690 [SVE][LoopVectorize] Verify support for vectorizing loops with invariant loads
D95598 added a cost model for broadcast shuffle, which should enable loops
such as the following to vectorize, where the load of b[42] is invariant
and can be done using a scalar load + splat:

  for (int i=0; i<n; ++i)
    a[i] = b[i] + b[42];

This patch adds tests to verify that we can vectorize such loops.

Reviewed By: joechrisellis

Differential Revision: https://reviews.llvm.org/D98506
2021-03-25 14:10:21 +00:00