Commit Graph

616 Commits

Author SHA1 Message Date
Ivan Krasin c2124e185c Revert r298620: [LV] Vectorize GEPs
Reason: breaks linking Chromium with LLD + ThinLTO (a pass crashes)
LLVM bug: https://bugs.llvm.org//show_bug.cgi?id=32413

Original change description:

[LV] Vectorize GEPs

This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.

With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.

The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.

Original Differential Revision: https://reviews.llvm.org/D30710

llvm-svn: 298735
2017-03-24 20:49:43 +00:00
Gil Rapaport 638d4538cd [LV] Add regression test for r297610
The new test asserts that scalarized memory operations get memcheck metadata
added even if the loop is only unrolled.

Differential Revision: https://reviews.llvm.org/D30972

llvm-svn: 298641
2017-03-23 20:02:23 +00:00
Matthew Simpson 4e7b71bc86 [LV] Vectorize GEPs
This patch adds support for vectorizing GEPs. Previously, we only generated
vector GEPs on-demand when creating gather or scatter operations. All GEPs from
the original loop were scalarized by default, and if a pointer was to be stored
to memory, we would have to build up the pointer vector with insertelement
instructions.

With this patch, we will vectorize all GEPs that haven't already been marked
for scalarization.

The patch refines collectLoopScalars to more exactly identify the scalar GEPs.
The function now more closely resembles collectLoopUniforms. And the patch
moves vector GEP creation out of vectorizeMemoryInstruction and into the main
vectorization loop. The vector GEPs needed for gather and scatter operations
will have already been generated before vectoring the memory accesses.

Differential Revision: https://reviews.llvm.org/D30710

llvm-svn: 298620
2017-03-23 16:29:58 +00:00
Matthew Simpson 1fb4064531 [LV] Delete unneeded scalar GEP creation code
The code for generating scalar base pointers in vectorizeMemoryInstruction is
not needed. We currently scalarize all GEPs and maintain the scalarized values
in VectorLoopValueMap. The GEP cloning in this unneeded code is the same as
that in scalarizeInstruction. The test cases that changed as a result of this
patch changed because we were able to reuse the scalarized GEP that we
previously generated instead of cloning a new one.

Differential Revision: https://reviews.llvm.org/D30587

llvm-svn: 298615
2017-03-23 16:07:21 +00:00
Matt Arsenault 3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Jonas Paulsson a48ea231c0 [TargetTransformInfo] getIntrinsicInstrCost() scalarization estimation improved
getIntrinsicInstrCost() used to only compute scalarization cost based on types.
This patch improves this so that the actual arguments are checked when they are
available, in order to handle only unique non-constant operands.

Tests updates:

Analysis/CostModel/X86/arith-fp.ll
Transforms/LoopVectorize/AArch64/interleaved_cost.ll
Transforms/LoopVectorize/ARM/interleaved_cost.ll

The improvement in getOperandsScalarizationOverhead() to differentiate on
constants made it necessary to update the interleaved_cost.ll tests even
though they do not relate to intrinsics.

Review: Hal Finkel
https://reviews.llvm.org/D29540

llvm-svn: 297705
2017-03-14 06:35:36 +00:00
Changpeng Fang 1be9b9f816 AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized.
Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D30719

llvm-svn: 297328
2017-03-09 00:07:00 +00:00
Matthew Simpson 3388de1349 [LV] Select legal insert point when fixing first-order recurrences
Because IRBuilder performs constant-folding, it's not guaranteed that an
instruction in the original loop map to an instruction in the vector loop. It
could map to a constant vector instead. The handling of first-order recurrences
was incorrectly making this assumption when setting the IRBuilder's insert
point.

llvm-svn: 297302
2017-03-08 18:18:20 +00:00
Matthew Simpson 8966848d17 [LV] Make the test case for PR30183 less fragile
This patch also renames the PR number the test points to. The previous
reference was PR29559, but that bug was somehow deleted and recreated under
PR30183.

llvm-svn: 297295
2017-03-08 17:03:38 +00:00
Matthew Simpson 903dd5aa9b [LV] Add missing check labels to tests and reformat
llvm-svn: 297294
2017-03-08 16:55:34 +00:00
Matthew Simpson c86b2134c7 [LV] Consider users that are memory accesses in uniforms expansion step
When expanding the set of uniform instructions beyond the seed instructions
(e.g., consecutive pointers), we mark a new instruction uniform if all its
loop-varying users are uniform. We should also allow users that are consecutive
or interleaved memory accesses. This fixes cases where we have an instruction
that is used as the pointer operand of a consecutive access but also used by a
non-memory instruction that later becomes uniform as part of the expansion.

llvm-svn: 297179
2017-03-07 18:47:30 +00:00
Matthew Simpson aee9771ae2 [ARM/AArch64] Update costs for interleaved accesses with wide types
After r296750, we're able to match interleaved accesses having types wider than
128 bits. This patch updates the associated TTI costs.

Differential Revision: https://reviews.llvm.org/D29675

llvm-svn: 296751
2017-03-02 15:15:35 +00:00
Matthew Simpson 455c2ee394 [LV] Considier non-consecutive but vectorizable accesses for VF selection
When computing the smallest and largest types for selecting the maximum
vectorization factor, we currently ignore loads and stores of pointer types if
the memory access is non-consecutive. We do this because such accesses must be
scalarized regardless of vectorization factor, and thus shouldn't be considered
when determining the factor. This patch makes this check less aggressive by
also considering non-consecutive accesses that may be vectorized, such as
interleaved accesses. Because we don't know at the time of the check if an
accesses will certainly be vectorized (this is a cost model decision given a
particular VF), we consider all accesses that can potentially be vectorized.

Differential Revision: https://reviews.llvm.org/D30305

llvm-svn: 296747
2017-03-02 13:55:05 +00:00
Adam Nemet 15032a0455 [LV] These remark should have been missed remarks
The practice in LV is that we emit analysis remarks and then finally report
either a missed or applied remark on the final decision whether vectorization
is taking place.  On this code path, we were closing with an analysis remark.

llvm-svn: 296578
2017-03-01 04:31:15 +00:00
Craig Topper fe25988c68 [AVX-512] Fix the execution domain for AVX-512 integer broadcasts.
llvm-svn: 296290
2017-02-26 06:45:51 +00:00
Matthew Simpson bdc9c78880 [LV] Merge floating-point and integer induction widening code
This patch merges the existing floating-point induction variable widening code
into the integer induction variable widening code, creating a single set of
functions for both kinds of inductions. The primary motivation for doing this
is to enable vector phi node creation for floating-point induction variables.

Differential Revision: https://reviews.llvm.org/D30211

llvm-svn: 296145
2017-02-24 18:20:12 +00:00
Matthew Simpson 835b246d7f [LV] Update floating-point induction test checks (NFC)
llvm-svn: 295885
2017-02-22 21:56:02 +00:00
Matthew Simpson 5ef66ef248 [LV] Add scalar floating-point induction test (NFC)
llvm-svn: 295862
2017-02-22 19:09:38 +00:00
Karl-Johan Karlsson 6eaed7aceb [LoopVectorize] Added address space check when analysing interleaved accesses
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.

This is fixing pr31900.

Reviewers: HaoLiu, mssimpso, mkuper

Reviewed By: mssimpso, mkuper

Subscribers: llvm-commits, efriedma, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29717

This reverts r295042 (re-applies r295038) with an additional fix for the
buildbot problem.

llvm-svn: 295858
2017-02-22 18:37:36 +00:00
Dehao Chen 7d230325ef Increases full-unroll threshold.
Summary:
The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold

This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge):

Performance:

403	0.11%
433	0.51%
445	0.48%
447	3.50%
453	1.49%
464	0.75%

Code size:

403	0.56%
433	0.96%
445	2.16%
447	2.96%
453	0.94%
464	8.02%

The compiler time overhead is similar with code size.

Reviewers: davidxl, mkuper, mzolotukhin, hfinkel, chandlerc

Reviewed By: hfinkel, chandlerc

Subscribers: mehdi_amini, zzheng, efriedma, haicheng, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D28368

llvm-svn: 295538
2017-02-18 03:46:51 +00:00
Matthew Simpson f68e183f91 [LV] Remove constant restriction for vector phi creation
We previously only created a vector phi node for an induction variable if its
step had a constant integer type. However, the step actually only needs to be
loop-invariant. We only handle inductions having loop-invariant steps, so this
patch should enable vector phi node creation for all integer induction
variables that will be vectorized.

Differential Revision: https://reviews.llvm.org/D29956

llvm-svn: 295456
2017-02-17 16:09:07 +00:00
Matthew Simpson f09d13e5cc Reapply "[LV] Extend trunc optimization to all IVs with constant integer steps"
This reapplies commit r294967 with a fix for the execution time regressions
caught by the clang-cmake-aarch64-quick bot. We now extend the truncate
optimization to non-primary induction variables only if the truncate isn't
already free.

Differential Revision: https://reviews.llvm.org/D29847

llvm-svn: 295063
2017-02-14 16:28:32 +00:00
Karl-Johan Karlsson ec21b769ec Revert "[LoopVectorize] Added address space check when analysing interleaved accesses"
This reverts r295038. The buildbot clang-with-thin-lto-ubuntu failed.
I'm reverting to investigate.

llvm-svn: 295042
2017-02-14 10:06:16 +00:00
Karl-Johan Karlsson 2ec409cca2 [LoopVectorize] Added address space check when analysing interleaved accesses
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.

This is fixing pr31900.

Reviewers: HaoLiu, mssimpso, mkuper

Reviewed By: mssimpso, mkuper

Subscribers: llvm-commits, efriedma, mzolotukhin

Differential Revision: https://reviews.llvm.org/D29717

llvm-svn: 295038
2017-02-14 08:14:06 +00:00
Matthew Simpson 659f92e2aa Revert "[LV] Extend trunc optimization to all IVs with constant integer steps"
This reverts commit r294967. This patch caused execution time slowdowns in a
few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot.
I'm reverting to investigate.

llvm-svn: 294973
2017-02-13 18:02:35 +00:00
Matthew Simpson 7b7f40297f [LV] Extend trunc optimization to all IVs with constant integer steps
This patch extends the optimization of truncations whose operand is an
induction variable with a constant integer step. Previously we were only
applying this optimization to the primary induction variable. However, the cost
model assumes the optimization is applied to the truncation of all integer
induction variables (even regardless of step type). The transformation is now
applied to the other induction variables, and I've updated the cost model to
ensure it is better in sync with the transformation we actually perform.

Differential Revision: https://reviews.llvm.org/D29847

llvm-svn: 294967
2017-02-13 16:48:00 +00:00
Dorit Nuzman eac89d736c [LV/LoopAccess] Check statically if an unknown dependence distance can be
proven larger than the loop-count

This fixes PR31098: Try to resolve statically data-dependences whose
compile-time-unknown distance can be proven larger than the loop-count, 
instead of resorting to runtime dependence checking (which are not always 
possible).

For vectorization it is sufficient to prove that the dependence distance 
is >= VF; But in some cases we can prune unknown dependence distances early,
and even before selecting the VF, and without a runtime test, by comparing 
the distance against the loop iteration count. Since the vectorized code 
will be executed only if LoopCount >= VF, proving distance >= LoopCount 
also guarantees that distance >= VF. This check is also equivalent to the 
Strong SIV Test.

Reviewers: mkuper, anemet, sanjoy

Differential Revision: https://reviews.llvm.org/D28044

llvm-svn: 294892
2017-02-12 09:32:53 +00:00
Ahmed Bougacha 8425f453ef [ARM] Make f16 interleaved accesses expensive.
There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.

Teach the cost model to consider f16 interleaved operations as
expensive.  Otherwise, we are all but guaranteed to end up with
a large block of scalarized vector code.

llvm-svn: 294819
2017-02-11 01:53:04 +00:00
Dehao Chen fb02f7140a Encode duplication factor from loop vectorization and loop unrolling to discriminator.
Summary:
This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html

When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations.

The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default.

Reviewers: probinson, aprantl, davidxl, hfinkel, echristo

Reviewed By: hfinkel

Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26420

llvm-svn: 294782
2017-02-10 21:09:07 +00:00
Matthew Simpson df124a7569 [LV] Remove type restriction for vector phi creation
We previously only created a vector phi node for an induction variable if its
type matched the type of the canonical induction variable.

Differential Revision: https://reviews.llvm.org/D29776

llvm-svn: 294755
2017-02-10 16:15:26 +00:00
Elena Demikhovsky 5267edd3e3 [Loop Vectorizer] Cost-based decision for vectorization form of memory instruction.
Making the cost model selecting between Interleave, GatherScatter or Scalar vectorization form of memory instruction.
The right decision should be done for non-consecutive memory access instrcuctions that may have more than one vectorization solution.

This patch includes the following changes:
- Cost Model calculates the cost of Load/Store vector form and choose the better option between Widening, Interleave, GatherScactter and Scalarization. Cost Model keeps the widening decision.
- Arrays of Uniform and Scalar values are moved from Legality to Cost Model.
- Cost Model collects Uniforms and Scalars per VF. The collection is based on CM decision map of Loadis/Stores vectorization form.
- Vectorization of memory instruction is performed according to the CM decision.

Differential Revision: https://reviews.llvm.org/D27919

llvm-svn: 294503
2017-02-08 19:25:23 +00:00
Craig Topper e0ac7f3beb [X86] Remove PCOMMIT instruction support since Intel has deprecated this instruction with no plans to release products with it.
Intel's documentation for the deprecation https://software.intel.com/en-us/blogs/2016/09/12/deprecate-pcommit-instruction

llvm-svn: 294405
2017-02-08 05:45:39 +00:00
Matthew Simpson 3877f397cd [LV] Add new ARM/AArch64 interleaved access cost model tests (NFC)
llvm-svn: 294342
2017-02-07 19:34:24 +00:00
Matthew Simpson 1cd02f13a5 [LV] Simplify ARM/AArch64 interleaved access cost model tests (NFC)
This patch removes unneeded instructions from the existing ARM/AArch64
interleaved access cost model tests. I'll be adding a similar set of tests in a
follow-on patch to increase coverage.

llvm-svn: 294336
2017-02-07 19:17:44 +00:00
Adam Nemet 0bf1b863b9 [LV] Also port failure remarks to new OptimizationRemarkEmitter API
llvm-svn: 293866
2017-02-02 05:41:51 +00:00
Chandler Carruth 6f4ed077d0 [LV] Fix an issue where forming LCSSA in the place that we did would
change the set of uniform instructions in the loop causing an assert
failure.

The problem is that the legalization checking also builds data
structures mapping various facts about the loop body. The immediate
cause was the set of uniform instructions. If these then change when
LCSSA is formed, the data structures would already have been built and
become stale. The included test case triggered an assert in loop
vectorize that was reduced out of the new PM's pipeline.

The solution is to form LCSSA early enough that no information is cached
across the changes made. The only really obvious position is outside of
the main logic to vectorize the loop. This also has the advantage of
removing one case where forming LCSSA could mutate the loop but we
wouldn't track that as a "Changed" state.

If it is significantly advantageous to do some legalization checking
prior to this, we can do a more careful positioning but it seemed best
to just back off to a safe position first.

llvm-svn: 293168
2017-01-26 10:41:09 +00:00
Mohammed Agabaria 20caee95e1 [X86] enable memory interleaving for X86\SLM arch.
Differential Revision: https://reviews.llvm.org/D28547

llvm-svn: 293040
2017-01-25 09:14:48 +00:00
Michael Kuperstein 230867e583 [LV] Run loop-simplify and LCSSA explicitly instead of "requiring" them
This changes the vectorizer to explicitly use the loopsimplify and lcssa utils,
instead of "requiring" the transformations as if they were analyses.

This is not NFC, since it changes the LCSSA behavior - we no longer run LCSSA
for all loops, but rather only for the loops we expect to modify.

Differential Revision: https://reviews.llvm.org/D28868

llvm-svn: 292456
2017-01-19 00:42:28 +00:00
Michael Kuperstein 7cefb409b0 [LV] Allow reductions that have several uses outside the loop
We currently check whether a reduction has a single outside user. We don't
really need to require that - we just need to make sure a single value is
used externally. The number of external users of that value shouldn't actually
matter.

Differential Revision: https://reviews.llvm.org/D28830

llvm-svn: 292424
2017-01-18 19:02:52 +00:00
Matthew Simpson e2c9ad9483 [LV] Add requires asserts to test case
llvm-svn: 292280
2017-01-17 22:21:33 +00:00
Matthew Simpson 3fbdaa5906 [LV] Mark non-consecutive-like pointers non-uniform
If a memory instruction will be vectorized, but it's pointer operand is
non-consecutive-like, the instruction is a gather or scatter operation. Its
pointer operand will be non-uniform. This should fix PR31671.

Reference: https://llvm.org/bugs/show_bug.cgi?id=31671
Differential Revision: https://reviews.llvm.org/D28819

llvm-svn: 292254
2017-01-17 20:51:39 +00:00
Mohammed Agabaria 81d0f17055 [X86] fixing failed test in commit: r291657
Missing Requires asserts.

llvm-svn: 291659
2017-01-11 09:03:11 +00:00
Mohammed Agabaria 2c96c43388 [X86] updating TTI costs for arithmetic instructions on X86\SLM arch.
updated instructions:
pmulld, pmullw, pmulhw, mulsd, mulps, mulpd, divss, divps, divsd, divpd, addpd and subpd.

special optimization case which replaces pmulld with pmullw\pmulhw\pshuf seq. 
In case if the real operands bitwidth <= 16.

Differential Revision: https://reviews.llvm.org/D28104 

llvm-svn: 291657
2017-01-11 08:23:37 +00:00
Michael Kuperstein ee31cbe35f [LV] Don't panic when encountering the IV of an outer loop.
Bail out instead of asserting when we encounter this situation,
which can actually happen.

The reason the test uses the new PM is that the "bad" phi, incidentally, gets
cleaned up by LoopSimplify. But LICM can create this kind of phi and preserve
loop simplify form, so the cleanup has no chance to run.

This fixes PR31190.
We may want to solve this in a less conservative manner, since this phi is
actually uniform within the inner loop (or we may want LICM to output a cleaner
promotion to begin with).

Differential Revision: https://reviews.llvm.org/D28490

llvm-svn: 291589
2017-01-10 19:32:30 +00:00
Matthew Simpson cf796478e9 [LV] Fix-up external IV users after updating dominator tree
This patch delays the fix-up step for external induction variable users until
after the dominator tree has been properly updated. This should fix PR30742.
The SCEVExpander in InductionDescriptor::transform can generate code in the
wrong location if the dominator tree is not up-to-date. We should work towards
keeping the dominator tree up-to-date throughout the transformation.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30742
Differential Revision: https://reviews.llvm.org/D28168

llvm-svn: 291462
2017-01-09 19:05:29 +00:00
Mohammed Agabaria 23599ba794 Currently isLikelyComplexAddressComputation tries to figure out if the given stride seems to be 'complex' and need some extra cost for address computation handling.
This code seems to be target dependent which may not be the same for all targets.
Passed the decision whether the given stride is complex or not to the target by sending stride information via SCEV to getAddressComputationCost instead of 'IsComplex'.

Specifically at X86 targets we dont see any significant address computation cost in case of the strided access in general.

Differential Revision: https://reviews.llvm.org/D27518

llvm-svn: 291106
2017-01-05 14:03:41 +00:00
Adrian Prantl 1eadba1c8c Renumber testcase metadata nodes after r290153.
This patch renumbers the metadata nodes in debug info testcases after
https://reviews.llvm.org/D26769. This is a separate patch because it
causes so much churn. This was implemented with a python script that
pipes the testcases through llvm-as - | llvm-dis - and then goes
through the original and new output side-by side to insert all
comments at a close-enough location.

Differential Revision: https://reviews.llvm.org/D27765

llvm-svn: 290292
2016-12-22 00:45:21 +00:00
Adrian Prantl bceaaa9643 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153
2016-12-20 02:09:43 +00:00
Florian Hahn 2e03213f90 [LoopVersioning] Require loop-simplify form for loop versioning.
Summary:
Requiring loop-simplify form for loop versioning ensures that the
runtime check block always dominates the exit block.
    
This patch closes #30958 (https://llvm.org/bugs/show_bug.cgi?id=30958).

Reviewers: silviu.baranga, hfinkel, anemet, ashutosh.nema

Subscribers: ashutosh.nema, mzolotukhin, efriedma, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D27469

llvm-svn: 290116
2016-12-19 17:13:37 +00:00
Adrian Prantl 73ec065604 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Matthew Simpson a4964f291a Reapply "[LV] Enable vectorization of loops with conditional stores by default"
This patch reapplies r289863. The original patch was reverted because it
exposed a bug causing the loop vectorizer to crash in the Python runtime on
PPC. The underlying issue was fixed with r289958.

llvm-svn: 289975
2016-12-16 19:12:02 +00:00
Matthew Simpson 099af810de [LV] Don't attempt to type-shrink scalarized instructions
After r288909, instructions feeding predicated instructions may be scalarized
if profitable. Since these instructions will remain scalar, we shouldn't
attempt to type-shrink them. We should only truncate vector types to their
minimal bit widths. This bug was exposed by enabling the vectorization of loops
containing conditional stores by default.

llvm-svn: 289958
2016-12-16 16:52:35 +00:00
Chandler Carruth 48b4e614d8 Revert r289863: [LV] Enable vectorization of loops with conditional
stores by default

This uncovers a crasher in the loop vectorizer on PPC when building the
Python runtime. I'll send the testcase to the review thread for the
original commit.

llvm-svn: 289934
2016-12-16 11:31:39 +00:00
Adrian Prantl 74a835cda0 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Adrian Prantl 03c6d31a3b Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Adrian Prantl ce13935776 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Matthew Simpson 6a98bcfe33 [LV] Enable vectorization of loops with conditional stores by default
This patch sets the default value of the "-enable-cond-stores-vec" command line
option to "true".

Differential Revision: https://reviews.llvm.org/D27814

llvm-svn: 289863
2016-12-15 20:11:05 +00:00
Michael Kuperstein 3d23d4a234 [LV] Don't vectorize when we have a small static bound on trip count
We currently check if the exact trip count is known and is smaller than the
"tiny loop" bound. We should be checking the maximum bound on the trip count
instead.

Differential Revision: https://reviews.llvm.org/D27690

llvm-svn: 289583
2016-12-13 20:38:18 +00:00
Sanjoy Das 3336f681e3 [Verifier] Add verification for TBAA metadata
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.

Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:

 - That by the time an struct access tuple `(base-type, offset)` is
   "reduced" to a scalar base type, the offset is `0`.  For instance, in
   C++ you can't start from, say `("struct-a", 16)`, and end up with
   `("int", 4)` -- by the time the base type is `"int"`, the offset
   better be zero.  In particular, a variant of this invariant is needed
   for `llvm::getMostGenericTBAA` to be correct.

 - That there are no cycles in a struct path.

 - That struct type nodes have their offsets listed in an ascending
   order.

 - That when generating the struct access path, you eventually reach the
   access type listed in the tbaa tag node.

Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D26438

llvm-svn: 289402
2016-12-11 20:07:15 +00:00
Matthew Simpson 364da7e527 [LV] Scalarize operands of predicated instructions
This patch attempts to scalarize the operand expressions of predicated
instructions if they were conditionally executed in the original loop. After
scalarization, the expressions will be sunk inside the blocks created for the
predicated instructions. The transformation essentially performs
un-if-conversion on the operands.

The cost model has been updated to determine if scalarization is profitable. It
compares the cost of a vectorized instruction, assuming it will be
if-converted, to the cost of the scalarized instruction, assuming that the
instructions corresponding to each vector lane will be sunk inside a predicated
block, possibly avoiding execution. If it's more profitable to scalarize the
entire expression tree feeding the predicated instruction, the expression will
be scalarized; otherwise, it will be vectorized. We only consider the cost of
the entire expression to accurately estimate the cost of the required
insertelement and extractelement instructions.

Differential Revision: https://reviews.llvm.org/D26083

llvm-svn: 288909
2016-12-07 15:03:32 +00:00
Guozhi Wei 835de1f3ab [ppc] Correctly compute the cost of loading 32/64 bit memory into VSR
VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.

Differential Revision: https://reviews.llvm.org/D26713

llvm-svn: 288560
2016-12-03 00:41:43 +00:00
Robert Lougher b0905209dd [LoopVectorizer] When estimating reg usage, unused insts may "end" another use
The register usage algorithm incorrectly treats instructions whose value is
not used within the loop (e.g. those that do not produce a value).

The algorithm first calculates the usages within the loop.  It iterates over
the instructions in order, and records at which instruction index each use
ends (in fact, they're actually recorded against the next index, as this is
when we want to delete them from the open intervals).

The algorithm then iterates over the instructions again, adding each
instruction in turn to a list of open intervals.  Instructions are then
removed from the list of open intervals when they occur in the list of uses
ended at the current index.

The problem is, instructions which are not used in the loop are skipped.
However, although they aren't used, the last use of a value may have been
recorded against that instruction index.  In this case, the use is not deleted
from the open intervals, which may then bump up the estimated register usage.

This patch fixes the issue by simply moving the "is used" check after the loop
which erases the uses at the current index.

Differential Revision: https://reviews.llvm.org/D26554

llvm-svn: 286969
2016-11-15 14:27:33 +00:00
Simon Pilgrim 27fed8e5d6 [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

llvm-svn: 286832
2016-11-14 14:45:16 +00:00
Adam Nemet 9bfbf8bbdf [LV] Stop saying "use -Rpass-analysis=loop-vectorize"
This is PR28376.

Unfortunately given the current structure of optimization diagnostics we
lack the capability to tell whether the user has
passed -Rpass-analysis=loop-vectorize since this is local to the
front-end (BackendConsumer::OptimizationRemarkHandler).

So rather than printing this even if the user has already
passed -Rpass-analysis, this patch just punts and stops recommending
this option.  I don't think that getting this right is worth the
complexity.

Differential Revision: https://reviews.llvm.org/D26563

llvm-svn: 286662
2016-11-11 22:51:46 +00:00
Dehao Chen 06e079a530 Update vectorization debug info unittest.
Summary:
The change will test the change in r286159.
The idea behind the change: Make the dbg location different between loop header and preheader/exit. Originally, dbg location 21 exists in 3 BBs: preheader, header, critical edge (exit). Update the debug location of inside the loop header from !21 to !22 so that it will reflect the correct location.

Reviewers: probinson

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26428

llvm-svn: 286403
2016-11-09 22:25:19 +00:00
Dorit Nuzman bf2c15b5dc Second attempt at r285517.
llvm-svn: 285568
2016-10-31 13:17:31 +00:00
Dorit Nuzman 06903d16af Revert r285517 due to build failures.
llvm-svn: 285518
2016-10-30 14:34:57 +00:00
Dorit Nuzman 3c1c658f24 [LoopVectorize] Make interleaved-accesses analysis less conservative about
possible pointer-wrap-around concerns, in some cases.

Before this patch, collectConstStridedAccesses (part of interleaved-accesses
analysis) called getPtrStride with [Assume=false, ShouldCheckWrap=true] when
examining all candidate pointers. This is too conservative. Instead, this
patch makes collectConstStridedAccesses use an optimistic approach, calling
getPtrStride with [Assume=true, ShouldCheckWrap=false], and then, once the
candidate interleave groups have been formed, revisits the pointer-wrapping
analysis but only where it matters: namely, in groups that have gaps, and where
the gaps are not at the very end of the group (in which case the loop is
peeled). This second time getPtrStride is called with [Assume=false,
ShouldCheckWrap=true], but this could further be improved to using Assume=true,
once we also add the logic to track that we are not going to meet the scev
runtime checks threshold.

Differential Revision: https://reviews.llvm.org/D25276

llvm-svn: 285517
2016-10-30 12:23:26 +00:00
Matthew Simpson 9b6755362b [LV] Correct misleading comments in test (NFC)
llvm-svn: 285402
2016-10-28 14:27:45 +00:00
Matthew Simpson c62266d680 [LV] Sink scalar operands of predicated instructions
When we predicate an instruction (div, rem, store) we place the instruction in
its own basic block within the vectorized loop. If a predicated instruction has
scalar operands, it's possible to recursively sink these scalar expressions
into the predicated block so that they might avoid execution. This patch sinks
as much scalar computation as possible into predicated blocks. We previously
were able to sink such operands only if they were extractelement instructions.

Differential Revision: https://reviews.llvm.org/D25632

llvm-svn: 285097
2016-10-25 18:59:45 +00:00
Michael Kuperstein b2443ed62b [X86] Enable interleaved memory access by default
This lets the loop vectorizer generate interleaved memory accesses on x86.

Differential Revision: https://reviews.llvm.org/D25350

llvm-svn: 284779
2016-10-20 21:04:31 +00:00
Matthew Simpson 41fa838f07 [LV] Avoid emitting trivially dead instructions
Some instructions from the original loop, when vectorized, can become trivially
dead. This happens because of the way we structure the new loop. For example,
we create new induction variables and induction variable "steps" in the new
loop. Thus, when we go to vectorize the original induction variable update, it
may no longer be needed due to the instructions we've already created. This
patch prevents us from creating these redundant instructions. This reduces code
size before simplification and allows greater flexibility in code generation
since we have fewer unnecessary instruction uses.

Differential Revision: https://reviews.llvm.org/D25631

llvm-svn: 284631
2016-10-19 19:22:02 +00:00
Matthew Simpson 1d4b163fc0 [LV] Account for predicated stores in instruction costs
This patch ensures that we scale the estimated cost of predicated stores by
block probability. This is a follow-on patch for r284123.

llvm-svn: 284126
2016-10-13 14:54:31 +00:00
Matthew Simpson 6cdb5a6f96 [LV] Avoid rounding errors for predicated instruction costs
This patch modifies the cost calculation of predicated instructions (div and
rem) to avoid the accumulation of rounding errors due to multiple truncating
integer divisions. The calculation for predicated stores will be addressed in a
follow-on patch since we currently don't scale the cost of predicated stores by
block probability.

Differential Revision: https://reviews.llvm.org/D25333

llvm-svn: 284123
2016-10-13 14:19:48 +00:00
Matthew Simpson a371c14ffe [LV] Don't mark multi-use branch conditions uniform
Previously, we marked the branch conditions of latch blocks uniform after
vectorization if they were instructions contained in the loop. However, if a
condition instruction has users other than the branch, it may not remain
uniform. This patch ensures the conditions we mark uniform are only used by the
branch. This should fix PR30627.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30627
llvm-svn: 283563
2016-10-07 15:20:13 +00:00
Michael Kuperstein 5185b7dde3 [LV] Remove triples from target-independent vectorizer tests. NFC.
Vectorizer tests in the target-independent directory should not have a target
triple. If a test really needs to query a specific backend, it belongs in the
right target subdirectory (which "REQUIRES" the right backend). Otherwise, it
should not specify a triple.

llvm-svn: 283512
2016-10-06 23:57:25 +00:00
Matthew Simpson 7808833e28 [LV] Build all scalar steps for non-uniform induction variables
When building the steps for scalar induction variables, we previously attempted
to determine if all the scalar users of the induction variable were uniform. If
they were, we would only emit the step corresponding to vector lane zero. This
optimization was too aggressive. We generally don't know the entire set of
induction variable users that will be scalar. We have
isScalarAfterVectorization, but this is only a conservative estimate of the
instructions that will be scalarized. Thus, an induction variable may have
scalar users that aren't already known to be scalar. To avoid emitting unused
steps, we can only check that the induction variable is uniform. This should
fix PR30542.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30542
llvm-svn: 282863
2016-09-30 15:13:52 +00:00
Matthew Simpson b764aba2ab [LV] Scalarize instructions marked scalar after vectorization
This patch ensures that we actually scalarize instructions marked scalar after
vectorization. Previously, such instructions may have been vectorized instead.

Differential Revision: https://reviews.llvm.org/D23889

llvm-svn: 282418
2016-09-26 17:08:37 +00:00
Matthew Simpson 15869f86d8 [LV] Don't emit unused scalars for uniform instructions
If we identify an instruction as uniform after vectorization, we know that we
should only use the value corresponding to the first vector lane of each unroll
iteration. However, when scalarizing such instructions, we still produce values
for the other vector lanes. This patch prevents us from generating the unused
scalars.

Differential Revision: https://reviews.llvm.org/D24275

llvm-svn: 282087
2016-09-21 16:50:24 +00:00
Adam Nemet e3cef93727 [LV] When reporting about a specific instruction without debug location use loop's
This can occur for example if some optimization drops the debug location.

llvm-svn: 282048
2016-09-21 03:14:20 +00:00
Elena Demikhovsky 5f8cc0c346 [Loop Vectorizer] Consecutive memory access - fixed and simplified
Amended consecutive memory access detection in Loop Vectorizer.
Load/Store were not handled properly without preceding GEP instruction.

Differential Revision: https://reviews.llvm.org/D20789

llvm-svn: 281853
2016-09-18 13:56:08 +00:00
Matthew Simpson b25e87fca5 [LV] Process pointer IVs with PHINodes in collectLoopUniforms
This patch moves the processing of pointer induction variables in
collectLoopUniforms from the consecutive pointer phase of the analysis to the
phi node phase. Previously, if a pointer induction variable was used by both a
scalarized non-memory instruction as well as a vectorized memory instruction,
we would incorrectly identify the pointer as uniform. Pointer induction
variables should be treated the same as other phi nodes. That is, they are
uniform if all users of the induction variable and induction variable update
are uniform.

Differential Revision: https://reviews.llvm.org/D24511

llvm-svn: 281485
2016-09-14 14:47:40 +00:00
Peter Collingbourne d4135bbc30 DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.

Fixes PR30362. A program for upgrading test cases is attached to that
bug.

Differential Revision: http://reviews.llvm.org/D20147

llvm-svn: 281284
2016-09-13 01:12:59 +00:00
Matthew Simpson bfe5e1817b [LV] Ensure proper handling of multi-use case when collecting uniforms
The test case included in r280979 wasn't checking what it was supposed to be
checking for the predicated store case. Fixing the test revealed that the
multi-use case (when a pointer is used by both vectorized and scalarized memory
accesses) wasn't being handled properly. We can't skip over
non-consecutive-like pointers since they may have looked consecutive-like with
a different memory access.

llvm-svn: 280992
2016-09-08 21:38:26 +00:00
Matthew Simpson 408a3abcfe [LV] Don't mark pointers used by scalarized memory accesses uniform
Previously, all consecutive pointers were marked uniform after vectorization.
However, if a consecutive pointer is used by a memory access that is eventually
scalarized, the pointer won't remain uniform after all. An example is
predicated stores. Even though a predicated store may be consecutive, it will
still be scalarized, making it's pointer operand non-uniform.

This patch updates the logic in collectLoopUniforms to consider the cases where
a memory access may be scalarized. If a memory access may be scalarized, its
pointer operand is not marked uniform. The determination of whether a given
memory instruction will be scalarized or not has been moved into a common
function that is used by the vectorizer, cost model, and legality analysis.

Differential Revision: https://reviews.llvm.org/D24271

llvm-svn: 280979
2016-09-08 19:11:07 +00:00
Matthew Simpson b65c230eab [LV] Ensure reverse interleaved group GEPs remain uniform
For uniform instructions, we're only required to generate a scalar value for
the first vector lane of each unroll iteration. Thus, if we have a reverse
interleaved group, computing the member index off the scalar GEP corresponding
to the last vector lane of its pointer operand technically makes the GEP
non-uniform. We should compute the member index off the first scalar GEP
instead.

I've added the updated member index computation to the existing reverse
interleaved group test.

llvm-svn: 280497
2016-09-02 16:19:22 +00:00
Michael Kuperstein 2954d1db77 [LoopVectorizer] Predicate instructions in blocks with several incoming edges
We don't need to limit predication to blocks that have a single incoming
edge, we just need to use the right mask.
This fixes PR30172.

Differential Revision: https://reviews.llvm.org/D24009

llvm-svn: 280148
2016-08-30 20:22:21 +00:00
Matthew Simpson df19502b16 [LV] Move insertelement sequence after scalar definitions
After r279649 when getting a vector value from VectorLoopValueMap, we create an
insertelement sequence on-demand if the value has been scalarized instead of
vectorized. We previously inserted this insertelement sequence before the
value's first vector user. However, this insert location is problematic if that
user is the phi node of a first-order recurrence. With this patch, we move the
insertelement sequence after the last scalar instruction we created when
scalarizing the value. Thus, the value's vector definition in the new loop will
immediately follow its scalar definitions. This should fix PR30183.

Reference: https://llvm.org/bugs/show_bug.cgi?id=30183
llvm-svn: 280001
2016-08-29 20:14:04 +00:00
Elena Demikhovsky 3622fbfc68 [Loop Vectorizer] Fixed memory confilict checks.
Fixed a bug in run-time checks for possible memory conflicts inside loop.
The bug is in Low <-> High boundaries calculation. The High boundary should be calculated as "last memory access pointer + element size".

Differential revision: https://reviews.llvm.org/D23176

llvm-svn: 279930
2016-08-28 08:53:53 +00:00
Matthew Simpson abd2be1e2e [LV] Unify vector and scalar maps
This patch unifies the data structures we use for mapping instructions from the
original loop to their corresponding instructions in the new loop. Previously,
we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap.
WidenMap maintained the vector values each instruction from the old loop was
represented with, and ScalarIVMap maintained the scalar values each scalarized
induction variable was represented with. With this patch, all values created
for the new loop are maintained in VectorLoopValueMap.

The change allows for several simplifications. Previously, when an instruction
was scalarized, we had to insert the scalar values into vectors in order to
maintain the mapping in WidenMap. Then, if a user of the scalarized value was
also scalar, we had to extract the scalar values from the temporary vector we
created. We now aovid these unnecessary scalar-to-vector-to-scalar conversions.
If a scalarized value is used by a scalar instruction, the scalar value is used
directly. However, if the scalarized value is needed by a vector instruction,
we generate the needed insertelement instructions on-demand.

A common idiom in several locations in the code (including the scalarization
code), is to first get the vector values an instruction from the original loop
maps to, and then extract a particular scalar value. This patch adds
getScalarValue for this purpose along side getVectorValue as an interface into
VectorLoopValueMap. These functions work together to return the requested
values if they're available or to produce them if they're not.

The mapping has also be made less permissive. Entries can be added to
VectorLoopValue map with the new initVector and initScalar functions.
getVectorValue has been modified to return a constant reference to the mapped
entries.

There's no real functional change with this patch; however, in some cases we
will generate slightly different code. For example, instead of an insertelement
sequence following the definition of an instruction, it will now precede the
first use of that instruction. This can be seen in the test case changes.

Differential Revision: https://reviews.llvm.org/D23169

llvm-svn: 279649
2016-08-24 18:23:17 +00:00
Gil Rapaport 550148b2f6 [Loop Vectorizer] Support predication of div/rem
div/rem instructions in basic blocks that require predication currently prevent
vectorization. This patch extends the existing mechanism for predicating stores
to handle other instructions and leverages it to predicate divs and rems.

Differential Revision: https://reviews.llvm.org/D22918

llvm-svn: 279620
2016-08-24 11:37:57 +00:00
Tim Shen c9c0d2dcb5 [LoopVectorize] Detect loops in the innermost loop before creating InnerLoopVectorizer
InnerLoopVectorizer shouldn't handle a loop with cycles inside the loop
body, even if that cycle isn't a natural loop.

Fixes PR28541.

Differential Revision: https://reviews.llvm.org/D22952

llvm-svn: 278573
2016-08-12 22:47:13 +00:00
David Majnemer a19d0f2f3e [ValueTracking] Teach computeKnownBits about [su]min/max
Reasoning about a select in terms of a min or max allows us to derive a
tigher bound on the result.

llvm-svn: 277914
2016-08-06 08:16:00 +00:00
Michael Kuperstein 3ceac2bbd5 [LV, X86] Be more optimistic about vectorizing shifts.
Shifts with a uniform but non-constant count were considered very expensive to
vectorize, because the splat of the uniform count and the shift would tend to
appear in different blocks. That made the splat invisible to ISel, and we'd
scalarize the shift at codegen time.

Since r201655, CodeGenPrepare sinks those splats to be next to their use, and we
are able to select the appropriate vector shifts. This updates the cost model to
to take this into account by making shifts by a uniform cheap again.

Differential Revision: https://reviews.llvm.org/D23049

llvm-svn: 277782
2016-08-04 22:48:03 +00:00
Wei Mi dc7001afb2 [LoopVectorize] Change comment for isOutOfScope in collectLoopUniforms, NFC
Update comment for isOutOfScope and add a testcase for uniform value being used
out of scope.

Differential Revision: https://reviews.llvm.org/D23073

llvm-svn: 277515
2016-08-02 20:27:49 +00:00
Matthew Simpson 18d8898317 [LV] Generate both scalar and vector integer induction variables
This patch enables the vectorizer to generate both scalar and vector versions
of an integer induction variable for a given loop. Previously, we only
generated a scalar induction variable if we knew all its users were going to be
scalar. Otherwise, we generated a vector induction variable. In the case of a
loop with both scalar and vector users of the induction variable, we would
generate the vector induction variable and extract scalar values from it for
the scalar users. With this patch, we now generate both versions of the
induction variable when there are both scalar and vector users and select which
version to use based on whether the user is scalar or vector.

Differential Revision: https://reviews.llvm.org/D22869

llvm-svn: 277474
2016-08-02 15:25:16 +00:00
Matthew Simpson 58f562887b [LV] Untangle the concepts of uniform and scalar
This patch refactors the logic in collectLoopUniforms and
collectValuesToIgnore, untangling the concepts of "uniform" and "scalar". It
adds isScalarAfterVectorization along side isUniformAfterVectorization to
distinguish the two. Known scalar values include those that are uniform,
getelementptr instructions that won't be vectorized, and induction variables
and induction variable update instructions whose users are all known to be
scalar.

This patch includes the following functional changes:

- In collectLoopUniforms, we mark uniform the pointer operands of interleaved
  accesses. Although non-consecutive, these pointers are treated like
  consecutive pointers during vectorization.

- In collectValuesToIgnore, we insert a value into VecValuesToIgnore if it
  isScalarAfterVectorization rather than isUniformAfterVectorization. This
  differs from the previous functionaly in that we now add getelementptr
  instructions that will not be vectorized into VecValuesToIgnore.

This patch also removes the ValuesNotWidened set used for induction variable
scalarization since, after the above changes, it is now equivalent to
isScalarAfterVectorization.

Differential Revision: https://reviews.llvm.org/D22867

llvm-svn: 277460
2016-08-02 14:29:41 +00:00
Igor Breger f44b79d08e [AVX512] Don't use i128 masked gather/scatter/load/store. Do more accurately dataWidth check.
Differential Revision: http://reviews.llvm.org/D23055

llvm-svn: 277435
2016-08-02 09:15:28 +00:00
Craig Topper d2b2d745ff [AVX-512] Fix a test missed in r277327.
llvm-svn: 277330
2016-08-01 08:15:30 +00:00
Matt Masten a6669a1e05 Initial support for vectorization using svml (short vector math library).
Differential Revision: https://reviews.llvm.org/D19544

llvm-svn: 277166
2016-07-29 16:42:44 +00:00