Commit Graph

27221 Commits

Author SHA1 Message Date
Reshabh Sharma 9f51f1b927 [ASAN][AMDGPU] Add support for accesses to global and constant addrspaces
Add address sanitizer instrumentation support for accesses to global
and constant address spaces in AMDGPU. It strictly avoids instrumenting
the stack and assumes x86 as the host.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D99071
2021-05-03 09:01:15 +05:30
Florian Hahn 942e068d7a [VPlan] Add VPBasicBlock::phis() helper (NFC).
This patch introduces a helper to obtain an iterator range for the
PHI-like recipes in a block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100101
2021-05-02 19:20:13 +01:00
Juneyoung Lee 39eb2665d9 [InstCombine] Add a few more patterns for folding select of select
This is a patch that folds select of select to salvage some optimizations after select -> and/or folding is disabled.

```
select (select a, true, b), c, false -> select a, c, false
select c, (select a, true, b), false -> select c, a, false
  if c implies that b is false (isImpliedCondition).
```
https://alive2.llvm.org/ce/z/ANatjt, https://alive2.llvm.org/ce/z/rv8zTB

```
sel (sel c, a, false), true, (sel !c, b, false) -> sel c, a, b
sel (sel !c, a, false), true, (sel c, b, false) -> sel c, b, a
```
https://alive2.llvm.org/ce/z/U2kp-t, https://alive2.llvm.org/ce/z/bc88EE

See D101191

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101375
2021-05-02 19:00:42 +09:00
Juneyoung Lee 1977c53b2a [InstCombine] Fold overflow bit of [u|s]mul.with.overflow in a poison-safe way
As discussed in D101191, this patch adds a poison-safe folding of overflow bit check:
```
  %Op0 = icmp ne i4 %X, 0
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = select i1 %Op0, i1 %Op1, i1 false
=>
  %Y.fr = freeze %Y
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y.fr)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = %Op1
```

https://alive2.llvm.org/ce/z/zgPUGT
https://alive2.llvm.org/ce/z/h2gZ_6

Note that there are cases where inserting freeze is not necessary: e.g. %Y is `noundef`.
In this case, LLVM is already good because `%ret` is already successfully folded into `and`,
triggering the pre-existing optimization in InstSimplify: https://godbolt.org/z/v6qena15K

Differential Revision: https://reviews.llvm.org/D101423
2021-05-02 11:54:12 +09:00
Nikita Popov ffa5a402a9 [IndVars] Remove redundant loop invariance check (NFC)
This is checked again directly below this condition.
2021-05-01 17:22:00 +02:00
Nathan Chancellor 4397b7095d
Revert "Re-reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
This reverts commit 791930d740, as per
https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy.

I observed breakage with the Linux kernel, as reported at
https://reviews.llvm.org/D91722#2724321

Fixes exist at
https://reviews.llvm.org/D101523
https://reviews.llvm.org/D101540

but they have not landed so to unbreak the tree for the weekend, revert
this commit.

Commit b11e4c9907 ("Revert "[DebugInfo] Drop DBG_VALUE_LISTs with an
excessive number of debug operands"") only reverted one follow-up fix,
not the original patch that broke the kernel.

e
2021-04-30 20:23:21 -07:00
George Balatsouras a45fd436ae [dfsan] Fix origin tracking for fast8
The problem is the following. With fast8, we broke an important
invariant when loading shadows.  A wide shadow of 64 bits used to
correspond to 4 application bytes with fast16; so, generating a single
load was okay since those 4 application bytes would share a single
origin.  Now, using fast8, a wide shadow of 64 bits corresponds to 8
application bytes that should be backed by 2 origins (but we kept
generating just one).

Let’s say our wide shadow is 64-bit and consists of the following:
0xABCDEFGH. To check if we need the second origin value, we could do
the following (on the 64-bit wide shadow) case:

 - bitwise shift the wide shadow left by 32 bits (yielding 0xEFGH0000)
 - push the result along with the first origin load to the shadow/origin vectors
 - load the second 32-bit origin of the 64-bit wide shadow
 - push the wide shadow along with the second origin to the shadow/origin vectors.

The combineOrigins would then select the second origin if the wide
shadow is of the form 0xABCDE0000.  The tests illustrate how this
change affects the generated bitcode.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D101584
2021-04-30 15:57:33 -07:00
Justin Bogner 9542721085 Add support for llvm.assume intrinsic to the LoadStoreVectorizer pass
Patch by Viacheslav Nikolaev. Thanks!
2021-04-30 13:39:46 -07:00
Alexey Bataev a3fd82c289 [SLP]Fix the crash on cost calculation if non-compatible vectors shuffled.
If the extracts from the non-power-2 vectors are recognized as shuffles,
need some extra checks to not crash cost calculations if trying to gext
the ecost for subvector extracts. In this case need to check carefully
that we do not exit out of bounds of the original vector, otherwise the
TTI's cost model will crash on assert.

Differential Revision: https://reviews.llvm.org/D101477
2021-04-30 09:34:20 -07:00
Jingu Kang 88b259c014 [SimpleLoopUnswitch] Port partially invariant unswitch from LoopUnswitch to SimpleLoopUnswitch
Differential Revision: https://reviews.llvm.org/D99354
2021-04-30 15:55:56 +01:00
Evgeniy Brevnov 7861cb600c [NARY] Don't optimize min/max if there are side uses (part2)
Previous attempt to fix infinite recursion in min/max reassociation was not fully successful (D100170). Newly discovered failing case is due to not properly handled when there is a single use. It should be processed separately from 2 uses case.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D101359
2021-04-30 19:02:02 +07:00
Florian Hahn ed9df5bd2f
[Passes] Run sinking/hoisting in SimplifyCFG earlier.
Hoisting and sinking instructions out of conditional blocks enables
additional vectorization by:

1. Executing memory accesses unconditionally.
2. Reducing the number of instructions that need predication.

After disabling early hoisting / sinking, we miss out on a few
vectorization opportunities. One of those is causing a ~10% performance
regression in one of the Geekbench benchmarks on AArch64.

This patch tires to recover the regression by running hoisting/sinking
as part of a SimplifyCFG run after LoopRotate and before LoopVectorize.

Note that in the legacy pass-manager, we run LoopRotate just before
vectorization again and there's no SimplifyCFG run in between, so the
sinking/hoisting may impact the later run on LoopRotate. But the impact
should be limited and the benefit of hosting/sinking at this stage
should outweigh the risk of not rotating.

Compile-time impact looks slightly positive for most cases.
http://llvm-compile-time-tracker.com/compare.php?from=2ea7fb7b1c045a7d60fcccf3df3ebb26aa3699e5&to=e58b4a763c691da651f25996aad619cb3d946faf&stat=instructions

NewPM-O3: geomean -0.19%
NewPM-ReleaseThinLTO: geoman -0.54%
NewPM-ReleaseLTO-g: geomean -0.03%

With a few benchmarks seeing a notable increase, but also some
improvements.

Alternative to D101290.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D101468
2021-04-30 12:23:57 +01:00
Alexey Bataev 12c51f2358 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 12:48:00 -07:00
Alexey Bataev 6e859f3cd4 Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."
This reverts commit 9239932221 to fix
a compiler crash on mask checks.
2021-04-29 12:40:33 -07:00
Sanjay Patel 0f8b6686ac [InstCombine] narrow popcount with zext operand
https://llvm.org/PR50141
2021-04-29 15:07:16 -04:00
Roman Lebedev cc63203908
[SimplifyCFG] Common code sinking: fix application of profitability check
The profitability check is: we don't want to create more than a single PHI
per instruction sunk. We need to create the PHI unless we'll sink
all of it's would-be incoming values.

But there is a caveat there.
This profitability check doesn't converge on the first iteration!
If we first decide that we want to sink 10 instructions,
but then determine that 5'th one is unprofitable to sink,
that may result in us not sinking some instructions that
resulted in determining that some other instruction
we've determined to be profitable to sink becoming unprofitable.

So we need to iterate until we converge, as in determine
that all leftover instructions are profitable to sink.

But, the direct approach of just re-iterating seems dumb,
because in the worst case we'd find that the last instruction
is unprofitable, which would result in revisiting instructions
many many times.

Instead, i think we can get away with just two passes - forward and backward.
However then it isn't obvious what is the most performant way to update
InstructionsToSink.
2021-04-29 21:11:40 +03:00
Alexey Bataev 9239932221 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 09:42:56 -07:00
Sander de Smalen 51d648c119 Revert "[LV] Calculate max feasible scalable VF."
Temporarily reverting this patch due to some unexpected issue found
by one of the PPC buildbots.

This reverts commit 584e9b6e4b.
2021-04-29 16:04:37 +01:00
Florian Hahn a0e1313c23
[VPlan] Add getVPSingleValue helper.
As suggested in D99294, this adds a getVPSingleValue helper to use for
recipes that are guaranteed to define a single value. This replaces uses
of getVPValue() which used to default to I = 0.
2021-04-29 13:37:38 +01:00
Reshabh Sharma 60c60dd138 [ASAN] NFC: Use addrspace cast for pointers in non-zero addrspace
Pointers in non-zero address spaces need to be address space
casted before appending to the used list.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D101363
2021-04-29 11:06:00 +05:30
Reshabh Sharma fc1df36e6e [ASAN] NFC: Copy address space when creating globals with redzones
This patch makes sure that globals in supported address spaces
will be replaced by globals with red zones in the same address
space by copying the address space.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D101362
2021-04-29 10:21:43 +05:30
Amanieu d'Antras ad9ce8142d [ConstantMerge] Don't merge thread_local constants with non-thread_local constants
Fixes PR49932

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D100322
2021-04-28 23:44:20 +01:00
Roman Lebedev 707ad01399
[SimplifyCFG] Common code sinking: fixup variable name
As noticed in post-commit review.

I've gone through several iterations of that name,
and somehow managed to end up with an incorrect one.
2021-04-29 01:24:16 +03:00
Dávid Bolvanský e20b32ff3b [BuildLibCalls] Remove inaccessiblememonly inference for calloc
Solves regression mentioned in PR50143.

As noted in D101440, proper modelling for calloc would require new attribute inaccessible_or_returned_memonly.
2021-04-29 00:17:37 +02:00
Roman Lebedev 1886aad9d0
[SimplifyCFG] Common code sinking: relax restriction on non-uncond predecessors
While we have a known profitability issue for sinking in presence of
non-unconditional predecessors, there isn't any known issues
for having multiple such non-unconditional predecessors,
so said restriction appears to be artificial. Lift it.
2021-04-29 01:01:01 +03:00
Roman Lebedev a8e273f2ed
[NFC][SimplifyCFG] Add test showing that profitability check for sinking is broken
Essentially, we can't promise that the instruction is sinkable without
introducing PHI's until we know that it is profitable to sink.
2021-04-29 01:01:00 +03:00
Roman Lebedev 12c8027ce3
[NFC][SimplifyCFG] Common code sinking: check profitability once
We can just eagerly pre-check all the instructions that we *could*
sink that we'd actually want to sink them, clamping the number of
instructions that we'll sink to stop just before the first unprofitable one.
2021-04-29 01:01:00 +03:00
Roman Lebedev 4c27ca21d9
[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): reword comment about PR30244 2021-04-29 01:01:00 +03:00
Bardia Mahjour ddb3b26a12 [LV] Consider Loop Unroll Hints When Making Interleave Decisions
This patch causes the loop vectorizer to not interleave loops that have
nounroll loop hints (llvm.loop.unroll.disable and llvm.loop.unroll_count(1)).
Note that if a particular interleave count is being requested
(through llvm.loop.interleave_count), it will still be honoured, regardless
of the presence of nounroll hints.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D101374
2021-04-28 17:27:52 -04:00
Sanjay Patel abd7529625 [InstCombine] relax masking requirement for truncated funnel/rotate match
I was investigating a seemingly unrelated improvement in demanded
bits for shift-left, but that caused regressions on these tests
because we were able to look through/eliminate the mask.

https://alive2.llvm.org/ce/z/Ztdr22

  define i8 @src(i32 %x, i32 %y, i32 %shift) {
  %and = and i32 %shift, 3
  %conv = and i32 %x, 255
  %shr = lshr i32 %conv, %and
  %sub = sub i32 8, %and
  %shl = shl i32 %y, %sub
  %or = or i32 %shr, %shl
  %conv2 = trunc i32 %or to i8
  ret i8 %conv2
  }

  define i8 @tgt(i32 %x, i32 %y, i32 %shift) {
  %x8 = trunc i32 %x to i8
  %y8 = trunc i32 %y to i8
  %shift8 = trunc i32 %shift to i8
  %and = and i8 %shift8, 3
  %conv2 = call i8 @llvm.fshr.i8(i8 %y8, i8 %x8, i8 %and)
  ret i8 %conv2
  }

  declare i8 @llvm.fshr.i8(i8,i8,i8)
2021-04-28 16:49:50 -04:00
Roman Lebedev d16d820c2e
[SimplifyCFG] Try 2: sink all-indirect indirect calls
Note that we don't want to turn a partially-direct call
into an indirect one, that will break ICP amongst other things.
2021-04-28 19:08:54 +03:00
Roman Lebedev 262c679d32
Revert "[SimplifyCFG] Sinking indirect calls - they're already indirect anyways"
Seems to break indirect call promotion, LTO/Resolution/X86/load-sample-prof-icp.ll fails.

This reverts commit e57cf128b3.
2021-04-28 17:46:59 +03:00
Roman Lebedev e57cf128b3
[SimplifyCFG] Sinking indirect calls - they're already indirect anyways 2021-04-28 17:36:23 +03:00
Dawid Jurczak 5f5974aeac [SimplifyLibCalls] Transform printf("%s", str) --> puts(str)/noop
Before this change LLVM cannot simplify printf in following cases:

printf("%s", "") --> noop
printf("%s", str"\n") --> puts(str)

From the other hand GCC can perform such transformations for many years:
https://godbolt.org/z/7nnqbedfe

Differential Revision: https://reviews.llvm.org/D100724
2021-04-28 10:29:07 -04:00
David Sherwood 00e65f3345 [LoopVectorize][SVE] Fix crash when vectorising FP negation
This patch fixes a crash encountered when vectorising the following loop:

 void foo(float *dst, float *src, long long n) {
   for (long long i = 0; i < n; i++)
     dst[i] = -src[i];
 }

using scalable vectors. I've added a test to

 Transforms/LoopVectorize/AArch64/sve-basic-vec.ll

as well as cleaned up the other tests in the same file.

Differential Revision: https://reviews.llvm.org/D98054
2021-04-28 15:22:35 +01:00
Tres Popp f0e848e63d Silence unused variable warning 2021-04-28 15:46:09 +02:00
Alexey Bataev 8af4723c58 [SLP]Try to vectorize tiny trees with shuffled gathers.
If the first tree element is vectorize and the second is gather, it
still might be profitable to vectorize it if the gather node contains
less scalars to vectorize than the original tree node. It might be
profitable to use shuffles.

Differential Revision: https://reviews.llvm.org/D101397
2021-04-28 06:35:31 -07:00
David Sherwood 6998f8ae2d [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-28 13:41:07 +01:00
Sander de Smalen 584e9b6e4b [LV] Calculate max feasible scalable VF.
This patch also refactors the way the feasible max VF is calculated,
although this is NFC for fixed-width vectors.

After this change scalable VF hints are no longer truncated/clamped
to a shorter scalable VF, nor does it drop the 'scalable flag' from
the suggested VF to vectorize with a similar VF that is fixed.

Instead, the hint is ignored which means the vectorizer is free
to find a more suitable VF, using the CostModel to determine the
best possible VF.

Reviewed By: c-rhodes, fhahn

Differential Revision: https://reviews.llvm.org/D98509
2021-04-28 12:30:00 +01:00
Tres Popp efce19c3b0 Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
This reverts commit 75d6b8bb40.

The reasoning is mentioned in https://reviews.llvm.org/D97667
2021-04-28 13:16:34 +02:00
Kerry McLaughlin 9cc217ab36 [LoopVectorize] Prevent multiple Phis being generated with in-order reductions
When using the -enable-strict-reductions flag where UF>1 we generate multiple
Phi nodes, though only one of these is used as an input to the vector.reduce.fadd
intrinsics. The unused Phi nodes are removed later by instcombine.

This patch changes widenPHIInstruction/fixReduction to only generate
one Phi, and adds an additional test for unrolling to strict-fadd.ll

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D100570
2021-04-28 11:29:01 +01:00
Benjamin Kramer 7e5682ee62 [ADT] Make TrackingStatistic's ctor constexpr
This lets clang diagnose unused statistics, so remove them.
2021-04-28 12:00:17 +02:00
Dávid Bolvanský e81819377e [DSE] Eliminate zero memset after calloc
Solves PR11896

As noted, this can be improved futher (calloc -> malloc) in some cases. But for know, this is the first step.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101391
2021-04-28 03:30:52 +02:00
Han Zhu 75d6b8bb40 [loop-idiom] Hoist loop memcpys to loop preheader
For a simple loop like:
```
struct S {
  int x;
  int y;
  char b;
};

unsigned foo(S* __restrict__ a, S* b, int n) {
  for (int i = 0; i < n; i++)
    a[i] = b[i];

  return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %0 = bitcast %struct.S* %arrayidx2 to i8*
  %1 = bitcast %struct.S* %arrayidx to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.

With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %a1 = bitcast %struct.S* %a to i8*
  %b2 = bitcast %struct.S* %b to i8*
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %0 = zext i32 %n to i64
  %1 = mul nuw nsw i64 %0, 12
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %2 = bitcast %struct.S* %arrayidx2 to i8*
  %3 = bitcast %struct.S* %arrayidx to i8*
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```

Reviewed By: zino

Differential Revision: https://reviews.llvm.org/D97667
2021-04-27 17:37:51 -07:00
Han Zhu cd13f19031 [loop-idiom][NFC] Extract processLoopStoreOfLoopLoad into a helper function
Differential Revision: https://reviews.llvm.org/D100979
2021-04-27 13:42:30 -07:00
Sanjay Patel 025bb52903 [InstCombine] fold clamp to 2 values from min/max intrinsics
The "select" versions of these folds is also missing and can
cause infinite loops as shown in:
https://llvm.org/PR48900
...but it seems easier to match these as max/min as a first fix.

https://alive2.llvm.org/ce/z/wv-_dT
2021-04-27 15:35:49 -04:00
David Sherwood 6968520c3b Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost"
This reverts commit 4afeda9157.
2021-04-27 15:46:03 +01:00
David Sherwood 4afeda9157 [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-27 15:26:15 +01:00
Alexey Bataev 24590d8d67 [SLP]Improved isGatherShuffledEntry, NFC.
Reworked isGatherShuffledEntry function, simplified and moved
common code to the lambda (it shall go away when non-power-2 patch will
be landed).
2021-04-27 05:59:46 -07:00
Florian Hahn cb96d802d4
[LV] Hoist code to get vector loop latch (NFC).
Address suggestion from D99294.
2021-04-27 13:30:17 +01:00