Commit Graph

26805 Commits

Author SHA1 Message Date
Wei Mi 14756b70ee [SampleFDO] Don't mix up the existing indirect call value profile with the new
value profile annotated after inlining.

In https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we
use the magic number -1 in the value profile to avoid repeated indirect call
promotion to the same target for an indirect call. Function updateIDTMetaData
is used to mark an target as being promoted in the value profile with the
magic number. updateIDTMetaData is also used to update the value profile
when an indirect call is inlined and new inline instance profile should be
applied. For the second case, currently updateIDTMetaData mixes up the
existing value profile of the indirect call with the new profile, leading
to the problematic senario that a target count is larger than the total count
in the value profile.

The patch fixes the problem. When updateIDTMetaData is used to update the
value profile after inlining, all the values in the existing value profile
will be dropped except the values with the magic number counts.

Differential Revision: https://reviews.llvm.org/D98835
2021-03-18 09:54:34 -07:00
Mircea Trofin 4b1c8070bb [NFC][ArgumentPromotion] Clear FAM cached results of erased function.
Not doing it here can lead to subtle bugs - the analysis results are
associated by the Function object's address. Nothing stops the memory
allocator from allocating new functions at the same address.
2021-03-18 09:17:32 -07:00
Alexey Bataev b3ced9852c [SLP]Fix crash on extending scheduling region.
If SLP vectorizer tries to extend the scheduling region and runs out of
the budget too early, but still extends the region to the new ending
instructions (i.e., it was able to extend the region for the first
instruction in the bundle, but not for the second), the compiler need to
recalculate dependecies in full, just like if the extending was
successfull. Without it, the schedule data chunks may end up with the
wrong number of (unscheduled) dependecies and it may end up with the
incorrect function, where the vectorized instruction does not dominate
on the extractelement instruction.

Differential Revision: https://reviews.llvm.org/D98531
2021-03-18 06:11:08 -07:00
Max Kazantsev 26ec76add5 [NFC] One more use case for evaluatePredicate 2021-03-18 19:21:29 +07:00
Max Kazantsev 1067a13cc1 [NFC] Use evaluatePredicate in eliminateComparison
Just makes code simpler.
2021-03-18 19:21:29 +07:00
Arthur Eubanks 792bed6a4c Revert "[NewPM] Verify LoopAnalysisResults after a loop pass"
This reverts commit 6db3ab2903.

Causing too large of compile time regression.
2021-03-17 15:22:52 -07:00
Arthur Eubanks 6db3ab2903 [NewPM] Verify LoopAnalysisResults after a loop pass
All loop passes should preserve all analyses in LoopAnalysisResults. Add
checks for those.

Note that due to PR44815, we don't check LAR's ScalarEvolution.
Apparently calling SE.verify() can change its results.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98805
2021-03-17 13:37:22 -07:00
Philip Reames 31764ea295 [LCSSA] Extract a utility for deciding if a new use requires a new lcssa phi [NFC]
(Triggered by a review comment on D98728, but otherwise unrelated.)
2021-03-17 12:14:01 -07:00
Philip Reames 7c7f4676cd [LICM] Fix a crash when sinking instructions w/token operands
It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not.

This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well.

Differential Revision: https://reviews.llvm.org/D98728
2021-03-17 11:18:46 -07:00
David Green e2935dcfc4 [TTI] Add a Mask to getShuffleCost
This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.

Differential Revision: https://reviews.llvm.org/D98206
2021-03-17 17:46:26 +00:00
Stephen Tozer 3bfddc2593 Reapply "[DebugInfo] Handle multiple variable location operands in IR"
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.

This reverts commit 01ac6d1587.
2021-03-17 16:45:25 +00:00
LemonBoy 4f024938e4 [LoopVectorize] Refine hasIrregularType predicate
The `hasIrregularType` predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector.
The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32.

The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits).

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D97465
2021-03-17 17:03:47 +01:00
Hans Wennborg 01ac6d1587 Revert "[DebugInfo] Handle multiple variable location operands in IR"
This caused non-deterministic compiler output; see comment on the
code review.

> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232

This reverts commit df69c69427.
2021-03-17 13:36:48 +01:00
David Green 3c25c40d51 [LV] Account for the cost of predication of scalarized load/store
This adds the cost of an i1 extract and a branch to the cost in
getMemInstScalarizationCost when the instruction is predicated. These
predicated loads/store would generate blocks of something like:

    %c1 = extractelement <4 x i1> %C, i32 1
    br i1 %c1, label %if, label %else
  if:
    %sa = extractelement <4 x i32> %a, i32 1
    %sb = getelementptr inbounds float, float* %pg, i32 %sa
    %sv = extractelement <4 x float> %x, i32 1
    store float %sa, float* %sb, align 4
  else:

So this increases the cost by the extract and branch. This is probably
still too low in many cases due to the cost of all that branching, but
there is already an existing hack increasing the cost using
useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is
a load or there are more than one store. This patch improves the cost
for when there is only a single store, and hopefully at some point in
the future the hack can be removed.

Differential Revision: https://reviews.llvm.org/D98243
2021-03-17 10:57:50 +00:00
Bu Le 9abe500473 [SLP] Fix the trunc instruction insertion problem
Current SLP pass has this piece of code that inserts a trunc instruction
after the vectorized instruction. In the case that the vectorized instruction
is a phi node and not the last phi node in the BB, the trunc instruction
will be inserted between two phi nodes, which will trigger verify problem
in debug version or unpredictable error in another pass.
This patch changes the algorithm to 'if the last vectorized instruction
is a phi, insert it after the last phi node in current BB' to fix this problem.
2021-03-17 13:51:08 +03:00
Arthur Eubanks 70af2924a7 [Unswitch] Guard dbgs logging with LLVM_DEBUG 2021-03-16 22:31:57 -07:00
Sanjay Patel 7202f47508 [SLP] separate min/max matching from its instruction-level implementation; NFC
The motivation is to handle integer min/max reductions independently
of whether they are in the current cmp+sel form or the planned intrinsic
form.

We assumed that min/max included a select instruction, but we can
decouple that implementation detail by checking the instructions
themselves rather than relying on the recurrence (reduction) type.
2021-03-16 17:16:11 -04:00
Mohammad Hadi Jooybar 302b80abf0 [InstCombine] Avoid Bitcast-GEP fusion for pointers directly from allocation functions
Elimination of bitcasts with void pointer arguments results in GEPs with pure byte indexes. These GEPs do not preserve struct/array information and interrupt phi address translation in later pipeline stages.

Here is the original motivation for this patch:

```
#include<stdio.h>
#include<malloc.h>

typedef struct __Node{

  double f;
  struct __Node *next;

} Node;

void foo () {
  Node *a = (Node*) malloc (sizeof(Node));
  a->next = NULL;
  a->f = 11.5f;

  Node *ptr = a;
  double sum = 0.0f;
  while (ptr) {
    sum += ptr->f;
    ptr = ptr->next;
  }
  printf("%f\n", sum);
}
```
By explicit assignment  `a->next = NULL`, we can infer the length of the link list is `1`. In this case we can eliminate while loop traversal entirely. This elimination is supposed to be performed by GVN/MemoryDependencyAnalysis/PhiTranslation .

The final IR before this patch:

```
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2
  %next = getelementptr inbounds i8, i8* %call, i64 8
  %0 = bitcast i8* %next to %struct.__Node**
  store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2
  %f = bitcast i8* %call to double*
  store double 1.150000e+01, double* %f, align 8, !tbaa !8
  %tobool12 = icmp eq i8* %call, null
  br i1 %tobool12, label %while.end, label %while.body.lr.ph

while.body.lr.ph:                                 ; preds = %entry
  %1 = bitcast i8* %call to %struct.__Node*
  br label %while.body

while.body:                                       ; preds = %while.body.lr.ph, %while.body
  %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ]
  %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ]
  %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0
  %2 = load double, double* %f1, align 8, !tbaa !8
  %add = fadd contract double %sum.014, %2
  %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1
  %3 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2
  %tobool = icmp eq %struct.__Node* %3, null
  br i1 %tobool, label %while.end, label %while.body

while.end:                                        ; preds = %while.body, %entry
  %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %while.body ]
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa)
  ret void
}
```

Final IR after this patch:
```
; Function Attrs: nofree nounwind
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
while.end:
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double 1.150000e+01)
  ret void
}
```

IR before GVN before this patch:
```
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2
  %next = getelementptr inbounds i8, i8* %call, i64 8
  %0 = bitcast i8* %next to %struct.__Node**
  store %struct.__Node* null, %struct.__Node** %0, align 8, !tbaa !2
  %f = bitcast i8* %call to double*
  store double 1.150000e+01, double* %f, align 8, !tbaa !8
  %tobool12 = icmp eq i8* %call, null
  br i1 %tobool12, label %while.end, label %while.body.lr.ph

while.body.lr.ph:                                 ; preds = %entry
  %1 = bitcast i8* %call to %struct.__Node*
  br label %while.body

while.body:                                       ; preds = %while.body.lr.ph, %while.body
  %sum.014 = phi double [ 0.000000e+00, %while.body.lr.ph ], [ %add, %while.body ]
  %ptr.013 = phi %struct.__Node* [ %1, %while.body.lr.ph ], [ %3, %while.body ]
  %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0
  %2 = load double, double* %f1, align 8, !tbaa !8
  %add = fadd contract double %sum.014, %2
  %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1
  %3 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2
  %tobool = icmp eq %struct.__Node* %3, null
  br i1 %tobool, label %while.end.loopexit, label %while.body

while.end.loopexit:                               ; preds = %while.body
  %add.lcssa = phi double [ %add, %while.body ]
  br label %while.end

while.end:                                        ; preds = %while.end.loopexit, %entry
  %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ]
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa)
  ret void
}
```
IR before GVN after this patch:
```
define dso_local void @foo(i32* nocapture readnone %r) local_unnamed_addr #0 {
entry:
  %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16) #2
  %0 = bitcast i8* %call to %struct.__Node*
  %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1
  store %struct.__Node* null, %struct.__Node** %next, align 8, !tbaa !2
  %f = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 0
  store double 1.150000e+01, double* %f, align 8, !tbaa !8
  %tobool12 = icmp eq i8* %call, null
  br i1 %tobool12, label %while.end, label %while.body.preheader

while.body.preheader:                             ; preds = %entry
  br label %while.body

while.body:                                       ; preds = %while.body.preheader, %while.body
  %sum.014 = phi double [ %add, %while.body ], [ 0.000000e+00, %while.body.preheader ]
  %ptr.013 = phi %struct.__Node* [ %2, %while.body ], [ %0, %while.body.preheader ]
  %f1 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 0
  %1 = load double, double* %f1, align 8, !tbaa !8
  %add = fadd contract double %sum.014, %1
  %next2 = getelementptr inbounds %struct.__Node, %struct.__Node* %ptr.013, i64 0, i32 1
  %2 = load %struct.__Node*, %struct.__Node** %next2, align 8, !tbaa !2
  %tobool = icmp eq %struct.__Node* %2, null
  br i1 %tobool, label %while.end.loopexit, label %while.body

while.end.loopexit:                               ; preds = %while.body
  %add.lcssa = phi double [ %add, %while.body ]
  br label %while.end

while.end:                                        ; preds = %while.end.loopexit, %entry
  %sum.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add.lcssa, %while.end.loopexit ]
  %call3 = tail call i32 (i8*, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i64 0, i64 0), double %sum.0.lcssa)
  ret void
}
```

The phi translation fails before this patch and it prevents GVN to remove the loop. The reason for this failure is in InstCombine. When the Instruction combining pass decides to convert:
```
 %call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16)
  %0 = bitcast i8* %call to %struct.__Node*
  %next = getelementptr inbounds %struct.__Node, %struct.__Node* %0, i64 0, i32 1
  store %struct.__Node* null, %struct.__Node** %next
```
to
```
%call = tail call noalias dereferenceable_or_null(16) i8* @malloc(i64 16)
  %next = getelementptr inbounds i8, i8* %call, i64 8
  %0 = bitcast i8* %next to %struct.__Node**
  store %struct.__Node* null, %struct.__Node** %0

```

GEP instructions with pure byte indexes (e.g. `getelementptr inbounds i8, i8* %call, i64 8`) are obstacles for address translation. address translation is looking for structural similarity between GEPs and these GEPs usually do not match since they have different structure.

This change will cause couple of failures in LLVM-tests. However, in all cases we need to change expected result by the test. I will update those tests as soon as I get green light on this patch.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96881
2021-03-16 17:05:44 -04:00
Philip Reames cec9e7352b [rs4gc] Simplify code by cloning existing instructions when inserting base chain [NFC]
Previously we created a new node, then filled in the pieces. Now, we clone the existing node, then change the respective fields. The only change in handling is with phis since we have to handle multiple incoming edges from the same block a bit differently.

Differential Revision: https://reviews.llvm.org/D98316
2021-03-16 13:10:32 -07:00
Philip Reames ef884e155d [rs4gc] don't force a conflict for a canonical broadcast
A broadcast is a shufflevector where only one input is used. Because of the way we handle constants (undef is a constant), the canonical shuffle sees a meet of (some value) and (nullptr). Given this, every broadcast gets treated as a conflict and a new base pointer computation is added.

The other way to tackle this would be to change constant handling specifically for undefs, but this seems easier.

Differential Revision: https://reviews.llvm.org/D98315
2021-03-16 12:59:06 -07:00
Philip Reames 5cabf472cb [rs4gc] don't duplicate existing values which are provably base pointers
RS4GC needs to rewrite the IR to ensure that every relocated pointer has an associated base pointer. The existing code isn't particularly smart about avoiding duplication of existing IR when it turns out the original pointer we were asked to materialize a base pointer for is itself a base pointer.

This patch adds a stage to the algorithm which prunes nodes proven (with a simple forward dataflow fixed point) to be base pointers from the list of nodes considered for duplication. This does require changing some of the later invariants slightly, that's probably the riskiest part of the change.

Differential Revision: D98122
2021-03-16 12:51:21 -07:00
Liam Keegan edf9565a86 [MemCpyOpt] Add missing MemorySSAWrapperPass dependency macro
Add MemorySSAWrapperPass as a dependency to MemCpyOptLegacyPass,
since MemCpyOpt now uses MemorySSA by default.

Differential Revision: https://reviews.llvm.org/D98484
2021-03-16 20:30:00 +01:00
Philip Reames 6972e39d47 [gvn] CSE gc.relocates based on meaning, not spelling (try 2)
This was (partially) reverted in cfe8f8e0 because the conversion from readonly to readnone in Intrinsics.td exposed a couple of problems.  This change has been reworked to not need that change (via some explicit checks in client code).  This is being done to address the original optimization issue and simplify the testing of the readonly changes.  I'm working on that piece under 49607.

Original commit message follows:

The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.)

We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass.

Differential Revision: https://reviews.llvm.org/D97974
2021-03-16 10:59:31 -07:00
Florian Hahn f586de8459
[VPlan] Remove PredInst2Recipe, use VP operands instead. (NFC)
Instead of maintaining a separate map from predicated instructions to
recipes, we can instead directly look at the VP operands. If the operand
comes from a predicated instruction, the operand will be a
VPPredInstPHIRecipe with a VPReplicateRecipe as its operand.
2021-03-16 17:40:35 +00:00
Sanjay Patel 40fdb43d30 [SLP] improve readability in reduction logic; NFC
We had 2 different and ambiguously-named 'I' variables.
2021-03-16 07:35:13 -04:00
Caroline Concatto 3c03635d53 [SVE][LoopVectorize] Add support for scalable vectorization of loops with vector reverse
This patch adds support for reverse loop vectorization.
It is possible to vectorize the following loop:
```
  for (int i = n-1; i >= 0; --i)
    a[i] = b[i] + 1.0;
```
with fixed or scalable vector.
The loop-vectorizer will use 'reverse' on the loads/stores to make
sure the lanes themselves are also handled in the right order.
This patch adds support for scalable vector on IRBuilder interface to
create a reverse vector. The IR function
CreateVectorReverse lowers to experimental.vector.reverse for scalable vector
and keedp the original behavior for fixed vector using shuffle reverse.

Differential Revision: https://reviews.llvm.org/D95363
2021-03-16 07:51:59 +00:00
Wenlei He a5d30421a6 [CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list
For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining.

For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work:

 - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee.
 - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added.
 - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile).
 - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate.

Differential Revision: https://reviews.llvm.org/D98590
2021-03-15 12:22:15 -07:00
Juneyoung Lee edf634ebc2 [AssumeBundles] Add nonnull/align to op bundle if noundef exists
This is a patch to add nonnull and align to assume's operand bundle
only if noundef exists.
Since nonnull and align in fn attr have poison semantics, they should be
paired with noundef or noundef-implying attributes to be immediate UB.

Reviewed By: jdoerfert, Tyker

Differential Revision: https://reviews.llvm.org/D98228
2021-03-16 10:23:42 +09:00
Hongtao Yu beea06c106 [NFC][Inliner] Debugging support to print funtion size after each inlining.
Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D98439
2021-03-14 22:11:53 -07:00
Chenguang Wang 166620a4f0
[ArgPromotion] Copy additional metadata for loads.
Current ArgPromotion implementation does not copy it: https://godbolt.org/z/zzTKof

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D93927
2021-03-14 21:28:14 +00:00
Simonas Kazlauskas 7d7001b2cb [InstCombine] Restrict a GEP transform to avoid changing provenance
This is an alternative to D98120. Herein, instead of deleting the transformation entirely, we check
that the underlying objects are both the same and therefore this transformation wouldn't incur a
provenance change, if applied.

https://alive2.llvm.org/ce/z/SYF_yv

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D98588
2021-03-14 16:32:04 +02:00
Luo, Yuanke 66fbf5fafb [X86][AMX] Prevent transforming load pointer from <256 x i32>* to x86_amx*.
The load/store instruction will be transformed to amx intrinsics
in the pass of AMX type lowering. Prohibiting the pointer cast
make that pass happy.

Differential Revision: https://reviews.llvm.org/D98247
2021-03-14 09:24:56 +08:00
Nikita Popov 5556660971 [MemCpyOpt] Handle read from lifetime.start with offset
This fixes a regression from the MemDep-based implementation:
MemDep completely ignores lifetime.start intrinsics that aren't
MustAlias -- this is probably unsound, but it does mean that the
MemDep based implementation successfully eliminated memcpy's from
lifetime.start if the memcpy happens at an offset, rather than
the base address of the alloca.

Add a special case for the case where the lifetime.start spans the
whole alloca (which is pretty much the only kind of lifetime.start
that frontends ever emit), as we don't need to figure out our exact
aliasing relationship in that case, the whole alloca is dead prior
to the call.

If this doesn't cover all practically relevant cases, then it
would be possible to make use of the recently added PartialAlias
clobber offsets to make this more precise.
2021-03-13 20:38:09 +01:00
Sanjay Patel 4224a36957 [InstCombine] avoid creating an extra instruction in zext fold and possible inf-loop
The structure of this fold is suspect vs. most of instcombine
because it creates instructions and tries to delete them
immediately after.

If we don't have the operand types for the icmps, then we are
not behaving as assumed. And as shown in PR49475, we can inf-loop.
2021-03-13 08:30:51 -05:00
Roman Lebedev 6e9b9978cf
[LSR] Don't try to fixup uses in 'EH pad' instructions
The added test case crashes before this fix:
```
opt: /repositories/llvm-project/llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp:5172: BasicBlock::iterator (anonymous namespace)::LSRInstance::AdjustInsertPositionForExpand(BasicBlock::iterator, const (anonymous namespace)::LSRFixup &, const (anonymous namespace)::LSRUse &, llvm::SCEVExpander &) const: Assertion `!isa<PHINode>(LowestIP) && !LowestIP->isEHPad() && !isa<DbgInfoIntrinsic>(LowestIP) && "Insertion point must be a normal instruction"' failed.
```
This is fully analogous to the previous commit,
with the pointer constant replaced to be something non-null.

The comparison here can be strength-reduced,
but the second operand of the comparison happens to be identical
to the constant pointer in the `catch` case of `landingpad`.

While LSRInstance::CollectLoopInvariantFixupsAndFormulae()
already gave up on uses in blocks ending up with EH pads,
it didn't consider this case.

Eventually, `LSRInstance::AdjustInsertPositionForExpand()`
will be called, but the original insertion point it will get
is the user instruction itself, and it doesn't want to
deal with EH pads, and asserts as much.

It would seem that this basically never happens in-the-wild,
otherwise it would have been reported already,
so it seems safe to take the cautious approach,
and just not deal with such users.
2021-03-13 16:05:34 +03:00
Nikita Popov 2902bdeea1 [MemCpyOpt] Use AA to check for MustAlias between memset and memcpy
Rather than checking for simple equality, check for MustAlias, as
we do in other transforms. This catches equivalent GEPs.
2021-03-13 11:41:15 +01:00
Nikita Popov 9080444f33 [MemCpyOpt] Don't generate zero-size memset
If a memset destination is overwritten by a memcpy and the sizes
are exactly the same, then the memset is simply dead. We can
directly drop it, instead of replacing it with a memset of zero
size, which is particularly ugly for the case of a dynamic size.
2021-03-13 11:41:15 +01:00
Wei Mi ef9d7db723 [IndirectCallPromotion] Recommit "Don't strip ".__uniq." suffix when it strips
".llvm." suffix".

The recommit fixed a bug that symbols with "." at the beginning is not
properly handled in the last commit.

Original commit message:
Currently IndirectCallPromotion simply strip everything after the first "."
in LTO mode, in order to match the symbol name and the name with ".llvm."
suffix in the value profile. However, if -funique-internal-linkage-names
and thinlto are both enabled, the name may have both ".__uniq." suffix and
".llvm." suffix, and the current mechanism will strip them both, which is
unexpected. The patch fixes the problem.

Differential Revision: https://reviews.llvm.org/D98389
2021-03-12 13:48:14 -08:00
Nikita Popov 42eb658f65 [OpaquePtrs] Remove some uses of type-less CreateGEP() (NFC)
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.

There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
2021-03-12 21:01:16 +01:00
Florian Hahn fb3ca70761
[LV] Account IV recipes being uniform in VPTransformState::get().
This patch fixes a crash when trying to get a scalar value using
VPTransformState::get() for uniform induction values or truncated
induction values. IVs and truncated IVs can be uniform and the updated
code accounts for that, fixing the crash.

This should fix
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=31981
2021-03-12 13:29:06 +00:00
Sanjay Patel bd197ed0a5 [SimplifyCFG] avoid sinking insts within an infinite-loop
The test is reduced from a C source example in:
https://llvm.org/PR49541

It's possible that the test could be reduced further or
the predicate generalized further, but it seems to require
a few ingredients (including the "late" SimplifyCFG options
on the RUN line) to fall into the infinite-loop trap.
2021-03-12 08:04:57 -05:00
Hans Wennborg f50aef745c Revert "[InstrProfiling] Don't generate __llvm_profile_runtime_user"
This broke the check-profile tests on Mac, see comment on the code
review.

> This is no longer needed, we can add __llvm_profile_runtime directly
> to llvm.compiler.used or llvm.used to achieve the same effect.
>
> Differential Revision: https://reviews.llvm.org/D98325

This reverts commit c7712087cb.

Also reverting the dependent follow-up commit:

Revert "[InstrProfiling] Generate runtime hook for ELF platforms"

> When using -fprofile-list to selectively apply instrumentation only
> to certain files or functions, we may end up with a binary that doesn't
> have any counters in the case where no files were selected. However,
> because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
> runtime would still be pulled in and incur some non-trivial overhead,
> especially in the case when the continuous or runtime counter relocation
> mode is being used. A better way would be to pull in the profile runtime
> only when needed by declaring the __llvm_profile_runtime symbol in the
> translation unit only when needed.
>
> This approach was already used prior to 9a041a7522, but we changed it
> to always generate the __llvm_profile_runtime due to a TAPI limitation.
> Since TAPI is only used on Mach-O platforms, we could use the early
> emission of __llvm_profile_runtime there, and on other platforms we
> could change back to the earlier approach where the symbol is generated
> later only when needed. We can stop passing -u__llvm_profile_runtime to
> the linker on Linux and Fuchsia since the generated undefined symbol in
> each translation unit that needed it serves the same purpose.
>
> Differential Revision: https://reviews.llvm.org/D98061

This reverts commit 87fd09b25f.
2021-03-12 13:53:46 +01:00
Serguei Katkov cfe8f8e0f0 Revert "Mark gc.relocate and gc.result as readnone"
As readnone function they become movable and LICM can hoist them
out of a loop. As a result in LCSSA form phi node of type token
is created. No one is ready that GCRelocate first operand is phi node
but expects to be token.

GVN test were also updated, it seems it does not do what is expected.
Test for LICM is also added.

This reverts commit f352463ade.
2021-03-12 16:59:17 +07:00
Johannes Doerfert ff256c1376 [Attributor] Derive `willreturn` based on `mustprogress`
Since D86233 we have `mustprogress` which, in combination with
`readonly`, implies `willreturn`. The idea is that every side-effect
has to be modeled as a "write". Consequently, `readonly` means there
is no side-effect, and `mustprogress` guarantees that we cannot "loop"
forever without side-effect.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D94125
2021-03-11 23:31:44 -06:00
Nikita Popov 2fe85dd289 [Attributor] Don't access pointer elem type in constructPointer (NFC)
Splitting this out as the change is non-trivial: The way this code
handled pointer types doesn't really make sense, as GEPs can only
apply an offset to the outermost pointer, but can't drill down
into interior pointer types (which would require dereferencing
memory).

Instead give special treatment to the first (pointer) index.
I've hardcoded it to zero as that's the only way the function is
used right now, but handling non-zero indexes would be
straightforward.

The original goal here was to have an element type for CreateGEP.
2021-03-11 21:36:40 +01:00
Petr Hosek 87fd09b25f [InstrProfiling] Generate runtime hook for ELF platforms
When using -fprofile-list to selectively apply instrumentation only
to certain files or functions, we may end up with a binary that doesn't
have any counters in the case where no files were selected. However,
because on Linux and Fuchsia, we pass -u__llvm_profile_runtime, the
runtime would still be pulled in and incur some non-trivial overhead,
especially in the case when the continuous or runtime counter relocation
mode is being used. A better way would be to pull in the profile runtime
only when needed by declaring the __llvm_profile_runtime symbol in the
translation unit only when needed.

This approach was already used prior to 9a041a7522, but we changed it
to always generate the __llvm_profile_runtime due to a TAPI limitation.
Since TAPI is only used on Mach-O platforms, we could use the early
emission of __llvm_profile_runtime there, and on other platforms we
could change back to the earlier approach where the symbol is generated
later only when needed. We can stop passing -u__llvm_profile_runtime to
the linker on Linux and Fuchsia since the generated undefined symbol in
each translation unit that needed it serves the same purpose.

Differential Revision: https://reviews.llvm.org/D98061
2021-03-11 12:29:01 -08:00
Valery N Dmitriev 73f94969b2 [SLP] Fix crash when matching associative reduction for integer min/max.
Associative reduction matcher in SLP begins with select instruction but when
it reached call to llvm.umax (or alike) via def-use chain the latter also matched
as UMax kind. The routine's later code assumes matched instruction to be a select
and thus it merely died on the first encountered cast that did not fit.

Differential Revision: https://reviews.llvm.org/D98432
2021-03-11 11:52:57 -08:00
Wenlei He 051f2c144e [SamplePGO] Skip inlinee profile scaling for sample loader inlining
For CGSCC inline, we need to scale down a function's branch weights and entry counts when thee it's inlined at a callsite. This is done through updateCallProfile. Additionally, we also scale the weigths for the inlined clone based on call site count in updateCallerBFI. Neither is needed for inlining during sample profile loader as it's using context profile that is separated from inlinee's own profile. This change skip the inlinee profile scaling for sample loader inlining.

Differential Revision: https://reviews.llvm.org/D98187
2021-03-11 10:18:26 -08:00
Hiroshi Yamauchi 365b225d46 [PGO] Fix two issues in PGOMemOPSizeOpt.
1. PGOMemOPSizeOpt grabs only the first, up to five (by default) entries from
the value profile metadata and preserves the remaining entries for the fallback
memop call site. If there are more than five entries, the rest of the entries
would get dropped. This is fine for PGOMemOPSizeOpt itself as it only promotes
up to 3 (by default) values, but potentially not for other downstream passes
that may use the value profile metadata.

2. PGOMemOPSizeOpt originally assumed that only values 0 through 8 are kept
track of. When the range buckets were introduced, it was changed to skip the
range buckets, but since it does not grab all entries (only five), if some range
buckets exist in the first five entries, it could potentially cause fewer
promotion opportunities (eg. if 4 out of 5 were range buckets, it may be able to
promote up to one non-range bucket, as opposed to 3.) Also, combined with 1, it
means that wrong entries may be preserved, as it didn't correctly keep track of
which were entries were skipped.

To fix this, PGOMemOPSizeOpt now grabs all the entries (up to the maximum number
of value profile buckets), keeps track of which entries were skipped, and
preserves all the remaining entries.

Differential Revision: https://reviews.llvm.org/D97592
2021-03-11 09:53:05 -08:00
Stephen Tozer f40976bd01 Revert "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This reverts commit c0f3dfb9f1.

Reverted due to an error on the clang-x64-windows-msvc buildbot.
2021-03-11 14:48:01 +00:00