Jon Roelofs
4b19e7dfae
[LoopIdiomRecognize][Remarks] Track loop-strided store to/from blocks
...
Differential revision: https://reviews.llvm.org/D109929
2021-09-16 15:46:26 -07:00
Eli Friedman
8f3d16905d
[ScalarEvolution] Ensure backedge-taken counts are not pointers.
...
A backedge-taken count doesn't refer to memory; returning a pointer type
is nonsense. So make sure we always return an integer.
The obvious way to do this would be to just convert the operands of the
icmp to integers, but that doesn't quite work out at the moment:
isLoopEntryGuardedByCond currently gets confused by ptrtoint operations.
So we perform the ptrtoint conversion late for lt/gt operations.
The test changes are mostly innocuous. The most interesting changes are
more complex SCEV expressions of the form "(-1 * (ptrtoint i8* %ptr to
i64)) + %ptr)". This is expected: we can't fold this to zero because we
need to preserve the pointer base.
The call to isLoopEntryGuardedByCond in howFarToZero is less precise
because of ptrtoint operations; this shows up in the function
pr46786_c26_char in ptrtoint.ll. Fixing it here would require more
complex refactoring. It should eventually be fixed by future
improvements to isImpliedCond.
See https://bugs.llvm.org/show_bug.cgi?id=46786 for context.
Differential Revision: https://reviews.llvm.org/D103656
2021-06-21 16:24:16 -07:00
Eli Friedman
925cd6b467
Regenerate a few tests related to SCEV.
...
In preparation for https://reviews.llvm.org/D103656
2021-06-04 13:35:00 -07:00
Han Zhu
da1cdffbb1
[loop-idiom] Hoist loop memcpys to loop preheader
...
For a simple loop like:
```
struct S {
int x;
int y;
char b;
};
unsigned foo(S* __restrict__ a, S* b, int n) {
for (int i = 0; i < n; i++)
a[i] = b[i];
return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%0 = bitcast %struct.S* %arrayidx2 to i8*
%1 = bitcast %struct.S* %arrayidx to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.
With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%a1 = bitcast %struct.S* %a to i8*
%b2 = bitcast %struct.S* %b to i8*
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
%0 = zext i32 %n to i64
%1 = mul nuw nsw i64 %0, 12
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%2 = bitcast %struct.S* %arrayidx2 to i8*
%3 = bitcast %struct.S* %arrayidx to i8*
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
Reviewed By: zino
Differential Revision: https://reviews.llvm.org/D97667
2021-05-04 17:05:04 -07:00
Tres Popp
efce19c3b0
Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
...
This reverts commit 75d6b8bb40
.
The reasoning is mentioned in https://reviews.llvm.org/D97667
2021-04-28 13:16:34 +02:00
Han Zhu
75d6b8bb40
[loop-idiom] Hoist loop memcpys to loop preheader
...
For a simple loop like:
```
struct S {
int x;
int y;
char b;
};
unsigned foo(S* __restrict__ a, S* b, int n) {
for (int i = 0; i < n; i++)
a[i] = b[i];
return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%0 = bitcast %struct.S* %arrayidx2 to i8*
%1 = bitcast %struct.S* %arrayidx to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.
With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%a1 = bitcast %struct.S* %a to i8*
%b2 = bitcast %struct.S* %b to i8*
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
%0 = zext i32 %n to i64
%1 = mul nuw nsw i64 %0, 12
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%2 = bitcast %struct.S* %arrayidx2 to i8*
%3 = bitcast %struct.S* %arrayidx to i8*
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
Reviewed By: zino
Differential Revision: https://reviews.llvm.org/D97667
2021-04-27 17:37:51 -07:00
Krasimir Georgiev
8e7df996e3
Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
...
This reverts commit 92ddd3c1b6
.
Causes multistage clang crashes, e.g.:
https://lab.llvm.org/buildbot/#/builders/36/builds/6678
2021-03-30 11:47:12 +02:00
Han Zhu
92ddd3c1b6
[loop-idiom] Hoist loop memcpys to loop preheader
...
For a simple loop like:
```
struct S {
int x;
int y;
char b;
};
unsigned foo(S* __restrict__ a, S* b, int n) {
for (int i = 0; i < n; i++)
a[i] = b[i];
return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%0 = bitcast %struct.S* %arrayidx2 to i8*
%1 = bitcast %struct.S* %arrayidx to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.
With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%a1 = bitcast %struct.S* %a to i8*
%b2 = bitcast %struct.S* %b to i8*
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
%0 = zext i32 %n to i64
%1 = mul nuw nsw i64 %0, 12
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%2 = bitcast %struct.S* %arrayidx2 to i8*
%3 = bitcast %struct.S* %arrayidx to i8*
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
Reviewed By: zino
Differential Revision: https://reviews.llvm.org/D97667
2021-03-29 23:36:26 -07:00
Han Zhu
2bd4049ceb
Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
...
This reverts commit deb5095833
.
Bad commit message.
2021-03-29 23:35:35 -07:00
Han Zhu
deb5095833
[loop-idiom] Hoist loop memcpys to loop preheader
...
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Blame Revision:
Differential Revision: https://phabricator.intern.facebook.com/D26380397
2021-03-29 23:14:42 -07:00
Fangrui Song
f31811f2dc
[BasicAA] Rename deprecated -basicaa to -basic-aa
...
Follow-up to D82607
Revert an accidental change (empty.ll) of D82683
2020-06-26 20:41:37 -07:00
Orlando Cazalet-Hyams
1251cac62a
[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
...
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
I have set up a separate review D61933 for a fix which is required for this patch.
Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse
Reviewed By: hfinkel, jmorse
Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D60831
> llvm-svn: 363046
llvm-svn: 363786
2019-06-19 10:50:47 +00:00
Orlando Cazalet-Hyams
a947156396
Revert "[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion"
...
This reverts commit 1a0f7a2077
.
See phabricator thread for D60831.
llvm-svn: 363132
2019-06-12 08:34:51 +00:00
Orlando Cazalet-Hyams
1a0f7a2077
[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
...
Summary:
Bug: https://bugs.llvm.org/show_bug.cgi?id=39024
The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:
A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.
In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.
I have set up a separate review D61933 for a fix which is required for this patch.
Reviewers: samsonov, vsk, aprantl, probinson, anemet, hfinkel, jmorse
Reviewed By: hfinkel, jmorse
Subscribers: jmorse, javed.absar, eraman, kcc, bjope, jmellorcrummey, hfinkel, gbedwell, hiraditya, zzheng, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D60831
llvm-svn: 363046
2019-06-11 10:37:20 +00:00
Roman Lebedev
e8578953ac
[LoopIdiom] Basic OptimizationRemarkEmitter handling
...
Summary:
I'm adding ORE to memset/memcpy formation, with tests,
but mainly this is split off from D61144.
Reviewers: reames, anemet, thegameg, craig.topper
Reviewed By: thegameg
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62631
llvm-svn: 362092
2019-05-30 13:02:06 +00:00