Jon Roelofs
4b19e7dfae
[LoopIdiomRecognize][Remarks] Track loop-strided store to/from blocks
...
Differential revision: https://reviews.llvm.org/D109929
2021-09-16 15:46:26 -07:00
Han Zhu
da1cdffbb1
[loop-idiom] Hoist loop memcpys to loop preheader
...
For a simple loop like:
```
struct S {
int x;
int y;
char b;
};
unsigned foo(S* __restrict__ a, S* b, int n) {
for (int i = 0; i < n; i++)
a[i] = b[i];
return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%0 = bitcast %struct.S* %arrayidx2 to i8*
%1 = bitcast %struct.S* %arrayidx to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.
With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%a1 = bitcast %struct.S* %a to i8*
%b2 = bitcast %struct.S* %b to i8*
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
%0 = zext i32 %n to i64
%1 = mul nuw nsw i64 %0, 12
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%2 = bitcast %struct.S* %arrayidx2 to i8*
%3 = bitcast %struct.S* %arrayidx to i8*
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
Reviewed By: zino
Differential Revision: https://reviews.llvm.org/D97667
2021-05-04 17:05:04 -07:00
Tres Popp
efce19c3b0
Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
...
This reverts commit 75d6b8bb40
.
The reasoning is mentioned in https://reviews.llvm.org/D97667
2021-04-28 13:16:34 +02:00
Han Zhu
75d6b8bb40
[loop-idiom] Hoist loop memcpys to loop preheader
...
For a simple loop like:
```
struct S {
int x;
int y;
char b;
};
unsigned foo(S* __restrict__ a, S* b, int n) {
for (int i = 0; i < n; i++)
a[i] = b[i];
return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%0 = bitcast %struct.S* %arrayidx2 to i8*
%1 = bitcast %struct.S* %arrayidx to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.
With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%a1 = bitcast %struct.S* %a to i8*
%b2 = bitcast %struct.S* %b to i8*
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
%0 = zext i32 %n to i64
%1 = mul nuw nsw i64 %0, 12
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%2 = bitcast %struct.S* %arrayidx2 to i8*
%3 = bitcast %struct.S* %arrayidx to i8*
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
Reviewed By: zino
Differential Revision: https://reviews.llvm.org/D97667
2021-04-27 17:37:51 -07:00
Krasimir Georgiev
8e7df996e3
Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
...
This reverts commit 92ddd3c1b6
.
Causes multistage clang crashes, e.g.:
https://lab.llvm.org/buildbot/#/builders/36/builds/6678
2021-03-30 11:47:12 +02:00
Han Zhu
92ddd3c1b6
[loop-idiom] Hoist loop memcpys to loop preheader
...
For a simple loop like:
```
struct S {
int x;
int y;
char b;
};
unsigned foo(S* __restrict__ a, S* b, int n) {
for (int i = 0; i < n; i++)
a[i] = b[i];
return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%0 = bitcast %struct.S* %arrayidx2 to i8*
%1 = bitcast %struct.S* %arrayidx to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.
With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }
define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
%a1 = bitcast %struct.S* %a to i8*
%b2 = bitcast %struct.S* %b to i8*
%cmp7 = icmp sgt i32 %n, 0
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
%0 = zext i32 %n to i64
%1 = mul nuw nsw i64 %0, 12
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret i32 12
for.body: ; preds = %for.body, %for.body.preheader
%i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
%idxprom = zext i32 %i.08 to i64
%arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
%arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
%2 = bitcast %struct.S* %arrayidx2 to i8*
%3 = bitcast %struct.S* %arrayidx to i8*
%inc = add nuw nsw i32 %i.08, 1
%cmp = icmp slt i32 %inc, %n
br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}
; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0
attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```
Reviewed By: zino
Differential Revision: https://reviews.llvm.org/D97667
2021-03-29 23:36:26 -07:00
Han Zhu
2bd4049ceb
Revert "[loop-idiom] Hoist loop memcpys to loop preheader"
...
This reverts commit deb5095833
.
Bad commit message.
2021-03-29 23:35:35 -07:00
Han Zhu
deb5095833
[loop-idiom] Hoist loop memcpys to loop preheader
...
Summary:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Blame Revision:
Differential Revision: https://phabricator.intern.facebook.com/D26380397
2021-03-29 23:14:42 -07:00
Fangrui Song
f31811f2dc
[BasicAA] Rename deprecated -basicaa to -basic-aa
...
Follow-up to D82607
Revert an accidental change (empty.ll) of D82683
2020-06-26 20:41:37 -07:00
Sam Parker
60d6fb2a63
[SCEV] Use NoWrapFlags when expanding a simple mul
...
Second functional change following on from rL362687. Pass the
NoWrapFlags from the MulExpr to InsertBinop when we're generating a
shl or mul.
Differential Revision: https://reviews.llvm.org/D61934
llvm-svn: 363540
2019-06-17 10:05:18 +00:00
Benjamin Kramer
f1249442cf
Revert "[SCEV] Use wrap flags in InsertBinop"
...
This reverts commit r362687. Miscompiles llvm-profdata during selfhost.
llvm-svn: 362699
2019-06-06 12:35:46 +00:00
Sam Parker
7cc580f5e9
[SCEV] Use wrap flags in InsertBinop
...
If the given SCEVExpr has no (un)signed flags attached to it, transfer
these to the resulting instruction or use them to find an existing
instruction.
Differential Revision: https://reviews.llvm.org/D61934
llvm-svn: 362687
2019-06-06 08:56:26 +00:00
Roman Lebedev
e8578953ac
[LoopIdiom] Basic OptimizationRemarkEmitter handling
...
Summary:
I'm adding ORE to memset/memcpy formation, with tests,
but mainly this is split off from D61144.
Reviewers: reames, anemet, thegameg, craig.topper
Reviewed By: thegameg
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62631
llvm-svn: 362092
2019-05-30 13:02:06 +00:00