Mark ModRefInfo as a bitmask enum, which allows using normal
& and | operators on it. This supersedes various functions like
unionModRef() and intersectModRef(). I think this makes the code
cleaner than going through helper functions...
Differential Revision: https://reviews.llvm.org/D130870
isSafeToExpand() for addrecs depends on whether the SCEVExpander
will be used in CanonicalMode. At least one caller currently gets
this wrong, resulting in PR50506.
Fix this by a) making the CanonicalMode argument on the freestanding
functions required and b) adding member functions on SCEVExpander
that automatically take the SCEVExpander mode into account. We can
use the latter variant nearly everywhere, and thus make sure that
there is no chance of CanonicalMode mismatch.
Fixes https://github.com/llvm/llvm-project/issues/50506.
Differential Revision: https://reviews.llvm.org/D129630
Commit dd5991cc modified the aliasing checks here to allow transforming
a memcpy where the source and destination point into the same object.
However, the change accidentally made the code skip the alias check for
other operations in the loop.
Instead of completely skipping the alias check, just skip the check for
whether the memcpy aliases itself.
Differential Revision: https://reviews.llvm.org/D126486
libcalls." (was 0f8c626). This reverts commit 14d9390.
The patch previously failed to recognize cases where user had defined a
function alias with an identical name as that of the library
function. Module::getFunction() would then return nullptr which is what the
sanitizer discovered.
In this updated version a new function isLibFuncEmittable() has as well been
introduced which is now used instead of TLI->has() anytime a library function
is to be emitted . It additionally also makes sure there is e.g. no function
alias with the same name in the module.
Reviewed By: Eli Friedman
Differential Revision: https://reviews.llvm.org/D123198
test/Transforms/InstCombine/pr39177.ll failed in a -DLLVM_USE_SANITIZER=Undefined build.
```
lib/Transforms/Utils/BuildLibCalls.cpp:1217:17: runtime error: reference binding to null pointer of type 'llvm::Function'
```
`Function &F = *M->getFunction(Name);`
This reverts commit 0f8c626723.
A new set of overloaded functions named getOrInsertLibFunc() are now supposed
to be used instead of getOrInsertFunction() when building a libcall from
within an LLVM optimizer(). The idea is that this new function also makes
sure that any mandatory argument attributes are added to the function
prototype (after calling getOrInsertFunction()).
inferLibFuncAttributes() is renamed to inferNonMandatoryLibFuncAttrs() as it
only adds attributes that are not necessary for correctness but merely
helping with later optimizations.
Generally, the front end is responsible for building a correct function
prototype with the needed argument attributes. If the middle end however is
the one creating the call, e.g. when replacing one libcall with another, it
then must take this responsibility.
This continues the work of properly handling argument extension if required
by the target ABI when building a lib call. getOrInsertLibFunc() now does
this for all libcalls currently built by any LLVM optimizer. It is expected
that when in the future a new optimization builds a new libcall with an
integer argument it is to be added to getOrInsertLibFunc() with the proper
handling. Note that not all targets have it in their ABI to sign/zero extend
integer arguments to the full register width, but this will be done
selectively as determined by getExtAttrForI32Param().
Review: Eli Friedman, Nikita Popov, Dávid Bolvanský
Differential Revision: https://reviews.llvm.org/D123198
Factor in the TBAA of adjacent stores instead of just the head store
when merging stores into a memset. We were seeing GVN remove a load that
had a TBAA that matched the 2nd store because GVN determined it didn't
match the TBAA of the memset. The memset had the TBAA of only the first
store.
i.e. Loading the field pi_ of shared_count after memset to create an
array of shared_ptr
template<class T>
class shared_ptr {
T *p;
shared_count refcount;
};
class shared_count {
sp_counted_base *pi_;
};
Differential Revision: https://reviews.llvm.org/D122205
When upgrading a loop of load/store to a memcpy, the existing pass does not keep existing aliasing information. This patch allows existing aliasing information to be kept.
Reviewed By: jeroen.dobbelaere
Differential Revision: https://reviews.llvm.org/D108221
There is no need to sort inserted instructions by dominance, as the
deletion loop still requires RAUW with undef before deleting. Removing
instructions in reverse insertion order should still insure that the
number of uselist updates is kept to a minimum.
These are deprecated and should be replaced with getAlign().
Some of these asserts don't do anything because Load/Store/AllocaInst never have a 0 align value.
Expression guraded in loop entry can be folded prior to comparison. This patch
proceeds D107353 and makes LIR able to deal with nested for-loop.
Reviewed By: qianzhen, bmahjour
Differential Revision: https://reviews.llvm.org/D108112
We were using the type of the loop back edge count to represent the
store size. This failed for small loop counts (e.g. in the added test,
the loop count was an i2).
Use the index type instead.
Fixes PR52104.
Differential Revision: https://reviews.llvm.org/D111401
This change fixes issue found by Markus: https://reviews.llvm.org/rG11338e998df1
Before this patch following code was transformed to memmove:
for (int i = 15; i >= 1; i--) {
p[i] = p[i-1];
sum += p[i-1];
}
However load from p[i-1] is used not only by store to p[i] but also by sum computation.
Therefore we cannot emit memmove in loop header.
Differential Revision: https://reviews.llvm.org/D107964
Letting it take SCEV allows further modification on the function to optimize
if the StoreSize / Stride is runtime determined.
The plan is to let memcpy / memmove deal with runtime-determined sizes, just
like what D107353 did to memset.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D108289
When dealing with memmove, we also add the load instruction to the ignored
instructions list passed to `mayLoopAccessLocation`. Renaming "Stores" to
"IgnoredInsts" to be more precise.
Differential Revision: https://reviews.llvm.org/D108275
The current LIR does not deal with runtime-determined memset-size. This patch
utilizes SCEV and check if the PointerStrideSCEV and the MemsetSizeSCEV are equal.
Before comparison the pass would try to fold the expression that is already
protected by the loop guard.
Testcase file `memset-runtime.ll`, `memset-runtime-debug.ll` added.
This patch deals with proper loop-idiom. Proceeding patch wants to deal with SCEV-s
that are inequal after folding with the loop guards.
Reviewed By: lebedev.ri, Whitney
Differential Revision: https://reviews.llvm.org/D107353
Letting it take SCEV allows further modification on the function to optimize
if the StoreSize / Stride is runtime determined.
This is a preceeding of D107353.
The big picture is to let LoopIdiom deal with runtime-determined sizes.
Reviewed By: Whitney, lebedev.ri
Differential Revision: https://reviews.llvm.org/D104595
The purpose of patch is to learn Loop idiom recognition pass how to recognize simple memmove patterns
in similar way like GCC: https://godbolt.org/z/fh95e83od
LoopIdiomRecognize already has machinery for memset and memcpy recognition, patch tries to extend exisiting capabilities with minimal effort.
Differential Revision: https://reviews.llvm.org/D104464
Essentially, the cover function simply combines the loop level check and the function level scope into one call. This simplifies several callers and is (subjectively) less error prone.
Nowadays LLVM does not assume that all loops are finite,
so if we want to produce a finite loop from a potentially-infinite one,
we must ensure that the original loop is known to be a finite one.
For this transform, it only matters for arithmetic right-shifts.
For them, either the function or the loop must be known to
be `mustprogress`, or the original value being shifted must be known
to be non-negative (because iff the sign bit was set,
it will never become zero, but will become `-1` in the "end").
It would be really good for alive2 to actually complain about this,
but it currently does not: https://github.com/AliveToolkit/alive2/issues/726
This adds support for the "count active bits" pattern, i.e.:
```
int countBits(unsigned val) {
int cnt = 0;
for( ; (val << cnt) != 0; ++cnt)
;
return cnt;
}
```
but a somewhat more general one:
```
int countBits(unsigned val, int start, int off) {
int cnt;
for (cnt = start; val << (cnt + off); cnt++)
;
return cnt;
}
```
alive2 is happy with all the tests there.
Note that, again, much like with the right-shift cases,
we don't require the `val != 0` guard.
This is the last pattern that was supported by
`detectShiftUntilZeroIdiom()`, which now becomes obsolete.
This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(signed val) {
int cnt = 0;
for( ; (val >> cnt) != 0; ++cnt)
;
return cnt;
}
```
but a somewhat more general one:
```
int countActiveBits(signed val, int start, int off) {
int cnt;
for (cnt = start; val >> (cnt + off); cnt++)
;
return cnt;
}
```
This directly matches the existing 'logical right-shift until zero' idiom.
alive2 is happy with all the tests there.
Note that, again, much like with the original unsigned case,
we don't require the `val != 0` guard.
The old `detectShiftUntilZeroIdiom()` already supports this pattern,
the idea here is that the `val` must be positive (have at least one
leading zero), because otherwise the loop is non-terminating,
but since it is not `while(1)`, that would have been UB.
I think i've added exhaustive test coverage, and i have verified that alive2 is happy with all the tests,
so in principle i'm fine with landing this without review, but just in case..
This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(unsigned val) {
int cnt = 0;
for( ; (val >> cnt) != 0; ++cnt)
;
return cnt;
}
```
but a somewhat more general one, since that is what i need:
```
int countActiveBits(unsigned val, int start, int off) {
int cnt;
for (cnt = start; val >> (cnt + off); cnt++)
;
return cnt;
}
```
I've followed in footstep of 'left-shift until bittest' idiom (D91038),
in the sense that iff the `ctlz` intrinsic is cheap, we'll transform,
regardless of all other factors.
This can have a shocking effect on certain benchmarks:
```
raw.pixls.us-unique/Olympus/XZ-1$ /repositories/googlebenchmark/tools/compare.py -a benchmarks ~/rawspeed/build-{old,new}/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf
RUNNING: /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmp49_28zcm
2021-05-09T01:06:05+03:00
Running /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench
Run on (32 X 3600.24 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
Load Average: 5.26, 6.29, 3.49
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean 145 ms 145 ms 128 0.145319 0.999981 10.1568M 69.8949M 69.8936M 6.88159 6.88146 0.145322
p1319978.orf/threads:32/process_time/real_time_median 145 ms 145 ms 128 0.145317 0.999986 10.1568M 69.8941M 69.8931M 6.88151 6.88141 0.145319
p1319978.orf/threads:32/process_time/real_time_stddev 0.766 ms 0.766 ms 128 766.586u 15.1302u 0 354.167k 354.098k 0.0348699 0.0348631 766.469u
RUNNING: /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench --benchmark_counters_tabular=true --benchmark_min_time=0.00000001 --benchmark_repetitions=128 p1319978.orf --benchmark_display_aggregates_only=true --benchmark_out=/tmp/tmpwb9sw2x0
2021-05-09T01:06:24+03:00
Running /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Run on (32 X 3599.95 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x16)
L1 Instruction 32 KiB (x16)
L2 Unified 512 KiB (x16)
L3 Unified 32768 KiB (x2)
Load Average: 4.05, 5.95, 3.43
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations CPUTime,s CPUTime/WallTime Pixels Pixels/CPUTime Pixels/WallTime Raws/CPUTime Raws/WallTime WallTime,s
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_mean 99.8 ms 99.8 ms 128 0.0997758 0.999972 10.1568M 101.797M 101.794M 10.0225 10.0222 0.0997786
p1319978.orf/threads:32/process_time/real_time_median 99.7 ms 99.7 ms 128 0.0997165 0.999985 10.1568M 101.857M 101.854M 10.0284 10.0281 0.0997195
p1319978.orf/threads:32/process_time/real_time_stddev 0.224 ms 0.224 ms 128 224.166u 34.345u 0 226.81k 227.231k 0.0223309 0.0223723 224.586u
Comparing /home/lebedevri/rawspeed/build-old/src/utilities/rsbench/rsbench to /home/lebedevri/rawspeed/build-new/src/utilities/rsbench/rsbench
Benchmark Time CPU Time Old Time New CPU Old CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------
p1319978.orf/threads:32/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 128 vs 128
p1319978.orf/threads:32/process_time/real_time_mean -0.3134 -0.3134 145 100 145 100
p1319978.orf/threads:32/process_time/real_time_median -0.3138 -0.3138 145 100 145 100
p1319978.orf/threads:32/process_time/real_time_stddev -0.7073 -0.7078 1 0 1 0
```
Reviewed By: craig.topper, zhuhan0
Differential Revision: https://reviews.llvm.org/D102116
-Make sure of the CreateShl/LShr/AShr methods that take a uint64_t
instead of creating a ConstantInt for 1 ourselves.
-Use Builder.getInt1 or ConstantInt::getBool instead of a conditional.
-Pull out repeated calls to getType.