with and without -g.
Adding a test case to make sure that the threshold used in the memory
dependence analysis is respected. The test case also checks that debug
intrinsics are not counted towards this threshold.
Differential Revision: http://llvm-reviews.chandlerc.com/D2141
llvm-svn: 194646
BitVector/SmallBitVector::reference::operator bool remain implicit since
they model more exactly a bool, rather than something else that can be
boolean tested.
The most common (non-buggy) case are where such objects are used as
return expressions in bool-returning functions or as boolean function
arguments. In those cases I've used (& added if necessary) a named
function to provide the equivalent (or sometimes negative, depending on
convenient wording) test.
One behavior change (YAMLParser) was made, though no test case is
included as I'm not sure how to reach that code path. Essentially any
comparison of llvm::yaml::document_iterators would be invalid if neither
iterator was at the end.
This helped uncover a couple of bugs in Clang - test cases provided for
those in a separate commit along with similar changes to `operator bool`
instances in Clang.
llvm-svn: 181868
PR15000 has a testcase where the time to compile was bordering on 30s. When I
dropped the limit value to 100, it became a much more managable 6s. The compile
time seems to increase in a roughly linear fashion based on increasing the limit
value. (See the runtimes below.)
So, let's lower the limit to 100 so that they can get a more reasonable compile
time.
Limit Value Time
----------- ----
10 0.9744s
20 1.8035s
30 2.3618s
40 2.9814s
50 3.6988s
60 4.5486s
70 4.9314s
80 5.8012s
90 6.4246s
100 7.0852s
110 7.6634s
120 8.3553s
130 9.0552s
140 9.6820s
150 9.8804s
160 10.8901s
170 10.9855s
180 12.0114s
190 12.6816s
200 13.2754s
210 13.9942s
220 13.8097s
230 14.3272s
240 15.7753s
250 15.6673s
260 16.0541s
270 16.7625s
280 17.3823s
290 18.8213s
300 18.6120s
310 20.0333s
320 19.5165s
330 20.2505s
340 20.7068s
350 21.1833s
360 22.9216s
370 22.2152s
380 23.9390s
390 23.4609s
400 24.0426s
410 24.6410s
420 26.5208s
430 27.7155s
440 26.4142s
450 28.5646s
460 27.3494s
470 29.7255s
480 29.4646s
490 30.5001s
llvm-svn: 179713
The "invariant.load" metadata indicates the memory unit being accessed is immutable.
A load annotated with this metadata can be moved across any store.
As I am not sure if it is legal to move such loads across barrier/fence, this
change dose not allow such transformation.
rdar://11311484
Thank Arnold for code review.
llvm-svn: 176562
These are two related changes (one in llvm, one in clang).
LLVM:
- rename address_safety => sanitize_address (the enum value is the same, so we preserve binary compatibility with old bitcode)
- rename thread_safety => sanitize_thread
- rename no_uninitialized_checks -> sanitize_memory
CLANG:
- add __attribute__((no_sanitize_address)) as a synonym for __attribute__((no_address_safety_analysis))
- add __attribute__((no_sanitize_thread))
- add __attribute__((no_sanitize_memory))
for S in address thread memory
If -fsanitize=S is present and __attribute__((no_sanitize_S)) is not
set llvm attribute sanitize_S
llvm-svn: 176075
into their new header subdirectory: include/llvm/IR. This matches the
directory structure of lib, and begins to correct a long standing point
of file layout clutter in LLVM.
There are still more header files to move here, but I wanted to handle
them in separate commits to make tracking what files make sense at each
layer easier.
The only really questionable files here are the target intrinsic
tablegen files. But that's a battle I'd rather not fight today.
I've updated both CMake and Makefile build systems (I think, and my
tests think, but I may have missed something).
I've also re-sorted the includes throughout the project. I'll be
committing updates to Clang, DragonEgg, and Polly momentarily.
llvm-svn: 171366
directly.
This is in preparation for removing the use of the 'Attribute' class as a
collection of attributes. That will shift to the AttributeSet class instead.
llvm-svn: 171253
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
loads. It's not really profitable and may result in GVN going into an infinite
loop when it hits constructs like this:
%x = gep %some.type %x, ...
Found via an LTO build of LLVM.
llvm-svn: 166490
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.
llvm-svn: 165488
If an allocation has a must-alias relation to the access pointer, we treat it
as a Def. Otherwise, without this check, the code here was just skipping over
the allocation call and ignoring it. I noticed this by inspection and don't
have a specific testcase that it breaks, but it seems like we need to treat
a may-alias allocation as a Clobber.
llvm-svn: 163127
This code used to only handle malloc-like calls, which do not read memory.
r158919 changed it to check isNoAliasFn(), which includes strdup-like and
realloc-like calls, but it was not checking for dependencies on the memory
read by those calls.
llvm-svn: 163106
This disables malloc-specific optimization when -fno-builtin (or -ffreestanding)
is specified. This has been a problem for a long time but became more severe
with the recent memory builtin improvements.
Since the memory builtin functions are used everywhere, this required passing
TLI in many places. This means that functions that now have an optional TLI
argument, like RecursivelyDeleteTriviallyDeadFunctions, won't remove dead
mallocs anymore if the TLI argument is missing. I've updated most passes to do
the right thing.
Fixes PR13694 and probably others.
llvm-svn: 162841
Currently, if GetLocation reports that it did not find a valid pointer (this is the case for volatile load/stores),
we ignore the result. This patch adds code to handle the cases where we did not obtain a valid pointer.
rdar://11872864 PR12899
llvm-svn: 161802
- provide more extensive set of functions to detect library allocation functions (e.g., malloc, calloc, strdup, etc)
- provide an API to compute the size and offset of an object pointed by
Move a few clients (GVN, AA, instcombine, ...) to the new API.
This implementation is a lot more aggressive than each of the custom implementations being replaced.
Patch reviewed by Nick Lewycky and Chandler Carruth, thanks.
llvm-svn: 158919
so that it can be reused in MemCpyOptimizer. This analysis is needed to remove
an unnecessary memcpy when returning a struct into a local variable.
rdar://11341081
PR12686
llvm-svn: 156776
captured. This allows the tracker to look at the specific use, which may be
especially interesting for function calls.
Use this to fix 'nocapture' deduction in FunctionAttrs. The existing one does
not iterate until a fixpoint and does not guarantee that it produces the same
result regardless of iteration order. The new implementation builds up a graph
of how arguments are passed from function to function, and uses a bottom-up walk
on the argument-SCCs to assign nocapture. This gets us nocapture more often, and
does so rather efficiently and independent of iteration order.
llvm-svn: 147327
and stores capture) to permit the caller to see each capture point and decide
whether to continue looking.
Use this inside memdep to do an analysis that basicaa won't do. This lets us
solve another devirtualization case, fixing PR8908!
llvm-svn: 144580
The limit in this patch is probably too high, but it is enough to stop DSE from going completely insane on a testcase I have (which has a single block with around 50,000 non-aliasing stores in it).
rdar://9471075
llvm-svn: 133111
redundant with partially-aliasing loads.
When computing what portion of a clobbering load value is needed,
it doesn't consider phi-translation which may have occurred
between the clobbing load and the redundant load.
llvm-svn: 132631
In the given testcase, the "Clobber" was pointing to a load, and GVN was incorrectly assuming that meant that the "Clobber" load overlapped the load being analyzed (when they are actually unrelated).
The included testcase tests both this commit and r132434.
Part two of rdar://9429882. (r132434 was mislabeled.)
llvm-svn: 132442
wider load would allow elimination of subsequent loads, and when the wider
load is still a native integer type. This eliminates a ton of loads on
various benchmarks involving struct fields, though it is somewhat hobbled
by clang not being very aggressive about field alignment.
This is yet another step along the way towards resolving PR6627.
llvm-svn: 130390
an earlier load could be widened to encompass a later load. For example,
if we see:
X = load i8* P, align 4
Y = load i8* (P+3), align 1
and we have a 32-bit native integer type, we can widen the former load
to i32 which then makes the second load redundant. GVN can't actually
do anything with this load/load relation yet, so this isn't testable, but
it is the next step to resolving PR6627, and a fairly general class of
"merge neighboring loads" missed optimizations.
llvm-svn: 130250
return it as a clobber. This allows GVN to do smart things.
Enhance GVN to be smart about the case when a small load is clobbered
by a larger overlapping load. In this case, forward the value. This
allows us to compile stuff like this:
int test(void *P) {
int tmp = *(unsigned int*)P;
return tmp+*((unsigned char*)P+1);
}
into:
_test: ## @test
movl (%rdi), %ecx
movzbl %ch, %eax
addl %ecx, %eax
ret
which has one load. We already handled the case where the smaller
load was from a must-aliased base pointer.
llvm-svn: 130180
with BasicAA's DecomposeGEPExpression, which recently began
using a TargetData. This fixes PR8968, though the testcase
is awkward to reduce.
Also, update several off GetUnderlyingObject's users
which happen to have a TargetData handy to pass it in.
llvm-svn: 124134
references. For example, this allows gvn to eliminate the load in
this example:
void foo(int n, int* p, int *q) {
p[0] = 0;
p[1] = 1;
if (n) {
*q = p[0];
}
}
llvm-svn: 118714
must be called in the pass's constructor. This function uses static dependency declarations to recursively initialize
the pass's dependencies.
Clients that only create passes through the createFooPass() APIs will require no changes. Clients that want to use the
CommandLine options for passes will need to manually call the appropriate initialization functions in PassInitialization.h
before parsing commandline arguments.
I have tested this with all standard configurations of clang and llvm-gcc on Darwin. It is possible that there are problems
with the static dependencies that will only be visible with non-standard options. If you encounter any crash in pass
registration/creation, please send the testcase to me directly.
llvm-svn: 116820
perform initialization without static constructors AND without explicit initialization
by the client. For the moment, passes are required to initialize both their
(potential) dependencies and any passes they preserve. I hope to be able to relax
the latter requirement in the future.
llvm-svn: 116334
response from getModRefInfo is not useful here. Instead, check for identical
calls only in the NoModRef case.
Reapply r110270, and strengthen it to compensate for the memdep changes.
When both calls are readonly, there is no dependence between them.
llvm-svn: 110382
with a fix for self-hosting
rotate CallInst operands, i.e. move callee to the back
of the operand array
the motivation for this patch are laid out in my mail to llvm-commits:
more efficient access to operands and callee, faster callgraph-construction,
smaller compiler binary
llvm-svn: 101465
with a fix
rotate CallInst operands, i.e. move callee to the back
of the operand array
the motivation for this patch are laid out in my mail to llvm-commits:
more efficient access to operands and callee, faster callgraph-construction,
smaller compiler binary
llvm-svn: 101397
of the operand array
the motivation for this patch are laid out in my mail to llvm-commits:
more efficient access to operands and callee, faster callgraph-construction,
smaller compiler binary
llvm-svn: 101364
argument is non-null, pass it along to PHITranslateSubExpr so that it can
prefer using existing values that dominate the PredBB, instead of just
blindly picking the first equivalent value that it finds on a uselist.
Also when the DominatorTree is specified, have PHITranslateValue filter
out any result that does not dominate the PredBB. This is basically just
refactoring the check that used to be in GetAvailablePHITranslatedSubExpr
and also in GVN.
Despite my initial expectations, this change does not affect the results
of GVN for any testcases that I could find, but it should help compile time.
Before this change, if PHITranslateSubExpr picked a value that does not
dominate, PHITranslateWithInsertion would then insert a new value, which GVN
would later determine to be redundant and would replace. By picking a good
value to begin with, we save GVN the extra work of inserting and then
replacing a new value.
llvm-svn: 97010
instead of stored. This reduces memdep memory usage, and also eliminates a bunch of
weakvh's. This speeds up gvn on gcc.c-torture/20001226-1.c from 23.9s to 8.45s (2.8x)
on a different machine than earlier.
llvm-svn: 91885
cache a pointer as being unavailable due to phi trans in the
wrong place. This would cause later queries to fail even when
they didn't involve phi trans.
llvm-svn: 91787
phi translation of complex expressions like &A[i+1]. This has the
following benefits:
1. The phi translation logic is all contained in its own class with
a strong interface and verification that it is self consistent.
2. The logic is more correct than before. Previously, if intermediate
expressions got PHI translated, we'd miss the update and scan for
the wrong pointers in predecessor blocks. @phi_trans2 is a testcase
for this.
3. We have a lot less code in memdep.
We can handle phi translation across blocks of things like @phi_trans3,
which is pretty insane :).
This patch should fix the miscompiles of 255.vortex, and I tested it
with a bootstrap of llvm-gcc, llvm-test and dejagnu of course.
llvm-svn: 90926
was being added to the Result vector, but not being put in the
cache. This means that if the cache was reused wholesale for a
later query that it would be missing this entry and we'd do an
incorrect load elimination.
Unfortunately, it's not really possible to write a useful
testcase for this, but this unbreaks 255.vortex.
llvm-svn: 90093
if we don't have an address expression available in a predecessor,
then model this as the value being clobbered at the end of the pred
block instead of being modeled as a complete phi translation failure.
This is important for PRE of loads because we want to see that the
load is available in all but this predecessor, and complete phi
translation failure results in not getting any information about
predecessors.
This doesn't do anything until I renable code insertion since PRE
now sees that it is available in all but one predecessors, but can't
insert the addressing in the predecessor that is missing it to
eliminate the redundancy.
llvm-svn: 90037
translation of add with immediate. This allows us
to optimize this function:
void test(int N, double* G) {
long j;
G[1] = 1;
for (j = 1; j < N - 1; j++)
G[j+1] = G[j] + G[j+1];
}
to only do one load every iteration of the loop.
llvm-svn: 90013
Update all analysis passes and transforms to treat free calls just like FreeInst.
Remove RaiseAllocations and all its tests since FreeInst no longer needs to be raised.
llvm-svn: 84987
so that all code paths get it. PR4256 was about a case where the
phi translation loop would find all preds in the Visited cache, so
it could get by without re-sorting the NonLocalPointerDeps cache.
Fix this by resorting it earlier, there is no reason not to do this.
This patch inspired by Jakub Staszak's patch.
llvm-svn: 75476
This avoids using a dangling pointer.
Reset NumSortedEntries after restoring Cache to avoid extraneous sorts.
This fixes the reduced sqlite3 testcase, but apparently not the whole app.
llvm-svn: 62838
analyses could be run without the caches properly sorted. This
can fix all sorts of weirdness. Many thanks to Bill for coming
up with the 'issorted' verification idea.
llvm-svn: 62757
visited set before they are used. If used, their blocks need to be
added to the visited set so that subsequent queries don't use conflicting
pointer values in the cache result blocks.
llvm-svn: 61080
memdep keeps track of how PHIs affect the pointer in dep queries, which
allows it to eliminate the load in cases like rle-phi-translate.ll, which
basically end up being:
BB1:
X = load P
br BB3
BB2:
Y = load Q
br BB3
BB3:
R = phi [P] [Q]
load R
turning "load R" into a phi of X/Y. In addition to additional exposed
opportunities, this makes memdep safe in many cases that it wasn't before
(which is required for load PRE) and also makes it substantially more
efficient. For example, consider:
bb1: // has many predecessors.
P = some_operator()
load P
In this example, previously memdep would scan all the predecessors of BB1
to see if they had something that would mustalias P. In some cases (e.g.
test/Transforms/GVN/rle-must-alias.ll) it would actually find them and end
up eliminating something. In many other cases though, it would scan and not
find anything useful. MemDep now stops at a block if the pointer is defined
in that block and cannot be phi translated to predecessors. This causes it
to miss the (rare) cases like rle-must-alias.ll, but makes it faster by not
scanning tons of stuff that is unlikely to be useful. For example, this
speeds up GVN as a whole from 3.928s to 2.448s (60%)!. IMO, scalar GVN
should be enhanced to simplify the rle-must-alias pointer base anyway, which
would allow the loads to be eliminated.
In the future, this should be enhanced to phi translate through geps and
bitcasts as well (as indicated by FIXMEs) making memdep even more powerful.
llvm-svn: 61022
of a pointer. This allows is to catch more equivalencies. For example,
the type_lists_compatible_p function used to require two iterations of
the gvn pass (!) to delete its 18 redundant loads because the first pass
would CSE all the addressing computation cruft, which would unblock the
second memdep/gvn passes from recognizing them. This change allows
memdep/gvn to catch all 18 when run just once on the function (as is
typical :) instead of just 3.
On all of 403.gcc, this bumps up the # reundandancies found from:
63 gvn - Number of instructions PRE'd
153991 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
to:
63 gvn - Number of instructions PRE'd
154137 gvn - Number of instructions deleted
50185 gvn - Number of loads deleted
+120 loads deleted isn't bad.
llvm-svn: 60799
tricks based on readnone/readonly functions.
Teach memdep to look past readonly calls when analyzing
deps for a readonly call. This allows elimination of a
few more calls from 403.gcc:
before:
63 gvn - Number of instructions PRE'd
153986 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
after:
63 gvn - Number of instructions PRE'd
153991 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
5 calls isn't much, but this adds plumbing for the next change.
llvm-svn: 60794
load dependence queries. This allows GVN to eliminate a few more
instructions on 403.gcc:
152598 gvn - Number of instructions deleted
49240 gvn - Number of loads deleted
after:
153986 gvn - Number of instructions deleted
50069 gvn - Number of loads deleted
llvm-svn: 60786
the first block of a query specially. This makes the "complete query
caching" subsystem more effective, avoiding predecessor queries. This
speeds up GVN another 4%.
llvm-svn: 60752
track of whether the CachedNonLocalPointerInfo for a block is specific
to a block. If so, just return it without any pred scanning. This is
good for a 6% speedup on GVN (when it uses this lookup method, which
it doesn't right now).
llvm-svn: 60695
method. This will eventually take over load/store dep
queries from getNonLocalDependency. For now it works
fine, but is incredibly slow because it does no caching.
Lets not switch GVN to use it until that is fixed :)
llvm-svn: 60649
clobber with the current implementation. Instead of returning
a "precise clobber" just return a fuzzy one. This doesn't
matter to any clients anyway and should speed up analysis time
very very slightly.
llvm-svn: 60641
1. Merge the 'None' result into 'Normal', making loads
and stores return their dependencies on allocations as Normal.
2. Split the 'Normal' result into 'Clobber' and 'Def' to
distinguish between the cases when memdep knows the value is
produced from when we just know if may be changed.
3. Move some of the logic for determining whether readonly calls
are CSEs into memdep instead of it being in GVN. This still
leaves verification that the arguments are hte same to GVN to
let it know about value equivalences in different contexts.
4. Change memdep's call/call dependency analysis to use
getModRefInfo(CallSite,CallSite) instead of doing something
very weak. This only really matters for things like DSA, but
someday maybe we'll have some other decent context sensitive
analyses :)
5. This reimplements the guts of memdep to handle the new results.
6. This simplifies GVN significantly:
a) readonly call CSE is slightly simpler
b) I eliminated the "getDependencyFrom" chaining for load
elimination and load CSE doesn't have to worry about
volatile (they are always clobbers) anymore.
c) GVN no longer does any 'lastLoad' caching, leaving it to
memdep.
7. The logic in DSE is simplified a bit and sped up. A potentially
unsafe case was eliminated.
llvm-svn: 60607
vector instead of a densemap. This shrinks the memory usage of this thing
substantially (the high water mark) as well as making operations like
scanning it faster. This speeds up memdep slightly, gvn goes from
3.9376 to 3.9118s on 403.gcc
This also splits out the statistics for the cached non-local case to
differentiate between the dirty and clean cached case. Here's the stats
for 403.gcc:
6153 memdep - Number of dirty cached non-local responses
169336 memdep - Number of fully cached non-local responses
162428 memdep - Number of uncached non-local responses
yay for caching :)
llvm-svn: 60313
ReverseLocalDeps when we update it. This fixes a regression test
failure from my last commit.
Second, for each non-local cached information structure, keep a bit that
indicates whether it is dirty or not. This saves us a scan over the whole
thing in the common case when it isn't dirty.
llvm-svn: 60274
instead of containing them by value. This increases the density
(!) of NonLocalDeps as well as making the reallocation case
faster. This speeds up gvn on 403.gcc by 2% and makes room for
future improvements.
I'm not super thrilled with having to explicitly manage the new/delete
of the map, but it is necesary for the next change.
llvm-svn: 60271
If we see that a load depends on the allocation of its memory with no
intervening stores, we now return a 'None' depedency instead of "Normal".
This tweaks GVN to do its optimization with the new result.
llvm-svn: 60267