When we timeout or exceed a max number of blocks within an inlined
function, we retry with no inlining starting from a node right before
the CallEnter node. We assume the state of that node is the state of the
program before we start evaluating the call. However, the node pruning
removes this node as unimportant.
Teach the node pruning to keep the predecessors of the call enter nodes.
llvm-svn: 157860
improved the pruning heuristics. The current heuristics are pretty good, but they make diagnostics
for uninitialized variables warnings particularly useless in some cases.
llvm-svn: 157734
a horrible bug in GetLazyBindings where we falsely appended a field suffix when traversing 3 or more
layers of lazy bindings. I don't have a reduced test case yet; but I have added the original source
to an internal regression test suite. I'll see about coming up with a reduced test case.
Fixes <rdar://problem/11405978> (for real).
llvm-svn: 156580
to reason about.
As part of taint propagation, we now allow creation of non-integer
symbolic expressions like a cast from int to float.
Addresses PR12511 (radar://11215362).
llvm-svn: 156578
RegionStore, so be explicit about it and generate UnknownVal().
This is a hack to ensure we never produce undefined values for a value
coming from a compound value. (The undefined values can lead to
false positives.)
radar://10127782
llvm-svn: 156446
disruptive, but it allows RegionStore to better "see" through casts that reinterpret arrays of values
as structs. Fixes <rdar://problem/11405978>.
llvm-svn: 156428
This could conceivably cut down on state proliferation, although we don't
use BasicConstraintManager by default anymore. No functionality change.
llvm-svn: 156362
This involves keeping track of three separate types: the symbol type, the
adjustment type, and the comparison type. For example, in "$x + 5 > 0ULL",
if the type of $x is 'signed char', the adjustment type is 'int' and the
comparison type is 'unsigned long long'. Most of the time these three types
will be the same, but we should still do the right thing when the
comparison value is out of range, and wraparound should be calculated in
the adjustment type.
This also re-disables an out-of-bounds test; we were extracting the symbol
from non-additive SymIntExprs, but then throwing away the integer.
Sorry for the large patch; both the basic and range constraint managers needed
to be updated together, since they share code in SimpleConstraintManager.
llvm-svn: 156361
There are more parts of the analyzer that could use the convenience of APSIntType, particularly the constraint engine, but that needs a fair amount of rewriting to handle mixed-type constraints anyway.
llvm-svn: 156360
SValBuilder should return an UnknownVal() when comparison of int and ptr
fails. Previous to this commit, it went on assuming that we are dealing
with pointer arithmetic.
PR12509, radar://11390991
llvm-svn: 156320
The logical change is that the integers in SymIntExprs may not have the same type as the symbols they are paired with. This was already the case with taint-propagation expressions created by SValBuilder::makeSymExprValNN, but I think those integers may never have been used. SimpleSValBuilder should be able to handle mixed-integer-type SymIntExprs fine now, though, and the constraint managers were already being defensive (though not entirely correct). All existing tests pass.
The logic in evalBinOpNN has been simplified so that conversion is done as late as possible. As a result, most of the switch cases have been reduced to do the minimal amount of work, delegating to another case when they can by substituting ConcreteInts and (as before) reversing the left and right arguments when useful.
Comparisons require special handling in two places (building SymIntExprs and evaluating constant-constant operations) because we don't /know/ the best type for comparing the two values. I've approximated the rules in Sema [C99 6.3.1.8] but it'd be nice to refactor Sema's actual algorithm into ASTContext.
This is also groundwork for handling mixed-type constraints better than we do now.
llvm-svn: 156270
We need to identify the value of ptr as
ElementRegion (result of pointer arithmetic) in the following code.
However, before this commit '(2-x)' evaluated to Unknown value, and as
the result, 'p + (2-x)' evaluated to Unknown value as well.
int *p = malloc(sizeof(int));
ptr = p + (2-x);
llvm-svn: 156052
The resulting type info is stored in the SymSymExpr, so no reason not to
support construction of expression with different subexpression types.
llvm-svn: 156051
The change resulted in multiple issues on the buildbot, so it's not
ready for prime time. Only enable history tracking for tainted
data(which is experimental) for now.
llvm-svn: 156049
values through interesting expressions. This allows us to map from interesting values in a caller
to interesting values in a caller, thus recovering some precision in diagnostics lost from IPA.
Fixes <rdar://problem/11327497>
llvm-svn: 155971
reason about the expression.
This essentially keeps more history about how symbolic values were
constructed. As an optimization, previous to this commit, we only kept
the history if one of the symbols was tainted, but it's valuable keep
the history around for other purposes as well: it allows us to avoid
constructing conjured symbols.
Specifically, we need to identify the value of ptr as
ElementRegion (result of pointer arithmetic) in the following code.
However, before this commit '(2-x)' evaluated to Unknown value, and as
the result, 'p + (2-x)' evaluated to Unknown value as well.
int *p = malloc(sizeof(int));
ptr = p + (2-x);
This change brings 2% slowdown on sqlite. Fixes radar://11329382.
llvm-svn: 155944
filter_decl_iterator had a weird mismatch where both op* and op-> returned T*
making it difficult to generalize this filtering behavior into a reusable
library of any kind.
This change errs on the side of value, making op-> return T* and op* return
T&.
(reviewed by Richard Smith)
llvm-svn: 155808
This is needed to ensure that we always report issues in the correct
function. For example, leaks are identified when we call remove dead
bindings. In order to make sure we report a callee's leak in the callee,
we have to run the operation in the callee's context.
This change required quite a bit of infrastructure work since:
- We used to only run remove dead bindings before a given statement;
here we need to run it after the last statement in the function. For
this, we added additional Program Point and special mode in the
SymbolReaper to remove all symbols in context lower than the current
one.
- The call exit operation turned into a sequence of nodes, which are
now guarded by CallExitBegin and CallExitEnd nodes for clarity and
convenience.
(Sorry for the long diff.)
llvm-svn: 155244
attached. Since we do not support any attributes which appertain to a statement
(yet), testing of this is necessarily quite minimal.
Patch by Alexander Kornienko!
llvm-svn: 154723
We should not deserialize unused declarations from the PCH file. Achieve
this by storing the top level declarations during parsing
(HandleTopLevelDecl ASTConsumer callback) and analyzing/building a call
graph only for those.
Tested the patch on a sample ObjC file that uses PCH. With the patch,
the analyzes is 17.5% faster and clang consumes 40% less memory.
Got about 10% overall build/analyzes time decrease on a large Objective
C project.
A bit of CallGraph refactoring/cleanup as well..
llvm-svn: 154625
As per Jordy's review. Creating a symbol here is more flexible; however
I could not come up with an example where it was needed. (What
constrains can be added on of the symbol constrained to 0?)
llvm-svn: 154542
we use the same Expr* as the one being currently visited. This is preparation for transitioning to having
ProgramPoints refer to CFGStmts.
This required a bit of trickery. We wish to keep the old Expr* bindings in the Environment intact,
as plenty of logic relies on it and there is no reason to change it, but we sometimes want the Stmt* for
the ProgramPoint to be different than the Expr* being used for bindings. This requires adding an extra
argument for some functions (e.g., evalLocation). This looks a bit strange for some clients, but
it will look a lot cleaner when were start using CFGStmt* in the appropriate places.
As some fallout, the diagnostics arrows are a bit difference, since some of the node locations have changed.
I have audited these, and they look reasonable.
llvm-svn: 154214
consolidate some commonly used category strings into global references (more of this can be done, I just did a few).
Fixes <rdar://problem/11191537>.
llvm-svn: 154121
Store this info inside the function summary generated for all analyzed
functions. This is useful for coverage stats and can be helpful for
analyzer state space search strategies.
llvm-svn: 153923
properly reason about such accesses, but we shouldn't emit bogus "uninitialized value" warnings
either. Fixes <rdar://problem/11127008>.
llvm-svn: 153913
Fixes a false positive (radar://11152419). The current solution of
adding the info into 3 places is quite ugly. Pending a generic pointer
escapes callback.
llvm-svn: 153731
count.
This is an optimization for "retry without inlining" option. Here, if we
failed to inline a function due to reaching the basic block max count,
we are going to store this information and not try to inline it
again in the translation unit. This can be viewed as a function summary.
On sqlite, with this optimization, we are 30% faster then before and
cover 10% more basic blocks (partially because the number of times we
reach timeout is decreased by 20%).
llvm-svn: 153730
The analyzer gives up path exploration under certain conditions. For
example, when the same basic block has been visited more than 4 times.
With inlining turned on, this could lead to decrease in code coverage.
Specifically, if we give up inside the inlined function, the rest of
parent's basic blocks will not get analyzed.
This commit introduces an option to enable re-run along the failed path,
in which we do not inline the last inlined call site. This is done by
enqueueing the node before the processing of the inlined call site
with a special policy encoded in the state. The policy tells us not to
inline the call site along the path.
This lead to ~10% increase in the number of paths analyzed. Even though
we expected a much greater coverage improvement.
The option is turned off by default for now.
llvm-svn: 153534
This required adding a change count token to BugReport, but also allowed us to ditch ImmutableList as the BugReporterVisitor data type.
Also, remove the hack from MallocChecker, now that visitors appear in the opposite order. This is not exactly a fix, but the common case -- custom diagnostics after generic ones -- is now the default behavior.
llvm-svn: 153369
Specifically, we use the last store of the leaked symbol in the leak diagnostic.
(No support for struct fields since the malloc checker doesn't track those
yet.)
+ Infrastructure to track the regions used in store evaluations.
This approach is more precise than iterating the store to
obtain the region bound to the symbol, which is used in RetainCount
checker. The region corresponds to what is uttered in the code in the
last store and we do not rely on the store implementation to support
this functionality.
llvm-svn: 153212
The symbol-aware stack hint combines the checker-provided message
with the information about how the symbol was passed to the callee: as
a parameter or a return value.
For malloc, the generated messages look like this :
"Returning from 'foo'; released memory via 1st parameter"
"Returning from 'foo'; allocated memory via 1st parameter"
"Returning from 'foo'; allocated memory returned"
"Returning from 'foo'; reallocation of 1st parameter failed"
(We are yet to handle cases when the symbol is a field in a struct or
an array element.)
llvm-svn: 152962
The only use of AggExprVisitor was in #if 0 code (the analyzer's incomplete C++ support), so there is no actual behavioral change anyway.
llvm-svn: 152856
BugVisitor DiagnosticPieces.
When checkers create a DiagnosticPieceEvent, they can supply an extra
string, which will be concatenated with the call exit message for every
call on the stack between the diagnostic event and the final bug report.
(This is a simple version, which could be/will be further enhanced.)
For example, this is used in Malloc checker to produce the ",
which allocated memory" in the following example:
static char *malloc_wrapper() { // 2. Entered call from 'use'
return malloc(12); // 3. Memory is allocated
}
void use() {
char *v;
v = malloc_wrapper(); // 1. Calling 'malloc_wrappers'
// 4. Returning from 'malloc_wrapper', which allocated memory
} // 5. Memory is never released; potential
memory leak
llvm-svn: 152837
track whether the referenced declaration comes from an enclosing
local context. I'm amenable to suggestions about the exact meaning
of this bit.
llvm-svn: 152491
as aborted, but didn't treat such cases as sinks in the ExplodedGraph.
Along the way, add basic support for CXXCatchStmt, expanding the set of code we actually analyze (hopefully correctly).
Fixes: <rdar://problem/10892489>
llvm-svn: 152468
We do not reanalyze a function, which has already been analyzed as an
inlined callee. As per PRELIMINARY testing, this gives over
50% run time reduction on some benchmarks without decreasing of the
number of bugs found.
Turning the mode on by default.
llvm-svn: 152440
Essentially, a bug centers around a story for various symbols and regions. We should only include
the path diagnostic events that relate to those symbols and regions.
The pruning is done by associating a set of interesting symbols and regions with a BugReporter, which
can be modified at BugReport creation or by BugReporterVisitors.
This patch reduces the diagnostics emitted in several of our test cases. I've vetted these as
having desired behavior. The only regression is a missing null check diagnostic for the return
value of realloc() in test/Analysis/malloc-plist.c. This will require some investigation to fix,
and I have added a FIXME to the test case.
llvm-svn: 152361