During my work on analyzer dependencies, I created a great amount of new
checkers that emitted no diagnostics at all, and were purely modeling some
function or another.
However, the user shouldn't really disable/enable these by hand, hence this
patch, which hides these by default. I intentionally chose not to hide alpha
checkers, because they have a scary enough name, in my opinion, to cause no
surprise when they emit false positives or cause crashes.
The patch introduces the Hidden bit into the TableGen files (you may remember
it before I removed it in D53995), and checkers that are either marked as
hidden, or are in a package that is marked hidden won't be displayed under
-analyzer-checker-help. -analyzer-checker-help-hidden, a new flag meant for
developers only, displays the full list.
Differential Revision: https://reviews.llvm.org/D60925
llvm-svn: 359720
Currently we always inline functions that have no branches, i.e. have exactly
three CFG blocks: ENTRY, some code, EXIT. This makes sense because when there
are no branches, it means that there's no exponential complexity introduced
by inlining such function. Such functions also don't trigger various fundamental
problems with our inlining mechanism, such as the problem of inlined
defensive checks.
Sometimes the CFG may contain more blocks, but in practice it still has
linear structure because all directions (except, at most, one) of all branches
turned out to be unreachable. When this happens, still treat the function
as "small". This is useful, in particular, for dealing with C++17 if constexpr.
Differential Revision: https://reviews.llvm.org/D61051
llvm-svn: 359531
the assertion is in fact incorrect: there is a cornercase in Objective-C++
in which a C++ object is not constructed with a constructor, but merely
zero-initialized. Namely, this happens when an Objective-C message is sent
to a nil and it is supposed to return a C++ object.
Differential Revision: https://reviews.llvm.org/D60988
llvm-svn: 359262
If macro "CHECK_X(x)" expands to something like "if (x != NULL) ...",
the "Assuming..." note no longer says "Assuming 'x' is equal to CHECK_X".
Differential Revision: https://reviews.llvm.org/D59121
llvm-svn: 359037
Summary:
The existing CTU mechanism imports `FunctionDecl`s where the definition is available in another TU. This patch extends that to VarDecls, to bind more constants.
- Add VarDecl importing functionality to CrossTranslationUnitContext
- Import Decls while traversing them in AnalysisConsumer
- Add VarDecls to CTU external mappings generator
- Name changes from "external function map" to "external definition map"
Reviewers: NoQ, dcoughlin, xazax.hun, george.karpenkov, martong
Reviewed By: xazax.hun
Subscribers: Charusso, baloghadamsoftware, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, george.karpenkov, mgorny, whisperity, szepet, rnkovacs, a.sidorin, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D46421
llvm-svn: 358968
When growing a body on a body farm, it's essential to use the same redeclaration
of the function that's going to be used during analysis. Otherwise our
ParmVarDecls won't match the ones that are used to identify argument regions.
This boils down to trusting the reasoning in AnalysisDeclContext. We shouldn't
canonicalize the declaration before farming the body because it makes us not
obey the sophisticated decision-making process of AnalysisDeclContext.
Differential Revision: https://reviews.llvm.org/D60899
llvm-svn: 358946
Stuffing invalid source locations (such as those in functions produced by
body farms) into path diagnostics causes crashes.
Fix a typo in a nearby function name.
Differential Revision: https://reviews.llvm.org/D60808
llvm-svn: 358945
Default RegionStore bindings represent values that can be obtained by loading
from anywhere within the region, not just the specific offset within the region
that they are said to be bound to. For example, default-binding a character \0
to an int (eg., via memset()) means that the whole int is 0, not just
that its lower byte is 0.
Even though memset and bzero were modeled this way, it didn't work correctly
when applied to simple variables. Eg., in
int x;
memset(x, 0, sizeof(x));
we did produce a default binding, but were unable to read it later, and 'x'
was perceived as an uninitialized variable even after memset.
At the same time, if we replace 'x' with a variable of a structure or array
type, accessing fields or elements of such variable was working correctly,
which was enough for most cases. So this was only a problem for variables of
simple integer/enumeration/floating-point/pointer types.
Fix loading default bindings from RegionStore for regions of simple variables.
Add a unit test to document the API contract as well.
Differential Revision: https://reviews.llvm.org/D60742
llvm-svn: 358722
Writing stuff into an argument variable is usually equivalent to writing stuff
to a local variable: it will have no effect outside of the function.
There's an important exception from this rule: if the argument variable has
a non-trivial destructor, the destructor would be invoked on
the parent stack frame, exposing contents of the otherwise dead
argument variable to the caller.
If such argument is the last place where a pointer is stored before the function
exits and the function is the one we've started our analysis from (i.e., we have
no caller context for it), we currently diagnose a leak. This is incorrect
because the destructor of the argument still has access to the pointer.
The destructor may deallocate the pointer or even pass it further.
Treat writes into such argument regions as "escapes" instead, suppressing
spurious memory leak reports but not messing with dead symbol removal.
Differential Revision: https://reviews.llvm.org/D60112
llvm-svn: 358321
The idea behind this heuristic is that normally the visitor is there to
inform the user that a certain function may fail to initialize a certain
out-parameter. For system header functions this is usually dictated by the
contract, and it's unlikely that the header function has accidentally
forgot to put the value into the out-parameter; it's more likely
that the user has intentionally skipped the error check.
Warnings on skipped error checks are more like security warnings;
they aren't necessarily useful for all users, and they should instead
be introduced on a per-API basis.
Differential Revision: https://reviews.llvm.org/D60107
llvm-svn: 357810
Requires making the llvm::MemoryBuffer* stored by SourceManager const,
which in turn requires making the accessors for that return const
llvm::MemoryBuffer*s and updating all call sites.
The original motivation for this was to use it and fix the TODO in
CodeGenAction.cpp's ConvertBackendLocation() by using the UnownedTag
version of createFileID, and since llvm::SourceMgr* hands out a const
llvm::MemoryBuffer* this is required. I'm not sure if fixing the TODO
this way actually works, but this seems like a good change on its own
anyways.
No intended behavior change.
Differential Revision: https://reviews.llvm.org/D60247
llvm-svn: 357724
It turns out that SourceManager::isInSystemHeader() crashes when an invalid
source location is passed into it. Invalid source locations are relatively
common: not only they come from body farms, but also, say, any function in C
that didn't come with a forward declaration would have an implicit
forward declaration with invalid source locations.
There's a more comfy API for us to use in the Static Analyzer:
CallEvent::isInSystemHeader(), so just use that.
Differential Revision: https://reviews.llvm.org/D59901
llvm-svn: 357329
It is now an inter-checker communication API, similar to the one that
connects MallocChecker/CStringChecker/InnerPointerChecker: simply a set of
setters and getters for a state trait.
Differential Revision: https://reviews.llvm.org/D59861
llvm-svn: 357326
The transfer function for the CFG element that represents a logical operation
computes the value of the operation and does nothing else. The element
appears after all the short circuit decisions were made, so they don't need
to be made again at this point.
Because our expression evaluation is imprecise, it is often hard to
discriminate between:
(1) we don't know the value of the RHS because we failed to evaluate it
and
(2) we don't know the value of the RHS because it didn't need to be evaluated.
This is hard because it depends on our knowledge about the value of the LHS
(eg., if LHS is true, then RHS in (LHS || RHS) doesn't need to be computed)
but LHS itself may have been evaluated imprecisely and we don't know whether
it is true or not. Additionally, the Analyzer wouldn't necessarily even remember
what the value of the LHS was because theoretically it's not really necessary
to know it for any future evaluations.
In order to work around these issues, the transfer function for logical
operations consists in looking at the ExplodedGraph we've constructed so far
in order to figure out from which CFG direction did we arrive here.
Such post-factum backtracking that doesn't involve looking up LHS and RHS values
is usually possible. However sometimes it fails because when we deduplicate
exploded nodes with the same program point and the same program state we may end
up in a situation when we reached the same program point from two or more
different directions.
By removing the assertion, we admit that the procedure indeed sometimes fails to
work. When it fails, we also admit that we don't know the value of the logical
operator.
Differential Revision: https://reviews.llvm.org/D59857
llvm-svn: 357325
Almost all path-sensitive checkers need to tell the user when something specific
to that checker happens along the execution path but does not constitute a bug
on its own. For instance, a call to operator delete in C++ has consequences
that are specific to a use-after-free bug. Deleting an object is not a bug
on its own, but when the Analyzer finds an execution path on which a deleted
object is used, it'll have to explain to the user when exactly during that path
did the deallocation take place.
Historically such custom notes were added by implementing "bug report visitors".
These visitors were post-processing bug reports by visiting every ExplodedNode
along the path and emitting path notes whenever they noticed that a change that
is relevant to a bug report occurs within the program state. For example,
it emits a "memory is deallocated" note when it notices that a pointer changes
its state from "allocated" to "deleted".
The "visitor" approach is powerful and efficient but hard to use because
such preprocessing implies that the developer first models the effects
of the event (say, changes the pointer's state from "allocated" to "deleted"
as part of operator delete()'s transfer function) and then forgets what happened
and later tries to reverse-engineer itself and figure out what did it do
by looking at the report.
The proposed approach tries to avoid discarding the information that was
available when the transfer function was evaluated. Instead, it allows the
developer to capture all the necessary information into a closure that
will be automatically invoked later in order to produce the actual note.
This should reduce boilerplate and avoid very painful logic duplication.
On the technical side, the closure is a lambda that's put into a special kind of
a program point tag, and a special bug report visitor visits all nodes in the
report and invokes all note-producing closures it finds along the path.
For now it is up to the lambda to make sure that the note is actually relevant
to the report. For instance, a memory deallocation note would be irrelevant when
we're reporting a division by zero bug or if we're reporting a use-after-free
of a different, unrelated chunk of memory. The lambda can figure these thing out
by looking at the bug report object that's passed into it.
A single checker is refactored to make use of the new functionality: MIGChecker.
Its program state is trivial, making it an easy testing ground for the first
version of the API.
Differential Revision: https://reviews.llvm.org/D58367
llvm-svn: 357323
Since rL335814, if the constraint manager cannot find a range set for `A - B`
(where `A` and `B` are symbols) it looks for a range for `B - A` and returns
it negated if it exists. However, if a range set for both `A - B` and `B - A`
is stored then it only returns the first one. If we both use `A - B` and
`B - A`, these expressions behave as two totally unrelated symbols. This way
we miss some useful deductions which may lead to false negatives or false
positives.
This tiny patch changes this behavior: if the symbolic expression the
constraint manager is looking for is a difference `A - B`, it tries to
retrieve the range for both `A - B` and `B - A` and if both exists it returns
the intersection of range `A - B` and the negated range of `B - A`. This way
every time a checker applies new constraints to the symbolic difference or to
its negated it always affects both the original difference and its negated.
Differential Revision: https://reviews.llvm.org/D55007
llvm-svn: 357167
r356634 didn't fix all the problems caused by r356222 - even though simple
constructors involving transparent init-list expressions are now evaluated
precisely, many more complicated constructors aren't, for other reasons.
The attached test case is an example of a constructor that will never be
evaluated precisely - simply because there isn't a constructor there (instead,
the program invokes run-time undefined behavior by returning without a return
statement that should have constructed the return value).
Fix another part of the problem for such situations: evaluate transparent
init-list expressions transparently, so that to avoid creating ill-formed
"transparent" nonloc::CompoundVals.
Differential Revision: https://reviews.llvm.org/D59622
llvm-svn: 356969
Summary:
If the constraint information is not changed between two program states the
analyzer has not learnt new information and made no report. But it is
possible to happen because we have no information at all. The new approach
evaluates the condition to determine if that is the case and let the user
know we just `Assuming...` some value.
Reviewers: NoQ, george.karpenkov
Reviewed By: NoQ
Subscribers: llvm-commits, xazax.hun, baloghadamsoftware, szepet, a.sidorin,
mikhail.ramalho, Szelethus, donat.nagy, dkrupp, gsd, gerazo
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D57410
llvm-svn: 356323
Summary:
Removed the `GDM` checking what could prevent reports made by this visitor.
Now we rely on constraint changes instead.
(It reapplies 356318 with a feature from 356319 because build-bot failure.)
Reviewers: NoQ, george.karpenkov
Reviewed By: NoQ
Subscribers: cfe-commits, jdoerfert, gerazo, xazax.hun, baloghadamsoftware,
szepet, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp
Tags: #clang
Differential Revision: https://reviews.llvm.org/D54811
llvm-svn: 356322
Summary: If the constraint information is not changed between two program states the analyzer has not learnt new information and made no report. But it is possible to happen because we have no information at all. The new approach evaluates the condition to determine if that is the case and let the user know we just 'Assuming...' some value.
Reviewers: NoQ, george.karpenkov
Reviewed By: NoQ
Subscribers: xazax.hun, baloghadamsoftware, szepet, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, gsd, gerazo
Tags: #clang
Differential Revision: https://reviews.llvm.org/D57410
llvm-svn: 356319
Summary: Removed the `GDM` checking what could prevent reports made by this visitor. Now we rely on constraint changes instead.
Reviewers: NoQ, george.karpenkov
Reviewed By: NoQ
Subscribers: jdoerfert, gerazo, xazax.hun, baloghadamsoftware, szepet, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp
Tags: #clang
Differential Revision: https://reviews.llvm.org/D54811
llvm-svn: 356318
RegionStore now knows how to bind a nonloc::CompoundVal that represents the
value of an aggregate initializer when it has its initial segment of sub-values
correspond to base classes.
Additionally, fixes the crash from pr40022.
Differential Revision: https://reviews.llvm.org/D59054
llvm-svn: 356222
For a rather short code snippet, if debug.ReportStmts (added in this patch) was
enabled, a bug reporter visitor crashed:
struct h {
operator int();
};
int k() {
return h();
}
Ultimately, this originated from PathDiagnosticLocation::createMemberLoc, as it
didn't handle the case where it's MemberExpr typed parameter returned and
invalid SourceLocation for MemberExpr::getMemberLoc. The solution was to find
any related valid SourceLocaion, and Stmt::getBeginLoc happens to be just that.
Differential Revision: https://reviews.llvm.org/D58777
llvm-svn: 356161
Buildbot breaks when LLVm is compiled with memory sanitizer.
WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0xa3d16d8 in getMacroNameAndPrintExpansion(blahblah)
lib/StaticAnalyzer/Core/PlistDiagnostics.cpp:903:11
llvm-svn: 355911
When there is a functor-like macro which is passed as parameter to another
"function" macro then its parameters are not listed at the place of expansion:
#define foo(x) int bar() { return x; }
#define hello(fvar) fvar(0)
hello(foo)
int main() { 1 / bar(); }
Expansion of hello(foo) asserted Clang, because it expected an l_paren token in
the 3rd line after "foo", since it is a function-like token.
Patch by Tibor Brunner!
Differential Revision: https://reviews.llvm.org/D57893
llvm-svn: 355903
In the commited testfile, macro expansion (the one implemented for the plist
output) runs into an infinite recursion. The issue originates from the algorithm
being faulty, as in
#define value REC_MACRO_FUNC(value)
the "value" is being (or at least attempted) expanded from the same macro.
The solved this issue by gathering already visited macros in a set, which does
resolve the crash, but will result in an incorrect macro expansion, that would
preferably be fixed down the line.
Patch by Tibor Brunner!
Differential Revision: https://reviews.llvm.org/D57891
llvm-svn: 355705
Asserting on invalid input isn't very nice, hence the patch to emit an error
instead.
This is the first of many patches to overhaul the way we handle checker options.
Differential Revision: https://reviews.llvm.org/D57850
llvm-svn: 355704
Summary:
When comparing a symbolic region and a constant, the constant would be
widened or truncated to the width of a void pointer, meaning that the
constant could be incorrectly truncated when handling symbols for
non-default address spaces. In the attached test case this resulted in a
false positive since the constant was truncated to zero. To fix this,
widen/truncate the constant to the width of the symbol expression's
type.
This commit does not consider non-symbolic regions as I'm not sure how
to generalize getting the type there.
This fixes PR40814.
Reviewers: NoQ, zaks.anna, george.karpenkov
Reviewed By: NoQ
Subscribers: xazax.hun, baloghadamsoftware, szepet, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, jdoerfert, Charusso, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D58665
llvm-svn: 355592
This patch includes the necessary code for converting between a fixed point type and integer.
This also includes constant expression evaluation for conversions with these types.
Differential Revision: https://reviews.llvm.org/D56900
llvm-svn: 355462
Under the term "subchecker", I mean checkers that do not have a checker class on
their own, like unix.MallocChecker to unix.DynamicMemoryModeling.
Since a checker object was required in order to retrieve checker options,
subcheckers couldn't possess options on their own.
This patch is also an excuse to change the argument order of getChecker*Option,
it always bothered me, now it resembles the actual command line argument
(checkername:option=value).
Differential Revision: https://reviews.llvm.org/D57579
llvm-svn: 355297
#define f(y) x
#define x f(x)
int main() { x; }
This example results a compilation error since "x" in the first line was not
defined earlier. However, the macro expression printer goes to an infinite
recursion on this example.
Patch by Tibor Brunner!
Differential Revision: https://reviews.llvm.org/D57892
llvm-svn: 354806
FindLastStoreBRVisitor tries to find the first node in the exploded graph where
the current value was assigned to a region. This node is called the "store
site". It is identified by a pair of Pred and Succ nodes where Succ already has
the binding for the value while Pred does not have it. However the visitor
mistakenly identifies a node pair as the store site where the value is a
`LazyCompoundVal` and `Pred` does not have a store yet but `Succ` has it. In
this case the `LazyCompoundVal` is different in the `Pred` node because it also
contains the store which is different in the two nodes. This error may lead to
crashes (a declaration is cast to a parameter declaration without check) or
misleading bug path notes.
In this patch we fix this problem by checking for unequal `LazyCompoundVals`: if
their region is equal, and their store is the same as the store of their nodes
we consider them as equal when looking for the "store site". This is an
approximation because we do not check for differences of the subvalues
(structure members or array elements) in the stores.
Differential Revision: https://reviews.llvm.org/D58067
llvm-svn: 353943
Now, instead of passing the reference to a shared_ptr, we pass the shared_ptr instead.
I've also removed the check if Z3 is present in CreateZ3ConstraintManager as this function already calls CreateZ3Solver that performs the exactly same check.
Differential Revision: https://reviews.llvm.org/D54976
llvm-svn: 353371
This patch moves the ConstraintSMT definition to the SMTConstraintManager header to make it easier to move the Z3 backend around.
We achieve this by not using shared_ptr anymore, as llvm::ImmutableSet doesn't seem to like it.
The solver specific exprs and sorts are cached in the Z3Solver object now and we move pointers to those objects around.
As a nice side-effect, SMTConstraintManager doesn't have to be a template anymore. Yay!
Differential Revision: https://reviews.llvm.org/D54975
llvm-svn: 353370