We no longer need a reference to RangedConstraintManager, we call top
level `State->assume` functions.
Differential Revision: https://reviews.llvm.org/D113261
D103314 introduced symbol simplification when a new constant constraint is
added. Currently, we simplify existing equivalence classes by iterating over
all existing members of them and trying to simplify each member symbol with
simplifySVal.
At the end of such a simplification round we may end up introducing a
new constant constraint. Example:
```
if (a + b + c != d)
return;
if (c + b != 0)
return;
// Simplification starts here.
if (b != 0)
return;
```
The `c == 0` constraint is the result of the first simplification iteration.
However, we could do another round of simplification to reach the conclusion
that `a == d`. Generally, we could do as many new iterations until we reach a
fixpoint.
We can reach to a fixpoint by recursively calling `State->assume` on the
newly simplified symbol. By calling `State->assume` we re-ignite the
whole assume machinery (along e.g with adjustment handling).
Why should we do this? By reaching a fixpoint in simplification we are capable
of discovering infeasible states at the moment of the introduction of the
**first** constant constraint.
Let's modify the previous example just a bit, and consider what happens without
the fixpoint iteration.
```
if (a + b + c != d)
return;
if (c + b != 0)
return;
// Adding a new constraint.
if (a == d)
return;
// This brings in a contradiction.
if (b != 0)
return;
clang_analyzer_warnIfReached(); // This produces a warning.
// The path is already infeasible...
if (c == 0) // ...but we realize that only when we evaluate `c == 0`.
return;
```
What happens currently, without the fixpoint iteration? As the inline comments
suggest, without the fixpoint iteration we are doomed to realize that we are on
an infeasible path only after we are already walking on that. With fixpoint
iteration we can detect that before stepping on that. With fixpoint iteration,
the `clang_analyzer_warnIfReached` does not warn in the above example b/c
during the evaluation of `b == 0` we realize the contradiction. The engine and
the checkers do rely on that either `assume(Cond)` or `assume(!Cond)` should be
feasible. This is in fact assured by the so called expensive checks
(LLVM_ENABLE_EXPENSIVE_CHECKS). The StdLibraryFuncionsChecker is notably one of
the checkers that has a very similar assertion.
Before this patch, we simply added the simplified symbol to the equivalence
class. In this patch, after we have added the simplified symbol, we remove the
old (more complex) symbol from the members of the equivalence class
(`ClassMembers`). Removing the old symbol is beneficial because during the next
iteration of the simplification we don't have to consider again the old symbol.
Contrary to how we handle `ClassMembers`, we don't remove the old Sym->Class
relation from the `ClassMap`. This is important for two reasons: The
constraints of the old symbol can still be found via it's equivalence class
that it used to be the member of (1). We can spare one removal and thus one
additional tree in the forest of `ClassMap` (2).
Performance and complexity: Let us assume that in a State we have N non-trivial
equivalence classes and that all constraints and disequality info is related to
non-trivial classes. In the worst case, we can simplify only one symbol of one
class in each iteration. The number of symbols in one class cannot grow b/c we
replace the old symbol with the simplified one. Also, the number of the
equivalence classes can decrease only, b/c the algorithm does a merge operation
optionally. We need N iterations in this case to reach the fixpoint. Thus, the
steps needed to be done in the worst case is proportional to `N*N`. Empirical
results (attached) show that there is some hardly noticeable run-time and peak
memory discrepancy compared to the baseline. In my opinion, these differences
could be the result of measurement error.
This worst case scenario can be extended to that cases when we have trivial
classes in the constraints and in the disequality map are transforming to such
a State where there are only non-trivial classes, b/c the algorithm does merge
operations. A merge operation on two trivial classes results in one non-trivial
class.
Differential Revision: https://reviews.llvm.org/D106823
We can reuse the "adjustment" handling logic in the higher level
of the solver by calling `State->assume`.
Differential Revision: https://reviews.llvm.org/D112296
Initiate the reorganization of the equality information during symbol
simplification. E.g., if we bump into `c + 1 == 0` during simplification
then we'd like to express that `c == -1`. It makes sense to do this only
with `SymIntExpr`s.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D111642
Prior to this, the solver was only able to verify whether two symbols
are equal/unequal, only when constants were involved. This patch allows
the solver to work over ranges as well.
Reviewed By: steakhal, martong
Differential Revision: https://reviews.llvm.org/D106102
Patch by: @manas (Manas Gupta)
Summary:
`a % b != 0` implies that `a != 0` for any `a` and `b`. This patch
extends the ConstraintAssignor to do just that. In fact, we could do
something similar with division and in case of multiplications we could
have some other inferences, but I'd like to keep these for future
patches.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51940
Reviewers: noq, vsavchenko, steakhal, szelethus, asdenyspetrov
Subscribers:
Differential Revision: https://reviews.llvm.org/D110357
In this patch we store a reference to `RangedConstraintManager` in the
`ConstraintAssignor`. This way it is possible to call back and reuse some
functions of it. This patch is exclusively needed for its child patches,
it is not intended to be a standalone patch.
Differential Revision: https://reviews.llvm.org/D111640
In this patch we simply move the definition of RangeConstraintManager before
the definition of ConstraintAssignor. This patch is exclusively needed for it's
child patch, so in the child the diff would be clean and the review would be
easier.
Differential Revision: https://reviews.llvm.org/D110387
The solver's symbol simplification mechanism was not able to handle cases
when a symbol is simplified to a concrete integer. This patch adds the
capability.
E.g., in the attached lit test case, the original symbol is `c + 1` and
it has a `[0, 0]` range associated with it. Then, a new condition `c == 0`
is assumed, so a new range constraint `[0, 0]` comes in for `c` and
simplification kicks in. `c + 1` becomes `0 + 1`, but the associated
range is `[0, 0]`, so now we are able to realize the contradiction.
Differential Revision: https://reviews.llvm.org/D110913
There is an error in the implementation of the logic of reaching the `Unknonw` tristate in CmpOpTable.
```
void cmp_op_table_unknownX2(int x, int y, int z) {
if (x >= y) {
// x >= y [1, 1]
if (x + z < y)
return;
// x + z < y [0, 0]
if (z != 0)
return;
// x < y [0, 0]
clang_analyzer_eval(x > y); // expected-warning{{TRUE}} expected-warning{{FALSE}}
}
}
```
We miss the `FALSE` warning because the false branch is infeasible.
We have to exploit simplification to discover the bug. If we had `x < y`
as the second condition then the analyzer would return the parent state
on the false path and the new constraint would not be part of the State.
But adding `z` to the condition makes both paths feasible.
The root cause of the bug is that we reach the `Unknown` tristate
twice, but in both occasions we reach the same `Op` that is `>=` in the
test case. So, we reached `>=` twice, but we never reached `!=`, thus
querying the `Unknonw2x` column with `getCmpOpStateForUnknownX2` is
wrong.
The solution is to ensure that we reached both **different** `Op`s once.
Differential Revision: https://reviews.llvm.org/D110910
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in clang.
Differential Revision: https://reviews.llvm.org/D110808
This change is an extension to D103967 where I added dump methods for
(dis)equality classes of the State. There, the (dis)equality classes and their
contents are dumped in an ordered fashion, they are ordered based on their
string representation. This is very useful once we start to use FileCheck to
test the State dump in certain tests.
Differential Revision: https://reviews.llvm.org/D106642
https://bugs.llvm.org/show_bug.cgi?id=51109
When we merged two classes, `*this` became an obsolete representation of
the new `State`. This is b/c the member relations had changed during the
previous merge of another member of the same class in a way that `*this`
had no longer any members. (`mergeImpl` might keep the member relations
to `Other` and could dissolve `*this`.)
Differential Revision: https://reviews.llvm.org/D106285
../../git/llvm-project/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2395:17: warning: 'clang::ento::ProgramStateRef {anonymous}::RangeConstraintManager::setRange(clang::ento::ProgramStateRef, {anonymous}::EquivalenceClass, clang::ento::RangeSet)' defined but not used [-Wunused-function]
../../git/llvm-project/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2384:10: warning: 'clang::ento::RangeSet {anonymous}::RangeConstraintManager::getRange(clang::ento::ProgramStateRef, {anonymous}::EquivalenceClass)' defined but not used [-Wunused-function]
Differential Revision: https://reviews.llvm.org/D106063
This patch simplifies the way we deal with (dis)equalities.
Due to the symmetry between constraint handler and range inferrer,
we can have very similar implementations of logic handling
questions about (dis)equality and assumptions involving (dis)equality.
It also helps us to remove one more visitor, and removes uncertainty
that we got all the right places to put `trackNE` and `trackEQ`.
Differential Revision: https://reviews.llvm.org/D105693
The new component is a symmetric response to SymbolicRangeInferrer.
While the latter is the unified component, which answers all the
questions what does the solver knows about a particular symbolic
expression, assignor associates new constraints (aka "assumes")
with symbolic expressions and can imply additional knowledge that
the solver can extract and use later on.
- Why do we need it and why is SymbolicRangeInferrer not enough?
As it is noted before, the inferrer only helps us to get the most
precise range information based on the existing knowledge and on the
mathematical foundations of different operations that symbolic
expressions actually represent. It doesn't introduce new constraints.
The assignor, on the other hand, can impose constraints on other
symbols using the same domain knowledge.
- But for some expressions, SymbolicRangeInferrer looks into constraints
for similar expressions, why can't we do that for all the cases?
That's correct! But in order to do something like this, we should
have a finite number of possible "similar expressions".
Let's say we are asked about `$a - $b` and we know something about
`$b - $a`. The inferrer can invert this expression and check
constraints for `$b - $a`. This is simple!
But let's say we are asked about `$a` and we know that `$a * $b != 0`.
In this situation, we can imply that `$a != 0`, but the inferrer shouldn't
try every possible symbolic expression `X` to check if `$a * X` or
`X * $a` is constrained to non-zero.
With the assignor mechanism, we can catch this implication right at
the moment we associate `$a * $b` with non-zero range, and set similar
constraints for `$a` and `$b` as well.
Differential Revision: https://reviews.llvm.org/D105692
Prior to this patch, we always gave priority to constraints that we
actually know about symbols in question. However, these can get
outdated and we can get better results if we look at all possible
sources of knowledge, including sub-expressions.
Differential Revision: https://reviews.llvm.org/D105436
This reverts commit 6f3b775c3e.
Test fails flakily, see comments on https://reviews.llvm.org/D103967
Also revert follow-up "[Analyzer] Attempt to fix windows bots test
failure b/c of new-line"
This reverts commit fe0e861a4d.
Since RangeSet::Factory actually contains BasicValueFactory, we can
remove value factory from many function signatures inside the solver.
Differential Revision: https://reviews.llvm.org/D105005
Consider the code
```
void f(int a0, int b0, int c)
{
int a1 = a0 - b0;
int b1 = (unsigned)a1 + c;
if (c == 0) {
int d = 7L / b1;
}
}
```
At the point of divisiion by `b1` that is considered to be non-zero,
which results in a new constraint for `$a0 - $b0 + $c`. The type
of this sym is unsigned, however, the simplified sym is `$a0 -
$b0` and its type is signed. This is probably the result of the
inherent improper handling of casts. Anyway, Range assignment
for constraints use this type information. Therefore, we must
make sure that first we simplify the symbol and only then we
assign the range.
Differential Revision: https://reviews.llvm.org/D104844
Update `setConstraint` to simplify existing equivalence classes when a
new constraint is added. In this patch we iterate over all existing
equivalence classes and constraints and try to simplfy them with
simplifySVal. This solves problematic cases where we have two symbols in
the tree, e.g.:
```
int test_rhs_further_constrained(int x, int y) {
if (x + y != 0)
return 0;
if (y != 0)
return 0;
clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}
clang_analyzer_eval(y == 0); // expected-warning{{TRUE}}
return 0;
}
```
Differential Revision: https://reviews.llvm.org/D103314
<string> is currently the highest impact header in a clang+llvm build:
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.
This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.
Differential Revision: https://reviews.llvm.org/D103888
ImmutableSet doesn't seem like the perfect fit for the RangeSet
data structure. It is good for saving memory in a persistent
setting, but not for the case when the population of the container
is tiny. This commit replaces RangeSet implementation and
redesigns the most common operations to be more efficient.
Differential Revision: https://reviews.llvm.org/D86465
Additionally, this patch puts an assertion checking for feasible
constraints in every place where constraints are assigned to states.
Differential Revision: https://reviews.llvm.org/D98948
This patch fixes the situation when our knowledge of disequalities
can help us figuring out that some assumption is infeasible, but
the solver still produces a state with inconsistent constraints.
Additionally, this patch adds a couple of assertions to catch this
type of problems easier.
Differential Revision: https://reviews.llvm.org/D98341
Summary:
This commmit adds another relation that we can track separately from
range constraints. Symbol disequality can help us understand that
two equivalence classes are not equal to each other. We can generalize
this knowledge to classes because for every a,b,c, and d that
a == b, c == d, and b != c it is true that a != d.
As a result, we can reason about other equalities/disequalities of symbols
that we know nothing else about, i.e. no constraint ranges associated
with them. However, we also benefit from the knowledge of disequal
symbols by following the rule:
if a != b and b == C where C is a constant, a != C
This information can refine associated ranges for different classes
and reduce the number of false positives and paths to explore.
Differential Revision: https://reviews.llvm.org/D83286
Summary:
For the most cases, we try to reason about symbol either based on the
information we know about that symbol in particular or about its
composite parts. This is faster and eliminates costly brute force
searches through existing constraints.
However, we do want to support some cases that are widespread enough
and involve reasoning about different existing constraints at once.
These include:
* resoning about 'a - b' based on what we know about 'b - a'
* reasoning about 'a <= b' based on what we know about 'a > b' or 'a < b'
This commit expands on that part by tracking symbols known to be equal
while still avoiding brute force searches. It changes the way we track
constraints for individual symbols. If we know for a fact that 'a == b'
then there is no need in tracking constraints for both 'a' and 'b' especially
if these constraints are different. This additional relationship makes
dead/live logic for constraints harder as we want to maintain as much
information on the equivalence class as possible, but we still won't
carry the information that we don't need anymore.
Differential Revision: https://reviews.llvm.org/D82445
Summary:
* Add a new function to delete points from range sets.
* Introduce an internal generic interface for range set intersections.
* Remove unnecessary bits from a couple of solver functions.
* Add in-code sections.
Differential Revision: https://reviews.llvm.org/D82381
An old clang warns that the const object has no default constructor so it may
remain uninitialized forever. That's a false alarm because all fields
have a default initializer. Apply the suggested fixit anyway.
Summary:
Implemented RangeConstraintManager::getRangeForComparisonSymbol which handles comparison operators.
RangeConstraintManager::getRangeForComparisonSymbol cares about the sanity of comparison expressions sequences helps reasonably to branch an exploded graph.
It can significantly reduce the graph and speed up the analysis. For more details, please, see the differential revision.
This fixes https://bugs.llvm.org/show_bug.cgi?id=13426
Differential Revision: https://reviews.llvm.org/D78933
Summary:
New logic tries to narrow possible result values of the remainder operation
based on its operands and their ranges. It also tries to be conservative
with negative operands because according to the standard the sign of
the result is implementation-defined.
rdar://problem/44978988
Differential Revision: https://reviews.llvm.org/D80117
Summary:
Previously the current solver started reasoning about bitwise AND
expressions only when one of the operands is a constant. However,
very similar logic could be applied to ranges. This commit addresses
this shortcoming. Additionally, it refines how we deal with negative
operands.
rdar://problem/54359410
Differential Revision: https://reviews.llvm.org/D79434
Summary:
Previously the current solver started reasoning about bitwise OR
expressions only when one of the operands is a constant. However,
very similar logic could be applied to ranges. This commit addresses
this shortcoming. Additionally, it refines how we deal with negative
operands.
Differential Revision: https://reviews.llvm.org/D79336
Summary:
This change introduces a new component to unite all of the reasoning
we have about operations on ranges in the analyzer's solver.
In many cases, we might conclude that the range for a symbolic operation
is much more narrow than the type implies. While reasoning about
runtime conditions (especially in loops), we need to support more and
more of those little pieces of logic. The new component mostly plays
a role of an organizer for those, and allows us to focus on the actual
reasoning about ranges and not dispatching manually on the types of the
nested symbolic expressions.
Differential Revision: https://reviews.llvm.org/D79232
The `SubEngine` interface is an interface with only one implementation
`EpxrEngine`. Adding other implementations are difficult and very
unlikely in the near future. Currently, if anything from `ExprEngine` is
to be exposed to other classes it is moved to `SubEngine` which
restricts the alternative implementations. The virtual methods are have
a slight perofrmance impact. Furthermore, instead of the `LLVM`-style
inheritance a native inheritance is used here, which renders `LLVM`
functions like e.g. `cast<T>()` unusable here. This patch removes this
interface and allows usage of `ExprEngine` directly.
Differential Revision: https://reviews.llvm.org/D80548
Summary:
This fixes https://bugs.llvm.org/show_bug.cgi?id=41588
RangeSet Negate function shall handle unsigned ranges as well as signed ones.
RangeSet getRangeForMinusSymbol function shall use wider variety of ranges, not only concrete value ranges.
RangeSet Intersect functions shall not produce assertions.
Changes:
Improved safety of RangeSet::Intersect function. Added isEmpty() check to prevent an assertion.
Added support of handling unsigned ranges to RangeSet::Negate and RangeSet::getRangeForMinusSymbol.
Extended RangeSet::getRangeForMinusSymbol to return not only range sets with single value [n,n], but with wide ranges [n,m].
Added unit test for Negate function.
Added regression tests for unsigned values.
Differential Revision: https://reviews.llvm.org/D77802
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
Differential revision: https://reviews.llvm.org/D66259
llvm-svn: 368942
Since rL335814, if the constraint manager cannot find a range set for `A - B`
(where `A` and `B` are symbols) it looks for a range for `B - A` and returns
it negated if it exists. However, if a range set for both `A - B` and `B - A`
is stored then it only returns the first one. If we both use `A - B` and
`B - A`, these expressions behave as two totally unrelated symbols. This way
we miss some useful deductions which may lead to false negatives or false
positives.
This tiny patch changes this behavior: if the symbolic expression the
constraint manager is looking for is a difference `A - B`, it tries to
retrieve the range for both `A - B` and `B - A` and if both exists it returns
the intersection of range `A - B` and the negated range of `B - A`. This way
every time a checker applies new constraints to the symbolic difference or to
its negated it always affects both the original difference and its negated.
Differential Revision: https://reviews.llvm.org/D55007
llvm-svn: 357167
Summary:
Removed the `GDM` checking what could prevent reports made by this visitor.
Now we rely on constraint changes instead.
(It reapplies 356318 with a feature from 356319 because build-bot failure.)
Reviewers: NoQ, george.karpenkov
Reviewed By: NoQ
Subscribers: cfe-commits, jdoerfert, gerazo, xazax.hun, baloghadamsoftware,
szepet, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp
Tags: #clang
Differential Revision: https://reviews.llvm.org/D54811
llvm-svn: 356322