The `Profile` function was incorrectly implemented.
The `StreamErrorState` has an implicit `bool` conversion operator, which
will result in a different hash than faithfully hashing the raw value of
the enum.
I don't have a test for it, since it seems difficult to find one.
Even if we would have one, any change in the hashing algorithm would
have a chance of breaking it, so I don't think it would justify the
effort.
Depends on D127836, which uncovered this issue by marking the related
`Profile` function dead.
Reviewed By: martong, balazske
Differential Revision: https://reviews.llvm.org/D127839
Thanks @kazu for helping me clean these parts in D127799.
I'm leaving the dump methods, along with the unused visitor handlers and
the forwarding methods.
The dead parts actually helped to uncover two bugs, to which I'm going
to post separate patches.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D127836
This is an initial step of removing the SimpleSValBuilder abstraction. The SValBuilder alone should be enough.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D126127
This change specializes the LLVM RTTI mechanism for SVals.
After this change, we can use the well-known `isa`, `cast`, `dyn_cast`.
Examples:
// SVal V = ...;
// Loc MyLoc = ...;
bool IsInteresting = isa<loc::MemRegionVal, loc::GotoLabel>(MyLoc);
auto MRV = cast<loc::MemRegionVal>(MyLoc);
Optional<loc::MemRegionVal> MaybeMRV = dyn_cast<loc::MemRegionVal>(V)
The current `SVal::getAs` and `castAs` member functions are redundant at
this point, but I believe that they are still handy.
The member function version is terse and reads left-to-right, which IMO
is a great plus. However, we should probably add a variadic `isa` member
function version to have the same casting API in both cases.
Thanks for the extensive TMP help @bzcheeseman!
Reviewed By: bzcheeseman
Differential Revision: https://reviews.llvm.org/D125709
I'm trying to remove unused options from the `Analyses.def` file, then
merge the rest of the useful options into the `AnalyzerOptions.def`.
Then make sure one can set these by an `-analyzer-config XXX=YYY` style
flag.
Then surface the `-analyzer-config` to the `clang` frontend;
After all of this, we can pursue the tablegen approach described
https://discourse.llvm.org/t/rfc-tablegen-clang-static-analyzer-engine-options-for-better-documentation/61488
In this patch, I'm proposing flag deprecations.
We should support deprecated analyzer flags for exactly one release. In
this case I'm planning to drop this flag in `clang-16`.
In the clang frontend, now we won't pass this option to the cc1
frontend, rather emit a warning diagnostic reminding the users about
this deprecated flag, which will be turned into error in clang-16.
Unfortunately, I had to remove all the tests referring to this flag,
causing a mass change. I've also added a test for checking this warning.
I've seen that `scan-build` also uses this flag, but I think we should
remove that part only after we turn this into a hard error.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D126215
I'm trying to remove unused options from the `Analyses.def` file, then
merge the rest of the useful options into the `AnalyzerOptions.def`.
Then make sure one can set these by an `-analyzer-config XXX=YYY` style
flag.
Then surface the `-analyzer-config` to the `clang` frontend;
After all of this, we can pursue the tablegen approach described
https://discourse.llvm.org/t/rfc-tablegen-clang-static-analyzer-engine-options-for-better-documentation/61488
In this patch, I'm proposing flag deprecations.
We should support deprecated analyzer flags for exactly one release. In
this case I'm planning to drop this flag in `clang-16`.
In the clang frontend, now we won't pass this option to the cc1
frontend, rather emit a warning diagnostic reminding the users about
this deprecated flag, which will be turned into error in clang-16.
Unfortunately, I had to remove all the tests referring to this flag,
causing a mass change. I've also added a test for checking this warning.
I've seen that `scan-build` also uses this flag, but I think we should
remove that part only after we turn this into a hard error.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D126215
I've faced crashes in the past multiple times when some
`check::EndAnalysis` callback caused some crash.
It's really anoying that it doesn't tell which function triggered this
callback.
This patch adds the well-known trace for that situation as well.
Example:
1. <eof> parser at end of file
2. While analyzing stack:
#0 Calling test11
Note that this does not have tests.
I've considered `unittests` for this purpose, by using the
`ASSERT_DEATH()` similarly how we check double eval called functions in
`ConflictingEvalCallsTest.cpp`, however, that the testsuite won't invoke
the custom handlers. Only the message of the `llvm_unreachable()` will
be printed. Consequently, it's not applicable for us testing this
feature.
I've also considered using an end-to-end LIT test for this.
For that, we would need to somehow overload the `clang_analyzer_crash()`
`ExprInspection` handler, to get triggered by other events than the
`EvalCall`. I'm not saying that we could not come up with a generic way
of causing crash in a specific checker callback, but I'm not sure if
that would worth the effort.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D127389
Depends on D126560. `getKnownValue` has been changed by the parent patch
in a way that simplification was removed. This is not correct when the
function is called by the Checkers. Thus, a new internal function is
introduced, `getConstValue`, which simply queries the constraint manager.
This `getConstValue` is used internally in the `SimpleSValBuilder` when a
binop is evaluated, this way we avoid the recursion into the `Simplifier`.
Differential Revision: https://reviews.llvm.org/D127285
A crash was seen in CastValueChecker due to a null pointer dereference.
The fix uses QualType::getAsString to avoid the null dereference
when a CXXRecordDecl cannot be obtained. A small reproducer is added,
and cast value notes LITs are updated for the new debug messages.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D127105
Aligned with the measures we had in D124674, this condition seems to be
unlikely.
Nevertheless, I've made some new measurments with stats just for this,
and data confirms this is indeed unlikely.
Differential Revision: https://reviews.llvm.org/D127190
Assume functions might recurse (see `reAssume` or `tryRearrange`).
During the recursion, the State might not change anymore, that means we
reached a fixpoint. In this patch, we avoid infinite recursion of assume
calls by checking already visited States on the stack of assume function
calls. This patch renders the previous "workaround" solution (D47155)
unnecessary. Note that this is not an NFC patch. If we were to limit the
maximum stack depth of the assume calls to 1 then would it be equivalent
with the previous solution in D47155.
Additionally, in D113753, we simplify the symbols right at the beginning
of evalBinOpNN. So, a call to `simplifySVal` in `getKnownValue` (added
in D51252) is no longer needed.
Fixes https://github.com/llvm/llvm-project/issues/55851
Differential Revision: https://reviews.llvm.org/D126560
I'm also hoisting common code from the existing specializations into a
common trait impl to reduce code duplication.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D126801
Make the SimpleSValBuilder to be able to look up and use a constraint
for an operand of a SymbolCast, when the operand is constrained to a
const value.
This part of the SValBuilder is responsible for constant folding. We
need this constant folding, so the engine can work with less symbols,
this way it can be more efficient. Whenever a symbol is constrained with
a constant then we substitute the symbol with the corresponding integer.
If a symbol is constrained with a range, then the symbol is kept and we
fall-back to use the range based constraint manager, which is not that
efficient. This patch is the natural extension of the existing constant
folding machinery with the support of SymbolCast symbols.
Differential Revision: https://reviews.llvm.org/D126481
This reverts commit 3988bd1398.
Did not build on this bot:
https://lab.llvm.org/buildbot#builders/215/builds/6372
/usr/include/c++/9/bits/predefined_ops.h:177:11: error: no match for call to
‘(llvm::less_first) (std::pair<long unsigned int, llvm::bolt::BinaryBasicBlock*>&, const std::pair<long unsigned int, std::nullptr_t>&)’
177 | { return bool(_M_comp(*__it, __val)); }
One could reuse this functor instead of rolling out your own version.
There were a couple other cases where the code was similar, but not
quite the same, such as it might have an assertion in the lambda or other
constructs. Thus, I've not touched any of those, as it might change the
behavior in some way.
As per https://discourse.llvm.org/t/submitting-simple-nfc-patches/62640/3?u=steakhal
Chris Lattner
> LLVM intentionally has a “yes, you can apply common sense judgement to
> things” policy when it comes to code review. If you are doing mechanical
> patches (e.g. adopting less_first) that apply to the entire monorepo,
> then you don’t need everyone in the monorepo to sign off on it. Having
> some +1 validation from someone is useful, but you don’t need everyone
> whose code you touch to weigh in.
Differential Revision: https://reviews.llvm.org/D126068
This patch annotates the most important analyzer function APIs.
Also adds a couple of assertions for uncovering any potential issues
earlier in the constructor; in those cases, the member functions were
already dereferencing the members unconditionally anyway.
Measurements showed no performance impact, nor crashes.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D126198
This patch adds a new descendant to the SymExpr hierarchy. This way, now
we can assign constraints to symbolic unary expressions. Only the unary
minus and bitwise negation are handled.
Differential Revision: https://reviews.llvm.org/D125318
Depends on D124758. That patch introduced serious regression in the run-time in
some special cases. This fixes that.
Differential Revision: https://reviews.llvm.org/D126406
Fixes https://github.com/llvm/llvm-project/issues/55546
The assertion mentioned in the issue is triggered because an
inconsistency is formed in the Sym->Class and Class->Sym relations. A
simpler but similar inconsistency is demonstrated here:
https://reviews.llvm.org/D114887 .
Previously in `removeMember`, we didn't remove the old symbol's
Sym->Class relation. Back then, we explained it with the following two
bullet points:
> 1) This way constraints for the old symbol can still be found via it's
> equivalence class that it used to be the member of.
> 2) Performance and resource reasons. We can spare one removal and thus one
> additional tree in the forest of `ClassMap`.
This patch do remove the old symbol's Sym->Class relation in order to
keep the Sym->Class relation consistent with the Class->Sym relations.
Point 2) above has negligible performance impact, empirical measurements
do not show any noticeable difference in the run-time. Point 1) above
seems to be a not well justified statement. This is because we cannot
create a new symbol that would be equal to the old symbol after the
simplification had happened. The reason for this is that the SValBuilder
uses the available constant constraints for each sub-symbol.
Differential Revision: https://reviews.llvm.org/D126281
Depends on D125892. There might be efficiency and performance
implications by using a lambda. Thus, I am going to conduct measurements
to see if there is any noticeable impact.
I've been thinking about two more alternatives:
1) Make `assumeDualImpl` a variadic template and (perfect) forward the
arguments for the used `assume` function.
2) Use a macros.
I have concerns though, whether these alternatives would deteriorate the
readability of the code.
Differential Revision: https://reviews.llvm.org/D125954
Depends on D124758. This is the very same thing we have done for
assumeDual, but this time we do it for assumeInclusiveRange. This patch
is basically a no-brainer copy of that previous patch.
Differential Revision:
https://reviews.llvm.org/D125892
Most clients only used these methods because they wanted to be able to
extend or truncate to the same bit width (which is a no-op). Now that
the standard zext, sext and trunc allow this, there is no reason to use
the OrSelf versions.
The OrSelf versions additionally have the strange behaviour of allowing
extending to a *smaller* width, or truncating to a *larger* width, which
are also treated as no-ops. A small amount of client code relied on this
(ConstantRange::castOp and MicrosoftCXXNameMangler::mangleNumber) and
needed rewriting.
Differential Revision: https://reviews.llvm.org/D125557
This new CTU implementation is the natural extension of the normal single TU
analysis. The approach consists of two analysis phases. During the first phase,
we do a normal single TU analysis. During this phase, if we find a foreign
function (that could be inlined from another TU) then we don’t inline that
immediately, we rather mark that to be analysed later.
When the first phase is finished then we start the second phase, the CTU phase.
In this phase, we continue the analysis from that point (exploded node)
which had been enqueued during the first phase. We gradually extend the
exploded graph of the single TU analysis with the new node that was
created by the inlining of the foreign function.
We count the number of analysis steps of the first phase and we limit the
second (ctu) phase with this number.
This new implementation makes it convenient for the users to run the
single-TU and the CTU analysis in one go, they don't need to run the two
analysis separately. Thus, we name this new implementation as "onego" CTU.
Discussion:
https://discourse.llvm.org/t/rfc-much-faster-cross-translation-unit-ctu-analysis-implementation/61728
Differential Revision: https://reviews.llvm.org/D123773
In some rare cases the type of an SVal might be interesting.
This introspection function exposes this information in tests.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D125532
BoolAssignment checker is now taint-aware and warns if a tainted value is
assigned.
Original author: steakhal
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D125360
This PR changes the `SymIntExpr` so the expression that uses a
negative value as `RHS`, for example: `x +/- (-N)`, is modeled as
`x -/+ N` instead.
This avoids producing a very large `RHS` when the symbol is cased to
an unsigned number, and as consequence makes the value more robust in
presence of casts.
Note that this change is not applied if `N` is the lowest negative
value for which negation would not be representable.
Reviewed By: steakhal
Patch By: tomasz-kaminski-sonarsource!
Differential Revision: https://reviews.llvm.org/D124658
Summary:
By evaluating both children states, now we are capable of discovering
infeasible parent states. In this patch, `assume` is implemented in the terms
of `assumeDuali`. This might be suboptimal (e.g. where there are adjacent
assume(true) and assume(false) calls, next patches addresses that). This patch
fixes a real CRASH.
Fixes https://github.com/llvm/llvm-project/issues/54272
Differential Revision:
https://reviews.llvm.org/D124758
In some cases a parent State is already infeasible, but we recognize
this only if an additonal constraint is added. This patch is the first
of a series to address this issue. In this patch `assumeDual` is changed
to clone the parent State but with an `Infeasible` flag set, and this
infeasible-parent is returned both for the true and false case. Then
when we add a new transition in the exploded graph and the destination
is marked as infeasible, the node will be a sink node.
Related bug:
https://github.com/llvm/llvm-project/issues/50883
Actually, this patch does not solve that bug in the solver, rather with
this patch we can handle the general parent-infeasible cases.
Next step would be to change the State API and require all checkers to
use the `assume*Dual` API and deprecate the simple `assume` calls.
Hopefully, the next patch will introduce `assumeInBoundDual` and will
solve the CRASH we have here:
https://github.com/llvm/llvm-project/issues/54272
Differential Revision: https://reviews.llvm.org/D124674
This patch restores the symmetry between how operator new and operator delete
are handled by also inlining the content of operator delete when possible.
Patch by Fred Tingaud.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D124845
It seems like multiple users are affected by a crash introduced by this
commit, thus I'm reverting it for the time being.
Read more about the found reproducers at Phabricator.
Differential Revision: https://reviews.llvm.org/D124658
This reverts commit f0d6cb4a5c.
There are many more instances of this pattern, but I chose to limit this change to .rst files (docs), anything in libcxx/include, and string literals. These have the highest chance of being seen by end users.
Reviewed By: #libc, Mordante, martong, ldionne
Differential Revision: https://reviews.llvm.org/D124708
This PR changes the `SymIntExpr` so the expression that uses a
negative value as `RHS`, for example: `x +/- (-N)`, is modeled as
`x -/+ N` instead.
This avoids producing a very large `RHS` when the symbol is cased to
an unsigned number, and as consequence makes the value more robust in
presence of casts.
Note that this change is not applied if `N` is the lowest negative
value for which negation would not be representable.
Reviewed By: steakhal
Patch By: tomasz-kaminski-sonarsource!
Differential Revision: https://reviews.llvm.org/D124658
Region store was not able to see through this case to the actual
initialized value of STRUCT ff. This change addresses this case by
getting the direct binding. This was found and debugged in a downstream
compiler, with debug guidance from @steakhal. A positive and negative
test case is added.
The specific case where this issue was exposed.
typedef struct {
int a:1;
int b[2];
} STRUCT;
int main() {
STRUCT ff = {0};
STRUCT* pff = &ff;
int a = ((int)pff + 1);
return a;
}
Reviewed By: steakhal, martong
Differential Revision: https://reviews.llvm.org/D124349
This is an extension to diff D99260. This adds an additional exception
for `std::__addressof` in `InnerPointerChecker`.
Patch By alishuja (Ali Shuja Siddiqui)!
Reviewed By: martong, alishuja
Differential Revision: https://reviews.llvm.org/D109467
Essentially, having a default member initializer for a constant member
does not necessarily imply the member will have the given default value.
Remove part of a2e053638b ([analyzer] Treat more const variables and
fields as known contants., 2018-05-04).
Fix#47878
Reviewed By: r.stahl, steakhal
Differential Revision: https://reviews.llvm.org/D124621
Remove unnecessary conversion to Optional<> and incorrect assumption
that BindExpr can return a null state.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D124681
Historically, exploded graph dumps were disabled in non-debug builds.
It was done so probably because a regular user should not dump the
internal representation of the analyzer anyway and the dump methods
might introduce unnecessary binary size overhead.
It turns out some of the users actually want to dump this.
Note that e.g. `LiveExpressionsDumper`, `LiveVariablesDumper`,
`ControlDependencyTreeDumper` etc. worked previously, and they are
unaffected by this change.
However, `CFGViewer` and `CFGDumper` still won't work for a similar
reason. AFAIK only these two won't work after this change.
Addresses #53873
---
**baseline**
| binary | size | size after strip |
| clang | 103M | 83M |
| clang-tidy | 67M | 54M |
**after this change**
| binary | size | size after strip |
| clang | 103M | 84M |
| clang-tidy | 67M | 54M |
CMake configuration:
```
cmake -S llvm -GNinja -DBUILD_SHARED_LIBS=OFF -DCMAKE_BUILD_TYPE=Release
-DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang
-DLLVM_ENABLE_ASSERTIONS=OFF -DLLVM_USE_LINKER=lld
-DLLVM_ENABLE_DUMP=OFF -DLLVM_ENABLE_PROJECTS="clang;clang-tools-extra"
-DLLVM_ENABLE_Z3_SOLVER=ON -DLLVM_TARGETS_TO_BUILD="X86"
```
Built by `clang-14.0.0`.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D124442
We ignored the cast if the enum was scoped.
This is bad since there is no implicit conversion from the scoped enum to the corresponding underlying type.
The fix is basically: isIntegralOrEnumerationType() -> isIntegralOr**Unscoped**EnumerationType()
This materialized in crashes on analyzing the LLVM itself using the Z3 refutation.
Refutation synthesized the given Z3 Binary expression (`BO_And` of `unsigned char` aka. 8 bits
and an `int` 32 bits) with the wrong bitwidth in the end, which triggered an assert.
Now, we evaluate the cast according to the standard.
This bug could have been triggered using the Z3 CM according to
https://bugs.llvm.org/show_bug.cgi?id=44030Fixes#47570#43375
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D85528
`g_memdup()` allocates and copies memory, thus we should not assume that
the returned memory region is uninitialized because it might not be the
case.
PS: It would be even better to copy the bindings to mimic the actual
content of the buffer, but this works too.
Fixes#53617
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D124436
The patch is straightforward except the tiny fix in BugReporterVisitors.cpp
that suppresses a default note for "Assuming pointer value is null" when
a note tag from the checker is present. This is probably the right thing to do
but also definitely not a complete solution to the problem of different sources
of path notes being unaware of each other, which is a large and annoying issue
that we have to deal with. Note tags really help there because they're nicely
introspectable. The problem is demonstrated by the newly added getenv() test.
Differential Revision: https://reviews.llvm.org/D122285
In the following example:
int va_list_get_int(va_list *va) {
return va_arg(*va, int); // FP
}
The `*va` expression will be something like `Element{SymRegion{va}, 0, va_list}`.
We use `ElementRegions` for representing the result of the dereference.
In this case, the `IsSymbolic` was set to `false` in the
`getVAListAsRegion()`.
Hence, before checking if the memregion is a SymRegion, we should take
the base of that region.
Analogously to the previous example, one can craft other cases:
struct MyVaList {
va_list l;
};
int va_list_get_int(struct MyVaList va) {
return va_arg(va.l, int); // FP
}
But it would also work if the `va_list` would be in the base or derived
part of a class. `ObjCIvarRegions` are likely also susceptible.
I'm not explicitly demonstrating these cases.
PS: Check the `MemRegion::getBaseRegion()` definition.
Fixes#55009
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D124239
This change adds an option to detect all null dereferences for
non-default address spaces, except for address spaces 256, 257 and 258.
Those address spaces are special since null dereferences are not errors.
All address spaces can be considered (except for 256, 257, and 258) by
using -analyzer-config
core.NullDereference:DetectAllNullDereferences=true. This option is
false by default, retaining the original behavior.
A LIT test was enhanced to cover this case, and the rst documentation
was updated to describe this behavior.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D122841
A recent review emphasized the preference to use DefaultBool instead of
bool for checker options. This change is a NFC and cleans up some of the
instances where bool was used, and could be changed to DefaultBool.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D123464
Under the hood this prints the same as `QualType::getAsString()` but cuts out the middle-man when that string is sent to another raw_ostream.
Also cleaned up all the call sites where this occurs.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D123926
WG14 has elected to remove support for K&R C functions in C2x. The
feature was introduced into C89 already deprecated, so after this long
of a deprecation period, the committee has made an empty parameter list
mean the same thing in C as it means in C++: the function accepts no
arguments exactly as if the function were written with (void) as the
parameter list.
This patch implements WG14 N2841 No function declarators without
prototypes (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2841.htm)
and WG14 N2432 Remove support for function definitions with identifier
lists (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2432.pdf).
It also adds The -fno-knr-functions command line option to opt into
this behavior in other language modes.
Differential Revision: https://reviews.llvm.org/D123955
Summary: Handle casts for ranges working similarly to APSIntType::apply function but for the whole range set. Support promotions, truncations and conversions.
Example:
promotion: char [0, 42] -> short [0, 42] -> int [0, 42] -> llong [0, 42]
truncation: llong [4295033088, 4295033130] -> int [65792, 65834] -> short [256, 298] -> char [0, 42]
conversion: char [-42, 42] -> uint [0, 42]U[4294967254, 4294967295] -> short[-42, 42]
Differential Revision: https://reviews.llvm.org/D103094
I recently evaluated ~150 of bug reports on open source projects relating to my
GSoC'19 project, which was about tracking control dependencies that were
relevant to a bug report.
Here is what I found: when the condition is a function call, the extra notes
were almost always unimportant, and often times intrusive:
void f(int *x) {
x = nullptr;
if (alwaysTrue()) // We don't need a whole lot of explanation
// here, the function name is good enough.
*x = 5;
}
It almost always boiled down to a few "Returning null pointer, which participates
in a condition later", or similar notes. I struggled to find a single case
where the notes revealed anything interesting or some previously hidden
correlation, which is kind of the point of condition tracking.
This patch checks whether the condition is a function call, and if so, bails
out.
The argument against the patch is the popular feedback we hear from some of our
users, namely that they can never have too much information. I was specifically
fishing for examples that display best that my contribution did more good than
harm, so admittedly I set the bar high, and one can argue that there can be
non-trivial trickery inside functions, and function names may not be that
descriptive.
My argument for the patch is all those reports that got longer without any
notable improvement in the report intelligibility. I think the few exceptional
cases where this patch would remove notable information are an acceptable
sacrifice in favor of more reports being leaner.
Differential Revision: https://reviews.llvm.org/D116597
Do import the definition of objects from a foreign translation unit if that's type is const and trivial.
Differential Revision: https://reviews.llvm.org/D122805
This change fixes an assert that occurs in the SMT layer when refuting a
finding that uses pointers of two different sizes. This was found in a
downstream build that supports two different pointer sizes, The CString
Checker was attempting to compute an overlap for the 'to' and 'from'
pointers, where the pointers were of different sizes.
In the downstream case where this was found, a specialized memcpy
routine patterned after memcpy_special is used. The analyzer core hits
on this builtin because it matches the 'memcpy' portion of that builtin.
This cannot be duplicated in the upstream test since there are no
specialized builtins that match that pattern, but the case does
reproduce in the accompanying LIT test case. The amdgcn target was used
for this reproducer. See the documentation for AMDGPU address spaces here
https://llvm.org/docs/AMDGPUUsage.html#address-spaces.
The assert seen is:
`*Solver->getSort(LHS) == *Solver->getSort(RHS) && "AST's must have the same sort!"'
Ack to steakhal for reviewing the fix, and creating the test case.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D118050
clang: <root>/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:727:
void assertEqualBitWidths(clang::ento::ProgramStateRef,
clang::ento::Loc, clang::ento::Loc): Assertion `RhsBitwidth ==
LhsBitwidth && "RhsLoc and LhsLoc bitwidth must be same!"'
This change adjusts the bitwidth of the smaller operand for an evalBinOp
as a result of a comparison operation. This can occur in the specific
case represented by the test cases for a target with different pointer
sizes.
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D122513
Adds basic parsing/sema/serialization support for the
#pragma omp target parallel loop directive.
Differential Revision: https://reviews.llvm.org/D122359
This change fixes a crash in RangedConstraintManager.cpp:assumeSym due to an
unhandled BO_Div case.
clang: <root>clang/lib/StaticAnalyzer/Core/RangedConstraintManager.cpp:51:
virtual clang::ento::ProgramStateRef
clang::ento::RangedConstraintManager::assumeSym(clang::ento::ProgramStateRef,
clang::ento::SymbolRef, bool):
Assertion `BinaryOperator::isComparisonOp(Op)' failed.
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D122277
This is a NFC refactoring to change makeIntValWithPtrWidth
and remove getZeroWithPtrWidth to use types when forming values to match
pointer widths. Some targets may have different pointer widths depending
upon address space, so this needs to be comprehended.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D120134
Usages of makeNull need to be deprecated in favor of makeNullWithWidth
for architectures where the pointer size should not be assumed. This can
occur when pointer sizes can be of different sizes, depending on address
space for example. See https://reviews.llvm.org/D118050 as an example.
This was uncovered initially in a downstream compiler project, and
tested through those systems tests.
steakhal performed systems testing across a large set of open source
projects.
Co-authored-by: steakhal
Resolves: https://github.com/llvm/llvm-project/issues/53664
Reviewed By: NoQ, steakhal
Differential Revision: https://reviews.llvm.org/D119601
Few weeks back I was experimenting with reading the uninitialized values from src , which is actually a bug but the CSA seems to give up at that point . I was curious about that and I pinged @steakhal on the discord and according to him this seems to be a genuine issue and needs to be fix. So I goes with fixing this bug and thanks to @steakhal who help me creating this patch. This feature seems to break some tests but this was the genuine problem and the broken tests also needs to fix in certain manner. I add a test but yeah we need more tests,I'll try to add more tests.Thanks
Reviewed By: steakhal, NoQ
Differential Revision: https://reviews.llvm.org/D120489
Few weeks back I was experimenting with reading the uninitialized values from src , which is actually a bug but the CSA seems to give up at that point . I was curious about that and I pinged @steakhal on the discord and according to him this seems to be a genuine issue and needs to be fix. So I goes with fixing this bug and thanks to @steakhal who help me creating this patch. This feature seems to break some tests but this was the genuine problem and the broken tests also needs to fix in certain manner. I add a test but yeah we need more tests,I'll try to add more tests.Thanks
Reviewed By: steakhal, NoQ
Differential Revision: https://reviews.llvm.org/D120489
The problem with leak bug reports is that the most interesting event in the code
is likely the one that did not happen -- lack of ownership change and lack of
deallocation, which is often present within the same function that the analyzer
inlined anyway, but not on the path of execution on which the bug occured. We
struggle to understand that a function was responsible for freeing the memory,
but failed.
D105819 added a new visitor to improve memory leak bug reports. In addition to
inspecting the ExplodedNodes of the bug pat, the visitor tries to guess whether
the function was supposed to free memory, but failed to. Initially (in D108753),
this was done by checking whether a CXXDeleteExpr is present in the function. If
so, we assume that the function was at least party responsible, and prevent the
analyzer from pruning bug report notes in it. This patch improves this heuristic
by recognizing all deallocator functions that MallocChecker itself recognizes,
by reusing MallocChecker::isFreeingCall.
Differential Revision: https://reviews.llvm.org/D118880
Since CallDescriptions can only be matched against CallEvents that are created
during symbolic execution, it was not possible to use it in syntactic-only
contexts. For example, even though InnerPointerChecker can check with its set of
CallDescriptions whether a function call is interested during analysis, its
unable to check without hassle whether a non-analyzer piece of code also calls
such a function.
The patch adds the ability to use CallDescriptions in syntactic contexts as
well. While we already have that in Signature, we still want to leverage the
ability to use dynamic information when we have it (function pointers, for
example). This could be done with Signature as well (StdLibraryFunctionsChecker
does it), but it makes it even less of a drop-in replacement.
Differential Revision: https://reviews.llvm.org/D119004
Add a checker to maintain the system-defined value 'errno'.
The value is supposed to be set in the future by existing or
new checkers that evaluate errno-modifying function calls.
Reviewed By: NoQ, steakhal
Differential Revision: https://reviews.llvm.org/D120310
Since https://reviews.llvm.org/D120334 we shouldn't pass temporary LangOptions to Lexer.
This change fixes stack-use-after-scope UB in LocalizationChecker found by sanitizer-x86_64-linux-fast buildbot
and resolve similar issue in HeaderIncludes.
Add a checker to maintain the system-defined value 'errno'.
The value is supposed to be set in the future by existing or
new checkers that evaluate errno-modifying function calls.
Reviewed By: NoQ, steakhal
Differential Revision: https://reviews.llvm.org/D120310
LocationContext::getDecl() isn't useful for obtaining the "farmed" body because
the (synthetic) body statement isn't actually attached to the (natural-grown)
declaration in the AST.
Differential Revision: https://reviews.llvm.org/D119509
This reverts commit 620d99b7ed.
Let's see if removing the two offending RUN lines makes this patch pass.
Not ideal to drop tests but, it's just a debugging feature, probably not
that important.
There was a typo in the rule.
`{{0}, ReturnValueIndex}` meant that the discrete index is `0` and the
variadic index is `-1`.
What we wanted instead is that both `0` and `-1` are in the discrete index
list.
Instead of this, we wanted to express that both `0` and the
`ReturnValueIndex` is in the discrete arg list.
The manual inspection revealed that `setproctitle_init` also suffered a
probably incomplete propagation rule.
Reviewed By: Szelethus, gamesh411
Differential Revision: https://reviews.llvm.org/D119129
Fixes the issue D118987 by mapping the propagation to the callsite's
LocationContext.
This way we can keep track of the in-flight propagations.
Note that empty propagation sets won't be inserted.
Reviewed By: NoQ, Szelethus
Differential Revision: https://reviews.llvm.org/D119128
Recently we uncovered a serious bug in the `GenericTaintChecker`.
It was already flawed before D116025, but that was the patch that turned
this silent bug into a crash.
It happens if the `GenericTaintChecker` has a rule for a function, which
also has a definition.
char *fgets(char *s, int n, FILE *fp) {
nested_call(); // no parameters!
return (char *)0;
}
// Within some function:
fgets(..., tainted_fd);
When the engine inlines the definition and finds a function call within
that, the `PostCall` event for the call will get triggered sooner than the
`PostCall` for the original function.
This mismatch violates the assumption of the `GenericTaintChecker` which
wants to propagate taint information from the `PreCall` event to the
`PostCall` event, where it can actually bind taint to the return value
**of the same call**.
Let's get back to the example and go through step-by-step.
The `GenericTaintChecker` will see the `PreCall<fgets(..., tainted_fd)>`
event, so it would 'remember' that it needs to taint the return value
and the buffer, from the `PostCall` handler, where it has access to the
return value symbol.
However, the engine will inline fgets and the `nested_call()` gets
evaluated subsequently, which produces an unimportant
`PreCall<nested_call()>`, then a `PostCall<nested_call()>` event, which is
observed by the `GenericTaintChecker`, which will unconditionally mark
tainted the 'remembered' arg indexes, trying to access a non-existing
argument, resulting in a crash.
If it doesn't crash, it will behave completely unintuitively, by marking
completely unrelated memory regions tainted, which is even worse.
The resulting assertion is something like this:
Expr.h: const Expr *CallExpr::getArg(unsigned int) const: Assertion
`Arg < getNumArgs() && "Arg access out of range!"' failed.
The gist of the backtrace:
CallExpr::getArg(unsigned int) const
SimpleFunctionCall::getArgExpr(unsigned int)
CallEvent::getArgSVal(unsigned int) const
GenericTaintChecker::checkPostCall(const CallEvent &, CheckerContext&) const
Prior to D116025, there was a check for the argument count before it
applied taint, however, it still suffered from the same underlying
issue/bug regarding propagation.
This path does not intend to fix the bug, rather start a discussion on
how to fix this.
---
Let me elaborate on how I see this problem.
This pre-call, post-call juggling is just a workaround.
The engine should by itself propagate taint where necessary right where
it invalidates regions.
For the tracked values, which potentially escape, we need to erase the
information we know about them; and this is exactly what is done by
invalidation.
However, in the case of taint, we basically want to approximate from the
opposite side of the spectrum.
We want to preserve taint in most cases, rather than cleansing them.
Now, we basically sanitize all escaping tainted regions implicitly,
since invalidation binds a fresh conjured symbol for the given region,
and that has not been associated with taint.
IMO this is a bad default behavior, we should be more aggressive about
preserving taint if not further spreading taint to the reachable
regions.
We have a couple of options for dealing with it (let's call it //tainting
policy//):
1) Taint only the parameters which were tainted prior to the call.
2) Taint the return value of the call, since it likely depends on the
tainted input - if any arguments were tainted.
3) Taint all escaped regions - (maybe transitively using the cluster
algorithm) - if any arguments were tainted.
4) Not taint anything - this is what we do right now :D
The `ExprEngine` should not deal with taint on its own. It should be done
by a checker, such as the `GenericTaintChecker`.
However, the `Pre`-`PostCall` checker callbacks are not designed for this.
`RegionChanges` would be a much better fit for modeling taint propagation.
What we would need in the `RegionChanges` callback is the `State` prior
invalidation, the `State` after the invalidation, and a `CheckerContext` in
which the checker can create transitions, where it would place `NoteTags`
for the modeled taint propagations and report errors if a taint sink
rule gets violated.
In this callback, we could query from the prior State, if the given
value was tainted; then act and taint if necessary according to the
checker's tainting policy.
By using RegionChanges for this, we would 'fix' the mentioned
propagation bug 'by-design'.
Reviewed By: Szelethus
Differential Revision: https://reviews.llvm.org/D118987
There is different bug types for different types of bugs but the **emitAdditionOverflowbug** seems to use bugtype **BT_NotCSting** but actually it have to use **BT_AdditionOverflow** .
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D119462
`CallDescriptions` for builtin functions relaxes the match rules
somewhat, so that the `CallDescription` will match for calls that have
some prefix or suffix. This was achieved by doing a `StringRef::contains()`.
However, this is somewhat problematic for builtins that are substrings
of each other.
Consider the following:
`CallDescription{ builtin, "memcpy"}` will match for
`__builtin_wmemcpy()` calls, which is unfortunate.
This patch addresses/works around the issue by checking if the
characters around the function's name are not part of the 'name'
semantically. In other words, to accept a match for `"memcpy"` the call
should not have alphanumeric (`[a-zA-Z]`) characters around the 'match'.
So, `CallDescription{ builtin, "memcpy"}` will not match on:
- `__builtin_wmemcpy: there is a `w` alphanumeric character before the match.
- `__builtin_memcpyFOoBar_inline`: there is a `F` character after the match.
- `__builtin_memcpyX_inline`: there is an `X` character after the match.
But it will still match for:
- `memcpy`: exact match
- `__builtin_memcpy`: there is an _ before the match
- `__builtin_memcpy_inline`: there is an _ after the match
- `memcpy_inline_builtinFooBar`: there is an _ after the match
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D118388
Sometimes when I pass the mentioned option I forget about passing the
parameter list for c++ sources.
It would be also useful newcomers to learn about this.
This patch introduces some logic checking common misuses involving
`-analyze-function`.
Reviewed-By: martong
Differential Revision: https://reviews.llvm.org/D118690
This reverts commit 9d6a615973.
Exit Code: 1
Command Output (stderr):
--
/scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Analysis/analyze-function-guide.cpp:53:21: error: CHECK-EMPTY-NOT: excluded string found in input // CHECK-EMPTY-NOT: Every top-level function was skipped.
^
<stdin>:1:1: note: found here
Every top-level function was skipped.
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Input file: <stdin>
Check file: /scratch/buildbot/bothome/clang-ve-ninja/llvm-project/clang/test/Analysis/analyze-function-guide.cpp
-dump-input=help explains the following input dump.
Input was:
<<<<<<
1: Every top-level function was skipped.
not:53 !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match expected
2: Pass the -analyzer-display-progress for tracking which functions are analyzed.
>>>>>>
Sometimes when I pass the mentioned option I forget about passing the
parameter list for c++ sources.
It would be also useful newcomers to learn about this.
This patch introduces some logic checking common misuses involving
`-analyze-function`.
Reviewed-By: martong
Differential Revision: https://reviews.llvm.org/D118690
GenericTaintChecker now uses CallDescriptionMap to describe the possible
operation in code which trigger the introduction (sources), the removal
(filters), the passing along (propagations) and detection (sinks) of
tainted values.
Reviewed By: steakhal, NoQ
Differential Revision: https://reviews.llvm.org/D116025
Summary: Produce SymbolCast for integral types in `evalCast` function. Apply several simplification techniques while producing the symbols. Added a boolean option `handle-integral-cast-for-ranges` under `-analyzer-config` flag. Disabled the feature by default.
Differential Revision: https://reviews.llvm.org/D105340
An identical declaration is present just a couple of lines above the
line being removed in this patch.
Identified with readability-redundant-declaration.
Control-Flow Integrity (CFI) replaces references to address-taken
functions with pointers to the CFI jump table. This is a problem
for low-level code, such as operating system kernels, which may
need the address of an actual function body without the jump table
indirection.
This change adds the __builtin_function_start() builtin, which
accepts an argument that can be constant-evaluated to a function,
and returns the address of the function body.
Link: https://github.com/ClangBuiltLinux/linux/issues/1353
Depends on D108478
Reviewed By: pcc, rjmccall
Differential Revision: https://reviews.llvm.org/D108479
Summary: Refactor return value of `StoreManager::attemptDownCast` function by removing the last parameter `bool &Failed` and replace the return value `SVal` with `Optional<SVal>`. Make the function consistent with the family of `evalDerivedToBase` by renaming it to `evalBaseToDerived`. Aligned the code on the call side with these changes.
Differential Revision: https://reviews.llvm.org/
This expands checking for more expressions. This will check underflow
and loss of precision when using call expressions like:
void foo(unsigned);
int i = -1;
foo(i);
This also includes other expressions as well, so it can catch negative
indices to std::vector since it uses unsigned integers for [] and .at()
function.
Patch by: @pfultz2
Differential Revision: https://reviews.llvm.org/D46081
Summary: Handle intersected and adjacent ranges uniting them into a single one.
Example:
intersection [0, 10] U [5, 20] = [0, 20]
adjacency [0, 10] U [11, 20] = [0, 20]
Differential Revision: https://reviews.llvm.org/D99797
This avoids an unnecessary copy required by 'return OS.str()', allowing
instead for NRVO or implicit move. The .str() call (which flushes the
stream) is no longer required since 65b13610a5,
which made raw_string_ostream unbuffered by default.
Differential Revision: https://reviews.llvm.org/D115374
Previously, the `SValBuilder` could not encounter expressions of the
following kind:
NonLoc OP Loc
Loc OP NonLoc
Where the `Op` is other than `BO_Add`.
As of now, due to the smarter simplification and the fixedpoint
iteration, it turns out we can.
It can happen if the `Loc` was perfectly constrained to a concrete
value (`nonloc::ConcreteInt`), thus the simplifier can do
constant-folding in these cases as well.
Unfortunately, this could cause assertion failures, since we assumed
that the operator must be `BO_Add`, causing a crash.
---
In the patch, I decided to preserve the original behavior (aka. swap the
operands (if the operator is commutative), but if the `RHS` was a
`loc::ConcreteInt` call `evalBinOpNN()`.
I think this interpretation of the arithmetic expression is closer to
reality.
I also tried naively introducing a separate handler for
`loc::ConcreteInt` RHS, before doing handling the more generic `Loc` RHS
case. However, it broke the `zoo1backwards()` test in the `nullptr.cpp`
file. This highlighted for me the importance to preserve the original
behavior for the `BO_Add` at least.
PS: Sorry for introducing yet another branch into this `evalBinOpXX`
madness. I've got a couple of ideas about refactoring these.
We'll see if I can get to it.
The test file demonstrates the issue and makes sure nothing similar
happens. The `no-crash` annotated lines show, where we crashed before
applying this patch.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D115149
Some projects [1,2,3] have flex-generated files besides bison-generated
ones.
Unfortunately, the comment `"/* A lexical scanner generated by flex */"`
generated by the tools is not necessarily at the beginning of the file,
thus we need to quickly skim through the file for this needle string.
Luckily, StringRef can do this operation in an efficient way.
That being said, now the bison comment is not required to be at the very
beginning of the file. This allows us to detect a couple more cases
[4,5,6].
Alternatively, we could say that we only allow whitespace characters
before matching the bison/flex header comment. That would prevent the
(probably) unnecessary string search in the buffer. However, I could not
verify that these tools would actually respect this assumption.
Additionally to this, e.g. the Twin project [1] has other non-whitespace
characters (some preprocessor directives) before the flex-generated
header comment. So the heuristic in the previous paragraph won't work
with that.
Thus, I would advocate the current implementation.
According to my measurement, this patch won't introduce measurable
performance degradation, even though we will do 2 linear scans.
I introduce the ignore-bison-generated-files and
ignore-flex-generated-files to disable skipping these files.
Both of these options are true by default.
[1]: https://github.com/cosmos72/twin/blob/master/server/rcparse_lex.cpp#L7
[2]: 22362cdcf9/sandbox/count-words/lexer.c (L6)
[3]: 11abdf6462/lab1/lex.yy.c (L6)
[4]: 47f5b2cfe2/B_yacc/1/y1.tab.h (L2)
[5]: 71d1bf9b1e/src/VBox/Additions/x11/x11include/xorg-server-1.8.0/parser.h (L2)
[6]: 3f773ceb13/Framework/OpenEars.framework/Versions/A/Headers/jsgf_parser.h (L2)
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D114510
This reverts commit f02c5f3478 and
addresses the issue mentioned in D114619 differently.
Repeating the issue here:
Currently, during symbol simplification we remove the original member
symbol from the equivalence class (`ClassMembers` trait). However, we
keep the reverse link (`ClassMap` trait), in order to be able the query
the related constraints even for the old member. This asymmetry can lead
to a problem when we merge equivalence classes:
```
ClassA: [a, b] // ClassMembers trait,
a->a, b->a // ClassMap trait, a is the representative symbol
```
Now let,s delete `a`:
```
ClassA: [b]
a->a, b->a
```
Let's merge ClassA into the trivial class `c`:
```
ClassA: [c, b]
c->c, b->c, a->a
```
Now, after the merge operation, `c` and `a` are actually in different
equivalence classes, which is inconsistent.
This issue manifests in a test case (added in D103317):
```
void recurring_symbol(int b) {
if (b * b != b)
if ((b * b) * b * b != (b * b) * b)
if (b * b == 1)
}
```
Before the simplification we have these equivalence classes:
```
trivial EQ1: [b * b != b]
trivial EQ2: [(b * b) * b * b != (b * b) * b]
```
During the simplification with `b * b == 1`, EQ1 is merged with `1 != b`
`EQ1: [b * b != b, 1 != b]` and we remove the complex symbol, so
`EQ1: [1 != b]`
Then we start to simplify the only symbol in EQ2:
`(b * b) * b * b != (b * b) * b --> 1 * b * b != 1 * b --> b * b != b`
But `b * b != b` is such a symbol that had been removed previously from
EQ1, thus we reach the above mentioned inconsistency.
This patch addresses the issue by making it impossible to synthesise a
symbol that had been simplified before. We achieve this by simplifying
the given symbol to the absolute simplest form.
Differential Revision: https://reviews.llvm.org/D114887
Add the capability to simplify more complex constraints where there are 3
symbols in the tree. In this change I extend simplifySVal to query constraints
of children sub-symbols in a symbol tree. (The constraint for the parent is
asked in getKnownValue.)
Differential Revision: https://reviews.llvm.org/D103317
Currently, during symbol simplification we remove the original member symbol
from the equivalence class (`ClassMembers` trait). However, we keep the
reverse link (`ClassMap` trait), in order to be able the query the
related constraints even for the old member. This asymmetry can lead to
a problem when we merge equivalence classes:
```
ClassA: [a, b] // ClassMembers trait,
a->a, b->a // ClassMap trait, a is the representative symbol
```
Now lets delete `a`:
```
ClassA: [b]
a->a, b->a
```
Let's merge the trivial class `c` into ClassA:
```
ClassA: [c, b]
c->c, b->c, a->a
```
Now after the merge operation, `c` and `a` are actually in different
equivalence classes, which is inconsistent.
One solution to this problem is to simply avoid removing the original
member and this is what this patch does.
Other options I have considered:
1) Always merge the trivial class into the non-trivial class. This might
work most of the time, however, will fail if we have to merge two
non-trivial classes (in that case we no longer can track equivalences
precisely).
2) In `removeMember`, update the reverse link as well. This would cease
the inconsistency, but we'd loose precision since we could not query
the constraints for the removed member.
Differential Revision: https://reviews.llvm.org/D114619
I just read this part of the code, and I found the nested ifs less
readable.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D114441
Make the SValBuilder capable to simplify existing
SVals based on a newly added constraints when evaluating a BinOp.
Before this patch, we called `simplify` only in some edge cases.
However, we can and should investigate the constraints in all cases.
Differential Revision: https://reviews.llvm.org/D113753
Make the SimpleSValBuilder capable to simplify existing IntSym
expressions based on a newly added constraint on the sub-expression.
Differential Revision: https://reviews.llvm.org/D113754
`CallDescriptions` have a `RequiredArgs` and `RequiredParams` members,
but they are of different types, `unsigned` and `size_t` respectively.
In the patch I use only `unsigned` for both, that should be large enough
anyway.
I also introduce the `MaybeUInt` type alias for `Optional<unsigned>`.
Additionally, I also avoid the use of the //smart// less-than operator.
template <typename T>
constexpr bool operator<=(const Optional<T> &X, const T &Y);
Which would check if the optional **has** a value and compare the data
only after. I found it surprising, thus I think we are better off
without it.
Reviewed By: martong, xazax.hun
Differential Revision: https://reviews.llvm.org/D113594
Previously, CallDescription simply referred to the qualified name parts
by `const char*` pointers.
In the future we might want to dynamically load and populate
`CallDescriptionMaps`, hence we will need the `CallDescriptions` to
actually **own** their qualified name parts.
Reviewed By: martong, xazax.hun
Differential Revision: https://reviews.llvm.org/D113593