Support for functions wmemcpy, wcslen, wcsnlen is added to the checker.
Documentation and tests are updated and extended with the new functions.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D130091
Summary: Introduce a new function 'clang_analyzer_value'. It emits a report that in turn prints a RangeSet or APSInt associated with SVal. If there is no associated value, prints "n/a".
This reverts commit 7c51f02eff because it
stills breaks the LLDB tests. This was re-landed without addressing the
issue or even agreement on how to address the issue. More details and
discussion in https://reviews.llvm.org/D112374.
Without this patch, clang will not wrap in an ElaboratedType node types written
without a keyword and nested name qualifier, which goes against the intent that
we should produce an AST which retains enough details to recover how things are
written.
The lack of this sugar is incompatible with the intent of the type printer
default policy, which is to print types as written, but to fall back and print
them fully qualified when they are desugared.
An ElaboratedTypeLoc without keyword / NNS uses no storage by itself, but still
requires pointer alignment due to pre-existing bug in the TypeLoc buffer
handling.
---
Troubleshooting list to deal with any breakage seen with this patch:
1) The most likely effect one would see by this patch is a change in how
a type is printed. The type printer will, by design and default,
print types as written. There are customization options there, but
not that many, and they mainly apply to how to print a type that we
somehow failed to track how it was written. This patch fixes a
problem where we failed to distinguish between a type
that was written without any elaborated-type qualifiers,
such as a 'struct'/'class' tags and name spacifiers such as 'std::',
and one that has been stripped of any 'metadata' that identifies such,
the so called canonical types.
Example:
```
namespace foo {
struct A {};
A a;
};
```
If one were to print the type of `foo::a`, prior to this patch, this
would result in `foo::A`. This is how the type printer would have,
by default, printed the canonical type of A as well.
As soon as you add any name qualifiers to A, the type printer would
suddenly start accurately printing the type as written. This patch
will make it print it accurately even when written without
qualifiers, so we will just print `A` for the initial example, as
the user did not really write that `foo::` namespace qualifier.
2) This patch could expose a bug in some AST matcher. Matching types
is harder to get right when there is sugar involved. For example,
if you want to match a type against being a pointer to some type A,
then you have to account for getting a type that is sugar for a
pointer to A, or being a pointer to sugar to A, or both! Usually
you would get the second part wrong, and this would work for a
very simple test where you don't use any name qualifiers, but
you would discover is broken when you do. The usual fix is to
either use the matcher which strips sugar, which is annoying
to use as for example if you match an N level pointer, you have
to put N+1 such matchers in there, beginning to end and between
all those levels. But in a lot of cases, if the property you want
to match is present in the canonical type, it's easier and faster
to just match on that... This goes with what is said in 1), if
you want to match against the name of a type, and you want
the name string to be something stable, perhaps matching on
the name of the canonical type is the better choice.
3) This patch could exposed a bug in how you get the source range of some
TypeLoc. For some reason, a lot of code is using getLocalSourceRange(),
which only looks at the given TypeLoc node. This patch introduces a new,
and more common TypeLoc node which contains no source locations on itself.
This is not an inovation here, and some other, more rare TypeLoc nodes could
also have this property, but if you use getLocalSourceRange on them, it's not
going to return any valid locations, because it doesn't have any. The right fix
here is to always use getSourceRange() or getBeginLoc/getEndLoc which will dive
into the inner TypeLoc to get the source range if it doesn't find it on the
top level one. You can use getLocalSourceRange if you are really into
micro-optimizations and you have some outside knowledge that the TypeLocs you are
dealing with will always include some source location.
4) Exposed a bug somewhere in the use of the normal clang type class API, where you
have some type, you want to see if that type is some particular kind, you try a
`dyn_cast` such as `dyn_cast<TypedefType>` and that fails because now you have an
ElaboratedType which has a TypeDefType inside of it, which is what you wanted to match.
Again, like 2), this would usually have been tested poorly with some simple tests with
no qualifications, and would have been broken had there been any other kind of type sugar,
be it an ElaboratedType or a TemplateSpecializationType or a SubstTemplateParmType.
The usual fix here is to use `getAs` instead of `dyn_cast`, which will look deeper
into the type. Or use `getAsAdjusted` when dealing with TypeLocs.
For some reason the API is inconsistent there and on TypeLocs getAs behaves like a dyn_cast.
5) It could be a bug in this patch perhaps.
Let me know if you need any help!
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D112374
CStringChecker is using getByteLength to get the length of a string
literal. For targets where a "char" is 8-bits, getByteLength() and
getLength() will be equal for a C string, but for targets where a "char"
is 16-bits getByteLength() returns the size in octets.
This is verified in our downstream target, but we have no way to add a
test case for this case since there is no target supporting 16-bit
"char" upstream. Since this cannot have a test case, I'm asserted this
change is "correct by construction", and visually inspected to be
correct by way of the following example where this was found.
The case that shows this fails using a target with 16-bit chars is here.
getByteLength() for the string literal returns 4, which fails when
checked against "char x[4]". With the change, the string literal is
evaluated to a size of 2 which is a correct number of "char"'s for a
16-bit target.
```
void strcpy_no_overflow_2(char *y) {
char x[4];
strcpy(x, "12"); // with getByteLength(), returns 4 using 16-bit chars
}
```
This change exposed that embedded nulls within the string are not
handled. This is documented as a FIXME for a future fix.
```
void strcpy_no_overflow_3(char *y) {
char x[3];
strcpy(x, "12\0");
}
```
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D129269
This reverts commit bdc6974f92 because it
breaks all the LLDB tests that import the std module.
import-std-module/array.TestArrayFromStdModule.py
import-std-module/deque-basic.TestDequeFromStdModule.py
import-std-module/deque-dbg-info-content.TestDbgInfoContentDequeFromStdModule.py
import-std-module/forward_list.TestForwardListFromStdModule.py
import-std-module/forward_list-dbg-info-content.TestDbgInfoContentForwardListFromStdModule.py
import-std-module/list.TestListFromStdModule.py
import-std-module/list-dbg-info-content.TestDbgInfoContentListFromStdModule.py
import-std-module/queue.TestQueueFromStdModule.py
import-std-module/stack.TestStackFromStdModule.py
import-std-module/vector.TestVectorFromStdModule.py
import-std-module/vector-bool.TestVectorBoolFromStdModule.py
import-std-module/vector-dbg-info-content.TestDbgInfoContentVectorFromStdModule.py
import-std-module/vector-of-vectors.TestVectorOfVectorsFromStdModule.py
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/45301/
Without this patch, clang will not wrap in an ElaboratedType node types written
without a keyword and nested name qualifier, which goes against the intent that
we should produce an AST which retains enough details to recover how things are
written.
The lack of this sugar is incompatible with the intent of the type printer
default policy, which is to print types as written, but to fall back and print
them fully qualified when they are desugared.
An ElaboratedTypeLoc without keyword / NNS uses no storage by itself, but still
requires pointer alignment due to pre-existing bug in the TypeLoc buffer
handling.
Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>
Differential Revision: https://reviews.llvm.org/D112374
Instead of dumping the string literal (which
quotes it and escape every non-ascii symbol),
we can use the content of the string when it is a
8 byte string.
Wide, UTF-8/UTF-16/32 strings are still completely
escaped, until we clarify how these entities should
behave (cf https://wg21.link/p2361).
`FormatDiagnostic` is modified to escape
non printable characters and invalid UTF-8.
This ensures that unicode characters, spaces and new
lines are properly rendered in static messages.
This make clang more consistent with other implementation
and fixes this tweet
https://twitter.com/jfbastien/status/1298307325443231744 :)
Of note, `PaddingChecker` did print out new lines that were
later removed by the diagnostic printing code.
To be consistent with its tests, the new lines are removed
from the diagnostic.
Unicode tables updated to both use the Unicode definitions
and the Unicode 14.0 data.
U+00AD SOFT HYPHEN is still considered a print character
to match existing practices in terminals, in addition of
being considered a formatting character as per Unicode.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D108469
Instead of dumping the string literal (which
quotes it and escape every non-ascii symbol),
we can use the content of the string when it is a
8 byte string.
Wide, UTF-8/UTF-16/32 strings are still completely
escaped, until we clarify how these entities should
behave (cf https://wg21.link/p2361).
`FormatDiagnostic` is modified to escape
non printable characters and invalid UTF-8.
This ensures that unicode characters, spaces and new
lines are properly rendered in static messages.
This make clang more consistent with other implementation
and fixes this tweet
https://twitter.com/jfbastien/status/1298307325443231744 :)
Of note, `PaddingChecker` did print out new lines that were
later removed by the diagnostic printing code.
To be consistent with its tests, the new lines are removed
from the diagnostic.
Unicode tables updated to both use the Unicode definitions
and the Unicode 14.0 data.
U+00AD SOFT HYPHEN is still considered a print character
to match existing practices in terminals, in addition of
being considered a formatting character as per Unicode.
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D108469
The functions 'mkdir', 'mknod', 'mkdirat', 'mknodat' return 0 on success
and -1 on failure. The checker modeled these functions with a >= 0
return value on success which is changed to 0 only. This fix makes
ErrnoChecker work better for these functions.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D127277
This updates StdLibraryFunctionsChecker to set the state of 'errno'
by using the new errno_modeling functionality.
The errno value is set in the PostCall callback. Setting it in call::Eval
did not work for some reason and then every function should be
EvalCallAsPure which may be bad to do. Now the errno value and state
is not allowed to be checked in any PostCall checker callback because
it is unspecified if the errno was set already or will be set later
by this checker.
Reviewed By: martong, steakhal
Differential Revision: https://reviews.llvm.org/D125400
Extend checker 'ErrnoModeling' with a state of 'errno' to indicate
the importance of the 'errno' value and how it should be used.
Add a new checker 'ErrnoChecker' that observes use of 'errno' and
finds possible wrong uses, based on the "errno state".
The "errno state" should be set (together with value of 'errno')
by other checkers (that perform modeling of the given function)
in the future. Currently only a test function can set this value.
The new checker has no user-observable effect yet.
Reviewed By: martong, steakhal
Differential Revision: https://reviews.llvm.org/D122150
Previously, system globals were treated as immutable regions, unless it
was the `errno` which is known to be frequently modified.
D124244 wants to add a check for stores to immutable regions.
It would basically turn all stores to system globals into an error even
though we have no reason to believe that those mutable sys globals
should be treated as if they were immutable. And this leads to
false-positives if we apply D124244.
In this patch, I'm proposing to treat mutable sys globals actually
mutable, hence allocate them into the `GlobalSystemSpaceRegion`, UNLESS
they were declared as `const` (and a primitive arithmetic type), in
which case, we should use `GlobalImmutableSpaceRegion`.
In any other cases, I'm using the `GlobalInternalSpaceRegion`, which is
no different than the previous behavior.
---
In the tests I added, only the last `expected-warning` was different, compared to the baseline.
Which is this:
```lang=C++
void test_my_mutable_system_global_constraint() {
assert(my_mutable_system_global > 2);
clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{TRUE}}
invalidate_globals();
clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{UNKNOWN}} It was previously TRUE.
}
void test_my_mutable_system_global_assign(int x) {
my_mutable_system_global = x;
clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{TRUE}}
invalidate_globals();
clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{UNKNOWN}} It was previously TRUE.
}
```
---
Unfortunately, the taint checker will be also affected.
The `stdin` global variable is a pointer, which is assumed to be a taint
source, and the rest of the taint propagation rules will propagate from
it.
However, since mutable variables are no longer treated immutable, they
also get invalidated, when an opaque function call happens, such as the
first `scanf(stdin, ...)`. This would effectively remove taint from the
pointer, consequently disable all the rest of the taint propagations
down the line from the `stdin` variable.
All that said, I decided to look through `DerivedSymbol`s as well, to
acquire the memregion in that case as well. This should preserve the
previously existing taint reports.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D127306
Initially, I thought there is some fundamental bug here by not using the
bool fields, but it turns out D55425 split this checker into two
separate ones; making these fields dead.
Depends on D127836, which uncovered this issue.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D127838
The `Profile` function was incorrectly implemented.
The `StreamErrorState` has an implicit `bool` conversion operator, which
will result in a different hash than faithfully hashing the raw value of
the enum.
I don't have a test for it, since it seems difficult to find one.
Even if we would have one, any change in the hashing algorithm would
have a chance of breaking it, so I don't think it would justify the
effort.
Depends on D127836, which uncovered this issue by marking the related
`Profile` function dead.
Reviewed By: martong, balazske
Differential Revision: https://reviews.llvm.org/D127839
Thanks @kazu for helping me clean these parts in D127799.
I'm leaving the dump methods, along with the unused visitor handlers and
the forwarding methods.
The dead parts actually helped to uncover two bugs, to which I'm going
to post separate patches.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D127836
This change specializes the LLVM RTTI mechanism for SVals.
After this change, we can use the well-known `isa`, `cast`, `dyn_cast`.
Examples:
// SVal V = ...;
// Loc MyLoc = ...;
bool IsInteresting = isa<loc::MemRegionVal, loc::GotoLabel>(MyLoc);
auto MRV = cast<loc::MemRegionVal>(MyLoc);
Optional<loc::MemRegionVal> MaybeMRV = dyn_cast<loc::MemRegionVal>(V)
The current `SVal::getAs` and `castAs` member functions are redundant at
this point, but I believe that they are still handy.
The member function version is terse and reads left-to-right, which IMO
is a great plus. However, we should probably add a variadic `isa` member
function version to have the same casting API in both cases.
Thanks for the extensive TMP help @bzcheeseman!
Reviewed By: bzcheeseman
Differential Revision: https://reviews.llvm.org/D125709
A crash was seen in CastValueChecker due to a null pointer dereference.
The fix uses QualType::getAsString to avoid the null dereference
when a CXXRecordDecl cannot be obtained. A small reproducer is added,
and cast value notes LITs are updated for the new debug messages.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D127105
I'm also hoisting common code from the existing specializations into a
common trait impl to reduce code duplication.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D126801
This patch adds a new descendant to the SymExpr hierarchy. This way, now
we can assign constraints to symbolic unary expressions. Only the unary
minus and bitwise negation are handled.
Differential Revision: https://reviews.llvm.org/D125318
In some rare cases the type of an SVal might be interesting.
This introspection function exposes this information in tests.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D125532
BoolAssignment checker is now taint-aware and warns if a tainted value is
assigned.
Original author: steakhal
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D125360
There are many more instances of this pattern, but I chose to limit this change to .rst files (docs), anything in libcxx/include, and string literals. These have the highest chance of being seen by end users.
Reviewed By: #libc, Mordante, martong, ldionne
Differential Revision: https://reviews.llvm.org/D124708
This is an extension to diff D99260. This adds an additional exception
for `std::__addressof` in `InnerPointerChecker`.
Patch By alishuja (Ali Shuja Siddiqui)!
Reviewed By: martong, alishuja
Differential Revision: https://reviews.llvm.org/D109467
Remove unnecessary conversion to Optional<> and incorrect assumption
that BindExpr can return a null state.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D124681
`g_memdup()` allocates and copies memory, thus we should not assume that
the returned memory region is uninitialized because it might not be the
case.
PS: It would be even better to copy the bindings to mimic the actual
content of the buffer, but this works too.
Fixes#53617
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D124436
The patch is straightforward except the tiny fix in BugReporterVisitors.cpp
that suppresses a default note for "Assuming pointer value is null" when
a note tag from the checker is present. This is probably the right thing to do
but also definitely not a complete solution to the problem of different sources
of path notes being unaware of each other, which is a large and annoying issue
that we have to deal with. Note tags really help there because they're nicely
introspectable. The problem is demonstrated by the newly added getenv() test.
Differential Revision: https://reviews.llvm.org/D122285
In the following example:
int va_list_get_int(va_list *va) {
return va_arg(*va, int); // FP
}
The `*va` expression will be something like `Element{SymRegion{va}, 0, va_list}`.
We use `ElementRegions` for representing the result of the dereference.
In this case, the `IsSymbolic` was set to `false` in the
`getVAListAsRegion()`.
Hence, before checking if the memregion is a SymRegion, we should take
the base of that region.
Analogously to the previous example, one can craft other cases:
struct MyVaList {
va_list l;
};
int va_list_get_int(struct MyVaList va) {
return va_arg(va.l, int); // FP
}
But it would also work if the `va_list` would be in the base or derived
part of a class. `ObjCIvarRegions` are likely also susceptible.
I'm not explicitly demonstrating these cases.
PS: Check the `MemRegion::getBaseRegion()` definition.
Fixes#55009
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D124239
This change adds an option to detect all null dereferences for
non-default address spaces, except for address spaces 256, 257 and 258.
Those address spaces are special since null dereferences are not errors.
All address spaces can be considered (except for 256, 257, and 258) by
using -analyzer-config
core.NullDereference:DetectAllNullDereferences=true. This option is
false by default, retaining the original behavior.
A LIT test was enhanced to cover this case, and the rst documentation
was updated to describe this behavior.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D122841
A recent review emphasized the preference to use DefaultBool instead of
bool for checker options. This change is a NFC and cleans up some of the
instances where bool was used, and could be changed to DefaultBool.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D123464
Under the hood this prints the same as `QualType::getAsString()` but cuts out the middle-man when that string is sent to another raw_ostream.
Also cleaned up all the call sites where this occurs.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D123926