This is a NFC refactoring to change makeIntValWithPtrWidth
and remove getZeroWithPtrWidth to use types when forming values to match
pointer widths. Some targets may have different pointer widths depending
upon address space, so this needs to be comprehended.
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D120134
Usages of makeNull need to be deprecated in favor of makeNullWithWidth
for architectures where the pointer size should not be assumed. This can
occur when pointer sizes can be of different sizes, depending on address
space for example. See https://reviews.llvm.org/D118050 as an example.
This was uncovered initially in a downstream compiler project, and
tested through those systems tests.
steakhal performed systems testing across a large set of open source
projects.
Co-authored-by: steakhal
Resolves: https://github.com/llvm/llvm-project/issues/53664
Reviewed By: NoQ, steakhal
Differential Revision: https://reviews.llvm.org/D119601
Few weeks back I was experimenting with reading the uninitialized values from src , which is actually a bug but the CSA seems to give up at that point . I was curious about that and I pinged @steakhal on the discord and according to him this seems to be a genuine issue and needs to be fix. So I goes with fixing this bug and thanks to @steakhal who help me creating this patch. This feature seems to break some tests but this was the genuine problem and the broken tests also needs to fix in certain manner. I add a test but yeah we need more tests,I'll try to add more tests.Thanks
Reviewed By: steakhal, NoQ
Differential Revision: https://reviews.llvm.org/D120489
Few weeks back I was experimenting with reading the uninitialized values from src , which is actually a bug but the CSA seems to give up at that point . I was curious about that and I pinged @steakhal on the discord and according to him this seems to be a genuine issue and needs to be fix. So I goes with fixing this bug and thanks to @steakhal who help me creating this patch. This feature seems to break some tests but this was the genuine problem and the broken tests also needs to fix in certain manner. I add a test but yeah we need more tests,I'll try to add more tests.Thanks
Reviewed By: steakhal, NoQ
Differential Revision: https://reviews.llvm.org/D120489
The problem with leak bug reports is that the most interesting event in the code
is likely the one that did not happen -- lack of ownership change and lack of
deallocation, which is often present within the same function that the analyzer
inlined anyway, but not on the path of execution on which the bug occured. We
struggle to understand that a function was responsible for freeing the memory,
but failed.
D105819 added a new visitor to improve memory leak bug reports. In addition to
inspecting the ExplodedNodes of the bug pat, the visitor tries to guess whether
the function was supposed to free memory, but failed to. Initially (in D108753),
this was done by checking whether a CXXDeleteExpr is present in the function. If
so, we assume that the function was at least party responsible, and prevent the
analyzer from pruning bug report notes in it. This patch improves this heuristic
by recognizing all deallocator functions that MallocChecker itself recognizes,
by reusing MallocChecker::isFreeingCall.
Differential Revision: https://reviews.llvm.org/D118880
Add a checker to maintain the system-defined value 'errno'.
The value is supposed to be set in the future by existing or
new checkers that evaluate errno-modifying function calls.
Reviewed By: NoQ, steakhal
Differential Revision: https://reviews.llvm.org/D120310
Since https://reviews.llvm.org/D120334 we shouldn't pass temporary LangOptions to Lexer.
This change fixes stack-use-after-scope UB in LocalizationChecker found by sanitizer-x86_64-linux-fast buildbot
and resolve similar issue in HeaderIncludes.
Add a checker to maintain the system-defined value 'errno'.
The value is supposed to be set in the future by existing or
new checkers that evaluate errno-modifying function calls.
Reviewed By: NoQ, steakhal
Differential Revision: https://reviews.llvm.org/D120310
LocationContext::getDecl() isn't useful for obtaining the "farmed" body because
the (synthetic) body statement isn't actually attached to the (natural-grown)
declaration in the AST.
Differential Revision: https://reviews.llvm.org/D119509
There was a typo in the rule.
`{{0}, ReturnValueIndex}` meant that the discrete index is `0` and the
variadic index is `-1`.
What we wanted instead is that both `0` and `-1` are in the discrete index
list.
Instead of this, we wanted to express that both `0` and the
`ReturnValueIndex` is in the discrete arg list.
The manual inspection revealed that `setproctitle_init` also suffered a
probably incomplete propagation rule.
Reviewed By: Szelethus, gamesh411
Differential Revision: https://reviews.llvm.org/D119129
Fixes the issue D118987 by mapping the propagation to the callsite's
LocationContext.
This way we can keep track of the in-flight propagations.
Note that empty propagation sets won't be inserted.
Reviewed By: NoQ, Szelethus
Differential Revision: https://reviews.llvm.org/D119128
Recently we uncovered a serious bug in the `GenericTaintChecker`.
It was already flawed before D116025, but that was the patch that turned
this silent bug into a crash.
It happens if the `GenericTaintChecker` has a rule for a function, which
also has a definition.
char *fgets(char *s, int n, FILE *fp) {
nested_call(); // no parameters!
return (char *)0;
}
// Within some function:
fgets(..., tainted_fd);
When the engine inlines the definition and finds a function call within
that, the `PostCall` event for the call will get triggered sooner than the
`PostCall` for the original function.
This mismatch violates the assumption of the `GenericTaintChecker` which
wants to propagate taint information from the `PreCall` event to the
`PostCall` event, where it can actually bind taint to the return value
**of the same call**.
Let's get back to the example and go through step-by-step.
The `GenericTaintChecker` will see the `PreCall<fgets(..., tainted_fd)>`
event, so it would 'remember' that it needs to taint the return value
and the buffer, from the `PostCall` handler, where it has access to the
return value symbol.
However, the engine will inline fgets and the `nested_call()` gets
evaluated subsequently, which produces an unimportant
`PreCall<nested_call()>`, then a `PostCall<nested_call()>` event, which is
observed by the `GenericTaintChecker`, which will unconditionally mark
tainted the 'remembered' arg indexes, trying to access a non-existing
argument, resulting in a crash.
If it doesn't crash, it will behave completely unintuitively, by marking
completely unrelated memory regions tainted, which is even worse.
The resulting assertion is something like this:
Expr.h: const Expr *CallExpr::getArg(unsigned int) const: Assertion
`Arg < getNumArgs() && "Arg access out of range!"' failed.
The gist of the backtrace:
CallExpr::getArg(unsigned int) const
SimpleFunctionCall::getArgExpr(unsigned int)
CallEvent::getArgSVal(unsigned int) const
GenericTaintChecker::checkPostCall(const CallEvent &, CheckerContext&) const
Prior to D116025, there was a check for the argument count before it
applied taint, however, it still suffered from the same underlying
issue/bug regarding propagation.
This path does not intend to fix the bug, rather start a discussion on
how to fix this.
---
Let me elaborate on how I see this problem.
This pre-call, post-call juggling is just a workaround.
The engine should by itself propagate taint where necessary right where
it invalidates regions.
For the tracked values, which potentially escape, we need to erase the
information we know about them; and this is exactly what is done by
invalidation.
However, in the case of taint, we basically want to approximate from the
opposite side of the spectrum.
We want to preserve taint in most cases, rather than cleansing them.
Now, we basically sanitize all escaping tainted regions implicitly,
since invalidation binds a fresh conjured symbol for the given region,
and that has not been associated with taint.
IMO this is a bad default behavior, we should be more aggressive about
preserving taint if not further spreading taint to the reachable
regions.
We have a couple of options for dealing with it (let's call it //tainting
policy//):
1) Taint only the parameters which were tainted prior to the call.
2) Taint the return value of the call, since it likely depends on the
tainted input - if any arguments were tainted.
3) Taint all escaped regions - (maybe transitively using the cluster
algorithm) - if any arguments were tainted.
4) Not taint anything - this is what we do right now :D
The `ExprEngine` should not deal with taint on its own. It should be done
by a checker, such as the `GenericTaintChecker`.
However, the `Pre`-`PostCall` checker callbacks are not designed for this.
`RegionChanges` would be a much better fit for modeling taint propagation.
What we would need in the `RegionChanges` callback is the `State` prior
invalidation, the `State` after the invalidation, and a `CheckerContext` in
which the checker can create transitions, where it would place `NoteTags`
for the modeled taint propagations and report errors if a taint sink
rule gets violated.
In this callback, we could query from the prior State, if the given
value was tainted; then act and taint if necessary according to the
checker's tainting policy.
By using RegionChanges for this, we would 'fix' the mentioned
propagation bug 'by-design'.
Reviewed By: Szelethus
Differential Revision: https://reviews.llvm.org/D118987
There is different bug types for different types of bugs but the **emitAdditionOverflowbug** seems to use bugtype **BT_NotCSting** but actually it have to use **BT_AdditionOverflow** .
Reviewed By: steakhal
Differential Revision: https://reviews.llvm.org/D119462
GenericTaintChecker now uses CallDescriptionMap to describe the possible
operation in code which trigger the introduction (sources), the removal
(filters), the passing along (propagations) and detection (sinks) of
tainted values.
Reviewed By: steakhal, NoQ
Differential Revision: https://reviews.llvm.org/D116025
An identical declaration is present just a couple of lines above the
line being removed in this patch.
Identified with readability-redundant-declaration.
Control-Flow Integrity (CFI) replaces references to address-taken
functions with pointers to the CFI jump table. This is a problem
for low-level code, such as operating system kernels, which may
need the address of an actual function body without the jump table
indirection.
This change adds the __builtin_function_start() builtin, which
accepts an argument that can be constant-evaluated to a function,
and returns the address of the function body.
Link: https://github.com/ClangBuiltLinux/linux/issues/1353
Depends on D108478
Reviewed By: pcc, rjmccall
Differential Revision: https://reviews.llvm.org/D108479
This expands checking for more expressions. This will check underflow
and loss of precision when using call expressions like:
void foo(unsigned);
int i = -1;
foo(i);
This also includes other expressions as well, so it can catch negative
indices to std::vector since it uses unsigned integers for [] and .at()
function.
Patch by: @pfultz2
Differential Revision: https://reviews.llvm.org/D46081
This avoids an unnecessary copy required by 'return OS.str()', allowing
instead for NRVO or implicit move. The .str() call (which flushes the
stream) is no longer required since 65b13610a5,
which made raw_string_ostream unbuffered by default.
Differential Revision: https://reviews.llvm.org/D115374
This patch replaces each use of the previous API with the new one.
In variadic cases, it will use the ADL `matchesAny(Call, CDs...)`
variadic function.
Also simplifies some code involving such operations.
Reviewed By: martong, xazax.hun
Differential Revision: https://reviews.llvm.org/D113591
`CallDescriptions` deserve its own translation unit.
This patch simply moves the corresponding parts.
Also includes the `CallDescription.h` where it's necessary.
Reviewed By: martong, xazax.hun, Szelethus
Differential Revision: https://reviews.llvm.org/D113587
Replace variable and functions names, as well as comments that contain whitelist with
more inclusive terms.
Reviewed By: aaron.ballman, martong
Differential Revision: https://reviews.llvm.org/D112642
Due to a typo, `sprintf()` was recognized as a taint source instead of a
taint propagator. It was because an empty taint source list - which is
the first parameter of the `TaintPropagationRule` - encoded the
unconditional taint sources.
This typo effectively turned the `sprintf()` into an unconditional taint
source.
This patch fixes that typo and demonstrated the correct behavior with
tests.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D112558
It seems like protobuf crashed the `std::string` checker.
Somehow it acquired `UnknownVal` as the sole `std::string` constructor
parameter, causing a crash in the `castAs<Loc>()`.
This patch addresses this.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D112551
This patch adds a checker checking `std::string` operations.
At first, it only checks the `std::string` single `const char *`
constructor for nullness.
If It might be `null`, it will constrain it to non-null and place a note
tag there.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D111247