This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Casting a pointer to a suitably large integral type by reinterpret-cast
should result in the same value as by using the `__builtin_bit_cast()`.
The compiler exploits this: https://godbolt.org/z/zMP3sG683
However, the analyzer does not bind the same symbolic value to these
expressions, resulting in weird situations, such as failing equality
checks and even results in crashes: https://godbolt.org/z/oeMP7cj8q
Previously, in the `RegionStoreManager::getBinding()` even if `T` was
non-null, we replaced it with `TVR->getValueType()` in case the `MR` was
`TypedValueRegion`.
It doesn't make much sense to auto-detect the type if the type is
already given. By not doing the auto-detection, we would just do the
right thing and perform the load by that type.
This means that we will cast the value to that type.
So, in this patch, I'm proposing to do auto-detection only if the type
was null.
Here is a snippet of code, annotated by the previous and new dump values.
`LocAsInteger` should wrap the `SymRegion`, since we want to load the
address as if it was an integer.
In none of the following cases should type auto-detection be triggered,
hence we should eventually reach an `evalCast()` to lazily cast the loaded
value into that type.
```lang=C++
void LValueToRValueBitCast_dumps(void *p, char (*array)[8]) {
clang_analyzer_dump(p); // remained: &SymRegion{reg_$0<void * p>}
clang_analyzer_dump(array); // remained: {{&SymRegion{reg_$1<char (*)[8] array>}
clang_analyzer_dump((unsigned long)p);
// remained: {{&SymRegion{reg_$0<void * p>} [as 64 bit integer]}}
clang_analyzer_dump(__builtin_bit_cast(unsigned long, p)); <--------- change #1
// previously: {{&SymRegion{reg_$0<void * p>}}}
// now: {{&SymRegion{reg_$0<void * p>} [as 64 bit integer]}}
clang_analyzer_dump((unsigned long)array); // remained: {{&SymRegion{reg_$1<char (*)[8] array>} [as 64 bit integer]}}
clang_analyzer_dump(__builtin_bit_cast(unsigned long, array)); <--------- change #2
// previously: {{&SymRegion{reg_$1<char (*)[8] array>}}}
// now: {{&SymRegion{reg_$1<char (*)[8] array>} [as 64 bit integer]}}
}
```
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D136603
Previously, `LazyCompoundVal` bindings to subregions referred by
`LazyCopoundVals`, were not marked as //lazily copied//.
This change returns `LazyCompoundVals` from `getInterestingValues()`,
so their regions can be marked as //lazily copied// in `RemoveDeadBindingsWorker::VisitBinding()`.
Depends on D134947
Authored by: Tomasz Kamiński <tomasz.kamiński@sonarsource.com>
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D135136
To illustrate our current understanding, let's start with the following program:
https://godbolt.org/z/33f6vheh1
```lang=c++
void clang_analyzer_printState();
struct C {
int x;
int y;
int more_padding;
};
struct D {
C c;
int z;
};
C foo(D d, int new_x, int new_y) {
d.c.x = new_x; // B1
assert(d.c.x < 13); // C1
C c = d.c; // L
assert(d.c.y < 10); // C2
assert(d.z < 5); // C3
d.c.y = new_y; // B2
assert(d.c.y < 10); // C4
return c; // R
}
```
In the code, we create a few bindings to subregions of root region `d` (`B1`, `B2`), a constrain on the values (`C1`, `C2`, ….), and create a `lazyCompoundVal` for the part of the region `d` at point `L`, which is returned at point `R`.
Now, the question is which of these should remain live as long the return value of the `foo` call is live. In perfect a word we should preserve:
# only the bindings of the subregions of `d.c`, which were created before the copy at `L`. In our example, this includes `B1`, and not `B2`. In other words, `new_x` should be live but `new_y` shouldn’t.
# constraints on the values of `d.c`, that are reachable through `c`. This can be created both before the point of making the copy (`L`) or after. In our case, that would be `C1` and `C2`. But not `C3` (`d.z` value is not reachable through `c`) and `C4` (the original value of`d.c.y` was overridden at `B2` after the creation of `c`).
The current code in the `RegionStore` covers the use case (1), by using the `getInterestingValues()` to extract bindings to parts of the referred region present in the store at the point of copy. This also partially covers point (2), in case when constraints are applied to a location that has binding at the point of the copy (in our case `d.c.x` in `C1` that has value `new_x`), but it fails to preserve the constraints that require creating a new symbol for location (`d.c.y` in `C2`).
We introduce the concept of //lazily copied// locations (regions) to the `SymbolReaper`, i.e. for which a program can access the value stored at that location, but not its address. These locations are constructed as a set of regions referred to by `lazyCompoundVal`. A //readable// location (region) is a location that //live// or //lazily copied// . And symbols that refer to values in regions are alive if the region is //readable//.
For simplicity, we follow the current approach to live regions and mark the base region as //lazily copied//, and consider any subregions as //readable//. This makes some symbols falsy live (`d.z` in our example) and keeps the corresponding constraints alive.
The rename `Regions` to `LiveRegions` inside `RegionStore` is NFC change, that was done to make it clear, what is difference between regions stored in this two sets.
Regression Test: https://reviews.llvm.org/D134941
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Reviewed By: martong, xazax.hun
Differential Revision: https://reviews.llvm.org/D134947
`LazyCompoundVals` should only appear as `default` bindings in the
store. This fixes the second case in this patch-stack.
Depends on: D132142
Reviewed By: xazax.hun
Differential Revision: https://reviews.llvm.org/D132143
It turns out that in certain cases `SymbolRegions` are wrapped by
`ElementRegions`; in others, it's not. This discrepancy can cause the
analyzer not to recognize if the two regions are actually referring to
the same entity, which then can lead to unreachable paths discovered.
Consider this example:
```lang=C++
struct Node { int* ptr; };
void with_structs(Node* n1) {
Node c = *n1; // copy
Node* n2 = &c;
clang_analyzer_dump(*n1); // lazy...
clang_analyzer_dump(*n2); // lazy...
clang_analyzer_dump(n1->ptr); // rval(n1->ptr): reg_$2<int * SymRegion{reg_$0<struct Node * n1>}.ptr>
clang_analyzer_dump(n2->ptr); // rval(n2->ptr): reg_$1<int * Element{SymRegion{reg_$0<struct Node * n1>},0 S64b,struct Node}.ptr>
clang_analyzer_eval(n1->ptr != n2->ptr); // UNKNOWN, bad!
(void)(*n1);
(void)(*n2);
}
```
The copy of `n1` will insert a new binding to the store; but for doing
that it actually must create a `TypedValueRegion` which it could pass to
the `LazyCompoundVal`. Since the memregion in question is a
`SymbolicRegion` - which is untyped, it needs to first wrap it into an
`ElementRegion` basically implementing this untyped -> typed conversion
for the sake of passing it to the `LazyCompoundVal`.
So, this is why we have `Element{SymRegion{.}, 0,struct Node}` for `n1`.
The problem appears if the analyzer evaluates a read from the expression
`n1->ptr`. The same logic won't apply for `SymbolRegionValues`, since
they accept raw `SubRegions`, hence the `SymbolicRegion` won't be
wrapped into an `ElementRegion` in that case.
Later when we arrive at the equality comparison, we cannot prove that
they are equal.
For more details check the corresponding thread on discourse:
https://discourse.llvm.org/t/are-symbolicregions-really-untyped/64406
---
In this patch, I'm eagerly wrapping each `SymbolicRegion` by an
`ElementRegion`; basically canonicalizing to this form.
It seems reasonable to do so since any object can be thought of as a single
array of that object; so this should not make much of a difference.
The tests also underpin this assumption, as only a few were broken by
this change; and actually fixed a FIXME along the way.
About the second example, which does the same copy operation - but on
the heap - it will be fixed by the next patch.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D132142
Prior to this patch when the analyzer encountered a non-POD 0 length array,
it still invoked the constructor for 1 element, which lead to false positives.
This patch makes sure that we no longer construct any elements when we see a
0 length array.
Differential Revision: https://reviews.llvm.org/D131501
If a lazyCompoundVal to a struct is bound to the store, there is a policy which decides
whether a copy gets created instead.
This patch introduces a similar policy for arrays, which is required to model structured
binding to arrays without false negatives.
Differential Revision: https://reviews.llvm.org/D128064
Region store was not able to see through this case to the actual
initialized value of STRUCT ff. This change addresses this case by
getting the direct binding. This was found and debugged in a downstream
compiler, with debug guidance from @steakhal. A positive and negative
test case is added.
The specific case where this issue was exposed.
typedef struct {
int a:1;
int b[2];
} STRUCT;
int main() {
STRUCT ff = {0};
STRUCT* pff = &ff;
int a = ((int)pff + 1);
return a;
}
Reviewed By: steakhal, martong
Differential Revision: https://reviews.llvm.org/D124349
Essentially, having a default member initializer for a constant member
does not necessarily imply the member will have the given default value.
Remove part of a2e053638b ([analyzer] Treat more const variables and
fields as known contants., 2018-05-04).
Fix#47878
Reviewed By: r.stahl, steakhal
Differential Revision: https://reviews.llvm.org/D124621
Usages of makeNull need to be deprecated in favor of makeNullWithWidth
for architectures where the pointer size should not be assumed. This can
occur when pointer sizes can be of different sizes, depending on address
space for example. See https://reviews.llvm.org/D118050 as an example.
This was uncovered initially in a downstream compiler project, and
tested through those systems tests.
steakhal performed systems testing across a large set of open source
projects.
Co-authored-by: steakhal
Resolves: https://github.com/llvm/llvm-project/issues/53664
Reviewed By: NoQ, steakhal
Differential Revision: https://reviews.llvm.org/D119601
Summary: Add support of multi-dimensional arrays in `RegionStoreManager::getBindingForElement`. Handle nested ElementRegion's getting offsets and checking for being in bounds. Get values from the nested initialization lists using obtained offsets.
Differential Revision: https://reviews.llvm.org/D111654
Summary: Assuming that values of constant arrays never change, we can retrieve values for specific position(index) right from the initializer, if presented. Retrieve a character code by index from StringLiteral which is an initializer of constant arrays in global scope.
This patch has a known issue of getting access to characters past the end of the literal. The declaration, in which the literal is used, is an implicit cast of kind `array-to-pointer`. The offset should be in literal length's bounds. This should be distinguished from the states in the Standard C++20 [dcl.init.string] 9.4.2.3. Example:
const char arr[42] = "123";
char c = arr[41]; // OK
const char * const str = "123";
char c = str[41]; // NOK
Differential Revision: https://reviews.llvm.org/D107339
Summary: Fix a case when the extent can not be retrieved correctly from incomplete array declaration. Use redeclaration to get the array extent.
Differential Revision: https://reviews.llvm.org/D111542
Summary:
1. Improve readability by moving deeply nested block of code from RegionStoreManager::getBindingForElement to new separate functions:
- getConstantValFromConstArrayInitializer;
- getSValFromInitListExpr.
2. Handle the case when index is a symbolic value. Write specific test cases.
3. Add test cases when there is no initialization expression presented.
This patch implies to make next patches clearer and easier for review process.
Differential Revision: https://reviews.llvm.org/D106681
It turns out llvm::isa<> is variadic, and we could have used this at a
lot of places.
The following patterns:
x && isa<T1>(x) || isa<T2>(x) ...
Will be replaced by:
isa_and_non_null<T1, T2, ...>(x)
Sometimes it caused further simplifications, when it would cause even
more code smell.
Aside from this, keep in mind that within `assert()` or any macro
functions, we need to wrap the isa<> expression within a parenthesis,
due to the parsing of the comma.
Reviewed By: martong
Differential Revision: https://reviews.llvm.org/D111982
Summary: Move logic from CastRetrievedVal to evalCast and replace CastRetrievedVal with evalCast. Also move guts from SimpleSValBuilder::dispatchCast inside evalCast.
evalCast intends to substitute dispatchCast, evalCastFromNonLoc and evalCastFromLoc in the future. OriginalTy provides additional information for casting, which is useful for some cases and useless for others. If `OriginalTy.isNull()` is true, then cast performs based on CastTy only. Now evalCast operates in two ways. It retains all previous behavior and take over dispatchCast behavior. dispatchCast, evalCastFromNonLoc and evalCastFromLoc is considered as buggy since it doesn't take into account OriginalTy of the SVal and should be improved.
From this patch use evalCast instead of dispatchCast, evalCastFromNonLoc and evalCastFromLoc functions. dispatchCast redirects to evalCast.
This patch shall not change any behavior.
Differential Revision: https://reviews.llvm.org/D96090
Summary:
CompoundLiteralRegions have been properly modeled before, but
'getBindingForElement` was not changed to accommodate this change
properly.
rdar://problem/46144644
Differential Revision: https://reviews.llvm.org/D78990
The `SubEngine` interface is an interface with only one implementation
`EpxrEngine`. Adding other implementations are difficult and very
unlikely in the near future. Currently, if anything from `ExprEngine` is
to be exposed to other classes it is moved to `SubEngine` which
restricts the alternative implementations. The virtual methods are have
a slight perofrmance impact. Furthermore, instead of the `LLVM`-style
inheritance a native inheritance is used here, which renders `LLVM`
functions like e.g. `cast<T>()` unusable here. This patch removes this
interface and allows usage of `ExprEngine` directly.
Differential Revision: https://reviews.llvm.org/D80548
Summary:
This patch uses the new `DynamicSize.cpp` to serve dynamic information.
Previously it was static and probably imprecise data.
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D69599
Summary:
This patch introduces a placeholder for representing the dynamic size of
regions. It also moves the `getExtent()` method of `SubRegions` to the
`MemRegionManager` as `getStaticSize()`.
Reviewed By: NoQ
Differential Revision: https://reviews.llvm.org/D69540
When implementation of the block runtime is available, we should not
warn that block layout fields are uninitialized simply because they're
on the stack.
Write tests for the actual crash that was found. Write comments and refactor
code around 17 style bugs and suppress 3 false positives.
Differential Revision: https://reviews.llvm.org/D66847
llvm-svn: 370246
If the global variable has an initializer, we'll ignore it because we're usually
not analyzing the program from the beginning, which means that the global
variable may have changed before we start our analysis.
However when we're analyzing main() as the top-level function, we can rely
on global initializers to still be valid. At least in C; in C++ we have global
constructors that can still break this logic.
This patch allows the Static Analyzer to load constant initializers from
global variables if the top-level function of the current analysis is main().
Differential Revision: https://reviews.llvm.org/D65361
llvm-svn: 370244
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
Differential revision: https://reviews.llvm.org/D66259
llvm-svn: 368942
Quotes around StringRegions are now escaped and unescaped correctly,
producing valid JSON.
Additionally, add a forgotten escape for Store values.
Differential Revision: https://reviews.llvm.org/D63519
llvm-svn: 363897
Include a unique pointer so that it was possible to figure out if it's
the same cluster in different program states. This allows comparing
dumps of different states against each other.
Differential Revision: https://reviews.llvm.org/D63362
llvm-svn: 363896
the assertion is in fact incorrect: there is a cornercase in Objective-C++
in which a C++ object is not constructed with a constructor, but merely
zero-initialized. Namely, this happens when an Objective-C message is sent
to a nil and it is supposed to return a C++ object.
Differential Revision: https://reviews.llvm.org/D60988
llvm-svn: 359262
Summary:
The existing CTU mechanism imports `FunctionDecl`s where the definition is available in another TU. This patch extends that to VarDecls, to bind more constants.
- Add VarDecl importing functionality to CrossTranslationUnitContext
- Import Decls while traversing them in AnalysisConsumer
- Add VarDecls to CTU external mappings generator
- Name changes from "external function map" to "external definition map"
Reviewers: NoQ, dcoughlin, xazax.hun, george.karpenkov, martong
Reviewed By: xazax.hun
Subscribers: Charusso, baloghadamsoftware, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, george.karpenkov, mgorny, whisperity, szepet, rnkovacs, a.sidorin, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D46421
llvm-svn: 358968
Default RegionStore bindings represent values that can be obtained by loading
from anywhere within the region, not just the specific offset within the region
that they are said to be bound to. For example, default-binding a character \0
to an int (eg., via memset()) means that the whole int is 0, not just
that its lower byte is 0.
Even though memset and bzero were modeled this way, it didn't work correctly
when applied to simple variables. Eg., in
int x;
memset(x, 0, sizeof(x));
we did produce a default binding, but were unable to read it later, and 'x'
was perceived as an uninitialized variable even after memset.
At the same time, if we replace 'x' with a variable of a structure or array
type, accessing fields or elements of such variable was working correctly,
which was enough for most cases. So this was only a problem for variables of
simple integer/enumeration/floating-point/pointer types.
Fix loading default bindings from RegionStore for regions of simple variables.
Add a unit test to document the API contract as well.
Differential Revision: https://reviews.llvm.org/D60742
llvm-svn: 358722
RegionStore now knows how to bind a nonloc::CompoundVal that represents the
value of an aggregate initializer when it has its initial segment of sub-values
correspond to base classes.
Additionally, fixes the crash from pr40022.
Differential Revision: https://reviews.llvm.org/D59054
llvm-svn: 356222
This reverts commit r341722.
The "postponed" mechanism turns out to be necessary in order to handle
situations when a symbolic region is only kept alive by implicit bindings
in the Store. Otherwise the region is never scanned by the Store's worklist
and the binding gets dropped despite being live, as demonstrated
by the newly added tests.
Differential Revision: https://reviews.llvm.org/D57554
llvm-svn: 353350
As noted in https://bugs.llvm.org/show_bug.cgi?id=36651, the specialization for
isPodLike<std::pair<...>> did not match the expectation of
std::is_trivially_copyable which makes the memcpy optimization invalid.
This patch renames the llvm::isPodLike trait into llvm::is_trivially_copyable.
Unfortunately std::is_trivially_copyable is not portable across compiler / STL
versions. So a portable version is provided too.
Note that the following specialization were invalid:
std::pair<T0, T1>
llvm::Optional<T>
Tests have been added to assert that former specialization are respected by the
standard usage of llvm::is_trivially_copyable, and that when a decent version
of std::is_trivially_copyable is available, llvm::is_trivially_copyable is
compared to std::is_trivially_copyable.
As of this patch, llvm::Optional is no longer considered trivially copyable,
even if T is. This is to be fixed in a later patch, as it has impact on a
long-running bug (see r347004)
Note that GCC warns about this UB, but this got silented by https://reviews.llvm.org/D50296.
Differential Revision: https://reviews.llvm.org/D54472
llvm-svn: 351701
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636