Skip to content

Commit 6c24f3f

Browse files
Merge pull request #407 from correctcomputation/merge-from-microsoft-20210127
Merge from Microsoft 2021-01-27
2 parents 009d992 + 6694a25 commit 6c24f3f

File tree

67 files changed

+2814
-1952
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+2814
-1952
lines changed

clang/automation/README.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
This directory contains scripts used for automated builds and testing
2-
of the Checked C clang compiler.
1+
This directory contains scripts used for automated builds and testing
2+
of the Checked C clang compiler.

clang/docs/DriverArchitecture.png

2 Bytes
Loading

clang/docs/PCHLayout.png

2 Bytes
Loading
Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
# Bounds Checking: Generalizing from Variables to LValues
2+
3+
## Problem Summary
4+
5+
The current bounds checking algorithm checks that the inferred bounds of each
6+
in-scope variable imply the target bounds of the variable after each top-level
7+
CFG statement. This supports bounds checking in multiple assignments.
8+
For example:
9+
10+
```
11+
void f(array_ptr<int> small : count(1),
12+
array_ptr<int> medium : count(2),
13+
array_ptr<int> large : count(3)) {
14+
// At the end of medium = small: medium's inferred bounds are count(1).
15+
// These inferred bounds do not imply the target bounds of count(2).
16+
// At the end of medium = large: medium's inferred bounds are count(3).
17+
// These inferred bounds imply the target bounds of count(2).
18+
// The bounds are not checked until the end of the statement, when
19+
// the inferred bounds of count(3) imply the target bounds of count(2).
20+
medium = small, medium = large;
21+
}
22+
```
23+
24+
For other kinds of lvalue expressions (`*p`, `a.f`, etc.), bounds checking
25+
occurs immediately after checking an assignment to the expression. The goal
26+
is to generalize the work that was previously done for variables so that the
27+
bounds checking behavior is the same for all lvalue expressions.
28+
29+
## Bounds Checking for Variables
30+
31+
The compiler tracks the inferred bounds for each in-scope variable while
32+
traversing expressions. The following data structures and methods are relevant
33+
for tracking, updating, and using the inferred bounds:
34+
35+
- **Tracking:** the `ObservedBounds` member of the `CheckingState` class.
36+
- **Updating:** the `TraverseCFG`, `GetIncomingBlockState`,
37+
`UpdateCtxWithWidenedBounds`, `GetDeclaredBounds`, `ResetKilledBounds`,
38+
and `UpdateAfterAssignment` methods.
39+
- **Using:** the `ValidateBoundsContext` and `RValueCastBounds` methods.
40+
41+
## AbstractSet: LValue Expression Equality
42+
43+
### Requirements for Bounds Checking
44+
45+
For certain lvalue expressions `e1` and `e2`, updating the inferred bounds
46+
of `e1` should also update the inferred bounds of `e2`. For example:
47+
48+
```
49+
struct S {
50+
array_ptr<int> f : count(2);
51+
};
52+
53+
void f(struct S *a, array_ptr<int> p : count(3)) {
54+
(*a).f = 0, p = a->f;
55+
}
56+
```
57+
58+
At the end of `(*a).f = 0`, the inferred bounds of `(*a).f` are `bounds(any`).
59+
At the end of `p = a->f`, the inferred bounds `p` are the inferred bounds of
60+
`a->f`. The inferred bounds of `a->f` should be the same as the inferred bounds
61+
of `(*a).f`, since `(*a).f` and `a->f` are identical lvalue expressions.
62+
63+
In general, for lvalue expressions `e1` and `e2`, updating the inferred bounds
64+
of `e1` should also update the bounds of `e2` if and only if `e1` and `e2`
65+
are guaranteed to be **identical lvalue expressions**. That is, if:
66+
67+
1. `e1` and `e2` point to the same location in memory, and:
68+
2. `e1` and `e2` have the same range in memory.
69+
70+
For the initial planned work for lvalue generalization, we will only generalize
71+
across lvalue expressions that we are able to determine as identical based on
72+
the definition stated above. Further, while determining identical lvalue
73+
expressions, we will initially ignore aliasing concerns and ignore lvalue
74+
expressions that do not fully overlap in memory. We may consider aliasing
75+
issues in future work.
76+
77+
### ObservedBounds Keys
78+
79+
Currently, the `ObservedBounds` map uses `VarDecl *` as its keys. This ensures
80+
that updating the inferred bounds for a `DeclRefExpr *` `x` updates the
81+
inferred bounds for all other `DeclRefExpr *` `y` where `x` and `y` have the
82+
same `VarDecl *`. However, `VarDecl *` will not work as a key for general
83+
lvalue expressions.
84+
85+
If the `ObservedBounds` map uses `Expr *` as its keys, then updating the
86+
inferred bounds of `(*a).f` will not update the inferred bounds of `a->f`
87+
since `(*a).f` and `a->f` are distinct expressions in the Clang AST.
88+
Therefore, a different type is needed for the `ObservedBounds` keys.
89+
90+
To support the new keys for `ObservedBounds`, we introduce an AbstractSet API.
91+
Given an lvalue expression `e`, the API should return a representation of the
92+
set containing all lvalue expressions that are identical to `e`. This
93+
representation will be the key in `ObservedBounds` that maps to the inferred
94+
bounds of `e` (as well as all other lvalue expressions that are identical to
95+
`e`).
96+
97+
The AbstractSet representation of an lvalue expression `e` may use a canonical
98+
form of `e`. For example, the canonical form of `(*a).f` may be `a->f`. This
99+
canonicalization may use the
100+
[PreorderAST](https://github.com/microsoft/checkedc-clang/blob/master/clang/include/clang/AST/PreorderAST.h).
101+
102+
## Work Item Overview
103+
104+
1. Define the representation that the AbstractSet API returns. What will be
105+
the type of the representation? What are some examples of the representation
106+
for different kinds of expressions (`DeclRefExpr *`, `MemberExpr *`, etc.)?
107+
(As described above, the representation may use the `PreorderAST` for
108+
canonicalization. We may also consider using the comparison in
109+
[CanonBounds.cpp](https://github.com/microsoft/checkedc-clang/blob/master/clang/lib/AST/CanonBounds.cpp)).
110+
2. Implement the AbstractSet API for `DeclRefExpr *`. This is necessary to
111+
maintain the current behavior for checking variable bounds.
112+
3. Replace the current `VarDecl *` keys in `ObservedBounds` with the
113+
AbstractSet representation. This should result in no changes in compiler
114+
behavior (since only `DeclRefExpr *` have AbstractSet representations).
115+
4. Implement AbstractSet representations for other kinds of lvalue
116+
expressions (see below for the suggested prioritized expression kinds).
117+
This should result in the compiler behavior for the implemented lvalue
118+
expressions to be identical to the current behavior for variables.
119+
5. For each lvalue expression kind that has an AbstractSet representation,
120+
remove the current bounds checking behavior that deals with lvalue
121+
expressions that are not a `DeclRefExpr *`. For example, the
122+
`CheckBinaryOperator` method currently performs bounds checking for all
123+
assignments where the left-hand side is not a `DeclRefExpr *`.
124+
6. Track equality for lvalue expressions. Certain types of lvalue expressions,
125+
e.g. expressions such as `*p` and `a->f` which read memory via a pointer,
126+
are currently not allowed in `EquivExprs`. However, bounds checking relies
127+
on equality information. For example, in an assignment `a->f = e`, where `e`
128+
is an expression with inferred bounds of `(e, e + i)`, the compiler needs
129+
to know that `a->f` and `e` are equal.
130+
131+
## LValue Expression Priorities
132+
133+
The lvalue generalization work will be done by incrementally adding AbstractSet
134+
representations for each kind of lvalue expression. The bounds checker checks
135+
the following kinds of lvalue expressions (in the `CheckLValue` method):
136+
137+
- `DeclRefExpr *`. Examples: `x`, `y`, `myvariable`.
138+
- `UnaryOperator *`. Examples: `*p`, `*(p + 1)`.
139+
- `ArraySubscriptExpr *`. Examples: `arr[0]`, `1[arr]`.
140+
- `MemberExpr *`. Examples: `a.f`, `a->f`.
141+
- `ImplicitCastExpr *`. Examples: `LValueBitCast(e)`.
142+
- `CHKCBindTemporaryExpr *`. Examples: `TempBinding({ 0 })`.
143+
144+
The following priorities are proposed for AbstractSet implementations for each
145+
kind of lvalue expression. Priority 0 and Priority 1 expressions should be
146+
implemented before the April release of the Checked C compiler.
147+
148+
- Priority 0: `DeclRefExpr *`. This is necessary to maintain the current
149+
bounds checking behavior for variables. It will also serve to test the
150+
implementation of the AbstractSet API since implementing it for
151+
`DeclRefExpr *` should not result in any compiler behavior changes.
152+
- Priority 1: `MemberExpr *`. This expression kind is more likely to be
153+
involved in bounds checking, since a `MemberExpr *` is more likely to have
154+
target bounds that are not `bounds(unknown)`.
155+
- Priority 2: `UnaryOperator *` and `ArraySubscriptExpr *`. These are more
156+
likely than `MemberExpr *` to have unknown target bounds.
157+
- Priority 3: `ImplicitCastExpr *` and `CHKCBindTemporaryExpr *`. These kinds
158+
are less common in example code.

clang/include/clang/Basic/DiagnosticSemaKinds.td

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10172,6 +10172,23 @@ def err_bounds_type_annotation_lost_checking : Error<
1017210172
"%select{assignment|decrement|increment|initialization|statement}0">,
1017310173
InGroup<CheckBoundsDeclsChecked>;
1017410174

10175+
def error_bounds_declaration_unprovable : Error<
10176+
"it is not possible to prove that the inferred bounds of %1 "
10177+
"imply the declared bounds of %1 after "
10178+
"%select{assignment|decrement|increment|initialization|statement}0">;
10179+
10180+
def note_free_variable_decl_or_inferred : Note<
10181+
"the %select{declared|inferred}0 %select{lower |upper |}1bounds use the "
10182+
"variable '%2' and there is no relational information involving '%2' "
10183+
"and any of the expressions used by the %select{inferred|declared}0 "
10184+
"%select{lower |upper |}1bounds">;
10185+
10186+
def note_free_variable_in_expected_args : Note<
10187+
"the %select{expected argument|inferred}0 %select{lower |upper |}1bounds use the "
10188+
"variable '%2' and there is no relational information involving '%2' "
10189+
"and any of the expressions used by the %select{inferred|expected argument}0 "
10190+
"%select{lower |upper |}1bounds">;
10191+
1017510192
def error_bounds_declaration_invalid : Error<
1017610193
"declared bounds for %1 are invalid after "
1017710194
"%select{assignment|decrement|increment|initialization|statement}0">;
@@ -10203,6 +10220,9 @@ def err_bounds_type_annotation_lost_checking : Error<
1020310220
"argument must be a non-modifying expression because %ordinal0 parameter "
1020410221
"is used in a bounds expression">;
1020510222

10223+
def error_argument_bounds_unprovable : Error<
10224+
"it is not possible to prove argument meets declared bounds for %ordinal0 parameter">;
10225+
1020610226
def warn_argument_bounds_invalid : Warning<
1020710227
"cannot prove argument meets declared bounds for %ordinal0 parameter">,
1020810228
InGroup<CheckBoundsDeclsUnchecked>;

clang/lib/AST/CanonBounds.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -336,7 +336,7 @@ Result Lexicographic::CompareExpr(const Expr *Arg1, const Expr *Arg2) {
336336
#include "clang/AST/StmtNodes.inc"
337337
llvm_unreachable("cannot compare a statement");
338338
case Expr::PredefinedExprClass: Cmp = Compare<PredefinedExpr>(E1, E2); break;
339-
case Expr::DeclRefExprClass: return Compare<DeclRefExpr>(E1, E2);
339+
case Expr::DeclRefExprClass: Cmp = Compare<DeclRefExpr>(E1, E2); break;
340340
case Expr::IntegerLiteralClass: return Compare<IntegerLiteral>(E1, E2);
341341
case Expr::FloatingLiteralClass: return Compare<FloatingLiteral>(E1, E2);
342342
case Expr::ImaginaryLiteralClass: break;

0 commit comments

Comments
 (0)