|
| 1 | +# Bounds Checking: Generalizing from Variables to LValues |
| 2 | + |
| 3 | +## Problem Summary |
| 4 | + |
| 5 | +The current bounds checking algorithm checks that the inferred bounds of each |
| 6 | +in-scope variable imply the target bounds of the variable after each top-level |
| 7 | +CFG statement. This supports bounds checking in multiple assignments. |
| 8 | +For example: |
| 9 | + |
| 10 | +``` |
| 11 | +void f(array_ptr<int> small : count(1), |
| 12 | + array_ptr<int> medium : count(2), |
| 13 | + array_ptr<int> large : count(3)) { |
| 14 | + // At the end of medium = small: medium's inferred bounds are count(1). |
| 15 | + // These inferred bounds do not imply the target bounds of count(2). |
| 16 | + // At the end of medium = large: medium's inferred bounds are count(3). |
| 17 | + // These inferred bounds imply the target bounds of count(2). |
| 18 | + // The bounds are not checked until the end of the statement, when |
| 19 | + // the inferred bounds of count(3) imply the target bounds of count(2). |
| 20 | + medium = small, medium = large; |
| 21 | +} |
| 22 | +``` |
| 23 | + |
| 24 | +For other kinds of lvalue expressions (`*p`, `a.f`, etc.), bounds checking |
| 25 | +occurs immediately after checking an assignment to the expression. The goal |
| 26 | +is to generalize the work that was previously done for variables so that the |
| 27 | +bounds checking behavior is the same for all lvalue expressions. |
| 28 | + |
| 29 | +## Bounds Checking for Variables |
| 30 | + |
| 31 | +The compiler tracks the inferred bounds for each in-scope variable while |
| 32 | +traversing expressions. The following data structures and methods are relevant |
| 33 | +for tracking, updating, and using the inferred bounds: |
| 34 | + |
| 35 | +- **Tracking:** the `ObservedBounds` member of the `CheckingState` class. |
| 36 | +- **Updating:** the `TraverseCFG`, `GetIncomingBlockState`, |
| 37 | +`UpdateCtxWithWidenedBounds`, `GetDeclaredBounds`, `ResetKilledBounds`, |
| 38 | +and `UpdateAfterAssignment` methods. |
| 39 | +- **Using:** the `ValidateBoundsContext` and `RValueCastBounds` methods. |
| 40 | + |
| 41 | +## AbstractSet: LValue Expression Equality |
| 42 | + |
| 43 | +### Requirements for Bounds Checking |
| 44 | + |
| 45 | +For certain lvalue expressions `e1` and `e2`, updating the inferred bounds |
| 46 | +of `e1` should also update the inferred bounds of `e2`. For example: |
| 47 | + |
| 48 | +``` |
| 49 | +struct S { |
| 50 | + array_ptr<int> f : count(2); |
| 51 | +}; |
| 52 | +
|
| 53 | +void f(struct S *a, array_ptr<int> p : count(3)) { |
| 54 | + (*a).f = 0, p = a->f; |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +At the end of `(*a).f = 0`, the inferred bounds of `(*a).f` are `bounds(any`). |
| 59 | +At the end of `p = a->f`, the inferred bounds `p` are the inferred bounds of |
| 60 | +`a->f`. The inferred bounds of `a->f` should be the same as the inferred bounds |
| 61 | +of `(*a).f`, since `(*a).f` and `a->f` are identical lvalue expressions. |
| 62 | + |
| 63 | +In general, for lvalue expressions `e1` and `e2`, updating the inferred bounds |
| 64 | +of `e1` should also update the bounds of `e2` if and only if `e1` and `e2` |
| 65 | +are guaranteed to be **identical lvalue expressions**. That is, if: |
| 66 | + |
| 67 | +1. `e1` and `e2` point to the same location in memory, and: |
| 68 | +2. `e1` and `e2` have the same range in memory. |
| 69 | + |
| 70 | +For the initial planned work for lvalue generalization, we will only generalize |
| 71 | +across lvalue expressions that we are able to determine as identical based on |
| 72 | +the definition stated above. Further, while determining identical lvalue |
| 73 | +expressions, we will initially ignore aliasing concerns and ignore lvalue |
| 74 | +expressions that do not fully overlap in memory. We may consider aliasing |
| 75 | +issues in future work. |
| 76 | + |
| 77 | +### ObservedBounds Keys |
| 78 | + |
| 79 | +Currently, the `ObservedBounds` map uses `VarDecl *` as its keys. This ensures |
| 80 | +that updating the inferred bounds for a `DeclRefExpr *` `x` updates the |
| 81 | +inferred bounds for all other `DeclRefExpr *` `y` where `x` and `y` have the |
| 82 | +same `VarDecl *`. However, `VarDecl *` will not work as a key for general |
| 83 | +lvalue expressions. |
| 84 | + |
| 85 | +If the `ObservedBounds` map uses `Expr *` as its keys, then updating the |
| 86 | +inferred bounds of `(*a).f` will not update the inferred bounds of `a->f` |
| 87 | +since `(*a).f` and `a->f` are distinct expressions in the Clang AST. |
| 88 | +Therefore, a different type is needed for the `ObservedBounds` keys. |
| 89 | + |
| 90 | +To support the new keys for `ObservedBounds`, we introduce an AbstractSet API. |
| 91 | +Given an lvalue expression `e`, the API should return a representation of the |
| 92 | +set containing all lvalue expressions that are identical to `e`. This |
| 93 | +representation will be the key in `ObservedBounds` that maps to the inferred |
| 94 | +bounds of `e` (as well as all other lvalue expressions that are identical to |
| 95 | +`e`). |
| 96 | + |
| 97 | +The AbstractSet representation of an lvalue expression `e` may use a canonical |
| 98 | +form of `e`. For example, the canonical form of `(*a).f` may be `a->f`. This |
| 99 | +canonicalization may use the |
| 100 | +[PreorderAST](https://github.com/microsoft/checkedc-clang/blob/master/clang/include/clang/AST/PreorderAST.h). |
| 101 | + |
| 102 | +## Work Item Overview |
| 103 | + |
| 104 | +1. Define the representation that the AbstractSet API returns. What will be |
| 105 | + the type of the representation? What are some examples of the representation |
| 106 | + for different kinds of expressions (`DeclRefExpr *`, `MemberExpr *`, etc.)? |
| 107 | + (As described above, the representation may use the `PreorderAST` for |
| 108 | + canonicalization. We may also consider using the comparison in |
| 109 | + [CanonBounds.cpp](https://github.com/microsoft/checkedc-clang/blob/master/clang/lib/AST/CanonBounds.cpp)). |
| 110 | +2. Implement the AbstractSet API for `DeclRefExpr *`. This is necessary to |
| 111 | + maintain the current behavior for checking variable bounds. |
| 112 | +3. Replace the current `VarDecl *` keys in `ObservedBounds` with the |
| 113 | + AbstractSet representation. This should result in no changes in compiler |
| 114 | + behavior (since only `DeclRefExpr *` have AbstractSet representations). |
| 115 | +4. Implement AbstractSet representations for other kinds of lvalue |
| 116 | + expressions (see below for the suggested prioritized expression kinds). |
| 117 | + This should result in the compiler behavior for the implemented lvalue |
| 118 | + expressions to be identical to the current behavior for variables. |
| 119 | +5. For each lvalue expression kind that has an AbstractSet representation, |
| 120 | + remove the current bounds checking behavior that deals with lvalue |
| 121 | + expressions that are not a `DeclRefExpr *`. For example, the |
| 122 | + `CheckBinaryOperator` method currently performs bounds checking for all |
| 123 | + assignments where the left-hand side is not a `DeclRefExpr *`. |
| 124 | +6. Track equality for lvalue expressions. Certain types of lvalue expressions, |
| 125 | + e.g. expressions such as `*p` and `a->f` which read memory via a pointer, |
| 126 | + are currently not allowed in `EquivExprs`. However, bounds checking relies |
| 127 | + on equality information. For example, in an assignment `a->f = e`, where `e` |
| 128 | + is an expression with inferred bounds of `(e, e + i)`, the compiler needs |
| 129 | + to know that `a->f` and `e` are equal. |
| 130 | + |
| 131 | +## LValue Expression Priorities |
| 132 | + |
| 133 | +The lvalue generalization work will be done by incrementally adding AbstractSet |
| 134 | +representations for each kind of lvalue expression. The bounds checker checks |
| 135 | +the following kinds of lvalue expressions (in the `CheckLValue` method): |
| 136 | + |
| 137 | +- `DeclRefExpr *`. Examples: `x`, `y`, `myvariable`. |
| 138 | +- `UnaryOperator *`. Examples: `*p`, `*(p + 1)`. |
| 139 | +- `ArraySubscriptExpr *`. Examples: `arr[0]`, `1[arr]`. |
| 140 | +- `MemberExpr *`. Examples: `a.f`, `a->f`. |
| 141 | +- `ImplicitCastExpr *`. Examples: `LValueBitCast(e)`. |
| 142 | +- `CHKCBindTemporaryExpr *`. Examples: `TempBinding({ 0 })`. |
| 143 | + |
| 144 | +The following priorities are proposed for AbstractSet implementations for each |
| 145 | +kind of lvalue expression. Priority 0 and Priority 1 expressions should be |
| 146 | +implemented before the April release of the Checked C compiler. |
| 147 | + |
| 148 | +- Priority 0: `DeclRefExpr *`. This is necessary to maintain the current |
| 149 | + bounds checking behavior for variables. It will also serve to test the |
| 150 | + implementation of the AbstractSet API since implementing it for |
| 151 | + `DeclRefExpr *` should not result in any compiler behavior changes. |
| 152 | +- Priority 1: `MemberExpr *`. This expression kind is more likely to be |
| 153 | + involved in bounds checking, since a `MemberExpr *` is more likely to have |
| 154 | + target bounds that are not `bounds(unknown)`. |
| 155 | +- Priority 2: `UnaryOperator *` and `ArraySubscriptExpr *`. These are more |
| 156 | + likely than `MemberExpr *` to have unknown target bounds. |
| 157 | +- Priority 3: `ImplicitCastExpr *` and `CHKCBindTemporaryExpr *`. These kinds |
| 158 | + are less common in example code. |
0 commit comments